Exploiting neural networks native parallelism and interconnection locality, dedicated parallel hardware implementation of neural network is essential for effective use of these strong computation facilities in time-critical applications. The architecture proposed in this paper is a parallel stream processor which can be configured as a number of different neural networks. The architecture is an especial collection of configurable data-flow processing elements with a custom FIFO-based cache architecture which is designed to fill the gap between the slowness of the external memory and the processor