A high-performance, lightweight Verilog implementation of a Neural Network Processing Element (PE) for Multi-Layer Perceptron (MLP) acceleration. Developed as Exercise 3 in the AI Systems Course at the University of Tehran.
- Project Goals
- Features
- Project Structure
- Installation & Simulation
- Usage
- Results
- Contributing
- License
- GitHub Topics (SEO)
-
Design & Implementation: Develop a neural network processing unit capable of performing multiply-accumulate (MAC), ReLU activation, and quantization operations.
-
Hardware Acceleration: Use Verilog to create an optimized pipelined architecture for low-latency execution.
-
Lightweight & Scalable: Minimize execution time and resource usage with parameterizable data widths and pipeline depths.
- Pipelined MAC Unit: Overlapping multiply-accumulate operations to achieve high throughput.
- ReLU Activation: Hardware-optimized rectified linear unit for non-linearity.
- Quantizer: Fixed-point quantization to control dynamic range and bit-width.
- SRAM Interface: Dual-port, ping-pong memory for efficient weight & data buffering.
- Control FSM: Manages read/write cycles and pipeline sequencing.
- Parameterizable Design: Configure word width, memory depth, and pipeline stages via Verilog parameters.
AI-based-Neural-Network-Processing-Unit/
├── src/
│ ├── PE.v # Top-level Processing Element module
│ ├── MAC.v # Multiply-Accumulate Unit
│ ├── ReLU.v # ReLU Activation Module
│ ├── Quantizer.v # Fixed-point Quantization Unit
│ ├── SRAM.v # SRAM Storage Interface
│ └── Controller.v # Control FSM
├── tests/
│ └── PE_testbench.v # Functional verification testbench
├── docs/
│ ├── simulation/ # Waveforms and logs
│ └── README.md # Documentation (this file)
└── reports/
├── 403_EAI-CA3.pdf # Assignment instructions
└── Gozaresh_final.pdf # Final project report
- Clone the repository
git clone https://github.com/Alighorbani1380/AI-based-Neural-Network-Processing-Unit
- Parameter Tuning: Edit
src/PE.v
to adjust data widths and pipeline stages. - SoC Integration: Instantiate
PE
module in your top-level design for on-chip acceleration. - Custom Verification: Use
tests/PE_testbench.v
as a template for targeted test scenarios.
Simulation confirms accurate MAC, ReLU, and quantization at 100 MHz with full throughput:
verilog
hardware-acceleration
neural-network
MLP
processing-element
FPGA
AI-systems