A BRAM microstructure for sliding window liked pattern access on Xilinx FPGA
Differences between this microstructure and the general memory partition approach are listed as the following
Multi-pixeles (N_in) input per clock;
Put several pixeles in one BANK to increase the standalone BRAM ultilization;
N_x is #pixel per image line; In the multi-pixels input mode, it may not be interger divided by N_in; Then each line will have an offset while writing and fetching. The offset chages periodically.
- matlab: theoritical analysis and test data generation
- rtl: hardware modules with testbench
- testdata: testdata used in verification
- syn: PlanAhead Project folder
The first comparison is to the original video frame buffer scheme.
Used to generate configuration mif file.
generate xilinx memory initialization file format.
N_bram per line can be calcuated with ceil(N_in / N_bram_width)+1, where N_bram_width = 8 with the 72x512 SDP RAMB36E1 configuration. Here, we choose that N_in is multiple of 8 (16).
With N_in =16, we should set 3x3 BRAM for 3x3 sliding window implementation. BRAM0 -> BRAM8; They all have corresponding control signal as following,
- Input: wr_data_en; wr_data_mask; wr_addr; wr_data_in; | rd_addr;
- Output: rd_data_out;
The function of control logic is generating standalone signals for these BRAMs.
At first, a configuration BRAM is set to store some parameters calculated offline that can avoid complex calculation logic. The signal slot definition as following,
slot_name | len(bit) | useage |
---|---|---|
S0,offset | 3 | offset in BRAM, the same for the whole |
image line (because input batch size 16 | ||
is multiple to BRAM size 8 ),from 0 to 7; | ||
Used to do data shuffling and wr_en_mask. | ||
S1,order | 3 | indicate which BRAM working fisrt, |
with S0, decide the addr_inc and shuffling. | ||
S2,cycle | 8 | indicate the cycling times, MAX 255 for 4096 pix/line. |
S4,addr_ret | 1 | Address return to zero, control signal of the outermost |
loop. | ||
S5, split | 4 | indicate #pixel need to be store at the head of next BRAM |
data_in_vld_d0(1st) | d1 | d2 | d3 | d4 | … | data_in_vld | d0(lin1last) | d1 | d2 |
---|---|---|---|---|---|---|---|---|---|
conf_bram_addr0 | … | conf_bram_addr1 | |||||||
conf_data_line0 | … | conf_data_line1 | |||||||
inline_cnt=0 | inline_cnt=79 | inline_cnt=78 | … | inline_cnt=2 | inline_cnt=1 | inline_cnt=0 | inline_cnt=79 | ||
return setted | return setted | ||||||||
inline_cnt | inline_cnt_d0 | inline_cnt_d1 | inline_cnt_d2 | ||||||
ctl_sig_l0_0 | … | ctl_sig_l0_0(end) | ctl_sig_l1_0(start) | ||||||
ctl_sig_l0_1 | ctl_sig_split_0 | ctl_sig_tailappend_0 | |||||||
group_wr_sig | group_wr_sig_d0 | ||||||||
group_wr_en | group_wr_en_d0 | group_wr_en_d1 | group_wr_en_d2 | group_wr_en_d3 | group_wr_en_d4 | ||||
shuf_flag | shuf_flag_d0 | shuf_flag_d1 | shuf_flag_d2 | ||||||
wr_data_0,mask0 | wr_data_1,mask1 | ||||||||
flag_write | |||||||||
wr_data_s0 | wr_data_s1 | BRAM control signal | |||||||
wr_inc_s0 | wr_inc_s1 | ||||||||
shuf_flag_tail | |||||||||
conf_offset_tail | |||||||||
wr_data_mask_tail0 | wr_data_mask_tail1 | ||||||||
wr_data_tail0 | wr_data_tail1 | ||||||||
d5, rd_out_back | d6, deshuffling | d7, deoffset | d8, line_output | ||||||
conf_offset | offset_d0 | offset_d1 | offset_d2 | offset_d3 | offset_d4 | ||||
pix_data_in_d0 | d1 | d2 | d3 | d4 | d5 | d6 | d7 |
pix23 … | … | Pix0 | |
---|---|---|---|
HSB RAM2 LSB | RAM1 | RAM0 | group0 |
HSB RAM5 LSB | RAM4 | RAM3 | group1 |
HSB RAM8 LSB | RAM7 | RAM6 | group2 |
Overlap area will be store in multiple neighboring BRAM groups. Then, ctl_sig will be different while inline_cnt=0 & inline_cnt = MAX;
An implementation of a simple dual port ram with Xilinx Primitive;
RAMB36 | FF | Slice | LUT |
---|---|---|---|
1 | 0 | 0 | 0 |
Name | Dir | BitWidth | Useage |
---|---|---|---|
clk | I | 1 | clock |
rst_n | I | 1 | reset@negedge |
rd_addr | I | 9 | memory read address |
rd_data_out | O | 64 | memory read output |
wr_addr | I | 9 | memory write address |
wr_data_in | I | 64 | memory write input |
wr_data_mask | I | 8 | byte-wide write mask |
wr_data_en | I | 1 | memory write enable |
MIF_FILE: Xilinx styled .mif file used to initialize the BRAM
This module is used to decode the configuration information in the Config BRAM/Distributed RAM and generate control signals for the 3x3 BRAM groups.
Name | Dir | BitWidth | Useage |
---|---|---|---|
clk | I | 1 | clock |
rst_n | I | 1 | reset@negedge |
data_in_vld | I | 1 | vld sig of data input |
pix_data_in | I | 128 | pixel data input |
conf_bram_rd_data_out | I | CONF_DATA_WIDTH | data read out from config RAM |
conf_bram_rd_addr | O | CONF_BRAM_ADDR_WIDTH | read address to conf bram |
wr_data | O | 192*3 | write data to 3 BRAM groups |
wr_data_mask | O | 24*3 | write data mask |
wr_data_group_en | O | 3 | 3 BRAM groups write enable |
wr_addr_inc | O | 9 | address increase sig for each BRAM |
wr_addr_reset | O | 3 | address reset sig for each BRAM group |