Skip to content

A BRAM for sliding window liked pattern access on Xilinx FPGA

Notifications You must be signed in to change notification settings

rbshi/swin_bram

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

swin_bram

A BRAM microstructure for sliding window liked pattern access on Xilinx FPGA

Features

Differences between this microstructure and the general memory partition approach are listed as the following

Ultra-fast

Multi-pixeles (N_in) input per clock;

Pixel Aggregation

Put several pixeles in one BANK to increase the standalone BRAM ultilization;

Line Alignment

N_x is #pixel per image line; In the multi-pixels input mode, it may not be interger divided by N_in; Then each line will have an offset while writing and fetching. The offset chages periodically.

Project Structure

  • matlab: theoritical analysis and test data generation
  • rtl: hardware modules with testbench
  • testdata: testdata used in verification
  • syn: PlanAhead Project folder

Matlab

latency_compare.m

The first comparison is to the original video frame buffer scheme.

parameter_gen.m

Used to generate configuration mif file.

xilinx_mif_gen.m

generate xilinx memory initialization file format.

3x3 sliding window design method

N_bram per line can be calcuated with ceil(N_in / N_bram_width)+1, where N_bram_width = 8 with the 72x512 SDP RAMB36E1 configuration. Here, we choose that N_in is multiple of 8 (16).

With N_in =16, we should set 3x3 BRAM for 3x3 sliding window implementation. BRAM0 -> BRAM8; They all have corresponding control signal as following,

  • Input: wr_data_en; wr_data_mask; wr_addr; wr_data_in; | rd_addr;
  • Output: rd_data_out;

The function of control logic is generating standalone signals for these BRAMs.

At first, a configuration BRAM is set to store some parameters calculated offline that can avoid complex calculation logic. The signal slot definition as following,

Configuration BRAM signal definition

slot_namelen(bit)useage
S0,offset3offset in BRAM, the same for the whole
image line (because input batch size 16
is multiple to BRAM size 8 ),from 0 to 7;
Used to do data shuffling and wr_en_mask.
S1,order3indicate which BRAM working fisrt,
with S0, decide the addr_inc and shuffling.
S2,cycle8indicate the cycling times, MAX 255 for 4096 pix/line.
S4,addr_ret1Address return to zero, control signal of the outermost
loop.
S5, split4indicate #pixel need to be store at the head of next BRAM

Hardware Working Timing

Decoder

data_in_vld_d0(1st)d1d2d3d4data_in_vldd0(lin1last)d1d2
conf_bram_addr0conf_bram_addr1
conf_data_line0conf_data_line1
inline_cnt=0inline_cnt=79inline_cnt=78inline_cnt=2inline_cnt=1inline_cnt=0inline_cnt=79
return settedreturn setted
inline_cntinline_cnt_d0inline_cnt_d1inline_cnt_d2
ctl_sig_l0_0ctl_sig_l0_0(end)ctl_sig_l1_0(start)
ctl_sig_l0_1ctl_sig_split_0ctl_sig_tailappend_0
group_wr_siggroup_wr_sig_d0
group_wr_engroup_wr_en_d0group_wr_en_d1group_wr_en_d2group_wr_en_d3group_wr_en_d4
shuf_flagshuf_flag_d0shuf_flag_d1shuf_flag_d2
wr_data_0,mask0wr_data_1,mask1
flag_write
wr_data_s0wr_data_s1BRAM control signal
wr_inc_s0wr_inc_s1
shuf_flag_tail
conf_offset_tail
wr_data_mask_tail0wr_data_mask_tail1
wr_data_tail0wr_data_tail1
d5, rd_out_backd6, deshufflingd7, deoffsetd8, line_output
conf_offsetoffset_d0offset_d1offset_d2offset_d3offset_d4
pix_data_in_d0d1d2d3d4d5d6d7

BRAM Data Arrangment

pix23 …Pix0
HSB RAM2 LSBRAM1RAM0group0
HSB RAM5 LSBRAM4RAM3group1
HSB RAM8 LSBRAM7RAM6group2

Line End & Line Start

Overlap area will be store in multiple neighboring BRAM groups. Then, ctl_sig will be different while inline_cnt=0 & inline_cnt = MAX;

RTL

sdp_ram

An implementation of a simple dual port ram with Xilinx Primitive;

Resource

RAMB36FFSliceLUT
1000

Pin

NameDirBitWidthUseage
clkI1clock
rst_nI1reset@negedge
rd_addrI9memory read address
rd_data_outO64memory read output
wr_addrI9memory write address
wr_data_inI64memory write input
wr_data_maskI8byte-wide write mask
wr_data_enI1memory write enable

key parameter

MIF_FILE: Xilinx styled .mif file used to initialize the BRAM

decoder

This module is used to decode the configuration information in the Config BRAM/Distributed RAM and generate control signals for the 3x3 BRAM groups.

Resource

Pin

NameDirBitWidthUseage
clkI1clock
rst_nI1reset@negedge
data_in_vldI1vld sig of data input
pix_data_inI128pixel data input
conf_bram_rd_data_outICONF_DATA_WIDTHdata read out from config RAM
conf_bram_rd_addrOCONF_BRAM_ADDR_WIDTHread address to conf bram
wr_dataO192*3write data to 3 BRAM groups
wr_data_maskO24*3write data mask
wr_data_group_enO33 BRAM groups write enable
wr_addr_incO9address increase sig for each BRAM
wr_addr_resetO3address reset sig for each BRAM group

bram_group

Generalization Update (Reboot this work for the journal paper)

TODO-List

About

A BRAM for sliding window liked pattern access on Xilinx FPGA

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published