-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUTRAM for CologneChip?! #28
Comments
Where does that number come from? |
There are a couple of things you could do, depending on your needs.
|
Any plans for adding LUTRAM support to future silicon devices? 10K is an arbitrary SWAG, based on the already established datapoint for OPL3 FPGA design:
The trouble is that building RAM from DFFs consumes not only flops, but also the LUTs and wiring. Altogether, it may create routing choke points that restrict access to resources that are nominally "free". |
We've conducted tests on Gowin and CologneChip FPGAs for respective "LUTRAM" capacities. The results and the reproduction steps are published on this repo. Tests indicate that emulating LUTRAM behaviour is costly on GateMate FPGAs and the capacities are 10x less than ones of a comparable 20k LUT FPGA. Apart from this, there is an underutilization thing happening here: GateMate chip uses a tenth of the available CPE before failing in routing. |
... as predicted, the cost of GateMate LUTRAM emulation in discrete FFs is very high and twofold:
Is there anything that can be improved about the latter?! We are thinking about:
|
@DadoCCAG, to bring this catastrophic result up to the management attention for the next spin of the chip
|
@tarik-ibrahimovic any insights you can share on the benefit (or not) of this synthesis switch? |
@chili-chips-ba the experimental -luttree feature has no relation to lutrams. |
... how about helping us fully understand what this experimental switch is trying to achieve! Let's also note that its name is a bit misleading, given that GateMate does not appear to have traditional LUTs, but rather MUX trees. Or, asked differently, if GateMate had the LUTs, why does it not have the LUTRAMs? |
I assumed that this issue is only LUTRAM-related. Nevertheless:
Where does this information come from? It is wrong: It is a tree of LUT2s - with 4 configruation bits each - and is described in detail in the Primitives Library from page 53.
Please see above, since it is LUT2 in a tree structure, the name is already very appropriate.
There is no way to read the config bits, that we store in latches, of the LUTs back. Furthermore there is no decoding logic that can be used by the user. Therefore there is also no possibility for a LUTRAM implementation. But again: there are alternatives, as I have shown in #28 (comment). About the LUT-tree itself. We are currently using yosys/abc to map combinatorial logic into typical LUT4. P&R analyzes it and maps it into the LUT-trees. This is certainly not the best way to go. Yosys already supports the mapping with the Why did we decide to go with a LUT-tree? It requires fewer config bits than a standard LUT4 or LUT6. It also requires less space in the silicon. Not all features are yet supported in the P&R. That's why we marked it experimental. As soon as we finish its implementation, it certainly makes sense to activate the feature in yosys by default. Nextpnr for GateMate will only support L2T4 and L2T5. |
Thank you, this is all very informative. The confusion is partly from the statements in the press about GateMate falling into L2T4 indeed needs 12 instead of 16 configuration latches. However, is a true LUT4 otherwise better than the L2T4?
Similarly, are there 8-input logic combinations that cannot be realized with L2T4 and L2T5 structure shown above? When it comes to the LUTRAM alternatives, they are simply too expensive, or too scarce / too coarse-grained for most cases. Tarik's experiment has shown us the following:
|
Sorry, this is probably off-topic by now, but there is no Cologne Chip press in which we claim to have LUT8. The official language regulation is 4/8-input LUT-tree.
I don't know how to rate "better". In purely mathematical terms, there are obviously fewer functions that can be implemented with it. In general, it's hard to say whether this is a limitation, as not every design utilizes the full scope of the LUTs. Please take a look at the official code, which should clarify many questions.
Once again, under what conditions? If I fill the entire chip with LUTRAM, maybe. But then I would first think about whether it's really that clever to implement it like this, and possibly use the built-in RAM, of which there is a lot more.
Can you open an issue for this? We've been doing stress test with almost full utilization, i.e. with https://github.com/stnolting/fpga_torture. |
How do you feel about publishing an analysis of the 4-input logic functions that the L2T4 cannot implement, along with the corresponding percentage wrt traditional LUT4?! |
Prompted by this result that puts GateMate at disadvantage compared to other FPGAs in its class, potentially downgrading it to a 10K LUT device (depending on design), we wonder if there is anything that can be done about this major shortcoming, both now, with as-is silicon, and in the future.
Given our upcoming utilization and stress tests, as well as comparisons with Gowin, LatticeSemi and Xilinx, we suspect that GateMate may come out short in most of them (while claiming 20-to-40K capacity on paper, it may fill up faster than a standard 20K device from other vendors).
Hence this call to CologneChip experts for help and advice on the approaches users can take for this architectural uniqueness.
The text was updated successfully, but these errors were encountered: