Remove Comb Groups Pass Produces Unoptimal Designs #1232
Labels
AMC
Needed for Andrew's memory compiler
C: Calyx
Extension or change to the Calyx IL
S: Discussion needed
Issues blocked on discussion
I've been experimenting in Calyx to try to understand cases where even hand optimizing the program (mostly by chaining operations together) is not enough to meet a Vitis HLS design. One big limiting factor I have found is how
while ... with ...
loops are lowered to purewhile
loops during the remove comb groups pass. Here is an example of this issue:This first program is a vadd optimized as much as possible while still using the higher level
while ... with ...
construct.This design produces a correct result in 37 cycles. However, this next design is hand optimized again after running the first design through the
remove-comb-groups
pass:The main change between these two designs is that the comb group in the body of the loop has been chained together with the rest of the operations in bb0_5. This new design runs in 27 cycles, which is exactly the same as a non-pipelined HLS implementation.
Although it does not have a huge impact in latency in this case, it will become a big issue when considering large nested loops and pipelines, where the while loop may run thousands of times.
Now obviously this optimization cannot always be made, depending on the target frequency and how the iter arg of the while loop is incremented. If it is incremented with an addition this optimization should almost always be possible because the comparison delay is small.
I think the correct solution here unfortunately requires some analysis of delay and chaining (or just assume a user who wants full performance will not use
while ... with ...
constructs), but I figured it was worth pointing out this limitation with examples.The text was updated successfully, but these errors were encountered: