-
Notifications
You must be signed in to change notification settings - Fork 403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use of Equivalent Sites Ambiguates PB Type Usage in Place and Route #2888
Comments
I have attempted to dig through source code to identify options for changing the behavior described above. Here is what I think I know: *Based on studying the placer's use of the
Based on information collected so far, it would appear that an update step needs to be added to placement to modify the netlist block type to whichever of several equivalent sites match the site at the physical location where a block is placed. A validation step that the netlist block types match the physical types at all placed locations would also be wise so that illegal post-placement netlists are not allowed to continue to routing. I have not yet studied in detail what requirements this would impose in terms of updating timing data, but timing data updates may be required as well based on how the delay lookups are performed. @AlexandreSinger I'm not sure how much this genuinely overlaps with your recent work, but I seem to at least be poking next door to where you've been lately with the packer. Between that and your placement work I would be curious to get your thoughts. |
Hi @petergrossmann21 , I am not familiar with how / if the Placer handles the logical and physical block types of clusters, but I have become quite familiar with how the Packer handles them! In the Packer, the physical block type (pb_type) is entirely decided by the logical block type chosen for the cluster. From my understanding, the logical block type are the high-level types of the clusters which can be placed on the FPGA grid (each grid tile has sub-tiles which may implement logical block types); while the physical block types are the actual, physical implementation of clusters (as defined in the architecture file; i.e. pb_type). The logical block type of a cluster is decided in one and only one place within the packer (and is never changed as far as I am aware, unless the cluster is destroyed and restarted): vtr-verilog-to-routing/vpr/src/pack/greedy_clusterer.cpp Lines 370 to 488 in 23b3aa3
This code performs the following steps:
Within the Cluster Legalizer, this logical block type is used to create the actual physical block of this cluster (which resembles a hierarchical graph representing the physical blocks in a sub-tile in the architecture): vtr-verilog-to-routing/vpr/src/pack/cluster_legalizer.cpp Lines 1459 to 1464 in 23b3aa3
This physical block structure is used by the cluster legalizer to decide which molecules can be legally packed into the cluster (by ensuring that a path exists from the molecules pins to where they need to go). Based on my understanding of the cluster legalizer, I would be very surprised if the physical block of a cluster were to completely change after a cluster has been legalized, especially in the Placer. Once we reach the placer, the Cluster Legalizer object is destroyed and is not used (but the physical block object remains). Without extra information, I believe that in order to change from one physical block type to another, one would have to rerun the Cluster Legalizer (which basically performs a PathFinder within the cluster to find paths from input pins, to molecules, to output pins). However, I wonder if equivalent types come with a guarantee that the intra-cluster routing will not change, and the logical block type of these clusters can mutate between these types without fully running the Cluster Legalizer. If that is the case, it would make sense for the Placer to only change the logical block type, but not actually update the physical block information (since it can take some work to regenerate the t_pb of a cluster, and it may change the type again later). If this is the case, perhaps the physical block of the cluster just needs to be regenerated after the placer; however, I am not sure if that is what is causing the timing analysis issues you are mentioning I agree with you completely though that a check should be added to verify that the logical block type of the the cluster matches the physical type of the cluster and matches the physical type of the site it wants to be placed in. Hopefully this information is helpful! This is just what logical / physical cluster types look like from the perspective of Packing. I am not sure how different it appears in the Placement / Routing stages; especially considering flat routing. |
@AlexandreSinger thanks for the quick reply! I think your comments round out the picture for me somewhat. If all of the legalization is happening before placement even starts (makes sense when site equivalence is not in play), then it's not difficult to imagine there being some opportunity for things to fall out of sync during placement. I will need to study the equivalent type usage within the packer and placer more to confirm their role in placement, but I am at least attempting to use them in a way where the architecture model implicitly provides a guarantee of intra-cluster routing immutability and thereby can always get a correct placement if equivalent sites are swapped. For what it's worth, equivalent site definitions include XML syntax to allow the architect to specify either a direct (1:1) pin mapping between the sub_tile and the site, or a custom pin mapping. I'm willing to speculate that perhaps the custom pin mapping enables the intra-cluster routing solution to be preserved when swapping between top level I will also need to deepen my understanding of the distinction between logical block type and physical block type (mainly which gets used where) to prevent miscommunication. Once I have that straight, I'll be able to comment with more confidence on which of the two types the timing data is tied to. |
@petergrossmann21 Not a problem! Happy to help! Regarding the distinction between logical and physical block types, there is some documentation here that explains it: vtr-verilog-to-routing/libs/libarchfpga/src/physical_types.h Lines 928 to 1040 in 23b3aa3
This comment does try to explain when it is being used: Hopefully this will be helpful! The distinction between logical and physical block types is something I have found very confusing and something I have wanted to refactor at some point; but it would not be easy! My cleanups to the Cluster Legalizer may make that easier, but I am not sure. |
@AlexandreSinger Thanks for the follow up These comments confirm what I was seeing in some parts of the code I was studying and the closest thing I have to intuition for the distinction. Loosely speaking, I tend to think of it this way:
It would appear that the notion of a physical block in the packer diverges from this somewhat, as it is not possible during packing to divine which of several equivalent tiles types the cluster might be placed to, and so if the term "physical block" is being used during packing it necesssarily has a different meaning than the one I'm using above. That said, there is still a notion of physical implementation during packing since it must solve a small routing problem to connect blocks within the cluster, so I can see why such terminology would be adopted. Borrowing my definitions for the moment to avoid confusion, delay parameters reside squarely in the logical block definitions, as they are enumerated in the |
@petergrossmann21 : This tutorial may be helpful in detailing common uses of the equivalent_sites feature: https://docs.verilogtorouting.org/en/latest/tutorials/arch/equivalent_sites/ You are right that they are distinct from the packer's notion of "physical types". The basic reason they were created is for things like MLABs vs. LABs (some tiles can implement memory, some can't) and things like different I/Os with different pin-outs. You are right that timing could be another use for them, but currently we didn't build in that feature. The timing analyzer makes callbacks to delay calculators to get the delay of the connections between primitives. So rather than 're-syncing' the netlist with the equivalent tile type at which a block has been placed I think it would be cleaner and easier to upgrade the post-routing delay calculator (or other delay calculators if needed, but perhaps routing is the only one that really needs to be this accurate). We don't have a way to get those additional delays into vpr right now though, so you'd need to make a proposal of how to get them in. Possibilities would be to add optional delay data to the tile definition and/or complex block type description (which could turn into a fair amount of code and xml unfortunately) or some other separate detailed delay file (which also seems like a significant change to get all the data in the right place). Perhaps the easiest would be to allow a list of delays in the complex block type delay data (per interconnect and per cell, with a key of the tile location equivalent_site type and a default of delays apply to all tile locations). Probably worth discussing in a Thursday meeting! |
@vaughnbetz Thanks for weighing in. I'm not too worried about the XML size bloating -- that's inevitable if it's where the delay data is stored, and I will be auto-generating it in any case. A thought ahead of a real discussion on Thursday: is there a case where the original intended use of equivalent sites can create an issue with the post-route simulation flow due to the packed netlist packing to a pb_type that is not what is used in the final netlist? I could see where the answer would probably be no in the common case. Consider the SLICEL/SLICEM example in the tutorial. If the packer produces a netlist that's all SLICELs, I would expect moving a subset of them to SLICEM locations in the grid won't actually change the post-route netlist since the netlist will be comprised of lower-level primitives. I think it might be less clear, though, whether the SDF would be correct since any differences in interconnect delay between the SLICEL and SLICEM would not be captured. Either way, I'll benefit from understanding the nuances of the post-routing delay calculator better; that will make it much easier to frame a proposed solution. |
As we discussed in today's VTR meeting, you can find how the delay for a timing path is retrieved here. This applies when the path is not "cached," meaning either this is the first time the delay for the path is being calculated, or the path has changed since the last computation. The delay is divided into three parts, as explained here.
When the two-stage router is used, the intra-cluster routing results obtained during packing are reused, as implemented in this function. For flat router timing, I had to experiment with the delay calculator and Tatum quite a bit. Let me know if you need any help! |
In addition to what @AlexandreSinger mentioned about logical and physical blocks, I thought it might be useful to highlight the following points:
When the Cluster Netlist is formed, a cluster block ID can reference both a Each (x, y, layer_number) location corresponds to a I hope this explanation helps clarify the purpose of each data structure. |
A method of uniquifying delay parameters for logic blocks at different (X,Y) locations is to uniquify the block as needed as a function of X and Y, and then annotate each version of the block with a unique set of delay parameters. For example, one might have clb_0 and clb_1 defined as equivalent sites, and then provide pb_type definitions for each that are identical other than their delay parameters. Since they are defined as equivalent sites, VPR can then swap between them during packing/placement.
One could imagine similar scenarios for purposes other than detailed delay modeling.
Expected Behaviour
Extending the above example, if the placement solution used 3 instances of clb_0 and 2 of clb_1, I would expect reporting throughout the flow to reflect this post-placement, and timing analysis to correctly track delay parameters according to which block is located at each (X,Y) location.
Current Behaviour
The observed behavior in a toy test case is that when such an approach is used, VPR treats all placed instances of equivalent sites as being of type equal to one of the site types, regardless of the specifics of the placement solution. Extending the above example, if the placement solution used 3 instances of clb_0 and 2 of clb_1, but the two clb type are equivalent sites, then VPR's observed behavior is to count all five as being of type clb_0. This then causes the wrong delay parameters to be looked up during timing analysis for blocks placed in tile locations where clb_1 is present.
Possible Solution
It would appear that some additional tracking is needed during placement to maintain the usage of each site type, if for no other reason than so that the correct delay parameters are obtained for the given placed block used at each (X,Y) location.
Steps to Reproduce
Test case data is not yet publicly available; an equivalent (smaller) test case will need to be designed to reproduce. Any architecture that makes use of equivalent sites should be sufficient.
Context
This arose while attempting to refine a delay model for an eFPGA for which layout is generated at the subarray level using automatic place and route software. A side effect of this implementation approach is that each tile ( (X,Y) location) of the subarray has unique delay parameters, and there is not a strong guarantee that these parameters can be approximated as invariant across (X,Y) locations.
Your Environment
The text was updated successfully, but these errors were encountered: