Use of Equivalent Sites Ambiguates PB Type Usage in Place and Route #2888

petergrossmann21 · 2025-02-07T20:56:26Z

A method of uniquifying delay parameters for logic blocks at different (X,Y) locations is to uniquify the block as needed as a function of X and Y, and then annotate each version of the block with a unique set of delay parameters. For example, one might have clb_0 and clb_1 defined as equivalent sites, and then provide pb_type definitions for each that are identical other than their delay parameters. Since they are defined as equivalent sites, VPR can then swap between them during packing/placement.

One could imagine similar scenarios for purposes other than detailed delay modeling.

Expected Behaviour

Extending the above example, if the placement solution used 3 instances of clb_0 and 2 of clb_1, I would expect reporting throughout the flow to reflect this post-placement, and timing analysis to correctly track delay parameters according to which block is located at each (X,Y) location.

Current Behaviour

The observed behavior in a toy test case is that when such an approach is used, VPR treats all placed instances of equivalent sites as being of type equal to one of the site types, regardless of the specifics of the placement solution. Extending the above example, if the placement solution used 3 instances of clb_0 and 2 of clb_1, but the two clb type are equivalent sites, then VPR's observed behavior is to count all five as being of type clb_0. This then causes the wrong delay parameters to be looked up during timing analysis for blocks placed in tile locations where clb_1 is present.

Possible Solution

It would appear that some additional tracking is needed during placement to maintain the usage of each site type, if for no other reason than so that the correct delay parameters are obtained for the given placed block used at each (X,Y) location.

Steps to Reproduce

Test case data is not yet publicly available; an equivalent (smaller) test case will need to be designed to reproduce. Any architecture that makes use of equivalent sites should be sufficient.

Context

This arose while attempting to refine a delay model for an eFPGA for which layout is generated at the subarray level using automatic place and route software. A side effect of this implementation approach is that each tile ( (X,Y) location) of the subarray has unique delay parameters, and there is not a strong guarantee that these parameters can be approximated as invariant across (X,Y) locations.

Your Environment

VTR revision used: 9.0.0
Operating System and version: Ubuntu 24.04
Compiler version: gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0

petergrossmann21 · 2025-02-08T17:19:22Z

I have attempted to dig through source code to identify options for changing the behavior described above. Here is what I think I know:

*Based on studying the placer's use of the clb_nlist object (which appears to store the packed netlist), it looks like when moves are made that would change the physical type of the block from one of several equivalent types to another, the only updates that are made to clb_nlist are to pins, i.e. the actual netlist block type is not updated. If it were, I am currently assuming that it would happen within or next to where the pin values are updated, at the end of Placer::place()

vtr-verilog-to-routing/vpr/src/place/placer.cpp

Line 382 in 23b3aa3

// Update physical pin values

Consistent with the above, it appears that when the placer correctly reports to the user which of several equivalent sites are used, it is bypassing clb_nlist entirely. Instead, it simply infers what types are used from the XML specification of physical tile locations, by fetching the physical type from the device grid and querying its block type directly. This means that the clb_nlist block types need not be consistent with what is reported.
I have not rummaged through enough code to know in every case where the placer queries block types from clb_nlist vs. from a physical tile type query, so cannot comment on which source of truth the block type the placer uses for any timing analysis operations it might perform. However, I can infer from the timing graph echo files that the timing graph ground truth comes from the netlist, and as a result can confirm that delay data in the timing graph comes from whichever of several equivalent sites are picked for a given cluster at the conclusion of the packing step.
The netlist timing appears to be used consistently throughout placement and routing. In particular, because of large discrepancies in my test case's delay values between the packer's selected CLB variant (a default variant with arbitrary LUT delay = 2ns) and that in the placed CLB variants (annotated with IC timing data, << 2ns), I know that the reported critical path delay for the circuit must be netlist-based rather than physical type-based.
Reviewing the ClusteredNetlist class, it appears that no provisions are currently made for a public method to perform updates to the block type.
I have not investigated how the packer chooses which of the equivalent sites are used in packing, but this is moot since there is no information available to pick a "correct" site type until an initial placement is performed.

Based on information collected so far, it would appear that an update step needs to be added to placement to modify the netlist block type to whichever of several equivalent sites match the site at the physical location where a block is placed. A validation step that the netlist block types match the physical types at all placed locations would also be wise so that illegal post-placement netlists are not allowed to continue to routing. I have not yet studied in detail what requirements this would impose in terms of updating timing data, but timing data updates may be required as well based on how the delay lookups are performed.

@AlexandreSinger I'm not sure how much this genuinely overlaps with your recent work, but I seem to at least be poking next door to where you've been lately with the packer. Between that and your placement work I would be curious to get your thoughts.

AlexandreSinger · 2025-02-08T18:27:42Z

Hi @petergrossmann21 ,

I am not familiar with how / if the Placer handles the logical and physical block types of clusters, but I have become quite familiar with how the Packer handles them!

In the Packer, the physical block type (pb_type) is entirely decided by the logical block type chosen for the cluster. From my understanding, the logical block type are the high-level types of the clusters which can be placed on the FPGA grid (each grid tile has sub-tiles which may implement logical block types); while the physical block types are the actual, physical implementation of clusters (as defined in the architecture file; i.e. pb_type). The logical block type of a cluster is decided in one and only one place within the packer (and is never changed as far as I am aware, unless the cluster is destroyed and restarted):

vtr-verilog-to-routing/vpr/src/pack/greedy_clusterer.cpp

Lines 370 to 488 in 23b3aa3

    
           LegalizationClusterId GreedyClusterer::start_new_cluster( 
        
                       PackMoleculeId seed_mol_id, 
        
                       ClusterLegalizer& cluster_legalizer, 
        
                       const Prepacker& prepacker, 
        
                       bool balance_block_type_utilization, 
        
                       std::map<t_logical_block_type_ptr, size_t>& num_used_type_instances, 
        
                       DeviceContext& mutable_device_ctx) { 
        
               VTR_ASSERT(seed_mol_id.is_valid()); 
        
               const t_pack_molecule& seed_mol = prepacker.get_molecule(seed_mol_id); 
        
               /* Allocate a dummy initial cluster and load a atom block as a seed and check if it is legal */ 
        
               AtomBlockId root_atom = seed_mol.atom_block_ids[seed_mol.root]; 
        
               const std::string& root_atom_name = atom_netlist_.block_name(root_atom); 
        
               const t_model* root_model = atom_netlist_.block_model(root_atom); 
        
               auto itr = primitive_candidate_block_types_.find(root_model); 
        
               VTR_ASSERT(itr != primitive_candidate_block_types_.end()); 
        
               std::vector<t_logical_block_type_ptr> candidate_types = itr->second; 
        
               if (balance_block_type_utilization) { 
        
                   //We sort the candidate types in ascending order by their current utilization. 
        
                   //This means that the packer will prefer to use types with lower utilization. 
        
                   //This is a naive approach to try balancing utilization when multiple types can 
        
                   //support the same primitive(s). 
        
                   std::stable_sort(candidate_types.begin(), candidate_types.end(), 
        
                                    [&](t_logical_block_type_ptr lhs, t_logical_block_type_ptr rhs) { 
        
                                        int lhs_num_instances = 0; 
        
                                        int rhs_num_instances = 0; 
        
                                        // Count number of instances for each type 
        
                                        for (auto type : lhs->equivalent_tiles) 
        
                                            lhs_num_instances += mutable_device_ctx.grid.num_instances(type, -1); 
        
                                        for (auto type : rhs->equivalent_tiles) 
        
                                            rhs_num_instances += mutable_device_ctx.grid.num_instances(type, -1); 
        
                                        float lhs_util = vtr::safe_ratio<float>(num_used_type_instances[lhs], lhs_num_instances); 
        
                                        float rhs_util = vtr::safe_ratio<float>(num_used_type_instances[rhs], rhs_num_instances); 
        
                                        //Lower util first 
        
                                        return lhs_util < rhs_util; 
        
                                    }); 
        
               } 
        
               if (log_verbosity_ > 2) { 
        
                   VTR_LOG("\tSeed: '%s' (%s)", root_atom_name.c_str(), root_model->name); 
        
                   VTR_LOGV(seed_mol.pack_pattern, " molecule_type %s molecule_size %zu", 
        
                            seed_mol.pack_pattern->name, seed_mol.atom_block_ids.size()); 
        
                   VTR_LOG("\n"); 
        
               } 
        
               //Try packing into each candidate type 
        
               bool success = false; 
        
               t_logical_block_type_ptr block_type; 
        
               LegalizationClusterId new_cluster_id; 
        
               for (auto type : candidate_types) { 
        
                   //Try packing into each mode 
        
                   e_block_pack_status pack_result = e_block_pack_status::BLK_STATUS_UNDEFINED; 
        
                   for (int j = 0; j < type->pb_graph_head->pb_type->num_modes && !success; j++) { 
        
                       std::tie(pack_result, new_cluster_id) = cluster_legalizer.start_new_cluster(seed_mol_id, type, j); 
        
                       success = (pack_result == e_block_pack_status::BLK_PASSED); 
        
                   } 
        
                   if (success) { 
        
                       VTR_LOGV(log_verbosity_ > 2, "\tPASSED_SEED: Block Type %s\n", type->name.c_str()); 
        
                       // If clustering succeeds return the new_cluster_id and type. 
        
                       block_type = type; 
        
                       break; 
        
                   } else { 
        
                       VTR_LOGV(log_verbosity_ > 2, "\tFAILED_SEED: Block Type %s\n", type->name.c_str()); 
        
                   } 
        
               } 
        
               if (!success) { 
        
                   //Explored all candidates 
        
                   if (seed_mol.type == e_pack_pattern_molecule_type::MOLECULE_FORCED_PACK) { 
        
                       VPR_FATAL_ERROR(VPR_ERROR_PACK, 
        
                                       "Can not find any logic block that can implement molecule.\n" 
        
                                       "\tPattern %s %s\n", 
        
                                       seed_mol.pack_pattern->name, 
        
                                       root_atom_name.c_str()); 
        
                   } else { 
        
                       VPR_FATAL_ERROR(VPR_ERROR_PACK, 
        
                                       "Can not find any logic block that can implement molecule.\n" 
        
                                       "\tAtom %s (%s)\n", 
        
                                       root_atom_name.c_str(), root_model->name); 
        
                   } 
        
               } 
        
               VTR_ASSERT(success); 
        
               VTR_ASSERT(new_cluster_id.is_valid()); 
        
               VTR_LOGV(log_verbosity_ > 2, 
        
                        "Complex block %zu: '%s' (%s) ", size_t(new_cluster_id), 
        
                        cluster_legalizer.get_cluster_pb(new_cluster_id)->name, 
        
                        cluster_legalizer.get_cluster_type(new_cluster_id)->name.c_str()); 
        
               VTR_LOGV(log_verbosity_ > 2, "."); 
        
               //Progress dot for seed-block 
        
               fflush(stdout); 
        
               // TODO: Below may make more sense in its own method. 
        
               // Successfully created cluster 
        
               num_used_type_instances[block_type]++; 
        
               /* Expand FPGA size if needed */ 
        
               // Check used type instances against the possible equivalent physical locations 
        
               unsigned int num_instances = 0; 
        
               for (auto equivalent_tile : block_type->equivalent_tiles) { 
        
                   num_instances += mutable_device_ctx.grid.num_instances(equivalent_tile, -1); 
        
               } 
        
               if (num_used_type_instances[block_type] > num_instances) { 
        
                   mutable_device_ctx.grid = create_device_grid(packer_opts_.device_layout, 
        
                                                                arch_.grid_layouts, 
        
                                                                num_used_type_instances, 
        
                                                                packer_opts_.target_device_utilization); 
        
               } 
        
               return new_cluster_id; 
        
           }

This code performs the following steps:

Get a list of logical block types that this molecule may be a part of.
Sort that list by the current utilization of that logical block type (how many of them exist in the architecture at this moment in time in the Packing).
Go through each type, and each mode of that type, in order until a valid cluster can be created.

Within the Cluster Legalizer, this logical block type is used to create the actual physical block of this cluster (which resembles a hierarchical graph representing the physical blocks in a sub-tile in the architecture):

vtr-verilog-to-routing/vpr/src/pack/cluster_legalizer.cpp

Lines 1459 to 1464 in 23b3aa3

    
           // Create the physical block for this cluster based on the type. 
        
           t_pb* cluster_pb = new t_pb; 
        
           cluster_pb->pb_graph_node = cluster_type->pb_graph_head; 
        
           alloc_and_load_pb_stats(cluster_pb); 
        
           cluster_pb->parent_pb = nullptr; 
        
           cluster_pb->mode = cluster_mode;

This physical block structure is used by the cluster legalizer to decide which molecules can be legally packed into the cluster (by ensuring that a path exists from the molecules pins to where they need to go).

Based on my understanding of the cluster legalizer, I would be very surprised if the physical block of a cluster were to completely change after a cluster has been legalized, especially in the Placer. Once we reach the placer, the Cluster Legalizer object is destroyed and is not used (but the physical block object remains). Without extra information, I believe that in order to change from one physical block type to another, one would have to rerun the Cluster Legalizer (which basically performs a PathFinder within the cluster to find paths from input pins, to molecules, to output pins).

However, I wonder if equivalent types come with a guarantee that the intra-cluster routing will not change, and the logical block type of these clusters can mutate between these types without fully running the Cluster Legalizer. If that is the case, it would make sense for the Placer to only change the logical block type, but not actually update the physical block information (since it can take some work to regenerate the t_pb of a cluster, and it may change the type again later). If this is the case, perhaps the physical block of the cluster just needs to be regenerated after the placer; however, I am not sure if that is what is causing the timing analysis issues you are mentioning

I agree with you completely though that a check should be added to verify that the logical block type of the the cluster matches the physical type of the cluster and matches the physical type of the site it wants to be placed in.

Hopefully this information is helpful! This is just what logical / physical cluster types look like from the perspective of Packing. I am not sure how different it appears in the Placement / Routing stages; especially considering flat routing.

petergrossmann21 · 2025-02-08T19:02:05Z

@AlexandreSinger thanks for the quick reply! I think your comments round out the picture for me somewhat. If all of the legalization is happening before placement even starts (makes sense when site equivalence is not in play), then it's not difficult to imagine there being some opportunity for things to fall out of sync during placement.

I will need to study the equivalent type usage within the packer and placer more to confirm their role in placement, but I am at least attempting to use them in a way where the architecture model implicitly provides a guarantee of intra-cluster routing immutability and thereby can always get a correct placement if equivalent sites are swapped.

For what it's worth, equivalent site definitions include XML syntax to allow the architect to specify either a direct (1:1) pin mapping between the sub_tile and the site, or a custom pin mapping. I'm willing to speculate that perhaps the custom pin mapping enables the intra-cluster routing solution to be preserved when swapping between top level <pb_type>s that do not have identical port lists. In my case, this is irrelevant; all of my equivalent sites have direct mappings and identical port lists. A complete test suite for my proposed feature might need to exercise the custom pin mapping case, though.

I will also need to deepen my understanding of the distinction between logical block type and physical block type (mainly which gets used where) to prevent miscommunication. Once I have that straight, I'll be able to comment with more confidence on which of the two types the timing data is tied to.

AlexandreSinger · 2025-02-08T19:43:51Z

@petergrossmann21 Not a problem! Happy to help!

Regarding the distinction between logical and physical block types, there is some documentation here that explains it:

vtr-verilog-to-routing/libs/libarchfpga/src/physical_types.h

Lines 928 to 1040 in 23b3aa3

    
           /* Describes the type for a logical block 
        
            * name: unique identifier for type 
        
            * pb_type: Internal subblocks and routing information for this physical block 
        
            * pb_graph_head: Head of DAG of pb_types_nodes and their edges 
        
            * 
        
            * index: Keep track of type in array for easy access 
        
            * physical_tile_index: index of the corresponding physical tile type 
        
            * 
        
            * pin_logical_num_to_pb_pin_mapping: Contains all the pins, including pins on the root-level block and internal pins, in 
        
            * the logical block. The key of this map is the logical number of the pin, and the value is a pointer to the 
        
            * corresponding pb_graph_pin 
        
            * 
        
            * primitive_pb_pin_to_logical_class_num_mapping: Maps each pin to its corresponding class's logical number. To retrieve the actual class, use this number as an 
        
            * index to logical_class_inf. 
        
            * 
        
            * logical_class_inf: Contains all the classes inside the logical block. The index of each class is the logical number associate with the class. 
        
            * 
        
            * A logical block is the implementation of a component's functionality of the FPGA device 
        
            * and it identifies its logical behaviour and internal connections. 
        
            * 
        
            * The logical block type is mainly used during the packing stage of VPR and is used to generate 
        
            * the packed netlist and all the corresponding blocks and their internal structure. 
        
            * 
        
            * The logical blocks than get assigned to a possible physical tile for the placement step. 
        
            * 
        
            * A logical block must correspond to at least one physical tile. 
        
            */ 
        
           struct t_logical_block_type { 
        
               std::string name; 
        
               /* Clustering info */ 
        
               t_pb_type* pb_type = nullptr; 
        
               t_pb_graph_node* pb_graph_head = nullptr; 
        
               int index = -1; /* index of type descriptor in array (allows for index referencing) */ 
        
               std::vector<t_physical_tile_type_ptr> equivalent_tiles; ///>List of physical tiles at which one could 
        
                                                                       ///>place this type of netlist block. 
        
               std::unordered_map<int, t_pb_graph_pin*> pin_logical_num_to_pb_pin_mapping;                    /* pin_logical_num_to_pb_pin_mapping[pin logical number] -> pb_graph_pin ptr} */ 
        
               std::unordered_map<const t_pb_graph_pin*, int> primitive_pb_pin_to_logical_class_num_mapping;  /* primitive_pb_pin_to_logical_class_num_mapping[pb_graph_pin ptr] -> class logical number */ 
        
               std::vector<t_class> primitive_logical_class_inf;                                              /* primitive_logical_class_inf[class_logical_number] -> class */ 
        
               std::unordered_map<const t_pb_graph_node*, t_class_range> primitive_pb_graph_node_class_range; /* primitive_pb_graph_node_class_range[primitive_pb_graph_node ptr] -> class range for that primitive*/ 
        
               // Is this t_logical_block_type empty? 
        
               bool is_empty() const; 
        
             public: 
        
               /** 
        
                * @brief Returns the logical block port given the port name and the corresponding logical block type 
        
                */ 
        
               const t_port* get_port(std::string_view port_name) const; 
        
               /** 
        
                * @brief Returns the logical block port given the pin name and the corresponding logical block type 
        
                */ 
        
               const t_port* get_port_by_pin(int pin) const; 
        
           }; 
        
           /************************************************************************************************* 
        
            * PB Type Hierarchy                                                                             * 
        
            ************************************************************************************************* 
        
            * 
        
            * VPR represents the 'type' of block types corresponding to FPGA grid locations using a hierarchy 
        
            * of t_pb_type objects. 
        
            * 
        
            * The root t_pb_type corresponds to a single top level block type and maps to a particular type 
        
            * of location in the FPGA device grid (e.g. Logic, DSP, RAM etc.). 
        
            * 
        
            * A non-root t_pb_type represents an intermediate level of hierarchy within the root block type. 
        
            * 
        
            * The PB Type hierarchy corresponds to the tags specified in the FPGA architecture description: 
        
            * 
        
            *      struct              XML Tag 
        
            *      ------              ------------ 
        
            *      t_pb_type           <pb_type/> 
        
            *      t_mode              <mode/> 
        
            *      t_interconnect      <interconnect/> 
        
            *      t_port              <port/> 
        
            * 
        
            * The various structures hold pointers to each other which encode the hierarchy. 
        
            */ 
        
           /** Describes the type of clustered block if a root (parent_mode == nullptr), an 
        
            *  intermediate level of hierarchy (parent_mode != nullptr), or a leaf/primitive 
        
            *  (num_modes == 0, model != nullptr). 
        
            * 
        
            *  This (along with t_mode) corresponds to the hierarchical specification of 
        
            *  block modes that users provide in the architecture (i.e. <pb_type/> tags). 
        
            * 
        
            *  It is also useful to note that a single t_pb_type may represent multiple instances of that 
        
            *  type in the architecture (see the num_pb field). 
        
            * 
        
            *  In VPR there is a single instance of a t_pb_type for each type, which is referenced as a 
        
            *  flyweight by other objects (e.g. t_pb_graph_node). 
        
            * 
        
            *  Data members: 
        
            *      name: name of the physical block type 
        
            *      num_pb: maximum number of instances of this physical block type sharing one parent 
        
            *      blif_model: the string in the blif circuit that corresponds with this pb type 
        
            *      class_type: Special library name 
        
            *      modes: Different modes accepted 
        
            *      ports: I/O and clock ports 
        
            *      num_clock_pins: A count of the total number of clock pins 
        
            *      num_input_pins: A count of the total number of input pins 
        
            *      num_output_pins: A count of the total number of output pins 
        
            *      num_pins: A count of the total number of pins 
        
            *      timing: Timing matrix of block [0..num_inputs-1][0..num_outputs-1] 
        
            *      parent_mode: mode of the parent block 
        
            *      t_mode_power: ??? 
        
            *      meta: Table storing extra arbitrary metadata attributes. 
        
            */ 
        
           struct t_pb_type {

This comment does try to explain when it is being used:

Hopefully this will be helpful! The distinction between logical and physical block types is something I have found very confusing and something I have wanted to refactor at some point; but it would not be easy! My cleanups to the Cluster Legalizer may make that easier, but I am not sure.

petergrossmann21 · 2025-02-08T20:23:39Z

@AlexandreSinger Thanks for the follow up

These comments confirm what I was seeing in some parts of the code I was studying and the closest thing I have to intuition for the distinction. Loosely speaking, I tend to think of it this way:

The <complexblocklist> section of the XML file enumerates the logical block types and their properties.
The <tiles> section of the XML file enumerates the physical block types and links them to 1+ logical block types.

It would appear that the notion of a physical block in the packer diverges from this somewhat, as it is not possible during packing to divine which of several equivalent tiles types the cluster might be placed to, and so if the term "physical block" is being used during packing it necesssarily has a different meaning than the one I'm using above. That said, there is still a notion of physical implementation during packing since it must solve a small routing problem to connect blocks within the cluster, so I can see why such terminology would be adopted.

Borrowing my definitions for the moment to avoid confusion, delay parameters reside squarely in the logical block definitions, as they are enumerated in the <complexblocklist> section's <pb_type> definitions. In order to form a complete netlist, the packer must select a block type for each cluster, and so it does so. Whether the block type is physical or logical is actually kind of moot. All that really matters is that at the end of packing, the block type remains mutable if it has multiple equivalent sites. Then, during placement, there must be some basis for netlist and timing data to be updated when a move legally changes the block type from one to another. The netlist requires updating so that the correct types are emitted in a new .net file after placement, and the timing data requires updating because each block type may have unique delay parameters. Once placement is fixed, routing may proceed with certainty that the timing information is correct.

vaughnbetz · 2025-02-10T22:45:47Z

@petergrossmann21 : This tutorial may be helpful in detailing common uses of the equivalent_sites feature: https://docs.verilogtorouting.org/en/latest/tutorials/arch/equivalent_sites/

You are right that they are distinct from the packer's notion of "physical types". The basic reason they were created is for things like MLABs vs. LABs (some tiles can implement memory, some can't) and things like different I/Os with different pin-outs.

You are right that timing could be another use for them, but currently we didn't build in that feature.

The timing analyzer makes callbacks to delay calculators to get the delay of the connections between primitives. So rather than 're-syncing' the netlist with the equivalent tile type at which a block has been placed I think it would be cleaner and easier to upgrade the post-routing delay calculator (or other delay calculators if needed, but perhaps routing is the only one that really needs to be this accurate). We don't have a way to get those additional delays into vpr right now though, so you'd need to make a proposal of how to get them in. Possibilities would be to add optional delay data to the tile definition and/or complex block type description (which could turn into a fair amount of code and xml unfortunately) or some other separate detailed delay file (which also seems like a significant change to get all the data in the right place). Perhaps the easiest would be to allow a list of delays in the complex block type delay data (per interconnect and per cell, with a key of the tile location equivalent_site type and a default of delays apply to all tile locations). Probably worth discussing in a Thursday meeting!

petergrossmann21 · 2025-02-11T03:13:41Z

@vaughnbetz Thanks for weighing in.

I'm not too worried about the XML size bloating -- that's inevitable if it's where the delay data is stored, and I will be auto-generating it in any case.

A thought ahead of a real discussion on Thursday: is there a case where the original intended use of equivalent sites can create an issue with the post-route simulation flow due to the packed netlist packing to a pb_type that is not what is used in the final netlist? I could see where the answer would probably be no in the common case. Consider the SLICEL/SLICEM example in the tutorial. If the packer produces a netlist that's all SLICELs, I would expect moving a subset of them to SLICEM locations in the grid won't actually change the post-route netlist since the netlist will be comprised of lower-level primitives. I think it might be less clear, though, whether the SDF would be correct since any differences in interconnect delay between the SLICEL and SLICEM would not be captured.

Either way, I'll benefit from understanding the nuances of the post-routing delay calculator better; that will make it much easier to frame a proposed solution.

amin1377 · 2025-02-13T23:32:06Z

Hi @petergrossmann21,

As we discussed in today's VTR meeting, you can find how the delay for a timing path is retrieved here. This applies when the path is not "cached," meaning either this is the first time the delay for the path is being calculated, or the path has changed since the last computation.

The delay is divided into three parts, as explained here.

Launch cluster delay refers to the intra-cluster delay of the source cluster.
Capture cluster delay refers to the intra-cluster delay of the sink cluster.

When the two-stage router is used, the intra-cluster routing results obtained during packing are reused, as implemented in this function.
However, when the flat router is used, the delay is calculated directly here, without breaking the path into the three separate components mentioned above.

For flat router timing, I had to experiment with the delay calculator and Tatum quite a bit. Let me know if you need any help!

amin1377 · 2025-02-14T00:56:21Z

In addition to what @AlexandreSinger mentioned about logical and physical blocks, I thought it might be useful to highlight the following points:

t_logical_block_type (here) and all the data structures instantiated inside it (t_pb_type (here) and t_pb_graph_node (here)) are architecture-dependent. They contain only the information listed under the <complexblocklist> tag in the architecture file. These data structures store all modes of pb_types, routing between sub-blocks, delay information, etc.
t_physical_tile_type (here) is also architecture-dependent and contains information listed under the <tile> tag, such as the number of pins, width, height, sub-tiles, etc.
t_pb(here) is a netlist-dependent data structure. For each block in the cluster netlist, there is one instance of this structure. It contains only the information relevant to that cluster block. For example, it does not store all modes of a block or all intra-cluster routing details. Instead, it only retains the used modes, used routing, and used sub-blocks.

When the Cluster Netlist is formed, a cluster block ID can reference both a t_logical_block_type and a t_pb. However, it is important to note that multiple cluster blocks may reference the same t_logical_block_type, but each cluster block has its own unique t_pb.

Each (x, y, layer_number) location corresponds to a t_physical_tile_type. Multiple locations may reference the same t_physical_tile_type, and each t_physical_tile_type may be compatible with multiple types of t_logical_block_type that can be placed on it.

I hope this explanation helps clarify the purpose of each data structure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use of Equivalent Sites Ambiguates PB Type Usage in Place and Route #2888

Use of Equivalent Sites Ambiguates PB Type Usage in Place and Route #2888

petergrossmann21 commented Feb 7, 2025

petergrossmann21 commented Feb 8, 2025

AlexandreSinger commented Feb 8, 2025

petergrossmann21 commented Feb 8, 2025

AlexandreSinger commented Feb 8, 2025

petergrossmann21 commented Feb 8, 2025

vaughnbetz commented Feb 10, 2025 •

edited

Loading

petergrossmann21 commented Feb 11, 2025

amin1377 commented Feb 13, 2025

amin1377 commented Feb 14, 2025

Use of Equivalent Sites Ambiguates PB Type Usage in Place and Route #2888

Use of Equivalent Sites Ambiguates PB Type Usage in Place and Route #2888

Comments

petergrossmann21 commented Feb 7, 2025

Expected Behaviour

Current Behaviour

Possible Solution

Steps to Reproduce

Context

Your Environment

petergrossmann21 commented Feb 8, 2025

AlexandreSinger commented Feb 8, 2025

petergrossmann21 commented Feb 8, 2025

AlexandreSinger commented Feb 8, 2025

petergrossmann21 commented Feb 8, 2025

vaughnbetz commented Feb 10, 2025 • edited Loading

petergrossmann21 commented Feb 11, 2025

amin1377 commented Feb 13, 2025

amin1377 commented Feb 14, 2025

vaughnbetz commented Feb 10, 2025 •

edited

Loading