- 1. Introduction
- 2. Day 1: Inception of opensource EDA, OpenLANE and Sky130 PDK
- 3. Day 2: Understand importance of good floorplan vs bad floorplan and introduction to library cells
- 4. Day 3: Design library cell using Magic Layout and ngspice characterization
- 5. Day 4: Pre-layout timing analysis and importance of good clock tree
- 6. Day 5: Final steps for RTL2GDS using tritonRoute and openSTA
- 7. Conclusion
- 8. Acknowledgements
Advanced Physical Design using OpenLANE was a five day cloud based workshop organized by VSD. The main objectives of the workshop were:
- Inception of open-source EDA, OpenLANE and Sky130 PDK.
- Designing and characterize one library cell using Magic Layout tool and ngspice.
- Incorporating this cell in the OpenLANE RTL2GDSII flow.
- Anlaysing the results of each step and improving the design by changing design strategies and parameters.
- Performing timing analysis and resolving design violations.
- What is an Instruction Set Architecture?
The Instruction Set Architecture (ISA) defines the software specification for a hardware. ISA defines a set of instructions and how they behave, data types, registers, addressing modes, etc.
RISC V is an open Instruction Set Architecture. It is a well organised ISA divided into categories and extensions. ARM architecture is commonly used for mobile phones and the x86 is used for personal computers.
- How it helps in bridging the gap between Software and Hardware?
Suppose we need to run a C program or any high level language program on a hardware. The main problem is that the hardware cannot understand the high level language and can understand only the machine level language i.e. binary language. It is here where the ISA comes to rescue. The compiler converts the high level language into a RISC V assembly level language. This is then converted into a machine level program which contains binary language. The hardware can understand this language and thus ISA helps in bridging the gap between Software and Hardware.
There are three components of an ASIC design flow. They are RTL IP's, EDA tools (like SPICE, MAGIC, OpenLANE, etc.), PDK. PDK stands for Process Design Kit. It includes process design rules: DRC, LVS, etc., device models, digital standrd cell library, input output libraries.
- Synthesis: Converts an RTL code written in HDL to a circuit out of components from the standard cell library.
- Floor/Power Planning:
- Chip Floor Planning: It is the process of partitioning the chip die between different system building blocks and place the i/o pads.
- Macro Floor Planning: Here, we define the dimensions, pin locations, rows and routing tracks.
- Power Planning: Here, the power network (Vdd and Ground) is constructed.
- Placement: Here, we place the cells on the floorplan rows, alligned with the sites. It is usually done in 2 steps:
- Global Placement: Here, we find the optimal positions of the cells. These positions may not be legal.
- Detailed Placement: The positions obtained from global placement are altered to obtain legal positions.
- Clock Tree Synthesis: Here, we create a clock distribution network to deliver the clock to all sequential elements with minimum skew.
- Routing: Here, we implement the interconnects using the available metal layers. The PDK defines the thickness, pitch, the minimum width and the vias. We use Sky130 PDK which has 6 metal layers. The lowest layer is called local interconnect layer and is made of Titanium nitrate. The following 5 metal layers are Aluminium layers.
- Physical Verification: It includes Design Rule Checking (DRC) and Layout vs Schematic (LVS).
- Timing Verification: Static Timing Analysis
OpenLANE is the open source EDA tool used in the workshop. It is fully automated RTL2GDSII tool. Its main goal is to produce a GDSII with no human intervention, no LVS violations, no DRC violations, no timing violations. striVe is a family of open PDK, open EDA, open RTL.
It is tuned for SkyWater 130 nm open PDK. It can be used to harden the Macros and chips. It has two modes of operation: Autonomous or Interactive.
The OpenLANE detailed ASIC design flow along with the tools used is as shown below:
As we have already seen, PDK stands for process design kit. The pdks directory has three subdirectories:
- skywater-pdk : It contains all PDK related files which provide information about timing library, tech lef, cell lef, etc.
- open_pdks : It contains a set of files to convert foundry level PDKs to be compatible with opensource EDA tools.
- sky130A : It contains the PDK which has been made compatible with OpenLANE environment.
The sky130A contains libs.ref (contains all the process specification files) and libs.tech (contains files specific to the tools).
There are two modes in which the tool works.
-
Automatic mode: ./flow.tcl -design <design_name>
-
Interactive mode: 1. ./flow.tcl -interactive 2. package require openlane 0.9 ( importing the package ) 3. prep -design <design_name> ( preparing the design)
Before synthesis, we need to prepare the file system, the data design setup stage which will be setting the data for our design. Merging of lef.py happens. The cell level lef file and the technology level lef files are merged.
This picture shows the runs and the merged.lef file created after preparing the design:
- config.tcl in runs folder shows which all are the default parameters taken by runs.
- cmds.log keeps the record of all the commands.
To run the synthesis, the command run_synthesis must be given.
The below shown screen will appear after a successful synthesis:
It is the ratio of the number of D flip flops to the total number of cells.
The obtained number of gates after synthesising the design of picorv32a are as shown:
The flop ratio is found to be 0.09432 .
It is the ratio of the number of buffers to the total number of cells.
The buffer ratio was found to be 0.1322 .
The following are the considerations of a Chip Floorplan:
- Utilization Factor
- Aspect Ratio
- Location of preplaced cells
- Decoupling capacitors
- Power planning
- Pin placement
- Logical cell placement blockage
First step in the physical design flow is to define the height and width of die and core.
Utilization Factor is the ratio of area occupied by the netlist to the total area of the core.
A utilization factor of 1 signifies that the entire area of the core is occupied by the netlist. However, practically we wont have a utilization factor of 1.
Aspect ratio is the ratio of the height of the core to width of the core.
Aspect ratio signifies whether the core is rectangle or square. An aspect ratio of 1 signifies that the core is a square.
Preplaced cells are the Macros or IPs which is a piece of complex logic and is being used multiple times. The arrangement of these IPs is included in floorplanning. These blocks have user defined locations and hence are placed in chip before automated placement and routing. These preplaced cells will be placed depending on the design scenario.
Due to the resistance and the inductance of the power supply wire, the power supplies Vdd and Vss would be dropped by some value to Vdd' and Vss' near the cells. If Vdd' goes below noise margin, the logic 1 at the output of the cell will not be detected as logic 1.
Solution: Adding the decoupling capacitors parallel with the circuit. Everytime the circuit switches, it draws current from the Capacitor and the RL network is used to replenish the charge into The capacitor. Hence it decouples the circuit from the supply.
There is a chance of noise getting introduced in the circuit if the voltage droop and ground bounce exceed the noise margin. This problem is due to a single power supply. So multiple power supplies are needed so that if any logic demands current, it takes current from the nearest power supply.
The image explaining this is shown below:
The connectivity information between the gates is coded using VHDL or verilog language and is called as a netlist. The area between the die and the core is used for placing the I/O pins.
The area between the die and core is used to fill pin information and is blocked for the placement and routing tool as it is reserved for pin placement.
To run the floorplan, the command run_floorplan must be given.
Below is a screenshot of the floorplan.tcl file and the ioPlacer.log file :
We find the area occupied by the cell in the file named picorv32a.floorplan.def . The screenshot of that file is as shown:
Given below are the different variables and their default values in floorplanning. It can be found in the README.md file in the configuration directory.
Here we can find that the Lower Left coordinates of the Die are: (0 0) ; the Upper Right coordinates of the die are: (1057235 806405).
To open the layout in magic, the following command must be typed:
magic -T lef read def read
The screenshot of the layout opened in MAGIC tool is as shown:
Here we can observe that the input pins are placed equidistant to each other. The decoupling capacitor cells are arranged in the boundary of the core. Also, the standard cells are kept in the lower left corner.
We now have the floorplan, netlist and the cell library. The next step is to place the netlist in the floorplan using the library.
Next the placement has to be optimized. This is the stage where we estimate the wire lengths and based on that, insert repeaters. To maintain the signal integrity, repeaters are used. Repeaters are basically buffers that will recondition the original signal, make a new signal replecatind the original one. But inturn this will consume area.
There are two stages of placement namely global placement and detailed placement.
The main goal in placement is to reduce the HPWL. HPWL is the half parametric wire length.
The command run_placement has to be given.
After a successful placement, we can open the layout in the MAGIC tool using the command:
magic -T lef read def read
The screenshot of the layout after placement is as shown below:
A standard cell library has the cells with different sizes, different functionality and also with different threshold voltage. It contains the information about the width, height, delay, etc of each cell in it.
The cell design flow is as follows:
- Inputs: Process Design Kits (PDKs), DRC and LVS rules, SPICE Models, Library and user defined specs.
- Design Steps: Circuit design, Layout Design, Characterization.
- Outputs: CDL (Circuit Description Language), GDSII, LEF, extracted spice netlist (.cir), timing, noise, power .libs, functions.
The characterization flow is as follows:
- Read the model files.
- Read the extracted spice netlist.
- Recognize the behaviour of the circuit.
- Read the subcircuit files.
- Attach the necessary power supplies.
- Apply the stimulus.
- Provide necessary output capacitance.
- Provide necessary simulation commands.
These are to be fed as a configuration file to the characterization software called GUNA. The software will generate timing, noise, power .libs, functions.
There are 3 types of Characterization:
- Timing Characterization
- Power Characterization
- Noise Characterization
- slew_low_rise_thr 20%
- slew_high_rise_thr 80%
- slew_low_fall_thr 20%
- slew_high_fall_thr 80%
- in_rise_thr 50%
- in_fall_thr 50%
- out_rise_thr 50%
- out_fall_thr 50%
There are totally 4 strategies of the pin placement supported by IO placer.
Initially, it was set to 1 where the pins were equidistant to each other.We can find the default setting in the floorplan.tcl file. The snapshots of it are as follows:
To change the mode on fly, the following command has to be typed set ::env(FP_IO_MODE) <Mode_number>. On changing the mode number to 2, the following result was obtained:
It was observed that the pins were stacked.
This will copy all the .lib and .mag files from the Git repo to the local machine.
The snapshot of the cloning process is shown below:
First step to charaterize a gate is to create a SPICE deck. A SPICE deck contains information about the following :
- Component connctivity
- Component values (the dimensions of MOSFETs, input gate voltage, supply voltage, capacitor values, etc..)
- Location and names of the nodes.
Vm is the point where Vin = Vout. At this point, both the NMOSFET and the PMOSFET will be in the saturation region.
16 Mask process is the process of fabricating the CMOS gates and ICs using 16 different masks. It consists of the following steps:
- Selection of the substrate.
- Creating active region for transistors.
- N well and P well formation.
- Formation of gate.
- Lightly Doped Drain formation.
- Source & Drain formation.
- Formation of Contacts and Interconnects.
- Higher level metal formation.
To open the inverter layout on the MAGIC tool, the command magic -T <tech_file> <mag_file> has to be given.
Following are the snapshots of the layout:
To extract the .spice file from the MAGIC tool, including all the parasitic capacitances, the following codes has to be given in the MAGIC console:
The extracted spice file is opened and necessary things are added so as to run the file in the Ng SPICE tool. The model files are included. The model name is updated. The power supply are added. A pulse input is given at the gate. The snapshot of the updated spice file is shown below:
Below is the snapshot of the PMOSFET model file:
Below is the snapshot of the NMOSFET model file:
To run it on the Ng SPICE tool, the following command is to be given:
ngspice <file_name.spice>
The transient response is obtained by running ' plot y vs time a '. The transient response is obtained as follows:
The rise tranistion time is calculated by finding the difference between the time taken to reach 80% of the output from the 20% of the output. The snapshot below shows the time at the two positions. It was calculated to be
The cell rise propagation delay is calculated as the difference of the time taken to reach 50% of the input to 50% of the output. The snapshot below shows the time at two positions. It was calculated to be
Tracks are used in the process of routing. We can find a track.info file in the address : pdks/sky130A/libs.tech/ sky130_fd_sc_hd/ .
A screenshot of the contents of the file is as shown below:
This file contains information about the horizontal and vertical offsets and pitches of each metal layer.
Some guidelines while making the standard cells :
- The input and output ports must lie on the intersection of the vertical and horizontal tracks.
- The width of the standard cells must be odd multiples of the track pitch.
- The height must be odd multiples of track vertical pitch.
In the below picture, we can see that the ports are located at the intersection of the tracks.
For PnR, we will not be needing all the details of the logic of the cell; we would be requiring only their boundary, the power and GND rails, the input and output port information. Here, the LEF file comes into the picture which would contain only the required information.
So, the next step would be to extract LEF file out of the mag file.
For extracting the lef file, the following command must be given in the magic console: lef write
This is the screenshot of the contents of the lef file.
Delay table is the table of the delays of a gate for various input slew and the output load.
For a 2 level clock buffering, the following were observed:
- At every level, each node must drive the same load.
- At a given level, the buffers must be identical. Only then the total skew at the clock end ponts would be zero.
The config.tcl file had to be modified as shown:
Then, we had to invoke OpenLANE tool and prepare the design picorv32a. After this, we had to include the lef file of our custom made inverter sky130_vsdinv in the merged.lef file. So, the following comments had to be given:
set lefs [glob $::env(DESIGN_DIR)/src/*.lef]
add_lefs -src $lefs
We can see the lef file included in the merged.lef :
The following is the screenshot of the terminal after running synthesis:
It was noticed that the slack was very high. So, we tried to vary the synthesis strategies in order to reduce it.
The following were the observations after changing the strategies:
Floorplan was done using the same command run_floorplan. Then placement was performed using the command run_placement.
After placement was completed, the layout of the placement result was opened in MAGIC and our custom cell was present in it as shown below:
We can see that the cells in the layout overlap each other. This overlap was observed because of the abutment of the cells due to the power lines.
The my_base.sdc file was added in the src folder of picorv32a. Its content is as shown below:
Then a .conf file was created in the openLANE_flow folder. Its contents are as shown below:
This file was used to run STA in OpenSTA tool. To run STA, the command: sta pre_sta.conf was to be given in openLANE_flow. Below are the results obtained after running STA of this pre_sta.conf file:
Here we again observe that the slack violation is too high. It can be seen that the delay of each cell depends on the input delay and the output load capacitance (which depends on its FANOUT). So, the maximum fanout was set to 4 using the command: set (SYNTH_MAX_FANOUT) 4. The results obtained is as shown below:
Here, we observe that the slack though improved is still high. It was seen that the below shown buffer was causing a high delay.
So, we upsized it to sky130_fd_sc_hd__buf_4.
The result obtained was as follows:
The slack again has improved but is still large. So, we further find the buffers which are causing this delay in the updated worst delay path. After a series of improvements, we arrived at a slack of -0.69 ns.
Thus, iteratively we had to improve the delay of the worst delay path.
This STA was done outside the OpenLANE flow, hence in order to include its results in the flow, we had to replace the existing picorv32a.synthesis.v file in the results of synthesis with the one generated by this STA. Hence, to write the verilog file, the following command was given: write_verilog
.This replaced the existing verilog file under the results of synthesis. This would be used for the further steps in the flow after synthesis. The floorplan and the placement were done as usual. After the placement, the core area was observed as follows:
After this the CTS was performed using the command: run_cts. The tool TritonCTS was used. The following were the parameters of cts that could be varied:
The results of CTS were as follows:
At the end of CTS, the clock buffers would be added to the netlist and the updated verilog file named picorv32a.synthesis_cts.v was created in the synthesis results of the picorv32a design.
STA in OpenLANE is performed using the tool Openroad. Openroad is invoked in the OpenLANE flow by typing openroad. In this a db file had to be created, and the STA had to be performed. The screenshot after creating the db file is as shown below:
The result of STA are shown below:
This was the end of Day 4.
The power distribution network should have been generated in the floorplan step. But because of the adjustments in the OpenLANE, this is run after CTS. To see the current def, the command: echo $::env(CURRENT_DEF) should be given. This lets us know which step has be completed previously. To generate the PDN, the command : gen_pdn had to be given.This will create the grids and the stripes for the Vdd and Vss. The following was obtained after successful generation of PDN:
Here we can observe that the pitch of the standard cell rails is 2.72. So, inorder to match the power and ground rails with the power and ground ports of the cell, the height of the cell must be an integral multiple of the pitch. The chip will get power from the Vdd and Ground pads. From the pads, it gets supplied to the rings. This then enters the vertical stripes. From the stripes, it is supplied to the standard cell rails as shown in the figure below:
The next step is to run the routing.
To run routing, the command: run_routing should be given. The different strategies for routing can be found in the README.md file. The strategies are as shown below:
There are 5 routing strategies. The values 0 to 3 use TritinRoute 13. The value 14 will use the TritonRoute14. The TritonRoute 14 ensures 0 violations occur but we lose on the runtime and memory utilization.
Routing takes place in two steps:
-
Global/Fast Routing: This uses the FastRoute tool to generate a coarse 3D routing. Entire routing region is demarked into grids and the routing guide is generated. This is followed by detailed routing.
-
Detailed Routing: This uses the TritonRoute tool. The TritonRoute generates detailed routing with optimized wire length and via contact according to the route guide, connectivity constraints and design rules. The inputs to this tool are the lef, def files and the preprocessed route guides.
Given below is the screenshot after successful routing:
This step takes many optimization iterations. The following is the screenshot of the details of seventh iteration:
Total wire lengths of each metal layer is shown here. The number of violations are also shown. As the strategy used was 0, we dont have 0 violations. These violations can be found in the drc file shown. The contents of this file are shown below:
After routing, the routes will be generated. The next step is to extract the parasitics of these routes. This information is present in the SPEF file. SPEF extraction tool was not present in the OpenLANE tool, so it was performed outside the OpenLANE tool. In the SPEF extraction folder, there was a python file main.py. To start the extraction, the following command had to be given : python3 main.py
<address of the current def file (routing)>. The created SPEF file will be present in the same folder where the def is present.Given below is the screenshot of the contents of the spef file.
In the results of the synthesis, we can find 4 verilog files:
- The first one was the one generated after the first synthesis.
- The second one was the one obtained after CTS when the clock buffers were added.
- The third one was generated during routing when the antenna diodes got inserted.
- The final one was the netlist used by the tool before routing.
This completes the RTL to GDSII flow in the OpenLANE tool. We have seen all the steps involved in it. The interactive run of the OpenLANE tool was very useful and interesting. We could do as many iterations we wanted to do at any point in the flow.
In this workshop, our aim was to characterize an inverter cell and use it in the picorv32a design. On the first day, we learnt the importance of Instruction Set Architecture and were introduced to the OpenLANE tool, the RTL2GDSII flow in it. We also learnt how to run the synthesis step. On the second day, we learnt the importance of floorplanning, did floorplanning and placement. We were introduced to the standard cell characterization. On the third day, we designed an inverter standard cell using Magic Layout and ngspice characterization. We were also intrduced to the 16 Mask CMOS process which was very interesting. On the fourth day, we merged the lef file of our custom inverter with the lef of the picorv32a design and started the RTL2GDSII flow from the synthesis as a new cell was added. Different strategies were used to minimize the slack violation. We were also introduced to OpenSTA tool and improvizing the slack to meet the timing constraints. Then, we performed CTS in OpenLANE and STA in OpenRoad tool. On the last day, we generated the power distribution network and performed ruting using TritonRoute and the SPEF file was extracted.
Thus, we successfully characterized an inverter cell and used it in the picorv32a design using the OpenLANE tool.
- Kunal Ghosh, Co-Founder of VLSI System Design (VSD) Corp. Pvt. Ltd.
- Nickson Jose - VSD VLSI Engineer.