# **ECE 260C, Spring 2025** # Routing Andrew B. Kahng Thanks to: Bangqi Xu, Matt Liberty, Cho Moon, Eder Monteiro, Zhiang Wang, ... #### Physical Design Flow Pictures (old ECE 260B slide) Floorplanning **Placement** Powerplanning Routing ## Final (Detailed) Routing # **Background** - Routing challenges - Complex design rules - Enormous solution space - Physical and circuit considerations - Generic "area routing" flow - Global routing - Produces 3D "route guides" - Detailed routing - Input: route guides = union of gcells - Output: physical nets - Subject to: honoring route guides, honoring design rules http://www.ispd.cc/contests/18/index.htm GCell = global routing grid; Global router will only generate gcell-to-gcell connections #### **Critical Elements** - Must be able to clean up DRC (design rule check) violations! - Without a DRC engine -> can't tell that violations exist! - Without violation filtering have no clue what to ripup - Major source of violations: naive pin access - On-track access assumption - No inter-cell pin access compatibility check - No accurate modeling of design rules - DRC engine and robust pin access are "scuba tanks" - SCUBA: Self-Contained, Underwater, Breathing Apparatus # TritonRoute (2018-2022) – Overall Structure - Best academic detailed router for contest benchmarks - Only academic detailed router capable of delivering DRC-clean solution for commercial foundry nodes # **Geometry-Based Design Rule Checking for Detailed Routing** #### **Motivation** - Design rule checking is critical for EDA enablement - New technology has increasingly complex design rules - Mandatory physical verification for signoff - No end-to-end framework for design rule checking in the open literature - → Missing key enablement for DRC convergence #### Work at UCSD: - (i) Optimized data structures for design rule check for detailed routing - (ii) Industry-format (LEF) based design rule check methodology - (iii) Differentiation between fixable and non-fixable design rule violations for detailed routing - (iv) Foundry nodes: confirmed "clean" by commercial DRC tools # Preliminaries: Basic Geometry Objects - Basic geometry objects in a DRC checking database - Polygon edge - = Edge of a polygon - Max rectangle - = Maximum rectangle inside a polygon - Polygon set - = Union of disjoint polygons # **Preliminaries: Design Rule Syntax** - Typical design rule can have three components - Spacing value - Intrinsic property condition (optional trigger) - Extrinsic property condition (optional trigger) - Example SPACING eolSpacing ENDOFLINE eolWidth WITHIN eolWithin Spacing value Intrinsic property Extrinsic property a) EOL width < eolWidth requires eolSpace beyond EOL to either side by < eolWithin distance. 10 EOL = END-OF-LINE Kahng ECE 260C SP25 # Preliminaries: Design Rule Violation Marker - A design rule violation marker consists of - Bounding box where - Layer - Design rule $\rightarrow$ " - Usage: Give hints to DR where / what to ripup - Example - Rule: SPACING 0.06 - → Object needs to be 0.06 unit away from each other Kahng ECE 260C SP25 #### Preliminaries: Fixable and Non-Fixable DRC - Knowing locations of DRCs is not enough - DRC could happen inside standard cell itself - DRC could happen between PG stripes - → DR cannot help resolve such **non-fixable** DRCs - Need to filter DRCs before give them to DR - I.e., only provide DR with fixable DRCs - Example Fixable violation Non-fixable violation #### **Problem Statement** Goal: Given layout objects, find fixable design rule violations (if any) #### Inputs - Design layout database - Design rules #### Constraints - Design rule checking bounding box - Layer range (e.g., M2-M5) #### Output Fixable design rule violation markers # **Database Objects** | Object | Meaning | Status | Geometries | |--------------|-------------------------|---------|--------------| | instTerm | cell pin | fixed | polygon(s) | | term | block IO pin | fixed | polygon(s) | | instBlockage | cell blockage | fixed | polygon(s) | | blockage | block blockage | fixed | polygon(s) | | pathSeg | regular net wire | routing | rectangle | | pathSeg | special net wire | fixed | rectangle | | via | regular net via | routing | rectangle(s) | | via | special net via | fixed | rectangle(s) | | patchMetal | regular net patch metal | routing | rectangle | | patchMetal | special net patch metal | fixed | rectangle | #### **Data Structure** - Object (fixed) & object (routing) - → Key for fixable / non-fixable differentiation - Polygon set (fixed) - Union of fixed obj. shapes - Polygon set (routing) - Union of routing obj. shapes | Object | Status | |--------------|---------| | instTerm | fixed | | term | fixed | | instBlockage | fixed | | blockage | fixed | | pathSeg | routing | | pathSeg | fixed | | via | routing | | via | fixed | | patchMetal | routing | | patchMetal | fixed | DRC checking database # **Region Query** RTree as underlying container Layout RTree Representation - Two RTrees for each layer - MaxRect RTree - Edge Rtree - Each maxRect / Edge has property indicating whether it is **fixed** or **routing** Thanks: Dr. Bangqi Xu Α D G 16 В E # High-Level Flow for DRC Check (Single Obj.) UCSD # **Design Rules Types** Design rules are divided into two categories **Metal Shape Rule** - → Metal layer and Cut layer - Metal layer rules - Metal short - Non-sufficient-metal-overlap - Parallel run length (PRL) spacing - Minimum width - Minimum step - End-of-line (EOL) spacing - Cut layer rules - Cut short - Cut spacing **Metal Spacing Rule** Many more design rule types in advanced technologies! See the "LEF5.8" standard, e.g., here. #### **Metal Spacing Checking** #### Algorithm 1 Check metal spacing ``` 1: Input: max rectangle m 2: N \leftarrow \text{queryMaxRectangles}(m, maxDist) 3: for all n \neq m in N do 4: if isOverlap(m, n) then 5: if getNet(m) = getNet(n) then checkNSMetal(m, n) 7: else 8: checkMetalShort(m, n) 9: end if 10: else 11: checkPRL(m, n) 12: end if 13: end for ``` https://www.ispd.cc/contests/19/Introduction of ISPD19 Contest Problem.pdf #### **Short Checking** #### Algorithm 2 Check metal short ``` Input: max rectangles m, n shortRect ← getIntersection(m, n) if isFixed(m) AND isFixed(n) then return end if if isCoveredByPin(shortRect) AND isBlockage(m, n) then return end if if not hasRouting(shortRect) then return end if addMarker(MetalShort) ``` #### **NS-Metal Checking** #### Algorithm 3 Check non-sufficient metal overlap ``` Input: max rectangles m, n nsRect ← getIntersection(m, n) if diagLen(nsRect) ≥ minWidth then return end if if width(m) < minWidth OR width(n) < minWidth then</li> return end if if hasValid3rdObj(nsRect) then return end if addMarker(NonSufficientMetalOverlap) ``` #### PRL Spacing Checking ## Algorithm 4 Check parallel run length spacing ``` 1: Input: max rectangles m, n 2: actVal \leftarrow getActualSpacing(m, n) 3: reqVal \leftarrow getRequiredSpacing(m, n) 4: if actVal \ge regVal then return 6: end if 7: if isFixed(m) AND isFixed(n) then 8: return 9: end if 10: prlRect \leftarrow getIntersection(m, n) 11: if not hasPolyEdge(prlRect) then 12: return 13: end if 14: maxWidth \leftarrow getMaxWidth(m, n) 15: if not hasExclusiveRoutingWithin(prlRect, maxWidth) then 16: return 17: end if 18: addMarker(ParallelRunLengthSpacing) ``` #### Min Width Checking #### Algorithm 5 Check minimum width ``` 1: Input: polygon m 2: N \leftarrow \text{slicePolygon}(m, vertical) 3: for all n in N do 4: if ySpan(n) \ge minWidth then 5: return end if 6: if not hasRouting(n) then 8: return 9: end if 10: addMarker(MinimumWidth) 11: end for 12: N \leftarrow \text{slicePolygon}(m, horizontal) 13: for all n in N do 14: if xSpan(n) \geq minWidth then 15: return 16: end if 17: if not hasRouting(n) then 18: return 19: end if 20: addMarker(MinimumWidth) 21: end for ``` #### Min Step Checking #### **Algorithm 6** Check minimum step ``` 1: Input: polygon edge e 2: if length(e) < minStepLength then 3: return 4: end if 5: initializeBBox(bbox, endPoint(e)) 6: beginEdge \leftarrow e 7: numEdges \leftarrow 0 8: while beginEdge \neq nextEdge(e) do 9: e \leftarrow \text{nextEdge}(e) 10: updateBBox(bbox, endPoint(e)) 11: if length(e) < minStepLength then numEdges \leftarrow numEdges + 1 13: else 14: break 15: end if 16: end while 17: if e = beginEdge then 18: return 19: end if 20: if numEdges <= maxEdges then 21: return 22: end if 23: if not hasRoute(bbox) then 24: return 25: end if 26: addMarker(MinimumStep) ``` #### **EOL Spacing Checking** #### **Algorithm 7** Check end-of-line spacing ``` 1: Input: polygon edge e 2: if len(e) \ge eolWidth then 3: return 4: end if 5: if not hasParallelEdge(e) then 6: return 7: end if 8: E \leftarrow \text{queryPolygonEdge}(e, eolWithin, eolSpacing) 9: for all e' in E do 10: eolRect \leftarrow getIntersection(e, e') 11: if not isEmpty(eolRect) then 12: return 13: end if 14: if not hasRoute(e) then 15: return 16: end if 17: addMarker(EndOfLineSpacing) 18: end for ``` Thanks: Dr. Bangqi Xu 25 #### **Cut Spacing Checking** #### **Algorithm 8** Check cut spacing ``` 1: Input: cuts m, n 2: actVal \leftarrow getActualSpacing(m, n) 3: reqVal \leftarrow getRequiredSpacing(m, n) 4: if actVal \ge regVal then 5: return 6: end if 7: if isFixed(m) AND isFixed(n) then 8: return 9: end if 10: if not hasAdjCuts(m) then 11: return 12: end if 13: if not has Parallel Overlap (m, n) then 14: refurn 15: end if 16: if not hasArea(m, n) then 17: return 18: end if 19: addMarker(CutSpacing) ``` # Dynamic Programming-Based Multi-Level Pin Access Analysis #### **Motivation** Pin access = wire / via connection to access a pin Via access - Critical to decrease DRCs in detailed routing - → Failure results in repeating violation patterns - Need robust & scalable pin access analysis (!) #### **Previous Works / Our Work** - Existing work [Han15] assumes on-track access - Usually assume alignment between routing track and placement site - → Not always true (ISPD18/19 contests) - LUT-based abutting cell pair analysis [Xu16] - → Not scalable (> 10M combinations) #### Our work: - (i) Robust pin access point enumeration - (ii) Boundary conflict-aware access pattern enumeration - (iii) Dynamic programming-based access pattern selection for standard cell instance cluster # What Does Pin Access Analysis Do? - Testcase: ISPD18\_test10 - 290K standard cell instances - 992K nets - DRC clean pin access pattern selection in 241s Whole design layout #### **How to Find Such Access Points?** #### Multi-level hierarchical pin access analysis - Unique instance pin level - Unique instance level - Instance cluster level Scalable memory usage On-demand design-based analysis Scalable runtime - DRC check on the fly - → More than 2M DRC engine calls in 8min with single thread Cluster-based access pattern selection Unique instancebased access pattern generation Pin-based access point generation ## **Unique Instance** - To address track-placement site misalignment - Defined by a signature consisting of - Cell master (e.g., BUFFX4) - Orientation - Offsets to all track patterns - Two cell instances point to the same unique instance if they share the same signature #### **Definitions** - Access point (for a pin) - A location (x, y, layer) that detailed router (DR) can make route to - Valid access point - An access point that allows DRC-clean routing - Valid access pattern: combination of mutually DRC-clean access points (one access point per pin) Pin shape × Access point --- Access pattern 33 # **Access Point Quality Assessment** Goal: an evaluation system compatible with a broad range of technology nodes Four coordinate types | Туре | Cost | |--------------------|------| | On-track | 1 | | Half-track | 2 | | Shape-center | 3 | | Enclosure boundary | 4 | Quality of an access point = sum of coordinate type costs for both x and y coordinates ## **Coordinate Types** - On-track: On preferred and non-preferred routing track of upper layer - Half-track: At midpoint between two neighboring routing tracks - Shape-center: At midpoint between left and right (or top and bottom) coordinates of a rectangular pin shape - 4. Enclosure boundary: Via enclosure aligns with pin shape boundary #### **Pin-Based Access Point Generation** #### Algorithm 1 Pin-based access point generation ``` 1: Inputs: pin, track patterns tps, viadefs vias 2: Output: valid access points aps 3: for all nonPreferredDirCoordType t1 \in \{0, 1, 2\} do 4: for all preferredDirCoordType t0 \in \{0, 1, 2, 3\} do 5: tmpAps \leftarrow genAccessPoint(pin, tps, vias, t0, t1) 6: for all ap \in tmpAps do 7: if isValid(ap) then 8: aps += ap 9: end if 10: end for 11: if |aps| \geq k then 12: return 13: end if 14: end for 15: end for ``` ## Unique Instance-Based Access Pattern Generation ## Input Valid access points of pins in a unique instance # Output Valid access patterns # **Pin Ordering** Sort pins according to their average x coordinate of valid access points - Idea: neighbors in sorted pin list are more likely to have conflicts due to DRC - I.e., pin A and pin B are more likely to have conflict compared to pin A and pin C # **Graph Construction** - Vertex = access point - Marked with pin index and access point index - E.g., 23 means the third access point of the second pin - s and t are virtual start and end points - Pin correspond to a "group" of vertices in graph - Edge exists between pair of access points from neighboring groups, weighted by physical distance - Access pattern = path from s to t Kahng ECE 260C SP25 Thanks: Dr. Bangqi Xu ## **Dynamic Programming-Based Pattern Generation** ## Algorithm 2 Access pattern generation ``` 1: Inputs: graph G(V, E) 2: Output: access patterns APs 3: Initialize array dp[m][n] G(V, E) 4: for all currPinIdx m do 5: for all currApIdx n do for all prevApIdx n' do 6: 7: prev \leftarrow aps[m-1][n'] 8: curr \leftarrow aps[m][n] 9: edgeCost \leftarrow getEdgeCost(prev, curr) 10: pathCost \leftarrow prev.cost + edgeCost 11: if pathCost < curr.cost then 12: curr.cost \leftarrow pathCost 13: curr.prev \leftarrow prev 14: end if 15: end for 16: end for 17: end for 18: APs += traceBack() 19: return APs ``` # **Iterative Edge Penalty Method** - Inter-cell pattern conflicts between cell-boundary pins - Need to encourage to choose different boundary pin access point - Pin shape × Access point -- Conflicts - Solution: add penalty cost to boundary pin access points if they have been selected in existing pattern # **Pattern Legality Check** - Use DRC check engine to validate access pattern - Violation can occur between non-neighboring pins - ii. Some design rules check multiple objects (access points) (i) Pin shape × Access point Conflicts (ii) Only DRC-clean patterns will be seen in next stage ## Cluster-Based Access Pattern Selection ## Inputs - Instances in a cluster (of the same row) - Access patterns of each unique instance - Map from instance to corresponding unique instance # Output Access pattern for each instance in the cluster with minimized overall cost ## **Cluster-Based Access Pattern Selection** - Instance ordering - Sort instances in the cluster according to x coordinate of the lower-left corner - Graph construction - Vertex = access pattern - Shortest path from s to t is the best pattern combination WCSD Kahng ECE 260C SP25 # **CTS** CCSD Kahng ECE 260C SP25 # **CTS Main Steps** ## Sink clustering - Sequential elements are grouped into a fixed number of clusters based on their locations - Tree construction and balancing - Buffers are inserted based on some structure, e.g., hierarchical H-Tree - Tree lengths are balanced such that clock skews are minimized - LDRC (electrical rules) repair - LDRC violations are repaired during or after CTS - Max transition, max capacitance, max wire length, etc. # **Sink Clustering** - Group sequential elements based on their locations to produce the best results (e.g., minimum wire length) - These parameters can be specified manually or determined automatically - Cluster size - Cluster diameter - All elements in the cluster will be driven by the same buffer Early "TritonCTS" versions used spacefilling curves to perform sink clustering! ## **Obstruction-Aware CTS** # Clock buffers should not be placed on top of macros, placement blockages or another clock buffers - Detailed placement may displace "illegal" buffers and cause timing to change after CTS - New "legal" buffer locations need to preserve balanced clock tree - Obstruction-aware CTS can reduce legalizer displacement by up to 4X sky130hd/microwatt without obstruction-aware CTS sky130hd/microwatt with obstruction-aware CTS Kahng ECE 260C SP25 # **OpenROAD CTS Commands** | Command | Description | Example Output | | |-------------------------------------------|--------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--| | clock_tree_synthesis | Build a balanced Htree by choosing appropriate clock buffers | [INFO CTS-0050] Root buffer is BUF_X4. [INFO CTS-0051] Sink buffer is BUF_X4. [INFO CTS-0052] The following clock buffers will be used for CTS: BUF_X4 [INFO CTS-0017] Max level of the clock tree: 5. [INFO CTS-0098] Clock net "clk" [INFO CTS-0099] Sinks 2537 [INFO CTS-0100] Leaf buffers 96 [INFO CTS-0101] Average sink wire length 9247.25 um [INFO CTS-0102] Path depth 18 - 19 [INFO CTS-0207] Leaf load cells 62 [INFO RSZ-0058] Using max wire length 693um. [INFO RSZ-0048] Inserted 94 buffers in 33 nets. | | | repair_clock_nets | Fixes LDRC violations including max wire length | [INFO RSZ-0058] Using max wire length 2154um. | | | report_clock_skew | Report worst clock skew for each clock signal in the design | Clock clk 1.26 source latency inst_7_12/clk ^ -1.13 target latency inst_8_12/clk ^ 0.00 CRPR 0.13 setup skew | | | report_checks -format full_clock_expanded | Report timing violations including clock paths | Startpoint: dp.rf.rf[31][3]S_DFFE_PP_ | | UCSD Kahng ECE 260C SP25 ## **Clock Tree Viewer** #### Open GUI gui::show Enable "Clock Tree Viewer" if not enabled Clock tree viewer shows latencies at all sinks - Red sinks = FF/latches - Green sinks = macros - Insertion delays are added to macro sinks ## **Generalized H-Tree Concept** #### Structured clock trees K. Han, A. B. Kahng and J. Li, "Optimal Generalized H-Tree Topology and Buffering for High-Performance and Low-Power Clock Distribution", *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems* https://vlsicad.ucsd.edu/Publications/Journals/j128.pdf. | | H-tree | Fishbone | |------------|--------|----------| | Skew | < | | | Wirelength | > | | | Latency | > | | | Power | | > | Generalized H-tree (GH-tree) Can we mix two clock structures to have better tradeoff between clock power vs. skew or latency? - History: (1) Bakoglu's 1988 book made H-tree approach well-known. Cadence CTGen (Dr. Lars Hagen), mid-1990s, started trend toward "fishbone" style save capacitance! - These days: on-chip variation (OCV) derates are costly, so goal is to reduce insertion delay (== "latency"). ## **Generalized H-Tree Concept** #### Structured clock trees K. Han, A. B. Kahng and J. Li, "Optimal Generalized H-Tree Topology and Buffering for High-Performance and Low-Power Clock Distribution", *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems* <a href="https://vlsicad.ucsd.edu/Publications/Journals/j128.pdf">https://vlsicad.ucsd.edu/Publications/Journals/j128.pdf</a>. | | H-tree | Fishbone | |------------|--------|----------| | Skew | < | | | Wirelength | > | | | Latency | > | | | Power | > | | Generalized H-tree (GH-tree) a balanced tree topology with arbitrary branching factor at each level GH-tree with depth P = 8 and branching factors (4, 2, 2, 2, 4, 2, 2, 2) Kahng ECE 260C SP25 # Idea: Capture/Explore Tradeoff ("Pareto" Frontier) (Recall: floorplan shape functions ?) # Also: Routing to IOs: pad (ICeWall) # SOC Integration and Planning: ICeWall Padring Gen #### Starts with: - Verilog netlist with signal IO pads for simulation and STA - Power/ground IO cells may be present - IO cell data (signal, P/G, fillers, ...) from library documentation - Footprint file defines where each padcell is to be placed in the padring – supports reuse of pre-existing padframes - Signal mapping file defines which signal in the Verilog is to be associated with which padcell in the padring - + Auto-assignment capability in ICeWall - Decouples footprint and signal mapping for padframe reuse 55 Thanks: Colin Holehouse, Arm ### **ICeWall Padring Examples** GF12LP BP-1, staggered pads GF12LP BP-1, as a flipchip #### What designers ask for ... - Determining the number of required P/G pads to be provided as callback functions to allow to encapsulate specs from library documentation - Definition of padring segments for analog signals, PHYs, different IO voltages, etc. - Definition of control cells that are required on a per-IO cell basis # **Also: GPU-Accelerated GRT** # **Summary of Previous Methods** - 2D-GPU-accelerated GRs are based on FastRoute4.1 - SPRoute and SPRoute2.0: implement parallel maze routing - 3D-GPU-accelerated GRs are based on CUGR - FastGR and GGR: implement parallel L/Z-shape pattern routing - GAMER and GGR: implement parallel maze routing - Replace the A\* search algorithm with the parallel n-bend pattern routing algorithm ### Route a batch of non-overlapping nets concurrently! Thanks: Dr. Zhiang Wang 58 # Proposed GPU-Accelerated TritonRoute-GR - Our GPU-GR is based on TritonRoute-GR - TritonRoute-GR adopts a two-step approach (2D + 3D GR) - 2D global routing can effectively reduce the solution space - 3D global routing can further optimize the solution locally - Replace the original 2D-RRR and 3D-RRR with corresponding GPUaccelerated GPU-RRR Thanks: Dr. Zhiang Wang 59 # **Current Approach** 60 # **Three-Level Parallel Maze Routing** - We adopt three-level parallel maze routing - - Parallel A\* search based on multiple priority queues [ZhouZ'15AAAI] Kahng ECE 260C SP25 Thanks: Dr. Zhiang Wang ### **Future Work** - "machine learning alongside optimization algorithms" - Combine the detailed-routability-driven GR with the MLbased DRV prediction models Improve accuracy and robustness compared to "end to end Extract features for DRV learning" **Congestion Map Setup** hotspots prediction **Steiner Tree Construction** & Net Decomposition **Initial Global Routing Iterative L-Shape Pattern Routing** Adjust the cost for the hotspot gcells Clip-Based GPU-Accelerated CNN-Based DRV **Hotspots Prediction** 2D Ripup and Reroute Congestion map from **Layer Assignment** Maze Routing Clip-Based GPU-Accelerated 3D Ripup and Reroute Yes **Maze Routing** No Overflow? Solution Thanks: Dr. Zhiang Wang # **BACKUP** 63