# **ECE 260C, Spring 2025**

#### **Placement**

Andrew B. Kahng

Thanks to: Andrew Kennings, Mingyu Woo, ...

#### Physical Design Flow Pictures (old ECE 260B slide)

Floorplanning



**Placement** 



Powerplanning



Routing



#### **Global Placement**







VCSD Kahng ECE 260C SP25

# **GPL**



# RePlAce: Advancing Solution Quality and Routability Validation in Global Placement

# Chung-Kuan Cheng, Andrew B. Kahng, Ilgweon Kang and Lutong Wang

(based on slides from Ilgweon Kang's Ph.D. defense)

paper

#### **Placement Overview**

#### Placement

- Determines the locations of standard cells and/or logic elements while addressing optimization objectives
- Highly important physical design step in integrated circuit (IC) design flow
  - Directly impacts on timing closure, die utilization, routability, design turnaround time (TAT) → operating frequency, yield, power consumption, cost

#### Placement instances

- Hypergraph G = (V, E)
  - V:= a set of vertices, i.e., standard cells and macros
  - E:= a set of *hyperedges*, i.e., nets
- Placement solution v = (x, y), X- and Y-coordinates of all placeable vertices
  - legal solution?
  - 1. Every instance should be settled in the placement region.
  - 2. Every standard cell should be spaced within predefined rows.
  - 3. No overlap is allowed between instances including both standard cells and macros.



#### **Problem Formulation**

Placement Objective Function f(v)  $\min_{\boldsymbol{v}} f(\boldsymbol{x}, \boldsymbol{y}) = W(\boldsymbol{x}, \boldsymbol{y}) + \sum_{b} \mathcal{V}_b D_b(\boldsymbol{x}, \boldsymbol{y})$ 

ePlace: Electrostatics-based global-smooth density function



#### ePlace: In Each Iteration



cell & macro distr.



electric field distr.



charge density distr.



electric potential distr.

#### **RePlAce**

- Unified placement engine which solves multiple classes of academic benchmarks (mixed-size, fixed-macros, routabilitydriven, etc.
- Local Lagrange multiplier
  - Enables local smoothing w/awareness of local over-demanded bins
- Meta parameter tuning
  - Adjust step size of numerical method
- Routability optimization
  - Simple but effective metal layer-aware superlinear cell inflation
- Differences from the previous ePlace 2.0
  - Routability optimization techniques
  - Code optimizations for ~4X speedups
  - Various improvements to placement mechanisms
    - E.g., macro legalization using annealing; amount of rollback after fixing macro; handling of overflows; bin size determination and local smoothing; ...



#### RePlAce: Constraint-Driven Placement

#### Local Lagrangian multiplier $v_i$

- $v_j = e^{\alpha(BinDemand_j BinCapacity_j)}$
- Local-Density Penalty Factor for each bin b<sub>i</sub>
- $BinDemand_j$  = total cell area in bin  $b_j$
- $BinCapacity_i$  = area of bin  $b_i$
- $BinDemand_j$   $BinCapacity_j$  =  $overflow_j$  of bin  $b_j$ .







Local  $\nu_i$   $\uparrow$ , and Smaller HPWL  $\uparrow$  [RePlAce with Constraint-Driven]

#### RePlAce: Constraint-Driven Placement

Local-Density Cost Coefficient for each cell

$$\Delta_i^{iter+} = \Delta_i^{iter} + \beta \cdot \frac{\max(Overflow_j, 0)}{\sum_i A_i} \qquad i \in b_j.$$

- The previously defined  $v_j$  was insufficient for local smoothing
- Cell i from high-overflow bin losts a large amount of local-oriented repulsive force
- Δi helps to memorizes previously obtained local force.



#### RePlAce: Constraint-Driven Placement

Global placement animation with local Lagrangian multiplier



**NEWBLUE1**, [RePlAce with local Lagrangian multiplier, i.e., constraint-driven placement] HPWL = 5.60E+7, #Iter = 762 (623+139), runtime = 9.9 min, target density = 100%

Comparison: ePlace 2.0 with local Lagrangian multiplier HPWL = 5.71E+7, #Iter = 1078 (609 + 469), runtime = 42 min, target density = 100%



#### RePlAce: Dynamic Step Size

General Idea of Dynamic Step Size



# RePlAce: Improved Dynamic Step Size

Methodology to capture "transition points" on HPWL curve.



▲ Flowchart of trial placement procedure. The red rectangle indicates nonlinear optimization using Nesterov's method. The actual placement procedure follows this tGP procedure.

#### RePlAce: Improved Dynamic Step Size

- Solution quality in terms of the final HPWL
  - ePlace vs. RePlAce-ds (ADAPTEC1)
    - RePlAce-ds achieves a dominating runtime and solution quality (red square)
    - Below [0.95, 1.003], e.g., [0.95, 1.002], [0.95, 1.001], etc., runs did not converge





- Simple but effective routability optimization
  - A layer-aware cell inflation technique
  - Integrate the official global router NCTU-GR [17] of the DAC-2012 [18] and ICCAD-2012 [19] benchmark suites for congestion estimation.
  - Superlinear cell inflation technique to mitigate global routing congestion during global placement.
  - We further include a post-placement optimization by [20]
    - Following the strategy of recent leading works [21] [22]

[17] NCTU-GR, http://people.cs.nctu.edu.tw/~whliu/NCTU-GR.htm

[18] N. Viswanathan, C. J. Alpert, C. N. Sze, Z. Li and Y. Wei, "The DAC 2012 Routability-driven Placement Contest and Benchmark Suite", *Proc. DAC*, 2012, pp. 774-782.

[19] N. Viswanathan, C. J. Alpert, C. N. Sze, Z. Li and Y. Wei, "ICCAD-2012 CAD Contest in Design Hierarchy Aware Routability-Driven Placement and Benchmark Suite", *Proc. ICCAD*, 2012, pp. 345-348.

[20] W.-H. Liu, C.-K. Koh and Y.-L. Li, "Optimization of Placement Solutions for Routability", Proc. DAC, 2013, pp. 1-9.

[21] X. He, T. Huang, L. Xiao, H. Tian and E. F. Y. Young, "Ripple: A Robust and Effective Routability-Driven Placer", *IEEE Trans. on CAD* 32(10) (2013), pp. 1546-1556.

[22] X. He, Y. Wang, Y. Guo and E. F. Y. Young, "Ripple 2.0: Improved Movement of Cells in Routability-Driven Placement", *ACM Trans. on DAES* 22(1) (2016), pp. 10:1-10:26.



Metal layer-aware superlinear cell inflation

$$infl_ratio = \max_{all\ e,ml} \left( \left( \frac{demand_{e,ml} + blk_{e,ml}}{cap_{e,ml}} \right)^{\gamma_{super}}, 2.5 \right)$$

- e = one of the four edges of a given global routing tile
- ml = a specific metal layer



- Considers the total available whitespace
  - Starting from 90% die utilization,
  - We limit 'the maximum cell-inflated area' ≤ 10%
  - If exceed, then perform 'inflation ratio adjustment'
    - Divides the inflation ratio for each tile by the inflation ratio of the least-congested tile that has a ratio greater than one

IEEE Trans. on CAD paper: routability optimization flow



Uses the detailed placer from NTUplace3

T.-C. Chen, Z.-W. Jiang, T.-C. Hsu, H.-C. Chen and Y.-W. Chang, "NTUplace3: An Analytical Placer for Large-Scale Mixed-Size Designs with Preplaced Blocks and Density Constraints", *IEEE Trans. on CAD* 27(7) (2008), pp. 1228-1240.



 Global routing overflow (SUPERBLUE12) during routabilitydriven global placement procedure



- Global placement animation with our routability optimization
  - Keeps the inflated cell size by the end of global placement
  - Red-cell zone is keep growing along with the routability optimization







SUPERBLUE18 (ICCAD-2012)
The initial RC value = 135.23
The final RC value = 102.10



# RePlAce (2019) Summary

- Advancing solution quality and routability validation in GPL
  - Local density function
    - Local smoothing comprehending local over-demanded bins
  - Improved dynamic step size adaptation
  - Routability optimization
    - Simple but effective metal layer-aware superlinear cell inflation technique
- Superior solution quality for range of problem types
  - Standard cell placement: average <u>HPWL reduction of 2.00%</u> over best previous ISPD benchmark results
  - Mixed-size placement: average <u>HPWL reduction of 2.73%</u> over the best previous MMS benchmark results
  - Routability-driven placement: average <u>8.50% to 9.59% scaled</u> <u>HPWL reduction</u> over previous leading academic placers for DAC-2012 and ICCAD-2012 benchmark suites

#### **Notes**

- Formulation is here (calculate demand / capacity) and here
- Dynamic step size updating is here
- Entry point for routability-driven is here
- Calculation of cell inflation for routability-driven placement is <a href="here">here</a> and <a href="here">here</a>



#### **Global Placement Zoom-In**





UCSD Kahng ECE 260C SP25

#### (Legalized) Detailed Placement Zoom-In





#### Region constraints

 Cells assigned to a region have to be placed inside the boundary of the region.



Zoomed area containing a region constraint



Thanks: Nima Darav, AMD

# Fence region constraints

 Cells assigned to a fence region have to be placed inside the boundary of the region while other cells need to be placed outside.





Thanks: Nima Darav, AMD

25

# **Edge spacing constraints**

Prone to pin access and short problems



Two cells are too close to each other

Thanks: Nima Darav, AMD

26

# **DPL**



CCSD Kahng ECE 260C SP25

# Legalization and detailed placement

- Global placement must be legalized
  - Cell locations typically do not align with power rails
  - Small cell overlaps due to incremental changes, such as cell resizing or buffer insertion
- Legalization seeks to find legal, non-overlapping placements for all placeable modules
- Legalization can be improved by detailed placement techniques, such as
  - Swapping neighboring cells to reduce wirelength
  - Sliding cells to unused space
- Software implementations of legalization and detailed placement are often bundled (dpl)

Kahng ECE 260C SP25 Thanks: Sang-Gi Do, Samsung 28

#### Mixed-height standard cell legalization

Legalization: seek legal placement + remove overlaps



- Mixed-height standard cells: standard cells that span different row heights ← single/double/triple/quadruple
- Mixed-height standard cell legalization is challenging
  - Potential overlaps in vertical (+ horizontal) directions
  - Horizontal movement of a multi-height cell disrupts placement across all rows it overlaps
- Presence of fence regions aggravate the problem



Thanks: Sang-Gi Do, Samsung

#### An example of power rails violation



- Red bars: power (VDD); blue bars: ground (VSS)
- A1, A2: single height standard cells; B1, B2: double height standard cells; C1, C2: triple height standard cells
- B2 and B3 violate power rails alignment



Thanks: Sang-Gi Do, Samsung

# Overall framework in dpl

Fence region-aware (FRA) BFS algorithm (paper)



#### Pre-legalization – an example

 Grey cells outside fence region are pushed into the fence region



**Before pre-legalization** 

After pre-legalization



Thanks: Sang-Gi Do, Samsung

#### Corner weight move – an example

Push standard cells towards the corners of the fence region



**Before corner weight moves** 

After corner weight moves



Thanks: Sang-Gi Do, Samsung

# Local shifting – an example

 Set a region around violating cell + rearrange peripheral cells to resolve violation



# Cell swaps – an example



Benchmark : des\_perf\_b\_md2. Cell count : 112644.

Max cell row: 4 Fence Utilization: 96.2%



Thanks: Sang-Gi Do, Samsung

# **DPO**



CCSD Kahng ECE 260C SP25

# DPO in a nutshell (1/2)

- DPO runs a detailed placement "script" that can perform a sequence of different optimization algorithms (see Optdp.cpp).
- Implemented algorithms (variations of what's been described in literature as there are many incarnations of the different algorithms): Maximum independent set matching [1], Optimal reordering [2], Median improvement (global and vertical swapping) [3], greedy improvement [4]
- Sample script string:

```
algorithm to run (e.g., gs = global swap, ro = optimal reordering, etc.)

dtParams.script_ = "mis -p 10 -t 0.005" gs -p 10 -t 0.005; vs -p 10 -t 0.005; ro -p 10 -t 0.005; default -p 5 -f 20 -gen rng -obj hpwl -cost (hpwl)"

Dpo::Detailed dt(dtParams);
dt.improve(mgr); runs the script

algorithm-specific parameters
```



# DPO in a nutshell (2/2)

- Most algorithms can optimize hpwl or displacement (other objectives such as timing, congestion, etc.) are not well accounted for which is a "consequence" of the algorithm.
- The greedy algorithm is actually very flexible; it uses the idea of "move generators" and "cost objects" that can propose any set of cell movements (cells assigned to new locations) which are then evaluated by the "cost objects" to form a cost function.
  - Proposed moves are accepted or rejected based on cost and move generators and cost objects track the current state of the placement (for incremental computation).
  - Need to be careful to support the required object pieces to add something; e.g., routines need accept() and reject() methods.
- DPO is largely not great with multi-height cells
  - E.g., greedy algorithm can only do simple swaps and moves and is not good at "shifting other cells out of the way".



#### Some References

- 1. K. Doll, F. M. Johannes, and K. J. Antreich. "Iterative placement improvement by network flow methods", IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, 13(10):1189-1200, 1994
- 2. A. E. Caldwell, A. B. Kahng and I. L. Markov, "Optimal partitioners and end-case placers for standard-cell layout", *Proc. ACM Intl. Symp. on Physical Design*, April 1999, pp. 90-96.
- 3. Min Pan, Natarajan Viswanathan and Chris Chu, "An efficient and effective detailed placement algorithm", IEEE/ACM International Conference on Computer-Aided Design, pages 48-55, 2005.
- 4. Andrew A. Kennings, Nima Karimpour Darav, Laleh Behjat "Detailed placement accounting for technology constraints VLSI-SoC 2014:1-6

# **BACKUP**

