# Unified Logical Effort - A Method for Delay Evaluation and Minimization in Logic Paths with *RC* Interconnect

Arkadiy Morgenshtein, Eby G. Friedman, *Fellow, IEEE*, Ran Ginosar, *Senior Member, IEEE*, and Avinoam Kolodny, *Member, IEEE* 

Abstract—The Unified Logical Effort (ULE) model for delay evaluation and minimization in paths composed of CMOS logic gates and resistive wires is presented. The method provides conditions for timing optimization while overcoming the limitations of standard logical effort (LE) in the presence of interconnect. The condition for optimal gate sizing in a logic path with long wires is also presented. This condition is achieved when the delay component due to the gate input capacitance is equal to the delay component due to the effective output resistance of the gate. The ULE delay model unifies the problems of gate sizing and repeater insertion: In the case of negligible interconnect, the ULE method converges to standard LE optimization, yielding tapered gate sizes. In the case of long wires, the solution converges towards uniform sizing of gates and repeaters. The technique is applied to various types of logic paths to demonstrate the influence of wire length, gate type, and technology.

Index Terms—Interconnect, logical effort, delay minimization, power.

# I. INTRODUCTION

**T**IMING modeling and optimization are fundamental tasks in digital circuit design. The method of logical effort (LE) was first proposed by Sutherland *et al.* [1],[2] for fast evaluation and optimization of delay in CMOS logic paths (see Fig. 1a). The technique has since been adopted as a basis for several CAD tools, thanks to the simplicity and elegance of the model. The optimization rule of logical effort, however, only addresses logic gates and does not consider on-chip wires. As VLSI circuits continue to scale, the contribution of wires to the delay increases and cannot be neglected. The useful LE rule

Manuscript received July 7, 2006.

Arkadiy Morgenshtein was with the Electrical Engineering Department, Technion – Israel Institute of Technology, Haifa, Israel. He is now with the Core CAD Technologies Group, Intel Corporation, Haifa, Israel (e-mail: morgenshtein@gmail.com).

Eby G. Friedman is with the Department of Electrical and Computer Engineering at the University of Rochester, Rochester, NY 14627 USA. He is also a Visiting Professor at the Technion - Israel Institute of Technology, Haifa, Israel. (e-mail: friedman@ece.rochester.edu).

Ran Ginosar and Avinoam Kolodny are with the Electrical Engineering Department, Technion – Israel Institute of Technology, Haifa, Israel. (e-mail: ran@ee.technion.ac.il), kolodny@ee.technion.ac.il).

that path delay is minimum when the efforts of each of the stages are equal breaks down, because interconnect has fixed capacitances which do not correlate with the characteristics of the gates (see Fig. 1b). The same issue arises when arbitrary fanouts and fixed branch loads are present in the circuit structure. This behavior is described by the authors of the LE method as "one of the most dissatisfying limitations of logical effort" [3].



Fig. 1. Cascaded strings of logic gates. (a) Logical effort optimization for gates without wires is based on equal stage efforts, e.g., g1h1 = g2h2., (b) In the case of gates with wires, the rule of equal effort breaks down due to fixed wire parameters.

The objective of this paper is to develop a simple method for minimizing delay in logic paths containing both gates and interconnect, including any fanout loads. Currently, timing optimization is typically treated separately in two scenarios: (a) logic gates without wires (using the standard LE method), and (b) long wires without logic (using repeater insertion [5]). We introduce the *Unified Logical Effort* (ULE) method for delay evaluation and optimization of logic paths with general logic gates and *RC* wires. ULE treats a broad scope of design problems with a single analytic model, combining logic and interconnect delay optimization.

The paper is composed of the following sections. Related work is surveyed and discussed in Section II. The Unified Logical Effort model is developed in Section III. Timing optimization based on the ULE model referring to resistive and capacitive wires is presented in Section IV. A condition for optimal gate sizing in logic paths with wires is also described in Section IV, which provides an intuitive approach to the problem, namely that the delay component due to the gate capacitance is equal to the delay component due to the effective resistance of the gate. Examples of ULE optimization are presented in Section V. Convergence of the model to existing optimization techniques is shown for specific cases. Gate sizing by ULE for long wires is analyzed in Section VI. Simulation results of benchmark circuits are presented in Section VII comparing ULE optimization with the results of an industrial CAD tool optimizer. A discussion of advanced design constraints and applicability of ULE is presented in Section VIII. Finally, a summary of the paper as well as topics for future research are provided in Section IX.

# II. RELATED WORK

Research has been developed to increase the accuracy of the logical effort model by considering I/O coupling and ramp input effects [9], as well as internodal charge and deep submicrometer effects [10]. While increasing the accuracy of the LE method for logic gate delays, the research described in these papers does not address the issue of interconnects. In [11], the LE model is extended to relate transistor size to the speed and energy consumption of the circuits without considering the RC wires among the gates. An optimization methodology using logical effort is proposed in [12] for logic blocks driving interconnect with uniform and non-uniform repeaters. This work, however, does not address sizing in the presence of interconnect between the logic gates.

Traditional timing optimization procedures have been developed assuming capacitive interconnect [13],[14],[15], focusing on optimally tapered buffers. In [16],[17], the wire capacitance between the gates is assumed to be correlated to the gate size, resulting in a fixed tapering factor similar to the logical effort model. In [15], local interconnect capacitances are considered to be independent of the gate size and the optimization process is based on constant capacitance-tocurrent ratio tapering. In order to accurately consider resistive interconnect, post-routing design steps have been added, involving wire segmentation and repeater insertion [5],[6],[7],[8],[12]. These optimization techniques include equal sizing and spacing of the repeaters [5], as well as tapering the repeater size and wire segments [12]. Most of these techniques for timing optimization in interconnect have been developed independent of the logical effort model, focusing on inverters as repeaters (or buffers) driving long wires rather than on general logic paths with wire segments.

The logical effort delay expression has been combined with the Elmore delay model [21] in [18], [19] and [24]. The combined model is used in [18], [19] for optimal wire segmentation with general logic gates rather than repeaters. The work described in these publications, however do not consider optimal gate sizing. The authors of [24] use the combined delay model to derive the optimal number and size of equally spaced uniform buffers for insertion into long wires. None of these previous publications, however, provides a general method for logic gate size optimization for circuit speed in the presence of interconnect. This topic in circuit optimization is addressed in this paper, covering logic circuits with both capacitive and resistive interconnect segments including arbitrary branch fanout.

#### III. A DELAY MODEL OF LOGIC GATES WITH WIRES

The logical effort model is modified here to include the interconnect delay. This change is achieved by extending the gate logical effort delay by the wire delay, establishing a *Unified Logical Effort* (ULE) model.

A circuit comprising logic gates and wires is shown in Fig. 2. The interconnect is represented by a  $\pi$ -model. Following [20], the Elmore delay model [21] is used to describe the wire delay. The total combined delay expression is

$$D_{i} = R_{i} \cdot \left(C_{p_{i}} + C_{w_{i}} + C_{i+1}\right) + R_{w_{i}} \cdot \left(0.5 \cdot C_{w_{i}} + C_{i+1}\right),$$
(1)

where  $R_i$  is the effective output resistance of the gate *i*,  $C_{p_i}$  is the parasitic output capacitance of gate *i*,  $C_{w_i}$  and  $R_{w_i}$  are, respectively, the wire capacitance and resistance of segment *i*, and  $C_{i+1}$  is the input capacitance of gate i+1.



Fig. 2. Cascaded logic gates with resistive-capacitive interconnect.

This expression is rewritten, similar to [18],[19],[24], by introducing the delay of a minimum size inverter as a technology constant  $\tau = R_0 \cdot C_0$ , where  $R_0$  and  $C_0$  are the output resistance and input capacitance of a minimum sized inverter, respectively;

$$D_{i} = \tau \cdot d_{i} = \tau \cdot \left| \frac{\frac{R_{i}}{R_{0}} \cdot \left( \frac{C_{w_{i}} + C_{i+1} + C_{p_{i}}}{C_{0}} \right)}{+ \frac{R_{w_{i}}}{R_{0} \cdot C_{0}} \cdot \left( 0.5 \cdot C_{w_{i}} + C_{i+1} \right)} \right|.$$
(2)

The stage delay, normalized with respect to a minimum inverter delay  $\tau$  , is expressed in logical effort (LE) terms,

$$d_{i} = g_{i} \cdot \left(h_{i} + \frac{C_{w_{i}}}{C_{i}}\right) + \frac{R_{w_{i}} \cdot \left(0.5 \cdot C_{w_{i}} + C_{i+1}\right)}{\tau} + p_{i}, \qquad (3)$$

where  $g_i = (R_i \cdot C_i)/(R_0 \cdot C_0)$  is the logical effort related to the gate topology,  $h_i = C_{i+1}/C_i$  is the electrical effort describing the drive capability, and  $p_i = (R_i \cdot C_{p_i})/(R_0 \cdot C_0)$  is the delay factor of the parasitic impedance. The capacitance and resistance of the gate are related to the scaling factor  $x_i$  as  $C_i = C_0 \cdot g_i \cdot x_i$ , and  $R_i = R_0/x_i$ , respectively. The capacitive interconnect effort  $h_w$  and resistive interconnect effort  $p_w$  are, respectively,

$$h_{w_{i}} = \frac{C_{w_{i}}}{C_{i}} , \qquad (4)$$

$$p_{w_{i}} = \frac{R_{w_{i}} \cdot \left(0.5 \cdot C_{w_{i}} + C_{i+1}\right)}{\tau} . \qquad (5)$$

As shown in (4),  $h_w$  expresses the influence of the wire capacitance on the electrical effort of the gate. The component  $p_w$  in (5) is the delay of the loaded wire in terms of the gate delay ( $\tau$ ). The component  $R_w \cdot 0.5 \cdot C_w / \tau$  is technology specific.

The final expression of the ULE delay for a single stage is

$$d = g \cdot (h + h_w) + (p + p_w). \tag{6}$$

The ULE delay expression for an N stage logic path with RC wires is

$$d = \sum_{i=1}^{N} g_{i} \cdot \left(h_{i} + h_{w_{i}}\right) + \left(p_{i} + p_{w_{i}}\right).$$
(7)

Note that in the case of short wires, the resistance  $R_w$  of the wire may be neglected, eliminating  $p_w$  and leaving only the capacitive interconnect effort  $h_w$  in the expression. When the wire impedance along the logic path is negligible, the extended delay expression reduces to the standard LE delay equation.

#### IV. DELAY MINIMIZATION USING UNIFIED LOGICAL EFFORT

As a first step in the path delay optimization process, consider a two-stage portion of a logic path with wires (as shown in Fig. 2). The ULE expression of the total delay is

$$d = g_i \cdot (h_i + h_{w_i}) + (p_i + p_{w_i}) + (p_{i+1} + p_{w_{i+1}}) + (p_{i+1} + p_{w_{i+1}}).$$
(8)

Substituting  $C_{i+1} = h_i \cdot C_i$  into (8) in the presence of resistive interconnect, the delay can be expressed in terms of  $h_i$  as

$$d = g_{i} \cdot \left(h_{i} + \frac{C_{w_{i}}}{C_{i}}\right) + p_{i} + \frac{R_{w_{i}} \cdot \left(0.5 \cdot C_{w_{i}} + h_{i} \cdot C_{i}\right)}{R_{0} \cdot C_{0}} + g_{i+1} \cdot \left(\frac{C_{i+2} + C_{w_{i+1}}}{h_{i} \cdot C_{i}}\right) + p_{i+1} + p_{w_{i+1}}$$
(9)

The condition for optimal gate sizing is determined by equating the derivative of the delay with respect to the gate size to zero (see [4] for derivation details),

$$\left(g_i + \frac{R_{w_i} \cdot C_i}{R_0 \cdot C_0}\right) \cdot h_i = g_{i+1} \cdot \left(h_{i+1} + h_{w_{i+1}}\right).$$
(10)

For a logic path without wires  $(h_{w_i} = 0, R_{w_i} = 0)$ , the optimum condition of ULE (10) converges to the optimum condition of LE [1]:  $g_i \cdot h_i = g_{i+1} \cdot h_{i+1}$ .

To provide an intuitive interpretation of the expression, the expression can be rewritten by multiplying by  $R_0 \cdot C_0$  and

using the relationships  $h_i = C_{i+1}/C_i$ ,  $C_i = C_0 \cdot g_i \cdot x_i$ , and  $R_i = R_0/x_i$ . The resulting optimum condition is

$$\left(R_{i}+R_{w_{i}}\right)\cdot C_{i+1}=R_{i+1}\cdot\left(C_{i+2}+C_{w_{i+1}}\right).$$
(11)

The meaning of (11) is that the optimum size of gate i+1 is achieved when the delay component  $(R_i + R_{w_i}) \cdot C_{i+1}$  due to the gate capacitance is equal to the delay component  $R_{i+1} \cdot (C_{i+2} + C_{w_{i+1}})$  due to the effective resistance of the gate. Note that the wire parameters,  $R_{w}$ , and  $C_{w}$ , are considered fixed when deriving this intuition for gate sizing.

A schematic model describing the related delay components is shown in Fig. 3. Note that the other delay components  $(R_i \cdot C_{w_i}, 0.5 \cdot R_{w_i} \cdot C_{w_i}, R_{w_{i+1}} \cdot (0.5 \cdot C_{w_{i+1}} + C_{i+2}))$  are independent of the size of gate i+1 and do not influence the optimum size. Also note that in the presence of wires, the condition for minimum path delay does not correspond to equal delay or equal effort at every stage along the path.



Fig. 3. Delay components in characterizing ULE for long wires

The optimum condition (11) can be further developed for any gate *i* based on the characteristic that the total delay is the sum of the upstream and downstream delay components,

$$D_{C_{i}} = \left(R_{i-1} + R_{w_{i-1}}\right) \cdot C_{i} = \left(R_{i-1} + R_{w_{i-1}}\right) \cdot C_{0} \cdot g_{i} \cdot x_{i} ,$$

$$D_{R_{i}} = R_{i} \cdot \left(C_{i+1} + C_{w_{i}}\right) = \frac{R_{0}}{x_{i}} \cdot \left(C_{i+1} + C_{w_{i}}\right) , \quad (12)$$

$$D_{i} = D_{C} + D_{n} + const .$$

Thus, when the total delay is minimum, the sum of the differential of the delay components with respect to the sizing factor  $x_i$  is equated to 0,

$$\frac{\partial D_{C_i}}{\partial x_i} = \left(R_{i-1} + R_{w_{i-1}}\right) \cdot C_0 \cdot g_i \quad ,$$

$$\frac{\partial D_{R_i}}{\partial x_i} = -\frac{R_0}{x_i^2} \cdot \left(C_{i+1} + C_{w_i}\right) \quad .$$

$$\frac{\partial D_i}{\partial x_i} = \frac{\partial D_{C_i}}{\partial x_i} + \frac{\partial D_{R_i}}{\partial x_i} = 0 . (14)$$

The solution of (14) provides an expression for the optimal sizing factor  $x_{i_{av}}$ ,

$$x_{i_{opt}} = \sqrt{\frac{R_0}{\left(R_{i-1} + R_{w_{i-1}}\right)} \cdot \frac{\left(C_{i+1} + C_{w_i}\right)}{C_0 \cdot g_i}} .$$
(15)

When  $x_{i_{opt}}$  is substituted into the expression in (11), a general optimum condition can be determined,

$$\left(R_{i-1} + R_{w_{i-1}}\right) \cdot C_{i} = R_{i} \cdot \left(C_{i+1} + C_{w_{i}}\right) = \sqrt{\left[\left(R_{i-1} + R_{w_{i-1}}\right) \cdot C_{0} \cdot g_{i}\right] \cdot \left[R_{0} \cdot \left(C_{i+1} + C_{w_{i}}\right)\right]}.$$
(16)

An intuitive interpretation of (16) is that the minimum delay is achieved when the downstream delay component (due to  $C_i$ ) and the upstream delay component (due to  $R_i$ ) of an optimally sized gate are both equal to the geometric mean of the upstream and downstream delays that would be obtained if the gate (with logical effort  $g_i$ ) is arbitrarily sized,

$$D_{R_{i_{opt}}} = D_{C_{i_{opt}}} = GM \left[ D_{R_i}, D_{C_i} \right].$$
(17)

The dependence of the delay on the sizing factor is exemplified in Fig. 4. Observe that choosing sizing factors different from  $x_{opt}$  will increase the delay. The total delay  $D_i$  comprises four components: the constant delays  $0.5 \cdot R_{w_{i-1}}C_{w_{i-1}}$  and  $0.5 \cdot R_{w_i}C_{w_i}$ , and the variable delays  $D_{C_i} = (R_{i-1} + R_{w_{i-1}}) \cdot C_i$  and  $D_{R_i} = R_i \cdot (C_{i+1} + C_{w_i})$  that are dependent on the sizing factor  $x_i$ . The value of the sizing factor  $x_{opt}$  is determined by the intersection of the three curves,  $D_{R_i}$ ,  $D_{C_i}$ , and  $D^* = GM [D_{R_{imin}}, D_{C_{imin}}]$ , as described in (17) and illustrated in Fig. 4.



Fig. 4. Dependence of delay on the sizing factor (for a NAND gate with  $L_i=100 \ \mu m$ ,  $L_{i-1}=1 \ mm$ ,  $C_{i-1}=C_0$ , and  $C_{i+1}=10C_0$ ).

The drive ability of a gate is related to the size of the gate and can be represented by a ratio of input capacitances [1]. The optimum condition in (10) can be rewritten to develop an expression for the input capacitance of each gate based on the ULE model,

$$C_{i_{opt}} = \sqrt{\frac{g_i}{g_{i-1} + \frac{R_{w_{i-1}} \cdot C_{i-1}}{R_0 \cdot C_0}}} \cdot C_{i-1} \cdot \left(C_{i+1} + C_{w_i}\right)} =$$

$$= \underbrace{\sqrt{C_{i-1} \cdot C_{i+1}}}_{\text{LE}} \cdot \underbrace{\sqrt{\left(1 + \frac{C_{w_i}}{C_{i+1}}\right)}}_{\text{wire capacitance}} \cdot \underbrace{\sqrt{\frac{g_i}{g_{i-1} + \frac{R_{w_{i-1}} \cdot C_{i-1}}{R_0 \cdot C_0}}}_{\text{logical efforts and wire resistance}}$$
(18)

Note that the first part of the resulting expression is similar to the condition described by the LE model for a path of identical gates. The second component expresses the influence of the interconnect capacitance. The last component is related to the resistance of the wire and the difference among the individual logical efforts (types of logic gates) along the path. The expression in (18) illustrates the quadratic relationship between the size of the neighboring gates. The gate size based on ULE can be determined by solving a set of N polynomial expressions for the N gates along the path. The expressions of optimal ULE sizing are extended to include fixed side branches and multiple fanout in Section VIII.

In order to simplify the solution, a relaxation method can be used. The technique is based on an iterative calculation along the path while applying the optimum conditions [4]. Each capacitance along the path is iteratively replaced by the capacitance determined from applying the optimum expression (18) to two neighboring logic gates.

## V. EXAMPLE LOGIC PATHS

The ULE technique is applied to two example logic paths to demonstrate the properties of gate sizing. Parameters from [22] are used for a 65 nm CMOS technology. The first example logic path is shown in Fig. 5 and consists of nine identical stages. The input capacitance of the first and last gates are  $10 \cdot C_0$  and  $100 \cdot C_0$ , respectively. The size of the logic gates along the path is shown in Fig. 5 for several values of wire length *L* between stages. The solutions range between two limits (bold lines in the plot): (a) for zero wire lengths, the solution converges to LE optimization [1], and (b) for long wires, the gate size in the middle stages of the path converges to a fixed value,  $x_{opt} \cong 50$  (the dashed line), similar to repeater insertion methods [5],[19]. The concept of equal optimal sizing  $x_{opt}$  for long wires is explained in the following section.



Fig. 5. Optimization of ULE sizing (normalized with respect to C0) for a chain of nine NAND gates with equal wire segments for a variety of lengths. For zero wire length, the solution converges to LE optimization. For long wires, the solution converges to a fixed size xopt. The parameters of a 65 nm CMOS process include  $R_0 = 8800 \ \Omega$ ,  $C_0 = 0.74 \ fF$ ; intermediate wires:  $r_w = 1.0 \ \Omega/\mu m$ ,  $c_w = 0.15 \ fF/\mu m$ ; and global wires:  $r_w = 0.04 \ \Omega/\mu m$ , and  $c_w = 0.23 \ fF/\mu m$ .

A second example is shown in Fig. 6. The logic chain is similar to the previous case, but the input and output gate capacitances are equal to  $10 \cdot C_0$ ; hence, the total electrical effort H = 1. In this case, no gate scaling is performed by LE in the absence of wires. Note that the ULE optimization process provides a sizing solution for a variety of wire lengths: It satisfies LE optimization (no scaling) in the case of zero wire length and converges to a fixed size for long wires.



Fig. 6. Optimization of ULE sizing (normalized to C0) for a chain of NAND gates with total electrical effort H=1 and with equal wire segments for a variety of lengths.

# VI. ULE GATE SIZING FOR LONG WIRES

As described in the previous section, in the case of long wire segments, the gate sizing optimization process converges to the scale factor  $x_{opt}$ . This scale factor is independent of wire length in the case of equal interconnect segments. In this section, the delay model of a logic gate with long wires is investigated in terms of the optimal size.

When long wires are assumed, the impedances  $C_{w_i}$  and  $R_{w_{i-1}}$  of (18) dominate the gate impedances. A schematic model of this case is shown in Fig. 7.



Fig. 7. Delay components of optimum ULE for long wires

The scale factor of a general gate can be derived from (15) for the case of long wires,

$$x_{opt_i} \cong \sqrt{\frac{R_0 \cdot C_{w_i}}{R_{w_{i-1}} \cdot C_0 \cdot g_i}} = \underbrace{\sqrt{\frac{c_w \cdot R_0}{r_w \cdot C_0 \cdot g_i}}}_{\text{constant}} \cdot \sqrt{\frac{L_i}{L_{i-1}}}, \quad (19)$$

using the relationships  $C_{w_i} = c_w \cdot L_i$  and  $R_{w_i} = r_w \cdot L_i$ , where  $r_w$  and  $c_w$  are the resistance and capacitance of the wire per unit length, and  $L_{i-1}$  and  $L_i$  are the length of the wires before and after the logic gate  $g_i$ , respectively. Note that the scale factor of the gate in the case of long wires depends only upon the ratio of the lengths of adjacent wires.

A general optimum condition can be derived, similar to (16)

$$R_{w_{i-1}} \cdot C_i = R_i \cdot C_{w_i} = \sqrt{\left[R_{w_{i-1}} \cdot C_0 \cdot g_i\right] \cdot \left[R_0 \cdot C_{w_i}\right]} \,. \tag{20}$$

The meaning of (20) is that the minimum delay is achieved when the downstream and upstream delay components of an optimally sized gate are both equal to the geometric mean of the upstream and downstream delays that would be obtained for an arbitrary sized gate.

In the special case of equal wire segments, the capacitance and resistance of all the segments are equal to  $C_w$  and  $R_w$ , respectively. In this case, the scaling factor  $x_{opt}$  is independent of the wire length and (19) reduces to

$$x_{opt_i} = \sqrt{\frac{R_0 \cdot c_w}{r_w \cdot C_0 \cdot g_i}} .$$
<sup>(21)</sup>

Note that this expression can be used as an extension of the basic repeater sizing equation, while the size can be determined for any logic gate according to the logical effort. For the special case of inverter-based repeater insertion (with a logical effort g = 1), (21) reduces to

IEEE TVLSI Manuscript No. TVLSI-00230-2007.R3

$$x_{opt} = \sqrt{\frac{R_0 \cdot c_w}{r_w \cdot C_0}} \,. \tag{22}$$

This optimal sizing factor is the same as for optimal repeater scaling [5]. In addition, similar to (20), the optimal sizing condition for a repeater is

$$R_{rep} \cdot C_w = C_{rep} \cdot R_w \,. \tag{23}$$

The best sizing of a repeater is achieved when the delay component  $R_w \cdot C_{rep}$  due to the repeater capacitance is equal to the delay component  $R_{rep} \cdot C_w$  due to the effective resistance of the repeater.

The application of ULE to repeater insertion provides a solution to some specific design problems. Two examples are presented here:

Wire layout constraint: given a wire of total length L comprising two unequal segments of lengths  $L_1$  and  $L_2$ , the optimal size of the repeater located between the segments is

$$x_{rep_{opt}} = \sqrt{\frac{C_w \cdot R_0}{r_w \cdot C_0}} \cdot \sqrt{\frac{L_2}{L_1}} .$$
(24)

*Cell size constraint:* given a repeater of size  $x_{rep}$  dividing a wire of total length L into two segments, the optimal segment lengths  $L_{1_{opt}}$  and  $L_{2_{opt}} = L - L_{1_{opt}}$  are related by

$$\frac{L_{2_{opt}}}{L_{1_{opt}}} = x_{rep}^2 / \left( \frac{c_w \cdot R_0}{r_w \cdot C_0} \right).$$
(25)

### VII. COMPARISON WITH BENCHMARK CIRCUITS

ULE optimization is verified by comparison to the results of Cadence Virtuoso® Analog Optimizer [23], a commercial numerical optimizer that uses a circuit simulator for delay modeling. The Analog Optimizer uses LSQ (least square) and Feasible CFSQP (C version Sequential Quadratic Programming) numerical algorithms to determine the value of the design variables that satisfy specific design objectives. The optimal solution is achieved by detecting the sensitivity of the expression to each design variable, iteratively changing the variables and performing circuit simulations. The numerical methods in Analog Optimizer can be used to satisfy a variety of design specifications. In this paper, minimum delay is the design goal. The design variable used by Analog Optimizer is the size of the gates along the critical path. Two circuits are considered, (a) a four-bit carry-lookahead adder and (b) a four-bit ripple-carry adder, designed for 65 nm CMOS technology [22]. The critical paths in both circuits are optimized according to (18) for different of inter-stage wire lengths. The ULE results are compared with the results of the Analog Optimizer tool.

A comparison of the resulting delay, evaluated by circuit simulation, is presented in Fig. 8. The delay after ULE optimization is close to the results achieved by the Analog Optimizer tool (within 9%), while the standard LE technique becomes increasingly inaccurate as the wire lengths grow.



Fig. 8. Delay of a carry-lookahead adder for various wire segment lengths after gate size optimization by LE, ULE, and Analog Optimizer (AO). Each pair of adder stages is interconnected by a wire segment in a 65 nm CMOS technology. For short wires, all methods yield the same results. For longer wires, LE becomes increasingly inaccurate while ULE optimization is comparable to the numerical results obtained by Analog Optimizer.

The low complexity and efficient computational time of ULE makes the algorithm a competitive alternative for integration into EDA toolsets that optimize complex logic structures with interconnect. The ULE and Analog Optimizer are compared in 0 in terms of the computational run time as a function of the length of the logic path. Both techniques are used to optimize the critical path in a ripple carry adder with a varying number of full adder stages. Note that the run time of Analog Optimizer is orders of magnitude longer than the ULE run time.

 
 TABLE I

 Comparison of computational run time of Analog Optimizer and ULE for various numbers of stages in a ripple-carry adder.

|                      | Run Time [minutes] |    |    |    |
|----------------------|--------------------|----|----|----|
| Number of stages     | 2                  | 4  | 6  | 8  |
| AO (1% precision)    | 25                 | 43 | 60 | 82 |
| AO (5% precision)    | 18                 | 25 | 32 | 39 |
| ULE (0.1% precision) | < 1 sec            |    |    |    |

#### VIII. ULE OPTIMIZATION IN PATHS WITH BRANCHES

ULE optimization can be extended to address the general design case where the logic path may include branches or gates with multiple fanout. The extended delay model is exemplified by the circuit shown in Fig. 9, defining a theoretical framework for delay minimization in circuits with side branches and multiple fanout. The circuit shows the general structure containing a side branch with *RC* interconnect and/or a fanout load with arbitrary capacitance. A similar circuit can be used to extend the Logical Effort model [1],[2] using only a capacitive load at the branch.



Fig. 9. A logic path segment including *RC* interconnect and two branches.  $R_b$  and  $C_b$  are the resistance and capacitance of branch wires, respectively, and  $C_f$  is the fanout load capacitance.

The ULE expression of the total delay of stages i and i+1 containing branches and fanout can be written similarly to (9),

$$d = g_{i} \cdot \left[ h_{i} + h_{w_{i}} + \frac{C_{b1_{i}} + C_{f1_{i}}}{C_{i}} + \frac{C_{b2_{i}} + C_{f2_{i}}}{C_{i}} \right] + \frac{R_{w_{i}}}{\tau} \cdot \left[ 0.5 \cdot C_{w_{i}} + h_{i} \cdot C_{i} + C_{b2_{i}} + C_{f2_{i}} \right] + g_{i+1} \cdot \left[ \frac{C_{w_{i+1}} + C_{i+2} + C_{b1_{i+1}} + C_{f1_{i+1}} + C_{b2_{i+1}} + C_{f2_{i+1}}}{h_{i} \cdot C_{i}} \right] + \frac{R_{w_{i+1}}}{\tau} \cdot \left[ 0.5 \cdot C_{w_{i+1}} + C_{i+2} + C_{b2_{i+1}} + C_{f2_{i+1}} \right],$$

$$(26)$$

where  $\tau = R_0 \cdot C_0$  is the minimum inverter delay.

The ULE condition for gate sizing is determined by equating the derivative of the delay with respect to the gate size to zero,

$$\left( g_{i} + \frac{R_{w_{i}} \cdot C_{i}}{\tau} \right) \cdot h_{i} =$$

$$= g_{i+1} \cdot \left( h_{i+1} + h_{w_{i+1}} + \frac{C_{b1_{i+1}} + C_{f1_{i+1}} + C_{b2_{i+1}} + C_{f2_{i+1}}}{C_{i+1}} \right).$$
(27)

The branch wire resistance  $R_{b}$  is not a part of the optimum

condition since the resistance is not along the path where the Elmore delay is calculated. Note that in those circuits without multiple fanout or branch interconnects, this general ULE condition for gate sizing converges to (10).

By applying expression (27) to each gate on the path in an iterative procedure, (19) can replaced by

$$C_{i} = \sqrt{\frac{g_{i} \cdot C_{i-1} \cdot \left(C_{w_{i}} + C_{i+1} + C_{bl_{i}} + C_{fl_{i}} + C_{b2_{i}} + C_{f2_{i}}\right)}{g_{i-1} + \frac{R_{w_{i-1}} \cdot C_{i-1}}{\tau}}} = \sqrt{C_{i-1}C_{i+1}} \cdot \sqrt{1 + \frac{C_{w_{i}}}{C_{i+1}} + \frac{\left(C_{bl_{i}} + C_{f1_{i}} + c_{b2_{i}}\right)}{\frac{C_{i+1}}{Dranches and fanouts}}} \cdot \sqrt{\frac{g_{i}}{g_{i-1} + \frac{R_{w_{i-1}} \cdot C_{i-1}}{\tau}}}$$
(28)

From the relationship  $(g_i \cdot \tau)/C_i = R_i$ , an intuitive interpretation of the optimum condition can be derived similar to (11),

$$\left( R_{i-1} + R_{w_{i-1}} \right) \cdot C_i =$$

$$= R_i \cdot \left( C_{w_i} + C_{i+1} + \underbrace{C_{b_{l+1}}}_{branches and fanouts} + \underbrace{C_{b_{2,i+1}}}_{branches and fanouts} \right).$$
(29)

The load of the side branches is represented by  $C_{bf1}$  and  $C_{bf2}$ . These capacitances are the effective capacitive load of the branch wires and fanout gates shown in Fig. 10. Note that the resistances  $R_{b1}$  and  $R_{b2}$  of the wires on the fanout branches do not affect the Elmore delay of the path.



Fig. 10. Equivalent circuit with the effective branch and fanout capacitances  $C_{bf1}$  and  $C_{bf2}$  in parallel with the path capacitances.

These ULE optimum expressions can be generalized for any combination of side branch wires and fanout gates by determining the total effective capacitance of the fanout branches for each stage of the path,

$$C_{BF} = \sum_{1}^{n} C_{b_n} + \sum_{1}^{m} C_{f_m} , \qquad (30)$$

where n and m are the number of branch wires and fanout gates in a path stage, respectively. The general ULE conditions for gate sizing are determined from (30) similar to (27)-(29),

$$\left(g_{i} + \frac{R_{w_{i}} \cdot C_{i}}{\tau}\right) \cdot h_{i} = g_{i+1} \cdot \left(h_{i+1} + h_{w_{i+1}} + \frac{C_{BF_{i+1}}}{C_{i+1}}\right),$$
(31)

$$C_{i} = \sqrt{C_{i-1}C_{i+1}} \cdot \sqrt{1 + \frac{C_{w_{i}}}{C_{i+1}} + \frac{C_{BF_{i}}}{C_{i+1}}} \cdot \sqrt{\frac{g_{i}}{g_{i-1} + \frac{R_{w_{i-1}} \cdot C_{i-1}}{\tau}}}, \quad (32)$$

$$\left(R_{i-1} + R_{w_{i-1}}\right) \cdot C_i = R_i \cdot \left(C_{w_i} + C_{i+1} + C_{BF_i}\right).$$
(33)

Note that in those circuits without multiple fanout gates or branch interconnects, these general ULE conditions for gate sizing converges to (10), (11) and (18).

## IX. SUMMARY AND FUTURE WORK

Delay minimization in logic paths with wires is an important issue in the high complexity integrated circuit design process. The interconnect is a dominant factor in performance-driven circuits and must be explicitly considered throughout the design process. The characteristics of the wires are not correlated with those of the gates, thereby not permitting the use of the standard logical effort model. In fact, gate sizing in the presence of interconnect does not correspond to equal effort of all of the stages along a path. The Unified Logical Effort (ULE) method is proposed for delay evaluation and minimization of logic paths with general gates and *RC* wires. The ULE method provides conditions to achieve minimum delay. Optimal gate sizing in logic paths with wires is achieved when the delay component due to the gate capacitance is equal to the delay component due to the effective resistance of the gate. The ULE method converges to the standard Logical Effort when wire resistance and capacitance are negligible. Gate sizing determined by the proposed ULE method makes ULE suitable for both manual calculations and integration into existing EDA tools.

ULE optimization is compared with the industrial Analog Optimizer tool, showing close agreement in terms of delay. Thanks to the simplicity of the delay model, the computational run time of ULE optimization is several orders of magnitude lower than the industrial tool. This enhanced efficiency with similar accuracy demonstrates the high potential of ULE for integration into EDA tools.

The ULE method can be combined with known heuristics for buffering and repeater insertion. This combination is effective due to the fixed wire lengths dictated in many design flows. Further research is required to develop solutions that combine simultaneous optimal gate sizing with wire segmentation.

#### ACKNOWLEDGEMENT

The authors thank Nathanaelle Gibrat-Wormser and David Pedahel for contributing to the ULE evaluation. The helpful suggestions of the reviewers are gratefully acknowledged.

#### REFERENCES

- I. Sutherland, B. Sproull, and D. Harris, *Logical Effort Designing Fast CMOS Circuits*, Morgan Kaufmann Publishers, 1999.
- [2] I.E. Sutherland and R.F. Sproull, "Logical Effort: Designing for Speed on the Back of an Envelope," *Proceedings of the University of California/Santa Cruz Conference on Advanced Research in VLSI* (ARVLSI), pp. 1-16, 1991.
- [3] —, Section 10.4, Interconnect, p. 175, 1999.
- [4] A. Morgenshtein, E.G. Friedman, R. Ginosar, and A. Kolodny, "Unified Logical Effort - A Method for Delay Evaluation and Minimization in Logic Paths with RC Interconnect", CCIT Technical Report #612, EE Pub. no. 1569, *Technion*, 2007.
- http://www.ee.technion.ac.il/matrics/papers/UnifiedLogicalEffort-tr.pdf [5] H.B. Bakoglu, *Circuits, Interconnections and Packaging for VLSI*,
- Adison-Wesley, pp. 194-219, 1990.
  [6] H.B. Bakoglu and J.D. Meindl, "Optimal Interconnection Circuits for VLSI," *IEEE Transactions on Electron* Devices, vol. ED-32, no. 5, pp. 903-909, May 1985.
- [7] A. Nalamalpu and W. Burleson, "Repeater Insertion in Deep Submicron CMOS: Ramp-based Analytical Model and Placement Sensitivity Analysis," *Proceedings of the IEEE International Symposium on Circuits and Systems*, pp. 766-769, May 2000.
- [8] V. Adler and E. G. Friedman, "Repeater Design to Reduce Delay and Power in Resistive Interconnect," *IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing*, vol. 45, no. 5, pp. 607-616, May 1998.
- [9] B. Lasbouygues, S. Engels, R. Wilson, P. Maurine, N. Azemard, and D. Auvergne, "Logical Effort Model Extension to Propagation Delay Representation," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 25, no. 9, pp. 1677-1684, September 2006.

- [10] A. Kabbani, D. Al-Khalili and A. J. Al-Khalili, "Delay Analysis of CMOS Gates Using Modified Logical Effort Model," *IEEE* 
  - Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 24, no. 6, pp. 937- 947, June 2005.
- [11] J. Ebergen, J. Gainsley, and P. Cunningham, "Transistor Sizing How to Control the Speed and Energy Consumption of a Circuit," *Proceedings of the IEEE International Symposium on Asynchronous Circuits and Systems*, pp. 51- 61, April 2004
- [12] S. Srinivasaraghavan and W. Burleson, "Interconnect Effort A Unification of Repeater Insertion and Logical Effort," *Proceedings of the IEEE Computer Society Annual Symposium on VLSI*, pp. 55-61, February 2003.
- [13] H. C. Lin and L.W. Linholm, "An Optimized Output Stage for MOS Integrated Circuits," *IEEE Journal of Solid-State Circuits*, vol. SC-10, no. 2, pp.106-109, April 1975.
- [14] R. C. Jaeger, "Comments on 'An Optimized Output Stage for MOS Integrated Circuits'," *IEEE Journal of Solid-State Circuits*, vol. SC-10, no. 2, pp.185-186, June 1975.
- [15] B. S. Cherkauer and E. G. Friedman, "Design of Tapered Buffers with Local Interconnect Capacitance," *IEEE Journal of Solid-State Circuits*, vol. 30, no. 2, pp. 151-155, February 1995.
- [16] B. S. Cherkauer and E. G. Friedman, "A Unified Design Methodology for CMOS Tapered Buffers," *IEEE Transactions on Very Large Scale Integration Systems*, vol. 3, no. 1, pp. 99-111, March 1995.
- [17] S. R. Vemuru and A. R. Thorbjornsen, "Variable-Taper CMOS Buffer," *IEEE Journal of Solid-State Circuits*, vol. 26, no. 9, pp. 1265-1269, September 1991.
- [18] K. Venkat, "Generalized Delay Optimization of Resistive Interconnections through an Extension of Logical Effort," *Proceedings* of the IEEE International Symposium on Circuits and Systems, pp. 2106-2109, May 1993.
- [19] M. Moreinis, A. Morgenshtein, I. Wagner, and A. Kolodny, "Logic Gates as Repeaters (LGR) for Area-Efficient Timing Optimization," *IEEE Transactions on Very Large Scale Integration Systems*, vol. 14, no. 11, pp. 1276-1281, November 2006.
- [20] C. Chu and D. F. Wong, "Closed Form Solution to Simultaneous Buffer Insertion / Sizing and Wire Sizing," ACM Transactions on Design Automation of Electronic Systems, vol. 6, no. 3, pp. 343-371, July 2001.
- [21] W. C. Elmore, "The Transient Response of Damped Linear Networks with Particular Regard to Wide Band Amplifiers," *Journal of Applied Physics*, vol. 19, no. 1, pp. 55-63, January 1948.
- [22] Predictive Technology Model (PTM), http://www.eas.asu.edu/~ptm/.[23] Virtuoso Advanced Analysis Tools User Guide,
- http://www.ece.uci.edu/eceware/cadence/aatoolsuser/chap3.html
- [24] A. Cao, R. Lu, and C. K. Koh, "Post-Layout Logic Duplication for Synthesis of Domino Circuits with Complex Gates," *Proceedings of the Conference on Asia South Pacific Design Automation (ASP-DAC)*, pp. 260 - 265, January 2005.
- [25] P. V. Buch, H. Savoj, and L. P. P. Van Ginneken, "Timing Optimization in Presence of Interconnect Delays", US Patent No. 6,553,338, April 1999.

## IEEE TVLSI Manuscript No. TVLSI-00230-2007.R3



Arkadiy Morgenshtein received the B.S.E.E. in 1999, the M.S. degree in biomedical engineering in 2003, the M.B.A. degree in 2006, and the Ph.D. degree in electrical engineering in 2008, all from Technion – Israel Institute of Technology. From 1999 to 2008, he was a Teaching and Research Assistant at the Department of Electrical Engineering in Technion. From 2001 to 2004, he was a research engineer with Rafael, a national research and development organization. Since 2008 he is with Intel

Corporation, Core CAD Technologies group, where he is engaged in research and development of power optimization tools. His current research interests include low-power VLSI design and interconnect optimization.



**Eby G. Friedman** received the B.S. degree from Lafayette College in 1979, and the M.S. and Ph.D. degrees from the University of California, Irvine, in 1981 and 1989, respectively, all in electrical engineering. From 1979 to 1991, he was with Hughes Aircraft Company, rising to the position of manager of the Signal Processing Design and Test Department, responsible for the design and test of high performance digital and analog IC's. He has been with the Department of Electrical and Computer Engineering at

the University of Rochester since 1991, where he is a Distinguished Professor. He is also a Visiting Professor at the Technion - Israel Institute of Technology. His current research and teaching interests are in high performance synchronous digital and mixed-signal microelectronic design and analysis with application to high speed portable processors and low power wireless communications.

He is the author of more than 320 papers and book chapters, several patents, and the author or editor of ten books in the fields of high speed and low power CMOS design techniques, high speed interconnect, and the theory and application of synchronous clock and power distribution networks. Dr. Friedman is the Regional Editor of the Journal of Circuits, Systems and Computers, Chair of the IEEE Transactions on Very Large Scale Integration (VLSI) Systems steering committee, and a Member of several editorial boards and conference technical program committees. He previously was the Editorin-Chief of the IEEE Transactions on Very Large Scale Integration (VLSI) Systems, a Member of the editorial board of the Proceedings of the IEEE, a Member of the Circuits and Systems (CAS) Society Board of Governors, Program and Technical chair of several IEEE conferences, and a recipient of the Howard Hughes Masters and Doctoral Fellowships, the University of Rochester Graduate Teaching Award, a College of Engineering Teaching Excellence Award, and several other awards. Dr. Friedman is a Senior Fulbright Fellow and an IEEE Fellow.



**Ran Ginosar** (S'79, M'82, SM'07) received the B. Sc. (summa cum laude) in electrical and computer degree engineering from the Technion— Israel Institute of Technology, Haifa, in 1978, and the Ph. D. degree in electrical engineering and computer science from Princeton University, Princeton, NJ, in 1982. After working with AT& T Bell Laboratories for one year, he joined the Technion faculty in 1983. He was a Visiting Associate Professor with the University of Utah, Salt Lake City, from 1989 to

1990 University and a Visiting Faculty Member with the Strategic CAD Laboratory, Intel Corporation, from 1997 to 1999. He is currently the Head of the VLSI Systems Research Center at the Technion. His research interests include asynchronous circuits and systems, synchronization, networks on chip, manycore architecture, neuro-processors and electronic imaging.



Avinoam Kolodny received his doctorate in microelectronics from Technion - Israel Institute of Technology in 1980. He joined Intel Corporation, where he was engaged in research and development in the areas of device physics, VLSI circuits, electronic design automation, and organizational development. He has been a member of the Faculty of Electrical Engineering at the Technion since 2000. His current research is focused primarily on interconnects in VLSI systems, at both physical and architectural levels.