SYNTHESIZABLE DELAY LINE ARCHITECTURES FOR DIGITALLY
CONTROLLED VOLTAGE REGULATORS

A Thesis Submitted to

Electrical Engineering department

in partial fulfillment of the requirements
for the degree of Master of Science

by Omar Haridy

under the supervision of Dr. Yehea Ismail
and co-supervisor Dr. Amr Helmy

July 2012
Cairo, Egypt
To my family
ACKNOWLEDGEMENTS

First, I would like to thank God for everything that he has presented to me in life. I would like to express my gratitude to the people who encouraged and supported me during my MSc. Without their help this work would not have been completed.

I would like to express my gratitude and sincerest appreciation to my advisor Dr. Yehea Ismail, for his continual guidance, support, motivating suggestions and encouragement during my research. His endless energy and enthusiasm in research has motivated all of his students including me. It is a great honor to be one of his students in the center of nanoelectronis and devices at the American University in Cairo. I wish him all the best. Also, I would like to express my gratitude to Dr. Amr Helmy for his important advices and comments for my thesis. It was an honor to have Dr. Ali Darwish, Dr. Hani Ragai, and Dr. Mohamed Kassem in my thesis committee. I am grateful for their comments and objective directions to complete the master’s thesis work.

I would also like to thank my mentor Harish Krishnamurthy from Intel Circuit Research Labs (CRL) for his help and support initiating this research. I appreciate his advice and enthusiastic help that definitely light up my road during my internship at Intel. I would also like to thank Yorgos, George Matthew, and Stefano, with Intel Corporation, for their helpful discussions.

Last but not least, I would like to extend my gratitude to the people who have helped and inspired me during my research. My heartfelt appreciation goes toward my family and my dearest parents. They were always supportive whenever it was a hard time for me. I would like to thank my friends and colleagues especially Ahmed, Moataz, Kareem, Ahmed Mahfouz, and Ramy for their support and for the decent environment that we had in our lab.
ABSTRACT

The American University in Cairo

Synthesizable Delay Line Architectures for Digitally Controlled Voltage Regulators

Omar Fathy Mohamed Haridy
Supervisor: Dr. Yehea Ismail
Co-supervisor: Dr. Amr Helmy

Voltage regulators used in the integrated circuit (IC) industry require precise voltage regulation. In digitally controlled switching converters, this precise voltage regulation is achieved by high resolution digital pulse width modulators (DPWM). Digital delay lines can be used to generate the pulse width modulation (PWM) signal. Conventional delay lines are designed in a full custom design methodology which is extremely slow and expensive compared to register-transfer level (RTL) based designs; also RTL based designs are technology independent so the same design can be used with new technologies. The purpose of this work is to introduce a new architecture for the fully synthesizable digital delay line used in digitally controlled voltage regulators. A comparison between the proposed scheme and the conventional delay line is done post synthesis on the key delay line specifications like linearity, area, complexity, and compensation for process, voltage, and temperature (PVT) variations for multiple clock frequencies. Both schemes are designed using a hardware description language (HDL) and synthesized using Intel 32nm technology. The comparison showed that the proposed architecture has better linearity, area, and also it has a fast calibration time with respect to conventional delay lines. The delay lines are designed in parameterized way in order to make the design suitable for multiple frequencies.
TABLE OF CONTENTS

1. Introduction........................................................................................................................................... 1
   1.1. Evolution of Microprocessors and Voltage Regulators ................................................................. 1
   1.2. Voltage Regulator Operation ........................................................................................................ 4
   1.3. On-chip and Off-chip VRs ............................................................................................................. 6
      1.3.1. Limitations of Off-Chip Voltage Regulator ............................................................................ 6
      1.3.2. Advantages of On-Chip Voltage Regulator ........................................................................... 6
2. Background and Literature Review ...................................................................................................... 8
   2.1. On-chip Voltage Regulator Topologies .......................................................................................... 8
      2.1.1. The Linear Regulator ............................................................................................................... 9
      2.1.2. The Switching Regulator ......................................................................................................... 13
      2.1.3. Relative Merits/Demerits of Each Option ............................................................................. 16
      2.1.4. On-Chip VR Issues ................................................................................................................ 19
   2.2. DPWM Signal Generation .............................................................................................................. 20
      2.2.1. Counter Based DPWM ........................................................................................................... 23
      2.2.2. Delay Line Based DPWM .................................................................................................... 26
      2.2.3. Hybrid DPWM ....................................................................................................................... 29
   2.3. Digital Design Methodologies ..................................................................................................... 31
   2.4. Scope of Work / Thesis Objective .................................................................................................. 33
3. Digital Delay Line Architectures ....................................................................................................... 34
   3.1. Delay Line Calibration Techniques ............................................................................................... 34
      3.1.1. Fixed Number of Tunable Cells .............................................................................................. 36
      3.1.2. Variable Number of Cells ....................................................................................................... 38
   3.2. Digital Delay Line Building Blocks ............................................................................................ 39
      3.2.1. Conventional Adjustable Cells Delay Line ........................................................................... 39
3.2.2. Proposed Delay Line ................................................................. 48
3.3. Preliminary Comparison ................................................................. 54
4. Results and Comparison ................................................................. 55
  4.1. Synthesis Results and Comparison for 100 MHz .......................... 55
  4.2. Design Example ........................................................................ 57
    4.2.1. Conventional Delay Line Design Example ............................. 58
    4.2.2. Proposed Delay Line Design Example .................................. 59
  4.3. Proposed Scheme Analysis ......................................................... 61
5. Conclusion and Future Work ............................................................ 64
  5.1. Summary and Conclusion ............................................................ 64
  5.2. Future work .............................................................................. 64
  5.3. Publications .............................................................................. 65
Bibliography ....................................................................................... 66
LIST OF FIGURES

Figure 1: Intel microprocessor architectures [1] ................................................................. 1
Figure 2: Number of transistors integrated on the die for Intel microprocessors [3]. 2
Figure 3: Intel microprocessor’s power roadmap [4] ......................................................... 3
Figure 4: Voltage regulator operation [11] ........................................................................ 5
Figure 5: Power distribution noise in a system on a chip [16]. ................................. 9
Figure 6: Linear regulator functional diagram [19] ......................................................... 10
Figure 7: Standard (NPN) regulator [19] ................................................................. 12
Figure 8: LDO regulator [19] ......................................................................................... 12
Figure 9: QUASI-LDO regulator [19] ........................................................................ 13
Figure 10: Typical buck converter [17] ........................................................................ 14
Figure 11: Switching regulator with switch ON [2] .......................................................... 15
Figure 12: Switching regulator with switch OFF [11] ................................................... 15
Figure 13: Inductor current in switching converter [7] ................................................ 15
Figure 14: Ideal switched-capacitor voltage regulator [23]. ................................. 16
Figure 15: Block diagram of the digitally controlled buck converter [28] .............. 21
Figure 16: Trailing edge modulation circuit ............................................................... 22
Figure 17: DPWM signal generation using the Reset signal ........................................ 22
Figure 18: Counter based DPWM .................................................................................... 23
Figure 19: Timing diagram for a 2 bits counter based DPWM ...................................... 23
Figure 20: Delay line based DPWM ................................................................................ 25
Figure 21: Timing diagram for a 2 bits delay line based DPWM .................................. 25
Figure 22: Hybrid DPWM [31] ....................................................................................... 29
Figure 23: Timing diagram of a hybrid DPWM [31] ...................................................... 30
Figure 24: Full Custom design flow .............................................................................. 32
Figure 25: RTL design flow .................................................................32
Figure 26: Digital delay line architecture ........................................34
Figure 27: Ideal delay line .................................................................35
Figure 28: Cells delay at different corners .......................................36
Figure 29: Schematic of starved buffer [30] ....................................36
Figure 30: Calibration using fixed number of tunable cells ...............37
Figure 31: Calibration with variable number of cells ......................38
Figure 32: Conventional adjustable cells delay line .........................39
Figure 33: Typical tunable delay cell ...............................................40
Figure 34: Delay element structure ................................................41
Figure 35: Delay line structure ........................................................42
Figure 36: Adjustable cells delay line controller ............................43
Figure 37: Controller locking operation .........................................44
Figure 38: Extra Flops for meta-stability .........................................45
Figure 39: Meta-stability timing diagram .......................................45
Figure 40: Shift register .................................................................46
Figure 41: Delay line locking scenarios .........................................47
Figure 42: Linearity for different scenarios ......................................47
Figure 43: Proposed scheme block diagram ....................................48
Figure 44: Delay line structure ........................................................49
Figure 45: Delay cell structure ........................................................49
Figure 46: Proposed delay line controller .......................................50
Figure 47: Locking timing diagram ................................................51
Figure 48: Locking timing diagram ................................................51
Figure 49: Mapping block diagram ................................................53
Figure 50: Linearity for multiple frequencies at the slow corner .......62
Figure 51: Linearity for multiple frequencies at the fast corner............................63
LIST OF TABLES

Table 1: The characteristics of switching vs. linear regulators [26] ......................... 18
Table 2: DPWM approaches comparison ................................................................... 27
Table 3: ASIC design methodologies ....................................................................... 31
Table 4: Preliminary comparison ............................................................................. 54
Table 5: Post synthesis results at 100 MHz .............................................................. 56
Table 6: Proposed scheme synthesis results for multiple frequencies .................... 61
1. Introduction

1.1. Evolution of Microprocessors and Voltage Regulators

Microprocessor industry is improving the performance continuously. The huge market for microprocessors makes it necessary to improve the performance of the designed product. Microprocessors are used in many applications such as computer systems, embedded systems, and handheld devices. Figure 1 shows different Intel microprocessor architectures used for different applications [1].

![Figure 1: Intel microprocessor architectures [1]](image)

Providing more functionality and higher performance requires packing more transistors on a chip. As Intel co-founder Gordon E. Moore predicted, the number of transistors on a chip doubles approximately every two years. Intel, which has maintained this pace for decades, uses this golden rule as both a guiding principle and a springboard for technological advancement, driving the expansion of functions on a chip at a lower cost per function and lower power per transistor by introducing and using new materials and transistor structures [2]. Figure 2 shows the historical data of
transistor number integrated into Intel’s microprocessor [3]. The CPU transistor count has increased by 2X and feature size has decreased by 0.7X every two years.

![Graph showing CPU Transistor Count and Feature Size](image)

**Figure 2**: Number of transistors integrated on the die for Intel microprocessors [3]

Besides the increase in the number of transistors on a chip, the operating frequency is also increased dramatically. Having billions of transistors working at high frequency, result in huge current and tremendous total power dissipation. Figure 3 shows the power roadmap of Intel’s microprocessors [4]. Due to factors such as low voltage, high current, and fast load transition speeds, delivering high-quality power to modern processors has become a challenging task. Voltage regulators as a special power supply of these improving devices are facing thermal management challenge due to such high current requirement [5]. For a single microprocessor system, the power dissipated is proportional to the clock frequency [6]. Increasing the clock frequency, to achieve higher operating speed, results in an increase in the dissipated power. Starting in 2005, Intel changed the microprocessor structure from the single core to core multi-processor (CMP). This product uses multiple cores on the die, and includes dual-core and quad-core processors [3]. With the recent trend towards multi-core systems and system on a chip (SOC), improved power management techniques are required.
Figure 3: Intel microprocessor's power roadmap [4]
1.2. Voltage Regulator Operation

The function of the voltage regulator is to deliver power from the unregulated source to the load. The load requires a regulated voltage and a constant voltage level should be delivered to the microprocessor during transient response [7]. Transient response happens when the load current changes as the circuit moves from one operation mode to another.

The evolution of system on chip design has opened the window for integrating a lot of electronic circuits with different functions on the same chip. It is better from performance point of view to have each circuit or each group of circuits with its own value for the supply voltage. Thus, a many voltage regulators are required in such systems in order to deliver the required voltage for all circuits. Also, even if the input voltage is regulated, a voltage regulator is needed for the circuit if it operates in different operation modes like a normal operation mode and a power saving mode. Each of these modes can have a different value for the supply voltage and thus a voltage regulator is required in order to provide these different values for the supply voltage.

Recently, most designs are moving towards the low voltage supply circuits in order to reduce the power of the active circuits [8] [9]. Therefore, low voltage-power converters are achieving remarkable attention. One important parameter of the voltage regulator is its power efficiency to account for load variations. High power dissipation means that an expensive cooling system is required. The efficiency of a converter having output power, $P_{out}$ and input power, $P_{in}$ is given by:

$$\eta = \frac{P_{out}}{P_{in}} , \quad (1)$$

and the power loss is given by:

$$P_{loss} = P_{in} - P_{out} = P_{out}(1/\eta - 1) . \quad (2)$$
Here are the available circuit elements that can be used to build a converter that changes the voltage, but dissipates negligible power circuit [10]:

- Resistive elements, capacitive elements, magnetic devices including inductors and transformers, semiconductor devices operated in the linear mode
- Semiconductor devices operated in the switched mode

Figure 4 shows the operation of the voltage regulator. It takes an unregulated voltage from the source and through semiconductor switches and control circuitry; it provides a regulated output voltage that is suitable for load operation.

![Figure 4: Voltage regulator operation](image)

Inductors have large size and are difficult to be implemented on chip, so they are not used for applications where efficiency is not the primary concern. On the other hand, capacitors ideally do not consume power so they are important for switching converters. The basic elements consume the power in a voltage regulator are resistors and linear-mode semiconductor devices. Also, the power dissipation for switched-mode semiconductor devices is very small. As when the device is off, no current flows and when it is on, its voltage drop is small. So, capacitive and inductive elements, as well as switched-mode semiconductor devices, are available for synthesis of high-efficiency converters.
1.3. On-chip and Off-chip VRs

1.3.1. Limitations of Off-Chip Voltage Regulator

There are many types of voltage regulators that are used for off-chip implementations on the motherboard. Important system requirements are considered while selecting the regulator topology, such as the type of the input source, the maximum load current, and the output voltage tolerance.

In recent years, there has been a shift towards on chip voltage regulator implementations. On-chip implementation provides several benefits over off-chip implementation. The size reduction for the filter elements means that higher operating frequencies can be used, and thus the transient response to the load variations is improved.

Due to the large power transistors and large output filter, voltage regulators are placed off-chip. However, off-chip regulators have many problems. They have large area. Also, Low operating frequencies result in slow transient response due to the large values for the inductors and capacitors. Furthermore, parasitic inductance and capacitance elements in the power delivery network cause a voltage variation problem. In order to minimize this problem, voltage regulators are placed close to the load.

1.3.2. Advantages of On-Chip Voltage Regulator

On-chip regulators provide faster voltage switching and improved power delivery. An on-chip regulator operating at high switching frequencies avoids large filter components such as inductors and capacitors, allows filter capacitor to be integrated entirely on-chip, and enables fast voltage transients.

With the growing push towards multi-core system-on-chip implementations, in recent years, there has been a surge of interest to build on-chip integrated switching voltage regulators [12] [13]. Tight integration between the VR and the microprocessor
results in resonance elimination. These regulators, operating with high switching frequencies, can obviate large valued inductors and capacitors, allow the filter capacitor to be integrated entirely on-chip, place smaller inductors on the package, and enable fast voltage transitions at nanosecond timescales [14]. There is a direct tradeoff between the switching frequencies of the voltage regulator and their power conversion efficiency [14].
2. Background and Literature Review

This chapter covers the background of different topologies of on-chip voltage regulators along with the merits and demerits of each option. Also, the chapter provides a literature review for the Digital Pulse Width Modulation (DPWM) signal generation techniques. The pros and cons of each technique are discussed. Finally, the design flows for digital design mythologies are stated.

2.1. On-chip Voltage Regulator Topologies

In general, voltage regulators are placed off-chip, where one regulator resides between the power source and each load which in a star configuration [15]. This regulator delivers current to the load with the appropriate voltage levels. The International Technology Roadmap for Semiconductors (ITRS) predicts an increase in the power consumption of microprocessors for future applications [11]. The power delivery network (PDN) provides the power supply to the processors and when it is not designed properly it may be a major source of noise in the circuit, especially in high-speed electronic systems.

Voltage regulation and power management of integrated circuits are very critical in the nano-scale IC design. Portable electronic devices require higher levels of integration to reduce board space requirements. The specifications of the highly integrated power functions are changing with the requirements for small size, extended battery life, and low cost. On-chip voltage regulators offer many benefits for the multi-core implementation. A major problem with the high level of integration is the noise coupling between the various non-similar blocks constituting the system. The noise coupling is conceptually depicted in Figure 5, where the maximum power supply noise is transmitted through interconnections and vias at resonance in the power/ground planes of the package [16].

In recent years, building on-chip integrated switching voltage regulators has had a course of interest specially with the growing push towards multi-core system-on-chip implementations [12] [13]. Many challenges faced the on chip placement of
these regulators such as the requirement of size reduction of filter components and higher operating frequencies. On chip regulators allow fast voltage transitions, which were not possible when the regulator was placed off-chip away from the load. There is a direct tradeoff between the switching frequencies of the voltage regulator and their power conversion efficiency [14].

![Power distribution noise in a system on a chip][1]

Figure 5: Power distribution noise in a system on a chip [16]

There are a variety of voltage regulators available. However, the most widely used topologies for on-chip implementation are linear and switching regulators [17]. In this section, the linear regulator, the switched capacitor regulator, and the buck converter are presented with the relative merits/demerits of each option.

### 2.1.1. The Linear Regulator

The linear regulator is widely used in the power supply of electronics devices. It provides several advantages like low noise, simple design, low cost and fast
response to load current variations. Therefore, they are more suitable to be used in designs that require low output noise and fast input-output reaction. Moreover, it is very suitable for applications requiring multiple voltage islands because of its small size, low cost, and ease of on-chip integration [14].

The linear voltage regulator achieves constant output voltage by dissipating the excess input power delivered by the source. Therefore, a linear regulator has high efficiency only when the incoming power matches the power demanded by the load [18]. A typical linear regulator is shown in Figure 6. The regulator operates by using a voltage-controlled current source in order to force a fixed voltage at the regulator output terminal. The control circuit monitors the output voltage and adjusts the current source to maintain the desired voltage at the output.

![Linear regulator functional diagram](image)

Figure 6: Linear regulator functional diagram [19]

There are three basic types of linear regulator designs, the Standard (NPN Darlington) Regulator, Low Dropout or LDO Regulator, and Quasi LDO Regulator. The first difference between the three types is the dropout voltage, which is defined as the minimum voltage drop required across the regulator to maintain output voltage regulation. For high efficient linear regulator, the dropout voltage should be minimized. The LDO requires the least voltage across it, while the Standard regulator requires the most. The second difference between linear regulator types is the ground pin current required by the regulator. Increased ground pin current is undesirable
since it is wasted current. The Standard regulator has the lowest ground pin current, while the LDO generally has the highest.

In a linear regulator, the power delivered to the load is given by:

\[ P = (V_{IN} - V_{\text{dropout}}) \times I_{\text{load}} \]  

and the power extracted from the input source is:

\[ P = V_{IN} \times (I_{\text{load}} + I_{\text{pin}}) \]  

where, \( I_{\text{pin}} \), is the current in the internal linear regulator circuitry. To have a high efficiency regulator, the dropout voltage and the pin current must be minimized. Also, the voltage difference between input and output must be minimized since the internal power dissipation of linear regulators accounts for the loss of power efficiency as given by the equation:

\[ P = (V_{OUT} - V_{IN}) \times I_{\text{load}} \]  

The Standard (NPN) regulator is the first IC voltage regulator. It uses NPN Darlington configuration for the pass device as shown in Figure 7. The pass transistor requires a high minimum voltage (dropout voltage) across it given by:

\[ V_{D\text{min}} = 2V_{BE} + V_{CE} \text{ (Standard Regulator)} \]  

However, the ground pin current of the Standard regulator is very low because the base drive current to the pass transistor (which flows out the ground pin) is equal to the load current divided by the gain of the pass device. In the Standard regulator, the gain of the pass device is extremely high (pass device is a network composed of one PNP and two NPN transistors) [19].
In order to improve the linear regulator efficiency, the pass device of the Low-dropout (LDO) regulator is made up of only a single PNP transistor as shown in Figure 8.

The voltage dropout in this case is minimal as given by the equation:

\[ V_{D\,\text{min}} = V_{CE} \quad \text{(LDO Regulator)} \]  

(7)
this makes the LDO more suitable for battery-powered applications. However, the ground pin current is high because it is equal to the load current divided by the gain of the single PNP transistor.

The third type of linear regulators is the quasi-LDO, which uses an NPN and PNP transistor as the pass device as shown in Figure 9.

The dropout voltage of the Quasi-LDO regulator is given by:

\[ V_{D_{\text{min}}} = V_{BE} + V_{CE} \quad \text{(QUASI-LDO Regulator)} \]  

and the ground pin current of the quasi-LDO is fairly low.

![Figure 9: QUASI-LDO regulator [19]](image)

Major drawbacks of linear regulators are low conversion efficiency, which degrades linearly with the \( V_{out}/V_{in} \) ratio; they are also unable to provide multiple output voltages, and they cannot step up voltage [17].

**2.1.2. The Switching Regulator**

Switching regulators provide higher power conversion efficiency due to their low-loss inductor. This regulator topology can regulate a wide range of output voltage levels with better efficiency. It operates as a switch where its duty cycle determines
how much charge is transferred to the load [20]. Switching regulators can provide outputs that are higher than the input, which is in contrast with linear regulators. For on-chip implementation, switching voltage regulator is the preferred topology. Since most of the circuit supply voltages are lower than the voltage of the primary source to the board, these converters usually have to step down the voltage [16].

There are different types of switching regulators such as the switched capacitor regulator and the buck converter. The most commonly used switching converter is the buck converter since it also provides better efficiency. Figure 10 shows the typical buck converter which uses a switch to connect and disconnect the input voltage to the inductor that is connected to the output terminal having the same output current.

![Figure 10: Typical buck converter [17]](image)

When the switch is on, the input voltage is connected to the inductor causing a voltage difference to appear across it, and thus an increase in the current [21]. This current flows through the inductor and charge the capacitor as shown in Figure 11. When the switch is off, the input voltage is removed. The current through the inductor decreases but does not change instantly causing the input terminal of the inductor to have a negative voltage. Thus, the diode is turned on and the current flows through the load and the diode as shown in Figure 12. At the same time, the capacitor discharges into the load. Figure 13 shows the ramp up and down of the inductor
current. A more advanced buck converter with more details is covered in next sections.

![Diagram of Switching Regulator with Switch ON](image1)

**Figure 11**: Switching regulator with switch ON [11]

![Diagram of Switching Regulator with Switch OFF](image2)

**Figure 12**: Switching regulator with switch OFF [11]

![Diagram of Inductor Current](image3)

**Figure 13**: Inductor current in switching converter [7]

The inductor and the capacitor of the buck converter work as a low pass filter. So, in order to have a smooth output voltage, the cut of frequency of this filter should
be very small to pass the DC value only without any other higher frequency components. Since the cut off frequency is determined by:

\[ f = \frac{1}{LC} \]  

Thus, the values of the capacitor and inductor should be large.

Another type of the switching regulators is the Switched Capacitor Circuits (Charge Pumps). Figure 14 shows the ideal switched-capacitor voltage regulator which consist of switches and energy transfer capacitors in the power stage.

![Figure 14: Ideal switched-capacitor voltage regulator [22].](image)

The switches are turned on and off so that the converter cycles through a number of switched networks. This topology is easy to implement however, it has several drawbacks such as pulsating input current, weak regulation capability since the output voltage varies with the input voltage accordingly, and the dc voltage conversion is predetermined by the circuit structure [22].

### 2.1.3. Relative Merits/Demerits of Each Option

Important considerations when selecting a voltage regulator include:

1) The desired output voltage level and its regulation capability.
2) The output current capacity.
3) The applicable input voltages.
4) Conversion efficiency (Pout/Pin).
5) The transient response time.
6) Ease of use.
7) If applicable, the ability to step-down or step-up output voltages.
8) In switch-mode regulators, the switching frequency is also a consideration.

Linear regulators provide significant advantages over switching regulators: they are simple (requiring simpler support circuits with fewer external components), cheap, and there is no switching to generate excess noise. However, they suffer from several drawbacks. The main disadvantage of linear regulators is their low efficiency, since they are constantly conducting [23]. Linear regulators suffer from low power conversion efficiency specially if the required output voltage is much lower than the input voltage. For on chip implementations, this power loss means high temperature which cannot be ignored and a heat sink is required. The heat sink increases area and cost.

On the other hand, switching regulators offer higher power conversion efficiency. However, the large area requirement for the inductor and capacitor make it a serious concern for the on chip implementation. Thus, for off-chip implementation these types of regulators are good in terms of providing high-conversion efficiency. In order to overcome this obstacle, the size of the inductor and capacitor is reduced since the regulator operates at high frequencies for on chip implementation. But this of course causes efficiency degradation [24]. The requirements for high efficiency and high accuracy make the size of the inductor prohibitively large for SOC solutions, where the inductor is embedded on the chip. Another drawback of the switching regulator is the long transient response time of the regulator compared to the linear regulator due to the lag caused by the inductor. Table 1 concludes a comparison of the characteristics of switching and linear regulators.
<table>
<thead>
<tr>
<th></th>
<th>Ripple/Noise</th>
<th>Total Cost</th>
<th>Size</th>
<th>Complexity</th>
<th>Waste Heat</th>
<th>Efficiency</th>
<th>Function</th>
</tr>
</thead>
<tbody>
<tr>
<td>Linear</td>
<td>Low; no ripple, low noise</td>
<td>Low</td>
<td>Small to medium in portable designs, but may be larger if heat sinking is needed</td>
<td>Low, which usually requires only the regulator and low-value bypass capacitors</td>
<td>High, if average load and/or input/output voltage difference are high</td>
<td>High if VIN - VOUT difference is small</td>
<td>Only steps down; input voltage must be greater</td>
</tr>
<tr>
<td>Switching</td>
<td>Medium to high, due to ripple at switching rate</td>
<td>Medium to high, largely due to external components</td>
<td>Larger than linear at low power, but smaller at power levels</td>
<td>Low, which usually requires an inductor, diode, and filter caps in addition to the IC, for high-power circuits</td>
<td>Low, as components usually run cool for power levels</td>
<td>Low, except at very low load currents (μA), where switch-mode quiescent current (IQ) is usually higher</td>
<td>Steps up, steps down, or inverts</td>
</tr>
</tbody>
</table>
2.1.4. On-Chip VR Issues

The increasing demand for high-performance processor designs with improved power delivery networks is the motivation for on-chip integration of voltage regulators. Moreover, on chip integration saves area leading to cost reduction. The response time of the regulator is faster because it is placed closer to the load. However, there are a lot of overheads associated with on-chip integration. The first challenge is the large area of the filter components for the switching regulator. For one chip integration, the area of the capacitor and inductor are reduced resulting in worse regulation efficiency. Also, a smaller capacitor size results in less charge stored, which means less charge available to the load, which then introduces higher vulnerability to large \( \frac{di}{dt} \) events that can cause large voltage fluctuations \[20\]. Decoupling capacitors are used in order to reduce the fluctuations adding the increased chip area overhead.

Inductance issues are another concern for on chip integration. With new processor generations, faster transient response time is required, which requires the ability to change the current through the inductor quickly. The magnetic field within an inductor resists change as in the equation:

\[
di/dt = V_{OUT}/L .
\]  

(10)

Since \( V_{OUT} \) is constant, then the only way to increase the response time is by decreasing the inductance value resulting in a higher inductance ripple current and a more noticeable parasitic inductance in the circuit as well. Also, there is a challenge in fabricating on-chip inductors. To avoid inductance when dealing with on-chip regulators, potential candidates to replace an inductor are gyrators, which consist of a capacitor, resistors, and transistors.
2.2. DPWM Signal Generation

Buck converters perform regulation through switching between the input voltage and the ground as explained in the previous section. The trend for the buck converter design is moving towards digital circuits because of the demand for high performance, more flexible and reconfigurable DC-DC converters \[26\]. There has been a lot of effort done in research to switch from the most commonly used analog pulse width modulation (PWM) controllers to a digital controller. In this section, a detailed explanation of the digitally controlled buck converter is provided. Also, different digital control techniques are covered.

Figure 15 shows the basic building blocks of the digitally controlled buck converter. It is basically divided into two blocks, the converter body and the controller. The body consists of two transistors working as switches and a low pass filter. One switch connects the source voltage \(V_g\) to the input of the filter, and the other switch connects the ground to the filter input. The switches are controlled through a digital pulse width modulation (DPWM) signal. In order to make sure that only one switch is on at a time, one switch is controlled by the DPWM signal and the other switch is controlled by its inverted version. The signal at the input of the low pass filter is alternating between \(V_g\) and the zero voltage with the duty cycle of the DPWM signal. The low pass filter gets the average value of its input signal. The switches are designed with large width in order to allow large current to flow and reduce their resistance to improve the power efficiency.

The DPWM signal is a signal with a variable duty cycle. This signal is generated by the controller block to adjust the output voltage by controlling the duty cycle. In other words, this signal determines the percentage of time the input voltage is connected to the output filter and hence it determines the output voltage. The output voltage is used as feedback to the controller side, wherein it is compared to the reference voltage \(V_{ref}\) to calculate the error voltage. The error voltage \(V_e = V_o - V_{ref}\) is quantized by the ADC providing a digital word to calibrate the duty cycle of the DPWM signal in order to get the required voltage \(V_{ref}\).
This section focuses on the DPWM block that generates the DPWM signal. The output voltage $V_o$ and the resolution of the switching regulator are determined through equations:

$$V_o = V_g \times \text{Duty}, \quad (11)$$

$$\text{Resolution} = \Delta V_o = V_g \times \Delta \text{Duty} , \quad (12)$$

where $V_g$ is the input voltage and Duty is the duty cycle of the DPWM signal. In order to achieve high voltage resolution, precise control of the DPWM signal duty cycle is needed. In other words, the voltage resolution is converted into a time resolution, meaning that having more time accuracy of the duty cycle will result in having better output voltage resolution.
Pulse width modulation can be achieved by the simple trailing edge modulation circuit shown in Figure 16. The circuit simply consists of a Flip-Flop with $V_{dd}$ connected to its input. At the beginning of the switching period, the output pulse is set to one. The DPWM signal stays at one until the Reset signal comes to set the DPWM signal to zero as shown in Figure 17. The time instance at which the Reset signal is enabled is controlled in order to get different duty cycles.

Digitally controlled voltage regulators use different techniques to achieve the required resolution. A standard counter-based DPWM is proposed in [28], while [29] presents a pure delay line DPWM. In applications requiring high resolution, a hybrid DPWM is used to arrive at the best compromise between the high clock frequency.
requirements of the standard counter-based DPWM, and large hardware requirements of the pure delay line DPWM [30] [31] [32].

### 2.2.1. Counter Based DPWM

Figure 18 illustrates the building block of the typical counter based DPWM approach. In this approach, a counter with a high clock frequency is used. The relation between the counter clock frequency and the regulator switching frequency is given by:

\[
 f_{clock} = 2^n \times f_{switching}, \tag{13}
\]

where \( n \) is the number of bits of the counter. An \( n \)-bit counter counts up by one step at each input clock period, \( t_{clk} \). When the counter counts to its maximum, it resets itself and starts counting from zero at the next input clock period \( t_{clk} \). The output of the counter is then compared to the required duty cycle generating the Reset signal that controls the DPWM signal using the trailing edge modulation technique described earlier.

![Counter Based DPWM Diagram](image)

**Figure 18:** Counter based DPWM
The timing diagram of a simple example using a two-bit counter is illustrated in Figure 19. In this example, the counter has four possible values (00, 01, 10, and 11), thus, four different values for the duty cycle can be achieved:

- If duty input word (Duty) is 00, the reset signal goes high generating a DPWM with a 25% duty cycle as shown in Figure 19 (a).
- If duty input word (Duty) is 01, the reset signal goes high generating a DPWM with a 50% duty cycle as shown in Figure 19 (b).
- If duty input word (Duty) is 10, the reset signal goes high generating a DPWM with a 75% duty cycle as shown in Figure 19 (c).
- If duty input word (Duty) is 11, the reset signal goes high generating a DPWM with a 100% duty.

The resolution of the two bits counter is 25%. In order to achieve higher resolution, the number of bits of the counter should be increased which means a high clock frequency is required. Since the dynamic power is directly dependent on the clock frequency as given by the equation, higher clock frequency means high power consumption:

\[
P_{\text{dynamic}} = \alpha C_{\text{total}} V_{dd}^2 f_{\text{clk}},
\]

where \( \alpha \) is the activity factor, \( C_{\text{total}} \) is the total switched capacitance, \( V_{dd} \) is the supply voltage, and \( f_{\text{clk}} \) is the clock frequency. The power of the reported PWM controller alone is on the order of milliwatts.

The advantages of the counter based DPWM are its simplicity and linearity. However, for a high-resolution DPWM, the required number of bits of the counter is large. Therefore, the required clock frequency, \( f_{\text{clk}} \), can be unreasonably high. The switching frequency is in the range of 1 MHz as stated in [28], and the typical voltage regulator resolution used in the state of the art systems is about 13 bits [30]. Therefore, using a counter-based DPWM requires a clock frequency in the range of multiple GHZ which is very high and not available in all systems.
Figure 19: Timing diagram for a 2 bits counter based DPWM
2.2.2. Delay Line Based DPWM

The delay line based DPWM approach avoids the very high frequency requirement that appears in the counter-based DPWM. Since this approach uses the switching frequency without the need for higher clock frequency, the power is significantly reduced in comparison to the fast-clocked counter approach. So, this approach is used in low power systems. In [29], a PWM circuit created with a tapped delay line and a multiplexer is Proposed. Figure 20 illustrates the schematic of a typical delay line based DPWM. The delay line consists of a successive number of delay cells with a tap after each cell. The total delay of the delay line is adjusted so that the overall line delay is equal to the switching frequency period. The switching frequency pulse propagates down the delay line, and when it reaches the output selected by the multiplexer, it is used to set the DPWM output low.

![Figure 20: Delay line based DPWM](image)

Figure 21 illustrates the timing diagram of a simple example using a two bits delay line. In this example, the delay line has four cells and a four to one multiplexer. The delay of each cell is equal to one quarter of the switching frequency period. Depending on the duty input word, the duty cycle takes one of four possible values:
- If duty input word (Duty) is 00, the first tap (Tap 0) is selected generating a DPWM with a 25% duty cycle as shown in Figure 21 (a).
- If duty input word (Duty) is 01, the second tap (Tap 1) is selected generating a DPWM with a 50% duty cycle as shown in Figure 21 (b).
- If duty input word (Duty) is 10, the third tap (Tap 2) is selected generating a DPWM with a 75% duty cycle as shown in Figure 21 (c).
- If duty input word (Duty) is 11, the fourth tap is selected generating a DPWM with a 100% duty.

The resolution achieved by this simple delay line based DPWM is 25%. In order to achieve better resolution, more cells are used and a larger multiplexer is needed which means larger area. For example, for n bits resolution DPWM, a delay line with L delay cells and L to 1 multiplexer are used, where L is given by the equation:

$$ L = 2^n. $$

(15)

So, this scheme suffers from large hardware requirements compared to the counter based approach.

In the counter based approach, to achieve n bits resolution DPWM, n-bits counter is used which contains only n Flip-Flops. Thus, the area of the counter based DPWM scheme is much smaller than the area of the delay line based DPWM architecture. So, depending on the application and the available area and clock frequency resources, the proper scheme is selected. A simple comparison between the counter based and the delay line based DPWM approaches is concluded in Table 2.

<table>
<thead>
<tr>
<th></th>
<th>Counter</th>
<th>Delay line</th>
</tr>
</thead>
<tbody>
<tr>
<td>Clock frequency/ Power dissipation</td>
<td>High</td>
<td>Low</td>
</tr>
<tr>
<td>Area requirements</td>
<td>Small</td>
<td>Large</td>
</tr>
</tbody>
</table>

Table 2: DPWM approaches comparison
Figure 21: Timing diagram for a 2 bits delay line based DPWM
2.2.3. Hybrid DPWM

In order to achieve the best compromise between area and power, a hybrid DPWM is used for applications requiring high resolution [30]. Hybrid DPWM provide high resolution, high frequency DPWM without the need for a very high input clock frequency as in counter based DPWM, or a very large area as in pure delay line based DPWM. In this approach, both a counter and a delay line are used to achieve the required resolution.

Figure 22 shows an example of the hybrid scheme. The simple example is a 5-bit resolution DPWM. The most significant three bits (msb) are provided by the counter while the delay line provides the least significant two bits (lsb). In this example, the counter clock is eight times faster than the switching frequency and the line consists of four cells. If the counter is only used to achieve the same resolution, the required clock frequency in this case is 32 times the switching frequency. While if the delay line is only used to achieve the same resolution, the number of cells in the line should be 32 cells. So, using the hybrid approach reduces the requirement for both area and power.

Figure 22: Hybrid DPWM [30]
The timing diagram of the hybrid DPWM is shown in Figure 23. The counter counts at each clock period of the fast clock signal (clk). The output DPWM (DPWM_out) is set at the beginning of the switching period (when the counter count ‘cnt = 000’). First, the counter output is compared to the three most significant bits of the input duty word, msb(duty). The result of this comparison is the delclk signal. Then, this delclk signal propagates through the delay line. The delay line taps $i_0$, $i_1$, $i_2$, $i_3$ are connected to a four to one multiplexer. The two least significant bits of the input duty word, lsb(duty), are used as selection lines to the multiplexer to select the appropriate tap generating the reset signal R. In the example shown in Figure 22 and Figure 23, duty = 10110 and, therefore, $i_2$ is connected to R (lsb(duty) = 10). The signal R resets DPWM_out.

Figure 23: Timing diagram of a hybrid DPWM [30]
2.3. Digital Design Methodologies

Very large Scale Integrated (VLSI) circuit design has several design flows. Design Flow is a term used to describe the various design phases of an IC design. In this section, a comparison between different Application-specific integrated circuits (ASICs) design flows is provided.

We can distinguish between two design flows, the full custom design methodology, and the Register transfer level (RTL) design methodology [33]. The characteristics of the full custom design methodology are simply:

- Transistors are the building elements.
- Every transistor in the design can be (and often is) individually sized, regardless of its functional context.

RTL design methodology provides more automation in synthesis and automated place and route processes. Automation reduces design time, but the resulting circuitry and fabrication process may not be optimal. Custom designers can optimize the individual logic cells, the layout and wiring between the cells, and other aspects of the design. Table 3 illustrates a comparison between the two design methodologies. Figure 24 shows the steps of the custom design flow while Figure 25 shows the RTL design flow.

Table 3: ASIC design methodologies

<table>
<thead>
<tr>
<th>ASIC Design Methodologies</th>
<th>Full-custom Design</th>
<th>RTL based design</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>➢ Extremely slow, expensive</td>
<td>➢ Reasonable fast, less expensive</td>
</tr>
<tr>
<td></td>
<td>➢ Only used to design very high performance systems</td>
<td>➢ Most ASICs are currently designed using this method</td>
</tr>
<tr>
<td></td>
<td></td>
<td>➢ Technology independent</td>
</tr>
<tr>
<td></td>
<td></td>
<td>➢ Reliable</td>
</tr>
</tbody>
</table>
Figure 24: Full Custom design flow

Figure 25: RTL design flow
2.4. Scope of Work / Thesis Objective

Most of the delay lines used in the conventional DPWM circuits are designed in a full custom design methodology. As explained in the previous chapter, the full custom design methodology is extremely slow and expensive compared to register-transfer level (RTL) based designs. RTL based designs are technology independent, so the same design can be used with new technologies. The purpose of this work is to propose a fully synthesizable RTL digital delay line. Two different designs for the synthesizable digital delay line are implemented. One design uses the conventional adjustable delay cells calibration mechanism. While the second design uses a proposed calibration technique. The results are compared with respect to area, linearity, and the response of both designs for a frequency range from 50 MHz to 200 MHz. The results used in the comparison are the results after the synthesis and the Automatic Placement and Routing processes in order to have an indication about the actual area and delays. Intel 32nm technology is used in the synthesis process.
3. Digital Delay Line Architectures

This chapter introduces the implementation of the fully synthesizable digital delay lines used for generating the DPWM signal. First, different calibration techniques are presented. The first calibration technique uses the conventional approach used in literature. While the proposed calibration technique uses a different approach. Second, the implementation of two fully synthesizable digital delay lines is discussed. One delay line uses the conventional calibration technique, while the other uses the proposed technique. The building blocks for both delay line schemes are explained in detail. Also, a preliminary comparison between both schemes is provided.

3.1. Delay Line Calibration Techniques

The simple delay line consists of successive delay cells with a tap after each cell; a multiplexer is used to select one tap Figure 26. Delay lines can be used in digitally controlled converters to generate the DPWM signal with the required variable duty cycle as explained earlier. To achieve n-bit delay line resolution, $2^n$ delay taps are required.

![Digital delay line architecture](image)
Under ideal operation conditions, the overall line delay equals to the clock period as shown in Figure 27. This is required in order to have a full control of the duty cycle of the generated DPWM signal.

![Ideal delay line diagram](image)

Figure 27: Ideal delay line

Since elements’ delay varies with process and temperature variations, the PWM signal will be executed wrong and the required resolution will be lost. With technology scaling, the relative variations between different process corners increases. For the used Intel 32nm technology, the difference between the fast and the slow corners has a factor of 4X. This means that if the typical delay of a delay cell in this specific technology is $d$, the delay of the cell will be $d/2$ in the fast corner and $2d$ in the slow corner. Figure 28 shows the delay variations of delay cells for the fast and the slow process corners. As shown in the figure, the same tap at different process corners will result in different value for the duty cycle. Also, in case of the fast corner, part of the clock period is not covered by the delay line. Thus, a calibration scheme is used to synchronize the overall line delay to the clock period in order to compensate for both process and temperature variations. Process variations require calibration only at the beginning, while temperature variations require continuous real time calibration. Voltage variations are divided into two components, spike variations and transient variations (white noise). The calibration scheme will be able to account for the spike variations. But transient variations have very high frequencies which might be larger than the clock frequency of the system. So, bulk capacitors are used to remove these transient variations. The rest of this section describes the calibration...
concept of conventional adjustable cells scheme and also the calibration concept of the proposed delay line scheme.

Figure 28: Cells delay at different corners

3.1.1. Fixed Number of Tunable Cells

Conventional calibration techniques use fixed number of adjustable delay cells. In these schemes, variations in delay cells are compensated for by tuning the delay of each cell individually to synchronize the overall line delay to the clock period. Adjustable delay cells are designed in different ways. A starved buffer technique was proposed in [29], where the current available to switch the buffer output is limited by a series MOS device in the sub-threshold or linear region as illustrated in Figure 29. In [34], each delay cell consists of two cascaded inverters at the output of which a set of digitally configurable loads, called shunt capacitors, is connected.

Figure 29: Schematic of starved buffer [29]
A simple delay line with four adjustable cells is shown in Figure 30. Each delay cell has many branches (three in this example) with different number of delay elements. In this case, if the first path is selected, the cell delay will equal to the delay of one delay element. If the second path is selected, the cell delay will equal to the delay of two delay elements and so on. Thus, the resolution in this case is the delay element itself. A delay element can be a buffer or a group of combined buffers. Using a control word for each delay cell, only one branch is selected through the multiplexer. The control word can be different for each cell, resulting in different delays for the delay line cells. This means that the delay between every two successive taps is not always the same introducing non linearity in the output voltage. The objective in this calibration technique is to select the proper delay for each cell to have the overall line delay locked to the clock period. This locking mechanism is done using a Delay Locked Loop (DLL) with feedback as will be explained in details in the next sections.

Figure 30: Calibration using fixed number of tunable cells
3.1.2. Variable Number of Cells

The proposed calibration method uses a variable number of delay cells to synchronize to the clock period. Figure 31 illustrates this concept; the number of cells used to lock to the clock period is large in fast corners and small in slow corners. Since the number of cells locked to the clock period is not fixed, the input duty word should be mapped using the number of cells locked to the clock period. Otherwise, the output DPWM will be executed wrong. Assuming a system with the flowing information:

- Clock period = 20 ns
- The typical delay of the cell = 1 ns
- The cell delay = 0.5 ns in the fast corner
- The cell delay = 2 ns in the slow corner
- The required duty cycle = 50%

In order to achieve a 50% duty cycle, the tap after the delay cell number 10 should be selected. But, in the fast or the slow corner, this tap generates a duty cycle of 25% and 100% respectively. Thus, a mapping is required in order to select the proper tap depending on the process corner. For this example, in the fast corner, tap number 20 should be selected and in the fast corner, tap number 5 should be selected.

![Figure 31: Calibration with variable number of cells](image-url)
3.2. Digital Delay Line Building Blocks

In this section, a detailed explanation of the building blocks and the operation concepts of two digital delay line schemes are introduced. The first scheme is designed using the conventional adjustable cells delay line technique. While the second scheme uses the proposed idea of different number of cells locking to the clock period. Both schemes are designed using Verilog Hardware Description Language (HDL) and synthesized using Intel 32nm technology. Mentor Graphics QuestaSim tool is used for the Verilog simulation and for verification at different stages: High-Level verification, Gate-Level verification, and Post-Layout verification. Synopsis Design Compiler is used in the synthesis process and Synopsis IC Compiler is used for the Automatic Placement and Routing (APR) process.

3.2.1. Conventional Adjustable Cells Delay Line

Figure 32 shows the block diagram of the conventional adjustable cells delay line. The scheme is very simple containing the delay line itself, a multiplexer, and a controller block.

![Figure 32: Conventional adjustable cells delay line](image)
The delay control block is responsible for adjusting the delay of all delay cells individually to lock the overall line delay to the clock period. The input digital word selects the required tap through a multiplexer to generate the DPWM signal with the required duty cycle. The delay line consists of a fixed number of successive tunable delay cells.

A typical delay cell shown in Figure 33 is used; the delay of the cell can be adjusted with the ratio of 1:3. The cell has 3 parallel branches to connect the input to the output. Two control bits for every delay cell coming from the controller are used in a thermometer code to select one of the 3 branches using a multiplexer.

The thermometer code works as follows:

- Control = 00 → the first branch is selected → the cell delay = delay of one delay element
- Control = 01 → the second branch is selected → the cell delay = delay of two delay elements
- Control = 11 → the third branch is selected → the cell delay = delay of three delay elements

![Figure 33: Typical tunable delay cell](image)

In general, the number of paths inside the delay cell is determined depending on the relative variations between the fast and the slow corners. In the fast corner, all delay cells are set to the longest branch. While for the slow corner, all delay cells are
set to the shortest branch. Thus, if the delay in the slow corner is $m$ times its value in the fast corner, the delay cell should compose of $m$ branches.

The delay element is simply a buffer (two successive inverters) as shown in Figure 34. For different clock frequencies, a group of buffers is combined in one delay element in order to achieve locking while maintaining the same resolution.

![Figure 34: Delay element structure](image)

The block diagram of the delay line is illustrated in Figure 35. In this scheme, it is clear that the delay line is very complex as it contains sophisticated tunable delay cells. This means that the delay line will contribute with a big portion of the overall area of this architecture. The signal Control SR coming from the control shift register contains the control words of all delay cells. The selector block selects 2 bits from the Control SR signal for each delay cell. Then, the 2 bits are used inside each cell to select one branch as explained earlier. The number of delay cells in the delay line is fixed in this architecture. This number is determined depending on the required resolution of the DPWM signal. And then, depending on the switching frequency, the number of buffers inside the delay element is calculated.
Figure 35: Delay line structure

Figure 36 shows the block diagram of the controller. Using a shift register the block generates a digital word to tune the delay line cells to lock to the clock period. The shift register is initialized to an all zero value in order to program all delay line cells to the minimum delay. The last two taps (taps) of the delay line are used as inputs to the controller block. The clock edge is compared to the selected taps, tap (n) and tap (n-1), as shown in Figure 37. The overall line delay locks to the clock period when the clock edge comes between the last two taps. As long as the system is not locked (taps ≠ 01), the controller updates the shift register by shifting one into it. This affects the delay of only one cell in the delay line. The comparison and shift operations are repeated until locking (taps = 01). A signal Up_lim is used to indicate that the maximum delay is achieved.
The controller block is a synchronous block in which the circuit only changes in response to a system clock edge. But the input signal \textit{taps} is an asynchronous signal coming from the delay line. So, a meta-stability problem appears as the input signal may change in the setup time of the input flip flop. In a metastable state, the signal oscillates between the stable logic states (zero or one) for an indeterminate amount of time, and as a result, requires unexpectedly long resolving times. Meta-stability problem causes system failures and malfunctions of digital systems \cite{35}. In order to tackle this issue, extra flip flops at the input are used. The flip-flops are used as a synchronizer to avoid meta-stability problems between delay line taps and the controller block. As defined in \cite{36}, “a synchronizer is a device that samples an asynchronous signal and outputs a version of the signal that has transitions synchronized to a local or sample clock”. Although this scheme cannot eliminate the meta-stability problem, it minimizes the probability of synchronous failure. We can obtain the estimated mean time between failure years based on the equation in \cite{37} \cite{38}. 

---

**Figure 36: Adjustable cells delay line controller**
Figure 37: Controller locking operation
The simplest and most common synchronizer used by digital designers is a two-flip-flop synchronizer as shown in Figure 38. The first flip-flop samples the asynchronous input signal into the new clock domain and waits for a full clock cycle to permit any meta-stability on the first stage output signal to decay, then the first stage signal is sampled by the same clock into a second stage flip-flop, with the intended goal that the second stage signal is now a stable and valid signal. Figure 39 shows the meta-stability timing diagram with the same explanation.

![Figure 38: Extra Flops for meta-stability](Image)

(1) If the input changes in this window, the output of FF1 will be in the meta-stable state and FF2 will propagate the old value

(2) Hopefully the output of FF1 is settled, so that FF2 is in a stable state

![Figure 39: Meta-stability timing diagram](Image)

The number of bits required to control the delay cell using the thermometer code is given by:

\[ \text{Cell control word} = (m - 1) \text{ bits} , \]  

(16)
where $m$ is the adjustment ratio. And, the size of the shift register is calculated using the following equation:

$$\text{Size of shift register} = (m - 1) \times 2^n + 1 \text{ bits}, \quad (17)$$

where $n$ is delay line resolution. The extra one bit is for the $Up\_lim$ signal.

Figure 40 shows an example of a shift register for a design with $m = 3$ and $n = 5$, so, the number of delay cells in the delay line is 32 and each delay cell has two control bits. As illustrated in the figure, the shift register has the first bit for all cells followed by the second bit for all cells. This arrangement of the control bits in the shift register means that if the overall line delay is not locked to the clock period, the controller increases or decreases the delay of the cells in their order. In other words, if the overall line delay is less than the clock period, the controller increases the delay of the first cell then the second and so on until locking.

![Shift register diagram](image)

Figure 40: Shift register
The system linearity depends on the way in which the control signals at the output of the shift register are used to adjust the cells’ delay. From a linearity perspective, the worst case is to put all the cells with high or low delays at the beginning of the delay line. The ideal way for linearity is to adjust half of the delay cells at a low delay value and the other half are at a high delay value as proposed in [30]. Figure 41 illustrates this idea simply; having four cells, locking is achieved by increasing the delay of the first then the second cell in scenario number 1 in blue. While in scenario number 2, locking is achieved by increasing the delay of the first then the third cell. The linearity is better in scenario number two when extending the number of delay line cells as shown in Figure 42.

![Figure 41: Delay line locking scenarios](image1)

![Figure 42: Linearity for different scenarios](image2)
3.2.2. Proposed Delay Line

The block diagram of the proposed scheme is shown in Figure 43. The block consists basically of three parts, a delay line, a controller, and a mapping block. The operation of this scheme has two phases, the locking phase and the mapping phase. In the locking phase, the controller determines the number of delay cells required to lock to the clock period. The mapping phase generates the output signal by mapping the input digital word to the correct tap using the number of cells determined in the locking phase. Mapping is required in this scheme because the number of cells locked to the clock period is not the same for different process corners and temperature variations. Then, the input digital word that is responsible of generating the DPWM signal should be mapped according to this number of the cells that are locked to one clock period.

Figure 43: Proposed scheme block diagram
Similar to the previous scheme, the delay line consists of a number of successive delay cells as illustrated in Figure 44. But in this scheme, delay cells are not tunable. So, the delay cell is very simple compared to the tunable cell that contains many different branches.

![Figure 44: Delay line structure](image)

Figure 44: Delay line structure

Figure 45 shows the delay cell structure, each cell contains only one branch with one buffer or more. The number of buffers combined in each cell is determined according to the clock frequency and the required resolution. For lower frequencies, a higher number of delay elements are combined in one cell in order to maintain the same resolution.

![Figure 45: Delay cell structure](image)

Figure 45: Delay cell structure

The number of cells in this scheme is determined from the fastest corner in which the cells’ delay is minimal and a maximum number of cells are required for locking. This design is considered a worst case design that guarantees locking for all process corners and different temperature variation. For the fast process corner, all
cells are used for locking and for generating the output DPWM signal. While for slower corners, some cells are not used at all.

The controller block diagram is shown in Figure 46. At the beginning, the reset signal $\text{Reset}$ is high, so the $\text{tap\_sel}$ signal selects the first tap through the calibration multiplexer MUX 1. Using a flip-flop, the selected tap is compared to the clock edge and the $\text{up/down}$ signal is generated to decide whether more or less number of cells is required to lock to the clock period. Using the new $\text{tap\_sel}$ signal, a new tap is selected and the same procedure is repeated until locking. The calibration process is done continuously even after locking to accommodate the temperature variations. Extra flip-flops are used to handle meta-stability issues. The scheme is designed so that the controller updates its value every clock cycle making the calibration time very short compared to other schemes that updates the value every many clock cycles.

![Controller Diagram](image)

Figure 46: Proposed delay line controller
The locking operation is done for only half cycle of the clock period. This simplifies the process a lot and prevents any confusion for the controller. Also, the locking requires less number of clock cycles resulting in a faster locking operation. Figure 47 and Figure 48 show the timing diagram of the controller input signal $tap$ at different cycles until locking.

![Figure 47: Locking timing diagram](image)

![Figure 48: Locking timing diagram](image)
When the value of the tap signal is zero, the up/down signal is up and the multiplexer MUX 2 adds one to the tap_sel selecting the next delay line tap. This operation is repeated until the tap signal is one meaning that the delay of the line at this tap is more than half clock period as illustrated in Figure 47. At this time instance, the up_down signal is down and MUX 2 subtracts one from the tap_sel signal to select the previous tap as illustrated in Figure 48. The up_down keeps toggling between up and down, this means locking is achieved.

Since the number of cells locked to the clock period varies from one process corner to another and also with temperature variations, the input digital word cannot be used directly to select a tap from the delay line to generate the DPWM signal. A mapping to this input digital word with respect to the number of cells used in the locking operations is required in order to execute a correct DPWM signal. The mapping block maps the input digital word to a calibrated selection word Cal_sel using this equation:

\[
\text{Cal}_\text{sel} = \left( \frac{\text{tap}_\text{sel}}{\text{total number of cells}/2} \right) \times \text{Digital Word}.
\]  

The tap_sel represents the number of cells locked to half clock period, so it is divided by half the typical total number of cells in the delay line. In the design, the total number of cells is a power of two, so the division is done by shifting the tap_sel signal. Using the output multiplexer MUX 2, the mapped value Cal_sel selects the output as shown in Figure 49.

The linearity of this scheme is very good with respect to the conventional tunable cells delay line. Since all cells are not tunable and only have one branch, thus all cells are identical except for variations due to the placement and routing. Also, there is no redundancy in the delay cell. But of course some cells are useless in some process corners. Instead of the worst case design for the delay line that guarantees locking under all conditions and for all corners, statistical analysis can be studied in order to optimize the number of cells used in this architecture. This statistical analysis is explained in more details in the future work section.
Figure 49: Mapping block diagram
3.3. Preliminary Comparison

The block diagrams of both schemes show that the conventional adjustable cells delay line scheme is simpler as it does not need a mapping block since the number of cells is fixed. Also, this scheme uses only one multiplexer while the proposed scheme uses two multiplexers. This extra circuitry in the proposed scheme is translated to extra area, but the delay line itself in the proposed scheme is very simple compared to the conventional delay line scheme. Also, the controller block in the proposed scheme is simpler while the shift register in the conventional scheme is considered an overhead. Thus, this information is not enough to decide which scheme is better from area perspective. Synthesis results are discussed in the following section for both schemes to compare between them from area point of view. From linearity perspective, the proposed scheme has the best linearity that could be achieved because all delay cells are identical except for process variations. Table 4 illustrates the initial comparison between the two schemes.

Table 4: Preliminary comparison

<table>
<thead>
<tr>
<th>Conventional adjustable cells delay line</th>
<th>Proposed delay line</th>
</tr>
</thead>
<tbody>
<tr>
<td>➢ Complex delay cell</td>
<td>➢ Simple delay cell</td>
</tr>
<tr>
<td>➢ Worse linearity</td>
<td>➢ Better linearity</td>
</tr>
<tr>
<td>➢ No mapper is required</td>
<td>➢ Requires a mapper and extra Multiplexer</td>
</tr>
</tbody>
</table>
4. Results and Comparison

This section discusses the implementation and results of both delay line schemes. First, for a given clock frequency, results after the synthesis process are compared in order to decide which scheme is better from area perspective. Then, more simulations are done for the better scheme at a range of frequencies in order to test scalability, linearity, and the behavior of the system for different frequencies.

4.1. Synthesis Results and Comparison for 100 MHz

The synthesis process is the process of translating or mapping the RTL code into real gates using a specific technology. Intel 32nm technology is used in this project. The RTL simulation only performs functional verifications without any information about the actual delays of the system or any area information. Synopsis Design Compiler tool is used for the synthesis process. After synthesizing the design using the technology files and generating the actual gates that perform the function described by the RTL code, the tool generates different reports for the design like area, delay, and power reports. These reports are not the final reports that should meet the system constraints and specifications because the routing process is not done yet. However, the information provided by the synthesis process gives an indication about the total area and it is enough to compare between the two schemes from area point of view.

The design of both schemes is parameterized. The parameterized design allows programming the number of buffers in the delay element, the number of elements in the delay cell, and the number of cells in the delay line. For different clock frequencies, these parameters are only chosen to fit for multiple frequencies without the need for changing the RTL code.

For a fair area comparison between the schemes, the maximum delay that can be achieved from both schemes should be the same. The longest delay achieved by the proposed scheme is to use all the cells, while the longest delay achieved by the conventional adjustable cells scheme is when the longest branch inside each cell is
chosen. Table 5 shows post synthesis results for the total area and the area distribution for both schemes using 100 MHz clock frequency. The proposed scheme has 256 delay cells; each cell is two buffers. The conventional scheme has 64 tunable cells; each tunable cell has 4 branches so it can be programmed with the ratio 1:4. Each delay element is two buffers. The maximum delay \((m1 \text{ and } m2)\) achieved by both schemes is the same as illustrated in the equations:

\[
m1 = 256 \text{ cells} \times 2 \text{ buffers} = 512 \text{ Buffers},
\]

\[
m2 = 64 \text{ cells} \times 4 \text{ elements} \times 2 \text{ buffers} = 512 \text{ Buffers.}
\]

Post synthesis results show that the proposed scheme has a better total area although it has an extra multiplexer used for calibration and an extra block used for mapping. The calibration multiplexer has a 2-bit output, so it has double the area of the output multiplexer. The large area of the conventional scheme is consumed in the delay line itself due to the complex tunable delay cell. Only one branch of each delay cell is used at a time, so other branches represent redundancy. Also, the huge shift register used in the controller adds area overhead.

Table 5: Post synthesis results at 100 MHz

<table>
<thead>
<tr>
<th>Parameters</th>
<th>DDL Parameters</th>
<th>Proposed Scheme</th>
<th>Conventional Scheme</th>
</tr>
</thead>
<tbody>
<tr>
<td>Number of taps</td>
<td></td>
<td>256</td>
<td>64</td>
</tr>
<tr>
<td>Total area (μm²)</td>
<td></td>
<td>1337</td>
<td>2330</td>
</tr>
<tr>
<td>Area distribution</td>
<td>Line: 24.7 %</td>
<td></td>
<td>Line: 52.4 %</td>
</tr>
<tr>
<td></td>
<td>Output MUX: 14.9 %</td>
<td></td>
<td>Output MUX: 3 %</td>
</tr>
<tr>
<td></td>
<td>Cal MUX: 30.3 %</td>
<td></td>
<td>Controller: 46.6 %</td>
</tr>
<tr>
<td></td>
<td>Controller: 9.8 %</td>
<td></td>
<td>Mapper: 20.3 %</td>
</tr>
<tr>
<td></td>
<td>Mapper: 20.3 %</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
In [29], a complete DC-DC converter was implemented to test and evaluate the low resolution feedback and digital PWM circuit. The circuits were fabricated on an integrated circuit, along with output switches to create a down converter. The fabrication was done on a 0.6 µm CMOS process. The intended application was a 5mW, 3V in, 1V out DC-DC converter, and the output switches were sized for this load. In order to give about 10mV resolution for the output voltage, a 256 tap delay line was used. The reported chip area was 2.3mm × 2.4mm, and the contribution of the delay line and the multiplexer was about 1mm × 0.5mm.

A digital PWM controller was designed in [27] and implemented in a standard 0.5µm CMOS process. The chip design was described in Verilog HDL. Synopsys synthesis and timing verification tools were used to reduce the design to standard-cell gates. Less than 0.2 mm² was taken by the delay-line A/D converter. The total active chip area was less than 1 mm².

These results cannot be directly compared to the results of the work done in this thesis because the process technology is totally different. So, our approach to have a fair comparison between our proposed delay line scheme and the conventional delay line was to design both schemes using the same process technology (Intel 32nm technology).

### 4.2. Design Example

In a real system design, system specifications state the required resolution of the delay line and the clock frequency and the design technology (used in the synthesis process). Design steps for the two schemes are different. Following an example for the design steps of both schemes:

Design example of a digital delay line with the following specifications:

- 100 MHz clock frequency
- 6 bits resolution
- 32nm Intel technology, which has the following data:
  - Buffer delay in the fast corner ≈ 20 ps.
Buffer delay in the slow corner \(\approx 80\) ps.

### 4.2.1. Conventional Delay Line Design Example

For the conventional delay line design, in order to provide 6-bit resolution, the delay line is designed with \(2^6\) taps. The number of cells in the delay line and the multiplexer size are given by the following equations:

\[
\text{Number of delay cells} = 2^6 = 64, \quad (21)
\]
\[
\text{Multiplexer size} = 64 \text{ to } 1. \quad (22)
\]

For the given technology, the ratio between the delay in the fast and the slow corners is approximately 4. Thus, the adjustment ratio is:

\[
\text{Adjustment ratio} = 4. \quad (23)
\]

The adjustment ratio is the number of branches inside the tunable delay cell. The first branch has one delay element, the second has two, the third has three, and the forth have four elements.

For the fast corner, the longest branch is selected for all delay cells and the maximum number of successive delay elements is used. The maximum number of delay elements in the delay line is given by:

\[
\text{Number of elements in the longest path} = 64 \times 4 = 256. \quad (24)
\]

In order to lock to the clock period, the maximum delay achieved by the line should cover the 10 ns clock period, so the delay of the delay element should be:
\[
\text{Delay of the delay element} \geq 10 \text{ ns}/256, \quad (25)
\]
\[
\text{Delay of the delay element} \geq 39 \text{ ps}. \quad (26)
\]

Since the buffer delay in the fast corner = 20 ps, then each delay element is designed to contain two buffers:

\[
\text{Number of buffers combined in the delay element} = \frac{39}{20} = 2. \quad (27)
\]

In the worst case (the fast corner), the delay of the element and the overall line delay are given by:

\[
\text{Delay of the delay element} = 2 \times 20 = 40 \text{ ps}, \quad (28)
\]
\[
\text{The overall line delay} = 40 \times 4 \times 64 = 10.24 \text{ ns}. \quad (29)
\]

The overall line delay is larger than one clock period, so locking is guaranteed for all process corners.

**4.2.2. Proposed Delay Line Design Example**

For the proposed delay line design, the first step is to determine the number of cells in the delay line. In order to achieve 6-bit resolution delay line, \(2^6\) different steps should be used. In the slow corner, a minimum number of cells is used in the locking operation. But the system should guarantee the 6-bit resolution for all corners. So, \(2^6\) cells are used in the slow corner. The number of cells in the proposed scheme is given by the ratio between the buffer delays in the slow and the fast corners:

\[
\text{Number of cells} = 2^6 \times \frac{\text{slow delay}}{\text{fast corner delay}} = 64 \times \frac{80}{20} = 256. \quad (30)
\]
This means that for the fast corner, a maximum number of cells (256) is used to lock to the clock period. Since the number of delay line taps is 256, the output multiplexers and the calibration multiplexer have the same size, but the calibration multiplexer has a 2-bit output, so it has double the area of the output multiplexer:

$$\text{Multiplexers size} = 256 \text{ to } 1.$$ \hfill (31)

In order to lock to the clock period, the maximum delay achieved by the line should cover the 10 ns clock period, so the delay of the delay cell should be:

$$\text{Delay of the delay cell} \geq 10 \text{ ns/256},$$ \hfill (32)

$$\text{Delay of the delay cell} \geq 39 \text{ ps}.$$ \hfill (33)

Since the buffer delay in the fast corner = 20 ps, then, the delay cell is designed to contain two buffers:

$$\text{Number of buffers combined in the delay cell} = \frac{39}{20} = 2.$$ \hfill (34)

In the worst case (the fast corner), the delay of the cell and the overall line delay are given by:

$$\text{Delay of the delay cell} = 2 \times 20 = 40 \text{ ps},$$ \hfill (35)

$$\text{The overall line delay} = 40 \times 256 = 10.24 \text{ ns}.$$ \hfill (36)

The overall line delay is larger than one clock period, so locking is guaranteed for all process corners.
4.3. Proposed Scheme Analysis

More simulations are done for the proposed scheme to test its scalability with frequency and also to check its linearity for different process corners. Synthesis results for the area and the area distribution are included in Table 6 for an 8-bit resolution delay line. To achieve the same resolution, more buffers are combined in one delay cell for lower frequencies. The area contribution of the delay line itself increases for lower frequencies as a result of the larger delay cell. As long as the delay line resolution is the same, the only difference between multiple frequencies is the number of buffers combined together in one delay cell.

Table 6: Proposed scheme synthesis results for multiple frequencies

<table>
<thead>
<tr>
<th>Comparison parameters</th>
<th>Clock Frequency</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>50 MHz</td>
</tr>
<tr>
<td>Buffers combined in one cell</td>
<td>4</td>
</tr>
<tr>
<td>Total area (μm²)</td>
<td>1675</td>
</tr>
<tr>
<td>Area distribution</td>
<td></td>
</tr>
<tr>
<td>Delay Line</td>
<td>39.5 %</td>
</tr>
<tr>
<td>Output MUX</td>
<td>11.9 %</td>
</tr>
<tr>
<td>Calibration MUX</td>
<td>24.7 %</td>
</tr>
<tr>
<td>Controller</td>
<td>7.8 %</td>
</tr>
<tr>
<td>Mapper</td>
<td>16.1 %</td>
</tr>
</tbody>
</table>

In order to be able to monitor the linearity at different frequencies, the output delay of the delay line should be multiplied by a factor representing the frequency difference.

Figure 50 and Figure 51 show the linearity performance for multiple frequencies at the slow and fast corners respectively. These results are the results after
the Automatic Placement and Routing (APR) process. The horizontal axis represents the input digital word before calibration. The figures show the delay of the 50 MHz simulation, the delay of the 100 MHz simulation multiplied by a factor of 2, and the delay of the 200 MHz multiplied by a factor of 4. The linearity is better for lower frequencies because the number of buffers combined in one cell is larger, so some of the random variations of the cells’ delay cancel each other. For the slow corner, less number of delay cells is used to lock to the clock frequency compared to the fast corner. So, many input words are mapped to the same calibrated word and hence the same output delay as shown in Figure 50. While for the fast corner, larger number of elements is locked to the clock frequency and hence different input words will result in different output delays as shown in Figure 51.

Some tips should be taken into consideration while doing the placement process in order to achieve better linearity. For example, the delay line cells should be placed beside each other carefully to ensure that the delays of the cells are matched as much as possible.

![Figure 50: Linearity for multiple frequencies at the slow corner](image-url)
Figure 51: Linearity for multiple frequencies at the fast corner
5. Conclusion and Future Work

5.1. Summary and Conclusion

This work introduces a new scheme for a fully synthesizable digital delay line used in digitally controlled voltage regulators. A conventional adjustable cells delay line scheme is also implemented using the same technology in order to obtain a fair comparison. The proposed delay line scheme performs calibration by using different number of delay cells, while the conventional adjustable cells scheme uses a fixed number of tunable delay cells and does the calibration by adjusting the delay of each cell. Both schemes are parameterized in order to be suitable for a wide range of frequencies. The work also provides a comparison between the schemes on the key delay line specifications like linearity, area, complexity, and compensation for process and temperature variations for multiple clock frequencies. Post synthesis results show that the proposed scheme is better from both area and linearity perspectives. The area and linearity of proposed scheme are discussed for multiple frequencies and different process corners. Using RTL design methodology, the design is technology independent, so the same design can be used for different technologies.

5.2. Future work

For the proposed delay line scheme, the used design methodology is the worst case design methodology. Meaning that, the number of cells in the delay line is determined from the fastest corner. Thus, larger number of cells is used in order to guarantee that 100% of the designed chips have an overall line delay greater than the one clock period. But if the percentage of chips that are designed in this severe corner is very small, the system is considered to be overdesigned. The design overhead in this case is the large area coming from extra cells that are only used in this corner. So, instead of this worst case design methodology, statistical analysis to the behavior of the specific given technology can be done. From the statistical analysis, the designer will know the relation between the number of cells used in the delay line, and the
percentage of the output chips that covers a complete clock period. Then, the designer compromises between the area and the yield. Also, delay lines are used in different application. The proposed delay line can be used in these applications and compared to existing designs.

5.3. Publications

A paper is to be published out of this work:

- “Synthesizable Delay Line architectures for digitally controlled voltage regulators,” accepted at the International System-on-Chip conference (SOCC), NY, USA, Sept. 2012.
Bibliography


[27] Aleksandar Prodic Benjamin J. Patella, "High-Frequency Digital PWM Controller IC for DC–DC Converters," IEEE transactions on power electronics,


