# Regenerative Breaking: Recovering Stored Energy from Inactive Voltage Domains for Energy-efficient Systems-on-Chip

Ali Najafi, Jacques C. Rudell, and Visvesh Sathe Department of Electrical Engineering, University of Washington, Seattle, USA anajafi@uw.edu, jcrudell@uw.edu, sathe@uw.edu

# ABSTRACT

Modern Systems-on-Chip(SoCs) frequently power-off individual voltage domains to save leakage power across a variety of applications, from large-scale heterogeneous computing to ultralow power systems in IoT applications. However, the considerable energy stored within the capacitance of the powered-off domain is lost through leakage. In this paper, we present an approach to leverage existing voltage regulators to recover this energy from the disabled voltage-domain back into the supply using a low-overhead all-digital runtime control system. Simulation experiments conducted in an industrial 65nm CMOS process indicate that over 90% of the stored energy can be recovered across a range of operating system voltages from 0.4V-1V.

### **Categories and Subject Descriptors** B.7.1 [Hardware]: VLSI

D.7.1 [Haldward]. VES

# General Terms

Design

# Keywords

Heterogeneous computing; Energy-harvesting; Voltage regulators; Energy recovery; Buck converters; All-digital control.

# **1. INTRODUCTION**

Energy-efficiency continues to play a central role in determining compute performance in large-scale computing systems[1,2,3], and dictating the feasibility of a large number of ultra-low power applications[4,5]. Continued growth in computational performance is predicated upon advances in energy efficiency[6].

A variety of design techniques over the past decade have led to significant progress in energy-efficiency. Dynamic Voltage and Frequency Scaling~(DVFS) has in particular, emerged as the most effective low-power design approach, by effectively maximizing the achievable reduction in dynamic and static power while minimizing system-level performance impact. The need for continued power reduction has led to the aggressive employment of DVFS at higher spatio-temporal resolution.

An emerging trend in low-power design is the use of finegrained DVFS implementation through Integrated Voltage

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.

*ISLPED '16*, August 08-10, 2016, San Francisco Airport, CA, USA © 2016 ACM. ISBN 978-1-4503-4185-1/16/08 \$15.00 DOI: http://dx.doi.org/10.1145/2934583.2934621 Regulation (IVR) to provide independent fine-grained voltageregulators for each power-domain in the system[7,8]. Fine-grained DVFS allows for power-domain-specific optimal supply voltages without being constrained to operate at the maximum required supply-voltage across power-domains. Furthermore, reduced parasitic components enable a fast transient response to supply droop events, and enables rapid scaling of the supply voltage to meet domain-performance needs. Future SoCs are expected to widely employ IVR to continue to leverage DVFS in a more effective manner.

Another emerging trend driven by power constraints and technological parameters is the increasingly heterogeneous nature of modern SoCs, which are increasingly leveraging accelerators for improved efficiency to perform *specialized* tasks [3] for relatively smaller durations. One consequence of this significant trend is a frequent transition of these duty-cycled domains between active and sleep modes. Similarly, in ultra-low power applications such as sensors and IoT devices, systems are commonly duty-cycled[4] to achieve energy-efficiency in a leakage-dominated environment.



Figure 1. (a) System with one or more independently regulated voltage domains for fine-grain supply-voltage control (b) Energy delivered by the supply during wakeup, active and sleep modes.



Figure 2. Simplified schematic of (a) A Buck converter driving a voltage domain in active mode (b) Reversing power flow from the voltage domain (load) to the supply by operating the same regulator as a Boost converter.

Figure 1 illustrates the energy delivered by the supply to a voltage domain as it transitions from sleep-mode to active operation, and back to sleep-mode (SLEEP). Before the domain can perform any computation, a sizable amount of energy must first be delivered to charge up the output capacitance (both implicit and explicit). In the active mode, the supply continues to deliver energy to the domain as dynamic and static losses are incurred during computation. However in conventional systems, once the domain is returned to SLEEP, most or all of this stored "over-head" energy is lost as leakage. The significance of this wasted overhead energy depends on the duration of the active and sleep modes. In cases where the domain performs limited computation before returning to SLEEP (duty-cycled energy-harvesting systems), this wasted overhead-energy can dominate overall energy [4, 12]. In the case of IVR-enabled voltage domains for accelerators, a significant amount of output capacitance is required to ensure regulator stability even in high switching frequency converters [9]. Voltage-domains must remain in SLEEP for sufficiently long to offset the significant energy overhead of charging this output capacitance, thereby limiting energy savings. As more numerous, specialized accelerators[10, 11] are increasingly used to address energy-efficiency challenges in IVR-enabled systems of the future, the problem of minimum sleep durations will become increasingly acute

In this paper we propose a novel technique, *regenerative breaking(RB)*, that effectively leverages both the emerging trends of fine-grained regulation and duty-cycled voltage domains to address the challenge of wasted overhead energy. Instead of allowing the stored charge in a regulated voltage domain to leak away during *SLEEP*, RB reverses the flow of charge from the voltage domain, through the IVR system, back to the supply or other active domains, thereby recovering the delivered energy.

The idea of recovering charge from voltage domains was first proposed in [12] in the context of switched capacitor circuits. During recovery, capacitors were connected to the voltage domain, and subsequently the supply over a sequence of capacitor ratios. However, switched-capacitor circuits, with discrete voltage conversion ratios, are limited in their capabilities of efficient energy recovery--[12] reports an efficiency of 67%. Furthermore, the switched-capacitor implementation requires programmable capacitor banks. In contrast, our proposed RB approach effectively leverages existing Buck converters to achieve recovery at significantly higher efficiencies without placing constraints on the regulator design. The contributions of this paper are as follows: (1) we propose a novel circuit-architecture that efficiently recovers energy from a voltage domain into the supply using pre-existing voltage regulator circuits to enhance energy efficiency in power gated systems. (2) We propose a technique for stable and efficient runtime-control of the energy recovery with a low-overhead implementation.

The remainder of the paper is organized as follows. In Section 2, we examine the principle of RB and highlight the key challenges facing the approach. In Section 3, we find the upper bound for recovery efficiency and propose a hardware-amenable control-law to govern recovery. Implementation of the recovery circuit is presented in Section 4. RB requires runtime control at the system level to achieve high efficiency recovery. The system level control methodology is presented in this Section. Finally, simulation results are presented in Section 5.

# 2. REGENERATIVE BREAKING (RB)

#### **2.1** Principle and Challenges

Figure 2a illustrates a simplified schematic circuit of a Buck and boost converter regulating a voltage domain (also referred to as the "load"). In a Buck converter, the L and C components act as a lowpass filter to efficiently provide an output voltage corresponding to the DC component of the voltage at V<sub>B</sub>. The DC component of V<sub>B</sub> is determined by the switching duty cycle of the bridge (NMOS and PMOS transistors in Figure 2), and the constant supply voltage V<sub>DD</sub>[13]. In practice, the inductor L, is implemented using off-chip passive components [16], or package inductors[8]. The capacitor *C*<sub>load</sub> is the result of implicit capacitance of the load, and a significant amount of explicit capacitance that is added to stabilize the voltage regulator in both, high performance[8] and ultra-low power applications[14, 15]

Similarly, the boost converter operates by controlling the duty-cycle of the bridge so that the resulting output voltage exceeds  $V_{DD}$  as follows:

$$V_{load} = \frac{V_{DD}}{D'} : D' = 1 - D \tag{1}$$

where D is the duty cycle, the portion of time that the inductor is connected to  $V_{SS}$  through the NMOS device. A closer observation of Figure 2(a) reveals that the buck and boost converter topologies are duals of each other - Reversal of one topology yields the other.

RB takes advantage of this duality to enable the regulator to recover energy from the voltage domain. When the load transitions to *SLEEP*, an energy-recovery controller operates the

voltage regulator (buck converter) as a boost converter. In this arrangement, the load capacitor serves as the input voltage source for a boost converter, and the power supply serves as the load-Current flows back from the load to the power supply. The regeneration incurs some losses— $I^2R$  dissipation in the resistance of the inductor, and the bridge devices (conduction losses), and  $CV^2f$  losses in the bridge, its pre-driver and the control circuitry (switching losses) need to be minimized.

An important distinction between a traditional boost converter and RB is that unlike a boost converter which employs a steady input voltage, RB involves a gradually decaying input voltage—As charge is returned to the supply, the load capacitance discharges, reducing the input voltage of the boost converter (Figure 4). Consequently, the duty cycle D, for which the inductor is connected between the load and  $V_{SS}$  must be continuously increased to reflect the evolving  $V_{out}/V_{in}$  ratio. A lower-than optimal duty-cycle results in reduced efficiency due to excessive switching losses in the bridge, or power transfer out of the supply. An excessive duty cycle results in excess current flow into the supply, degrading efficiency due to increased conduction losses.

The key to RB therefore, is the system-level runtime control required to orchestrate the efficient recovery of energy as the load capacitance continuously discharges. Note that RB is applicable to both buck and boost regulators, though we only consider repurposing the Buck converter in this paper.

### 2.2 System Model and Problem Formulation



Figure 3. Simplified model of typical Buck converter with the bridge transistors and pre-drivers.

In this section, we present a simplified analytical model for RB, and use the model to analyze the load voltage profile, and the incurred energy losses in returning the stored energy to the supply.



Figure 4. Evolution of duty-cycle as V<sub>load</sub> discharges(T=period)

Similar to prior work in the analysis of buck and boost converters, we make realistic simplifications in the analysis of the load voltage  $V_{load}$ , and the inductor current  $I_L[13]$ . We assume that the rise and fall of  $I_L$  is linear in nature, and that  $V_{load}$  is constant during *a single switching cycle*. These simplifications enable the

determination of  $V_{load}$  and  $I_L$  at the beginning of the n<sup>th</sup> cycle, based on the choice of the duty-cycle applied to the bridge, D.

$$V_{load}[n+1] = V_{load}[n] - \frac{Q[n]}{C_{load}}$$
<sup>(2)</sup>

where Q[n] is the total charge moving out of the load capacitor.

$$I_{L}[n+1] = I_{L}[n] + \frac{V_{load}[n]}{L}D[n]T - \frac{V_{DD} - V_{load}[n]}{L}D'[n]T$$
(3)

The equations above apply to a converter operating at a fixed frequency under Pulse-Width-Modulation (PWM) of the bridge signal under Continuous Conduction Mode (CCM). Operating at the Discontinuous Conduction Mode (DCM) and CCM boundary can sometimes be more advantageous, but we omit this scenario for brevity. The voltages and currents  $V_{load}[n]$ ,  $I_L[n]$ , and the applied duty cycles D[n] together determine both, the amount of energy recovered in a cycle, and the energy lost in that cycle.

The energy recovered by the supply in cycle-n is

$$E_{rec}[n] = V_{DD}I_{V_{DD},rms}[n] = \Delta E_{cap}[n] - E_{loss}[n]$$
(4)

The energy losses incurred in recovery due to conduction losses in  $R_{ind}$  and  $R_{sw}$ , and switching losses incurred in driving the bridge capacitance,  $C_{sw}$  and total control circuitry switching capacitance  $C_{control}$  can be written as:

$$E_{loss,total} = NT(R_{sw} + R_{ind})I_{L,rms}^{2} + N(C_{sw} + C_{control})V_{DD}^{2} + I_{leak}V_{DD}NT$$
(5)

where N is the total number of cycles of recovery.

## **3. EFFICIENCY OPTIMIZATION**

In this section, we utilize the analysis model from Section 2.2 to develop an optimization problem for maximizing the energy efficiency of RB. We identify a heuristic solution that we show to be readily implementable in hardware, with close to optimal-efficiency.

The efficiency,  $\eta$  of the recovery process can be defined as:

$$\eta = \frac{E_{total} - E_{loss,total}}{E_{total}} = 1 - \frac{E_{loss,total}}{\frac{1}{2}C_{load}V_{load,initial}}^2$$
(6)

where  $E_{loss,total}$  is defined in Equation (A.9) and  $V_{load,initial}$  is voltage at the beginning of the recovery process.

Maximizing efficiency for a given amount of recovery requires minimization of losses. The resulting optimization problem (detailed in Appendix A) does not inform an efficient hardware implementation - Storing the optimal sequence of D[n] would be inefficient. We instead identify and develop a more hardwareamenable heuristic solution.

To benchmark the quality of the heuristic solution, we first identify an upper bound for the energy efficiency of the system by assuming  $I_L$  to be ripple-free, guaranteeing reduced conduction losses in the system. Considering the losses occurring in each cycle, the condition for  $I_L[n]$  for minimum loss in the n<sup>th</sup> cycle can be found to be (See Appendix B for details):

$$I_{avg,opt} = \sqrt{\frac{(C_{sw} + C_{control})V_{DD}^2 f + I_{leak}V_{DD}}{R_{sw} + R_{ind}}}$$
(7)

The optimal discharge ripple-free current waveform therefore does not depend on  $V_{load}[n]$  and is therefore *constant* through the recovery process. The switching frequency *f* is the only variable design parameter in Eq. 7. In a practical implementation, reducing *f* reduces switching losses at the expense of increased conduction losses due to ripple. In the context of a ripple-free I<sub>L</sub> model however, losses are minimized by making *f*=0 (effectively ignoring switching losses). Assuming ripple-free  $I_L$  and ignoring switching losses therefore provides an upper bound for the maximum achievable efficiency.

Motivated by the optimality of maintaining constant  $I_L$  (for the ripple-free case), we adopt a constant  $I_L$ -based control (with ripple) mechanism to control the bridge duty-cycle during recovery.

To quantify the impact of proposed heuristic on efficiency, we implemented the heuristic optimization using our converter model in MATLAB, and compared its efficiency to the efficiency-upper bound. The resulting recovery efficiency,  $\eta$  and the upper bounds for each initial load voltage are plotted in Figure 5. Note that the constant-current heuristic yields not only a high recovery efficiency of 90%, but also falling within 7% of the theoretical upper bound across the range of domain voltages from 0.4V to 1V.



Figure 5. Efficiency upper bound and proposed method total efficiency vs. Initial load voltage.

# **4. IMPLEMENTATION**

In this section, we outline the design methodology adopted to implement a regulator to support RB. To illustrate the design of such a system, we consider a converter with circuit parameters outlined in Figure 6, consistent with prior demonstrations of ultra-low-power systems regulated with Buck converters [14,15] and available off-chip passive component models. Our analytical model enables determination of the converter switching frequency *f*, and the appropriate choice of compensator (i.e. digital block to stabilize the control loop) design for stable operation. The converter is subsequently designed and simulated using device-models and standard cells in an industrial 65nm CMOS process.

# 4.1 Overall Architecture

Figure 6 illustrates the overall RB architecture. In the active mode, the system operates as a conventional Buck converter. During recovery in *SLEEP*, a constant-current recovery compensator replaces the conventional buck compensator. Inductor current is indirectly sensed using a novel *all-digital* fashion using a fixed dead-time setting, and comparing the voltage difference across the PMOS device at the onset of PMOS turn-on. The sensing technique provides an early-late signal to the recovery compensator which adjusts the Digital Pulse Width Modulation (DPWM) module to adjust the current in the next cycle through the duty-cycle setting. Once the pre-determined target current is achieved the proposed digital current sensor also provides Zero Voltage Switching [13] for efficient operation.



Figure 6. Block diagram of the implemented system architecture

Unlike its existing counterpart [12], the proposed implementation of the energy recovery system does not place any constraints on the design of the voltage regulator during regular operation. Moreover, this implementation augments the regulator with low-overhead control to enable energy recovery.

### 4.2 Switching Frequency

Selection and adjustment of the optimal switching frequency for minimum overall energy loss during recovery is onerous due to the complexity of the resulting system of equations (Eq. A.10). Consequently, we target constant recovery current at a fixed switching frequency for the design in Figure 6.



Figure 7. Efficiency contours vs. Current(I) and Frequency for  $V_{load,initial}$ =0.75V.

Figure 7 shows a contour plot of energy efficiency versus switching frequency and target current using the proposed approach. The maximum of the efficiency occurs at approximately at I=10.5mA and f=5MHz.

# **4.3 Compensator Design**

Maintaining stable control of RB requires an effective compensator design. To understand why, we first construct a zdomain approximation of the discrete time system using discretetime difference equations.

Inductor current  $I_L[n]$  can be controlled by the input to the DPWM module, x[n]. Assuming that  $C_{\text{load}}$  is large enough to maintain a constant  $V_{\text{load}}$  during a single cycle, the z-domain small-signal approximation of the inductor current can be written as:

$$I_L(z) = \frac{V_{DD}}{L} \frac{Tz}{M(z - p_{load})} X(z)$$
(8)

where M is the maximum control code of a linear delay chain in the DPWM and  $p_{load}$  is a pole that is close to 1.



Figure 8. z-domain representation of the constant current control loop.

Figure 8 shows the resulting z-domain representation of the closed-loop system. In a closed-loop configuration, the recovery control loop must be appropriately compensated to ensure stability. A straightforward accumulator-only approach (corresponding to  $k_i$ ) to update the DPWM code based on the sensed current results in an unstable loop response. In this example, we implemented a Proportional-Integral (PI) compensator, allowing  $k_p$  to provide the necessary damping to the system.

## 5. SIMULATION RESULTS

The proposed RB system was designed with an industrial 65nm CMOS process technology kit. The all-digital compensator was implemented by using a p-type Strong-ARM latch as a clocked comparator for current detection. The compensator was implemented using a Synthesis, Auto-Place and Route (SAPR) design flow. The simulation accounts for parasitics in the bridge and the control logic. Spice simulations were performed to validate the operation, and measure the efficiency of the proposed architecture.

Figure 9 shows simulation results of the RB system implementation during the course of discharge, starting from  $V_{load}$ =0.5V. PI controller coefficients are selected to achieve sufficient loop bandwidth to track required adjustments in D[n] as  $V_{load}$  discharges. As seen in the figure, the controller is able to maintain a steady inductor current. The inductor ripple current is at its highest at the start of recovery since  $V_{init}$  is close to  $V_{DD}/2$  [13], and gradually reduces as  $V_{load}$  approaches  $V_{SS}$ .



Figure 9. Energy recovery spice simulation. a)  $I_L$  and b)  $V_{load}$ and c)D[n] vs. time. The recovery occurs over 210 $\mu$ s



Figure 10. Blue: modeled and simulated (parasitics included) efficiency vs. Initial load voltage of RB implementation. Green: Total capacitor energy and recovered capacitor energy vs. Initial load voltage.

Efficiency measurements were performed using spice simulations incorporating parasitics in the controller and Buck converter modules. The measured efficiency accounts for all losses including switching, conduction, and controller-overhead losses. The achieved efficiency  $\eta$ , matches the constant current model fairly well, with a peak efficiency of 96% for an initial  $V_{load}$  of 1V (before recovery) down to 91% for an initial  $V_{load}$  of 0.4V (Figure 10).

# 6. CONCLUSION

As design and technology trends favor increased use of voltage domains that are active for short periods of time, with longer sleepmode durations in between, dissipation of the energy stored on the sizeable capacitance of the voltage domain will play a significant role in overall efficiency. The proposed efficient, low-overhead alldigital power system architecture, Regenerative Breaking, enables the recovery of this overhead energy back to the supply using preexisting circuits in the regulator itself. Spice simulations indicate that 96% recovery efficiency can be achieved using the proposed architecture.

# **APPENDIX A: Energy Losses in Recovery**

In this section we will analyze the losses incurred in energy recovery.

Q[n] in Eq. (2) is equal to:

Į

$$Q[n] = \frac{(I_L[n] + I_L'[n])D[n]T + (I_L'[n] + I_L[n+1])D'[n]T}{2}$$
(A.1)

As shown in Figure 4, Q[n] is equal to the area under the inductor current waveform in the  $n^{th}$  cycle.

 $V_{load}[n+1]$  can then be written as:

$$V_{load}[n+1] = V_{load}[n] - \frac{(I_L[n] + I_L[n])D[n]T + (I_L[n] + I_L[n+1])D[n]T}{2C_{load}}$$
(A.2)

Conduction losses in the NMOS and PMOS devices of the bridge in each cycle are equal to:

$$E_{R,n}[n] = (R_N + R_{ind}) D[n] T (I_L^2[n] + I_L[n] \Delta I_1[n] + \frac{\Delta I_1[n]^2}{3}) \quad (A.3)$$

$$E_{R,p}[n] = (R_P + R_{ind})D'[n]T(I_L'^2[n] - I_L'[n]\Delta I_2[n] + \frac{\Delta I_2[n]^2}{3}) \quad (A.4)$$

Where  $R_{\text{N}}$  and  $R_{\text{P}}$  are resistance of the NMOS and PMOS switches, and

$$\Delta I_{1}[n] = \frac{V_{load}[n]}{L} D[n]T , \quad \Delta I_{2}[n] = \frac{V_{DD} - V_{load}[n]}{L} D'[n]T$$
(A.5)

$$I_L'[n] = I_L[n] + \frac{V_{laod}[n]}{L} D[n]T$$
(A.6)

Note that in our implementation  $R_N=R_P=R_{sw}$ . The leakage and switching losses in each cycle are:

$$E_{Leakage}[n] = I_{leak}V_{DD}(D[n] + D'[n])T, E_{SW}[n] = (C_{SW} + C_{control})V_{DD}^{2}$$
(A.7)

respectively. The total losses for each cycle can be written as:

$$E_{loss}[n] = E_{Leakage}[n] + E_{sw}[n] + E_{R,n}[n] + E_{R,p}[n].$$
(A.8)

The total recovery is the sum of losses for each cycle:

$$E_{loss,total} = \sum_{1}^{N} E_{loss}[n]$$
(A.9)

To find the maximum efficiency we minimize losses with respect to D[n] and D'[n]. The resulting optimization problem is:

$$\min_{D[n], D'[n] \ge 0} E_{loss, total} \text{ Subject to:}$$
(A.10)

 $V_{load}[1] = V_{load, initial}, I_L[1] = 0, V_{load}[N] = V_{load, stop}$ 

And  $V_{load}[n]$  and  $I_L[n]$  are governed by equations (A.2) and (3) respectively.  $V_{load,stop}$  is voltage at the end of the recovery process.

# **APPENDIX B: Optimal Iavg Determination**

In this section we determine the optimum current assuming ripple-free inductor current  $I_L$ . In this case, the energy losses in each cycle are independent of load voltage during that cycle. There is therefore no cycle (and time) dependence in the loss equation for each cycle. Consequently, the optimum current waveform in this case is constant and can be found by minimizing the loss in this case:

$$E_{loss,total} = ((C_{sw} + C_{control})V_{DD}^2 f + I_{leak}V_{DD} + (R_{sw} + R_{ind})I_{avg}^2)\Delta t \quad (A.11)$$

$$\Delta t = \frac{C_{load} V_{load,init}}{I_{avg}} \tag{A.12}$$

Then the optimum  $I_{avg}$  is equal to:

$$I_{avg,opt} = I_{avg,opt}[n] = \sqrt{\frac{(C_{sw} + C_{control})V_{DD}^2 f + I_{leak}V_{DD}}{R_{sw} + R_{ind}}}$$
(A.13)

The total loss in this case is equal to:

$$E_{loss,total,opt} = 2C_{load}V_{load}\sqrt{((C_{sw} + C_{control})V_{DD}^2 f + I_{leak}V_{DD})(R_{sw} + R_{ind})} \quad (A.14)$$

Considering equation (A.14) lower frequency will result in lower losses and a more efficient system. The frequency is chosen to make sure that the current will not go negative(which will result in current going from supply to  $C_{load}$ ). To find the efficiency upper bound we assume zero frequency. Figure 5 shows the efficiency upper bound versus initial  $V_{load}$ .

# 7. REFERENCES

[1] Karnik, T.; et al., "Power management and delivery for highperformance microprocessors," in *Design Automation Conference (DAC), 2013 50th ACM/EDAC/IEEE*, vol., no., pp.1-3, May 29 2013-June 7 2013. [2] Galal, S.; Horowitz, M., "Energy-Efficient Floating-Point Unit Design," in *Computers, IEEE Transactions on*, vol.60, no.7, pp.913-922, July 2011.

[3] Esmaeilzadeh, H.; et al., "Dark silicon and the end of multicore scaling," in *Computer Architecture (ISCA), 2011 38th Annual International Symposium on*, vol., no., pp.365-376, 4-8 June 2011.

[4] Mingoo Seok; et al., "The Phoenix Processor: A 30pW platform for sensor applications," in*VLSI Circuits, 2008 IEEE Symposium on*, vol., no., pp.188-189, 18-20 June 2008.

[5] Dreslinski, R.G.; et al., "Near-Threshold Computing: Reclaiming Moore's Law Through Energy Efficient Integrated Circuits," in *Proceedings of the IEEE*, vol.98, no.2, pp.253-266, Feb. 2010.

[6] Fuller, S.H.; Millett, L.I., "Computing Performance: Game Over or Next Level?," in *Computer*, vol.44, no.1, pp.31-38, Jan. 2011.

[7] Bulzacchelli, J.F.; et al., "Dual-Loop System of Distributed Microregulators With High DC Accuracy, Load Response Time Below 500 ps, and 85-mV Dropout Voltage," in *Solid-State Circuits, IEEE Journal of*, vol.47, no.4, pp.863-874, April 2012.

[8] Burton, E.A.; et al., "FIVR — Fully integrated voltage regulators on 4th generation Intel® Core<sup>™</sup> SoCs," in *Applied Power Electronics Conference and Exposition (APEC), 2014 Twenty-Ninth Annual IEEE*, vol., no., pp.432-439, 16-20 March 2014.

[9] Sugahara, S.; et al., "90% High Efficiency and 100-W/cm High Power Density Integrated DC - DC Converter for Cellular Phones," in *IEEE Transactions on Power Electronics*, vol. 28, no. 4, pp. 1994-2004, April 2013.

[10] Taylor, M.B., "Is dark silicon useful? Harnessing the four horsemen of the coming dark silicon apocalypse," in *Design Automation Conference (DAC), 2012 49th ACM/EDAC/IEEE*, vol., no., pp.1131-1136, 3-7 June 2012.

[11] Cong, J.; et al., "On-chip interconnection network for accelerator-rich architectures," in *Design Automation Conference (DAC), 2015 52nd ACM/EDAC/IEEE*, vol., no., pp.1-6, 8-12 June 2015.

[12] Alioto, M.; Consoli, E.; Rabaey, J., "EChO power management unit with reconfigurable switched-capacitor converter in 65 nm CMOS," in *Custom Integrated Circuits Conference (CICC), 2012 IEEE*, vol., no., pp.1-4, 9-12 Sept. 2012.

[13] Robert W Erickson, Dragan Maksimovic, "Fundamentals of Power Electronics," 2007, Springer

[14] Shrivastava, A.; et al., "A  $1.2\mu$ W SIMO energy harvesting and power management unit with constant peak inductor current control achieving 83–92% efficiency across wide input and output voltages," in *VLSI Circuits Digest of Technical Papers, 2014 Symposium on*, vol., no., pp.1-2, 10-13 June 2014.

[15] Bandyopadhyay, S.; et al., "20 uA to 100 mA DC–DC Converter With 2.8-4.2 V Battery Supply for Portable Applications in 45 nm CMOS," in *Solid-State Circuits, IEEE Journal of*, vol.46, no.12, pp.2807-2820, Dec. 2011.

[16] http://www.coilcraft.com/mss1278.cfm