A Reconfigurable Non-Uniform Power-Combining V-Band PA With $+17.9$ dBm $P_{\text{sat}}$ and 26.5% PAE in 16-nm FinFET CMOS

Kun-Da Chu, Graduate Student Member, IEEE, Steven Callender, Member, IEEE, Yanjie Wang, Senior Member, IEEE, Jacques Christophe Rudell, Senior Member, IEEE, Stefano Pellerano, Member, IEEE, and Christopher Hull, Senior Member, IEEE

Abstract—This article presents the design of a dual-mode V-band power amplifier (PA) that enhances the efficiency at power back-off (PBO) using load modulation. The PA utilizes a reconfigurable two-/four-way power combiner to enable two discrete modes of operation—full power and back-off power. The power combiner employs two techniques to further improve the PA efficiency at PBO: 1) usage of transformers with non-uniform turns ratios to reduce the difference in impedance presented to the PA cores between the two modes and 2) utilize a proposed switching scheme to eliminate the voltage inductance associated with the disabled path in back-off power mode (BPM). The two-stage PA achieves a peak gain of 21.4 dB with a fractional bandwidth (FBW) of 22.6% (51–64 GHz). At 65 GHz, the PA has a $P_{\text{sat}}$ of $+17.9$ dBm with an $OP_{1\text{db}}$ of $+13.5$ dBm and a peak power added efficiency (PAE) of 26.5% in full-power mode. In BPM, the measured $P_{\text{sat}}$, $OP_{1\text{db}}$, and peak PAE are $+13.8$ dBm, $+9.6$ dBm, and 18.4%, respectively. The PA is enhanced by 6% points at a 4.5-dB back-off. The PA is capable of amplifying a 6 Gb/s 16-QAM modulated signal with an $EVM_{\text{rms}}$ of $−20.7$ dB at an average $P_{\text{out}}$/PAE of $+13$ dBm/13.6%, respectively. This PA was implemented in 16-nm FinFET, occupies a core area of 0.107 mm$^2$, and operates under a 0.95-V supply.

Index Terms—CMOS, FinFET (FF), load modulation, millimeter-wave (mm-wave), power amplifiers (PAs), power combining.

I. INTRODUCTION

With the introduction of 60 GHz, fifth-generation (5G) communications, and radar systems for autonomous driving, the demand for highly integrated millimeter-wave (mm-wave) wireless front-ends has intensified with an emphasis on reducing the form factor and cost. Millimeter-wave bands provide expanded bandwidth (BW) of several gigahertz as well as increase the effective input/output quality factor of various wireless applications to operate at increased data rates. As the feature size of CMOS technologies continues to scale to allow high-speed operation and high-level system integration, major challenges exist for the development of wireless radio system-on-chips (SoCs). One such challenge is achieving high-efficiency power amplifier (PA) designs with wide BW and high output power. Although a number of mm-wave CMOS PAs [1]–[26] and SiGe PAs [27]–[29] have been published, only a few of these publications were implemented in a FinFET (FF) CMOS technology [3], [4], which is a prime candidate technology to implement next-generation mm-wave SoCs.

Despite the challenges of mm-wave design in an FF process, there have been many publications discussing various aspects of mm-wave design. A high peak-to-average power ratio (PAPR) is a well-known concern in FF due to the confined geometry which makes the heat dissipation through substrate difficult. This could affect the effective input/output quality factor of the devices if careful layout optimization is not followed [30]. These challenges limit attainable gain per unit current, and thus limit attainable PA efficiency. To combat this, a capacitively neutralized differential pair is commonly used to boost $f_1$ and $f_{\text{max}}$ as well as increase the effective input/output quality factor of the devices [31]. Some of the considerations are worth mentioning here which can be applied to achieve an improved PA efficiency. Self-heating is a well-known concern in FF due to the confined geometry which makes the heat dissipation through substrate difficult. As a result, FF transistors are usually biased at a lower current density that leads to a lower gain. Second, the high parasitic capacitances contributed by the 3-D FF gate and deep sub-micron interconnect can limit the device $f_1$ and $f_{\text{max}}$ as well as increase the effective input/output quality factor of the devices if careful layout optimization is not followed [30]. These challenges limit attainable gain per unit current, and thus limit attainable PA efficiency. To combat this, a capacitively neutralized differential pair is commonly used to boost $G_{\text{max}}$ by 4–5 dB [34]. Another challenge associated with both FF transistors and non-SOI processes, in general, is the limited output power from a single-stage. FF has similar limits to device stacking as bulk CMOS which thereby limits the maximum output power that can be generated reliably. To circumvent this, power combining is often employed to increase transmit power.

The efficiency of PAs plays an important role for improving battery lifetime as PAs often consume the majority of power in radio transceivers. However, the average efficiency of the PA is usually significantly lower than its peak efficiency due to the characteristics of the data-modulated signal. As the demand for a high data rate grows, spectrally efficient modulation methods are desired. Unfortunately, these modulation schemes exhibit a high peak-to-average power ratio (PAPR), thereby degrading PA average efficiency. As an example, the probability density function (PDF) of a 16-QAM modulation as a function of
normalized PA $P_{\text{out}}$ is shown in Fig. 1 along with the power added efficiency (PAE) of a typical class-A PA. The average PAE is the sum of the product of the PDF and PAE. As shown in Fig. 1, the PA rarely operates in the peak PAE region and most often operates in the lower PAE region which leads to an average PAE much lower than its peak PAE. In this example, though the peak PAE is 25%, the average PAE is only 6%. As a result, several techniques have been proposed to enhance efficiency at power back-off (PBO) in order to improve the average PA efficiency.

One such effective and popular technique is the Doherty PA. Doherty PAs show impressive back-off efficiencies, with one implementation at 60 GHz exhibiting a peak PAE of 26% with an enhanced PAE of 16.6% at 7-dB PBO [1]. However, the large footprint associated with Doherty PAs complicates SoC integration. In addition, Doherty PAs suffer from narrow BW imposed by the $\lambda/4$ impedance rotation on the auxiliary path. Furthermore, as next-generation systems are likely to utilize several mm-wave bands from 28 to 90 GHz, a wideband PA will be desirable in order to reduce the number of required front-end modules for multi-band operation and thus lower system cost. While some wideband mm-wave PAs have been demonstrated [2], [3], their back-off efficiencies typically drop by more than half at PBOs greater than 3 dB. As a result, it is of interest to develop compact, wideband, high-output-power PAs in deeply scaled FF CMOS with enhanced efficiency at PBO.

This article presents a wideband reconfigurable two/four-way power-combining PA with compact form factor implemented in 16-nm FF CMOS [32]. The PA can be configured in two discrete output power modes: full-power mode (FPM) and back-off power mode (BPM). In FPM, all gain stages are ON with SW1–4 open, thereby placing the PA in its highest Pout mode. In BPM, DRV1-2 and PA2-3 are ON, while PA1 and PA4 are OFF, and SW1–4 are closed. In this configuration, the PA output stage becomes a parallel 2-to-1 combiner and ideally operates at 6-dB PBO as compared to FPM, assuming uniform power combining (i.e., all transformers have identical turns ratios).

A. PA Core

Fig. 3 shows the detailed transistor-level schematic for the bottom-half of the PA. Capacitively neutralized differential pairs are employed in all gain stages for an increased $G_{\text{max}}$ [34]. The capacitances are obtained by overlapping drain and gate routing in layout, similar to [4]. The driver stages are biased in the class-A region for higher gain while the PA stages are biased in the class-AB region with a current density of 125 $\mu A/\mu m$ for better efficiency at PBO [3]–[7]. A common-mode (CM) source degeneration inductor of 145 pH is placed in the driver stage for better CM stability and CM rejection, as the driver stage contributes to the majority of gain and is more susceptible to oscillation.

This article is organized as follows. Section II shows the architecture and design of the proposed PA followed by a detailed discussion of non-uniform power combining and switching scheme in Section III. Section IV presents the measurement results. Section V provides the conclusion with a comparison to state-of-the-art mm-wave CMOS PAs.

II. TWO-/FOUR-WAY POWER-COMBINING PA ARCHITECTURE

Fig. 2 depicts the PA topology. It is composed of two gain stages, an input matching transformer, two interstage power splitters, and a reconfigurable two-/four-way series-parallel power combiner at the output. In FPM, all gain stages are ON with SW1–4 open, thereby placing the PA in its highest Pout mode. In BPM, DRV1-2 and PA2-3 are ON, while PA1 and PA4 are OFF, and SW1–4 are closed. In this configuration, the PA output stage becomes a parallel 2-to-1 combiner and ideally operates at 6-dB PBO as compared to FPM, assuming uniform power combining (i.e., all transformers have identical turns ratios).

A. PA Core

Fig. 3 shows the detailed transistor-level schematic for the bottom-half of the PA. Capacitively neutralized differential pairs are employed in all gain stages for an increased $G_{\text{max}}$ [34]. The capacitances are obtained by overlapping drain and gate routing in layout, similar to [4]. The driver stages are biased in the class-A region for higher gain while the PA stages are biased in the class-AB region with a current density of 125 $\mu A/\mu m$ for better efficiency at PBO [3]–[7]. A common-mode (CM) source degeneration inductor of 145 pH is placed in the driver stage for better CM stability and CM rejection, as the driver stage contributes to the majority of gain and is more susceptible to oscillation.
B. Input Matching and Interstage Power Splitter

The input matching network uses a high-$k$ ($k = 0.6$) transformer for minimal loss while low-$k$ ($k = 0.3$) transformers are used for the interstage power splitters to enhance the BW [36]. Series power splitting is utilized for two reasons. First, the resulting transformer inductance ratio (1.6:1) is much lower than that of a parallel splitter (6.5:1) [2], thereby resulting in lower transformer insertion loss [37]. Second, series power splitting enables the use of shunt switches at the front of PA1 and PA4 to disable these paths in BPM (SW1–2 in Fig. 2). In contrast, a parallel power splitter would require a large OFF impedance from PA1 and PA4, which is challenging to achieve at mm-wave frequencies due to the large input capacitance associated with the PA devices. As such, the shunt switch in a series splitter leads to reduced loading of the OFF paths in BPM (PA1 and PA4). Note that adding a switch to reduce the signal swing at the disabled PA input is necessary. This is because the swing accumulated (or $V_{\text{rms}}$) at the gate of disabled PA might partially turn on the PA and degrade the overall efficiency.

C. Output Matching Network and Power Combiner

The transformer-based output matching network is designed using a holistic optimization approach to improve PA efficiency by performing active/passive device co-design [3]. The reconfigurable two-/four-way series–parallel power combiner applies a non-uniform turns ratio for the transformers presented at the load of each PA driver. Applying a non-uniform turns ratio improves the PA performance in BPM by reducing the change in PA load impedance that occurs when switching between the two modes. Fig. 4(a) shows the conceptual diagram of a non-uniform power combiner with the 50-$\Omega$ antenna load modeled as two 100-$\Omega$ resistors in parallel. Here, the characteristic of non-uniform is identified as the different turns ratios used in each pair of transformers that make up the half-circuit of the combiner. As shown in Fig. 4(a), the non-uniform turns ratios of transformers for PA1, PA2, PA3, and PA4 are 1:1, $1: \sqrt{2}$, $1: \sqrt{2}$, and 1:1, respectively. Fig. 4(b) shows the configuration of the combiner in FPM. When all the paths are ON, the voltages across each transformers’ secondaries are $V_{\text{in}}$, $\sqrt{2}V_{\text{in}}$, $\sqrt{2}V_{\text{in}}$, and $V_{\text{in}}$, respectively, assuming each PA outputs the same $V_{\text{in}}$. Moreover, the currents flowing through each transformer’s secondary are equal. As a result, each of the 100-$\Omega$ terminations is distributed as 59- and 41-$\Omega$ impedances across the secondaries of the transformers of PA2/PA3 and PA1/PA4, respectively. These impedances are then transformed, via the respective turns ratios, to 29- and 41-$\Omega$ loads which are presented to each PA core. In BPM, PA2 and PA3 will see a load impedance of 50-$\Omega$, as shown in Fig. 4(c), where PA1 and PA4 are OFF, and SW3 and SW4 are ON.

By contrast, with the conventional uniform power combining where the turns ratio is 1:1 for all transformers (i.e., identical transformer turns ratios in the combiner), the impedance presented to each PA is 50-$\Omega$/100-$\Omega$ in...
FPM/BPM. Now, assuming the impedance presented to the PA in FPM is its optimal load, \( r_{\text{opt}} \), this impedance should also be presented to PA2 and PA3 in BPM as well for optimal performance. Therefore, by applying non-uniform power combining, the impedance change between FPM and BPM is reduced to 1.72 x (29 \( \Omega \):50 \( \Omega \)), as compared to 2 x (50 \( \Omega \):100 \( \Omega \)) in uniform combining, and improves the output power and efficiency in BPM.

Note that the impedance change between FPM and BPM can be further minimized by choosing a more aggressive non-uniform combining turns ratios. For instance, the turns ratios of 1:1, 1: \( \sqrt{3} \), 1:1 can reduce the impedance mismatch to 1.57 x (21 \( \Omega \):33 \( \Omega \)), thereby improving \( P_{\text{sat}} \) and PAE in BPM further. However, implementing a turns ratio of 1: \( \sqrt{3} \) (or 1:3 inductance ratio) is challenging and exhibits higher loss at mm-wave frequencies [37].

Finally, it is worth noting that the back-off efficiency can also be improved by reducing the drive strength of each PA while simultaneously adjusting the PA load line [5]. The PA published in [5] is segmented into a few PA cells and capable of adjusting the PA load line to accommodate the impedance at PBO. In this scenario, an increase in the impedance presented to the PA for BPM is desirable so that the PA can utilize the full voltage swing in BPM and which is why a uniform combiner was adequate for previous designs such as [5]. However, the PA design in [5] is a digital PA. Applying the same technique to linear PAs would require the insertion of a tail switch device into the PA unit cell which has implications on performance. Simulations show that although insertion of such a switch device would not significantly affect \( P_{\text{sat}} \) and linearity, the PAE would degrade by approximately 5% points. As a result, instead of adjusting the PA load line by using a tail switch, the load line of each PA device in this design remains constant between FPM and BPM, and the passive combiner is reconfigured to reduce the impedance difference between the two modes.

**B. Comparison of Non-Uniform and Uniform Power Combining**

This section compares non-uniform and uniform power combining by presenting transistor-level simulation results. As shown in Fig. 5, we will only consider the bottom-half PA and an ideal switch for simplicity. Fig. 5(a) shows the schematic for the uniform power combining which is comprised of two PAs (PA3 and PA4), an ideal switch, and a two-way series power combiner with uniform turns ratios of 1:1.2 for both transformers. The quality factor of the inductors (\( Q = 15 \)) and coupling factor of the transformers (\( k = 0.65 \)) are applied to emulate the passive loss of the combiner. Since it is the half-circuit of the series–parallel combiner, the load presented to the bottom-half PA is now a 100-\( \Omega \) resistor in parallel with a 12-fF pad parasitic. Fig. 5(b) shows the schematic for non-uniform power combining which is the same as that of Fig. 5(a) except that the transformer turns ratios for PA3 and PA4 are 1: \( \sqrt{2} \) and 1:1, respectively.

Fig. 6(a) plots the drain efficiency (DE) versus \( P_{\text{out}} \) for uniform and non-uniform combiners in both FPM and BPM. For uniform power combining, the simulated DE drops by 7.1% points (52.1%–45%) when switching from FPM to BPM. This is expected as the impedance presented to PA3 is increased and shifted from \( r_{\text{opt}} \) of the PA, as discussed in Section III-A. By contrast, the simulated DE of non-uniform power combining shows a difference of only 2.3% points (51.3%–49%) between the two modes.

We can further break down the DE of each PA in the non-uniform combining case. As shown in Fig. 6(b), the PA3 and PA4 contribute slightly different DEs to the total DE of 51.3% in FPM where the peak DEs for PA3 and PA4 are \( \sim 52.8\% \) and \( \sim 50.8\% \), respectively. This is also expected since the impedances presented to PA3 and PA4 are different [Fig. 5(b)]. Note that the DEs for PA3 and PA4 are the same in uniform power combining.

The efficiency difference between FPM and BPM is mainly contributed by the PA3 as it is always ON but is presented with different impedances in the two modes. By applying non-uniform power combining, the simulated DE degradation of PA3 between two modes can be improved from 7.1% points to 2.9% points, see Fig. 6(a) and (b), respectively.

To provide another view of how non-uniform power combining improves the efficiency, Fig. 7(a) plots the simulated load–pull of DEs for uniform output combining using the half-circuit schematic depicted in Fig. 5(a). The peak DE occurs at a real 100 \( \Omega \) in FPM since the included combiner network should transform the 100-\( \Omega \) resistance and present \( r_{\text{opt}} \) to both PA3 and PA4. In BPM, as the impedance presented to the PA3 [Fig. 5(a)] is now 2 x of \( r_{\text{opt}} \), the peak DE can be obtained at a real 50 \( \Omega \), which is half of 100 \( \Omega \). With the load impedance fixed at 100 \( \Omega \) in the two modes, the DEs in FPM and BPM are 52% and 45%, respectively, based on the contours shown in Fig. 7(a). These results agree with Fig. 6(a).
In contrast, Fig. 7(b) presents the load–pull contours with non-uniform combining as shown in Fig. 5(b). The peak DE in FPM still occurs at a real 100 Ω but the peak DE in BPM is now closer to 100 Ω. With a fixed 100-Ω load, the DE of 51.3% in FPM and an improved DE of 49% in BPM are obtained.

Note that the purpose of this simplified example is to illustrate how non-uniform power combining reduces the change in load impedance presented to the PA output stage (PA1–4 in Fig. 2) between the two modes. The change in peak efficiency between FPM and BPM will be larger in the final design than what is shown in Fig. 6 due to several non-idealities associated with practical implementations (e.g., loss introduced by the switch on combiner secondary side in BPM, non-ideal short of PA1 and PA4 inputs in BPM results in power loss, etc.).

C. Proposed Switching Scheme

The proposed load modulation is implemented by placing the switch at the transformer’s secondary side to eliminate both the coupling and leakage inductances for the OFF path of the combiner. To understand the switching scheme of the power combiner, we will focus on the bottom half of the combiner as shown in Fig. 8(a) which is composed of two transformers and a switch. The simplified transformer model uses an ideal 1-to-n transformer, coupling inductance $kL$, and leakage inductance $(1 - k)L$ [38]. Looking at Fig. 8(b) where an ideal switch is placed at the secondary side, both the coupling inductance, $kL_2$, and the leakage inductance, $(1 - k)L_2$ will be shunted to ground. In contrast, Fig. 8(c) shows a technique commonly used to implement load modulation which places a shunt switch at the outputs of the PAs (transformer’s primary side). In this configuration, the switch can short the $kL_2$ term, but not the $(1 - k)L_2$ term. As a result, the leakage inductance becomes an undesired reactance in series with the secondary of the ON path to ground, thereby degrading the performance and frequency response in BPM. This effect is more severe at mm-wave frequencies where the transformer’s coupling factor is usually lower and thus leakage inductance is non-negligible.

The switches are implemented using thick-oxide devices with both gate and bulk terminals biased through kilo-ohm-order resistors, $R_B$ and $R_G$, to form a high-pass response which stabilizes the switch on-resistance under a high-voltage
Fig. 8. Comparison of implementing the switching scheme. (a) Power combiner model. (b) Proposed switch placement at secondary side. (c) Conventional switch placement at transformer’s primary. (d) Thick-oxide switch architecture which accommodates a high-voltage swing.

swing, see Fig. 8(d). This technique is commonly used in T/R switch designs [39].

D. Power Combiner

Fig. 9 shows a sketch of the proposed two-/four-way non-uniform power combiner implemented using RDL, ultra-thick metal (UTM), and 4×-thick metal (MZ) layers of the process. The combiner occupies a drawn area of 210 × 50 μm² with a drawn metal width of 3.4 μm. The simulated power combiner insertion loss is 2.7/2.9 dB in FPM/BPM. This loss is higher than what was reported in [32] due to calculation error in the earlier publication. Asymmetry between the differential terminals of each primary coil (e.g., primary-to-secondary capacitive coupling) can be observed in Fig. 9 and is most pronounced for PA1/PA4. Fig. 10 shows the magnitudes of the series impedances seen from the two single-ended outputs (+/−) of PA1–4 in (a) FPM and (b) BPM.

In FPM, SW3 and SW4 are OFF and the drain terminals see the most voltage stress. The simulated instantaneous peak voltages of $V_D$ and $V_{DG}$ are 1.35 and 1.23 V, respectively, which are well within the reliability margin for 18ud12 (1.8 V underdrive to 1.2 V) devices. In BPM, SW3 and SW4 are ON. The drain terminals are pulled close to the ground. The peak $V_D$ is 139 mV, and $V_{GD}$ has a quiescent voltage of ~1.2 V in BPM.

IV. MEASUREMENT RESULTS

This PA is fabricated in 16-nm FF CMOS technology and operates from a 0.95-V supply. The die photos are shown in Fig. 11. The core area of the PA is 0.107 mm².

The measured and simulated S-parameters in FPM and BPM are shown in Fig. 12. In FPM, the PA achieves a measured peak gain of 21.4 dB at 54 GHz and a 13-GHz BW (51–64 GHz), see Fig. 12(a). In BPM [Fig. 12(b)], the PA achieves a measured peak gain of 18.5 dB at 55 GHz and a 14-GHz BW (52–66 GHz). $S_{11} < −5.5$ dB and $S_{22} < −5.2$ dB are achieved with $S_{12} < −45$ dB (not shown) over the band of interest. The measured results show good agreement with
the simulations for $S_{21}$ and $S_{22}$ while the measured $S_{11}$ null is shifted $\sim 6$ GHz lower.

Fig. 13 shows the measured and simulated large-signal performance at 65 GHz. In FPM, the PA delivers a $P_{\text{sat}}$ of $+17.9$ dBm with a $+13.5$-dBm $\text{OP}_{1\text{dB}}$ and a 26.5% peak PAE. In BPM [Fig. 12(b)], the measured $P_{\text{sat}}$, $\text{OP}_{1\text{dB}}$, and peak PAE are $+13.8$ dBm, $+9.6$ dBm, and 18.4%, respectively. A reasonable agreement is achieved between measurements.
and simulations. Upon closer inspection, there appears to be a larger than expected difference between the measured PA \( P_{\text{sat}} \) and the cascaded compression point (OP1\(_{\text{dB}}\)). This may be attributed to the fact that the first stage, of the two-stage PA, is biased closer to class-A to boost the gain, thereby introducing a non-negligible impact on the overall linearity. Second, PA1 and PA4 see a higher load impedance as compared to PA2 and PA3 in the FPM, which contributes to the soft compression of the PA.

Fig. 14 plots the measured and simulated PAE curves versus \( P_{\text{out}} \) at 65 GHz. In FPM, the PA can deliver an output power of +12 to +18 dBm with >12% PAE. For output powers below +12 dBm, the PA can be switched to BPM for an enhanced efficiency. The PAE is \( \sim 6\% \) higher in BPM over an output power range of +8 to +12 dBm.

Fig. 15 shows key large-signal performance versus frequency, including \( P_{\text{sat}} \), OP1\(_{\text{dB}}\), peak PAE, and PAE at OP1\(_{\text{dB}}\). The PA maintains good performance within the BW of 60–70 GHz. The lowest frequency of large-signal test is limited to 60 GHz due to the band-limited test setup. However, the PA is expected to still maintain good performance down to 52 GHz since it is within the 3-dB BW.

The PA was also tested with modulated signals at 65 GHz. Fig. 16 shows constellations for two test cases. The measured 65-GHz spectrum shown in Fig. 16 was down-converted to a 3.5-GHz IF and captured by a VSA. In Fig. 16(a), the PA has an average EVM\(_{\text{rms}}\) of \(-21.9\) dB with an average \( P_{\text{out}} \) of +10.5 dBm and an average PAE of 7.2% for 1.5 GSym/s 16-QAM. For 1 GSym/s 64-QAM shown in Fig. 16(b), an average EVM\(_{\text{rms}}\) of \(-23.2\) dB with an average \( P_{\text{out}} \) of +9.8 dBm and an average PAE of 8.2% is achieved. Fig. 17(a) shows the EVM\(_{\text{rms}}\) versus \( P_{\text{out}} \) in FPM and BPM for various modulations. The measurement setup has an EVM\(_{\text{rms}}\) floor of \(-22\) dB/\(-24\) dB for 6 Gb/s 16-/64-QAM, respectively. Therefore, the true PA performance is expected to be better than what is reported. Fig. 17(b) plots the PAE versus \( P_{\text{out}} \) with 4 Gb/s 16-QAM modulation in FPM and BPM, which is similar to Fig. 14, but in this case, it is for modulated signals at 65 GHz. As shown in Fig. 17(b), the average PAE can be improved by 4.5% points at \( P_{\text{out}} \) of +9 dBm when switched to BPM while maintaining reasonable EVM\(_{\text{rms}}\) of \(-20\) dB for 4 Gb/s 16-QAM modulation. Note that the simulated AM–PM distortion was below 3°/1° for FPM/BPM up to OP1\(_{\text{dB}}\) (13.5 dBm/9.6 dBm), respectively. Therefore, the AM–PM conversion is not suspected to be limiting the overall EVM performance.

Fig. 18 plots selected prior art [1–20, 22–26, 40] of advanced 50–75 GHz (V-band) CMOS PAs, with technologies

---

Fig. 14. Measured versus simulated PAE versus \( P_{\text{out}} \) in FPM and BPM at 65 GHz.

Fig. 15. Large-signal measurements in FPM and BPM across 60–70 GHz. (a) \( P_{\text{sat}} \) and OP1\(_{\text{dB}}\). (b) Peak PAE and PAE at OP1\(_{\text{dB}}\).

Fig. 16. Measured spectrums and constellations for (a) 1.5 GSym/s 16-QAM and (b) 1 GSym/s 64-QAM at 65 GHz.

Fig. 17. Measurements of modulated signals. (a) EVM\(_{\text{rms}}\) versus \( P_{\text{out}} \) for various modulations. (b) PAE versus \( P_{\text{out}} \) with 4 Gb/s 16-QAM modulation in FPM and BPM.

---
Fig. 18. Performance comparison of mm-wave (50–75 GHz) PA prior art. (a) Peak PAE versus PA $P_{\text{sat}}$. (b) Peak PAE versus gain-fBW product [38].

### TABLE I
CONTINUOUS WAVE PERFORMANCE COMPARISON TO PRIOR-ART MM-WAVE PAs

<table>
<thead>
<tr>
<th>Technology</th>
<th>This Work</th>
<th>[1]</th>
<th>[2]</th>
<th>[4]</th>
<th>[3]</th>
<th>[18]</th>
<th>[22]</th>
<th>[25]</th>
</tr>
</thead>
<tbody>
<tr>
<td>Topology</td>
<td>2/4-Way Non-Uni Power Comb.</td>
<td>Doherty</td>
<td>4-Way Power Comb</td>
<td>3-Stage CS</td>
<td>2-Stage CS</td>
<td>8-Stage Power Comb</td>
<td>4-Stage CS</td>
<td>Cascaded Asym. DAT</td>
</tr>
<tr>
<td>$V_{\text{dd}}$ (V)</td>
<td>0.95</td>
<td>0.9</td>
<td>0.9</td>
<td>1</td>
<td>1.1</td>
<td>4.8</td>
<td>2</td>
<td>2</td>
</tr>
<tr>
<td>Frequency (GHz)</td>
<td>65</td>
<td>60</td>
<td>80</td>
<td>71</td>
<td>74</td>
<td>60</td>
<td>71</td>
<td>60</td>
</tr>
<tr>
<td>Peak Gain (dB)</td>
<td>21.4</td>
<td>12.9</td>
<td>18.1</td>
<td>16.7</td>
<td>16.6</td>
<td>24.4</td>
<td>12.5</td>
<td>23.9</td>
</tr>
<tr>
<td>Frac. BW (%)</td>
<td>22.6</td>
<td>19.1</td>
<td>10.4</td>
<td>32</td>
<td>19</td>
<td>39.4</td>
<td>13.5</td>
<td></td>
</tr>
<tr>
<td>$P_{\text{sat}}$ (dBm)</td>
<td>17.9</td>
<td>20.1</td>
<td>20.9</td>
<td>7.4</td>
<td>12.8</td>
<td>19.8</td>
<td>18</td>
<td>29.1</td>
</tr>
<tr>
<td>$O_{\text{psat}}$ (dBm)</td>
<td>13.5</td>
<td>19.3</td>
<td>17.8</td>
<td>2</td>
<td>5.7</td>
<td>15.8</td>
<td>15</td>
<td>24.7</td>
</tr>
<tr>
<td>Peak PAE (%)</td>
<td>26.5</td>
<td>26</td>
<td>22.3</td>
<td>8.9</td>
<td>26.3</td>
<td>18.4</td>
<td>20</td>
<td>18.4</td>
</tr>
<tr>
<td>PAE @ $O_{\text{psat}}$ (%)</td>
<td>15</td>
<td>25.9</td>
<td>10</td>
<td>4.8</td>
<td>11.6</td>
<td>9</td>
<td>20</td>
<td>8</td>
</tr>
</tbody>
</table>

### TABLE II
MODULATION PERFORMANCE COMPARISON TO PRIOR-ART MM-WAVE PAs

<table>
<thead>
<tr>
<th>Technology</th>
<th>This Work</th>
<th>[1]</th>
<th>[2]</th>
<th>[3]</th>
<th>[25]</th>
</tr>
</thead>
<tbody>
<tr>
<td>$V_{\text{dd}}$ (V)</td>
<td>0.95</td>
<td>2</td>
<td>0.9</td>
<td>1</td>
<td>1.1</td>
</tr>
<tr>
<td>Frequency (GHz)</td>
<td>65</td>
<td>60</td>
<td>80</td>
<td>71</td>
<td>74</td>
</tr>
<tr>
<td>Modulation (M-QAM)</td>
<td>16</td>
<td>16</td>
<td>16</td>
<td>16</td>
<td>16</td>
</tr>
<tr>
<td>Data Rate (Gb/s)</td>
<td>4</td>
<td>6</td>
<td>4</td>
<td>6</td>
<td>4</td>
</tr>
<tr>
<td>RMS EVM (dB)</td>
<td>-20.7</td>
<td>-21.6L</td>
<td>-23.2</td>
<td>-23.1</td>
<td>-25</td>
</tr>
<tr>
<td>Avg. $P_{\text{sat}}$ (dBm)</td>
<td>13</td>
<td>13</td>
<td>13.8</td>
<td>11.9</td>
<td>11</td>
</tr>
<tr>
<td>Avg. PAE (%)</td>
<td>7.2</td>
<td>10.6</td>
<td>4.5</td>
<td>11</td>
<td>5</td>
</tr>
</tbody>
</table>

1 Estimated from figures. 2 Fractional BW is defined as $\frac{\text{BW}_{\text{psat}}}{\text{Center Frequency}}$. 3 Tone-based tests. 4 PBO from $P_{\text{dd}}$. 5 Achieved by switching between two static power modes. 6 Achieved dynamically without switching modes.

varying from 45-nm SOI to 14-nm FF. The PAE versus $P_{\text{sat}}$ is shown in Fig. 18(a), while the PAE versus gain-fractional BW (fBW) product is shown in Fig. 18(b), where the fBW is defined as the small-signal 3-dB BW divided by the center frequency. The desired performance is on the upper right corner, meaning high PAE, high $P_{\text{sat}}$, and high gain-fBW product. As seen from the FPM of this design improves upon prior art for FF mm-wave PA designs [3] while also obtaining a respectable performance in BPM. Tables I and II list the comparison of the PA performance with prior-art.
V. CONCLUSION

This article presented the design of a reconfigurable V-band two-/four-way non-uniform power-combining PA implemented in 16-nm FF CMOS. The PA achieves a high gain with large fractional BW while also demonstrating back-off efficiency enhancement when switching from FPM to BPM. This work demonstrates the viability of high-power PA design in deeply scaled FF CMOS, thus enabling the development of mm-wave SoCs for next-generation wireless communication systems.

ACKNOWLEDGMENT

The authors would like to thank A. Agrawal, W. Shin, P. Sagagio, R. Bhat, M. Chakravarti, T. Palomino, and C. Paulino for technical discussion, layout, and CAD support.

REFERENCES


Kun-Da Chu (Graduate Student Member, IEEE) received the B.S. degree in electronic engineering from National Taiwan University of Science and Technology, Taipei, Taiwan, in 2005, and the M.S. degree in electronic engineering from National Taiwan University, Taipei, in 2008. He is currently pursuing the Ph.D. degree with the University of Washington, Seattle, WA, USA.

From 2008 to 2016, he was a Principal Engineer with the RFIC Design Division, MSStar Semiconductor Inc., and MediaTek Inc, Hsinchu, Taiwan, with a focus on RF transceiver circuit design. In 2018, he joined Intel Labs, Hillsboro, OR, USA, as an RFIC Design Intern. His research interests include RF/millimeter-wave (mm-wave) circuits.

Mr. Chu was a recipient of the Analog Device Outstanding Student Designer Award in 2018 and 2019.

Steven Callender (Member, IEEE) was born in Brooklyn, NY, USA, in 1986. He received the B.S. degree in electrical engineering from Columbia University, New York, NY, USA, in 2008, and the M.S. and Ph.D. degrees in electrical engineering from the University of California at Berkeley, Berkeley, CA, USA, in 2010 and 2015, respectively.

In 2015, he joined the Intel Labs, Hillsboro, OR, USA, as a Research Scientist, with a focus on the development of next-generation wireless systems. His research interests include RF/millimeter-wave (mm-wave) circuits and wideband mixed-signal systems.

Dr. Callender was a co-recipient of the ISSCC 2019 Lewis Winner Award for the Outstanding Paper and the ISSCC 2010 Jack Kilby Outstanding Student Paper Award. He was also a recipient of the Robert Noyce Memorial Fellowship in Microelectronics, in 2012, the ADI Outstanding Student Designer Award, in 2013, the William L. Everitt Student Award from Columbia University, in 2008, and the UC Berkeley EECs Chair’s Excellence Award, in 2008.

Jacques Christophe Rudell (Senior Member, IEEE) received the B.S. degree in electrical engineering from the University of Michigan, Ann Arbor, MI, USA, in 1994, and the M.S. and Ph.D. degrees from the University of California at Berkeley, Berkeley, CA, USA, in 1997 and 2000, respectively.

After completing his Ph.D., he was an RF IC Designer with Burkana Wireless (now Qualcomm), and Intel Corporation, Hillsboro, OR, USA, for several years. In 2009, he joined the University of Washington, Seattle, WA, USA, as a Faculty Member, where he is currently an Associate Professor of electrical engineering. He is an Active Member with the Center for Neural Technology (CNT), NSF Engineering Research Center (ERC), University of Washington, Seattle, WA, USA. He is a Co-Director of the Center for Design of Analog-Digital Integrated Circuits (CDADIC). His research interests include topics in RF and millimeter-wave (mm-wave) integrated circuits design for communication systems, in addition to biomedical electronics for imaging and neural interface applications.

Dr. Rudell received the Demetri Angelakos Memorial Achievement Award, during his Ph.D., a citation given to one student per year by the EICS Department. He has twice been a co-recipient of best paper awards at the IEEE International Solid-State Circuits Conference (ISSCC), the first of which was the 1998 Jack Kilby Award, followed by the 2001 Lewis Winner Award, and the 2011 and 2014 RFIC Symposium Best Student Paper Award. He received the 2008 ISSCC Best Evening Session Award. He was a recipient of the 2015 NSF CAREER Award. He served on the ISSCC Technical Program Committee (2003–2010) and on the IEEE Radio Frequency Integrated Circuits (RFIC) Symposium Steering Committee (2002–2013), where he also served as the 2013 General Chair. He currently serves on the technical program committees for the IEEE European Solid-State Circuits Conference (ESSCIRC) and the IEEE Custom Integrated Circuits Conference (CICC). He also serves as the Founding Seattle Chapter Chair for the IEEE Solid-State Circuits Society. He was an Associate Editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS (2009–2015).
Stefano Pellerano (Member, IEEE) was born in Bari, Italy, in 1977. He received the Laurea degree (summa cum laude) and the Ph.D. degree in electronics engineering from the Politecnico di Milano, Milan, Italy, in 2000 and 2004, respectively. During his Ph.D., his activity was focused on the design of fully integrated low-power frequency synthesizers for WLAN applications.

In 2003, he joined Agere Systems, Allentown, PA, USA, as a Consultant. Since 2004, he has been with Intel Labs, Hillsboro, OR, USA. He is currently a Principal Engineer leading the Next Generation Radio Integration Lab at Intel, Hillsboro, OR, where he drives several research activities focused at enabling radio circuit integration in deeply-scaled CMOS technologies. He has authored or coauthored more than 40 IEEE conference and journal articles, one book chapter, and more than 15 issued patents. His main research contributions include multi-in multi-out (MIMO) transceivers for WiFi, digital phase-locked loops (PLLs), high-efficient digital architectures for polar and outphasing transmitters, millimeter-wave (mm-wave) radio transceiver and phased-array systems, and low-power radios. Recently, he is also exploring cryogenic CMOS integrated electronics for qubit control in fault-tolerant scalable quantum computers.

Dr. Pellerano is currently serving as the Wireless Subcommittee Chair for the IEEE International Solid-State Circuit Conference (ISSCC). He served as the Technical Program Chair and the General Chair for the IEEE Radio Frequency Integrated Circuit (RFIC) Symposium in 2018 and 2019, respectively. He is currently part of the RFIC Executive Committee.

Christopher Hull (Senior Member, IEEE) received the Ph.D. degree from the University of California at Berkeley, Berkeley, CA, USA, in 1992.

In 1992, he joined Rockwell Semiconductor Systems, Newport Beach, CA, USA. In 1998, he joined Silicon Wave, San Diego, CA, USA. In 2001, he joined Innocomm Wireless, which was subsequently acquired by National Semiconductor. In 2003, he joined the Wireless Networking Group, Intel, San Diego, CA. In 2005, he moved to Intel, Hillsboro, OR, USA. In 2013, he was on international assignment with Intel Mobile Communications, Munich, Germany, where he worked closely with his colleagues on 4G cellular transceivers. Since 2015, he has been the Director or a Senior Principal Engineer with Intel Labs.