# Fully Digital Feedforward Background Calibration Of Clock Skews For Sub-sampling TIADCs Using The Polyphase Decomposition

Han Le Duc, Duc Minh Nguyen, Chadi Jabbour, Patricia Desgreys, Olivier Jamin, and Van Tam Nguyen

Abstract—This paper presents a low-power fully digital clock skew feedforward background calibration technique in sub-sampling Time-Interleaved Analog-to-digital Converters (TIADCs). Both estimation and correction algorithms share the common derivative filter, which enable to save filter hardware. Furthermore, these algorithms use the polyphase filtering technique and do not use the adaptive digital synthesis filters. Thus, the proposed calibration can be implemented at a moderate hardware with low power dissipation. The adopted feedforward technology eliminates the stability issues encountered with the adaptive technique. The Hardware Description Language (HDL) design of the proposed calibration is synthesized using 28nm FD-SOI process for a 60dB SNR TIADC clocked at 2.7GHz. The calibration is designed for both baseband and sub-sampling TIADC applications. The synthesized calibration system occupies  $0.04 \text{mm}^2$  area and dissipates 33.2 mW total power for subsampling ADC with the input at the first four NBs; and it obtains  $0.02 \text{mm}^2$  chip area occupation and 15.5 mW power consumption for the Nyquist TIADCs.

*Index Terms*—All-digital feedforward calibration, Subsampling and undersampling TIADCs, FPGA/ASIC implementation, polyphase filtering.

# I. INTRODUCTION

N modern communication systems such as broadband satellite receivers, cable TV tunners and Software-Defined Radios, Analog to Digital Converters (ADCs) play a essential role. Such ADCs require very high sampling rates with high resolution and low power dissipation. A Time-interleaved ADC (TIADC) which is formed by several slow but accurate ADCs in parallel, is a promising solution to achieve these goals [1], [2]. Unfortunately, the TIADC performance suffers from the channel mismatches including offset, gain and timing mismatches due to process, voltage, and temperature variations. These channel mismatches produce frequency aliases on the output signal and, hence, degrade the Signal-to-Noise and Distortion Ratio (SNDR) and Spurious-Free Dynamic Range (SFDR) performance of the converters significantly. Among these aforementioned errors, sample-time errors are the most critical as the timing mismatch errors increase with the input frequency and overshadow the effect of other mismatches for broadband inputs [3]. For these reasons, this framework focuses on tackling the timing mismatch problem.

The timing mismatches in TIADCs can be effectively handled by analog and/or mixed signal calibration techniques [1], [4]–[9]. Mixed-signal calibration techniques mitigate the clock skew effect by adjusting the variable-delay line in clock buffers. They exhibit good performance but the analog correction schemes limit the overall ADC resolution due to process, supply voltage and temperature variations and a thermal noise. Moreover, they require an additional analog circuit with longer development time and are not portable between CMOS technology nodes.

Recently technology shrinking, fully digital techniques seem to be a more promising solution to overcome the above issues of the analog and mixed-signal calibration. They can be developed faster, make use of the advantages of CMOS technology scaling, and easier to port to the next technology generation. Most calibration approaches [10]–[17], are derived assuming an input signal band-limited to the Nyquist frequency, i.e., the input located in the first Nyquist Band (NB), also commonly known as Nyquist Zone (NZ). However, these techniques can not be directly applied in the undersampling (or subsampling) TIADCs which samples the band-limited signals in the higher NB and is an interesting solution in the next generation directly sampling receivers such as sub-sampling receivers, software defined radios, and broadband satellite receivers [18], [19]. A few all-digital calibration techniques coping with mismatches in undersampling TIADCs have been proposed in [19]-[22]. Nevertheless, they require either an additional channel [21] or a pilot input signal [22] for calibration. The work in [20] is performed with the assumption of narrow-band signals and applied for only two channels. For the very high speed applications, the calibration in [19] used the polyphase filtering technique in order to enhance the working frequency of the Digital Signal Processing (DSP) units. However, usage of two identical derivative filters (one in correction and one in estimation) is not optimal in term of power consumption. In order to save the filter hardware, sharing one ideal differentiator filter between correction and estimation schemes is actually proposed in this framework.

Regarding the all-digital calibration implementation, calibration schemes are performed either with the aid of a feedback loop as shown in Fig. 1(a) or in a feedforward manner as shown in Fig. 1(b) [23]. Therefore, the former is defined as *all-digital feedback* calibration and the latter as *all-digital feedforward* calibration. Most prior arts [11]–[17], [19]–[22] used adaptive feedback techniques which suffer from the potential stability issues [10], [24]. In order to overcome these issues and to achieve the low power dissipation, the calibration algorithms in [10], [25] are performed in feedforward manner. This approach indeed calculated the cross-correlations among the adjacent sub-ADCs and the cross-correlations between the sub-ADC output samples and their corresponding derivative



Fig. 1: All-digital calibration of timing mismatch in TIADCs: (a) all-digital feedback calibration and (b) all-digital feedforward calibration.

samples. Using orthogonal property of input signal and its derivative, these cross-correlations enable a direct estimation of clock skews. The above pairwise cross-correlations are achieved by using the overall TIADC output and a FIR filter whose coefficients are (-1, 0, +1). The sub-ADC derivative is obtained by feeding the TIADC gain-corrected samples into a FIR baseband derivative filter [10]. With such implementation, the used FIR filters are required to run at full sampling rate, which is very challenging and power hungry for very high speed ADC design. Moreover, this technique does not work with input at any NZ.

Inspired by the all-digital feedforward calibration scheme of timing mismatch proposed in [10], [25], in this paper, we present its theoretical analysis for sub-sampling TIADCs in any Nyqvist Band. In addition, the complexity of the estimation scheme is reduced by using one cross-product between the sub-ADC output and the output derivative of the adjacent sub-ADC instead of using two cross-products in the computation of the derivative of the input autocorrelation. Furthermore, we leverage our previous work [19] by using polyphase implementation for both estimation and correction algorithms to reduce the power. Finally, the digital feedforward background calibration system is synthesized in 28nm FD-SOI process.

In detail, a linear equation system of the clock skews is derived from the pairwise cross-correlations of sub-ADC outputs which are obtained from the direct sub-ADC output samples. Solutions of these established equations are the clock skew estimates. In addition, in both correction and estimation schemes, the derivatives of sub-ADC outputs are required and computed by using polyphase technique. Note that the estimation and correction algorithms run at the sub-ADC rate, leading to the lower power consumption. They are also share the same derivative filter, i.e., the proposed algorithm saves one derivative filter in comparison with our previous works proposed in [19], [26]. Furthermore, they do not use the adaptive filter banks. As a result, the proposed calibration alleviates the power consumption and chip die area. It is also employed for very high speed (beyond several gigahertz) TIADC design.

The rest of the paper is organized as follows. Section II reviews time-interleaved ADCs, their limitations and timing skew induced errors. Section III presents the proposed feedforward calibration of the timing offset including the correction and estimation. In order to show the efficiency fo the proposed calibration, section IV analyzes the simulation results, ASIC

synthesis results and simulation results post-processing the real data captured from the ADC chip. Conclusions are finally drawn in section V.

## **II. TIME-INTERLEAVED CONVERTERS**

Fig. 2 shows a simplified block diagram architecture of M-channel TIADCs. It encompasses an analog demultiplexer at the input, a digital Multiplexer (MUX) at the output and M channel converters with the same sampling period of  $MT_s$ . The sampling instant deviation of the neighboring sub-ADCs is  $T_s$ . During operation, each selected sub-ADC by the demultiplexer sequentially sampled and digitized an analog input signal x(t) to form the digital streams. The digital data streams from M sub-ADCs are then periodically multiplexed by MUX to generate a TIADC digital output y[n]. With time-interleaving technique, the overall sampling rate  $f_s$  of TIADCs ideally is M times higher than the sampling rate of sub-ADCs. The equivalent sampling period of the TIADCs is  $T_s$ .



Fig. 2: *M*-channel TIADCs: (a) Block diagram and (b) timing diagram

However, channel mismatches among the constituent sub-ADCs including offset, gain and timing mismatches significantly degrade the linearity performance of TIADCs [3]. Offset mismatches creates additive tones at frequency  $k\frac{f_s}{M}$  where k is integer. The effect of gain mismatch causes amplitude modulation of the input samples, producing scaled replicas of the input spectrum to appear centered around integer multiples of  $\frac{f_s}{M}$  (or  $\pm f_{in} + k\frac{f_s}{M}$ ) where  $f_{in}$  is the input frequency. Both offset and gain errors are static and input frequency independent [5], [27]. The timing mismatch results in phase shift (phase modulation) of the input samples. Clock skew mismatch creates the scaled copies of the derivative of the input signal spectrum at the first order when timing mismatches are small. These copies locate at the same frequencies as the spurious components stemming from gain mismatch, hence considerably degrading SNDR/SFDR.

As stated in Section I, we focus on the timing mismatches problem in this framework. we assume that there are no gain/offset mismatches and assume  $\delta t_i$  is the clock skew (or deterministic relative timing deviation) of the  $i^{th}$  channel ADC. Ignoring quantization effects, the digital output sequence  $y_m[k]$  of the  $m^{th}$  channel ADC is expressed by

$$y_m[k] = y_m(kT_s) = x \left( kMT_s + mT_s + t_0 + \delta t_m \right).$$
 (1)

where  $t_0$  is initial sampling phase.

The  $m^{th}$  channel sub-ADC output can be expressed as a sum of an ideal signal and an error term proportional to the timing offset and the signal derivative as [4], [10], [13]

$$y_m[k] \approx x_m[k] + \delta t_m x'_m[k], \qquad (2)$$

where  $x'_m[k]$  is the derivative of the input signal at the nominal sampling time (or the derivative of sub-ADC output in the case without timing skews). The timing skew induced error is a product of the derivative signal of sub-ADC output samples and its corresponding clock skews.

#### **III. PROPOSED CALIBRATION TECHNIQUE**

The proposed calibration for input at the first NZ consists of two main algorithms: correction and estimation presented hereafter. The proposed calibration extension to input at any NZ is described straight forward.

#### A. Digital Correction

Once the estimates of  $\hat{\delta}t_m$  is known, it can be used in various different ways to get the correct value of the input signal at the nominal sampling time. Some approaches are to use a digital synthesis filter bank in the correction as in [14], [20] that runs at full speed all the time and require large amount of computation resource. Therefore, these approaches are not very suitable for low power specifications. In this framework, the adopted approach is to just subtract the error given by the second term of (2) from the conversion result. This would require a knowledge of the derivatives  $x'_m[k]$ of ideal sub-ADC outputs. In blind calibration, the input is unknown, i.e.,  $x'_m[k]$  is also unknown. Thus, the derivatives  $y'_m[k]$  of the distorted sub-ADC output samples are used for correction as done in [10], [11], [13], [24], [27]. Derivatives  $y'_m[k]$  are computed in digital domain using polyphase filters of the digital differentiators proposed in [19], [26]. With 25-tap FIR differentiator used in [26], the derivative span is extended up to 90% of the Nyquist zone before it rolls off.

## **B.** Digital Estimation

Let the input analog signal x(t) be Wide-Sense Stationary (WSS) random process and band-limited to the Nyquist frequency, i.e., its mean  $E \{x(t)\}$  is constant and its autocorrela-

$$E\{x(t)\} = \eta = \text{constant},$$
 (3a)

$$R_x(t_1, t_2) = R_x(\tau) = E\{x(t+\tau)x(t)\}, \text{ for all } t.$$
 (3b)

In discrete-time domain, the cross-correlation  $R_{fg}[l]$  of two signals f[n] and g[n] is defined by [29], [30]

$$R_{fg}[l] = E\{f[n+l]g[n]\} \\ = \lim_{N \to \infty} \frac{1}{2^{N+1}} \sum_{n=-N}^{N} f[n+l]g[n],$$
(4)

where index l is the (time) shift (or *lag*) parameter and the notation  $R_{fg}[l]$  is extensively used for writing convenience.

*Basic idea:* To estimate the clock skews of sub-ADCs, the cross-correlation of two adjacent channels of the TIADC is computed. The difference between two cross-correlation functions produces a linear equation system whose variables are timing skews of the individual channel ADCs. A solution of this linear equation system is the timing skews of sub-ADCs. The estimation technique will be interpreted in more detail as follows.

1) Linear Equation System With Clock Skew Variables: From (4), (1) and (3b), the cross-correlation between two consecutive channel ADCs (ADC<sub>m</sub> and ADC<sub>m-1</sub> for  $1 \le m \le M - 1$ ) is expressed by

$$R_{y_m y_{m-1}} [0] = E \{ y_m [k] y_{m-1} [k] \}$$
  
=  $E \{ x (kMT_s + mT_s + t_0 + \delta t_m)$   
 $\times x (kMT_s + mT_s - T_s + t_0 + \delta t_{m-1}) \}$   
=  $R_x (T_s + \delta t_m - \delta t_{m-1}).$  (5)

Analogously, the cross-correlation of two adjacent sequence outputs from sub-channel  $ADC_m$  and  $ADC_{m+1}$  for  $0 \le m \le M-2$  is written by

$$R_{y_m y_{m+1}} [0] = E \{ y_m [k] y_{m+1} [k] \}$$
  
=  $E \{ x (kMT_s + mT_s + t_0 + \delta t_m)$   
 $\times x (kMT_s + mT_s + T_s + t_0 + \delta t_{m+1}) \}$   
=  $R_x (T_s + \delta t_{m+1} - \delta t_m).$  (6)

In order to estimate sample-time errors, the timing error function is defined as the difference of pairwise crosscorrelations of sub-ADC output samples and is expressed by

$$\Gamma_m = R_{y_m y_{m-1}} [0] - R_{y_m y_{m+1}} [0], 1 \le m \le M - 2.$$
(7)

By applying the Taylor series expansion up to the first derivative around a zero point, the autocorrelation function  $R_x (T_s + \delta t)$  as a function of the variables  $\delta t$  is expressed by

$$R_{x}(T_{s} + \delta t) \approx R_{x}(T_{s}) + \delta t \times \frac{dR_{x}(T_{s} + \delta t)}{d\delta t}\Big|_{\delta t=0}$$
(8)  
$$\approx R_{x}(T_{s}) + \delta t \times R'_{x}(T_{s}),$$

where  $R'_{x}(T_{s})$  is the derivative of autocorrelation of the input signal. In practice, the clock skews arising in time-interleaved ADCs are typically small as compared to the sampling interval  $T_{s}$ . By Taylor series expansion to (5) and (6) as done in (8), the timing error function  $\Gamma_{m}$  in (7) can be simplified as follows.

$$\Gamma_m \approx R'_x \left(T_s\right) \left(2\delta t_m - \delta t_{m-1} - \delta t_{m+1}\right), 1 \le m \le M - 2.$$
(9)

Obviously, (9) is a linear equation system that has (M - 2) linear equations and M variables being the timing skews of M channel sub-ADCs. In order to derive the linear equation system having number of linear equations the same as the number of variables, we introduce additionally two timing error functions of  $\Gamma_0$  and  $\Gamma_{M-1}$  as follows.

$$\Gamma_{0} = R_{y_{M-1}y_{0}} [-1] - R_{y_{0}y_{1}} [0]. 
\Gamma_{M-1} = R_{y_{M-1}y_{M-2}} [0] - R_{y_{M-1}y_{0}} [-1].$$
(10)

By substituting m for m = 0 into (6), we have

$$R_{y_0y_1}[0] = R_x \left( T_s + \delta t_1 - \delta t_0 \right).$$
(11)

By replacing m with m = M - 1 into (5), we have

$$R_{y_{M-1}y_{M-2}}[0] = R_x \left( T_s + \delta t_{M-1} - \delta t_{M-2} \right).$$
(12)

Analogously to (5) and (6), we completely compute

$$R_{y_{M-1}y_0}[-1] = E \{y_{M-1}[k-1]y_0[k]\} \\= E \{x((k-1)MT_s + (M-1)T_s + t_0 + \delta t_{M-1}) \\ \times x (kMT_s + 0 \times T_s + t_0 + \delta t_0)\} \\= R_x (T_s + \delta t_0 - \delta t_{M-1}).$$
(13)

From (8), (10), (11), (12) and (13), we have

$$\Gamma_{0} \approx R'_{x}(T_{s}) \left(2\delta t_{0} - \delta t_{1} - \delta t_{M-1}\right).$$

$$\Gamma_{M-1} \approx R'_{x}(T_{s}) \left(2\delta t_{M-1} - \delta t_{M-2} - \delta t_{0}\right).$$
(14)

From (9) and (14), we have a linear equation system with  $1 \le m \le M - 1$  as follows.

$$\begin{cases} \Gamma_{0} \approx R'_{x}(T_{s}) \left(2\delta t_{0} - \delta t_{1} - \delta t_{M-1}\right) \\ \vdots & \vdots & \vdots \\ \Gamma_{m} \approx R'_{x}(T_{s}) \left(2\delta t_{m} - \delta t_{m-1} - \delta t_{m+1}\right) & (15) \\ \vdots & \vdots & \vdots \\ \Gamma_{M-1} \approx R'_{x}(T_{s}) \left(2\delta t_{M-1} - \delta t_{M-2} - \delta t_{0}\right). \end{cases}$$

In general, if the derivative of the input autocorrelation  $R'_x(T_s)$  is determined, straightforwardly solving the equation system (15) provides the sample-time error estimates of sub-ADCs.

2) Estimates Of The Input Autocorrelation Derivative: From (3b), we have

$$R'_{x}(T_{s}) = \left. \frac{dR_{x}(\tau)}{d\tau} \right|_{\tau=T_{s}} = E\left\{ x'(t+T_{s})x(t) \right\}.$$
 (16)

Without clock skews, the ideal sampling instant (or the ideal sampling edge) of the  $m^{th}$  sub ADC is at

$$t_{s_m} = kMT_s + mT_s + t_0, 0 \le m \le M - 1.$$
 (17)

By replacing t by  $t_{s_m}$  into (16), we have

$$R'_{x}(T_{s}) = E\left\{\underbrace{x'(kMT_{s} + (m+1)T_{s} + t_{o})}_{x'_{m+1}[k]} \times \underbrace{x(kMT_{s} + mT_{s} + t_{o})}_{x_{m}[k]}\right\}$$
$$= E\left\{x'_{m+1}[k]x_{m}[k]\right\}, 0 \le m \le M - 1.$$
(18)

where  $x_m[k]$  is the ideal output samples of the  $m^{th}$  channel ADC; and  $x'_{m+1}[k]$  is the derivative of the desired outputs of the  $(m+1)^{th}$  channel ADC. In a blind calibration mechanism, the input signal is not available, i.e.,  $x_m[k]$  and  $x'_{m+1}[k]$  are unknown. Thus, the output of the sub ADC<sub>m</sub> and the derivative of the channel ADC<sub>m+1</sub> are used to approximately compute the autocorrelation of the input signal. Thus, the first order derivative of the autocorrelation of the input signal can be computed approximately by

$$R'_{x}(T_{s}) \approx E\left\{y'_{m+1}[k]y_{m}[k]\right\}$$

$$= R_{y'_{m+1}y_{m}}[0], 0 \le m \le M - 1.$$
(19)

In other words, the first order derivative of the autocorrelation of the input signal at the time of  $T_s$  is computed by using one cross-correlation between the sampled output of the  $m^{th}$ channel ADC and the derivative of the output of the  $(m + 1)^{th}$  sub-ADC. It has less computation complexity than the approach in [10], [25] where a sum of many cross-correlations is used.

3) Solution Of The Linear Equation System: The linear equation system (15) can be written in matrix notation as follows.

$$\frac{1}{R'_{x}(T_{s})} \begin{bmatrix} \Gamma_{0} \\ \Gamma_{1} \\ \vdots \\ \Gamma_{M-1} \end{bmatrix} \approx \mathbf{H} \times \delta \mathbf{T}, \qquad (20)$$

where  $\delta \mathbf{T}$  is a column vector whose elements are individual clock skews of sub-ADCs, and expressed by

$$\delta \mathbf{T} = [\delta t_0, \delta t_1, \cdots, \delta t_{M-1}]^T \tag{21}$$

and

$$\mathbf{H} = \begin{bmatrix} 2 & -1 & 0 & \cdots & 0 & -1 \\ -1 & 2 & -1 & \cdots & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ -1 & 0 & 0 & \cdots & -1 & 2 \end{bmatrix}_{M \times M}$$
(22)

is a constant circulant matrix of size  $M \times M$  (M rows and M columns) having the following properties.

- If the column index is equal to the row index, its corresponding coefficient is equal to 2.
- If  $ind_{col} = (ind_{row} \pm 1) \mod M$ , its corresponding coefficients are equal to -1, where  $ind_{col}$ ,  $ind_{row}$  are the index of column and row of matrix **H**, respectively.
- The coefficients are equal to 0 for all other indexes of the rows and columns.
- **H** is a symmetric matrix, i.e.,  $\mathbf{H} = \mathbf{H}^T$ , where symbol T notates the transpose operator.
- The circulant matrix  $\mathbf{H}$  is fully specified by its first row vector. The remaining rows are generated by the cyclic permutations of the first row. The final row is a vector in reverse order compared to the first row. Moreover, both the sum of all rows and the sum of all columns are zero. As a result, the rank of the circulant matrix  $\mathbf{H}$  is determined by [31] and is equal to (M 1).

Taking a sum of all equations of the above linear equation system results in

$$\frac{1}{R'_x(T_s)} \left( \Gamma_0 + \Gamma_1 + \ldots + \Gamma_{M-1} \right) = 0,$$
 (23)

which is not dependent on the variables of  $\delta t_m$ ,  $1 \leq m \leq M - 1$ . As a result, the linear equation (20) is simplified and expressed in matrix notation by

$$A\delta\mathbf{T} = \frac{1}{R'_x\left(T_s\right)}\mathbf{\Gamma},\tag{24}$$

where

and

$$\boldsymbol{\Gamma} = \left[\Gamma_1, \cdots, \Gamma_{M-1}\right]^T \tag{25}$$

$$\mathbf{A} = \begin{bmatrix} -1 & 2 & -1 & \cdots & 0 & 0\\ 0 & -1 & 2 & \cdots & 0 & 0\\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots\\ -1 & 0 & 0 & \cdots & -1 & 2 \end{bmatrix}_{(M-1) \times M}$$
(26)

is also constant circulant matrix of size  $(M - 1) \times M$ . The matrix **A** is generated by deleting the first row of the constant matrix **H**. From the properties of matrix **H**, the circulant constant matrix **A** is referred to be *full row rank*, i.e.,  $rank(\mathbf{A}) = M - 1$  [31]. This is equivalent to that the square,  $(M - 1) \times (M - 1)$  matrix  $\mathbf{AA}^{\mathrm{T}}$  is invertible, meaning that it has full rank, (M - 1) [32].  $\Gamma$  is a vector of size  $(M - 1) \times 1$  in the right-hand side of the linear system (24). A  $M \times 1$  size vector  $\delta \mathbf{T}$  is a solution of the equation system (24) if  $A\delta \mathbf{T} = \frac{1}{R'(T_{\mathrm{c}})}\Gamma$  holds.

The system of linear equations (24) is underdetermined because there are more unknowns (variables  $\delta t_m$ ,  $0 \le m \le M - 1$ ) than equations. This occurs once there are more columns than rows with linearly independent rows. In this case, the least-norm (or least square) solution of the underdetermined system of linear equations is given by authors in [32]–[35]. The unique minimum norm solution (or the unique solution with minimal Euclidean norm) of the system can then be expressed as [32]–[35].

$$\delta \mathbf{T} \approx \mathbf{A}^{\mathbf{T}} \left( \mathbf{A} \mathbf{A}^{\mathbf{T}} \right)^{-1} \frac{1}{R'_{x} \left( T_{s} \right)} \mathbf{\Gamma}.$$
 (27)

 $\mathbf{A}^{\dagger} = \mathbf{A}^{\mathbf{T}} (\mathbf{A}\mathbf{A}^{\mathbf{T}})^{-1}$  is called the *pseudo-inverse* constant circulant matrix of full rank of M.  $\mathbf{A}^{\dagger}$  is also called a right inverse matrix of  $\mathbf{A}$ . The elements (or coefficients) of a minimum norm solution vector of the underdetermined equations are the clock skew estimates of sub-ADCs  $\delta t_m, 0 \leq m \leq M-1$ .

## C. Architecture Of The Proposed Calibration

The overall architecture of the proposed calibration technique is shown in Fig. 3. It mainly consists of a *Derivative Polyphase Filter (DPF)* block proposed in [19], a *Crosscorrection Computation Unit (CCU)*, a *Gamma computation Unit (* $\Gamma$ U) block, and a *Matrix Processing Unit (MPU)* block.

• The Derivative Polyphase Filters (DPF): From (19) and (2), the derivatives of the sub-ADC outputs are required



Fig. 3: The overall architecture of the proposed calibration

to estimate the timing mismatch coefficients. The signal derivatives of sub-ADC outputs are obtained by an DPF block. The DPF block encompasses polyphase filter sub-systems subPF<sub>i</sub>,  $0 \le i \le M - 1$  described in [19], [26]. The subPF<sub>i</sub> is formed by delay elements  $z^{-1}$  and the type-1 polyphase filter components  $P_i(z)$  of the Nyquist derivative FIR filter. The impulse response of the  $i^{th}$  polyphase filter is given by

$$p_i[n] = h_d[nM+i], i = \{0, 1, \cdots, M-1\},$$
 (28)

where  $h_d[n]$  is the impulse response of ideal derivative filter and expressed as [29], [30]

$$h_d[n] = \begin{cases} \frac{\cos(n\pi)}{n} & (n \neq 0) \\ 0 & (n = 0) \end{cases}.$$
 (29)

The architecture of  $subPF_i$  is shown in Fig 4 in the detail for an example of two-channel TIADCs. This architecture can be easily extended to *M*-channel TIADCs [19], [26].



Fig. 4: Equivalent polyphase structure of the two-channel TIADCs [19], [26].

• Cross-correction Computation Unit (CCU): Crosscorrelation Computation Unit (CCU) is to compute the cross-correlation between two signals in (4). The limits in (4) can not generally be computed. In practice, given lag l, the cross-correlation  $R_{fg}[l]$  is calculated by averaging the product of two signals over N samples as [29], [30]

$$R_{fg}[l] = E\{f[n+l]g[n]\} \\ = \frac{1}{N}\sum_{n=0}^{N-1} f[n+l]g[n].$$
(30)

The computation of the expected values in (30) is realized by the CCU block as shown in Fig. 5 for lag l = 0, that consists of a multiplier and a *Modified Moving Average* (MMA) filter [25]. Note that if time shift  $l \neq 0$ , the delay is added to a corresponding input of multiplier in order to compute the signal f[n+l] with lag l. The MMA filter



Fig. 5: Cross-correlation Computation Unit (CCU).

(or *Smoothed Moving Average* (SMMA) filter) is defined by

$$s[n] = \frac{N-1}{N}s[n-1] + \frac{1}{N}q[n],$$
 (31)

where s[n] and q[n] are its output and input, respectively. It is used to calculate the average of cross-product between two input signals in (30). The differential equation (31) can be simplified by

$$s[n] = \frac{1}{N} \left( q[n] - s[n-1] \right) + s[n-1].$$
(32)

As a result, the MMA filter is drawn in Fig. 6. The



Fig. 6: The MMA filter.

coefficient  $\alpha = \frac{1}{N}$  represents a constant smoothing factor of MMA filter. Number of samples N is selected to be  $2^k$  where k is the positive integer since multiplying the signal with  $\alpha$  is equivalent to a left arithmetic shift by k places, hence outperforming hardware multipliers.

- Gamma computation Unit ( $\Gamma$ U):  $\Gamma$ U consists of the Gamma computation sub-units  $\Gamma$ U<sub>m</sub> that compute timing error functions  $\Gamma_m, 1 \leq m \leq M 1$ . From (4) and (7), sub-Unit  $\Gamma$ U<sub>m</sub> is illustrated in Fig. 7 for  $1 \leq m \leq M 2$ . From (4) and (10), the sub-Unit  $\Gamma$ U<sub>M-1</sub> is shown in Fig. 8. Sub-Unit  $\Gamma$ U<sub>0</sub> is not shown herein because the timing error function  $\Gamma_0$  is not used for the timing skew estimation, see (25). However, the architecture of  $\Gamma$ U<sub>0</sub> can be easily derived based on (4) and (10).
- Matrix Processing Unit (MPU): The MPU performs the matrix product in (27). The pseudo-inverse constant



Fig. 7: Sub-Unit  $\Gamma U_m$ ,  $1 \le m \le M - 2$ .



circulant matrix  $\mathbf{A}^{\dagger}$  is referred to be as an input parameter of the proposed calibration algorithm.

• Correction Scheme: The corrected samples  $\hat{y}_m[k]$  of the



Fig. 9: Correction Scheme

 $m^{th}$  sub-ADC are computed by first-order Taylor approximation according to (33). This principle is realized in Fig. 9.

$$\hat{y}_m[k] \approx y_m[k] - \hat{\delta}t_m y'_m[k]. \tag{33}$$

# D. Calibration For Input At Any Nyquist Band

As elaborated in sections III-B, III-A and III-C, in order to cancel out the timing skew error, the derivative of the WSS input bandlimited to Nyquist frequency is determined by using the polyphase filter decomposition of the ideal differentiator filter with impulse response  $h_d[n]$ . Let consider a continuous time WSS Bandpass (BP) input signal inside the  $k_{NB}^{th}$  NB. Its frequency content in the two frequency bands is then defined by

$$(k_{NB} - 1) \frac{f_s}{2} < f_L \le |f| \le f_H < k_{NB} \frac{f_s}{2}, k_{NB} \ge 1,$$
 (34)

where  $f_L$ ,  $f_H$  are the low and high cutoff frequencies of the input, respectively. This BP input occupied the  $k_{NB}^{th}$  Nyquist zone. If the condition (34) is fulfilled, there will be no aliases after sub-sampling the original input [36]. In [19], [27], a

filter is proposed to compute the derivative of the original BP input of undersampling (or sub-sampling) TIADC. The filter is called a BP Derivative (BD) filter and re-sketched in Fig. 10. It encompasses a scaling factor dependent on the



Fig. 10: Bandpass derivative filter.

order of NB  $k_{NB}$  and two Finite Impulse Response (FIR) filters with constant coefficients: a differentiator  $h_d[n]$  and an Hilbert filter. The Hilbert filter is an all-pass filter that shifts the input signal phase by 90 degree [36] and its impulse response  $h_h[n]$  is expressed by [30], [36]

$$h_h[n] = \begin{cases} \frac{2}{\pi} \frac{\sin^2\left(\frac{\pi n}{2}\right)}{n} & (n \neq 0)\\ 0 & (n = 0). \end{cases}$$
(35)

The impulse response of the BD filter is expressed by

$$h_{bd}[n] = h_d[n] + h_h[n] \times (-1)^{k_{NB}} \left\lfloor \frac{k_{NB}}{2} \right\rfloor 2\pi.$$
 (36)

The constant (or scale factor) of  $(-1)^{k_{NB}} \times \lfloor \frac{k_{NB}}{2} \rfloor \times 2\pi$  is referred to be as an input parameter of the proposed calibration algorithm. Given the order of NB, the BD filter has constant coefficients. By decomposing the *M*-component polyphase filter structure for the BD filter  $h_{bd}[n]$  as done for the aforementioned differentiator  $h_d[n]$ , the overall architecture of the proposed calibration illustrated in Fig. 3 is applicable for input at any NBs.

# IV. EXPERIMENTAL RESULTS

#### A. Simulation Results

To verify the efficiency of the proposed technique, simulations were carried out on an undersampling four-channel TIADCs with 60dB SNR (thermal noise level) clocked at  $f_s = 2.7$ GHz. Timing skews are modeled as Gaussian distribution with zero mean and standard deviation  $\delta_{te}$  of 0.33ps. The constant smoothing factor  $\alpha$  of MMA filters is chosen as  $2^{-15}$  to achieve a good compromise between the convergence speed and the parameter estimation precision [37]. The order NB  $k_{NB}$  of the input is designed up to the fourth Nyquist band for the next-generation sub-sampling radio receivers in order to choose proper and feasible sampling rate [29], [38]. Thus, the simulations demonstrating the efficiency of the proposed calibration are therefore performed up to the fourth NZ i.e., simulations with  $k_{NB} = \{1, 2, 3, 4\}$ .

The number of FIR taps is designed to be equal for both derivative filter and Hilbert filter. The coefficients of these FIR



Fig. 11: SNDR/SFDR performance vs. number of FIR taps.

filters are obtained by multiplying the exact coefficients by Hanning window to mitigate the influence of a truncation error. Their polyphase filters are causal and linear phase FIR filters. The coefficients of polyphase filters are explicitly computed based on equations (28), (29), (35) and (36). Note that constant group delay of the polyphase filter of  $\frac{\text{number of FIR taps}-1}{2^M}$  is added to all signal paths to balance the delay. In order to avoid the non integer delay, the number of FIR taps of the derivative and Hilbert filters should be integer multiple of 2Mplus 1 [19], [19]. The SNDR and SFDR performance versus the FIR taps over the first four NBs is drawn in Fig. 11. The solid and dash-dot curves show the performance with/without calibration, respectively. The curves with  $\bigcirc, \Box, \bigstar$  and  $\triangleright$ markers show the SNDR/SFDR performance in respect of the first four NZs, respectively. As can be noticed, the optimal number of FIR taps is 25 taps at which the performance saturates.

Fig. 12 shows the output spectrum of TIADC before and after calibration for a single-tone sinusoidal input signal is generated at  $f_{in} = 0.45 \times f_s + \frac{f_s}{2}$  in the second NB. As illustrated in Fig. 12(b), the spurs due to the timing skews are mitigated significantly in comparison with the output spectrum before calibration shown in Fig. 12(a). The SFDR is improved by almost 35dB. The SNDR value after calibration is 60dB which is equal to its value in the nomismatch case. Fig. 13(a) illustrates the convergence speed



Fig. 12: Output spectrum of four channel TIADCs with input  $f_{in} = 0.45 \times f_s + \frac{f_s}{2}$  in second Nyquist band (Due to undersampling TIADC,  $f_{in}$  maps to  $0.05 \times f_s$  in baseband).

of timing mismatch estimates. The Monte Carlo simulation method is used to generate the many iterations of clock skews in order to show the convergence speed of SNDR during calibration for the single-tone sinusoid input at frequency of  $0.45 \times f_s + (k_{NB} - 1) \times f_s, k_{NB} \in \{1, 2, 3, 4\}$ . The SNDR curves as functions of time are shown in Fig. 13(b). By comparing Fig. 13(a) and Fig. 13(b), it can be seen that the clock skew estimates converge to their expected values after 5-K samples (or after  $1.8\mu$ s). Note that the convergence time



(a) Convergence speed of skews (b) Convergence speed of SNDR

Fig. 13: Convergence speed behavior during calibration.

depends on the simulated parameters such as input frequency, number of channels, and channel mismatches. To analyze the key differences between the proposed calibration and the state of the art, Table I presents the reported available convergence time and main characteristics of the prior art techniques.

TABLE I: Comparison with the state of the art techniques.

| Characteristics              | ISSCC<br>2014 [10] | TCAS-I<br>2012 [21] | SPAWC<br>2011 [22] | TCAS-II<br>2015 [19] | TCAS-I<br>This work |
|------------------------------|--------------------|---------------------|--------------------|----------------------|---------------------|
| Background                   | Yes                | Yes                 | Yes                | Yes                  | Yes                 |
| Blind                        | Yes                | Yes                 | semi-blind         | Yes                  | Yes                 |
| Input in any NB              | No                 | Yes                 | Yes                | Yes                  | Yes                 |
| Add ref. channel             | No                 | Yes                 | No                 | No                   | No                  |
| Pilot input injection        | No                 | No                  | Yes                | No                   | No                  |
| M (# of Channels)            | 12                 | 4                   | 2                  | 4                    | 4                   |
| Clock freq. of filter        | $f_s$              | $f_s$               | $f_s$              | $f_s/M$              | $f_s/M$             |
| Cal. Manner                  | FF                 | FB                  | FB                 | FB                   | FF                  |
| Conv. time<br>[# of samples] | 10K                | 1.5K                | 4K                 | 10K                  | <b>5</b> K          |

Fig. 14 illustrates SNDR and SFDR versus the baseband frequencies to which the subsampling TIADC folds the input frequency at higher NB back. As can be seen, the SNDR and SFDR before calibration decrease with the input frequency and the NZ order increment. This is because the impact of timing skew increases with input frequency [3]. After calibration, the SFDR remains smaller for higher NZs. This due to the fact that (i) (8) and (2) are obtained by employing the Taylor series to the first derivative once assuming the sample-time error small; (ii) the derivative signals in (2) and (19) are computed using the incorrected sub-ADC outputs instead of input. These approximations become less accurate when the input frequency increases because the distortion level rises. Nevertheless, the proposed technique achieves the SFDR improvement of at least 28dB and 60dB SNDR over the first four NBs, which proves the efficiency and added value of the proposed technique.

Fig. 15 shows the SNDR and SFDR versus the standard deviation  $\delta_{te}$  of clock skews for the input at frequency  $f_{in} = 0.45f_s + \frac{f_s}{2}$  in the second NZ. The solid and dashdot curves show the performance after and before calibration, respectively. SNDR and SFDR before calibration decrease when timing skews increases. This is because the impact



Fig. 14: Minimum SNDR/SFDR vs. input frequencies over the first four NBs: (a) SNDR; (b) SFDR.

of timing mismatch rises with clock skews. The proposed calibration significantly improves the linearity up to 97dB SFDR for  $\delta_{te}$  less than 0.2ps.

The presented feedforward calibration is also validated for a band-limited bandpass multitone input signal. Fig. 16 shows the output spectrum with/without calibration for a 47 sinusoidal tone input in the second NB with  $f_L = 0.05f_s + \frac{f_s}{2}$ , and  $f_H = 0.4f_s + \frac{f_s}{2}$ . As illustrated in Fig. 16, undersampling TIADC directly down-converts the bandpass input to the frequency baseband  $(0, \frac{f_s}{2})$ , i.e.,  $f_H$  and  $f_L$  map to  $0.1f_s$  and  $0.45f_s$  due to the second NZ sub-sampling, respectively. It can be seen that spurs due to timing skews are reduced to the noise floor. The proposed technique obtains the SNDR improvement of approximately 14dB.

## B. Hardware Implementation and Validation

FPGA and ASIC design flow using Matlab/Simulink in [19], [24], [26] is applied in this framework. The hardware architecture of the proposed calibration is designed and optimized in term of fixed-point representation of signals that is characterized by signal ranges and signal *Word-Length* (WL). In the Optimal Fixed-point Simulink (OFpS) model, its parameters need to be optimized, i.e. the order of FIR filters and the signal WLs. The signal ranges of block signals in the OFpS model are mathematically computed based the transfer function of DSP blocks and simulations as done in [24]. The signal ranges



Fig. 15: SNDR and SFDR vs. the standard deviation  $\delta_{te}$  of clock skews for the input at frequency  $f_{in} = 0.45f_s + \frac{f_s}{2}$  in the second NZ.



Fig. 16: Output spectra for the band-limited bandpass multitone input x(t) with  $f_L = 0.05f_s + \frac{f_s}{2}$ ,  $f_H = 0.4f_s + \frac{f_s}{2}$  in the second NZ (a) before calibration: SNDR = 43.9dB and (b) after calibration: SNDR = 57.4 dB. (cutoff frequencies of  $f_H$ ,  $f_L$  map to  $0.1f_s$  and  $0.45f_s$  in baseband, respectively).

would determine the fractional factors used to convert signal values into binary representation meanwhile the WLs impact on SNDR/SFDR performance. Thus, the optimal FIR orders and the WLs of all block signals are optimized based on SNDR/SFDR metrics as presented in Fig. 11 for an example of the FIR order optimization and in Fig. 17 for the corrected sub-ADC outputs. As can be seen in Fig. 17, the number of bits is assigned to the compensated sub-ADC outputs is 14



Fig. 17: Minimum SNDR/SFDR vs. WLs of the corrected sub-ADC outputs: (a) SNDR; (b) SFDR.

The OFpS model (or hardware architecture) processes realtime signal data in sample-by-sample manner. Therefore, the delays of each signal datapath are made balanced. Note that, pipeline registers are inserted to reduce the combinational path length and improve the global working frequency and throughput [19]. Fig. 18 shows the pipelined OFpS architecture of



Fig. 18: The OFpS architecture of the proposed calibration for four-channel TIADCs.

the proposed calibration.  $z^{-k}$  blocks are delayed/pipeline registers. The fixed point data type of the 13.11 means that the signal WL is 13 bits with the fractional factor of  $2^{-11}$  [19], [26]. The sub-ADC outputs are represented on 13 bits to decrease the sensitivity to digital truncation errors. The system-level simulation reveals that the corrected sub-ADC outputs are presented by 14 bits for minimum digital truncation errors. Note that since the delayed/pipeline registers are manually inserted at the outputs and inputs of polyphase filters subPF<sub>i</sub>, the total number of delayed/pipeline registers of the DPF is 8. As shown in Fig. 18, the latency between the distorted output and the compensated output is 13 clock cycles.

The OFpS model is converted into Hardware Description Language (HDL) using Matlab HDL Coder toolbox. In this framework, Verilog code is generated automatically for FPGA/ASIC design target. The Hardware-In-Loop (HIL) emulation methodology in [19], [24], [26] is applied to validate the proposed calibration on the Altera FPGA board. Simulation shows that the synthesized circuit operates properly on the FPGA and dissipates few percentages of the hardware resources of FPGA chip.

1) ASIC synthesis: The ASIC design flow proposed in [19], [26] is applied in this framework. The HDL design is synthesized to a gate-level netlist using Cadence RTL Compiler (RC) targeting the ST-28nm FD-SOI technology. The RC synthesis tool uses the automatic datapath retiming techniques in order to improve the working frequency. To make more efficient power savings, automatically inserting clock-gating logic for register banks is executed by the low power (RC-LP) engine. The RC-LP engine identifies when the registers are inactive and disables the clock during these periods. Once timing requirements are fulfilled, automatic placement and routing are performed using Cadence Encounter Place and Route (P&R) tool. During the P&R phase, clock tree synthesis is also performed and the clock buffers are placed to ensure correct clock propagation and synchronization in the design [19]. The P&R logic simulations is performed using the P&R Verilog gate-level netlist and the Verilog testbench generated by Matlab HDL coder Toolbox. Verilog Value-Change Dump (VCD) file generated by P&R logic simulations contains the information about value changes on selected signals. Reading switching activity information from a VCD file by Encounter RTL compiler provides the detailed information about the switching behavior of nets and ports, which leads to more accurate power estimation [19].

Note that the fixed point coding, optimization of the filter coefficients and the signal path Word-Length (WL) were performed using Matlab-Simulink. Using this study, the Verilog code was created and thus the output of the modelsim simulation is to be identical to the output of the Matlab simulink simulations as long as there are no timing errors in the modelsim simulations. Thus, if there is no error after the post P&R logic simulations, the achieved output of the synthesized circuit after the layout simulations is the same as the output of the fixed-point simulink model.

Using the polyphase implementation, the working clock frequency of the synthesized circuit is  $\frac{f_s}{4} = 675$ MHz. The post P&R logic simulations are executed without any error messages at 2.7GHz, i.e., reporting that Total Negative Slack (TNS = 0), Worst Negative Slack (WNS = 0.78ns), and no violating paths. The overall speed of the calibration sys-

TABLE II: The targeted logic gates for sub-sampling ADCs

| Туре                             | Instances           | Area $[\mu m^2]$                 | Area %             |
|----------------------------------|---------------------|----------------------------------|--------------------|
| Sequential<br>Inverter<br>Buffer | 4185<br>5905<br>449 | 18367.181<br>1963.459<br>240.067 | 41.6<br>4.4<br>0.5 |
| Logic                            | 15493               | 23600.106                        | 53.4               |
| Total                            | 26032               | 44170.813                        | 100.0              |

tem is 2.7GHz. It consumes a total power of 33.2 mW at input frequency 5.3 GHz in the fourth NZ and occupies the  $0.04 \text{mm}^2$  area. The synthesized calibration system uses 26032 logic gates in total as shown in Table. II.

Fig. 19(a) shows the pie chart of the power consumption at the level hierarchy of the synthesized system. As can be seen,

the DPF block dissipates 45% of the total power significantly more power than the other sub-modules. Fig. 19(b) displays shows the physical locality of the sub-modules on the chip of the synthesized and designed digital circuit. The DPF block formed from the digital polyphase filter components of BD FIR filter occupies the biggest part of the chip die area.



Fig. 19: (a) Power pie chart and (b) Module locality of the chip die area

The above calibration system design works properly for the sub-sampling ADC with the input up to NZ4. Considering the baseband (or Nyquist) ADC applications, the calibration is dedicated to design for the input in the first NB. In this case, the digital backgound calibration unit dissipates 15.5mW total power at input frequency 1.2GHz in the first NZ and occupies the  $0.02 \text{mm}^2$  chip area. 11214 logic gates are used in the digital unit as presented in Table III. The power dissipation of

TABLE III: The targeted logic gates for baseband ADCs

| Туре       | Instances | Area [ $\mu m^2$ ] | Area % |
|------------|-----------|--------------------|--------|
| Sequential | 2255      | 9864.787           | 47.1   |
| Inverter   | 992       | 338.477            | 1.6    |
| Buffer     | 55        | 34.272             | 0.2    |
| Logic      | 7912      | 10716.528          | 51.1   |
| Total      | 11214     | 20954.064          | 100.0  |

the digital calibration unit for the babseband ADC is 50% lower than the circuits dedicated for sub-sampling ADCs. Actually, for this latter, we have a higher signal WLs and the addition of FIR Hilbert filter, NZ order dependent scale factor to the Nyquist derivative filter.

2) Measurement Set-up: This section is dedicated to the results achieved by post processing the measured data using the TIADC board provided by NXP seminconductor and published in [6]. The chip is a 64-channel 11-b time-interleaved SAR ADC clocked at 2.7GHz. 64 channels are divided into 4 quarters. Each quarter consists of the interleaved 16 SAR ADCs which share the same sample and hold. The effect of timing mismatch in this TIADC is therefore equivalent to the timing mismatch in four-channel TIADCs. This is the reason why the above simulations were carried for the four-channel TIADC in order to demonstrate the efficiency of our solution. Note that offset and gain mismatches are static and frequency independent. Their impacts are the same for both subsampling and regular baseband TIADCs [19], [27]. Moreover, the offset

and gain calibrations integrated on the ADC chip can be turned on/off by software. Thus, all-digital offset and gain mismatch calibrations are proposed in this framework in order to tackle all channel mismatches in subsampling TIADCs with the real samples captured from the chip.

A simple way of canceling the offset error is to calculate the modified moving average of each sub-ADC output and then subtract the average results from their respective sub-ADC outputs samples [27]. The offset-corrected samples are then passed into the gain mismatch calibration unit where gain mismatch is mitigated. The relative gain of each sub-ADC with respect to a reference sub-ADC is determined by computing a power ratio between the average powers of sub-ADCs and reference channel. The relative gain estimate is multiplied with the according sub-ADC output samples to generate the corrected sub-ADC output [26]. The gain-corrected channels have the same gain values of the reference channel, hence equalizing gain mismatch among ADC channels.

The offset and gain corrected samples are then transmitted to the pipelined OFpS architecture where clock skews are corrected by our proposed calibration. The pipelined OFpS model is in fact equivalent to the synthesized digital circuit as interpreted in section IV-B1. Thus, the emulated results also present the efficiency of our solution when it is integrated on the ADC chip.

The measurement set-up used to measure the TIADC output, is described in Fig. 20. The various equipments are set up as follows:

- Arbitrary signal generators up to 6GHz.
- A 2.7GHz external clock generator needs to supply the clock signal to the ADC chip. The input signal generators and the clock signal generator are synchronized through a reference signal (10MHz).
- Narrow band filters are used for the input signal frequency to remove the harmonics (or distortions) created by the input signal generators and achieve a pure sinewave at the input of the ADC.
- Because of the limited Random-Access Memory (RAM) (limited to 16K samples) of the ADC chip, the TIADC is connected to a FPGA DE4 board via a SATA connection. The SATA interface is made of the two differential links connected to the FPGA board. Because of the limited speed of the SATA bus, the ADC output is downsampled by 5. The donwsampled output sequence is saved in the FPGA RAM that can store up to  $2^{18} = 262144$  samples. This determines the resolution of the power spectral density (PSD).
- Measured data is transferred to a Personal Computer (PC) via an I2C connection for post processing calibrations.

For the measurement, the offset and gain calibrations embedded on the chip are turned off. Two-tone sinusoidal input is created at frequency  $f_{in} = [2300 \text{MHz}, 2301 \text{MHz}]$  in the second NB. The same powers of the two tones are equal to -1 dBm. The frequency gap between the two tone is 1 MHz. This is because the generated input is guaranteed inside the bandwidth of a narrow band filter.

Fig. 21(a) and (b) show the output spectrum before and after executing the proposed all-digital offset and gain calibrations,



Fig. 20: Lab environment and setup measurement.



Fig. 21: Measured output spectrum after and before calibration(due to sub-sampling and downsampling two original input tones map to 140MHz and 141MHz, respectively).

respectively. The output of the ADC chip is distorted by channel mismatches (offset, gain and clock skew) and nonlinear harmonic distortion coming from the electrical components of the ADC front-end. Note that due to sub-sampling and downsampling processes, input tones map to 140MHz and 141MHz, respectively as shown in Fig. 21. Obviously, the offset tones are completely removed out after calibration. The spurs due to timing skew and gain mismatches have the same position at the output spectrum [3]. As can be seen, the gain calibration compensates the real samples and reduces the level of some spurious tones (not all) induced by clock skews and gain mismatches by around 3dB. The corrected output samples by offset and gain calibration are fed into the optimal hardware architecture illustrated in Fig. 18 to mitigate the clock skew errors. Fig. 21(c) illustrates the output spectrum after performing the proposed calibration algorithm. As can be seen, the proposed clock skew calibration significantly mitigate the skew and gain spur levels to the noise floor level from -66dBFS to -97dBFS. During calibration, the timing mismatch coefficients converge to their expected values after 5K samples (or  $1.8\mu s$ ) as shown in Fig. 22. From the above



Fig. 22: Convergence speed of clock skews with postprocessing the real samples.

measurement results, the ADC chip suffers from nonlinear distortions coming from the front-end of the ADC. The effects of these distortion errors can be removed out by applying the digital distortion compensation techniques presented in [39], [40].

By making a survey of the TIADC chips published at the ISSCC and VLSI conferences from 1997 to 2016 [41], if the clock skew calibration is needed, most TIADC chip used the mixed-signal calibration. Table IV shows the performance comparison of the synthesized digital logic of only clock skew calibrations to the state-of-the-art techniques. The designed

TABLE IV: Performance Comparison

| Ref.         | This Work                         | Our previous<br>work [26] | ISSCC<br>2002 [42] | ISSCC<br>2014 [10] |
|--------------|-----------------------------------|---------------------------|--------------------|--------------------|
| Technology   | 28nm FD-SOI                       | 28nm FD-SOI               | 0.35-µm            | 40nm               |
| M            | 4                                 | 4                         | 2                  | 12                 |
| Rate [GS/s]  | 2.7                               | 2.7                       | 0.12               | 1.6                |
| Resolution   | 11 bits                           | 11 bits                   | 10 bits            | 12 bits            |
| Input        | Up to NZ4                         | Up to NZ4                 | NZ1                | NZ1                |
| Spurs [dBFS] | 97 @ f <sub>in</sub> <sup>1</sup> | 97 @ $f_{in}^1$           | 90.3 @ 0.99MHz     | 70 @ 750MHz        |
| Power [mW]   | 33.2                              | 41                        | 171                | 35.3               |

 $f_{in} = \{2300 \text{MHz}, 2301 \text{MHz}\}$ 

digital logic of the proposed calibration system successfully tackles the timing skew problem for input at any NB. In post-processing simulations with the real data captured from the ADC chip, it keeps skew tones at -97dBFS for twotone sinusoidal input located in the second NZ at frequencies of 2300MHz, 2301MHz. Moreover, it dissipates less power and work at higher sampling rate than the prior arts. With 31dB improvement of clock skew induced spur levels, the proposed calibration has the same emulation performance as our previous work reported in [19], [26]. However, it consumes 7.5mW less power after post P&R logic simulations than our previous one in [19], [26], i.e., reducing 19% power consumption. This is because the proposed calibration saves one more BD filter in design. Furthermore, with 15.5mW power consumption and 0.02mm<sup>2</sup> chip area of the synthesized calibration circuit for the baseband ADCs clocked at 2.7GHz presented in Section IV-B1, our solution outperforms the state-of-the-art in terms of low power consumption and high sampling rate.

## V. CONCLUSION

This framework has presented an all-digital clock skew feedforward background calibration for sub-sampling TIADCs. This technique does not require a pilot input nor additional reference channel. It is implemented using the polyphase filtering technique in order to enhance the working frequency in the DSP and does not use adaptive filter banks which enable the implementation at a moderate hardware and also improve the power consumption. Moreover, the correction and estimation algorithms in the feedforward calibration scheme re-use (or share) the common BD FIR filter which is able to save filter hardware and reduce 19%power dissipation. Simulations demonstrate the efficiency of the proposed calibration which leads to SFDR improvement of at least 28dB over the first four NZs. The pipelined hardware architecture of the proposed calibration is optimized using the fixed-point optimization methodology. The HDL design of the pipelined hardware architecture has been synthesized using RC targeting ST-28nm FD-SOI process. The designed digital circuit successfully addresses the clock skew problem for the 60dB SNR sub-sampling and baseband TIADC applications clocked at 2.7GHz. For undersampling TIADC applications, it occupies an area of 0.04mm<sup>2</sup> and dissipates a total power of 33.2mW at the input frequency of 5.3GHz; and it has  $0.02 \text{mm}^2$  area occupation and 15.5 mW power consumption at the input frequency of 1.2GHz for babseband ADC applications. In comparison with the state-of-the-art, our solution achieves small chip area, lower power consumption and higher rate. By post processing the measured data from the ADC chip, the clock skew calibration combines with the classical calibration algorithms of gain and offset mismatches to successfully address all channel mismatches. The clock calibration keeps the timing mismatch tones below -97dBFS for the two-tone sinusoidal input in the second Nyquist Zone in post processing the measured data stream from the ADC chip, demonstrating the outperforming of our solution compared with prior arts.

#### ACKNOWLEDGMENT

The authors would like to thank the European CATRENE CORTIF project for supporting this work. The authors also would like to thank Dr. Nicolas Le Dortz for his discussion.

#### REFERENCES

- B. Razavi, "Design Considerations for Interleaved ADCs," Solid-State Circuits, IEEE Journal of, vol. 48, no. 8, pp. 1806–1817, 2013.
- [2] J. Black, W.C. and D. Hodges, "Time interleaved converter arrays," *Solid-State Circuits, IEEE Journal of*, vol. 15, no. 6, pp. 1022–1029, Dec 1980.

- [3] M. El-Chammas and B. Murmann, "General Analysis on the Impact of Phase-Skew in Time-Interleaved ADCs," *Circuits and Systems I: Regular Papers, IEEE Transactions on*, vol. 56, no. 5, pp. 902–910, May 2009.
- [4] D. Stepanovic and B. Nikolic, "A 2.8 GS/s 44.6 mW Time-Interleaved ADC Achieving 50.9 dB SNDR and 3 dB Effective Resolution Bandwidth of 1.5 GHz in 65 nm CMOS," *Solid-State Circuits, IEEE Journal* of, vol. 48, no. 4, pp. 971–982, April 2013.
- [5] M. El-Chammas and B. Murmann, "A 12-GS/s 81-mW 5-bit timeinterleaved flash ADC with background timing skew calibration," in *VLSI Circuits (VLSIC), 2010 IEEE Symposium on*, June 2010, pp. 157– 158.
- [6] K. Doris et al., "A 480mW 2.6GS/s 10b 65nm CMOS time-interleaved ADC with 48.5dB SNDR up to Nyquist," in Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2011 IEEE International, Feb 2011, pp. 180–182.
- [7] D. Camarero et al., "Mixed-Signal Clock-Skew Calibration Technique for Time-Interleaved ADCs," *Circuits and Systems I: Regular Papers, IEEE Transactions on*, vol. 55, no. 11, pp. 3676–3687, 2008.
- [8] A. Haftbaradaran and K. W. Martin, "A sample-time error compensation technique for time-interleaved adc systems," in *Custom Integrated Circuits Conference*, 2007. CICC '07. IEEE, Sept 2007, pp. 341–344.
- [9] M. Straayer et al., "A 4GS/s time-interleaved RF ADC in 65nm CMOS with 4GHz input bandwidth," in 2016 IEEE International Solid-State Circuits Conference (ISSCC), Jan 2016, pp. 464–465.
- [10] N. Le Dortz et al., "22.5 A 1.62GS/s time-interleaved SAR ADC with digital background mismatch calibration achieving interleaving spurs below 70dBFS," in *Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014 IEEE International*, Feb 2014, pp. 386–388.
- [11] J. Matsuno et al., "All-Digital Background Calibration Technique for Time-Interleaved ADC Using Pseudo Aliasing Signal," Circuits and Systems I: Regular Papers, IEEE Transactions on, vol. 60, no. 5, pp. 1113–1121, 2013.
- [12] P. Satarzadeh et al., "A parametric polyphase domain approach to blind calibration of timing mismatches for M-channel time-interleaved ADCs," in Circuits and Systems (ISCAS), Proceedings of 2010 IEEE International Symposium on, May 2010, pp. 4053–4056.
- [13] V. Divi and G. W. Wornell, "Blind Calibration of Timing Skew in Time-Interleaved Analog-to-Digital Converters," *Selected Topics in Signal Processing, IEEE Journal of*, vol. 3, no. 3, pp. 509–522, 2009.
- [14] S. Huang and B. Levy, "Blind Calibration of Timing Offsets for Four-Channel Time-Interleaved ADCs," *Circuits and Systems I: Regular Papers, IEEE Transactions on*, vol. 54, no. 4, pp. 863–876, April 2007.
- [15] T. Oshima *et al.*, "LMS calibration of sampling timing for timeinterleaved A/D converters," *Electronics Letters*, vol. 45, no. 12, pp. 615–617, June 2009.
- [16] J. A. McNeill *et al.*, "Split ADC' Calibration for All-Digital Correction of Time-Interleaved ADC Errors," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 56, no. 5, pp. 344–348, May 2009.
- [17] C. Vogel et al., "Adaptive blind compensation of gain and timing mismatches in M-channel time-interleaved ADCs," in *Electronics, Circuits* and Systems, 2008. ICECS 2008. 15th IEEE International Conference on, 2008, pp. 49–52.
- [18] S. Louwsma et al., "A Time-Interleaved Track & Hold in 0.13m CMOS sub-sampling a 4 GHz signal with 43 dB SNDR," in *Custom Integrated Circuits Conference*, 2007. CICC '07. IEEE, Sept 2007, pp. 329–332.
- [19] H. Le Duc et al., "All-Digital Calibration of Timing Skews for TIADCs Using the Polyphase Decomposition," *IEEE Transactions on Circuits* and Systems II: Express Briefs, vol. 63, no. 1, pp. 99–103, Jan 2016.
- [20] S. Jamal et al., "Calibration of Sample-Time Error in a Two-Channel Time-Interleaved Analog-to-Digital Converter," Circuits and Systems I: Regular Papers, IEEE Transactions on, vol. 52, no. 4, pp. 822–822, 2005.
- [21] F. Centurelli *et al.*, "Efficient Digital Background Calibration of Time-Interleaved Pipeline Analog-to-Digital Converters," *Circuits and Systems I: Regular Papers, IEEE Transactions on*, vol. 59, no. 7, pp. 1373–1383, 2012.
- [22] F. Harris et al., "Two channel TI-ADC for communication signals," in 2011 IEEE 12th International Workshop on Signal Processing Advances in Wireless Communications, June 2011, pp. 576–580.
- [23] P. Benabes *et al.*, "Mismatch calibration methods for high-speed timeinterleaved adcs," in *New Circuits and Systems Conference (NEWCAS)*, 2014 IEEE 12th International, June 2014, pp. 49–52.
- [24] H. Le Duc *et al.*, "Hardware implementation of all digital calibration for undersampling TIADCs," in *Circuits and Systems (ISCAS)*, 2015 IEEE International Symposium on, May 2015, pp. 2181–2184.

- [25] N. Dortz et al., "Method and Device for use with Analog to Digital Converter," Aug. 21 2014, uS Patent App. 14/179,993. [Online]. Available: http://www.google.com/patents/US20140232575
- [26] H. Le Duc, "All-digital Calibration Techniques of Timing Skews for Undersampling Time-Interleaved ADCs," Ph.D. dissertation, COMELEC Department, Telecom-ParisTech, 46 Rue Barrault, 75013 Paris, Dec 2015.
- [27] H. Le Duc et al., "A Fully Digital Background Calibration of Timing Skew in Undersampling TI-ADC," in New Circuits and Systems Conference (NEWCAS), 2014 IEEE 12th International, June 2014.
- [28] A. Papoulis and S. U. Pillai, Probability, Random Variables, and Stochastic Processes, 4th ed. McGraw-Hill Higher Education, 2002.
- [29] J. G. Proakis and D. G. Manolakis, *Digital Signal Processing: Principles, Algorithms and Applications*. Upper Saddle River, New Jersey, 2007.
- [30] A. V. Oppenheim et al., Discrete-time Signal Processing (2Nd Ed.). Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 1999.
- [31] A. W. Ingleton, "The Rank of Circulant Matrices," J. London Math. Soc. 1956 s1-31:445-460.
- [32] L. El Ghaoui, "Optimization models and applications," http://livebooklabs.com/keeppies/c5a5868ce26b8125, 2015, livebook visited July 2015.
- [33] S. Boyd and L. Vandenberghe, *Convex Optimization*. New York, NY, USA: Cambridge University Press, 2004.
- [34] —, Introduction to Matrix Methods and Applications (working Title). Stanford University, 2014. [Online]. Available: http://stanford.edu/class/ee103/mma.pdf
- [35] C. L. Lawson and R. J. Hanson, *Solving least squares problems*, ser. Classics in applied mathematics. Philadelphia (Pa.): SIAM, 1995, sIAM : Society of industrial and applied mathematics. [Online]. Available: http://opac.inria.fr/record=b1080804
- [36] J. G. Proakis and D. K. Manolakis, *Digital Signal Processing (4th Edition)*. Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 2006.
- [37] S. Smith, The Scientist and Engineer's Guide to Digital Signal Processing. California Technical Pub., 1997. [Online]. Available: https://books.google.fr/books?id=rp2VQgAACAAJ
- [38] R. Vaughan et al., "The theory of bandpass sampling," Signal Processing, IEEE Transactions on, vol. 39, no. 9, pp. 1973–1984, Sep 1991.
- [39] R. Vansebrouck et al., "Performance study of nonlinearities blind correction in wideband receivers," in *Electronics, Circuits and Systems* (ICECS), 2014 21st IEEE International Conference on, Dec 2014, pp. 335–338.
- [40] —, "Digital distortion compensation for wideband direct digitization rf receiver," in *New Circuits and Systems Conference (NEWCAS)*, 2015 *IEEE 13th International*, June 2015, pp. 1–4.
- [41] B. Murmann, "ADC Performance Survey 1997-2015," [Online]. Available: http://web.stanford.edu/ murmann/adcsurvey.html.
- [42] S. Jamal et al., "A 10-b 120-Msample/s time-interleaved analog-todigital converter with digital background calibration," Solid-State Circuits, IEEE Journal of, vol. 37, no. 12, pp. 1618–1627, Dec 2002.



Han Le Duc received the M.Sc. (Master of Science) degree in Communication Engineering from RWTH Aachen University, at Aachen, Germany in 2012, and the PhD degree in Electronics and Communications from the Institute Mines Telecom, France in 2015. Since January 2016, he has worked at the Institute Mines Telecom as a postdoc researcher.

His main research interests include advanced digital signal processing algorithms for communication systems, digital IC design and the channel mismatch calibration in time-interleaved ADCs.



**Duc Minh Nguyen** obtained a PhD in Electrical Engineering from University of Kaiserslautern in 2009. He worked as a scientific staff at University of Kaiserslautern, Germany. And he is currently a Researcher and Lecturer in the School of Electronics and Telecommunications at Hanoi University of Science and Technology. His research activities involve digital hardware design, embedded system design, formal verification of digital design and embedded systems.



Olivier Jamin received the MSc. degree in Electronic Systems from the University of Nantes in 1999, and his Ph.D. in Electronics & Communication from Telecom ParisTech in 2013. He currently is a Senior Principal System & IC architect at NXP Semiconductors. His centers of interest are AMS design techniques for highly-digitized frontends & RF transceivers. From 1999 to 2004, he worked as a mixed-signal IC designer and design leader at Philips Semiconductors in Caen (France), designing low-power analog front-ends for imaging

& ultrasound applications. From 2004 to 2006, he joined Philips high-speed data conversion group as a RF system engineer, architecting SDR products for wireless cellular infrastructure transceivers. From 2006 to 2014, he worked in the RF Transceivers group of NXP as an RF & mixed-signal IC architect for TV tuners, low-power IoT transceivers, and as a lead architect of the product line designing multi-channel full-band-capture transceivers for cable modems. From 2014, he has been working on NFC controllers, and currently leads a team focused on system aspects of NFC transceivers.



**Chadi Jabbour** received the MSc. degree from *Ecole Nationale Supérieur de physique et de chimie industrielles (ESPCI)*, France in 2007 and the Ph.D. degree from Telecom Paristech, France in 2010. From 2011 to 2014, he worked at the Institute Mines Telecom as a research engineer. Between 2014 and 2015, he was a visting fellow during 8 months at Nokia Technologies, Berkeley working on novel architecture for RF-to-digital receivers. In 2015, he became an associate professor at Institut Mines-Télécom. His research interests include ADC design

and calibration, post/pre-distortion for communication systems and flexible receiver architectures.



Van Tam Nguyen was born in Tinh Gia, Thanh Hoa, Vietnam in 1975. He received the Diplôme d'Ingenieur from Ecole Superieure d'Electricite (Supelec), a M.Sc. degree in automatic and signal processing from University Paris XI and graduated in image processing from EPFL, Switzerland in 2000, and a Ph.D. degree in Communications and Electronic from Ecole Nationnale Superieure des Telecommunications (Telecom ParisTech) in 2004. From 2000 to 2005 he worked mainly on European Project SPRING (Scientific Multidisciplinary

Network for metering - IST-1999- 12342) with Schlumberger and Ecole Nationale Superieure des Telecommunications. Since 2005, he has been on the faculty of Telecom ParisTech, where he is currently an Associate Professor in the Communications and Electronic Department. Since 2015, he has been a Senior Marie Curie Fellow at UC Berkeley. He held visiting positions at the University of Aizu, Japan in 2012 and UC Berkeley in 2013 and 2014. He was also a rank A guest researcher of NICT, Japan, from 2012 July to 2013 June, where he has proposed a surveillance game for the reliability and worked on radio resource management for cognitive radio systems.

At UC Berkeley, he has proposed "COGNICOM", a brain-inspired software-hardware paradigm, to support IoT's future growth. COGNICOM brings computing closer to end-user and focuses on optimal uses of local Smart Application Gateway and cloud computing. COGNICOM consists of two key components: Cognitive Engine and Smart Connectivity. The cognitive engine is powered by deep-learning algorithms integrated with game-theoretic decision analytics, implemented on low-power Network Multi-Processor System on Chip. The cognitive engine provides cognitive functions (e.g. anomaly detection and decision making) to smart objects. SC integrates neural network inspired designs of cognitive radio, transceivers and baseband processors. The smart connectivity provides flexible and reliable connections to IoT objects and optimally distributes communication resources. The designs of both cognitive engine and smart connectivity will leverage his past success in designing cognitive radios and surveillance game.

He moved to Stanford University on July 1st 2016.



Patricia Desgreys (M'00-M'12), received the M.Sc. degree and Ph.D. in Microelectronics from the University of Bordeaux in 1996 and 1999 respectively. Since 2000, she has been with Institute Mines Telecom-TELECOM ParisTech where she is currently professor in the Communications and Electronics Department, heading the Circuit and Communication Systems group. She is in charge of research on Wireless Systems and Design Techniques for Nanoscale Circuits with special emphasis on Software and Cognitive Radio System & Smart

AMS Systems for IoT. So far, she has co-authored more than 100 technical publications mainly on international journals (19) and international conference proceedings (52) and has been involved in many collaborative projects. Patricia Desgreys is a board member of the LTCI (Processing and Information Communication Laboratory), CNRS research laboratory associated with Telecom ParisTech. Since 2007, she is a member of the Steering Committee of CNRS Research group on SoC-SiP in charge of animating the community of 600 French researchers on SoC-SiP design. In this context, she has organized more than 10 workshops on hot topics with international renowned speakers. Patricia is an IEEE SM, involved in the international animation of the CAS community, in particular President of IEEE-CAS France since 2015 and Technical Program Chair of IEEE NEWCAS for 2012 & 2013.