fundamentals of time-varying communication 1 …...hlawatsch ch01-9780123744838 2011/2/21 21:31 page...

HLAWATSCH Ch01-9780123744838 2011/2/21 21:31 #1

CHAPTER

1Fundamentals ofTime-Varying CommunicationChannels

Gerald Matz, Franz HlawatschVienna University of Technology, Vienna, Austria

1.1 INTRODUCTIONWireless communication systems, i.e., systems transmitting information via electromagnetic (radio)or acoustic (sound) waves, have become ubiquitous. In many of these systems, the transmitter or thereceiver is mobile. Even if both link ends are static, scatterers – i.e., objects that reflect, scatter, ordiffract the propagating waves – may move with significant velocities. These situations give rise totime variations of the wireless channel due to the Doppler effect. Nonideal local oscillators are anothersource of temporal channel variations, even in the case of wireline channels. Because of their practicalrelevance, linear time-varying (LTV) channels have attracted considerable interest in the fields of signalprocessing, communications, propagation, information theory, and mathematics. In their most generalform, LTV channels are also referred to as time-frequency (TF) dispersive or doubly dispersive, as wellas TF selective or doubly selective.

In this chapter, we discuss the fundamentals of wireless channels from a signal processing and com-munications perspective. In contrast to existing textbooks (e.g., Jakes, 1974; Molisch, 2005; Parsons,1992; Vaughan & Bach Andersen, 2003), our focus will be on LTV channels. Many of the theoreti-cal foundations of LTV channels were laid in the 1950s and 1960s. Zadeh (1950) proposed a “systemfunction” that characterizes an LTV system in a joint TF domain. Driven by increasing interest inionospheric channels, Kailath complemented Zadeh’s work by introducing a dual system function,discussing sampling models, and addressing measurement issues (Kailath, 1959, 1962). A relateddiscussion focusing on the concept of duality (an important notion in TF analysis) was provided byGersho (1963). In a seminal paper on random LTV channels. Bello (1963) introduced the assumptionof wide-sense stationary uncorrelated scattering (WSSUS), which has been used almost universallysince. The estimation of channel statistics was addressed by Gallager (1964) and a few years later byGaarder (1968). A fairly comprehensive coverage of the modeling of and communication over randomLTV channels was provided by Kennedy (1969). Information-theoretic aspects of LTV channels wereaddressed in Biglieri, Proakis and Shamai (1998) and in Gallager (1968) (see also Chapter 2).

This chapter provides a review of this early work and a discussion of several more recent results. InSection 1.2, we summarize the most important physical aspects of LTV channels. Some basic tools for adeterministic description of LTV channels are discussed in Section 1.3, while the statistical descriptionof random LTV channels is considered in Section 1.4. Section 1.5 is devoted to the important class ofunderspread channels and their properties. Parsimonious channel models are reviewed in Section 1.6.

Wireless Communications Over Rapidly Time-Varying ChannelsCopyright c© 2011 Elsevier Ltd. All rights reserved.

1

HLAWATSCH Ch01-9780123744838 2011/2/21 21:31 #2

2 CHAPTER 1 Fundamentals of Time-Varying Communication Channels

Finally, Section 1.7 discusses the measurement of LTV channels and of their statistics. Throughout thischapter, we will consider noise-free systems since our focus is on the signal distortions caused by LTVwireless channels and not on noise effects. The equivalent complex baseband representation of signalsand systems (channels) will be used in most cases.

1.2 THE PHYSICS OF TIME-VARYING CHANNELSIn this section, we briefly describe some of the physical phenomena associated with wireless channels.We shall concentrate on radio channels, although much of our discussion is also relevant to acousticchannels. The term “wireless channel” will be understood as an abstraction of all effects on the transmitsignal caused by the transmission. This typically includes the effects of antennas and radio-frequencyfront ends in addition to the propagation environment affecting the electromagnetic waves.

1.2.1 Wave PropagationIn wireless communications, information is transmitted by radiating a modulated electromagnetic waveat a certain carrier frequency by means of a transmit antenna and picking up energy of the radiated waveby means of a receive antenna. The behavior of the radio waves is determined by the propagation envi-ronment according to Maxwell’s equations. For most scenarios of interest, solving Maxwell’s equationsis infeasible (even if the propagation environment is completely known, which rarely happens in prac-tice). This is due to the fact that except for free-space propagation, the wave interacts with dielectricor conducting objects. These interactions are usually classified as reflection, transmission, scattering,and diffraction. We will follow the prevailing terminology and refer to interacting objects simply as“scatterers,” without distinguishing between the different types of interaction. While the behavior ofradio waves strongly depends on the carrier frequency fc (or, equivalently, wavelength λc), there are anumber of common phenomena that lead to a high-level characterization valid for all wireless channels.

1.2.2 Multipath Propagation and Time DispersionThe presence of multiple scatterers (buildings, vehicles, hills, and so on) causes a transmitted radiowave to propagate along several different paths that terminate at the receiver. Hence, the receiveantenna picks up a superposition of multiple attenuated copies of the transmit signal. This phenomenonis referred to as multipath propagation. Due to different lengths of the propagation paths, the indi-vidual multipath components experience different delays (time shifts). The receiver thus observes atemporally smeared-out version of the transmit signal. Even though the medium itself is not physicallydispersive (in the sense that different frequencies propagate with different velocities), such channelsare termed time-dispersive. The following example considers a simple idealized scenario.

Example 1.1Consider two propagation paths in a static environment. The receive signal in the equivalent complexbaseband domain is given by

r(t)= h1 s(t− τ1)+ h2 s(t− τ2).

HLAWATSCH Ch01-9780123744838 2011/2/21 21:31 #3

1.2 The Physics of Time-Varying Channels 3

Here, hp = |hp|e jϕp and τp are, respectively, the complex attenuation factor and delay associated with the pthpath. The magnitude of the Fourier transform R( f ),

∫∞

−∞r(t)e−j2π ftdt of the receive signal follows as

|R( f )| = |S( f )|√|h1|

2+ |h2|2+ 2|h1||h2|cos(2π(τ1− τ2)f − (ϕ1−ϕ2)).

As can be seen from Example 1.1, a time-dispersive channel has a multiplicative effect on thetransmit signal in the frequency domain (this, of course, is a basic equivalence in Fourier analy-sis). Therefore, time-dispersive channels are frequency-selective in the sense that different frequenciesare attenuated differently; see Fig. 1.1 for illustration. These differences in attenuation become moresevere when the difference of the path delays is large and the difference between the path attenuationsis small.

Multipath propagation is not the only source of time dispersion. Further potential sources aretransmitter and receiver imperfections, such as transmit/receive pulses not satisfying the Nyquistcriterion, imperfect timing recovery, or sampling jitter. In the following example, we consider anequivalent discrete-time baseband representation that includes pulse amplitude modulation (PAM),analog-to-digital and digital-to-analog conversion, and demodulation.

|R( f )| (dB)

|R( f )| (dB) |R( f )| (dB)

|R( f )| (dB)

f

f

f

f

(a) (b)

(c) (d)

FIGURE 1.1

Illustration of frequency selectivity in a receive spectrum (thick solid line) for a two-path channel and araised-cosine transmit spectrum (thin dotted line): (a) small |τ1− τ2| and |h1| � |h2|, (b) large |τ1− τ2| and|h1| � |h2|, (c) small |τ1− τ2| and |h1| ≈ |h2|, and (d) large |τ1− τ2| and |h1| ≈ |h2|.

HLAWATSCH Ch01-9780123744838 2011/2/21 21:31 #4


Example 1.2Consider a single propagation path with complex attenuation factor h and delay 0 and a digital PAM sys-tem with symbol period T and transmit and receive pulses whose convolution yields a Nyquist pulse p(t).Assuming a timing error 1T, the received sequence in the equivalent discrete-time (symbol-rate) basebanddomain equals

r[k]=∞∑

l=−∞

hl a[k− l] , with hk = hp(kT −1T),

where a[k] denotes the sequence of transmit symbols. Note that in spite of a single propagation path, thereis significant temporal dispersion unless 1T = 0 (in which case hk = 0 for k 6= 0).

Although multipath propagation has traditionally been viewed as a transmission impairment, nowa-days there is a tendency to consider it as beneficial since it provides additional degrees of freedom thatare known as delay diversity or frequency diversity and that can be exploited to realize diversity gainsor, in the context of multiantenna systems, even multiplexing gains (Tse & Viswanath, 2005).

1.2.3 Doppler Effect and Frequency DispersionIn many wireless systems, the transmitter, receiver, and/or scatterers are moving. In such situations, theemitted wave is subject to the Doppler effect and hence experiences frequency shifts. We first restrictour discussion to a simple scenario with a static transmitter, no scatterers, and a receiver moving withvelocity υ. In this case, a purely sinusoidal carrier wave of frequency fc is observed by the receiver asa sinusoidal wave of frequency(

1−υ cos(φ)

c0+ υ cos(φ)

)fc ≈

(1−

υ cos(φ)

c0

)fc, (1.1)

where φ is the angle of arrival of the wave relative to the direction of motion of the receiver and c0is the speed of light. The above approximation on the right-hand side of (1.1) holds for the practicallypredominant case υ� c0. For a general transmit signal s(t) with Fourier transform S( f ), one can thenshow the following expressions of the receive signal (h is the complex attenuation factor):

R( f )= hS(αf ), r(t)=h

αs

(t

α

), with α = 1−

υ cos(φ)

c0. (1.2)

This shows that the Doppler effect results in a temporal/spectral scaling (i.e., compression or dilation).In many practical cases, the transmit signal is effectively band-limited around the carrier frequency

fc, i.e., S( f ) is effectively zero outside a band [ fc−B/2, fc+B/2], where B� fc. The approximationαf = f − υ cos(φ)

c0f ≈ f − υ cos(φ)

c0fc (whose accuracy increases with decreasing normalized bandwidth

B/fc) then implies

R( f )≈ hS( f − ν), r(t)≈ hs(t)e j2πνt, with ν =υ cos(φ)

c0fc. (1.3)

Here, the Doppler effect essentially results in a frequency shift, with the Doppler shift frequency νbeing proportional to both the velocity υ and the carrier frequency fc. The relations (1.2) and (1.3)

HLAWATSCH Ch01-9780123744838 2011/2/21 21:31 #5

1.2 The Physics of Time-Varying Channels 5

are often referred to as wideband and narrowband Doppler effect, respectively, even though the“narrowband” approximation (1.3) holds true also for systems usually considered wideband by com-munication engineers (e.g., for a WLAN at carrier frequency fc = 2.4 GHz and with bandwidthB= 20 MHz, there is B/fc = 8.3 · 10−3).

In the general case of multipath propagation and moving transmitter, receiver, and/or scatterers,the received multipath components (echoes) experience different Doppler shifts since the angles ofarrival/departure and the relative velocities associated with the individual multipath components aretypically different. Hence, the transmit signal is spread out in the frequency domain – it experiencesfrequency dispersion.

Example 1.3Consider two propagation paths with equal delay τ0 but different Doppler frequencies ν1 and ν2. Here, theFourier transform of the receive signal is obtained as

R( f )=[h1 S( f − ν1)+ h2 S( f − ν2)

]e−j2πτ0f , (1.4)

where hp = |hp|e jϕp denotes the complex attenuation factor of the pth path. The magnitude of the receivesignal follows as

|r(t)| = |s(t− τ0)|

√|h1|

2+ |h2|2+ 2|h1||h2|cos(2π(ν1− ν2)(t− τ0)+ (ϕ1−ϕ2)). (1.5)

While (1.4) illustrates the frequency dispersion, (1.5) shows that the Doppler effect leadsto time-varying multiplicative modifications of the transmit signal in the time domain. Thus,channels involving Doppler shifts are also referred to as being time-selective. With the replace-ments R( f )→ r(t), f → t, τ1→ ν1, and τ2→ ν2, Fig. 1.1 can also be viewed as an illustrationof time selectivity. Depending on the system architecture, time selectivity may be viewed as atransmission impairment or as a beneficial effect offering Doppler diversity (also termed timediversity).

Apart from the Doppler effect due to mobility, imperfect local oscillators are another cause offrequency dispersion because they result in carrier frequency offsets (i.e., different and possibly time-varying carrier frequencies at the transmitter and receiver), oscillator drift, and phase noise.

Example 1.4Consider a static scenario with line-of-sight propagation without multipath, i.e., the transmit and receivesignals in the (complex) bandpass domain are related as r(t)= hs(t). The transmitter uses a perfect localoscillator, i.e., s(t)= sB(t)e j2π fct (sB(t) denotes the baseband signal). The local oscillator at the receiver ischaracterized by m(t)= p(t)e−j2π( fc+1f )t, where1f is a carrier frequency offset and p(t)models phase noiseeffects that broaden the oscillator’s line spectrum. In this case, the received baseband signal rB(t)= r(t)m(t)and its Fourier transform are given by

rB(t)= hsB(t)p(t)e−j2π1ft, RB( f )= h

∞∫−∞

P(ν−1f )SB( f − ν)dν.

HLAWATSCH Ch01-9780123744838 2011/2/21 21:31 #6


Clearly, in spite of the static scenario (no Doppler effect), the transmit signal experiences temporal selectivityand frequency dispersion.

1.2.4 Path Loss and FadingWireless channels are characterized by severe fluctuations in the receive power, i.e., in the strengthof the electromagnetic field at the receiver position. The receive power is usually modeled as acombination of three phenomena: path loss, large-scale fading, and small-scale fading.

The path loss describes the distance-dependent power decay of electromagnetic waves. Let usmodel the attenuation factor as d−β , where d is the distance the wave has traveled and β denotesthe path loss exponent, which is typically assumed to lie between 2 and 4. The path loss in decibels isthen obtained as PL = 10β log10(d).

Two receivers located at the same distance d from the transmitter may still experience significantlydifferent receive powers if the radio waves have propagated through different environments. In partic-ular, obstacles like buildings or dense vegetation can block or attenuate propagation paths and resultin shadowing and absorption loss, respectively. This type of fading is referred to as large-scale fadingsince its effect on the receive power is constant within geographic regions whose dimensions are on theorder of 10λc · · ·100λc, i.e., large relative to the wavelength λc. Experimental evidence indicates thatfor many systems, large-scale fading can be accurately modeled as a random variable with log-normaldistribution (Molisch, 2005).

Finally, constructive and destructive interference of field components corresponding to differentpropagation paths causes receive power fluctuations within “small” regions whose dimensions are onthe order of a few wavelengths. This small-scale fading can vary over several decades and is usuallymodeled stochastically by channel coefficients with Gaussian distribution. The magnitude of the chan-nel coefficients is then Rayleigh distributed for zero-mean channel coefficients and Rice distributed fornonzero-mean channel coefficients. Rician fading is often assumed when there exists a line of sight.

Path loss and large-scale fading change only gradually and are relevant to the link budget andaverage receive signal-to-noise ratio (SNR); they are often combated by a feedback loop performingpower control at the transmitter. In contrast, small-scale fading causes the receive power to fluctuateso rapidly that adjusting the transmit power is infeasible. Hence, small-scale fading has a direct impacton system performance (capacity, error probability, and so on). The main approach to mitigating small-scale fading is the use of diversity techniques (in time, frequency, or space).

1.2.5 Spatial CharacteristicsIn a multipath scenario, the angle of departure (AoD) of a propagation path indicates the direction inwhich the planar wave corresponding to that path departs from the transmitter. Similarly, the angleof arrival (AoA) indicates from which direction the wave arrives at the receiver. AoA and AoD arespatial channel characteristics that can be measured using an antenna array at the respective link end.The angular resolution of an antenna array is determined by the number of individual antennas, theirarrangement, and their distance. The transformation between the array signal vector and the angulardomain is based on the array steering vector. In the case of a uniform linear array, this transformationis a discrete Fourier transform (Tse & Viswanath, 2005).

HLAWATSCH Ch01-9780123744838 2011/2/21 21:31 #7

1.3 Deterministic Description 7

1.3 DETERMINISTIC DESCRIPTIONWe next discuss some basic deterministic1 characterizations of LTV channels. We consider a wire-less system operating at carrier frequency fc. We will generally describe this system in the equivalentcomplex baseband domain for simplicity (an exception being the wideband system considered inSection 1.3.2). The LTV channel will be viewed and denoted as a linear operator (Naylor & Sell,1982) H that acts on the transmit signal s(t) and yields the receive signal r(t)= (Hs)(t).

1.3.1 Delay-Doppler Domain – Spreading FunctionAs mentioned before, the physical effects underlying LTV channels are mainly multipath propagationand the Doppler effect. Hence, a physically meaningful and intuitive characterization of LTV channelsis in terms of time delays and Doppler frequency shifts. Let us first assume an LTV channel H with Pdiscrete propagation paths. The receive signal r(t)= (Hs)(t) is here given by

r(t)=P∑

p=1

hp s(t−τp)ej2πνpt, (1.6)

where hp, τp, and νp denote, respectively, the complex attenuation factor, time delay, and Dopplerfrequency associated with the pth path. Equation (1.6) models the effect of P discrete specular scatterers(ideal point scatterers). This expression can be generalized to a continuum of scatterers as (Bello, 1963;Molisch, 2005; Proakis, 1995)

r(t) =

∞∫−∞

∞∫−∞

SH(τ ,ν)s(t−τ)e j2πνt dτ dν. (1.7)

The weight function SH(τ ,ν) is termed the (delay-Doppler) spreading function of the LTV channel Hsince it describes the spreading of the transmit signal in time and frequency. The value of the spreadingfunction SH(τ ,ν) at a given delay-Doppler point (τ ,ν) characterizes the overall complex attenuationand scatterer reflectivity associated with all paths of delay τ and Doppler ν, and it describes howthe delayed and Doppler-shifted version s(t− τ)e j2πνt of the transmit signal s(t) contributes to thereceive signal r(t). Thus, the spreading function expresses the channel’s TF dispersion characteristics.As such, it is a generalization of the impulse response of time-invariant systems, which describesthe time dispersion. An example is shown in Fig. 1.2. Note that (1.6) is reobtained as a special caseof (1.7) for

SH(τ ,ν)=P∑

p=1

hp δ(τ − τp)δ(ν− νp). (1.8)

1We call these characterizations “deterministic” because they do not assume a stochastic model of the channel; however, fora random channel, they are themselves random, i.e., nondeterministic – see Section 1.4.

HLAWATSCH Ch01-9780123744838 2011/2/21 21:31 #8


ν

τ

FIGURE 1.2

Example of a spreading function (magnitude).

A dual representation of the channel’s TF dispersion in the frequency domain, again in terms of thespreading function SH(τ ,ν), is

R( f ) =

∞∫−∞

∞∫−∞

SH(τ ,ν)S( f−ν)e−j2πτ( f−ν)dτ dν.

We can also obtain a representation of the TF dispersion in a joint TF domain. We will use theshort-time Fourier transform (STFT) of a signal x(t), which is a linear TF signal representationdefined as X(g)(t, f ),

∫∞

−∞x(t′)g∗(t′− t)e−j2π ft′dt′, where g(t) denotes a normalized analysis window

(Flandrin, 1999; Hlawatsch & Boudreaux-Bartels, 1992; Nawab & Quatieri, 1988). The STFT of thereceive signal in (1.7) can be expressed as

R(g)(t, f )=

∞∫−∞

∞∫−∞

SH(τ ,ν)S(g)(t− τ , f − ν)e−j2πτ( f−ν) dτ dν. (1.9)

Apart from the phase factor, this is the two-dimensional (2D) convolution of the STFT of the transmitsignal with the spreading function of the channel. This again demonstrates that the spreading functiondescribes the channel’s TF dispersion.

Example 1.5For a two-path channel with delays τ1,τ2 and Doppler frequencies ν1,ν2, the spreading function is given by

SH(τ ,ν)= h1 δ(τ − τ1)δ(ν− ν1)+ h2 δ(τ − τ2)δ(ν− ν2).

HLAWATSCH Ch01-9780123744838 2011/2/21 21:31 #9


Inserting this expression into (1.7) yields

r(t)= h1 s(t−τ1)ej2πν1t

+ h2 s(t−τ2)ej2πν2t.

We note that Examples 1.1 and 1.3 are essentially reobtained as special cases with ν1 = ν2 = 0 and τ1 =

τ2 = τ0, respectively.

Writing (1.7) as r(t)=∫∞

−∞

[∫∞

−∞SH(τ ,ν)s(t−τ)dτ

]e j2πνt dν =

∫∞

−∞rν(t)dν, we see that the LTV

channel H can be viewed as a continuous (infinitesimal) parallel connection of systems parameterizedby the Doppler frequency ν. The output signals of these systems are given by

rν(t)= rν(t)ej2πνt with rν(t)= (SH(·,ν) ∗ s)(t)=

∞∫−∞

SH(τ ,ν)s(t− τ)dτ .

Thus, each system consists of a time-invariant filter with impulse response SH(τ ,ν), followed by amodulator (mixer) with frequency ν.

For a time-invariant channel with impulse response h(τ ), the spreading function equals SH(τ ,ν)=h(τ )δ(ν) so that (1.7) reduces to the convolution of s(t) with h(τ ). This correctly indicates the absenceof Doppler shifts (frequency dispersion). In the dual case of a channel without frequency selectiv-ity, i.e., r(t)= h(t)s(t), there is SH(τ ,ν)= H(ν)δ(τ ) with H(ν)=

∫∞

−∞h(t)e−j2πνt dt, which correctly

indicates the absence of time dispersion.

1.3.2 Delay-Scale Domain – Delay-Scale Spreading FunctionWhereas for narrowband systems (B/fc� 1), the Doppler effect can be represented as a frequencyshift, it must be characterized by a time-scaling (compression/dilation) in the case of (ultra)widebandsystems. Relation (1.6) is here replaced by (Molisch, 2005)

r(t) =P∑

p=1

ap1√αp

s

(t−τp

αp

), with αp = 1−

υ cos(φp)

c0.

Generalizing to a continuum of scatterers, we obtain

r(t) =

∞∫−∞

∞∫0

FH(τ ,α)1√α

s

(t−τ

α

)dτ dα. (1.10)

Here, FH(τ ,α) denotes the delay-scale spreading function of the LTV channel H (Margetts, Schniter,& Swami, 2007; Ye & Papandreou-Suppappola, 2003). Like (1.7), expression (1.10) can represent anyLTV channel, but it is most efficient (parsimonious) for wideband channels. The delay-scale descriptionof LTV channels will be discussed in more detail in Chapter 9.

HLAWATSCH Ch01-9780123744838 2011/2/21 21:31 #10


ft

FIGURE 1.3

Example of a TF transfer function (magnitude, in decibel).

1.3.3 Time-Frequency Domain – Time-Varying Transfer FunctionAs explained in Section 1.2, time-dispersiveness corresponds to frequency selectivity, and frequency-dispersiveness corresponds to time selectivity. The joint TF selectivity of an LTV channel ischaracterized by the TF (or time-varying) transfer function (Bello, 1963; Zadeh, 1950)

LH(t, f ) ,

∞∫−∞

∞∫−∞

SH(τ ,ν)e j2π(tν−f τ) dτ dν. (1.11)

This 2D Fourier transform relation between shift (dispersion) domain and weight (selectivity) domainextends the 1D Fourier transform relation H( f )=

∫∞

−∞h(τ )e−j2π f τ dτ of time-invariant channels to

the time-varying case. According to (1.11), the TF echoes described by SH(τ ,ν) correspond to TFfluctuations of LH(t, f ), which are a TF description of small-scale fading. For underspread channels(to be defined in Section 1.5), the TF transfer function LH(t, f ) can be interpreted as the channel’scomplex attenuation factor at time t and frequency f , and it inherits many properties of the transferfunction (frequency response) defined in the time-invariant case. An example is shown in Fig. 1.3.

Example 1.6We reconsider the two-path channel from Example 1.5. Using (1.11), it can be shown that the squaredmagnitude of the channel’s TF transfer function is given by

|LH(t, f )|2 = |h1|2+ |h2|

2+ 2|h1||h2|cos(2π [t(ν1− ν2)− f (τ1− τ2)]+ϕ1−ϕ2) .

Clearly, this channel is TF selective in the sense that LH(t, f ) fluctuates with time and frequency. The rapidityof fluctuation with time is proportional to the “Doppler spread” |ν1− ν2|, whereas the rapidity of fluctuationwith frequency is proportional to the “delay spread” |τ1− τ2|.

HLAWATSCH Ch01-9780123744838 2011/2/21 21:31 #11


Inserting (1.11) into (1.7) and developing the integrals with respect to τ and ν leads to the channelinput–output relation

r(t)=

∞∫−∞

LH(t, f )S( f )e j2π tf d f . (1.12)

In spite of its apparent similarity to the relation r(t)=∫∞

−∞H( f )S( f )e j2π ft df valid for time-invariant

channels, (1.12) has to be interpreted with care. Specifically, (1.12) is not a simple inverse Fouriertransform since LH(t, f )S( f ) also depends on t.

For the special case of a time-invariant channel (no frequency dispersion), the TF transfer functionreduces to the frequency response, i.e., LH(t, f )= H( f ), and (1.12) corresponds to R( f )= H( f )S( f ).This correctly reflects the channel’s pure frequency selectivity. In the dual case of a channel withouttime dispersion, the TF transfer function simplifies according to LH(t, f )= h(t), and (1.12) thus reducesto the relation r(t)= h(t)s(t), which describes the channel’s pure time selectivity.

1.3.4 Time-Delay Domain – Time-Varying Impulse ResponseWhile the spreading function was motivated by a specific physical model (multipath propagation,Doppler effect), it actually applies to any LTV system. To see this, we develop (1.7) as

r(t)=

∞∫−∞

∞∫−∞

SH(τ ,ν)e j2π tν dν

︸︷︷︸

h(t,τ)

s(t− τ)dτ =

∞∫−∞

h(t,τ)s(t− τ)dτ , (1.13)

where h(t,τ)=∫∞

−∞SH(τ ,ν)e j2π tν dν is the (time-varying) impulse response of the LTV channel H.

An example is depicted in Fig. 1.4. Defining the kernel of H as kH(t, t′), h(t, t− t′), (1.13) can be

τt

FIGURE 1.4

Example of a time-varying impulse response (magnitude).

HLAWATSCH Ch01-9780123744838 2011/2/21 21:31 #12


rewritten as

r(t)=

∞∫−∞

kH(t, t′)s(t′)dt′, (1.14)

which is the integral representation of a linear operator (Naylor & Sell, 1982). This shows that theinput–output relation (1.7) is completely general, i.e., any LTV system (channel) can be characterizedin terms of its spreading function. The spreading function and TF transfer function can be written interms of the impulse response h(t,τ) as

SH(τ ,ν)=

∞∫−∞

h(t,τ)e−j2πνt dt, (1.15)

LH(t, f )=

∞∫−∞

h(t,τ)e−j2π f τ dτ . (1.16)

From (1.13) and (1.14), it follows that for a transmit signal that is an impulse, i.e., s(t)= δ(t− t0),the receive signal equals r(t)= k(t, t0)= h(t, t− t0). The impulse response can also be interpreted interms of a “continuous tapped delay line”: for a fixed tap (delay) τ , h(t,τ) as a function of t describes thetime-varying tap weight function that multiplies the delayed transmit signal s(t− τ). In the special caseof a time-invariant channel, h(t,τ) simplifies to a function of τ only, and for a frequency-nonselectivechannel, it simplifies as h(t,τ)= h(t)δ(τ ) with some h(t).

Example 1.7A popular model of an LTV channel with specular scattering is specified in terms of the impulse response as

h(t,τ)=P∑

p=1

hp e j2πνpt δ(τ − τp(t)). (1.17)

Apart from the time dependence of the delays τp(t), this is just the inverse Fourier transform of (1.8) withrespect to ν. The time-varying delays τp(t) account for a “delay drift” that is due to changing path lengthscaused by the movement of the transmitter, receiver, and/or scatterers. However, these changes are muchslower than the phase fluctuations resulting from small-scale fading (which are described by the exponentialfunctions in (1.17)).

1.3.5 Extension to Multiantenna SystemsConsider a multiple-input multiple-output (MIMO) wireless system with MT transmit antennas and MRreceive antennas (Bolcskei, Gesbert, Papadias, & van der Veen, 2006; Paulraj, Nabar & Gore, 2003).The signals emitted by the jth transmit antenna and captured by the ith receive antenna will be denotedby sj(t) and ri(t), respectively. Each receive signal ri(t) is a superposition of distorted versions of all

HLAWATSCH Ch01-9780123744838 2011/2/21 21:31 #13


transmit signals, i.e.,

ri(t)=MT∑j=1

(Hij sj)(t) , i= 1, . . . ,MR, (1.18)

where Hij denotes the LTV channel between transmit antenna j and receive antenna i. (For a sur-vey of MIMO channel modeling aspects, see Almers et al., 2007 and references therein.) Definingthe length-MT transmit signal vector s(t)= (s1(t) · · ·sMT(t))

T and the length-MR receive vector r(t)=(r1(t) · · ·rMR(t))

T , all input–output relations (1.18) can be combined as

r(t)= (Hs)(t), with H ,

H11 . . . H1MT

.... . .

...

HMR1 . . . HMRMT

. (1.19)

The delay-Doppler, TF, and time-delay characterizations of single-antenna channels can be easilygeneralized to the MIMO case. Let us define the MR×MT matrices SH(τ ,ν), LH(t, f ), and H(t,τ)whose (i, j)th elements equal the delay-Doppler spreading function, TF transfer function, and impulseresponse of Hij, respectively. Then,

r(t) =

∞∫−∞

∞∫−∞

SH(τ ,ν)s(t− τ)e j2πνt dτ dν (1.20)

=

∞∫−∞

H(t,τ)s(t−τ)dτ ,

with

SH(τ ,ν)=

∞∫−∞

H(t,τ)e−j2πνt dt. (1.21)

Furthermore,

LH(t, f ) =

∞∫−∞

∞∫−∞

SH(τ ,ν)e j2π(tν−f τ) dτ dν (1.22)

=

∞∫−∞

H(t,τ)e−j2π f τ dτ .

The main difference between these relations and their single-antenna counterparts is the spatialresolution offered by the aperture of the antenna arrays. The transmit array and receive array make itpossible to resolve to a certain extent the AoD and AoA, respectively, of the individual paths. Thisspatial or directional resolution is essentially determined by the array steering vectors that define atransformation to the angular domain (Sayeed, 2002; Tse & Viswanath, 2005).

HLAWATSCH Ch01-9780123744838 2011/2/21 21:31 #14


Example 1.8Assuming uniform linear arrays at the transmitter and the receiver with antenna separation 1Tλc and 1Rλc,respectively, the array steering vectors for given AoD φ and AoA ψ are (Molisch, 2005)

aT(φ)=1√

MT

1

e−j2π1T cos(φ)

...

e−j2π(MT−1)1T cos(φ)

, aR(ψ)=1√

MR

1

e−j2π1R cos(ψ)

...

e−j2π(MR−1)1R cos(ψ)

.

Further assuming purely specular scattering with P paths, where each path has its distinct AoD φp and AoAψp, the matrix-valued spreading function is given by (cf. (1.8))

SH(τ ,ν)=P∑

p=1

Hp δ(τ − τp)δ(ν− νp), with Hp = hp aR(ψp)aTT(φp).

Note that here the MIMO matrices Hp = hp aR(ψp)aTT(φp) describing the individual paths in the delay-Doppler

domain (i.e., determining SH(τ ,ν) at the corresponding delay-Doppler points (τp,νp)) all have rank equal toone. The TF transfer function is obtained as

LH(t, f )=P∑

p=1

Hp e j2π(tνp−f τp);

it involves a superposition of all matrices Hp at each TF point (t, f ) and hence in general will have full rankeverywhere, provided that more than min{MT,MR} paths have sufficiently distinct AoA/AoD (rich scattering).We note that because of the finite aperture of the antenna arrays (MT1T and MR1R), no more than, respec-tively, MT and MR orthogonal directions can be effectively resolved in the angular domain (Sayeed, 2002;Tse & Viswanath, 2005).

The spatial/angular dispersion of MIMO channels – i.e., the mixing of the signals emitted from alltransmit antennas – can be viewed as an inconvenience necessitating spatial equalization. However,spatial dispersion actually provides additional degrees of freedom that can be exploited to realize spa-tial diversity. This diversity is analogous to the delay diversity due to time dispersion and the Dopplerdiversity due to frequency dispersion.

1.4 STOCHASTIC DESCRIPTIONA complete deterministic characterization of LTV channels (e.g., based on Maxwell’s equations) isinfeasible in virtually all scenarios of practical relevance. Even if such a characterization were possible,it would only apply to a specific environment, whereas wireless systems need to be designed for a widevariety of operating conditions. This motivates stochastic characterizations, which consider an LTVchannel as a random quantity whose statistics describe common properties of an underlying ensembleof wireless channels.

We will restrict our discussion to the common case of Rayleigh fading, where the channel’s systemfunctions SH(τ ,ν), LH(t, f ), and h(t,τ) are 2D complex Gaussian random processes with zero mean.

HLAWATSCH Ch01-9780123744838 2011/2/21 21:31 #15

1.4 Stochastic Description 15

For Rayleigh fading, the stochastic characterization of a channel reduces to the specification of itssecond-order statistics.

1.4.1 WSSUS Channels1.4.1.1 The WSS, US, and WSSUS PropertiesThe second-order statistics of the 2D system functions of an LTV channel (spreading function, TFtransfer function, and impulse response) generally depend on four variables. In his seminal paper, Bello(1963) provided a simplified description in terms of only two variables by introducing the assumptionof wide-sense stationary uncorrelated scattering (WSSUS). The WSSUS property is also discussed,e.g., in Matz and Hlawatsch (2003); Molisch (2005); Proakis (1995).

A random LTV channel is said to feature uncorrelated scattering (US) if different channel taps(delay coefficients) are uncorrelated (Bello, 1963), i.e.,

E{h(t,τ)h∗(t′,τ ′)} = r′h(t, t′;τ)δ(τ − τ ′),

with some correlation function r′h(t, t′;τ). Note that different taps can be interpreted as belonging to

different scatterers. Furthermore, a channel is said to be wide-sense stationary (WSS) if the channeltaps are jointly wide-sense stationary with respect to the time variable t (Bello, 1963), i.e.,

E{h(t,τ)h∗(t′,τ ′)} = rh(t− t′;τ ,τ ′),

with some correlation function rh(1t;τ ,τ ′). Combining the WSS and US properties, we obtain theWSSUS property

E{h(t,τ)h∗(t′,τ ′)} = rh(t− t′;τ)δ(τ − τ ′). (1.23)

This shows that the second-order statistics of a WSSUS channel are fully described by the 2D functionrh(1t;τ), which is a correlation function in the time-difference variable 1t.

1.4.1.2 Scattering Function and TF Correlation FunctionUS channels have uncorrelated delay coefficients, and it can be shown that for WSS channels, dif-ferent Doppler frequency coefficients are uncorrelated. Taken together, this implies that the spreadingfunction SH(τ ,ν) of a WSSUS channel is a 2D white (but nonstationary) process, i.e.,

E{SH(τ ,ν)S∗H(τ

′,ν′)}= CH(τ ,ν)δ(τ−τ ′)δ(ν−ν′). (1.24)

The rationale here is that each delay-Doppler pair (τ ,ν) corresponds to a scatterer with reflectivitySH(τ ,ν), and the reflectivities of any two distinct scatterers (i.e., scatterers with different delay τ ordifferent Doppler ν) are uncorrelated. The mean intensity of the 2D white spreading function process,CH(τ ,ν)≥ 0, is known as the channel’s scattering function (Bello, 1963; Matz & Hlawatsch, 2003;Molisch, 2005). The scattering function characterizes the average strength of scatterers with delayτ and Doppler frequency ν, and thus, it provides a statistical characterization of the TF dispersionproduced by a WSSUS channel. A wideband scattering function based on the delay-scale spreadingfunction FH(τ ,α) in (1.10) can be defined in a similar manner (see Balan, Poor, Rickard, & Verdu,2004; Margetts et al., 2007; Ye & Papandreou-Suppappola, 2003; and Chapter 9).

HLAWATSCH Ch01-9780123744838 2011/2/21 21:31 #16


By definition, WSS channels are stationary in time; furthermore, it can be shown that US channelsare stationary in frequency. It follows that the statistics of a WSSUS channel do not change with timeor frequency, and hence, the TF transfer function LH(t, f ) is a 2D stationary process, i.e.,

E{LH(t, f )L∗H(t

′, f ′)}= RH(t−t′, f−f ′). (1.25)

Here, RH(1t,1f ) denotes the channel’s TF correlation function. The stationarity of LH(t, f ) asexpressed by the above equation is consistent with the fact that the spreading function SH(τ ,ν), whichis the 2D Fourier transform of LH(t, f ), is a white process.

Using the inverse of the Fourier transform relation (1.11) in (1.24), we obtain a similar Fouriertransform relation between the TF correlation function RH(1t,1f ) and the scattering function CH(τ ,ν):

CH(τ ,ν) =

∞∫−∞

∞∫−∞

RH(1t,1f )e−j2π(ν1t−τ1f ) d1t d1f . (1.26)

By inspecting (1.25) and (1.26), it is seen that the scattering function is the 2D power spectral density ofthe 2D stationary process LH(t, f ). This observation suggests the use of spectrum estimation techniquesto measure the scattering function (Kay & Doyle, 2003; see also Section 1.7.4).

1.4.1.3 Statistical Input–Output RelationsThe scattering function is a statistical characterization of the TF dispersion produced by a WSSUSchannel. This interpretation can be made more explicit by considering the Rihaczek spectra (Flandrin,1999; Matz & Hlawatsch, 2006) of transmit signal s(t) and receive signal r(t). The Rihaczek spectrumof a (generally nonstationary) random process x(t) with correlation function Rx(t, t′)= E{x(t)x∗(t′)} isdefined as

0x(t, f ),

∞∫−∞

Rx(t, t−1t)e−j2π f1t d1t.

With some precautions, 0x(t, f ) can be interpreted as a mean energy distribution of x(t) over the TFplane. Thus, it generalizes the power spectral density of stationary processes.

Starting from (1.7), it can be shown that

0r(t, f )=

∞∫−∞

∞∫−∞

CH(τ ,ν)0s(t− τ , f − ν)dτ dν. (1.27)

This means that the TF energy spectrum of the receive signal is a superposition of TF-translatedversions of the TF energy spectrum of the transmit signal, weighted by the corresponding values ofthe scattering function. The “statistical input–output relation” (1.27) thus represents a second-orderstatistical analogue of the deterministic linear input–output relation (1.9).

Performing a 2D Fourier transform of the convolution relation (1.27) yields

Ar(1t,1f )= RH(1t,1f ) As(1t,1f ), (1.28)

where Ax(1t,1f ),∫∞

−∞Rx(t, t−1t)e−j2π t1f dt, the 2D Fourier transform of 0x(t, f ), is the expected

ambiguity function of a random process x(t). The simple multiplicative input–output relation (1.28) is

HLAWATSCH Ch01-9780123744838 2011/2/21 21:31 #17


the basis of certain methods for estimating the scattering function (Artes, Matz, & Hlawatsch, 2004;Gaarder, 1968). Specifically, As(1t,1f ) is known by design, and Ar(1t,1f ) can be estimated fromthe receive signal. According to (1.28), an estimate of the TF correlation function RH(1t,1f ) can thenbe obtained by a (regularized) division of the estimate of Ar(1t,1f ) by As(1t,1f ), and an estimateof the scattering function CH(τ ,ν) is finally obtained by a 2D Fourier transform according to (1.26).Further details of this approach are provided in Section 1.7.4.

1.4.1.4 Delay and Doppler Profiles, Time and Frequency Correlation FunctionsIn some situations, only the delays or only the Doppler shifts of a WSSUS channel are of interest. Anexample is the exploitation of delay diversity in orthogonal frequency division multiplexing (OFDM)systems – see Chapter 7 – by (pre)coding across tones; here, the channel’s Doppler characteristicsare irrelevant. In such cases, the 2D descriptions provided by the scattering function CH(τ ,ν) or TFcorrelation function RH(1t,1f ) may be too detailed, and it is sufficient to use one of the “marginals”of the scattering function. These marginals are defined as

c(1)H (τ ),

∞∫−∞

CH(τ ,ν)dν, c(2)H (ν),

∞∫−∞

CH(τ ,ν)dτ , (1.29)

and termed delay power profile and Doppler power profile, respectively. The name “delay power pro-file” for c(1)H (τ ) is motivated by the relation c(1)H (τ )= E{|h(t,τ)|2}, which shows that c(1)H (τ ) is themean power of the channel tap with delay τ (this does not depend on t since WSSUS channels are sta-tionary with respect to time). A similar relation and interpretation hold for c(2)H (ν). Because of (1.26),

the (1D) Fourier transforms of the delay power profile c(1)H (τ ) and Doppler power profile c(2)H (ν) aregiven by the frequency correlation function and time correlation function defined, respectively, as

r(1)H (1f ) , RH(0,1f ) = E{LH(t, f )L∗H(t, f −1f )},

r(2)H (1t) , RH(1t,0) = E{LH(t, f )L∗H(t−1t, f )}.

The 1D channel statistics discussed above can be used to formulate statistical input–output relationsfor stationary or white input (transmit) processes. For a stationary transmit signal with power spectraldensity Ps( f ), it can be shown that the receive signal produced by a WSSUS channel is stationary aswell and its power spectral density and correlation function are respectively given by

Pr( f )=

∞∫−∞

c(2)H (ν)Ps( f − ν)dν, rr(1t)= r(2)H (1t)rs(1t).

Dual relations involving c(1)H (τ ) and r(1)H (1f ) hold for white transmit signals. Furthermore, similarrelations exist for cyclostationary processes (Matz & Hlawatsch, 2003).

1.4.1.5 Global Channel ParametersFor many design and analysis tasks in wireless communications, only global channel parametersare relevant. These parameters summarize important properties of the scattering function and TFcorrelation function such as overall strength, position, and extension (spread).

HLAWATSCH Ch01-9780123744838 2011/2/21 21:31 #18


As discussed in Section 1.2.4, the path loss (average power attenuation) is important, e.g., for linkbudget considerations. In the case of WSSUS channels, the path loss is equal to the volume of thescattering function or, equivalently, the maximum amplitude of the TF correlation function. That is,we have PL =−10log10(ρ

2H) with

ρ2H =

∞∫−∞

∞∫−∞

CH(τ ,ν)dτ dν

=

∞∫−∞

c(1)H (τ )dτ =

∞∫−∞

c(2)H (ν)dν

= RH(0,0)= E{|LH(t, f )|2

}.

Note that the last expression, E{|LH(t, f )|2

}, does not depend on (t, f ) due to the TF stationarity of

WSSUS channels.Further useful parameters are the mean delay and mean Doppler shift of a WSSUS channel, which

are defined by the first moments (centers of gravity)

τ ,1

ρ2H

∞∫−∞

∞∫−∞

τ CH(τ ,ν)dτ dν =1

ρ2H

∞∫−∞

τ c(1)H (τ )dτ , (1.30a)

ν ,1

ρ2H

∞∫−∞

∞∫−∞

νCH(τ ,ν)dτ dν =1

ρ2H

∞∫−∞

ν c(2)H (ν)dν. (1.30b)

In particular, τ describes the distance-dependent mean propagation delay. For physical channels,causality implies that CH(τ ,ν)= 0 for τ < 0 and consequently that τ ≥ 0. Assuming that the receiver’stiming recovery unit locks to the center of gravity of the delay power profile, the subsequent receiverstages will see an equivalent channel H where the mean delay τ is split off. That is, H= HDτ whereDτ is a pure time-delay operator acting as (Dτ s)(t)= s(t− τ ) and H is an equivalent channel whosemean delay is zero. In many treatments, the equivalent channel H is considered even though this is notstated explicitly. Similar considerations apply to the mean Doppler shift ν, which can also be split offby frequency offset compensation techniques, resulting in an equivalent channel with mean Dopplershift equal to zero.

The mean delay τ and mean Doppler shift ν describe the overall location of the scattering functionCH(τ ,ν) in the (τ ,ν) plane. The extension of CH(τ ,ν) about (τ , ν) can be measured by the delay spreadand Doppler spread, which are defined as the root-mean-square (RMS) widths of delay power profilec(1)H (τ ) and Doppler power profile c(2)H (ν), respectively:

στ ,1

ρH

√√√√√ ∞∫−∞

∞∫−∞

(τ − τ )2 CH(τ ,ν)dτ dν =1

ρH

√√√√√ ∞∫−∞

(τ − τ )2 c(1)H (τ )dτ , (1.31a)

σν ,1

ρH

√√√√√ ∞∫−∞

∞∫−∞

(ν− ν)2 CH(τ ,ν)dτ dν =1

ρH

√√√√√ ∞∫−∞

(ν− ν)2 c(2)H (ν)dν. (1.31b)

HLAWATSCH Ch01-9780123744838 2011/2/21 21:31 #19


Sometimes, it is more convenient to work with the reciprocals of Doppler spread and delay spread,

Tc ,1

σν, Fc ,

1

στ, (1.32)

which are known as the coherence time and coherence bandwidth, respectively. These two parameterscan be used to quantify the duration and bandwidth within which the channel is approximately constant(or, at least, strongly correlated). This interpretation is supported by two arguments. First, it can beshown that the curvatures in the 1t and 1f directions of the squared magnitude of the TF correlationfunction RH(1t,1f ) at the origin are inversely proportional to the squared coherence bandwidth andsquared coherence time, respectively. This corresponds to the following second-order Taylor seriesapproximation of |RH(1t,1f )|2 about the origin:

|RH(1t,1f )|2 ≈ ρ4H

[1−

(2π1t

Tc

)2

−

(2π1f

Fc

)2 ].

(The first-order terms vanish since |RH(1t,1f )|2 ≤ |RH(0,0)|2 = ρ4H, i.e., |RH(1t,1f )|2 assumes its

maximum at the origin.) Thus, within durations |1t| smaller than Tc and bandwidths |1f | smaller thanFc, the channel will be strongly correlated. In addition, it can be shown that

1

ρ2H

E{|LH(t+1t, f +1f )−LH(t, f )|2

}≤ 2π

[(1t

Tc

)2

+

(1f

Fc

)2]

. (1.33)

This implies that within TF regions of duration |1t| smaller than Tc and bandwidth |1f | smaller thanFc, the channel is approximately constant (in the mean-square sense). More specifically, within thelocal ε-coherence region Bε

c (t, f ), [t, t+ εTc]× [ f , f + εFc], the RMS error of the approximationLH(t+1t, f +1f )≈ LH(t, f ) is of order ε. An illustration of this coherence region will be shown inFig. 1.8 in Section 1.4.3.3.

We finally illustrate the characterization of WSSUS channels with a simple example.

Example 1.9A popular WSSUS channel model uses a separable scattering function CH(τ ,ν)= 1

ρ2H

c(1)H (τ )c(2)H (ν) with an

exponential delay power profile and a so-called Jakes Doppler power profile (Molisch, 2005; Proakis, 1995):

c(1)H (τ )=

ρ2

Hτ0

e−τ/τ0 , τ ≥ 0,

0, τ < 0,c(2)H (ν)=

ρ2

H

π√ν2

max−ν2

, |ν|< νmax,

0, |ν|> νmax.

Here, τ0 is a delay parameter, and νmax is the maximum Doppler shift. The exponential delay profile ismotivated by the exponential decay of receive power with path length (which is proportional to delay),and the Jakes Doppler profile results from the assumption of uniformly distributed AoA (Jakes, 1974).Note that c(1)H (τ ) ignores the fundamental propagation delay of the first (and, in this case, also strongest)multipath component. The corresponding TF correlation function is separable as well, i.e., RH(1t,1f )=1ρ2

Hr(2)H (1t)r(1)H (1f ), with time correlation function and frequency correlation function given by

r(2)H (1t)= ρ2H J0(2πνmax1t), r(1)H (1f )=

ρ2H

1+ j2πτ01f.

HLAWATSCH Ch01-9780123744838 2011/2/21 21:31 #20


Δt

Δf

ν

τ

FIGURE 1.5

WSSUS channel following the Jakes-exponential model: scattering function (left) and magnitude of TFcorrelation function (right).

Here, J0(·) denotes the zeroth-order Bessel function of the first kind. The scattering function and TFcorrelation function for this WSSUS channel are depicted in Fig. 1.5.

Assuming a delay parameter τ0 = 10 µs and maximum Doppler νmax = 100 Hz, we obtain the meandelay τ = τ0 = 10 µs and mean Doppler ν = 0 Hz. Furthermore, the delay spread and Doppler spread followas

στ = τ0 = 10 µs, σν =νmax√

2= 70.71 Hz,

and the corresponding coherence time and coherence bandwidth are

Tc =

√2

νmax= 14.14 ms, Fc =

1

τ0= 100 kHz.

For a narrowband system with bandwidth 10 kHz and frame duration 1 ms, the bound (1.33) implies that themean-square difference between any two values of LH(t, f ) within the frame duration and transmit band isat most roughly 1.5%. Thus, the TF transfer function can be assumed constant, i.e., LH(t, f )≈ h. Within oneframe, the input–output relation (1.12) then simplifies to r(t)≈ hs(t), a model known as block flat fading.

1.4.2 Extension to Multiantenna SystemsWe next outline the extension of the WSSUS property to multiple-antenna (MIMO) channels, mostlyfollowing Matz (2006). The main difference from the single-antenna case is the need for joint statisticsof the individual links.

1.4.2.1 Scattering Function Matrix and Space-Time-Frequency CorrelationFunction Matrix

Extending Bello (1963) (see also Section 1.4.1), we call a MIMO channel WSSUS if all MTMR ele-ments of the TF transfer function matrix LH(t, f ) in (1.22) are jointly (wide-sense) stationary. Definingthe length-MTMR vector lH(t, f ), vec{LH(t, f )}, this condition can be written as (cf. (1.25))

E{lH(t, f ) lHH(t

′, f ′)}= RH(t−t′, f−f ′). (1.34)

HLAWATSCH Ch01-9780123744838 2011/2/21 21:31 #21


Here, the MTMR×MTMR matrix RH(1t,1f ) is referred to as the space-time-frequency correlationfunction matrix of the channel. This matrix-valued function describes the correlation of the transferfunctions LHij(t, f ) and LHi′j′

(t′, f ′) of any two component channels Hij and Hi′j′ at time lag t− t′ =1t

and frequency lag f − f ′ =1f .Equivalently, the MIMO-WSSUS property expresses the fact that the elements of the spreading

function matrix SH(τ ,ν) in (1.21) are jointly white (cf. (1.24)), i.e.,

E{sH(τ ,ν)sH

H(τ′,ν′)

}= CH(τ ,ν)δ(τ − τ ′)δ(ν− ν′),

with sH(τ ,ν), vec{SH(τ ,ν)}. The MTMR×MTMR matrix CH(τ ,ν)will be referred to as the scatteringfunction matrix (SFM). The SFM is nonnegative definite for all (τ ,ν). It summarizes the mean spatialcharacteristics and strength of all scatterers with delay τ and Doppler ν. The SFM and the space-time-frequency correlation function matrix are related via a 2D Fourier transform (cf. (1.26)):

CH(τ ,ν) =

∞∫−∞

∞∫−∞

RH(1t,1f )e−j2π(ν1t−τ1f ) d1t d1f .

Thus, recalling (1.34), the SFM CH(τ ,ν) can be interpreted as the 2D power spectral density matrix ofthe 2D stationary multivariate random process lH(t, f ).

1.4.2.2 Canonical DecompositionWhile the SFM provides a decorrelated representation with respect to delay and Doppler, it still fea-tures spatial correlations. For a spatially decorrelated representation, consider the (τ ,ν)-dependenteigendecomposition of the SFM,

CH(τ ,ν)=MR∑i=1

MT∑j=1

λij(τ ,ν)uij(τ ,ν)uHij (τ ,ν).

Here, λij(τ ,ν)≥ 0 and uij(τ ,ν) denote the eigenvalues and eigenvectors of CH(τ ,ν), respectively (weuse 2D indexing for later convenience). For each (τ ,ν), the MTMR vectors ui,j(τ ,ν) form an orthonor-mal basis of CMTMR . Using the MR×MT matrix form of this basis, Uij(τ ,ν), unvec{uij(τ ,ν)}, thechannel’s spreading function can be expanded as

SH(τ ,ν)=MR∑i=1

MT∑j=1

αij(τ ,ν)Uij(τ ,ν), (1.35)

with the random coefficients αij(τ ,ν), uHij (τ ,ν)sH(τ ,ν). It can be shown that these coefficients are

orthogonal with respect to delay, Doppler, and space, i.e.,

E{αij(τ ,ν)α∗i′j′(τ

′,ν′)}= λij(τ ,ν)δi,i′ δj,j′ δ(τ−τ

′)δ(ν−ν′).

The expansion (1.35) entails the following representation of the MIMO channel (see (1.19) and (1.20)):

r(t)= (Hs)(t)=

∞∫−∞

∞∫−∞

MR∑i=1

MT∑j=1

αij(τ ,ν)Uij(τ ,ν)s(t− τ)e j2πνt dτ dν. (1.36)

HLAWATSCH Ch01-9780123744838 2011/2/21 21:31 #22


It is seen that the eigenvector matrices Uij(τ ,ν) describe the spatial characteristics of deterministicatomic MIMO channels associated with delay τ and Doppler frequency ν. The expansion (1.36) is“doubly orthogonal” since for any (τ ,ν), the matrices Uij(τ ,ν) are (deterministically) orthonormaland the coefficients αij(τ ,ν) are stochastically orthogonal. Thus, (1.36) represents any MIMO-WSSUS channel as a superposition of deterministic atomic MIMO channels weighted by uncorrelatedscalar random coefficients. In this representation, the channel transfer effects (space-time-frequencydispersion/selectivity) are separated from the channel stochastics.

Example 1.10For spatially i.i.d. MIMO-WSSUS channels, the SFM is given by CH(τ ,ν)= C(τ ,ν)I, i.e., all componentchannels Hij are independent WSSUS channels with identical scattering function C(τ ,ν). In this case,λij(τ ,ν)= C(τ ,ν) and Uij(τ ,ν)= ei eH

j , with ei denoting the ith unit vector. Here, the action of the atomicchannels is Uij(τ ,ν)s(t)= sj(t)ei, which means that for the atomic channels, the ith receive antenna observesthe signal emitted by the jth transmit antenna. Since due to the i.i.d. assumption the individual spatial linksare independent, there is αij(τ ,ν)= uH

ij (τ ,ν)sH(τ ,ν)= SHij(τ ,ν), and (1.36) simplifies to

r(t)=

∞∫−∞

∞∫−∞

MR∑i=1

MT∑j=1

SHij(τ ,ν) sj(t− τ)ej2πνt ei dτ dν.

The i.i.d. MIMO-WSSUS model is an extremely simple model in that it is already completelydecorrelated in all domains. Slightly more complex – but more realistic – MIMO-WSSUS models arediscussed in the next example.

Example 1.11An extension of the flat-fading MIMO model of Weichselberger, Herdin, Ozcelik, & Bonek (2006) assumesthat the spatial eigenmodes (but not necessarily the associated powers λij(τ ,ν)) are separable, i.e.,Uij(τ ,ν)= vi(τ ,ν)wH

j (τ ,ν). Here, the spatial modes vi(τ ,ν) and wj(τ ,ν) can be interpreted as transmitand receive beamforming vectors, respectively. The resulting simplified version of (1.35) can be written as

SH(τ ,ν)= V(τ ,ν)6(τ ,ν)WH(τ ,ν), (1.37)

with the deterministic matrices V(τ ,ν)= (v1(τ ,ν) · · · vMR(τ ,ν)) (dimension MR×MR) and W(τ ,ν)=(w1(τ ,ν) · · · wMT(τ ,ν)) (dimension MT×MT) and the random matrix 6(τ ,ν) (dimension MR×MT) givenby [6(τ ,ν)]ij = αij(τ ,ν). In this context, the average powers λij(τ ,ν) are referred to as coupling coefficientssince they describe how strongly the spatial transmit modes wj(τ ,ν) and spatial receive modes vi(τ ,ν) arecoupled on average.

A special case of the above model for uniform linear arrays is obtained by assuming that the spatialmodes equal the array steering vectors, i.e., V(τ ,ν)= FMR and W(τ ,ν)= FMT , where FN denotes the N×NDFT matrix. This model is known in the literature as the virtual MIMO model (Sayeed, 2002).

Another simplification of (1.37) is obtained by assuming that also the SFM eigenvalues are spatiallyseparable, i.e., λij(τ ,ν)= κi(τ ,ν)µj(τ ,ν). This corresponds to a WSSUS extension of the so-called Kroneckermodel (Kermoal, Schumacher, Pedersen, Mogensen, & Frederiksen, 2002).

HLAWATSCH Ch01-9780123744838 2011/2/21 21:31 #23


An analysis of channel measurements in Matz (2006) revealed that for a channel having a domi-nant scatterer with delay τ0 and Doppler frequency ν0, the SFM at (τ ,ν)= (τ0,ν0), CH(τ0,ν0), has asingle dominant eigenvalue (i.e., effective rank one), and the same holds true for the associated atomicMIMO channel matrix U11(τ0,ν0). That is, U11(τ0,ν0)= v1(τ0,ν0)wH

1 (τ0,ν0), where the spatial sig-natures v1(τ0,ν0) and w1(τ0,ν0) essentially capture the AoA and AoD, respectively, associated withthat dominant scatterer. It follows that in the delay-Doppler domain, a rank-one Kronecker model suf-ficiently characterizes the channel, i.e., SH(τ0,ν0)= α11(τ0,ν0)v1(τ0,ν0)wH

1 (τ0,ν0). Note, however,that the Kronecker model is not necessarily applicable to the TF transfer function matrix LH(t, f ).This is because the spatial averaging effected by the Fourier transform (1.22) will generally build upfull-rank matrices.

1.4.3 Non-WSSUS ChannelsThe WSSUS assumption greatly simplifies the statistical characterization of LTV channels. How-ever, it is satisfied by practical wireless channels only approximately within certain time intervalsand frequency bands. A similarly simple and intuitive framework for non-WSSUS channels is providednext, following to a large extent Matz (2005). This framework includes WSSUS channels as a specialcase.

A fundamental property of WSSUS channels is the fact that different scatterers (delay-Dopplercomponents) are uncorrelated, i.e., the spreading function SH(τ ,ν) is a white process. In practice, thisproperty will not be satisfied because channel components that are close to each other in the delay-Doppler domain often result from the same physical scatterer and will hence be correlated. In addition,filters, antennas, and windowing operations at the transmit and/or receive side are often viewed as partof the channel; they cause some extra time and frequency dispersion that results in correlations of thespreading function of the overall channel.

Example 1.12Consider a channel with a single specular scatterer with delay τ0, Doppler shift ν0, and random reflectivityh. The transmitter uses a filter with impulse response g(τ ), and the receiver multiplies the receive signal bya window γ (t). It can be shown that the spreading function of the effective channel (including transmit filterand receiver window) is given by

SH(τ ,ν)= h g(τ − τ0)0(ν− ν0),

where 0(ν) is the Fourier transform of γ (t). Clearly, the spreading function exhibits correlations in aneighborhood of (τ0,ν0) that is determined by the effective duration of g(τ ) and the effective bandwidthof γ (t).

An alternative view of non-WSSUS channels builds on the TF transfer function, which is no longerTF stationary. The physical mechanisms causing LH(t, f ) to be nonstationary include shadowing, delayand Doppler drift due to mobility, and changes in the propagation environment. These effects occur ata much larger scale than small-scale fading.

HLAWATSCH Ch01-9780123744838 2011/2/21 21:31 #24


Example 1.13Consider a receiver approaching the transmitter with changing speed υ(t) so that their distance decreasesaccording to d(t)= d0−

∫ t0 υ(t

′)dt′. In this case, the transmit signal is delayed by τ1(t)= d(t)/c0 and Dopplershifted by ν1(t)= fcυ(t)/c0. The impulse response and TF transfer function are here given by h(t,τ)=he j2πν1(t)tδ(τ − τ1(t)) and LH(t, f )= he j2π(ν1(t)t−τ1(t)f ), respectively. The correlation function of LH(t, f ) canbe shown to depend explicitly on time. Hence, the channel is temporally nonstationary.

1.4.3.1 Local Scattering Function and Channel Correlation FunctionIn Matz (2005), the local scattering function (LSF) was introduced as a physically meaningful second-order statistic that extends the scattering function CH(τ ,ν) of WSSUS channels to the case of non-WSSUS channels. The LSF is defined as a 2D Fourier transform of the 4D correlation function of theTF transfer function LH(t, f ) or the spreading function SH(τ ,ν) with respect to the lag variables, i.e.,

CH(t, f ;τ ,ν),

∞∫−∞

∞∫−∞

E{LH(t, f+1f )L∗H(t−1t, f )

}e−j2π(ν1t−τ1f ) d1t d1f

=

∞∫−∞

∞∫−∞

E{SH(τ ,ν+1ν)S∗H(τ−1τ ,ν)

}e j2π(t1ν−f1τ) d1τ d1ν .

For WSSUS channels, CH(t, f ;τ ,ν)= CH(τ ,ν) (cf. (1.26)). It was shown in Matz (2005) that the LSFdescribes the power of multipath components with delay τ and Doppler shift ν occurring at time tand frequency f . This interpretation can be supported by the following channel input–output relationextending (1.27):

0r(t, f )=

∞∫−∞

∞∫−∞

CH(t, f − ν;τ ,ν)0s(t− τ , f − ν)dτ dν,

where, as before, 0x(t, f ) denotes the Rihaczek spectrum of a random process x(t). This relation showsthat the LSF CH(t, f ;τ ,ν) describes the TF energy shifts from (t− τ , f ) to (t, f + ν).

The LSF is a channel statistic that reveals the nonstationarities (in time and frequency) of a channelvia its dependence on t and f . A dual second-order channel statistic that is better suited for describingthe channel’s delay-Doppler correlations (in addition to TF correlations) is provided by the channelcorrelation function (CCF) defined as

RH(1t,1f ;1τ ,1ν),

∞∫−∞

∞∫−∞

E{LH(t, f+1f )L∗H(t−1t, f )

}e−j2π(t1ν−f1τ) dt df

=

∞∫−∞

∞∫−∞

E{SH(τ ,ν+1ν)S∗H(τ−1τ ,ν)

}e j2π(ν1t−τ1f ) dτ dν .

HLAWATSCH Ch01-9780123744838 2011/2/21 21:31 #25


The CCF can be shown to characterize the correlation of multipath components separated by 1tin time, by 1f in frequency, by 1τ in delay, and by 1ν in Doppler. It generalizes the TF cor-relation function RH(1t,1f ) of WSSUS channels to the non-WSSUS case: for WSSUS channels,RH(1t,1f ;1τ ,1ν)= RH(1t,1f )δ(1τ)δ(1ν), which correctly indicates the absence of delay andDoppler correlations. The CCF is symmetric and assumes its maximum at the origin. It is related to theLSF via a 4D Fourier transform,

CH(t, f ;τ ,ν)=

∞∫−∞

∞∫−∞

∞∫−∞

∞∫−∞

RH(1t,1f ;1τ ,1ν)e−j2π(ν1t−τ1f−t1ν+f1τ)d1t d1f d1τ d1ν,

in which time t and Doppler lag1ν are Fourier dual variables, and so are frequency f and delay lag1τ .Once again, this indicates that delay-Doppler correlations manifest themselves as TF nonstationarities(and vice versa).

1.4.3.2 Reduced-Detail Channel DescriptionsSeveral less detailed channel statistics for non-WSSUS channels can be obtained as marginals of theLSF or as cross sections of the CCF. Of specific interest are the average scattering function,

CH(τ ,ν),

∞∫−∞

∞∫−∞

CH(t, f ;τ ,ν)dt df = E{|SH(τ ,ν)|2

}and its Fourier dual,

RH(1t,1f ;0,0)=

∞∫−∞

∞∫−∞

CH(τ ,ν)e j2π(ν1t−τ1f ) dτ dν,

which characterize the global delay-Doppler dispersion and TF correlations much in the same wayas the scattering function CH(τ ,ν) and TF correlation function RH(1t,1f ) of WSSUS channels. Dualchannel statistics describing particularly the nonstationarities and delay-Doppler correlations of non-WSSUS channels are given by the TF-dependent path loss

ρ2H(t, f ),

∞∫−∞

∞∫−∞

CH(t, f ;τ ,ν)dτ dν = E{|LH(t, f )|2

}and its Fourier dual

RH(0,0;1τ ,1ν)=

∞∫−∞

∞∫−∞

ρ2H(t, f )e−j2π(t1ν−f1τ) dt d f .

In addition, it is possible to define TF-dependent delay and Doppler power profiles,

c(1)H (t, f ;τ),

∞∫−∞

CH(t, f ;τ ,ν)dν, c(2)H (t, f ;ν),

∞∫−∞

CH(t, f ;τ ,ν)dτ ,

whose usefulness straightforwardly generalizes from the WSSUS case (see (1.29)).

HLAWATSCH Ch01-9780123744838 2011/2/21 21:31 #26


ν ν ν

τ τ τ

t = 5.31 s t = 6.42 s t = 6.97 s

ν ν ν

τ τ τ

t = 7.44 s t = 7.60 s t = 7.75 s

ν ν ν

τ τ τ

t = 8.07 s t = 8.62 s t = 9.72 s

FIGURE 1.6

Temporal snapshots of the LSF estimate CH(t, fc;τ ,ν) for a car-to-car channel.

Example 1.14We consider measurement data of a mobile radio channel for car-to-car communications. The channel mea-surements were recorded during 10s at fc = 5.2 GHz with the transmitter and receiver located in two carsthat moved in opposite directions on a highway. (We note that channel sounding is addressed in Section 1.7.)Details of the measurement campaign are described in Paier et al. (2009), and the measurement data areavailable at http://measurements.ftw.at. Figure 1.6 shows nine snapshots of the estimated LSF CH(t, f ;τ ,ν)at different time instants t and with frequency f fixed to fc (see Section 1.7.4 for an estimator of the LSF).The figure depicts three successive phases: during phase I (top row), the cars approach each other; during

HLAWATSCH Ch01-9780123744838 2011/2/21 21:31 #27


ν

τ

FIGURE 1.7

Estimate of the average LSF CH(τ ,ν) for the car-to-car channel.

phase II (middle row), they drive by each other; and during phase III (bottom row), they move away fromeach other. At each time instant, the LSF is seen to consist of only a small number of dominant components.These components correspond to (1) the direct path between the two cars, (2) a path involving a reflectionby a building located sideways of the highway, and (3) further paths corresponding to reflections by othervehicles on the highway. The direct path has a large delay and large positive Doppler frequency duringphase I, a small delay and near-zero Doppler frequency during phase II, and a large delay and large negativeDoppler frequency during phase III. Similar observations apply to the other multipath components.

Figure 1.7 shows an estimate of the average LSF CH(τ ,ν). While this representation indicates the max-imum delay and Doppler frequency, it suggests a continuum of scatterers, and thus fails to indicate that ateach time instant there are only a few dominant multipath components.

1.4.3.3 Global Channel ParametersAs in the WSSUS case, it is desirable to be able to characterize non-WSSUS channels in terms of afew global scalar parameters. In particular, the transmission loss is given by

E 2H ,

∞∫−∞

∞∫−∞

∞∫−∞

∞∫−∞

CH(t, f ;τ ,ν)dt df dτ dν

=

∞∫−∞

∞∫−∞

CH(τ ,ν)dτ dν

=

∞∫−∞

∞∫−∞

ρ2H(t, f )dt df

= E{‖H‖2

},

HLAWATSCH Ch01-9780123744838 2011/2/21 21:31 #28


where ‖H‖2 ,∫∞

−∞

∫∞

−∞|kH(t, t′)|2 dt dt′. The transmission loss quantifies the mean received energy for

a normalized stationary and white transmit signal. Furthermore, non-WSSUS versions of the meandelay τ , mean Doppler ν, delay spread στ , and Doppler spread σν can be defined by replacing thescattering function CH(τ ,ν) and path loss ρ2

H in the respective WSSUS-case definitions (1.30), (1.31)with the average scattering function CH(τ ,ν) and transmission loss E 2

H , respectively. Time-dependentor frequency-dependent versions of these parameters can also be defined. As an example, we mentionthe time-dependent delay spread given by

στ (t),1

ρH(t)

√√√√√ ∞∫−∞

∞∫−∞

∞∫−∞

[τ − τ (t)]2 CH(t, f ;τ ,ν)df dτ dν

with

ρ2H(t),

∞∫−∞

ρ2H(t, f )df , τ (t),

1

ρ2H(t)

∞∫−∞

∞∫−∞

∞∫−∞

τ CH(t, f ;τ ,ν)df dτ dν.

As in the case of WSSUS channels, a coherence time Tc and a coherence bandwidth Fc can bedefined as the reciprocal of Doppler spread σν and delay spread στ , respectively (Matz, 2005). Thesecoherence parameters can be combined into a local ε-coherence region Bε

c (t, f ), [t, t+ εTc]× [ f , f +εFc]. It can then be shown that the TF transfer function is approximately constant within Bε

c (t, f )in the sense that the normalized RMS error of the approximation LH(t′, f ′)≈ LH(t, f ) is maximally oforder ε for all (t′, f ′) ∈Bε

c (t, f ).While much of the above discussion involved concepts familiar from WSSUS channels, delay-

Doppler correlations and channel nonstationarity are phenomena specific to non-WSSUS channels.The amount of delay correlation and Doppler correlation can be measured by the following momentsof the CCF:

1τ ,1

‖RH‖1

∞∫−∞

∞∫−∞

∞∫−∞

∞∫−∞

|1τ | |RH(1t,1f ;1τ ,1ν)|d1t d1f d1τ d1ν, (1.38a)

1ν ,1

‖RH‖1

∞∫−∞

∞∫−∞

∞∫−∞

∞∫−∞

|1ν| |RH(1t,1f ;1τ ,1ν)|d1t d1f d1τ d1ν. (1.38b)

These parameters quantify the delay lag and Doppler lag spans within which there are significant cor-relations. Delay-Doppler correlations correspond to channel nonstationarity in the (dual) TF domain.The amount of (non-)stationarity can be measured in terms of a stationarity time and a stationaritybandwidth that are respectively defined as

Ts ,1

1ν, Fs ,

1

1τ. (1.39)

These two stationarity parameters can be combined into a local ε-stationarity region Bεs (t, f ),

[t, t+ εTs]× [ f , f + εFs]. It can be shown that the LSF is approximately constant within Bεs (t, f )

in the sense that the normalized error magnitude of the approximation CH(t′, f ′;τ ,ν)≈ CH(t, f ;τ ,ν)

HLAWATSCH Ch01-9780123744838 2011/2/21 21:31 #29

1.5 Underspread Channels 29

t

f

ε Tc

ε Ts

ε Fc ε Fs

εs (t1, f1)

εc (t0, f0)

(t0, f0)

(t1, f1)

FIGURE 1.8

Illustration of coherence region Bεc (t0, f0) and stationarity region Bε

s (t1, f1); the gray-shaded backgroundcorresponds to the magnitude of the channel’s TF transfer function.

is maximally of order ε for all (t′, f ′) ∈Bεs (t, f ). The stationarity region essentially quantifies the

duration and bandwidth within which the channel can be approximated with good accuracy by aWSSUS channel. The relevance of the stationarity region to wireless system design was discussedin Matz (2005). For example, the ratio of the size of the stationarity region and the size of the coher-ence region, TsFs/(TcFc), is crucial for the operational meaning of ergodic capacity. An illustration ofthe stationarity region and the coherence region is provided in Fig. 1.8.

1.5 UNDERSPREAD CHANNELSSo far, some of the system functions (e.g., TF transfer function and LSF) have been defined onlyformally without addressing their theoretical justification or practical applications.

Example 1.15If a pure carrier signal s(t)= e j2π fct is transmitted over a linear time-invariant (purely time-dispersive/frequency-selective) channel with impulse response h(τ ), the receive signal is given by the transmit signalmultiplied by a complex factor, i.e.,

r(t)= H( fc)ej2π fct, (1.40)

where H( f )=∫∞

−∞h(τ )e−j2π f τ dτ is the conventional channel transfer function. In mathematical language,

complex exponentials are eigenfunctions of time-invariant channels.For an LTV channel with spreading function SH(τ ,ν), we have

r(t)= LH(t, fc)ej2π fct, (1.41)

where LH(t, f )=∫∞

−∞

∫∞

−∞SH(τ ,ν)e j2π(tν−f τ) dτ dν is the channel’s TF transfer function. In spite of the formal

similarity to (1.40), the receive signal in (1.41) is not a complex exponential in general; the time-dependenceof the complex factor LH(t, fc) may result in strong amplitude and frequency modulation.

HLAWATSCH Ch02-9780123744838 2011/2/22 17:32 Page 65 #1

CHAPTER

2Information Theory ofUnderspread WSSUSChannels

Giuseppe Durisi1, Veniamin I. Morgenshtern2, Helmut Bolcskei2,Ulrich G. Schuster3, Shlomo Shamai (Shitz)4

1Chalmers University of Technology, Gothenburg, Sweden2ETH Zurich, Switzerland

3Robert Bosch GmbH, Stuttgart, Germany4Technion – Israel Institute of Technology, Haifa, Israel

2.1 THE ROLE OF A SYSTEM MODEL2.1.1 A Realistic ModelIn this chapter, we are interested in the ultimate limit on the rate of reliable communication throughRayleigh-fading channels that satisfy the wide-sense stationary (WSS) and uncorrelated scattering(US) assumptions and are underspread (Bello, 1963; Kennedy, 1969). Therefore, the natural settingis an information-theoretic one, and the performance metric is channel capacity (Cover & Thomas,1991; Gallager, 1968).

The family of Rayleigh-fading underspread WSSUS channels (reviewed in Chapter 1) constitutesa good model for real-world wireless channels: their stochastic properties, like amplitude and phasedistributions match channel measurement results (Schuster, 2009; Schuster & Bolcskei, 2007). TheRayleigh-fading and the WSSUS assumptions imply that the stochastic properties of the channel arefully described by a two-dimensional power spectral density (PSD) function, often referred to as scat-tering function (Bello, 1963). The underspread assumption implies that the scattering function is highlyconcentrated in the delay-Doppler plane.

To analyze wireless channels with information-theoretic tools, a system model, not just a channelmodel, needs to be specified. A system model is more comprehensive than a channel model because itdefines, among other parameters, the transmit-power constraints and the channel knowledge availableat the transmitter and the receiver. The choice of a realistic system model is crucial for the insightsand guidelines provided by information theory to be useful for the design of practical systems. Twoimportant aspects need to be accounted for by a model that aims at being realistic:

1. Neither the transmitter nor the receiver knows the realization of the channel: In most wirelesssystems, channel state information (CSI) is acquired by allocating part of the available resourcesto channel estimation. For example, pilot symbols can be embedded into the data stream, asexplained in Chapters 4 and 5, to aid the receiver in the channel-estimation process. From aninformation-theoretic perspective, pilot-based channel estimation is just a special case of coding.


65


66 CHAPTER 2 Information Theory of Underspread WSSUS Channels

Hence, the rate achievable with training-based schemes cannot exceed the capacity in the absenceof CSI at transmitter and receiver.

We refer to the setting where no CSI is available at transmitter and receiver, but both know thestatistics of the channel, as the noncoherent setting (Sethuraman, Hajek, & Narayanan, 2005;Sethuraman, Wang, Hajek, & Lapidoth, 2009; Zheng & Tse, 2002), in contrast to the coherentsetting, where a genie provides the receiver with perfect CSI (Biglieri, Proakis, & Shamai (Shitz),1998, Section III.B). Furthermore, we denote capacity in the noncoherent setting as noncoherentcapacity.

2. The peak power of the transmit signal is limited: Every power amplifier has finite gain and everymobile transmitter has limited battery resources. In addition, regulatory bodies often constrain theadmissible radiated power. Hence, in a realistic system model, a peak constraint should beimposed on the transmit signal.

Motivated by these two aspects, we provide an information-theoretic analysis of Rayleigh-fadingunderspread WSSUS channels in the noncoherent setting, under the additional assumption that thetransmit signal is peak-constrained.

2.1.2 A Brief Literature SurveyThe noncoherent capacity of fading channels is notoriously difficult to characterize analytically, evenfor simple channel models (Abou-Faycal, Trott, & Shamai (Shitz), 2001; Marzetta & Hochwald, 1999).Most of the results available in the literature pertain to either the large-bandwidth or the high-SNRregime. In the following, large-bandwidth regime refers to the case where the average power P is fixedand the bandwidth B is large, so that the SNR, which is proportional to P/B, is small. High-SNR regimerefers to the case of fixed B and large P and, hence, large SNR. In this section, we briefly review therelevant literature; two elements will emerge from this review:

1. The modeling aspects we identified in the previous section (WSSUS, noncoherent setting, peakconstraint) are fundamental in that the capacity is highly sensitive to these aspects.

2. In spite of the large number of results available in the literature, several questions of practicalengineering relevance about the design of wireless systems operating over fading channels are stillopen.

Large-bandwidth (low-SNR) regimeThe noncoherent capacity of fading channels has been a subject of investigation in information theoryfor several decades. The first contributions, which date back to the sixties (Gallager, 1968; Kennedy,1969; Pierce, 1966; Viterbi, 1967) (see Biglieri et al., 1998 for a more complete list of references),mainly deal with the characterization of the asymptotic behavior of noncoherent capacity in the large-bandwidth limit.

The results in Gallager (1968), Kennedy (1969), Pierce (1966) and Viterbi (1967) illustrate thesensitivity of noncoherent capacity to the presence of a peak constraint. Specifically, the outcome ofthis analysis is the following rather surprising result: in the large-bandwidth limit, the noncoherentcapacity of a fading channel coincides with that of an additive white Gaussian noise (AWGN) channelwith the same receive power. However, the signaling schemes that achieve the noncoherent capacity


2.1 The Role of a System Model 67

of a fading channel in the wideband limit have unbounded peak power – a result recently formalizedby Verdu (2002). Hence, these signaling schemes are not practical.

If a peak constraint is imposed on the transmit signal, AWGN channel capacity cannot be achievedin the infinite-bandwidth limit, and the actual behavior of noncoherent capacity in the widebandregime depends on the specific form of the peak constraint (Durisi, Schuster, Bolcskei, & Shamai(Shitz), 2010; Medard & Gallager, 2002; Subramanian & Hajek, 2002; Telatar & Tse, 2000; Viterbi,1967). In particular, when the transmit signal is subject to a peak constraint both in time and fre-quency (as in most practical systems), noncoherent capacity vanishes as the bandwidth grows large(Durisi et al., 2010; Medard & Gallager, 2002; Subramanian & Hajek, 2002; Telatar & Tse, 2000),provided that the number of independent diversity branches of the channel scales linearly with band-width,1 which is the case for WSSUS channels. Intuitively, under a peak constraint on the transmitsignal, the receiver is no longer able to resolve the channel uncertainty as the bandwidth, and, hence,the number of independent diversity branches, increases. This result implies that, for a large classof fading channels, the corresponding noncoherent capacity has a global maximum at a certain finitebandwidth, commonly referred to as the critical bandwidth. Computing this critical bandwidth is obvi-ously of great practical interest. Moreover, it is important to understand the role played by the spatialdegrees of freedom provided by multiple antennas at the transmitter and/or the receiver: can theybe used to increase capacity, or do they merely lead to a rate loss because of the increase in channeluncertainty?

High-SNR regimeCharacterizing noncoherent capacity in the high-SNR regime is of great practical interest for systemsthat operate over narrow frequency bands. As no closed-form expressions for the noncoherent capacityare known, not even for memoryless channels, one typically resorts to analyzing the asymptotic behav-ior of capacity as SNR goes to infinity. Differently from the large-bandwidth regime, where capacityresults are robust with respect to the underlying channel model, in the high-SNR regime the capa-city behavior is highly sensitive to the fine details of the channel model (Lapidoth, 2005; Lapidoth &Moser, 2003; Liang & Veeravalli, 2004; Marzetta & Hochwald, 1999; Telatar, 1999; Zheng & Tse,2002). The following results support this claim.

In the coherent setting, capacity grows logarithmically with SNR (Telatar, 1999); logarithmicgrowth also holds in the noncoherent setting for block-fading channels (Marzetta & Hochwald, 1999;Zheng & Tse, 2002).2 An alternative – more general – approach to modeling the time variation of a

1The independent diversity branches of a fading channel are sometimes referred to as stochastic degrees of freedom(Schuster and Bolcskei, 2007). We will not use this convention here. Instead, we will use the term degrees of freedom torefer to signal-space dimensions. Information-theoretic analyses of wireless channels for which the number of independentdiversity branches scales sublinearly with bandwidth, can be found, for example, in Porrat, Tse, and Nacu (2007); Raghavan,Hariharan, and Sayeed (2007).2The block-fading model is the simplest model that captures the time variation of a wireless channel. In this model, thechannel is taken to be constant over a given time interval, called block, and assumed to change independently from onesuch block to the next. The independence assumption across blocks can be justified, e.g., for systems employing frequencyhopping or if the data symbols are interleaved.



wireless channel is to assume that the fading process is stationary.3 Surprisingly, if the fadingprocess is stationary, the noncoherent capacity does not necessarily grow logarithmically with SNR:other scaling behaviors are possible (Lapidoth, 2005). For example, consider two stationary discrete-time Rayleigh-fading channels subject to additive Gaussian noise. The fading process of the firstchannel has PSD equal to 1/1 on the interval [−1/2,1/2], where 0<1< 1, and 0 else. The fadingprocess of the second channel has PSD equal to (1− ε)/1 on the interval [−1/2,1/2] and ε/(1−1)else (0< ε < 1). These two channels have completely different high-SNR capacity behavior, no mat-ter how small ε is: the noncoherent capacity of the first channel grows logarithmically in SNR, withpre-log factor equal to 1−1 (Lapidoth, 2005); the noncoherent capacity of the second channel growsdouble-logarithmically in SNR (Lapidoth & Moser, 2003).

Such a result is unsatisfactory from an engineering point of view because the support of a PSDcannot be determined through measurements (measurement noise is one of the reasons, another one isthe finite time duration of any physically meaningful measurement process). In other words, capacityturns out to be highly sensitive to a parameter – the measure of the support of the PSD – that has,in the words of Slepian (1976), “. . . no direct meaningful counterparts in the real world . . . ”. Such adependency of the capacity behavior on fine details of the channel model suggests that the stationarymodel is not robust in the high-SNR regime. An engineering-relevant problem is then to establish theSNR value at which this lack of robustness starts to manifest itself.

2.1.3 Capacity Bounds Answering Engineering-Relevant QuestionsThe purpose of this chapter is to present tight upper and lower bounds on the noncoherent capacity ofRayleigh-fading underspread WSSUS channels. On the basis of these bounds, answers to the followingengineering-relevant questions can be given:

1. How does the noncoherent capacity of this class of channels differ from the correspondingcoherent capacity and from the capacity of an AWGN channel with the same receive power?

2. How much bandwidth and how many antennas should be used to maximize capacity?3. How robust is the Rayleigh-fading WSSUS underspread channel model? More specifically, at

which SNR values does the noncoherent capacity start being sensitive to the fine details of thechannel model?

The capacity bounds presented in this chapter make use of information-theoretic tools recentlydeveloped in Guo, Shamai (Shitz), & Verdu (2005); Lapidoth (2005); Lapidoth & Moser (2003);Sethuraman et al. (2009). One of the difficulties we shall encounter is to adapt these tools to thecontinuous-time setting considered in this chapter. Harmonic analysis plays a fundamental role in thisrespect: it provides an effective method for converting a general continuous-time channel into a dis-cretized channel that can be analyzed using standard information-theoretic tools. The discretizationis accomplished by transmitting and receiving on a highly structured set of signals, similar to whatis done in pulse-shaped (PS) orthogonal frequency-division multiplexing (OFDM) systems (Kozek &Molisch, 1998). This ensures that the resulting discretized channel inherits the statistical properties of

3Stationarity in time, together with the assumption that scatterers corresponding to paths of different delays are uncorrelated,is the fundamental feature of the WSSUS model we focus on in this chapter.


2.2 A Discretized System Model 69

the underlying continuous-time channel (in particular, stationarity), a fact that is crucial for the ensu-ing analysis. As a byproduct, our results yield a novel information-theoretic criterion for the design ofPS-OFDM systems (see Matz et al. (2007) for a recent overview on this topic).

2.2 A DISCRETIZED SYSTEM MODEL2.2.1 The Channel Model2.2.1.1 Continuous-Time Input–Output RelationThe input–output (I/O) relation of a general continuous-time stochastic linear time-varying (LTV)channel H can be written as

r(t)=

∞∫−∞

h(t,τ)s(t− τ)dτ +w(t). (2.1)

Here, s(t) is the (stochastic) input signal, whose realizations can be taken as elements of the Hilbertspace L2(R) of square-integrable functions over the real line R; r(t) is the output signal; and w(t) isa zero-mean unit-variance proper AWGN random process. Finally, the time-varying channel impulseresponse h(t,τ) is a zero-mean jointly proper Gaussian (JPG) random process that satisfies the WSSUSassumption (Bello, 1963), i.e., it is stationary in time t and uncorrelated in delay τ :

E{h(t,τ)h∗(t′,τ ′)

}, rh(t− t′,τ)δ(τ − τ ′).

As a consequence of the JPG and WSSUS assumptions, the time-delay correlation function rh(t,τ)fully characterizes the channel statistics. The JPG assumption is empirically supported for narrowbandchannels (Vaughan & Bach Andersen, 2003); even ultrawideband (UWB) channels with bandwidthup to several Gigahertz can be modeled as Gaussian-distributed (Schuster & Bolcskei, 2007). TheWSSUS assumption is widely used in wireless channel modeling (Bello, 1963; Biglieri et al., 1998;Kennedy, 1969; Matz & Hlawatsch, 2003; Proakis, 2001; Vaughan & Bach Andersen, 2003). It isin good agreement with measurements of tropospheric scattering channels (Kennedy, 1969), and itprovides a reasonable model for many types of mobile radio channels (Cox, 1973a,b; Jakes, 1974), atleast when observed over a limited time duration and limited bandwidth (Bello, 1963). A more detailedreview of the WSSUS channel model can be found in Chapter 1.

Often, it will be convenient to describe H in domains other than the time-delay domain.The time-varying transfer function LH(t, f )= Fτ→f {h(t,τ)} and the delay-Doppler spreading func-tion SH(τ ,ν)= Ft→ν{h(t,τ)} can be used for this purpose. As a consequence of the WSSUS assump-tion, LH(t, f ) is WSS both in time t and in frequency f , and SH(τ ,ν) is uncorrelated in delay τ andDoppler ν:

E{LH(t, f )L∗H(t

′, f ′)}

, RH(t− t′, f − f ′). (2.2)

E{SH(τ ,ν)S∗H(τ

′,ν′)}

, CH(τ ,ν)δ(τ − τ ′)δ(ν− ν′). (2.3)

The function RH(t, f ) is usually referred to as the channel time-frequency correlation function, andCH(τ ,ν) is called the channel scattering function. The two functions are related by the two-dimensional



Fourier transform CH(τ ,ν)= Ft→ν, f→τ {RH(t, f )}. We assume throughout that

∞∫−∞

∞∫−∞

CH(τ ,ν)dτdν = 1 (2.4)

for simplicity.

2.2.1.2 The Underspread AssumptionAlmost all WSSUS channels of practical interest are underspread, i.e., they have a scatter-ing function CH(τ ,ν) that is highly concentrated around the origin of the delay-Doppler plane(Bello, 1963).

A mathematically precise definition of the underspread property is available for the casewhen CH(τ ,ν) is compactly supported within a rectangle in the delay-Doppler plane. In this case, aWSSUS channel is said to be underspread if the support area of CH(τ ,ν) is much less than one (Durisiet al., 2010; Kozek, 1997). In practice, it is not possible to determine through channel measurementswhether CH(τ ,ν) is compactly supported or not. Hence, in the terminology introduced in Section 2.1.2,the measure (area) of the support of the scattering function is a fine detail of the channel model. Toinvestigate the sensitivity of noncoherent capacity to this fine detail, and assess the robustness of theRayleigh-fading WSSUS model, we replace the compact-support assumption by the following physi-cally more meaningful assumption: CH(τ ,ν) has a small fraction of its total volume supported outsidea rectangle of an area that is much smaller than 1. More precisely, we have the following definition:

Definition 2.1 Let τ0,ν0 ∈ R+,ε ∈ [0,1], and let H (τ0,ν0,ε) be the set of all Rayleigh-fading WSSUSchannels H with scattering function CH(τ ,ν) that satisfies

ν0∫−ν0

τ0∫−τ0

CH(τ ,ν)dτdν ≥ 1− ε.

We say that the channels in H (τ0,ν0,ε) are underspread if 1H , 4τ0ν0� 1 and ε� 1.

Remark 2.1 Definition 2.1 is inspired by Slepian’s treatment of L2(R) signals that are approximately limitedin time and frequency (Slepian, 1976). Note that ε = 0 in Definition 2.1 yields the compact-support underspreaddefinition of Durisi et al. (2010) and Kozek (1997). An alternative definition of the underspread property, also notrequiring that CH(τ ,ν) is compactly supported, was used in Matz and Hlawatsch (2003). The definition in Matzand Hlawatsch (2003) is in terms of moments of the scattering function.

Typical wireless channels are (highly) underspread, with most of the volume of CH(τ ,ν) supportedwithin a rectangle of area 1H ≈ 10−3 for typical land-mobile channels, and as low as 10−7 for someindoor channels with restricted mobility of the terminals (Hashemi, 1993; Parsons, 2000; Rappaport,2002). In the remainder of the chapter, we refer to 1H as the channel spread.

2.2.1.3 Continuous-Time Channel Model and CapacityThe goal of this chapter is to provide a characterization of the capacity of the continuous-time channelwith I/O relation (2.1), under the assumptions that:



1. Neither the transmitter nor the receiver knows the realizations of h(t,τ), but both are aware of thechannel statistics. For the Rayleigh-fading WSSUS channel model, knowledge of the channelstatistics amounts to knowledge of the channel scattering function.

2. The input signal s(t) is subject to a bandwidth constraint, an average-power constraint, and apeak-power constraint.

A formal definition of the capacity of the continuous-time channel (2.1) can be found in Durisi,Morgenshtern, and Bolcskei (2010). This definition is along the lines of Wyner’s treatment of thecapacity of a bandlimited continuous-time AWGN channel (Gallager, 1968; Wyner, 1966). The key ele-ment in this definition is the precise specification of the set of constraints (approximate time duration,bandwidth, average power, . . . ) that are imposed on the input signal s(t). For the sake of simplicity ofexposition, we refrain from presenting this definition here (the interested reader is referred to Durisi,Morgenshtern, and Bolcskei (2009); Durisi, Morgenshtern, and Bolcskei (2011)). We take, instead, asomewhat less rigorous approach, which has the advantage of simplifying the exposition drasticallyand better illustrates the harmonic analysis aspects of the problem: we first discretize the channel I/Orelation, and then impose a set of constraints on the resulting discretized input signal. These constraints“mimic” the ones that are natural to impose on the underlying continuous-time input signal. We empha-size that all the results stated in the next sections can be made precise in an information-theoretic sensefollowing the approach in Durisi et al. (2010), and Durisi, Morgenshtern, and Bolcskei (2011).

2.2.2 Discretization of the Continuous-Time Input–Output RelationDifferent ways to discretize LTV channels have been proposed in the literature (some are reviewedin Chapter 1); not all the induced discretized I/O relations are, however, equally well suited foran information-theoretic analysis. Stationarity of the discretized system functions and statisticallyindependent noise samples are some of the desiderata regarding the discretization step.

The most common approach to the discretization of random LTV channels is based on sam-pling (Bello, 1963; Medard, 2000; Medard & Gallager, 2002), often combined with a basis expansionmodel (BEM) (see Chapter 1 for a detailed discussion). A different approach, particularly wellsuited for information-theoretic analyses (Gallager, 1968; Wyner, 1966), is based on a channel oper-ator eigendecomposition or, more generally, singular-value decomposition. We briefly review thesetwo approaches and discuss their shortcomings when used for the problem we are dealing within this chapter. These shortcomings will motivate us to pursue a different approach detailed inSection 2.2.2.3.

2.2.2.1 Sampling and Basis ExpansionUnder the assumption that CH(τ ,ν) is compactly supported in ν over the interval [−ν0,ν0], and theinput signal s(t) is strictly bandlimited with bandwidth B (i.e., S( f ), F{s(t)} = 0 for | f |> B), the I/Orelation (2.1) can be discretized by means of the sampling theorem (see, for example, Artes, Matz, andHlawatsch (2004) for a detailed derivation). The resulting discretized I/O relation is given by

r

(n

fo

)=

1

fi

∞∑m=−∞

hc

(n

fo,m

fi

)s

(n

fo−

m

fi

)+w

(n

fo

), (2.5)



where fo = 2(B+ ν0), fi = 2B, and

hc(t,τ),

∞∫−∞

h(t,z)sin[2πB(τ − z)]

π(τ − z)dz. (2.6)

The discretized I/O relation (2.5) can be further simplified if the input signal is oversampled, i.e., if fiis chosen to be equal to fo. In this case, (2.5) can be rewritten as

r[n]=∞∑

m=−∞

hc[n,m]s[n−m]+w[n]. (2.7)

One evident limitation of the sampling approach just discussed is the compact-support assumptionon CH(τ ,ν) (cf. Definition 2.1). Furthermore, as a consequence of (2.6), hc[n,m] does not inherit theUS property of h(t,τ), a fact that makes the information-theoretic analysis of (2.7) more involved.Finally, the apparently harmless oversampling step imposes an implicit constraint on the set of theinput-signal samples; this constraint is hard to account for in an information-theoretic analysis. Morespecifically, if the input-signal samples are chosen in an arbitrary way, the resulting continuous-timeinput signal may have a bandwidth as large as B+ ν0, rather than just B.

Often, the sampling approach is combined with a BEM. In the most common form of the BEM, thebasis consists of complex exponentials and hc[n,m] is given by

hc[n,m]=I−1∑i=0

ci[m]e j2πvin, (2.8)

where {νi}I−1i=0 is a set of Doppler frequencies. For general WSSUS underspread channels, the validity

of the modeling assumption underlying (2.8) is difficult to assess, and information-theoretic resultsobtained on the basis of (2.8) might lack generality.

2.2.2.2 Discretization through Channel EigendecompositionAs discussed in Chapter 1, the kernel kH(t, t′), h(t, t− t′) associated with a general LTV channel H(under the assumption that the channel operator H is normal and compact (Durisi et al., 2010; Naylor &Sell, 1982)) can be decomposed as (Naylor & Sell, 1982, Theorem 6.14.1)

kH(t, t′)=

∞∑k=0

λkuk(t)u∗

k (t′), (2.9)

where λk and uk(t) are the channel eigenvalues and eigenfunctions, respectively. The set {uk(t)} isorthonormal and complete in the input space and in the range space of H (Naylor & Sell, 1982,Theorem 6.14.1). Hence, any input signal s(t) and any output signal r(t) can be expressed, withoutloss of generality, in terms of its projections onto the set {uk(t)} according to

s(t)=∑

k

〈s(t),uk(t)〉︸︷︷︸,s[k]

uk(t) (2.10)



and

r(t)=∑

k

〈r(t),uk(t)〉︸︷︷︸,r[k]

uk(t). (2.11)

The decomposition (2.9), together with (2.10) and (2.11), yields a particularly simple I/O relation,which we refer to as channel diagonalization:

r[k]= 〈(Hs)(t)+w(t),uk(t)〉 =∑

k′

s[k′] 〈(Huk′)(t),uk(t)〉︸︷︷︸λk′ δk′k

+〈w(t),uk(t)〉︸︷︷︸,w[k]

= λk s[k]+w[k]. (2.12)

Note that in (2.12), the channel acts on the discretized input through scalar multiplications only. Tosummarize, it follows from (2.10) and (2.11) that channel diagonalization is achieved by transmittingand receiving on the channel eigenfunctions.

The discretization method just described is used in Wyner (1966) to compute the capacity ofbandlimited AWGN channels and in Gallager (1968) to compute the capacity of (deterministic)linear time-invariant (LTI) channels. This approach, however, is not applicable to our setting. Trans-mitting and receiving on the channel eigenfunctions {uk(t)} requires perfect knowledge of {uk(t)}at the transmitter and the receiver. But as H is random, its eigenfunctions uk(t) are (in general)random as well, and, hence, not known at the transmitter and the receiver in the noncoherentsetting.

In contrast, if the eigenfunctions of the random channel H did not depend on the particular realiza-tion of H, we could diagonalize H without knowledge of the channel realizations. This is the case, forexample, for LTI channels, where (deterministic) complex sinusoids are always eigenfunctions, inde-pendently of the realization of the channel impulse response. This observation is crucial for the ensuinganalysis. Specifically, our discretization strategy is based on the following fundamental property ofunderspread LTV channels (see Chapter 1 for details): the eigenfunctions of a random underspreadWSSUS channel can be well approximated by deterministic functions that are well localized in timeand frequency.

Before illustrating the details of the discretization step, we would like to point out two difficultiesthat arise from the approach we are going to pursue. The discretized I/O relation induced by replacingthe channel eigenfunctions in (2.10) and (2.11) with approximate eigenfunctions (or, in other words,induced by transmitting and receiving on the approximate channel eigenfunctions) will, in general,not be as simple as (2.12), because of the presence of “off-diagonal” terms. Furthermore, the set ofapproximate eigenfunctions may not be complete in the input and range spaces of the channel opera-tor H. This results in a loss of dimensions in signal space, i.e., of degrees of freedom, which will needto be accounted for in our information-theoretic analysis.

2.2.2.3 Discretization through Transmission and Reception on a Weyl-Heisenberg SetWe accomplish the discretization of the I/O relation (2.1) by transmitting and receiving on the highlystructured Weyl-Heisenberg (WH) set of time-frequency shifts of a pulse g(t). We denote such a WHset as (g(t),T ,F ),

{gk,l(t)= g(t− kT)e j2π lFt

}k,l∈Z, where T ,F > 0 are the grid parameters of the



WH set (Kozek & Molisch, 1998). The triple g(t),T ,F is chosen so that g(t) has unit energy and thatthe resulting set (g(t),T ,F ) is orthonormal.4 Note that we do not require that the set (g(t),T ,F ) iscomplete in L2(R). The time-frequency localization of g(t) plays an important role in our analysis,because the functions in a WH set constructed from a pulse that is well localized in time and frequencyare approximate eigenfunctions of underspread WSSUS channels (Kozek, 1997; Matz & Hlawatsch,1998, 2003).

We consider input signals of the form

s(t)=K−1∑k=0

L−1∑l=0

s[k, l]gk,l(t). (2.13)

Whenever g(t) is well localized in time and frequency, we can take D , KT to be the approximatetime duration of s(t) and B , LF to be its approximate bandwidth. Note that the lack of completenessof (g(t),T ,F ) implies that there exist signals s(t) ∈ L2(R) that cannot be written in the form (2.13),even when K,L→∞. In other words, we may lose degrees of freedom.

The received signal r(t) is projected onto the signal set{gk,l(t)

}K−1,L−1k=0,l=0 to obtain

〈r(t),gk,l(t)〉︸︷︷︸,r[k,l]

=〈(Hgk,l)(t),gk,l(t)〉︸︷︷︸,h[k,l]

s[k, l]+K−1∑k′=0

L−1∑l′=0

(k′,l′) 6=(k,l)

〈(Hgk′,l′)(t),gk,l(t)〉︸︷︷︸,z[k′,l′,k,l]

s[k′, l′]+〈w(t),gk,l(t)〉︸︷︷︸,w[k,l]

that is

r[k, l]= h[k, l]s[k, l]+K−1∑k′=0

L−1∑l′=0

(k′,l′)6=(k,l)

z[k′, l′,k, l]s[k′, l′]+w[k, l]. (2.14)

We refer to the channel with I/O relation (2.14) as the discretized channel induced by the WH set(g(t),T ,F ). The I/O relation (2.14) satisfies the desiderata we listed at the beginning of Section 2.2.2.Specifically, the channel coefficients h[k, l] inherit the two-dimensional stationarity property ofthe underlying continuous-time system function LH(t, f ) [see (2.2)]. Furthermore, the noise coeffi-cients w[k, l] are i.i.d. CN (0,1) as a consequence of the orthonormality of (g(t),T ,F ). These twoproperties are crucial for the ensuing analysis.

A drawback of (2.14) is the presence of (self-)interference [the second term in (2.14)], which makesthe derivation of capacity bounds involved, as will be seen in Section 2.4.

The signaling scheme (2.13) can be interpreted as PS-OFDM (Kozek & Molisch, 1998), wherethe input symbols s[k, l] are modulated onto a set of orthogonal signals, indexed by discrete time(symbol index) k and discrete frequency (subcarrier index) l. From this perspective, the interferenceterm in (2.14) can be interpreted as intersymbol and intercarrier interference (ISI and ICI). Figure 2.1provides a qualitative representation of the PS-OFDM signaling scheme.

4A systematic approach to constructing orthonormal WH sets is described in Section 2.2.2.6.



t

f

T 2T

F

2F

0

0

Symbol

Subcarrier

FIGURE 2.1

Pulse-shaped OFDM interpretation of the signaling scheme (2.13). The shaded areas represent theapproximate time-frequency support of the pulses gk,l(t).

2.2.2.4 Outline of the Information-Theoretic AnalysisThe program pursued in the next sections is to tightly bound the noncoherent capacity of the discretizedchannel (2.14) under an average-power and a peak-power constraint on the input symbols s[k, l]. Werefer the interested reader to Durisi et al. (2010) and Durisi, Morgenshtern, and Bolcskei (2011) for adiscussion on the relation between the capacity of the discretized channel (2.14) and the capacity ofthe underlying continuous-time channel (2.1).

The derivation of capacity bounds is made difficult by the presence of the interference termin (2.14). Fortunately, to establish our main results in Sections 2.4 and 2.5, it will be sufficient to com-pare a trivial capacity upper bound, i.e., the capacity of an AWGN channel with the same receive poweras in (2.14), with simple capacity lower bounds obtained by treating the interference term in (2.14) asnoise. The corresponding results are of practical interest, as receiver algorithms that take the structureof interference explicitly into account are, in general, computationally expensive. The power of theinterference term in (2.14) depends on the choice of the WH set. As it will be shown in Section 2.2.2.7,good time-frequency localization of the pulse g(t) the WH set is generated from is required for theinterference term to be small. But good time-frequency localization entails a loss of degrees of freedom.

Before discussing the trade-off between interference minimization and maximization of the num-ber of degrees of freedom, we review some important theoretical results on the construction ofWH sets.

2.2.2.5 Orthonormality, Completeness, and LocalizationOrthonormality, completeness, and time-frequency localization are desirable properties of the WHset (g(t),T ,F ). It is, therefore, sensible to ask whether complete orthonormal WH sets generated by



a g(t) with prescribed time-frequency localization exist. The answer is as follows:

1. A necessary condition for the set (g(t),T ,F ) to be orthonormal is TF ≥ 1 (Grochenig, 2001, Cor.7.5.1, Cor. 7.3.2).

2. For TF = 1, it is possible to find orthonormal sets (g(t),T ,F ) that are complete in L2(R)(Christensen, 2003, Th. 8.3.1). These sets, however, do not exhibit good time-frequencylocalization, as a consequence of the Balian-Low Theorem (Christensen, 2003, Th. 4.1.1), whichstates that if (g(t),T ,F ) is orthonormal and complete in L2(R), then ∞∫

−∞

|tg(t)|2dt

∞∫−∞

| fG( f )|2df

=∞,

where G( f ), F{g(t)}.3. For TF > 1, it is possible to have orthonormality and good time-frequency localization

concurrently, but the resulting set (g(t),T ,F ) is necessarily incomplete in L2(R). Lack ofcompleteness entails a loss of degrees of freedom.

4. For TF < 1, it is possible to construct WH sets generated by a well-localized g(t), which are also(over)complete in L2(R). However, as a consequence of overcompleteness, the resulting inputsignal (2.13) cannot be recovered uniquely at the receiver, even in the absence of noise.

Our choice will be to privilege localization and orthonormality over completeness. Theinformation-theoretic results in Sections 2.4 and 2.5 will show that this choice is sound. In the nexttwo sections, we review the mathematical framework that enables the construction of (noncomplete)orthonormal WH sets that are well localized in time and frequency. We furthermore review a classiccriterion for the design of WH sets, which is based on the maximization of the signal-to-interferenceratio (SIR) in (2.14). The information-theoretic analysis in Section 2.5 will yield a design criterion thatis more fundamental.

2.2.2.6 Construction of WH SetsWe next discuss how to construct (noncomplete) WH sets with prescribed time-frequency localization.Frame theory plays a fundamental role in this context.

A WH set (g(t),T ,F ) is called a WH frame for L2(R) if there exist constants A and B ( framebounds) with 0< A≤ B<∞ such that for all s(t) ∈ L2(R), we have

A‖s(t)‖2 ≤∑

k

∑l

|〈s(t),gk,l(t)〉|2≤ B‖s(t)‖2.

When A= B, the frame is called tight. A necessary condition for (g(t),T ,F ) to be a frame for L2(R)is TF ≤ 1 (Grochenig, 2001, Cor. 7.5.1). For specific pulses, sufficient conditions for the corre-sponding WH set to be a frame are known. For example, for the Gaussian pulse g(t)= 21/4e−π t2 ,the condition TF < 1 is necessary and sufficient for (g(t),T ,F ) to be a frame (Grochenig, 2001,Th. 7.5.3).



A frame (g(t),T ,F ) for L2(R) can be transformed into a tight frame (g⊥(t),T ,F) for L2(R)through standard frame-theoretic methods (Christensen, 2003, Th. 5.3.4). Furthermore, the so-obtainedg⊥(t) inherits the decay properties of g(t) (see Bolcskei & Janssen, 2000; Matz et al., 2007 for amathematically precise formulation of this statement).

The key result that makes frame theory relevant for the construction of orthonormal WH sets is theso called duality theorem (Daubechies, Landau, & Landau, 1995; Janssen, 1995; Ron & Shen, 1997),which states that (g(t),T ,F ) with TF ≥ 1 is an orthonormal WH set if and only if (g(t),1/F,1/T) is atight frame for L2(R).

The results summarized above are used in the following example to construct an orthonormal WHset that will be used throughout this chapter.

Example 2.1 (Root-raised-cosine WH set)For later use, we present an example of an orthonormal WH set for the case T= F=

√c, with 1≤ c≤ 2.

Let G(f )= F{g(t)},ζ =√

c, and µ= c−1. We choose G(f ) as the (positive) square root of a raised-cosinepulse:

G( f )=

√ζ , if | f | ≤ 1−µ

2ζ√ζ2 [1+ S( f )], if 1−µ

2ζ ≤ | f | ≤1+µ2ζ

0, otherwise

where S(f)= cos[πζµ

(| f | − 1−µ

2ζ

)]. As (1+µ)/(2ζ )= ζ/2, the function G(f ) has compact support of length

ζ =√

c. Furthermore, G(f ) is real-valued and even, and satisfies

∞∑l=−∞

G( f − l/ζ )G( f − l/ζ − kζ )= ζ δk0.

By (Christensen, 2003, Th. 8.7.2), we can, therefore, conclude that the WH set (g(t),1/√

c,1/√

c) is a tightWH frame for L2(R), and, by duality, the WH set (g(t),

√c,√

c) is orthonormal. Note that, for c= 1 (i.e.,TF= 1), the pulse G(f) reduces to the rectangular pulse 1[−1/2,1/2](f) and, consequently, g(t) reduces to asinc function, which has poor time localization, as expected from the Balian-Low Theorem.

2.2.2.7 Diagonalization and Loss of Degrees of FreedomBy choosing g(t) to be well-localized in time and frequency and TF sufficiently large, we can makethe variance of the interference term in the I/O relation (2.14) small. The drawback is a loss of degreesof freedom, as formalized next.

Let Ag(τ ,ν) denote the ambiguity function of g(t), defined as

Ag(τ ,ν),

∞∫−∞

g(t)g∗(t− τ)e−j2πvtdt.



The variance of both h[k, l]= 〈(Hgk,l)(t),gk,l(t)〉 and z[k′, l′,k, l]= 〈(Hgk′,l′)(t),gk,l(t)〉 can be expressedin terms of Ag(τ ,ν). In fact, as a consequence of the WSSUS property of H, we have that

E{h[k, l]h∗[k′, l′]

}=

∞∫−∞

∞∫−∞

CH(τ ,ν)|Ag(τ ,ν)|2ej2π [(k−k′)Tν−(l−l′)Fτ ]dτdν (2.15)

and, in particular,

σ 2h , E

{|h[k, l]|2

}=

∞∫−∞

∞∫−∞

CH(τ ,ν)|Ag(τ ,ν)|2dτdν. (2.16)

Because |Ag(τ ,ν)|2 ≤ ‖g(t)‖4 = 1 and because the scattering function CH(τ ,ν) was assumed to beof unit volume [see (2.4)], we have that σ 2

h ≤ 1. The relation (2.15) implies that RH[k,k′, l, l′] ,E{h[k, l]h∗[k′, l′]

}= RH[k− k′, l− l′], i.e., that h[k, l] is WSS both in discrete time k and discrete

frequency l.The variance of the interference term in (2.14), under the assumption that the s[k, l] are i.i.d. with

zero mean and unit variance,5 can be upper-bounded as follows:

E{∣∣∣∣ K−1∑

k′=0

L−1∑l′=0

(k′,l′) 6=(k,l)

z[k′, l′,k, l]s[k′, l′]

∣∣∣∣2}= K−1∑k′=0

L−1∑l′=0

(k′,l′)6=(k,l)

E{|z[k′, l′,k, l]|2

}

≤

∞∑k′=−∞

∞∑l′=−∞

(k′,l′) 6=(k,l)

E{|z[k′, l′,k, l]|2

}

=

∞∑k′=−∞

∞∑l′=−∞

(k′,l′) 6=(k,l)

∞∫−∞

∞∫−∞

∣∣Ag(τ + (k′− k)T ,ν+ (l′− l)F)

∣∣2 CH(τ ,ν)dτdν

=

∞∑k=−∞

∞∑l=−∞

(k,l)6=(0,0)

∞∫−∞

∞∫−∞

∣∣Ag(τ ,ν)∣∣2 CH(τ − kT ,ν− lF)dτdν

, σ 2I .

(2.17)

The “infinite-horizon” interference variance σ 2I will turn out to be of particular importance in the

information-theoretic analysis in Sections 2.4 and 2.5. When σ 2I ≈ 0, the I/O relation (2.14) can be

well approximated by the following diagonalized I/O relation

r[k, l]= h[k, l]s[k, l]+w[k, l]. (2.18)

5In Sections 2.4 and 2.5, we will use this assumption to obtain capacity lower bounds explicit in σ 2I .



This simplification eases the derivation of bounds on capacity. As the received signal power in (2.18)is proportional to σ 2

h , it is also desirable to choose WH sets that result in σ 2h ≈ 1 (recall that σ 2

h ≤ 1).Next, we investigate the design criteria a WH set (g(t),T ,F ) needs to satisfy so that σ 2

h ≈ 1and σ 2

I ≈ 0. We assume, for simplicity, that CH(τ ,ν) is compactly supported within the rectangle[−τ0,τ0]× [−ν0,ν0]. Referring to (2.16), (2.17), and to Fig. 2.2, we conclude the following:

1. σ 2h ≈ 1 if the ambiguity function of g(t) satisfies |Ag(τ ,ν)|2 ≈ 1 over the support of the scattering

function.2. σ 2

I ≈ 0 if the ambiguity function of g(t) takes on small values on the rectangles [−τ0+ kT ,τ0+ kT]× [−ν0+ lF,ν0+ lF], for (k, l) ∈ Z2

\ {(0,0)}.

For these two conditions to be satisfied, the spread of the channel 1H = 4τ0ν0 needs to be small,and the grid parameters need to be chosen such that the ambiguity function Ag(τ ,ν) takes on smallvalues outside the solid gray-shaded rectangle centered at the origin in Fig. 2.2. Let Dg and Bg bethe root-mean-square duration and the root-mean-square bandwidth, respectively, of the pulse g(t)(see Chapter 1 for a definition of these two quantities). Then, condition 2 above holds if T > τ0+Dg

and F > ν0+Bg. These two inequalities illustrate the importance of good time-frequency localizationof the pulse g(t) (i.e., small Dg and Bg). In fact, large Dg and Bg imply large T and F, and, consequently,a significant loss of degrees of freedom.

For fixed TF, a simple rule on how to choose the grid parameters T and F follows from the obser-vation that for given τ0 and ν0, the area of the solid rectangle centered at the origin in Fig. 2.2 ismaximized (Kozek, 1997; Kozek & Molisch, 1998; Matz et al., 2007), if

ν0T = τ0F. (2.19)

This rule is commonly referred to as grid matching rule.

T

−F

F

−T

0

2 0

2 0

2 0

2 0

FIGURE 2.2

Support of CH(τ − kT ,ν− lF) for some (k, l) pairs. From (2.16), it follows that σ 2h ≈ 1 if |Ag(τ ,ν)|2 ≈ 1 over the

support of CH(τ ,ν). Furthermore, from (2.17) it follows that σ 2I ≈ 0 if |Ag(τ ,ν)|2 ≈ 0 outside the area shaded

in gray.



Remark 2.2 Whenever T and F are chosen according to the grid matching rule, it is possible to assume, withoutloss of generality, that CH(τ ,ν) in (2.16) and (2.17) is supported on a square rather than a rectangle. A proof ofthis claim follows from a simple coordinate transformation.

Common approaches for the optimization of WH sets, such as the ones described in Matz et al.(2007), aim at finding – for fixed TF – the pulse g(t) that maximizes the ratio σ 2

h /σ2I (which can be

thought of as an SIR). To understand the trade-off between degrees of freedom and SIR, we proceedin a different way, and study how σ 2

h /σ2I varies as a function of TF, for fixed g(t). In Fig. 2.3, we plot

a lower bound on σ 2h /σ

2I for the root-raised-cosine WH sets of Example 2.1, as a function of TF and

for different channel spreads 1H (see Example 2.2 for more details). As expected, the larger TF, thelarger the SIR, but the larger also the loss of degrees of freedom. A common compromise between lossof degrees of freedom and maximization of SIR is to take TF ≈ 1.2 (Matz et al., 2007).

The limitation of the analysis we just outlined is that, although it sheds light on how σ 2h /σ

2I depends

on TF, it does not reveal the influence that σ 2h , σ 2

I , and TF have on the rate achievable when interferenceis treated as noise. An information-theoretic analysis of the trade-off between maximization of SIR andminimization of the loss of degrees of freedom is called for. Such an analysis is provided in Section 2.5.

1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.4 1.45 1.520

25

30

35

40

45

50

55

60

65

70

mg/

Mg

(dB

)

TF

ΔH=10−6

ΔH=10−4

FIGURE 2.3

Trade-off between the product TF of the grid parameters and the signal-to-interference ratio σ 2h /σ

2I for the

root-raised-cosine WH sets constructed in Example 2.1. The two cases 1H = 10−4 and 1H = 10−6 areconsidered. The ratio mg/Mg is a lower bound on σ 2

h /σ2I , as detailed in Example 2.2.



Example 2.2 (Trade-off between TF and σ 2h/σ

2I )

Let D , [−τ0,τ0]× [−ν0,ν0] and assume that CH(τ ,ν) is compactly supported within D and has unitvolume. Then,

σ 2h =

∞∫−∞

∞∫−∞

CH(τ ,ν)|Ag(τ ,ν)|2dτdν ≥ min(τ ,ν)∈D

|Ag(τ ,ν)|2 , mg. (2.20)

Furthermore,

σ 2I =

∞∑k=−∞

∞∑l=−∞

(k,l)6=(0,0)

∞∫−∞

∞∫−∞

|Ag(τ + kT ,ν+ lF)|2CH(τ ,ν)dτdν

≤ max(τ ,ν)∈D

∞∑k=−∞

∞∑l=−∞

(k,l)6=(0,0)

|Ag(τ + kT ,ν+ lF)|2 , Mg. (2.21)

The ratio mg/Mg is obviously a lower bound on the SIR σ 2h /σ

2I , and is easier to compute than σ 2

h /σ2I , because

it depends on the scattering function only through its support area and not its shape. Under the assumptionthat T and F are chosen according to the grid matching rule (2.19), it is sufficient to study the ratio mg/Mg

exclusively for the case when the scattering function is compactly supported within a square in the delay-Doppler plane (see Remark 2.2). In Fig. 2.3, the ratio mg/Mg is plotted as a function of TF and for differentvalues of 1H for the family of root-raised-cosine WH sets defined in Example 2.1. The curves in Fig. 2.3can be used to determine the value of TF needed to achieve a prescribed SIR for a given channel spread.The ratio mg/Mg increases significantly when TF is taken only slightly larger than 1. A further increase of TFproduces a much less pronounced increase of the ratio mg/Mg.

2.2.2.8 Large-Bandwidth and High-SNR RegimesThe effect of a loss of degrees of freedom on capacity depends on the operating regime of the system.To illustrate this point, let us consider, for simplicity, an AWGN channel and input signals subject to anaverage-power constraint only. Two operating regimes can be identified, where the impact of a loss ofdegrees of freedom is drastically different: the large-bandwidth (or power-limited, or low-SNR) regimeand the high-SNR (or bandwidth-limited) regime (Tse & Viswanath, 2005).

In the large-bandwidth regime, capacity is essentially proportional to the receive power, and onlymildly sensitive to the number of degrees of freedom. Hence, a loss of degrees of freedom is irrelevantin this regime. In contrast, in the high-SNR regime, capacity grows linearly with the number of degreesof freedom and is only mildly sensitive to the receive power. Therefore, a loss of degrees of freedomis critical in this regime.

One of the main contributions of this chapter is to illustrate that for all SNR values of practicalinterest, the noncoherent capacity of a Rayleigh-fading underspread WSSUS channel is close to thecapacity of an AWGN channel with the same receive power. A key element to establish this result is



an appropriate choice of the WH set (g(t),T ,F ) that is used to discretize the channel. Not surprisingly,this choice turns out to depend on the operating regime of the system. In particular,

1. In the large-bandwidth regime, where capacity is only mildly sensitive to a loss of degrees offreedom, it is sensible to choose (g(t),T ,F ) so that σ 2

h ≈ 1 and σ 2I ≈ 0, and replace the discretized

I/O relation (2.14) by the much simpler diagonalized I/O relation (2.18). In Section 2.3, we studythe noncoherent capacity of the diagonalized channel (2.18) in the large-bandwidth regime. Then,in Section 2.4, we assess how well the noncoherent capacity of (2.18) approximates that of (2.14)in this regime. The analysis in these two sections sheds light on the impact of bandwidth, numberof antennas, and shape of the scattering function on capacity, and allows us to derive guidelines onthe choice of the WH set.

2. In the high-SNR regime, where capacity is sensitive to a loss of degrees of freedom, the choiceof (g(t),T ,F ) that leads to σ 2

h ≈ 1 and σ 2I ≈ 0 may be inadequate, as it could entail a dimension

loss that is too high. We, therefore, have to work directly with the I/O relation (2.14), whichaccounts for ISI/ICI. In Section 2.5, we study the noncoherent capacity of (2.14) in the high-SNRregime. In particular, we address the dependency of noncoherent capacity on the support of thescattering function, and we discuss, from an information-theoretic perspective, the trade-offbetween maximization of the number of degrees of freedom and maximization of SIR.

2.3 THE LARGE-BANDWIDTH REGIME: DIAGONALIZED I/O RELATIONAs a first step toward the characterization of the noncoherent capacity of the channel (2.14) in thelarge-bandwidth regime, we present in this section tight bounds on the noncoherent capacity of thediagonalized channel (2.18). In Section 2.4, we will then discuss how well the noncoherent capac-ity of (2.18) approximates that of (2.14). We chose this two-level approach because bounds on thenoncoherent capacity of (2.18) are much easier to derive than for (2.14). Furthermore, the information-theoretic tools presented in this section will also turn out useful for the analysis of the noncoherentcapacity of (2.14) presented in Section 2.4 and 2.5.

Throughout this section, we shall then focus on the diagonalized I/O relation (2.18), which werecall is obtained by discretizing the continuous-time channel I/O relation (2.1) by means of a WHset (g(t),T ,F ) for which σ 2

h ≈ 1 and σ 2I ≈ 0. As a consequence, the capacity bounds we shall derive

in this section, will depend on (g(t),T ,F ) only through the grid parameters T and F. A more refinedanalysis, which is based on the I/O relation with ISI/ICI (2.14) and, hence, leads to bounds explicitin g(t), will be presented in Section 2.4.

We shall assume throughout that the scattering function of the underlying Rayleigh-fadingunderspread WSSUS channel is compactly supported within [−τ0,τ0]× [−ν0,ν0], and that the gridparameters satisfy the Nyquist condition T ≤ 1/(2ν0) and F ≤ 1/(2τ0). These assumptions are notfundamental: they merely serve to simplify the analytic expressions for our capacity bounds.

It is convenient for our analysis to rewrite the I/O relation (2.18) in vector form. As discussedin Section 2.2.2.3, we let D= KT be the approximate duration of the continuous-time input sig-nal s(t) in (2.13) and B= LF be its approximate bandwidth. We denote by s the KL-dimensionalinput vector that contains the input symbols s[k, l]. The exact way the input symbols s[k, l] areorganized into the vector s is of no concern here, as we will only provide a glimpse of the proof


2.3 The Large-Bandwidth Regime: Diagonalized I/O Relation 83

of the capacity bounds. The detailed proof can be found in Durisi et al. (2010). Similarly, thevector r contains the output-signal samples r[k, l], the vector h contains the channel coefficients h[k, l],and w contains the AWGN samples w[k, l]. With these definitions, we can now express the I/Orelation (2.18) as

r= h�s+w, (2.22)

where � denotes the Hadamard element-wise product.

2.3.1 Power ConstraintsWe assume that the average power and the peak power of the input signal are constrained as follows:

1. The average power satisfies

1

TE{∥∥s

∥∥2}≤ KP, (2.23)

where P denotes the admissible average power.2. Among the several ways in which the peak power of the input signal in (2.18) can be constrained

(see Durisi et al., 2010 for a detailed discussion), here, we exclusively analyze the case where ajoint limitation in time and frequency is imposed, i.e., the case where the amplitude of the inputsymbols s[k, l] in each time-frequency slot (k, l) is constrained:

1

T|s[k, l]|2 ≤ β

P

L. (2.24)

Here, β ≥ 1 is the nominal peak-to-average-power ratio (PAPR). This type of peak constraint isof practical relevance. It models, e.g., a limitation of the radiated peak power in a given frequencyband; it also mimics regulatory peak constraints, such as those imposed on UWB systems.

Note that, according to (2.24), the admissible peak power per time-frequency slot goes to zero as thebandwidth [and, hence, L in (2.24)] goes to infinity. An important observation is that the peak constraintin (2.24) depends on the total available bandwidth, rather than the bandwidth that is effectively used bythe input signal. As a consequence, it can be shown (Durisi et al., 2010) that the peak constraint (2.24)enforces the use of the total available bandwidth. Consequently, the PAPR of the signal transmittedin each time-frequency slot is effectively limited, and signals with unbounded PAPR, like flash sig-nals (Verdu, 2002), are ruled out. Input alphabets commonly used in current systems, like phase-shiftkeying (PSK) and quadrature-amplitude modulation (QAM), satisfy the constraint (2.24). In contrast,Gaussian inputs, which are often used in information-theoretic analyses, do not satisfy (2.24).

We note that the power constraints (2.23) and (2.24) are imposed on the input symbols s[k, l],rather than on the continuous-time signal s(t). While for the average-power constraint, it is possible tomake our analysis more rigorous and let the constraint on s[k, l] follow from an underlying constrainton s(t) (Durisi et al., 2010), the same does not seem to hold true for the peak-power constraint: a limiton the amplitude of s[k, l] does not generally imply a limit on the amplitude of s(t).



2.3.2 Definition of Noncoherent CapacityLet Q be the set of probability distributions on s that satisfy the average-power constraint (2.23) and thepeak-power constraint (2.24). The noncoherent capacity of the channel (2.18) is given by (Sethuramanand Hajek, 2005)

C , limK→∞

1

KTsupQ

I(r;s), (2.25)

where I(r;s) denotes the mutual information between the KL-dimensional output vector r and the KL-dimensional input vector s (Cover & Thomas, 1991). The noncoherent capacity C is notoriously hardto characterize analytically. In the next subsections, we present the following bounds on C:

1. An upper bound Uc, which we refer to as coherent-capacity upper bound, that is based on theassumption that the receiver has perfect knowledge of the channel realizations. The derivation ofthis bound is standard (see, e.g., Biglieri et al., 1998, Sec. III.C.1).

2. An upper bound U1 that is explicit in the channel scattering function and extends the upperbound (Sethuraman et al., 2009, Prop. 2.2) on the capacity of frequency-flat time-selectivechannels to general underspread WSSUS channels.

3. A lower bound L1 that extends the lower bound (Sethuraman et al., 2005, Prop. 2.2) to generalunderspread WSSUS channels. This bound is explicit in the channel scattering function only forlarge bandwidth.

Our focus here will be on the engineering insights that can be obtained from the bounds; we willgive a flavor of the derivations and refer the reader to Durisi et al. (2010) and Schuster, Durisi, Bolcskei,and Poor (2009) for detailed derivations.

2.3.3 A Coherent-Capacity Upper BoundThe assumptions that the receiver perfectly knows the instantaneous channel realizations and that theinput vector s is subject only to an average-power constraint furnish the following standard capacityupper bound (Biglieri et al., 1998, Sec. III.C.1)

Uc(B),B

TFEh

{ln

(1+

PTF

B|h|2

)}, (2.26)

where h∼ CN (0,1). As the upper bound Uc increases monotonically with bandwidth, this bounddoes not reflect the noncoherent capacity behavior for large bandwidth accurately. Nevertheless, weshall see in Section 2.3.6, by means of numerical examples, that Uc is quite useful over a large rangeof bandwidth values of practical interest.

2.3.4 An Upper Bound on Capacity that Is Explicit In CH(τ ,ν)

In this section, we derive an upper bound on C that will help us to better understand the dependency ofnoncoherent capacity on bandwidth in the large-bandwidth regime. An important feature of this boundis that it is explicit in the channel scattering function. To simplify the exposition of the key steps usedto obtain this bound, we first focus on a very simple channel model.



2.3.4.1 Bounding IdeaLet h∼ CN (0,1) denote the random gain of a memoryless flat-fading channel with input s, outputr, and additive noise w∼ CN (0,1), i.e., with I/O relation r = hs+w. Let Q denote the set of allprobability distributions on s that satisfy the average-power constraint E{|s|2} ≤ P and the peak con-straint |s|2 ≤ βP. We obtain an upper bound on the capacity supQ I(r;s) along the lines of Sethuramanet al. (2009, Prop. 2.2) (see also Biglieri et al., 1998, p. 2636) as follows:

supQ

I(r;s)(a)= sup

Q

{I(r;s,h)− I(r;h |s)

}(b)≤ sup

Q

{ln(1+E

{|s|2

})−E

{ln(1+ |s|2)

}}= sup

0≤α≤1supQ

E{|s|2}=αP

{ln(1+E

{|s|2

})−E

{ln(1+ |s|2)

}}

(c)≤ sup

0≤α≤1

{ln(1+αP)− inf

QE{|s|2}=αP

E

{inf|u|2≤βP

ln(1+ |u|2)

|u|2|s|2

}}

(d)= sup

0≤α≤1

{ln(1+αP)−

α

βln(1+βP)

}. (2.27)

Here, (a) follows from the chain rule for mutual information (Cover & Thomas, 1991); the inequal-ity (b) results because we take hs as JPG with variance E{|hs|2} = E{|s|2}. To obtain the inequality (c),we multiply and divide ln(1+ |s|2) by |s|2 and lower-bound the resulting term ln(1+ |s|2)/|s|2 by itsinfimum over all inputs s that satisfy the peak constraint. Finally, (d) results because ln(1+ |u|2)/|u|2

is monotonically decreasing in |u|2, so that its infimum is achieved for |u|2 = βP.If the supremum in (2.27) is achieved for α = 1, the upper bound simplifies to

C ≤ ln(1+P)−1

βln(1+βP).

This bound can be interpreted as the capacity of an AWGN channel with SNR equal to P minusa penalty term that quantifies the capacity loss due to channel uncertainty. The higher the allowedpeakness of the input, as measured by the PAPR β, the smaller the penalty.

In spite of its simplicity, the upper bound (2.27) is tight in the low-SNR regime we are interestedin (in this section). More precisely, the Taylor-series expansion of the upper bound (2.27) aroundthe point P= 0 matches that of capacity up to second order (Sethuraman et al., 2009, Prop. 2.1). Incontrast, at high SNR, the upper bound (2.27) exhibits an overly optimistic behavior. In fact, the boundscales logarithmically in P, while the high-SNR capacity scaling for memoryless channels is doublylogarithmic (Lapidoth & Moser, 2003; Taricco & Elia, 1997).

2.3.4.2 The Actual BoundFor the I/O relation of interest in this chapter, namely (2.18), the derivation of a bound similarto (2.27) is complicated by the correlation that h[k, l] exhibits in k and l. A key element in thisderivation is the relation between mutual information and minimum mean-square error (MMSE)



estimation (Guo et al., 2005), which leads through the classic formula for the infinite-horizon non-causal prediction error for stationary Gaussian processes (Poor, 1994, Eq. (V.D.28)) to a closed-formexpression that is explicit in the channel scattering function. The resulting upper bound on Cis presented in Theorem 2.1 below. A detailed derivation of this upper bound can be found inDurisi et al. (2010).

Theorem 2.1 The noncoherent capacity of the channel (2.18), under the assumption that the input signal satisfiesthe average-power constraint (2.23) and the peak-power constraint (2.24), is upper-bounded as C ≤ U1, where

U1(B),B

TFln

(1+α(B)P

TF

B

)−α(B)ψ(B) (2.28a)

with

α(B), min

{1,

B

TF

(1

ψ(B)−

1

P

)}(2.28b)

and

ψ(B),B

β

∞∫−∞

∞∫−∞

ln

(1+

βP

BCH(τ ,ν)

)dτdν. (2.28c)

The bound U1 approaches zero for B→∞ (Durisi et al., 2010, Sec. III.E), a behavior that is tobe expected because the peak constraint (2.24) forces the input signal to be spread out over the totalavailable bandwidth. This behavior is well known (Medard & Gallager, 2002; Subramanian & Hajek,2002; Telatar & Tse, 2000). However, as the bound (2.28) is explicit in B, it allows us to characterize thecapacity behavior also for finite bandwidth. In particular, through a numerical evaluation of U1 (andof the lower bound we shall derive in Section 2.3.5), it is possible to (coarsely) identify the criticalbandwidth for which capacity is maximized (see Section 2.3.6).

By means of a Taylor-series expansion, the bound U1 can be shown to be tight in the large-bandwidth regime. More precisely, the Taylor-series expansion of U1 around the point 1/B= 0 matchesthat of capacity up to first order (Durisi et al., 2010).

2.3.4.3 Some SimplificationsSimilarly to the very simple memoryless flat-fading channel analyzed in Section 2.3.4.1, if the mini-mum in (2.28b) were attained for α(B)= 1, the first term of the upper bound U1 in (2.28a) couldbe interpreted as the capacity of an effective AWGN channel with power P and B/(TF) degrees offreedom, whereas the second term could be seen as a penalty term that characterizes the capacityloss due to channel uncertainty. It turns out, indeed, that α(B)= 1 minimizes (2.28b) for virtually allwireless channels and SNR values of practical interest. In particular, a sufficient condition for α(B)= 1is (Durisi et al., 2010)

1H ≤β

3TF(2.29a)



and

0≤P

B<1H

β

[exp

(β

2TF1H

)− 1

]. (2.29b)

As virtually all wireless channels are highly underspread, as β ≥ 1, and as, typically, TF ≈ 1.2 (seeSection 2.2.2.7), condition (2.29a) is satisfied in most real-world application scenarios, so that the onlyrelevant condition is (2.29b); but even for large channel spread 1H, this condition holds for all SNRvalues6 P/B of practical interest. As an example, consider a system with β = 1 and spread1H = 10−2;for this choice, (2.29b) is satisfied for all SNR values less than 153dB. As this value is far in excess ofthe SNR encountered in practical systems, we can safely claim that a capacity upper bound of practicalinterest results if we substitute α(B)= 1 in (2.28a).

2.3.4.4 Impact of Channel CharacteristicsThe spread 1H and the shape of the scattering function CH(τ ,ν) are important characteristics of wire-less channels. As the upper bound (2.28) is explicit in the scattering function, we can analyze itsbehavior as a function of CH(τ ,ν). We restrict our discussion to the practically relevant case α(B)= 1.

Channel spreadFor fixed shape of the scattering function, the upper bound U1 decreases for increasing spread 1H.To see this, we define a normalized scattering function CH(τ ,ν) supported on a square with unit area,so that

CH(τ ,ν)=1

1HCH

(τ

2τ0,ν

2ν0

).

By a change of variables, the penalty term can now be written as

ψ(B)=B

β

∞∫−∞

∞∫−∞

ln

(1+

βP

BCH(τ ,ν)

)dτdν

=B1H

β

1/2∫−1/2

1/2∫−1/2

ln

(1+

βP

B1HCH(τ ,ν)

)dτdν.

Because 1H ln(1+ ρ/1H) is monotonically increasing in 1H for any positive constant ρ > 0, thepenalty term ψ(B) increases with increasing spread 1H. Consequently, as the first term in (2.28a)does not depend on 1H, the upper bound U1 decreases with increasing spread. Because of the Fourierrelation CH(τ ,ν)= Ft→ν, f→τ {RH(t, f )}, a larger spread implies less correlation in time, frequency, orboth; but a channel with less correlation is harder for the receiver to learn; hence, channel uncertaintyincreases, which ultimately reduces capacity. In a typical system that uses pilot symbols to estimate thechannel, a larger spread means that more pilots are required to reliably estimate the channel, so thatfewer degrees of freedom are left to transmit data.

6Recall that the noise variance was normalized to 1.



Shape of the scattering functionFor fixed spread 1H, the scattering function that minimizes the upper bound U1 is the “brick-shaped”scattering function: CH(τ ,ν)= 1/1H for (τ ,ν) ∈ [−τ0,τ0]× [−ν0,ν0]. This observation follows fromJensen’s inequality applied to the penalty term in (2.28c), the normalization of CH(τ ,ν) in (2.4), andthe fact that a brick-shaped scattering function achieves the resulting upper bound.

First design sketches of a communication system often rely on simple channel parameters like themaximum multipath delay τ0 and the maximum Doppler shift ν0. These two parameters completelyspecify a brick-shaped scattering function, which we just saw to provide the minimum capacity upperbound among all WSSUS channels with a scattering function of prescribed τ0 and ν0. Hence, a designon the basis of τ0 and ν0 alone is implicitly targeted at the worst-case channel and thus results in arobust design.

2.3.5 A Lower Bound on CapacityTypically, lower bounds on capacity are easier to obtain than upper bounds because it is sufficientto evaluate the mutual information in (2.25) for an input distribution that satisfies the power con-straints (2.23) and (2.24). The main difficulty here is to find input distributions that lead to tight lowerbounds. As we are going to show in Section 2.3.6, a good trade-off between analytical tractability andtightness of the resulting bound follows from the choice of i.i.d. zero-mean constant modulus inputsymbols. Constant modulus input symbols are often found in practical systems in the form of PSKconstellations, especially in systems designed to operate at low SNR.

2.3.5.1 Bounding IdeaAs with the upper bound in Theorem 2.1, we illustrate the main steps in the derivation of the lowerbound by considering a simple memoryless channel with I/O relation r = hu+w. Here, we pick u tobe of zero mean and constant modulus |u|2 = P. A word of warning is appropriate at this point. Thechoice of transmitting constant-modulus signals on a memoryless fading channel (with the channel notknown at the receiver) is a bad one. It is easy to show that the corresponding mutual information I(r;u)(which is a lower bound on the capacity of the memoryless channel) equals zero. The underlyingreason is that in constant modulus signals, the information is encoded in the phase of the signal. Butfor the setting we just described, the conditional probability of r given u depends on the amplitudeof u only. Disturbing as this observation might seem, the bounding steps presented below lead to aperfectly sensible lower bound on the capacity of the channel (2.18) we are interested in. In fact, as thediscretized channel h[k, l] in (2.18) exhibits correlation both in k and l, the conditional probability of rgiven s [see (2.22)] depends on both the phase and the amplitude of the entries of s. Consequently, aswe shall see in Section 2.3.6, we have that I(r;s) > 0 also when constant-modulus inputs are used.

We now derive a lower bound on I(r;u), which not surprisingly will turn out to be negative. Weuse the chain rule to split the mutual information I(r;u) and obtain

I(r;u)= I(r;u,h)− I(r;h |u)

= I(r;h)+ I(r;u |h)− I(r;h |u)

(a)≥ I(r;u |h)− I(r;h |u)



(b)= I(r;u |h)−E

{ln(1+ |u|2)

}(c)= I(r;u |h)− ln(1+P). (2.30)

To get the inequality (a), we used the chain rule for mutual information twice, and then dropped thenonnegative term I(r;h). This essentially splits the mutual information into a first component thatcorresponds to the case when perfect channel knowledge at the receiver is available, and a secondcomponent that [like in the upper bound (2.27)] can be interpreted as a penalty term, and quantifies theimpact of channel uncertainty. Next, (b) follows because, given the input u, the output r is JPG withvariance 1+ |u|2. Finally, (c) results as u is of constant modulus with |u|2 = P. As expected, the lowerbound we arrived at is negative. In fact, the first term in the lower bound (2.30), which is a “coherent”mutual information, is always less than the second term, which is the capacity of an AWGN channelwith the same receive power.

If the input signal is subject to a peak-power constraint, with PAPR β strictly larger than 1, we canimprove upon (2.30) by time sharing (Sethuraman et al., 2005, Cor. 2.1): we let the input signal havesquared magnitude γP during a fraction 1/γ of the total transmission time, where 1≤ γ ≤ β – that is,we set the channel input s to be s=

√γ u during this time; for the remaining time, the transmitter is

silent, so that the constraint on the average power is satisfied. The resulting bound is

supQ

I(r;s)≥ max1≤γ ≤β

1

γ

[I(r;√γ u |h)− ln(1+ γP)

].

Because a closed-form expression for the coherent mutual information I(r;√γ u |h) does not

exist for constant modulus u, numerical methods are needed for evaluating the lower bound justderived.

2.3.5.2 The Actual BoundTwo difficulties arise when trying to derive a bound similar to (2.30) on the capacity of the channel(2.18): we need to account for the correlation that h[k, l] exhibits in k and l, and we need to computethe limit K→∞ in (2.25). We choose the input symbols to be i.i.d. of zero mean and constant modu-lus |s[k, l]|2 = PT/K. The coherent mutual information I(r;s |h) is then simply given by KL× I(r;s |h),i.e., KL times the coherent mutual information of the scalar memoryless Rayleigh-fading channel weanalyzed previously.

In the limit K→∞, the penalty term I(r;h |s) is explicit in the matrix-valued spectrum Ph(θ) ofthe multivariate channel process {h[k] ,

(h[k,0] h[k,1] · · · h[k,L− 1]

)T}

Ph(θ),∞∑

k=−∞

Rh[k]e−j2πkθ , |θ | ≤1

2, (2.31)

where Rh[k] , E{h[k′+ k]hH[k′]

}. This result follows from a generalization of Szego’s theorem on

the asymptotic distribution of Toeplitz matrices (see Durisi et al., 2010 for further details).The final lower bound on capacity is stated in the following theorem. As before, a detailed

derivation can be found in Durisi et al. (2010).



Theorem 2.2 The noncoherent capacity of the channel (2.18), under the assumption that the input signal satisfiesthe average-power constraint (2.23) and the peak-power constraint (2.24), is lower-bounded as C ≥ L1, where

L1(B)= max1≤γ ≤β

{B

γTFI(r;√γ u |h)−

1

γT

1/2∫−1/2

lndet

{IL+

γPTF

BPh(θ)

}dθ

}. (2.32)

2.3.5.3 Some ApproximationsThe lower bound (2.32) differs from the upper bound in Theorem 2.1 in two important aspects. (i)The first term inside the braces in (2.32) cannot be expressed in closed form but needs to be evaluatednumerically because of the constant modulus signaling assumption. (ii) The penalty term depends onthe scattering function only indirectly through Ph(θ), which again complicates the evaluation of thelower bound. The reason for the complicated structure are edge effects caused by the finite band-width B= LF. In the large-bandwidth regime, however, we can approximate the penalty term byexploiting the asymptotic equivalence between Toeplitz and circulant matrices (Gray, 2005). Thisyields

L1(B)≈ La(B), max1≤γ ≤β

{B

γTFI(r;√γ u |h)−

B

γ

∞∫−∞

∞∫−∞

ln

(1+

γP

BCH(τ ,ν)

)dτdν

}. (2.33)

Furthermore, we can replace the mutual information I(r;√γ u |h) in (2.33) by its first-order Taylor

series expansion for B→∞ (Verdu, 2002, Th. 14), to obtain the approximation

La(B)≈ Laa(B), max1≤γ ≤β

{P−

γP2TF

B−

B

γ

∞∫−∞

∞∫−∞

ln

(1+

γP

BCH(τ ,ν)

)dτdν

}. (2.34)

Both La and Laa are no longer true lower bounds, yet they agree with L1 in (2.32) for large B. Moredetails on how well La and Laa approximate L1 can be found in Durisi et al. (2010).

2.3.6 A Numerical ExampleWe evaluate the bounds presented previously for the following set of practically relevant systemparameters:

1. Brick-shaped scattering function with maximum delay τ0 = 0.5µs, maximum Dopplershift ν0 = 5Hz, and corresponding spread 1H = 4τ0ν0 = 10−5.

2. Grid parameters T = 0.35ms and F = 3.53kHz, so that TF ≈ 1.25 and ν0T = τ0F, as suggested bythe design rule (2.19).

3. Receive power normalized with respect to the noise spectral density P/(1W/Hz)= 2.42 · 107s−1.

These parameter values are representative of several different types of systems. For example:

1. An IEEE 802.11a system with transmit power of 200 mW, path loss of 118 dB, and receiver noisefigure (Razavi, 1998) of 5 dB; the path loss is rather pessimistic for typical indoor link distancesand includes the attenuation of the signal, e.g., by a concrete wall.

2. A UWB system with transmit power of 0.5 mW, path loss of 77 dB, and receiver noise figureof 20 dB.



0.01 0.1 1 10 100 1000

5

10

15

20

25

30

35

40

Bandwidth (GHz)

Rat

e (M

bit/s

)

U1

Uc L1

La

Laa

AWGN

FIGURE 2.4

The coherent-capacity upper bound Uc in (2.26), the upper bound U1 in (2.28), as well as the lower bound L1

in (2.32) (evaluated for a QPSK input alphabet) and its large-bandwidth approximations La and Laa in (2.33)and (2.34), respectively, for β = 1 and a brick-shaped scattering function with spread 1H = 10−5. Thenoncoherent capacity is confined to the hatched area. The AWGN-capacity upper bound (2.35) is also plottedfor reference.

As illustrated in Fig. 2.4, the upper bound U1 and the lower bound L1 (which is evaluated fora QPSK input alphabet) take on their maximum at a large but finite bandwidth; above this criticalbandwidth, additional bandwidth is detrimental, and the capacity approaches zero as the bandwidthincreases further. In this regime, the rate gain resulting from the additional degrees of freedom is offsetby the resources required to resolve channel uncertainty. In particular, we can see from Fig. 2.4 thatmany current wireless systems operate well below the critical bandwidth.

For bandwidth values smaller than the critical bandwidth, L1 comes quite close to the coherent-capacity upper bound Uc; we show in Section 2.4.2 that this gap can be further reduced by a moresophisticated choice of the input distribution.

We finally note that, for a large range of bandwidth values of practical interest, both U1 and L1(and, hence, the noncoherent capacity of (2.18)) are close to the capacity of an AWGN channel withthe same receive power (a trivial upper bound on (2.25)), given by7

Cawgn(B)= B ln

(1+

P

B

). (2.35)

This observation will be helpful in Section 2.4, where we quantify how well the noncoherent capacityof the diagonalized channel (2.18) approximates that of the channel with ISI/ICI (2.14) in the large-bandwidth regime.

7In the remainder of the chapter, we will refer to Cawgn as AWGN-capacity upper bound.



2.3.7 Extension to the Multiantenna SettingBandwidth is the main source of degrees of freedom in wireless communications, but it also resultsin channel uncertainty. Multiple antennas at the transmitter and the receiver can be used to increasethe number of degrees of freedom even further. Hence, questions of engineering relevance are (i) howchannel uncertainty grows with the number of antennas, (ii) if there is a regime where the additionalspatial degrees of freedom are detrimental just as the degrees of freedom associated with bandwidthare, and (iii) what is the impact on capacity of spatial correlation across antennas.

2.3.7.1 Modeling Multiantenna Channels – A Formal ExtensionTo answer the questions above, we need an appropriate model for multiantenna underspread WSSUSchannels. As the accurate modeling of multiantenna LTV channels goes beyond the scope of thischapter, we limit our analysis to a multiantenna channel model that is a formal extension of the single-input single-output (SISO) model used so far. More precisely, we extend the diagonalized SISO I/Orelation (2.18) to a multiple-input multiple-output (MIMO) I/O relation with MT transmit antennas,indexed by q, and MR receive antennas, indexed by r. We assume that all component channels, whichwe denote by hr,q[k, l], are identically distributed, although not necessarily statistically independent.

For a given slot (k, l), we arrange the corresponding component channels hr,q[k, l] in an MR×MTmatrix H[k, l] with entries [H[k, l]]r,q = hr,q[k, l]. The diagonalized I/O relation of the MIMO channelis then given by

r[k, l]=H[k, l]s[k, l]+w[k, l], (2.36)

where, for each slot (k, l),w[k, l] is the MR-dimensional noise vector, s[k, l] is the MT-dimensional inputvector, and r[k, l] is the output vector of dimension MR. We allow for spatial correlation accordingto the separable (Kronecker) correlation model (Chuah, Tse, Kahn, & Valenzuela, 2002; Kermoal,Schumacher, Pedersen, Mogensen, & Frederiksen, 2002), i.e.,

E{hr,q[k′+ k, l′+ l]h∗r′,q′ [k

′, l′]}= B[r,r′]A[q,q′]RH[k, l].

The MT×MT matrix A with entries [A]q,q′ = A[q,q′] is called the transmit correlation matrix, andthe MR×MR matrix B with entries [B]r,r′ = B[r,r′] is the receive correlation matrix. We normalize Aand B so that tr{A} =MT and tr{B} =MR. Finally, we let σ0 ≥ σ1 ≥·· ·≥ σMT−1 be the eigenvaluesof A, and λ0 ≥ λ1 ≥·· ·≥ λMR−1 the eigenvalues of B.

A detailed discussion and formal description of the MIMO extension just outlined can be foundin Schuster (2009) and Schuster et al. (2009).

2.3.7.2 Capacity Bounds for MIMO Channels in the Large-Bandwidth RegimeUpper and lower bounds on the capacity of the MIMO channel (2.36), under the average-powerconstraint

1

TE

{K−1∑k=0

L−1∑l=0

‖s[k, l]‖2}≤ KP (2.37)

and the peak-power constraint

1

T‖s[k, l]‖2 ≤ β

P

L(2.38)



can be obtained using the same techniques as in the SISO case. The resulting upper and lower boundsare presented in Theorems 2.3 and 2.4 below. A detailed derivation of these bounds can be foundin Schuster (2009) and Schuster et al. (2009).

2.3.7.3 The Upper BoundTheorem 2.3 The noncoherent capacity of the channel (2.36), under the assumption that the input signal satisfiesthe average-power constraint (2.37) and the peak-power constraint (2.38), is upper-bounded as C ≤ Umimo

1 , where

Umimo1 (B), sup

0≤α≤σ0

MR−1∑r=0

[B

TFln

(1+αλr

PTF

B

)−αψr(B)

](2.39a)

with

ψr(B),B

σ0β

∞∫−∞

∞∫−∞

ln

(1+

σ0λrβP

BCH(τ ,ν)

)dτdν. (2.39b)

Similar to the single-antenna case (see Section 2.3.4.3), it can be shown that for virtually allSNR values of practical interest, the supremum in (2.39a) is attained for α = σ0. Hence, the upperbound can be interpreted as the capacity of a set of MR parallel AWGN channels with B/(TF) degreesof freedom and received power σ0λrP, minus a penalty term that quantifies the capacity loss because ofchannel uncertainty. The observations on the impact of the shape and spread of the scattering functionmade in Section 2.3.4.4 remain valid.

2.3.7.4 The Lower BoundTheorem 2.4 The noncoherent capacity of the channel (2.36), under the assumption that the input signal satisfiesthe average-power constraint (2.37) and the peak-power constraint (2.38), is lower-bounded as

C(B)≥ max1≤Q≤MT

Lmimo1 (B,Q),

where

Lmimo1 (B,Q)= max

1≤γ ≤β

B

γTFI(r;√γ s |H)−

1

γT

Q−1∑q=0

MR−1∑r=0

1/2∫−1/2

lndet

(IL+ σqλr

γPTF

QBPh(θ)

)dθ

with

r=31/2H61/2s+ w.

Here, s is a MT-dimensional vector whose first Q elements are i.i.d. of zero mean and constant modulus |[s]q|2=

PT/(QL), and the remaining MT−Q elements are equal to zero. Both the MR×MT matrix H and the MR-dimensional vector w have i.i.d. JPG entries of zero mean and unit variance. Finally, 6 = diag(σ0 σ1 · · ·σMT−1)

T

and 3= diag(λ0 λ1 · · ·λMR−1)T .



The lower bound Lmimo1 needs to be optimized with respect to the number of active transmit

eigenmodes Q (i.e., the number of eigenmodes of the transmit correlation matrix A being signaledover, see Schuster et al. (2009) for more details). Note that when the channel is spatially uncorrelatedat the transmitter, Q simply denotes the number of active transmit antennas.

It can be shown that, at very large bandwidth, it is optimal to signal along the strongest eigenmodeonly (Schuster et al., 2009), a scheme often referred to as rank-one statistical beamforming or eigen-beamforming (Verdu, 2002). In particular, when the channel is spatially uncorrelated at the transmitter,at very large bandwidth, it is optimal to use a single transmit antenna, an observation previously madein Sethuraman et al. (2009) for frequency-flat time-selective channels. At intermediate bandwidth val-ues, the number of transmit antennas to use (for uncorrelated component channels at the transmitterside), or the number of transmit eigenmodes to signal over (if correlation is present), can be determinedby numerical evaluation of the bounds, as shown in Section 2.3.8.

2.3.8 Numerical ExamplesFor a 3× 3 MIMO system, we show in Figs 2.5–2.7 plots of the upper bound Umimo

1 , and – for anumber of active transmit eigenmodes Q ranging from 1 to 3 – plots of the lower bound Lmimo

1 and ofa large-bandwidth approximation of Lmimo

1 denoted as Lmimoaa .8

Parameter settingsAll plots are obtained for receive power normalized with respect to the noise spectral den-sity P/(1W/Hz)= 1.26 · 108 s−1; this corresponds, e.g., to a transmit power of 0.5mW, thermal noiselevel at the receiver of −174 dBm/Hz, free-space path loss corresponding to a distance of 10m,and a rather conservative receiver noise figure of 20dB. Furthermore, we assume that the scat-tering function is brick-shaped with τ0 = 5µs, ν0 = 50Hz, and corresponding spread 1H = 10−3.We also set β = 1. For this set of parameter values, we analyze three different scenarios: a spa-tially uncorrelated channel, spatial correlation at the receiver only, and spatial correlation at thetransmitter only.

Spatially uncorrelated channelFigure 2.5 shows the upper bound Umimo

1 and the lower bound Lmimo1 for the spatially uncorrelated case.

For comparison, we also plot a standard capacity upper bound Umimoc obtained for the coherent setting

and with input subject to an average-power constraint only (Tse & Viswanath, 2005). We can observethat Umimo

c is tighter than our upper bound Umimo1 for small bandwidth; this holds true in general, as

for small bandwidth and spatially uncorrelated channels, Umimo1 (B)≈ [(MRB)/(TF)] ln(1+PTF/B),

which is the Jensen upper bound on Umimoc .

For small and medium bandwidth, the lower bound Lmimo1 increases with Q and comes surprisingly

close to the coherent capacity upper bound Umimoc for Q= 3.

8The analytic expression for Lmimoaa , which is similar to (2.34), can be found in Schuster et al. (2009), Eq. (19).



Bandwidth (GHz)10 100 10001.00.10.01

0

100

200

300

400

500

600R

ate

(Mbi

t/s)

Q=3

Q=

1

Q=2Q

= 1

L1mimo

Laamimo

U1mimo

Ucmimo

Q=

3Q

=2

FIGURE 2.5

Upper and lower bounds on the noncoherent capacity of a spatially uncorrelated MIMO underspread WSSUSchannel with MT =MR = 3,β = 1, and 1H = 10−3. The bounds confine the noncoherent capacity to thehatched area.

Bandwidth (GHz)10 100 10001.00.10.01

0

100

200

300

400

500

600

Rat

e (M

bit/

s)

Q= 3

Q= 2

Q=1

Q=

1Q=3

Q=

2

L1mimo Laa

mimo

U1mimo

FIGURE 2.6

Upper and lower bounds on the noncoherent capacity of a MIMO underspread WSSUS channel that isspatially uncorrelated at the transmitter but correlated at the receiver with eigenvalues of the receivecorrelation matrix given by {2.6, 0.3, 0.1}; MT =MR = 3,β = 1, and 1H = 10−3. The bounds confine thenoncoherent capacity to the hatched area.



Bandwidth (GHz)1.0 10 100 10000.10.01

0

100

200

300

400

500

600

800

700

900

Rat

e (M

bit/

s)

Q = 1

Q=1

Q=2

Q=3

Q = 2

Q = 3

L1mimo Laa

mimo

U1mimo

FIGURE 2.7

Upper and lower bounds on the noncoherent capacity of a MIMO underspread WSSUS channel that isspatially correlated at the transmitter with eigenvalues of the transmit correlation matrix {1.7, 1.0, 0.3} anduncorrelated at the receiver; MT =MR = 3, β = 1, and 1H = 10−3. The bounds confine the noncoherentcapacity to the hatched area.

As for the SISO case, when the bandwidth exceeds a certain critical bandwidth, both Umimo1

and Lmimo1 start to decrease, because the rate gain due to the additional degrees of freedom is off-

set by the increase in channel uncertainty. The same argument holds in the wideband regime for thedegrees of freedom provided by multiple transmit antennas: in this regime, using a single transmitantenna is optimal (Schuster et al., 2009).

Impact of receive correlationFigure 2.6 shows the same bounds as before but evaluated for spatial correlation at the receiver accord-ing to a correlation matrix with eigenvalues {2.6, 0.3, 0.1} and a spatially uncorrelated channel at thetransmitter. The curves in Fig. 2.6 are very similar to the ones shown in Fig. 2.5 for the spatiallyuncorrelated case, yet they are shifted toward higher bandwidth and the maximum is lower. In general,receive correlation reduces capacity at small bandwidth but is beneficial at large bandwidth (Schusteret al., 2009).

Impact of transmit correlationWe evaluate the same bounds once more, but this time for spatial correlation at the transmit-ter according to a correlation matrix with eigenvalues {1.7, 1.0, 0.3} and a spatially uncorrelatedchannel at the receiver. The corresponding curves are shown in Fig. 2.7. The maximum of boththe upper and the lower bound is higher than the corresponding maxima in the previous twoexamples. This rate increase at large bandwidth is caused by the power gain due to statistical beam-forming (Schuster et al., 2009). The impact of transmit correlation at small bandwidth is more


2.4 The Large-Bandwidth Regime: I/O Relation with Interference 97

difficult to characterize because the distance between upper and lower bound is larger than for thespatially uncorrelated case.

An observation of practical importance is that both Umimo1 and Lmimo

1 are rather flat over a largerange of bandwidth values around their respective maxima. Further numerical results (not presentedhere) point at the following: (i) for smaller values of the channel spread 1H, these maxima broadenand extend toward higher bandwidth; (ii) an increase in β increases the gap between upper and lowerbounds.

2.4 THE LARGE-BANDWIDTH REGIME: I/O RELATION WITHINTERFERENCE

So far, we based our analysis on the diagonalized I/O relation (2.18). The goal of this section is todetermine how well the noncoherent capacity of (2.18) approximates that of the channel with ISI andICI (2.14) in the large-bandwidth regime. The presence of interference makes the derivation of tightcapacity bounds (in particular, upper bounds) technically challenging. Nevertheless, our analysis willbe sufficient to establish that for a large range of bandwidth values of practical interest, the presence ofinterference does not change the noncoherent capacity behavior significantly, whenever the channel isunderspread.

To establish this result, we derive a lower bound on the capacity of (2.14) by treating interferenceas noise. We then show that, whenever the channel is underspread, this lower bound, evaluated foran appropriately chosen root-raised-cosine WH set (see Example 2.1), is close for a large range ofbandwidth values of practical interest both to the lower bound L1 we derived in the previous section(for the diagonalized I/O relation (2.18)) and to the AWGN-capacity upper bound Cawgn.

To get an I/O relation in vector form, similar to the one we worked with in the previous section, wearrange the intersymbol and intercarrier interference terms {z[k′, l′,k, l]} in (2.14) in a KL×KL matrix Zwith entries

[Z](l′+k′L),(l+kL) =

{z[k′, l′,k, l], if (k′, l′) 6= (k, l)

0, otherwise.

This definition, together with the definitions in Section 2.3, allows us to compactly express (2.14) as

r= h�s+Zs+w.

For a given WH set, the noncoherent capacity C of the induced discretized channel (2.14) is definedas in (2.25). We next derive a lower bound on C. As in Section 2.3, we first illustrate the main steps inthe derivation of the lower bound using a simplified setting.

2.4.1 A Lower Bound on Capacity2.4.1.1 The Bounding IdeaWe consider, for simplicity, a block-fading channel with block length 2 and I/O relation

r[1]= hs[1]+ zs[2]+w[1]

r[2]= hs[2]+ zs[1]+w[2],(2.40)


CHAPTER

6Equalization of Time-VaryingChannels

Philip Schniter1, Sung-Jun Hwang2, Sibasish Das3, Arun P. Kannu4

1The Ohio State University, Columbus, OH, USA2Qualcomm, Inc., Santa Clara, CA, USA

3Qualcomm, Inc., San Diego, CA, USA4Indian Institute of Technology, Madras, Chennai, India

6.1 INTRODUCTIONAs discussed in Chapter 1, the wireless communication channel can be modeled as a time-varying(TV) linear1 system whose output is corrupted by additive noise. To reliably recover the transmittedinformation from the channel output, the receiver must address the effects of both linear distortion andadditive noise. Although, in theory, the mitigation of linear distortion and additive noise should bedone jointly, in practice the task is often partitioned into two tasks, equalization and decoding, in orderto reduce implementation complexity.

Roughly speaking, equalization leverages knowledge of channel structure to mitigate the effects ofthe linear distortion, whereas decoding leverages knowledge of code structure to mitigate the channel’sadditive noise component. The equalizer might be well informed about the channel (e.g., knowing thecomplete channel impulse response) or relatively uninformed (e.g., knowing only the maximum chan-nel length). In some cases, knowledge of symbol structure (e.g., the symbol alphabet or, if applicable,the fact that the symbols have a constant modulus) is assumed to be in the domain of the equalizer,whereas in other cases, it is assumed to be in the domain of the decoder; because the equalizer anddecoder work together to infer the transmitted information from the channel output, the role of equali-zation versus decoding is somewhat a matter of definition. For this chapter, however, we assume thatexploitation of code structure is not in the domain of the equalizer.

Generally speaking, the output of the equalizer is a sequence of symbol (or bit) estimates whichhave been, to the best of the equalizer’s ability, freed of channel corruption. These estimates are thenpassed to the decoder for further refinement and final decision making. In so-called turbo equalizationschemes (Douillard et al., 1995; Koetter, Singer, & Tuchler, 2004), the decoder passes refined soft bitestimates back to the equalizer for further refinement, and the equalizer passes further refined soft bitestimates to the decoder. The process is then iterated until the equalizer and decoder “agree” on the softbit estimates. Note that the use of soft bit estimates implies that the equalizer treats the bits as (a priori)independent. Turbo equalization is illustrated in Fig. 6.1 and will be discussed in more detail later.

1Some channels are better modeled as nonlinear, but such channels are not the focus of this book.


237


238 CHAPTER 6 Equalization of Time-Varying Channels

Estimatesof coded bits

SISOequalizer

Π−1

Π

SISOdecoder

Lc|y[ j] Lc,ext[ j]

Lc[ j]

L′pri[ j]

−+

−+

y

L′pos[ j]L′

ext[ j]

FIGURE 6.1

Soft-input soft-output (SISO) equalizer connected with a SISO decoder in a turbo configuration. Note thepresence of deinterleaver 5−1 fed by the extrinsic equalizer LLRs Lc,ext[ j]= Lc|y[j]−Lc[j], and interleaver 5fed by the extrinsic decoder LLRs L′ext[ j]= L′pos[ j]−L′pri[ j].

The inputs to an equalizer depend on its design. So-called coherent equalizers are assumed to knowthe parameters describing the state of the TV linear system that they are trying to mitigate, or anestimate thereof. Typical examples of channel state parameters include impulse response coefficientsor intercarrier interference coefficients. Coherent equalization requires the simultaneous operation of achannel estimator, whose main purpose is to provide accurate and up-to-date estimates of the TV chan-nel state to the equalizer. Channel estimation is discussed in Chapter 4. The idea to separate channelestimation from equalization can be traced back to early work by Kailath (1960).

So-called noncoherent equalizers operate without explicit knowledge of the channel state, andtherefore are not dependent on the implementation of a channel estimator. Noncoherent equalizers,however, are sometimes assumed to know the channel statistics (e.g., the scattering function) or anestimate thereof. In the case of a nonstationary channel, the statistics themselves would need to betracked. In their most general form, noncoherent equalizers treat the channel parameters as “nuisanceparameters” that complicate data estimation. In some cases, they explicitly estimate the channel stateparameters in conjunction with the data (i.e., joint channel/symbol estimation), whereas in other casesthey compute data estimates without ever computing a channel estimate.

The equalization of rapidly TV communication channels is much more challenging than the equali-zation of their time-invariant (or slowly TV) counterparts. This can be understood intuitively as fol-lows. From the perspective of coherent equalization, a rapidly TV channel implies that the channelstate is constantly changing, which implies that the equalizer must be constantly redesigned in orderto stay well matched to the channel. From the perspective of noncoherent equalization, a rapidly TVchannel has more degrees of freedom (over a given bandwidth and signaling epoch) than a slowlyvarying channel, and thus more nuisance parameters to contend with.

Beyond these intuitive considerations, there is another important reason why rapidly TV channelsare more difficult to equalize than slowly TV ones. For time-invariant linear channels, information canbe split up and transmitted in parallel on noninterfering subcarriers. In this case, equalization becomesa simple matter of adjusting the gain and phase on each received subcarrier. This is, in fact, the mainidea behind multicarrier modulation schemes like orthogonal frequency division multiplexing (OFDM)


6.2 System Model 239

(Cimini, 1985). For slowly TV channels, the same approach can be easily extended: to mimic a time-invariant channel, the OFDM symbol duration can be chosen shorter than the channel’s coherencetime. But, as now explained, such an approach turns out to be impractical for rapidly TV channels.To prevent interference between adjacent OFDM symbols, guard intervals are typically inserted. Fortime-invariant or slowly TV channels, the loss in spectral efficiency due to the inclusion of these guardscan be made small because the channel delay spread (and hence the guard interval) is much smallerthan the channel coherence time (and hence the OFDM symbol length). For rapidly TV channels, theOFDM symbol length would need to be made extremely short, at which point the loss of spectralefficiency resulting from guard insertion would be severe. If one tried to optimize the modulationstrategy, one would find that it is, in fact, impossible to prevent interference among the subcarrierswithout significant compromise in spectral efficiency (Strohmer & Beaver, 2003) – a consequenceof the Balian-Low theorem (Daubechies, 1992). To summarize: while the equalization of slowly TVchannels can be trivialized through suitable choice of the transmission scheme, the equalization ofrapidly TV channels cannot.

The remainder of this chapter will be organized as follows. In Section 6.2, we outline the systemmodel assumed throughout the chapter and detail the essential features that result from rapid channeltime-variation. In Section 6.3, we describe coherent approaches to equalization of rapidly TV channelsand, in Section 6.4, we describe noncoherent approaches. In Section 6.5, we conclude.

6.2 SYSTEM MODELWe now outline the system model used in the remainder of the chapter. In this chapter, we focus onsystems which use a single transmitter antenna and a single receiver antenna; multiantenna systemswill be discussed in Chapter 8.

6.2.1 Basic AssumptionsAs discussed in Chapter 1, the time-domain received sample r[n] can be written in terms of the trans-mitted sequence (s[n])n∈Z, the TV time-n length-M impulse response (h[n,m])M−1

m=0 , and additive whiteGaussian noise process (w[n])n∈Z of variance σ 2

w as follows:

r[n]=M−1∑m=0

h[n,m]s[n−m]+w[n]. (6.1)

In this chapter, we assume that the transmitted sequence (s[n])n∈Z is generated from the finite-alphabet symbol sequence (a[k])k∈Z using a generic finite-memory linear modulation scheme, andthat the demodulated sequence (y[k])k∈Z is generated from the received sequence (r[n])n∈Z using acorresponding finite-memory linear demodulation scheme. Prior to modulation, the symbol sequence(a[k])k∈Z is mapped from a coded-bit sequence (c[ j])j∈Z, which is generated from an information-bitsequence (b[i])i∈Z through rate-Rc coding and interleaving. We denote the symbol alphabet by A , itscardinality by |A |, and the set of admissible symbol sequences (as allowed by coding/interleaving)by AA .



For ease of notation, we find it convenient to assume block transmission with block length K, wherethe symbols

a , (a[0] a[1] · · ·a[K− 1])T ∈A K

can be related to the demodulated channel outputs

y , (y[0] y[1] · · ·y[K− 1])T ∈ CK

through the matrix/vector equation

y= 0HG︸︷︷︸, Q

a+ z. (6.2)

We note, however, that the block length K can be arbitrarily large and that the receiver might not beable to store/process the entire vector y. In (6.2), 0, H, and G are matrix representations of the lineardemodulation operator, the linear TV channel, and the linear modulation operator, respectively, and

z , (z[0] z[1] · · ·z[K− 1])T ∈ CK

represents the noise after demodulation. We note that the effective channel matrix Q= 0HG ∈ CK×K

represents the combined effects of modulation, channel propagation, and demodulation, and will beused extensively throughout the chapter. Finally, we collect, in the vector c, the K log2 |A | coded bitsthat determine the K symbols in a. Note that, with a block length of K, we have AA ⊂A K .

In writing (6.2), we have assumed that the K demodulated samples in y are sufficient for equal-ization/decoding of the K symbols in a (i.e., that (y[k])k<0 and (y[k])k≥K can be ignored), and thatinterblock interference (IBI) is negligible. These assumptions will be satisfied for any well-designedblock transmission scheme. Furthermore, we will assume that the noise z, the symbols a, and theeffective channel Q are mutually independent, and that (unless otherwise noted) the symbols a are zero-mean (i.e., µa = 0) and white (i.e.,2 Ca = σ

2a I). Finally, it should be noted that the demodulated noise

z is not assumed to be white unless otherwise noted; although (w[n])n∈Z is white, the demodulationprocess does not necessarily guarantee white (z[k])k∈Z.

Throughout the chapter, we assume that the equalizer knows the symbol alphabet A but not thecode structure, i.e., AA . Thus, the topic of joint equalization/decoding lies outside the scope of thischapter. Turbo equalization, where separate equalization and decoding steps are iterated (as illustratedin Fig. 6.1) will, however, be discussed.

6.2.2 The Structure of the Effective Channel Matrix QIn block equalization, if it can be assumed that certain coefficients of Q will be negligible for nearlyall realizations of Q, then it is reasonable to conclude that an equalizer that ignores these coefficientswill perform nearly as good as an equalizer that incorporates these coefficients. However, the equalizerwhich ignores these coefficients may be significantly cheaper to implement, especially if the proportionof negligible coefficients is large. This is, in fact, the guiding principle behind the design of practicalequalization algorithms for rapidly TV channels.

2Throughout the chapter, we use subscripted versions of C to denote covariance matrices.



DD + 1

K

(b)

M

(a)

M − 1

FIGURE 6.2

Support region of (a) “widely quasibanded” and (b) “narrowly quasibanded” matrices. While M is often large(e.g., in the hundreds), D is usually very small (e.g., 1 or 2).

(a)

TV convolution Widely quasibanded Narrowly quasibanded Fully populated

(b) (c) (d)

FIGURE 6.3

Example of (a) a TV channel’s propagation matrix and the corresponding effective channel matrices that resultfrom (b) CP-SCM, (c) CP-OFDM with max-SINR receiver windowing (Schniter, 2004), and (d) CP-OFDM withrectangular receiver windowing. The dot size is proportional to the coefficient magnitude.

Based on the characteristics of rapidly TV channels and commonly used modulation/demodulationschemes, we partition effective channel matrices Q into three classes based on the support region ofnon-negligible coefficients within the matrix: (1) widely quasibanded, (2) narrowly quasibanded, and(3) fully populated matrices. The support regions of widely quasibanded and narrowly quasibandedmatrices are defined in Fig. 6.2, and illustrative examples of Q based on a randomly generated channelimpulse response and several modulation/demodulation schemes are given in Fig. 6.3 (the constructionof which will be detailed later). Note that we use the term “quasibanded” as opposed to “banded” due tothe corner3 support regions in Fig. 6.2. Banded matrices, like that illustrated in Fig. 6.5(b) on page 248,will also be discussed in the sequel.

To understand how these patterns manifest in Q= 0HG, we must consider the composite effect oflinear modulation G, propagation through the TV linear channel H, and demodulation 0. As impliedby (6.1), the channel propagation matrix H is a TV convolution matrix whose nth row contains theimpulse response coefficients (h[n,m])M−1

m=0 . For example, Fig. 6.3(a) shows a TV channel propagation

3Note that the one-corner support of the widely quasibanded matrix in Fig. 6.2(a) can be transformed into the two-cornersupport of the narrowly quasibanded matrix in Fig. 6.2(b) by simply rotating the columns of the former matrix right by M/2places. Thus, the essential difference between these matrices is really the width of the support region (i.e., M versus 2D+ 1).



matrix for M = 8 that was randomly generated according to the WSSUS Jakes (Stuber, 2001) fadingassumption with νmaxTs = 0.03, where νmax denotes the maximum (single-sided) Doppler spread inHz and Ts the channel-use interval (i.e., the symbol period in a single-carrier system) in seconds.If the channel was time-invariant, the propagation matrix would have a Toeplitz structure. But here,since the channel is rapidly TV, each coefficient’s magnitude varies smoothly along its diagonal of thepropagation matrix. Given the construction of H, the characteristics of Q will depend on the choicesof G and 0 and their interaction with H, as discussed next.

6.2.2.1 Single-Carrier Modulation/DemodulationFor single-carrier modulation/demodulation schemes, G and 0 accomplish little more than insertionand removal of a guard interval (of length Ng ≥M− 1). In this case, Q is created from the propa-gation matrix H by simply cutting the first Ng columns of H out and superimposing them onto thelast Ng columns of H. This operation was used, e.g., to create the widely quasibanded matrix inFig. 6.3(b) from the TV convolution matrix in Fig. 6.3(a). More precisely, when H has dimensionsK× (K+M− 1), cyclic-prefixed single-carrier modulation (CP-SCM) (Falconer, Ariyavisitakul,Benyamin-Seeyar, & Eidson, 2002) uses

G=

0 IM−1

IK−M+1 0

0 IM−1

and 0 = IK , (6.3)

whereas zero-padded single-carrier modulation (ZP-SCM) (Wang, Ma, & Giannakis, 2004) uses aslightly different construction of G, H, and 0 that results in an equivalent Q matrix. We consider theeffective channel matrix generated from SCM to be widely quasibanded because M, the width ofthe non-negligible band in Q, is typically large: because M , dτmax/Tse is the discrete delay spread ofthe channel, it is not unusual for M to be in the hundreds (e.g., delay spread τmax = 20 µs and bandwidth1/Ts = 10 MHz yield M = 200). Although small-M applications do exist, they yield equalization prob-lems that are not very challenging, and hence not very interesting, especially in the coherent setting.Hence, we focus on the case of large M.

6.2.2.2 Time-Frequency Concentrated Modulation/DemodulationThe effect, on the transmitted signal {s[n]}, of propagation through the linear TV channel {h[n,m]} canbe understood as simultaneous delay and Doppler spreading. Thus, if each symbol a[k] is modulatedon a time-frequency concentrated waveform gk , (gk[0] · · ·gk[N− 1])T for suitable4 N, so that

s[n]=K−1∑k=0

a[k]gk[n] for n= 0, . . . ,N−M, (6.4)

where gk is sufficiently “isolated” from the other waveforms {gk′}k′ 6=k in the time-frequency domain,then propagation through the delay/Doppler spreading channel should cause only mild interference

4If N exceeds the time period between consecutive block transmissions, then interblock interference (IBI) can result.In this case, the model (6.2) can be generalized to y=Qa+Qpreapre+Qpstapst+ z, where Qpreapre accounts forprecursor IBI and Qpstapst accounts for postcursor IBI. The IBI can be made negligible, however, with suitable designof modulation/demodulation pulses {gk} and {γk}.



between these {a[k]}. Extraction of the kth symbol’s contribution from the received signal {r[n]} wouldthen be accomplished through the linear demodulation operation

y[k]=N−1∑n=0

r[n]γ ∗k [n] for k = 0, . . . ,K− 1, (6.5)

for γk = (γk[0] · · ·γk[N− 1])T concentrated at the same time and frequency as gk. This is the main ideabehind pulse-shaped multicarrier schemes (Bolcskei, 2002; Das & Schniter, 2007; Haas & Belfiore,1997; Kozek & Molisch, 1998; Le Floch, Alard, & Berrou, 1995; Matheus & Kammeyer, 1997; Matz,Schafhuber, Grochenig, Hartmann, & Hlawatsch, 2007; Rugini, Banelli, & Leus, 2006; Schniter, 2004;Strohmer & Beaver, 2003) as well as Slepian schemes (Sigloch, Andrews, Mitra, & Thomson, 2005).

With suitably designed modulation/demodulation waveforms {gk} and {γk}, the combined channelmatrix Q under (6.4)–(6.5) can be ensured to have the narrowly quasibanded structure illustrated inFig. 6.2(b). There, D can be interpreted as the (single-sided) discrete Doppler spread of the effectivechannel and 2D+ 1 can be recognized as the width of the non-negligible interference band. TypicallyD is chosen as

D= dνmaxTsK+D0e, (6.6)

where D0 is a small non-negative constant (e.g., 0≤ D0 ≤ 2 for a well-designed modulation/demod-ulation scheme), as discussed in the sequel. We can see that Q will be narrowly quasibanded, so that2D+ 1�M, by plugging the typical block-length choice of K = 4M into (6.6) and then using thedefinition M = τmax/Ts to see that (Hwang & Schniter, 2006)

D≤ d4νmaxτmaxe+ dD0e (6.7)

= 1+dD0e when 0< 2νmaxτmax ≤ 0.5. (6.8)

The quantity 2νmaxτmax, sometimes referred to as the “spreading index,” describes the total severityof delay-Doppler spreading. The boundary between underspread and overspread channels occurs at2νmaxτmax = 1, and it can be safely assumed that 2νmaxτmax� 1 for practical applications. Thus, from(6.8), we conclude that the width of the non-negligible coefficient band is 2D+ 1≤ 3+ 2dD0e whensuitable modulation/demodulation waveforms are used. In summary, 2D+ 1�M is a reasonable claimfor the values of M that are of interest in this chapter.

As an example, Fig. 6.3(c) shows Q constructed through cyclic-prefixed orthogonal frequency divi-sion multiplexing (CP-OFDM) (Cimini, 1985) with max-SINR receiver pulse-shaping (Schniter, 2004)using the TV convolution matrix shown in Fig. 6.3(a). Although the channel has an extremely highspreading index of 2νmaxτmax = 0.8, all coefficients in Q outside of the 3-wide band are negligible. Asanother example, Fig. 6.4 shows E

{∣∣[Q]k,k+d∣∣2} (in dB) versus d for several modulation/demodulation

schemes and a channel with a spreading index of 0.1. (E{∣∣[Q]k,k+d

∣∣2} is invariant to k.) As can be seenin Fig. 6.4, the JOMS scheme from Das & Schniter (2007) suppresses coefficients outside the band ofradius D= dνmaxTsKe = 1 by at least 44 dB.

At this point, we make one final observation about a narrowly quasibanded matrix Q. If weupper-triangularize Q, e.g., through the QR decomposition Q= VQ where V is unitary and Q is uppertriangular, then Q will have the “V-shaped” structure shown in Fig. 6.5(c) on page 248. Such upper-triangularization of Q occurs before decision feedback and tree-search based equalization, as discussedin Section 6.3.2.



0 5 10 15−100

−90

−80

−70

−60

−50

−40

−30

−20

−10

0

10

(a)d

−100

−90

−80

−70

−60

−50

−40

−30

−20

−10

0

10

0 5 10 15

(c)d

−100

−90

−80

−70

−60

−50

−40

−30

−20

−10

0

10

0 5 10 15

(b)d

•

JOMSCP−OFDMZP−OFDMS−OFDM

FIGURE 6.4

For the IBI model y=Qa+Qpreapre+Qpstapst+ z, with K = 64,Ng = 15,M = 16,νmaxTs = 0.003, and WSSUSJakes fading (see Chapter 1), subplot (b) shows the mean-square value (in decibel) of a coefficient in Q versusits distance “d” from the main diagonal of Q, while subplots (a) and (c) show the same for the coefficients inQpre and Qpst, respectively. The dashed vertical line indicates D= dνmaxTsKe. JOMS refers to Das & Schniter’sjoint transmitter/receiver optimization max-SINR scheme (Das & Schniter, 2007) while S-OFDM refers toStrohmer and Beaver’s orthogonal scheme (Strohmer & Beaver, 2003).

6.2.2.3 Other Modulation/Demodulation SchemesWhen the modulation and demodulation pulses {gk}

K−1k=0 and {γk}

K−1k=0 are not designed to curb

the effects of delay/Doppler spreading, the support of non-negligible coefficients within Q can bewidespread, to the point where Q must be considered as fully populated. Examples of such mod-ulation/demodulation schemes include the wavelet-based schemes (Martone, 2000; Wornell, 1996),the chirp-based schemes (Barbarossa & Torti, 2001; Kannu & Schniter, 2008; Martone, 2001), ascheme designed to maximize a lower bound on capacity (Yun, Chung, & Lee, 2007), and the diversitymaximizing schemes (Hwang & Schniter, 2007; Ma & Giannakis, 2003).

Even popular multicarrier schemes like CP-OFDM, when used with a rectangular receiver pulse,yield a near-fully populated Q when the channel is TV rapidly enough. Figure 6.3 shows this byexample: the effective channel matrix in Fig. 6.3(d) was constructed from the TV convolution matrix


6.3 Coherent Equalization 245

in Fig. 6.3(a) through standard CP-OFDM. Notice that the non-negligible coefficients in Q are notall located in the central band of the matrix. Figure 6.4 shows a similar phenomenon: for CP-OFDM,E{∣∣[Q]k,k+d

∣∣2} decays very slowly with d, the distance from the main diagonal of Q.

6.3 COHERENT EQUALIZATIONIn this section, we focus on coherent equalization, i.e., equalization under the assumption that thechannel matrix H, and thus the effective channel matrix Q, is known. The noncoherent case will bediscussed in Section 6.4.

In Section 6.3.1, we discuss several criteria (i.e., notions of optimality) under which coherent equal-izers are designed, and, in Section 6.3.2, we describe classical equalization algorithms for generic Q.Then, in Sections 6.3.3–6.3.4, we focus on coherent equalization techniques for the specific types of Qanticipated for rapidly TV channels in Section 6.2.2.

6.3.1 Coherent Equalization CriteriaReferring to (6.2), the goal of coherent block equalization is estimation of the symbol vector a, or thecorresponding coded-bit vector c, from the linearly distorted and noisy demodulator output vector y,assuming knowledge of the channel matrix Q and the noise statistics. Note that this may or may notinclude a hard-decision or quantization step, as explained later. In any case, we are fundamentally inter-ested in identifying the “optimal” method to generate these symbol estimates. The answer, however,depends on how optimality is defined, i.e., which equalization criterion is used.

In organizing the criteria that are most often used for equalizer design, it helps to consider how theequalizer outputs will be used by the receiver (e.g., by the decoder).

6.3.1.1 Hard Symbol or Bit EstimatesIf there is no decoder or if the decoder wants hard estimates of the symbols or bits, then the goal is toproduce a finite-alphabet estimate a ∈A K . (Recall that the equalizer is assumed to know the symbolalphabet A but not the set of coded symbol sequences AA .)

Maximum a posteriori (MAP) sequence detection (SD) (Poor, 1994) minimizes the probability ofsequence error. By definition, the MAPSD estimate is

acMAPSD , arg maxa′∈A K

Pr{a= a′ | y,Q}. (6.9)

In (6.9), we use the notation “cMAPSD” to emphasize that this is the coherent version of the MAPcriterion applied to sequence detection. In contrast, coherent MAP symbol and bit detection takesthe form

acMAP[k] , arg maxa∈A

Pr{a[k]= a | y,Q} for k ∈ 0, . . . ,K− 1 (6.10)

ccMAP[ j] , arg maxc∈{0,1}

Pr{c[ j]= c | y,Q} for j ∈ 0, . . . ,K log2 |A | − 1. (6.11)

In writing (6.9)–(6.11), we have treated the channel Q as a random quantity.



If we assume that each of the symbol sequences in A K has equal prior probability, i.e., Pr{a=a′} = 1/|A |K ∀a′ ∈A K , then coherent MAPSD reduces to coherent maximum likelihood (ML) SD(Poor, 1994):

acMLSD , arg maxa∈A K

f (y | a,Q), (6.12)

where f (y | a,Q) denotes the probability density function of y conditioned on a and Q, also known asthe likelihood function. To see this, notice from Bayes rule that

Pr{a= a′ | y,Q} =f (y | a′,Q)Pr{a= a′ | Q}

f (y | Q)=

1

|A |Kf (y | a′,Q)

f (y | Q), (6.13)

from which it becomes clear that maximizing Pr{a= a′ | y,Q} over a′ is equivalent to maximizingf (y | a′,Q) over a′. Due to our assumption of zero-mean Gaussian noise with covariance Cz, we havef (y | a′,Q)= 1

πK det{Cz}exp(−‖y−Qa′

∥∥2C−1

z), so that coherent MLSD reduces to

acMLSD = arg mina∈A K

∥∥y−Qa∥∥2

C−1z

. (6.14)

Above, we used the quadratic-form notation ‖z‖2A , zHAz, where A is any positive semidefiniteHermitian matrix.

6.3.1.2 Complex-Field Symbol EstimatesIf the decoder prefers or tolerates complex-valued symbol estimates, rather than finite-alphabet symbolestimates, then one can consider equalization schemes that yield a ∈ CK . Note, however, that we stillassume a ∈A K .

A popular criterion for this case is minimum mean-squared error (MMSE) (Poor, 1994). Thecoherent unconstrained MMSE sequence estimate is defined as

acMMSE , arg mina′∈CK

E{‖a− a′‖2

∣∣ y,Q}. (6.15)

Because the MMSE estimate equals the conditional mean (Poor, 1994), we have

acMMSE = E{a | y,Q} =∑

a∈A K

a p(a | y,Q) (6.16)

=

∑a∈A K

af (y | a,Q)p(a)∑

a′∈A K f (y | a′,Q)p(a′). (6.17)

If we assume that p(a) is uniformly distributed over A K , then

acMMSE =

∑a∈A K a exp

(−‖y−Qa‖2

C−1z

)∑

a′∈A K exp(−‖y−Qa′‖2

C−1z

) . (6.18)

Notice from (6.18) that the finite-alphabet nature of a makes the conditional mean difficult to evaluatebecause it requires the evaluation of |A |K terms.

To reduce complexity, the MMSE criterion is often used in conjunction with particular constraintson how the symbol estimates are generated from y. The most common examples are MMSE linear



equalization (6.22) and MMSE decision feedback equalization (6.27). Note that, if one assumes thata|y,Q is Gaussian distributed, then the unconstrained MMSE estimator (6.15) itself becomes a linearfunction of y (Poor, 1994).

As the signal-to-noise ratio (SNR) increases, the effect of linear channel distortion overwhelmsthat of additive noise, motivating the so-called zero-forcing (ZF) criterion. Effectively, ZF equalizers“invert” the effect of the linear channel distortion while ignoring the presence of additive channelnoise. The most common examples are ZF linear equalization and ZF decision feedback equalization,both described in Section 6.3.2. In the absence of additive noise, ZF equalizers are equivalent to theirMMSE counterparts.

6.3.1.3 Soft Bit EstimatesIf the decoder prefers soft bit estimates, then the goal is to produce reliability information on each ofthe coded bits in c. Typically, bit reliabilities are expressed in the form of a log likelihood ratio (LLR)for each bit. The goal of coherent equalization, thus, becomes the computation of the coherent posteriorLLRs5

Lc|y,Q[ j] , lnPr{c[ j]= 1 | y,Q}Pr{c[ j]= 0 | y,Q}

for j= 0, . . . ,K log2 |A | − 1, (6.19)

given the a priori LLRs

Lc[ j] , lnPr{c[ j]= 1}

Pr{c[ j]= 0}for j= 0, . . . ,K log2 |A | − 1. (6.20)

When nothing is a priori known about the bit c[ j], the value Lc[ j]= 0 is used. Nonzero a priori LLRsare used, e.g., when the equalizer is fed by the outputs of a soft decoder, as in turbo equalization (seeFig. 6.1) or when certain bits are known pilots. If c[ j] is a pilot (or otherwise known with completeconfidence), then Lc[ j]=±∞. Recall that the use of a priori LLRs implies that the equalizer treats thecoded bits as independent.

Hard MAP bit estimates can be generated by quantizing the posterior LLRs as follows:

ccMAP[ j]=1

2

(1+ sign(Lc|y,Q[ j])

). (6.21)

6.3.2 Coherent Equalization ToolsThe coherent equalization criteria discussed in Section 6.3.1 each describe a particular goal for equal-ization, but not how equalization would be practically implemented. For example, the MAPSD, MLSD,and (unconstrained) MMSE estimates described in Section 6.3.1 require the evaluation of O(|A |K)metrics if computed through brute force, which is not practical for typical values of K. In this section,we review classical equalization implementations whose designs are guided by the various criteria inSection 6.3.1.

5Sometimes, it is more practical to calculate posteriors using only a limited number of (say J) future observations (Li,Vucetic, & Sato, 1995). In this so-called “fixed lag” case, the conditioning in (6.19) is performed on

(y[0], . . . ,y

[⌈ jlog2 |A |

⌉+

J])T instead of y.



2D + 1

(b)

N − 2D

K

D + 1

(a)

D 2D + 1

(c)

2D

FIGURE 6.5

Support region of (a) “quasibanded,” (b) “banded,” and (c) “V-shaped” matrices. From quasibanded (a),banded (b) is obtained by deleting the first and last D columns, while V-shaped (c) is obtained by uppertriangularization.

6.3.2.1 Trellis-Based EqualizationTrellis methods can be used to implement MLSD and MAP equalization when Q is a banded matrix. Asillustrated in Fig. 6.5, a banded matrix differs from its quasibanded counterpart because of the lack ofcorner elements. A banded matrix manifests when, e.g., the first and last few elements of a are knownor zero-valued.6 If Q is a banded matrix with a 2D+ 1 wide band, then the Viterbi algorithm (For-ney, 1972) can perform MAPSD/MLSD equalization using O(KD|A |2D+1) operations. Similarly, theforward–backward (or BCJR) algorithm (Bahl, Cocke, Jelinek, & Raviv, 1974) can be used to accom-plish MAP symbol/bit equalization with a complexity of O(KD|A |2D+1) operations (Forney, 1973,Appendix). Lower-complexity trellis-based approximate MAP equalizers include fixed-lag approaches(Li et al., 1995) and the soft-output Viterbi algorithm (SOVA) (Hagenauer & Hoeher, 1989). In allcases, the complexity is linear in the block length K and exponential in the effective channel length2D+ 1. Thus, these techniques will be practical if and only if 2D+ 1 is very small.

Trellis methods can be modified to work on quasibanded Q using, e.g., a “tail-biting” approach.Here, from an arbitrary location within the block, the Viterbi algorithm is initialized from each ofthe possible |A |2D+1 states and forced to terminate in the same state; the initialization leading tothe optimum sequence metric is then chosen. This approach requires running the Viterbi algorithm|A |2D+1 times, for a total cost of O(KD|A |4D+2) operations.

6.3.2.2 Linear EqualizationIn linear equalization, the symbol estimates are a linear function of the observation y, i.e.,

aLIN = Ey, (6.22)

6More precisely, consider the system model (6.2). If Q is as illustrated in Fig. 6.5(a) and the last M− 1 elements of a arezero-valued, then we can write y= Qa+ z where a= (a[0] · · ·a[K−M])T and where Q is a banded matrix (as illustratedin Fig. 6.5(b)) with an M-wide band. Or, if Q is as illustrated in Fig. 6.5(b) and the first and last D elements of a are zero-valued, then we can write y= Qa+ z where a= (a[D] · · ·a[K−D− 1])T and where Q is a banded matrix (as illustrated inFig. 6.5(b)) with a 2D+ 1 wide band.



for a suitably chosen matrix E ∈ CK×K . In some cases, such as when it is impractical to process theentire observation y at once, additional constraints are placed on E. Because linear equalization ignoresthe finite-alphabet property of a, its performance is generally much worse than that of techniques thatleverage the finite-alphabet property.

The coherent linear MMSE (LMMSE)7 equalizer uses, for E in (6.22),

ELMMSE , arg minE∈CK×K

E{‖a− aLIN‖

2∣∣ Q}. (6.23)

Given the symbol and noise statistics assumed in Section 6.2, it can be shown (Verdu, 1998) that

ELMMSE =QH(QQH+ σ−2

a Cz)−1 (6.24)

= (QHC−1z Q+ σ−2

a IK)−1QHC−1

z . (6.25)

The matrix inversion lemma8 can be used to relate (6.24) and (6.25). The linear ZF (LZF) estimatoruses (6.22) with E set to

ELZF =Q−1, (6.26)

assuming that Q is invertible. When Q is not invertible, the LZF equalizer is said not to exist.Due to the matrix inversions in (6.24)–(6.26), the complexity of LMMSE and LZF equalization

is O(K3), which is much less than the O(|A |K) complexity of unconstrained MMSE estimation in(6.18). Still, O(K3) may be impractical when K is large.

6.3.2.3 Decision Feedback EqualizationDecision feedback equalization (DFE) exploits the finite-alphabet symbol property while keeping com-plexity close to that of linear equalization. Essentially, it makes hard symbol decisions sequentially andleverages past decisions for future symbol estimates.

The DFE generates complex-valued symbol estimates as follows:9

aDFE = Ey− (U− IK)DA (aDFE). (6.27)

In (6.27), DA (·) : CK→A K denotes element-wise quantization w.r.t. the symbol alphabet A , U ∈

CK×K is monic upper triangular (to ensure that decision feedback is strictly causal), and E ∈ CK×K .Keeping the monic upper-triangular property of U in mind, (6.27) can be understood as follows:the estimate aDFE[K− 1] is linearly computed from y using the last row in E; then, the estimateaDFE[K− 2] is linearly computed from y and quantized aDFE[K− 1] using the second-to-last rowsin E and U, respectively; then, the estimate aDFE[K− 3] is linearly computed from y and quantized{aDFE[K− 2], aDFE[K− 1]} using the third-to-last rows in E and U, respectively; and so on.

7Note that the LMMSE equalizer described here is a generalization of the classical tapped delay-line LMMSE equalizer(Proakis, 2001).8The matrix inversion lemma can be stated as (A−1

+BC−1BH)−1= A−AB(C+BHAB)−1BHA, assuming the inverses

exist.9The DFE described here is sometimes referred to as a “generalized” DFE to distinguish it from the classical DFEimplemented using tapped delay-line forward and feedback filters (Al-Dhahir & Cioffi, 1995).



The DFE matrices E and U are typically designed according to the MMSE or ZF criteria. As withlinear equalization, additional constraints may be placed on E and/or U. The coherent MMSE-DFE(Cioffi & Forney, 1997) uses (6.27) with {E,U} set to

{EMMSE-DFE,UMMSE-DFE} = argminE,U

E{‖a− aDFE‖2| Q}

assuming DA (aDFE)= a, (6.28)

i.e., set to minimize the MSE of aDFE under the assumption of perfect decision feedback. It can be shownthat UMMSE-DFE and EMMSE-DFE can be computed with the aid of an LDU decomposition (Al-Dhahir &Sayed, 2000):

UHMMSE-DFE1MMSE-DFEUMMSE-DFE =QHC−1

z Q+ σ−2a I (6.29)

EMMSE-DFE = UMMSE-DFEELMMSE, (6.30)

with ELMMSE given by (6.24)–(6.25). The ZF-DFE takes the form of (6.27) with UZF-DFE computedthrough the LDU decomposition UH

ZF-DFE1ZF-DFEUZF-DFE =QHQ, and with EZF-DFE = UZF-DFEQ−1.In practice, the hard decisions in (6.27) are not always perfect, which leads to the phenomenon

known as error propagation (O’Reilly & de Oliviera Duarte, 1985). There, a decision error on a[k] hasthe effect of amplifying, rather than canceling, the interference that a[k] causes to the not-yet-estimatedsymbols {a[k′]}k−1

k′=0. Although error propagation can be somewhat alleviated by detecting symbols withhigher signal-to-interference noise ratio (SINR) first (e.g., by V-BLAST detection ordering (Wolnian-sky, Foschini, Golden, & Valenzuela, 1998)), error propagation is better avoided through tree search oriterative soft equalization, as discussed in Sections 6.3.2.4 and 6.3.2.5.

Finally, we note that hybrid trellis/DFE techniques have been proposed with complexities andperformances that lie between trellis and DFE methods. Two of the more well-known techniquesare reduced state sequence estimation (Eyuboglu & Qureshi, 1988) and delayed decision feedbackestimation (Duel-Hallen & Heegard, 1989).

6.3.2.4 Equalization Based on Tree SearchIn DFE, a single hypothesis of the sequence (a[k+ 1], . . . ,a[K− 1]) is used to aid the estimation ofa[k]. Tree-search10 methods improve on this idea by keeping and using several hypotheses of thesequence (a[k+ 1], . . . ,a[K− 1]) until it is clear which is the single best hypothesis.

Tree-search algorithms can be partitioned into optimal and suboptimal approaches. Optimal treesearch methods are capable of implementing MLSD with a complexity that is on average much lessthan that of brute-force search (Mow, 1994). Although this average complexity has been claimed togrow as roughly O(K3) at sufficiently high SNR (Hassibi & Vikalo, 2005), a careful analysis showsthat, in fact, the average complexity of optimal tree search is exponential in K (Jalden & Otter-sten, 2005). To circumvent the potentially high complexity of exact tree search (especially at lowSNR), suboptimal tree search may be considered because a very small performance sacrifice can

10What we call “tree search” is sometimes referred to as closest lattice point search, lattice decoding, sequential decoding,or sphere decoding.



often lead to a huge reduction in complexity. In fact, a well-designed suboptimal tree search canachieve near-ML performance with near-DFE complexity (Murugan, El Gamal, Damen, & Caire,2006). When assessing a suboptimal tree search algorithm, it is most appropriate to think in termsof its performance/complexity tradeoff.

Before conducting a tree search, the observations y in (6.2) must be preprocessed to yield a causalobservation model of the form

y= Qa+ z, (6.31)

where Q is upper triangular and a is some permutation of a. Ignoring permutation for the moment(so that a= a), the standard approach to upper-triangularization of (6.2) is QR decomposition: if Q=VQRQQR, where VQR is unitary and QQR is upper triangular, then preprocessing according to y= VH

QRy ,yQR yields (6.31) with Q= QQR and z= VH

QRz , zQR. Notice that zQR is statistically equivalent to z. Aswe show below, QR preprocessing is closely related to the feedforward filtering operation in ZF-DFE.Because the MMSE-DFE is known to outperform the ZF-DFE in noisy environments, it has beensuggested (Damen, El Gamal, & Caire, 2003) to replace the QR preprocessing step with its MMSE-DFE equivalent, at least for suboptimal tree search. To see this from another perspective, imagine forthe moment that suboptimal tree search is conducted according to the most greedy method possible,i.e., with a single surviving hypothesis per stage. Then, if QR preprocessing is used, this suboptimaltree search is exactly the ZF-DFE, whereas, if MMSE-DFE preprocessing is used, this suboptimal treesearch is exactly the MMSE-DFE.

We will now provide the technical link between the QR decomposition and the ZF-DFE,as well as the details of the MMSE-DFE preprocessor. Comparing the LDU decompositionQHQ= UH

ZF-DFE1ZF-DFEUZF-DFE to the QR decomposition Q= VQRQQR, it becomes evident that QQR =

11/2ZF-DFEUZF-DFE, from which it follows that VH

QR = QQRQ−1=1

1/2ZF-DFEUZF-DFEQ−1

=11/2ZF-DFEEZF-DFE. Thus,

yQR =11/2ZF-DFEEZF-DFEy can be recognized as a scaled version of the ZF-DFE feedforward filter output

EZF-DFEy. If we repeat the same steps with MMSE-DFE quantities in place of ZF-DFE quantities, weobtain the MMSE-DFE preprocessed observation (Damen et al., 2003)

yMMSE-DFE ,11/2MMSE-DFEEMMSE-DFEy, (6.32)

and the corresponding causal model

yMMSE-DFE =11/2MMSE-DFEUMMSE-DFEa+ zMMSE-DFE, (6.33)

where zMMSE-DFE , yMMSE-DFE−11/2MMSE-DFEUMMSE-DFEa.

MMSE-DFE preprocessed tree search proceeds from the causal model (6.33), where the interfer-ence zMMSE-DFE is treated as (signal-independent) additive white Gaussian noise (AWGN). Althoughit can be shown that zMMSE-DFE is white (in fact, CzMMSE-DFE = I for any Cz), it can readily be seenthat zMMSE-DFE is signal-dependent (and hence non-Gaussian) (Damen et al., 2003). Thus, treatingzMMSE-DFE as if it were AWGN will produce suboptimal11 sequence estimates. However, it turns outthat the increase in prequantization SINR (from the use of MMSE-DFE in place of ZF-DFE) more

11Interestingly, it has been shown that zMMSE-DFE can be treated as AWGN when A is constant modulus. In other words,acMLSD = argmina∈A K ‖yMMSE-DFE−1

1/2MMSE-DFEUMMSE-DFEa‖2 for constant modulus A (Hwang & Schniter, 2005).



than compensates for the loss in optimality (due to non-AWGN zMMSE-DFE). Thus, relative to QRpreprocessing, MMSE-DFE preprocessing has been observed to yield significant improvements in theperformance/complexity tradeoff of suboptimal tree search (Murugan et al., 2006).

Other types of preprocessing include lattice reduction (e.g., the method of Lenstra, Lenstra, &Lovasz (1982)) and column permutation (e.g., reordering of a so that stronger symbols are decidedfirst, as in V-BLAST ordering (Wolniansky et al., 1998)). Because these techniques would destroy thequasibanded structure of Q, however, we will not elaborate on them further.

Tree search algorithms (whether optimal or suboptimal) can be categorized as breadth-first, depth-first, or best-first (Anderson & Mohan, 1984; Murugan et al., 2006). Breadth-first search algorithmsinclude, e.g., the M-algorithm (Anderson & Mohan, 1984), the T-algorithm (Simmons, 1990), statisticalpruning algorithms (Gowaikar & Hassibi, 2003), the Wozencraft sequential decoder (Wozencraft &Reiffen, 1961), and the Pohst sphere decoder (Fincke & Pohst, 1985). Depth-first search algorithmsinclude, e.g., the Schnorr–Euchner sphere decoder and its variants (Agrell, Eriksson, Vardy, & Zeger,2002; Damen et al., 2003; Viterbo & Boutros, 1999). Best-first search algorithms include, e.g., the stackand Fano algorithms (Fano, 1963; Murugan et al., 2006; Viterbi & Omura, 1979). Because a thoroughdescription and comparison of these approaches are outside the scope of this chapter, we make onlya few remarks. The Fano algorithm was recently found to yield a superior complexity/performancetradeoff when Q was either a convolution matrix or fully populated (Murugan et al., 2006). Thesuperiority of the Fano algorithm does not appear to hold when Q is quasibanded, though (Hwang &Schniter, 2006). The M-algorithm is popular for two reasons: simplicity and fixed complexity (i.e.,complexity invariant to channel/noise realizations and SNR).

Although so far we have focused on tree-search implementations of MLSD, we now describe howtree search can be used to find (approximate) posterior LLRs, and thus MAP symbol and bit estimates,using the method of Hochwald & ten Brink (2003). First, we define the coherent MAP sequence metric

ζcoh(c), ln f (y | c,Q)+ lTc c (6.34)

=−‖y−Qa‖2C−1

z− ln

(πK det{Cz}

)+ lTc c, (6.35)

where lc , (Lc[0] · · ·Lc[K− 1])T , with Lc[k] being a priori LLRs from (6.20), and where the sym-bols a are determined by the hypothesized bit vector c. As previously remarked, the use of a prioriLLRs implies that the coded bits {c[ j]} are treated as independent. It is straightforward to show (seeAppendix 6.A) that the posterior LLR defined in (6.19) can be written as

Lc|y,Q[ j]= ln

∑c:c[ j]=1 eζcoh(c)∑c:c[ j]=0 eζcoh(c)

. (6.36)

Note that, in the summations of (6.36), all possibilities of c ∈ {0,1}K log2 |A | are considered, not onlythose in the codebook. (The same holds true in related equations throughout the chapter.) This reflectsour assumption that the equalizer does not use knowledge of the code structure to generate posteriorLLRs; code structure is exploited only by the decoder.

Computing Lc|y,Q[ j] through (6.36) would require 2K log2 |A | evaluations of the MAP metric ζcoh(c),and hence would be impractical. However, as suggested in Hochwald & ten Brink (2003), the



“max-log” approximation ln∑

c eζ(c) ≈maxc ζ(c) can be applied to yield

Lc|y,Q[ j]≈ maxc:c[ j]=1

ζcoh(c)− maxc:c[ j]=0

ζcoh(c). (6.37)

Suboptimal tree search can then be used to find the set of all bit vectors c ∈ {0,1}K log2 |A | whichyield non-negligible coherent MAP metrics ζcoh(c), as detailed in de Jong & Willink (2005). Once theposterior LLRs have been calculated, it is possible to generate hard bit estimates through (6.21), ifneeded. In a turbo configuration, though, the equalizer passes the posterior LLRs to a soft-input/soft-output decoder. After decoding, the refined LLRs are passed back to the equalizer to be used as priors,i.e., lc. (Recall Fig. 6.1.)

6.3.2.5 Iterative Soft EqualizationFor approximate symbol/bit MAP equalization, one can consider using iterative soft equalization tech-niques (Tuchler, Koetter, & Singer, 2002; Wang & Poor, 1999) as an alternative to the trellis andtree-search approaches described earlier. The iterative soft equalization techniques described here uselinear estimation strategies in conjunction with evolving beliefs of the interfering bits. After estimat-ing a given bit, the equalizer updates its belief about that bit to better estimate the other bits. Onceall bit beliefs (e.g., LLRs) have been updated, the process repeats. The equalizer may itself iterateseveral times and/or it may trade soft bit information with a soft-input/soft-output decoder in a turboconfiguration.

Below we detail the main concepts behind iterative soft equalization for the simple case of BPSK.12

This simplification allows us to make a direct mapping between each bit and a corresponding symbol,e.g., a[ j]= 2b[ j]− 1 for j= 0, . . . ,K− 1, where b[ j] ∈ {0,1} and a[ j] ∈A = {−1,+1}. In this case,the a priori LLR from (6.20) can be rewritten as

Lc[ j]= lnPr{a[ j]=+1}

Pr{a[ j]=−1}. (6.38)

Suppose that we are interested in estimating the jth bit, c[ j], or equivalently the jth symbol, a[ j].And say that, when doing so, we have prior information on the other bits, and thus the other symbolsaj , (a[0] · · ·a[ j− 1] 0 a[ j+ 1] · · ·a[K− 1])T , that come in the form of a priori LLRs. To facilitate theuse of linear operations, the symbol estimation stage treats the elements in aj as independent Gaussianwith means and variances that are calculated from the respective LLRs. In particular, the calculatedmean of a[k] (for k 6= j) is computed through µa[k] ,

∑a∈{−1,+1} a Pr{a[k]= a} using the identity

Pr{a[k]= a} =exp

((a− 1)Lc[k]/2

)1+ exp(−Lc[k])

for a ∈ {−1,+1}, (6.39)

from which it can be shown that

µa[k]=1− exp(−Lc[k])

1+ exp(−Lc[k])= tanh(Lc[k]/2). (6.40)

12The case of nonbinary alphabets follows similar principles but is more tedious to describe.



Similarly, the calculated variance of a[k] (for k 6= j) is computed through

va[k] ,−µa[k]2+

∑a∈{−1,+1}

a2 Pr{a[k]= a} = 1−µa[k]2. (6.41)

The estimation of a[ j] proceeds by writing the observation as

y= qja[ j]+Qaj+ z, (6.42)

where qj denotes the jth column of Q. For convenience, we collect the calculated means intoµj , (µa[0] · · ·µa[j− 1] 0 µa[j+ 1] · · ·µa[K− 1])T and the calculated variances into vj , (va[0] · · ·va[j− 1] 0 va[j+ 1] · · ·va[K− 1])T .

In the classical iterative soft equalization approach proposed by Wang & Poor (1999), softinterference cancelation:

xj = y−Qµj (6.43)

is followed by LMMSE combining:

aLMMSE[ j]= eHj xj with ej = arg min

e∈CKE{∣∣a[ j]− eHxj

∣∣2}. (6.44)

Writing the interference-canceled vector as

xj = qja[ j]+ rj, (6.45)

with residual interference vector

rj =Q(aj− µj)+ z, (6.46)

it can be seen that Crj =Qdiag{vj}QH+Cz. Withholding prior belief on a[ j], so that E{a[ j]} = 0 and

var{a[ j]} = 1, the LMMSE combiner in (6.44) becomes

ej = C−1xj

Cxj,a[ j] =(qjqH

j +Crj

)−1qj =1

1+qHj C−1

rj qjC−1

rjqj, (6.47)

where the matrix inversion lemma was used to obtain the right side of (6.47). Thus, aLMMSE[ j] becomes

aLMMSE[ j]=qH

j C−1rj

xj

1+qHj C−1

rj qj. (6.48)

From straightforward arguments,13 one can conclude that any scaled version of the statistic

g[ j] , qHj C−1

rjxj, (6.49)

13Sufficiency can be understood as follows. After constructing the interference-canceled/whitened observation C−1/2rj xj =

C−1/2rj qja[ j]+C−1/2

rj rj, the application of the matched filter C−1/2rj qj, or any scaling thereof, yields a sufficient statistic for

the detection of a[ j] (Poor, 1994). These two steps are combined in writing (6.49).



including the LMMSE estimate aLMMSE[ j], is sufficient (Poor, 1994) for ML14 detection of a[ j] (andthus of b[ j]) from xj. In fact, the ML symbol decision is simply the sign of

Lg[ j] , lnf (g[ j] | a[ j]=+1)

f (g[ j] | a[ j]=−1)= ln

f (g[ j] | c[ j]= 1)

f (g[ j] | c[ j]= 0). (6.50)

Expanding g[ j] as

g[ j]= qHj C−1

rjqj a[ j]+qH

j C−1rj

rj, (6.51)

it can be seen that g[ j]|a[ j] is circular Gaussian with mean a[ j]µg[ j] and variance σ 2g[ j], where µg[ j] =

qHj C−1

rjqj = σ

2g[ j]. Hence,

Lg[ j]= lnexp

(−∣∣g[ j]−µg[ j]

∣∣2/σ 2g[ j]

)exp

(−∣∣g[ j]+µg[ j]

∣∣2/σ 2g[ j]

) (6.52)

=−∣∣g[ j]−µg[ j]

∣∣2/σ 2g[ j]+

∣∣g[ j]+µg[ j]∣∣2/σ 2

g[ j] (6.53)

= 4Re{g[ j]}. (6.54)

Finally, a posterior LLR on a[ j] (and hence on c[ j]) can be generated through

lnPr{a[ j]=+1 | g[ j]}

Pr{a[ j]=−1 | g[ j]}= ln

Pr{c[ j]= 1 | g[ j]}

Pr{c[ j]= 0 | g[ j]}= Lg[ j]+Lc[ j], (6.55)

where (6.50) and Bayes rule were used to obtain the right side of (6.55). The posterior LLR (6.55)can then be used in place of Lc[ j] in (6.40)–(6.41) to calculate the mean µa[ j] and variance va[ j] forsubsequent estimation of {c[k]}k 6=j.

6.3.2.6 Remarks on ComplexityThe coherent equalization tools described in this section are quite general; they apply to any Q and,thus, to any type of linear modulation/demodulation combined with any type of linear channel propa-gation (whether the channel is rapidly TV or not). In fact, when structure in Q is lacking or ignored,equalization can be viewed as a form of CDMA multiuser detection (Moshavi, 1996; Verdu, 1998)where the code matrix (in this case Q) changes from one bit to the next, or as a form of MIMO decoding(Tse & Viswanath, 2005) for communication over a flat-fading channel with K transmit and K receiveantennas. For the case of generic Q, however, the cost of implementing the equalization criteria risesrapidly with K, the block size. For example, we saw that linear and DFE schemes consume O(K3) oper-ations per block, and that more sophisticated schemes can be significantly more expensive. Because,for the applications we envision, typical values of K can be in the hundreds or thousands, equalizationis made practical only by leveraging the structural properties of Q discussed in Section 6.2.2. Usingthese properties, Sections 6.3.3–6.3.4 describe equalization algorithms specifically tailored to rapidlyTV channels.

14Because we assume a uniform prior on a[ j], ML detection is equivalent to MAP detection.



6.3.3 Coherent Equalization for Time-Frequency ConcentratedModulation/Demodulation

Recall from Section 6.2.2 that, when sufficiently time-frequency concentrated modulation/demodula-tion pulses are used, the effective channel matrix Q falls into the “narrowly quasibanded” class. Here,Q contains only negligible coefficients outside of the shaded region in Fig. 6.2(b), for some D� K.The main idea behind the equalization algorithms discussed in this section is that, by ignoring thesenegligible coefficients, the complexity of equalization can be significantly reduced without a significantloss in performance.

In this section, we will treat the interference caused by the negligible coefficients in Q as if it werepart of the additive noise z, allowing us to regard the negligible coefficients in Q as if they were zero-valued. In doing so, we will assume that the interference radius D has been chosen large enough so thatthese additional contributions to z are relatively small (for the SNRs of interest). In particular, we willassume that the value of D allows us to continue treating z as statistically independent of a, as assumedin Section 6.2. With suitably designed modulation/demodulation schemes like the max-SINR schemesin Das & Schniter (2007), these assumptions have been shown (Hwang & Schniter, 2006)15 to besatisfied with D= dνmaxTsKe+ 1 at SNRs up to at least 10 dB and with D= dνmaxTsKe+ 2 at SNRsup to at least 30 dB. Less time-frequency-concentrated schemes require larger values of D, makingequalization more expensive to implement for the same level of residual interference. For example, theinterference profiles in Fig. 6.4 suggest that Strohmer & Beaver’s scheme (Strohmer & Beaver, 2003)requires a radius D at least 2 higher than the max-SINR scheme of Das & Schniter (2007) for the samelevel of residual interference.

In the remainder of this section, we provide some insight into how the narrowly quasibandedstructure of Q can be leveraged to lower the complexity of the equalization strategies describedin Section 6.3.2. In particular, we identify two principal approaches to this problem: fast serialequalization and fast joint equalization. We keep our description brief because equalization for nar-rowly quasibanded Q is related to particular forms of equalization for OFDM, which is the topic ofChapter 7.

6.3.3.1 Fast Serial EqualizationMany techniques that leverage the narrowly quasibanded structure of Q can be classified as fast serialequalization techniques. These techniques avoid the K×K matrix operations (e.g., inversion and LDUdecomposition) specified in Section 6.3.2 for, e.g., linear equalization (6.24)–(6.26), decision feedbackequalization (6.29)–(6.30), tree-search-based equalization (6.35), and iterative soft equalization (6.49).Instead, the fast serial techniques work on the local observation model

yk =Qkak + zk (6.56)

15For the specified D and SNR range, MLSD performance was found to be identical whether the out-of-band Q coefficientswere treated as part of the channel or as part of the noise. We note that the variable “D” in Hwang & Schniter (2006) isdefined to have twice the value of D in this chapter because in Hwang & Schniter (2006) the effective channel matrix isreal-valued.



qk

yk

Qk

ak

zk

a[k]

+2D

+1

4D + 1

=

FIGURE 6.6

The local observation model used for fast serial equalization.

when estimating the symbol a[k] (or any coded bits represented by a[k]) for k = 0, . . . ,K− 1. Here,yk , (y[k−D] · · ·y[k+D])T and ak , (a[k− 2D] · · ·a[k+ 2D])T are illustrated in Fig. 6.6, along withQk and zk. The principal idea behind the local model is the following. Because a[k] affects onlythe local observations yk ∈ C2D+1 within y ∈ CK , use only these local observations to estimate a[k].We can, thus, think of Qk ∈ C(2D+1)×(4D+1) as the “local effective channel matrix.” It is usually con-venient to increment the index k in steps of 1, so that estimation is performed serially, i.e., one symbolat a time. And sometimes it helps to start over at k = 0 after k = K− 1 has been reached. Notice that,due to the corner support regions of the quasibanded Q in Fig. 6.6, the local observation window shiftscyclically within y.

To our knowledge, Jeon, Chang, & Cho (1999) were the first to apply this fast serial approach tothe equalization of rapidly TV channels. In particular, they proposed an LMMSE approximation thatrequired only O(KD3) operations per block. Note that, when D� K, their approach is much cheaperthan standard O(K3) LMMSE, i.e., (6.24)–(6.25). Cai & Giannakis (2003) proposed LMMSE andMMSE-DFE extensions of Jeon et al. (1999), where the inversion of the (2D+ 1)× (2D+ 1) covari-ance matrix Cyk was accomplished using a rank-one update. However, their schemes require O(K2D)operations per block, where the quadratic dependence on K remained as a result of not fully exploit-ing the quasibanded property of Q. Barhumi, Leus, & Moonen (2004, 2006) proposed generalizedO(K2D2) per-tone linear equalization schemes that allowed oversampling in the frequency domain.

Hunziker & Dahlhaus (2003) proposed an iterative approximation to ML symbol detection in whichthe likelihoods of individual symbols were serially maximized assuming tentative hard decisions on theother symbols. To reduce error propagation, they initialized using an approximation of LZF that wasimplemented serially using Gauss–Seidel iterations. Das & Schniter (2007) and Schniter (2004) pro-posed iterative soft equalization based on the local observation model (6.56), requiring only O(KD3)

operations per block iteration. As discussed in Section 6.3.2, a well-designed iterative soft equalizer iseffective at preventing error propagation and can be used in a turbo configuration, as in Das & Schniter(2007). A similar iterative soft equalization scheme was proposed later by Peng & Ryan (2006).



6.3.3.2 Fast Joint EqualizationFast joint equalization techniques have also been proposed for the coherent equalization of narrowlyquasibanded versions of Q that result when time-frequency concentrated modulation/demodulation isused with rapidly TV channels. As opposed to serial equalization schemes, which estimate the symbolsin a one-at-a-time, joint equalization schemes estimate the K symbols in a jointly.

Early joint techniques assumed not only that Q is narrowly quasibanded, but also that the off-diagonal coefficients within the support region of Q are themselves relatively small. For example,iterative LZF approximation techniques that require O(KD) operations per block iteration were pro-posed by Toeltsch & Molisch (2001) and Guillaud & Slock (2003). Gorokhov & Linnartz (2004)proposed O(KD) approximate LMMSE and DFE-like schemes using a first-order Taylor series approx-imation of the K×K LMMSE matrix inverse. Tomasin, Gorokhov, Yang, & Linnartz (2005) extendedthe techniques in Gorokhov & Linnartz (2004) to incorporate iterative hard interference cancelation.Hou & Chen (2005) proposed a O(KD2) nonlinear estimator of the form a= EFFy−EFBDA (EFFy),where EFF and EFB are both narrowly banded. Note that, in Hou & Chen (2005), quantization isperformed on the linear estimates EFFy rather than (causally) on the final estimates a, as in DFE (6.27).

More recently, Rugini, Banelli, & Leus proposed O(KD2) exact LMMSE (Rugini, Banelli, &Leus, 2005) and MMSE-DFE (Rugini et al., 2006) schemes for narrowly banded16 Q based on fastLDU decomposition. Furthermore, they showed how to design a receiver window to ensure that thenoise covariance Cz is quasibanded, making the observation covariance Cy = σ

2a QHQ+Cz banded

as well. These nonapproximate LMMSE and MMSE-DFE equalizers are expected to outperform theirapproximate counterparts. (See Chapter 7 for more details.)

Joint MLSD-based schemes exploiting the quasibanded structure of Q have also been proposed.For example, Matheus & Kammeyer (1997) applied the Viterbi algorithm to implement exact MLSDon a (2D+ 1)-banded Q with complexity O(KD|A |2D+1). The same idea was reinvented in laterworks, e.g., (Sigloch et al., 2005). For typical values of |A | and D, however, the complexity of Viterbiequalization can be orders-of-magnitude higher than that of MMSE-DFE. Thus, Hwang & Schniter(2006) investigated tree-search approaches to approximate MLSD. Their techniques use fast MMSE-DFE preprocessing, costing O(KD2) operations, followed by a tree-search that uses a fast metric updateand is tuned to the V-shaped structure of the upper triangular matrix 11/2

MMSE-DFEUMMSE-DFE in the causalmodel (6.33). (Recall the V-shaped illustration in Fig. 6.5(c).) The resulting scheme has approximatelythe same complexity as the fast MMSE-DFE from Rugini et al. (2006), yet results in performance thatis almost indistinguishable from MLSD.

MAP schemes exploiting the quasibanded structure of Q have also been proposed. For (2D+ 1)-banded Q, e.g., Liu & Fitz (2007) used reduced-state sequence estimation (Eyuboglu & Qureshi, 1988)to compute approximate soft bit estimates while Hwang & Schniter (2009) applied tree-search (witha fast metric update). Finally, using the technique of Tuchler, Koetter, & Singer (2002), it is straight-forward to translate any set of LMMSE estimates into soft bit estimates. Leveraging this idea, Fang,Rugini, & Leus (2008) turned the fast joint LMMSE estimation scheme of Rugini et al. (2005) into asoft bit estimation scheme.

16With minor modifications, the fast LDU decomposition that Rugini, Banelli, & Leus developed for banded matrices (i.e.,those matching Fig. 6.5(b)) can be extended to quasibanded matrices (i.e., those matching Fig. 6.5(a)) (Hwang & Schniter,2006).



6.3.3.3 Other Approaches to Equalization for Time-Frequency Concentrated SchemesFor completeness, we mention two other schemes proposed for the equalization of channels yield-ing a narrowly quasibanded Q. The article by Choi, Voltz, & Cassara (2001) was among the firstto consider equalization for multicarrier modulation over doubly selective (i.e., time- and frequency-selective) channels, and it proposed ZF, LMMSE, and ZF-DFE schemes for doing so. However, theseschemes consumed O(K3) operations per block because the narrowly quasibanded structure of Q wasnot leveraged. For the same application, Stamoulis, Diggavi, & Al-Dhahir (2002) proposed an O(K2)

LMMSE approximation where the matrix to be inverted during each block interval is replaced by itstime average. As we have seen, however, near-optimal schemes can be designed with complexities thatare linear in K.

6.3.4 Coherent Equalization for Single-Carrier Modulation/DemodulationRecall from Section 6.2.2 that, when single-carrier modulation/demodulation is used, the effectivechannel matrix Q falls into the “widely quasibanded” class. Here, Q has only negligible coefficientsoutside the shaded region in Fig. 6.2(a), where M denotes the discrete channel delay spread. Because,in this case, Q is quasibanded, the equalization techniques described in Section 6.3.3 can in principlebe applied here as well (e.g., Ahmed, Sellathurai, Lambotharan, & Chambers, 2006). However, thisapproach will only be practical when M is small. Because M is often large (e.g., in the hundreds), thereis good reason to study equalization schemes whose complexities are robust to large M.

6.3.4.1 Frequency-Domain EqualizationFrequency-domain equalization (FDE) (Falconer et al., 2002) is one approach to make the equalizationcomplexity of single-carrier schemes reasonable when M is large. To describe FDE, we focus on thecase of CP-SCM modulation/demodulation, assuming adequate guard length (i.e., Ng ≥M− 1) andwhite noise (i.e., Cz = σ

2z IK). The first step of FDE is transformation of the observations y to the

frequency domain. Denoting the K×K unitary discrete Fourier transform (DFT) matrix by W andusing under-bars to identify frequency-domain vectors (e.g., y , Wy, a , Wa, and z , Wz), it followsfrom (6.2) that

y=Qa+ z (6.57)

where Q=WQWH and where z also has covariance σ 2z IK . The second step of FDE is estimation of a

from y. Although the elements of a belong to a finite alphabet, the elements of a do not, and hence the

estimator must be linear. One option is LMMSE estimation, i.e., aLMMSE = (QHQ+ σ 2

zσ 2

aIK)−1QHy. The

third and final step of FDE is transformation of a back to the time domain, yielding the symbol esti-mates a , WH a. If the DFTs are implemented using radix-2 FFTs, they will consume only O(K log2 K)operations.

With a time-invariant channel, the use of CP-SCM makes Q circulant (recalling Section 6.2.2)and hence Q diagonal. In this case, the LMMSE estimation step consumes only O(K) operations(because the matrix to invert is diagonal), and FDE consumes O(K log2 K) operations in total. Notethat, for large M, FDE would be significantly cheaper than LMMSE estimation of a from y throughfast LDU (Rugini et al., 2005) (as discussed in Section 6.3.3), which consumes O(KM2) operations,where typically M ≈ K/4.



With a TV channel, Q will not be circulant, and thus Q will not be diagonal. In this case, the off-diagonal terms of Q will be nonzero, complicating the estimation of a from y. In fact, the interferencepower profile of Q with CP-SCM is identical to that of Q with CP-OFDM, which (as shown in Fig. 6.4)decays quite slowly with distance from the diagonal. However, through the application of time-domainwindowing17 at the demodulator (Schniter & Liu, 2003), it is possible to give Q the narrowly quasi-banded support of Fig. 6.2(b), in which case any of the fast linear equalization techniques described inSection 6.3.3 can be used to estimate a from y. For example, Tang & Leus (2008) proposed a methodto equalize a single-carrier system using the OFDM fast LMMSE technique (Rugini et al., 2005).

Because a does not have a finite-alphabet structure, the trellis, DFE, and tree-search-based tech-niques discussed in Section 6.3.3 are not directly applicable to the estimation of a. Iterative softequalization, however, is applicable. We now summarize the approach proposed by Schniter & Liu(2003). First, the fast serial iterative soft equalization technique of Schniter (2004) is used to computethe LMMSE interference-canceled estimate a from the frequency-domain windowed observations y(given current estimates of the time-domain symbol means and variances). Next, the estimates a aretransformed to the time domain through a=WH a, from which posterior LLRs are calculated for eachof the bits c[ j]. The posterior LLR computation is more complicated than (6.54)–(6.55), though, dueto the correlation that results from the time-frequency transformation. Finally, the posterior LLRs areused as priors in the next iteration, which begins by recalculating the time-domain symbol means andvariances. In Schniter & Liu (2003), a fast algorithm for the entire procedure was derived that con-sumes only O(D2K logK) operations per block iteration. Ng & Falconer (2004) later extended thetechnique of Schniter & Liu (2003) to include widely linear estimation (although they neglected thereceiver windowing step).

Although the windowed FDE method above focuses on CP-SCM, similar techniques can be appliedto ZP-SCM under appropriate processing of the received guard samples. For single-carrier modu-lation without a prefix, the use of IBI-cancellation and cyclic-prefix reconstruction (Kim & Stuber,1998) enables the application of the CP-SCM-based windowed FDE methods discussed earlier, asdemonstrated by Schniter & Liu (2004).

6.3.4.2 Other Approaches to Equalization for Single-Carrier SchemesBarhumi, Leus, & Moonen (2005) proposed a CP-SCM equalization technique based on linear TV

AQ:1

filters whose time-variations were constrained to obey a (possibly oversampled) complex-exponentialbasis expansion model of order I− 1. Under these constraints, LZF and LMMSE equalizers, requiringO(KI3M3) operations per block, were designed. However, because of the cubic complexity in M, theseschemes are much more expensive than frequency-domain equalization when M is large.

6.4 NONCOHERENT EQUALIZATIONIn Section 6.3, we discussed the coherent approach to equalization, i.e., estimation of a from y in(6.2), where the channel H, and hence the effective channel Q, was assumed to be known. Here, wediscuss noncoherent equalization, where the channel realization H is unknown but its statistics may beknown.

17With time-domain windowing, y=W1y and Q=W1QWH for suitably chosen diagonal 1.


CHAPTER

7OFDM Communicationsover Time-Varying Channels

Luca Rugini1, Paolo Banelli1, Geert Leus2

1University of Perugia, Perugia, Italy2Delft University of Technology, Delft, The Netherlands

7.1 OFDM SYSTEMSOrthogonal frequency-division multiplexing (OFDM), also known as multicarrier modulation(Bingham, 1990; Cimini Jr, 1985; Keller & Hanzo, 2000; Le Floch, Alard, & Berrou, 1995; Sari,Karam, & Jeanclaude, 1995; Wang & Giannakis, 2000; Zou & Wu, 1995), relies on the concept ofparallel data transmission in the frequency domain and mainly owes its success to the easy equaliza-tion for linear time-invariant (LTI) frequency-selective channels. In OFDM systems, the data symbolstream is split into L parallel flows, which are transmitted on equispaced frequencies called subcar-riers, each one characterized by a transmission rate that is 1/L times lower than the original datarate. This is obtained by splitting the original data stream into multiple blocks, which are transmit-ted in consecutive time intervals, where each symbol of a block is associated to a specific subcarrier.This frequency-domain multiplexing can be efficiently performed by means of fast Fourier transformalgorithms.

Due to the use of orthogonal (equispaced) subcarriers, OFDM systems with LTI frequency-selectivechannels avoid the so-called intercarrier interference (ICI) among the data symbols of the same OFDMblock. Differently from conventional frequency-division multiplexing, a frequency overlapping amongthe spectra associated to different substreams is permitted, resulting in a significant reduction of thebandwidth requirements. Moreover, for LTI frequency-selective channels, the absence of ICI allowsan easy channel equalization, which can be performed on a per-subcarrier basis by means of scalardivisions. The intersymbol interference (ISI)1 among data symbols of different OFDM blocks, inducedby multipath propagation, is avoided by a suitable cyclic extension of each OFDM block, usuallyreferred to as cyclic prefix (CP) (Sari et al., 1995; Wang & Giannakis, 2000; Zou & Wu, 1995).

However, when the channel experiences a nonnegligible time variation, each subcarrier under-goes a Doppler spreading effect that destroys the subcarrier orthogonality, producing significant ICI(Robertson & Kaiser, 1999; Russell & Stuber, 1995; Stantchev & Fettweis, 2000). Dually to the ISI insingle-carrier systems, the ICI power reduces the signal-to-interference-plus-noise ratio (SINR) and,when left uncompensated, impairs the performance of OFDM systems. A simple method that reducesthe ICI is the shortening of the OFDM block duration. This way the channel becomes (almost) constant

1The ISI is also known as interblock interference, while the OFDM blocks are also known as OFDM symbols.


285


286 CHAPTER 7 OFDM Communications over Time-Varying Channels

over each block. However, the block-length shortening is capacity inefficient, because the CP has to beinserted more frequently. Therefore, other ICI mitigation techniques are necessary. These techniquesare reviewed in Section 7.2. In addition, the rapid time variation of the channel makes its estimationmore complicated. This issue is discussed in Section 7.3.

In this section, we first set up the system model and review the behavior of OFDM systems withLTI channels, focusing on the most popular OFDM wireless standards. Subsequently, we show theeffects of rapidly time-varying channels on conventional OFDM systems by analyzing the ICI power,the SINR degradation, and the bit-error rate (BER) performance loss. Finally, we extend the systemmodel to multiantenna OFDM systems.

7.1.1 System ModelWe consider an OFDM system with L equispaced subcarriers, where F is the subcarrier separation andLCP is the size of the CP that is prepended to each OFDM block. The whole OFDM system we aregoing to describe is depicted in Fig. 7.1.

After serial-to-parallel conversion, the stream of symbols is split into data blocks. Each OFDMblock, of size L, can contain either data symbols or pilot symbols, or both data and pilots, dependingon the training pattern. The pilot symbols may be used at the receiver side for time and frequencysynchronization, channel estimation, phase offset correction, and so on. Virtual carriers, which areincluded in every OFDM system as guard bands to prevent adjacent-channel interference, are consid-ered as pilot symbols. The generic symbol transmitted on the lth subcarrier of the kth OFDM block isdenoted by x[l,k]. Defining x[k] , (x[0,k] · · ·x[L− 1,k])T as the vector that collects the data a[k] and

a[k]Data

x[k]

p[k]Pilots

IDFT

CP insertion

CP removal

Windowing DFT

Equalizer a[k]

Channelestimator

W

LTV channel

AWGN

s[k]

r[k]yw[k]

y[k]=yw[k]|Δ= IL

z[k]

r[k]H0[k]

H1[k]WH

LCP

LCP

Δ

FIGURE 7.1

OFDM system model. Top: transmitter and channel. Bottom: receiver.


7.1 OFDM Systems 287

pilots p[k] of the kth block, i.e., x[k]= a[k]+p[k], the kth transmitted block s[k], of size N = L+LCP,can be expressed as (Wang & Giannakis, 2000)

s[k]= TCPWHx[k]. (7.1)

Here, W is the L×L unitary discrete Fourier transform (DFT) matrix, defined by [W]i,l , 1√

Le−j2π il/L,

0≤ i, l≤ L− 1, and TCP ,(IT

CP IL)T

is the N×L matrix that inserts the CP, where ICP contains the lastLCP rows of the identity matrix IL. Thus, OFDM can be seen as a particular linearly precoded blocktransmission, with precoding matrix TCPWH .

After parallel-to-serial conversion, the signal stream s[kN+ n] , [s[k]]n is transmitted through alinear time-varying (LTV) multipath channel with discrete-time impulse response h[n,m], where n isthe time index and m is the time-delay (lag) index. We assume a finite impulse response LTV channel,i.e., h[n,m] has zero entries outside 0≤ m≤M− 1. Assuming time and frequency synchronization atthe receiver side, the received samples can be expressed as

r[n]=M−1∑m=0

h[n,m]s[n−m]+w[n],

where w[n] represents additive white Gaussian noise (AWGN). The N received samples relative to thekth OFDM block are grouped in the vector r[k], with [r[k]]n = r[kN+ n], thus obtaining

r[k]=H0[k]s[k]+H1[k]s[k− 1]+w[k]. (7.2)

Here, H0[k] and H1[k] are N×N matrices with elements [H0[k]]n,m = h[kN+ n,n−m] and[H1[k]]n,m = h[kN+ n,N+ n−m]:

H0[k]1=

h [kN,0] 0 · · · · · · 0...

. . .. . .

...

h [kN+M− 1,M− 1]. . .

. . ....

.... . .

. . . 0

0 · · · h [kN+N− 1,M− 1] · · · h [kN+N− 1,0]

,

H1[k]1=

0 · · · h [kN,M− 1] · · · h [kN,1]...

. . .. . .

...

0. . . h [kN+M− 2,M− 1]

.... . .

. . ....

0 · · · 0 · · · 0

.

In obtaining (7.2), we have implicitly assumed that the block length N is greater than the channelorder M− 1 so that ISI is possible only from the previous data block s[k− 1]. At the receiver, r[k]in (7.2) is left-multiplied by the matrix RCP ,

(0L×LCP IL

)that removes the CP. In what follows, we



assume LCP ≥M− 1. Then, the ISI is completely eliminated, since RCPH1[k]s[k− 1]= 0L×1 (Wang &Giannakis, 2000).

Next, the received signal is converted to the frequency domain by applying a DFT, as expressed byy[k] , WRCPr[k], which by (7.1) and (7.2) can be rewritten as

y[k]=WHT[k]WHx[k]+WRCPw[k]=HF[k]x[k]+ z[k]. (7.3)

Here, HT[k] , RCPH0[k]TCP is the L×L matrix that summarizes the LTV channel in the time domain,including CP insertion and removal, with elements expressed by

[HT[k]]n,m = h[kN+LCP+ n,(n−m) mod L], (7.4)

while HF[k] , WHT[k]WH is the L×L frequency-domain channel matrix, with elements expressedby

[HF[k]]l+d,l =1

L

L−1∑n=0

L−1∑m=0

[HT[k]]n,me−j2π(l(n−m)+dn)/L

=1

L

L−1∑n=0

M−1∑m=0

h[kN+LCP+ n,m]e−j2π(lm+dn)/L,

(7.5)

where l represents the subcarrier index and d is the discrete Doppler index. Specifically, the off-diagonal elements of the lth column of HF[k] represent the discrete Doppler spread associated withthe lth subcarrier, which is responsible for the ICI induced by the lth symbol of the OFDM block onthe other symbols.

Summarizing, clearly HF[k] plays a crucial role, since it describes how the transmitted frequency-domain block x[k] is modified by the LTV channel. In addition, in (7.3), z[k] , WRCPw[k] is thefrequency-domain noise, which is AWGN because W is unitary.

7.1.1.1 LTI Channels and One-Tap EqualizersWhen either the channel time variation is absent, i.e., for LTI multipath channels, or it can be neglected,the channel impulse response (CIR) is constant over time. Hence, (7.4) becomes [HT[k]]n,m = h[0,(n−m) mod L], i.e., HT[k]=HT is circulant and constant over the OFDM blocks. In this scenario, the CPnot only eliminates the ISI, which could be removed by any kind of sufficiently long guard interval,e.g., by trailing zeros (Wang & Giannakis, 2000). In addition, the CP induces a time-domain circularconvolution of the transmitted signal with the CIR, which corresponds to a scalar multiplication inthe discrete frequency domain. Because the columns of the DFT matrix, which linearly precodes theOFDM data, are eigenvectors of circulant matrices, the eigenvalue decomposition of HT is given byHT =WH3W. Consequently, HF[k]=HF =3 is diagonal, which shows that in LTI channels thereis no ICI. A continuous-time interpretation of OFDM systems is that, for every OFDM block, the lthsymbol is transmitted in the frequency domain by a sinc function centered on the lth subcarrier. Thezeros of this sinc function are located on the other equispaced subcarriers, which guarantees ICI-free



reception by DFT spectrum sampling. From (7.5), it is easy to derive

λll , [3]l,l =

M−1∑m=0

h[0,m]e−j2π lm/L,

i.e., HF contains on its diagonal the DFT of the CIR. Due to the diagonal frequency-domain channelmatrix, the input–output relation can be expressed as

y[k, l]= λllx[k, l]+ z[k, l].

Hence, in OFDM systems, the equalization of LTI channels is rather simple and may be computed asx[k, l]= y[k, l]/λll (Sari et al., 1995). This is usually referred to as one-tap equalization.

In general, the channel transfer function is estimated, for pilot locations, by λll = y[k, l]/p[k, l] orby λll =

1K

∑K−1k=0

y[k,l]p[k,l] when the pilot positions are constant for K OFDM blocks. Estimates of λll for

the data subcarriers are usually obtained by interpolating the channel values estimated for the pilotsubcarriers (Ozdemir & Arslan, 2007).

7.1.1.2 OFDM StandardsIn this section, we compare some popular wireless OFDM standards to appreciate their sensitivity tothe Doppler spread induced by LTV channels. The OFDM standards under investigation are DVB-T/H (ETSI, 2004), DAB (ETSI, 2005), IEEE 802.11a (IEEE, 1999), and IEEE 802.16e (WiMAX)(IEEE, 2006). Clearly, the time variability of the channel, summarized by the channel coherence timeTc, should be compared with the duration of the OFDM block: standards with longer OFDM blockduration are more sensitive to the Doppler effect; they feature bigger channel matrices HT[k], whosediagonals display a larger time variability of the channel.

Dually, we can compare the maximum Doppler frequency νmax with the subcarrier separation F:indeed, as it will be clarified in the next section, the ICI power is roughly a quadratic function of thenormalized maximum Doppler shift ϑmax , νmax/F. As a consequence, robust standards have a smallϑmax. This quantity can be calculated as

ϑmax =fcF

υ

c0,

where fc is the carrier frequency, v is the relative speed between transmitter and receiver, and c0 is thespeed of light.

For the IEEE 802.11a WLAN standard (IEEE, 1999), F = 312.5kHz, while the frequency band isaround 5 GHz. Specifically, for the maximum carrier frequency fc = 5.825 GHz and speed υ = 100km/h, we obtain ϑmax ≈ 0.0017.

For the IEEE 802.16e WiMAX standard (IEEE, 2006), the subcarrier separation depends on theratio between the allocated bandwidth B and the number L of subcarriers as F ≈ nB/L, where n isa rational scaling factor between 1.12 and 1.152. Typical low values of B and L give F ≈ 9.77n≈10 kHz, which equals the value obtained for typical high values of B and L. Since the maximum carrierfrequency is fc = 10.68 GHz, for υ = 100 km/h, we obtain ϑmax ≈ 0.089. This value, which is roughly50 times higher than for WLAN, explains why the channel time variation could be a problem forWiMAX, while it can be ignored in WLAN systems. Indeed, the emerging IEEE 802.11p standard



amendment for vehicular communications further increases the number of subcarriers with respect toIEEE 802.11a, without significantly degrading the Doppler resistance. The CP length is increased toguarantee ISI-free transmission in outdoor environments. The resulting loss in spectral efficiency iskept down by an increased duration of the IEEE 802.11p OFDM block.

For DVB-T/H (ETSI, 2004), the value of ϑmax is highly dependent on the channel bandwidth B,which ranges from 5 to 8MHz for different countries, on the carrier frequency fc, and on the trans-mission mode, which determines the number of subcarriers L for a given bandwidth B. The availablemodes are Mode 2k (L= 2048), Mode 4k (L= 4096), and Mode 8k (L= 8192). In the following,we will focus on the 8k mode, which is the most sensitive to the Doppler spread. Assuming againυ = 100 km/h, for fc = 230MHz, the normalized maximum Doppler shift is between ϑmax ≈ 0.019 (forB= 8MHz) and ϑmax ≈ 0.031 (for B= 5MHz). Hence, in this case, the Doppler effect for DVB-T/H8k is less pronounced than for WiMAX. For fc = 862MHz, we have ϑmax ≈ 0.071 for B= 8MHz andϑmax ≈ 0.11 for B= 5MHz. Therefore, in this second case, the sensitivity of DVB-T/H 8k to the chan-nel time variation is similar to WiMAX. In addition, for fc = 1.492 GHz, the performance degradationof DVB-T/H 8k is even higher than for WiMAX, since ϑmax ≈ 0.12 for B= 8MHz and ϑmax ≈ 0.20for B= 5MHz. The results for B= 6MHz and B= 7MHz can be found in Table 7.1. With respect toMode 8k, the subcarrier separation F of Modes 4k and 2k is the double and quadruple, respectively,and consequently, the values of ϑmax are one-half and one-quarter of those for the 8k mode listed inTable 7.1.

Also for DAB (ETSI, 2005), and for its evolution known as T-DMB, we have to distinguish amongdifferent cases, depending on the carrier frequency fc and the transmission mode. However, the trans-mission bandwidth is fixed to B= 1.536MHz. The values of ϑmax, listed in Table 7.2, show that thesensitivity of DAB Mode I to Doppler is similar to that of DVB-T/H Mode 8k with B= 7MHz.The sensitivity of DAB Mode IV is similar to that of DVB-T/H Mode 4k, and the sensitivity ofDAB Mode II is similar to that of DVB-T/H Mode 2k. Indeed, in all cases, the ratio of the number

Table 7.1 Normalized Maximum Doppler Shift ϑmax for DVB-T/H (Mode 8k),Assuming υ = 100km/h.

B = 5MHz B = 6MHz B = 7MHz B = 8MHz

fc = 230MHz ϑmax ≈ 0.031 ϑmax ≈ 0.025 ϑmax ≈ 0.022 ϑmax ≈ 0.019



Table 7.2 Normalized Maximum Doppler Shift ϑmax for DAB, Assuming υ = 100km/h

Mode I, L = 2048 Mode IV, L = 1024 Mode II, L = 512 Mode III, L = 256






of subcarriers of DVB-T/H versus DAB is constant (equal to 4), and approximately equal to thebandwidth ratio.

7.1.2 Effects of Rapidly Time-Varying ChannelsWhen the channel is LTV, the frequency-domain channel matrix HF[k] in (7.5) is neither diagonal norconstant over successive OFDM blocks. Therefore, both the useful channel part and the ICI changefrom block to block. Let us split the frequency-domain channel matrix into two parts, as expressed by

HF[k]=Q[k]+8[k],

where Q[k] is the diagonal part of HF[k] and 8[k] , HF[k]−Q[k] is the corresponding off-diagonalmatrix. Then, (7.3) can be rewritten as

y[k]=Q[k]x[k]+8[k]x[k]+ z[k], (7.6)

where the three terms on the right-hand side of (7.6) represent the useful signal, the ICI, andthe AWGN, respectively. Since qll[k] , [Q[k]]l,l = [HF[k]]l,l, from (7.5), the useful channel can bewritten as

qll[k]=M−1∑m=0

[1

L

L−1∑n=0

h[kN+LCP+ n,m]

]e−j2π lm/L. (7.7)

This is obtained as the DFT of the time-averaged CIR, which is the expression within the squarebrackets in (7.7). When the CIR varies rapidly with time, the time-averaged CIR in (7.7) decreases,because the elements {h[kN+LCP+ n,m]} add incoherently. As a result, the average power of thefrequency-domain useful channel (i.e., the power of the elements of Q[k]) decreases. In addition, arapid time variation of the CIR also leads to an increased ICI power in 8[k], as detailed in the nextsubsection.

7.1.2.1 ICI and SINR AnalysisSince conventional one-tap equalizers do not take the ICI into account, it is important to quantify theeffect of the ICI on the decision variable. Herein, we present a statistical analysis of both the ICI andthe SINR. For simplicity, in this subsection, we assume that only data symbols are transmitted, i.e.,x[k, l]= a[k, l], ∀l,∀k. From (7.6), we obtain

y[k, l]= qll[k]a[k, l]+L−1∑

d=0,d 6=l

φdl[k]a[k,d]+ z[k, l]. (7.8)

Assuming that

1. data and noise terms have zero mean;2. all data symbols on different subcarriers and in different OFDM blocks are uncorrelated and have

equal mean power σ 2a ;

3. the LTV channel is wide-sense stationary with uncorrelated scattering (WSSUS) with normalizedpath loss, i.e., ρ2

H defined in Chapter 1 is equal to one;4. the noise is independent of the data and the channel;



then, the mean power received on the lth subcarrier of the kth OFDM block is expressed as

σ 2y , E{|y[k, l]|2} = E{|qll[k]|2}σ 2

a +PICIσ2a + σ

2z , (7.9)

where PICI ,∑L−1

d=0,d 6=l E{|φdl[k]|2} = 1−E{|qll[k]|2} is the ICI power normalized by the data

power σ 2a .

The value of PICI can be well approximated by assuming an infinite number of subcarriers.When the time-frequency correlation function RH(1t,1f ), defined in Chapter 1, is separable, i.e.,RH(1t,1f )= r(2)H (1t)r(1)H (1f ), or equivalently when the scattering function CH(τ ,ν), defined in

Chapter 1, is separable, i.e., CH(τ ,ν)= c(1)H (τ )c(2)H (ν), PICI can be expressed by (Li & Cimini Jr, 2001)

PICI = 1− 2

1∫0

(1− x)r(2)H

( x

F

)dx

= 1−

νmax∫−νmax

c(2)H (ν)sinc2(πν

F

)dν.

(7.10)

In this case, the ICI power does not depend on the delay power profile of the channel, whereas itdepends on the Doppler power profile. For instance, in case of Jakes’ Doppler power profile withr(2)H (1t)= J0(2πϑmaxF1t), where ϑmax is the normalized maximum Doppler shift, (7.10) becomes(Robertson & Kaiser, 1999)

PICI = 1− 1F2

(1

2;

3

2,2;−(πϑmax)

2)

= 1− 2∞∑

i=0

(−1)i(πϑmax)

2i

(i!)2 (2i+ 1)(2i+ 2)

≈π2

6ϑ2

max−π4

60ϑ4

max+π6

1008ϑ6

max,

(7.11)

where pFq stands for the generalized hypergeometric function (Gradshteyn & Ryzhik, 1994). The ICIpower can be calculated also for different Doppler power profiles, such as uniform, Gaussian, andthe two-path model (Robertson & Kaiser, 1999). In particular, the two-path model characterizes theDoppler shift caused by a carrier frequency offset (CFO). In this case, we have

PICI = 1− sinc2(πϑmax)≈π2

3ϑ2

max−2π4

45ϑ4

max+π6

315ϑ6

max. (7.12)

It is noteworthy that the ICI power produced by a CFO is roughly twice the ICI power related to a clas-sical Jakes’ Doppler power profile and is quite close to the universal upper bound PICI ≤ (πϑmax)

2/3(Li & Cimini Jr, 2001).

When guard bands are present, the ICI power PICI[l] ,∑

d active,d 6=l E{|φdl[k]|2} depends on the

subcarrier index l. For subcarriers far away from the guard subcarriers, PICI[l]≈ PICI, expressedfor instance by (7.11) or (7.12), while PICI[l]≈ PICI/2 for the edge subcarriers, since they receive



most interference from a single side only. The exact value of PICI[l] can be determined by sum-ming up the elements E{|φdl[k]|2} for all the indices d 6= l corresponding to the active subcarriers,where (Schniter, 2004)

E{|φdl[k]|2} =1

L2

L−1∑m=−L+1

(L− |m|)rt,H[m]e−j2πdm/L

=

(sin2(ωL/2)

L2 sin2(ω/2)∗ sν,H(ω)

)∣∣∣∣∣ω=2πd/L

.

(7.13)

Observe that (7.13) is the DFT of the product of the triangular function L− |m| and the discrete-timecorrelation rt,H[m] , r(2)H (m/(LF)) of the channel, or, dually, the sampled version of the frequency-domain convolution of a squared digital-sinc function with the Doppler power profile of the discrete-time channel sν,H(ω),

∑∞

m=−∞ rt,H[m]e−jωm. This convolution destroys the zeros of the squareddigital-sinc function and hence generates ICI. From (7.13), an important result is that E{|φdl[k]|2}rapidly decreases with increasing Doppler index d, because the squared-sinc function tends to zeroquadratically. Hence, most of the ICI is due to only a few subcarriers, especially for small values ofϑmax. Therefore, when the number L of subcarriers is large, PICI[l]≈ PICI for almost all the subcarri-ers. It should be noted that (7.13) does not depend on the subcarrier index l, but only on the Dopplerindex d.

From (7.9), the SINR is expressed by

ρ ,E{|qll[k]|2}σ 2

a

PICIσ2a + σ

2z=

1−PICI

PICI+ σ2z /σ

2a

.

Hence, when the ICI is left uncompensated, the SINR cannot exceed the maximum value ρmax =

P−1ICI− 1. When there are virtual subcarriers, the SINR on the lth subcarrier is expressed by ρl =

1−PICI

PICI[l]+σ 2z /σ

2a

, and the maximum SINR is ρmax ≈ 2(P−1ICI− 1) for the edge subcarriers.

7.1.2.2 BER Performance with One-Tap EqualizersWhile the analysis of the ICI power is relatively straightforward, a theoretical BER analysis is quitedifficult, apart from some specific cases. As a consequence, we assume that

1. a linear modulation scheme (e.g., PSK or QAM) is used;2. the channel h[n,m] is WSSUS with Rayleigh fading statistics;3. a receiver with perfect time and frequency synchronization is used;4. the one-tap equalizer for the lth subcarrier has perfect knowledge of the useful channel coefficient

qll[k].

First, we review some theoretical models for the uncoded BER, and then, we extend the discussion tothe coded BER, which is usually investigated by simulations.

For theoretical purposes, the power series model of an LTV channel is often used (Bello, 1963).With this model, the time variation of the channel is represented by a Taylor series expansion, usuallytruncated to the first term, as expressed by

h(t,τ)≈ h(t0,τ)+ h′(t0,τ)(t− t0), (7.14)



where t0 is the time instant in the center of the OFDM block, and h′(t0,τ), ∂∂t h(t,τ)

∣∣t=t0

. In the linear

model (7.14), h(t0,τ) stands for the useful component, and h′(t0,τ) represents the slope of the channeltime variability, assumed linear during the block interval. The approximation (7.14) is very accuratefor relatively small time variability, e.g., when ϑmax ≤ 0.1 (Chiavaccini & Vitetta, 2000), but can alsobe used when the Doppler spread is larger (Wang, Proakis, Masry, & Zeidler, 2006).

Since for Rayleigh fading h(t0,τ) and h′(t0,τ) are complex Gaussian and independent, the usefulsignal and the ICI will be independent, too. By dropping the block index k for simplicity, (7.8) becomes

yl = qllal+ il+ zl,

where the useful coefficient qll induced by h(t0,τ) is Gaussian, and the ICI il ,∑L−1

d=0,d 6=lφdlad, relatedto h′(t0,τ), is a Gaussian mixture.

When the number L of subcarriers is sufficiently high, due to the central limit theorem, the prob-ability density function (pdf) of the ICI il can be well approximated as Gaussian, and hence, alsoil+ zl is Gaussian. By means of this Gaussian ICI approximation, the BER can be obtained withstandard approaches. For instance, for QPSK with Gray coding, the conditional bit error probabilityPr{Re{al} 6= Re{al}|qll} can be expressed as

Pr{Re{al} 6= Re{al}|qll} = Q

(√|qll|

2 σ 2a

σ 2i [l]+ σ 2

z

), (7.15)

where Q(x), 1√

2π

∫∞

x e−u2/2du. The average of (7.15) over the Rayleigh pdf of |qll| leads to

Pr{Re{al} 6= Re{al}} =1

2

(1−

√ρl

ρl+ 2

), (7.16)

where ρl is the SINR per symbol on the lth subcarrier. The same expression is also valid for theimaginary part. According to (7.16), the BER only depends on the SINR and does not depend on thedelay power profile of the channel. Chiavaccini & Vitetta (2000) have shown that this approach is veryaccurate for QPSK when L= 1024. A similar approach has been used by Russell & Stuber (1995) toevaluate the symbol-error rate for 16-QAM. However, the numerical approximation of the symbol-error rate, expressed by Pr{al 6= al} ≈ 6.48/ρl, is valid only for large SINR. Al-Gharabally & Das(2006) have used a Gaussian ICI approximation that also incorporates the effect of channel estimationerrors.

An improved BER approximation can be obtained by avoiding the Gaussian ICI approxima-tion. By denoting with vl , il/qll the ICI after equalization, the Gaussian mixture conditional pdffvl|qll(Re{vl}, Im{vl}|qll) can be expressed as a two-dimensional Gram–Charlier series, whose coeffi-cients depend on the joint moments of Re{vl} and Im{vl} (Wang, Proakis, Masry, & Zeidler, 2006);then, the conditional BER is obtained after series truncation, and the average over the statistics of qll

can be done by means of semianalytical computation. Wang et al. (2006) have shown that a seriestruncation order equal to 4 produces a good accuracy for 16-QAM when L= 128. Interestingly, theGram–Charlier series approach highlights that the uncoded BER is moderately dependent on the fre-quency selectivity of the channel. When truncated up to the second order, the Gram–Charlier seriesexpansion reduces to the Gaussian ICI approximation.



For the coded BER, a theoretical characterization is rather difficult even for LTI channels, apartfrom some specific cases. Consequently, we only discuss some results obtained by simulations byPoggioni, Rugini, & Banelli (2008). We assume that the information bit sequence b[i] is convolu-tionally encoded to obtain the coded bit sequence c[j], whose length is KA log2(Na), where Na is theconstellation size, A is the number of data subcarriers, and K is the number of OFDM blocks within theinterleaver time span. After interleaving and mapping, P= L−A pilot symbols per block are added,and the KL resulting symbols are transmitted within K OFDM blocks. While for the uncoded BERthe delay power profile of the channel has little importance, its effect on the coded BER is relevant,since channel coding is able to exploit the frequency selectivity of the channel. Moreover, when theinterleaver time-span Tint , K(1+LCP/L)/F is greater than the channel coherence time Tc, the OFDMsystem can benefit from the time selectivity of the channel.

While the coded BER performance highly depends on the specific channel encoder and interleaver,only a few channel parameters have a significant impact on the coded BER. To explain this point, weintroduce the equivalent frequency-domain OFDM model (EFDOM) of Poggioni et al. (2008), which,combined with the specific channel encoder and interleaver, produces the same BER as the originalOFDM model with LTV channels. Basically, the EFDOM is a simple approximate model obtainedusing only a reduced number of parameters, which are the most important for the coded BER. First,(7.8) is rewritten as

y=Qa+ i+ z, (7.17)

where the underlined vectors, of size KL, are obtained by collecting the elements on the L subcarriers ofthe K OFDM blocks. In (7.17), the diagonal matrix Q contains the useful part of the LTV channel, a isthe data vector, i is the ICI vector, and z stands for the AWGN vector. In order to speed up simulationsfor the coded BER, the EFDOM replaces (7.17) with

y(E) =Q(E)a+√ϕ(E)i(E)+ z, (7.18)

where Q(E) has the same statistical properties as Q, dictated by the delay power profile and by the

Doppler power profile, i(E) is a Gaussian random vector with the same mean and the same covarianceas i, and ϕ(E) is a real positive random variable that models the energy variability of the ICI with respectto its mean value. Specifically, ϕ(E) is a computer-generated random variable that has approximately thesame pdf of the random variable

ϕ ,‖i‖2

E{‖i‖2}≈

‖i‖2

KLPICIσ2a

,

whose pdf is well approximated by the pdf of the sum of exponential random variables (Poggioni et al.,2008). In the coded case, since K OFDM blocks are processed together, the time variability of thechannel has a greater impact than in the uncoded case, where single blocks are separately considered.Therefore, in the coded case, the linear approximation (7.14) is not valid in general, and hence, theuseful part of the channel Q can be correlated with the ICI i. The EFDOM generates Q(E) and i(E) in

(7.18) in such a way that ρ(E)P , defined as the correlation coefficient between ‖Q(E)a‖2 and ‖i(E)‖2, isequal to ρP, which is the correlation coefficient between ‖Qa‖2 and ‖i‖2. Indeed, simulation results



0 5 10 15 20 2510−5

10−4

10−3

10−2

10−1

100

SNR (dB)

BE

R

Ideal, 1/2, 150Ideal, 1/2, 300LS, 1/2, 300Linear, 1/2, 150Linear, 1/2, 300LS, 2/3, 300Linear, 2/3, 150Linear, 2/3, 300

FIGURE 7.2

BER performance of DVB-H. In the legend, the first term indicates the type of channel estimation, the secondterm represents the code rate of the convolutional code, and the third term is the speed of the mobile receiverexpressed in kilometer/hour.

have shown that the single coefficient ρP is able to summarize the whole correlation effect over theK blocks (Poggioni et al., 2008). For K = 1 (frequency-domain-only interleaver), ρP is practically zerofor ϑmax ≤ 0.5. Due to the EFDOM, fast simulation of coded OFDM standards is enabled.

Figure 7.2 shows the BER performance of DVB-H at the output of the Viterbi decoder. We con-sider Mode 2k (L= 2048) with carrier frequency fc = 800MHz and channel bandwidth B= 8MHz.For a mobile receiver with speed υ = 150 km/h, this corresponds to a normalized maximum Dopplershift ϑmax ≈ 0.025. We assume QPSK modulation, a Rayleigh fading multipath channel with Jakes’Doppler power profile (Poggioni, Rugini, & Banelli, 2009), soft Viterbi decoding, and perfect time andfrequency synchronization. The receiver assumes a time-invariant channel within the OFDM block.The CIR estimation is performed by interpolation or fitting of the frequency-domain channel esti-mates obtained on equispaced pilot subcarriers: in Fig. 7.2, Linear stands for linear interpolation, LSstands for least-squares fitting, and Ideal stands for perfect knowledge of the average CIR. Figure 7.2shows that when the code rate of the convolutional encoder is 1/2, increasing the mobile speed fromυ = 150 km/h to υ = 300 km/h produces a small performance degradation. On the contrary, when thecode rate is 2/3, the performance degradation due to the increased mobile speed is significant, espe-cially if the channel estimator employs linear interpolation. Using least-squares fitting instead of linearinterpolation, a big performance improvement can be obtained, at the price of increased complexity.

Figure 7.3 illustrates the BER performance of DAB at the output of the Viterbi decoder. We considerMode III (L= 256) with carrier frequency fc = 800MHz and channel bandwidth B= 1.536MHz. Fora mobile receiver with speed υ = 150 km/h, this corresponds to a normalized maximum Doppler shiftϑmax ≈ 0.014. We assume π/4-DQPSK modulation, a Rayleigh fading multipath channel with Jakes’



0 5 10 15 20 25SNR (dB)

1/2, 150

1/2, 300

2/3, 150

2/3, 300

2/3, 0

10−5

10−4

10−3

10−2

10−1

100B

ER

FIGURE 7.3

BER performance of DAB. In the legend, the first term represents the code rate of the convolutional code, andthe second term is the speed of the mobile receiver expressed in kilometer/hour.

Doppler power profile, and soft Viterbi decoding (Poggioni et al., 2009). Differential demodulation isused. When increasing the mobile speed from υ = 0 km/h to υ = 150 km/h, the performance improvesdue to the time diversity gathered by the interleaver. However, when the mobile speed increases fromυ = 150 km/h to υ = 300 km/h, the ICI causes a performance loss.

Additionally, Reed–Solomon encoding is incorporated in DVB-T/H and in T-DMB as outer code.A detailed performance comparison of DVB-T/H and T-DMB have been presented by Poggioni et al.(2009). For DVB-T/H, differently from the uncoded BER, the coded BER highly depends on the delaypower profile of the channel (Poggioni et al., 2009). On the other hand, for T-DMB, the delay powerprofile of the channel has only a slight impact, because the effect of the time-domain interleaver isdominant (Poggioni et al., 2009).

7.1.3 MIMO-OFDMWe now extend the OFDM model with LTV channels to multiple-input multiple-output (MIMO)OFDM systems with MT transmit antennas and MR receive antennas. Denoting by x(j)[k] thefrequency-domain vector containing data and pilots of the jth transmit antenna, the vector trans-mitted from the jth antenna can be expressed as (see (7.1))

s( j)[k] , TCPWHx( j)[k].

The signal transmitted from the jth antenna arrives at the ith receive antenna after passing through anLTV channel with impulse response h(i,j)[n,m]. We denote by M the maximum of the MTMR maximum



discrete-time delay spreads. The vector received at the ith antenna can be expressed by (see (7.2))

r(i)[k]=MT∑j=1

(H(i, j)0 [k]s(j)[k]+H(i,j)

1 [k]s( j)[k− 1])+w(i)[k].

After CP removal and DFT, this becomes y(i)[k] , WRCPr(i)[k], which is expressed by (see (7.3))

y(i)[k]=MT∑j=1

H(i, j)F [k]x(j)[k]+ z(i)[k].

We now stack the vectors related to all the receive antennas in a single vector, denoted as y[k] ,(y(1)T [k] · · ·y(MR)T [k]

)T, and similarly for the transmit antennas, i.e., x[k] ,

(x(1)T [k] · · ·x(MT)T [k]

)T,

and we define the LMR×LMT matrix

HF[k] ,

H(1,1)

F [k] . . . H(1,MT)F [k]

......

H(MR,1)F [k] · · · H(MR,MT)

F [k]

.

The MIMO-OFDM system can then be described as

y[k]=HF[k]x[k]+ z[k]. (7.19)

Expression (7.19) shows that in MIMO-OFDM systems, the ICI increases due to the presence of mul-tiple transmit antennas. The ICI power, whose analysis has been presented by Stamoulis, Diggavi, &Al-Dhahir (2002), can be roughly estimated as MT times the ICI for the single antenna case. In addition,as usual in MIMO schemes, there exists some inter-antenna interference (IAI). Despite the increasedinterference, multiple receive antennas provide additional degrees of freedom in order to mitigate bothICI and IAI.

In addition, we can stack the vectors related to K successive OFDM blocks, resulting in y ,(yT [0] · · ·yT [K− 1]

)T and x ,(xT [0] · · ·xT [K− 1]

)T, and define the block diagonal matrix

HF

,

HF[0] 0

. . .

0 HF[K− 1]

.

We then obtain

y=HFx+ z. (7.20)

In (7.20), the elements are ordered in such a way that first there is a change in the subcarrier index,then in the antenna index, and finally in the OFDM block index. However, this order can be changedby using suitable permutation matrices.


7.2 ICI Mitigation Techniques 299

7.2 ICI MITIGATION TECHNIQUESIn this section, we present some common techniques for reducing the ICI produced by LTV channels.Some of these techniques have also been discussed in Chapter 6. Throughout this section, we assumethat the LTV channel is unknown at the transmitter and perfectly known at the receiver. We first presenttechniques that make use of receiver processing only. These receiver-only techniques, which will befurther divided into linear and nonlinear, are similar to those used for multiuser detection for code-division multiple-access (CDMA) systems. However, the specific structure of the ICI allows for somespecific methods. Subsequently, we describe ICI mitigation techniques that employ transmitter prepro-cessing. These transmitter methods can be powerful, but in general are not compliant with the currentOFDM standards. Finally, we extend the ICI mitigation techniques to MIMO-OFDM systems.

7.2.1 Linear EqualizationAmong the receiver equalization methods, linear algorithms construct a soft data estimate by a linearcombination of the received samples. For convenience, we rewrite (7.3) by dropping the OFDM blockindex as

y=HFx+ z, (7.21)

where y represents the L-dimensional frequency-domain received vector, HF is the L×L frequency-domain nondiagonal matrix that induces ICI, x is the frequency-domain transmitted vector, and z standsfor the AWGN. Since the channel matrix HF is assumed known at the receiver, to simplify the explana-tion, we assume that no pilot symbols are transmitted except for P guard subcarriers, which are consec-utive and typically present in any OFDM standard. Here, P is assumed even. These guard bands corre-spond to the edge positions of the analog bandpass frequency-domain transmitted signal and hence tothe central positions of the corresponding discrete-time baseband signal. For convenience, we reorderthe subcarriers by a cyclic shift in such a way that the A= L−P data positions are in the center. Denot-ing by TGB ,

(0A×P/2 IA 0A×P/2

)T the L×A matrix that inserts the guard subcarriers, and by a theA-dimensional subvector of x containing the data symbols, we obtain x= TGBa. At the receiver, we canexclude the P virtual subcarriers by applying RGB , TT

GB, as expressed by yA , RGBy. This becomes

yA =HAa+ zA, (7.22)

where HA , RGBHFTGB is the A×A ICI matrix relative to the data subcarriers and yA (zA) is theA-dimensional received (AWGN) vector. Equalizers designed using the model (7.22) will be referredto as block equalizers, since the data subcarriers of the whole OFDM block are jointly equalized.

As explained by (7.13), due to the structure of the Doppler spreading, the ICI on the lth subcarriermainly comes from a few subcarriers. This means that the matrix HA can be well approximated by abanded matrix B(Db)

A , where Db denotes the number of retained subdiagonals and, at the same time,

superdiagonals of HA. An intuitive example of B(Db)A is given in Fig. 7.4. Therefore, in the banded

case, the block model (7.22) becomes

yA = B(Db)A a+ zA. (7.23)



FIGURE 7.4

Possible approximations of the frequency-domain channel matrix. The gray intensity is proportional to themagnitude of the corresponding element.

The integer parameter Db represents the (single-sided) discrete Doppler support that is used for equal-ization. Since the ICI coefficients E{|φdl[k]|2} in (7.13) have a rapid decay, the significant discreteDoppler support D is usually quite low. Anyway, we can select Db < D to reduce complexity. Thevalue of Db is usually chosen according to some empirical rules, such as proportionally to ϑmax, oras the value that reduces the Frobenius norm ‖HA−B(Db)

A ‖2F below a given threshold. A common

choice is Db = dϑmax+D0e, where D0 is a small nonnegative number (Schniter, 2004; Hwang &Schniter, 2006) (see also (6.6) in Chapter 6). This rule usually leads to 2Db+ 1� L, which allows forlow-complexity equalization algorithms.2

It is noteworthy that the relations (7.22) and (7.23) only consider the A active subcarriers. Whenthe equalizer considers all the L subcarriers, the frequency-domain channel matrix HF can be approx-imated by a matrix with cyclically banded structure,3 since the upper-right and the lower-left cornersare significant (see Fig. 6.2(b) in Chapter 6). This effect, due to frequency-domain aliasing, disappears

2When the ICI mitigation support 2Db+ 1 exceeds the channel length M, time-domain equalizers are less complex thanfrequency-domain equalizers (Hrycak & Matz, 2006).3Cyclically banded matrices are also known as quasi-banded matrices.



in the presence of guard subcarriers, which cancel the first and the last columns of the channelmatrix.

7.2.1.1 Serial EqualizersAlternatively to the block models expressed by (7.22) and (7.23), a reduced model for the lth sub-carrier can be exploited. Indeed, due to the banded structure of the channel matrix, the energy ofthe lth data symbol al mostly falls onto a subvector of yA with size 2Db+ 1, denoted by y(Db)

A [l] ,(yl−Db · · ·yl · · ·yl+Db

)T , which can be expressed by

y(Db)A [l]=H(Db)

A [l]a+ z(Db)A [l]. (7.24)

Here, H(Db)A [l] is the (2Db+ 1)×A submatrix of HA that contains the rows with index from l−Db to

l+Db, as shown in Fig. 7.4, and z(Db)A [l] is the AWGN subvector, defined similarly to y(Db)

A [l]. Theequalizers designed using (7.24) will be refered to as serial equalizers. Indeed, since (7.24) is validfor the lth subcarrier only, the data have to be equalized serially (sequentially). Including the bandapproximation, the serial model (7.24) becomes

y(Db)A [l]= B(Db)

A [l]a(Db)[l]+ z(Db)A [l], (7.25)

where B(Db)A [l] is the (2Db+ 1)× (4Db+ 1) submatrix of HA with row index from l−Db to l+Db

and column index from l− 2Db to l+ 2Db (see Fig. 7.4), and a(Db)[l] ,(al−2Db · · ·al · · ·al+2Db

)T .In the context of LTV channel equalization for OFDM, different linear serial equalizers have been

proposed so far. Indeed, although the reduced model (7.24) is suboptimal with respect to the full one(7.22), serial equalization deals with matrices and vectors with smaller dimension and hence reducesthe memory requirements of the equalizer. One of the most popular serial equalizers is the zero-forcing(ZF) or least-squares (LS) banded approach of Jeon, Chang, & Cho (1999), which estimates the softdata as

al = eT2Db+1,Db+1B(Db)−1

A [l]y(Db)A [l]. (7.26)

Here, em,n is the nth column of Im, and B(Db)A [l] is the (2Db+ 1)× (2Db+ 1) central block of B(Db)

A [l].Therefore, the ICI is completely eliminated, at the price of some noise enhancement, quantitativelysummarized by the condition number of B(Db)

A [l]. The computational complexity of banded linear serialequalizers can be reduced fromO(D3

bA) toO(D2bA) per block, using recursive inversion algorithms that

compute B(Db)−1A [l+ 1] by updating the already calculated B(Db)−1

A [l] (Cai & Giannakis, 2003).To reduce the noise enhancement, serial equalizers based on the linear minimum mean-squared

error (MMSE) criterion have been proposed. For instance, the nonbanded approach of Cai & Giannakis(2003) is expressed by

al = eTA,lH

(Db)HA [l]R(Db)−1

A [l]y(Db)A [l], (7.27)

where R(Db)A [l]=H(Db)

A [l]H(Db)HA [l]+ γ I2Db+1 and γ = σ 2

z /σ2a is the noise-to-signal ratio. With respect

to (7.26), the approach in (7.27) produces an improved performance for two reasons: (1) differentlyfrom a ZF equalizer, an MMSE equalizer balances ICI reduction and noise enhancement; (2) there is no



band approximation error. Since nonbanded approaches model the out-of-band (OOB) elements of theICI matrix, they have a larger computational complexity, which is O(DbA2) per block when recursiveinversion is employed to obtain R(Db)−1

A [l+ 1] from R(Db)−1A [l] (Cai & Giannakis, 2003).

A different linear serial equalizer has been proposed by Barhumi, Leus, & Moonen (2004) exploit-ing a basis expansion model (BEM) for both the LTV channel and the equalizer.4 Using complexexponential basis functions, the linear equalizer of Barhumi et al. (2004) is modeled as banded withbandwidth parameter Db > D, i.e., greater than the bandwidth of the channel matrix. The resultingcomplexity is O(D2

bA2) per block.

7.2.1.2 Block EqualizersIn the literature, many linear block equalizers have been proposed, relying on either the LS or theMMSE criterion, sometimes exploiting the band approximation. LS and linear MMSE equalizers basedon the full (nonbanded) model (7.22) have been proposed by Choi, Voltz, & Cassara (2001). However,due to the high complexity (O(A3) per block), nonbanded block equalizers have limited applicabilityin OFDM systems with many subcarriers, such as DVB-T/H.

Indeed, in block equalization, a structured model of the frequency-domain channel matrix is essen-tial to reduce the computational complexity of the equalizer and is instrumental for LTV channelestimation, too. For instance, by exploiting the band approximation, a linear block MMSE equalizerbased on (7.23) can be expressed by (Rugini, Banelli, & Leus, 2005)

a= B(Db)HA

(B(Db)

A B(Db)HA + γ IA

)−1yA. (7.28)

Since in (7.28) the matrix to be inverted is banded, the estimated data can be obtained by exploit-ing banded linear system solving techniques (such as band LDLH factorization), whose complexity isO(D2

bA) like in the corresponding serial case. Figure 7.4 summarizes the four possible combinationsthat can be obtained by selecting a block or a serial equalizer and a banded or a full (nonbanded) equal-izer. For each of the four models, different equalization criteria and structures are possible, includingZF, MMSE, and nonlinear equalizers.

Alternatively to direct equalization, block (and serial) equalization can be performed relying oniterative linear equalization. In contrast to iterative nonlinear equalization, which will be discussed inSection 7.2.2, iterative linear equalizers do not use hard decisions or nonlinearly modified (e.g., hyper-bolic tangent) soft decisions of the data. Specifically, the matrix inversion in ZF or MMSE equalizersis avoided by performing an iterative procedure that produces an increasingly improved approximationof the exact result. For example, Li, Yang, Cai, & Gui (2003) have presented an iterative banded blockZF equalizer based on Jacobi iterations. Denoting by κ the iteration index, the received data vector isestimated as

a(κ) =Q−1A

[yA− (B

(Db)A −QA)a(κ−1)]

= a(κ−1)+Q−1

A (yA−B(Db)A a(κ−1)),

where QA is a diagonal matrix that contains the main diagonal of B(Db)A . The term (B(Db)

A −QA)a(κ−1)

represents the soft ICI reconstructed from the previous iteration. Therefore, the ZF equalizer of

4We will discuss BEM techniques in the context of channel estimation in Section 7.3.1.



Li et al. (2003) implements a linear parallel ICI cancelation scheme. Since the matrix to be inverted isdiagonal, each iteration requires very few computations. Moreover, the convergence of this algorithmto the exact solution a(∞) = B(Db)−1

A yA is always guaranteed. The speed of convergence can be slow,especially for some bad channel realizations. However, an acceleration of convergence can be achieved(Molisch, Toeltsch, & Vermani, 2007). The number of iterations and the choice of the initial estimatea(0) highly affect the approximation error of the final data estimate.

Undoubtedly, in block equalization, the system matrix is bigger than for serial approaches. As aconsequence, a high condition number can be a significant issue. This problem can be reduced by usingTikhonov regularization, which adds a small term to γ in (7.28) to improve conditioning. Alternatively,

γ can be replaced by the inverse modified SINR ρ−1, where the modified SINR ρ ,1−P

(Db)OOB

P(Db)OOB+γ

is obtained

by considering the elements of HA within the main band as useful terms, and the OOB elements aseffective ICI, as expressed by P(Db)

OOB , ‖HA−B(Db)A ‖

2F/A. A third option is to employ an iterative

equalization with implicit regularization: Taubock, Hampejs, Matz, Hlawatsch, & Grochenig (2007)have used an LSQR algorithm to perform iterative banded block ZF equalization constrained to theKrylov subspace generated by B(Db)H

A B(Db)A and B(Db)H

A yA. In the LSQR algorithm, the conditioningimprovement is obtained by early termination of the iterative algorithm, which also helps in savingcomplexity.

7.2.1.3 Receiver WindowingBanded equalizers sometimes employ time-domain receiver windowing techniques to concentrate theICI into the main band of HA so that the band approximation is more accurate. This ICI shorteningtechnique can be viewed as the dual of ISI channel shortening for single-carrier systems with LTI mul-tipath channels. Receiver windowing is compatible with both serial (Schniter, 2004) and block (Rugini,Banelli, & Leus, 2006) approaches and can be used also in conjunction with nonlinear equalization. Toexamine receiver windowing, we define yW , W1RCPr, where 1 is the L×L diagonal matrix repre-senting the time-domain windowing operation, performed before the DFT at the receiver. The OFDMsignal model of (7.3) can then be replaced by

yW =W1HTWHx+W1RCPw=HWx+ zW, (7.29)

where HW , W1HTWH is the frequency-domain windowed channel matrix and zW , W1RCPw isthe noise after windowing. It is interesting to note that HW = 0HF and zW = 0z, where 0 , W1WH

is a circulant filtering matrix that models receiver windowing (i.e., ICI shortening) in the frequency-Doppler domain. As a result of 0, the noise zW, though Gaussian, is no longer white. Obviously, byselecting 1= IL, (7.29) reduces to classical OFDM and coincides with (7.3).

Receiver windowing does not affect the performance of nonbanded linear block equalizers, sinceit only performs a linear operation on the received signal. Nevertheless, when coupled with the bandapproximation, the OOB ICI energy, which is neglected by banded equalizers, can be greatly reduced,thereby improving performance considerably. From a performance viewpoint, a good window designcriterion could be the minimization of the mean-squared error (MSE) on the decision variable. How-ever, a closed-form solution to this minimization problem is hard to find. Therefore, common designcriteria target the windowed matrix HW rather than the MSE on the data. For instance, the Max-Average



SINR criterion of Schniter (2004) maximizes the average input SINR, expressed by

ρ(Db)IN (1)=

E{‖B(Db)W ‖

2F}

E{‖HW−B(Db)W ‖

2F}+ σ

2z tr{00H

}

, (7.30)

where B(Db)W is the cyclically banded matrix that contains the 2Db+ 1 central diagonals of HW. Of

course, the maximization of (7.30) is subject to the window energy constraint tr{11H} = L. Similarly,

the minimum band approximation error (MBAE) criterion of Rugini et al. (2006) looks for the windowthat minimizes the OOB ICI energy E{‖HW−B(Db)

W ‖2F}, with the additional constraint that the window

is the sum of 2Db+ 1 exponential (SOE) functions, as expressed by

δ , diag{1} = W(Db)η(Db},

where W(Db) is an L× (2Db+ 1)matrix that contains the first Db+ 1 and the last Db columns of W, andη(Db) is a vector of size 2Db+ 1 containing the window coefficients. The MBAE solution η(Db)

MBAE withthe SOE constraint is the eigenvector that corresponds to the maximum eigenvalue of W(Db)HAW(Db),where A is an L×L Toeplitz matrix defined by

[A]m,n , rt,H[n−m]sin(π (2Db+ 1)(n−m)/L)

Lsin(π (n−m)/L). (7.31)

Hence, the window depends on the selected parameter Db, and on the Doppler power profile (of thediscrete-time channel) through the time-domain autocorrelation rt,H[m] (see comments after (7.13)).

Other criteria than Max-Average SINR and MBAE-SOE are possible. For instance, different typesof input SINR could be defined. The Max-SINR criterion of Schniter (2004) considers the instanta-neous input SINR rather than the average input SINR. This translates into a window that depends onthe LTV channel realization rather than on the LTV channel statistics. In this case, the window designmust be repeated for each OFDM block. Das & Schniter (2007) have proposed a window design thatconsiders: the elements on the main diagonal as useful signal; the other elements on the dominant diag-onals as don’t-care values; and the elements on the other diagonals as interference. The interferencepower also includes other disturbances, such as the ISI coming from the previous OFDM block whenthe CP is short or absent.

A nice feature of the window design with the SOE constraint is that the circulant matrix 0,which represents the frequency-domain noise after windowing, is cyclically banded (with bandwidth2Db+ 1). This can be exploited for low-complexity equalization. The banded linear block MMSEequalizer (Rugini et al., 2006) can be expressed by (see (7.28))

a= B(Db)HWA

(B(Db)

WA B(Db)HWA + γ0A0

HA

)−1yWA, (7.32)

where B(Db)WA , RGBB(Db)

W TGB, 0A , RGB0, and yWA , RGByW are obtained by excluding the guard

bands. Since B(Db)W and 0 are cyclically banded, when the guard band on each side has size P/2≥

Db, B(Db)WA is banded with bandwidth 2Db+ 1, and the matrix to be inverted in (7.32) is banded with

bandwidth 4Db+ 1. Therefore, as in the absence of windowing, simple equalizers can be employed,with linear complexity in the number of subcarriers (Rugini et al., 2006).



The main advantage of receiver windowing lies in its extremely low additional complexity, despitethe significant performance improvement. We note that good window designs require the knowledgeof the channel statistics, such as the normalized maximum Doppler shift ϑmax and the shape of theDoppler power profile. In the absence of channel statistics, suboptimal windows can be employed,such as those used for spectral estimation (e.g., Hamming, Bartlett, Gaussian) (Harris, 1978), at theprice of a reduced performance improvement. A performance comparison of different windows hasbeen presented by Peiker, Dominicus, Teich, & Lindner (2008), assuming one-tap equalization and anadditional cyclic extension (postfix).

7.2.1.4 Performance-Complexity Trade-OffWe now compare some representative linear equalizers in terms of simulated BER performance andcomputational complexity. We consider an OFDM system with L= 128 subcarriers, of which A= 96are active, and QPSK modulated data. We assume a WSSUS Rayleigh fading channel with trun-cated exponential delay power profile E{|h[n,m]|2} = αe−0.6m, where α is a normalization constant.The channel length is chosen as M = 9, and consequently, the CP length is set to LCP = 8. Regardingthe time variation of the channel, we assume a Jakes’ Doppler power profile with ϑmax = 0.12, i.e., themaximum Doppler frequency νmax is 12% of the subcarrier spacing F.

Figure 7.5 shows the BER performance of the following linear equalizers:. Conventional one-tap equalizer;. Full block ZF and MMSE equalizers (Choi et al., 2001);. Banded serial ZF equalizer (Jeon et al., 1999);

0 5 10 15 20 25 3010−4

10−3

10−2

10−1

Eb/N0 (dB)

BE

R

One−tap equalization

ZF (full, block) (Choi et al., 2001)

ZF (banded, serial) (Jeon et al., 1999)

MMSE (banded, block) (Rugini et al., 2005)

MMSE (windowing) (Rugini et al., 2006)

MMSE (full, serial) (Cai and Giannakis 2003)

MMSE (full, block) (Choi et al., 2001)

FIGURE 7.5

BER performance comparison of linear equalizers.



. Full serial MMSE equalizer (Cai & Giannakis, 2003);. Banded block MMSE equalizer (Rugini et al., 2005) and its window-aided version (Rugini et al.,2006).

The matrix bandwidth parameter of banded equalizers is Db = 2, i.e., only 2Db+ 1= 5 diagonalsare considered. Similarly, serial equalizers only consider Db = 2 subcarriers for each side, and hence,the observation vector length is 2Db+ 1= 5. The receiver window is designed using the MBAE-SOEcriterion (Rugini et al., 2006), assuming perfect knowledge of the Doppler power profile. To avoid ill-conditioning problems at high SNR, in the absence of windowing, the banded block MMSE equalizer(Rugini et al., 2005) exploits a Tikhonov regularization, i.e., when the SNR Es/N0 = log2(Na)Eb/N0exceeds 20 dB, the equalizer assumes a virtual SNR of 20 dB. All the equalizers exploit perfectchannel-state information (CSI) at the receiver.

From the results of Fig. 7.5, it is clear that there exists a big performance gap between the ZF andMMSE equalizers. This confirms that doubly selective channels are ill conditioned, since an MMSEequalizer can be interpreted as a regularized ZF equalizer. Among the MMSE equalizers, the best per-formance is obtained by the full block approach of Choi et al. (2001), whose complexity per OFDMblock is however cubic in the number of subcarriers. Therefore, the complexity for the full blockMMSE equalizer of Choi et al. (2001) is O(A2) per symbol, where A2

≈ 104. The full serial MMSEequalizer of Cai & Giannakis (2003) is able to reduce the computational complexity to about O(DbA)per symbol, with (2Db+ 1)A≈ 500, at a price of a modest performance loss. The banded block MMSEequalizer is able to significantly reduce complexity, since the number of complex operations per equal-ized symbol is C = 8D2

b+ 22Db+ 4= 80 (Rugini et al., 2005), plus 2Db+ 1= 5 additional complexoperations per symbol when windowing is included (Rugini et al., 2006).

Despite the lower complexity, the banded block MMSE equalizers maintain a good BER perfor-mance: specifically, due to the statistical CSI knowledge (summarized by the Doppler power profile),the window-aided banded block MMSE equalizer (Rugini et al., 2006) is able to outperform the fullserial MMSE equalizer (Cai & Giannakis, 2003) with respect to both performance and complexity.

Figure 7.6 presents a BER performance comparison of banded block MMSE equalizers as a func-tion of the normalized maximum Doppler shift ϑmax, for the same scenario previously described, whenEb/N0 = 20 dB. For comparison purposes, also the conventional one-tap equalizer and the full blockMMSE equalizer (Choi et al., 2001) are considered. Clearly, to maintain a fixed performance, thematrix bandwidth size Db should be increased as ϑmax grows, especially when receiver windowingis not used. However, the computational complexity increases quadratically with Db, ranging fromC = 8D2

b+ 22Db+ 4= 34 complex operations per symbol when Db = 1 to C = 220 complex oper-ations per symbol when Db = 4. Moreover, when Db increases, more matrix parameters have to beestimated, and hence, a more powerful channel estimator is required.

7.2.2 Nonlinear EqualizationA nonlinear equalizer estimates the data symbols by applying a nonlinear operation on the receivedvector. A typical configuration of a nonlinear equalizer consists of a first linear stage that produces sometentative data decisions and a second nonlinear stage that cancels the ICI using the tentative decisions.This configuration includes decision-feedback equalization, parallel ICI cancelation, successive ICIcancelation, and many other types of interference cancelation techniques. In addition, similarly to



0.1 0.15 0.2 0.25 0.310−3

10−2

10−1

Normalized maximum Doppler shift ϑmax

BE

R

One-tap equalizationDb=1Db=2Db=4Db=1, windowingDb=2, windowingFull block MMSE

FIGURE 7.6

BER performance comparison of banded block MMSE equalizers.

multipath channel equalization for single-carrier systems, there exists a large variety of other nonlinearequalizer structures, including maximum-likelihood (ML) methods and turbo approaches. In all cases,as for linear equalizers, nonlinear equalizers can be classified as serial or block methods, banded ornonbanded approaches, window-aided or nonwindow-aided techniques.

Generally, a nonlinear equalizer performs better than a linear equalizer, although in many cases thecomputational complexity increases, especially for ML approaches. In the following, we review themost common techniques for OFDM systems with LTV channels.

7.2.2.1 Decision-Feedback EqualizersDecision-feedback equalization (DFE) is characterized by a feedforward filter that reduces the ICIproduced by the not-yet-detected symbols and a feedback filter that cancels the ICI produced by thealready-detected symbols. For block equalizers, the soft-detected data can be expressed by

a= FFyA−FBa, (7.33)

where FF is the A×A feedforward filter matrix, FB is the A×A feedback filter matrix, and a containsthe already-estimated hard-detected data symbols. Usually, the data symbols are detected sequentially,starting from the first (last) subcarrier; in this case, FB should be strictly lower (upper) triangular, whichguarantees that the not-yet-detected symbols are not fed back.

To design the DFE filters, the ZF or MMSE criterion can be used. Since a linear (ZF or MMSE)equalizer can be obtained as a degenerate case of DFE with FB = 0A×A, DFE approaches generally



outperform their corresponding linear counterparts. The main drawback of DFE is the error propaga-tion due to the bad cancelation of an incorrectly detected symbol. Moreover, often the filter designoptimistically assumes perfect (error-free) feedback, neglecting the error propagation.

For DFE, both serial and block approaches are possible. In block approaches, the two filters arejointly designed for all the subcarriers. Rugini et al. (2006) have presented some block MMSE DFEreceivers that also incorporate a band approximation and receiver windowing. As in linear equaliza-tion, the band approximation is used to obtain FFyA with reduced complexity. Moreover, FB is alsobanded so that only 2Db symbols are fed back, thereby reducing the error propagation. As a result,the computational complexity of the block DFE of Rugini et al. (2006) is O(D2

bA) per block. Thiscomplexity, which is lower compared to nonbanded approaches, is balanced by a performance loss thatincreases the error floor. However, also in the banded case, DFE outperforms linear equalization withbasically the same complexity. The banded block DFE can also be coupled with receiver windowing,leading to a significant reduction of the error floor. However, the complexity is approximately doubledwith respect to the nonwindowed DFE (Rugini et al., 2006).

In the serial case, the feedforward filter (different from subcarrier to subcarrier) acts on a fewelements of the received vector, e.g., on y(Db)

A [l] in (7.25). One example is the serial MMSE DFEproposed by Cai & Giannakis (2003), where the filters are computed recursively from the filters usedfor the previously detected subcarrier. This recursive procedure reduces the complexity to O(DbA2)

per block. A specific feature of the DFE of Cai & Giannakis (2003) relies on its cyclic ordering forsuccessive cancelation. Consequently, the “best” subcarrier can be chosen as starting point instead ofone of the edge subcarriers. This produces a clear connection with SIC equalizers, discussed in thefollowing subsection.

7.2.2.2 ICI CancellersThe concept of ICI cancelation, introduced for DFE above, is exploited also by other equaliza-tion structures, such as successive interference cancelation (SIC) equalizers. Also SIC equalizershave two stages, with the first producing tentative decisions and the second subtracting the regen-erated ICI. However, differently from DFE, SIC equalizers perform an ordered ICI cancelation, insuch a way that reliably detected subcarriers are detected first. Due to this subcarrier ordering, theprobability of error propagation is reduced, especially for the first subtractions. However, a sort-ing procedure is necessary to establish the subcarrier ordering. This can be problematic for bandedequalizers, since subcarrier sorting destroys the banded structure of the frequency-domain channelmatrix.

In the technical literature, different options have been considered for the first tentative datadetection: conventional one-tap equalization (Leung & Ho, 1998), nonbanded linear block MMSEequalization (Choi et al., 2001), and banded linear serial MMSE equalization (Kim & Park, 2006; Lu,Kalbasi, & Al-Dhahir, 2006). Also the detection order can be chosen using different criteria: postde-tection SINR (Choi et al., 2001; Lu et al., 2006), magnitude of the diagonal elements of the channelmatrix HF (Kim & Park, 2006), and distance between soft and hard estimates produced by the firststage (Leung & Ho, 1998). The subcarrier order can be updated during the ICI subtraction, as in thenulling-canceling approach of Choi et al. (2001). This implies an increased complexity due to multiplesorting. In addition, many cancelation stages can be employed, as proposed by Leung & Ho (1998),who basically used an iterative nonlinear equalizer.



A closely related technique is parallel interference cancelation (PIC), where the ICI of all thesymbols is jointly canceled in a block fashion. The first estimate is typically obtained by one-tapMMSE equalization (Chen & Yao, 2004; Gorokhov & Linnartz, 2004) or by serial approaches (Chang,Han, Ha, & Kim, 2006; How & Chen, 2005). Banded cancelation is used to save complexity with smallperformance loss, since only the relevant ICI produced by a few subcarriers is subtracted. An improvedPIC approach can be obtained by replacing the hard cancelation by reliability-based nonlinear softcancelation (Molisch et al., 2007), where the hyperbolic tangent function is used to control the amountof ICI cancelation. Molisch et al. (2007) have also included a performance comparison with a SICscheme.

Huang, Letaief & Lu (2005) have applied bit-interleaved coded modulation over multiple OFDMblocks. This scheme employs a reduced ML decoder obtained by approximating the LTV channel asconstant over a single OFDM block. Since the reduced ML decoder neglects the ICI, its effectivenessis limited to low ϑmax. For high Doppler spreads, the uncoded BER floor is too high, and the channeldecoder even increases the BER. Therefore, Huang et al. (2005) have also included a PIC equalizerdriven by a linear MMSE equalizer that works over multiple OFDM symbols.

Hybrid approaches that combine PIC with SIC are also possible, such as in groupwise interferencecancelation. Basically, the set of subcarriers is split into a certain number of groups of subcarriers.Then, ICI cancelation within the group is performed in a parallel way, whereas the ICI among differentgroups is subtracted in a successive way. This reduces the ordering problem, because the number ofgroups is smaller than the number of subcarriers. Commonly, the groups contain consecutive subcar-riers, but reliability-based subcarrier grouping criteria are sometimes used. Some examples of thesetechniques have been presented by Vogeler, Brotje, Klenner, Kuhn, & Kammeyer (2004); Tran &Fujino (2005); Song, Kim, Nam, Yu, & Hong (2008); and Hampejs et al. (2009).

7.2.2.3 Near-ML EqualizersAssuming the block model (7.22), the ML equalizer is expressed by

a= argmins‖yA−HAs‖ , (7.34)

where s is a generic possible data vector. Among the various approaches, the ML equalizer gives thebest performance, since (7.34) minimizes the conditional probability of block error Pr

{a 6= a|HA

}. On

the other hand, the ML approach is characterized by the worst complexity O(NAa ), where Na is the

constellation size, i.e., the complexity is exponential in the number of active subcarriers A. Hence, amajor goal is to find a low-complexity yet good approximation of the exact ML equalizer. In theory,most of the methods already developed for multiuser detection of CDMA signals could be employed,but the specific structure of the OFDM channel matrix and the potentially large number of subcarriersshould be taken into account to avoid prohibitive complexity.

Using the band approximation HA ≈ B(Db)A , Ohno (2005) proposed a banded block ML equalizer

that reduces the equalization complexity up toO(DbN2Db+1a A). This is achieved by employing a Viterbi

algorithm with a reduced number of surviving paths. However, in the case of a large constellation sizeNa, complexity is still an issue. In addition, the Viterbi equalizer proposed by Ohno (2005) assumedwhite noise and therefore is not compatible with receiver windowing.

A second type of ML approximation consists in performing, for a specific subcarrier, a local MLsearch that only considers the neighboring subcarriers, with a philosophy that is similar to the blind



time-domain equalizer of Cui & Tellambura (2007). This approach can be regarded as the serial versionof the banded ML equalizer.

Similarly, a third type of quasi-ML equalizers can be established by employing sphere decod-ing (see Section 3.8, Section 8.3.4 and references therein) or other tree-search techniques (seeSection 6.3.2.4). For instance, Hwang & Schniter (2006) have applied a breadth-first search basedon the T-algorithm, in a multicarrier system with transmitter and receiver windowing. This specifictree-search algorithm is coupled with a banded block MMSE DFE preprocessing with cyclic ordering,practically leading to ML performance with reduced complexity (below O(L2.4) per block) (Hwang &Schniter, 2006). Another tree-search algorithm has been investigated by Chow & Jeremic (2006).

As a fourth option, the optimization problem (7.34) can be relaxed to an equality-constrainedquadratic programming problem, which is solved iteratively (Kou, Lu, & Antoniou, 2005). Thisapproach can be extended to QAM as proposed by Zhang, Lu, & Gulliver (2007), which also reducesthe equalization complexity by using a subspace constraint.

Mixed approaches are also possible. For instance, the groupwise approach of Feng, Minn, Yan, &Jinhui (2010) employs semidefinite relaxation to mitigate the ICI within a group of adjacent subcarriersand a PIC technique to reduce the ICI coming from the other groups of subcarriers.

7.2.2.4 Iterative and Turbo ApproachesDifferently from the linear ones, iterative nonlinear equalizers perform nonlinear operations to itera-tively update the data estimate. From this viewpoint, many ICI cancelation schemes that we describedpreviously are also iterative. As a consequence, in this section, we describe those iterative nonlinearequalizers that have also other specific features.

As an example, equalization can be combined with channel estimation and with forward error cor-rection decoding. Tomasin, Gorokhov, Yang, & Linnartz (2005) have presented an iterative channelestimator and a PIC equalizer that reuses the output of the convolutional decoder. The frequency-diversity gain provided by channel coding allows for a reliable ICI estimate, which producesimproved performance (at least at medium-to-high SINR) but also increases the decoding delay. Jointequalization and channel estimation is also performed by Mostofi & Cox (2005).

A remarkable iterative structure is the turbo equalizer proposed by Schniter (2004), which is basedon a window-assisted serial linear MMSE equalizer. In the equalizer of Schniter (2004), the symbol al

is iteratively estimated using a linear MMSE (LMMSE) criterion, as expressed by

al = µl+ eT4Db+1,2Db+1B(Db)H

W [l]R(Db)−1W [l]

(y(Db)

W [l]−B(Db)W [l]µ(Db)[l]

). (7.35)

Here, µl is the a priori mean of al, µ(b)[l] ,(µl−2Db · · ·µl−1 0 µl+1 · · ·µl+2Db

)T is the a priorimean of the data vector a(Db)[l] (except for the middle symbol, which is set to 0 instead of µl),R(Db)

W [l] , B(Db)W [l]V(Db)[l]B(Db)

WH[l]+ γ0(Db)[l]0(Db)H[l] is the matrix to be inverted, with V(Db)[l] ,

diag(vl−2Db · · ·vl−1 1 vl+1 · · ·vl+2Db

)the diagonal matrix that contains the a priori variances (except

for al) and 0(Db)[l] the (2Db+ 1)×L matrix obtained by selecting the rows of 0 from index l−Dbto l+Db. After LMMSE symbol estimation, the iterative procedure of Schniter (2004) updates thelog-likelihood ratio (LLR) L(al|al), which is used to update the a priori means and variances (forQPSK, µl = tanh(L(al|al)/2) and vl = 1− |µl|

2), to be used for symbol estimation in the next iter-ation. Different algorithms are possible depending on how the a priori quantities are updated: for



instance, in order to obtain a(κ)l in the κth iteration, (7.35) can use the a priori quantities cal-

culated in the previous iteration{µ(κ−1)l−2Db

,v(κ−1)l−2Db

, ...,µ(κ−1)l−1 ,v(κ−1)

l−1 ,µ(κ−1)l+1 ,v(κ−1)

l+1 , ...,µ(κ−1)l+2Db

,v(κ−1)l+2Db

},

or it can employ also some a priori values already calculated in the current iteration{µ(κ)l−2Db

,v(κ)l−2Db, ...,µ(κ)l−1,v(κ)l−1,µ(κ−1)

l+1 ,v(κ−1)l+1 , ...,µ(κ−1)

l+2Db,v(κ−1)

l+2Db

}. These two updating strategies corre-

spond to a block-wise update and a serial-wise update, respectively. The simulation results of Schniter(2004) show that the serial-wise update produces a better performance, since the newly acquired apriori information is used as soon as it is available, thereby improving the convergence of the iterativealgorithm. In both cases, the computational complexity is linear in the number of subcarriers and in thenumber of iterations.

Alternatively, the serial MMSE equalizer (7.35) can be replaced by a block MMSE equalizer thatjointly calculates all the a priori values (Fang, Rugini, & Leus, 2008). In this case, only the block-wise a priori update is possible. Turbo block MMSE equalization can be related to probabilistic dataassociation, which is commonly considered a quasi-ML technique. Indeed, in the presence of receiverwindowing, the turbo block MMSE equalizer (Fang et al., 2008) outperforms the corresponding serialversion, improving the BER performance at medium SNR. The computational complexity is similar tothat of Schniter (2004).

Summarizing, many iterative ICI mitigation techniques have been presented in the literature,exploiting serial or block MMSE equalization, receiver windowing, and serial or block a priori LLRupdating, sometimes incorporating channel estimation or channel decoding into the turbo loop. Obvi-ously, performance and complexity highly depend on the type of iterative scheme and on the number ofiterations. Therefore, a thorough comparison is difficult, except for some specific cases. A comparisonbetween some selected schemes has been performed by Schniter (2004) and Fang et al. (2008). A gen-eral drawback of iterative nonlinear equalizers is the difficulty of a theoretical convergence analysis.Usually, the number of iterations is selected heuristically or by means of EXIT charts (ten Brink, 2001).

7.2.2.5 Performance-Complexity Trade-OffFirst, we compare a linear block equalizer with a nonlinear block DFE approach. Both equalizers aredesigned using the MMSE criterion. The simulation scenario is that of Fig. 7.5, i.e., L= 128 subcarri-ers (A= 96 active), QPSK modulation, WSSUS Rayleigh channel with Jakes’ Doppler power profilewith ϑmax = 0.12, and with truncated exponential delay power profile E{|h[n,m]|2} = αe−0.6m oflength M = 9. Banded equalizers use perfect CSI, Db = 2, and Tikhonov regularization. MBAE-SOEwindowing is employed.

Figure 7.7 shows the BER performance of three linear block MMSE equalizers (banded, window-aided, and full) and of three block MMSE DFE receivers (banded, window-aided, and full). The useof DFE produces a noticeable performance improvement. This improvement is more evident in thepresence of windowing, e.g., when the computational complexity of DFE is roughly doubled withrespect to linear approaches (Rugini et al., 2006).

We next compare two nonlinear approaches, focusing on iterative (turbo) MMSE equalizers. Weagain assume the simulation parameters of Fig. 7.7, except the channel length, which is M = 17 inthis case, and the CP length LCP = 16. Figure 7.8 displays the BER performance of two window-aidedturbo banded MMSE equalizers using either a block approach (Fang et al., 2008) or a serial approach(Schniter, 2004). For both cases, Db = 2 and MBAE-SOE windowing is adopted. The block equalizeremploys Tikhonov regularization when the SNR Es/N0 exceeds 25 dB. For the serial equalizer, the



0 5 10 15 20 25 30

Eb/N0 (dB)

BE

R

MMSE (banded, block)

MMSE (windowing)

MMSE (full, block)

DFE (banded, block)

DFE (windowing)

DFE (full, block)

10−4

10−3

10−2

10−1

FIGURE 7.7

Block MMSE equalizers: BER performance comparison between linear and DFE versions.

10 15 20 25 30

Serial, 1 iterationSerial, 2 iterationsSerial, 3 iterationsBlock, 1 iterationBlock, 2 iterationsBlock, 3 iterations

BE

R

10−4

10−3

10−2

10−1

Eb/N0 (dB)

FIGURE 7.8

Iterative (turbo) banded MMSE equalizers: BER performance comparison between serial and block versions.



serial-wise LLR updating is considered, since it outperforms the block-wise LLR updating (Schniter,2004). Figure 7.8 shows that the serial equalizer exhibits a slight performance loss at low SNR. This ismainly caused by the presence of windowing: since the window produces noise correlation across thesubcarriers, considering only few subcarriers simultaneously is suboptimal. Block approaches jointlyequalize all the subcarriers and therefore do not suffer from this loss. By focusing on the first itera-tion, obtained by linear equalization, it can be observed that the serial equalizer outperforms the blockequalizer at very high SNR. This is mainly due to the ill-conditioning of the frequency-domain chan-nel matrix, which is more problematic in the block case because of the larger size of the channelmatrix. However, after the second iteration, the block equalizer is able to recover the gap and to out-perform the corresponding serial version. More iterations improve the performance only slightly, inboth cases. As far as complexity is concerned, both versions have linear complexity with respect tothe number of subcarriers and the number of iterations. However, serial equalizers are more complex,with Cserial ≈ 1.75 Cblock when Db = 2 and Cserial ≈ 2.50 Cblock when Db = 4. On the other hand, serialequalizers deal with matrices of smaller size and hence in general are characterized by reduced memoryrequirements.

7.2.3 Transmitter PreprocessingIn Sections 7.2.1 and 7.2.2, we presented a broad overview of receiver processing techniques thatare able to reduce the ICI power in CP-based OFDM systems with LTV channels. Current OFDMsystems, which are designed for LTI channels, allow for the use of alternative receiver techniqueswhen the channel is LTV. More refined approaches modify the OFDM transmission scheme in orderto counteract the Doppler effect before the signal is received. As a result of transmitter processing, theneed for equalization is highly reduced.

When the time variation of the channel is significant, it is not reasonable to assume full knowledgeof the CSI at the transmitter. Therefore, CSI transmitter processing techniques used for LTI channels,such as ZF pre-equalization or Tomlinson–Harashima precoding, are not appropriate for LTV channels.However, statistical CSI knowledge of the WSSUS channel can be helpful.

In this section, we describe some common techniques to cope with the channel time variation at thetransmitter. Some of these techniques have been originally proposed to counteract the ICI producedby a CFO, but have been successfully applied also to other types of Doppler distortions. We splitthese transmitter techniques into two categories: (1) data precoding, which only acts on the data tobe transmitted, and (2) pulse shaping, which instead operates on the transmitted signal waveform. Ingeneral, data precoding requires a minor modification of current OFDM standards, since the precodeddata can be transmitted by standard OFDM systems. On the other hand, pulse-shaping techniquesrequire additional filtering at the transmitter and at the receiver.

7.2.3.1 Data PrecodingIn OFDM, linear precoding of the frequency-domain data vector can result in a multipath diversitygain even in the absence of Galois-field channel coding (Wang & Giannakis, 2003). Similarly, dataprecoding across tones can be exploited aiming at ICI reduction (Zhang & Li, 2003; Zhao & Haggman,2001). Two types of data precoding methods exist: redundant and nonredundant. Redundant linearprecoding is performed by means of a tall L×A precoding matrix P applied to the data vector a.



By using (7.21), we can express the received data as

y=HFPa+ z.

In other words, A data symbols share L> A subcarriers. The most popular redundant precodingapproach, known as self-cancelation or polynomial cancelation coding (Armstrong, 1999; Zhao &Haggman, 2001), exploits the ICI correlation of nearby subcarriers. For instance, the rate-1/2 schemeuses P= IL/2⊗ (1 −1)T , where ⊗ denotes the Kronecker product. Hence, the same symbol istransmitted on two consecutive subcarriers. At the receiver, the data symbols are estimated as

a= PHQ−1y= PHQ−1HFPa+PHQ−1z,

where Q= diag{HF}, i.e., one-tap equalization is used. We note that the data received on two consecu-tive subcarriers are subtracted to recover the original data. Since there exists a strong correlation amongnearby ICI elements of HF, significant ICI cancelation is achieved. In other words, the off-diagonalelements of the L/2×L/2 system matrix PHQ−1HFP are by far smaller than those of Q−1HF, whichresults in a reduced ICI. Consequently, significant performance gains are possible compared to con-volutional coding (Zhao & Haggman, 2001), especially at high Doppler spread and at low SNR. Asexplained by Armstrong (1999) for CFO, this technique effectively eliminates the constant and linearcomponents of the ICI variation over the rows of the frequency-domain channel matrix.

Even higher performance gains can be obtained by self-canceling more than two subcarriers. Ingeneral, P= IA⊗ p is used, where p is a size-G vector obtained using the coefficients of the polyno-mial p(x)= (1− x)G−1 and G , L/A (assumed integer) denotes the number of subcarriers per symbol(Zhao & Haggman, 2001). For instance, in the CFO case, G= 3 permits the elimination of the ICIup to the cubic component (Armstrong, 1999). The main drawback is the information rate reduction,which, however can be eliminated using higher-order constellations.

An extension of the self-cancelation method consists in precoding each data symbol over G con-secutive carriers, which are used to transmit several frequency-shifted replicas of the same datasymbol (Seyedi & Saulnier, 2005). The frequency-shift values can be integer (Zhao & Haggman,2001) or noninteger numbers. Noninteger frequency-shifted data precoding can be implemented usingfrequency-domain upsampling and time-domain windowing, producing an additional computationalcomplexity of only O(L) (Seyedi & Saulnier, 2005). The main drawback of this approach lies in thecomplexity of the window design, which requires a numerical maximization (Seyedi & Saulnier, 2005).Other ICI cancelation codes, based on capacity maximization, are investigated by Yun, Chung, & Lee(2007).

Nonredundant precoding techniques are commonly based on correlative coding (Zhao, Leclercq, &Haggman, 1998) and partial response coding (Zhang & Li, 2003), applied in the frequency domain.This corresponds to a square precoder P with triangular and banded structure applied to (modulo-precoded) data. Precoder designs that approximately minimize the ICI power have been presented byZhang & Li (2003). Similarly to the time-domain case, the data detection can be performed using aper-subcarrier detector (Zhao et al., 1998) or a joint (ML-based) detector (Zhang & Li, 2003). This lastcase is more complex but provides significant ICI reduction (more than 4 dB).



7.2.3.2 Pulse Shaping and Transmitter WindowingPulse-shaping techniques use transmitter (and receiver) windows in order to reduce the sensitivity toICI (Bolcskei, 2003; Haas & Belfiore, 1997; Hunziker & Dahlhaus, 2003; Kozek & Molisch, 1998;Matz, Schafhuber, Grochenig, Hartmann, & Hlawatsch, 2007; Strohmer & Beaver, 2003). In this case,the signal waveform is modified without changing the transmitted data symbols. In addition to thereduced ICI, pulse-shaping techniques also provide additional robustness to frequency synchronizationerrors and reduce the spurious emissions into adjacent channels. To describe this set of techniques, it isuseful to adopt a continuous-time model of the OFDM transmission. Similarly to (2.13) in Chapter 2,the transmitted signal can be expressed by

s(t)=K−1∑k=0

L−1∑l=0

a[k, l]g(t− kT)ej2π lFt,

where a[k, l] is the data symbol transmitted on the lth subcarrier of the kth OFDM block, F is thesubcarrier separation, T is the OFDM block duration, including a possible cyclic extension, and g(t) isthe transmitted pulse, which is rectangular in conventional OFDM systems. For simplicity, we assumethat only data are transmitted. After passing through an LTV channel with continuous-time impulseresponse h(t,τ), the received signal is obtained as

r(t)=

∞∫−∞

h(t,τ)s(t− τ)dτ +w(t),

which is demodulated by computing the inner products with the receiver pulse-shaped waveforms{γ (t− kT)e j2π lFt

}, as expressed by

y[k, l]=

∞∫−∞

r(t)γ ∗(t− kT)e−j2π lFtdt.

Conventional OFDM systems, which employ rectangular pulse shapes for both the transmitter pulseg(t) and the receiver pulse γ (t), are orthogonal for both ideal and LTI channels. Orthogonality meansthat in the absence of noise, y[k, l] is a scaled version of a[k, l] and hence does not contain any unwantedcontribution from symbols a[k, l] with (k, l) 6= (k, l). In other words, ISI is absent, since y[k, l] does notdepend on a[k, l] with k 6= k, and also intra-block ICI is avoided,5 because y[k, l] does not depend ona[k, l] with l 6= l. A necessary condition for orthogonality is TF ≥ 1. For noiseless ideal channels, wherethe CP is not necessary, OFDM systems avoid the ISI by rectangular windowing, i.e., g(t) is constantwhen t ∈ [0,T] and g(t)= 0 elsewhere, and avoid the ICI by choosing F = 1/T . Indeed, the time-domain truncated complex exponentials {ej2π lFtg(t), l= 0, ...,L− 1} with F = 1/T lead to a frequency-domain sinc-shaped waveform centered at lF = l/T with zeros on a regular grid, at l′F = l′/T , l′ 6= l,so that only a single sinc waveform contributes to the signal components at the frequency lF = l/T . Inthis case, TF = 1, i.e., the spectral efficiency is maximum. OFDM signals maintain their orthogonalityeven for LTI multipath channels, provided that a cyclic extension (or a guard interval) is inserted

5Note that we have included the inter-block ICI in the ISI component.



to guarantee ISI-free reception and F = 1/TU, where TU is the useful (CP-free) part of the signal.Note that in CP-based OFDM, orthogonality is maintained at the price of reduced spectral efficiency,due to the insertion of a guard period of duration TCP = T −TU, which makes TF > 1. However, theorthogonality of OFDM systems is lost in the case of LTV channels: the Doppler spread modifies thesinc waveforms by a frequency-domain convolution. Therefore, as explained after (7.13), the zeros ofthe resulting function do not fall on the regular frequency grid anymore, and consequently ICI arises.This ICI is significant for time-domain rectangular windowing in the case of LTV channels, since thesinc function decays only as 1/f . On the contrary, using pulse-shaping (windowing) functions withbetter frequency-domain decay properties, less ICI would be introduced by Doppler spread, therebydecreasing the need for complex LTV equalization. Note that for LTV channels, OFDM systems withrectangular pulses avoid ISI, provided that a sufficiently long guard interval is inserted.

Let us first assume that the pulses g(t) and γ (t) have the same shape, as suggested by optimal(matched) filtering in AWGN. A first pulse-shaping approach to reduce the ICI for LTV channels,while maintaining orthogonality in the ideal case, is to use the Nyquist criterion (Muschallik, 1996) todesign the pulses g(t) and γ (t). For instance, dually to the ISI mitigation principle for single-carriertransmissions, a time-domain raised cosine window, which decays as 1/f 3, can be evenly split betweentransmitter and receiver. However, the decay rate is not the only factor that induces ICI mitigation (Tan& Beaulieu, 2004). Different kinds of orthogonal pulses have been proposed. The use of time-domainsinc pulse shapes makes the subcarrier spectrum rectangular and therefore is dual to conventionalOFDM with rectangular pulses. This is the idealized version of the filtered multitone approach (Amini& Farhang-Boroujeny, 2009; Tonello & Pecile, 2008; Wang, Proakis, & Zeidler, 2007), which can useguard bands between subcarriers to completely eliminate the ICI in the case of LTV channels. Otherdesigns adopt pulses that are well localized in both time and frequency. Some examples include thequasi-orthogonal pulses of Haas & Belfiore (1997), which are based on Hermite functions, and thescale-adapted pulses of Liu, Kadous, & Sayeed (2004), which are matched to the spread factor of thechannel. Although the orthogonal approach is optimum for (nondispersive) AWGN channels, there is aprice to be paid for the obtained ICI reduction. Indeed, any window with good spectral properties has alarger duration than the rectangular window, and consequently, an additional guard period is necessaryto avoid ISI, thus reducing the spectral efficiency. Otherwise, inter-block orthogonality is lost, and ISIequalization is required even for nondispersive channels.

Another orthogonal scheme is the lattice OFDM approach of Strohmer & Beaver (2003), wherethe temporal locations of the OFDM blocks at a given subcarrier are staggered with respect to those atthe two adjacent subcarriers. This gives rise to hexagonal lattices in the time-frequency plane, which isconsistent with the sphere-packing principle. Well-localized pulses are designed by orthogonalizationof Gaussian pulses, taking into account the shape of the channel scattering function CH(τ ,ν). Sincethis scheme uses well-localized orthogonal pulses, TF > 1.

Let us now assume that the pulses g(t) and γ (t) have different shapes. Biorthogonal approachesrely on transmit and receive pulses g(t) and γ (t) that are characterized by the ambiguity functionaccording to

Agγ (kT , lF),

∞∫−∞

g(t)γ ∗(t− kT)e−j2π lFtdt = δ[k]δ[l]. (7.36)

The key idea is that different transmit and receive pulses provide more degrees of freedom for ICIand ISI mitigation, at the price of a slightly reduced performance for idealized AWGN channels due to



mismatched receive filtering. Kozek & Molisch (1998) have proposed a transmit pulse design based onthe maximization of the useful signal, leading to a large |Agγ (τ ,ν)|2 wherever CH(τ ,ν) is large, whilethe receive pulse is chosen to fulfill the biorthogonality constraint. Matz et al. (2007) have proposed areceive pulse design that minimizes the ICI power (for a fixed transmit pulse).

Alternatively, nonbiorthogonal approaches are possible. In this case, different pulses g(t) and γ (t)that do not satisfy (7.36) are chosen. For instance, the joint transmit-receive pulse design of Matz et al.(2007) is obtained by disregarding the biorthogonality constraint. On the other hand, the pulse designsof Das & Schniter (2007) adopt an input SINR criterion that neglects the ICI due to nearby subcarriers,which is subsequently mitigated by an iterative banded nonlinear equalizer.

A different class of techniques is based on offset QAM, with real-valued pulses. The main advantageof pulse-shaped OFDM with offset QAM is the existence of well-localized functions even for TF = 1,which gives the maximum spectral efficiency. Some examples have been presented by Bolcskei (2003),Jung & Wunder (2007), Le Floch et al. (1995), and Vahlin & Holte (1995).

A totally different approach relies on using chirp waveforms for the “pulses” g(t) and γ (t). In thiscase, perfect ICI elimination is possible when the delay-Doppler spreading function of the channelis a straight line in the time-frequency plane (Barbarossa & Torti, 2001). Chirp approaches based onthe fractional Fourier transform and on the affine Fourier transform have been presented by Martone(2001) and by Erseghe, Laurenti, & Cellini (2005), respectively.

7.2.4 Extension to MIMO-OFDMThe transmitter-based and receiver-based ICI mitigation methods discussed in the previous sectionscan be extended to MIMO-OFDM systems, using convenient methods to deal with the IAI arising frommultiple receive antennas. The research on ICI mitigation in MIMO-OFDM systems is quite recent,and relatively few techniques have been proposed so far. Therefore, differently from the single-antennacase, a meticulous categorization of the proposed techniques is not opportune. In the following sections,we distinguish between ICI mitigation techniques proposed for spatial multiplexing systems, whichaim at increasing the data rate, and techniques proposed for space-time-frequency coding systems,which seek to improve the performance.

7.2.4.1 OFDM with Spatial MultiplexingIn spatial multiplexing systems, different data symbols are transmitted from different antennas usingshared frequency-time slots. Since (7.19) is formally similar to its single-antenna version (7.3), theequalization methods previously described can be used for MIMO-OFDM with minor modifications.For instance, a block linear MMSE equalizer has been investigated by Stamoulis et al. (2002), and abanded version that includes receiver windowing in the MBAE sense has been considered by Rugini &Banelli (2006). We note that linear equalization is appropriate only when MR ≥MT; otherwise there arenot enough degrees of freedom for data recovery. Therefore, when MR <MT, nonlinear equalizationor ICI cancelation is necessary.

Among the ICI cancelation techniques proposed in the literature, a simple approach consists inapplying an iterative interference canceller with two stages: the first counteracts the ICI due to theDoppler spread, while the second reduces the IAI due to multiple transmit antennas. In the first stage,a banded PIC is usually adopted (Li, Li, & Vucetic, 2008; Song & Lim, 2006), while the second stage



could employ a VBLAST-like nulling-canceling method (Song & Lim, 2006) or a PIC with linearcombining of the outputs of different iterations (Li et al., 2008). Alternatively, joint cancelation of bothICI and IAI can be performed by turbo banded approaches, as in the window-assisted equalizer ofRugini, Banelli, Fang, & Leus (2009) and in the turbo decoder of Liu & Fitz (2008).

Correlative coding has been extended to MIMO-OFDM by Zhang & Liu (2006). However, the datarecovery by means of ML sequence estimation can be quite expensive when the number of subcarriers(and the number of transmit antennas) is large.

7.2.4.2 OFDM with Space-Time-Frequency CodingIt is well known that the BER performance for MIMO channels can be improved by space-time coding(STC). In MIMO-OFDM, there is an additional domain, and hence, space-frequency coding (SFC) orspace-time-frequency coding (STFC) is also possible. The choice between STC and SFC depends onthe channel selectivity: intuitively, STC is able to collect the time diversity, while SFC can gather thefrequency diversity. STFC can collect both gains, but it requires additional processing.

OFDM is a type of block transmission, and consequently, most of the techniques proposed forMIMO-OFDM in LTV channels are based on block codes. Usually, the employed block codes areorthogonal, such as the well-known Alamouti code (Alamouti, 1998). However, the ICI induced bythe channel Doppler spread destroys the code orthogonality, and consequently, Alamouti combining atthe receiver is no longer equal to the ML detector. Lin, Chiang, & Li (2005) have compared differentreceiver combining methods (namely Alamouti, ZF, decision-feedback, and ML) for space-time blockcoding (STBC) and space-frequency block coding (SFBC).

Kim, Heath Jr, & Powers (2005) have proposed an STBC receiver that switches between decision-feedback and Alamouti combining. Fang, Leus, & Rugini (2006) have investigated a banded MMSEapproach for STBC. For CFO distortions, a Tomlinson–Harashima precoder with partial CSI at thetransmitter has been proposed by Fu, Tellambura, & Krzymien (2007). SFBC ICI self-cancelationcodes are proposed in Dao & Tellambura (2005). The SFBC of Park & Cho (2005) is an Alamoutitechnique applied to a group of consecutive subcarriers, which contain redundant frequency-domainprecoded data. Zhu, Wen, & Du (2008) have shown that space-time-frequency block coding (STFBC)outperforms STBC and SFBC, as expected by intuition.

7.3 TIME-VARYING CHANNEL ESTIMATIONMost of the described ICI mitigation techniques assume that the receiver knows the LTV channel. Inrapidly time-varying scenarios, the channel estimation task is rather cumbersome because the CIR isnot constant within the OFDM block. As a consequence, multiple parameter estimation is necessary foreach channel path, and the estimation has to be repeated (or updated) for each OFDM block. Therefore,we include an overview of LTV channel estimation for OFDM (this issue is treated in more detailwithin Chapter 4). In this section, we first review the basis expansion model (BEM), which is oneof the most popular channel models used for LTV channel estimation. Then, we describe some pilot-aided and data-aided channel estimation methods. An iterative channel estimation method based onturbo processing will also be considered.

fundamentals of time-varying communication 1 …...hlawatsch ch01-9780123744838 2011/2/21 21:31 page...

Documents