signal shaping for bicm at low snr
TRANSCRIPT
2396 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 4, APRIL 2013
Signal Shaping for BICM at Low SNRErik Agrell, Senior Member, IEEE, and Alex Alvarado, Member, IEEE
Abstract—The generalized mutual information (GMI) of bit-in-terleaved coded modulation (BICM) systems, sometimes called theBICM capacity, is investigated at low signal-to-noise ratio (SNR).The combinations of input alphabet, input distribution, and binarylabeling that achieve the Shannon limit are completelycharacterized. The main conclusion is that a BICM system withprobabilistic shaping achieves the Shannon limit at low SNR if andonly if it can be represented as a zero-mean linear projection of ahypercube. Hence, probabilistic shaping offers no extra degrees offreedom to optimize the low-SNR BICM-GMI, in addition to whatis provided by geometrical shaping. The analytical conclusions areconfirmed by numerical results, which also show that for a fixedinput alphabet, probabilistic shaping can improve the BICM-GMIin the low and medium SNR range.
Index Terms—Binary labeling, bit-interleaved codedmodulation(BICM), generalizedmutual information (GMI), Hadamard trans-form, probabilistic shaping, Shannon limit (SL), wideband regime.
I. INTRODUCTION
T HE most important breakthrough for coded modulation(CM) in fading channels came in 1992, when Zehavi
introduced the so-called bit-interleaved coded modulation(BICM) [1], usually referred to as a pragmatic approach forCM [2], [3]. Despite not being fully understood theoretically,BICM has been rapidly adopted in commercial systems suchas wireless and wired broadband access networks, 3G/4G tele-phony, and digital video broadcasting, making it the de factostandard for current telecommunications systems [3, Ch. 1].Signal shaping refers to the use of nonequally spaced and/or
nonequally likely symbols, i.e., geometrical shaping and prob-abilistic shaping, respectively. Signal shaping has been studiedduring many years, see [4] and [5] and references therein. In thecontext of BICM, geometrical shaping was studied in [6]–[8],and probabilistic shaping, i.e., varying the probabilities of thebit streams, was first proposed in [9] and [10] and developedfurther in [11]–[14]. Probabilistic shaping offers another degreeof freedom in the BICM design, which can be used to make thediscrete input distribution more similar to the optimal distribu-tion (which is in general unknown). This is particularly advan-tageous at low and medium SNR.
Manuscript received February 28, 2012; revised September 13, 2012;accepted November 07, 2012. Date of publication December 04, 2012; dateof current version March 13, 2013. This work was supported in part byThe British Academy and The Royal Society (via the Newton InternationalFellowship scheme), U.K., and in part by the European Community’s Seventh’sFramework Programme (FP7/2007-2013) under Grant 271986. This paper waspresented in part at the 2012 Information Theory and Applications Workshopand in part at the 2012 IEEE International Symposium on Information Theory.E. Agrell is with theDepartment of Signals and Systems, Chalmers University
of Technology, SE-41296 Göteborg, Sweden (e-mail: [email protected]).A. Alvarado is with theDepartment of Engineering, University of Cambridge,
Cambridge, CB2 1PZ, U.K. (e-mail: [email protected]).Communicated by R. F. H. Fischer, Associate Editor for Communications.Digital Object Identifier 10.1109/TIT.2012.2231900
For the additive white Gaussian noise (AWGN) channel, theso-called Shannon limit (SL) represents the averagebit energy-to-noise ratio needed to transmit information reliablywhen the signal-to-noise ratio (SNR) tends to zero [15], [16],i.e., in the wideband regime. When discrete input alphabets areconsidered at the transmitter and a BICM decoder is used at thereceiver, the SL is not always achieved as first noticed in [17].This was later shown to be caused by the selection of the binarylabeling [18]. The behavior of BICM in the wideband regimewas studied in [17]–[21] as a function of the alphabet andthe binary labeling , assuming a uniform input distribution.First-order optimal (FOO) constellations were defined in [21] asthe triplet that make a BICM system achieve the SL,where represents the input distribution.In this paper, the results of [21] are generalized to nonuni-
form input distributions and give a complete characterization ofFOO constellations for BICM in terms of . More par-ticularly, the geometrical and/or probabilistic shaping rules thatshould be applied to a constellation to make it FOO are found.The main conclusion is that probabilistic shaping offers no extradegrees of freedom in addition to what is provided by geomet-rical shaping for BICM in the wideband regime.
II. PRELIMINARIES
A. Notation
Bold italic letters denote row vectors. Block letters de-note matrices or sometimes column vectors. The identity ma-trix is . The inner product between two row vectors andis denoted by and their elementwise product by .The Euclidean norm of the vector is denoted by . Randomvariables are denoted by capital letters and random vectorsby boldface capital vectors . The probability density function(pdf) of the random vector is denoted by and the con-ditional pdf by . A similar notation applies to prob-ability mass functions of a random variable, which are denotedby and . Expectations are denoted by .The empty set is denoted by and the binary set by
. The negation of a bit is denoted by .Binary addition (exclusive-OR) of two bits and is denotedby . The same notation denotes the integer that resultsfrom taking the bitwise exclusive-or of two integers and .
B. System Model
We consider transmissions over a discrete-time memorylessvectorial fast fading channel. The received vector at any discretetime instant is
(1)
where is the channel input and is Gaussian noise with zeromean and variance in each dimension [1], [3, App. 2.A].The channel is represented by the -dimensional vector . It
0018-9448/$31.00 © 2012 IEEE
AGRELL AND ALVARADO: SIGNAL SHAPING FOR BICM AT LOW SNR 2397
Fig. 1. Generic BICM system, consisting of a BICM transmitter, the channel, and a BICM receiver.
contains the real fading coefficients , which are random, pos-sibly dependent, with the same pdf . We assume thatand are perfectly known at the receiver or can be perfectlyestimated, and that the technical requirements on and in[21, Sec. I-D] are satisfied.The conditional transition pdf of the channel in (1) is
(2)
The SNR is defined as
(3)
where is the average transmitted symbol energy,is the transmission rate in information bits per symbol, and
is the average received energy per informa-tion bit.The generic BICM scheme in Fig. 1 is considered. The trans-
mitter is, in the simplest case, a single binary encoder concate-nated with an interleaver and a memoryless mapper . Multipleencoders and/or interleavers may be needed to achieve prob-abilistic shaping [10]–[13]. At the receiver, using the channeloutput , the demapper computes metrics for the indi-vidual coded bits with , usually in the formof logarithmic likelihood ratios. These metrics are then passedto the deinterleaver(s) and decoder(s) to obtain an estimate ofthe information bits.The mapper is defined via the input alphabet
, where bits are used to indexthe symbols vectors for .We associate with each symbol the codeword (binarylabeling) and the probability
, where . The binary labeling isdenoted by and the inputdistribution by .In the following, the labeling used throughout this paper is
defined. This can be done without loss of generality, as will beexplained in Section II-C.
Definition 1 (Natural Binary Code): The natural binary code(NBC) is the binary labeling ,where denotes the base-2 rep-resentation of the integer , with beingthe most significant bit.This definition of the NBC is different from the one in [21].
The difference lies only in the bit ordering, i.e., in this paper,
we consider the last column of to contain the most sig-nificant bits of the base-2 representation of the integers
. It follows from Definition 1 that
(4)
for and , and
(5)
for , , and .
C. Probabilistic Shaping in BICM
Assuming independent, but possibly nonuniformly dis-tributed, bits at the input of the modulator (seeFig. 1), the symbol probabilities are given by [21, eq. (30)],[13, eq. (8)] [22, eq. (9)]
for , where for is the prob-ability of . Since , the dis-tribution is fully specified by the vector of bit probabilities
.Throughout this paper, we assume that for
all ; i.e., all constellation points are used with anonzero probability. This can be done without loss of generality,because if or for some , then half ofthe constellation points will never be transmitted. If this is thecase, the corresponding branches in Fig. 1 are removed, isreduced by one, and the mapper is redefined accordingly.1
The result is another BICM scheme with identical performance,which satisfies for all .For any constellation , a set of equivalent constella-
tions can be constructed by permuting the rows of , , and ,provided that the same permutation is applied to all three ma-trices. Specifically, denote the permutation that maps the NBCinto the desired labeling by , i.e., . The BICMsystem defined by the alphabet , the distribution , andthe labeling is entirely equivalent to the systemwith alphabet , distribution , and labeling . Without lossof generality, the analysis in this paper is therefore restricted tothe latter case.Based on the previous discussion, from now on, we use the
name constellation to denote the pair , where the NBClabeling is implicit. Thus, and for all and
1Constellations with for some can yield counter-intuitive results,such as Gray-labeled constellations being FOO (see [13], [14], and Example 7).
2398 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 4, APRIL 2013
, which simplifies the analysis. Note that cannot be chosenarbitrarily in BICM; only distributions that satisfy
(6)
for some vector of bit probabilities will be considered in thepaper. An important special case is the uniform distribution, forwhich and .
D. Hadamard Transform
The Hadamard transform (HT), or Walsh–Hadamard trans-form, is a discrete, linear, orthogonal transform, whose coeffi-cients take values in . It is popular in image processing [23]and can be used to analyze various aspects of binary labelingsin digital communications and source coding [21], [24]–[26].
Definition 2: The HT of a matrix (orvector) with rows is
(7)
where for all and
(8)
Because for , setting in(7)–(8) shows that the first HT vector
(9)
can be interpreted as the uniformly weighted mean of the al-phabet. This is a property that the HT shares with, e.g., the dis-crete Fourier transform.It can be shown from (8) that
(10)
for all and . Therefore,the inverse transform is identical to the forward transform, apartfrom a scale factor:
(11)
E. New Transform
In this section, we define a linear transform between vectorsor matrices, which depends on the input distribution via thebit probabilities . Its usage will become clear in Section III-C.
Definition 3: Given the bit probabilities, the transform
of a matrix (or vector) is
(12)
where is given by (6). The coefficients are defined as
(13)for all and , where the barsrepresent negation ( , see Section II-A).
Remark 1: For equally likely symbols, i.e., , thetransform becomes the identity operation , because then
for and for .
The transform coefficients are nonsymmetric in the sensethat in general . They have some appealing propertiesgiven by the following lemma, which will be used in the proofsof Theorems 3, 4, and 8.
Lemma 1: For any and
(14)
(15)
where is given by (6) and is defined in (8).Proof: See the Appendix.
We pay particular attention to two important special casesof (15). First, if , then andfor . Second, if for any integer
, then by (8), for anyand by (6)
Using first (5) and then (4), we obtain
Substituting these two cases ( and ) into (15) provesthe following corollary.
AGRELL AND ALVARADO: SIGNAL SHAPING FOR BICM AT LOW SNR 2399
Corollary 2: For any
(16)
(17)
The fact that the sums in (14) are zero when-ever , independently of the input distribution, implies thatthe coefficients form an orthogonal basis. As a consequence,the transform is invertible, as shown in the next theorem.
Theorem 3: The inverse transformof a matrix (or vector) is, given the bitprobabilities
(18)
Proof: For
Applying (14) and dividing both sides by , which bySection II-C is nonzero, completes the proof.
Example 1: If the bit probabilities are, then the symbol probabilities (6) are
. The transform coefficientsin (13) are the elements at row , column of
(19)
It is readily verified that , which is (14) in ma-trix notation. The mean values in each column of (19) are
, which in agreement with (16)are the square roots of the elements in . Similarly, it can beshown that in (19) satisfies (15) and (17).If the Gray-labeled 4-ary pulse amplitude modulation
(PAM) constellation is considered, .Rewriting (12) in matrix notation, the transform can be cal-culated as ,where . This nonequally spaced 4-PAM alphabetwill be illustrated and analyzed in Example 3. The inversetransform (18) can be written as . For auniform distribution, , which agrees withRemark 1.In Section IV-B, we will need to apply the HT and the new
transform after each other to the same alphabet. However, thetwo transforms do not commute, and the result will therefore
Fig. 2. Relations between the alphabet , its transform , and their respectiveHadamard transforms and . The transform matrices , , , and aredefined in Examples 1 and 2.
depend on in which order the transforms are applied. Of partic-ular interest for our analysis is the setup in Fig. 2, where andare related via the transform defined previously. Their HTs
and are however not related via the same transform. Instead,a relation between and can be established via the followingtheorem.
Theorem 4: If , then their HTs and satisfy
(20)
(21)
where
(22)
and a product over is defined as 1.Proof: See the Appendix.
Remark 2: The summation in (20) can be confined to ,because whenever , there exists at least one bit positionfor which . Analogously, the summation in (21)can be confined to .
Example 2: Expression (20) can be written as , or. The element at row , column of and are
given by (20)–(21) as, respectively,and .
With , and from Example 1, we obtainand
(23)which, as predicted by Remark 2, are upper triangular.Another relation between and can be deduced from Fig. 2.
Defining the Hadamard matrix as the matrix with elementsfor , , the HT relations (7) and (11)
yield and . Since from Example 1
2400 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 4, APRIL 2013
, we conclude that ,which implies that . Because
(see (10)) and , the inverse relationis . It is straightforward to verifythat and calculated in this manner, using the numericalvalues of and in Example 1, indeed yield (23).
III. BICM AT LOW SNR
A. Mutual Information
The mutual information (MI) in bits per channel use betweenthe random vectors and for an arbitrary channel parameterperfectly known at the receiver is defined as
where the expectation is taken over the joint pdf , andis given by (2).
The MI between and conditioned on the value of the thbit at the input of the modulator is defined as
where the expectation is taken over the joint pdf .
Definition 4 (BICMGeneralizedMI): The BICM generalizedMI (BICM-GMI) is defined as [2], [17], [18], [27]
(24)
where the second line follows by the chain rule. We will analyzethe right-hand side of (24) as a function of , for a given pdf .According to (3), can be varied in two ways, either by varyingfor a fixed constellation or, equivalently, by rescaling
the alphabet linearly for fixed and input distribution .Martinez et al. [27] recognized the BICM decoder in Fig. 1
as a mismatched decoder and showed that the BICM-GMI in(24) corresponds to an achievable rate of such a decoder. Thismeans that reliable transmission using a BICM system at rateis possible if . Since from (3) ,
the inequality gives2
(25)
2The definition of the related function in [21, eq. (37)] is erroneousand should read “ is bounded from below by , where
.”
for any . Focusing on the wideband regime, i.e., asymptoticallylow SNR, we make the following definition.
Definition 5 (Low-GMI Parameters): The low-GMI param-eters of a constellation are defined as , where
In the wideband regime, the average bit energy-to-noise rationeeded for reliable transmission is, using (25) and the definitionof , lower bounded by
(26)
Furthermore, since in the wideband regime[15], .
The first-order behavior of the BICM-GMI in (24) is fullydetermined by , which, as we shall see later (e.g., in (36)), inturn depends on and . This is why we designate this tripletas low-GMI parameters. The same definitions can be applied toother MI functions such as the CM-MI [21]. In this paper,however, we are only interested in the BICM-GMI.The main contributions of this paper are to characterize the
low-GMI parameters for arbitrary constellations, includingthose with nonuniform distributions (see Section III-C), and toidentify the set of constellations for BICM that maximize , i.e.,minimize in the wideband regime (see Section IV-B).
B. Low-GMI Parameters for Uniform Distributions
The low-GMI parameters have been analyzed indetail for arbitrary input alphabets under the assumption ofuniform probabilities [21]. Under this assumption, they can beexpressed as given by the following theorem.
Theorem 5: For a constellation , the low-GMI param-eters are
(27)
(28)
(29)
Proof: Expressions (27) and (28) follow directly fromDef-inition 5, while (29) was proved in [21, eq. (50)].
The low-GMI parameters can be conveniently expressed asfunctions of the HT of the alphabet , as shown in the fol-lowing theorem.
AGRELL AND ALVARADO: SIGNAL SHAPING FOR BICM AT LOW SNR 2401
Theorem 6: The low-GMI parameters can be expressed as
(30)
(31)
(32)
Proof: The expression (30) is obtained from (9), (31) from[21, eq. (16)], and (32) from [21, Th. 11].
C. Low-GMI Parameters for Nonuniform Distributions
The next theorem is analogous to Theorem 5 but applies toan arbitrary input distribution.
Theorem 7: For a constellation , the low-GMI param-eters are
(33)
(34)
(35)
Proof: Again, (33) and (34) follow from Definition 5,while (35) requires some analysis. It was shown in [21, Th. 10]that
(36)
Substituting (33) and writing the squared norms as the innerproducts of two identical vectors yields
The expression in brackets can be simplified as
which completes the proof of (35).
Theorem 7 shows that the low-GMI parameters depend on theinput alphabet , the binary labeling (via in the expressionfor ), and the input distribution (via and ). Whilethe low-GMI parameters of an alphabet with uniform prob-abilities are conveniently expressed in terms of its HT (seeTheorem 6), no similar expressions are known for the low-GMIparameters of a general constellation in (33)–(35). This has sofar prevented the analytic optimization of such constellations.The new transform introduced in Section II-E, however, solvesthis problem by establishing an equivalence between an arbi-trary constellation, possibly with nonuniform probabilities, andanother constellation with uniform probabilities.
Theorem 8: The low-GMI parameters of anyconstellation are equal to the low-GMI parameters of
.Proof: Let the low-GMI parameters of be denoted
by . First, (27) and (12) yield
(37)
Applying (16) to the inner sum in (37) reveals that
Second, (28) and (12) yield
Evaluating the inner sum using (14) gives
For the third and last part of the theorem, (29) yields
(38)
where is given by (12). The inner sum can be expanded as
2402 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 4, APRIL 2013
TABLE ILOW-GMI PARAMETERS AND FOO CONDITIONS FOR BICM USING UNIFORM AND NONUNIFORM INPUT DISTRIBUTIONS. THE RESULTS FOR ARE FROM
[21] (CF., THEOREMS 6 AND 9) AND THE ONES FOR OR ARE FROM THEOREMS 7, 8, AND 11
Applying (17) to the inner sum, we obtain
We take the inner product of this vector with itself and substitutethe obtained expression for the squared norm in (38). This yieldsafter rearranging terms
(39)
(40)
The square root in (39) is ifor 1 if . In both cases, it can be expressed as
(or, equivalently, ).Comparing this result with (35) gives (40).
Theorem 8 shows that the constellation can be mappedto another constellation with the same low-GMI pa-rameters, where is related to via (12) and (18). This rela-tion between and will be applied in Section IV-Bto prove Theorem 10, which is the main result of the paper.To summarize, Table I lists the low-GMI parameters for BICMgiven by Theorems 6 and 7. The equivalence of the parametersfor or comes from Theorem 8.
D. Numerical Examples
In this section, we show examples of how the transformdefined in Section II-E works and we also present equiva-lent constellations and . All results are for theAWGN channel. The (G)MIs are numerically evaluated usingGauss–Hermite quadratures following [28, Sec. III].
Example 3: Consider the equally spaced 16-ary squarequadrature amplitude modulation (16-QAM) alphabet
labeled by the binary reflected Gray code(BRGC) [29] with bit probabilities ,shown with black circles in Fig. 3. The input distribution
given by (6) is ,, and
. These symbol probabilities
Fig. 3. 16-QAM constellation (black circles) with bit probabili-ties . Each symbol is marked with its index, and its probability is proportional to the area of the corresponding circle.White circles represent the transformed constellation , which hasthe same low-GMI parameters.
are indicated in Fig. 3, where the area of the circle representingis proportional to the corresponding probability .Another alphabet is obtained by applying the trans-
form in (12) to the constellation . The white circlesin Fig. 3 represent the symbols in using a uniform distri-bution . The alphabet is still a rectangular 16-QAMconstellation, but a nonuniformly spaced one. Every row inFig. 3 can be regarded as a probabilistically shaped (black) or ageometrically shaped (white) 4-PAM constellation; in fact, thesame 4-PAM constellations as in Example 1.The low-GMI parameters of the two constellations
and , given by Theorems 7 and 5,respectively, are identical, as predicted by Theorem 8. Theseare
(41)
The BICM-GMI for the constellations andare shown in Fig. 4. In this figure, we also show
the capacity of the AWGN channel [21, eq. (22)], andthe CM-MI and BICM-GMI for 16-QAM using a uniforminput distribution, i.e., . The results show that theBICM-GMIs of the original constellation and thetransformed constellation are in general different;however, they converge in the low-SNR regime. The endpointsof the BICM-GMI curves for and are
AGRELL AND ALVARADO: SIGNAL SHAPING FOR BICM AT LOW SNR 2403
Fig. 4. BICM-GMI for probabilistically shaped 16-QAM and its transform.The CM-MI and BICM-GMI for uniform input distributions are also shown, asis the AWGN capacity . The white circle and square indicate (26) withgiven by (41) and by [21, eq. (55)], respectively.
Fig. 5. 8-PSK constellation with bit probabilities (black cir-cles) and the transformed constellation (white circles), which havethe same low-GMI parameters. The circle areas are proportional to the symbolprobabilities . Dashed lines join symbols whose labels differ in one bit only.
shown with a white circle, whose value follows from (41) and(26). The endpoint for the BICM-GMI curve for the constella-tion is shown with a white square [17, eq. (18)].
Example 4: Consider the NBC-labeled -ary phase-shiftkeying (PSK) alphabet , where
with . The con-stellation for for two input distributions are shown inFigs. 5 and 6, where again the circle areas are proportional to thesymbol probabilities . We denote these input distributions byand , which are generated by and
, respectively. The transforms are irregular,not resembling a PSK alphabet. Nevertheless, the low-GMI pa-rameters for pairs of constellations andare again equal. Particularly, for and
for .The BICM-GMI for the two 8-PSK constellations in Figs. 5
and 6 is shown in Fig. 7. Also shown are the BICM-GMI andCM-MI for 8-PSK with uniform input distributions, for which
[21, eq. (56)] and , respectively, andthe capacity of the AWGN channel. Again, the results show
Fig. 6. 8-PSK constellation with bit probabilities (black circles)and its transform (white circles).
Fig. 7. BICM-GMI of the two probabilistically shaped 8-PSK constellations inFigs. 5 and 6 and their transforms . The CM-MI and BICM-GMI foruniform input distributions are also shown. The white circles and square showthe endpoints .
that the BICM-GMIs of the original constellationand the transformed constellation are in general dif-ferent but converge in the low-SNR regime. The endpoints ofthe BICM-GMI curves are obtained from (26).
Example 5: Consider the eight-level star-shaped QAMalphabet shown in Fig. 8 (black circles), which we denoteby . This alphabet is used with bit probabilities
, giving an input symbol probability .In this figure, we also show the transformed constellation
, which according to Theorem 8 has the samelow-GMI parameters as . This can be appreciatedin Fig. 9, where the corresponding BICM-GMIs are shown.Fig. 9 also shows how probabilistic shaping improves theBICM-GMI considerably over a wide range of SNRs.The results in Figs. 4, 7, and 9 also show other inter-
esting properties of probabilistic shaping for BICM. In thehigh-SNR regime, the use of a nonuniform distribution resultsin a loss in GMI, i.e., the curves flatten out at a value below
, but for a wide range of moderately high SNR,the BICM-GMI is higher with probabilistic shaping. For the16-QAM alphabet in Example 3, in the medium SNR regime,
2404 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 4, APRIL 2013
Fig. 8. Star-shaped 8-QAM constellation with bit probabilities(black circles) and its transform (white circles).
Fig. 9. BICM-GMI for , , and . TheAWGN capacity is also shown. The white circle and square show the endpoints
with and 1.18.
the use of nonequally likely symbols even gives a larger GMIthan the CM-MI obtained with a uniform distribution.
IV. FOO CONSTELLATIONS
Having characterized the low-SNR behavior of theBICM-GMI of an arbitrary constellation, the next step isto search for optimal constellations in terms of the BICM-GMIat low SNR. The following definition formally defines BICMsystems that achieve the SL.
Definition 6 (FOO Constellation): The constellationis said to be FOO if a BICM system using achieves theSL , i.e., .As discussed in Section II-C, we have fixed the labeling to
be the NBC, and thus, an FOO constellation is fully character-ized by only two parameters, the input alphabet and its inputdistribution , where the latter satisfies (6). The analysis is nev-ertheless, without loss of generality, applicable to an arbitrarylabeling by permuting the constellation, see Section II-C.
A. FOO Constellations for Uniform Distributions
In this section, we review results on FOO constellations forBICM for uniform input distributions. The next theorem givesnecessary and sufficient conditions for an input alphabet to be
FOO if the binary labeling is the NBC and input distribution isuniform.
Theorem 9: The constellation is FOO if and only if
(42)
Proof: From (31) and (32), if and only if
which gives (42).
This theorem was given in [21, Th.12], where it was usedto find FOO constellations for BICM when . It offersan appealing intuitive geometrical interpretation: An input al-phabet is FOO for a uniform input distribution if and only if itis a zero-mean linear projection of a hypercube. This behavioris illustrated in Example 6 and also in [21, Fig. 4].
B. FOO Constellations for Nonuniform Distributions
In this section, we derive necessary and sufficient conditionsfor a BICM system, with an arbitrary input alphabet and proba-bility distribution, to achieve the SL, i.e., we find FOO constel-lations for BICM. The conditions are derived by transformingan arbitrary constellation into another constellation with uni-form probabilities using Theorem 8 and applying Theorem 9to this transformed constellation. Since Theorem 9 is expressedin terms of the HT, a relation between the HTs of and isneeded, which is illustrated by the bottom arrow in Fig. 2. Sucha relation is provided by Theorem 4 and will be applied in theproofs of Theorems 10 and 11.
Theorem 10: The constellation is FOO if and only ifthe HT of satisfies both the following conditions:
(43)
(44)
Proof: We will prove the theorem in two steps. First, weprove the “if” part by showing that (43) and (44) imply that
is FOO. Second, we prove the “only if” part by showingthat if is FOO, then (43) and (44) hold.For the “if” part, suppose that (43) and (44) hold for a given
constellation . Applying (44) in (20) yields for the HTof
(45)Since the bits for , the first product in(45) is always zero, except when . (Recall that a productover in (20) is defined as 1.) Furthermore, because of (4), thesecond product in (45) is zero whenever for someinteger . We can therefore identify three cases for (45). First
(46)
AGRELL AND ALVARADO: SIGNAL SHAPING FOR BICM AT LOW SNR 2405
Second, yields
(47)
because of (43). And third, letting for an integer
(48)
When , the product in (48) includes two factors,and . For , and the whole product is0. Therefore, only contributes to the sum in (48). When
, the product in (48) is again over and (48) becomes
(49)
Combining the three cases (46), (47), and (49) yields
otherwise.
Using Theorem 9, we conclude that the constellation isFOO. Finally, Theorem 8 implies that is also FOO, whichcompletes the proof of the “if” part.For the “only if” part, assume that is FOO. By Theorem
8, is also FOO, and by Theorem 9, for any thatis not a power of two, where is the HT of . We will nowuse Theorem 4 to translate the condition on into conditions on.If for , then the summation over
in (21) can be reduced to a summation overfor
(50)
Due to (4), the product in (50) is nonzero only if the product isover or over the single-element set , i.e., ifor for some integer , respectively. Again, three casescan be identified. First, if , then theproduct in (50) includes at least one for every and
(51)
Second, for , the product comprises only one factor, ,and
(52)
And third, setting
(53)
As explained after (48), the product in (53) is 1 if (productover ) and 0 otherwise. Thus, the summation in (53) can bereduced to just one term, , and
(54)
Combining the three cases (51), (52), and (54) yields
otherwise
which satisfies (43) and (44). This completes the proof of the“only if” part.
Remark 3: Only (43) depends on the input distribution, not(44). In view of Theorem 9, the only difference between FOOconstellations with uniform and nonuniform distributions liesin . The final theorem gives this statement a more intuitiveinterpretation.
Theorem 11: The constellation is FOO if and only ifboth the following conditions hold:
(55)
(56)
Proof: We wish to prove that if (44) (or equivalently (56))holds, then (43) and (55) are equivalent.For any constellation , the mean , where is the
HT of . This follows from Theorem 8 and (30). The meancan be calculated by letting in (20) and (22) as
(57)
where .In this theorem, we are only interested in constellations that
satisfy (44) or equivalently (56). For such constellations, thesum in (57) includes at most nonzero terms, namely
This expression makes (43) and (55) equivalent.
Based on Theorem 9, the result in Theorem 11 can be under-stood as follows. If a constellation with a uniform input distribu-tion is FOO, it will still be FOO for any other input distributionprovided that the input alphabet is translated to be zero mean.In view of the geometrical interpretation of Theorem 9 given in[21, Th. 12], the result in Theorem 11 also states that a constella-tion is FOO if and only if its input alphabet is a linear projectionof a hypercube and it has zero mean.We also note that the zero-mean condition in Theorem 11 is
the same that guarantees FOO for the CM-MI [21, Footnote12]3. This implies that the only difference between FOO con-stellations for the CM-MI and the BICM-GMI lies in the extra
3The parameter for the CM-MI is [21, Th. 7] .
2406 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 4, APRIL 2013
constraint on the input alphabet to be a linear projection of ahypercube.
C. Numerical Examples
In this section, we give numerical examples to illustrate theanalytical results presented in this paper.
Example 6: Consider the so-called 8-AMPM alphabet [30]
(58)
which corresponds to a projected hypercube. The constellationwas shown to be FOO for in [21, Ex-
ample 4]. In view of Theorem 11, it is FOO for any if it haszero mean. Using (58) and (6) in (33), we find (after some al-gebra) that
(59)
For example, the bit probabilitiesgive an input distribution for which the mean (59) is
. We define another alphabetby subtracting from each element in . Fig. 10shows the translated constellation along with
, where is the transform offor the distribution . They are both zero-mean projectedhypercubes and thus FOO according to Theorem 11. This canbe observed from the BICM-GMI curve for inFig. 12.The exemplified method holds in full generality: Any al-
phabet that is FOO with a uniform input distribution is FOOalso with an arbitrary nonuniform distribution, if it is translatedto zero mean. Furthermore, all nonuniform FOO constellationscan be constructed in this manner.For certain distributions, the mean (59) is zero without trans-
lation. Specifically, if and only if
(60)
Clearly, the uniform caseanalyzed in [21] fulfills (60). More interestingly, when any othervector of bit probabilities fulfilling (60) is used, the resultingconstellation will be FOO. This is the case for instance with
. The obtained constellation, denoted by, is illustrated in Fig. 11 along with its transform. Graphically, both alphabets look like cubes, al-
though viewed from different angles, which is precisely whatTheorems 9 and 11 predict.In Fig. 12, we show the BICM-GMI for the zero-mean
constellations and as well asfor the constellations and . Asexpected, all the GMIs converge at the SL for low SNR. Forthese two cases, the transformed alphabets with uniform inputdistributions give larger GMI for all SNRs compared to thecorresponding nonuniform ones. This is however not alwaysthe case, cf. Fig. 7 with .
Example 7: -PAM alphabets have been shown to be FOOif the NBC is used with a uniform input distribution, i.e., the
Fig. 10. FOO constellations (black circles) and(white circles).
Fig. 11. Constellations (black circles) and(white circles). They both have zero mean and look like cubes, which indicatesthat they are FOO.
Fig. 12. BICM-GMI for the four FOO constellations in Figs. 10 and 11. TheSL is shown with a white circle.
constellation withis FOO [18], [21, Th. 14]. In this example, we
study the first-order behavior of the 8-PAM alphabet, where the order of the points repre-
sents the BRGC. The constellation is known not tobe FOO for [17, Th. 3].The BICM-GMI of is shown in Fig. 13 for the set of
bit probabilities for different values of . For, the uniform distribution is obtained. As decreases,
the Gray-labeled constellation approaches a zero-mean binaryalphabet, which is FOO [13, Fig. 2] [21, Fig. 3 (b)]. Fig. 13illustrates the tradeoff between the low- and high-SNR regimes:
AGRELL AND ALVARADO: SIGNAL SHAPING FOR BICM AT LOW SNR 2407
Fig. 13. BICM-GMI for the 8-PAM alphabet labeled by the BRGCwithuniform input distribution and bit probabilities for differentvalues of . The constellations approach a binary FOO constellation as. The NBC-labeled 8-PAM alphabet , although FOO with a uniformdistribution, is considerably weaker than for a wide range of SNRs.
The SL can be approached by decreasing , but this causes adecrease in GMI in the high-SNR regime. Alternatively, the SLcan be attained by switching from the BRGC to the NBC, butthis also comes with a heavy penalty at higher SNRs.
V. CONCLUSION
There exists a closed-form mapping between any probabilis-tically shaped constellation and a constellation with uniforminput distribution, such that the two systems have the samelow-SNR first-order behavior (Definition 3 and Theorem 8).Thus, the combination of probabilistic and geometric shapingis equivalent to pure geometric shaping at low SNR.We are particularly interested in BICM systems that attain the
SL at asymptotically low SNR, i.e., in FOO constel-lations. Somewhat disappointingly, the set of probabilisticallyshaped FOO constellation is no larger than the set of FOO con-stellations with uniform distributions, disregarding translationsof the whole input alphabet. Both sets can be fully characterizedas the set of linear projections of a hypercube, translated to havezero mean for the considered input distribution (Theorems 9 and10; cf. Figs. 10 and 11). Although non-FOO constellations forBICM can be improved by probabilistic shaping (see Fig. 13),it is impossible to make them FOO except in degenerate cases(by setting some probabilities equal to zero).
APPENDIXPROPERTIES OF THE TRANSFORM
In this Appendix, some theoretical properties of the newtransform defined in Section II-E are proved. In the first section,the “sum-product lemma” is established, which will be usedextensively throughout the appendix. In the following twosections, Lemma 1 and Theorem 4 are proved.
1) Sum–Product Lemma: Many properties of NBC-la-beled constellations and their transforms can be expressed as
the sum of products of certain functions, where each functiondepends on one bit position. Such expressions, which occurfrequently in the two subsequent sections, can be resolvedusing this general lemma.
Lemma 12: Let for and beany real numbers. Then
Proof: A summation over is equivalentto sums over , where and
. With this notation, and
2) Transform Coefficients: Lemma 1, which lists two fun-damental properties of the transform coefficients , was givenin Section II-E.
Proof of Lemma 1: The two parts of Lemma 1 will now beproved separately. First, the definition of in (13) yields
2408 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 4, APRIL 2013
where the last equality follows by repeatedly using the identitiesand for . Lemma 12 now
yields
(61)
where the last step follows because .The factors in (61) are either 2 or 0, depending on whether
or for the particular bit position . Thus,is either or 0, depending on whether
and have all bits equal or not. This completes the proof of(14).To prove the second part of Lemma 1, which is (15), we ob-
serve from (8) and (13) that
(62)
where
Intending to apply Lemma 12 to (62), we first calculate the quan-tity
for . For reasons that will soon become clear,we extract a common factor from all terms, ob-taining
The coefficient in front of is 2 if and 0otherwise. Similarly, the coefficient in front of is 0 if
and 2 otherwise. Thus, for every , the expressiondepends on either or but not both, namely
(63)
according to (5).Using (63), we are now ready to apply Lemma 12 to (62),
which yields
Applying (6) and (8), we obtain finally
which completes the proof of (15).
3) Two Consecutive Transforms: Theorem 4, which char-acterizes the vectors obtained by applying the new transformand the HT sequentially to a given alphabet, was given inSection II-E.
Proof of Theorem 4: To prove (20), we first write as afunction of using (7) and (12), as illustrated in Fig. 2. For
using (15) for the last equality. To obtain as a function of ,we express in terms of its inverse HT in (11), to obtain
(64)
where
AGRELL AND ALVARADO: SIGNAL SHAPING FOR BICM AT LOW SNR 2409
The coefficient can be expressed using (8) and (5) as
(65)
(66)
where to pass from (65) to (66), we used Lemma 12 with. Depending
on the value of , this product can be partitioned into twosubproducts
(67)
making use of defined in (22). Since the two factors in (67)depend on as
they can be expanded into four factors as
(68)
The two products of ones can be omitted. The product of zerosmay look strange but is perfectly legitimate; its value is by def-inition 1 if the set is empty and 0otherwise. Furthermore, since for allin this set, the second and third factors of (68) can be merged
into
(69)
which together with (64) completes the proof of (20).To prove the reverse relationship (21), we first multiply both
sides of (64) with and sum over to obtain
(70)
The Hadamard coefficient can be factorized using (8)with , as
(71)
Using (71) in (70) and replacing the ratios andwith their factorizations according to (69), we obtain for theinner sum on the right-hand side of (70)
(72)
where
The fourth case is always 0, because eitheror is 0. Furthermore, the second and thirdcases can be combined into
(73)We wish to apply Lemma 12 to the right-hand side of (72). In
order to do so, we first calculate from (73)
(74)
Now, by Lemma 12, (72) can be expressed as
By (74), this product will be nonzero only if and match inall bit positions , i.e., if . Thus, againutilizing (71)
(75)
2410 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 4, APRIL 2013
Finally, we combine (70) and (75) into
(76)
Dividing both sides by yields on the left-hand side thecoefficient , which can be expressedusing (8) and (69) as
This expression, substituted into (76), completes the proof of(21).
REFERENCES
[1] E. Zehavi, “8-PSK trellis codes for a Rayleigh channel,” IEEE Trans.Commun., vol. 40, no. 3, pp. 873–884, May 1992.
[2] G. Caire, G. Taricco, and E. Biglieri, “Bit-interleaved coded modula-tion,” IEEE Trans. Inf. Theory, vol. 44, no. 3, pp. 927–946, May 1998.
[3] A. Guillén i Fàbregas, A. Martinez, and G. Caire, “Bit-interleavedcoded modulation,” Found. Trends Commun. Inf. Theory, vol. 5, no.1–2, pp. 1–153, 2008.
[4] A. R. Calderbank and L. H. Ozarow, “Nonequiprobable signaling onthe Gaussian channel,” IEEE Trans. Inf. Theory, vol. 36, no. 4, pp.726–740, Jul. 1990.
[5] R. F. H. Fischer, Precoding and Signal Shaping for Digital Transmis-sion. New York: Wiley, 2002.
[6] D. Sommer and G. P. Fettweis, “Signal shaping by non-uniform QAMfor AWGN channels and applications using turbo coding,” in Proc. Int.ITG Conf. Source Channel Coding, Munich, Germany, Jan. 2000, pp.81–86.
[7] S. Y. Le Goff, “Signal constellations for bit-interleaved coded modula-tion,” IEEE Trans. Inf. Theory, vol. 49, no. 1, pp. 307–313, Jan. 2003.
[8] M. Barsoum, C. Jones, and M. Fitz, “Constellation design via capacitymaximization,” inProc. IEEE Int. Symp. Inf. Theory, Nice, France, Jun.2007, pp. 1821–1825.
[9] S. Y. Le Goff, B. S. Sharif, and S. A. Jimaa, “A new bit-interleavedcoded modulation scheme using shaping coding,” in Proc. IEEEGlobal Telecommun. Conf., Dallas, TX, Nov.–Dec. 2004.
[10] D. Raphaeli and A. Gurevitz, “Constellation shaping for pragmaticturbo-coded modulation with high spectral efficiency,” IEEE Trans.Commun., vol. 52, no. 3, pp. 341–345, Mar. 2004.
[11] S. Y. Le Goff, B. S. Sharif, and S. A. Jimaa, “Bit-interleaved turbo-coded modulation using shaping coding,” IEEE Commun. Lett., vol. 9,no. 3, pp. 246–248, Mar. 2005.
[12] S. Y. Le Goff, B. K. Khoo, C. C. Tsimenidis, and B. S. Sharif, “Con-stellation shaping for bandwidth-efficient turbo-coded modulation withiterative receiver,” IEEE Trans. Wireless Commun., vol. 6, no. 6, pp.2223–2233, Jun. 2007.
[13] A. Guillén i Fàbregas and A. Martinez, “Bit-interleaved coded mod-ulation with shaping,” in Proc. IEEE Inf. Theory Workshop, Dublin,Ireland, Aug.–Sep. 2010.
[14] L. Peng, A. Guillén i Fàbregas, and A. Martinez, “Mismatched shapingschemes for bit-interleaved coded modulation,” in Proc. IEEE Int.Symp. Inf. Theory, Cambridge, MA, Jul. 2012, pp. 2416–2420.
[15] S. Verdú, “Spectral efficiency in the wideband regime,” IEEE Trans.Inf. Theory, vol. 48, no. 6, pp. 1319–1343, Jun. 2002.
[16] V. V. Prelov and S. Verdú, “Second-order asymptotics of mutual infor-mation,” IEEE Trans. Inf. Theory, vol. 50, no. 8, pp. 1567–1580, Aug.2004.
[17] A. Martinez, A. Guillén i Fàbregas, and G. Caire, “Bit-interleavedcoded modulation in the wideband regime,” IEEE Trans. Inf. Theory,vol. 54, no. 12, pp. 5447–5455, Dec. 2008.
[18] C. Stierstorfer and R. F. H. Fischer, “Asymptotically optimal mappingsfor BICM with -PAM and -QAM,” IET Electron. Lett., vol. 45,no. 3, pp. 173–174, Jan. 2009.
[19] A. Alvarado, E. Agrell, A. Guillén i Fàbregas, and A. Martinez,“Corrections to ‘Bit-interleaved coded modulation in the widebandregime’,” IEEE Trans. Inf. Theory, vol. 56, no. 12, p. 6513, Dec. 2010.
[20] C. Stierstorfer and R. F. H. Fischer, “Mappings for BICM in UWBscenarios,” in Proc. Int. ITG Conf. Source Channel Coding, Ulm, Ger-many, Jan. 2008.
[21] E. Agrell and A. Alvarado, “Optimal alphabets and binary labelingsfor BICM at low SNR,” IEEE Trans. Inf. Theory, vol. 57, no. 10, pp.6650–6672, Oct. 2011.
[22] T. Nguyen and L. Lampe, “Bit-interleaved coded modulation with mis-matched decoding metrics,” IEEE Trans. Commun., vol. 59, no. 2, pp.437–447, Feb. 2011.
[23] W. K. Pratt, J. Kane, and H. C. Andrews, “Hadamard transform imagecoding,” Proc. IEEE, vol. 57, no. 1, pp. 58–72, Jan. 1969.
[24] R. Calderbank and J. E. Mazo, “A new description of trellis codes,”IEEE Trans. Inf. Theory, vol. IT-30, no. 6, pp. 784–791, Nov. 1984.
[25] P. Knagenhjelm and E. Agrell, “The Hadamard transform—A toolfor index assignment,” IEEE Trans. Inf. Theory, vol. 42, no. 4, pp.1139–1151, Jul. 1996.
[26] F. A. Khan, E. Agrell, and M. Karlsson, “Electronic dispersion com-pensation by Hadamard transformation,” inProc. Opt. Fiber Conf., SanDiego, CA, Mar. 2010, paper OWV4.
[27] A. Martinez, A. Guillén i Fàbregas, and G. Caire, “Bit-interleavedcoded modulation revisited: A mismatched decoding perspective,”IEEE Trans. Inf. Theory, vol. 55, no. 6, pp. 2756–2765, Jun. 2009.
[28] A. Alvarado, F. Brännström, and E. Agrell, “High SNR bounds for theBICM capacity,” in Proc. IEEE Inf. Theory Workshop, Paraty, Brazil,Oct. 2011, pp. 360–364.
[29] E. Agrell, J. Lassing, E. G. Ström, and T. Ottosson, “On the optimalityof the binary reflected Gray code,” IEEE Trans. Inf. Theory, vol. 50,no. 12, pp. 3170–3182, Dec. 2004.
[30] G. Ungerboeck, “Channel coding with multilevel/phase signals,” IEEETrans. Inf. Theory, vol. 28, no. 1, pp. 55–67, Jan. 1982.
Erik Agrell (M’99–SM’02) received the Ph.D. degree in information theory in1997 from Chalmers University of Technology, Sweden.From 1997 to 1999, he was a Postdoctoral Researcher with the University
of Illinois at Urbana-Champaign and the University of California, San Diego.In 1999, he joined the faculty of Chalmers University of Technology, firstas an Associate Professor and since 2009 as a Professor in CommunicationSystems. In 2010, he cofounded the Fiber-Optic Communications ResearchCenter (FORCE) at Chalmers, where he leads the signals and systems researcharea. His research interests belong to the fields of information theory, codingtheory, and digital communications, and his favorite applications are foundin optical communications. More specifically, his current research includespolarization-multiplexed coding and modulation, optical intensity modulation,nonlinear channel capacity, cognitive radio, bit-interleaved coded modulationand multilevel coding, bit-to-symbol mappings in coded and uncoded systems,lattice theory and sphere decoding, and multidimensional geometry.Prof. Agrell served as Publications Editor for the IEEE TRANSACTIONS ON
INFORMATION THEORY from 1999 to 2002 and is an Associate Editor for theIEEE TRANSACTIONS ON COMMUNICATIONS since 2012. He is a recipient ofthe 1990 John Ericsson Medal, the 2009 ITW Best Poster Award, and the 2011GlobeCom Best Paper Award.
Alex Alvarado (S’06–M’11) was born in 1982 in Quellón, on the island ofChiloé, Chile. He received his Electronics Engineer degree (Ingeniero CivilElectrónico) and his M.Sc. degree (Magíster en Ciencias de la Ingeniería Elec-trónica) from Universidad Técnica Federico Santa María, Valparaíso, Chile,in 2003 and 2005, respectively. He obtained the degree of Licentiate of En-gineering (Teknologie Licentiatexamen) in 2008 and his PhD degree in 2011,both of them from Chalmers University of Technology, Göteborg, Sweden.Since 2006, he has been involved in a research collaboration with the In-
stitut National de la Recherche Scientifique (INRS), Montreal, QC, Canada, onthe analysis and design of BICM systems. In 2008, he was holder of the MeritScholarship Program for Foreign Students, granted by the Ministère de l’Édu-cation, du Loisir et du Sports du Québec. From 2011 to 2012, he was a NewtonInternational Fellow at the University of Cambridge, United Kingdom, fundedby The British Academy and The Royal Society. He is currently a Marie CurieIntra-European Fellow at the same institution. His research interests are in theareas of digital communications, coding, and information theory.