submitted to ieee wireless communications letters, โฆ
TRANSCRIPT
SUBMITTED TO IEEE WIRELESS COMMUNICATIONS LETTERS, VOL. XX, NO. X, JUNE 2021 1
Opening the Black Box of Deep Neural Networksin Physical Layer Communication
Jun Liu, Kai Mei, Dongtang Ma, Senior Member, IEEE and Jibo Wei, Member, IEEE
AbstractโDeep Neural Network (DNN)-based physical layertechniques are attracting considerable interest due to theirpotential to enhance communication systems. However, moststudies in the physical layer have tended to focus on theapplication of DNN models to wireless communication problemsbut not to theoretically understand how does a DNN work in acommunication system. In this letter, we aim to quantitativelyanalyse why DNNs can achieve comparable performance in thephysical layer comparing with traditional techniques and theircost in terms of computational complexity. We further investigateand also experimentally validate how information is flown ina DNN-based communication system under the informationtheoretic concepts.
Index TermsโDeep neural network (DNN), physical layercommunication, information theory.
I. INTRODUCTION
DEEP neural networks (DNN) have recently drawn a lot ofattention as a powerful tool in science and engineering
problems such as protein structure prediction, image recog-nition, speech recognition and natural language processingthat are virtually impossible to explicitly formulate. Althoughthe mathematical theories of communication systems havebeen developed dramatically since Claude Elwood Shannonโsmonograph โA mathematical theory of communicationโ [1]provides the foundation of digital communication, the wirelesschannel-related gap between theory and practice motivatesresearchers to implement DNNs in existing physical layercommunication. In order to mitigate the gap, a natural thoughtis to let a DNN to jointly optimize a transmitter and a receiverfor a given channel model without being limited to component-wise optimization. In [2], a pure data-driven end-to-end com-munication system is proposed to jointly optimize transmitterand receiver components. Then, the authors consider the linearand nonlinear steps of processing the received signal as a radiotransformer network (RTN) which can be integrated into theend-to-end training process. The ideas of end-to-end learningof communication system and RTN through DNN are extendedto orthogonal frequency division multiplexing (OFDM) in [3].Another natural idea is to recover channel state information(CSI) and estimate the channel as accurate as possible byimplementing a DNN so that the effects of fading could be
Manuscript received June 2, 2021; revised X X, 2021; accepted XX, 2021. This work was supported in part by National Natural Science Founda-tion of China (NSFC) under Grant 61931020, 61372099 and 61601480. (Cor-responding author: Jun Liu.)
Jun Liu, Kai Mei, Dongtang Ma, and Jibo Wei are with the Collegeof Electronic Science and Technology, National University of DefenseTechnology, Changsha 410073, China (E-mail: {liujun15, meikai11, dong-tangma, wjbhw}@nudt.edu.cn).
reduced. The authors of [4] propose an end-to-end DNN-basedCSI compression feedback and recovery mechanism which isfurther extended with long short-term memory (LSTM) [5].In [6], a residual learning based DNN designed for OFDMchannel estimation is introduced. Furthermore, in order tomitigate the disturbances, in addition to Gaussian noise, suchas channel fading and nonlinear distortion, [7] proposes anonline fully complex extreme learning machine-based symboldetection scheme.
Comparing with traditional physical layer communicationsystems, the above-mentioned DNN-based techniques showcompetitive performance. However, what has been missing isto understand the dynamics behind the DNN in physical layercommunication.
In this paper, we attempt to first give a mathematicalexplanation to reveal the mechanism of end-to-end DNN-basedcommunication systems. Then, we try to unveil the role ofthe DNNs in the tasks of CSI recovery, channel estimationand symbol detection. We believe that we have developed aconcise way to open as well as understand the โblack boxโ ofDNNs in physical layer communication. To summarize, ourmain contributions of this paper are twofold:
โข Instead of proposing a scheme combining a DNN witha typical communication system, we analyse the be-haviours of a DNN-based communication system fromthe perspectives of the whole DNN (communication sys-tem), encoder (transmitter) and decoder (receiver). Oursimulation results verify that the constellations producedby autoencoders are equivalent to the (locally) optimumconstellations obtained by the gradient-search algorithmwhich minimize the asymptotic probability of error inGaussian noise under an average power constraint.
โข We consider the tasks of CSI recovery, channel estimationand symbol detection as a typical inference problem. Theinformation flow in the DNNs of these tasks is estimatedby using matrix-based functional of Renyiโs ๐ผ-entropy toapproximate Shannonโs entropy.
The remainder of this paper is organized as follows. InSection II, we give the system model and formulate theproblem. Next, simulation results are presented in Section III.Finally, the conclusions are drawn in Section IV.
Notations: The notations adopted in the paper are as fol-lows. We use boldface lowercase x, capital letters X andcalligraphic letters X to denote column vectors, matrices andsets respectively. In addition, ๏ฟฝ and E {ยท} denote respectivelythe Hadamard product and the expectation operation.
arX
iv:2
106.
0112
4v2
[ee
ss.S
P] 6
Jun
202
1
SUBMITTED TO IEEE WIRELESS COMMUNICATIONS LETTERS, VOL. XX, NO. X, JUNE 2021 2
InformationSource
Transmitter WirelessChannel
Receiver Destination
Encoder Decoder
Error Propagation
Autoencoder
s
s
sx y
( )f
f ฮ ( )g
g ฮ
z sv
( )|p v z
Fig. 1. Schematic diagram of a general communication system and itscorresponding autoencoder representation.
II. SYSTEM MODEL AND PROBLEM FORMULATION
In this section, we first describe the considered systemmodel and then provide a detailed explanation of the problemformulation from three different perspectives.
A. System Model
As shown in the upper part of Fig. 1, letโs consider the pro-cess of message transmission from the perspectives of a typicalcommunication system and an autoencoder, respectively. Weassume that an information source generates a sequence of ๐-bit message symbols ๐ โ {1, 2, ยท ยท ยท , ๐} to be communicatedto the destination, where ๐ = 2๐ . Then the modulationmodules inside the transmitter map each symbol ๐ to a signalx โ R๐ , where ๐ denoted the dimension of the signal space.The signal alphabet is denoted by x1, x2, ยท ยท ยท , x๐ . Duringchannel transmission, ๐-dimensional signal x is corrupted toy โ R๐ with conditional probability density function (PDF)๐ (y|x) =
โ๐๐=1 ๐ (๐ฆ๐ |๐ฅ๐). In this paper, we use ๐/2 band-
pass channels, each with separately modulated inphase andquadrature components to transmit the ๐-dimensional signal[8]. Finally, the received signal is mapped by the demodulationmodule inside the receiver to ๐ which is an estimate of thetransmitted symbol ๐ . The procedures mentioned above havebeen exhaustively presented by Shannon.
From the point of view of filtering or signal inference,the idea of autoencoder-based communication system matchesNorbert Wienerโs perspective [9]. As shown in the lower partof the Fig. 1, the autoencoder consists of an encoder and adecoder and each of them is a feedforward neural network(NN) with parameters (weights) ๐ฏ ๐ and ๐ฏ๐, respectively.Note that each symbol ๐ from information source usually needsto be encoded to an one-hot vector s โ R๐ and then is fedinto the encoder. Under a given constraint (e.g., average signalpower constraint), the PDF of a wireless channel and a lossfunction to minimize error symbol probability, the encoderand decoder are respectively able to learn to appropriatelyrepresent s as z = ๐๐ฏ ๐
(s) and to map the corrupted signal v toa estimate of transmitted symbol s = ๐๐ฏ๐
(v) where z, v โ R๐ .Here, we use z1, z2, ยท ยท ยท , z๐ denoted the transmitted signalsfrom the encoder in order to distinguish it from the transmittedsignals from the transmitter.
B. Understanding Autoencoders on Message Transmission
From the prospective of the whole autoencoder (communi-cation system), it aims to transmit information to destinationwith low error probability. The symbol error probability, i.e.,the probability that the wireless channel has shifted a signalpoint into another signalโs decision region, is
๐๐ =1๐
๐โ๐=1
Pr (s โ s๐ |s๐ transmitted). (1)
The autoencoder can use the cross-entropy loss function
Llog(s, s;๐ฏ ๐ ,๐ฏ๐
)= โ 1
๐ต
๐ตโ๐=1
๐โ๐=1
s(๐) [๐] log(s(๐) [๐]
)= โ 1
๐ต
๐ตโ๐=1
s(๐) [๐ ](2)
to represent the price paid for inaccuracy of prediction wheres(๐) [๐] denotes the ๐th element of the ๐th symbol in a trainingset with ๐ต symbols. In order to train the autoencoder tominimize the symbol error probability, the optimal parameterscould be found by optimizing the loss function(
๐ฏโ๐ ,๐ฏ
โ๐
)= arg min(๐ฏ ๐ ,๐ฏ๐)
[Llog
(s, s;๐ฏ ๐ ,๐ฏ๐
) ]subject to E
[โzโ2
2]โค ๐av
(3)
where ๐av denotes the average power. In this paper, we set๐av = 1/๐. Now, we must be very curious about how do themappings z = ๐๐ฏ ๐
(s) look like after the training was done.
C. Encoder: Finding a Good Representation
Letโs pay attention to the encoder (transmitter). In thedomain of communication, an encoder needs to learn a robustrepresentation z = ๐๐ฏ ๐
(s) to transmit s against channeldisturbances, such as thermal noise, channel fading, nonlineardistortion, phase jitter, etc. This is equivalent to find an coded(or uncoded) modulation scheme with the signal set of size๐ to map a symbol s to a point z for a given transmittedpower, which maximizes the minimum distance between anytwo constellation points. Usually the problem of finding goodsignal constellations for a Gaussian channel1 is associated withthe search for lattices with high packing density which is anold and well-studied problem in the mathematical literature[11].
We use the algorithm proposed in [12] to obtain the opti-mum constellations. Consider a zero-mean stationary additivewhite Gaussian noise (AWGN) channel with one-sided spec-tral density 2๐0. For large signal-to-noise ratio (SNR), theasymptotic approximation of the (1) can be written as
๐๐ โผ exp(โ 1
8๐0min๐โ ๐
z๐ โ z ๐
22
). (4)
1The problem of constellation optimization is usually considered under thecondition of the Gaussian channel. Although the problem under the conditionof Rayleigh fading channel has been studied in [10], its prerequisite is thatthe side information is perfect known.
SUBMITTED TO IEEE WIRELESS COMMUNICATIONS LETTERS, VOL. XX, NO. X, JUNE 2021 3
To minimize ๐๐, the problem can be formulated as{zโ๐
}๐๐=1 = arg min
{z๐ }๐๐=1
(๐๐)
subject to E[โzโ2
2]โค ๐av
(5)
where{zโ๐
}๐๐=1 denotes the optimal signal set. This optimiza-
tion problem can be solved by using a constrained gradient-search algorithm. We denote {z๐}๐๐=1 as a ๐ ร ๐ matrix
Z = [z1, z2, ยท ยท ยท , z๐ ]๐ . (6)
Then, the ๐ th step of the constrained gradient-search algorithmcan be described by
Zโฒ
๐ +1 = Z๐ โ [๐ โ๐๐ (Z๐ ) (7a)
Z๐ +1 =Zโฒ
๐ +1โ๐
โ๐
(Zโฒ๐ +1 [๐, ๐]
)2 (7b)
where [๐ denotes step size and โ๐๐ (Z๐ ) โ R๐ร๐ denotesthe gradient of ๐๐ respect to the current constellation points.It can be written as
โ๐๐ (Z๐ ) = [g1, g2, ยท ยท ยท , g๐ ]๐ (8)
where
g๐ โผ โโ๐โ ๐
exp
(โโz๐ โ z๐ โ2
28๐0
) (1
โz๐ โ z๐ โ22+ 1
4๐0
)1z๐โz๐ .
(9)The vector 1z๐โz๐ denotes ๐-dimensional unit vector in thedirection of z๐ โ z๐ .
Comparing (3) to (5), the mechanism of the encoder in anautoencoder-based communication system has been unveiled.The mapping function of encoder can be represented as{
๐๐ฝโ๐(s๐)
}๐๐=1
โ{zโ๐
}๐๐=1 (10)
when the PDF used for generating training samples is multi-variate zero-mean normal distribution zโz โผ N๐ (0, ๐บ) where0 denotes ๐-dimensional zero vector and ๐บ = (2๐0/๐) I is a๐ ร ๐ diagonal matrix.
D. Decoder: Inference
Finally, it is the time to zoom in the lower right cornerof the Fig. 1 to investigate what happens inside the decoder(receiver). As Fig. 2(a) shown, for the tasks of DNN-based CSIrecovery, channel estimate and symbol detection, the problemcan be formulated as an inference model. For the sake ofconvenience, we can denote the target output of the decoder asz instead of s because we can assume z = ๐๐ฏ ๐
(s) is bijection.If the decoder is symmetric, the decoder also can be seen as asub autoencoder which consists of a sub encoder and decoder.Its bottleneck (or middlemost) layer codes is denoted as u.Here we use z to denote CSI or transmitted symbol which wedesire to predict. The decoder infers a prediction z = ๐๐ฏ๐
(v)according to its corresponding measurable variable v.
z
z
TrainingData
( )|p v z v
หzu
...... ......
( )g
g ฮ
Decoder
v z1t
2t3t
1t
2t
3t
v1t 2t
z
u
st1
t
...
...
...
... st
2t
( )1 |p t v ( )2 1|p t t ( )3 2|p t t ( )1|s sp โt t
( )1ห |p z t ( )1 2|p t t ( )2 3|p t t ( )1 |s sp โ t t
(a)
(b)
( )|p v z
s
Fig. 2. (a) An inference model with a DNN decoder of size (2๐ โ 1) hiddenlayers for learning. (b) The graph representation of the decoder with (๐ โ 1)hidden layers in both sub encoder and decoder. The solid arrow denotes thedirection of input feedforward propagation and the dashed arrow denotes thedirection of information flow in the error back-propagation phase.
If the joint distribution ๐ (v, z) is known, the expected(population) risk C๐ (v,z)
(๐๐ฏ๐
,Llog
)can be written as
E[Llog
(z, z;๐ฏ๐
) ]=
โvโV ,zโZ
๐ (v, z) log(
1๐ (z|v)
)=
โvโV ,zโZ
๐ (v, z) log(
1๐ (z|v)
)+
โvโV ,zโZ
๐ (v, z) log(๐ (z|v)๐ (z|v)
)= ๐ป (z|v) + ๐ทKL (๐ (z|v) | |๐ (z|v))โฅ ๐ป (z|v)
(11)where ๐ (ยท|v) =๐๐ฏ๐
(v) โ ๐ (Z) and ๐ทKL (๐ (z|v) | |๐ (z|v))denotes Kullback-Leibler divergence between ๐ (z|v) and๐ (z|v) [13]2. If and only if the decoder is given by the con-ditional posterior ๐๐ฏ๐
(v) =๐ (z|v), the expected (population)
risk reaches the minimum min๐๐ฏ๐
C๐ (v,z)(๐๐ฏ๐
,Llog
)= ๐ป (z|v).
In physical layer communication, instead of perfectly know-ing the channel-related joint distribution ๐ (v, z), we only have
a set of ๐ต i.i.d. samples D๐ต :={(
v(๐) , z(๐))}๐ต
๐=1from ๐ (v, z).
In this case, the empirical risk is defined as
C๐ (v,z)(๐๐ฏ๐
,L,D๐ต
)=
1๐ต
๐ตโ๐=1
L[z๐ , ๐๐ฏ๐
(v๐)]. (12)
Practically, the D๐ต from ๐ (v, z) usually is a finite set.This leads the difference between the empirical and expected(population) risks which can be defined as
gen๐ (v,z)(๐๐ฏ๐
,L,D๐ต
)=C๐ (v,z)
(๐๐ฏ๐
,Llog
)โ
C๐ (v,z)(๐๐ฏ๐
,L,D๐ต
).
(13)
We now can preliminarily conclude that the DNN-basedreceiver is an estimator with minimum empirical risk fora given set D๐ต, whereas its performance is inferior to the
2If ๐ is a continuous random variable the sum becomes an integral whenits PDF exists.
SUBMITTED TO IEEE WIRELESS COMMUNICATIONS LETTERS, VOL. XX, NO. X, JUNE 2021 4
(a)
(b)
M=8, N=2 M=16, N=2 M=16, N=3
M=8, N=2 M=16, N=2 M=16, N=3
Fig. 3. Comparisons of (a) optimum constellations obtained by gradient-search technique and (b) constellations produced by autoencoders.
optimal with minimum expected (population) risk under agiven joint distribution ๐ (v, z).
Furthermore, it is necessary to quantitatively understandhow information flows inside the decoder. Fig. 2(b) showsthe graph representation of the decoder where t๐ andtโฒ๐ (1 โค ๐ โค ๐) denote ๐th hidden layer representations startingfrom the input layer and the output layer, respectively. Here,we use the method proposed in [14] to illustrate layer-wisemutual information by three kinds of information planes (IPs)where the Shannonโs entropy is estimated by matrix-basedfunctional of Renyiโs ๐ผ-entropy [15]. Its details are given inAppendix.
III. SIMULATION RESULTS
In this section, we provide simulation results to illustratethe behaviour of DNN in physical layer communication.
A. Constellation Comparison
Fig. 3(a) shows the optimum constellations obtained bygradient-search technique proposed in [12]. When ๐ = 2 and3, the algorithm were run allowing for 1000 and 3000 steps,respectively. The step size [ = 2 ร 10โ4. Fig. 3(b) shows theconstellations produced by autoencoders which have the samenetwork structures and hyperparameters with the autoencodersmentioned in [2]. The autoencoders were trained with 106
epochs, each of which contains ๐ different symbols.When ๐ = 2, the two-dimensional constellations produced
by autoencoders have a similar pattern to the optimum con-stellations which form a lattice of (almost) hexagonal. Specifi-cally, in the case of (๐ = 8, ๐ = 2), one of the constellationsfound by the autoencoder can be obtained by rotating theoptimum constellation found by gradient-search technique. Inthe case of (๐ = 16, ๐ = 2), the constellation found by theautoencoder is different from the optimum constellation butit still forms a lattice of (almost) equilateral triangles. In thecase of (๐ = 16, ๐ = 3), one signal point of the optimumconstellation is almost at the origin while the other 15 signalpoints are almost at the surface of a sphere with radius ๐av andcentre 0. This pattern is similar to the surface of a truncated
200 400 600 800 1000 1200 14000.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
S a(z|z L
S)
Size of Training Set B
SNR=3 dB, N=64 SNR=15 dB, N=64 SNR=30 dB, N=64 SNR=15 dB, N=128 SNR=15 dB, N=512
Fig. 4. The entropy ๐๐ผ (z |zLS) with respect to different values of SNR and๐ .
icosahedron which is composed of pentagonal and hexagonalfaces. However, the three-dimensional constellation producedby an autoencoder is a local optima which is form by 16 signalpoints almost in a plane.
From the perspective of computational complexity, the costto train an autoencoder is significantly higher than the cost oftraditional algorithm. Specifically, an autoencoder which has 4dense layers respectively with ๐ , ๐ , ๐ and ๐ neurons needsto train (2๐ + 1) (๐ + ๐) + 2๐ parameters for 106 epochswhereas the gradient-search algorithm only needs 2๐ trainableparameters for 103 steps.
B. Information Flow
We consider a common channel estimation problemfor an OFDM system with ๐ subcarriers. Let z ฮ
=
[๐ป [0] , ๐ป [1] , ยท ยท ยท , ๐ป [๐ โ 1]]๐ which denotes frequency im-pulse response (FIR) vector of a channel. For the sake ofconvenience, we denotes the measurable variable as v ฮ
= zLSwhere zLS represents the least-square (LS) estimation of z.Usually, it can be obtained by using training symbol-basedchannel estimation. In this paper, we use linear interpolationand the number of pilots ๐๐ = ๐/4 = 16.
According to (11), the minimum logarithmic expected (pop-ulation) risk for this inference problem is ๐ป (z|zLS) which canbe estimated by Renyiโs ๐ผ-entropy ๐๐ผ (z|zLS) =๐๐ผ (z, zLS) โ๐๐ผ (zLS) with ๐ผ = 1.01. Fig. 4 illustrates the entropy๐๐ผ (z|zLS) with respect to different values of SNR and ๐ .As can be seen, ๐๐ผ (z|zLS) monotonically decreases as thesize of training set increases. When ๐ต โ โ, ๐๐ผ (z|zLS)decreases slowly. It is because the joint distribution ๐ (z, zLS)can be perfectly learned and therefore the empirical risk isapproaching to the expected risk. Interestingly, when ๐ต > 580,the lower the SNR or the larger input dimension ๐ is, thesmaller ๐ต is needed to obtain the same value of ๐๐ผ (z|zLS).
Fig. 5(a), (b), (c) illustrate the behaviour of the IP-I, IP-II and IP-III in a DNN-based OFDM channel estimator withtopology โ128 โ 64 โ 32 โ 16 โ 8 โ 16 โ 32 โ 64 โ 128โ where
SUBMITTED TO IEEE WIRELESS COMMUNICATIONS LETTERS, VOL. XX, NO. X, JUNE 2021 5
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.50.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.50.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.50.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
T1
T2
T3
T4
I(T;V
' )
I(T;V)
020406080100120140160180200220240260280300320340360380400420440460480500520540560580600620640660680700720740760780800820840860880900
(a) IP-I
T1-T'1
T2-T'2
T3-T'3
T4-T'4
I(T' ;V
' )
I(T;V)
020406080100120140160180200220240260280300320340360380400420440460480500520540560580600620640660680700720740760780800820840860880900
(b) IP-II T1-T
'1
T2-T'2
T3-T'3
T4-T'4
I(T' ;V
)
I(T;V)
020406080100120140160180200220240260280300320340360380400420440460480500520540560580600620640660680700720740760780800820840860880900
(c) IP-III0 100 200 300 400 500 600 700 800 900
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
MSE
Iterations(d) Loss Curve
Fig. 5. The three IPs and loss curve in a DNN-based channel estimator.
linear activation function is considered and the training sampleis constructed by concatenating the real and imaginary parts ofthe complex channel vectors. Batch size is 100 and learningrate [ = 0.001. We use ๐ and ๐ โฒ to denote the input andoutput of the decoder, respectively. The number of iterationsis illustrated through a colour bar. From IP-I, it can be seen thatthe final value of mutual information I (๐ ;๐ โฒ) in each layertends to be equal to the final value of I (๐ ;๐), which meansthat the information from ๐ has been learnt and transferredto ๐ โฒ by each layer. In IP-II, I (๐ โฒ;๐ โฒ) < I (๐ ;๐) in eachlayer, which implies that all the layers are not overfitting.The tendency of I (๐ ;๐) to approach the value of I (๐ โฒ;๐)can be observed from IP-III. Finally, from all the IPs, it iseasy to notice that the mutual information does not changesignificantly when the number of iterations is larger than 200.Meanwhile, according to Fig. 5(d), the MSE reaches a verylow value and also does not change sharply. It means that200 iterations are enough for the task of 64-subcarrier channelestimation using a DNN with the above-mentioned topology.
IV. CONCLUSION
In this paper, we propose a framework to understand themanner of the DNNs in physical communication. We findthat a DNN-based transmitter essentially tries to producea good representation of the information source. Then, wequantitatively analyse the information flow in a DNN-basedcommunication system. We believe that this framework hasthe potential for the design of DNN-based physical commu-nication.
APPENDIX AMATRIX-BASED FUNCTIONAL OF RENYIโS ๐ผ-ENTROPY
For a random variable ๐ in a finite set X, its Renyiโs entropyof order ๐ผ is defined as
๐ป๐ผ (๐) = 11 โ ๐ผ
logโซX๐ ๐ผ (๐ฅ) ๐๐ฅ (14)
where ๐ (๐ฅ) is the PDF of the random variable ๐ . Let{๐ฅ (๐)
}๐ต๐=1 be an i.i.d. sample of ๐ต realizations from the random
variable ๐ with PDF ๐ (๐ฅ). The Gram matrix K can be definedas K [๐, ๐] = ^
(๐ฅ๐ , ๐ฅ ๐
)where ^ : X ร X โฆโ R is a real valued
positive definite and infinitely divisible kernel. Then, a matrix-based analogue to Renyiโs ๐ผ-entropy for a normalized positivedefinite matrix A of size ๐ต ร ๐ต with trace 1 can be given bythe functional
๐๐ผ (A) = 11 โ ๐ผ
log2
[๐ตโ
๐=1_๐ (A)๐ผ
](15)
where _๐ (A) denotes the ๐th eigenvalue of A, a normalizedversion of K:
A [๐, ๐] = 1๐ต
K [๐, ๐]โK [๐, ๐] K [ ๐ , ๐]
. (16)
Now, the joint-entropy can be defined as
๐๐ผ (A,B) = ๐๐ผ
[A ๏ฟฝ B
tr (A ๏ฟฝ B)
]. (17)
Finally, the matrix notion of Renyiโs mutual information canbe defined as
๐ผ๐ผ (A; B) = ๐๐ผ (A) + ๐๐ผ (B) โ ๐๐ผ (A,B) . (18)
REFERENCES
[1] C. E. Shannon, โA mathematical theory of communication,โ The Bellsystem technical journal, vol. 27, no. 3, pp. 379โ423, 1948.
[2] T. Oโshea and J. Hoydis, โAn introduction to deep learning for thephysical layer,โ IEEE Transactions on Cognitive Communications andNetworking, vol. 3, no. 4, pp. 563โ575, 2017.
[3] A. Felix, S. Cammerer, S. Dรถrner, J. Hoydis, and S. Ten Brink, โOFDM-autoencoder for end-to-end learning of communications systems,โ in2018 IEEE 19th International Workshop on Signal Processing Advancesin Wireless Communications (SPAWC). IEEE, 2018, pp. 1โ5.
[4] C.-K. Wen, W.-T. Shih, and S. Jin, โDeep learning for massive MIMOCSI feedback,โ IEEE Wireless Communications Letters, vol. 7, no. 5,pp. 748โ751, 2018.
[5] T. Wang, C.-K. Wen, S. Jin, and G. Y. Li, โDeep learning-based CSIfeedback approach for time-varying massive MIMO channels,โ IEEEWireless Communications Letters, vol. 8, no. 2, pp. 416โ419, 2018.
[6] L. Li, H. Chen, H.-H. Chang, and L. Liu, โDeep residual learning meetsOFDM channel estimation,โ IEEE Wireless Communications Letters,vol. 9, no. 5, pp. 615โ618, 2019.
[7] J. Liu, K. Mei, X. Zhang, D. Ma, and J. Wei, โOnline extreme learningmachine-based channel estimation and equalization for OFDM systems,โIEEE Communications Letters, vol. 23, no. 7, pp. 1276โ1279, 2019.
[8] B. Sklar and P. K. Ray, Digital Communications Fundamentals andApplications. Pearson Education, 2014.
[9] S. Yu, M. Emigh, E. Santana, and J. C. Prรญncipe, โAutoencoders trainedwith relevant information: blending Shannon and Wienerโs perspectives,โin 2017 IEEE International Conference on Acoustics, Speech and SignalProcessing (ICASSP). IEEE, 2017, pp. 6115โ6119.
[10] J. Boutros, E. Viterbo, C. Rastello, and J.-C. Belfiore, โGood latticeconstellations for both Rayleigh fading and Gaussian channels,โ IEEETransactions on Information Theory, vol. 42, no. 2, pp. 502โ518, 1996.
[11] G. C. Jorge, A. A. de Andrade, S. I. Costa, and J. E. Strapasson,โAlgebraic constructions of densest lattices,โ Journal of Algebra, vol.429, pp. 218โ235, 2015.
[12] G. Foschini, R. Gitlin, and S. Weinstein, โOptimization of two-dimensional signal constellations in the presence of Gaussian noise,โIEEE Transactions on Communications, vol. 22, no. 1, pp. 28โ38, 1974.
[13] A. Zaidi, I. Estella-Aguerri et al., โOn the information bottleneckproblems: models, connections, applications and information theoreticviews,โ Entropy, vol. 22, no. 2, p. 151, 2020.
[14] S. Yu and J. C. Principe, โUnderstanding autoencoders with informationtheoretic concepts,โ Neural Networks, vol. 117, pp. 104โ123, 2019.
[15] L. G. S. Giraldo, M. Rao, and J. C. Principe, โMeasures of entropyfrom data using infinitely divisible kernels,โ IEEE Transactions onInformation Theory, vol. 61, no. 1, pp. 535โ548, 2014.