blind cognitive mac protocols

Blind Cognitive MAC ProtocolsOmar Mehanna, Ahmed Sultan

Wireless Intelligent NetworksCenter (WINC)

Nile University, Cairo, [email protected], [email protected]

Hesham El GamalDepartment of Electrical

and Computer EngineeringOhio State University, Columbus, USA

[email protected]

AbstractWe consider the design of cognitive Medium AccessControl (MAC) protocols enabling an unlicensed (secondary)transmitter-receiver pair to communicate over the idle periodsof a set of licensed channels, i.e., the primary network. Theobjective is to maximize data throughput while maintaining thesynchronization between secondary users and avoiding interfer-ence with licensed (primary) users. No statistical informationabout the primary traffic is assumed to be available a-priori tothe secondary user. We investigate two distinct sensing scenarios.In the first, the secondary transmitter is capable of sensing allthe primary channels, whereas it senses one channel only inthe second scenario. In both cases, we propose MAC protocolsthat efficiently learn the statistics of the primary traffic on-line. Our simulation results demonstrate that the proposed blindprotocols asymptotically achieve the throughput obtained whenprior knowledge of primary traffic statistics is available.

I. INTRODUCTIONMost of licensed spectrum resources are under-utilized.

This observation has encouraged the emergence of dynamicand opportunistic spectrum access concepts, where unlicensed(secondary) users equipped with cognitive radios are allowedto opportunistically access the spectrum as long as they donot interfere with licensed (primary) users. To achieve thisgoal, secondary users must monitor the primary traffic inorder to identify spectrum holes or opportunities which canbe exploited to transfer data [1].

The main goal of a cognitive MAC protocol is to sense theradio spectrum, detect the occupancy state of different primaryspectrum channels, and then opportunistically communicateover unused channels (spectrum holes) with minimal interfer-ence to the primary users. Specifically, the cognitive MAC pro-tocol should continuously make efficient decisions on whichchannels to sense and access in order to obtain the most benefitfrom the available spectrum opportunities. Several cognitiveMAC protocols have been proposed in previous studies. Forexample, in [2], MAC protocols were constructed assumingeach secondary user is equipped with two transceivers, acontrol transceiver tuned to a dedicated control channel anda software defined radio SDR-based transceiver tuned toany available channels to sense, receive, and transmit sig-nals/packets. On the other hand, [3] proposed a sensing-periodoptimization mechanism and an optimal channel-sequencingalgorithm, as well as an environment adaptive channel-usagepattern estimation method.

The slotted Markovian structure for the primary networktraffic, adopted here, was also considered in [5] where the

optimal policy was characterized and a simple greedy policyfor secondary users was constructed. The authors of [5], how-ever, assumed that the primary traffic statistics (i.e., Markovchain transition probabilities) were available a-priori to thesecondary users. Here, our focus is on the blind scenariowhere the cognitive MAC protocol must learn the transitionprobabilities on-line.

In this work, we differentiate between two scenarios. Thefirst assumes that the secondary transmitter can sense all theavailable primary channels before making the decision onwhich one to access. The secondary receiver, however, doesnot participate in the sensing process and can wait to decodeon only one channel. This is the model adopted in [4]. Inthe sequel, we propose an efficient algorithm that optimizesthe on-line learning capabilities of the secondary transmitterand ensures perfect synchronization between the secondarypair. The proposed protocol does not assume a separatecontrol channel, and hence, piggybacks the synchronizationinformation on the same data packet. Our numerical resultsdemonstrate the superiority of the proposed protocol over theone in [4] where the primary transmitter and receiver areassumed to access the channel in a predetermined sequence,which they agreed upon a-priori.

The second scenario assumes that both the secondary trans-mitter and receiver can sense only one primary channel ineach time slot. This problem can be re-casted as a restlessmulti-armed bandit problem where the optimal algorithm muststrike a balance between exploration and exploitation [8].Unfortunately, finding the optimal solution for this problemremains an elusive task [10]. Inspired by the recent resultsof [8] and [9], an efficient MAC protocol is constructed whichcan be viewed as the Whittle index strategy of [8] augmentedwith a similar learning phase to the one proposed in [9] forthe multi-armed bandit scenario. Our numerical results showthat the performance of this protocol converges to the Whittleindex strategy with known transition probabilities [8].

II. NETWORK MODELA. Primary Network

We consider a primary network consisting of N indepen-dent channels with its users communicating according to asynchronous slot structure. We use i to refer to the channelindex i {1, , N}, and j to refer to the time-slot indexj {1, , T}. The ith primary channel has a bandwidth

Fig. 1. The Gilber-Elliot channel model

of Bi. The traffic statistics of the primary network are suchthat the occupancy of each of the N channels follows adiscrete-time Markov process with two states. The state ofthe ith channel at time slot j, S(j)i , is equal to 1 if thechannel is free, and to 0 if it is busy. The state diagram fora single Markov channel model is illustrated in Figure 1. Thechannel state transition matrix of the Markov chain is given

by P i =[P i00 P

i01

P i10 Pi11

]. We assume that P i remains fixed

for a block of T time slots and is unknown a-priori to thesecondary user.

B. Secondary Pair

It is assumed that the secondary transmitter can sense L1channels (L1 N) and can access L2 = 1 channel in eachslot. The secondary transmitter can only transmit if the channelit chooses to access is sensed to be free. Here, we only reportour results for the two special cases L1 = N and L1 = 1. Themore general case will be addressed in the journal version.

The secondary receiver does not participate in channelsensing and is assumed to be capable of accessing onlyone channel [4]. This assumption is intended to limit thedecoding complexity needed by the secondary receiver. An-other motivation behind restricting channel sensing to thetransmitter is the potentially different sensing outcomes at thesecondary transmitter and receiver due to the spatial diversityof the primary traffic which can lead to the breakdown of thesecondary transmitter-receiver synchronization.

Conceptually, our proposed cognitive MAC protocol can bedecomposed into the following stages: Decision stage: The secondary transmitter decides whichL1 channels to sense. Also, both transmitter and receiverdecide which channel to access.

Sensing stage: The transmitter senses the L1 selectedprimary channels.

Learning stage: The transmitter updates the estimatedprimary channels statistics, P i.

Access stage: If the access channel is sensed to be free,a data packet is transmitted to the secondary receiver.This packet contains the information needed to sustainsynchronization between secondary terminals and, hence,synchronization does not require a dedicated controlchannel. The length of the packet is assumed to be largeenough such that the loss of throughput resulting fromthe synchronization overhead is marginal.

ACK stage: The receiver sends an ACK to the transmitterupon successful reception of sent data.

The performance of the sensing stage is limited by twotypes of errors. If the secondary transmitter decides that anempty channel is busy, it will refrain from transmitting, anda spectrum opportunity is overlooked. This is the false alarmsituation, which is characterized by probability of false alarmPFA. On the other hand, if the detector fails to sense a busychannel as busy, a miss detection occurs resulting in interfer-ence with primary user. The probability of miss detection isdenoted by PMD. In the rest of the paper, S

(j)i denotes the

state of channel i at time slot j as sensed by the transmitter,which might not be the actual channel state S(j)i . Overall,successful communication between the secondary transmitterand receiver occur only when: 1) they both decide to accessthe same channel, and 2) the channel is sensed to be free andis actually free from primary transmissions.

III. FULL SENSING CAPABILITY: L1 = N

In this section it is assumed that the secondary transmittercan sense all N primary channels at the beginning of eachtime slot. The initial packet sent to the receiver includesestimates for the transition probabilities, and the belief vector(1), where (j) = [(j)1 , , (j)N ], and (j)i is the commontransmitters and receivers estimate of the prior probabilitythat channel i is free at the beginning of time slot j, onthe basis of the sensing history of channel i. Once theinitial communication is established, the secondary transmitterand receiver implement the same spectrum access strategydescribed below for j 1.

1) Decision: At the beginning of time slot j, and usingbelief vector (j), the secondary transmitter and receiverdecide to access channeli(j) = arg max

i=1, ,N

[

(j)i Bi

]2) Sensing: The secondary transmitter senses all channels

and captures the sensing vector (j) = [S(j)1 , , S(j)N ],where S(j)i = 1 if the ith channel is sensed to be free,and S(j)i = 0 if it is found busy.

3) Learning: Based on the sensing results, the transmitterupdates the estimates P i01 and P

i11 for all primary

channels as explained below.4) Access: If S(j)i = 1, the transmitter sends its data packet

to the receiver. The packet includes (j), P i01 and Pi11.

In addition, if the transmission at slot j 1 has failed,the transmitter sends (j), which is the belief vectorcomputed at the transmitter based on its observations. Ifthe receiver successfully receives the packet, it sendsan ACK back to the transmitter. Parameter K(j)i isequal to unity if an ACK is received by the transmitter,and zero otherwise. If the channel is free, the forwardtransmission and the feedback channel are assumed tobe error-free.

5) Finally, the transmitter and receiver update the commonbelief vector (j+1) such that:

(j+1)i =

P i11 if K(j)i = 1, i = i

(j)AiP

i11 +

(1 Ai

)P i01 if K

(j)i = 1, i 6= i(j), S

(j)i = 1

CiPi11 +

(1 Ci

)P i01 if K

(j)i = 1, i 6= i(j), S

(j)i = 0

DiPi11 +

(1 Di

)P i01 if K

(j)i = 0, i = i

(j)(j)i P

i11 + (1 (j)i )P i01 if K(j)i = 0, i 6= i(j)

(1)where:Ai = Pr(S

(j)i = 1|S(j)i = 1) = (1PFA)

(j)i

(1PFA)(j)i +PMD(1(j)i ),

Ci = Pr(S(j)i = 1|S(j)i = 0) = PFA

(j)i

PFA(j)i +(1PMD)(1(j)i )

,

Di = Pr(S(j)i = 1|K(j)i = 0) = PFA

(j)i

PFA(j)i +(1(j)i )

,

P i01 and Pi11 are the most recent shared estimates of ith channel

transition probabilities. Obviously, in case of perfect sensing,Ai = 1, Ci = 0 and Di = 0.

In addition, the transmitter computes another belief vector,(j+1), based on its observations:

(j+1)i =

(j+1)i if K

(j)i = 1

AiPi11 + (1Ai) P i01 if K(j)i = 0, i 6= i(j), S

(j)i = 1

CiPi11 + (1 Ci) P i01 if K(j)i = 0, i 6= i(j), S

(j)i = 0

DiPi11 + (1Di) P i01 if K(j)i = 0, i = i(j)

(2)where Ai, Ci, and Di are the same as Ai, Ci and Di with

(j)i

replaced by (j)i . Note that (1) = (1), and (j+1) differs

from (j+1) only when K(j)i = 0. If transmission succeeds atthe jth time slot after one or more failures, the transmitter andreceiver set (j) = (j) before computing (j+1).

Since we assume that traffic statistics on primary chan-nels (P i) are unknown to the secondary users a-priori, thesecondary users need to estimate these probabilities. Whencontinuous observations of each channel are available, eachchannel can be modeled as a hidden Markov model (HMM).An optimal learning algorithm for HMM is described in [7]using which the transition probabilities, PFA, and PMD canbe estimated. However, we propose a much less complexalgorithm based on simple counting, which approximates theestimated probabilities by the optimal HMM algorithm. Thealgorithm we propose works as follows. After sensing allthe primary channels at the beginning of each time slot, thesecondary transmitter keeps track of the following metrics foreach channel: Number of times each channel was sensed to be free:

N i1(j) =j1l=1

S(l)i

Number of times each channel was sensed to be busy:

N i0(j) =j1l=1

(1 S(l)i ) Number of state transitions from free to free:

N i11(j) =j1l=1

S(l)i S

(l+1)i

Number of state transitions from busy to free:

N i01(j) =j1l=1

(1 S(l)i )S(l+1)iThe transition probabilities are estimated:P i01(j) =

Ni01(j)

Ni0(j), P i11(j) =

Ni11(j)

Ni1(j)

In order to share channel transition probabilities betweensecondary transmitter and receiver as dictated by the strategy

for the L1 = N case, values of N i1(j), Ni0(j), N

i11(j) and

N i01(j) for each channel are sent within the transmitted packet.If K(j)i = 1, the transmitter and receiver update P

i01(j) and

P i11(j). Otherwise, the transmitter only updates Ni1(j) , N

i0(j),

N i11(j) and Ni01(j), but uses the old values since the last

successful transmission in order to determine which channelto access at the beginning of a time slot.

In a nutshell, the proposed algorithm uses the full sensingcapability of the secondary transmitter to decouple the ex-ploration (i.e., learning) task from the exploitation task. Afteran ACK is received, both nodes use the common observation-based belief vector to make the optimal access decision. On theother hand, in the absence of the ACK, both nodes can not usethe optimal belief vector in order to maintain synchronization.In this case, the proposed algorithm opts for a greedy strategyin order to minimize the time between two successive ACKs.At this point, we only conjecture the optimality of this strategyand continue to work on the proof for the journal version ofthis work.

As an analytical benchmark, we have the following upper-bound on the achievable throughput in this scenario. Assumingthat the delayed side information of all the primary channelsstates S(j1)i is given to the secondary transmitter and receiver,to decide on the channel to access at time j, an upper boundexpected throughput per slot is given by:

R =

1SN=0

1

S2=0

1S1=0

[(Ni=1

PSi

)(maxi

[PSi1Bi])]

(3)

where, PSi1 denotes the state transition probability forchannel i from state Si = (0, 1) to the free state. PSi isthe Markov steady state probability of channel i being freeor busy. The first term in the summation corresponds to theprobability that the N channels are in one of the 2N states, andthe second term represents the highest expected throughputgiven the current joint state for the N channels.

A final remark is now in order. Assuming that P i11 = Pi01,

a channels probability of being free, PSi=1, becomes inde-pendent of the previous state, i.e., PSi=1 = P

i11 = P

i01. In

this case, the optimal strategy, assuming that the transitionprobabilities are known, is for the secondary transmitter toaccess the channel i = arg max

i=1, ,N[PSi=1Bi] and the ex-

pected throughput becomes maxi=1, ,N

[PSi=1Bi] [9]. Assuming,

however, that the transition probabilities are unknown but bothnodes know that P i11 = P

i01, one can estimate each channels

free probability PSi=1 as PSi=1 = Ni1(j)/j. In Section V,

we quantify the value of this side information by comparingthe performance of this strategy with our universal algorithmthat does not make any prior assumptions about the transitionprobabilities.

IV. THE RESTLESS BANDIT SCENARIO: L1 = 1

Assuming that the transition probabilities are known a-priori by the secondary users, the medium access scenario inthis case can be formulated as a partially observable Markovdecision process (POMDP) [5]. The optimal policy, in this

scenario, must strike a balance between gaining instantaneousreward by exploiting channels based on already known infor-mation, and gaining information for future use by exploringnew spectrum opportunities. Motivated by the prohibitivecomputational complexity of the optimal strategy, the authorsfurther proposed a reduced complexity strategy based on thegreedy approach that maximizes the per-slot throughput basedon already known information (exploitation only) [5]. In amore recent work [8], the problem was re-casted as a restlessbandit problem and the Whittles index approach was used toconstruct a more efficient medium access policy [10].

Here, we relax the assumption of the a-priori known tran-sition probabilities by the secondary transmitter/receiver. Thisadds another interesting dimension to the problem since theblind cognitive MAC protocol must now learn this statisticalinformation on-line in order to make the appropriate accessdecisions. Inspired by previous results of Lai et al. in themulti-armed bandit setup [9], we propose the following simplestrategy. At the beginning of the T slots, each of the N primarychannels is continuously monitored for an initial learningperiod (LP ) to get an estimate for P i11 and P

i01. Then, by

assigning Whittles index T (j)i to each channel, we are able tochoose which channel to access at each time slot. In summary,the strategy works as follows.

1) Initial learning period: Each channel is continuouslysensed for LP time slots. At the end of the learn-ing period, the transition probabilities are estimated asP i01 =

Ni01Ni0

, P i11 =Ni11Ni1

2) Decision: At the beginning of any time slot (j > N LP ), the secondary transmitter and receiver decide toaccess channel i(j) = arg max

i=1, ,N

[T

(j)i Bi

].

3) Sensing: The secondary transmitter senses channel i(j).4) Learning: if i(j) = i(j 1), update N i11, N i1, N i01,

N i0, Pi11, and P

i01.

5) Access: If S(j)i = 1, the transmitter sends its data packetto the receiver. If the receiver successfully receives apacket, it sends an ACK back to the transmitter.

6) The transmitter and receiver calculate (j+1) given that:

(j+1)i =

P i11 if i(j) = i

(j),K(j)i = 1DiP

i11 +

(1 Di

)P i01 if i(j) = i

(j),K(j)i = 0(j)i P

i11 + (1 (j)i )P i01 if i(j) 6= i(j)

(4)

where P i11 and Pi01 are the latest successfully shared P

i11 and

P i01 between the secondary transmitter-receiver pair. Finally,(j+1) is used to update Whittles index T (j+1)i of eachchannel as detailed in [8].

In the case of time-independent channel states, i.e., P i11 =P i01, the problem reduces to the a multi-armed bandit scenarioconsidered in [9]. The difference, here, is the lack of thededicated control channel, between the cognitive transmitterand receiver, as assumed in [9]. The following strategy, whichis applied as soon as the initial synchronization is established,avoids this drawback by ensuring synchronization using theACK feedback over the same data channel.

Fig. 2. Throughput comparison between: the upper bound from equation [3],the proposed blind strategy proposed for L1 = N , the Whittle index strategyfor L1 = 1, the greedy strategy for L1 = 1, and the maximum achievableoffline bound.

1) Decision: At the beginning of any time slot j, thesecondary transmitter and receiver decide to access thechannel i(j) = arg max

i=1, ,N

[

(j)i Bi

], where (j)i =

X(j)i

Y(j)i

+

2lnj

Y(j)i

, X(j)i is the number of time slots where

successful communication occurs on channel i, and Y (j)iis the number of time slots where channel i is chosento sense and access.

2) Sensing: The secondary transmitter senses channel i(j).3) Access: If S(j)i = 1, the transmitter sends its data packet

to the receiver. If the receiver successfully receives apacket, it sends an ACK back to the transmitter.

4) The transmitter and receiver update the following:Y

(j+1)i = Y

(j)i + 1, if i(j) = i

(j)X

(j+1)i = X

(j)i + 1, if K

(j)i = 1, i(j) = i

(j)

(j+1)i =

X(j+1)i

Y(j+1)i

+

2lnj

Y(j+1)i

V. NUMERICAL RESULTS

In this section we present simulation results for the twoscenarios discussed earlier. Throughout this section, we as-sume that the number of primary channels N = 5, eachwith bandwidth Bi = 1. The spectrum usage statistics of theprimary network were assumed to remain unchanged for ablock of T = 104 time slots for Figures 2, 3, and 4, and fora block of T = 105 time slots for Figure 5. The transitionprobabilities for each channel P i11 and P

i01, were generated

randomly between 0.1 and 0.9. The plotted results are theaverage over 1000 simulation runs. The discount factor used toobtain the Whittle index is 0.9999. In all reported simulations,perfect sensing is assumed, and the average throughput pertime slot is plotted.

Figure 2 reports the throughput comparison between thedifferent cognitive MAC strategies, all with prior knowledgeabout the channels transition probabilities. The loss in through-put between the upper bound and the proposed strategy for theL1 = N case is shown and the gain offered by the full sensingcapability as compared with the L1 = 1 scenario is apparent.It is seen also that the strategies we proposed achieve higherthroughput than the best offline bound described in [4], in

Fig. 3. Throughput comparison between the proposed strategy for (L1 = N)with and without known transition probabilities.

Fig. 4. Throughput comparison for the blind cognitive MAC protocol (withand without the prior knowledge that P i11 = P

i01) and the genie-aided

scenario.

which the channel with highest steady state probability of be-ing free is always chosen. Figure 3 illustrates the convergenceof the throughput of the proposed blind strategy for L1 = N ,with no prior information, to the case with prior knowledgeof the transition probabilities as T grows. In Figure 4, weassume that P i11 = P

i01 for all channels. It is shown that even

if the secondary users are unaware of this fact, and applythe proposed strategy, the achievable throughput convergesasymptotically to the achievable performance when the factthat P i11 = P

i01 is known a-priori, albeit at the expense

of a longer learning phase. Interestingly, both strategies areshown to converge asymptotically to genie-aided upper bound(when the transition probabilities are known). Finally, Figure 5demonstrates the tradeoff between the learning time overheadin the blind strategy of Section IV and the final achievablethroughput at the end of the T slots. Clearly, this figuresupports the intuitive conclusion that for large T blocks, onecan tolerate a longer learning phase in order to maximize thesteady state achievable throughput.

VI. CONCLUSION

In this work, we propose blind cognitive MAC protocols thatdo not require any prior knowledge about the statistics of theprimary traffic. We differentiate between two distinct scenar-ios, based on the complexity of the cognitive transmitter. In thefirst, the full sensing capability of the secondary transmitteris fully utilized to learn the statistics of the primary trafficwhile ensuring perfect synchronization between the secondary

Fig. 5. Throughput comparison between the proposed blind strategy for(L1 = 1), when LP = 20 and LP = 200, and the genie-aided case.

transmitter and receiver in the absence of a dedicated con-trol channel. The second scenario focuses on low-complexitycognitive transmitter capable of sensing one channel only atthe beginning of each time slot. For this case, we proposean augmented Whittle index MAC protocol that allows for aninitial learning phase to estimate the transition probabilitiesof the primary traffic. Our numerical results demonstrate theconvergence of the blind protocols performance to that ofthe genie-aided scenario where the primary traffic statistic areknown a-priori by the secondary transmitter and receiver.

REFERENCES[1] S. Haykin, Cognitive radio: brain-empowered wireless communica-

tions, IEEE JSAC, vol. 23, no. 2, pp. 201-220, Feb 2005.[2] H. Su and X. Zhang, Opportunistic MAC Protocols for Cognitive

Radio, Proc. 41st Conference on Information Sciences and Systems(CISS 2007), March 2007

[3] H. Kim and K. Shin, Efficient Discovery of Spectrum Opportunitieswith MAC-Layer Sensing in Cognitive Radio Networks, IEEE Trans-actions on Mobile Computing, vol. 7, no. 5, pp. 533-545, May 2008

[4] S. Srinivasa, S. Jafar and N. Jindal, On the Capacity of the Cogni-tive Tracking Channel, IEEE International Symposium on InformationTheory, July 2006

[5] Q. Zhao, L. Tong, A. Swami, and Y. Chen, Decentralized Cogni-tive MAC for Opportunistic Spectrum Access in Ad Hoc Networks:A POMDP Framework, IEEE JSAC, vol. 25, no. 3, pp. 589-600,April 2007.

[6] A. Sahai, N. Hoven, S. Mishra and R. Tandra, Fun-damental tradeoffs in robust spectrum sensing for op-portunistic frequency reuse, Tech. Rep., 2006. Available:www.eecs.berkeley.edu/ smm/CognitiveTechReport06.pdf

[7] L. Rabiner and H. Juang, An introduction to hidden Markov models,IEEE ASSP Magazine, vol. 3, no. 1, Jan. 1986

[8] K. Liu and Q. Zhao, A Restless Bandit Formulation of OpportunisticAccess: Indexablity and Index Policy, 5th IEEE Annual Communica-tions Society Conference on Sensor, Mesh and Ad Hoc Communicationsand Networks Workshops (SECON Workshops 08), June 2008

[9] L. Lai, H. El-Gamal, H. Jiang and H. Poor, Cognitive Medium Access:Exploration, Exploitation and Competition, submitted to the IEEETransactions on Networking, Oct. 2007

[10] P. Whittle, Restless Bandits: Activity Allocation in a Changing World,Journal of Applied Probability, vol. 25, 1988

blind cognitive mac protocols

Documents