opportunistic cooperation in cognitive femtocell networksopportunistic cooperation in cognitive...

arX

iv:1

103.

1401

v3 [

mat

h.O

C]

9 A

pr 2

011

1

Opportunistic Cooperation in Cognitive FemtocellNetworks

Rahul Urgaonkar, Michael J. Neely

Abstract—We investigate opportunistic cooperation betweenunlicensed secondary users and legacy primary users in acognitive radio network. Specifically, we consider a model ofa cognitive network where a secondary user can cooperativelytransmit with the primary user in order to improve the latter ’seffective transmission rate. In return, the secondary usergetsmore opportunities for transmitting its own data when theprimary user is idle. This kind of interaction between the primaryand secondary users is different from the traditional dynamicspectrum access model in which the secondary users try to avoidinterfering with the primary users while seeking transmissionopportunities on vacant primary channels. In our model, thesecondary users need to balance the desire to cooperate more(to create more transmission opportunities) with the need formaintaining sufficient energy levels for their own transmissions.Such a model is applicable in the emerging area of cognitivefemtocell networks. We formulate the problem of maximizingthe secondary user throughput subject to a time average powerconstraint under these settings. This is a constrained MarkovDecision Problem and conventional solution techniques basedon dynamic programming require either extensive knowledgeofthe system dynamics or learning based approaches that sufferfrom large convergence times. However, using the techniqueofLyapunov optimization, we design a novelgreedy and onlinecontrol algorithm that overcomes these challenges and is provablyoptimal.

Index Terms—Resource Allocation, Opportunistic Coopera-tion, Cognitive Radio, Femtocell Networks, Optimal Control

I. I NTRODUCTION

Much prior work on resource allocation in cognitive ra-dio networks has focused on thedynamic spectrum accessmodel [1], [2] in which the secondary users seek transmissionopportunities for their packets on vacant primary channelsin frequency, time, or space. Under this model, the primaryusers are assumed to be oblivious of the presence of thesecondary users and transmit whenever they have data to send.Secondly, a collision model is assumed for the physical layerin which if a secondary user transmits on a busy primarychannel, then there is a collision and both packets are lost.We considered a similar model in our prior work [3] wherethe objective was to design an opportunistic scheduling policyfor the secondary users that maximizes their throughput utilitywhile providing tight reliability guarantees on the maximum

Rahul Urgaonkar and Michael J. Neely are with the Departmentof Electrical Engineering, University of Southern California, Los An-geles, CA 90089. E-mail: [email protected], [email protected], Web:http://www-scf.usc.edu/∼urgaonka

This material is supported in part by one or more of the following: theDARPA IT-MANET program grant W911NF-07-0028, the NSF Career grantCCF-0747525, and continuing through participation in the Network ScienceCollaborative Technology Alliance sponsored by the U.S. Army ResearchLaboratory.

number of collisions suffered by a primary user overany giventime interval. We note that this formulation does not considerthe possibility of any cooperation between the primary andsecondary users. Further, it assumes that the secondary useractivity does not affect the primary user channel occupancyprocess.

There is a growing body of work that investigates alternatemodels for the interaction between the primary and secondaryusers in a cognitive radio network. In particular, the idea ofcooperation at the physical layer has been considered from aninformation-theoretic perspective in many works (see [4] andthe references therein). These are motivated by the work onthe classical interference and relay channels [5]–[8]. Themainidea in these works is that the resources of the secondary usercan be utilized to improve the performance of the primarytransmissions. In return, the secondary user can obtain moretransmission opportunities for its own data when the primarychannel is idle.

These works mainly treat the problem from a physicallayer/information-theoretic perspective and do not considerupper layer issues such as queueing dynamics, higher priorityfor primary user, etc. Recent work that addresses some ofthese issues includes [9]–[13]. Specifically, [9] considers thescenario where the secondary user acts as a relay for thosepackets of the primary user that it receives successfully butwhich are not received by the primary destination. It derivesthe stable throughput of the secondary user under this model.[10], [11] use a Stackelberg game framework to study spec-trum leasing strategies incooperative cognitive radio networkswhere the primary users lease a portion of their licensedspectrum to secondary users in return for cooperative relaying.[12], [13] study and compare different physical layer strategiesfor relaying in such cognitive cooperative systems. An im-portant consequence of this interaction between the primaryand secondary users is that the secondary user activity cannow potentially influence the primary user channel occupancyprocess. However, there has been little work in studying thisscenario. Exceptions include the work in [14] that considers atwo-user setting where collisions caused by the opportunistictransmissions of the secondary user result in retransmissionsby the primary user.

In this paper, we study the problem of opportunistic co-operation in cognitive networks from anetwork utility maxi-mization perspective, specifically taking into account the abovementioned higher-layer aspects. To motivate the problem andillustrate the design issues involved, we first consider a simplenetwork consisting of one primary and one secondary userand their respective access points in Sec. II. This can model

http://arxiv.org/abs/1103.1401v3

http://www-scf.usc.edu/~urgaonka

2

a practical scenario of recent interest, namely a cognitivefemtocell [15], [16], as discussed in Sec. II. We assume thatthe secondary user can cooperatively transmit with the primaryuser to increase its transmission success probability. In return,the secondary user can get more opportunities for transmittingits own data when the primary user is idle. We formulate theproblem of maximizing the secondary user throughput subjectto time average power constraints in Sec. II-B.

Unlike most of the prior work on resource allocation incognitive radio networks, the evolution of the system statefor this problem depends on the control actions taken bythe secondary user. Here, the system state refers to thechannel occupancy state of the primary user. Because ofthis dependence, this problem becomes a constrained MarkovDecision Problem (MDP) and the greedy “drift-plus-penalty”minimization technique of Lyapunov optimization [17] thatwe used in [3] is no longer optimal. Such problems aretypically tackled using Markov Decision Theory and DynamicProgramming [23], [24]. For example, [14] uses these tools toderive structural results on optimal channel access strategiesin a similar two-user setting where collisions caused by theopportunistic transmissions of the secondary user cause theprimary user to retransmit its packets. However, this approachrequires either extensive knowledge of the dynamics of the un-derlying network state (such as state transition probabilities) orlearning based approaches that suffer from large convergencetimes.

Instead, in Sec. III, we use the recently developed frame-work of maximizing theratio of the expected total reward overthe expected length of a renewal frame [19]–[21] to design acontrol algorithm. This framework extends the classical Lya-punov optimization method [17] to tackle a more general classof MDP problems where the system evolves over renewals andwhere the length of a renewal frame can be affected by thecontrol decisions during that period. The resulting solution hasthe following structure: Rather than minimizing a “drift-plus-penalty” term every slot, it minimizes a “drift-plus-penaltyratio” over each renewal frame. This can be achieved bysolving a sequence of unconstrainedstochastic shortest path(SSP) problems and implementing the solution over everyrenewal frame.

While solving such SSP problems can be simpler than theoriginal constrained MDP, it may still require knowledge ofthe dynamics of the underlying network state. Learning basedtechniques for solving such problems by sampling from thepast observations have been considered in [18]. However, thesemay suffer from large convergence times. Remarkably, in Sec.IV, we show that for our problem, the “drift-plus-penalty ratio”method results in an online control algorithm thatdoes notrequire any knowledge of the network dynamics or explicitlearning, yet is optimal. In this respect, it is similar to thetraditional greedy “drift-plus-penalty” minimizing algorithmsof [17]. We then extend the basic model to incorporate multiplesecondary users as well as time-varying channels in Sec. VI.Finally, we present simulation results in Sec. VII.

SU

Femtocell

Macrocell

Macro BS

Femto BS

PU

Fig. 1. Example femtocell network with primary and secondary users.

II. BASIC MODEL

We consider a network with one primary user (PU), onesecondary user (SU) and their respective base stations (BS).The primary user is the licensed owner of the channel whilethe secondary user tries to send its own data opportunisticallywhen the channel is not being used by the primary user. Thismodel can capture a femtocell scenario where the primary useris a legacy mobile user that communicates with the macro basestation over licensed spectrum (Fig. 1). The secondary useristhe femtocell user that does not have any licensed spectrum ofits own and tries to send data opportunistically to the femtocellbase station over any vacant licensed spectrum. Similar modelsof cooperative cognitive radio networks have been consideredin [9]–[13]. This can also model a single server queueingsystem with two classes of arrivals where one class has astrictly higher priority over the other class.

We consider a time-slotted model. We assume that thesystem operates over a frame-based structure. Specifically,the timeline can be divided into successive non-overlappingframes of durationT [k] slots wherek ∈ {1, 2, 3, . . .} repre-sents the frame number (see Fig. 2). The start time of framekis denoted bytk with t1 = 0. The length of framek is givenby T [k]△=tk+1 − tk. For eachk, the frame lengthT [k] is arandom function of the control decisions taken during thatframe. Each frame can be further divided into two periods:PU Idle and PU Busy. The “PU Idle” period corresponds tothe slots when the primary user does not have any packet tosend to its base station and is idle. The “PU Busy” periodcorresponds to the slots when the primary user is transmittingits packets to its base station over the licensed spectrum. Asshown in Fig. 2, every frame starts with the “PU Idle” periodwhich is followed by the “PU Busy” period and ends whenthe primary user becomes idle again. In the basic model, weassume that the primary user receives new packets every slotaccording to an i.i.d. Bernoulli arrival processApu(t) withrateλpu packets/slot. This means that the length of the “PUIdle” period of any frame is a geometric random variable withparameterλpu. However, the length of the “PU Busy” perioddepends on the secondary user control decisions as discussedbelow.

In any slot t, if the primary user has a non-zero queuebacklog, it transmits one packet to its base station. We assumethat the transmission of each packet takes one slot. If thetransmission is successful, the packet is removed from theprimary user queue. However, if the transmission fails, the

3

t[1] t[2] t[3]

PU BusyPU Idle

T[1] T[2]

t[4]

T[3]

Fig. 2. Frame-based structure of the problem under consideration. Eachframe consists of two periods: PU Idle and PU Busy.

packet is retained in the queue for future retransmissions.Thesecondary user cannot transmit its packets when the channelis being used by the primary user. It can transmit its packetsonly during the “PU Idle” period of the frame and must stopits transmission whenever the primary user becomes activeagain. However, the secondary user can transmit cooperativelywith the primary user in the “PU Busy” period to increaseits transmission success probability. This has the effect ofdecreasing the expected length of the “PU Busy” period.In order to cooperate, the secondary user must allocate itspower resources to help relay the primary user packet. Thiscooperation can take place in several ways depending on thecooperative protocol being used (see [12] for some examples).In this simple model, these details are captured by the resultingprobability of successful transmission.

The reason why the secondary user may want to cooperateis because this can potentially increase the number of timeslots in the future in which the primary user does not haveany data to send as compared to a non-cooperative strategy.This can create more opportunities for the secondary user totransmit its own packets. However, note that the trivial strategyof cooperating whenever possible may lead to a scenariowhere the secondary user does not have enough power forits own data transmission. Thus, the secondary user needs todecide whether it should cooperate or not considering thesetwo opposing factors.

The probability of a successful primary transmission de-pends on the control actions such as power allocation andcooperative transmission decisions by the secondary user.Thisis discussed in detail in the next section. In this model, weassume that the network controller cannot control the primaryuser actions. However, it can control the secondary userdecisions on cooperation and the associated power allocation.

A. Control Decisions and Queueing Dynamics

Let Qpu(t), Qsu(t) ∈ {0, 1, 2, . . .} represent the primaryand secondary user queues respectively in slott. New packetsarrive at the secondary user according to an i.i.d. processAsu(t) of rateλsu packets/slot respectively. We assume thatthere exists a finite constantAmax such thatAsu(t) ≤ Amax

for all t. Every slot, an admission control decision determinesRsu(t), the number of new packets to admit into the secondaryuser queue. Further, every slot, depending on whether theprimary user is busy or idle, resource allocation decisionsare made as follows. WhenQpu(t) > 0, this represents thesecondary user decision on cooperative transmission and thecorresponding power allocationPsu(t). When Qpu(t) = 0,

this corresponds to the secondary user decision on its owntransmission and the corresponding power allocationPsu(t).We assume that in each slot, the secondary user can chooseits power allocationPsu(t) from a setP of possible op-tions. Further, this power allocation is subject to a long-term average power constraintPavg and an instantaneouspeak power constraintPmax. For example,P may containonly two options{0, Pmax} which represents “Remain Idle”and “Cooperate/Transmit at Full Power”. As another example,P = [0, Pmax] such thatPsu(t) can take any value between0andPmax.

Suppose the primary user is active in slott and the sec-ondary user allocates powerP (t) for cooperative transmission.Then the random success/failure outcome of the primarytransmission is given by an indicator variableµpu(P (t)) andthe success probability is given byφ(P (t)) = E {µpu(P (t))}.The functionφ(P ) is known to the network controller and isassumed to be non-decreasing inP . However, the value of therandom outcomeµpu(P (t)) may not be known beforehand.Note that settingP (t) = 0 corresponds to a non-cooperativetransmission and the success probability for this case becomesφ(0) and we denote this byφnc. Likewise, we denoteφ(Pmax)by φc. Thus,φnc ≤ φ(P (t)) ≤ φc for all P (t) ∈ P .

We assume thatλpu is such that it can be supported evenwhen the secondary user never cooperates, i.e.,λpu < φnc.This means that the primary user queue is stable even if thereis no cooperation. Further, for allk, the frame lengthT [k] ≥ 1and there exist finite constantsTmin, Tmax such that under allcontrol policies, we have:

1 ≤ Tmin ≤ E {T [k]} ≤ Tmax

Specifically,Tmin can be chosen to be the expected framelength when the secondary user always cooperates with fullpower whileTmax can be chosen to be the expected framelength when the secondary user never cooperates. Using Lit-tle’s Theorem, we have that:

Tmin

Tmin + 1/λpu

=λpu

φc

Similarly, we have:

Tmax

Tmax + 1/λpu

=λpu

φnc

Using these, we have:

Tmin△

=φc

(φc − λpu)λpu

, Tmax△

=φnc

(φnc − λpu)λpu

(1)

Finally, there exists a finite constantD such that the expecta-tion of the second moment of a frame size,E

{

T 2[k]}

, satisfiesthe following for all k, regardless of the policy:

E{

T 2[k]}

≤ D (2)

This follows from the assumption that the primary user queueis stable even if there is no cooperation. In Appendix C, weexactly compute such aD that satisfies (2).

When the primary user is idle in slott and the secondaryuser allocates powerP (t) for its own transmission, it gets aservice rate given byµsu(P (t)). This can represent the success

4

probability of a secondary transmission with a Bernoulliservice process. This can also be used to model more generalservice processes. We assume that there exists a finite constantµmax such thatµsu(P ) ≤ µmax for all P ∈ P .

Given these control decisions, the primary and secondaryuser queues evolve as follows:

Qpu(t+ 1) = max[Qpu(t)− µpu(P (t)), 0] +Apu(t) (3)

Qsu(t+ 1) = max[Qsu(t)− µsu(P (t)), 0] +Rsu(t) (4)

whereRsu(t) ≤ Asu(t).

B. Control Objective

Consider any control algorithm that makes admission con-trol decisionRsu(t) and power allocationP (t) every slotsubject to the constraints described in Sec. II-A. Note thatif the primary queue backlogQpu(t) > 0, then this poweris used for cooperative transmission with the primary user.IfQpu(t) = 0, then this power is used for the secondary user’sown transmission. Define the following time-averages underthis algorithm:

Rsu△

= limt→∞

1

t

t−1∑

τ=0

E {Rsu(τ)}

P su△

= limt→∞

1

t

t−1∑

τ=0

E {P (τ)}

µsu△

= limt→∞

1

t

t−1∑

τ=0

E {µsu(P (τ))}

where the expectations above are with respect to the potentialrandomness of the control algorithm. Assuming for the timebeing that these limits exist, our goal is to design a jointadmission control and power allocation policy that maximizesthe throughput of the secondary user subject to its average andpeak power constraints and the scheduling constraints imposedby the basic model. Formally, this can be stated as a stochasticoptimization problem as follows:

Maximize: Rsu

Subject to: 0 ≤ Rsu(t) ≤ Asu(t) ∀t

P (t) ∈ P ∀t

Rsu ≤ µsu

P su ≤ Pavg (5)

It will be useful to define the primary queue backlogQpu(t)as the “state” for this control problem. This is because thestate of this queue (being zero or nonzero) affects the controloptions as described before. Note that the control decisionson cooperation affect the dynamics of this queue. Therefore,problem (5) is an instance of a constrained Markov decisionproblem [24]. It is well known that in order to obtain anoptimal control policy, it is sufficient to consider only theclassof stationary, randomized policies that take control actionsonly as a function of the current system state (and independentof past history). A general control policy in this class ischaracterized by a stationary probability distribution over thecontrol action set for each system state. Letυ∗ denote the

optimal value of the objective in (5). Then using standardresults on constrained Markov Decision problems [24]–[26],we have the following:

Lemma 1: (Optimal Stationary, Randomized Policy): Thereexists a stationary, randomized policySTAT that takes controldecisionsRstat

su (t), P statsu (t) every slot purely as a (possibly

randomized) function of the current stateQpu(t) while satis-fying the constraintsRstat

su (t) ≤ Asu(t), Pstatsu (t) ∈ P for all

t and provides the following guarantees:

Rstat

su = υ∗ (6)

Rstat

su ≤ µstatsu (7)

Pstat

su ≤ Pavg (8)

whereRstat

su , µstatsu , P

stat

su denote the time-averages under thispolicy.

We note that the conventional techniques to solve (5)that are based on dynamic programming [23] require eitherextensive knowledge of the system dynamics or learningbased approaches that suffer from large convergence times.Motivated by the recently developed extension to the techniqueof Lyapunov optimization in [19]–[21], we take an differentapproach to this problem in the next section.

III. SOLUTION USING THE “D RIFT-PLUS-PENALTY ”RATIO METHOD

Recall that the start of thekth frame,tk, is defined as thefirst slot when the primary user becomes idle after the “PUBusy” period of the(k − 1)th frame. LetQsu(tk) denotethe secondary user queue backlog at timetk. Also let P (t)be the power expenditure incurred by the secondary user inslot t. For notational convenience, in the following we willdenoteµsu(P (t)) by µsu(t) noting the dependence onP (t)is implicit. Then the queueing dynamics ofQsu(tk) satisfiesthe following:

Qsu(tk+1) ≤ max[Qsu(tk)−

tk+1−1∑

t=tk

µsu(t), 0]

+

tk+1−1∑

t=tk

Rsu(t) (9)

whereRsu(t) denotes the number of new packets admitted inslot t and tk+1 denotes the start of the(k + 1)th frame. Theabove expression has an inequality because it may be possibleto serve the packets admitted in thekth frame during thatframe itself.

In order to meet the time average power constraint, we makeuse of a virtual power queueXsu(tk) [22] which evolves overframes as follows:

Xsu(tk+1) = max[Xsu(tk)− T [k]Pavg +

tk+1−1∑

t=tk

P (t), 0]

(10)

whereT [k] = tk+1 − tk is the length of thekth frame. RecallthatT [k] is a (random) function of the control decisions takenduring thekth frame.

5

In order to construct an optimal dynamic control policy,we use the technique of [19]–[21] where a ratio of “drift-plus-penalty” is maximized over every frame. Specifically,letQ(tk) = (Qsu(tk), Xsu(tk)) denote the queueing state ofthe system at the start of thekth frame. As a measure ofthe congestion in the system, we use a Lyapunov functionL(Q(tk))

△

=12 [Q

2su(tk) + X2

su(tk)]. Define the drift∆(tk) asthe conditional expected change inL(Q(tk)) over the framek:

∆(tk)△

=E {L(Q(tk+1))− L(Q(tk))|Q(tk)} (11)

Then, using (9) and (10), we can bound∆(tk) as follows:

∆(tk) ≤ B −Qsu(tk)E

{

tk+1−1∑

t=tk

[µsu(t)−Rsu(t)]|Q(tk)

}

−Xsu(tk)E

{

T [k]Pavg −

tk+1−1∑

t=tk

P (t)|Q(tk)

}

(12)

whereB is a finite constant that satisfies the following for allk andQ(tk) under any control algorithm:

B ≥1

2E

{

(

tk+1−1∑

t=tk

µsu(t))2

+(

tk+1−1∑

t=tk

Rsu(t))2

+(

tk+1−1∑

t=tk

P (t)− T [k]Pavg

)2

|Q(tk)

}

Using the fact thatµsu(t) ≤ µmax, P (t) ≤ Pmax for all t,and using the fact (2), it follows that choosingB as followssatisfies the above:

B =D[µ2

max +A2max + (Pmax − Pavg)

2]

2(13)

Adding a penalty term−V E

{

∑tk+1−1t=tk

Rsu(t)|Q(tk)}

(whereV > 0 is a control parameter that affects a utility-delay trade-off as shown in Theorem 1) to both sides andrearranging yields:

∆(tk)− V E

{

tk+1−1∑

t=tk

Rsu(t)|Q(tk)

}

≤ B + (Qsu(tk)− V )

× E

{

tk+1−1∑

t=tk

Rsu(t)|Q(tk)

}

−Xsu(tk)E {T [k]Pavg|Q(tk)}

− E

{

tk+1−1∑

t=tk

(

Qsu(tk)µsu(t)−Xsu(tk)P (t))

|Q(tk)

}

(14)

Minimizing the ratio of an upper bound on the right handside of the above expression and the expected frame lengthover all control options leads to the followingFrame-Based-Drift-Plus-Penalty-Algorithm. In each framek ∈ {1, 2, 3, . . .},do the following:

1) Admission Control: For all t ∈ {tk, tk+1, . . . , tk+1−1},chooseRsu(t) as follows:

Rsu(t) =

{

Asu(t) if Qsu(t) ≤ V0 else

(15)

2) Resource Allocation: Choose a policy that maximizes thefollowing ratio:

E

{

∑tk+1−1t=tk

(


|Q(tk)}

E {T [k]|Q(tk)}(16)

Specifically, every slott of the frame, the policy observesthe queue valuesQsu(tk) andXsu(tk) at the beginningof the frame and selects a secondary user powerP (t)subject to the constraintP (t) ∈ P and the constrainton transmitting own data vs. cooperation depending onwhether slott is in the “PU Idle” or “PU Busy” periodof the frame. This is done in such a way that the aboveframe-based ratio of expectations is maximized. Recallthat the frame sizeT [k] is influenced by the policythrough the success probabilities that are determined bysecondary user power selections. Further recall that thesesuccess probabilities are different during the “PU Idle”and “PU Busy” periods of the frame. An explicit policythat maximizes this expectation is given in the nextsection.

3) Queue Update: After implementing this policy, update thequeues as in (4) and (10).

From the above, it can be seen that the admission controlpart (15) is a simple threshold-based decision that does notrequire any knowledge of the arrival ratesλsu or λpu. In thenext section, we present an explicit solution to the maximizingpolicy for the resource allocation in (16) and show that, re-markably, it also does not require knowledge ofλsu or λpu andcan be computed easily. We will then analyze the performanceof the Frame-Based-Drift-Plus-Penalty-Algorithm in Sec. V.

IV. T HE MAXIMIZING POLICY OF (16)

The policy that maximizes (16) uses only two numbers thatwe call P ∗

0 and P ∗1 , defined as follows.P ∗

0 is given by thesolution to the following optimization problem:

Maximize: Qsu(tk)µsu(P0)−Xsu(tk)P0

Subject to:P0 ∈ P (17)

Let θ∗ △

=Qsu(tk)µsu(P∗0 )−Xsu(tk)P

∗0 denote the value of the

objective of (17) under the optimal solution. Then,P ∗1 is given

by the solution to the following optimization problem:

Minimize:θ∗ +Xsu(tk)P1

φ(P1)


Note that both (17) and (18) are simple optimization problemsin a single variable and can be solved efficiently. GivenP ∗

0

andP ∗1 , on every slott of framek, the policy that maximizes

(16) chooses powerP (t) as follows:

P (t) =

{

P ∗0 if Qpu(t) = 0

P ∗1 if Qpu(t) > 0

(19)

That is, the secondary user uses the constant powerP ∗0 for

its own transmission during the “PU Idle”period of the frame,and uses constant powerP ∗

1 for cooperative transmission

6

during all slots of the “PU busy”period of the frame. NotethatP ∗

0 andP ∗1 can be computed easily based on the weights

Qsu(tk), Xsu(tk) associated with framek, and do not requireknowledge of the arrival ratesλsu, λpu.

Our proof that the above decisions maximize (16) hasthe following parts: First, we show that the decisions thatmaximize the ratio of expectations in (16) are the same asthe optimal decisions in an equivalent infinite horizon Markovdecision problem (MDP). Next, we show that the solution tothe infinite horizon MDP uses fixed powerPi for each queuestateQpu(t) = i (for i ∈ {0, 1, 2, . . .}). Then, we show thatPi

are the same for alli ≥ 1. Finally, we show that the optimalpowersP ∗

0 andP ∗1 are given as above. The detailed proof is

given in the next section.

A. Proof Details

Recall that theFrame-Based-Drift-Plus-Penalty-Algorithmchooses a policy that maximizes the following ratio over everyframek ∈ {1, 2, 3, . . .}

E

{

∑tk+1−1t=tk

(


|Q(tk)}

E {T [k]|Q(tk)}(20)

subject to the constraints described in Sec. II. Here we examinehow to solve (20) in detail. First, define the statei in any slott ∈ {tk, tk + 1, . . . , tk+1 − 1} as the value of the primaryuser queue backlogQpu(t) in that slot. Now letR denote theclass of stationary, randomized policies where every policyr ∈ R chooses a power allocationPi(r) ∈ P in each stateiaccording to a stationary distribution. It can be shown thatitis sufficient to only consider policies inR to maximize (20).Now suppose a policyr ∈ R is implemented on arecurrentsystem with fixedQsu(tk) and Xsu(tk) and with the samestate dynamics as our model. Note thatµsu(t) = 0 for allt when the statei ≥ 1. Then, by basic renewal theory [27],we have that maximizing the ratio in (20) is equivalent to thefollowing optimization problem:

Maximize:Qsu(tk)E {µsu(P0(r))} π0(r)

−Xsu(tk)∑

i≥0

E {Pi(r)} πi(r)

Subject to:r ∈ R (21)

whereπi(r) is the resulting steady-state probability of being instatei in the recurrent system under the stationary, randomizedpolicy r and where the expectations above are with respect tor. Note that well-defined steady-state probabilitiesπi(r) existfor all r ∈ R because we have assumed thatλpu < φnc sothat even if no cooperation is used, the primary queue is stableand the system is recurrent. Thus, solving (20) is equivalent tosolving theunconstrained time average maximization problem(21) over the class of stationary, randomized policies. Note that(21) is an infinite horizon Markov decision problem (MDP)over the state spacei ∈ {0, 1, 2, . . .}. We study this problemin the following.

Consider the optimal stationary, randomized policy thatmaximizes the objective in (21). Letχi denote the probability

0 1 2 i

λpu(1-μ1)

(1-λpu)μ1 (1-λpu)μ2

i+1

λpu(1-μi)

(1-λpu)μi+1

λpu

1-λpu

Fig. 3. Birth-Death Markov Chain over the system state wherethe systemstate represents the primary user queue backlog.

distribution overP that is used by this policy to choose apower allocationPi in state i. Let µi denote the resultingeffective probability of successful primary transmissioninstate i ≥ 1. Then we have thatµi = Eχi

{φ(Pi)} whereφ(Pi) denotes the probability of successful transmission instatei when the secondary user spends powerPi in cooperativetransmission with the primary user. Since the system is stableand has a well-defined steady-state distribution, we can writedown the detail equations for the Markov Chain that describesthe state transitions of the system as follows (See Fig. 3):

π0λpu = π1(1− λpu)µ1

π1λpu(1− µ1) = π2(1− λpu)µ2

...

πiλpu(1 − µi) = πi+1(1 − λpu)µi+1 ∀i ≥ 1

whereπi denotes the steady-state probability of being in statei under this policy. Summing over alli yields:

λpu =∑

i≥1

πiµi (22)

The average power incurred in cooperative transmissions underthis policy is given by:

P =∑

i≥1

πiEχi{Pi} (23)

Now consider an alternate stationary policy that uses thefollowing fixed distributionχ′ for choosing control actionP ′

in all statesi ≥ 1:

χ′ △=

χ1 with probability π1∑j≥1

πj

χ2 with probability π2∑j≥1

πj

...χi with probability πi∑

j≥1πj

...

(24)

Let µ′ denote the resulting effective probability of a suc-cessful primary transmission in any statei ≥ 1. Note that thisis same for all states by the definition (24). Then, we havethat:

µ′ =∑

i≥1

µi

πi∑

j≥1 πj

(25)

Let π′i denote the steady-state probability of being in statei

under this alternate policy. Note that the system is stable underthis alternate policy as well. Thus, using the detail equations

7

for the Markov Chain that describes the state transitions ofthesystem under this policy yields

λpu =∑

k≥1

π′kµ

′ =∑

k≥1

π′k

(

∑

i≥1

µi

πi∑

j≥1 πj

)

=∑

k≥1

π′k

(

∑

i≥1 µiπi∑

j≥1 πj

)

=∑

k≥1

π′k

( λpu∑

j≥1 πj

)

(26)

where we used (22) in the last step. This implies that∑

k≥1 π′k =

∑

j≥1 πj and thereforeπ′0 = π0. Also, the

average power incurred in cooperative transmissions underthisalternate policy is given by:

P′=∑

k≥1

π′kEχ′{P ′} =

∑

k≥1

π′k

(

∑

i≥1

Eχi{Pi}

πi∑

j≥1 πj

)

=∑

k≥1

π′k

( P∑

j≥1 πj

)

= P (27)

where we used (23) in the second last step and∑

k≥1 π′k =

∑

j≥1 πj in the last step.Thus, if we chooseχ′ = χ0 in state i = 0 and choose

χ′ as defined in (24) in all other states, it can be seen thatthe alternate policy achieves the same time average value ofthe objective (21) as the optimal policy. This implies that tomaximize (21), it is sufficient to optimize over the class ofstationary policies that use thesame distribution for choosingPi for all statesi ≥ 1. Denote this class byR′. Then for alli > 1, we have thatE {Pi(r)} = E {P1(r)} for all r ∈ R′.Using this and the fact that1−π0(r) =

∑

i≥1 πi(r), (21) canbe simplified as follows:

Maximize: [Qsu(tk)E{µsu(P0(r))} −Xsu(tk)E {P0(r)}]π0(r)

−Xsu(tk)E {P1(r)} (1− π0(r))

Subject to:r ∈ R′ (28)

whereπ0(r) is the resulting steady-state probability of beingin state0 and whereE {P1(r)} is the average power incurredin cooperative transmission in statei = 1 (same for all statesi ≥ 1). Next, note that the control decisions taken by thesecondary user in statei = 0 do not affect the length of theframe and thereforeπ0(r). Further, the expectations can beremoved. Therefore the first term in the problem above can bemaximized separately as follows:

Maximize: Qsu(tk)µsu(P0)−Xsu(tk)P0


This is the same as (17). LetP ∗0 denote the optimal solution

to (29) and letθ∗ = Qsu(tk)µsu(P∗0 ) − Xsu(tk)P

∗0 denote

the value of the objective of (29) under the optimal solution.Note that we must have thatθ∗ ≥ 0 because the value of theobjective when the secondary user choosesP0 = 0 (i.e., staysidle) is 0. Then, (28) can be written as:

Maximize: θ∗π0(r) −Xsu(tk)E {P1(r)} (1− π0(r))

Subject to: r ∈ R′ (30)

The effective probability of a successful primary transmissionin any statei ≥ 1 is given byE{φ(P1(r))}. Using Little’sTheorem, we haveπ0(r) = 1 −

λpu

E{φ(P1(r))}. Using this and

rearranging the objective in (30) and ignoring the constantterms, we have the following equivalent problem:

Minimize:θ∗ +Xsu(tk)E{P1(r)}

E{φ(P1(r))}


It can be shown that it is sufficient to consider onlydetermin-istic power allocations to solve (31) (see, for example, [21,Section 7.3.2]). This yields the following problem:

Minimize:θ∗ +Xsu(tk)P1

φ(P1)


This is the same as (18). Note that solving this problem doesnot require knowledge ofλpu or λsu and can be solved easilyfor general power allocation optionsP . We present an examplethat admits a particularly simple solution to this problem.

SupposeP = {0, Pmax} so that the secondary user caneither cooperate with full powerPmax or not cooperate (withpower expenditure0) with the primary user. Then, the optimalsolution to (32) can be calculated by comparing the value ofits objective forP1 ∈ {0, Pmax}. This yields the followingsimple threshold-based rule:

P ∗1 =

{

0 if Xsu(tk) ≥θ∗(φc−φnc)Pmaxφnc

Pmax else(33)

We also note that this threshold can be computed without anyknowledge of the input ratesλpu, λsu.

To summarize, the overall solution to (16) is given bythe pair (P ∗

0 , P∗1 ) where P ∗

0 denotes the power allocationused by the secondary user for its own transmission whenthe primary user is idle andP ∗

1 denotes the power used bythe secondary user for cooperative transmission. Note thatthese values remain fixed for the entire duration of framek. However, these can change from one frame to anotherdepending on the values of the queuesQsu(tk), Xsu(tk). Thecomputation of(P ∗

0 , P∗1 ) can be carried out using a two-step

process as follows:

1) First, computeP ∗0 by solving problem (29). Letθ∗ be the

value of the objective of (29) under the optimal solutionP ∗0 .

2) Then computeP ∗1 by solving problem (32).

It is interesting to note that in order to implement thisalgorithm, the secondary user does not require knowledge ofthe current queue backlog value of the primary user. Rather,itonly needs to know the values of its own queues and whetherthe current slot is in the “PU Idle” or “PU Busy” part of theframe. This is quite different from the conventional solution tothe MDP (5) which is typically a different randomized policyfor each value of the state (i.e., the primary queue backlog).

V. PERFORMANCEANALYSIS

To analyze the performance of theFrame-Based-Drift-Plus-Penalty-Algorithm, we compare its Lyapunov drift withthat of the optimal stationary, randomized policySTAT ofLemma 1. First, note that by basic renewal theory [27], the

8

performance guarantees provided bySTAT hold over everyframe k ∈ {1, 2, 3, . . .}. Specifically, lettk be the start ofthekth frame. SupposeSTAT is implemented over this frame.Then the following hold:

E

tk+1−1∑

t=tk

Rstatsu (t)

= E

{

T [k]}

υ∗ (34)

E

tk+1−1∑

t=tk

Rstatsu (t)

≤ E

tk+1−1∑

t=tk

µstatsu (t)

(35)

E

tk+1−1∑

t=tk

P statsu (t)

≤ E

{

T [k]}

Pavg (36)

where tk+1 and T [k] denote the start of the(k + 1)th frameand the length of thekth frame, respectively, under thepolicy STAT. Similarly, Rstat

su (t), P statsu (t), µstat

su (t) denote theresource allocation decisions underSTAT.

Next, we define an alternate control algorithmALT that willbe useful in analyzing the performance of theFrame-Based-Drift-Plus-Penalty-Algorithm.

Algorithm ALT: In each framek ∈ {1, 2, 3, . . .}, do thefollowing:

1) Admission Control: For all t ∈ {tk, tk+1, . . . , tk+1−1},chooseRsu(t) as follows:

Rsu(t) =

{

Asu(t) if Qsu(tk) ≤ V0 else

(37)


E

{

∑tk+1−1t=tk

(


|Q(tk)}

E {T [k]|Q(tk)}(38)

3) Queue Update: After implementing this policy, update thequeues as in (9), (10).

By comparing with theFrame-Based-Drift-Plus-Penalty-Algorithm, it can be see that this algorithm differs only in theadmission control part while the resource allocation decisionsare exactly the same. Specifically, underALT, the queuebacklog Qsu(tk) at the start of thekth frame is used formaking admission control decisions for the entire durationof that frame. However, under theFrame-Based-Drift-Plus-Penalty-Algorithm, the queue backlogQsu(t) at the start ofeach slot is used for making admission control decisions.Note that since the length of the frame depends only on theresource allocation decisions and they are the same under thetwo algorithms, it follows that implementing them with thesame starting backlogQ(tk) yields the same frame lengths.

The following lemma compares the value of the secondterm in the Lyapunov drift bound (14) that corresponds tothe admission control decisions under these two algorithms.

Lemma 2: Let Rfabsu (t) and Ralt

su (t) denote the admis-sion control decisions made by theFrame-Based-Drift-Plus-Penalty-Algorithm and theALT algorithm respectively for all

t ∈ {tk, tk + 1, . . . , tk+1 − 1}. Then we have:

E

{

tk+1−1∑

t=tk

(Qsu(tk)− V )Raltsu (t)|Q(tk)

}

≥ E

{

tk+1−1∑

t=tk

(Qsu(tk)− V )Rfabsu (t)|Q(tk)

}

− C (39)

where C △

=D(Amax+µmax)Amax

2 is a constant that does notdepend onV .

Proof: See Appendix A.We are now ready to characterize the performance of the

Frame-Based-Drift-Plus-Penalty-Algorithm.Theorem 1: (Performance Theorem) Suppose theFrame-

Based-Drift-Plus-Penalty-Algorithm is implemented over allframes k ∈ {1, 2, 3, . . .} with initial condition Qsu(0) =0, Xsu(0) = 0 and with a control parameterV > 0.Let µfab

su (t), P fabsu (t) denote the resource allocation decisions

under this algorithm. Then, we have:1) The secondary user queue backlogQsu(t) is upper

bounded for allt:

Qsu(t) ≤ Qmax△

=Amax + V (40)

2) The virtual power queueXsu(tk) is mean rate stable, i.e.,

limK→∞

E {Xsu(tK)}

K= 0 (41)

Further, we have:

lim supK→∞

(

1

K

K∑

k=1

E

{

tk+1−1∑

t=tk

(P fabsu (t)− Pavg)

})

≤ 0

(42)

lim supK→∞

1K

∑Kk=1 E

{

∑tk+1−1t=tk

P fabsu (t)

}

1K

∑Kk=1 E {T [k]}

≤ Pavg (43)

3) The time-average secondary user throughput (definedover frames) satisfies the following bound for allK > 0:∑K

k=1 E

{

∑tk+1−1t=tk

Rfabsu (t)

}

∑Kk=1 E {T [k]}

≥ υ∗ −B + C

V Tmin

(44)

where B =D[µ2

max+A2max+(Pmax−Pavg)

2]2 and C =

D(Amax+µmax)Amax

2 .Theorem 1 shows that the time-average secondary userthroughput can be pushed to withinO(1/V ) of the optimalvalue with a trade-off in the worst case queue backlog. ByLittle’s Theorem, this leads to anO(1/V, V ) utility-delaytradeoff.

Proof: Part (1): We argue by induction. First, note that(40) holds fort = 0. Next, supposeQsu(t) ≤ Qmax for somet > 0. We will show thatQsu(t + 1) ≤ Qmax. We have twocases. First, supposeQsu(t) ≤ V . Then, by (9), the maximumthatQsu(t) can increase isAmax so thatQsu(t+1) ≤ Amax+V = Qmax. Next, supposeQsu(t) > V . Then, the admissioncontrol decision (15) choosesRsu(t) = 0. Thus, by (9), wehave thatQsu(t+1) ≤ Qsu(t) ≤ Qmax for this case as well.Combining these two cases proves the bound (40).

Parts (2) and (3): See Appendix B.

9

VI. EXTENSIONS TOBASIC MODEL

We consider two extensions to the basic model of Sec. II.

A. Multiple Secondary Users

Consider the scenario with one primary user as before,but with N > 1 secondary users. The primary user channeloccupancy process evolves as before where the secondaryusers can transmit their own data only when the primary useris idle. However, they may cooperatively transmit with theprimary user to increase its transmission success probability. Ingeneral, multiple secondary users may cooperatively transmitwith the primary in one timeslot. However, for simplicity, herewe assume that at most one secondary user can take part ina cooperative transmission per slot. Further, we also assumethat at most one secondary user can transmit its data when theprimary user is idle.

Our formulation can be easily extended to this scenario. LetPi denote the set of power allocation options for secondaryuseri. Suppose each secondary useri is subject to average andpeak power constraintsPavg,i andPmax,i respectively. Also,let φi(P ) denote the success probability of the primary trans-mission when secondary useri spends powerP in cooperativetransmission. Now consider the objective of maximizing thesum total throughput of the secondary users subject to eachuser’s average and peak power constraints and the schedulingconstraints of the model. In order to apply the “drift-plus-penalty” ratio method, we use the following queues:

Qi(tk+1) ≤ max[Qi(tk)−

tk+1−1∑

t=tk

µi(t), 0] +

tk+1−1∑

t=tk

Ri(t)

(45)

Xi(tk+1) = max[Xi(tk)− T [k]Pavg,i +

tk+1−1∑

t=tk

Pi(t), 0]

(46)

where Qi(tk) is the queue backlog of secondary useri atthe beginning of thekth frame,µi(t) is the service rate ofsecondary useri in slot t, Ri(t) andPi(t) denote the numberof new packets admitted and the power expenditure incurredby the secondary useri in slot t. Finally, tk+1 denotes thestart of the(k+1)th frame andT [k] = tk+1− tk is the lengthof the kth frame as before.

Let Q(tk) = (Q1(tk), . . . , QN(tk), X1(tk), . . . , XN(tk))denote the queueing state of the system at thestart of the kth frame. Using a Lyapunov functionL(Q(tk))

△

=12

[

∑Ni=1 Q

2i (tk) +

∑Ni=1 X

2i (tk)

]

and followingthe steps in Sec. III yields the followingMulti-UserFrame-Based-Drift-Plus-Penalty-Algorithm. In each framek ∈ {1, 2, 3, . . .}, do the following:

1) Admission Control: For all t ∈ {tk, tk+1, . . . , tk+1−1},for each secondary useri ∈ {1, 2, . . . , N}, chooseRi(t)as follows:

Ri(t) =

{

Ai(t) if Qi(t) ≤ V0 else

(47)

whereAi(t) is the number of new arrivals to secondaryuseri in slot t.


∑Ni=1 E

{

∑tk+1−1t=tk

(Qi(tk)µi(t)−Xi(tk)Pi(t))|Q(tk)}

E {T [k]|Q(tk)}(48)

3) Queue Update: After implementing this policy, update thequeues as in (45) and (46).

Similar to the basic model, this algorithm can be implementedwithout any knowledge of the arrival ratesλi or λpu. Further,using the techniques developed in Sec. IV, it can be shownthat the solution to (48) can be computed in two steps asfollows. First, we solve the following problem for eachi ∈{1, 2, . . . , N}:

Maximize: Qi(tk)µi(P )−Xi(tk)P

Subject to:P ∈ Pi (49)

Let P ∗0 denote the optimal solution to (49) achieved by useri∗

and letθ∗ denote the optimal objective value. This means useri∗ transmits on all idle slots of framek with powerP ∗

0 . Next,to determine the optimal cooperative transmission strategy, wesolve the following problem for eachi ∈ {1, 2, . . . , N}:

Minimize:θ∗ +Xi(tk)P

φi(P )

Subject to:P ∈ Pi (50)

Let P ∗1 denote the optimal solution to (50) achieved by user

j∗. This means userj∗ cooperatively transmits on all busyslots of framek with powerP ∗

1 .

B. Fading Channels

Next, suppose there is an additional channel fading processS(t) that takes values from a finite setS in an i.i.d fashionevery slot. We assume that in every slot, Prob[S(t) = s] = qsfor all s ∈ S. The success probability with cooperative trans-mission now is a function of both the power allocation and thefading state in that slot. Specifically, suppose the primaryuseris active in slott and the secondary user allocates powerP (t)for cooperative transmission. Also supposeS(t) = s. Then therandom success/failure outcome of the primary transmissionis given by an indicator variableµpu(P (t), s) and the successprobability is given byφs(P (t)) = E {µpu(P (t), s)}. Thefunction φs(P ) is known to the network controller for alls ∈ S and is assumed to be non-decreasing inP for eachs ∈ S. For simplicity, we assume that the secondary usertransmission rateµsu(t) depends only onP (t).

By applying the “drift-plus-penalty” ratio method to thisextended model, we get the following control algorithm. Theadmission control remains the same as (15). The resourceallocation part involves maximizing the ratio in (16). Usingthe same arguments as before in Sec. IV, it can be shownthat maximizing this ratio is equivalent to the following

10

optimization problem:

Max: Qsu(tk)E {µsu(P0(r))} π0(r) −Xsu(tk)E {P0(r)} π0(r)

−Xsu(tk)∑

i≥1

∑

s∈S

E {Pi,s(r)} πi,s(r)

Subject to:r ∈ R (51)

whereπi,s(r) is the resulting steady-state probability of beingin state (i, s) in the recurrent system under the stationary,randomized policyr and where the expectations above arewith respect tor. We study this problem in the following.

Consider the optimal stationary, randomized policy thatmaximizes the objective in (51). Letχi,s denote the probabilitydistribution overP that is used by this policy to choose acontrol actionPi,s in state(i, s). Let µi,s = Eχi,s

{φs(Pi,s)}denote the resulting effective probability of successful primarytransmission in state(i, s) where i ≥ 1. Since the systemis stable under any stationary policy, total incoming rate=total outgoing rate. Thus, we get:

λpu =∑

i≥1

∑

s∈S

πi,sµi,s (52)

whereπi,s denotes the steady-state probability of being in state(i, s) under this policy. Note that the system is stable andhas a well-defined steady-state distribution. The average powerincurred in cooperative transmissions under this policy isgivenby:

P =∑

i≥1

∑

s∈S

πi,sEχi,s{Pi,s} (53)

Now consider an alternate stationary policy that, for eachs ∈ S, uses the following fixed distributionχ′

s for choosingcontrol actionP ′

s in all states(i, s) wherei ≥ 1:

χ′s△

=

χ1,s with probability π1,s∑j≥1

πj,s

χ2,s with probability π2,s∑j≥1

πj,s

...χi,s with probability πi,s∑

j≥1πj,s

...

(54)

For eachs ∈ S, letµ′s denote the resulting effective probability

of a successful primary transmission in any state(i, s) wherei ≥ 1 under this policy. Note that this is same for all states(i, s) wherei ≥ 1 by the definition (54). Then, we have that:

µ′s =

∑

i≥1

µi,s

πi,s∑

j≥1 πj,s

(55)

Let π′i,s denote the steady-state probability of being in state

(i, s) under this alternate policy. Since the system is stable un-der any stationary policy, total incoming rate = total outgoingrate. Thus, we get:

λpu =∑

s∈S

∑

k≥1

π′k,sµ

′s =

∑

s∈S

µ′s

(

∑

k≥1

π′k,s

)

=∑

s∈S

[

∑

i≥1

µi,s

πi,s∑

j≥1 πj,s

](

∑

k≥1

π′k,s

)

(56)

where we used (55) in the last step. SinceS(t) is i.i.d., forany s1, s2 ∈ S, we have that

π0qs1 +∑

j≥1

πj,s1 = qs1, π0qs2 +∑

j≥1

πj,s2 = qs2

Similarly, we have:

π′0qs1 +

∑

j≥1

π′j,s1 = qs1, π′

0qs2 +∑

j≥1

π′j,s2 = qs2

Using this, for anys1, s2 ∈ S, we have:∑

j≥1 πj,s1∑

j≥1 π′j,s1

=

∑

j≥1 πj,s2∑

j≥1 π′j,s2

(57)

Using this in (56), we have for eachs ∈ S:

λpu =

[

∑

s∈S

∑

i≥1

µi,sπi,s

]

∑

k≥1 π′k,s

∑

j≥1 πj,s

= λpu

∑

k≥1 π′k,s

∑

j≥1 πj,s

(58)

where we used (52) in the last step. This implies that∑

k≥1 π′k,s =

∑

j≥1 πj,s for every s ∈ S and thereforeπ′0 = π0. Also, the average power incurred in cooperative

transmissions under this alternate policy is given by:

P′=∑

k≥1

∑

s∈S

π′k,sEχ′

s{P ′

s}

=∑

k≥1

∑

s∈S

π′k,s

(

∑

i≥1

Eχi,s{Pi,s}

πi,s∑

j≥1 πj,s

)

=∑

s∈S

∑

i≥1

Eχi,s{Pi,s}πi,s = P (59)

where we used the fact that∑

k≥1 π′k,s =

∑

j≥1 πj,s for alls. Thus, if we chooseχ′ = χ0 in state i = 0 and chooseχ′s as defined in (54) in all states(i, s) where i ≥ 1, it

can be seen that the alternate policy achieves the same timeaverage value of the objective (51) as the optimal policy. Thisimplies that to maximize (51), it is sufficient to optimize overthe class of stationary policies that, for eachs ∈ S, use thesame distribution for choosingPi,s for all states(i, s) wherei ≥ 1. Denote this class byR′. Using this and the fact that∑

i≥1 πi,s(r) = (1−π0(r))qs for all s, (51) can be simplifiedas follows:

Maximize: [Qsu(tk)E{µsu(P0(r))} −Xsu(tk)E {P0(r)}]π0(r)

−Xsu(tk)∑

s∈S

E {Ps(r)} (1− π0(r))qs

Subject to:r ∈ R′ (60)

whereπ0(r) is the resulting steady-state probability of beingin state0 and whereE {Ps(r)} is the average power incurredin cooperative transmission in any state(i, s) with i ≥ 1.Using the same arguments as before, the solution to (60) canbe obtained in two steps as follows. We first compute thesolution to (29) as before. Denoting its optimal value byθ∗,(60) can be written as:

Maximize: θ∗π0(r) −Xsu(tk)∑

s∈S

E {Ps(r)} (1− π0(r))qs


11

0 100 200 300 400 5000.12

0.14

0.16

0.18

0.2

0.22

0.24

0.26

0.28

0.3

V

Thr

ough

put (

pack

ets/

slot

)

Optimal CooperationNo CooperationCounter Based Policy

Fig. 4. Average Secondary User Throughput vs. V.

Using Little’s Theorem, we haveπ0(r) = 1 −λpu∑

s∈S qsE{φs(Ps(r))}. Using this and rearranging the objective

in (61) and ignoring the constant terms, we have the followingequivalent problem:

Maximize:−θ∗ −Xsu(tk)

∑

s∈S qsE{Ps(r)}∑

s∈S qsE{φs(Ps(r))}


It can be shown that it is sufficient to consider onlydetermin-istic power allocations to solve (62) (see, for example, [21,Section 7.3.2]). This yields the following problem:

Maximize:−θ∗ −Xsu(tk)

∑

s∈S qsPs∑

s∈S qsφs(Ps)

Subject to:Ps ∈ P for all s ∈ S (63)

Note that solving this problem does not require knowledge ofλpu or λsu and can be solved efficiently for general powerallocation optionsP .

VII. S IMULATIONS

In this section, we evaluate the performance of theFrame-Based-Drift-Plus-Penalty-Algorithm using simulations.We consider the network model as discussed in Sec. II withone primary and one secondary user. The setP consists ofonly two options{0, Pmax}. We assume thatPavg = 0.5 andPmax = 1. We setφnc = 0.6 and φc = 0.8. For simplicity,we assume thatµsu(Pmax) = 1.

In the first set of simulations, we fix the input ratesλpu = λsu = 0.5 packets/slot. For these parameters, we cancompute the optimal offline solution by linear programming.This yields the maximum secondary user throughput as0.25packets/slot. We now simulate theFrame-Based-Drift-Plus-Penalty-Algorithm for different values of the control parameterV over1000 frames. In Fig. 4, we plot the average throughputachieved by the secondary user over this period. It can be seenthat the average throughput increases withV and convergesto the optimal value0.25 packets/slot, with the differenceexhibiting a O(1/V ) behavior as predicted by Theorem 1.In Fig. 5, we plot the average queue backlog of the secondaryuser over this period. It can be see that the average queuebacklog grows linearly inV , again as predicted by Theorem 1.Also, for allV , the average secondary user power consumptionover this period was found not to exceedPavg = 0.5 units/slot.

0 100 200 300 400 5000

100

200

300

400

500

600

V

Ave

rage

Bac

klog

(pac

kets

)

Fig. 5. Average Secondary User Queue Occupancy vs. V.

For comparison, we also simulate three alternate algorithms.In the first algorithm “No Cooperation”, the secondary usernever cooperates with the primary user and only attempts tomaximize its throughput over the resulting idle periods. Thesecondary user throughput under this algorithm was found tobe 0.166 packets/slot as shown in Fig. 4. Note that usingLittle’s Theorem, the resulting fraction of time the primaryuser is idle is1 − λpu/φnc = 1 − 0.5/0.6 = 0.166. Thislimits the maximum secondary user throughput under the “NoCooperation” case to0.166 packets/slot.

In the second algorithm, we consider the “Always Cooper-ate” case where the secondary user always cooperates with theprimary user. For the example under consideration, this usesup all the secondary user power and thus, the secondary userachieves zero throughput.

In the third algorithm “Counter Based Policy”, a runningaverage of the total secondary user power consumption sofar is maintained. In each slot, the secondary user decidesto transmit/cooperate only if this running average is smallerthan Pavg . The maximum secondary user throughput underthis algorithm was found to be0.137 packets/slot. This demon-strates that simply satisfying the average power constraint isnot sufficient to achieve maximum throughput. For example,it may be the case that under the “Counter Based Policy”, therunning average condition is usually satisfied when the primaryuser is busy. This causes the secondary user to cooperate.However, by the time the primary user next becomes idle, therunning average exceedsPavg so that the secondary user doesnot transmit its own data. In contrast, theFrame-Based-Drift-Plus-Penalty-Algorithm is able to find the opportune momentsto cooperate/transmit optimally.

In the second set of simulations, we fix the input rateλsu = 0.8 packets/slot,V = 500, and simulate theFrame-Based-Drift-Plus-Penalty-Algorithm over 1000 frames. At thestart of the simulation, we setλpu = 0.4 packets/slot. The val-ues of the other parameters remain the same. However, duringthe course of the simulation, we changeλpu to 0.2 packets/slotafter the first350 frames and then again to0.55 packets/slotafter the first700 frames. In Figs. 6 and 7, we plot the runningaverage (over100 frames) of the secondary user throughputand the average power used for cooperation. These show thatthe Frame-Based-Drift-Plus-Penalty-Algorithm automaticallyadapts to the changes inλpu. Further, it quickly approachesthe optimal performance corresponding to the newλpu by

12

100 200 300 400 500 600 700 800 900 10000.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Frame Number

Run

ning

Ave

rage

of T

hrou

ghpu

t

Fig. 6. Moving Average of Secondary User Throughput over Frames.

adaptively spending more or less power (as required) on co-operation. For example, whenλpu reduces to0.2 packets/slotafter frame number350, the fraction of time the primary isidle even with no cooperation is1 − 0.2/0.6 = 0.66. WithPavg = 0.5, there is no need to cooperate anymore. This isprecisely what theFrame-Based-Drift-Plus-Penalty-Algorithmdoes as shown in Fig. 7. Similarly, when whenλpu increases to0.55 packets/slot after frame number700, the Frame-Based-Drift-Plus-Penalty-Algorithm starts to spend more power oncooperative transmissions.

VIII. C ONCLUSIONS

In this paper, we studied the problem of opportunisticcooperation in a cognitive femtocell network. Specifically,we considered the scenario where a secondary user cancooperatively transmit with the primary user to increase itstransmission success probability. In return, the secondary usercan get more opportunities for transmitting its own data whenthe primary user is idle. A key feature of this problem is thathere, the evolution of the system state depends on the controlactions taken by the secondary user. This dependence makes ita constrained Markov Decision Problem traditional solutionsto which require either extensive knowledge of the systemdynamics or learning based approaches that suffer from largeconvergence times. However, using the technique of Lyaunovoptimization, we designed a novel greedy and online controlalgorithm that overcomes these challenges and is provablyoptimal.

APPENDIX APROOF OFLEMMA 2

Let Qfabsu (t) denote the queue backlog value under

the Frame-Based-Drift-Plus-Penalty-Algorithm for all t ∈{tk, tk+1, . . . , tk+1−1}. Then, since the admission control de-cision (15) of theFrame-Based-Drift-Plus-Penalty-Algorithmminimizes the term(Qsu(t) − V )Rsu(t) for all Qsu(t), wehave:

E

{

tk+1−1∑

t=tk

(Qfabsu (t)− V )Ralt

su (t)|Q(tk)

}

≥ E

{

tk+1−1∑

t=tk

(Qfabsu (t)− V )Rfab

su (t)|Q(tk)

}

(64)

100 200 300 400 500 600 700 800 900 10000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Frame Number

Run

ning

Ave

rage

of P

ower

used

for

Coo

pera

tion

Fig. 7. Moving Average of Power used by the Secondary User forCooperative Transmissions over Frames.

Note that we are not implementing the admission controldecisions ofALT in the left hand side of the above.

Next, we make use of the following sample path relationsin (64) to prove (39). For allt ∈ {tk, tk + 1, . . . , tk+1 − 1},the following hold under any control algorithm:

Qsu(tk) ≥ Qsu(t)− (t− tk)Amax (65)

Qsu(tk) ≤ Qsu(t) + (t− tk)µmax (66)

(65) follows by noting that the maximum number of arrivalsto the secondary user queue in the interval[tk, . . . , t) is atmost (t − tk)Amax. Similarly, (66) follows by noting thatthe maximum number of departures from the secondary userqueue in the interval[tk, . . . , t) is at most(t− tk)µmax.

Using (65) in the left hand side of (64) yields:

E

{

tk+1−1∑

t=tk


su (t)|Q(tk)

}

≤

E

{

tk+1−1∑

t=tk


}

+ E

{

tk+1−1∑

t=tk

(t− tk)AmaxRaltsu (t)|Q(tk)

}

Using the fact thatRaltsu (t) ≤ Amax and

∑tk+1−1t=tk

(t − tk) =T [k](T [k]−1)

2 , we get:

E

{

tk+1−1∑

t=tk


su (t)|Q(tk)

}

≤

E

{

tk+1−1∑

t=tk


}

+DA2

max

2(67)

Next, using (66) in the right hand side of (64) yields:

E

{

tk+1−1∑

t=tk


su (t)|Q(tk)

}

≥

E

{

tk+1−1∑

t=tk


}

− E

{

tk+1−1∑

t=tk

(t− tk)µmaxRfabsu (t)|Q(tk)

}

13

Again using the fact thatRfabsu (t) ≤ Amax and

∑tk+1−1t=tk

(t −

t[k]) = T [k](T [k]−1)2 , we get:

E

{

tk+1−1∑

t=tk


su (t)|Q(tk)

}

≥

E

{

tk+1−1∑

t=tk


}

−DµmaxAmax

2

(68)

Using (67) and (68) in (64), we have:

E

{

tk+1−1∑

t=tk


}

≥

E

{

tk+1−1∑

t=tk


}

− C

APPENDIX BPROOF OFTHEOREM 1, PARTS2 AND 3

We prove parts (2) and (3) of Theorem 1 using the tech-nique of Lyapunov optimization. Using (14), a bound onthe Lyapunov drift under theFrame-Based-Drift-Plus-Penalty-Algorithm is given by:

∆(tk)− V E

{

tk+1−1∑

t=tk

Rfabsu (t)|Q(tk)

}

≤ B + (Qsu(tk)− V )

× E

{

tk+1−1∑

t=tk

Rfabsu (t)|Q(tk)

}

−Xsu(tk)E {T [k]Pavg|Q(tk)}

− E

{

tk+1−1∑

t=tk

(Qsu(tk)µfabsu (t)−Xsu(tk)P

fabsu (t))|Q(tk)

}

(69)

Using Lemma 2, we have that:

E

{

tk+1−1∑

t=tk


}

≤

C + E

{

tk+1−1∑

t=tk


}

Next, note that under theALT algorithm, we have:

E

{

∑tk+1−1t=tk


}

E {T [k]|Q(tk)}

≤E

{

∑tk+1−1t=tk

(Qsu(tk)− V )Rstatsu (t)|Q(tk)

}

E

{

T [k]|Q(tk)}

To see this, we have two cases:1) Qsu(tk) > V : Then,Ralt

su (t) = 0 for all t ∈ {tk, tk +1, . . . , tk+1 − 1}, so that the left hand side above is0while the right hand side is≥ 0. Hence, the inequalityfollows.

2) Qsu(tk) ≤ V : Then, Raltsu (t) = Asu(t) for all t ∈

{tk, tk + 1, . . . , tk+1 − 1}, so that the left hand side

becomes(Qsu(tk) − V )λsu while the right hand sidecannot be smaller than(Qsu(tk)− V )λsu.

Combining these, we get:

(Qsu(tk)− V )E

{

tk+1−1∑

t=tk

Rfabsu (t)|Q(tk)

}

≤ C

+ (Qsu(tk)− V )E

tk+1−1∑

t=tk

Rstatsu (t)|Q(tk)

E {T [k]|Q(tk)}

E

{

T [k]|Q(tk)}

Finally, since the resource allocation part of theFrame-Based-Drift-Plus-Penalty-Algorithm maximizes the ratio in(16), we have:

E

{

tk+1−1∑

t=tk

(Qsu(tk)µfabsu (t)−Xsu(tk)P

fabsu (t))|Q(tk)

}

≥

E

tk+1−1∑

t=tk

(Qsu(tk)µstatsu (t)−Xsu(tk)P

statsu (t))|Q(tk)

×E {T [k]|Q(tk)}

E

{

T [k]|Q(tk)}

Using these in (69), we have:

∆(tk)− V E

{

tk+1−1∑

t=tk

Rfabsu (t)|Q(tk)

}

≤ B + C

+ (Qsu(tk)− V )E

tk+1−1∑

t=tk

Rstatsu (t)|Q(tk)

E {T [k]|Q(tk)}

E

{

T [k]|Q(tk)}

− E

tk+1−1∑

t=tk

(Qsu(tk)µstatsu (t)−Xsu(tk)P

statsu (t))|Q(tk)

×E {T [k]|Q(tk)}

E

{

T [k]|Q(tk)} −Xsu(tk)E {T [k]Pavg|Q(tk)}

Using (34)-(36) in the inequality above, we get:

∆(tk)− V E

{

tk+1−1∑

t=tk

Rfabsu (t)|Q(tk)

}

≤ B + C

− V υ∗E {T [k]|Q(tk)} (70)

To prove (41), we rearrange (70) to get:

∆(tk) ≤ B + C − V υ∗E {T [k]|Q(tk)}

+ V E

{

tk+1−1∑

t=tk

Rfabsu (t)|Q(tk)

}

≤ B + C + V TmaxAmax

(41) now follows from Theorem 4.1 of [21]. SinceXsu(tk) ismean rate stable, (42) follows from Theorem 2.5(b) of [21].To prove (44), we take expectations of both sides of (70) toget:

E {L(Q(tk+1))} − E {L(Q(tk))} − V E

{

tk+1−1∑

t=tk

Rfabsu (t)

}

≤ B + C − V υ∗E {T [k]}

14

Summing overk ∈ {1, 2, . . . ,K}, dividing by V , andrearranging yields:

K∑

k=1

E

{

tk+1−1∑

t=tk

Rfabsu (t)

}

≥ υ∗K∑

k=1

E {T [k]} −(B + C)K

V

where we used that fact thatE {L(Q(tK+1))} ≥ 0 andE {L(Q(t1))} = 0. From this, we have:∑K

k=1 E

{

∑tk+1−1t=tk

Rfabsu (t)

}

∑Kk=1 E {T [k]}

≥ υ∗ −(B + C)K

V∑K

k=1 E {T [k]}

≥ υ∗ −B + C

V Tmin

since∑K

k=1 E {T [k]} ≥ KTmin. This proves (44).

APPENDIX CCOMPUTING D

Here, we compute a finiteD that satisfies (2). First, notethatE

{

T 2[k]}

would be maximum when the secondary usernever cooperates. Next, letI[k] andB[k] denote the lengthsof the primary user idle and busy periods, respectively, in thekth frame. Thus, we haveT [k] = I[k] +B[k].

In the following, we drop[k] from the notation for conve-nience. Using the independence ofI andB, we have:

E{

T 2}

= E{

I2}

+ E{

B2}

+ 2E {I}E {B}

We note thatI is a geometric r.v. with parameterλpu. Thus,E {I} = 1/λpu andE

{

I2}

= (2 − λpu)/λ2pu. To calculate

E {B}, we apply Little’s Theorem to get:

E {I} =(

1−λpu

φnc

)

(E {I}+ E {B})

This yieldsE {B} = 1/(φnc − λpu). To calculateE{

B2}

,we use the observation that changing the service order ofpackets in the primary queue to preemptive LIFO does notchange the length of the busy periodB. However, with LIFOscheduling,B now equals the duration that the first packetstays in the queue. Next, suppose there areN packets thatinterrupt the service of the first packet. Let these be indexedas{1, 2, . . . , N}. We can relateB to the service timeX of thefirst packet and the durations for which all these other packetsstay in the queue as follows:

B = X +

N∑

i=1

Bi (71)

Here, Bi denotes the duration for which packeti stays inthe queue. Using the memoryless property of the i.i.d. arrivalprocess of the primary packets as well as the i.i.d. nature ofthe service times, it follows that all the r.v.’sBi are i.i.d. withthe same distribution asB. Further, they are independent ofN . Squaring (71) and taking expectations, we get:

E{

B2}

= E{

X2}

+ 2E {X}E {N}E {B}

+ E

{

(

N∑

i=1

Bi

)2}

(72)

Note thatX is a geometric r.v. with parameterφnc. ThusE {X} = 1/φnc and E

{

X2}

= (2 − φnc)/φ2nc. Also,

E {N} = λpuE {X} = λpu/φnc. Using these in (72), wehave:

E{

B2}

=(2− φnc)

φ2nc

+2λpu

φ2nc(φnc − λpu)

+ E

{

(

N∑

i=1

Bi

)2}

To calculate the last term, we have:

E

{

(

N∑

i=1

Bi

)2}

= E

{

N∑

i=1

B2i

}

+ 2E

∑

i6=j

BiBj

= E {N}E{

B2}

+ 2(E {B})2(E{

N2}

− E {N})

Note that givenX = x, N is a binomial r.v. with parameters(x, λpu). Thus, we have:

E{

N2}

=∑

x≥1

E{

N2|X = x}

Prob[X = x]

=∑

x≥1

[

(xλpu)2 + xλpu(1− λpu)

]

(1− φnc)x−1φnc

= λ2pu

∑

x≥1

x2φnc(1− φnc)x−1

+ λpu(1− λpu)∑

x≥1

xφnc(1 − φnc)x−1

= λ2pu

(2− φnc)

φ2nc

+ λpu(1− λpu)1

φnc

Using this, we have:

E

{

(

N∑

i=1

Bi

)2}

=λpu

φnc

E{

B2}

+ 2( 1

φnc − λpu

)2

(E{

N2}

− E {N})

=λpu

φnc

E{

B2}

+ 2( 1

φnc − λpu

)2(2λ2pu(1− φnc)

φ2nc

)

Using this, we have:

E{

B2}

=(2 − φnc)

φ2nc

+2λpu

φ2nc(φnc − λpu)

+λpu

φnc

E{

B2}

+ 2( 1

φnc − λpu

)2(2λ2pu(1− φnc)

φ2nc

)

Simplifying this yields:

E{

B2}

=(2− φnc)

φnc(φnc − λpu)+

2λpu

φnc(φnc − λpu)2+

4λ2pu(1− φnc)

φnc(φnc − λpu)3

REFERENCES

[1] I. F. Akyildiz, W.-Y. Lee, M. C. Vuran, and S. Mohanty. NeXt gen-eration/dynamic spectrum access/cognitive radio wireless networks: Asurvey.Comput. Netw., 50:2127-2159, Sept. 2006.

[2] Q. Zhao and B. Sadler. A survey of dynamic spectrum access. IEEESignal Processing Magazine, 24(3):79-89, May 2007.

[3] R. Urgaonkar and M. J. Neely. Opportunistic scheduling with reliabilityguarantees in cognitive radio networks.IEEE Trans. Mobile Computing,8(6):766-777, June 2009.

[4] A. J. Goldsmith, S. A. Jafar, I. Maric, and S. Srinivasa. Breakingspectrum gridlock with cognitive radios: An information theoretic per-spective.Proc. of the IEEE, 97(5):894-914, May 2009.

15

[5] A. Carleial. Interference channels.IEEE Trans. Inform. Theory,24(1):60-70, Jan. 1978.

[6] T. Han and K. Kobayashi. A new achievable rate region for theinterference channel.IEEE Trans. Inform. Theory, 27(1):49-60, Jan.1981.

[7] T. Cover and A. E. Gamal. Capacity theorems for the relay channel.IEEE Trans. Inform. Theory, 25(5):572-584, Sep. 1979.

[8] T. M. Cover and J. A. Thomas.Elements of Information Theory. NewYork: John Wiley & Sons, Inc., 1991.

[9] O. Simeone, Y. Bar-Ness, and U. Spagnolini. Stable throughput ofcognitive radios with and without relaying capability.IEEE Trans.Communications, 55(12):2351-2360, Dec. 2007.

[10] O. Simeone, I. Stanojev, S. Savazzi, Y. Bar-Ness, U. Spagnolini, and R.Pickholtz. Spectrum leasing to cooperating secondary ad hoc networks.IEEE JSAC Special Issue on Cognitive Radio: Theory and Applications,26(1):203-213, Jan. 2008.

[11] J. Zhang and Q. Zhang. Stackelberg game for utility-based cooperativecognitive radio networks.Proc. ACM MobiHoc, May 2009.

[12] I. Krikidis, J. N. Laneman, J. Thompson, and S. McLaughlin. Protocoldesign and throughput analysis for multi-user cognitive cooperativesystems.IEEE Trans. Wireless Commun., 8(9):4740-4751, Sept. 2009.

[13] B. Rong, I. Krikidis, and A. Ephremides. Network-levelcooperation withenhancements based on the physical layer.IEEE Information TheoryWorkshop, Cairo, Egypt, Jan. 2010.

[14] M. Levorato, U. Mitra, and M. Zorzi. Cognitive interference manage-ment in retransmission-based wireless networks.Proc. 47th AllertonConference on Communication, Control, and Computing, Sept. 2009.

[15] G. Gur, S. Bayhan, and F. Alagoz. Cognitive femtocell networks:An overlay architecture for localized dynamic spectrum access.IEEEWireless Communications, 17(4):62-70, Aug. 2010.

[16] J. Jin and B. Li. Cooperative resource management in cognitive wimaxwith femto cells.Proc. IEEE INFOCOM, March 2010.

[17] L. Georgiadis, M. J. Neely, and L. Tassiulas. Resource allocation andcross-layer control in wireless networks.Foundations and Trends inNetworking, 1(1):1-149, 2006.

[18] M. J. Neely. Stochastic optimization for markov modulated networkswith application to delay constrained wireless scheduling. IEEE Con-ference on Decision and Control, Dec. 2009.

[19] C. Li and M. J. Neely. Network utility maximization overpartiallyobservable markovian channels.arXiv:1008.3421v1, Aug. 2010.

[20] M. J. Neely. Dynamic optimization and learning for renewal systems.Proc. Asilomar Conference, Nov. 2010.

[21] M. J. Neely. Stochastic Network Optimization with Application toCommunication & Queueing Systems. Morgan&Claypool, 2010.

[22] M. J. Neely. Energy optimal control for time varying wireless networks.IEEE Trans. Inform. Theory, 52(7):2915-2934, July 2006.

[23] D. P. Bertsekas.Dynamic Programming and Optimal Control, vols. 1 &2, Belmont, MA: Athena Scientific, 2007.

[24] E. Altman. Constrained Markov Decision Processes. Boca Raton, FL:Chapman and Hall/CRC Press, 1999.

[25] M. L. Puterman.Markov Decision Processes. John Wiley & Sons, 2005.[26] D. P. Bertsekas and J. N. Tsitsiklis.Neuro-Dynamic Programming.

Belmont, MA: Athena Scientific, 1996.[27] R. Gallager.Discrete Stochastic Processes. Kluwer Academic Publish-

ers, Boston, 1996.

opportunistic cooperation in cognitive femtocell networksopportunistic cooperation in cognitive...

Documents