communication networks: pricing, congestion control, routing, … · 2019-12-18 · communication...

Communication Networks: Pricing,Congestion Control, Routing, and Scheduling

Srinivas Shakkottai and R. Srikant

Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Congestion Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1 VCG Mechanism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Kelly Mechanism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3 Strategic or Price-Anticipating Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Flow Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.1 Braess’ Paradox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2 Flow Routing Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4 Pricing Approach to Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.1 Mean Field Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.2 Mean Field Auction Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.3 Properties of Optimal Bid Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.4 Existence of MFE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.5 Properties of MFE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

Abstract

This chapter considers three fundamental problems in the general area ofcommunication networks and their relationship to game theory. These problemsare (i) allocation of shared bandwidth resources, (ii) routing across sharedlinks, and (iii) scheduling across shared spectrum. Each problem inherently

S. Shakkottai (�)Department of Electrical and Computer Engineering, Texas A&M University, College Station,TX, USAe-mail: [email protected]

R. SrikantDepartment of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign,Champaign, IL, USAe-mail: [email protected]

© Springer Nature Switzerland AG 2020T. Basar, G. Zaccour (eds.), Handbook of Dynamic Game Theory,https://doi.org/10.1007/978-3-319-27335-8_29-2

1

http://crossmark.crossref.org/dialog/?doi=10.1007/978-3-319-27335-8_29-2&domain=pdf

mailto:[email protected]

mailto:[email protected]

https://doi.org/10.1007/978-3-319-27335-8_29-2

2 S. Shakkottai and R. Srikant

involves agents that experience negative externalities under which the presenceof one degrades the utility perceived by others. Two approaches to solving suchproblems are (i) to find a globally optimal allocation and simply implement it ina fait accompli fashion, and (ii) request information from the competing agents(traffic flows) and construct a mechanism to allocate resources. Often, only thesecond option is viable, since a centralized solution using complete informationmight be impractical (or impossible) with many millions of competing flows,each one having private information about the application that it corresponds to.Hence, a game theoretical analysis of these problems is natural. In what follows,we will present results on each problem and characterize the efficiency loss thatresults from the mechanism employed.

KeywordsCommunication networks � Utility maximization � Congestion control � Trafficrouting � Packet scheduling

1 Introduction

Communication networks are possibly the largest control systems in existence. Theyconsist of many millions of flows interacting with each other as well as the networkinfrastructure, and competing for available capacity on wired and wireless links.The most commonly used communication network today is the Internet, which isillustrated in schematic form in Fig. 1. Here, we have flows, each of which is betweentwo hosts, commonly a Web server and an end user. A flow typically consists ofdata packets from the server end, and acknowledgements back from the end user.The packets traverse a route that consists of communication links, with the directionof packet forwarding being determined by routers. The network infrastructure itselfis owned by several different Internet service providers, each of which implementsdifferent traffic shaping, scheduling, and pricing policies in their particular network.

In Fig. 1, the server must choose the rate of transmission of packets to each ofthe flows based on the available capacity in the end-to-end route of the flow. Itobtains information on the state of the links though feedback from the routers thatcould either mark or drop packets if the rate of packet arrivals on a particular link istoo high. This feedback is returned back to the server using the acknowledgementpackets, which then results in the server reducing or increasing the rate based on thefeedback received.

Control actions at different network routers are generally implemented via simplealgorithms. The individual routers usually do not maintain per-flow informationand take decisions on which packets to forward at each instant of time basedon their perception of fairness and stability across packets arriving from differentdirections. These decisions result in packets being dropped or marked, which formsthe feedback returned to the server. With the increasing prevalence of software-defined networking, however, it is increasingly possible to take decisions on aper-flow basis.

Communication Networks: Pricing, Congestion Control, Routing, and Scheduling 3

Server

Internet ServiceProvider

Internet Service Provider



End User

Internet ExchangePoint

End User

Fig. 1 Internet flows between a server and end users

Wireless links usually form the last hop of flow. The number of competing usersin the case of WiFi is usually low, and simple randomized access is employed.However, in the cellular data context, the usage of wireless links across competingflows is carefully scheduled due to the limited availability of wireless spectrum.However, such scheduling decisions again may result in dropped or marked packets.

Communication networks have usually been designed with the idea of distribut-edly achieving an overall goal of fair resource allocation and good quality of serviceto all flows assuming a cooperative setup. For a more comprehensive study ofInternet control systems, the reader is referred to Srikant (2004) and Shakkottaiand Srikant (2007). However, given the inherent resource constraints in the systemand the desire for each end user to get the best possible quality of service, it isnatural to try to understand the system from a game theoretic perspective. Thisapproach is becoming increasingly popular, particularly in the case of wirelessresource allocation due to the perceived scarcity of the resource.

This chapter deals with the analysis of communication networks from theperspective of strategic agents. Our focus is on two main problems, namely,(i) allocation of capacity to competing flows and (ii) routing decisions by flows.We consider three questions with different interaction models between the users asfollows:

1. Nash equilibrium. Resource allocation across a finite number of agents. Here, weconsider the problem of resource sharing via an auction mechanism that requestsbids from a finite set of agents and performs an allocation based on the responses.The efficient solution is to allocate resources in such a way that overall utility ofthe system is maximized. Our objective will be to quantify the efficiency lossin this system where the agents are nonstrategic in that they do not consider theexistence of other agents versus the case that they are strategic.


2. Wardrop equilibrium. Routing with infinite agents. Here, we consider the problemencountered in choosing between different routes on a per-packet basis. A singlepacket has effectively no impact on any other packet, but the total traffic alongeach route would impact the delay seen by each packet using that route. Eachpacket desires to reach the destination in the shortest possible time, while theoverall efficient solution is to minimize the total delay in the system.

3. Mean Field Equilibrium. Repeated resource allocation with infinite agents.Finally, we consider the problem of a repeated auction of resources betweenagents that only compete against random subsets of other agents for anyparticular resource. Here, agents are unaware of whom they would competeagainst in the next auction, and hence model their competitors via a belief aboutwhat they are likely to bid. As before, we are interested in the question of whetheran efficient allocation can be achieved at each step.

2 Congestion Control

As we described in the previous section, a communication network can be identifiedwith a set of sources of traffic (or users) R and a set of links L. Each link l 2 L hasa finite capacity cl . Each source desires to communicate with a destination in thenetwork and uses a route r � L to reach its destination. Thus, we can equivalentlyassociate each source with the route it uses, and we will interchangeably refer toboth by r 2 R: We denote the utility that a user obtains from transmitting data onroute r at rate xr by Ur.xr/. The typical assumption is that the utility function iscontinuously differentiable, nondecreasing, and strictly concave. We further assumethat U.0/ � 0: The assumption on concavity of the utility function representsthe fact that a user quality of experience has diminishing returns to per unit rateallocated on the links that it uses. For instance, the perceived value of a rate increaseby 1Mbps is much greater when the user has a low rate than at a high rate.

Consider a network planner who is interested in allocating resources to userswith the goal of maximizing the sum of the users’ utilities. The network planner cando this only if he knows the utility functions of all the users in the network, or ifthere is an incentive for the users to reveal their utility functions truthfully. In thissection, we will first discuss an incentive mechanism called the Vickrey-Clarke-Groves (VCG) which makes it profitable for the users to reveal their true utilitiesto the central network planner. However, the amount of information that needs tobe conveyed by the users and the amount of computation required on the part ofthe network planner make it difficult to implement the VCG mechanism. One candesign a mechanism based on the idea of distributed resource allocation using agradient approach. We call this the Kelly mechanism. However, this mechanism istruth-revealing only under the assumption that the network is very large so that itis difficult for each user to estimate its impact on the price decided by the network.Users that are unaware of their effect on the price are called price taking. On theother hand, in a small network such as a single link, a user may be able to assessits impact on the network price. In such a case, the user may act strategically, i.e.,


act in such a manner that influences the price to maximize its own benefit. We willreview results that show that the inefficiency of the Kelly mechanism with strategicusers is bounded by 25%, i.e., the Kelly mechanism loses at the most a factor of 1=4compared to the maximum possible network utility.

2.1 VCG Mechanism

Consider a network planner who wants to solve a utility maximization problem,where each user r is associated with a route r :

maxx�0

X

r

Ur.xr /

subject to

X

rWl2r

xr � cl ;8l:

Here, xr is the rate allocated to user r , who has a strictly concave utility functiongiven by Ur and cl is the capacity of link l: Also, we use the notation l 2 r to denotethe fact that link l is part of route r:

Suppose that the network planner asks each user to reveal their utilities and user rreveals its utility function as QUr.xr/; which may or may not be the same as Ur.xr/:Users may choose to lie about their utility function to get a higher rate than theywould get by revealing their true utility function. Let us suppose that the networksolves the maximization problem

maxx�0

X

r

QUr.xr/

subject to

X

rWl2r

xr � cl ;8l

and allocates the resulting optimal solution Qxr to user r: In return for allocatingthis rate to user r; the network charges a certain price pr : The price is calculatedas follows. The network planner calculates the reduction in the sum of the utilitiesobtained by other users in the network due to the presence of user r and collects thisamount as the price from user r: Specifically, the network planner first obtains theoptimal solution f Nxsg to the following problem:

maxx�0

X

s¤r

QUs.xs/


subject to

X

s¤rWl2s

xs � cl ;8l:

In other words, the network planner first solves the utility maximization problemwithout including user r: The price pr is then computed as

pr DX

s¤r

QU . Nxs/ �X

s¤r

QU. Qxs/;

which is the difference of sum utilities of all other users without (f Nxg) and with(f Qxg/ the presence of user r: The network planner announces this mechanism to theusers of the network, i.e., the network planner states that once the users reveal theirutilities, it will allocate resources by solving the utility maximization problem andwill charge a price pr to user r: Now the question for the users is the following:what utility function should user r announce to maximize its payoff ? The payoff isthe utility minus the price:

Ur. Qxr/ � pr :

We will now see that an optimal strategy for each user is to truthfully reveal itsutility function. We will show this by proving that announcing a false utility functioncannot increase the payoff for user r:

Suppose user r reveals its utility function truthfully, while the other users may ormay not. In this case, the payoff for user r is given by

U t D Ur. Qxtr / �

0

@X

s¤r

QUs. Nxts/ �

X

s¤r

QUs. Qxts/

1

A ;

where f Qxtsg is the allocation given to the users by the network planner and f Nxtsg is thesolution of the network utility maximization problem when user r is excluded fromthe network. The superscript t indicates that user r has revealed its utility functiontruthfully. Next, suppose that user r lies about its utility function and denote thenetwork planner’s allocation by Qxl : The superscript l indicates that user r has lied.Now, the payoff for user r is given by

U l D Ur. Qxlr / �

0

@X

s¤r

QUs. Nxts/ �

X

s¤r

QUs. Qxls/

1

A :

If truth-telling were not optimal, U l > U t : If this were true, by comparing thetwo expressions for U t and U l , we get


Ur. Qxlr /C

X

s¤r

QUs. Qxls/ > Ur. Qx

tr /C

X

s¤r

QUs. Qxts/;

which contradicts the fact that Qxt is the optimal solution to

maxx�0

Ur.xr/CX

s¤r

QUs.xs/

subject to the capacity constraints. Thus, truth-telling is optimal under the VCGmechanism. Note that truth-telling is optimal for user r independent of the strategiesof the other users. A strategy which is optimal for a user independent of the strategiesof other users is called a dominant strategy in game theory. Thus, truth-telling is adominant strategy under the VCG mechanism.

In the above discussion, note that Nx is somewhat irrelevant to the pricingmechanism. One could have chosen any Nx that is a function of the strategies ofall users other than r in computing the price for user r; and the result would stillhold, i.e., truth-telling would still be a dominant strategy. The reason for this is thatthe expression for U t � U l is independent of Nx since the computation of Nx does notuse either Ur.:/ or QUr.:/: Another point to note is that truth-telling is an optimalstrategy. Given the strategies of all the other users, there may be other strategies foruser r that are optimal as well. Such user strategies may result in allocations that arenot optimal from the point of view of the network planner.

We have thus established that truth-telling is optimal under a VCG mechanism.However, the VCG mechanism is not used in networks. The reason for this istwofold:

• Each user is asked to reveal its utility function. Thus, an entire function has to berevealed by each user, which imposes a significant communication complexity inthe information exchange required between the users and the network planner.

• The network planner has to solve many maximization problems: one to computethe resource allocation and one for user to compute the user’s price. Each ofthese optimization problems can be computationally quite expensive to solve ina centralized manner.

In the next subsection, we show how one can design a distributed mechanism forutility maximization.

2.2 Kelly Mechanism

One simple method to reduce the communication burden required to exchangeinformation between the users and the network planner is to ask the users to submitbids which are amounts that the users are willing to pay for the resource, i.e., thelink capacities in the network. We refer to the mechanism of submitting bids for


resource in the context of network resource allocation as the Kelly mechanism. Wewill describe the Kelly mechanism only for the case of a single link with capacity c:

Let the bid of user r be denoted by wr : Given the bids, suppose that the networkcomputes a price per unit amount of the resource as

q ,P

k wkc

: (1)

and allocates an amount of resource xr to user r according to xr D wr=q: Thisis a weighted proportionally fair allocation since it is equivalent to maximizingP

r wr log xr subject to the resource constraint.The payoff that the user obtains is given by

Ur

�wrq

�� wr (2)

Since the user is rational, it would try to maximize the payoff. We assume thatusers are price taking and hence are unaware of the effect that their bids have on theprice per unit resource. As far as they know, the central authority is selling them aresource at a price q, regardless of what their bid might be. The system is illustratedin Fig. 2.

What would a user bid given that the price per unit resource is q? Clearly, theuser would try to maximize the payoff and try to solve the problem:

maxwr�0

Ur

�wrq

�� wr :

Assuming, as we did earlier, that Ur.xr/ ! �1 as xr ! 0; the optimal valueof wr is strictly greater than zero and is given by

U 0r

�wrq

�D q: (3)

Fig. 2 In the price-takingparadigm, users are unawareof both how the price iscalculated and of each others’bids. As far as the users areconcerned, the system is a fairblack box that accepts bidsand outputs the price andtheir respective allocations

Player 3

Wei

ghte

dPr

opor

tiona

lly F

air

Syst

emx2, q

w1

w2

w3

x3, q

Player 1

Player 2

x1, q


Since we know that xr D wr=q, the above equation can be equivalently writtenin two other forms:

q D U 0r .xr / (4)

and

wr D xrU0r .xr /: (5)

Now the utility maximization problem that the network planner wants to solve isgiven by

maxx�0

X

r

Ur.xr /;

subject toP

r xr � c: The solution to the problem satisfies the KKT optimalityconditions given by

U 0r .xr / D q;X

r

xr D c (6)

The weighted proportionally fair mechanism ensures that (6) is satisfied (wedo not allocate more resource than is available). Also, we have just seen thatcoupled with rational price-taking users, the mechanism results in an allocation thatsatisfies (4), which is identical to (6). Thus, there exists a solution to the system ofprice-taking users that we call xT ; qT (using T to denote “taking”) that achieves thedesired utility maximization by using the weighted proportionally fair mechanism.

2.2.1 Relation to Decentralized Utility MaximizationNow, suppose that the network wants to reduce its computational burden. Then, itcan compute the price according to the following dual algorithm:

Pq D

X

r

xr � c

!C

q

:

Here, .g.x//Cy denotes

.g.x//Cy D

�g.x/; y > 0;

max.g.x/; 0/; y D 0;

We use projection above to ensure that pl never goes negative. The algorithm canbe interpretted as the system responding to resource usage by giving back a price,which it increases or decreases depending on whether the resource is overutilizedor underutilized, respectively. User r is then allocated a rate wr=q in proportion to


its bid wr : Given the price, we have already seen that the user r’s rational bid wrassuming that it is a price-taking user is given by

U 0r

�wrq

�D q;

which is the same as U 0.xr / D q:It is easy to show that the price update along with such a user response

converges to the optimal solution of the network utility maximization problem.Hence, the algorithm can be thought of as a decentralized implementation of theKelly mechanism. In summary, the Lagrange multiplier is a pricing incentive forusers to behave in a socially responsible manner assuming that they are price taking.

2.3 Strategic or Price-Anticipating Users

In the price-anticipating paradigm, the users are aware of the effect that their bidhas on the price of the resource. In this case the problem faced by the users is agame in which they attempt to maximize their individual payoffs anticipating theprice change that their bid would cause. Each user strategically tries to maximize itspayoff given by

Pr.wr Iw�r / D

(Ur

�wrPk wk

c�� wr if wr > 0

Ur.0/ if wr D 0;(7)

where w�r is the vector of all bids, except wr :Our game is a system wherein there are R users and each user r can make a

bid wr 2 RC [ f0g; i.e., the game consists of deciding a nonnegative real number.

A strategy is a rule by which a user would make his bid. In our case, the strategythat users might use could be Sr D“set the bid wr D wSr ”, where wSr 2 R

C [

f0g is some constant. Since the strategy recommends playing the same bid all thetime, it is called a pure strategy. A strategy profile is an element of the product-space of strategy spaces of each user, denoted by S . We denote the strategy profilecorresponding to all users using the strategy of setting their bids based on the vectorwG by S 2 S . We would like to know if a particular strategy profile is stable in somesense. For example, we might be interested in knowing if everyone knows everyoneelse’s strategy, would they want to change in some way. The concept of the Nashequilibrium formally defines one kind of stability.

Definition 1. A pure strategy Nash equilibrium is a strategy profile from which nouser has a unilateral incentive to change his strategy.

Note the term “unilateral”. This means that users do not collude with each other– they are interested solely in their own payoffs. If they find that there is no point


changing from the strategy that they are currently using, they would continue to useit and so remain at equilibrium. How do we tell if the strategy profile S definedabove is actually a Nash equilibrium? The answer is to check the payoff obtainedby using it. We denote the bid recommended by strategy Sr to user r as wSr andsimilarly the bid recommended by any other strategy G by wGr . So the users wouldnot unilaterally deviate from the strategy profile S if

Pr.wSr Iw

S�r / � Pr.w

Gr Iw

S�r /; (8)

which would imply that the strategy profile S is a Nash equilibrium. We would liketo know if there exists a vector wS such that strategy profile S that recommendsplaying that vector would be a Nash equilibrium, i.e., whether there exists wS thatsatisfies (8). We first find the conditions that need to be satisfied by the desired wS

and then show that there indeed exists a unique such vector.Now, our first observation is that wS must have at least two positive components.

On the one hand, if there were exactly one user with a positive bid, it would wantto decrease the bid towards zero and yet have the whole resource to itself (so therecan’t be exactly one positive bid). On the other hand, if there were no users with apositive bid, there would be an incentive for all users to increase their bids to somenonzero value to capture the resource. The next observation is that since wS has atleast two positive components, and since wr=.

Pk¤r wk C wr / is strictly increasing

in wr if there are, the payoff function is strictly concave in wr . Assuming that theutility functions are continuously differentiable, so this means that for each user k,the maximizer wS of (7) satisfies the conditions

U 0r

�wSPk wSk

c

��1 �

wSrPk wSk

�D

Pk wSkc

; if wSk > 0 (9)

U 0r .0/ �

Pk wSkc

; if wSk D 0; (10)

which are obtained by simply differentiating (7) and setting to 0 (or � 0 if wSr D 0)and multiplying by

Pk wSk =c.

We now have the set of conditions (9) and (10) that must be satisfied by thebids that the Nash strategy profile suggests. But we don’t know yet if there actuallyexists any such vector wS that would satisfy the conditions. How do we go aboutshowing that such a vector actually exists? Consider the conditions again. Theylook just like the KKT first-order conditions of a constrained optimization problem.Perhaps we could construct the equivalent optimization problem to which these areindeed the KKT conditions? Then if the optimization problem has a unique solution,the solution would be the desired vector of bids wS . Consider the constrainedoptimization problem of maximizing

X

k

OUk.xk/; (11)


subject to the constraints

X

k

xk � c; xk � 0 8k D 1; 2; 3 : : : ; R (12)

where the utility function OU .:/ is defined as

OUk.xk/ D�1 �

xk

c

�Uk.xk/C

�xkc

�� 1xk

Z xk

0

Uk.z/d z

�: (13)

It easy to see that OU .:/ is concave and increasing in 0 � xk � c by differentiatingit, which yields OU 0k.xk/ D U 0k.xk/.1 � xk=c/. Since Uk.:/ is concave and strictlyincreasing, we know that U 0k.xk/ > 0 and that U 0k.:/ is nonincreasing. Hence, weconclude that OU 0k.xk/ is nonnegative and strictly decreasing in k over the region0 � xk � c as required.

We verify that the KKT first-order conditions are identical in form to theconditions (9) and (10). Directly from the optimization problem above, we havethat there exists a unique vector w and a scalar (the Lagrange multiplier) � such that

U 0r .xk/�1 �

xk

c

�D �; if xk > 0 (14)

U 0k.0/ � �; if xk D 0 (15)X

k

xk D c (16)

We check that at least two components of x above are positive. We havefrom (16), at least one of the xk > 0. If only a single component xr > 0 with allothers being 0, then from (14) we have � D 0, which in turn means from (15) thatU 0k.0/ � 0 for some k. This is impossible since Uk.:/ was assumed to be concave,strictly increasing for all k. Then we see that as desired, the above conditions areidentical to the Nash conditions with � D

Pk wSk =c and xk D cwk=

Pk wk . Thus,

we see that even in the case of price-anticipating users, there exists unique solution.We will denote the resulting allocation xS , the S being used to denote “strategy.”

Notice that the solution found applies to an optimization problem that is differentto the one in the price-taking case, so we would expect that in terms of utilitymaximization, the price-anticipating solution would probably be worse than theprice-taking case. We will show how to bound the worst-case performance of theprice-anticipating case in the next subsection. Since in the price-anticipating case,all users play strategically with complete knowledge, and what the central authoritydoes is to allocate resources in a weighted proportionally fair manner, the systemcan be likened to anarchy with users operating with minimal control. We refer to theinefficiency in such a system as the price of anarchy.

We now examine the impact of price-anticipating users on the network utility,i.e., the sum of the utilities of all the users in the network. We use the superscript T


to denote solution to the network utility maximization problem (we use T since theoptimal network utility is also the network utility assuming price-taking users), andthe superscript S to denote the solution for the case of price-anticipating users. Wewill show the following result.

Theorem 1. Under the assumptions on the utility function given at the beginningof Sect. 2,

X

r

Ur�xSr��3

4

X

r

Ur�xTr�: (17)

Hence, the price of playing a game versus the optimal solution is no greater than1=4 of the optimal solution.

Proof. The proof of this theorem consists of two steps:

• Showing that the worst-case scenario for the price-anticipating paradigm is whenthe utility functions Ur are all linear.

• Minimizing the value of the game under the above condition.

Step 1Using concavity of Ur.:/ for any user k, we have for the general allocation vector zwith

Pk zk � c, that Ur

�xTk�� Uk.zr /C U 0k.zk/

�xTk � zk

�. This means that

Pr Ur.zr /Pr Ur .x

Tr /�

Pr .Ur.zr / � U

0r .zr /zr /C

Pr U0r .zr /zrP

r .Ur.zr / � U0r .zr /zr /C

Pr U0r .zr /x

Tr

Since we know thatP

k xTk D c, we know that

Pk U0k.zk/x

Tk � .maxk U 0r .zk//c.

Using this fact in the above equation, we obtain

Pk Uk.zk/Pk Uk.x

Tr /�

Pk.Uk.zk/ � U

0k.zk/zk/C

Pk U0k.zk/zkP

k.Uk.zk/ � U0k.zk/zk/C .maxk U 0k.zk//c

The termP

k.Uk.zk/ � U0k.zk/zk/ is nonnegative by concavity of Uk; and the

assumption that U.0/ � 0; which means that

Pk Uk.zk/Pk Uk.x

Tk /�

Pk U0k.zk/zk

.maxk U 0k.zk//c(18)

The above inequality is of interest as it compares the utility function with its linearequivalent. If we substitute z D xS in (18), we obtain

Pk Uk.x

Sk /P

k Uk.xTk /�

Pk U0k.x

Sk /x

Sk

.maxk U 0k.xSk //c

(19)


Now, we notice that the left-hand side of the expression above is the price of anarchythat we are interested in, while the right-hand side of the expression looks likethe price of anarchy for the case where the utility function is linear, i.e., whenUr.xr/ D U 0r .x

Sr /xr , NUr.xr/. We verify this observation by noting that since

the conditions in (14), (15), and (16) are linear, the numerator is the aggregateutility with price-anticipating users when the utility function is NU.:/. Also, thedenominator is the maximum aggregate that can be achieved with the utility functionNU .:/, which means that it corresponds to the price-taking case for utility NU.:/. Thus,

we see that the linear utility function of the sort described above necessarily has alower total utility than any other type of utility function.

Step 2Since the worst-case scenario is for linear utility functions, we may take Ur.xr/ D˛rxr . From Step 1, the price of anarchy, i.e., the ratio of aggregate utility at the Nashequilibrium to the aggregate utility at social optimum is then given by

Pk ˛kx

Sk

fmaxk ˛kgc: (20)

Without loss of generality, we may take the maxk ˛k D 1 and c D 1. Sincethis means that the denominator of the above expression is 1, to find the worst-caseratio, we need to find ˛2; ˛3; : : : ; ˛R such that the numerator is minimized. Thiswould directly give the price of anarchy. So the objective is to

minfxS ;˛g

xS1 C

RX

rD2

˛rxSr (21)

subject to ˛k�1 � xSk

�D 1 � xS1 ; if xSk > 0 (22)

˛k � 1 � xS1 ; if xSk D 0 (23)

X

k

xSk D 1 (24)

0 � ˛k � 1; k D 2; : : : R (25)

xSk � 0; k D 1; : : : ; R (26)

Notice that the constraints on the above optimization problem follow from (14),(15), and (16) to ensure that xS is a allocation vector that a Nash strategy profilewould suggest. Since only the users with nonzero allocations contribute to theutility, we can consider the system with N � R users, with every user getting anonzero allocation. Equivalently, we could just assume that allR users get a nonzeroallocation and observe what happens as R increases. Then ˛k.1 � xSk / D 1 � xS1holds for all users in the new system, which implies ˛k D .1 � xS1 /=.1 � x

Sk /. Let


us fix xS1 for now and minimize over the remaining xSr . Then we have

minfxSr Wr¤1g

xS1 C

RX

kD2

xSk .1 � xS1 /

1 � xSk(27)

subject toRX

kD2

xSr D 1 � xS1 (28)

0 � xSk � xS1 ; k D 2; : : : ; R (29)

The above problem is well defined only if xS1 � 1=R (otherwise the lastconstraint will be violated upon minimization). If we assume this condition, bysymmetry, the minimum value occurs for all users 2; : : : ; R getting the sameallocation equal to .1 � xS1 /=.R � 1/. Substituting this value into the problem, weonly have to minimize over xS1 , i.e., the problem is now given by

minxS1

xS1 C .1 � xS1 /2

�1 �

1 � xS1R � 1

��1(30)

subject to 0 � xS1 � 1: (31)

The objective function above is decreasing in R and so the lowest value wouldoccur as R!1. So we finally have very simple problem to solve, namely,

minxS1

xS1 C .1 � xS1 /2 (32)

subject to 0 � xS1 � 1 (33)

By differentiation we can easily see that the solution to the above problem isfor dS1 D 1=2, which yields a worst-case aggregate of 3=4. Thus, we see that theaggregate utility falls by no more than 25% when the users are price anticipating.�

3 Flow Routing

In the last section, we considered the problem of strategic agents that try tomaximize their individual throughputs, and saw that the the price of anarchy can betightly upper bounded in a game of such strategic behavior. We consider a differentgame involving routing of flows though a network in this section. As we saw inthe last section, the load on a link can be thought of as imposing a cost on all theflows on that link. This cost can be thought of as the delay experienced by eachpacket of the flow, which in turn causes a reduction in the quality of the service


Fig. 3 A Pigovian Network

S D

being supported by that flow. Suppose that a set of routes is available between asource and destination and each packet makes a decision on which route to takebased on the perceived delay on each alternative. What would be the effects of suchper-packet selfish routing?

Figure 3 illustrates the setup of an example routing problem proposed by Pigou.Here, there is a total flow of 1 unit, between S and D. We can think of this flowas the total number of packets per second that are being injected into the network.Let the delay per unit flow function on link A be pA.yA/ D 1; where yA is flowon link A: Similarly, the delay per unit flow on link B is pB.yB/ D yB; with thecorresponding flow being yB: Thus, the upper route has a fixed delay regardless ofthe flow on it, whereas the lower one has a delay proportional to the flow on it. Thetotal flow is yA C yB D 1:

We can characterize the total delay experienced by an average packet underselfish routing by deriving the equilibrium flows on the links under such selfishdynamics. We make the assumption that the flows are infinitely divisible, in whichcase the decision of each infinitesimal unit of flow’s decision has a vanishinglysmall impact on the delay on any link. Effectively, this is as if each packet is pricetaking and simply greedily chooses the route that shows the smallest delay. Then theequilibrium conditions would simply be that any routes that are in use would havethe same per-unit delay, since otherwise some of the flow would be diverted to aroute that shows a smaller delay until equalization occurs. Hence, for selfish routingat equilibrium

• if A and B are both used then pA.yA/ D pB.yB/:• if only A is used, pA.yA/ � pB.yB/:• if only B is used, pB.yB/ � pA.yA/:

Such a pair .yA; yB/ is said to be in Wardrop equilibrium. In this example, theWardrop equilibrium is .yA; yB/ D .0; 1/. The average delay experienced by apacket is

yApA.yA/C yBpB.yB/

yA C yBD yApA.yA/C yBpB.yB/ D 1: (34)

Now, let us consider what would happen if a network planner were to route theflows in such a way that the average delay of the system as a whole is minimized. We


refer to this case as socially optimal routing. We can determine the socially optimalflow assignment by choosing .yA; yB/ to solve

minyA C y2B

subject toyA C yB D 1; yA � 0; yB � 0

The problem is easy to solve by differentiating the objective function and settingit equal to zero as follows:

min0�yB�1

.1 � yB/C y2B (35)

) �1C 2yB D 0) yB D 1=2 (36)

) optimal cost D 1=2C 1=4 D 3=4 (37)

Combining the two results in (34) and (37), we have a characterization of thePoA (price of anarchy) as

Optimal cost

Cost under selfish routingD4

3:

The question arises as to whether the same kind of result would apply in thecase of flow routing in a general network? For instance, one might think that in anetwork with a large number of routes to choose from, perhaps the delay incurredeven with selfish routing might be comparatively small. We next present an examplethat shows that the intuition that adding links, even if the cost function of theadditional link is zero can actually increase the delay of a system with selfishrouting. This apparently paradoxical result is known as Braess’ paradox, namedafter its discoverer.

3.1 Braess’ Paradox

Consider the network shown in Fig. 4. We have the typical problem of routing aunit flow in a network with four links and a combination of fixed and linear linkdelay functions. Using the same logic that we used earlier, the Wardrop equilibriummust be such that all routes with a nonzero flow must have the same unit delay.By symmetry, we have that the Wardrop equilibrium is .y1; y2/ D .1=2; 1=2/: Theaverage delay per packet is 3=2� 1=2C 3=2� 1=2 D 3=2:

Now, let us add a link with zero delay (regardless of flow on it) betweenA andB;as shown in Fig. 5. We see immediately that Wardrop equilibrium is to route all theflow on S ! A ! B ! D. The average delay corresponding to this flow routingis 1C 1 D 2. We observe that the addition of a link with zero delay has increasedthe average delay under selfish routing!


Fig. 4 A flow routingexample with four links

S D

1

1

A

B

Fig. 5 Braess’ Paradox.Adding a link with zero delayincreases average delay

S D

1

0

1

A

B

3.2 Flow Routing Game

We now consider the problem of tightly bounding the price of anarchy of selfishflow routing in a general network. We will focus only on linear cost functions.Surprisingly, it turns out that the bound of 4=3 that we derived for the Pigoviannetwork with some specific cost functions not only turns out to be accurate for aPigovian network with general affine cost functions but also turns out to be thebound for a general network with affine cost functions. This subsection is devotedto proving that result.

Let F be the total flow between source, S and destination, D in a network. LetR be the set of routes between S and D. As before, we will use the notation l 2 rto indicate that link l is a part of route r . Also, xr denotes the flow on route r; whileyl denotes flow on link l with

yl DX

rWl2r

xr :

The cost (or delay) of a route r per unit of flow on it is denoted by


qr.y/ DX

l2r

pl .yl /;

where pl is the cost of link l , and is a nondecreasing function and y D

.y1; y2; � � � yl ; � � � /.We define the Socially optimal routing problem as follows:

minX

r

X

l

pl .yl /

!xr (38)

s:t: (39)X

rWl2r

xr D yl (40)

X

r

xr D F (41)

xr ; yl � 0 8r; l (42)

Definition 2. A Wardrop equilibrium is a vector y D fylg such that, if xr > 0 forsome r 2 R; then qr.y/ � q0r .y/;8r

0 2 R:

In other words, a route has nonzero flow only if it is a minimum-cost route.Does a Wardrop equilibrium exist in the flow routing game? In order to answer thisquestion, we will rewrite the conditions of the Wardrop equilibrium as the solutionof a convex optimization problem to which a solution has to exist. First, accordingto the definition, the Wardrop equilibrium is equivalent to the condition that thereexists a � � 0 such that

qr.y/ D � if xr > 0 (43)

and qr.y/ � � if xr D 0 (44)

We will now show that the above condition is identical to solving the followingproblem.

minX

l

Z yl

0

pl .y/dy (45)

s:t:X

rWl2r

xr D yl (46)

X

r2R

xr D F (47)

xr � 0; yl � 0 (48)


In order to show the equivalence between the conditions defined by (43), (44)and (45), (46), (47), (48), we will characterize the solution of the latter problem andshow that it exactly satisfies the former. Using (46), yl can be eliminated from theproblem yielding

minX

l

Z PrWl2r xr

0

pl .y/dy (49)

s:t:X

r2R

xr D F (50)

xr � 0 8r: (51)

First, we note that since pl.y/ is a nondecreasing function of y

Z yl

0

pl .y/dy

is a convex function. Hence, the above formulation is a convex optimizationproblem. The Lagrange dual of the above problem is

minxr�0

X

l

Z PrWl2r xr

0

pl .y/dy

„ ƒ‚ …V .x/

� �

X

r

xr � F

!; (52)

where we have used the notation x D .x1; x2; � � � ; xr ; � � � /. The first-order KKTconditions corresponding to the solution are, therefore,

@V

@xrD � if xr � 0 (53)

@V

@xr� � if xr D 0 (54)

@V

@xrDX

l2r

pl .yl //; (55)

which are identical to the definition of the Wardrop equilibrium in (43) and (44).Thus, (45), (46), (47), and (48) can be used as an alternative definition of a Wardropequilibrium.

3.2.1 PoA for Linear Latency Functions: Pigovian NetworkWe now reconsider the Pigovian network with two routes that we saw earlier, exceptwith more general (but still affine) cost functions and any nonnegative flow F; asshown in Fig. 6. Let yl be the flow on any link l in a graph. Assume that pl.yl / Dalyl C bl .al ; bl � 0/: Also, let ˛ be the worst-case PoA in a Pigou network. It is


Fig. 6 A Pigovian networkwith linear delay functions

S D

easily seen that the Wardrop equilibrium places all the flow on the top link and thecost is .aF C b/F D aF 2 C bF:

In order to find the socially optimal flows, we need to solve

min0�y�F

.ay C b/y C .aF C b/.F � y/: (56)

Note that we immediately have min0�y�F , miny�0 in this case. Let us ignorethe fact that y � 0 and simply try to find the global minimum. Differentiating, wehave

)2ay C b � aF � b D 0 (57)

)y D F=2; (58)

i.e., the solution satisfies y � 0: The cost of the social optimal is

D .aF =2C b/F =2C .aF C b/F =2 (59)

D 3aF =4C b: (60)

Finally, we can identify the Price of Anarchy as

˛ D maxF�0;a�0;b�0

.aF C b/F

F�3aF4C b

� (61)

D 4=3: (62)

Note that the PoA is achieved when F !1:

3.2.2 PoA for Linear Latency Functions: General NetworkWe now extend the results of the Pigovian network to a general network. The mainresult is stated below.

Theorem 2. Consider any network with affine link delay costs, with al ; bl � 0. ThePoA is upper bounded by 4=3.


Proof. Let Ox be the WE and x� be socially optimal for a network with a flowbetween a source and destination of F . If Oxr > 0, qr.Ox/ � qr 0.Ox/ 8r 0;which impliesthat for all used routes qr. Oxr/ are equal. Call this value as M: Thus,

X

r

qr .Ox/ Oxp DX

rW Oxr 6D0

qr .Ox/ Oxr DMF: (63)

Now, since the allocation x� might use routes not used in Ox;

X

r

qr . Oxr/x�r �M

X

r

x�r DMF (64)

)X

r

qr . Ox/.x�r � Oxr/ � 0; (65)

i.e., if all route costs are fixed as qr. Ox/, then Wardrop equilibrium is the smallestcost.

Recall from our discussion on the Pigovian network that

4

3D max

F�0;a�0;b�0

Fp.F /

min0�y�F yp.y/C .F � y/p.F /(66)

D maxF�0;a�0;b�0

max0�y�F

Fp.F /

yp.y/C .F � y/p.F /(67)

Recall that the upper bound on y is slack, and hence we can equivalently write

4

3D max

F�0;a�0;b�0;y�0

Fp.F /

yp.y/C .F � y/p.F /(68)

Since the above inequality is true in general, we can substitute Oyl for F and y�lfor y to obtain

4

3�

Oylpl . Oyl/

y�l pl .y�l /C . Oyl � y

�l /pl . Oyl/

(69)

) y�l pl .y�l / �

3

4Oylpl . Oyl/C .y

�l � Oyl/pl . Oyl/ (70)

Now, summing over all links we get

X

l

y�l pl .y�l / �

3

4

X

l

Oylpl . Oyl/CX

l

.y�l � Oyl/pl . Oyl/

„ ƒ‚ ….A/

(71)


If the term labelled .A/ above is nonnegative, the theorem immediately follows.In order to show that this is indeed the case, we use the following argument. Wehave

X

l

.y�l � Oyl/pl . Oyl/ (72)

DX

l

pl . Oyl/

X

l2r

x�r �X

l2r

Oxr

!(73)

DX

r

X

l2r

pl . Oyl/

!.x�r � Oxr/ (74)

DX

r

qr . Ox/.x�r � Oxr/ (75)

� 0; (76)

where the final inequality follows from (65). Hence, from (71)

PoA �4

3:

We may note in conclusion that the socially optimal solution, and WE bothmay have multiple solutions but have unique costs since they are both convexprograms. �

4 Pricing Approach to Scheduling

In the last two sections, we considered two kinds of agent models in resourcesharing games and characterized the efficiency loss incurred in each case. In Sect. 2,the focus was on a finite set of agents, with each having an impact on others’payoffs, whereas in Sect. 3, the number of agents was infinite, and each agent hadan infinitesimal impact on the payoffs of others. In this section, we will consider amodel in which the number of agents is infinitely large, but each agent only interactswith a finite subset of these agents at each time. Further, we now consider a repeatedgame in which an agent’s state changes at each step according to a discrete timeMarkov process.

Our context is a system consisting of smartphone users whose apps are modeledas queues that come into existence when the user starts the app and vanish when theuser terminates that app. The app generates packets (either uplink or downlink) thatget buffered in the corresponding queue and which need to be served by a cellularbase station. If a user is scheduled for service, the queue is decremented by a unitamount. We can associate a cost with the instantaneous queue length, which capturesthe idea that the quality of service of the app suffers with increasing packet delays.The user moves around, meaning that it changes the cell that it is present in. The


shared resource across users in a particular cell is the spectrum allocated to that cell,and we assume for simplicity that this is such that only one queue can be served atany time. Then a question arises as to how to schedule queues in each cell in such away that a good performance is obtained.

It is well known that the longest-queue-first (LQF) algorithm has attractiveproperties, such as minimizing the expected value of the longest queue in the system.One would expect that such an algorithm would ensure good performance across allapps. While the queues corresponding to downlink are present at the base stationitself, the uplink queues are present at the user devices themselves. However, ifthe base station announces an LQF policy and then polls the queues seeking theirlengths, there is clearly an incentive for each queue to try and receive additionalservice by misreporting its value. How should we design a system wherein thequeues accurately reveal their information to the base stations?

Suppose that we hold an auction in which each queue placed a bid and the highestbidder gets service. The mechanism used can be chosen as a second-price scheme inwhich the winning bidder pays the value of the second highest bid. It is well knownthat such an auction promotes truth-telling about valuation in the single-step case.But in our system, the queue makes bids in each time step during its lifetime, andso we have a repeated game setting. Further, the queue has to estimate the likelybids made by the other queues at each step and choose its bid by trading off thevalue of winning (in terms of decrementing its queue length) and the payment to bemade. What kind of bids would be seen in such as system? Would conducting suchan auction repeatedly over time with queues arriving and departing result in someform of equilibrium? Would the scheduling decisions resulting from such auctionsresemble that of LQF?

4.1 Mean Field Games

We will investigate the existence of an equilibrium using the theory of mean fieldgames (MFGs). In this setup, the players assume that each opponent would playan action drawn independently from a fixed distribution over its action space. Thisdistribution is called the belief of the players. The player then chooses an actionthat, given its own state, is the best response against actions drawn in this fashion.We say that the system is at Mean Field Equilibrium (MFE) if this best responseaction is itself a sample drawn from the assumed distribution. Thus, the player’saction should be consistent with the assumed distribution.

The MFG framework offers a structure to approximate so-called PerfectBayesian Equilibrium (PBE) in dynamic games. PBE requires each player to keeptrack of their beliefs on the future plays of every other opponent in the system andplay the best response to that belief. This makes the computation of PBE intractablewhen the number of players is large. In our case, the state of each player is thecurrent queue length, while the action is the bid made to the base station in whichthe phone is currently located. A PBE would require each app to estimate the queue


lengths and the associated bids of every other queue that it is competing against –something that is clearly quite hard. The MFG approximation would assume thatthe bids made by the opponents are drawn independently from some bid distributionand to place a bid in response. The MFG framework simplifies computation of thebest response and often turns out to be asymptotically optimal.

Analysis of a typical mean field game problem involves the following steps:

1. Characterization of the best response policy under a fixed belief. We view thesystem from the perspective of a single agent that takes an action at each timestep based on its state, its belief about other agents’ actions, and its knowledgeabout the transition probabilities of its state. In the repeated game case, thisprocess involves characterizing the optimal action in a Markov decision process(MDP).

2. Showing the existence of the MFE using fixed point arguments. This stepinvolves verifying that the map between the belief distribution and the invariantdistribution of the mean field agent’s actions has a fixed point. In general, theproperties to be verified are continuity of the map and compactness of the rangespace.

3. Showing asymptotic independence of the queues. One of the assumptions incomputing the MFE is that, when a given agent interacts with another agent,their queue length distributions are independent. The first step in justifying thisassumption is to prove that the queue lengths of any finite set of queues areindependent when the numbers of users goes to infinity. This independence resultis proved for a class of policies which includes the policy obtained using the MFEassumption. Then, the result is extended to show that independence continues tohold in steady state.

4. Showing that the MFE is an �-Nash equilibrium asymptotically. Here, the ideais that if the mean field policy were employed by all agents except a particularagent, that agent would gain no more than � payoff by a unilateral deviationemploying any other policy.

5. Dynamics of the many particle system. One hurdle to implementing the policyobtained from the MFE approach is that the equilibrium distribution of eachuser’s queue length distribution is unknown. In principle, this is estimated bythe stations from a histogram of the users’ queue lengths. Indeed, simulationsindicate that a policy computed from such an empirical estimate of the distribu-tion converges to the MFE policy when the number of users is large. However,proving such a result is an open problem at this time.

In what follows, we will focus only on Steps 1 and 2 above and will providefull results on the first and a sketch of the second. In the problem that we consider,Step 4 and the independence result in Step 3 over finite time horizons follow ina straightforward fashion, while research into independence assumption in steadystate in Step 3 and all of Step 5 is ongoing. We provide some references into theseaspects at the end of the chapter.


4.2 Mean Field Auction Model

We consider a system consisting of N cells, each with a base station, and a total ofNM agents. The agents are the smartphone apps, in which each one is associatedwith a queue. These agents are randomly permuted in the cells at each discrete timeinstant k, with each cell having exactly M agents. The model can be thought of asrepresenting the idea that each cell in a cellular system typically has about 1000devices, of which about 10 devices might be active at any time. The devices aremobile, which means that the pool from which active devices appear is constantlychanging. Each cell contains a base station, which conducts a second-price auctionto choose which agent to serve. Each agent must choose its bid in response to itsstate and its belief over the bids of its competitors.

Figure 7 illustrates the MFG approximation, which is accurate in the limit asN becomes large. As mentioned earlier, an MFG is described from the perspectiveof a single agent, which assumes that the actions of all its competitors are drawnindependently from some distribution called the belief.

Auction system: The agent of interest competes in a second-price auction againstM � 1 other agents, whose bids are assumed to be independently drawn from acontinuous, finite mean (cumulative) bid distribution �; with support RC: The stateof the agent is its current queue length q (the random variable is represented by Q).The queue length induces a holding cost C.q/; where C.:/ is a strictly convex andincreasing function. Suppose that the agent bids an amount w 2 R

C: The outcomesof the auction are that the agent would obtain a unit of service with probabilityp�.w/ and would have to pay an expected amount of r�.w/ when all the other bidsare drawn independently from �: Further, the queue has future job arrivals according

Actual probability thatbid lies in [0, w]

ρ

C(q)

rρ(x)

pρ(x)

A

Q, qA

Probability ofsuccess

ExpectedPayment

Value functionBid function

Holding cost(convex)

Resultant BidDistribution (R+)

Assumed BidDistribution (R+)

Queue LengthDistribution (R+)

RegenerationDistribution (R+)

State(R+)

Bid (R+)

Mean Field Equilibrium

Assumed probabilitythat bid lies in [0, w]

Πρ

Vρ, θρ

Ψ

Φ

R

γW, w

ρ(w) ?= =γ(w) Πρ θ−1ρ ([0, w])

)

)

ArrivalProcess [0, A]

Fig. 7 The game consists of an auction part (inner loop) and a queue dynamics part (outerloop). The system is at MFE if the resultant bid distribution � is the same as the assumed biddistribution �


to distribution ˆ; with the random job size being denoted by A: Finally, the app canterminate at any time instant with probability 1�ˇ: Based on these inputs, the agentneeds to determine the value of its current state OV�.q/; and the best response bid tomake w D O��.q/.

Queueing system: The queueing dynamics are driven by the arrival processˆ andthe probability of obtaining service being p�.w/ as described above. When the userterminates an app, he/she immediately starts a fresh app, i.e., a new queue takesthe place of a departing queue. The initial condition of this new queue is drawnfrom a regeneration distribution ‰; whose support is RC: The invariant distributionassociated with this queueing system (if it exists) is denoted by …�:

Mean field equilibrium: The probability that the agent’s bid (represented bythe random variable W ) lies in the interval Œ0;w� is equal to the probability thatthe agent’s queue length lies in some set whose best response is to bid betweenŒ0;w�: Thus, the probability of the bid lying in the interval Œ0;w�; denoted bythe cumulative probability distribution �.w/; is …�. O�

�1� .Œ0;w�//: According to the

assumed (cumulative) bid distribution, the probability of the same event is �.w/: If�.w/ D �.w/, it means that the assumed bid distribution is consistent with the bestresponse bid distribution, and we have an MFE.

4.2.1 Agent’s Decision ProblemUnder the mean field regime, we are interested in the decision and state evolutiona particular agent i that has a belief that the bid of each other agent (opponent) hascumulative distribution �; independent of each other. We assume that � 2 P where,P is the set of distributions with a continuous c.d.f. and a finite mean, upper boundedby some E < 1. Suppose that the random variable representing the bid made byagent i at time k is denoted by Wi;k; with the realized value being w: Also, let

NW�i;k D maxj2Mi;k

Wj;k;

represent the maximum value of M � 1 draws from the distribution �: Thus, NW�i;kis the value of the highest opposing bid.

Since the time of regeneration T ki is a geometric random variable, the expectedcost of agent i can be can be written as

Vi;�.Hi;kI �i / D E

"1X

tDk

ˇt ŒC .Qi;t /C r�.Wi;t /�

#; (77)

whereHi;k is the history observed by agent i until time k; �i is the bid function thatit employs, and the expectation is over future state evolutions. Also,

r�.w/ D EŒ NW�i;kIf NW�i;k � wg�


is the expected payment when i bids w under the assumption that the bids of otheragents are distributed according to �. Hence, given �, the probability that agent iwins in the auction is

p�.w/ D P. NW�i;k � w/ D �.w/M�1: (78)

The expected payment when bidding w is

r�.w/ D EŒ NW�i;kIf NW�i;k � wg�

D wp�.w/ �Z w

0

p�.u/du: (79)

The state process Qi;k is Markov and has a transition kernel

P.Qi;kC1 2 BjQi;k D q;Wi;k D w/ D ˇp�.w/P..q � 1/C C Ak 2 B/

C ˇ.1 � p�.w//P.q C Ak 2 B/C .1 � ˇ/‰.B/; (80)

whereB � RC is a Borel set and xC , max.x; 0/. Recall thatAk � ˆ is the arrival

between .k/th and .k C 1/th auction and ‰ is density function of the regenerationprocess. In the above expression, the first term corresponds to the event that agentwins the auction at time k; while the second corresponds to the event that it doesnot. The last term captures the event that the agent regenerates after auction k. Theagent’s decision problem can be modeled as an infinite horizon discounted costMDP. Standard results can be used to show that there exists an optimal Markovdeterministic policy for our MDP; (Strauch 1966). Then, from (77), the optimalvalue function of the agent can be written as

OVi;�.q/ D inf�i2‚

E

"1X

tD1

ˇt ŒC .Qi;t /C r�.Wi;t /� jQi;0 D q

#; (81)

where ‚ is the space of Markov deterministic policies. Once we have the aboveformulation, the index of the agent is redundant as we are concerned with a singleagent’s decision problem. Hence, we will omit the agent subscript i in what follows.

4.2.2 Existence of a Stationary DistributionGiven cumulative bid distribution � and a Markov policy � 2 ‚, the transitionkernel given by (80) can be rewritten as,

P.QkC1 2 BjQk D q/ D ˇp�.�.q//P..q � 1/C C Ak 2 B/

C ˇ.1 � p�.�.q///P.q C Ak 2 B/C .1 � ˇ/‰.B/: (82)

A basic question is whether a stationary distribution…� exists under an arbitraryMarkov policy �: This is critical if we are to characterize the map between the


assumed bid distribution and � and the resultant bid distribution �: It turns outthat under our formulation, the existence of the invariant state distribution followsimmediately from Meyn et al. (2009).

4.2.3 Mean Field EquilibriumThe mean field equilibrium is essentially a consistency check that the bid distribu-tion � induced by the stationary distribution…�;�� is identical to the bid distributionthat forms the belief of the agent, i.e., �. Hence, we have the following definition ofMFE:

Definition 3 (Mean field equilibrium). Let � be a bid distribution and �� be astationary policy for an agent. Then, we say that .�; ��/ constitutes a mean fieldequilibrium if

1. �� is an optimal policy of the decision problem in (81), given bid distribution �.

2. �.x/ D �.w/ , …�.��1� .Œ0;w�//;8w 2 R

C, where …� D …�;�� .

We now characterize the best response policy and describe the steps involved inproving existence of the MFE.

4.3 Properties of Optimal Bid Function

The decision problem given by (81) is an infinite horizon, discounted Markov deci-sion problem (MDP). The optimality equation or Bellman equation correspondingto the decision problem is

OV�.q/ D C.q/C ˇEA. OV�.q C A//

C infx2RC

hr�.w/ � p�.w/ˇEA

�OV�.q C A/ � OV �..q � 1/

C C A/�i;

(83)

where A � ˆ; and we use the notation max.0; z/ D zC:We define the set of functions

V D(f W RC 7! R

C W supq2RC

ˇˇf .q/�.q/

ˇˇ <1

); (84)

where �.q/ D maxfC.q/; 1g. Clearly, V is a Banach space with �-norm,

kf k� D supq2RC

ˇˇf .q/�.q/

ˇˇ <1:


We define the Bellman operator T� as

.T�f /.q/ D C.q/C ˇEAf .q C A/

C infw2RC

r�.w/ � p�.w/ˇ.EA.f .q C A/ � f ..q � 1/

C C A///;

(85)

where f 2 V . It is straightforward to show that the infimum in the above operatoroccurs at

ˇf .q/C; (86)

where f .q/ D EA.f .q C A/ � f ..q � 1/C C A//: Then, substitutingfrom (78), (79) and (86), (85) can be rewritten as

.T�f /.q/ D C.q/C ˇEAf .q C A/ �

Z ˇf .q/C

0

p�.u/du: (87)

Our first step is to show that an optimal solution exists for this problem. TheMDP is in discrete time, but state consists of all nonnegative real numbers. Thereexist standard regularity conditions under which such an MDP has a solution. Forinstance, our problem setup can be posed as a slightly modified version of that inTheorem 8:3:6 of Hernández-Lerma and de Ozak (1992). The result is as follows:

Lemma 1. Given a cumulative bid distribution �,

1. There exists a j 2 N such that T j� W V ! V is a contraction mapping. Hence,there exists a unique f �� 2 V such that T�f �� D f �� , and for any f 2 V ,T n� f ! f �� as n!1.

2. The fixed point f �� of operator T� is the unique solution to the optimality Eq. (83),

i.e., f �� D OV�.

3. Letting O��.q/ D ˇ OV�.q/C; O�� is an optimal policy.

Corollary 1. An optimal policy of the agent’s decision problem (81) is given by

O��.q/ D ˇEA

hOV�.q C A/ � OV�..q � 1/

C C A/i:

We now establish that OV� and O�� are continuous and increasing functions.

Lemma 2. Given a cumulative bid distribution function �

1. OV� is a continuous increasing function.

2. O�� is a continuous strictly increasing function.


Proof. Let f 2 V . Suppose f is a continuous monotone increasing function. Wefirst prove that T�f is also continuous monotone increasing function. Since, then � step Bellman operator T n� f ! OV� according to Statement 2 of Lemma 1, we

conclude that OV� also has the same property.Let q > q0. Then,

T�f .q/ � T�f .q0/ D C.q/ � C.q0/C ˇEA.f .q C A/ � f .q

0 C A//

C infwŒr�.w/ � ˇp�.w/EA.f .q C A/ � f ..q � 1/

C C A//�

� infwŒr�.w/ � ˇp�.w/EA.f .q

0 C A/ � f ..q0 � 1/C C A//�

.a/

� ˇEA.f .q C A/ � f .q0 C A//C ˇ inf

w

p�.w/EA.f .q

0 C A/

�f ..q0 � 1/C C A/ � f .q C A/C f ..q � 1/C C A//

� ˇmin˚EA.f .q C A/ � f .q

0 C A//;EA.f ..q � 1/C C A/

�f ..q0 � 1/C C A//�

.b/

� 0;

where (a) follows from the assumption that C.:/ is an increasing function, and (b)follows from the assumption that f .:/ is an increasing function.

To prove that T�f is continuous consider a sequence fqng such that qn ! q.Since f is a continuous function, f .qnCa/! f .qCa/. Then, by using dominatedconvergence theorem, we have EAf .qnCA/! EAf .qCA/ and EAf ..qn�1/

CC

A/! EAf ..q�1/CCA/. Also,f .qn/ � 0 as f is an increasing function. Then,

from (87), we get that

T�f .qn/ D C.qn/C ˇEAf .qn C A/ �

Z ˇf .qn/

0

p�.u/du

! C.q/C ˇEAf .q C A/ �

Z ˇf .q/

0

p�.u/du D T�f .q/:

Hence, T�f is a continuous function. This yields Statement 1 in the lemma.Now, to prove the second part, assume that f is an increasing function. First,

we show thatT�f is an increasing function. Let q > q0. From (87), for any a < NAwe can write

.T�f /.q C a/ � .T�f /..q � 1/C C a/ � .T�f /.q

0 C a/C .T�f /..q0 � 1/C C a/

DC.q C a/ � C..q � 1/C C a/ � C.q0 C a/C C..q0 � 1/C C a/

C ˇEAf .q C aC A/ � ˇEAf ..q � 1/C C aC A/


� ˇEAf .q0 C aC A/C ˇEAf ..q

0 � 1/C C aC A/

�

Z ˇf .qCa/

ˇf .q0Ca/

p�.u/ duCZ ˇf ..q�1/CCa/

ˇf ..q0�1/CCa/

p�.u/ du

DC.q C a/ � C..q � 1/C C a/

� C.q0 C a/C C..q0 � 1/C C a/

C ˇEAf ..q C a � 1/C C A/ � ˇEAf ..q � 1/

C C aC A/

� ˇEAf ..q0 C a � 1/C C A/C ˇEAf ..q

0 � 1/C C aC A/

C

Z ˇf .qCa/

ˇf .q0Ca/

1 � p�.u/ duCZ ˇf ..q�1/CCa/

ˇf ..q0�1/CCa/

p�.u/ du

It can be easily verified that

EA.f .q C a � 1/C C A/ � EA.f .q � 1/

C C aC A/ � EA.f .q0 C a � 1/C C A/

CEA.f .q0 � 1/C C aC A/ � 0;

as f is increasing (due to Statement 1 of this lemma). From the assumption thatfis increasing, the last two terms in the above expression are also nonnegative. Now,taking expectation on both sides, we obtain T�f .q/ � T�f .q0/ � C.q/ �

C.q0/ > 0. Therefore, from Statements 2 and 3 of Lemma 1, we have

O��.q/ � O��.q0/ D OV�.q/ � OV�.q

0/ � C.q/ �C.q0/ > 0:

Here, the last inequality holds since C is a strictly convex increasing function.�

4.4 Existence of MFE

We now describe the steps involved in showing the existence of the MFE. In manycases we will only provide proof sketches to show how the argument proceeds.

Theorem 3. There exists an MFE��; O��

�such that

�.x/ D �.x/ , …�

�O��1� Œ0; x�

�;8x 2 R

C:

We first introduce some useful notation. Let‚ D f� W R 7! R; supq2RCˇˇ �.q/w.q/

ˇˇ <

1g: Note that ‚ is a normed space with w-norm. Also, let be the space ofabsolutely continuous probability measures on R

C. We endow this probability spacewith the topology of weak convergence.


We define �� W P 7! ‚ as .��.�//.q/ D O��.q/, where O��.q/ is the optimal bidgiven by Corollary 1. It can be easily verified that O�� 2 ‚. Also, define the mapping…� that takes a bid distribution � to the invariant workload distribution…�.�/. Later,using Lemma 3 we will show that …�.�/ 2 . Therefore, …� W P ! . Finally,define F as .F.�//.x/ D �.x/ D …�. O�

�1� .Œ0; x�//. Lemma 5 will show that F

maps P into itself.In order to prove the above theorem, we need to show that F has a fixed point,

i.e., F.�/ D �.

Theorem 4 (Schauder Fixed Point Theorem). Suppose F.P/ � P . If F.�/ iscontinuous and F.P/ is contained in a convex and compact subset of P , then F.�/has a fixed point.

We will show that the mapping F satisfies the conditions of the above theorem, andhence it has a fixed point. Note that P is a convex set. Therefore, we only need toverify the other two conditions.

To prove the continuity of mapping F , we first show that �� and …� arecontinuous mappings. To that end, we will show that for any sequence �n ! �

in uniform norm, we have ��.�n/ ! ��.�/ in w-norm and …�.�n/ ) …�.�/

(where) denotes weak convergence). Finally, we use the continuity of �� and …�

to prove that F.�n/! F.�/:Step 1: Continuity of the map ��

Theorem 5. The map �� is continuous.

Proof. Define the map V � W P 7! V that takes � to OV�.�/. We begin by showingthat k O��1 � O��2k� � Kk OV�1 � OV�2k�, which means that the continuity of the map V �

implies the continuity of the map ��. Next, we show two simple properties of theBellman operator. The first is that for any � 2 P and f1; f2 2 V ,

kT�f1 � T�f2k� � OKkf1 � f2k� (88)

for some large OK, independent of �.Second, let T�1 and T�2 be the Bellman operators corresponding to �1; �2 2 P

and let f 2 V . We show that

kT�1f � T�2f k� � 2.M � 1/K1kf k�k�1 � �2k: (89)

We then have

kT j�1OV�2 � T

j�2OV�2k� � (90)


kT j�1OV�2 � T

j�1�1

T�2OV�2k�

C kT j�1�1T�2OV�2 � T

j�2�1

T 2�2OV�2k� C � � �

C kT�1Tj�1�2OV�2 � T

j�2OV�2k�

� OKj�1kT�1OV�2 � T�2

OV�2k� C � � �

C kT�1Tj�1�2OV�2 � T

j�2OV�2k� (91)

� . OKj�1 C � � � C 1/kT�1OV�2 � T�2

OV�2k�

� 2.M � 1/Kk�1 � �2k. OKj�1 C � � � C 1/k OV�2k�

(92)

Here, (91) and (92) follow from (88) and (89), respectively.Now, let j be such that T j�1 is an ˛-contraction, which is guaranteed to exist by

Lemma 1. Note that Statement 1 of Lemma 1 implies that such a j < 1 exists.Then we have

k OV�1 � OV�2k� D kTj�1OV�1 � T

j�2OV�2k�

� kT j�1OV�1 � T

j�1OV�2k� C kT

j�1OV�2 � T

j�2OV�2k�

H) .1 � ˛/k OV�1 �OV�2k� � kT

j�1OV�2 � T

j�2OV�2k� (93)

Finally, from (92) and (93), we get

k OV�1 �OV�2k�

�2.m � 1/K. OKj�1 C � � � C 1/k�1 � �2k

1 � ˛k OV�2k�

�2.m � 1/K. OKj�1 C � � � C 1/k�1 � �2k

1 � ˛

.k OV�1k� C k OV�1 � OV�2k�/:

Therefore, if 2.m�1/K. OKj�1C��C1/1�˛

k�1 � �2k <12, then

k OV�1 � OV�2k�

�4.m � 1/K. OKj�1 C � � � C 1/

1 � ˛k OV�1k�k�1 � �2k

Hence, the maps V � and �� are continuous. �


Step 2: Continuity of the map …�

Let …�;� .:/ be the invariant distribution generated by any �: Recall that …� takes� 2 P to probability measure …�.:/ D …

�; O��.:/. First, we show that …�;� .:/ 2 ;

where is the space of absolutely continuous measures (with respect to Lebesguemeasure) on R

C.

Lemma 3. For any � 2 P and any � 2 ‚, …�;� .�/ is absolutely continuous withrespect to the Lebesgue measure on R

C.

Proof. …�;� .�/ can be expressed as the invariant queue-length distribution of thedynamics

q !

(Q0 C A with probability ˇ

R with probability .1 � ˇ/;

where A � ˆ and R � ‰; and Q0 is a random variable with distribution generatedby the conditional probabilities

P.Q0 D qjq/ D1 � p�. O�.q//

P.Q0 D .q � 1/Cjq/ Dp�. O�.q//

Let …0 be the distribution of Q0. Then for any Borel set B , … can be expressedusing the convolution of …0 and ˆ W

…�;� .B/ D ˇ

Z 1

�1

ˆ.B � y/d…0.y/C .1 � ˇ/‰.B/: (94)

IfB is a Lebesgue null set, then so isB�y 8y. So,ˆ.B�y/ D 0 and‰.B/ D 0and therefore …�.B/ D 0. �

We now develop a useful characterization of …�;� . Let

‡.k/

�;� .Bjq/ D P.Qk 2 Bjno regeneration, Q0 D q/

be the distribution of queue length Qk at time k induced by the transitionprobabilities given in (82) conditioned on the event that Q0 D q and that thereare no regenerations until time k. We can now express the invariant distribution…�;� .�/ in terms of ‡.k/

�;� .�jq/ as in the following lemma.

Lemma 4. For any bid distribution � 2 P and for any stationary policy � 2 ‚, theMarkov chain described by the transition probabilities given in (82) has a uniqueinvariant distribution …�;� .�/. Also …�;� and ‡.k/

�;� are related as follows:


…�;� .B/ DX

k�0

.1 � ˇ/ˇkE‰.‡.k/

�;� .BjQ//; (95)

where E‰.‡.k/

�;� .BjQ// DR‡.k/

�;� .Bjq/d‰.q/.

Proof. ‡.k/

�;� .Bjq/ is the queue length distribution assuming no regeneration hashappened yet, and the regeneration event occurs with probability ˇ independentlyof the rest of the system. It is then easy to find …�;� .B/ in terms of ‡.k/

�;� .Bjq/ bysimply using the properties of the conditional expectation, and the theorem follows.Note that in E‰.‡

.k/

�;� .BjQ//; the random variable is the initial condition of thequeue, as generated by ‰: �

We next prove the continuity of …� in �.

Theorem 6. The mapping …� W P 7! is continuous.

Proof. By Portmanteau theorem, (Billingsley 2009), we only need to show thatfor any sequence �n ! � in w-norm and any open set B , lim infn!1…�n.B/ �

…�.B/. By Fatou’s lemma,

lim infn!1

…�n.B/

D lim infn!1

1X

kD0

.1 � ˇ/ˇkE‰RŒ‡.k/�n.BjQ/�

�

1X

kD0

.1 � ˇ/ˇkE‰RŒlim infn!1

‡.k/�n.BjQ/� (96)

where Q � ‰R. Let ‡.k/� D ‡

.k/

�; O��: We finally show that lim infn!1‡

.k/�n .Bjq/ �

‡.k/� .Bjq/ for every q 2 R

C; and the proof follows. �

Step 3: Continuity of the mapping F Now, using the results from Step 1 and

Step 2, we establish continuity of the mapping F . First, we show that F.�/ 2 P .

Lemma 5. For any � 2 P , let �.w/ D .F.�//.w/ D …�. O��1� .Œ0;w�//;w 2 R

C.Then, � 2 P .

Proof. From the definition of …�, it is easy to see that � is a distribution function.Since O�� is continuous and strictly increasing function as shown in Lemma 2,O��1� .fwg/ is either empty or a singleton. Then, from Lemma 3, we get that

…�. O��1� .fwg// D 0. Together, we get that �.w/ has no jumps at any w and hence it

is continuous.


To complete the proof, we need to show that the expected bid under �.:/ is finite.In order to do this, we construct a new random process QQk that is identical to theoriginal queue length dynamics Qk , except that it never receives any service. Weshow that this process stochastically dominates the original, and use this property tobound the mean of the original process by a finite quantity independent of �: �

We now have the main theorem.

Theorem 7. The mapping F W P 7! P given by .F.�//.w/ D …�. O��1� .Œ0;w�// is

continuous.

Proof. Let �n ! � in uniform norm. From previous steps, we have O��n ! O�� in�-norm and …�n ) …�. Then, using Theorem 5.5 of Billingsley (2009), one canshow that the push forwards also converge:

…�n.O��1�n .�//) …�. O�

�1� .�//:

Then, F.�n/ converges point-wise to F.�/ as it is continuous at every w, i.e.,.F.�n//.w/! .F.�//.w/ for all w 2 R

C.Finally, it is easy to show that in the norm space P , point-wise convergence

implies convergence in uniform norm, which completes the proof. �

Step 4: F.P/ is contained in a compact subset of P We show that the closure

of the image of the mapping F , denoted by F.P/, is compact in P . As P is anormed space, sequential compactness of any subset of P implies that the subset iscompact. Hence, we just need to show that F.P/ is sequentially compact. Sequentialcompactness of a set F.P/ means the following: if f�ng 2 F.P/ is a sequence,then there exists a subsequence f�nj g and � 2 F.P/ such that �nj ! �. We useArzelà-Ascoli theorem and uniform tightness of the measures in F.P/ to show thesequential compactness. The version that we will use is stated below:

Theorem 8 (Arzelà-Ascoli Theorem). Let X be a � -compact metric space. LetG be a family of continuous real valued functions on X . Then the following twostatements are equivalent:

1. For every sequence fgng � G there exists a subsequence gnj which convergesuniformly on every compact subset of X .

2. The family G is equicontinuous on every compact subset ofX; and for any x 2 X ,there is a constant Cx such that jg.x/j < Cx for all g 2 G.

Suppose a family of functions D � P satisfies the equivalent conditions of theArzelá-Ascoli theorem and in addition satisfy the uniform tightness property, i.e.,8� > 0, there exists an x� such that for all f 2 D 1 � f .x�/ � 1� �. Then, for anysequence f�ng � D, there exists a subsequence f�nj g that converges uniformly on


every compact set to a continuous increasing function � on RC. As D is uniformly

tight it can be shown that �nj converges uniformly to � and that � 2 P . Therefore,D is sequentially compact in the topology of uniform norm.

In the following, we show that F.P/ satisfies uniform tightness property andcondition 2 in Arzelá-Ascoli theorem. First verifying the conditions of Arzelá-Ascoli theorem, note that the functions in consideration are uniformly bounded by 1.To prove equicontinuity, consider a � D F.�/ and let x > y.

�.x/ � �.y/ D …�.��.q/ � x/ �…�.��.q/ � y/

D …�.y < ��.q/ � x/ (97)

Lemma 6. For any interval Œa; b�,…�.Œa; b�/ < c �.b�a/, for some large enough c.

Proof. The proof follows easily from our characterization of…� in terms of ‡.k/� :�

The above lemma and Eq. (97) imply that �.x/ � �.y/ � c.��1� .x/ � ��1� .y//.

To show equicontinuity, it is enough to show that lim supy"x�.x/��.y/x�y

� K.x/ forsome K independent of �. This property follows from our characterization of theoptimal bid function.

Finally, we have the following lemma showing that F.P/ is uniformly tight.

Lemma 7. F.P/ is uniformly tight, i.e., for any � > 0 and any f 2 F.P/, thereexists an x� 2 R such that 1 � � � f .x�/ � 1.

Proof. From Lemma 5, we have F.P/ � P . Hence, the expectation of the biddistributions in F.P/ is bounded uniformly. An application of Markov inequalitywill give uniform tightness. �

4.5 Properties of MFE

As we showed above, the bid function O��.q/ is monotone increasing in q regardlessof �: This property implies that the service regime corresponding to MFE is identicalto the LQF policy. The result essentially says that there is no price of anarchyinduced by the auction-based scheduling policy! In other words, the desirableproperties of LQF are a natural result of auction-based scheduling.

5 Notes

The VCG mechanism was developed by Clarke (1971) and Groves (1973) as ageneralization of an auction called the second-price auction due to Vickrey (1961).The Kelly mechanism is due to Kelly (1997). The price of anarchy for strategic users


using the Kelly mechanism was computed by Johari and Tsitsiklis in (2004). Theinterest in the Kelly mechanism is due to the fact that it has a simple decentralizedimplementation when the users are price taking. If one is more generally interestedin truth-revealing mechanisms using one-dimensional bids, then there is recent workon the design of such mechanisms: the interested reader is referred to the works ofMaheswaran and Basar (2006), Yang and Hajek (2007) and Johari and Tsitsiklis(2005).

The general class of selfish routing with infinitesimal agents (called non-atomic)was first discussed by Pigou (1920). The Baress’ paradox was discovered by Braess(1968). The results on selfish routing discussed in this chapter is a simplifiedversion of the work of Roughgarden and Tardos (2002). They also showed inthe same work that the result can be generalized to networks with affine costfunctions, with a price of anarchy of at most 4=3: Our development follows thepresentation in Roughgarden (2016). More recent developments in this area andfurther generalizations can be found in Roughgarden (2015) and Roughgarden andSchoppmann (2015).

The topic of “mean field games” has been covered in general terms in thisHandbook in a chapter by that name, where detailed information on historicaldevelopments can be found. More relevant to the topic of this chapter, the MFGapproach has recently been used in several different problems on games with alarge number of agents, each subset of which meet infrequently. Examples includeTembine et al. (2009), Borkar and Sundaresan (2012), Xu and Hajek (2012),Adlakha and Johari (2013), Iyer et al. (2014), Manjrekar et al. (2014), and Liet al. (2015, 2017). The framework lends itself readily to the modeling an analysisof many realistic systems. For example, Iyer et al. (2014) consider advertiserscompeting via a second-price auction for spots on a webpage. The bid must liein a finite real interval, and the winner can place an ad on the webpage. In thespace of queueing systems, Xu and Hajek (2012) consider the game of sampling anumber of queues and joining one. The mean field results on scheduling presentedin this chapter are based primarily on work by Manjrekar et al. (2014). Theidea of infrequent interactions between subsets of players is exploited in a recentapplication of the mean field game framework for mechanism design to incentivizetruth-telling about one’s ability to help peer devices in a device-to-device (D2D)network setting by Li et al. (2017). Another application is on designing “nudgesystems” for modifying societal behavior through providing incentives such aslottery tickets. An application of this kind in the context of electricity networks isstudied by Li et al. (2015), where the objective is demand response, i.e., modifyingone’s usage pattern when demand is high; demand response management in powernetworks in the large population regime has been covered in another chapter in theHandbook. Here, the agents are electricity consumers who must tradeoff the cost ofmodifying ones’ usage (say by resetting their air conditioner temperature) versus theprobability of a reward by obtaining many lottery tickets, under some belief aboutwhat the other consumers would do.

The asymptotic validity of the mean field assumption usually follows form aso-called chaos hypothesis, which essentially says that the correlation between the


states of any finite subset of agents decays as the number of agents become large.Results of this nature are available in work by Graham and Méléard (1994) and canbe used in the context of our scheduling game. There has been recent work studyingthe question of the conditions required to ensure that the mean field model is indeedthe limiting case of the finite system. Work by Benaïm and Le Boudec (2008)and Borkar and Sundaresan (2012) provide regularity conditions under which thepassage to the mean field is valid.

Conclusion

In this chapter, we considered three applications of game theory to problems relatedto routing and resource allocation in communication networks. In doing so, weexplored three game theoretic equilibrium concepts in different settings, namely,(i) Nash Equilibrium in the problem of resource allocation to a finite set of agents,(ii) Wardrop Equilibrium in the context of selfish routing by an infinite number ofagents, and (iii) Mean Field Equilibrium in the setting of repeated resource alloca-tion to an infinite number of agents. In each case, we presented a model that pertainsto a particular layer of the networking stack, and attempted to characterize the effectsof strategic decision making on system performance as relevant to that layer. Wenoted that strategic decision making by agents can degrade system performance,and showed tight bounds on the performance degradation in these cases.

References

Adlakha S, Johari R (2013) Mean field equilibrium in dynamic games with strategic complemen-tarities. Oper Res 61(4):971–989

Benaïm M, Le Boudec J-Y (2008) A class of mean field interaction models for computer andcommunication systems. Perform Eval 65(11–12):823–838

Billingsley P (2009) Convergence of probability measures. Wiley-Interscience, HobokenBorkar V, Sundaresan R (2012) Asymptotics of the invariant measure in mean field models with

jumps. Stoch Syst 2(2):322–380Braess D (1968) Über ein paradoxon aus der verkehrsplanung. Unternehmensforschung

12(1):258–268Clarke EH (1971) Multipart pricing of public goods. Public Choice 11(1):17–33Graham C, Méléard S (1994) Chaos hypothesis for a system interacting through shared resources.

Probab Theory Relat Fields 100(2):157–174Groves T (1973) Incentives in teams. Econometrica J Econom Soc 41(4):617–631Hernández-Lerma O, de Ozak MM (1992) Discrete-time Markov control processes with dis-

counted unbounded costs: optimality criteria. Kybernetika 28(3):191–212Iyer K, Johari R, Sundararajan M (2014) Mean field equilibria of dynamic auctions with learning.

Manag Sci 60(12):2949–2970Johari R, Tsitsiklis JN (2004) Efficiency loss in a network resource allocation game. Math Oper

Res 29(3):407–435Johari R, Tsitsiklis JN (2005) Communication requirements of VCG-like mechanisms in convex

environments. In: Proceedings of the Allerton Conference on Control, Communications andComputing, pp 1191–1196


Kelly FP (1997) Charging and rate control for elastic traffic. Eur Trans Telecommun 8:33–37Li J, Xia B, Geng X, Ming H, Shakkottai S, Subramanian V, Xie L (2015) Energy coupon: a mean

field game perspective on demand response in smart grids. ACM SIGMETRICS Perform EvalRev 43(1):455–456

Li J, Bhattacharyya R, Paul S, Shakkottai S, Subramanian V (2017) Incentivizing sharing inrealtime D2D streaming networks: a mean field game perspective. IEEE/ACM Trans Netw25(1):3–17

Maheswaran RT, Basar T (2006) Efficient signal proportional allocation (ESPA) mechanisms:decentralized social welfare maximization for divisible resources. IEEE J Sel Areas Commun24(5):1000–1009

Manjrekar M, Ramaswamy V, Shakkottai S (2014) A mean field game approach to scheduling incellular systems. In: Proceedings IEEE INFOCOM, 2014, Toronto, pp 1554–1562

Meyn SP, Tweedie RL, Glynn PW (2009) Markov chains and stochastic stability, vol 2. CambridgeUniversity Press, Cambridge

Pigou AC (1920) The economics of welfare. Palgrave Macmillan, New YorkRoughgarden T (2015) Intrinsic robustness of the price of anarchy. J ACM 62(5):32Roughgarden T (2016) Twenty lectures on algorithmic game theory. Cambridge University Press,

CambridgeRoughgarden T, Schoppmann F (2015) Local smoothness and the price of anarchy in splittable

congestion games. J Econ Theory 156:317–342Roughgarden T, Tardos E (2002) How bad is selfish routing? J ACM 49(2):236–259Shakkottai S, Srikant R (2007) Network optimization and control. Found Tr Netw 2(3):271–379Srikant R (2004) The mathematics of internet congestion control. Birkhauser, New YorkStrauch RE (1966) Negative dynamic programming. Ann Math Stat 37(4):871–890Tembine H, Le Boudec JY, El-Azouzi R, Altman E (2009) Mean field asymptotics of Markov

decision evolutionary games and teams. In: International Conference on Game Theory forNetworks (GameNets), pp 140–150

Vickrey W (1961) Counterspeculation, auctions, and competitive sealed tenders. J Financ 16(1):8–37

Xu J, Hajek B (2012) The supermarket game. In: IEEE International Symposium on InformationTheory (ISIT, 2012), Cambridge, pp 2511–2515

Yang S, Hajek B (2007) VCG-Kelly mechanisms for allocation of divisible goods: adapting VCGmechanisms to one-dimensional signals. IEEE J Sel Areas Commun 25(6):1237–1243

communication networks: pricing, congestion control, routing, … · 2019-12-18 · communication...

Documents