the mathematical parallels between packet switching and ... · message = 0101 (b) transmission...

arX

iv:c

s/06

1005

0v2

[cs.

IT]

22 N

ov 2

006

1

The Mathematical Parallels Between PacketSwitching and Information Transmission

Tony T. Lee,Fellow, IEEE

Abstract— All communication networks comprise of transmis-sion systems and switching systems, even though they are usuallytreated as two separate issues. Communication channels aregenerally disturbed by noise from various sources. In circuitswitched networks, reliable communication requires the error-tolerant transmission of bits over noisy channels. In packetswitched networks, however, not only can bits be corruptedwith noise, but resources along connection paths are also subjectto contention. Thus, quality of service (QoS) is determinedbybuffer delays and packet losses. The theme of this paper is toshow that transmission noise and packet contention actually havesimilar characteristics and can be tamed by comparable means toachieve reliable communication. The following analogies betweenswitching and transmission are identified.

1. Buffering against contention is a process that is similartothe error correction of noise corrupted signals. A signal-to-noiseratio that represents the carried load of packet switches can bededuced from the Boltzmann model of packet distribution.

2. When deflection routing is applied to Clos networks theloss probability decreases exponentially, which is similar to theexponential behavior of the error probability of binary symmetricchannels with random channel coding. In information theory, thisresult is stated as the noisy channel coding theorem.

3. The similarity between Hall’s condition of bipartite match-ing and expander graph manifests the resemblance betweennonblocking route assignments and error-correcting codes. Anextension of the Sipser-Speilman decoding algorithm of expandercodes to route assignments of Benes networks is given to illustratetheir correspondence.

4. Scheduling in packet switching serves the same functionas noiseless channel coding in digital transmission. The smooth-ness of scheduling, like source coding, is bounded by entropyinequalities.

5. The sampling theorem of bandlimited signals providesthe cornerstone of digital communication and signal processing.Recently, the Birkhoff-von Neumann decomposition of trafficmatrices has been widely applied to packet switches. With respectto the complexity reduction of packet switching, we show that thedecomposition of a doubly stochastic traffic matrix plays a similarrole to that of the sampling theorem in digital transmission.

We conclude that packet switching systems are governed bymathematical laws that are similar to those of digital transmissionsystems as envisioned by Shannon in his seminal 1948 paper,AMathematical Theory of Communication.

Index Terms— Clos network, complete matching, channel cod-ing, source coding, sampling theorem, scheduling

I. I NTRODUCTION

A LL communication networks comprise of transmissionsystems and switching systems, even though they are

Manuscript received xxxx, xxxx. This work was supported by xxxx.The authors are with the Department of Information Engineering, the

Chinese University of Hong Kong, Shatin, N.T., Hong Kong, China.

Time

Delay due to contention

Connection request ( )f i j=

(a) Packet switching with contention

0101

0111

0100

0001

1111

Message = 0101

(b) Transmission channel with noise

Fig. 1. The parallel characteristics between packet switching and digitaltransmission

usually treated as two separate issues. Communication chan-nels are generally disturbed by noise from various sources.In circuit switched networks, resources are dedicated to con-nections and reliable communication only requires the error-tolerant transmission of bits over noisy channels. In packetswitched networks, however, not only can bits be corruptedwith noise, but resources along connection paths are alsosubject to contention. Despite the great achievements of infor-mation theory in dealing with transmission noise [1]–[3], thereare still many networking problems in modern communicationsystems that information theory is unable to provide solutionsfor. A particular problem that has come to the fore is the delaythat is induced by contention in packet switched networks, thesolution for which lies in extending the theory to go beyondthe bit-rate of transmission [4].

The theme of this paper is to show that transmission noiseand packet contention actually have similar mathematicalcharacteristics and can be tamed by comparable means toachieve reliable communication. The source information ofthe transmission channel is a function of time, and errors arecorrected by coding scheme, which is an expansion of signalspace. For switching systems, source information is a spacefunctionf(i) = j, for i = 1, 2, ···, N form inputsVI to outputsVo, which represents a set of connection requests. Packets lostin contention are usually buffered or deflected to stretch outtime and defer their requests. As shown in Fig. 1, the twoprocesses are anti-symmetric with respect to time and spaceand they are both governed by the law of probability.

In transmission systems, the fundamental QoS parameteris bit-rate, or channel capacity, which can be pinpointedby signal-to-noise ratio. In packet switched networks, QoSparameters, buffer delay and packet loss, are all determined byloading. In order to provide a common ground for comparison,the carried load of a packet switch is converted into a pseudosignal-to-noise ratio (PSNR) induced by Boltzmann’s modelof packet distribution. Based on this correspondence betweencarried load and PSNR, we show that a packet switched Clos

http://arxiv.org/abs/cs/0610050v2

2

TransmitterChannel

Capacity CReveiver DestinationSource

Noise source

Temporal

information source:

function f(t) of time t

Message SignalReceived

signal

(a) Shannon’s general communication system

DestinationSource

Spatial information

source: function f(i)

of space i=0,1,..., N-1

0

k-1

0

m-1

0

k-1

Internal contention

0

n-1

0

n-1

N-n

N-1

N-n

N-1

Input module

n m×Central module

k k×Output module

m n×

Channel capacity = m

(b) A packet-switched Clos network C(m,n, k)

Fig. 2. Comparison of transmission systems and switching systems

network with random routing can be modelled as an abstractchannel with additive Gaussian noise.

A schematic diagram of transmission channel and a three-stage Clos network [5] are shown in Fig. 2. The inputmodules in the first stage and output modules in the thirdstage of the Clos network correspond to transmitters andreceivers, respectively, of the transmission system. The numberof central modules is the bandwidth of the Clos network.The disturbances due to internal contentions in the middlestage of a packet switched Clos network mimic the noise ofa channel. In this paper, we demonstrate that various routingschemes applied to a packet switched Clos network to copewith contentions are comparable to coding schemes of atransmission channel. The routing schemes of packet switchingand their counterparts in digital transmission are listed in Fig.3.Depending on the implementation of routing schemes, andsimilar to the separation of channel coding and source codingin information theory, the Clos network can either be modelledas a noisy channel with nonblocking routing, analogous tochannel coding, or a noiseless channel with route scheduling,analogous to source coding, as explained below.

The noisy channel model of the Clos network. We firstcompare the deflection routing of packet switching with therandom coding over noisy channel. Errors introduced by noisesin transmission are subdued by redundant bits, while packetcontentions can be relieved by redundant links in switching.When deflection routing is applied to Clos networks, the lossprobability decreases exponentially with respect to the networklength, which is similar to the exponential behavior of the errorprobability of binary symmetric channels with random channelcoding. In information theory, this result is stated as the noisychannel coding theorem [1], [2], [6], [7] .

In contrast to deflection routing, nonblocking routing inthe Clos network will avoid internal contentions completelyby conflict-free route assignments according to input requests[8]–[10]. The route assignment algorithms are based on thematching of bipartite graphs. In the error-correcting code, thelow density parity check (LDPC) code developed by Gallager

Packet Switched Clos Network

Random routing

Deflection routing

Route assignment

BvN decomposition

Path switching

Scheduling

Noisy channel capacity theorem

Noisy channel coding theorem

Error-correcting code

Sampling theorem

Noiseless channel

Noiseless coding theorem

Transmission Channel

Fig. 3. Analogies between packet switching and informationtransmission

can also be represented by a bipartite graph [11], [12], and asubclass called the expander code was constructed by Sipserand Speilman [13]. The condition on the expander graph isa generalization of Hall’s condition on complete matching,which suggests the resemblance between nonblocking routeassignments and error-correcting codes. An extension of theSipser and Speilman decoding algorithm of expander codesto route assignments of Benses networks is given to illustratetheir correspondence.

The noiseless channel model of the Clos network. Adrastically different approach, calledpath switching, to dealwith contention in a Clos network is proposed in [14]. A setof connection patterns of central modules is determined by thetraffic matrix decomposition for this purpose. Path switchingperiodically uses this finite set of predetermined connectionsin the central stage of the Clos network to avoid online com-putation of route assignments. Once the connection patterns ofcentral modules are fixed in each time slot, incoming packetscan be scheduled accordingly in the input buffer. Regardingthe predetermined connections as a code book, scheduling isaprocess similar to source coding in digital transmission [1], [2],[6], [7]. The smoothness of scheduling, like noiseless coding,is bounded by entropy inequalities.

The decomposition of traffic matrices, sometimes calledthe Birkhoff-von Neumann decomposition, has been widelyused in SS/TDMA satellite communications [15], [16]. Pathswitching adopted this scheduling scheme in packet switchingsystems to guarantee the capacity of virtual paths in Closnetworks. The same approach was applied to the crossbarswitch, also called the Birkhoff-von Neumann switch, in[17]. Mathematically, the series expansion and reconstructionof doubly stochastic capacity matrices are similar to theFourier series expansion and interpolation of sampling theoremof bandlimited signals [18]–[21]. They also serve the samefunction in terms of complexity reduction of communicationsystems. The capacity matrix decomposition employed in pathswitching will reduce the dimension of permutation space ofa Clos network fromN ! to O(N2) or even lower, whilethe sampling theorem in transmission reduces the infinitedimensional signal space of any durationT to a finite numberof samples.

The remainder of the paper is organized as follows. Insection II, we propose the definition of pseudo signal-to-noiseratio (PSNR) of packet switch, and prove that the trade-off

3

between the bandwidth and PSNR of a Clos network is thesame as that given by noisy channel capacity theorem intransmission. In section III, we show that the loss probabilityof the Clos network with deflection routing is similar tothe exponential error probability of binary symmetric chan-nels with random coding. In section IV, we demonstrate thecorrespondence between nonblocking route assignments anderror-correcting codes. In section V, we address the capacityallocation and capacity matrix decomposition issues related tothe path switching implemented on a Clos network. In sectionVI, the entropy inequalities of smoothness of schedulingare derived, and comparisons of scheduling algorithms arediscussed. Finally, the conclusion and future research aresummarized in section VII.

II. PARALLEL CHARACTERISTICS OFCONTENTION AND

NOISE

The apparent causes of contention and noise are quite dif-ferent, but they both limit the performance of communicationsystems. In transmission, the trade-off between bandwidthandsignal-to-noise ratio is given by Shannon-Hartley’s noisychan-nel capacity theorem [19]. In packet switching, the differencebetween offered load and carried load reflects the degree ofcontention. In order to provide a common ground to comparecontention and noise, a pseudo signal-to-noise ratio is definedto represent the carried load of a packet switch. We show thatthe packet-switched Clos network with random routing can bemathematically modelled as a noisy channel.

A. Pseudo Signal-to-Noise Ratio of Packet switch

Output port contention occurs in a crossbar switch whenseveral packets are destined for the same output in a timeslot. Packets lost in contention will be dropped as shown inFig. 4. Consider anN ×N crossbar switch, without any priorknowledge of input traffic. We assume that input loadingρis homogeneous and output address is uniformly distributed,then the carried loadρ′ is the probability that an output isbusy in any given time slot

ρ′ = 1− (1− ρ

N)N

N→+∞−→ 1− e−ρ (1)

The difference between offered load and carried load iscaused by packet dropping due to contention at the outputs.Benes considered the number of calls in progress as the energyof the connecting network in the thermodynamics theory oftraffic in a telephone system [22]. A similar traffic model wasalso explored in the broadband network [23]. If we regardpackets as energy quantum, this proposition can be adopted in

01

N-1

01

N-1

ρ ρ ′/ Nρ

/ Nρ

Fig. 4. The carried load of anN ×N crossbar switch

packet switching and stated as follows.

Proposition1: (Benes): Thesignal powerSp of anN ×Ncrossbar switch is the number of packets carried by the system,and thenoise powerequals toNp = N − Sp. Furthermore,both the signal power and noise power are Gaussian randomvariables whenN is large.

This proposition is verified by the Boltzmann model ofpacket distribution given in Appendix I [24]. It follows thatthe carried load of a crossbar switch can be converted topseudo signal-to-noise ratio, a concept that is comparable tothe signal-to-noise ratio (SNR) in transmission.

Definition 1: The pseudo signal-to-noise ratio (PSNR) ofa crossbar switch is the ratio of mean signal power to meannoise power

PSNR=E[Sp]

E[Np]=

Nρ′

N −Nρ′=

ρ′

1− ρ′(2)

It should be noted that the ratio of signal power to noisepower SNR in transmission is defined by the ratio of thesecond moment of signal to that of noise, but PSNR is theratio of their mean power here. Next, we will investigate thetrade-off between bandwidth and PSNR of the Clos network.

B. Clos Network with Random Routing as A Noisy Channel

In a three-stage Clos network, each pair of adjacent switchmodules are interconnected by a unique link, and each moduleis a crossbar switch. As shown in Fig. 5, theN × N Closnetwork withk input/output modules andm central modulesis denoted byC(m,n, k) [5], [8], wheren is the number ofinput ports or output ports of each input module or outputmodule, whereN = kn. The dimensions of the modules inthe input, middle and output stages aren×m, k×k andm×n,respectively. The Clos networkC(m,n, k) has the followingcharacteristics:

1) Any central module can only assign tooneinput of eachinput module, andoneoutput of each output module.

2) Source addressS and destination addressD can beconnected through any central module.

3) The number of alternative paths between inputS andoutputD is equal to the number of central modulesm.

0 00

1k − 1m− 1k −

1m−

1m−

OutputInput0

1n−

0

1n−

n m×0

0

k k× m n×

Input stage Middle stage Output stage

SQDQ

0

1n−

SnQ

1SnQ n+ −S SnQ R+

GG

1k−

1k −

0

1k −

0

1k −

0

0

0

1n−

( 1)n k−

1nk−

( 1)n k−

1nk−

RS

QS QD

G RD

D DnQ R+DnQ

1DnQ n+ −

1m−

1m−

0

0

0

1n−

0

1n−

Fig. 5. The Clos network C(m,n, k)

4

In circuit switching theory, it is known that the Clos networkC(m,n, k) is rearrangeably nonblocking ifm ≥ n and strictlynonblocking if m ≥ 2n − 1 [8], [25]. Consider the numberof cental modulesm as the bandwidth, the trade-off betweenbandwidth m and PSNR ofC(m,n, k) with nonblockingrouting is given by

E[Sp]

E[Np]=

n

m− n(3)

In a packet switched Clos network, the maximal data rate ofeach input module isn packets per time slot. Thus we define:

Definition 2: The ratioσ = n/m is themaximal utilizationof a Clos networkC(m,n, k).

The route of each packet can be expressed by the numberingscheme of the Clos network [26]. The switch modules in eachstage of the network, as well as the links associated witheach module, are independently labelled from top to bottom.According to this numbering scheme, the source addressS isrepresented by the 2-tupleS(QS , RS), indicating the sourceSis the linkRS of the input moduleQS , whereRS = [S]n andQS = ⌊S/n⌋ are the remainder and quotient ofS divided byn.Similarly, the destination addressD can be represented by the2-tupleD(QD, RD). The path of an input packet from sourceS to destinationD is determined by the choice of centralmodule. Suppose the path goes through the central moduleG,then the routing tag is(G,QD, RD) and the path is expressedby:

S(QS, RS)→ QSG→ G

QD→ QDRD→ D(QD, RD)

In a packet switched Clos networkC(m,n, k), contentionsmay occur in the middle stage even if destination addresses ofinput packets are all different. With random routing scheme,the trade-off between bandwidthm and PSNR is given asfollows:

Theorem1: The maximal data rate of each input moduleof the packet switched Clos networkC(m,n, k) with randomrouting is given by

n = m ln(1 +E[Sp]

E[Np]) (4)

Proof:The maximal data rate is achieved when the input loading

ρ = 1 and destination addresses of input packets are alldifferent in every time slot, in which case input modules andoutput modules are contention-free. If the central module isselected randomly for each input packet, then the loading oneach input link of a central module is given by

σ = n/m

Since each switch module of the Clos network is a crossbarswitch, the carried load on each output link of a central moduleis

σ′ = 1− (1 − n/m

k)k

k→+∞−→ = 1− e−n/m (5)

n 2n-1 m

[ ]

[ ]p

p

E S

E N

1.11

0.69

Rearrangeablly nonblocking Strictly nonblocking

Circuit switchingwith nonblocking routing

Packet switchingwith random routing

Fig. 6. Trade-off between bandwidthm and PSNR of C(m, n, k)

Substituting the following PSNR for the carried loadσ′ in (5)

E[Sp]

E[Np]=

kmσ′

km− kmσ′ =σ′

1− σ′

and taking logarithm, we obtain (4).

The trade-offs given in (3) and (4) for constant data ratenpackets per time slot of each input/output module are plottedin Fig. 6, which shows the improvement of PSNR that can beachieved by contention-free routing. The same kind of trade-off between bandwidth and SNR in the transmission channelis stated in the Shannon-Hartley theorem [19] on noisy cannelcapacity as follows:

Noisy Channel Capacity TheoremThe channel capacity ofa bandlimited Gaussian channel in the presence of additiveGaussian noise is given by

C = W log(1 +S

N) (6)

whereC is the capacity in bits per second,B is the bandwidthof the channel in Hertz, andSN is the signal-to-noise ratio.

With the understanding that the contention is mathemati-cally analogous to noise, the rest of paper seeks to investi-gate the connections between the various routing schemes ofthe Clos network and the corresponding coding schemes ofthe transmission channel. We first demonstrate the similaritybetween deflection routing and random coding in the nextsection.

III. C LOS NETWORK WITH DEFLECTION ROUTING

Packet contention is inevitable in packet switching. One wayof solving this problem without having to buffer the losingpacket is to use deflection routing. In contrast to ”store andfor-ward” routing where the network allows buffering, deflectionrouting, also known as hot potato routing, is a routing schemewithout buffering and can only be implemented for packet-switched networks [27]. In case the individual communicationlinks cannot support more than one packet at a time, excessivepackets will be transferred to other available links. Deflectionrouting actually utilizes idle or redundant links of the networkas temporary storages. Redundancy is built into the switch

5

design so that deflected packets can use extra stages to correctthe deviated routes.

A. Cascaded Clos Network

A cascaded Clos network is constructed by a sequenceof alternateC(n, n, k) and C(k, k, n), as illustrated in Fig.7. Each output link of a switch module is connected to amodule in the next stage and an output concentrator (notshown in Fig. 7). A packet will be sent to the concentrator if itsdestination address matches the output numbering, otherwise itwill continue with the remaining journey. The loss probabilitycan be made arbitrarily small by providing a large enoughnumber of stages.

In the cascaded Clos network, the destination addressDof a packet can either be expressed byD = nQ1 + R1 inthe C(n, n, k) network, orD = kQ2 + R2 in the C(k, k, n)network, as shown in the example given in Fig. 7, and thereforethe routing tag in the packet header comprises a 4-tuple(Q1, R1, Q2, R2). A packet can reach its destinationD inany two consecutively successful steps, either(Q1, R1) or(Q2, R2). The Markov chain shown in Fig. 8 describes thecomplete journey of a packet, wherepi and qi = 1 − pi arerespective probabilities of success and deflection inC(n, n, k)andC(k, k, n).

B. Analysis of Deflection Clos Network

For the sake of simplicity, we assume that the cascaded Closnetwork is formed byC(n, n, n) network uniformly. Since wewill consider the worst case deflection probability when bothnandk are sufficiently large, the assumption thatn = k does notlose the generality of analysis, and the packet loss probabilitywill be a conservative estimate. Under this presumption, asimplified Markov chain is depicted in Fig. 9.

The probability of successp in an n× n switch module isa function of input loadingρ. Under the homogeneous traffic

012

345

01

23

45

1

1

2

012

012

012

012

01

01

01

1

1

2 2

1

Deflection Output

1

C(3,3,2)C(2,2,3)

Destination Q1 R1

0 0 0

1 0 1

2 0 2

3 1 0

4 1 1

5 1 2

Routing coding in C(3,3,2)

Destination Q2 R2

0 0 0

1 0 1

2 1 0

3 1 1

4 2 0

5 2 1

Routing coding in C(2,2,3)

D = 3Q1+R1 D = 2Q2+R2

Output

Fig. 7. Cascaded Clos network with deflection routing

2q1Q

2Q

O

1R

2R

1p

2p

1p1q

2p

2q 1q Output

Input

Fig. 8. The Markov chain of deflection routing

Input

Q R O

q

q

p p

Output

Fig. 9. The simplified Markov chain of deflection routing

assumption, we have

1− pρ = (1 − ρ

n)n

n→∞−−−−→ e−ρ.

Thus, the worst probability of success is given by

p =1− e−ρ

ρ

The probabilities of successp and deflectionq = 1−p versusthe loadingρ are plotted in Fig. 10(a). LetGi(k) be theprobability that the packet in statei will reach the output stateO in exactlyk steps. The following equations can be derivedfrom the Markov chain shown in Fig.9.

GO(k) =

{1 if k = 00 otherwise.

GR(k) = pG0(k − 1) + qGQ(k − 1), k = 1, 2, . . .

GQ(k) = pGR(k − 1) + qGQ(k − 1), k = 1, 2, . . . (7)

The generating functionGQ(k) is given by

GQ(z) =

∞∑

k=0

GQ(k)zk =

p2z2

1− qz − pqz2, (8)

from which we obtain the following steady state probabilities

GQ(k) =

0 k = 0, 1pvq (pq)

(k+1)/2 cosh(k − 1)θ k = 2, 4, 6 · · · ·pvq (pq)

(k+1)/2 sinh(k − 1)θ k = 3, 5, 7 · · · ·(9)

where v =

√q2+4pq

2 and θ = lnq+√

q2+4qp

2√pq . For given

network lengthL, a conservative estimate of packet lossprobability is given as follows.

Ploss ≤∞∑

k=L+1

GQ(k), (10)

which can be explicitly expressed

Ploss ≤{

1vq (pq)

(L+2)/2 sinh(L+ 2)θ for even lengthL ;1vq (pq)

(L+2)/2 cosh(L+ 2)θ for odd lengthL.

6

0 10 20 30 40 50 60

-8

-6

-4

-2

0

L

10 losslog P

0.1ρ = 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

ρ

ac

(b) Parameters in equation (7)

(c) Loss probabilities versus network length

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

11

0.8

0.6

0.4

0.2

0.0

0 0.2 0.4 0.6 0.8 1

p

q

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

11

0.8

0.6

0.4

0.2

0.00 0.2 0.4 0.6 0.8 1

ρ

(a) Deflection probability

Fig. 10. Loss probabilities of deflection routing

WhenL is large, the logarithm ofPloss is linear inL as shownby the curves in Fig.10(c)

lnPloss ≤ m(L + 2) + b, (11)

wherem = ln(q+√

q2+4pq

2 ) and b = ln( 1

q√

q2+4pq). In the

worst case whenρ = 1, p = 0.6321, q = 0.3679, m =−0.3566, andb = 0.9683.

Theorem2: If the offered loadρ ≤ 1 on each input of theClos network with deflection routing, then the loss probabilitycan be arbitrarily small and the carried loadρ′ on each outputcan be arbitrarily close to the carried loadρ.

Proof: It follows from (11) that the loss probability interms of offered loadρ and carried loadρ′ is bounded by

Ploss =ρ− ρ′

ρ≤ ca−L. (12)

The two parametersa andc are given by following functionsof deflection probabilityq

a =2

q +√

q2 + 4pq> 1,

and

c =(q +

√

q2 + 4pq)2

4q√

q2 + 4pq> 1.

The parametersa andc versus the offered loadρ are plotted inFig. 10(b), in whicha = 1.4285 andc = 1.2906 whenρ = 1.

Conversely, if the offered loadρ > 1, then the lossprobability is unbounded, because the number of packets inputto a switch module could be more than the total number ofoutputs and there is not enough space to deflect packets, whichcan be lost halfway through.

Compare the deflection Clos network with the binary sym-metric channel (BSC) with random coding, as illustrated inFig. 11. Table I provides the parallels between the abovetheorem and Shannon’s main theorem [1], [2], [6], [7] statedas follows.

Noisy Channel Coding TheoremGiven a noisy channelwith capacityC and information transmitted at a rateR,then if R ≤ C, there exists a coding technique which allowsthe probability of error at the receiver to be made arbitrarilysmall. This means it is possible to transmit information withouterror up to a limit C. However, ifR > C, the probability oferror at the receiver increases without bound.

The capacity of BSC with cross probabilityq = 1 − p isgiven by

C = 1+ p log p+ (1 − p) log(1− p) (13)

p 0

1

0

1 p

q

q

p 0

1

0

1

0

1

0

1 p

q

q

Fig. 11. Binary Symmetric Channel (BSC)

The noisy channel coding theorem states that there existsan encoding functionE : {0, 1}k → {0, 1}n and a decodingfunctionD : {0, 1}n → {0, 1}k, such that the error probabilitythat the receiver gets a wrong message is bounded by

Pe ≤ a−n + c−n a > 1, c > 1 (14)

if the code rateR = kn = C−δ for someδ > 0 [28]. Thus, the

error probabilityPe can be arbitrarily small for a sufficientlylarge n. Conversely, if the rateR = k

n = C + δ for someδ > 0, then the error probability is unbounded.

Deflection Clos Network Binary Symmetric ChannelDeflection Probabilityq < 1

2Cross Probabilityq < 1

2

Deflection Routing Random Codingρ ≤ 1 R ≤ C

Exponential Loss Probability Exponential Error ProbabilityPloss ≤ ca−L Pe ≤ a−n + c−n

Complexity increases Complexity increaseswith network lengthL with code lengthn

Equivalent set of outputs Typical set decoding

TABLE I

COMPARISON OF DEFLECTIONCLOS NETWORK AND BINARY SYMMETRIC

CHANNEL

The similar behavior of the deflection routing of the Closnetwork and the random coding of noisy channel suggests the

7

connection between nonblocking route assignments and error-correcting codes. This analogy can be extended to multistageinterconnection networks as well. The deflection routing ofthe tandem banyan network proposed in [29] is equivalent tothe concept of error-detection and retransmission, while thedeflection routing of the dual-shuffle exchange network is adistributed error-correction algorithm [30]. These points willbe further elaborated on in the next section.

IV. ROUTE ASSIGNMENTS ANDERROR-CORRECTING

CODES

The Clos networkC(m,n, k) is rearrangeable ifm ≥ n,meaning that it can realize connections of any permutationsbetween inputs and outputs with the possibility of rearrangingthe calls in progress. Both Slepian-Duguid’s nonblockingtheorem [8], [25] and M. C. Paull’s rearrangeable theorem[31] on route assignments of the Clos network are rooted inmatching theory, which starts with Hall’s marriage theoremthat gives the existence condition of a complete matching ina given bipartite graph [32]. In the error-correcting code,thelow density parity check (LDPC) code can also be representedby a bipartite graph [11], [12]. The connection between routeassignment and graphic code in the context of bipartite graphis the main point addressed in this section.

A. Complete Matching in Bipartite Graphs

A bipartite graphG = (VL, VR, E) consists of two finitesetsVL, VR of vertices, and a collection of edgesE connectingvertices inVL to vertices inVR.

Definition 3: A complete matchingin a bipartite graphGis an injective functionf : VL → VR so that for everyx ∈ VL,there is an edge inE whose endpoints arex andf(x).

Definition 4: The neighborhoodof any subsetA ⊂ VL isdefined by

NA = {b|(a, b) ∈ E, a ∈ A} ⊆ VR

which are endpoints of an edge inE whose other endpointslie in A.

The necessary and sufficient condition for a bipartite graphto have a complete matching is given by Hall’s marriagetheorem [32], [33]. Because it is the most fundamental theoremin switching theory, we state the theorem and give R. Rado’selegant proof [34] below.

Theorem3: (Hall) Let G = (VL, VR, E) be a bipartitegraph, there exists a complete matching forG if and onlyif

|NA| ≥ |A| (15)

for any subsetA ⊆ VL.

Proof: It is clear that the condition is essential for theexistence of a complete matching, it remains to show that it issufficient. LetVL = {1, 2, ..., n}, andF = {N1, N2, ..., Nn},by way of contradiction, supposeN1 containsx, y, the removalof eitherx or y violates Hall’s condition. There existsA,B ⊆{2, ..., n} with the property

RA = NA ∪ (N1 − {x}), |RA| ≤ |A|

0

1

2

3

0

1

2

3

1 1 0 1

1 1 1 0

1 1 0 1

0 0 2 1

1 0 0 0 0 0 0 1 0 1 0 0

0 1 0 0 1 0 0 0 0 0 1 0

0 0 0 1 0 1 0 0 1 0 0 0

0 0 1 0 0 0 1 0 0 0 0 1

= + +

Fig. 12. An edge colored regular bipartite graph and decomposition of integerdoubly stochastic matrix

RB = NB ∪ (N1 − {y}), |RB| ≤ |B|

Then |RA ∪RB | = |NA∪B ∪N1|, and|RA ∩RB| ≥ |NA∩B|.It follows that

|A|+ |B| ≥ |RA|+ |RB| = |RA ∪RB|+ |RA ∩RB|≥ |NA∪B ∪N1|+ |NA∩B|. (16)

On the other hand, from Hall’s condition, we have

|NA∪B∪N1|+|NA∩B| ≥ |A∪B|+1+|A∩B| = |A|+|B|+1.(17)

Combining (16) and (17) together will lead to

|A|+ |B| ≥ |A|+ |B|+ 1,

a contradiction. Hence, the removal of eitherx or y doesnot violate Hall’s condition, and a complete matching can beobtained by repeating this procedure until eachNi containsonly one element.

The nonblocking route assignment of the Clos network andthe edge coloring of the bipartite graph stated in the follow-ing theorem are equivalent consequences of Hall’s marriagetheorem.

Theorem4: The following statements are equivalent:1. A regular bipartite graphG with degreen can be edge

colored bym colors if m ≥ n.2. (Slepian-Duguid)The Clos networkC(m,n, k) is rear-

rangeably nonblocking ifm ≥ n.

Proof: 1. For any subsetA ⊆ VL, the edges terminate onvertices in A must be terminated on vertices in its neighbor-hood setNA at the other end, Hall’s condition holds because

n|NA| ≥ n|A| ⇒ |NA| ≥ |A|.

The bipartite graphG can be reduced ton completematching by repeatedly applying Hall’s theorem. It is clearthat edge coloring can be satisfied ifm ≥ n. The edgecolored bipartite graph is equivalent to the decompositionofan integer doubly stochastic matrix into permutation matrices.An example to illustrate this point is shown in Fig.12.

2. Consider a set of call requests{(S0, D0), · · · , (SN−1, DN−1)}, in which the destinationaddresses are all different, and suppose the central moduleGi is assigned to the request(Si, Di) for i = 0, · · · , N − 1.

8

0

1

2

3

0

1

2

3

0

1

2

0

1

23

4

5

6

7

0

1

23

4

5

6

7

Inputmodules

Outputmodules

Centralmodules

Inputmodules

Outputmodules

0

1

2

3

0

1

2

3

(a) Three-stage Clos network (b) Equivalent regular bipartite graph

Fig. 13. Correspondence between Clos network and bipartitegraph

S 0 1 2 3 4 5 6 7D 1 3 2 0 6 4 7 5G 0 2 0 2 2 1 0 2Q 0 1 1 0 3 2 3 2R 1 1 0 0 0 0 1 1

TABLE II

THE SET OF ROUTING TAGS(G,Q,R)

This set of assignments is nonblocking, or contention-free, ifand only if

⌊Si/n⌋ = ⌊Sj/n⌋ ⇒ Gi 6= Gj

and⌊Di/n⌋ = ⌊Dj/n⌋ ⇒ Gi 6= Gj

for all i 6= j. It simply means that if either of the twosourcesSi, Sj are on the same input modules, or the twodestinationsDi, Dj are on the same output module, thenthe central modules assigned to(Si, Di), (Sj , Dj) must bedifferent.

The route assignments can be formulated as the edge color-ing of a bipartite graphG(VL, VR, E), in which vertices ofVL

represent input modules and vertices ofVR represent outputmodules, and each connection request(Si, Di) is representedby an edge(⌊Si/n⌋, ⌊Di/n⌋) in E. This bipartite graph isregular with degreen, and can be colored bym colors ifm ≥ n, such that each color represents a central module andthe edge coloring is the set of route assignments.

An example to show the equivalence between route assign-ment of the Clos network and edge coloring of the bipartitegraph is given in Fig.13. The resulting route assignments ofthe call request are listed in Table II. The connection betweenroute assignment and error-correcting code will be addressedin the sequel.

B. Graphical Codes

The low-density parity-check (LDPC) code is a class oflinear block code that can be represented by bipartite graphsG(VL, VR, E), called Tanner graph [12], in whichVL is theset of variable-vertices andVR the set of constraint-vertices.The parity matrix specifies the edge setE. Each variable canassume the value of0 or 1, and a constraint is satisfied if thesum of all the variables adjacent to it is zero mod 2. Fig. 14

0x

1x

2x

3x

+

+

0

0

0

1

: variablesLV n : constraintsRV m

1 3 4 7 1

Unsatisfied

x x x x+ + + =

0 1 2 5 0

Satisfied

x x x x+ + + =

2Closed Under (+)

4x

5x

6x

7x

+

+

1

0

1

1

2 5 6 7 0

Satisfied

x x x x+ + + =

0 3 4 6 1

Unsatisfied

x x x x+ + + =

Fig. 14. Low density parity check code

illustrates an example of such a bipartite graph that representsthe following matrix

0 1 0 1 1 0 0 11 1 1 0 0 1 0 00 0 1 0 0 1 1 11 0 0 1 1 0 1 0

(18)

A vectorx ∈ {0, 1}n is a code word if and only ifx satisfiesall constraints. The set of all code words is closed with respectto mod 2 sum, and forms a linear subspace of{0, 1}n.

A subclass of graphical codes based on expander graphs wasconstructed by Sipser and Speilman [13]. The expander codehas the following linear time sequential decoding algorithm.

Sipser-Speilman decoding algorithm:If there is a variable vertexv such that most of its neighboringconstraints are unsatisfied, flip the value ofv. Repeat.

The decoding algorithm of expander codes guarantees thatthe number of unsatisfied vertices will be monotonicallydecreasing until all constraints are satisfied. The main resultis stated in the following theorem.

Theorem5: (Sipser-Speilman)Let G(VL, VR, E) be thebipartite graph of size|VL| = n, |VR| = m that is k−regularon the left. Assume that for any subsetA ⊆ VL of size|A| ≤ αn, |NA| > 3k

4 |A|, then the decoding algorithm willcorrect up toαn

2 errors.

The similarity between Hall’s condition (15) on completematching and the condition on expander graphs in the abovetheorem is quite obvious. The resemblance between thesetwo structure conditions on bipartite graphs suggests the con-nection between error-correcting code and route assignmentof Clos network. The Sipser-Speilman decoding algorithmis extended to the nonblocking route assignment of Benesnetwork to illustrate their correspondence.

C. Route Assignments of Bense Network

A Bense network is a multi-stage connecting network thatcan be recursively constructed from Clos networkC(2, 2, k)

9

0

1

01

23

45

67

01

23

5

67

4

Bipartite graph of call requests

0 1 2 3 4 5 6 7

1 6 0 5 7 2 4 3

� ��

0

1

2

3

0

1

2

3

+

0 1

2 3

4 5

6 7

1

1

1

1

x x

x x

x x

x x

+ = ��+ =

+ = �� + = �

+

+

+

+

+

+

+

Input ModuleConstraints

0 2

5 7

3 6

1 4

1

1

1

1

x x

x x

x x

x x

+ = ��+ = ��

+ = ��+ = �

Output ModuleConstraints

0 unsatisfied Module 2 num =

1 satisfied

��

Not closed under +

0x

2x

3x

4x

5x

6x

7x

1x

Fig. 15. Nonblocking constraints of Benes network

[8], [25]. A set of call requests is a mapping from inputs tooutputs. An example is given in (19), which displays a set ofconnection requests in an8×8 Benes network. The upper rowof the matrix is the ordered inputs from 0 to 7, and the lowerrow consists of the respective outputs.

π =

(0 1 2 3 4 5 6 71 6 0 5 7 2 4 3

)

(19)

The two central modules inC(2, 2, N2 ), as shown in Fig.18(a), are labelled respectively by 0 and 1. Letxi be thebinary variable associated with the request(i, π(i)), for i =0, · · · , N − 1. The value ofxi ∈ {0, 1} defines the followingroute assignment

xi =

{

0 if module 0 assigned to(i, π(i))

1 otherwise

(20)

Since the same central module cannot be assigned to twoinputs or two outputs on the same module, the necessary andsufficient condition on nonblocking route assignment can besimply formulated as a set of linear constraints

xi + xj = 1 (21)

for all i 6= j such that either two inputsi and j are on thesame input module, or two outputsπ(i)) andπ(i)) are on thesame output module.

As shown in Fig.15, letVL = {x0, · · · , xN−1} be theset of variables andVR be the set of constraints (21), abipartite graphG(VL, VR, E) similar to that of LDPC can beconstructed to solve for the values ofx′

is, which determinethe nonblocking routes of the set of call requests inπ. Itshould be noted that the set of solutions to (21) is not closed

0x

1x

2x

3x

+

+

+

+

4x

5x

6x

7x+

+

+

+

0

1

0

1

0

1

0

1

0 1 1x x+ =

2 3 1x x+ =

4 5 1x x+ =

6 7 1x x+ =

0 2 0x x+ =

5 7 0x x+ =

3 6 1x x+ =

1 4 1x x+ =

Input moduleconstraints

Output moduleconstraintsVariables

Fig. 16. The flip algorithm for nonblocking route assignments

01

23

45

67

01

23

5

67

4

0 1 2 3 4 5 6 7Call requests

1 6 0 5 7 2 4 3

� ��

Assignment = 0 1 1 0 0 1 1 0

0

1

2

3

0

1

2

3

Fig. 17. The connection requests and and route assignments

with respect to mod 2 sum. In fact, two solutions in eachconnected component of the bipartite graphG(VL, VR, E) arecomplementary to each other, and the total number of solutionsof (21) should be2g, whereg is the number of connectedcomponents ofG. The following route assignment algorithmis a modification of Sipser-Speilman decoding algorithm.

The flip algorithm for nonblocking route assignments:

Step 1 Initially assignx0 = 0, x1 = 1, x2 = 0, x3 = 1, · · ·to satisfy all input module constraints.

Step 2 In each cycle of the bipartite graph, unsatisfiedvertices divide the cycle into line segments. Labelthemα andβ alternately and flip the values of allvariables located inα segments.

The initial values of all variables can be arbitrarily assigned.Compared with the Sipser-Speilman decoding algorithm, theinitial assignment can be considered as the received signal,and errors are systematically removed by the flip algorithm tosatisfy all constraints.

Theorem6: All constraints are satisfied when the flip al-gorithm terminates.

Proof: The proof of this theorem is illustrated by theexample displayed in Fig. 16. First, we want to show thatevery cycle has an even number of unsatisfied vertices. It isclear that the total number of constraints is even because thereis an equal number of input modules and output modules ineach cycle. Assume by way of contradiction that the numberof unsatisfied vertices is odd, then the number of satisfiedvertices must also be odd. Letu0 and u1 be the number ofunsatisfied vertices whose neighboring variables both equal0 and 1, respectively, and lets be the number of satisfiedvertices. Then, the number of variables equal to0 and 1,

10

0

1

2

3

4

5

6

7

0

1

2

3

4

5

6

7

Fig. 18. The complete route assignments of Benes network

respectively, should beu0+s2 andu1+

s2 , which are impossible

if s is odd.Next, we want to show that all constraints are satisfied

when the flip algorithm terminates as shown in Fig.16. Foran arbitrary constraint vertexv ∈ VR, let xi, xj ∈ VL bethe two neighboring variables ofv. If v is unsatisfied, the twovariablesxi andxj should have the same value but are locatedin segments bearing different labels. Only one located in theαsegment will flip the value, and the vertexv should be satisfiedwhen the algorithm terminates.

On the other hand, if the constraintv is initially satisfied,then xi and xj flip simultaneously if they are both in anαsegment, or else they keep the same value if they are locatedin a β segment. Hence, the constraintv remains satisfied ineither case when the algorithm terminates.

The route assignments of the set of call requests (19)resulting from the flip algorithm are displayed in Fig.17, andthe complete assignments shown in Fig. 18 can be determinediteratively. If both N and n are of the power of 2, theset of equations (21) can also be solved by the parallelalgorithm proposed in [35] with time complexity on the orderof O(log2(N)).

V. CLOS NETWORK AS NOISELESSCHANNEL-PATH

SWITCHING

Path switching proposed in [14] is a compromise of staticrouting and dynamic routing schemes for a Clos network. A setof connection patterns of the central modules is predeterminedaccording to the traffic between pairs of input and outputmodules. This set of connection patterns is repeatedly usedina cyclic manner such that online path hunting can be avoidedand bandwidth requirements can be satisfied in the long run.

The scheduling of path switching is based on the followingrelationships between the edge colored bipartite graph andtheconnection pattern in the middle stage of the Clos network asdepicted in Fig. 13 earlier:

1) The number of edgeseij represents the number ofpackets that can be sent from input moduleIi to outputmoduleOj , the virtual pathVij , in one time slot,

2) Each color of the bipartite graph corresponds to acentral module of the Clos network.

This correspondence suggests that the scheduling of switchac-cording to predetermined connection patterns if traffic matrixis expressed as weighted sum of a finite number of permutationmatrices as shown in Fig. 19.

Times slot 0

Times slot 1

1G

2G

2 1 0

(1) 0 2 1

1 0 2

ije

=

0

1

2

0

1

2

0

1

2

0

1

2

1 0 2

(2) 1 2 0

1 1 1

ije

=

Virtual path

1 2G G∪

0

1

2

0

1

2

(1) (2)

2

1.5 0.5 1

0.5 2 0.5

1 0.5 1.5

ij ij

ij

e ec

+ =

=

1.5

0.51

0.52

0.5

10.5

1.5

Fig. 19. Capacity of virtual path as average number of edges

0

k-1

0

m-1

0

k-1

Predetermined

connection pattern

in every time slot

0

n-1

0

n-1

N-n

N-1

N-n

N-1

Input module

(input queued

switch)

n m× k k× m n×

Central module

(nonblocking

switch)

Output module

(output queued

switch)

Output bufferInput buffer

Buffer

and

scheduler

Input

module i

Buffer

and

scheduler

Output

module jVirtual Path

Destination Sourceij

λ

Scheduling to combat

channel noise

Buffering to combat

source noise

Fig. 20. Noiseless virtual path of Clos network

For any given traffic matrixT = [λij ], whereλij is thenumber of packets per time slot from input moduleIi to outputmoduleOj , such that

∑

i λij < n ≤ m, and∑

j λij < n ≤ m,if there exists a finite numberF of regular bipartite graphssuch that the capacitycij of the virtual pathVij betweenIiandOj satisfies

cij =

∑F−1t=0 eij(t)

F> λij , (22)

whereeij(t) is the number of edges from nodei to nodej inthe tth bipartite graph. The bandwidth requirementT = [λij ]can be satisfied if the system periodically provides connectionsaccording to the edge coloring of theseF bipartite graphs.Adopting convention in the TDMA system, each cycle is calleda frameand the periodF frame size. The routing informationcan be stored in the local memory of each input module toavoid the slot-by-slot computation of route assignments [36],[37].

Since the connection patterns of switch modules in the mid-dle stage are fixed, the virtual path between any pair of inputmodule and output module is contention-free. As shown inFig. 20, input/output modules are scaled to a much smaller sizeand routing decisions in these modules become independent.Each input module is an independent input-queued switch,

11

arrival packets can be scheduled in the input buffer accordingto predetermined routes in every time slot by some matchingalgorithm similar to the scheduling algorithm for input-queuedcrossbar switch [38], while each output module could bean output-queued switch similar to a knockout switch [39].Consider the scheduled Clos network as a noiseless channel,the scheduling of path switching maps central modules andtime slots into incoming packets, a process similar to thesource coding of transmission [1], [2], [6], [7] if the setof predetermined connection patterns is regarded as a codebook. In this section, we address the capacity allocation andtraffic matrix decomposition issues, and the smoothness ofscheduling will be discussed in the next section.

A. Capacity Allocation

The capacity allocation problem seeks to find the capacitycij > λij for each virtual pathVij betweenIi and Oj

such that non-overbooking condition∑

i cij =∑

j cij = mfor each input/output module is observed. The problem canbe formulated as constrained optimization of some objectivefunction [40]. The choice of the objective function dependsonthe stochastic characteristic of the traffic on virtual paths andthe quality of service requirements of connections. FollowingKleinrock’s independency assumption [41], [42], each virtualpath can be modelled as anM/M/1 queue with arrival rateλij and service ratecij for all i, j, then the average delay forthe packets from input moduleIi to output moduleOj is givenby

Dij =1

cij − λij(23)

and the objective is to minimize the total weighted delay [41]∑

i,j

λij

cij − λij(24)

subject to cij > λij and∑

i cij =∑

j cij = m. Thecomputation of the optimal allocation is quite involved in gen-eral. Nevertheless, the following heuristic algorithm based onKleinrock’s square root rule [41] can always yield suboptimalcapacity allocation that is close to the optimal solution.

Theorem7: For any given traffic matrixT = [λij ] withnon-zero entries,λij > 0 for all i andj, then

n∑

j=1

λij ≤n∑

j=1

cij = m (25)

andn∑

i=1

λij ≤n∑

i=1

cij = m (26)

when the capacity allocation algorithm terminates.

Proof: Sinceλij > 0, it is obvious that

c(k)ij > c

(k−1)ij (27)

in kth iteration of the algorithm for allk. Also, we have

n∑

j=1

c(k)ij ≤

n∑

j=1

c(k−1)ij +

n∑

j=1

A(k−1)i

√λij

∑nj=1

√λij

= m. (28)

Capacity Allocation Algorithm

Step 1 Initialization: Setk = 1 andc(0)ij = λij .Step 2 Calculate slack capacity of each input module (row)

and output module (column).

A(k−1)i = m−

∑nj=1 c

(k−1)ij

B(k−1)j = m−∑n

i=1 c(k−1)ij

Step 3 Remove any saturated rowi if A(k−1)i = 0, and

column j if B(k−1)j = 0. The algorithm terminates

if there is no slack capacity left, otherwise do

c(k)ij = c

(k−1)ij +min(

A(k−1)i

√λij

∑nj=1

√λij

,B

(k−1)j

√λij

∑ni=1

√λij

),

and setk = k + 1, goto step 2.

The combination of (27) and (28) assures that∑n

j=1 cij =m, and similarly, that

∑ni=1 cij = m, when the algorithm

terminates.

Notice that if some entriesλij are zero, then the strictinequality (27) does not hold in general, and there is noguarantee that the algorithm will halt. It is also true for anycapacity allocation algorithms that ifλij = 0 implies cij = 0,then the doubly stochastic matrixC = [cij ] that satisfies (25)and (26) may not exist. Nevertheless, this problem can beeliminated in practice if a small amount of capacitycij = ǫallocated to a virtual path withλij = 0 is allowed.

B. Capacity Matrix Decomposition

The time axis is divided into frames of time-slots, andthe frame size is denoted byF . Within each frame, the raterequirement will be satisfied by a set of connection patterns,which are predetermined by the decomposition of the capacitymatrix [cij ]. In a time-slotted system, it is reasonable toassume that all entriescij are rational numbers with a leastcommon denominatorF . Then [F · cij ] is an integer doublystochastic matrix with constantmF row sum and column sum.According to theorem 4, the bipartite graph correspondingto the matrix [F · cij ] can be colored bymF colors. Thecolored bipartite graph can also be expressed by the genralizedBirkhoff-von Neumann decomposition [43]–[45] stated in thefollowing theorem.

Theorem8: (Generalized Birkhoff-von Neumann Decom-position) The capacity matrixC has the following expansions

C =1

F

mF∑

i=1

Mi =1

F

F∑

i=1

Gi =

K∑

i=1

φiPi, (29)

such that Gi =∑mi

j=m(i−1)+1 Mj , i = 1, · · · , F and∑K

i=1 φi = 1.

With respect to the scheduling of path switching, the cor-respondence between connection patterns in the Clos network

12

1.4 0.5 1.1

0.6 1.8 0.6

1.0 0.7 1.3ijc

! "# $% & =' ( ) *+ ,- .

capacity matrix (per slot)

5 2 4

2 7 2

4 2 5ije

/ 01 23 4 =5 6 7 89 :; <

11 4 8

4 14 4

8 5 10ije

= >? @A B =C D E FG HI J

F = 4

R.M.S. error = 0.104

F = 8

R.M.S. error = 0.0608

Fig. 21. Round-off errors with different frame sizeF .

C(m,n, k) and these matrices in the series expansions (29) isgiven as follows:

1) EachMi is a permutation matrix, or complete matching.2) Each matrixGi is a sum ofm permutation matrices,

Mm(i−1)+1, · · · ,Mmi, which represent connection pat-terns of thosem central modules ofC(m,n, k) in timeslot i of every frame. That is, the matrixGi is anedge colored regular bipartite graph with degreem.The combination of permutation matrices inGi can bearbitrary.

3) Each matrixPi is astateof the switch such thatPi 6= Pj

for all i 6= j, and{P1, · · · , PK} = {G1, · · · , GF }. Thecoefficientφi is the frequency of the statePi within eachtime frame. Since the total number of constraints in thedoubly stochastic matrixC equals to(N − 1)2 + 1, thenumber of states is bounded byK ≤ min{F,N2−2N+2}.

The scheduling information stored in the memory is linearlyproportional toF , the frame size is limited by the accessspeed and the memory space of input modules. In the capacitymatrix decomposition (29), bothK and F could be toolarge in practice, because the number of statesK is in theorder ofO(N2) and the frame sizeF is determined by theleast common denominator ofcij . The following limitationscompromise between complexity and QoS:

1) If the capacity matrixC is bandlimited such thatcij ≤BF , then it is easy to show thatK ≤ F ≤ BN

m .2) The frame sizeF can be a constant independent of

switch sizeN if [F · cij ] is rounded off into an integermatrix, and the round-off error in the order ofO(1/F ) isacceptable. The error can be arbitrarily small if the framesize F is sufficiently large, as shown in the examplesgiven in Fig. 21.

Consider the traffic matrix as the aggregate of signalsinput to the packet switch, the series expansion of doublystochastic capacity matrix into permutation matrices (29),and the reconstruction of capacity by weighted running sumof scheduled connections are mathematically similar to thatof the Nyquist-Shannon sampling theorem of bandlimitedsignals [18]–[21] input to the transmission channel.

Sampling theorem of bandlimited signalIf a functionf(t)contains no frequencies higher thanW cps, it is completely

determined by its ordinates at a series of equally spacedsampling intervals of1/2W sec., called Nyquist intervals. If|F (ω)| = 0, for |ω| ≥ 2πW , then

f(t) =

+∞∑

−∞

fnsinπ(2Wt− n)

π(2Wt− n)

wherefn = f( n2W ) is thenth sample off(t).

An important common characteristic of both series expan-sions is the complexity reduction of communication systems.Without scheduling, the total number of possible permutationsof a packet switch for a given capacity matrix isN !. Thecomplexity reduction is achieved by limiting the space tothose permutations only involved in the expansion (29). Hence,the dimension of permutation space is reduced toO(N2) bythe traffic matrix decomposition. It can further be reduced toO(N) if each connection is bandwidth limited. The complexityof packet level problems can be significantly reduced if theswitch fabric is limited to this set of predetermined permu-tations only, operating in a much smaller subspace, yet thecapacity of each connection is still guaranteed. In general, theminimal number of permutations required is determined by theedge coloring of the corresponding bipartite graphs as statedin theorem 4.

In digital transmission, the sampling theorem reduces theinfinite dimensional signal space in any time intervalT to afinite number2TW of samples, and the minimal number ofsamples is determined by the Nyquist sampling rate for limitedbandwidthW of the input signal function. The comparisons ofthe these two expansions and their respective roles in packetswitching and digital transmission are listed in Table III.

VI. SCHEDULING AND SOURCE CODING

The process and function of a scheduling algorithm aresimilar to those of source coding in transmission. If we regardthe set of predetermined connection patterns as a code book,scheduling incoming packets in the input buffer according topredetermined connection patterns is the same as encoding thesource signal. This point will be elaborated by the constructionof a scheduling algorithm based on the Huffman tree. We firstshow that the smoothness of scheduling, like source coding,is bounded by the source entropy, or the entropy of capacitydecomposition. The results can then be generalized to thesmoothness of two-dimensional scheduling of tokens assignedto each virtual path of the Clos network.

A. Smoothness of Scheduling

In light of the fairness of services, the scheduling is expectedto be as smooth as possible. In a Clos network with pathswitching, in addition to guarantee capacity for each virtual

Pi Pi Pi Pi Pi( )1

ix ( )2ix ( )

3ix ( )

4ix

F

Fig. 22. Interstate time forith state within a frame of sizeF

13

Packet switching Digital transmissionNetwork environment Time-slotted switching system Time-slotted transmission system

Bandwidth limitationCapacity limited traffic matrix Bandwidth limited signal function

NX

i=1

λij ≤N

X

i=1

cij = m f(t) =1

2π

Z 2πw

−2πw

F (ω)dω

NX

j=1

λij ≤N

X

i=1

cij = m F (ω) = 0, |ω| ≥ 2πw

λij ≤ cijSamples (0, 1) Permutation matrices (0, 1) Binary sequences

BvN decomposition (Hall’s marriage theorem) Fourier series

Expansion C =FmX

i=1

Mi

F=

KX

i=1

φiPi f(t) =

+∞X

−∞

fnsinπ(2Wt− n)

π(2Wt− n)

Frame size=F=lcm of denominators ofcij Nyquist interval= T = 1

2W

Inversion Reconstruct the capacity by running sum Reconstruct the signal by interpolation

Complexity reduction

• Reduce number of permutations fromN ! to O(N2) • Reduce infinite dimensional signal space• Reduce toO(N) if bandwidth is limited to finite number2tW in any durationt• Reduce to constantF if truncation error of orderO(1/F ) is acceptable

QoS Capacity guarantee, Scheduling, Delay bound Error-correcting code, Data compression, DSP

TABLE III

COMPARISON OF TRAFFIC MATRIX DECOMPOSITION AND SAMPLING THEOREM

path, smooth scheduling also reduce delay jitters and alleviatehead-of-line(HOL) blocking at input buffer [25].

Consider a scheduling of a given capacity decomposition

C =K∑

i=1

φiPi with frame sizeF . Let X(i)1 , X

(i)2 , · · · , X(i)

ni be

a sequence of interstate times of statePi within a frame. It iseasy to see from Fig. 22 that

ni = φiF ; (30)

X(i)1 + · · ·+X(i)

ni= F, for all i = 1, · · · , k. (31)

Since delay jitter is one of the major concerns of scheduling,it is natural to define the smoothness of scheduling by thesecond moment of interstate time.

Definition 5: The smoothness of statePi is defined by

Li = log

√∑ni

k=1(X(i)k )2

ni, (32)

and the smoothness of a scheduling is the weighted averagegiven by

L =

K∑

i=1

φiLi. (33)

The following theorem reveals the fact that the smoothnessof scheduling is parallel to the code length of source codingin information theory.

Theorem9: For any scheduling of a given capacity decom-positionC =

∑Ki=1 φiPi with frame sizeF , we have

K∑

i=1

2−Li ≤ 1 (Kraft’s inequality) (34)

and the average smoothness is bounded by

L =

K∑

i=1

φiLi ≥K∑

i=1

φi log1

φi= H. (35)

Both equalities hold whenX(i)k = 1

φi, for all i = 1, · · · ,K,

andk = 1, · · · , ni.Proof: The inequality (34) can be obtained by

substituting (30) and (31) into the Cauchy-Schwartz inequality

K∑

i=1

2−Li =

K∑

i=1

√ni

∑ni

k=1(X(i)k )2

≤K∑

i=1

ni

F=

K∑

i=1

φi = 1.

Again, using Cauchy-Schwartz inequality, we have

L =

K∑

i=1

φiLi =

K∑

i=1

φi log

√∑ni

k=1(X(i)k )2

ni

=

K∑

i=1

φi

2log

∑ni

k=1(X(i)k )2

ni≥

K∑

i=1

φi log1

φi= H.

Suppose equalities in the above theorem hold, then thescheduling is obviously optimal. First, we consider a simpleexample whenK = F, φi = 1

F , andni = 1 for all i, then

X(i)1 = F for all i = 1, · · · , F , in which case the smoothness

equals entropy.

L =1

F

F∑

i=1

log√

(F 2) = logF = H (36)

When all weights are reciprocals of the power of 2, forexampleφi =

12 ,

14 ,

18 ,

18 , an optimal scheduling can be easily

found. An example is shown in Fig 23(a), whereF = 8, K =4, andni = 4, 2, 1, 1 with respect to statePi for i = 1, · · · , 4.

P1 P2 P1 P3 P1 P2 P1 P4 P1 P1 P2 P1 P1 P2 P3 P4

(a) Optimal Scheduling (b) WFQ Scheduling

Fig. 23. Examples of Scheduling

14

The scheduling with constant interstate timeX(i) = 2, 4, 8, 8satisfies both equalities of theorem 10

H = L =1

2·+1

4· 2 + 1

8· 3 + 1

8· 3 = 1.75. (37)

Another scheduling based on weighted fair queueing (WFQ)for the same decomposition is shown in Fig 23(b). ThesmoothnessL = 1.8758 is greater than the decomposition en-tropyH = 1.75, and the superiority of the optimal schedulingis quite obvious by comparing Fig. 23(a) and (b). An upperbound of smoothness is given in the following theorem.

Theorem10: For any capacity decompositionC =∑K

i=1 φiPi, it is always possible to device a scheduling whosesmoothnessL is within 1

2 of decomposition entropy

H ≤ L < H +1

2(38)

Proof: Consider a random scheduling without framestructure such that the statePi of each time slot is randomlyselected with probabilityφi. The interstate timeX(i) of statePi is a geometric random variable withE[X(i)] = 1

φi, and

V ar[X(i)] = 1−φi

φ2i

. The second moment ofX(i) and thesmoothness of statePi are given respectively as follows

E[(X(i))2] = V ar[X(i)] + (E[X(i)])2 =2− φi

φ2i

, (39)

and

Li = log√

E[(X(i))2] =1

2log (2− φi) + logφi. (40)

It is counter intuitive that the smoothnessL of randomscheduling does not equal to the decomposition entropyH ;in fact we have

L =

K∑

i=1

φiLi =1

2

K∑

i=1

φi log (2− φi) +H. (41)

It is easy to show that the Kullback-Leibler distanceL −Hreaches the maximum value whenφ1 = φ2 = · · · = φK = 1

K ,in which case we have

L−H =1

2

K∑

i=1

φi log (2− φi) ≤1

2log (2− 1

K) <

1

2. (42)

The random scheduling is actually no scheduling at all. Hence,there should exist a scheduling algorithm, with or withoutframe, that satisfies the bound given in (38).

It is obvious that theorem 9 and 10 are counterparts of thesource coding theorem stated as follows.

Noiseless coding theoremLet random variableX takethe possible valuesx1, · · · , xK with respective probabilitiesp(x1), · · · , p(xK). Then, the necessary and sufficient condi-tion to encode the values ofX in binary prefix code ( noneof which is an extension of another) of respective lengthsL1, · · · , LK is

K∑

i=1

2−Li ≤ 1 (Kraft’s inequality) (43)

The average code length is bounded by

L =

K∑

i=1

p(xi)Li ≥ H(X) =

K∑

i=1

p(xi) log1

p(xi)(44)

and it is always possible to device an optimal prefix code forX whose average code lengthL is within 1 of entropy

H(X) ≤ L < H(X) + 1. (45)

B. Comparison of Scheduling Algorithms

The scheduling algorithm for an arbitrary set of decom-position weightsφi achieves optimal smoothness ifL = H ,which requires the constant interstate time that equals to theinverse of weightX(i)

k = Fni

= 1φi

for all statePi. Mostproposed scheduling algorithms are indeed constructed fromthe reciprocal of the normalized weights. A comparison ofseveral stereotype algorithms is given below.

1) Weighted Fair Queueing (WFQ) Scheduling Algorithm:The WFQ scheduling algorithm [46] is constructed from thesequence of virtual finish times1φi

, 2φi· · · of each statePi with

the initial finish time 1φi

. The algorithm is given as follows.

The WFQ scheduling algorithm:Select the state with the smallest finish time and increase itsfinish time by the inverse of its weight. Repeat this processuntil the frame size is reached

We will use the running example of a set of five statesP1, P2, P3, P4, P5 with respective weightsφ1 = 0.5, φ2 =φ3 = φ4 = φ5 = 0.125 to illustrate the performance ofthose scheduling algorithms presented in this section. Thesequence generated byWFQ in each time slotτ within aframe is given in the Table IV, and the final sequence isP1P1P1P1P2P3P4P5.

τ P1 P2 P3 P4 P5 Selection1 2 8 8 8 8 P1

2 4 8 8 8 8 P1

3 6 8 8 8 8 P1

4 8 8 8 8 8 P1

5 10 8 8 8 8 P2

6 10 16 8 8 8 P3

7 10 16 16 8 8 P4

8 10 16 16 16 8 P5

TABLE IV

AN EXAMPLE OF WFQ ALGORITHM

Although theWFQ algorithm properly utilizes the inverseweights in constructing the scheduling, the above exampleshows that the sequence generated byWFQ is still far fromthe achievable optimal smoothness. Possible amendments toWFQ are discussed below.

15

2) WF 2Q Scheduling Algorithm:WF 2Q is a schedulingalgorithm [47] that incorporates the stringent rate requirementof generalized processor sharing (GPS)[48] in WFQ. LetTi(τ) be the number of time slots assigned to statePi upto time τ , theWF 2Q will select the statePi that satisfies

Ti(τ − 1) < τ · φi (46)

in every time slotτ = 1, 2, · · · . That is, only the states thathave started its service in the correspondingGPSsystem willbe selected. The set of states qualified for selection in timeslot τ is defined by

Qτ = {Pi | Ti(τ − 1) < τ · φi} (47)

Hence, all states are qualified in the first time slot (see TableIV), becauseTi(0) = 0 for all i. The set of qualified statesQ1 = {P1P2P3P4P5} is then ordered by their finish timeand the stateP1 is selected. In the second time slot, however,T1(1) = 1 and2 · φ1 = 1, so the stateP1 cannot be selectedandQ2 = {P2P3P4P5}. In each time slot, whenever a stateis selected, like inWFQ, its finish time will be increased bythe inverse of its weight.

The following table provides the whole procedure ofWF 2Q. The final sequence isP1P2P1P3P1P4P1P5, whichis better than the sequence produced byWFQ. Anotherapproach to amendWFQ based on the Huffman tree is givennext.

τ P1 P2 P3 P4 P5 Qτ Selection1 2 8 8 8 8 P1P2P3P4P5 P1

2 4 8 8 8 8 P2P3P4P5 P2

3 4 8 8 8 8 P1P3P4P5 P1

4 4 16 8 8 8 P3P4P5 P3

5 6 16 8 8 8 P1P4P5 P1

6 6 16 16 8 8 P4P5 P4

7 8 16 16 8 8 P1P5 P1

8 10 16 16 16 8 P5 P5

TABLE V

AN EXAMPLE OF WF 2Q ALGORITHM

3) Huffman Round Rabin (HuRR) Algorithm :It is knownthat the Huffman code [49] is the optimal source code con-structed from the binary probability tree, called theHuffmantree, in a hierarchical manner. When theWFQ is applied toonly two states, the average interstate time of each state isveryclose to the optimum because the two-state Huffman binarytree only has one level.

For example, consider the two statesP1 and P2 withrespective weightsφ1 = 0.25 andφ2 = 0.75, the WFQ willproduce the sequenceP2P2P1P2. The interstate time of stateP1 is 4, which is exactly the reciprocal of 0.25, and stateP2

has interstate time of 1 and 2, which are the two integersnearest1/0.75. However, if there are more than two states,the WFQ fails to achieve the optimality because the Huffmantree has multiple levels.

The optimality of Huffman code is closely related to Kraft’sinequality, which is the necessary and sufficient conditionofprefix source coding. In scheduling, the optimal smoothness

P Y P X 0.25 0.25

P 1

0.5

P 2

0.125

P 3

0.125

P 4

0.125 0.125

P 5

P Z 0.5

1

Fig. 24. A Huffman tree of HuRR Algorithm

L = H also impliesLi = log 1φi

for all statePi. Hence, theequality of Kraft’s inequality holds

K∑

i=1

2−Li = 1

Similar to the Huffman code, theHuRRscheduling algorithmproposed in [50] is a hierarchical scheduling algorithm syn-thesized from the Huffman tree.

Consider each state of the decomposition as a symbol andthe weight of the state as its probability, a Huffman treecan be constructed from ordered sequence of probabilities asusual. The probability of the root node is 1, the probabilityofeach leave node is the probability of the symbol representedby that leave, and the probability of each intermediate nodeis the sum of the probabilities of two successors. TheHuRRalgorithm comprises the following steps.

The HuRRscheduling algorithm:

Step 1 Initially set the root to be temporary nodePX , andS = PX · · ·PX to be a temporary sequence.

Step 2 Apply theWFQ to the two successors ofPX toproduce a sequenceT , and substituteT for thesubsequencePX · · ·PX of S.

Step 3 If there is no intermediate node in the sequenceS,then terminate the algorithm. Otherwise select anintermediate nodePX appearing inS and goto step2.

The Huffman tree of the previous example is shown in Fig.24, the HuRR scheduling algorithm will generate the followingsequences

P1PZP1PZP1PZP1PZ → P1PXP1PY P1PXP1PY

→ P1P2P1P4P1P3P1P5

In this particular example, the following Huffman code can begenerated from the tree depicted in Fig.(24)

P1 ←− 0, P2 ←− 100, P3 ←− 101, P4 ←− 110, P5 ←− 111

16

and the logarithm of interstate time of each state is equalto the length of its Huffman code. The smoothness of thesequence of this example is the same as that generated byWF 2Q earlier. However, theHuRR outperformsWF 2Q ingeneral as demonstrated by the comparison given next.

4) Comparison of Smoothness of Scheduling Algorithms:The smoothness of preceding scheduling algorithms are com-pared with random scheduling in Table VI to show their pro-gressive improvements. It is clear that the sequences generatedby HuRRare, in general, closer to the entropy than the othertwo. The performance ofWF 2Q is comparable toHuRR,while WFQ is not satisfactory in some cases.

P1 P2 P3 P4 Random WFQ WF2Q HuRR Entropy0.1 0.1 0.1 0.7 1.628 1.575 1.414 1.414 1.3570.1 0.1 0.2 0.6 1.894 1.734 1.626 1.604 1.5710.1 0.1 0.3 0.5 2.040 1.784 1.724 1.702 1.6860.1 0.2 0.2 0.5 2.123 1.882 1.801 1.772 1.7610.1 0.1 0.4 0.4 2.086 1.787 1.745 1.745 1.7220.1 0.2 0.3 0.4 2.229 1.903 1.903 1.884 1.8470.2 0.2 0.2 0.4 2.312 2.011 1.980 1.933 1.9220.1 0.3 0.3 0.3 2.286 1.908 1.908 1.908 1.8960.2 0.2 0.3 0.3 2.370 2.016 2.016 1.980 1.971

TABLE VI

THE SMOOTHNESSCOMPARISON OFSCHEDULING ALGORITHMS

The significance of the Huffman coding scheme lies in thestructure of the Huffman Tree. By the same token,HuRR,which is also implemented with the Huffman tree, can providethe best performance in terms of smoothness among thescheduling algorithms discussed in this section. However,HuRRhas not yet been proven to be the optimal scheduling al-gorithm. It is clear that the key to construct optimal schedulingis to explore the structure of the smoothnessLi of each statePi in a frame of sizeF and Kraft’s inequality that is satisfiedby Li, which is mathematically equivalent to the codewordlength in source coding theorem.

C. Two-dimensional Scheduling

The smoothness of scheduling described in the precedingsection is for a single-server system. However, the capacitymatrix decomposition of the packet switch system is a two-dimensional scheduling aimed at multiple inputs and outputs[51]. The generalization of the smoothness to two-dimensionalscheduling is discussed in this section.

Recall that the capacity matrixC = [cij ] of a C(m,n, k)Clos network is ak × k matrix, wherecij is the averagenumber of tokens assigned to the virtual pathVij betweeninput modulei and output modulej. If the frame size isF ,then the matrixF · C is integer doubly stochastic, such that∑k

i cij = m,∑k

j cij = m.In the following discussion of two-dimensional smoothness,

we will consider, without loss of generality, capacity matrixC = [cij ] and

∑Ni cij = 1,

∑Nj cij = 1 of a crossbar switch

for the sake of simplicity. The two-dimensional entropies ofthe capacity matrixC are defined as follows.

Definition 6: Let C = [cij ] be a doubly stochastic capacitymatrix; the entropy of input modulei is defined by

Hi = −N∑

j=1

cij log cij , (48)

and the the set of entropies of input modules is representedby the column vector

H =

H1

H2

...HN

=

[H1, H2, · · · , HN

]T. (49)

Similarly, the entropy of output modulej is defined by

Hj = −N∑

i=1

cij log cij , (50)

and the set of entropies of output modules is represented bythe row vector

H = [H1, H2, · · · , HN ] . (51)

The entropy of the capacity matrixC is defined by

H(C) =

N∑

i=1

Hi =

N∑

j=1

Hj = −N∑

j=1

N∑

i=1

cij log cij . (52)

The smoothness of a two-dimensional scheduling is definedby the inter-token time in the same manner as the interstatetime of single server scheduling. Letnij be the number oftokens assigned to the virtual pathVij within a frameF , andy(ij)1 , y

(ij)2 , · · · , y(ij)nij be the sequence of the inter-token times.

We havenij = cijF ; (53)

y(ij)1 + y

(ij)2 + · · ·+ y(ij)nij

= F (54)

for all i, j = 1, · · · , N. It should be noted that the number oftokens assign to a virtual pathVij in a time slot can be morethan one if the number of central modulesm > 1, in whichcase the degenerate inter-token timey

(ij)k = 0 will be allowed.

The smoothness of two-dimensional scheduling is defined asfollows.

Definition 7: For a given scheduling of the capacity matrixdecompositionC =

∑Ki=1 φiPi, the smoothness of the virtual

pathVij is defined by

dij = log

√√√√

∑nij

k=1

(

y(ij)k

)2

nij. (55)

The smoothness of input modulei is defined by

Di =

N∑

j=1

cijdij =

N∑

j=1

cij log

√∑nij

k=1(y(ij)k )2

nij, (56)

and the input smoothness is the column vector

D =

D1

D2

...DN

=

[D1, D2, · · · , DN

]T. (57)

17

Similarly, the smoothness of output modulej is defined by

Dj =

N∑

i=1

cijdij =

N∑

i=1

cij log

√∑nij

k=1(y(ij)k )2

nij, (58)

and the output smoothness is the row vector

D = [D1, D2, · · · , DN ] (59)

The smoothness of the two-dimensional scheduling is definedby

D =

N∑

i=1

Di =

N∑

j=1

Dj =

N∑

j=1

N∑

i=1

cijdij . (60)

The properties of two-dimension smoothness are similar tothat of single server scheduling. The proof of the followingtheorem is the same as that of theorem 9.

Theorem11: For capacity matrix decompositionC =∑K

i=1 φiPi with frame sizeF , any scheduling satisfies thefollowing smoothness inequalities

1) The matrixKr =[2−dij

], called Kraft’s Matrix, is

doubly sub-stochastic such that

N∑

i=1

2−dij ≤ 1;

N∑

j=1

2−dij ≤ 1. (61)

2) For input modulei = 1, · · · , N, we have

Di ≥ Hi. (62)

3) For output modulej = 1, · · · , N, we have

Dj ≥ Hj . (63)

4) The overall smoothness is bounded by

D ≥ H(C). (64)

The above equalities hold wheny(ij)k = 1cij

, for all i, j =1, 2, · · · , N andk = 1, 2, · · · , nij .

The difference between smoothness and entropy can beconsidered as the distortion of delay jitter introduced byscheduling. For circuit switch, the capacity matrixC is apermutation matrix, in which case we haveH(C) = D = 0.Another extreme case is the scheduling of uniform capacitymatrix C = [ 1N ] = 1

N

∑Ni=1 Pi with frame sizeF = N .

In this case, the entropyH(C) = N logN also equals to thesmoothnessD = N logN because inter-token timey(ij)k = N ,for all i, j = 1, 2, · · · , N and k = 1, 2, · · · , nij for anyscheduling, in which case the entropyH(C) is the maximumbecause the capacity matrix is completely uniform, and thesmoothness of any scheduling cannot make it any worse.

It is clear that if the Kraft matrix

Kr =[2−dij

]

is doubly stochastic then the two-dimensional scheduling isoptimal. However, the existence of the optimal scheduling thatcan reach the entropyH(C) of any capacity matrixC is stillan open issue. Nevertheless, a rule of thumb is to minimizeK, the number of permutation matrices in the decomposition

[52], since the maximal delay bound is on the order ofO(K)for most known scheduling algorithms [19], [53].

For example, given a doubly stochastic matrixC,

C =

0.75 0 0.125 0.1250.125 0.5 0.375 00.125 0.125 0.5 0.250 0.375 0 0.625

The input entropyH and output entropyH of matrix C aregiven respectively by

H =[1.0613 1.4056 1.7500 0.9544

]T

and

H =[1.0613 1.4056 1.4056 1.2988

]

The entropy of the capacity matrixC is H(C) = 5.1714. Onepossible decomposition is given by

8 · C =

6 0 1 11 4 3 01 1 4 20 3 0 5

= 4

1 0 0 00 1 0 00 0 1 00 0 0 1

+

1 0 0 00 0 1 00 1 0 00 0 0 1

+

1 0 0 00 0 1 00 0 0 10 1 0 0

+

0 0 1 01 0 0 00 0 0 10 1 0 0

+

0 0 0 10 0 1 01 0 0 00 1 0 0

in which K = 5 and the permutation matrices are denotedby {P1, P2, · · · , P5}, respectively. If theWFQ schedulingalgorithm is applied to this set of permutation matrices, theresulting sequence is

P1P1P1P1P2P3P4P5

which gives rise to the following two-dimensional scheduledtokens

a a a a a a b cb b b b c d d dc c c c b b a bd d d d d c c a

where each row corresponds to an output module and the sym-bol a, b, c and d represents the respective tokens assigned toinput module1 to 4. The input smoothness of this schedulingis given by

D =[1.2084 1.6997 2.0323 1.3118]

]T

and the output smoothness is

D =[1.2084 1.7636 1.6997 1.5805

]

which sum up toD = 6.2522. When theHuRRschedulingalgorithm is applied to the same set of permutation matrices,the resulting sequence is

P1P2P1P4P1P3P1P5

18


a a a b a a a cb c b d b d b dc b c a c b c bd d d c d c d a

This result is obviously smoother than the previous one asevident by the following smoothness measures

D =[1.1250 1.4375 1.7902 1.0267

]T

andD =

[1.1250 1.4375 1.4375 1.3794

]

which sum up toD = 5.3794, an improvement consistentwith the comparison of single server scheduling described inthe preceding section.

Suppose we consider another capacity matrix decomposition

8 · C =

0 0 1 01 0 0 00 1 0 00 0 0 1

+

0 0 0 10 0 1 01 0 0 00 1 0 0

+2

1 0 0 00 0 1 00 0 0 10 1 0 0

+ 4

1 0 0 00 1 0 00 0 1 00 0 0 1

in which the number of permutation matricesK = 4 isreduced. Again, applying theHuRRscheduling, the resultingsequence is

P1P4P3P4P2P4P3P4


b a a a c a a ac b d b d b d ba c b c b c b cd d c d a d c d

The smoothness measures of this two-dimensional schedulingare given by

D =[1.1250 1.4375 1.7500 1.0267

]T

andD =

[1.125 1.4375 1.4375 1.3392

]

which sum up to the overall smoothnessD = 5.3392. Bycomparing the output smoothnessD with that of the previousdecomposition, we find some improvement in the fourthoutput. Similar improvement can be found at input module3, where the optimal smoothnessD3 = 1.7500 = H3 isachieved. The overall smoothnessD = 5.3392 is also betterthan the previousD = 5.3794 but still greater than the entropyof capacity matrixH(C) = 5.1714.

Preceding examples indicate that the performance of pathswitching depends not only on the scheduling algorithm butalso on the number of permutation matrices in the capacitymatrix decomposition. As we mentioned above that the num-ber of permutation matrices in the case of uniform capacity

matrix with maximal entropy isK = N , and the worst delaybound is on the order ofO(K) for most known schedulingalgorithm, it is therefore reasonable to arrive at the conjecturethatK is on the order ofO(N) for the optimal decompositionof any capacity matrix, even though the known bound ofK inthe Birkhoff von-Neumann decomposition isN2−2N+2. TheKraft matrix and the smoothness measure introduced in thissection will provide the objective of optimal two-dimensionalscheduling to be explored in the future, which is expected tobea very hard problem because of the large number of possiblecombinations.

VII. C ONCLUSION

The parallels between packet switch subject to the con-tention and transmission channel in the presence of noiseare consequences of the law of probability. Input signal totransmission is a function of time, the main theorem on noisychannel coding being based on the law of large number. Onthe other hand, the input signal to switch is a function ofspace, and both theorems on deflection routing and smoothnessof scheduling are proved on the ground of randomness. Inboth systems, the random disturbances are tamed by means ofsimilar mathematical tools aimed for reliable communication.

The communication network has gone through two phasesof quantization in the last century. The first phase was thequantization of transmission channels, from analog to digitalbased on the sampling theorem of bandlimited signal. Thesecond phase is the quantization of the switching system, fromcircuit switching to packet switching. The comparisons pro-vided in this paper demonstrate that the scheduling based oncapacity matrix decomposition serves the same function as thesampling theorem to reduce the complexity of communication.

The concept of path switching can further be extendedto the network level to cope with resources contentions. Aconnection-oriented sub-network with predetermined topologyand bandwidth can be embedded in the current IP networkfor supporting QoS of real-time services such as voice orvideo over IP. The scheduling of path switching in conjunctionwith the routing scheme of multiprotocol label switching(MPLS) will provide a platform to support different trafficclasses of differentiated services (DiffServ) [54], [55].Thepredetermined paths of label switched sub-network are similarto the subway system embedded in the public transportationnetwork. It will provide a coherent end-to-end QoS solutionto the multilevel resource allocation [14] issues arising fromreal-time services. First, the predetermined connection patternsat each router are contention-free at the packet level. Next, theperiodic connection patterns are frame based, the bandwidthassigned to each session is fixed at burst level within eachframe. Finally, the stable capacity of each session is guaranteedat call level. The traffic engineering of this internet subwaysystem could be a challenge networking research area in thefuture.

ACKNOWLEDGEMENT

The development of this work was helped by discussionsover the years with my students Cheuk H. Lam, Philip To, Man

19

Chi Chan, S. Y. Liew, Yun Deng, Manting Choy, JianmingLiu, Sichao Ruan, and Li Pu, all members of BroadbandCommunication Laboratory at the Chinese University of HongKong.

APPENDIX IBOLZMANN MODEL OF PACKET DISTRIBUTION

Consider anN×N switch as a thermal system. The outputsare particles and the energy level of an output, denoted byεi, equals to the number of packets destined for that output.The distribution of packets over outputs can be determined bymaximizing the Boltzmann entropy [24].

Suppose there areM packets sending fromN input portsto N output ports. Letni be the number of output ports withenergy levelεi. An example withN = 8 andM = 4 is shownin Fig. 25, wheren0 = 5, n1 = 2, andn3 = 1. The number ofpossible divisions ofN outputs intor distinct energy levelsof respective sizesn0, · · · , nr is

N !

n0!n1! · · · nr!,

and the number of possible divisions ofM packets intoNdistinct outputs of respective sizes0, · · · , 0

︸︷︷︸

n0

, · · · , r, · · · , r︸︷︷︸

nr

is

M !

(0!)n0(1!)n1 · · · (r!)nr

Suppose each input can send one packet at most in any timeslot, then the total number of states of the entire switch isgiven by

W =N !

(N −M)!M !· N !

n0!n1! · · · nr!· M !

(0!)n0(1!)n1 · · · (r!)nr

(65)subject to the following constraints:

N = n0 + n1 + n2 · · ·+nr, (66)

andM = 0n0 + 1n1 + 2n2 · · ·+rnr. (67)

The Boltzmann entropyS of the system is given by:

S = C lnW (68)

where C is the Boltzmann constant. The maximal entropy canbe obtained from the following function formed by Lagrangemultipliers

f(ni) = lnW + α(∑

i

ni −N) + β(∑

i

ini −M). (69)

0123

0 5n =

1 2n =

2 1n =

1

0 5

2 b,c

a d

ab

c

d

Output ports: Particles

Packets: Energy Quantum

4567

01234567

3 4 6 7

Fig. 25. Blotzmann model of packet distribution

Using Stirling’s approximationlnx! ≈ x ln x − x for thefactorials, we have

f(ni).= N lnN −N − (N −M) ln(N −M) + (N −M)

+N lnN −N −∑

i

(ni lnni − ni)−∑

i

ni ln(i!)

+ α(∑

i

ni −N) + β(∑

i

ini −M).

(70)

Taking the derivatives with respect toni, and setting the resultto zero and solving forni yields the population numbers:

ni =e(α+βi)

i!. (71)

If the offered loadρ is uniform on each input, then we have

ρ =M

N=

∑

i ini∑

i ni= eβ . (72)

The probability that there arei packets destined for a particularoutput has the Poisson distribution:

pi =ni

N=

e(α+βi)

i!∑

i ni= e−ρ ρ

i

i!, i = 1, 2, 3, ... (73)

The carried load equals the probability that an output port isbusy

ρ′ = 1− p0 = 1− e−ρ (74)

which is consistent with (1). Hence, our proposition on thesignal power of switch is verified.

In the above derivation, the assumption that each input canonly send at most one packet in any time slot can be relaxed.As long as packets are distinguishable, the total number ofstates of outputs is

W =N !

n0!n1! · · · nr!· M !

(0!)n0(1!)n1 · · · (r!)nr(75)

and the distribution of packets is still Poisson. On the otherhand, suppose all packets are indistinguishable, then the totalnumber of states of outputs becomes

W =N !

n0!n1!n2! · · · nr!. (76)

Following the same procedure, we obtain the geometric dis-tribution of packets

pi =1

1 + ρ· ( ρ

1 + ρ)i, i = 0, 1, 2, ... (77)

which will be inconsistent with (1).We next show that both the signalSp and noiseNp of an

N × N crossbar switch are normally distributed under theassumptions of homogeneous input loadingρ and uniformoutput address.

The signal powerSp is the sum of the following i.i.d.random variables:

Xi =

{

1 if output i is busy

0 otherwise

with meanE[Xi] = ρ′ and varianceV ar[Xi] = ρ′(1 − ρ′),where ρ′ is the carried load given in (1). It follows from

20

central limit theorem that the signal powerSp =∑n

i=1 Xi

is normally distributed with meanE[Sp] = Nρ′ and varianceV ar[Sp] = Nρ′(1 − ρ′). The noise powerNp = N − SP

becomes independent of the signal powerSp whenN is large.By using the similar argument, the noiseNP is also normallydistributed with meanE[Np] = N(1 − ρ′), and varianceV ar[Np] = Nρ′(1 − ρ′).

REFERENCES

[1] C. E. Shannon, “A mathematical theory of communication,” Bell SystemTech. J., vol. 27, pp. 379–423, 623–656, July, October 1948.

[2] D. J. C. MacKay,Information Theory, Inference, and Learning Algo-rithms. Cambridge: Cambridge University Press, 2003.

[3] J. Massey, “Information theory: The Copernican system of communi-cations,” Communications Magazine, IEEE, vol. 22, no.12, pp. 26–28,Dec 1984.

[4] A. Ephremides and B. Hajek, “Information theory and communicationnetworks: An unconsummated union,”IEEE Trans. on InformationTheory, vol. 44, pp. 2416–2434, Oct. 1998.

[5] C. Clos, “A study of non-blocking switching networks,”Bell Syst. Tech.J., vol. 32, pp. 406–424, Mar. 1953.

[6] R. W. Hamming,Coding and Information Theory. Englewood Cliffs,New Jersey: Prentice-Hall, 1986.

[7] N. Abramson,Information Theory and Coding. New York: McGraw-Hill, 1963.

[8] V. E. Benes,Mathematical Theory of Connecting Networks and Tele-phone Traffic. New York: Academic Press, 1965.

[9] H. Chao, Z. Jing, and S. Liew, “Matching algorithms for three-stagebufferless Clos network switches,”Communications Magazine, IEEE,vol. 41, no.10, pp. 46–54, Oct 2003.

[10] A. Jajszczyk, “Nonblocking, repackable, and rearrangeable Clos net-works: fifty years of the theory evolution,”Communications Magazine,IEEE, vol. 41, no.10, pp. 28–33, Oct 2003.

[11] R. Gallager, “Low-density parity-check codes,”Information Theory,IEEE Transactions on, vol. 8, no.1, pp. 21–28, Jan 1962.

[12] R. M. Tanner, “A recursive approach to low complexity codes,” Infor-mation Theory, IEEE Transactions on, vol. 27, No. 5, pp. 533–547,1981.

[13] M. Sipser and D. Spielman, “Expander codes,”Information Theory,IEEE Transactions on, vol. 42, no.6, pp. 1710–1722, Nov 1996.

[14] T. T. Lee and C. H. Lam, “Path switching – a quasi-static routing schemefor large-scale ATM packet switches,”Selected Areas in Communica-tions, IEEE Journal on, vol. 15, pp. 914–924, June 1997.

[15] T. Inukai, “An efficient SS/TDMA time slot assignment algorithm,”Communications, IEEE Transactions on, vol. 27, No. 10, pp. 1449–1455, October 1979.

[16] A. Bertossi, G. Bongiovanni, and M. Bonuccelli, “Time slot assignmentin SS/TDMA systems with intersatellite links,”Communications, IEEETransactions on, vol. 35, No. 6, pp. 602–608, June 1987.

[17] C.-S. Chang, W.-J. Chen, and H.-Y. Huang, “Birkhoff-von Neumanninput buffered crossbar switches,”INFOCOM 2000. Nineteenth AnnualJoint Conference of the IEEE Computer and Communications Societies.Proceedings. IEEE, vol. 3, pp. 1614–1623, Mar 2000.

[18] H. Nyquist, “Certain topics in telegraph transmissiontheory,” Trans.AIEE, vol. 47, pp. 617–644, Apr. 1928.

[19] C. E. Shannon, “Communication in the presence of noise,” Proc.Institute of Radio Engineers, vol. 37, pp. 10–21, Jan. 1949.

[20] E. T. Whittaker, “On the functions which are represented by theexpansions of the interpolation theory,”Proc. Royal Soc. Edinburgh,vol. 35, pp. 181–194, 1915.

[21] V. A. Kotelnikov, “On the carrying capacity of the etherand wire intelecommunications,”Material for the First All-Union Conference onQuestions of Communication, Izd. Red. Upr. Svyazi RKKA, Moscow(Russian), p. 1933.

[22] V. E. Benes, “A thermodynamics theory of traffic in telephone systems,”Bell System Tech. J., vol. 42, pp. 567–607, 1963.

[23] J. Y. Hui and E. Karasan, “A thermodynamic theory of broadbandnetworks with application to dynamic routing,”IEEE J. on SelectedAreas in Commu., vol. 13, pp. 991–1003, Aug. 1995.

[24] C. Kittel and H. Kroemer,Thermal Physics. New York: W. H. Freeman,1980.

[25] J. Y. Hui, Switching and Traffic Theory for Intergrated BroadbandNetworks. Boston: Kluwer, 1990.

[26] T. T. Lee and P. P. To, “Non-blocking routing propertiesof Closnetworks,” in ”Advances in Switching Networks”, DIMACS, AmericanMathematical Society, vol. 42, pp. 181–196, 1998.

[27] A. S. Tanenbaum,Computer Networks. Prentice-Hall, EnglewoodCliffs, New Jersey, 1981.

[28] R. Gallager, “A simple derivation of the coding theoremand someapplications,” Information Theory, IEEE Transactions on, vol. 11, No.1, pp. 3 – 18, Jan 1965.

[29] F. Tobagi, T. Kwok, and F. Chiussi, “Architecture, performance, andimplementation of the tandem banyan fast packet switch,”Selected Areasin Communications, IEEE Journal on, vol. 9, No. 8, pp. 1173–1193,October 1991.

[30] S. C. Liew and T. Lee, “N logN dual shuffle-exchange network witherror-correcting routing,”Communications, IEEE Transactions on, vol.42, No. 2,3,4, pp. 754–766, February-April 1994.

[31] M. C. Paull, “Reswitching of connection networks,”Bell System Tech.J., vol. 41, pp. 833–855, 1962.

[32] P. Hall, “On representatives of subsets,”J. London Math. Soc., vol. 10,pp. 26–30, 1935.

[33] R. J. Wilson, Introduction to Graph Theory. New York: AcademicPress, 1972.

[34] R. Rado, “On the number of systems of distinct representatives of sets,”J. London Math. Soc., vol. 42, pp. 107–109, 1967.

[35] T. T. Lee and S. Y. Liew, “Parallel routing algorithms inBenes-Closnetworks,” IEEE Trans. Commun., vol. 50, pp. 1841–1847, 2002.

[36] K. Y. Eng, M. J. Karol, and Y. S. Yeh, “A growable packet (ATM) switcharchitecture: Design principles and applications,”in Proc. Globecom,Nov. 1989.

[37] R. Melen and J. S. Turner, “Nonblocking multirate distribution net-works,” Communications, IEEE Transactions on, vol. 41, pp. 362–369,Feb. 1993.

[38] N. McKeown, “The iSLIP scheduling algorithm for input-queuedswitches,” Networking, IEEE/ACM Transactions on, vol. 7, no.2, pp.188–201, Apr 1999.

[39] Y. S. Yeh, M. G. Hluchyj, and A. S. Acampora, “The knockout switch:A simple architecture for high-performance packet switching,” IEEE J.on Selected Areas in Commu., vol. sac-5, pp. 1274–1283, Oct. 1987.

[40] G. Strang, Introduction to Applied Mathematics. Wellesley MA:Wellesley-Cambridge Press, 1986.

[41] L. Kleinrock, Queueing Systems Vol 2: Computer Applications. NewYork: John Wiley and Sons, 1975.

[42] D. Bertsekas and R. Gallager,Data Networks. Englewood Cliffs, NewJersey: Prentice-Hall, 1992.

[43] G. Birkhoff, “Tres observaciones sobre el algebra lineal,” UniversidadNacional de Tucuman Revista (Serie A), vol. 5, pp. 147–151, 1946.

[44] J. Lewandowski, J. Liu, and C. Liu, “SS/TDMA time slot assignmentwith restricted switching modes,”Communications, IEEE Transactionson, vol. 31, no.1, pp. 149–154, Jan 1983.

[45] J. L. Lewandowski, C. L. Liu, and J. W. S. Liu, “An algorithmic proofof a generalization of the Birkhoff-von Neumann theorem,”Journal ofAlgorithms, vol. 7, pp. 323–330, 1986.

[46] A. Demers, S. Keshav, and S. Shenker, “Analysis and simulation of afair queueing algorithm,”in Proc. ACM Sigcomm, pp. 3–12, 1989.

[47] H. Zhang, “Service disciplines for guaranteed performance service inpacket-switching networks,”Proceedzngs of the IEEE, vol. 83, No. 10,pp. 1374–1399, October 1995.

[48] A. Parekh and R. Gallager, “A generalized processor sharing approachto flow control in integrated services networks: the single-node case,”Networking, IEEE/ACM Transactions on, vol. 1, No. 3, pp. 344–357,June 1993.

[49] R. Gallager, “Variations on a theme by Huffman,”Information Theory,IEEE Transactions on, vol. 24, No. 6, pp. 668–674, November 1978.

[50] M.-T. Choy and T. T. Lee, “Huffman fair queueing: A schedulingalgorithm providing smooth output traffic,”ICC 06, IEEE, June 2006.

[51] S. Y. Liew and T. T. Lee, “Bandwidth assignment with QoS guarantee ina class of scalable ATM switches,”Communications, IEEE Transactionson, vol. 48, pp. 377–380, 2000.

[52] I. Keslassy, M. Kodialam, T. V. Lakshman, and D. Stiliadis, “Onguaranteed smooth scheduling for input-queued switches,”Networking,IEEE/ACM Transactions on, vol. 13, No. 6, pp. 1364–1375, December2005.

[53] M. C. Chan and T. Lee, “Statistical performance guarantees in large-scale cross-path packet switch,”Networking, IEEE/ACM Transactionson, vol. 11, p. 325 C 337, April 2003.

[54] J. Lawrence, “Designing multiprotocol label switching networks,”Com-munications Magazine, IEEE, vol. 39, Issue 7, pp. 134–142, July 2001.

21

[55] P. Trimintzios, I. Andrikopoulos, G. Pavlou, P. Flegkas, D. Griffin,P. Georgatsos, D. Goderis, Y. T’Joens, L. Georgiadis, C. Jacquenet,and R. Egan, “A management and control architecture for providingip differentiated services in mpls-based networks,”CommunicationsMagazine, IEEE, vol. 39, Issue 5, pp. 80–88, May 2001.

the mathematical parallels between packet switching and ... · message = 0101 (b) transmission...

Documents