reinforcement learning for congestion-avoidance in packet flow

20
Physica A 349 (2005) 329–348 Reinforcement learning for congestion-avoidance in packet flow Tsuyoshi Horiguchi a, , Keisuke Hayashi a , Alexei Tretiakov b a Department of Computer and Mathematical Sciences, Graduate School of Information Sciences, Tohoku University, Aoba-ku, Sendai 980-8579, Japan b Department of Information Systems, Massey University, Private Bag 11222, Palmerston North 5301, New Zealand Received 5 August 2004 Available online 10 November 2004 Abstract Occurrence of congestion of packet flow in computer networks is one of the unfavorable problems in packet communication and hence its avoidance should be investigated. We use a neural network model for packet routing control in a computer network proposed in a previous paper by Horiguchi and Ishioka (Physica A 297 (2001) 521). If we assume that the packets are not sent to nodes whose buffers are already full of packets, then we find that traffic congestion occurs when the number of packets in the computer network is larger than some critical value. In order to avoid the congestion, we introduce reinforcement learning for a control parameter in the neural network model. We find that the congestion is avoided by the reinforcement learning and at the same time we have good performance for the throughput. We investigate the packet flow on computer networks of various types of topology such as a regular network, a network with fractal structure, a small-world network, a scale-free network and so on. r 2004 Elsevier B.V. All rights reserved. PACS: 05.90.+m; 05.50.+q; 07.05.Kf; 84.35.+i Keywords: Congestion control; Reinforcement learning; Computer network; Mean-field approximation; Packet flow; Small-world network; Scale-free network ARTICLE IN PRESS www.elsevier.com/locate/physa 0378-4371/$ - see front matter r 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.physa.2004.10.015 Corresponding author. Tel.: +81 22 217 5842; fax: +81 22 217 5851. E-mail address: [email protected] (T. Horiguchi).

Upload: tsuyoshi-horiguchi

Post on 21-Jun-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Reinforcement learning for congestion-avoidance in packet flow

ARTICLE IN PRESS

Physica A 349 (2005) 329–348

0378-4371/$ -

doi:10.1016/j

�CorrespoE-mail ad

www.elsevier.com/locate/physa

Reinforcement learning for congestion-avoidancein packet flow

Tsuyoshi Horiguchia,�, Keisuke Hayashia, Alexei Tretiakovb

aDepartment of Computer and Mathematical Sciences, Graduate School of Information Sciences,

Tohoku University, Aoba-ku, Sendai 980-8579, JapanbDepartment of Information Systems, Massey University, Private Bag 11222, Palmerston North 5301,

New Zealand

Received 5 August 2004

Available online 10 November 2004

Abstract

Occurrence of congestion of packet flow in computer networks is one of the unfavorable

problems in packet communication and hence its avoidance should be investigated. We use a

neural network model for packet routing control in a computer network proposed in a

previous paper by Horiguchi and Ishioka (Physica A 297 (2001) 521). If we assume that the

packets are not sent to nodes whose buffers are already full of packets, then we find that traffic

congestion occurs when the number of packets in the computer network is larger than some

critical value. In order to avoid the congestion, we introduce reinforcement learning for a

control parameter in the neural network model. We find that the congestion is avoided by the

reinforcement learning and at the same time we have good performance for the throughput.

We investigate the packet flow on computer networks of various types of topology such as a

regular network, a network with fractal structure, a small-world network, a scale-free network

and so on.

r 2004 Elsevier B.V. All rights reserved.

PACS: 05.90.+m; 05.50.+q; 07.05.Kf; 84.35.+i

Keywords: Congestion control; Reinforcement learning; Computer network; Mean-field approximation;

Packet flow; Small-world network; Scale-free network

see front matter r 2004 Elsevier B.V. All rights reserved.

.physa.2004.10.015

nding author. Tel.: +8122 217 5842; fax: +81 22 217 5851.

dress: [email protected] (T. Horiguchi).

Page 2: Reinforcement learning for congestion-avoidance in packet flow

ARTICLE IN PRESS

T. Horiguchi et al. / Physica A 349 (2005) 329–348330

1. Introduction

In the present society, everybody can be connected through the Internet and isable to communicate with each other, if one wishes. We can also easily get necessaryinformation for our daily life through Web pages. The Internet and the Web areexpanding rapidly day by day and the available information is also increasingexponentially [1]. Thus it is important to have effective packet flow control throughthe Internet and also in the computer networks. We have routing control, trafficcontrol, congestion control, sequence control and so on as for the packet flowcontrol in the computer networks. We consider the routing control and thecongestion control in the present paper among those kinds of control. In the presentpaper, a computer network is assumed to consist of nodes, links and a process. Anode is a personal computer, a router, a work station or anything like them. A link isa communication line. A process is a mathematical model for the network layer [2].We should have decentralized, autonomous, and adaptive control of packet

routing for the large-scale computer networks. One of the main issues for the packetrouting control is to find a suitable route (or path) for a packet to be sent from asource node to its destination node. It is known that finding of the shortest path isnot always the best solution for the packet routing control. Due to a trade-offbetween a queue length and a distance from the present node of a packet to itsdestination node, next-shortest paths may be found for the packet to be sent. This isa kind of optimization problem. Hence, the control of the packet flow has beeninvestigated by using neural network [3–6], which is one of the powerful methods foroptimization problems. We know that techniques developed in statistical physics arevery useful for optimization problems, especially problems formulated by usingneural networks [7]. Horiguchi and Ishioka have proposed a neural network modelfor the routing control of packet flow in large-scale computer networks within theframework of statistical physics [8]. Horiguchi et al. have also proposed a neuralnetwork model for the routing control of packet flow under the priority links forcomputer networks; the priority links have been introduced for the cases such thatsome links have higher reliability than other links as for sending packets, some linkshave higher capacity than other links and/or the ability of processing packets insome nodes is higher than other nodes [9,10]. When the environment for packet flowis time-dependently changed, we have introduced goal-directed learning by using twoneural network models at each node and the concept of the priority links [11].In those previous investigations, a packet disappears when it arrives at its

destination node or is discarded when it is sent to a node whose buffer is already fullof packets. If a packet is discarded before reaching its destination, then informationis lost in the computer network, although there is no serious congestion occurring inthe computer network except when the average number of packets is close to thebuffer size. This setting, that the packets are discarded before their arrival at theirdestination, leads to loss of information; the loss of information is not favorable insome situation. We expect that, if a packet is not discarded but stays at the node untilthe next node for the packet to be sent has a vacancy in the buffer for that packet, wemight have serious traffic congestion in the computer network. The congestion

Page 3: Reinforcement learning for congestion-avoidance in packet flow

ARTICLE IN PRESS

T. Horiguchi et al. / Physica A 349 (2005) 329–348 331

control in the computer network is one of the important problems to be investigatedfor better control of packet flow. The congestion control of the packet flow has notbeen investigated in detail so far. For this reason, we change the setting for thepacket flow taken in the previous papers in order to investigate the congestionproblem, i.e., instead of discarding packets, these packets are not sent to nodes withfull of packets in their buffer but stay at the original nodes, until there are vacanciesin the buffers. Then we find that the congestion occurs seriously. Next we investigatea method by which the congestion is avoided in the computer network, namely weintroduce reinforcement learning for this purpose. We find that the reinforcementlearning works well without depending on the topology of networks and we are ableto improve the throughput considerably.In Section 2, we describe a neural network model for routing control of packet

flow with a setting such that the packets are not sent to nodes with full of packets butstay at the nodes where the packets exist. Then we investigate the packet flow andfind the congestion occurs seriously by numerical simulations. In Section 3, weintroduce reinforcement learning in order to avoid the congestion. In Section 4, weinvestigate the packet flow on computer networks with various types of topologysuch as a network with fractal structure, a small-world network, a scale-free networkand so on. Concluding remarks are given in Section 5.

2. A neural network model for packet flow

We use a simple neural network model for optimal packet routing control in adecentralized, autonomous and adaptive way proposed in a previous paper [8]. Wefirst explain a computer network considered in the present paper. The computernetwork is assumed to consist of nodes, links and a process. We consider that thereare N nodes. At each node, we put a neural network consisting of Ni neurons,namely fnikjk 2 fi1; i2; . . . ; iNi

gg; where Ni is the number of neurons at node i. Weassume that neuron nik at node i controls the sending of a packet from node i to nodek. When there is a neuron nik at node i, there is a neuron nki at node k. We assumethat there is a link between node i and node k through neurons nik and nki; each linkis assumed to be full duplex. We apply the first-in-first-out (FIFO) rule for packets ifone of the neurons fires at a node.An example of computer networks is shown in Fig. 1, in which a node is shown by

an open circle and a link by a solid line: we assume that the computer network is anarrangement of nodes on a square lattice with nearest neighbor and next nearest-neighbor connections. In the neural network at a node, we assume that a neuron isconnected fully with other neurons within the same node. By considering a trade-offbetween the queue length at each node and the shortest path of a packet to itsdestination node, we define an energy function for the neural network

E ¼1

2

XN

i

XNi

k

XNi

l

Jik;il siksil � ZXN

i

XNi

l

1�1

bl

ql þ1

2

XNl

jai

sjl

!( )sil

Page 4: Reinforcement learning for congestion-avoidance in packet flow

ARTICLE IN PRESS

Fig. 1. An example of a computer network with 49 nodes on a square lattice.

T. Horiguchi et al. / Physica A 349 (2005) 329–348332

� ð1� ZÞXN

i

XNi

l

1�dl;gi

dc

� �sil þ x

XN

i

XNi

l

sil � 1

!2

; ð1Þ

where sik is a state variable of neuron nik; sik 2 f0; 1g: Jik;il is a connection weightbetween neurons nik and nil : We assume that Jik;il ¼ 1; Jik;il ¼ Jil;ik and Jik;ik ¼ 0 inthe present paper. Here bl is a buffer size at node l, dc a constant related to acharacteristic path-length of the computer network, ql a queue length at node l, dl;gi

the shortest distance of a packet, ready to go out from node l, to its destination nodegi: The last term in Eq. (1) is a constraint term which forces only one neuron to fire ateach node. Two control parameters are denoted by Z and x:In order to introduce soft control of the packet flow, we use a mean-field

approximation. The energy given by Eq. (1) is expressed in the following way byusing a mean-field approximation [10]:

Emf ¼1

2

XN

i

XNi

k

XNi

l

Jik;ilvikvil � ZXN

i

XNi

l

1�1

bl

ql þ1

2

XNl

jai

vjl

!( )vil

� ð1� ZÞXN

i

XNi

l

1�dl;gi

dc

� �vil þ x

XN

i

XNi

l

vil � 1

!2

: ð2Þ

Here vil is calculated by

vil ¼ hsili ¼1

1þ expf�bhilg(3)

Page 5: Reinforcement learning for congestion-avoidance in packet flow

ARTICLE IN PRESS

T. Horiguchi et al. / Physica A 349 (2005) 329–348 333

in terms of an internal effective field hil ; where b ¼ 1=kT as usual and T is theabsolute temperature in the statistical physics and expresses a parameter for the softcontrol in the present system.For numerical simulations, we assume that a state of neuron, say sil ; depends on

time t and hence its corresponding hil and vil ; and also Emf are a function of t. Anequation for the dynamics has been derived such a way that the energy Emf ðtÞ

decreases as a function of time t. Then a time dependence of the internal effectivefield hilðtÞ is given as follows [10]:

d

dthilðtÞ ¼ �

XNi

k

Jik;ilvikðtÞ þ Z 1�1

bl

ql þXNl

jai

vjlðtÞ

!( )

þ ð1� ZÞ 1�dl;gi

dc

� �� 2x

XNi

k

vikðtÞ � 1

!: ð4Þ

In numerical simulations, we use a discrete time renewal process for Eq. (4) in thecalculation of the internal effective fields. Namely, we approximate Eq. (4) asfollows:

hilðt þ 1Þ ¼ hilðtÞ �XNi

k

Jik;ilvikðtÞ þ Z 1�1

bl

ql þXNl

jai

vjlðtÞ

!( )

þ ð1� ZÞ 1�dl;gi

dc

� �� 2x

XNi

k

vikðtÞ � 1

!: ð5Þ

Then the average value of state of neuron is obtained by using Eqs. (3) and (5) foreach iteration step. We determine whether neuron nil fires or not by using threshold yfor the thermal average of neuron vil : Namely, we assume that the state of neuron nil

takes 1 if vilXy and takes 0 if viloy; where we use y ¼ 0:9: When neuron nil fires, apacket ready to go out at node i is sent to node l according to the FIFO rule.The numerical simulations have been done for the computer network with 49

nodes shown in Fig. 1, as an example for the computer networks with regularlyarranged nodes. As an initial condition, we create Np packets randomly on the nodesin the computer network, where qiobi; each packet has its destination node chosenrandomly, namely chosen from the uniform distribution. We have performednumerical simulations in which we send a packet from node i to the node determinedby calculating fvilg; we will not send a packet if the node indicated by a neuron forthe packet to be sent is already full of packets in its buffer. We say this situation issuch that a packet is not accepted in a node even though it is sent to the node. Apacket will disappear from the computer network only if the packet arrives at itsdestination node and then a new packet is created at a randomly chosen node in thecomputer network in order to keep the total number of packets constant; we alsoassign randomly a destination node to its packet. We have made 30 times ofiterations for each neural networks in order to have fvilg and 250 times of iterationsas for sending packets in the computer network; the sending packets have been done

Page 6: Reinforcement learning for congestion-avoidance in packet flow

ARTICLE IN PRESS

T. Horiguchi et al. / Physica A 349 (2005) 329–348334

synchronously. We have made 20 simulations for each given parameters as for thestatistical average of the results.We define an average number of packets Np; existing in the computer network and

an average number of packets moved from a node to the other node Mr; respectively,as follows:

Np ¼1

N

XN

i

qi; Mr ¼Nf � Nr

Nt

; (6)

where Nf is the total number of nodes at which a neuron has fired, Nr the totalnumber of packets not accepted in a node even if it is sent to the node and Nt thetotal number of nodes with at least one packet. We notice that Mr corresponds to theaverage movement of packets. We calculate the total number of packets which havearrived at their destination and denote it by Na: We notice that Na corresponds tothe throughput.The parameters are then chosen as N ¼ 49; dc ¼ 6; Z ¼ 0:7; x ¼ 0:1; b ¼ 2:0 for

the computer network shown in Fig. 1. We assume bi ¼ 50 for all the nodes i. Weshow the results for Mr; Na and Nr in Fig. 2, as examples, as a function of Np: Wefind that the average movement of packets Mr and the throughput Na drop suddenlyat some value Nc

p of the average number of packets Np: At the same time, we see thatthe average number of packets, namely Nr; which are not accepted to a node even ifit is sent to the node, increases suddenly at Nc

p: These results indicate that there is atraffic congestion of packet flow in the computer network for Np4Nc

p: We havefound from the obtained results that serious congestion occurs more easily forsmaller values of Z; but the throughput is larger for smaller values of Z if there is nocongestion.

3. Reinforcement learning for packet flow

As seen in Section 2, we have a serious congestion occurring for Np4Ncp: The

congestion is not preferable as for the efficiency of the packet flow. We observe thatNc

p is larger for the larger values of Z; since several paths for the destination arepossible. On the other hand, we see that Na has smaller values for the larger values ofZ: From these observations, we consider that if there is no congestion occurring, weshould use smaller values of Z in order to increase the throughput Na: But if thecongestion has started to occur, it is better to use larger values of Z in order to avoidthe congestion. Hence it is important to adjust the parameter Z: We introducereinforcement learning proposed by Sutton and Barto [12] for this purpose.The reinforcement learning is a kind of unsupervised learning, which is

characterized by the absence of external supervisor. There is a kind of trade-offbetween exploration and exploitation. Exploitation is the knowledge which isalready known to an agent, namely each node in the present case, in order to getreward in the past and exploration is the action for better selection in the future. Inthe reinforcement learning, the agent must take an action in order to get a lot of

Page 7: Reinforcement learning for congestion-avoidance in packet flow

ARTICLE IN PRESS

Fig. 2. Average number of packets moved from a node to other nodes Mr; the total number of packetsarrived at their destination Na; and the total number of packets not accepted in a node even if it were sentto the node Nr; respectively, as a function of the average number of packets Np in the network. These are

the results without the reinforcement learning for the network with N ¼ 49 on the square lattice shown in

Fig. 1. The parameters are set as x ¼ 0:3 and b ¼ 3:0: The symbols for curves are given on the top of the

figure.

T. Horiguchi et al. / Physica A 349 (2005) 329–348 335

Page 8: Reinforcement learning for congestion-avoidance in packet flow

ARTICLE IN PRESS

T. Horiguchi et al. / Physica A 349 (2005) 329–348336

reward as an interactive learning. We also say that the reinforcement learning is akind of goal-directed learning for uncertain environment.Now we consider in the reinforcement learning that the value of Z depends on

node i and the number of times Ts during which each node has sent a packet. Hencewe rewrite the energy for the neural network as follows:

E ¼1

2

XN

i

XNi

k

XNi

l

Jik;il siksil � ZiðTsÞXN

i

XNi

l

1�1

bl

ql þ1

2

XNl

jai

sjl

!( )sil

� ð1� ZiðTsÞÞXN

i

XNi

l

1�dl;gi

dc

� �sil þ x

XN

i

XNi

l

sil � 1

!2

: ð7Þ

Eq. (5) is now rewritten as follows:

hilðt þ 1Þ ¼ hilðtÞ �XNi

k

Jik;ilvikðtÞ þ ZiðTsÞ 1�

1

bl

ql þXNl

jai

vjlðtÞ

!( )

þ ð1� ZiðTsÞÞ 1�

dl;gi

dc

� �� 2x

XNi

k

vikðtÞ � 1

!: ð8Þ

The problem is how we improve the value of ZiðTsÞ: We then propose the following

algorithm:

1.

Zið1Þ ¼ C for any i, where C is constant in 0pCp1: 2. Node i sends a packet at Tsth time step by using ZiðT

sÞ:

3. ZiðT

s þ 1Þ is determined by

ZiðTs þ 1Þ ¼ ð1� aÞZiðT

sÞ þ afgZ~{ðTsÞðTsÞ þ riðT

sÞg : (9)

4.

Go back to 2.

Parameters a and g are a learning rate and a discount rate, respectively, and areassumed that a 2 ½0; 1 and g 2 ½0; 1 : In the present numerical simulations, we usea ¼ 0:1 and g ¼ 0:9; after checking several values for these parameters. In Eq. (9),~{ðTsÞ is a node number chosen for a packet to be sent from node i at Tsth step. riðT

is defined as follows:

riðTsÞ ¼

0 if q~{ðTsÞob~{ðTsÞ ;

1� gZ~{ðTsÞðTsÞ if q~{ðTsÞ ¼ b~{ðTsÞ :

((10)

In this way, if the buffer at node ~{ðTsÞ is full of packets, then the value of ZiðTs þ 1Þ

increases. But if the buffer is not full of packets, then the value of ZiðTs þ 1Þ

decreases. We notice that 0pZiðTsÞp1: The learning agent, namely each node,

determines the value of ZiðTsÞ assigning to the agent according to the environment

surrounding the agent i.We show the obtained results by numerical simulations in Fig. 3 for the computer

network given in Section 2 by using the same setting except that ZiðTsÞ is used instead

Page 9: Reinforcement learning for congestion-avoidance in packet flow

ARTICLE IN PRESS

Fig. 3. Mr;Na and Nr as a function of Np; respectively, with the reinforcement learning for the network

on the square lattice. See the figure caption given in Fig. 2 for detail.

T. Horiguchi et al. / Physica A 349 (2005) 329–348 337

of Z: As seen in Fig. 3, the reinforcement learning works well for the packet flow inorder to avoid the occurrence of the traffic congestion. We also found that theobtained results do not depend much on the initial value of Zið1Þ: This is also one ofthe preferable results.

Page 10: Reinforcement learning for congestion-avoidance in packet flow

ARTICLE IN PRESS

T. Horiguchi et al. / Physica A 349 (2005) 329–348338

4. Packet flow on computer networks with various types of topology

In this section, we investigate the packet flow on computer networks with varioustypes of topology; we just say a network instead of a computer network in thissection. First, we investigate the packet flow on a network given by a fractal latticeshown in Fig. 4; this is a square lattice obtained from the Sierpinski Carpet of step 2[13]. In Fig. 5, we show the results for Mr; Na and Nr obtained for the case where wedo not introduce the reinforcement learning. We find that the results for Z ¼ 0:1 arevery poor for these quantities, since there are nodes at which many packetsconcentrate and they play the role of a bottleneck for packet flow, since the effect ofthe shortest path routing is too stressed for smaller values of Z: We show the resultsfor Mr; Na and Nr obtained by using the reinforcement learning in Fig. 6. As seen inFig. 6, we have a good performance in the packet flow when the reinforcementlearning is introduced.Next we consider the packet flow on a small-world network which is highly

clustered and yet has small characteristic path lengths. We have constructed a small-world network according to the algorithm proposed by Watts and Strogatz [14]:

1.

First, N nodes are put on a ring and each node is connected to its Kth nearestneighbors by undirected links, where 1oK5N:

2.

Then we reconnect each link at random with probability p.

We calculate the average of the shortest path length LðpÞ and the clusteringcoefficient CðpÞ defined as follows:

LðpÞ ¼2

NðN þ 1Þ

Xi4j

lij ; (11)

Fig. 4. An example of a square lattice obtained from the Sierpinski Carpet of step 2.

Page 11: Reinforcement learning for congestion-avoidance in packet flow

ARTICLE IN PRESS

Fig. 5. Mr;Na and Nr as a function of Np; respectively, without the reinforcement learning for the

network shown in Fig. 4. See the figure caption given in Fig. 2 for detail.

T. Horiguchi et al. / Physica A 349 (2005) 329–348 339

Page 12: Reinforcement learning for congestion-avoidance in packet flow

ARTICLE IN PRESS

Fig. 6. Mr;Na and Nr as a function of Np; respectively, with the reinforcement learning for the network

shown in Fig. 4. See the figure caption given in Fig. 2 for detail.

T. Horiguchi et al. / Physica A 349 (2005) 329–348340

Page 13: Reinforcement learning for congestion-avoidance in packet flow

ARTICLE IN PRESS

Fig. 7. The average of the shortest path length, LðpÞ=Lð0Þ; and the cluster coefficient, CðpÞ=Cð0Þ; as afunction of p, respectively, for the small-world network with N ¼ 100 and K ¼ 3:

T. Horiguchi et al. / Physica A 349 (2005) 329–348 341

where lij is the shortest path-length between node i and node j and is defined by thesmallest number of hops along links, and

CðpÞ ¼

PNi ci;3PNi ci;2

; (12)

where

ci;n ¼Xj4k

dðmi; jþmj;kþmk;iÞ;n (13)

and mi; j takes 1 when node i and node j are connected directly by a link and 0otherwise. da;b is the Kronecker delta. We give the results of LðpÞ=Lð0Þ andCðpÞ=Cð0Þ for one of small world networks obtained by the above algorithm in Fig. 7for N ¼ 100 and K ¼ 3:In Fig. 8, we show the results for Mr; Na and Nr for the typical case of the small-

world network, namely by setting p ¼ 0:03; for the case that the reinforcementlearning is not introduced. We see that there occurs a serious congestion in thepacket flow for the case of Z ¼ 0:1: This is due to the fact that there exist someshorter paths, and packets try to use those paths. In Fig. 9, we give the results forMr; Na and Nr by using the reinforcement learning. As seen in Fig. 9, we also have agood performance for the packet flow when the reinforcement learning is introducedfor the small-world network.Finally we consider the packet flow on scale-free networks. We construct the scale

free-networks by using the algorithm proposed by Barabasi and Albert [15]. See [16]and [17] for a general introduction to scale-free networks.

1.

We put M0 nodes first without any link, and then we put a new node with M linksat every time step.
Page 14: Reinforcement learning for congestion-avoidance in packet flow

ARTICLE IN PRESS

Fig. 8. Mr;Na and Nr as a function of Np; respectively, without the reinforcement learning for the smallworld network with p ¼ 0:03: See the figure caption given in Fig. 2 for detail.

T. Horiguchi et al. / Physica A 349 (2005) 329–348342

Page 15: Reinforcement learning for congestion-avoidance in packet flow

ARTICLE IN PRESS

Fig. 9. Mr;Na and Nr as a function of Np; respectively, with the reinforcement learning for the small-

world network with p ¼ 0:03: See the figure caption given in Fig. 2 for detail.

T. Horiguchi et al. / Physica A 349 (2005) 329–348 343

Page 16: Reinforcement learning for congestion-avoidance in packet flow

ARTICLE IN PRESS

T. Horiguchi et al. / Physica A 349 (2005) 329–348344

2.

We connect M links of the newly added node to nodes already existing in thenetwork with the probability Pi given by

Pi ¼ki þ 1Pj ðkj þ 1Þ

; (14)

where ki is the connectivity of node i.

We notice that the network is a kind of random networks with t þ M0 nodes and Mt

links. Barabasi and Albert have shown that the network obtained in their algorithmshows the scale-free behavior. An example of the obtained networks is shown inFig. 10 for M ¼ M0 ¼ 3 and N ¼ 100: We have confirmed that the exponent g is2.92 for N ¼ 105 and tends to be 3 in the N infinity limit.In Fig. 11, we show the results for Mr; Na and Nr for the scale-free network shown

in Fig. 10 with N ¼ 100; and M ¼ M0 ¼ 3 when there is no reinforcement learning.We choose parameter x ¼ 0:6: This value of x is different from the one used for theother networks. The value of x is not so important usually. However, in the scale-freenetwork there exist hub nodes. In a hub node, there are many neurons in our model.These neurons try to fire at the hub node and hence we force only one neuron to firein the hub node in order to save the simulation time by using a rather large value of

Fig. 10. An example of the scale-free network with N ¼ 100 and M ¼ M0 ¼ 3 by Albert and Barabasi.

Page 17: Reinforcement learning for congestion-avoidance in packet flow

ARTICLE IN PRESS

Fig. 11. Mr;Na and Nr as a function of Np; respectively, without the reinforcement learning for the scale-free network shown in Fig. 10. See the figure caption given in Fig. 2 for detail.

T. Horiguchi et al. / Physica A 349 (2005) 329–348 345

x: Of course there is no big difference in the obtained results for different valuesfor x: Now we find that the traffic congestion easily occurs for Z ¼ 0:1 and 0.3. Thisis because the hub nodes play as a bottleneck in the packet flow. In the throughput,

Page 18: Reinforcement learning for congestion-avoidance in packet flow

ARTICLE IN PRESS

T. Horiguchi et al. / Physica A 349 (2005) 329–348346

we find there exist two peaks in Fig. 11. This behavior is completely different fromthat in other networks. For example, we see two peaks around Np ¼ 5 and 37 andone valley around Np ¼ 20 for Z ¼ 0:5: In order to understand this situation, we

Fig. 12. Mr;Na and Nr as a function of Np; respectively, with the reinforcement learning for the scale-freenetwork shown in Fig. 10. See the figure caption given in Fig. 2 for detail.

Page 19: Reinforcement learning for congestion-avoidance in packet flow

ARTICLE IN PRESS

T. Horiguchi et al. / Physica A 349 (2005) 329–348 347

have calculated an average number of usage of links for packet communication andfound that there exist local loops which trap packets around Np ¼ 20:When Np is not so large, the hub nodes are not full of packets yet. However,

around Np ¼ 20; the hub nodes have a lots of packets and are almost full ofpackets. Hence packets try to move avoiding those hub nodes. Thus there existlocal loops which trap packets. Around Np ¼ 37; there is no big difference in thenumber of packets left as a vacancy in the buffer at each node, regardless of a hubnode or not, since the number of packets in the network are almost limited innumber. Then the throughput increases again until the buffer becomes full ofpackets. We give the results for Mr; Na and Nr by using the reinforcement learningin Fig. 12. In Fig. 12, we see again that a good performance for the packet flow isachieved when the reinforcement learning is introduced even for the scale-freenetwork.

5. Concluding remarks

In the present paper, we have proposed reinforcement learning in order to avoidtraffic congestion in the packet flow. As a basic model of the packet routing, we haveused the model proposed by Horiguchi and Ishioka [8]. We have found that ifpackets were not sent to nodes whose buffers are full of packets, then a seriouscongestion is observed in the computer network. By using the proposed algorithmfor the reinforcement learning, we have succeeded in avoiding these seriouscongestions and at the same time we have found that the throughput is improvedvery much. We have investigated the packet flow in the networks with various typesof topology, such as the one with fractal structure, the small-world network, thescale-free network and so on. For all of these networks, the proposed reinforcementlearning works very well.In increasing the usage of the computer network such as Internet, the designing

of network structure is important for better communication of information.Namely, the search for a network structure which gives us efficient packetflow is important. One of the methods for this purpose is given as follows:First we define a cost function for the efficiency of the packet communication.Then we obtain an optimized network structure by reconnecting links in the networkso as to minimize the defined cost function. This problem is left as one of futureproblems.

Acknowledgements

We are grateful to K. Tanaka, K. Katayama, C. Yamaguchi and J. Okubo fortheir valuable discussions. This work was partially supported by Grant-in-Aid forScientific Research no.14084202 from MEXT of Japan and also by the Ishidafoundation.

Page 20: Reinforcement learning for congestion-avoidance in packet flow

ARTICLE IN PRESS

T. Horiguchi et al. / Physica A 349 (2005) 329–348348

References

[1] D. Baldi, P. Frasconi, P. Smith, Modeling the Internet and the Web, Wiley, Chichester, West Sussex,

2003.

[2] A.S. Tanenbaum, Computer Networks, third ed., Prentice-Hall, Englewood Cliffs, NJ, 1998.

[3] H.E. Rauch, T. Winarske, IEEE Control Syst. Mag. (1988) 26.

[4] I. Iida, A. Chungo, R. Yatsuboshi, Proc. IEEE Int. Conf. SMC (1989) 194.

[5] M.K. Mehmet Ali, F. Kamoun, IEEE Trans. Neural Networks 4 (1993) 941.

[6] H. Kurokawa, C.Y. Ho, S. Mori, Neural Networks 11 (1998) 347.

[7] J. Hertz, A. Krogh, R.G. Palmer, Introduction to the Theory of Neural Computations, Addison-

Wesley, Redwood City, CA, 1991.

[8] T. Horiguchi, S. Ishioka, Physica A 297 (2001) 521.

[9] T. Horiguchi, H. Takahashi, ICANN/ICONIP 2003 International Conference, Supplementary

Proceedings, 2003, p. 358.

[10] T. Horiguchi, H. Takahashi, K. Hayashi, C. Yamaguchi, Proceedings of 2003 Joint Workshop of

Hayashibara Foundation and SMAPIP, 2003, p. 115.

[11] T. Horiguchi, H. Takahashi, K. Hayashi, C. Yamaguchi, Physica A 339 (2004) 653.

[12] R.S. Sutton, A.G. Barto, Reinforcement Learning, MIT Press, Cambridge, MA, 1998.

[13] H.-O. Peitgen, H. Jurgens, D. Saupe, Chaos and Fractals, Springer, New York, 1992.

[14] D.J. Watts, S.H. Strogatz, Nature 393 (1998) 440.

[15] A.-L. Barabasi, R. Albert, Science 286 (1999) 509.

[16] R. Albert, A.-L. Barabasi, Rev. Mod. Phys. 74 (2002) 47.

[17] S.N. Dorogovtsev, J.F.F. Mendes, Evolution of Networks, Oxford University Press, Oxford, 2003.