reinforcement learning for congestion-avoidance in packet flow
TRANSCRIPT
ARTICLE IN PRESS
Physica A 349 (2005) 329–348
0378-4371/$ -
doi:10.1016/j
�CorrespoE-mail ad
www.elsevier.com/locate/physa
Reinforcement learning for congestion-avoidancein packet flow
Tsuyoshi Horiguchia,�, Keisuke Hayashia, Alexei Tretiakovb
aDepartment of Computer and Mathematical Sciences, Graduate School of Information Sciences,
Tohoku University, Aoba-ku, Sendai 980-8579, JapanbDepartment of Information Systems, Massey University, Private Bag 11222, Palmerston North 5301,
New Zealand
Received 5 August 2004
Available online 10 November 2004
Abstract
Occurrence of congestion of packet flow in computer networks is one of the unfavorable
problems in packet communication and hence its avoidance should be investigated. We use a
neural network model for packet routing control in a computer network proposed in a
previous paper by Horiguchi and Ishioka (Physica A 297 (2001) 521). If we assume that the
packets are not sent to nodes whose buffers are already full of packets, then we find that traffic
congestion occurs when the number of packets in the computer network is larger than some
critical value. In order to avoid the congestion, we introduce reinforcement learning for a
control parameter in the neural network model. We find that the congestion is avoided by the
reinforcement learning and at the same time we have good performance for the throughput.
We investigate the packet flow on computer networks of various types of topology such as a
regular network, a network with fractal structure, a small-world network, a scale-free network
and so on.
r 2004 Elsevier B.V. All rights reserved.
PACS: 05.90.+m; 05.50.+q; 07.05.Kf; 84.35.+i
Keywords: Congestion control; Reinforcement learning; Computer network; Mean-field approximation;
Packet flow; Small-world network; Scale-free network
see front matter r 2004 Elsevier B.V. All rights reserved.
.physa.2004.10.015
nding author. Tel.: +8122 217 5842; fax: +81 22 217 5851.
dress: [email protected] (T. Horiguchi).
ARTICLE IN PRESS
T. Horiguchi et al. / Physica A 349 (2005) 329–348330
1. Introduction
In the present society, everybody can be connected through the Internet and isable to communicate with each other, if one wishes. We can also easily get necessaryinformation for our daily life through Web pages. The Internet and the Web areexpanding rapidly day by day and the available information is also increasingexponentially [1]. Thus it is important to have effective packet flow control throughthe Internet and also in the computer networks. We have routing control, trafficcontrol, congestion control, sequence control and so on as for the packet flowcontrol in the computer networks. We consider the routing control and thecongestion control in the present paper among those kinds of control. In the presentpaper, a computer network is assumed to consist of nodes, links and a process. Anode is a personal computer, a router, a work station or anything like them. A link isa communication line. A process is a mathematical model for the network layer [2].We should have decentralized, autonomous, and adaptive control of packet
routing for the large-scale computer networks. One of the main issues for the packetrouting control is to find a suitable route (or path) for a packet to be sent from asource node to its destination node. It is known that finding of the shortest path isnot always the best solution for the packet routing control. Due to a trade-offbetween a queue length and a distance from the present node of a packet to itsdestination node, next-shortest paths may be found for the packet to be sent. This isa kind of optimization problem. Hence, the control of the packet flow has beeninvestigated by using neural network [3–6], which is one of the powerful methods foroptimization problems. We know that techniques developed in statistical physics arevery useful for optimization problems, especially problems formulated by usingneural networks [7]. Horiguchi and Ishioka have proposed a neural network modelfor the routing control of packet flow in large-scale computer networks within theframework of statistical physics [8]. Horiguchi et al. have also proposed a neuralnetwork model for the routing control of packet flow under the priority links forcomputer networks; the priority links have been introduced for the cases such thatsome links have higher reliability than other links as for sending packets, some linkshave higher capacity than other links and/or the ability of processing packets insome nodes is higher than other nodes [9,10]. When the environment for packet flowis time-dependently changed, we have introduced goal-directed learning by using twoneural network models at each node and the concept of the priority links [11].In those previous investigations, a packet disappears when it arrives at its
destination node or is discarded when it is sent to a node whose buffer is already fullof packets. If a packet is discarded before reaching its destination, then informationis lost in the computer network, although there is no serious congestion occurring inthe computer network except when the average number of packets is close to thebuffer size. This setting, that the packets are discarded before their arrival at theirdestination, leads to loss of information; the loss of information is not favorable insome situation. We expect that, if a packet is not discarded but stays at the node untilthe next node for the packet to be sent has a vacancy in the buffer for that packet, wemight have serious traffic congestion in the computer network. The congestion
ARTICLE IN PRESS
T. Horiguchi et al. / Physica A 349 (2005) 329–348 331
control in the computer network is one of the important problems to be investigatedfor better control of packet flow. The congestion control of the packet flow has notbeen investigated in detail so far. For this reason, we change the setting for thepacket flow taken in the previous papers in order to investigate the congestionproblem, i.e., instead of discarding packets, these packets are not sent to nodes withfull of packets in their buffer but stay at the original nodes, until there are vacanciesin the buffers. Then we find that the congestion occurs seriously. Next we investigatea method by which the congestion is avoided in the computer network, namely weintroduce reinforcement learning for this purpose. We find that the reinforcementlearning works well without depending on the topology of networks and we are ableto improve the throughput considerably.In Section 2, we describe a neural network model for routing control of packet
flow with a setting such that the packets are not sent to nodes with full of packets butstay at the nodes where the packets exist. Then we investigate the packet flow andfind the congestion occurs seriously by numerical simulations. In Section 3, weintroduce reinforcement learning in order to avoid the congestion. In Section 4, weinvestigate the packet flow on computer networks with various types of topologysuch as a network with fractal structure, a small-world network, a scale-free networkand so on. Concluding remarks are given in Section 5.
2. A neural network model for packet flow
We use a simple neural network model for optimal packet routing control in adecentralized, autonomous and adaptive way proposed in a previous paper [8]. Wefirst explain a computer network considered in the present paper. The computernetwork is assumed to consist of nodes, links and a process. We consider that thereare N nodes. At each node, we put a neural network consisting of Ni neurons,namely fnikjk 2 fi1; i2; . . . ; iNi
gg; where Ni is the number of neurons at node i. Weassume that neuron nik at node i controls the sending of a packet from node i to nodek. When there is a neuron nik at node i, there is a neuron nki at node k. We assumethat there is a link between node i and node k through neurons nik and nki; each linkis assumed to be full duplex. We apply the first-in-first-out (FIFO) rule for packets ifone of the neurons fires at a node.An example of computer networks is shown in Fig. 1, in which a node is shown by
an open circle and a link by a solid line: we assume that the computer network is anarrangement of nodes on a square lattice with nearest neighbor and next nearest-neighbor connections. In the neural network at a node, we assume that a neuron isconnected fully with other neurons within the same node. By considering a trade-offbetween the queue length at each node and the shortest path of a packet to itsdestination node, we define an energy function for the neural network
E ¼1
2
XN
i
XNi
k
XNi
l
Jik;il siksil � ZXN
i
XNi
l
1�1
bl
ql þ1
2
XNl
jai
sjl
!( )sil
ARTICLE IN PRESS
Fig. 1. An example of a computer network with 49 nodes on a square lattice.
T. Horiguchi et al. / Physica A 349 (2005) 329–348332
� ð1� ZÞXN
i
XNi
l
1�dl;gi
dc
� �sil þ x
XN
i
XNi
l
sil � 1
!2
; ð1Þ
where sik is a state variable of neuron nik; sik 2 f0; 1g: Jik;il is a connection weightbetween neurons nik and nil : We assume that Jik;il ¼ 1; Jik;il ¼ Jil;ik and Jik;ik ¼ 0 inthe present paper. Here bl is a buffer size at node l, dc a constant related to acharacteristic path-length of the computer network, ql a queue length at node l, dl;gi
the shortest distance of a packet, ready to go out from node l, to its destination nodegi: The last term in Eq. (1) is a constraint term which forces only one neuron to fire ateach node. Two control parameters are denoted by Z and x:In order to introduce soft control of the packet flow, we use a mean-field
approximation. The energy given by Eq. (1) is expressed in the following way byusing a mean-field approximation [10]:
Emf ¼1
2
XN
i
XNi
k
XNi
l
Jik;ilvikvil � ZXN
i
XNi
l
1�1
bl
ql þ1
2
XNl
jai
vjl
!( )vil
� ð1� ZÞXN
i
XNi
l
1�dl;gi
dc
� �vil þ x
XN
i
XNi
l
vil � 1
!2
: ð2Þ
Here vil is calculated by
vil ¼ hsili ¼1
1þ expf�bhilg(3)
ARTICLE IN PRESS
T. Horiguchi et al. / Physica A 349 (2005) 329–348 333
in terms of an internal effective field hil ; where b ¼ 1=kT as usual and T is theabsolute temperature in the statistical physics and expresses a parameter for the softcontrol in the present system.For numerical simulations, we assume that a state of neuron, say sil ; depends on
time t and hence its corresponding hil and vil ; and also Emf are a function of t. Anequation for the dynamics has been derived such a way that the energy Emf ðtÞ
decreases as a function of time t. Then a time dependence of the internal effectivefield hilðtÞ is given as follows [10]:
d
dthilðtÞ ¼ �
XNi
k
Jik;ilvikðtÞ þ Z 1�1
bl
ql þXNl
jai
vjlðtÞ
!( )
þ ð1� ZÞ 1�dl;gi
dc
� �� 2x
XNi
k
vikðtÞ � 1
!: ð4Þ
In numerical simulations, we use a discrete time renewal process for Eq. (4) in thecalculation of the internal effective fields. Namely, we approximate Eq. (4) asfollows:
hilðt þ 1Þ ¼ hilðtÞ �XNi
k
Jik;ilvikðtÞ þ Z 1�1
bl
ql þXNl
jai
vjlðtÞ
!( )
þ ð1� ZÞ 1�dl;gi
dc
� �� 2x
XNi
k
vikðtÞ � 1
!: ð5Þ
Then the average value of state of neuron is obtained by using Eqs. (3) and (5) foreach iteration step. We determine whether neuron nil fires or not by using threshold yfor the thermal average of neuron vil : Namely, we assume that the state of neuron nil
takes 1 if vilXy and takes 0 if viloy; where we use y ¼ 0:9: When neuron nil fires, apacket ready to go out at node i is sent to node l according to the FIFO rule.The numerical simulations have been done for the computer network with 49
nodes shown in Fig. 1, as an example for the computer networks with regularlyarranged nodes. As an initial condition, we create Np packets randomly on the nodesin the computer network, where qiobi; each packet has its destination node chosenrandomly, namely chosen from the uniform distribution. We have performednumerical simulations in which we send a packet from node i to the node determinedby calculating fvilg; we will not send a packet if the node indicated by a neuron forthe packet to be sent is already full of packets in its buffer. We say this situation issuch that a packet is not accepted in a node even though it is sent to the node. Apacket will disappear from the computer network only if the packet arrives at itsdestination node and then a new packet is created at a randomly chosen node in thecomputer network in order to keep the total number of packets constant; we alsoassign randomly a destination node to its packet. We have made 30 times ofiterations for each neural networks in order to have fvilg and 250 times of iterationsas for sending packets in the computer network; the sending packets have been done
ARTICLE IN PRESS
T. Horiguchi et al. / Physica A 349 (2005) 329–348334
synchronously. We have made 20 simulations for each given parameters as for thestatistical average of the results.We define an average number of packets Np; existing in the computer network and
an average number of packets moved from a node to the other node Mr; respectively,as follows:
Np ¼1
N
XN
i
qi; Mr ¼Nf � Nr
Nt
; (6)
where Nf is the total number of nodes at which a neuron has fired, Nr the totalnumber of packets not accepted in a node even if it is sent to the node and Nt thetotal number of nodes with at least one packet. We notice that Mr corresponds to theaverage movement of packets. We calculate the total number of packets which havearrived at their destination and denote it by Na: We notice that Na corresponds tothe throughput.The parameters are then chosen as N ¼ 49; dc ¼ 6; Z ¼ 0:7; x ¼ 0:1; b ¼ 2:0 for
the computer network shown in Fig. 1. We assume bi ¼ 50 for all the nodes i. Weshow the results for Mr; Na and Nr in Fig. 2, as examples, as a function of Np: Wefind that the average movement of packets Mr and the throughput Na drop suddenlyat some value Nc
p of the average number of packets Np: At the same time, we see thatthe average number of packets, namely Nr; which are not accepted to a node even ifit is sent to the node, increases suddenly at Nc
p: These results indicate that there is atraffic congestion of packet flow in the computer network for Np4Nc
p: We havefound from the obtained results that serious congestion occurs more easily forsmaller values of Z; but the throughput is larger for smaller values of Z if there is nocongestion.
3. Reinforcement learning for packet flow
As seen in Section 2, we have a serious congestion occurring for Np4Ncp: The
congestion is not preferable as for the efficiency of the packet flow. We observe thatNc
p is larger for the larger values of Z; since several paths for the destination arepossible. On the other hand, we see that Na has smaller values for the larger values ofZ: From these observations, we consider that if there is no congestion occurring, weshould use smaller values of Z in order to increase the throughput Na: But if thecongestion has started to occur, it is better to use larger values of Z in order to avoidthe congestion. Hence it is important to adjust the parameter Z: We introducereinforcement learning proposed by Sutton and Barto [12] for this purpose.The reinforcement learning is a kind of unsupervised learning, which is
characterized by the absence of external supervisor. There is a kind of trade-offbetween exploration and exploitation. Exploitation is the knowledge which isalready known to an agent, namely each node in the present case, in order to getreward in the past and exploration is the action for better selection in the future. Inthe reinforcement learning, the agent must take an action in order to get a lot of
ARTICLE IN PRESS
Fig. 2. Average number of packets moved from a node to other nodes Mr; the total number of packetsarrived at their destination Na; and the total number of packets not accepted in a node even if it were sentto the node Nr; respectively, as a function of the average number of packets Np in the network. These are
the results without the reinforcement learning for the network with N ¼ 49 on the square lattice shown in
Fig. 1. The parameters are set as x ¼ 0:3 and b ¼ 3:0: The symbols for curves are given on the top of the
figure.
T. Horiguchi et al. / Physica A 349 (2005) 329–348 335
ARTICLE IN PRESS
T. Horiguchi et al. / Physica A 349 (2005) 329–348336
reward as an interactive learning. We also say that the reinforcement learning is akind of goal-directed learning for uncertain environment.Now we consider in the reinforcement learning that the value of Z depends on
node i and the number of times Ts during which each node has sent a packet. Hencewe rewrite the energy for the neural network as follows:
E ¼1
2
XN
i
XNi
k
XNi
l
Jik;il siksil � ZiðTsÞXN
i
XNi
l
1�1
bl
ql þ1
2
XNl
jai
sjl
!( )sil
� ð1� ZiðTsÞÞXN
i
XNi
l
1�dl;gi
dc
� �sil þ x
XN
i
XNi
l
sil � 1
!2
: ð7Þ
Eq. (5) is now rewritten as follows:
hilðt þ 1Þ ¼ hilðtÞ �XNi
k
Jik;ilvikðtÞ þ ZiðTsÞ 1�
1
bl
ql þXNl
jai
vjlðtÞ
!( )
þ ð1� ZiðTsÞÞ 1�
dl;gi
dc
� �� 2x
XNi
k
vikðtÞ � 1
!: ð8Þ
The problem is how we improve the value of ZiðTsÞ: We then propose the following
algorithm:
1.
Zið1Þ ¼ C for any i, where C is constant in 0pCp1: 2. Node i sends a packet at Tsth time step by using ZiðTsÞ:
3. ZiðTs þ 1Þ is determined by
ZiðTs þ 1Þ ¼ ð1� aÞZiðT
sÞ þ afgZ~{ðTsÞðTsÞ þ riðT
sÞg : (9)
4.
Go back to 2.Parameters a and g are a learning rate and a discount rate, respectively, and areassumed that a 2 ½0; 1 and g 2 ½0; 1 : In the present numerical simulations, we usea ¼ 0:1 and g ¼ 0:9; after checking several values for these parameters. In Eq. (9),~{ðTsÞ is a node number chosen for a packet to be sent from node i at Tsth step. riðT
sÞ
is defined as follows:
riðTsÞ ¼
0 if q~{ðTsÞob~{ðTsÞ ;
1� gZ~{ðTsÞðTsÞ if q~{ðTsÞ ¼ b~{ðTsÞ :
((10)
In this way, if the buffer at node ~{ðTsÞ is full of packets, then the value of ZiðTs þ 1Þ
increases. But if the buffer is not full of packets, then the value of ZiðTs þ 1Þ
decreases. We notice that 0pZiðTsÞp1: The learning agent, namely each node,
determines the value of ZiðTsÞ assigning to the agent according to the environment
surrounding the agent i.We show the obtained results by numerical simulations in Fig. 3 for the computer
network given in Section 2 by using the same setting except that ZiðTsÞ is used instead
ARTICLE IN PRESS
Fig. 3. Mr;Na and Nr as a function of Np; respectively, with the reinforcement learning for the network
on the square lattice. See the figure caption given in Fig. 2 for detail.
T. Horiguchi et al. / Physica A 349 (2005) 329–348 337
of Z: As seen in Fig. 3, the reinforcement learning works well for the packet flow inorder to avoid the occurrence of the traffic congestion. We also found that theobtained results do not depend much on the initial value of Zið1Þ: This is also one ofthe preferable results.
ARTICLE IN PRESS
T. Horiguchi et al. / Physica A 349 (2005) 329–348338
4. Packet flow on computer networks with various types of topology
In this section, we investigate the packet flow on computer networks with varioustypes of topology; we just say a network instead of a computer network in thissection. First, we investigate the packet flow on a network given by a fractal latticeshown in Fig. 4; this is a square lattice obtained from the Sierpinski Carpet of step 2[13]. In Fig. 5, we show the results for Mr; Na and Nr obtained for the case where wedo not introduce the reinforcement learning. We find that the results for Z ¼ 0:1 arevery poor for these quantities, since there are nodes at which many packetsconcentrate and they play the role of a bottleneck for packet flow, since the effect ofthe shortest path routing is too stressed for smaller values of Z: We show the resultsfor Mr; Na and Nr obtained by using the reinforcement learning in Fig. 6. As seen inFig. 6, we have a good performance in the packet flow when the reinforcementlearning is introduced.Next we consider the packet flow on a small-world network which is highly
clustered and yet has small characteristic path lengths. We have constructed a small-world network according to the algorithm proposed by Watts and Strogatz [14]:
1.
First, N nodes are put on a ring and each node is connected to its Kth nearestneighbors by undirected links, where 1oK5N:2.
Then we reconnect each link at random with probability p.We calculate the average of the shortest path length LðpÞ and the clusteringcoefficient CðpÞ defined as follows:
LðpÞ ¼2
NðN þ 1Þ
Xi4j
lij ; (11)
Fig. 4. An example of a square lattice obtained from the Sierpinski Carpet of step 2.
ARTICLE IN PRESS
Fig. 5. Mr;Na and Nr as a function of Np; respectively, without the reinforcement learning for the
network shown in Fig. 4. See the figure caption given in Fig. 2 for detail.
T. Horiguchi et al. / Physica A 349 (2005) 329–348 339
ARTICLE IN PRESS
Fig. 6. Mr;Na and Nr as a function of Np; respectively, with the reinforcement learning for the network
shown in Fig. 4. See the figure caption given in Fig. 2 for detail.
T. Horiguchi et al. / Physica A 349 (2005) 329–348340
ARTICLE IN PRESS
Fig. 7. The average of the shortest path length, LðpÞ=Lð0Þ; and the cluster coefficient, CðpÞ=Cð0Þ; as afunction of p, respectively, for the small-world network with N ¼ 100 and K ¼ 3:
T. Horiguchi et al. / Physica A 349 (2005) 329–348 341
where lij is the shortest path-length between node i and node j and is defined by thesmallest number of hops along links, and
CðpÞ ¼
PNi ci;3PNi ci;2
; (12)
where
ci;n ¼Xj4k
dðmi; jþmj;kþmk;iÞ;n (13)
and mi; j takes 1 when node i and node j are connected directly by a link and 0otherwise. da;b is the Kronecker delta. We give the results of LðpÞ=Lð0Þ andCðpÞ=Cð0Þ for one of small world networks obtained by the above algorithm in Fig. 7for N ¼ 100 and K ¼ 3:In Fig. 8, we show the results for Mr; Na and Nr for the typical case of the small-
world network, namely by setting p ¼ 0:03; for the case that the reinforcementlearning is not introduced. We see that there occurs a serious congestion in thepacket flow for the case of Z ¼ 0:1: This is due to the fact that there exist someshorter paths, and packets try to use those paths. In Fig. 9, we give the results forMr; Na and Nr by using the reinforcement learning. As seen in Fig. 9, we also have agood performance for the packet flow when the reinforcement learning is introducedfor the small-world network.Finally we consider the packet flow on scale-free networks. We construct the scale
free-networks by using the algorithm proposed by Barabasi and Albert [15]. See [16]and [17] for a general introduction to scale-free networks.
1.
We put M0 nodes first without any link, and then we put a new node with M linksat every time step.ARTICLE IN PRESS
Fig. 8. Mr;Na and Nr as a function of Np; respectively, without the reinforcement learning for the smallworld network with p ¼ 0:03: See the figure caption given in Fig. 2 for detail.
T. Horiguchi et al. / Physica A 349 (2005) 329–348342
ARTICLE IN PRESS
Fig. 9. Mr;Na and Nr as a function of Np; respectively, with the reinforcement learning for the small-
world network with p ¼ 0:03: See the figure caption given in Fig. 2 for detail.
T. Horiguchi et al. / Physica A 349 (2005) 329–348 343
ARTICLE IN PRESS
T. Horiguchi et al. / Physica A 349 (2005) 329–348344
2.
We connect M links of the newly added node to nodes already existing in thenetwork with the probability Pi given byPi ¼ki þ 1Pj ðkj þ 1Þ
; (14)
where ki is the connectivity of node i.
We notice that the network is a kind of random networks with t þ M0 nodes and Mt
links. Barabasi and Albert have shown that the network obtained in their algorithmshows the scale-free behavior. An example of the obtained networks is shown inFig. 10 for M ¼ M0 ¼ 3 and N ¼ 100: We have confirmed that the exponent g is2.92 for N ¼ 105 and tends to be 3 in the N infinity limit.In Fig. 11, we show the results for Mr; Na and Nr for the scale-free network shown
in Fig. 10 with N ¼ 100; and M ¼ M0 ¼ 3 when there is no reinforcement learning.We choose parameter x ¼ 0:6: This value of x is different from the one used for theother networks. The value of x is not so important usually. However, in the scale-freenetwork there exist hub nodes. In a hub node, there are many neurons in our model.These neurons try to fire at the hub node and hence we force only one neuron to firein the hub node in order to save the simulation time by using a rather large value of
Fig. 10. An example of the scale-free network with N ¼ 100 and M ¼ M0 ¼ 3 by Albert and Barabasi.
ARTICLE IN PRESS
Fig. 11. Mr;Na and Nr as a function of Np; respectively, without the reinforcement learning for the scale-free network shown in Fig. 10. See the figure caption given in Fig. 2 for detail.
T. Horiguchi et al. / Physica A 349 (2005) 329–348 345
x: Of course there is no big difference in the obtained results for different valuesfor x: Now we find that the traffic congestion easily occurs for Z ¼ 0:1 and 0.3. Thisis because the hub nodes play as a bottleneck in the packet flow. In the throughput,
ARTICLE IN PRESS
T. Horiguchi et al. / Physica A 349 (2005) 329–348346
we find there exist two peaks in Fig. 11. This behavior is completely different fromthat in other networks. For example, we see two peaks around Np ¼ 5 and 37 andone valley around Np ¼ 20 for Z ¼ 0:5: In order to understand this situation, we
Fig. 12. Mr;Na and Nr as a function of Np; respectively, with the reinforcement learning for the scale-freenetwork shown in Fig. 10. See the figure caption given in Fig. 2 for detail.
ARTICLE IN PRESS
T. Horiguchi et al. / Physica A 349 (2005) 329–348 347
have calculated an average number of usage of links for packet communication andfound that there exist local loops which trap packets around Np ¼ 20:When Np is not so large, the hub nodes are not full of packets yet. However,
around Np ¼ 20; the hub nodes have a lots of packets and are almost full ofpackets. Hence packets try to move avoiding those hub nodes. Thus there existlocal loops which trap packets. Around Np ¼ 37; there is no big difference in thenumber of packets left as a vacancy in the buffer at each node, regardless of a hubnode or not, since the number of packets in the network are almost limited innumber. Then the throughput increases again until the buffer becomes full ofpackets. We give the results for Mr; Na and Nr by using the reinforcement learningin Fig. 12. In Fig. 12, we see again that a good performance for the packet flow isachieved when the reinforcement learning is introduced even for the scale-freenetwork.
5. Concluding remarks
In the present paper, we have proposed reinforcement learning in order to avoidtraffic congestion in the packet flow. As a basic model of the packet routing, we haveused the model proposed by Horiguchi and Ishioka [8]. We have found that ifpackets were not sent to nodes whose buffers are full of packets, then a seriouscongestion is observed in the computer network. By using the proposed algorithmfor the reinforcement learning, we have succeeded in avoiding these seriouscongestions and at the same time we have found that the throughput is improvedvery much. We have investigated the packet flow in the networks with various typesof topology, such as the one with fractal structure, the small-world network, thescale-free network and so on. For all of these networks, the proposed reinforcementlearning works very well.In increasing the usage of the computer network such as Internet, the designing
of network structure is important for better communication of information.Namely, the search for a network structure which gives us efficient packetflow is important. One of the methods for this purpose is given as follows:First we define a cost function for the efficiency of the packet communication.Then we obtain an optimized network structure by reconnecting links in the networkso as to minimize the defined cost function. This problem is left as one of futureproblems.
Acknowledgements
We are grateful to K. Tanaka, K. Katayama, C. Yamaguchi and J. Okubo fortheir valuable discussions. This work was partially supported by Grant-in-Aid forScientific Research no.14084202 from MEXT of Japan and also by the Ishidafoundation.
ARTICLE IN PRESS
T. Horiguchi et al. / Physica A 349 (2005) 329–348348
References
[1] D. Baldi, P. Frasconi, P. Smith, Modeling the Internet and the Web, Wiley, Chichester, West Sussex,
2003.
[2] A.S. Tanenbaum, Computer Networks, third ed., Prentice-Hall, Englewood Cliffs, NJ, 1998.
[3] H.E. Rauch, T. Winarske, IEEE Control Syst. Mag. (1988) 26.
[4] I. Iida, A. Chungo, R. Yatsuboshi, Proc. IEEE Int. Conf. SMC (1989) 194.
[5] M.K. Mehmet Ali, F. Kamoun, IEEE Trans. Neural Networks 4 (1993) 941.
[6] H. Kurokawa, C.Y. Ho, S. Mori, Neural Networks 11 (1998) 347.
[7] J. Hertz, A. Krogh, R.G. Palmer, Introduction to the Theory of Neural Computations, Addison-
Wesley, Redwood City, CA, 1991.
[8] T. Horiguchi, S. Ishioka, Physica A 297 (2001) 521.
[9] T. Horiguchi, H. Takahashi, ICANN/ICONIP 2003 International Conference, Supplementary
Proceedings, 2003, p. 358.
[10] T. Horiguchi, H. Takahashi, K. Hayashi, C. Yamaguchi, Proceedings of 2003 Joint Workshop of
Hayashibara Foundation and SMAPIP, 2003, p. 115.
[11] T. Horiguchi, H. Takahashi, K. Hayashi, C. Yamaguchi, Physica A 339 (2004) 653.
[12] R.S. Sutton, A.G. Barto, Reinforcement Learning, MIT Press, Cambridge, MA, 1998.
[13] H.-O. Peitgen, H. Jurgens, D. Saupe, Chaos and Fractals, Springer, New York, 1992.
[14] D.J. Watts, S.H. Strogatz, Nature 393 (1998) 440.
[15] A.-L. Barabasi, R. Albert, Science 286 (1999) 509.
[16] R. Albert, A.-L. Barabasi, Rev. Mod. Phys. 74 (2002) 47.
[17] S.N. Dorogovtsev, J.F.F. Mendes, Evolution of Networks, Oxford University Press, Oxford, 2003.