university of jyvÄskylÄ building neurosearch – intelligent evolutionary search algorithm for...
TRANSCRIPT
UNIVERSITY OF JYVÄSKYLÄ
Building NeuroSearch – Intelligent Evolutionary Search Algorithm For Peer-to-Peer EnvironmentMaster’s Thesis by Joni Töyrylä 3.9.2004
Mikko Vapa, researcher studentInBCT 3.2 Cheese Factory / P2P Communication
Agora Center
http://tisu.it.jyu.fi/cheesefactory
2004
UNIVERSITY OF JYVÄSKYLÄ
Contents• Resource Discovery Problem• Related WorkRelated Work• Peer-to-Peer NetworkPeer-to-Peer Network• Neural NetworksNeural Networks• Evolutionary ComputingEvolutionary Computing• NeuroSearch• Research Environment• Research Cases
– Fitness– PopulationPopulation– Inputs– Resources– Queriers– Brain Size
• Summary and Future
2004
UNIVERSITY OF JYVÄSKYLÄ
Resource Discovery Problem
• In peer-to-peer (P2P) resource discovery problem a P2P node decides based on local knowledge which neighbors would be the best targets (if any) for the query to find the needed resource
• A good solution locates the predetermined number of resources using minimal number of packets
2004
UNIVERSITY OF JYVÄSKYLÄ
NeuroSearch
• NeuroSearch resource discovery algorithm uses neural networks and evolution to adapt its behavior to given environment– neural network for deciding whether to pass the query further
down the link or not– evolution for breeding and finding out the best neural
network in a large class of local search algorithms
Query
Forward the query
Forward the query
Neighbor Node
Neighbor Node
2004
UNIVERSITY OF JYVÄSKYLÄ
NeuroSearch’s Inputs• The internal structure of NeuroSearch algorithm
• Multiple layers enable the algorithm to express non-linear behavior
• With enough neurons the algorithm can universally approximate any decision function
2004
UNIVERSITY OF JYVÄSKYLÄ
NeuroSearch’s Inputs
• Bias is always 1 and provides means for neuron to produce non-zero output with zero inputs
• Hops is the number of links the message has gone this far• Neighbors (also known as currentNeighbors or MyNeighbors) is
the amount of neighbor nodes this node has• Target’s neighbors (also known as toNeighbors) is the amount
of neighbor nodes the message’s target has• Neighbor rank (also known as NeighborsOrder) tells target’s
neighbor amoun related to current node’s other neighbors• Sent is a flag telling if this message has already been forwarded
to the target node by this node• Received (also known as currentVisited) is a flag describing
whether the current node has got this message earlier
2004
UNIVERSITY OF JYVÄSKYLÄ
NeuroSearch’s Training Program
• The neural network weights define how neural network behaves so they must be adjusted to right values
• This is done using iterative optimization process based on evolution and Gaussian mutation
Define thenetwork
conditions
Define the quality requirements
for the algorithm
Create candidate algorithmsrandomly
Select the bestones for next
generation
Breed a newpopulation
Finally select thebest algorithm forthese conditions
Iteratethousands
ofgenerations
2004
UNIVERSITY OF JYVÄSKYLÄ
Research Environment
• The peer-to-peer network being tested contained:– 100 power-law distributed P2P nodes with 394 links and 788
resources– Resources were distributed based on the number of connections the
node has meaning that high-connectivity nodes were more likely to answer to the queries
– Topology was static so nodes were not disappearing or moving– Querier and the queried resource were selected randomly and 10
different queries were used in each generation (this was found to be enough to determine the overall performance of the neural network)
• Requirements for the fitness function were:– The algorithm should locate half of the available resources for every
query (each obtained resource increased fitness 50 points)– The algorithm should use as minimal number of packets as possible
(each used packet decreased fitness by 1 point)– The algorithm should always stop (stop limit for number of packets
was set to 300)
2004
UNIVERSITY OF JYVÄSKYLÄ
Research Cases - Fitness
• Fitness value determines how good the neural network is compared to others
• Even smallest and simplest neural networks manage to have fitness value over 10000
• Fitness value is calculated for poor NeuroSearch as following:
Fitness = 50 * replies – packets = 50*239 – 1290 = 10660
Note: Because of bug Steiner tree does not locate half of replies and thus gets a lower fitness than HDS
2004
UNIVERSITY OF JYVÄSKYLÄ
Research Cases – Random Weights• 10 million new neural networks were randomly generated• It seems that over 16000 fitness values cannot be obtained
purely by guessing and therefore we need optimization method
2004
UNIVERSITY OF JYVÄSKYLÄ
Research Cases - Inputs
• Different inputs were tested individually and together to get a feeling what inputs are important
Using Hops we can forexample design rules:”I have travelled 4 hops,I will not send further”
2004
UNIVERSITY OF JYVÄSKYLÄ ”Target node contains 10 neighbors,I will send further”
”Target node contains the most number ofneighbors compared to all my neighbors,I will not send further”
2004
UNIVERSITY OF JYVÄSKYLÄ
”I have received this query earlier,I will not send further”
”I have 7 neighbors,I will send further”
2004
UNIVERSITY OF JYVÄSKYLÄ
The results indicate that using only one topological information is more efficient than combining it with other topological information (the explanation for this behavior is still unclear)
2004
UNIVERSITY OF JYVÄSKYLÄ
Also the results indicate that using only one query related information is more efficient than combining it with other query related information (the explanation for this behavior is also unclear)
2004
UNIVERSITY OF JYVÄSKYLÄ
Research Cases - Resources• The needed percentage of resources was varied and the results
compared to other local search algorithms (Highest Degree Search and Breadth-First Search) and to near-optimal search trees (Steiner)
Note: Breadth-FirstSearch curve needsto be halved becausethe percentage wascalculated to half ofresources and not allavailable resources
2004
UNIVERSITY OF JYVÄSKYLÄ
Research Cases - Queriers
• The effect of lowering the amount of queriers per generation to calculate fitness value of neural network was examined
• It was found that the number ofqueriers can be dropped from 50 to 10 and still we get reliable fitness values Speeds up the optimizationprocess significantly
2004
UNIVERSITY OF JYVÄSKYLÄ
Research Cases – Brain Size
• The amount of neurons on first and second layer were varied• It was found that there exists many different kind of
NeuroSearch algorithms
2004
UNIVERSITY OF JYVÄSKYLÄ
Research Cases – Brain Size
• Also optimization of larger neural networks takes more time
2004
UNIVERSITY OF JYVÄSKYLÄ
Research Cases – Brain Size
• And there exists an interesting breadth-first search vs. depth-first search dilemma where:– smaller networks obtain best fitness values with breadth-first
search strategy,– medium-sized networks obtain best fitness values with
depth-first search strategy and– large-sized networks obtain best fitness values with breadth-
first search strategy• In overall it seems that best fitness 18091.0 can be obtained
with breadth-first strategy using 5 hops with neuron size of 25:10 (25 on the first hidden layer and 10 on the second hidden layer)
2004
UNIVERSITY OF JYVÄSKYLÄ
25:10 had the greatest fitness value
Would more generations than 100.000increase the fitness when 1st hiddenlayer contains more than 25 neurons?
20:10 had the greatest average hops value
What happens if the number of neuronson 2nd hidden layer is increased? Willthe average number of hops decrease?
2004
UNIVERSITY OF JYVÄSKYLÄ
Summary and Future
• The main findings of the thesis were that:– Population size of 24 and query amount of 10 are sufficient– Optimization algorithm needs to be used, because randomly
guessing neural network weights does not give good results– Individual inputs give better results than combination of two inputs
(however the best fitnesses can be obtained by using all 7 inputs)– By choosing specific set of inputs NeuroSearch may imitate any
existing search algorithm or it may behavior as combination of any of those
– Optimal algorithm (Steiner) has efficiency of 99%, whereas the best known local search algorithm (HDS) achieves 33% and NeuroSearch 25%
– Breadth-first search vs. Depth-first search dilemma exists, but no good explanation can be given yet
2004
UNIVERSITY OF JYVÄSKYLÄ
Summary and Future
• In addition to the problems shown this far, for the future work of NeuroSearch it is suggested that:– More inputs would be designed such that they provide useful
information e.g., the number of received replies, inputs used by Highest-Degree Search algorithm, inputs that define how many forwarding decisions have already been done in the current decision round and how many are still left
– Probability based output instead of threshold function could also be tested
– The correct neural network architecture and the size of population could be dynamically adjusted during evolution to find an optimal structure more easily