a model of information foraging via ant colony simulation
DESCRIPTION
Matthew Kusner. A Model of Information Foraging via Ant Colony Simulation. Information Foraging. Theory Background People search for information in roughly the same way that animals search for food in their surroundings. Information Scent Ex: “the text associated with Web links” (Fu, 2007) - PowerPoint PPT PresentationTRANSCRIPT
A Model of Information Foraging via Ant Colony Simulation
Matthew Kusner
Information Foraging
Theory Background
– People search for information in roughly the same way that animals search for food in their surroundings.
Information Scent
– Ex: “the text associated with Web links” (Fu, 2007)
– Background knowledge
– Recommendations
Ant Colony Simulation
Pheromone trails
– Laid by ants who've found food.
– Followed by other ants with probability p.
– Path Evaporation Path Optimization Simulation specifics
AOL Data Set 21 million queries (March 1– May 31, 2006) 650k users 19 million click-through events Quantities: query time of query click URL user ID clicked link rank
Information Foraging → Ant Colony
user → ant clicked link → food information scent → pheromone path website importance → food distance where website importance is defined by:
– 1. Rank
– 2. Popularity of website
– 3. Combination of above methods
Distancing Methods
• Ranking
• Popularity
• Combination
[based on data in Joachims et al., 2005]
Results• AOL user-visit per website vector
– [numWvisits1, numWvisits
2, ..., numWvisits
n]
• Simulation ant-visit per food vector
– [numAvisits1, numAvisits
2, ..., numAvisits
n]
• Pearson Correlation Score (PCS)
• Permutation Test → 95% Coverage Interval
– (AOL_datai, simulation_data
i) selection with
replacement
• Bootstrapping → p-value
– Shuffle AOL vector
Query Type of distancing
# of users
# of clicked links
# of distinct websites visited
Average PCS
Average 95% CI
Start
Average 95% CI
End
Significant p-val?
ranking 125 59 19 0.8182 0.3203 0.9364 Yes
vacation popularity 125 59 19 0.1296 -0.1768 0.6624
combination 125 59 19 0.1488 -0.3819 0.3920
ranking 39 25 6 0.7631 -0.4781 0.9854
rhino popularity 39 25 6 0.3906 -0.2484 0.9919
combination 39 25 6 0.2013 -0.7389 0.9657
ranking 53 61 12 -0.1825 -0.5426 0.4706
zebra popularity 53 61 12 -0.0110 -0.4667 0.5079
combination 53 61 12 0.1558 -0.3655 0.6754
ranking 52 39 9 0.6118 -0.1797 0.9214
lion popularity 52 39 9 0.0699 -0.5776 0.7296
combination 52 39 9 0.0304 -0.6170 0.6609
ranking 194 56 21 0.5358 -0.0952 0.9301
football popularity 194 56 21 0.2693 -0.1583 0.6722
combination 194 56 21 0.4149 -0.0223 0.7612
ranking 220 74 16 0.7137 -0.4225 0.9529
basketball popularity 220 74 16 0.2228 -0.1755 0.6455
combination 220 74 16 0.1415 -0.3470 0.6661
Results• Queries with significant p-values:
– vacation” (ranking), “baseball” (ranking), “reebok” (ranking), “adidas” (ranking), “marbles” (ranking), “helicopter” (ranking), “car” (ranking), “potatoes” (ranking), “coffee” (ranking), “farming” (ranking), “rock” (popularity), “shirts” (ranking), “playstation” (ranking), “sega” (popularity), “tom cruise” (ranking), “mel gibson” (ranking), “burger king” (ranking), “chicago” (ranking), “los angeles” (ranking), and “paris” (ranking)
• Distancing methods without 95% CI overlap:– Ranking:
• “potatoes” - neither popularity, nor combination
• “shirts” - not popularity
• “playstation” - not popularity
• “burger king” - not combination
Discussion• Disadvantages of popularity and combination
methods
– “vacation” example
• Possible reasons for 95% CI overlap
– Randomness
– Disregard of structure
• Significance of queries with low p-values
– Search engine matching
• Future directions
– Different Simulation
– Other similarity metrics
– Random beginnings
References
• Fu, W., & Pirolli, P. (2007). SNIF-ACT: a cognitive model of user navigation on the World Wide Web. Human-Computer Interaction, 22(4), 355-412.
• T. Joachims, L. Granka, B. Pang, H. Hembrooke, and G. Gay (2005). Accurately Interpreting Clickthrough Data as Implicit Feedback, Proceedings of the ACM Conference on Research and Development on Information Retrieval (SIGIR).