Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Download Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Post on 11-Aug-2014

522 views

Category:

Data & Analytics

0 download

Embed Size (px)

DESCRIPTION

 

TRANSCRIPT

<ul><li> Steffen Staab staab@uni-koblenz.de 1WeST Web Science &amp; Technologies University of Koblenz Landau, Germany Modelling the Web Examples of Modelling Text, Knowledge Networks and Physical-Social Systems Steffen Staab </li> <li> Steffen Staab staab@uni-koblenz.de 2WeST What do people want from the Web? Web as storage library memory Web as tool search transaction Web as social medium communication cooperation Web as mirror of self Identification outreach </li> <li> Steffen Staab staab@uni-koblenz.de 3WeST What are some of the footprints people leave? </li> <li> Steffen Staab staab@uni-koblenz.de 4WeST My Agenda in the Large Web Content Discovering patterns Building tools Understanding Web Interaction Monitoring Exploiting Guiding Understanding Web Evolution Monitoring Predicting Guiding Understanding </li> <li> Steffen Staab staab@uni-koblenz.de 5WeST 1. Modelling Text My Agenda for Today Web Content Web Interaction Web Evolution 2. Modeling Network Evolution 3. Modeling Physical- social Data </li> <li> Steffen Staab staab@uni-koblenz.de 6WeST 1. Modelling Text My Agenda for Today Web Content Web Interaction Web Evolution 2. Modeling Network Evolution 3. Modeling Physical- social Data </li> <li> Steffen Staab staab@uni-koblenz.de 7WeST Autocompletion of queries UK is? </li> <li> Steffen Staab staab@uni-koblenz.de 8WeST Language Models What follows UK is? Conditional probability: where Issue: Long word sequences can rarely be observed </li> <li> Steffen Staab staab@uni-koblenz.de 9WeST Modified Kneser-Ney Smoothing of n-grams If sequence is hard to observe then approximate recursively observing marginal frequencies of ...... </li> <li> Steffen Staab staab@uni-koblenz.de 10WeST Modified Kneser-Ney Smoothing of n-grams If sequence is hard to observe then approximate recursively observing marginal frequencies of First recursion step: Problem: If last word in the sequnce is rare, the overall sequence will be rare, then the approximation will be of low quality. </li> <li> Steffen Staab staab@uni-koblenz.de 11WeST Generalized Language Models [ACL14] If sequence is too hard to observe, then approximate based on marginal probabilities of ... recursively. Core idea of formal solution: Recursively applicable, commutative skip operators </li> <li> Steffen Staab staab@uni-koblenz.de 12WeST Improvement of GLMs [ACL14] Evaluation measure: Perplexity Data set: English Wikipedia, different sample sizes Relative improvement: 2,6% (most training data, smallest model) to 13,9% (least training data, largest model) Perplexity (normalized) </li> <li> Steffen Staab staab@uni-koblenz.de 13WeST Outlook for Generalized Language Models Correcting mistakes that are done in all tools Lack of appropriate models Other operators (the wild black cat) Delete: the black cat Part-of-speech: the adj adj cat Application: e.g. next word prediction Other data structures Tree-like data Graph data proposal for Google current focus Semantic Web </li> <li> Steffen Staab staab@uni-koblenz.de 14WeST 1. Modelling Text My Agenda for Today Web Content Web Interaction Web Evolution 2. Modeling Network Evolution 3. Modeling Physical- social Data </li> <li> Steffen Staab staab@uni-koblenz.de 15WeST Evolution of Networks [ICWSM 2013] Additions RemovalsTraining Link Prediction Problem Unlink Prediction Problem Markov assumption: history irrelevant </li> <li> Steffen Staab staab@uni-koblenz.de 16WeST Related Work in Brief Prediction feature f assigns a score to node pair (i, j) implies to be ranked above Link Prediction: edge likelier to be added Unlink Prediction: edge likelier to be removed f (i , j) &gt; f (i ,k) (i , j) (i , k) </li> <li> Steffen Staab staab@uni-koblenz.de 17WeST Related Work in Brief Static features degree common-neighbours path3 local-clustering- coefficient/embeddedness ... Prediction feature f assigns a score to node pair (i, j) implies to be ranked above Link Prediction: edge likelier to be added Unlink Prediction: edge likelier to be removed f (i , j) &gt; f (i ,k) (i , j) (i , k) </li> <li> Steffen Staab staab@uni-koblenz.de 18WeST Unlink prediction is much more difficult than link prediction The Snapshot View Link and unlink prediction (ICWSM 2013) </li> <li> Steffen Staab staab@uni-koblenz.de 19WeST Related Work in Brief Additions RemovalsTraining Link Prediction Problem Unlink Prediction Problem Markov assumption: history irrelevant Advantage: General Model Disadvantage: General Model Idea Keep generality, improve prediction </li> <li> Steffen Staab staab@uni-koblenz.de 20WeST Our Approach - 1 Additions RemovalsTraining Link Prediction Problem Unlink Prediction Problem Markov assumption: history irrelevant Hypothesis: Temporal information generally improves prediction Idea 1 Nodes concerned 2 Neighbourhood </li> <li> Steffen Staab staab@uni-koblenz.de 21WeST Our Approach - 2 Dynamic features: + recency + longevity Extrapolation for temporal preferential attachment: </li> <li> Steffen Staab staab@uni-koblenz.de 22WeST Evaluation &amp; Discussion (excerpt) Temporal link prediction significantly better, but only sightly Temporal unlink prediction always significantly improved Temporal preferential attachment best AUC baseline qualitative quantitative extrapolation </li> <li> Steffen Staab staab@uni-koblenz.de 23WeST Outlook for Evolution of Networks Temporal dynamics still underexplored lack of datasets! next experiments: Twitter followers Xing.de Unlinks lead to link recommendation new Wikipedia link (reorganization of Wikipedia pages!) new job new friend </li> <li> Steffen Staab staab@uni-koblenz.de 24WeST 1. Modelling Text My Agenda for Today Web Content Web Interaction Web Evolution 2. Modeling Network Evolution 3. Modeling Physical- social Data </li> <li> Steffen Staab staab@uni-koblenz.de 25WeST fish, rice seafood, fish seafood, shrimp lobster, wine seafood, fish, salmon fish, salmon, wine rice, fish lobster, seafood, shrimp coffee coffee, wine coffee wine wine pizza, wine pizza, wine pasta, wine pasta, shrimp lobster, shrimp seafood, shrimp Tagged photos with geo-coordinates from Flickr </li> <li> Steffen Staab staab@uni-koblenz.de 26WeST fish, rice seafood, fish seafood, shrimp lobster, wine seafood, fish, salmon fish, salmon, wine seafood, shrimp lobster, seafood, shrimp coffee coffee, wine coffee italian, wine wine pizza, wine italian, pizza, wine pasta, wine pasta, shrimp seafood fish lobster shrimp crab wine salmon wine pizza coffee italian pasta seafood, shrimp lobster, shrimp Tasks: Discovering topics, finding clusters </li> <li> Steffen Staab staab@uni-koblenz.de 27WeST Cultural areas, country borders, geographical features and other geographical observations exhibit complex spatial distributions wikipedia.org Challenge </li> <li> Steffen Staab staab@uni-koblenz.de 28WeST fish, rice lobster, shrimp seafood, fish seafood, shrimp lobster, wine seafood, fish, salmon seafood, shrimp fish, salmon, wine seafood, shrimp lobster, seafood, shrimp coffee coffee, wine coffee italian, wine wine pizza, wine italian, pizza, wine pasta, wine pasta, shrimp seafood fish lobster shrimp crab wine salmon wine pizza coffee italian pasta A. Ahmed, L. Hong and A. Smola, 2013 (following (Yin et al 2011; Sizov 2010)) Existing approaches: Gaussian regions </li> <li> Steffen Staab staab@uni-koblenz.de 29WeST fish, rice lobster, shrimp seafood, fish seafood, shrimp lobster, wine seafood, fish, salmon seafood, shrimp fish, salmon, wine seafood, shrimp lobster, seafood, shrimp coffee coffee, wine coffee italian, wine wine pizza, wine italian, pizza, wine pasta, wine pasta, shrimp seafood fish lobster shrimp crab wine salmon wine pizza coffee italian pasta MGTM 1: Global Topic Clustering </li> <li> Steffen Staab staab@uni-koblenz.de 30WeST fish, rice lobster, shrimp seafood, fish seafood, shrimp lobster, wine seafood, fish, salmon seafood, shrimp fish, salmon, wine seafood, shrimp lobster, seafood, shrimp coffee coffee, wine coffee italian, wine wine pizza, wine italian, pizza, wine pasta, wine pasta, shrimp seafood fish lobster shrimp crab wine salmon wine pizza coffee italian pasta MGTM 2: Determining Neighbourhoods </li> <li> Steffen Staab staab@uni-koblenz.de 31WeST Cluster adjacency Dependencies of document- specific topic distributions Exchange of topic information between clusters MGTM 3: Derived Topic Model </li> <li> Steffen Staab staab@uni-koblenz.de 32WeST Exchange of topic information between clusters MGTM 4: Exchange of Topic Information </li> <li> Steffen Staab staab@uni-koblenz.de 33WeST Exchange of topic information between clusters MGTM 4: Exchange of Topic Information </li> <li> Steffen Staab staab@uni-koblenz.de 34WeST Exchange of topic information between clusters MGTM 4: Exchange of Topic Information </li> <li> Steffen Staab staab@uni-koblenz.de 36WeST Evaluation: Anectodal, Perplexity, Gaming Gaming study: intrusion detection Precision 8 topics avg / median LGTA 0.60 / 0.58 Basic model 0.64 / 0.58 MGTM 0.78 / 0.75 </li> <li> Steffen Staab staab@uni-koblenz.de 37WeST Outlook for LDA with structure Texts + social network structures scientometry xing.de Web pages + user visits chefkoch.de </li> <li> Steffen Staab staab@uni-koblenz.de 38WeST Future: Knowledge about social aspects needed Future: CS style models for social sciences </li> <li> Steffen Staab staab@uni-koblenz.de 39WeST References [ACL14] R. Pickhardt, T. Gottron, M. Krner, P. G. Wagner, T. Speicher, S. Staab. A Generalized Language Model as the Combination of Skipped n- grams and Modified Kneser Ney Smoothing. In: Proc. of ACL-2014 - The 52nd Annual Meeting of the Association for Computational Linguistics. Baltimore, June 22-27, 2014. [WSDM14] C. Kling, J. Kunegis, S. Sizov, S. Staab. Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections. In: Proc. of the 7th ACM Conference on Web Search and Data Mining (WSDM2014), New York, US, February 24-28, 2014. [ICWSM13] J.Preusse, J.Kunegis, M.Thimm, T.Gottron, S. Staab. Structural Changes in Collaborative Knowledge Networks. In: Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media (ICWSM 2013), Boston, July 8-10, 2013. </li> <li> Steffen Staab staab@uni-koblenz.de 40WeST Semantic Web Social Web &amp; Web Retrieval Interactive Web &amp; Human Computing Web &amp; Economy Software &amp; Services Web Science &amp; Technologies Team &amp; Research Computational Social Science Thank You! </li> <li> Steffen Staab staab@uni-koblenz.de 41WeST Maslows pyramid of needs </li> </ul>

Recommended

View more >