an epistemic dynamic model for tagging systems
DESCRIPTION
This presentation is about our paper which was presented at the Hypertext conference 2008. Abstract: In recent literature, several models were proposed for reproducing and understanding the tagging behavior of users. They all assume that the tagging behavior is influenced by the previous tag assignments of other users. But they are only partially successful in reproducing characteristic properties found in tag streams. We argue that this inadequacy of existing models results from their inability to include user’s background knowledge into their model of tagging behavior. This paper presents a generative tagging model that integrates both components, the background knowledge and the influence of previous tag assignments. Our model successfully reproduces characteristic properties of tag streams. It even explains effects of the user interface on the tag stream. Links to the paper: http://dx.doi.org/10.1145/1379092.1379109 http://www.uni-koblenz.de/~staab/Research/Publications/2008/DellschaftStaabHypertext08.pdfTRANSCRIPT
<is web> Information Systems & Semantic Web
University of Koblenz ▪ Landau, Germany
An Epistemic Dynamic Model for Tagging Systems
Klaas Dellschaft and Steffen Staab{klaasd, staab}@uni-koblenz.de
<is web>
ISWeb - Information Systems & Semantic Web
Klaas [email protected]
Hypertext 20082 of 24
Delicious Tagging Interface
How do users influence each other in tagging systems? Hypothesis: Tagging involves imitation of other users AND
selection of tags from background knowledge of users.
<is web>
ISWeb - Information Systems & Semantic Web
Klaas [email protected]
Hypertext 20083 of 24
Understanding Dynamics in Tagging Systems
Conceptualization
Own Background Knowledge
Shared terminology
Something else?User interface
Tagging Behavior
Joint Stochastic Model
Model of Own Background Knowledge
Model of Sharing
Model of User Interface Influence
Simulated Tagging Behavior
Co
mp
ariso
n o
f Sta
tistics
<is web>
ISWeb - Information Systems & Semantic Web
Klaas [email protected]
Hypertext 20084 of 24
Folksonomies
Vertexes: Users, tags, resources Hyperedges: Tag assignments (user X tag X resource) Postings:
Tag assignments of a user to a single resource Can be ordered according to their time-stamp
<is web>
ISWeb - Information Systems & Semantic Web
Klaas [email protected]
Hypertext 20085 of 24
Co-occurrence Streams
Co-occurrence Streams: All tags co-occurring with a given tag in a posting Ordered by posting time
Example tag assignments for ‘ajax': {mackz, r1, {ajax, javascript}, 13:25} {klaasd, r2, {ajax, rss, web2.0}, 13:26} {mackz, r2, {ajax, php, javascript}, 13:27}
Resulting co-occurrence stream:
Tag |Y| |U| |T| |R|ajax 2.949.614 88.526 41.898 71.525blog 6.098.471 158.578 186.043 557.017xml 974.866 44.326 31.998 61.843
javascript rss web2.0 php javascript
time
<is web>
ISWeb - Information Systems & Semantic Web
Klaas [email protected]
Hypertext 20086 of 24
Co-occurrence Streams – Tag Growth
Linear scaling of axes
Logarithmic scaling of axes
<is web>
ISWeb - Information Systems & Semantic Web
Klaas [email protected]
Hypertext 20087 of 24
Co-occurrence Streams – Tag Frequencies
Logarithmic scaling of axes
<is web>
ISWeb - Information Systems & Semantic Web
Klaas [email protected]
Hypertext 20088 of 24
Resource Streams
Resource Streams: All tags assigned to a resource Ordered by posting time
Example tag assignments: {mackz, r1, {ajax, javascript}, 13:25} {klaasd, r2, {ajax, rss, web2.0}, 13:26} {mackz, r2, {ajax, php, javascript}, 13:27}
Resulting resource stream:
ajax rss web2.0 ajax php javascripttime
<is web>
ISWeb - Information Systems & Semantic Web
Klaas [email protected]
Hypertext 20089 of 24
Resource Streams – Tag Frequencies
Logarithmic scaling of axes
<is web>
ISWeb - Information Systems & Semantic Web
Klaas [email protected]
Hypertext 200810 of 24
Resource Streams – Tag Frequencies
Taken from Halpin, Robu and Shepard, WWW 2007
Logarithmic scaling of axes
<is web>
ISWeb - Information Systems & Semantic Web
Klaas [email protected]
Hypertext 200811 of 24
Existing Models
Frequency Rank
Co-occur. Streams Resource Streams Tag Growth
Polya Urn Model O O Fixed size
Simon Model O O Linear
YS Model w/ Memory + O Linear
Halpin et al. Model O O Linear
Observed Properties Long tail w/ cut-off Drop around tag 7 Sublinear
<is web>
ISWeb - Information Systems & Semantic Web
Klaas [email protected]
Hypertext 200812 of 24
Stochastic Models of Influence
Conceptualization
Own Background Knowledge
Shared terminology
Something else?User interface
Tagging Behavior
Joint Stochastic Model
Model of Own Background Knowledge
Model of Sharing
Model of User Interface Influence
Simulated Tagging Behavior
Co
mp
ariso
n o
f Sta
tistics
<is web>
ISWeb - Information Systems & Semantic Web
Klaas [email protected]
Hypertext 200813 of 24
Modeling Tag Imitation
Delicious Tagging Interface Imitation of previous tag assignments Free input of new tags (taken from background knowledge of a user)
<is web>
ISWeb - Information Systems & Semantic Web
Klaas [email protected]
Hypertext 200814 of 24
Simulating a Tag Stream
p(w|t): Probability of selecting word w for topic t. Modeled by word distributions in a topic centered text corpus.
n: Number of visible previous tags.
h: Maximal number of previous tag assignments used for determining ranking of the n distinct tags.
P BK
1-PBK
Start with empty tag stream Each simulation step appends a new tag assignment:
<is web>
ISWeb - Information Systems & Semantic Web
Klaas [email protected]
Hypertext 200815 of 24
Modeling Background Knowledge
PBK: Probability of selecting from background knowledge p(w|t): Probability of selecting word w for topic t. Modeled by
word distributions in a topic centered text corpus. p(w|r): Probability of selecting word w for resource r.
Text Corpora Del.icio.usText Corpora
<is web>
ISWeb - Information Systems & Semantic Web
Klaas [email protected]
Hypertext 200816 of 24
Modeling Tag Imitation
PI = 1 - PBK: Probability of imitating a previous tag assignment n: Number of visible top-ranked tags h: Maximal number of previous tag assignments used for determining
ranking of the n distinct tags
PBK t t-1 t-2 t-3 t-4 t-5 … …t-h
1-PBK
1 2 3 … n
<is web>
ISWeb - Information Systems & Semantic Web
Klaas [email protected]
Hypertext 200817 of 24
Simulation Results
Conceptualization
Own Background Knowledge
Shared terminology
Something else?User interface
Tagging Behavior
Joint Stochastic Model
Model of Own Background Knowledge
Model of Sharing
Model of User Interface Influence
Simulated Tagging Behavior
Co
mp
ariso
n o
f Sta
tistics
<is web>
ISWeb - Information Systems & Semantic Web
Klaas [email protected]
Hypertext 200818 of 24
Simulating Co-occurrence Streams
Tag growth: Influenced by PBK and p(w|t)
Tag Frequencies: Influenced by PBK, p(w|t), n, h n: Semantic breadth of a topic (blog: 100 tags,
ajax: 50 tags, xml: 50 tags; Cattuto et al. 2007) h: No hint for realistic values. Good guess may be a value
around 1000.
<is web>
ISWeb - Information Systems & Semantic Web
Klaas [email protected]
Hypertext 200819 of 24
Co-occ. Streams – Simulated Tag Growth
Logarithmic scaling of axes
PI PBKPI: Imitation
PBK: Background Knowl.
<is web>
ISWeb - Information Systems & Semantic Web
Klaas [email protected]
Hypertext 200820 of 24
Co-occ. Streams – Simulated Tag Frequencies
Logarithmic scaling of axes
PI PBK
PI PBK
PI PBK
PI: Imitation
PBK: Background Knowl.
n: Visible Tags
h: Available History
<is web>
ISWeb - Information Systems & Semantic Web
Klaas [email protected]
Hypertext 200821 of 24
Simulating Ajax, Blog and XML
blog
ajax
xml
Logarithmic scaling of axes
ObservationSimulation
(PI
(PI
(PI
PI: Imitation
PBK: Background Knowl.
n: Visible Tags
h: Available History
<is web>
ISWeb - Information Systems & Semantic Web
Klaas [email protected]
Hypertext 200822 of 24
Res. Streams – Simulated Tag Frequencies
Logarithmic scaling of axes
PI PBK
PI PBK
PI: Imitation
PBK: Background Knowl.
n: Visible Tags
h: Available History
<is web>
ISWeb - Information Systems & Semantic Web
Klaas [email protected]
Hypertext 200823 of 24
Conclusions
• Imitation AND background knowledge are needed for explaining properties of tag streams
• Probability of imitating previous tag assignments: ~60-90%
Frequency Rank
Co-occur. Streams Resource Streams Tag Growth
Polya Urn Model O O Fixed size
Simon Model O O Linear
YS Model w/ Memory + O Linear
Halpin et al. Model O O Linear
Epistemic Model + + Sublinear
Observed Properties Long tail w/ cut-off Drop around tag 7 Sublinear
<is web>
ISWeb - Information Systems & Semantic Web
Klaas [email protected]
Hypertext 200824 of 24
Data sets and simulation software:http://isweb.uni-koblenz.de/Research/Tagdataset
http://www.tagora-project.eu