thomas van der elsen, richard lawrence, jumi oladimeji, alastair smith
TRANSCRIPT
![Page 1: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith](https://reader034.vdocuments.mx/reader034/viewer/2022051400/55160e25550346cf6f8b6129/html5/thumbnails/1.jpg)
Thomas van der Elsen, Richard Lawrence,
Jumi Oladimeji, Alastair Smith
![Page 2: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith](https://reader034.vdocuments.mx/reader034/viewer/2022051400/55160e25550346cf6f8b6129/html5/thumbnails/2.jpg)
IntroductionPeople increasingly publish their reactions to
public events using a blogA tool that enables this info to be published quicklyA journal that is available on the web
Need for effective data-mining techniques specific to blogs and similar tools (e.g. the Semantic Web)
“Our goal is to develop a method of capturing hot conversations by automating readers’ processes for characterizing and monitoring blogs.”
![Page 3: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith](https://reader034.vdocuments.mx/reader034/viewer/2022051400/55160e25550346cf6f8b6129/html5/thumbnails/3.jpg)
OverviewData-mining techniques
Creation of blog link structureAnalysing link structure
Types of important bloggersAgitatorsSummarisers
Applications, analysis and conclusionsReal-world applications and extensionsPros and cons of the paper
![Page 4: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith](https://reader034.vdocuments.mx/reader034/viewer/2022051400/55160e25550346cf6f8b6129/html5/thumbnails/4.jpg)
Crawling blogsExtracting hyperlinksExtracting blog threads
![Page 5: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith](https://reader034.vdocuments.mx/reader034/viewer/2022051400/55160e25550346cf6f8b6129/html5/thumbnails/5.jpg)
Crawling blogs
System crawls through RSS list registering for each entry:TitlePermalink List entry date
Aggregator: gathers RSS feeds from multiple sources and organises them
OPML: file format used to share RSS feed lists
RSS: A format for distributing content on the web
Aggregators
RSS list
RSS feeds
OPML
![Page 6: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith](https://reader034.vdocuments.mx/reader034/viewer/2022051400/55160e25550346cf6f8b6129/html5/thumbnails/6.jpg)
Extracting hyperlinks
Problem: Different tag structures per server
RSS feed from list
Description
Blog entries
Hyperlink list
![Page 7: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith](https://reader034.vdocuments.mx/reader034/viewer/2022051400/55160e25550346cf6f8b6129/html5/thumbnails/7.jpg)
Extracting blog threadsHyperlink
If sourceLinkIf replyLink
Check links exist in thread data
Add
Check departure URL exists in thread data
Check destination URL points to entry on list
&&
Add dest entry to thread
11
Add destination entry to entry list and add to thread
10
Add departure entry to thread
01Create new thread
00
![Page 8: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith](https://reader034.vdocuments.mx/reader034/viewer/2022051400/55160e25550346cf6f8b6129/html5/thumbnails/8.jpg)
Example Results
![Page 9: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith](https://reader034.vdocuments.mx/reader034/viewer/2022051400/55160e25550346cf6f8b6129/html5/thumbnails/9.jpg)
AgitatorsSummarisersJoe Bloggs
![Page 10: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith](https://reader034.vdocuments.mx/reader034/viewer/2022051400/55160e25550346cf6f8b6129/html5/thumbnails/10.jpg)
AgitatorsDiscussion stimulatorThreads often grow after an agitator’s entryThree discriminants for an agitator
Link (Agi1)Popularity (Agi2)Topic (Agi3)
The three discriminants can be weighted using the following formula:
![Page 11: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith](https://reader034.vdocuments.mx/reader034/viewer/2022051400/55160e25550346cf6f8b6129/html5/thumbnails/11.jpg)
Link-based Discriminantex is an agitator if
(kx) > θ1
ex = a blog entry
kx = no of entries
in threadi with a
replyLink to ex
![Page 12: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith](https://reader034.vdocuments.mx/reader034/viewer/2022051400/55160e25550346cf6f8b6129/html5/thumbnails/12.jpg)
Popularity-based discriminantex is an agitator if
(lx/mx) > θ2
ex = a blog entrylx = no of entries in
threadi
published t days after ex
mx = no of entries in
threadi published t days
before ex
![Page 13: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith](https://reader034.vdocuments.mx/reader034/viewer/2022051400/55160e25550346cf6f8b6129/html5/thumbnails/13.jpg)
Topic-based discriminantex is an agitator if
ex = a blog entry
n = number of entries
![Page 14: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith](https://reader034.vdocuments.mx/reader034/viewer/2022051400/55160e25550346cf6f8b6129/html5/thumbnails/14.jpg)
Summarizers Publish entries that collate
and compact previous posts Provide a convenient way of
digesting an entire thread The discriminant for
summarizers is link-based:ex is a summarizer if
(px) > θ4
ex = a blog entry
px = number of entries in threadi that have a replyLink from ex
![Page 15: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith](https://reader034.vdocuments.mx/reader034/viewer/2022051400/55160e25550346cf6f8b6129/html5/thumbnails/15.jpg)
ApplicationsPros and ConsConclusions
![Page 16: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith](https://reader034.vdocuments.mx/reader034/viewer/2022051400/55160e25550346cf6f8b6129/html5/thumbnails/16.jpg)
ApplicationsSupplementary info e.g. TV, news site etc
Home and Away – who shot Josh West Agitator
Sports, etc. – used by studios and media to highlight points of interest in a match Summariser
![Page 17: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith](https://reader034.vdocuments.mx/reader034/viewer/2022051400/55160e25550346cf6f8b6129/html5/thumbnails/17.jpg)
Analysis – ProsBasis for future research – a brief intro to the
subject. Multiple thread analysisIdentification of areas of bloggers’ expertise
Highly effective in certain specific areasNews and reviews
Implementation of theory (feature vector)
![Page 18: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith](https://reader034.vdocuments.mx/reader034/viewer/2022051400/55160e25550346cf6f8b6129/html5/thumbnails/18.jpg)
Analysis – ConsOnly 25 sites used in sample (but 1000s of
blogs)Does not take context into consideration
E.g., an agitator may be posting offensive entries
No measurement of summary successComments are not analysedInappropriate for certain areas
MySpace, Bebo, et al. (due to target audience)
![Page 19: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith](https://reader034.vdocuments.mx/reader034/viewer/2022051400/55160e25550346cf6f8b6129/html5/thumbnails/19.jpg)
ConclusionsCreated a data-mining framework for future
researchMay instigate research into further work
Nice idea and potentially useful but needs to be extended
![Page 20: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith](https://reader034.vdocuments.mx/reader034/viewer/2022051400/55160e25550346cf6f8b6129/html5/thumbnails/20.jpg)
Thank you for your time