integrating and interpreting social data from heterogeneous sources

19
ing and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010 Integrating and Interpreting Social Data from Heterogeneous Sources Matthew Rowe Organisations, Information and Knowledge Group University of Sheffield Suvodeep Mazumdar Department of Information Studies University of Sheffield

Upload: matthew-rowe

Post on 26-Jan-2015

105 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Integrating and Interpreting Social Data from Heterogeneous Sources

Matthew Rowe Organisations, Information and

Knowledge GroupUniversity of Sheffield

Suvodeep MazumdarDepartment of Information Studies

University of Sheffield

Page 2: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Outline

• Information overload– Increase in social data publication

• Interlinking social data– Metadata Generation– Integrating Social Data

• Application: Interpreting Social Data– Cumbrian Floods Use Case– Interacting with Social Data

• Conclusions

Page 3: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Information Overload

• Masses of social data are published every day– E.g. 50 million tweets (600 per second)

• http://blog.twitter.com– 22 million Facebook users in the UK

• http://www.clickymedia.co.uk/2009/10/uk-facebook-user-statistics-october-2009/

• Too much information to deal with!• Social data is multi-faceted:

– Provenance– Topic– Geo

• Trend services (e.g. trendistic, blogpulse):– Focus on majority consensus– Need to listen in to a specific topic– Concentrate on a single source/platform– Do not consider geo facet

Page 4: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Page 5: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Page 6: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Interlinking Social Data

• Consider multi-faceted nature of social data:– Allows fine-grained analysis– Show geo-localised social data– Relevant past social data

• Solution: Interlink social data from heterogeneous sources– Use semantics!– Consistent data interpretation

Page 7: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Metadata Generation

• Web 2.0 platforms return data using:– Proprietary formats;– Heterogeneous data schemas

• Need to link data together from disparate sources• A social data fragment = a single piece of social data

– E.g. A tweet, an image, a video• Lift each social data fragment to RDF:

1. Create an instance of sioc:Post and itr:LocalizedResource• Assign it a URI

2. Assign the content to the instance (topic)• Use hashtags of the microblog

3. Create an instance of gml:Geometry (geo)• Capture geo facet

4. Assign timestamp of fragment creation (provenance)• Using dc:created

5. Assign the fragment to its owner (provenance)• Create foaf:Person instance

Page 8: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Metadata Generation

• Web 2.0 platforms return data using:– Proprietary formats;– Heterogeneous data schemas

• Need to link data together from disparate sources• A social data fragment = a single piece of social data

– E.g. A tweet, an image, a video• Lift each social data fragment to RDF:

1. Create an instance of sioc:Post and itr:LocalizedResource• Assign it a URI

2. Assign the content to the instance (topic)• Use hashtags of the microblog

3. Create an instance of gml:Geometry (geo)• Capture geo facet

4. Assign timestamp of fragment creation (provenance)• Using dc:created

5. Assign the fragment to its owner (provenance)• Create foaf:Person instance

<status> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at> <id>9774519667</id> <text>Writing up our Geovation work for #lupas2010.</text> <truncated>false</truncated> <in_reply_to_status_id></in_reply_to_status_id> <in_reply_to_user_id></in_reply_to_user_id> <favorited>false</favorited> <in_reply_to_screen_name></in_reply_to_screen_name> <geo xmlns:georss="http://www.georss.org/georss"> <georss:point>53.3833,-1.4722</georss:point> </geo></status>

<photo id="949406913" media="photo"> <owner nsid="54948696@N00”/> <title>DSC00171.JPG</title> <description></description> <dates posted="1205398307" taken="2009-01-09 09:16:31" lastupdate="1257421561" /> <tags> <tag id="24539622-2330113101-400" author="54948696@N00" raw="arctic">arctic</tag> <tag id="24539622-2330113101-401" author="54948696@N00" raw="monkeys">monkeys</tag> </tags> <location latitude="53.4813" longitude="-2.2392" place_id="R8vDw_abBpSzUA"> <locality place_id="R8vDw_abBpSzUA" woeid="27872">Manchester</locality> <region place_id="pn4MsiGbBZlXeplyXg" woeid="24554868">England</region> <country place_id="DevLebebApj4RVbtaQ" woeid="23424975">United Kingdom</country> </location></photo>

Page 9: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Metadata Generation

• Web 2.0 platforms return data using:– Proprietary formats;– Heterogeneous data schemas

• Need to link data together from disparate sources• A social data fragment = a single piece of social data

– E.g. A tweet, an image, a video• Lift each social data fragment to RDF:

1. Create an instance of sioc:Post and itr:LocalizedResource• Assign it a URI

2. Assign the content to the instance (topic)• Use hashtags of the microblog

3. Create an instance of gml:Geometry (geo)• Capture geo facet

4. Assign timestamp of fragment creation (provenance)• Using dc:created

5. Assign the fragment to its owner (provenance)• Create foaf:Person instance

<status> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at> <id>9774519667</id> <text>Writing up our Geovation work for #lupas2010.</text> <truncated>false</truncated> <in_reply_to_status_id></in_reply_to_status_id> <in_reply_to_user_id></in_reply_to_user_id> <favorited>false</favorited> <in_reply_to_screen_name></in_reply_to_screen_name> <geo xmlns:georss="http://www.georss.org/georss"> <georss:point>53.3833,-1.4722</georss:point> </geo></status>

Page 10: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Metadata Generation

• Web 2.0 platforms return data using:– Proprietary formats;– Heterogeneous data schemas

• Need to link data together from disparate sources• A social data fragment = a single piece of social data

– E.g. A tweet, an image, a video• Lift each social data fragment to RDF:

1. Create an instance of sioc:Post/itr:LocalizedResource• Assign it a URI

2. Assign the content to the instance (topic)• Use hashtags of the microblog

3. Create an instance of gml:Geometry (geo)• Capture geo facet

4. Assign timestamp of fragment creation (provenance)• Using dc:created

5. Assign the fragment to its owner (provenance)• Create foaf:Person instance

<status> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at> <id>9774519667</id> <text>Writing up our Geovation work for #lupas2010.</text> <truncated>false</truncated> <in_reply_to_status_id></in_reply_to_status_id> <in_reply_to_user_id></in_reply_to_user_id> <favorited>false</favorited> <in_reply_to_screen_name></in_reply_to_screen_name> <geo xmlns:georss="http://www.georss.org/georss"> <georss:point>53.3833,-1.4722</georss:point> </geo></status>

<http://twitter.com/mattroweshow/9774519667> rdf:type sioc:Post ;rdf:type itr:LocalizedResource ;

Page 11: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Metadata Generation

• Web 2.0 platforms return data using:– Proprietary formats;– Heterogeneous data schemas

• Need to link data together from disparate sources• A social data fragment = a single piece of social data

– E.g. A tweet, an image, a video• Lift each social data fragment to RDF:

1. Create an instance of sioc:Post/itr:LocalizedResource• Assign it a URI

2. Assign the content to the instance (topic)• Use hashtags of the microblog

3. Create an instance of gml:Geometry (geo)• Capture geo facet

4. Assign timestamp of fragment creation (provenance)• Using dc:created

5. Assign the fragment to its owner (provenance)• Create foaf:Person instance

<status> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at> <id>9774519667</id> <text>Writing up our Geovation work for #lupas2010.</text> <truncated>false</truncated> <in_reply_to_status_id></in_reply_to_status_id> <in_reply_to_user_id></in_reply_to_user_id> <favorited>false</favorited> <in_reply_to_screen_name></in_reply_to_screen_name> <geo xmlns:georss="http://www.georss.org/georss"> <georss:point>53.3833,-1.4722</georss:point> </geo></status>

<http://twitter.com/mattroweshow/9774519667> rdf:type sioc:Post ;rdf:type itr:LocalizedResource ; sioc:content "Writing up our Geovation work for

#lupas2010." ;dcterms:subject "lupas2010" ;

Page 12: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Metadata Generation

• Web 2.0 platforms return data using:– Proprietary formats;– Heterogeneous data schemas

• Need to link data together from disparate sources• A social data fragment = a single piece of social data

– E.g. A tweet, an image, a video• Lift each social data fragment to RDF:

1. Create an instance of sioc:Post/itr:LocalizedResource• Assign it a URI

2. Assign the content to the instance (topic)• Use hashtags of the microblog

3. Create an instance of gml:Geometry (geo)• Capture geo facet

4. Assign timestamp of fragment creation (provenance)• Using dc:created

5. Assign the fragment to its owner (provenance)• Create foaf:Person instance

<status> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at> <id>9774519667</id> <text>Writing up our Geovation work for #lupas2010.</text> <truncated>false</truncated> <in_reply_to_status_id></in_reply_to_status_id> <in_reply_to_user_id></in_reply_to_user_id> <favorited>false</favorited> <in_reply_to_screen_name></in_reply_to_screen_name> <geo xmlns:georss="http://www.georss.org/georss"> <georss:point>53.3833,-1.4722</georss:point> </geo></status>

<http://twitter.com/mattroweshow/9774519667> rdf:type sioc:Post ;rdf:type itr:LocalizedResource ; sioc:content "Writing up our Geovation work for

#lupas2010." ;dcterms:subject "lupas2010" ;itr:has_Localization _:a2 .

_:a2rdf:type gml:Geometry ;gml:pos "53.3833,-1.4722" .

Page 13: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Metadata Generation

• Web 2.0 platforms return data using:– Proprietary formats;– Heterogeneous data schemas

• Need to link data together from disparate sources• A social data fragment = a single piece of social data

– E.g. A tweet, an image, a video• Lift each social data fragment to RDF:

1. Create an instance of sioc:Post/itr:LocalizedResource• Assign it a URI

2. Assign the content to the instance (topic)• Use hashtags of the microblog

3. Create an instance of gml:Geometry (geo)• Capture geo facet

4. Assign timestamp of fragment creation (provenance)• Using dc:created

5. Assign the fragment to its owner (provenance)• Create foaf:Person instance

<status> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at> <id>9774519667</id> <text>Writing up our Geovation work for #lupas2010.</text> <truncated>false</truncated> <in_reply_to_status_id></in_reply_to_status_id> <in_reply_to_user_id></in_reply_to_user_id> <favorited>false</favorited> <in_reply_to_screen_name></in_reply_to_screen_name> <geo xmlns:georss="http://www.georss.org/georss"> <georss:point>53.3833,-1.4722</georss:point> </geo></status>

<http://twitter.com/mattroweshow/9774519667> rdf:type sioc:Post ;rdf:type itr:LocalizedResource ; sioc:content "Writing up our Geovation work for

#lupas2010." ;dcterms:subject "lupas2010" ;dcterms:created "2010-2-28 12:22:47.0" ;itr:has_Localization _:a2 .

_:a2rdf:type gml:Geometry ;gml:pos "53.3833,-1.4722" .

Page 14: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Metadata Generation

• Web 2.0 platforms return data using:– Proprietary formats;– Heterogeneous data schemas

• Need to link data together from disparate sources• A social data fragment = a single piece of social data

– E.g. A tweet, an image, a video• Lift each social data fragment to RDF:

1. Create an instance of sioc:Post/itr:LocalizedResource• Assign it a URI

2. Assign the content to the instance (topic)• Use hashtags of the microblog

3. Create an instance of gml:Geometry (geo)• Capture geo facet

4. Assign timestamp of fragment creation (provenance)• Using dc:created

5. Assign the fragment to its owner (provenance)• Create foaf:Person instance

<status> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at> <id>9774519667</id> <text>Writing up our Geovation work for #lupas2010.</text> <truncated>false</truncated> <in_reply_to_status_id></in_reply_to_status_id> <in_reply_to_user_id></in_reply_to_user_id> <favorited>false</favorited> <in_reply_to_screen_name></in_reply_to_screen_name> <geo xmlns:georss="http://www.georss.org/georss"> <georss:point>53.3833,-1.4722</georss:point> </geo></status>

<http://twitter.com/mattroweshow> rdf:type foaf:Person ;rdf:type itr:LocalizedResource ;foaf:name "Matthew Rowe" ;foaf:homepage <http://www.dcs.shef.ac.uk/~mrowe> ;

<http://twitter.com/mattroweshow/9774519667> rdf:type sioc:Post ;rdf:type itr:LocalizedResource ; sioc:content "Writing up our Geovation work for

#lupas2010." ;dcterms:subject "lupas2010" ;dcterms:created "2010-2-28 12:22:47.0" ;sioc:hasCreator <http://twitter.com/mattroweshow> ;itr:has_Localization _:a2 .

_:a2rdf:type gml:Geometry ;gml:pos "53.3833,-1.4722" .

Page 15: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Integrated Social Data

• Triplify social data from multiple platforms– Flickr XML response -> RDF– Picassa XML response -> RDF

• Use common semantics– Can perform SPARQL queries

PREFIX dcterms:<http://purl.org/dc/terms>SELECT ?itemWHERE {

?item dcterms:subject "iranelections" .

?item dcterms:created ?date}ORDER BY DESC(?date)

PREFIX dcterms:<http://purl.org/dc/terms>PREFIX itr:<http://www.dcs.shef.ac.uk/~gregoire/interaction/ns#>PREFIX gml:<http://www.opengis.net/gml/>SELECT DISTINCT ?post ?tagWHERE {

?post dcterms:subject ?tag .?post itr:has_Localization ?geo .?geo gml:pos "53.4813,-2.2392"

}

Page 16: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Interpreting Social Data

• Cumbrian Use Case– UK region suffered worst floods in centuries– Observe the effects in social data

• Rise in publication• Fine-grained geocoded social data

• Dataset:– Microblogs from 200 Cumbrian Twitter users

• Published during 2009• 3513 microblogs• Produced 475,043 triples

– Images from Flickr taken in Cumbria• 6663 images• Produced 182,304

Page 17: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Interacting with Social Data

• Built a visualisation application to analyse social data fragmentshttp://www.dcs.shef.ac.uk/~suvodeep/ViziSocial

• Filter by date– Lower slider

• Fine-grained focus– Zoom in

• Tag cloud– Shows fragment topics– Window controls tag cloud topics

• Markers contain number of fragments

Page 18: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Conclusions

• Consistent interpretation of social data– Across heterogeneous sources

• Application– Allows analyses of social data

• To fine-grained detail– Utilises multiple facets of social data– Requires metadata

• Issue of scalability

• Future Work– Adapting to real time data acquisition

• Focussing on South Yorkshire region at present• Assess scalability issue

Page 19: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Questions?

Twitter: @mattroweshowWeb: http://www.dcs.shef.ac.uk/~mroweEmail: [email protected]