integrating and interpreting social data from heterogeneous sources
DESCRIPTION
TRANSCRIPT
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Integrating and Interpreting Social Data from Heterogeneous Sources
Matthew Rowe Organisations, Information and
Knowledge GroupUniversity of Sheffield
Suvodeep MazumdarDepartment of Information Studies
University of Sheffield
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Outline
• Information overload– Increase in social data publication
• Interlinking social data– Metadata Generation– Integrating Social Data
• Application: Interpreting Social Data– Cumbrian Floods Use Case– Interacting with Social Data
• Conclusions
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Information Overload
• Masses of social data are published every day– E.g. 50 million tweets (600 per second)
• http://blog.twitter.com– 22 million Facebook users in the UK
• http://www.clickymedia.co.uk/2009/10/uk-facebook-user-statistics-october-2009/
• Too much information to deal with!• Social data is multi-faceted:
– Provenance– Topic– Geo
• Trend services (e.g. trendistic, blogpulse):– Focus on majority consensus– Need to listen in to a specific topic– Concentrate on a single source/platform– Do not consider geo facet
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Interlinking Social Data
• Consider multi-faceted nature of social data:– Allows fine-grained analysis– Show geo-localised social data– Relevant past social data
• Solution: Interlink social data from heterogeneous sources– Use semantics!– Consistent data interpretation
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Metadata Generation
• Web 2.0 platforms return data using:– Proprietary formats;– Heterogeneous data schemas
• Need to link data together from disparate sources• A social data fragment = a single piece of social data
– E.g. A tweet, an image, a video• Lift each social data fragment to RDF:
1. Create an instance of sioc:Post and itr:LocalizedResource• Assign it a URI
2. Assign the content to the instance (topic)• Use hashtags of the microblog
3. Create an instance of gml:Geometry (geo)• Capture geo facet
4. Assign timestamp of fragment creation (provenance)• Using dc:created
5. Assign the fragment to its owner (provenance)• Create foaf:Person instance
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Metadata Generation
• Web 2.0 platforms return data using:– Proprietary formats;– Heterogeneous data schemas
• Need to link data together from disparate sources• A social data fragment = a single piece of social data
– E.g. A tweet, an image, a video• Lift each social data fragment to RDF:
1. Create an instance of sioc:Post and itr:LocalizedResource• Assign it a URI
2. Assign the content to the instance (topic)• Use hashtags of the microblog
3. Create an instance of gml:Geometry (geo)• Capture geo facet
4. Assign timestamp of fragment creation (provenance)• Using dc:created
5. Assign the fragment to its owner (provenance)• Create foaf:Person instance
<status> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at> <id>9774519667</id> <text>Writing up our Geovation work for #lupas2010.</text> <truncated>false</truncated> <in_reply_to_status_id></in_reply_to_status_id> <in_reply_to_user_id></in_reply_to_user_id> <favorited>false</favorited> <in_reply_to_screen_name></in_reply_to_screen_name> <geo xmlns:georss="http://www.georss.org/georss"> <georss:point>53.3833,-1.4722</georss:point> </geo></status>
<photo id="949406913" media="photo"> <owner nsid="54948696@N00”/> <title>DSC00171.JPG</title> <description></description> <dates posted="1205398307" taken="2009-01-09 09:16:31" lastupdate="1257421561" /> <tags> <tag id="24539622-2330113101-400" author="54948696@N00" raw="arctic">arctic</tag> <tag id="24539622-2330113101-401" author="54948696@N00" raw="monkeys">monkeys</tag> </tags> <location latitude="53.4813" longitude="-2.2392" place_id="R8vDw_abBpSzUA"> <locality place_id="R8vDw_abBpSzUA" woeid="27872">Manchester</locality> <region place_id="pn4MsiGbBZlXeplyXg" woeid="24554868">England</region> <country place_id="DevLebebApj4RVbtaQ" woeid="23424975">United Kingdom</country> </location></photo>
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Metadata Generation
• Web 2.0 platforms return data using:– Proprietary formats;– Heterogeneous data schemas
• Need to link data together from disparate sources• A social data fragment = a single piece of social data
– E.g. A tweet, an image, a video• Lift each social data fragment to RDF:
1. Create an instance of sioc:Post and itr:LocalizedResource• Assign it a URI
2. Assign the content to the instance (topic)• Use hashtags of the microblog
3. Create an instance of gml:Geometry (geo)• Capture geo facet
4. Assign timestamp of fragment creation (provenance)• Using dc:created
5. Assign the fragment to its owner (provenance)• Create foaf:Person instance
<status> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at> <id>9774519667</id> <text>Writing up our Geovation work for #lupas2010.</text> <truncated>false</truncated> <in_reply_to_status_id></in_reply_to_status_id> <in_reply_to_user_id></in_reply_to_user_id> <favorited>false</favorited> <in_reply_to_screen_name></in_reply_to_screen_name> <geo xmlns:georss="http://www.georss.org/georss"> <georss:point>53.3833,-1.4722</georss:point> </geo></status>
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Metadata Generation
• Web 2.0 platforms return data using:– Proprietary formats;– Heterogeneous data schemas
• Need to link data together from disparate sources• A social data fragment = a single piece of social data
– E.g. A tweet, an image, a video• Lift each social data fragment to RDF:
1. Create an instance of sioc:Post/itr:LocalizedResource• Assign it a URI
2. Assign the content to the instance (topic)• Use hashtags of the microblog
3. Create an instance of gml:Geometry (geo)• Capture geo facet
4. Assign timestamp of fragment creation (provenance)• Using dc:created
5. Assign the fragment to its owner (provenance)• Create foaf:Person instance
<status> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at> <id>9774519667</id> <text>Writing up our Geovation work for #lupas2010.</text> <truncated>false</truncated> <in_reply_to_status_id></in_reply_to_status_id> <in_reply_to_user_id></in_reply_to_user_id> <favorited>false</favorited> <in_reply_to_screen_name></in_reply_to_screen_name> <geo xmlns:georss="http://www.georss.org/georss"> <georss:point>53.3833,-1.4722</georss:point> </geo></status>
<http://twitter.com/mattroweshow/9774519667> rdf:type sioc:Post ;rdf:type itr:LocalizedResource ;
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Metadata Generation
• Web 2.0 platforms return data using:– Proprietary formats;– Heterogeneous data schemas
• Need to link data together from disparate sources• A social data fragment = a single piece of social data
– E.g. A tweet, an image, a video• Lift each social data fragment to RDF:
1. Create an instance of sioc:Post/itr:LocalizedResource• Assign it a URI
2. Assign the content to the instance (topic)• Use hashtags of the microblog
3. Create an instance of gml:Geometry (geo)• Capture geo facet
4. Assign timestamp of fragment creation (provenance)• Using dc:created
5. Assign the fragment to its owner (provenance)• Create foaf:Person instance
<status> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at> <id>9774519667</id> <text>Writing up our Geovation work for #lupas2010.</text> <truncated>false</truncated> <in_reply_to_status_id></in_reply_to_status_id> <in_reply_to_user_id></in_reply_to_user_id> <favorited>false</favorited> <in_reply_to_screen_name></in_reply_to_screen_name> <geo xmlns:georss="http://www.georss.org/georss"> <georss:point>53.3833,-1.4722</georss:point> </geo></status>
<http://twitter.com/mattroweshow/9774519667> rdf:type sioc:Post ;rdf:type itr:LocalizedResource ; sioc:content "Writing up our Geovation work for
#lupas2010." ;dcterms:subject "lupas2010" ;
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Metadata Generation
• Web 2.0 platforms return data using:– Proprietary formats;– Heterogeneous data schemas
• Need to link data together from disparate sources• A social data fragment = a single piece of social data
– E.g. A tweet, an image, a video• Lift each social data fragment to RDF:
1. Create an instance of sioc:Post/itr:LocalizedResource• Assign it a URI
2. Assign the content to the instance (topic)• Use hashtags of the microblog
3. Create an instance of gml:Geometry (geo)• Capture geo facet
4. Assign timestamp of fragment creation (provenance)• Using dc:created
5. Assign the fragment to its owner (provenance)• Create foaf:Person instance
<status> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at> <id>9774519667</id> <text>Writing up our Geovation work for #lupas2010.</text> <truncated>false</truncated> <in_reply_to_status_id></in_reply_to_status_id> <in_reply_to_user_id></in_reply_to_user_id> <favorited>false</favorited> <in_reply_to_screen_name></in_reply_to_screen_name> <geo xmlns:georss="http://www.georss.org/georss"> <georss:point>53.3833,-1.4722</georss:point> </geo></status>
<http://twitter.com/mattroweshow/9774519667> rdf:type sioc:Post ;rdf:type itr:LocalizedResource ; sioc:content "Writing up our Geovation work for
#lupas2010." ;dcterms:subject "lupas2010" ;itr:has_Localization _:a2 .
_:a2rdf:type gml:Geometry ;gml:pos "53.3833,-1.4722" .
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Metadata Generation
• Web 2.0 platforms return data using:– Proprietary formats;– Heterogeneous data schemas
• Need to link data together from disparate sources• A social data fragment = a single piece of social data
– E.g. A tweet, an image, a video• Lift each social data fragment to RDF:
1. Create an instance of sioc:Post/itr:LocalizedResource• Assign it a URI
2. Assign the content to the instance (topic)• Use hashtags of the microblog
3. Create an instance of gml:Geometry (geo)• Capture geo facet
4. Assign timestamp of fragment creation (provenance)• Using dc:created
5. Assign the fragment to its owner (provenance)• Create foaf:Person instance
<status> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at> <id>9774519667</id> <text>Writing up our Geovation work for #lupas2010.</text> <truncated>false</truncated> <in_reply_to_status_id></in_reply_to_status_id> <in_reply_to_user_id></in_reply_to_user_id> <favorited>false</favorited> <in_reply_to_screen_name></in_reply_to_screen_name> <geo xmlns:georss="http://www.georss.org/georss"> <georss:point>53.3833,-1.4722</georss:point> </geo></status>
<http://twitter.com/mattroweshow/9774519667> rdf:type sioc:Post ;rdf:type itr:LocalizedResource ; sioc:content "Writing up our Geovation work for
#lupas2010." ;dcterms:subject "lupas2010" ;dcterms:created "2010-2-28 12:22:47.0" ;itr:has_Localization _:a2 .
_:a2rdf:type gml:Geometry ;gml:pos "53.3833,-1.4722" .
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Metadata Generation
• Web 2.0 platforms return data using:– Proprietary formats;– Heterogeneous data schemas
• Need to link data together from disparate sources• A social data fragment = a single piece of social data
– E.g. A tweet, an image, a video• Lift each social data fragment to RDF:
1. Create an instance of sioc:Post/itr:LocalizedResource• Assign it a URI
2. Assign the content to the instance (topic)• Use hashtags of the microblog
3. Create an instance of gml:Geometry (geo)• Capture geo facet
4. Assign timestamp of fragment creation (provenance)• Using dc:created
5. Assign the fragment to its owner (provenance)• Create foaf:Person instance
<status> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at> <id>9774519667</id> <text>Writing up our Geovation work for #lupas2010.</text> <truncated>false</truncated> <in_reply_to_status_id></in_reply_to_status_id> <in_reply_to_user_id></in_reply_to_user_id> <favorited>false</favorited> <in_reply_to_screen_name></in_reply_to_screen_name> <geo xmlns:georss="http://www.georss.org/georss"> <georss:point>53.3833,-1.4722</georss:point> </geo></status>
<http://twitter.com/mattroweshow> rdf:type foaf:Person ;rdf:type itr:LocalizedResource ;foaf:name "Matthew Rowe" ;foaf:homepage <http://www.dcs.shef.ac.uk/~mrowe> ;
<http://twitter.com/mattroweshow/9774519667> rdf:type sioc:Post ;rdf:type itr:LocalizedResource ; sioc:content "Writing up our Geovation work for
#lupas2010." ;dcterms:subject "lupas2010" ;dcterms:created "2010-2-28 12:22:47.0" ;sioc:hasCreator <http://twitter.com/mattroweshow> ;itr:has_Localization _:a2 .
_:a2rdf:type gml:Geometry ;gml:pos "53.3833,-1.4722" .
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Integrated Social Data
• Triplify social data from multiple platforms– Flickr XML response -> RDF– Picassa XML response -> RDF
• Use common semantics– Can perform SPARQL queries
PREFIX dcterms:<http://purl.org/dc/terms>SELECT ?itemWHERE {
?item dcterms:subject "iranelections" .
?item dcterms:created ?date}ORDER BY DESC(?date)
PREFIX dcterms:<http://purl.org/dc/terms>PREFIX itr:<http://www.dcs.shef.ac.uk/~gregoire/interaction/ns#>PREFIX gml:<http://www.opengis.net/gml/>SELECT DISTINCT ?post ?tagWHERE {
?post dcterms:subject ?tag .?post itr:has_Localization ?geo .?geo gml:pos "53.4813,-2.2392"
}
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Interpreting Social Data
• Cumbrian Use Case– UK region suffered worst floods in centuries– Observe the effects in social data
• Rise in publication• Fine-grained geocoded social data
• Dataset:– Microblogs from 200 Cumbrian Twitter users
• Published during 2009• 3513 microblogs• Produced 475,043 triples
– Images from Flickr taken in Cumbria• 6663 images• Produced 182,304
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Interacting with Social Data
• Built a visualisation application to analyse social data fragmentshttp://www.dcs.shef.ac.uk/~suvodeep/ViziSocial
• Filter by date– Lower slider
• Fine-grained focus– Zoom in
• Tag cloud– Shows fragment topics– Window controls tag cloud topics
• Markers contain number of fragments
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Conclusions
• Consistent interpretation of social data– Across heterogeneous sources
• Application– Allows analyses of social data
• To fine-grained detail– Utilises multiple facets of social data– Requires metadata
• Issue of scalability
• Future Work– Adapting to real time data acquisition
• Focussing on South Yorkshire region at present• Assess scalability issue
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Questions?
Twitter: @mattroweshowWeb: http://www.dcs.shef.ac.uk/~mroweEmail: [email protected]