highlights from day 3* in the big data house
DESCRIPTION
Highlights from Day 3* in the Big Data House. * ±1. Wednesday’s theme. It's not just the scale and volume of data that characterises data-intensive research, but also the complexity within and across datasets May be in one discipline or across many. - PowerPoint PPT PresentationTRANSCRIPT
Highlights from Day 3* in theBig Data House
* ±1
Wednesday’s theme• It's not just the scale and volume of data that
characterises data-intensive research, but also the complexity within and across datasets
• May be in one discipline or across many
My motivation: understanding the scholarly data ecosystem
• Data collections are growing in number, volume and complexity
• Overall there is growing heterogeneity• The scholarly process seems to be making people
more and more expert in smaller and smaller areas• Grand challenges need researchers to cut across the
silos:– Data– Technology– Community– Funding
Before• I know people want to do data integration –
linkage – different info about same thing/place/person/time• e.g. Google maps• e.g. Longitudinal studies
• I wanted to know what it really means, inside and across disciplines• NAR 1000+ databases• e.g. Climate change
http://isabel-drost.de/hadoop/slides/fosdem2010.pdf
MapReduceWhere is it applicable?
http://www.maptube.org/lookeast
July, August, September 20086,902 responses
BBC Look East: Anti-Social Behaviour
Mike Batty
Ideas on the future of social science research data
• Enduring challenges of documentation for replication, and coordination
• More and more comparative analysis• Harmonisation and standardisation
• Data linkage and data enhancement• Models for complex multiprocess systems • Fluency – increasing uptake by more users
17/MAR/2010 DIR workshop: Handling Social Science Data 7
Paul Lambert
Andrey Rzhetsky
Linked Open Data
Linked data• Lightweight• Doesn’t mandate a technology• Small investment, potential big return• Sometimes misunderstood
– Hugh Glaser didn’t use the O-word or the I-word• Well positioned for effect in the ecosystem• I’m worried about handling data that changes over
time• “Publish and be damned” can be cultural obstacle
What we didn’t discuss enough(or I wasn’t in the room)
• Provenance working across silos• Map-Reduce• Arts and humanities• ...
SysMO summary• Providing an environment where every data-driven
researcher will thrive• Reality is messy.
– Extreme Technology Determinism vs Voluntarist Sociocultural shaping
• Extreme and continuous partnership with users.– Act Local Think Global
• Agile development environment facilitated stream of features to tackle pain points.– Leverage other e-Laboratories, Maintaining scientists’ buy-in.
• Socio-Political Axis dominates the Technical Axis.– Collaboration evolutions, Confidence in exchange.
Carole Goble
Socio-technical perspective strong• Carole’s talk:
– Reputation, incentives, sharing• New forms of data for digital social research
– Loyalty cards– Traffic cameras– Smart electricity meters– Facebook
• Privacy vs. inference• Sociology of digital entities?• Social simulation• Crowd sourcing and citizen-sensing• Citation
Structural Analysis of Large Amounts of Music Information University of Illinois, Urbana-Champaign, University of Southampton, McGill UniversityDigging Into the Enlightenment: Mapping the Republic of Letters University of Oklahoma, University of Oxford, Stanford UniversityData Mining with Criminal Intent George Mason University, University of Alberta, University of HertfordshireTowards Dynamic Variorum Editions Mount Allison University, Imperial College, London, Tufts University
Digging into Image Data to Answer Authorship Related Questions Michigan State University, University of Illinois, Urbana-Champaign, University of SheffieldHarvesting Speech Datasets for Linguistic Research on the Web McGill University, Cornell UniversityRailroads and the Making of Modern America–Tools for Spatio-Temporal Correlation, Analysis, and Visualization University of Portsmouth, University of Nebraska-LincolnMining a Year of Speech University of Oxford, University of Pennsylvania
Digging into Data
Structural Analysis of Large Amounts of Music Information University of Illinois, Urbana-Champaign, University of Southampton, McGill UniversityDigging Into the Enlightenment: Mapping the Republic of Letters University of Oklahoma, University of Oxford, Stanford UniversityData Mining with Criminal Intent George Mason University, University of Alberta, University of HertfordshireTowards Dynamic Variorum Editions Mount Allison University, Imperial College, London, Tufts University
Thanks to everyone!