e-Science: Understanding Research Data
Malcolm Atkinson & David De Roure
[email protected] & [email protected]
20 October 2009
RCUK fact-finding mission
Reporting Back
Data Quest: 7-30 Sept. 2009
2
http://blog.openwetware.org/deroure/http://wikis.nesc.ac.uk/escienvoy/Main_Page#Fact-finding_mission_to_the_USA_-_September_2009
Preview: Digital Revolution
• Abundant data• Intense activity• All forms of data• All disciplines• Range of maturity• From raw numbers in
files to linked data sites in RDF
• No boundary between documents and data
• New architectures
• Beacons of good practice
• Trusted centres• New “instruments”• Intellectual “ramps”• Going the “last mile”• Niches in a digital
ecosystem• Co-evolution• Multiple large
investments• Software & data
projects
3
Data’s time has comeData’s time has come
‘twas ever thusdata + algorithm= computationNiklaus Wirth
‘twas ever thusdata + algorithm= computationNiklaus Wirth
Data data everywhere
• Digital data key to global communication• Between people• Between machines• Between software components• The universal connecting glue
• 1.8 Zetta bytes by 2011 (1.8×1021)• in 20×1015 data containers
• Research data a small part• in a much bigger ecosystem
• Triggering the Digital-Data Revolution• higher stress than the previous revolutions• because simultaneous impact on every nation
5
Rich so
urces o
f unders
tandin
g –
untapped
Rich so
urces o
f unders
tandin
g –
untapped
Technology & Researchers
6
Co-evolution
Tech. display
Researcherschoose?
Niches?
Fastest atadaptationwinsIs
your data
tech
nology o
f inte
rest
to
rese
archers
?
Is your d
ata te
chnolo
gy of i
ntere
st to
rese
archers
?
Access ramp for the mind
7
Easy and low risk to startProgress to advanced skillsFor research data users
Bringing re
search
ers up to
speed
safe
lyBringing re
search
ers up to
speed
safe
ly
Leading to specialised data use
8Giving o
pportuniti
es to d
o sophist
icate
d
rese
arch
Giving o
pportuniti
es to d
o sophist
icate
d
rese
arch
Ride a professionally delivered data service
9
and routin
e rese
arch
effortless
ly
and routin
e rese
arch
effortless
ly
New “instruments”
10
NRAO/AUI/NSF
To reveal to the “naked mind”information it cannot see unaided
Datato Information
Changed our place in the universe
to Knowledgeto Wisdom
PresentPublishEmbedArchiveEvidenceOn Demand
The ‘whole mile’ from data to influence
11
Globalcloud
of existing data
Globalcloud
of existing data
FindAcquirePrepareAccessData
CleanFilterTransformCombineData
NormaliseAnalyseReviewCompareData andModels
Raw dataRaw dataSelected & cleaned dataSelected & cleaned dataAnalysis
resultsAnalysis resultsEvidenc
eEvidenceInsight + scholarly
publicationInsight + scholarly publicationPresentati
onPresentationImpact = behavioural
changeImpact = behavioural change
All enrich pool of data
12
Globalcloud
of existing data
Globalcloud
of existing data
PresentPublishEmbedArchiveEvidenceOn Demand
FindAcquirePrepareAccessData
CleanFilterTransformCombineData
NormaliseAnalyseReviewCompareData andModels
13
Data-Intensive Research: The UK played a leading role; what will it do now?
Principles
• Research data support should be in harmony with evolving digital ecosystem
• Increase investment in using data to balance investment in collecting it
• Co-evolve research practices, new methods and their supporting software
• Democratise research by improving education and access
• Align foundational research, pioneering and support
• Users of data need to be aware of costs and environmental impact
14Draft
– stil
l under
developm
ent
Draft
– stil
l under
developm
ent
Recommendations
• Stimulate new thinking in next generation of researchers
• Invest in creating and sharing methods and software for exploiting data
• Increase access and use by building ‘intellectual ramps’ and improved education
• Invest in foundational research into data-intensive methods linked with the ‘field’ experience of use and support
• Support an innovation to pioneering to supported facility life cycle
15Draft
– stil
l under
developm
ent
Draft
– stil
l under
developm
ent
Possible DIR Actions
• Run summer school/”bootcamp” on data-intensive research (DIR)
• Open discussion workshop on UK DIR requirements, including UK ESFRI projects
• Sandpit (cross RCUK & charities) initiating foundational research programme
• Inject DIR into current Doctoral programmes + new ones – trickle down into rest of HEIs
• Ensure that a proportion of UK research infrastructure is in services with Amdahl number ~1 or better
• Establish broad coordination body to plan UK DIR software, service and international activity
16Draft
– stil
l under
developm
ent
Draft
– stil
l under
developm
ent
Take home message
Survival in the digital-data revolution
depends on speed and appropriateness
of adaptation
17
18
ADMIRE – Framework 7 ICT 215024
?
Picture compositionbyLuke Humphrybased on prior art by Frans Hals
www.omii.ac.uk
www.admire-project.eu
www.ogsadai.org.uk
www.nesc.ac.uk