kevin ashley digital curation centre ...rdm policy development customised data management plans daf...
TRANSCRIPT
Research data: what is being done
Kevin Ashley Digital Curation Centre
www.dcc.ac.uk @kevingashley
Reusable with attribution: CC-BY The DCC is supported by Jisc
My home – the DCC
• Mission – to increase capability and capacity for research data services in UK institutions
• Not just a UK problem – an international one
• Training, shared services, guidance, policy, standards, futures
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 2
Before what -
WHY?
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 3
Data reuse stories
• The palaeontologist who saved years of work with archaeological data
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 4
What a paleontologist looks at
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 5
Now 100 million years ago
25m 50m 75m
1m
What a paleontologist looks at
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 6
Now 100 million years ago
25m 50m 75m
1m Now 1 million years
750,000 500,000 100,000
What an archaeologist looks at
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 7
Now 1 million years
750,000 500,000 100,000
100,000 years ago
75,000 50,000 25,000
Data reuse stories
• The palaeontologist who saved years of work with archaeological data
• The 19th-century ships logs that help us model climate change
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 8
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 9
The Old weather project
Data for research, not from research
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 10
Data reuse stories
• The palaeontologist who saved years of work with archaeological data
• The 19th-century ships logs that help us model climate change
• The ‘noise’ from research radar that mapped dust from Eyjafjallajökull
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 11
Data reuse - messages
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 12
Often your data tells stories that your
publications do not
Not all data comes from other researchers
One person’s noise is another person’s signal
Discipline-bounded data discovery doesn’t give us
all we need or want
Data reuse from Hubble
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 13
G8UK - Endorses
OA
Open Data
Charter
Policy Paper
18 June 2013
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 14
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 15
Why care?
• Data is expensive – an investment
• Reuse:
– More research
– Teaching & Learning
– Planning
• Impact – with or without publication
• Accountability
• Legal & regulatory requirements
Why does this matter?
• Research quality – How close can we get to
the truth?
• Research speed – How quickly can we get
to the truth?
• Research finance – How much does the
truth cost?
• Improving one or more of these is of interest to all actors:
• Researchers as data creators
• Researchers as data reusers
• Research institutions
• Funders – hence government and society
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 16
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 17
FUNDER POLICY UNIVERSITY RESPONSE
Funders are making demands
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 18
Funder requirements
• UK
• USA – NSF, NEH, NIH • Europe
• Denmark – in development • Most place burden on
researcher – some on the institution
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 19
http://www.epsrc.ac.uk/about/standards/researchdata/Pages/policyframework.aspx
RCUK policy - The 1-minute version
• Research data are a public good – make openly available in timely & responsible way
• Have policies & plans. Data with long-term value should be preserved & usable
• Metadata for discovery & reuse. Link publications & data
• Sometimes law, ethics get in the way. We understand.
• Limited embargos OK. Recognition is important – always cite data sources
• OK to use public money to do this. Do it efficiently.
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 20
EPSRC policy points
• Awareness of regulatory environment
• Data access statement
• Policies and processes
• Data storage
• Structured metadata descriptions
• DOIs for data
• Securely preserved for a minimum of 10 years from last use
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY
21
Compliance expected by 2015
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 22
DCC Policy Summary
http://www.dcc.ac.uk/resources/policy-and-legal
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 23
Research data centres are good value!
• See Jisc reports on ADS, BADC, UKDA:
• Returns on investment between 400% and 1200%
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 24
http://www.jisc.ac.uk/whatwedo/programmes/di_directions/strategicdirections/badc.aspx
Research Data Centres – the solution!
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 25
MANY AREAS OF RESEARCH HAVE NO
DATA CENTRE TO SERVE THEM
164 universities in UK*
*2011 HESA data
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 26
71 (43%) > 5% research income
115 (70%) > £1m income from research
£4.4 billion total research grants
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 27
BIS business case: £1.5m annual investment in national research data services pays back 2.5 times after 5 years.
DCC ‘institutional engagement’ Assess needs
Make the case
Develop support and
services
RDM policy development
Customised Data Management Plans
DAF & CARDIO assessments
Guidance and training
Workflow assessment
DCC support
team
Advocacy with senior management
Institutional data catalogues
Pilot RDM tools
…and support policy implementation 2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 28
DCC guidance
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 29
Roles and Responsibilities
What data to keep
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 30
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 31
Some institutional roles
• Leadership – coordinate action • Audit – who has what, where does it go? • Advice on access – data, wherever it is • Preservation – permanence • Citability • Data/publication linking • Promoting data in teaching • Selection • Education – early career researchers
Who (in the UK) is leading RDM work?
Library
IT
Research
Office
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 32
RESEARCHERS
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 33
Understanding Data Requirements
http://www.dcc.ac.uk/
Kevin Ashley –DEIC-2014 - CC-BY
34
“Departments don’t have guidelines or norms
for personal back-up and researcher procedure,
knowledge and diligence varies tremendously.
Many have experienced moderate to
catastrophic data loss”
Incremental Project Report, June 2010
http://www.flickr.com/photos/mattimattila/3003324844/
2014-10-01
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 35
INSTITUTIONAL SERVICES
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 36
Some example services
• Storage – persistent, shareable
• Permanent, citeable identifiers
• Database as a service (e.g. Oxford ORDS)
• Embed tools in Excel – Dataup, others
• Workflow management - Taverna
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 37
Make data creation easier
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 38
Make data citable
• Making data available increases citations
• Everyone – academic, funder, institution – loves citations
• Want evidence? – Alter, Pienta, Lyle – 240%, social sciences *
– Piwowar, Vision – 9% (microarray data)†
– Henneken, Accomazzi – 20% (astronomy) #
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 39
† Piwowar H, Vision TJ. (2013) Data reuse & the open data citation advantage. PeerJ PrePrints 1:e1v1 http://dx.doi.org/10.7287/peerj.preprints.1v1
* Amy Pienta, George Alter, Jared Lyle, (2010) The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data. http://hdl.handle.net/2027.42/78307
# Edwin Henneken, Alberto Accomazzi, (2011) Linking to Data - Effect on Citation Rates in Astronomy. http://arxiv.org/abs/1111.3618
Make data discoverable
• Data must be discoverable to be reused
• Alone, or in conjunction with publication
• Institutional catalogues, national data registries – JISC is piloting through DCC
• We are copying Australian approach
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 40
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 41
Pimp your data –
make it findable & reusable
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 42
Gking.harvard.edu/data
Cloud for storage – sorted!
• Sorry, but it isn’t.
• See David Rosenthal’s analysis of the economics of Amazon for preservation
“Distributed digital preservation in the cloud”
IJDC 8(1), 2013 doi:10.2218/ijdc.v8i1.248
The cloud has uses – long-term data retention is not one.
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 43
Cost of data for 100 years – local vs Amazon S3 Data from blog.dshr.org/2013/01/talk-at-idcc2013.html
© David Rosenthal, used under CC-BY-SA licence 2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 44
Cost of data for 100 years – local vs Amazon S3 AND Glacier Data from blog.dshr.org/2013/01/talk-at-idcc2013.html
© David Rosenthal, used under CC-BY-SA licence 2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 45
What about collaboration?
• Collaborate within the university
• Collaborate with partners
• Collaborate with regional, national services
• Not everything can be done well locally
• Infrastructure needed at research group, institution, national, (discipline) & international level
• Internationally – look to Research Data Alliance
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 46
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 47
Commercial services
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 48
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 49
SWEDEN
DENMARK?
CANADA
Closing thoughts
• Library/data centre roles: – selecting content – protecting it – enabling and encouraging reuse – Assisting with data management planning
• Library: – helping users find the most relevant content – much
research data does not come from research
• Data center: – setting standards – enabling uptake – providing services
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 50
Infrastructure levels
• Truly international – instruments, standards
• National variation, international core:
– Training
– Data management planning
– Policy
– ..
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 51
My message to researchers • The credit belongs to you
• The data belongs to all of us
• Share, and we all reap the benefits
2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 52