opening up data: a uk perspective – jisc and cni conference 10 july 2014
DESCRIPTION
Kevin Ashley, director, Digital Curation CentreTRANSCRIPT
Opening up data:A UK perspective
Kevin Ashley Digital Curation Centre
www.dcc.ac.uk@kevingashley
Reusable with attribution: CC-BY The DCC is supported by Jisc
2
A summary
• Policy background• The end point – why it matters• UK reaction & developments• Infrastructure• Costs• Joining up internationally• More than data…
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY
3
My home – the DCC
• Mission – to increase capability and capacity for research data services in UK institutions
• Not just a UK problem – an international one
• Training, shared services, guidance, policy, standards, futures
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY
42014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY
SWEDEN
DENMARK
CANADA
5
Data reuse stories
• The palaeontologist who saved years of work with archaeological data
• The 19th-century ships logs that help us model climate change
• The ‘noise’ from research radar that mapped dust from Eyjafjallajökull
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY
6
Data reuse - messages
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY
Often your data tells stories that your
publications do not
Not all data comes from other researchers
One person’s noise is another person’s signal
Discipline-bounded data discovery doesn’t give us
all we need or want
7
Why does this matter?
• Research quality– How close can we get to
the truth?
• Research speed– How quickly can we get
to the truth?
• Research finance– How much does the
truth cost?
• Improving one or more of these is of interest to all actors:
• Researchers as data creators
• Researchers as data reusers
• Research institutions• Funders – hence
government and society2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY
Kevin Ashley -Jisc/CNI 2014 - CC-BY 8
The Policy
2014-07-10
Kevin Ashley -Jisc/CNI 2014 - CC-BY 9
G8UK - Endorses OAOpen Data CharterPolicy Paper18 June 2013
2014-07-10
10
Funder requirements• UK – RCUK (generic), NERC, STFC,
ESRC, BBSRC, EPSRC, MRC
• USA – NSF, NEH, NIH• Europe
• Denmark, Germany, Netherlands…• Most place burden on researcher –
some on the institution
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY
http://www.epsrc.ac.uk/about/standards/researchdata/Pages/policyframework.aspx
11
RCUK policy - The 1-minute version
• Research data are a public good – make openly available in timely & responsible way
• Have policies & plans. Data with long-term value should be preserved & usable
• Metadata for discovery & reuse. Link publications & data
• Sometimes law, ethics get in the way. We understand.• Limited embargos OK. Recognition is important –
always cite data sources• OK to use public money to do this. Do it efficiently.
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY
Kevin Ashley -Jisc/CNI 2014 - CC-BY
EPSRC policy points
• Awareness of regulatory environment• Data access statement• Policies and processes• Data storage• Structured metadata descriptions• Permanent identifiers for data• Securely preserved for a minimum of 10 years
from last use
2014-07-10
12
Compliance expected by 2015
Kevin Ashley -Jisc/CNI 2014 - CC-BY 132014-07-10
DCC Policy Summary
http://www.dcc.ac.uk/resources/policy-and-legal
Kevin Ashley -Jisc/CNI 2014 - CC-BY 14
The Response
2014-07-10
15
DCC guidance
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY
Kevin Ashley -Jisc/CNI 2014 - CC-BY 16
Roles and Responsibilities
What data to keep
2014-07-10
Compliance
Benefits
18
DCC ‘institutional engagement’
Assess needs
Make the case
Develop support and
services
RDM policy development
Customised Data Management Plans
DAF & CARDIO assessments
Guidance and training
Workflow assessment
DCC support
team
Advocacy with senior management
Institutional data catalogues
Pilot RDM tools
…and support policy implementation2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY
20
Who (in the UK) is leading on RDM ?
Library
IT
ResearchOffice
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY
Kevin Ashley -Jisc/CNI 2014 - CC-BY 21
Survey of UK HE RDM readiness• 61 of 69 responded (> 10%
funding from research)• 90% using internal funding for
staff, training• 57% filling all or most roles
through restructuring• Russell Group: 4.7FTE -> 9.5 FTE
within a year• Others: 2.6 FTE -> 3 FTE• Lack of clarity on staff outside
central services
2014-07-10
31%
38%
14%
17%
Research Sup-port & Com-mercialisation
Library or In-formation Ser-vice
IT/ Research computing
Others
Data & charts from Angus Whyte, DCC
Kevin Ashley -Jisc/CNI 2014 - CC-BY 22
Drivers – UK institutions
2014-07-10
UK Research Council data policies
Government policy on open data
Governance of research integrity / academic conduct
Strategy to expand support for research
EU Horizon2020 policy on data management
0 10 20 30 40 50 60 70 80 90 100
92
57
54
54
53
% Agreeing
23
Least progress
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY
Business planning & sustainability
Digital preservation & continuity planning
Governance of data access & reuse
0 5 10 15 20 25
% indicating piloting or live
24
What kind of external support is needed?
• Advice on retention, selection• Advice on metadata creation for discovery• Specifying tools & infrastructure• Costing• Advocacy to senior management• Developing data catalogues/registers
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY
Kevin Ashley -Jisc/CNI 2014 - CC-BY 25
EPSRC asked its researchers…
• 75% know of funder’s policy (25% in detail)• 55% know their institution has a policy• 70% are not aware of institutional training or
services for RDM• Some contradictory responses
2014-07-10
Thanks to Ben Ryan, EPSRC, for quotes & data
26
Services researchers are aware of
• Help with data management planning• Help with metadata creation• Training• Backup of research data• Dedicated storage
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY
27
Some selected observations
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY
The nature of my work is such that it generates no data that doesn't end up in my papers, so I'm unlikely to know about these policies.
This is irrelevant to me. I deal
with no sensitive data
RDM sounds like a gigantic waste of time and I intend to spendas little time on it as possible
I am on the point of retiring so taking
less interest in these things
28
Infrastructure
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY
Kevin Ashley -Jisc/CNI 2014 - CC-BY 292014-07-10
Kevin Ashley -Jisc/CNI 2014 - CC-BY 302014-07-10
Kevin Ashley -Jisc/CNI 2014 - CC-BY 31
Data centres are good value!
• See Jisc reports on ADS, BADC, UKDA:
2014-07-10
http://www.jisc.ac.uk/whatwedo/programmes/di_directions/strategicdirections/badc.aspx
Returns on
investment
between 400%
and 1200%
32
Research Data Registry & Discovery Service
• Modelled on Research Data Australia• Gain visibility of small data collections• Help drive home distinction between
discoverable data & open data• Get evidence on which metadata items deliver
reuse potential• Idea from UKRDS report in 2010• RDA working group coordinating international
work2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY
Kevin Ashley -Jisc/CNI 2014 - CC-BY 332014-07-10
342014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY
35
Pimp your data –
make it findable & reusable
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY
Gking.harvard.edu/data
Kevin Ashley -Jisc/CNI 2014 - CC-BY 36
On costs
• Costs of data curation relatively simple to measure: see work of 4C (4cproject.eu)
• Charging and payment are more complex• Funder rules can lead to perverse, inefficient
payment systems• Fundamental question is ‘who pays’. This
changes the answer to ‘what does it cost’
2014-07-10
37
Commercial services
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY
38
What it means
• Project funding can only be spent during projects on direct project costs
• Project funding comes with overheads, which universities must use for research infrastructure
• Ongoing (‘QR’) money is continuous, relates to research ranking
• Important to distinguish business-as-usual from exceptional requirements
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY
39
A research lifecycle
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY
Time
Resources
Exceptional zone
Normal zone
Project end point
Business as usual threshold
Eligible for project funding
40
Being clever with costs
• Ongoing costs beyond project end cannot be charged to a grant, but…
• ‘Pay once, store forever’ charges are acceptable.
• Thus, incentive to outsource long-term curation• Yet universities are only acting as last-resort
option in any case – discipline data archives preferred
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY
Many of these are run by
funders
41
What stops data reuse
• Loss• Destruction• Pride• Gluttony• Ineptitude• Concealment• Bureaucracy• Complexity• Procrastination• Lack of potential2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY
Kevin Ashley -Jisc/CNI 2014 - CC-BY 42
“Departments don’t have guidelines or norms for personal back-up and researcher procedure,
knowledge and diligence varies tremendously. Many have experienced moderate to
catastrophic data loss”
Incremental Project Report, June 2010
http://www.flickr.com/photos/mattimattila/3003324844/
2014-07-10
Kevin Ashley -Jisc/CNI 2014 - CC-BY 43
Excuses – and responses• “People will ask questions”
– So use a data centre or repository• “It will be misinterpreted”
– Stuff happens. Also, openness encourages correction• “It’s not interesting”
– Let others be the judge – your noise is my signal• “I might get another paper out of it”
– Up to a point. We might get more research out of it• “I don’t have permission”
– A real problem. But solvable at senior level• “It’s too bad/complicated” –see above• “It’s not a priority”
– Unfortunately, funders are making it so. But if you looked at the evidence, it would be your priority as well
2014-07-10
See e.g. Carly Strasser’s blog: http://datapub.cdlib.org/2013/04/24/closed-data-excuses-excuses/
44
Citability
• Making data available increases citations• Everyone – academic, funder, institution –
loves citations• Want evidence?
– Alter, Pienta, Lyle – 240%, social sciences *– Piwowar, Vision – 9% (microarray data)†– Henneken, Accomazzi – 20% (astronomy) #
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY
† Piwowar H, Vision TJ. (2013) Data reuse & the open data citation advantage. PeerJ PrePrints 1:e1v1 http://dx.doi.org/10.7287/peerj.preprints.1v1
* Amy Pienta, George Alter, Jared Lyle, (2010) The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data.http://hdl.handle.net/2027.42/78307
# Edwin Henneken, Alberto Accomazzi, (2011) Linking to Data - Effect on Citation Rates in Astronomy. http://arxiv.org/abs/1111.3618
45
Open scholarly communication
• It’s not just publications and/or data• Software, methods, workflows, instruments…• Need to resist the urge to make everything
look like a publication
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY