A centre of expertise in digital information management
www.ukoln.ac.uk
UKOLN is supported by:
Changing Roles, Responsibilities and Relationships
Dr Liz Lyon, Director, UKOLN
Associate Director, UK Digital Curation Centre
Opening the research data lifecycle, JISC Conference 2007
This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 2.0
Preliminary findings from a JISC study• Terms of Reference for UKOLNTo define how institutions (collectively and individually) and
scientific data centres can together effectively achieve – Preservation– Access – Managed and open– Re-use – Data citation, data mining and re-interpretation
• October 2006 – March 2007
• N.B. Work in progress!
Some of the data stakeholders?
Funders
• Interviews: 4 Research Councils + 1 charity• Support for data curation is (still) patchy• Mixed approaches: proactive to passive• Gaps in infrastructure support for data outputs• Limited formal links between programme planning and
support infrastructure• Some Data management and sharing policies• Some use of Data Management Plans• Wellcome Trust – Policy + Q&A January 2007
A centre of expertise in digital information management
www.ukoln.ac.uk
January 2007
Data Management and Sharing Plan required “if creating or developing a resource for the research community as the primary goal” or “involve the generation of a significant quantity of data that could potentially be shared for added benefit”
Funders 2
• Limited advocacy work
• Funding models for infrastructure support vary
• Funding models for research programmes vary
• Some productive partnerships e.g. MRC and Wellcome Trust, CCLRC and Wellcome
• Some examples of good practice
Hierarchy of drivers (for data sharing) Acknowledgement: Mark Thorley, NERC
• Level 0: deliver project.
• Level 1: meet ‘good scientific practice’.
• Level 2: support own science.
• Level 3: employer’s requirements.
• Level 4: funder’s requirements.
• Level 5: public policy requirements.
NATURALENVIRONMENTRESEARCH COUNCIL
NERC has:
7 designated data centres
Data Management Co-ordinator
DataGrid
MRC developing a data support plan Acknowledgement Alan Sudlow
Data centres & Data services• Interviews with 5 data services• Deep levels of expertise and subject knowledge• Exemplars of good practice: standards, policies, manuals, robust
curation / preservation practice• Limited sharing of expertise between centres• Some effective partnerships:
– AHDS Stormont Papers with Queens Belfast – BADC with CLADDIER Project
• Wide range of community awareness• Use of licences but IPR issues: performing arts, • Technical issues: complexity of data sets, version control, identifiers,
application profiles
Data centres & Data services 2• Exemplar of good practice
– European Bio-informatics Institute – Microarray data to inform gene expression– Consensus on community standards MIAME– Data pipelines at source via Laboratory Information Management
Systems LIMS– User tools MIAMExpress & value-added services– Annotation of data using the Gene Ontology– Submission & deposit is embedded in community culture: requirement for
publication– Training programme, eLearning materials coming– This level of data curation is expensive!!
Reactome
EnsEMBLGenome
Annotation
EMBL-BankDNA sequences
UniProtProtein Sequences
Array-ExpressMicroarray
Expression Data
EMSDMacromolecularStructure Data
IntActProtein Interactions Source: Graham
Cameron, EBI
Flybase
MGD
SGD
BRENDA
Chemicaldata
resources
Medical data resources
Biodiversitydata
resources
IMGT
Pasteur DBs
Eumorphia/Phenotypes
Corebiomolecular
resources
Specialist biomolecular data resource examples
Mutants
Large resources in related disciplines
Model organism resource examples
Mouse AtlasSource: Graham
Cameron, EBI
General Data Selection Criteria• Usability
– Quality of data– Usable data format– Conditions of Use– Reputable Author– Documentation
• Usefulness– Data quality– Uniqueness of data– Potential Strategic Use– Usefulness of parameters
Institutions & Data Repositories
• Not much data…. or duplication …… (yet?)• Departmental audits of research data practice at
University of Southampton to inform developing institutional data & curation policy
• Barriers to data sharing: – IPR and geospatial data– Lack of awareness amongst researchers– Cultural roots and resistance to change
• Exemplars of good practice: eBank Project
Aggregator services
Institutional data repositories
Deposit , Validation
Publication
ValidationData analysis
Search, harvest
Presentation services / portals
Data discovery, linking, citation
Laboratory repository
Deposit
eCrystals ‘Global Federation’ Model
Publishers: peer-review journals, conference proceedings, etc
Curation
Preservation
Subject Repository
Institution Library & Information Services
Data creation & capture in “Smart lab”
Data discovery, linking, citation
Search, harvest
Search, harvest
Deposit
Deposit
Deposit
Roles, Rights & Responsibilities
• ‘Scientist’: Creation and use of data.• ‘Data centre’: Curation of and access to data.
• ‘User’: Use of 3rd party data.
• ‘Funder’: Set / react to public policy drivers.• ‘Publisher’: Maintain integrity of the scientific
record.
Acknowledgement: Mark Thorley, NERCNATURALENVIRONMENTRESEARCH COUNCIL
Closing thoughts
• Co-ordination and join up – High level and strategic : Funders– Operational level and practical : JISC data services
& research council data centres
• Funding– Are current economic models for preservation &
data sharing infrastructure a) appropriate? b) adequate? c) sustainable?
– Should inform prioritisation and investment
Closing thoughts 2
• Good Practice requirements– Data management and sharing Policies– Data Management Plans (peer-reviewed)– Institutional data curation policies & planning
• Technical interoperability and integration– Data are diverse and complex– JISC IIE vision of discovery across repositories– Contextual linking offers opportunity for data centres and
institutional repositories to realise synergies and work more closely together
Closing thoughts 3
• Advocacy – Programmes to reach across sectors– Harmonisation and consistent messages – Tailored & targeted to disciplines– Researcher has some curatorial responsibility
• Training– Lack of skills– eLearning opportunity– Data scientists? Recognition and career development– “Native” data scientists are coming….
“Dealing with the Data Deluge”
• JISC Repositories Programme• Supporting Institutions in the Digital Age• Digital Repositories Conference• 5-6 June 2007• University of Manchester• Research Data Strand