ukoln is supported by: enhanced support for escience: the role of digital libraries digital...

Post on 28-Mar-2015

218 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

                                                             

UKOLN is supported by:

Enhanced support for eScience: the role of Digital Libraries

Digital Libraries Go eScience, ECDL, Alicante September 2006

Rachel Heery

Deputy Director R&D, UKOLN

www.ukoln.ac.uk

A centre of expertise in digital informaion management

                                                             

Summary

• New modes of scholarship– eScience service portfolio – Emerging eResearch ecology

• Infrastructural elements• Data creation and capture • Data curation and preservation• Data citation, discovery and use• Adding value and knowledge extraction

                                                             

Vision 2010

• Richer scholarly communication based on open access to and re-use of scholarly materials

• Integrated life-cycle of knowledge from research to learning

• Access and re-use of scholarly materials• Added value services on scholarly

materials (involving HE and commercial sectors)

                                                             

More repositories and more content!

• Working papers, primary data, audiovisual, images• Hardware in research labs will automatically

deposit experimental data• Desktop tools will deposit content• Rich data flow between networks of repositories• Rich data flows between repositories and other

components in information landscape

• National and institutional preservation strategies in place!

                                                             

Repositories interworking with other eResearch components

repositoryRepositories

Experimental equipment

Authoring tools

Name authority services

Field study capture tools

Terminology services

Content Packaging tools

                                                             

Where are we now?

Scholarship today? OA landscape

                                                             

http://www.flickr.com/photos/97797311@N00/61648107/

23 June 2006

Architecture of Participation?

Data-centric 2020 vision

Reference datasets as infrastructure?

                                                             

New forms of publication: integration of data and journals

                                                             

Emerging ecology

                                                             

Defining workflows and dataflows

• Analyse roles and interactions within and beween repositories

• What does the user want? • Identify and define services

– Potential for ‘shared services’, re-use of services

• Explore potential dataflows– Aggregation, data exchange, metadata

extraction and enhancement

                                                             

Dataflows and Workflows

• How is primary research data captured in faculty and academic departments?

• Where and how is primary research data stored? Made accessible?

• What are processes for deriving further data and how is this is structured and stored? Made accessible?

• How is data curated for the long term?

Understanding the research process

• Project StORe: Source-to-Output Repositories (Edinburgh) – Primary data : research publications– Survey questionnaire

• RepoMMan: Repository Metadata and Management (Hull)– Survey questionnaire and interviews– Activity diagram and workflow

• DCC SCARP– Curation staff working within research teams

Repository ecology

Institutional Repository

Departmentalrepository

Authoring tool

Subject repositories

Institutional research system

Data Centres

Learned society repositories

Laboratory repository

Experimental machine

Aggregators:

OAIster, Google

Regional, national

Text mining tools

Terminology services

Research council repositories

                                                             

Digital libraries & eScience Infrastructure

                                                             

Data capture

Digital repositories, OA & preservation

• Long-term access: trust, responsibility, policy• Trusted DR Audit Checklist for Certification Draft Research Libraries

Group-NARA Taskforce 2005• Defined criteria under 4 categories

– Organisation– Functions, processes & procedures– Designated community & usability– Technologies & technical infrastructure

• UK Digital Curation Centre: advice, tools & services• RepInfo Registry• EU CASPAR Integrated Project

• Task Force on the Permanent Access to the Records of Science

http://www.dcc.ac.uk/

http://www.casparpreserves.info/pages/1/index.htm

http://tfpa.kb.nl/

Data, metadata and discovery

• Validation, publication & discovery of data models & schema

• Metadata packaging standards– METS, MPEG 21 DIDL– Complex object model?

• Semantic descriptions– Formal high-level and domain ontologies– Inter-disciplinary discovery

• ePrints DC Application Profile • UK Intute IR search service (eprints)• Informal social network approaches

“folksonomies”• What data models and metadata schema are

in place?• Have librarians been involved in their

development?

Persistent identifiers for data citation

• How will they be used? We need use cases: depositor, author, service provider, researcher, publisher?

• Schemes: DOI, Handle, ARK, PURL• Publication & citation of scientific primary data project

National Library for Science & Technology (TIB), University of Hanover, Germany. STD-DOI Project DOI registry for datasets http://www.std-doi.de

• What persistent identifiers have been assigned to your data?• Is there a data citation policy?• Was the Library involved?

Adding value: repository services• Tools: for deposit, normalisation, manipulation, transformation…..

• Linking, annotation, visualisation

• Aggregators: generic, (sub-) disciplinary

Knowledge extraction:• Mining (data, text, structures)

• Modelling (economic, climate, mathematical, biological…)

• Analysis (statistical, lexical, gene….)

Is your data OA?

How is your data being used and re-used?

Nature 23 March 2006 OTMI: Open Text Mining Interface

NaCTeMhttp://www.nactem.ac.uk/

Emerging tools: TerMine, GENIA, Cafetiere

                                                             

A Case Study in Crystallography

                                                             

Data capture

R4L Deposit scenario (…part of….)1. Produce strategy for synthesis (=idea)

2. Submit plan to SmartTea system (incl. identifiers)

3. Retrieve and follow instructions (sub-workflow?)

4. Experimental synthesis metadata automatically recorded on instruments (Smart Lab)

5. Create record for synthesised sample (+ proposed chemical identifier) in R4L laboratory data management system

6. Run spectral analyses on sample capturing further analysis metadata (incl. time-stamp, analysis software version, researcher details etc.)

7. Save spectrum in native and common formats

8. Invoke R4L data capture service and deposit files + metadata in laboratory repository…

RAW DATA DERIVED DATA RESULTS DATA

                                                             

eBank UK Project• Promote open access crystallography data • Aggregator service harvests OAI metadata from institutional data

repository (e-Crystals archive)• Service linking from data to derived research publication• Embedding eBank service in learning workflows: pedagogy• Future federation plans for crystallography data repositories

UKOLN (lead), University of Southampton, University of Manchester

http://www.ukoln.ac.uk/projects/ebank-uk/

                                                             

A data repository entry ecrystals.chem.soton.ac.uk

Access to the underlying data: complex objects

eBank Metadata Publication

• Using simple Dublin Core • Crystal structure• Title (Systematic IUPAC Name)• Authors• Affiliation• Creation Date

• Additional chemical information through Qualified Dublin Core• Empirical formula• International Chemical Identifier InChI • Compound Class & Keywords

• Specifies which ‘datasets’ are present in an entry• Application Profile• DOIs from TIB http://dx.doi.org/10 .1594/ecrystals.chem.soton.ac.uk/145

• Data citation policy http://ecrystals.chem.soton.ac.uk/rights.html

http://www.ukoln.ac.uk/projects/ebank-uk/schemas/

Discovering data:

Coles, S.J., Day, N.E., Murray-Rust, P., Rzepa, H.S., Zhang, Y., Org. Biomol. Chem., 2005, (10),1832-1834. DOI: 10.1039/b502828k

• Domain identifier: International Chemical Identifier (INChI) code• Google molecule using INChISlide from Simon Coles

Adding value: eBank linking data to publications

Linking research to learning - embedding eBank aggregator service in a science portal for student learners

Integration into the curriculum and e-Learning workflows

• MChem course • Assess role in

Undergraduate Chemical Informatics courses

• Pedagogic evaluation• April – June 2006• Report to follow.

                                                             

Roles & responsibilities: new challenges?

Workforce development and capacity building

• NSF Draft Report 2005 “Data scientist” - hybrid skills

• Facilitate collaboration– “Multidisciplinary teams: computer

scientists, domain scientists, digital library experts, statisticians/modellers e.g. eBank project

– Lessons learnt: e-Science Human Factors Audit Report (to be published 2006) Roy Kawalsky, Loughborough

• CURL/SCONUL e-Research Taskforce

Has your (digital) library engaged with the e-Research agenda?

                                                             

Repositories roadmap :vision 2010

• Richer scholarly communication based on open access to and re-use of scholarly materials

• Integrated life-cycle of knowledge from research to learning

• Available metadata about scholarly materials• Added value services on scholarly materials

(involving HE and commercial sectors)

                                                             

More repositories and more content!

• Working papers, primary data, audiovisual, images• Hardware in research labs will automatically

deposit experimental data• Desktop tools will deposit content• Rich data flow between networks of repositories• Rich data flows between repositories and other

components in information landscape

• National and institutional preservation strategies in place!

                                                             

Repository interworking with other components

repositoryRepository

Virtual Learning Environment

Authoring tool

Name authority service

Institutional research system

Automated classification service

Packaging tool

                                                             

Where are we now?

Scholarship today? OA landscape

Repository ecology

Institutional Repository

Departmentalrepository

Authoring tool

Subject repositories

Institutional research system

Data Centres

Learned society repositories

Laboratory repository

Experimental machine

Aggregators:

OAIster, Google

Regional, national

Text mining tools

Terminology services

Research council repositories

                                                             

Defining workflows and dataflows

• Analyse roles and interactions within and beween repositories

• What does the user want? • Identify and define services

– Potential for ‘shared services’, re-use of services– In context of JISC e-Framework

• Explore potential dataflows– Aggregation, data exchange, metadata extraction

and enhancement

                                                             

Deposit a priority!

• To enable users to populate repositories simply, effectively and preferably automatically

• To capture content from desktop applications, experimental equipment (smart labs), learning content development tools etc

• To enable repository of deposit to exchange data with further repositories in predictable manner

• To hide complexity from end-user• To be compatible with follow-on added value services

layered on repository content• Deposit API Working group meeting July 11/12, Warwick

http://www.ukoln.ac.uk/repositories/digirep/

                                                             

Thank you!

top related