ukoln is supported by: enhanced support for escience: the role of digital libraries digital...
Post on 28-Mar-2015
218 Views
Preview:
TRANSCRIPT
UKOLN is supported by:
Enhanced support for eScience: the role of Digital Libraries
Digital Libraries Go eScience, ECDL, Alicante September 2006
Rachel Heery
Deputy Director R&D, UKOLN
www.ukoln.ac.uk
A centre of expertise in digital informaion management
Summary
• New modes of scholarship– eScience service portfolio – Emerging eResearch ecology
• Infrastructural elements• Data creation and capture • Data curation and preservation• Data citation, discovery and use• Adding value and knowledge extraction
Vision 2010
• Richer scholarly communication based on open access to and re-use of scholarly materials
• Integrated life-cycle of knowledge from research to learning
• Access and re-use of scholarly materials• Added value services on scholarly
materials (involving HE and commercial sectors)
More repositories and more content!
• Working papers, primary data, audiovisual, images• Hardware in research labs will automatically
deposit experimental data• Desktop tools will deposit content• Rich data flow between networks of repositories• Rich data flows between repositories and other
components in information landscape
• National and institutional preservation strategies in place!
Repositories interworking with other eResearch components
repositoryRepositories
Experimental equipment
Authoring tools
Name authority services
Field study capture tools
Terminology services
Content Packaging tools
Where are we now?
Scholarship today? OA landscape
http://www.flickr.com/photos/97797311@N00/61648107/
23 June 2006
Architecture of Participation?
Data-centric 2020 vision
Reference datasets as infrastructure?
New forms of publication: integration of data and journals
Emerging ecology
Defining workflows and dataflows
• Analyse roles and interactions within and beween repositories
• What does the user want? • Identify and define services
– Potential for ‘shared services’, re-use of services
• Explore potential dataflows– Aggregation, data exchange, metadata
extraction and enhancement
Dataflows and Workflows
• How is primary research data captured in faculty and academic departments?
• Where and how is primary research data stored? Made accessible?
• What are processes for deriving further data and how is this is structured and stored? Made accessible?
• How is data curated for the long term?
Understanding the research process
• Project StORe: Source-to-Output Repositories (Edinburgh) – Primary data : research publications– Survey questionnaire
• RepoMMan: Repository Metadata and Management (Hull)– Survey questionnaire and interviews– Activity diagram and workflow
• DCC SCARP– Curation staff working within research teams
Repository ecology
Institutional Repository
Departmentalrepository
Authoring tool
Subject repositories
Institutional research system
Data Centres
Learned society repositories
Laboratory repository
Experimental machine
Aggregators:
OAIster, Google
Regional, national
Text mining tools
Terminology services
Research council repositories
Digital libraries & eScience Infrastructure
Data capture
Digital repositories, OA & preservation
• Long-term access: trust, responsibility, policy• Trusted DR Audit Checklist for Certification Draft Research Libraries
Group-NARA Taskforce 2005• Defined criteria under 4 categories
– Organisation– Functions, processes & procedures– Designated community & usability– Technologies & technical infrastructure
• UK Digital Curation Centre: advice, tools & services• RepInfo Registry• EU CASPAR Integrated Project
• Task Force on the Permanent Access to the Records of Science
http://www.dcc.ac.uk/
http://www.casparpreserves.info/pages/1/index.htm
http://tfpa.kb.nl/
Data, metadata and discovery
• Validation, publication & discovery of data models & schema
• Metadata packaging standards– METS, MPEG 21 DIDL– Complex object model?
• Semantic descriptions– Formal high-level and domain ontologies– Inter-disciplinary discovery
• ePrints DC Application Profile • UK Intute IR search service (eprints)• Informal social network approaches
“folksonomies”• What data models and metadata schema are
in place?• Have librarians been involved in their
development?
Persistent identifiers for data citation
• How will they be used? We need use cases: depositor, author, service provider, researcher, publisher?
• Schemes: DOI, Handle, ARK, PURL• Publication & citation of scientific primary data project
National Library for Science & Technology (TIB), University of Hanover, Germany. STD-DOI Project DOI registry for datasets http://www.std-doi.de
• What persistent identifiers have been assigned to your data?• Is there a data citation policy?• Was the Library involved?
Adding value: repository services• Tools: for deposit, normalisation, manipulation, transformation…..
• Linking, annotation, visualisation
• Aggregators: generic, (sub-) disciplinary
Knowledge extraction:• Mining (data, text, structures)
• Modelling (economic, climate, mathematical, biological…)
• Analysis (statistical, lexical, gene….)
Is your data OA?
How is your data being used and re-used?
Nature 23 March 2006 OTMI: Open Text Mining Interface
NaCTeMhttp://www.nactem.ac.uk/
Emerging tools: TerMine, GENIA, Cafetiere
A Case Study in Crystallography
Data capture
R4L Deposit scenario (…part of….)1. Produce strategy for synthesis (=idea)
2. Submit plan to SmartTea system (incl. identifiers)
3. Retrieve and follow instructions (sub-workflow?)
4. Experimental synthesis metadata automatically recorded on instruments (Smart Lab)
5. Create record for synthesised sample (+ proposed chemical identifier) in R4L laboratory data management system
6. Run spectral analyses on sample capturing further analysis metadata (incl. time-stamp, analysis software version, researcher details etc.)
7. Save spectrum in native and common formats
8. Invoke R4L data capture service and deposit files + metadata in laboratory repository…
RAW DATA DERIVED DATA RESULTS DATA
eBank UK Project• Promote open access crystallography data • Aggregator service harvests OAI metadata from institutional data
repository (e-Crystals archive)• Service linking from data to derived research publication• Embedding eBank service in learning workflows: pedagogy• Future federation plans for crystallography data repositories
UKOLN (lead), University of Southampton, University of Manchester
http://www.ukoln.ac.uk/projects/ebank-uk/
A data repository entry ecrystals.chem.soton.ac.uk
Access to the underlying data: complex objects
eBank Metadata Publication
• Using simple Dublin Core • Crystal structure• Title (Systematic IUPAC Name)• Authors• Affiliation• Creation Date
• Additional chemical information through Qualified Dublin Core• Empirical formula• International Chemical Identifier InChI • Compound Class & Keywords
• Specifies which ‘datasets’ are present in an entry• Application Profile• DOIs from TIB http://dx.doi.org/10 .1594/ecrystals.chem.soton.ac.uk/145
• Data citation policy http://ecrystals.chem.soton.ac.uk/rights.html
http://www.ukoln.ac.uk/projects/ebank-uk/schemas/
Discovering data:
Coles, S.J., Day, N.E., Murray-Rust, P., Rzepa, H.S., Zhang, Y., Org. Biomol. Chem., 2005, (10),1832-1834. DOI: 10.1039/b502828k
• Domain identifier: International Chemical Identifier (INChI) code• Google molecule using INChISlide from Simon Coles
Adding value: eBank linking data to publications
Linking research to learning - embedding eBank aggregator service in a science portal for student learners
Integration into the curriculum and e-Learning workflows
• MChem course • Assess role in
Undergraduate Chemical Informatics courses
• Pedagogic evaluation• April – June 2006• Report to follow.
Roles & responsibilities: new challenges?
Workforce development and capacity building
• NSF Draft Report 2005 “Data scientist” - hybrid skills
• Facilitate collaboration– “Multidisciplinary teams: computer
scientists, domain scientists, digital library experts, statisticians/modellers e.g. eBank project
– Lessons learnt: e-Science Human Factors Audit Report (to be published 2006) Roy Kawalsky, Loughborough
• CURL/SCONUL e-Research Taskforce
Has your (digital) library engaged with the e-Research agenda?
Repositories roadmap :vision 2010
• Richer scholarly communication based on open access to and re-use of scholarly materials
• Integrated life-cycle of knowledge from research to learning
• Available metadata about scholarly materials• Added value services on scholarly materials
(involving HE and commercial sectors)
More repositories and more content!
• Working papers, primary data, audiovisual, images• Hardware in research labs will automatically
deposit experimental data• Desktop tools will deposit content• Rich data flow between networks of repositories• Rich data flows between repositories and other
components in information landscape
• National and institutional preservation strategies in place!
Repository interworking with other components
repositoryRepository
Virtual Learning Environment
Authoring tool
Name authority service
Institutional research system
Automated classification service
Packaging tool
Where are we now?
Scholarship today? OA landscape
Repository ecology
Institutional Repository
Departmentalrepository
Authoring tool
Subject repositories
Institutional research system
Data Centres
Learned society repositories
Laboratory repository
Experimental machine
Aggregators:
OAIster, Google
Regional, national
Text mining tools
Terminology services
Research council repositories
Defining workflows and dataflows
• Analyse roles and interactions within and beween repositories
• What does the user want? • Identify and define services
– Potential for ‘shared services’, re-use of services– In context of JISC e-Framework
• Explore potential dataflows– Aggregation, data exchange, metadata extraction
and enhancement
Deposit a priority!
• To enable users to populate repositories simply, effectively and preferably automatically
• To capture content from desktop applications, experimental equipment (smart labs), learning content development tools etc
• To enable repository of deposit to exchange data with further repositories in predictable manner
• To hide complexity from end-user• To be compatible with follow-on added value services
layered on repository content• Deposit API Working group meeting July 11/12, Warwick
http://www.ukoln.ac.uk/repositories/digirep/
Thank you!
top related