JISC Joint Programmes Meeting 2005 1
eBank UK : linking research data, learning and scholarly communications.
Dr Liz Lyon, UKOLN, University of Bath
Dr Simon Coles, School of Chemistry, University of Southampton
The wider context
JISC Joint Programmes Meeting 2005 3
Why create the e-Framework?The JISC strategic context
Sarah Porter, 2005
JISC-fundedcontent providers
institutionalcontent providers
externalcontent providers
brokers aggregators catalogues indexes
institutionalportals
subjectportals
learning managementsystems
media-specificportals
end-userdesktop/browser pr
esen
tatio
n
fusion
prov
isio
n
OpenURLlink servers
shared infrastructure
authentication/authorisation (Athens)
institutional profilingservices
terminology services
service registries
identifier services
metadata schema registries
© Andy Powell (UKOLN, University of Bath), 2005
This work is licensed under a Creative Commons LicenseAttribution-ShareAlike 2.0
JISC Information Environment architecture
JISC Joint Programmes Meeting 2005 5
Learning & Teaching workflows
Research & e-Science workflows
Aggregator services: national, commercial
Repositories : institutional, e-prints, subject, data, learning objects
Institutional presentation services: portals, Learning Management Systems, u/g, p/g courses, modules
Harvestingmetadata
Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media
Resource discovery, linking, embedding
Deposit / self-archiving
Peer-reviewed publications: journals, conference proceedings
Publication
Validation
Data analysis, transformation, mining, modelling
Resource discovery, linking, embedding
Deposit / self-archiving
Learning object creation, re-use
Searching , harvesting, embedding
Quality assurance bodies
Validation
Presentation services: subject, media-specific, data, commercial portals
Resource discovery, linking, embedding
The scholarly knowledge cycle.
Liz Lyon, Ariadne, July 2003.
This work is licensed under a Creative Commons LicenseAttribution-ShareAlike 2.0
© Liz Lyon (UKOLN, University of Bath), 2005
JISC Joint Programmes Meeting 2005 6
Data Overload!
How do we disseminate?
EPSRC National Crystallography
Service
eScience - the data deluge
JISC Joint Programmes Meeting 2005 7
JISC Joint Programmes Meeting 2005 8
Learning & Teaching workflows
Research & e-Science workflows
Aggregator services:
eBank UK
Repositories : institutional, e-prints, subject, data, learning objects
Institutional presentation services: portals, Learning Management Systems, u/g, p/g courses, modules
Harvestingmetadata
Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media
Resource discovery, linking, embedding
Deposit / self-archiving
Peer-reviewed publications: journals, conference proceedings
Publication
Validation
Data analysis, transformation, mining, modelling
Resource discovery, linking, embedding
Deposit / self-archiving
Learning object creation, re-use
Searching , harvesting, embedding
Quality assurance bodies
Validation
Presentation services: subject, media-specific, data, commercial portals
Resource discovery, linking, embedding
The eBank UK Project
JISC Joint Programmes Meeting 2005 10
eBank UK: background
• JISC-funded September 2003, Phase 2 February 2005• UKOLN at the University of Bath (lead), University of
Southampton, University of Manchester• Exemplar: e-Science testbed ‘Combechem’
– Grid-enabled combinatorial chemistry– Crystallography, laser and surface chemistry examples– Development of an e-Lab using pervasive computing technology– National Crystallography Service
• Resource Discovery Network / PSIgate physical sciences portal
• http://www.ukoln.ac.uk/projects/ebank-uk/
JISC Joint Programmes Meeting 2005 11
The project team
• UKOLN• Michael Day• Monica Duke• Rachel Heery• Traugott Koch • Liz Lyon• +• Andy Powell
• Southampton• Les Carr• Simon Coles• Jeremy Frey• Chris Gutteridge• Mike Hursthouse• Andrew Milstead
• Manchester• John Blunden-Ellis
JISC Joint Programmes Meeting 2005 12
Data Flow in eBank UK
Submit
Store/link
Data files
Metadata
Present
HTML
Institutional repository
OA
I-P
MH
Harvest (XML)
Index and Search
Present
HTML
eBank aggregator
Create
Deposition Interface
Local archive search
interface
Service Provider interfaces e.g. Subject PortalDeposit
JISC Joint Programmes Meeting 2005 13
ebank_dc record (XML)
Crystal structure (data holding)
Crystal structure report (HTML)
Dataset
Dataset
Institutional repository
eBank UK aggregator service
ePrint UK aggregator service
Subject service
DepositHarvesting OAI-PMH
ebank_dc
Harvesting OAI-PMH oai_dc
Harvesting OAI-PMH oai_dc
Searching, linking and embedding
Searching, linking and embedding
Searching, linking and embedding
Dataset
dc:identifier
dcterms:references
Linking
dc:type=“CrystalStructure” and/or “Collection”
Model input Andy Powell, UKOLN.
PSIgate portal
Eprint oai_dc record (XML)
dcterms:isReferencedBy
dc:type=“Eprint” and/or ”Text”
eBank data model
Eprint “jump-off” page (HTML)
dc:identifierEprint manifestation (e.g. PDF)
Linking
JISC Joint Programmes Meeting 2005 14
CombeChem: An EPSRC pilot project
X-Raye-Lab
Analysis
Properties
Propertiese-Lab
SimulationVideo
Diff
ract
omet
er
Grid Middleware
StructuresDatabase
JISC Joint Programmes Meeting 2005 15
Crystallography data: The publication problem
Cl
Cl
Cl
Cl
Cl
Cl
ClCl Cl
Cl
Cl
ClCl
O
O
O
O
N
N
N
N
N+
O
O
O
N+
O
O
O
25,000,000
2,000,000
300,000
JISC Joint Programmes Meeting 2005 16
Crystallography workflowRAW DATA DERIVED DATA RESULTS DATA
• Initialisation: mount new sample set up data collection• Collection: collect data• Processing: process and correct images• Solution: solve structures• Refinement: refine structure• CIF: produce CIF (Crystallographic Information File)• Validation: chemical & crystallographic checks• Report: generate Crystal Structure Report
JISC Joint Programmes Meeting 2005 17
A data repository entry
JISC Joint Programmes Meeting 2005 18
Access to the underlying data
ecrystals.chem.soton.ac.uk
JISC Joint Programmes Meeting 2005 19
Harvesting: OAIster
JISC Joint Programmes Meeting 2005 20
Aggregating: search & discover
JISC Joint Programmes Meeting 2005 21
Linking data to publications
JISC Joint Programmes Meeting 2005 22
eBank embedded in a science portal
JISC Joint Programmes Meeting 2005 23
Current Developments: Deposition and validation tools
Validation
File format manipulation
JISC Joint Programmes Meeting 2005 24
Current Developments: Integration into crystallographic publishing practices
Publishers seal of approval
JISC Joint Programmes Meeting 2005 25
Current Developments: Ontologies for aggregating, linking & discovery
• Transform the ‘list’ into an ‘ontology’
• Embed ontology into the deposition process
• Publish keywords in OAI
• Aggregators use keywords for linking with the broader literature
• Researchers use keyword ontology in search and discovery services
JISC Joint Programmes Meeting 2005 26
eBank : linking to learning
• Embedding in e-Learning processes• Evaluating the pedagogical benefits
– MChem course
– Chemical informatics course
Issues and challenges
JISC Joint Programmes Meeting 2005 28
1. Issues: research data as content
• Sharing it!• Data diversity
– Homo- or heterogeneous– Raw and derived / processed – Sensitivity– Fast or slow growth in volume
• Repository evolution: – Likelihood to scale up (from bytes to petabytes)– Quality assurance (from the start)– Community-based standards development
(“folksonomies”)– Build robust services
JISC Joint Programmes Meeting 2005 29
2. Issues: generic data models, metadata schema & terminology
• Validation against other schema– CCLRC Scientific Data Model Vs 2
• Complex digital objects and packaging options – METS– MPEG 21 DIDL
• Terminologies– Domain: crystallography– Inter-disciplinary e.g. biomaterials– Metadata enhancement: subject keyword additions to datasets
based on knowledge of keywords in related publications – Meaningful resource discovery?
JISC Joint Programmes Meeting 2005 30
3. Issues: linking and identifiers
• Links to individual datasets within an experiment• Links to all datasets associated with an experiment or a data
collection• Links to derived eprints and published literature • Context sensitive linking: find me
– Datasets by this author / creator– Datasets related to this subject– Learning objects by this author / creator– Learning objects related to this subject
• Identifiers and persistence– “generic” – domain: International Chemical Identifier (InChI code)
• Resource discovery : Google Scholar?• Provenance: authenticity, authority, integrity?
JISC Joint Programmes Meeting 2005 31
4. Issues: embedding and workflow• Into the crystallographic publishing community International Union of
Crystallography • Into the chemistry research workflow
– SMART TEA Digital Lab Book e-synthesis Lab– Other analytical techniques and instrumentation– RAE procedures?
• Into the curriculum and e-Learning workflows– MChem course – Undergraduate Chemical Informatics courses
JISC Joint Programmes Meeting 2005 32
Next in Phase 2…….
• Full embedding into the crystallographic research and publishing communities
• Chemistry workflow embedding– R4L Repository for the Laboratory– Related sub-domains of chemistry SPECTRa
• e-Learning embedding and pedagogic evaluation– Assess role in u/g chemical informatics courses– Introducing school children to e-research
• Enabling interdisciplinary research– Physical, mathematical, earth, environmental and
engineering sciences
Thank you.
Questions?…..