ogf 2007 presentation-1 · • transparent migration to new media. open repositories 2007, san...
TRANSCRIPT
Open Repositories 2007, San Antonio, 23 Jan. 2007 1
Funded by:
© AHDS
Grid activities at the Arts and Humanities Data Service
Mark Hedges
Arts and Humanities Data ServiceKing’s College London
Funded by:
© AHDS
OGF 20, Manchester, 7 May 2007
Overview• What is the AHDS?• Grid applications at the AHDS• Next steps
Open Repositories 2007, San Antonio, 23 Jan. 2007 2
Funded by:
© AHDS
What is the Arts and Humanities Data Service?
Funded by:
© AHDS
OGF 20, Manchester, 7 May 2007
What does the AHDS do?
• Preserves and distributes digital resources for research and teaching in arts and humanities subjects
• These resources are free for educational and private use
• Generally available online
Open Repositories 2007, San Antonio, 23 Jan. 2007 3
Funded by:
© AHDS
OGF 20, Manchester, 7 May 2007
How is the AHDS Organised?
• Established in 1996• Evolved considerably over 10 years• Managing Executive• Geographically distributed centres for
particular disciplines: History, Visual Arts, Performing Arts, Archaeology, Literature/Languages/Linguistics,
• Virtual centres for other disciplines
Funded by:
© AHDS
OGF 20, Manchester, 7 May 2007
Collections
History
Archaeology
Literature/Linguistics
Visual Arts
Performing Arts
• Highly diverse in terms of type and size
• Images, text, databases, video, sound, multimedia
•Complex internal structures
•Require discipline-specific knowledge to process
• increased acquisition rate
Open Repositories 2007, San Antonio, 23 Jan. 2007 4
Funded by:
© AHDS
OGF 20, Manchester, 7 May 2007
Museum of London Archaeological Archive
New Survey of London Life and Labour, 1929-1931
Corpus of Romanesque Sculpture in Britain
Imperial War Museum
Electronic Corpus of Tyneside English
Funded by:
© AHDS
Grids and the AHDS
Open Repositories 2007, San Antonio, 23 Jan. 2007 5
Funded by:
© AHDS
OGF 20, Manchester, 7 May 2007
Grid activities at the AHDS
• DARIAH• Preservation environment• Repositories and grids
Funded by:
© AHDS
OGF 20, Manchester, 7 May 2007
Digital Preservation
“Ensuring the usability of a digital resource through changing technological regimes with a minimum loss of the resource’s intellectual content.”
AHDS Preservation Glossary
Open Repositories 2007, San Antonio, 23 Jan. 2007 6
Funded by:
© AHDS
OGF 20, Manchester, 7 May 2007
AHDS preservation approach• Complies with OAIS (Open Archival
Information System) reference model• Preservation actions on ingest
- Capture preservation metadata set- Format normalisation
• Post-ingest: monitoring format/tool obsolescence and format migration
Funded by:
© AHDS
OGF 20, Manchester, 7 May 2007
Data grid based preservation• First approach - based on Storage
Resource Broker• Virtualisation of storage• Distributed across heterogeneous
resources (within AHDS and elsewhere)• Multiple replicas• Metadata associated with data object• Transparent migration to new media
Open Repositories 2007, San Antonio, 23 Jan. 2007 7
Funded by:
© AHDS
OGF 20, Manchester, 7 May 2007
Drawbacks• Difficult to integrate specialised
preservation requirements• Implemented as external client code• Metadata limited in scope (compared to
what we want to store for our complex objects)
Funded by:
© AHDS
OGF 20, Manchester, 7 May 2007
Enhanced approach• Based on iRODS (Rule Oriented Data
System)• Data management or preservation
actions encoded as rules built up from atomic services
• Rules integrated with system, yet easily changeable
Open Repositories 2007, San Antonio, 23 Jan. 2007 8
Funded by:
© AHDS
OGF 20, Manchester, 7 May 2007
Simple example • In the preservation archive, files are
periodically checked for fixity and repaired as necessary
• Define a rule set implementing this, with multiple possibilities for corrective action.
• An advantage: the services can access more complex metadata held in external digital repository systems.
Funded by:
© AHDS
OGF 20, Manchester, 7 May 2007
Nature of humanities research data
• “Hard” sciences – requirements derived from need for fast access to large distributed data sets, simulations
• Humanities – complexity and context dependency of research material
Open Repositories 2007, San Antonio, 23 Jan. 2007 9
Funded by:
© AHDS
OGF 20, Manchester, 7 May 2007
Digital repositories• Need to represent humanities digital content
so as to reflect its complexity and context.• So: Store using flexible digital repository
systems (Fedora at AHDS). • Need seamless integration between these
highly structured repositories.• So: Integration repository software with grid
middleware.
Funded by:
© AHDS
OGF 20, Manchester, 7 May 2007
Repository-Grid IntegrationTwo broad approaches:• Grid as virtualised distributed storage. • Repositories as data resources on grid.
Open Repositories 2007, San Antonio, 23 Jan. 2007 10
Funded by:
© AHDS
OGF 20, Manchester, 7 May 2007
Grid storage for Fedora• Fedora has been integrated with SRB,
providing virtualised storage. • Currently looking at iRODS integration.• Will be able to make use of the complex
metadata stored within Fedora (for discovery as well as preservation).
Funded by:
© AHDS
OGF 20, Manchester, 7 May 2007
Fedora repositories as grid resources• Use grid technologies to allow access to
distributed Fedoras belonging to different administrative domains.
• Registries to store information about repositories and contents.
• Grid AuthN and AuthZ mechanisms providing uniform access.
Open Repositories 2007, San Antonio, 23 Jan. 2007 11
Funded by:
© AHDS
OGF 20, Manchester, 7 May 2007
Contact