managing digital objects and their metadata: challenges and responses douglas campbell and adrienne...

33
Managing digital objects and their metadata: challenges and responses Douglas Campbell and Adrienne Kebbell National Library of New Zealand Te Puna Mātauranga o Aoteaora DC-2004 Conference, 12 October 2004

Upload: rodney-powers

Post on 30-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Managing digital objects and their metadata: challenges and responses Douglas Campbell and Adrienne Kebbell National Library of New Zealand Te Puna Mātauranga

Managing digital objects and their metadata:

challenges and responses

Douglas Campbell and Adrienne KebbellNational Library of New Zealand Te Puna Mātauranga o Aoteaora

DC-2004 Conference, 12 October 2004

Page 2: Managing digital objects and their metadata: challenges and responses Douglas Campbell and Adrienne Kebbell National Library of New Zealand Te Puna Mātauranga

Agenda

• Our situation

• Digital Preservation

Frameworks

• Digital Objects– Complex objects

– Identifiers

– File naming

• Metadata– Frameworks

– Descriptive metadata

– Preservation metadata

– Structural metadata

– Automatic extraction

– Modularity

• Integration

– Business process workflows

Page 3: Managing digital objects and their metadata: challenges and responses Douglas Campbell and Adrienne Kebbell National Library of New Zealand Te Puna Mātauranga

National Library of New ZealandTe Puna Mātauranga o Aoteaora

• Collect, maintain, and make accessible literature and information resources that relate to New Zealand and the Pacific

• Alexander Turnbull Library:Preserve New Zealand's documentary heritage for generations to come

• Develop and deliver services for schools to support teaching and learning

• Apply the partnership responsibilities of the Treaty of Waitangi to all activities

Page 5: Managing digital objects and their metadata: challenges and responses Douglas Campbell and Adrienne Kebbell National Library of New Zealand Te Puna Mātauranga

National Digital Heritage Archive

• National Library Act 2003 gives legal deposit of electronic

materials to the National Library

• Archive development funded by Government

• Working towards “Trusted Digital Repository” certification

Page 6: Managing digital objects and their metadata: challenges and responses Douglas Campbell and Adrienne Kebbell National Library of New Zealand Te Puna Mātauranga

Part 1 Digital Preservation Framework

Page 7: Managing digital objects and their metadata: challenges and responses Douglas Campbell and Adrienne Kebbell National Library of New Zealand Te Puna Mātauranga

Open Archival Information System (OAIS) Model

KEY:SIP – Submission Information Package (Ingest)AIP – Archival Information Package (Archive)DIP – Dissemination Information Package (Access)

Page 8: Managing digital objects and their metadata: challenges and responses Douglas Campbell and Adrienne Kebbell National Library of New Zealand Te Puna Mātauranga

Dig

ital O

bjec

tsM

etad

ata

Applying OAIS – building our framework

Catalogues

Technical Info

Preservation Info

Selection describe

extract manage

Rights

Digital Store

Digital Object Workbench

• Archive

• Migrate

• Manage media

• Identity

• Prepare

• Arrange

• Authenticate

• Create derivatives

Harvest or

Digitise

acquire

or donatedlegal deposit

retrieveload

Access

metadata conversion search

export

manage

Page 9: Managing digital objects and their metadata: challenges and responses Douglas Campbell and Adrienne Kebbell National Library of New Zealand Te Puna Mātauranga

Part 2 Digital Objects

Page 10: Managing digital objects and their metadata: challenges and responses Douglas Campbell and Adrienne Kebbell National Library of New Zealand Te Puna Mātauranga

Digital objects are complex

• Website – hundreds of files

• CD-ROM – hard-coded operation

• Diskette of accounts spreadsheets and correspondence –

dissimilar but related

• Self-contained single file, eg. MS Excel

• Dependent multiple files, eg. HTML + GIFs, or EXE + DLLs

• Self-contained multiple files, eg. Series of MS Word letters

Page 11: Managing digital objects and their metadata: challenges and responses Douglas Campbell and Adrienne Kebbell National Library of New Zealand Te Puna Mātauranga

Classifying the “conceptual object”

• Simple digital object– A single file

– MS Word document, TIFF image

• Digital object group – A set of independent but related files described as a group

– Disk of 100 MS Word letters

• Complex digital object– A group of dependent files intended to be viewed as a single

conceptual object, often with only one entry point

– Website, CD-ROM

Page 12: Managing digital objects and their metadata: challenges and responses Douglas Campbell and Adrienne Kebbell National Library of New Zealand Te Puna Mātauranga

Simple Digital Object

1 Descriptive Record

1 Preservation Object Record

(for PM Word file)

1 Original file [Word]

1 Preservation Master file[Word]

2 Access files [PDF + XML]

1 Simple Object eg. text document 1 PID for 4 files

Object Group

1 Descriptive Record for 800 files [Word, XML, PDF]

•1 Object Pres Data •200 File Data•NN Process Data•NN Metadata Modification Data

1 PID for 800 files

200 Original files [Word]

200 Preservation Master files[Word]

400 Access files [PDF + XML]

1 Object Group eg. 200 letters from

a donor

Complex Digital Object

1 Descriptive Record for 300

files [HTML + gif]

100 Original files [HTML + gif]

100 Preservation Master files[processed for local delivery]

100 Access files [HTML + gif]

1 Complex Object eg. Web Site of 80 html files + 20 gifs

1 PID for 300 files

•1 Object Pres Data •100 File Data•NN Process Data•NN Metadata Modification Data

Complexity of components

Page 13: Managing digital objects and their metadata: challenges and responses Douglas Campbell and Adrienne Kebbell National Library of New Zealand Te Puna Mātauranga

Identifiers

Key characteristics of identifiers to consider:

• Granularity – Question: What do we need to identify? Answer: Whatever we need to identify!

• Intelligence – Unanticipated changes may render intelligent identifiers inaccurate, though dumb identifiers place a reliance on external metadata

• Actionable – Need to separate identity from location, eg. two URLs may be two locations of the same entity

• Persistence – Depends mostly on your commitment

• Extensibility – Be generic, follow standards, application independent

Page 14: Managing digital objects and their metadata: challenges and responses Douglas Campbell and Adrienne Kebbell National Library of New Zealand Te Puna Mātauranga

Persistent Identifiers

Persistence means different things to different communities,

we separate them into:

• Persistent Identifier (PID) – assigned at the “conceptual”

level of an object, persists in perpetuity

• Persistent Locator (PL) – file locator, persists only for the

life of the file

We guarantee PIDs, but PLs to the “best current format” will

become inoperative over the decades as formats become

obsolescent

Page 15: Managing digital objects and their metadata: challenges and responses Douglas Campbell and Adrienne Kebbell National Library of New Zealand Te Puna Mātauranga

File naming conventions – Plan “A”

Plan A: Make filenames unique by including role code, eg:

• DO – Digital Original

• DD – Digital Derivative

• PM – Preservation Master (best attempt to replicate in a

currently accessible format)

• AF – Access Format

• TN – Thumbnail

Filename: IID_role_instance.extension, eg. 1234_af_01.doc

Page 16: Managing digital objects and their metadata: challenges and responses Douglas Campbell and Adrienne Kebbell National Library of New Zealand Te Puna Mātauranga

File naming conventions – Plan “B”

Plan B: “Virtualisation”

• Decouple locator and location

• Location and disk partitioning managed dynamically internally, delivered externally via persistent locator– /1234 (to access the default format)

– /1234?role=TN&size=150

• Locator may be HTTP, SOAP, etc.

• Provides additional opportunities such as transparent “on the fly” format conversions or correcting the MIME type reported

Page 17: Managing digital objects and their metadata: challenges and responses Douglas Campbell and Adrienne Kebbell National Library of New Zealand Te Puna Mātauranga

Novel

Expression

Manifestation

Component

Item

Work

Manuscript

Word v5 PDF XML

Chap 1

Chap 2

Chap 1

Chap 2

Chap 1

Chap 2

XML XSL XML XSL

DOPM

ASAF AF

DOPM

ASAF AFAF AF

Published

Preservation

Lending

BookManifestation

Item

• FRBR

Page 18: Managing digital objects and their metadata: challenges and responses Douglas Campbell and Adrienne Kebbell National Library of New Zealand Te Puna Mātauranga

Part 3 Metadata

Page 19: Managing digital objects and their metadata: challenges and responses Douglas Campbell and Adrienne Kebbell National Library of New Zealand Te Puna Mātauranga

Metadata Framework

Four key categories of metadata for digital objects:

• Resource discovery – finding and identifying

• Structural – presenting in context (eg. pages in a book

rather than bunch of files, navigation, etc)

• Rights management and Access control – protection

of property rights, authentication and authorisation

• Technical and Administrative – properties of the

objects, how they were created, changes made, etc.

Page 20: Managing digital objects and their metadata: challenges and responses Douglas Campbell and Adrienne Kebbell National Library of New Zealand Te Puna Mātauranga

Metadata Framework

Dublin Core

RDF

XML

Generic or GlobalAccess

NZ

GL

SD

C-G

ovG

ILS

AG

LS

MA

RC

DC

QM

OD

SM

ET

S

DC

-Ed

LO

M

EA

DIS

AD

(G)

Community / Sector

Specific Application

Profiles

Community / Sector

Specific Application Profiles

Following International Guidelines

Local

Library Education Archival Government

Metadata Standards Framework for National Library of New Zealand

Page 21: Managing digital objects and their metadata: challenges and responses Douglas Campbell and Adrienne Kebbell National Library of New Zealand Te Puna Mātauranga

Descriptive metadata

Digital Resource Description (DRD) Application Profile

• Lightweight alternative to METS for simple objects based on Qualified DC

• XLink extensions to differentiate links to the multiple derivative files

• Local refinements for different identifier types, eg. local id, persistent id, locator

• RDF/XML encoding syntax

• Used in our “Discover” and “Matapihi” products

Page 22: Managing digital objects and their metadata: challenges and responses Douglas Campbell and Adrienne Kebbell National Library of New Zealand Te Puna Mātauranga

Preservation metadata

NLNZ Preservation Metadata (2002)– Object – preservation info for object, eg. ID, software needed

– File – preservation info for a file, eg. format, size

– Process – record of actions taken, eg. format migration

– Metadata modification – record of changes to above metadata

Page 23: Managing digital objects and their metadata: challenges and responses Douglas Campbell and Adrienne Kebbell National Library of New Zealand Te Puna Mātauranga

Structural metadata

Metadata Encoding & Transmission Standard (METS)

METS recordHeade

rDescriptiv

eAdministrati

veContent

FilesStructural Map

Structural Links

Behaviour

Page 24: Managing digital objects and their metadata: challenges and responses Douglas Campbell and Adrienne Kebbell National Library of New Zealand Te Puna Mātauranga

Metadata Pieces for a Single TIFF Image

Preservation

DCQ Description

<?xml version="1.0" encoding="UTF-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:nlnzdl="http://www.natlib.govt.nz/dl#" xmlns:ead="http://www.natlib.govt.nz/dl#" xmlns:dcq="http://purl.org/dc/terms/" xmlns:dc="http://purl.org/dc/elements/1.1/"> <rdf:Description rdf:about="hdl:1727.11/00002170"> <dc:title>Blue smoke = Kohu auwahi</dc:title> <dc:creator>Karaitiana, Rangi Ruru Wananga, 1909-1970</dc:creator> <dc:subject> <dcq:LCSH> <rdf:value>Popular music - 1941-1950</rdf:value> </dcq:LCSH> </dc:subject> <dc:subject>Maori music - Waiata</dc:subject> <nlnzdl:category> <nlnzdl:NZCT> <rdf:value>A-M-02-03</rdf:value> <rdfs:label>Music Score Covers and Record Sleeves</rdfs:label> </nlnzdl:NZCT> </nlnzdl:category>

<dc:description>For voices and piano.</dc:description> <dc:description>Arranged with tonic sol-fa, chord symbols and two part vocal harmony"--Cover.</dc:description> <dc:description>Words in English and Maori.</dc:description> <dc:description>Caption title.</dc:description> <dc:publisher>C. Begg, [Dunedin] N.Z.</dc:publisher> <dc:contributor>Winchester, George, ca. 1900- , arranger</dc:contributor> <dcq:issued>c1947.</dcq:issued> <dc:type> <dcq:DCMIType> <rdf:value>Text</rdf:value> </dcq:DCMIType> </dc:type> <dc:type> <nlnzdl:LCSHFormOfComposition> <rdf:value>pp</rdf:value> <rdfs:label>Popular music</rdfs:label> </nlnzdl:LCSHFormOfComposition> </dc:type> <dc:format>1 score cover ([1]) p. ; 31 cm.</dc:format> <dcq:extent>17,640KB</dcq:extent> <dcq:extent>81KB</dcq:extent> <dc:format> <dcq:IMT> <rdf:value>image/tiff</rdf:value> </dcq:IMT> </dc:format> <dc:format> <dcq:IMT> <rdf:value>image/jpeg</rdf:value> </dcq:IMT> </dc:format> <dc:format> <dcq:IMT> <rdf:value>image/jpeg</rdf:value> </dcq:IMT> </dc:format> <dc:format> <dcq:IMT> <rdf:value>image/jpeg</rdf:value> </dcq:IMT> </dc:format> <dc:identifier rdf:resource="00182451_00002170_ds.tif"/> <dc:identifier rdf:resource="00182451_00002170_df.jpg"/> <dc:identifier rdf:resource="00182451_00002170_pv.jpg"/> <dc:identifier rdf:resource="00182451_00002170_tn.jpg"/> <nlnzdl:pid rdf:resource="hdl:1727.11/00002170"/> <nlnzdl:object rdf:resource="http://hdl.handle.net/1727.11/00002170"/> <dc:language> <dcq:ISO639-2> <rdf:value>eng</rdf:value> </dcq:ISO639-2> </dc:language> <dc:language> <dcq:ISO639-2> <rdf:value>eng</rdf:value> </dcq:ISO639-2> </dc:language> <dc:language> <dcq:ISO639-2> <rdf:value>mao</rdf:value> </dcq:ISO639-2> </dc:language> <dcq:hasFormat>Also available as an electronic resource.</dcq:hasFormat> <dcq:spatial>New Zealand</dcq:spatial> <dcq:temporal>1947</dcq:temporal> <dc:rights>Permission of the National Library of New Zealand, Te Puna Matauranga o Aotearoa must be obtained before any re-use of this item.</dc:rights> <ead:daoloc ead:behavior="image/tiff" ead:href="http://digital.natlib.govt.nz/source/20020605/00182451_00002170_ds.tif" ead:role="source"/> <ead:daoloc ead:behavior="image/jpeg" ead:href="http://digital.natlib.govt.nz/20020604/00182451_00002170_df.jpg" ead:role="reference" ead:title="Digital image of the cover of the score for Blue smoke. (81KB)"/> <ead:daoloc ead:behavior="image/jpeg" ead:href="http://digital.natlib.govt.nz/20020604/00182451_00002170_pv.jpg" ead:role="display"/> <ead:daoloc ead:behavior="image/jpeg" ead:href="http://digital.natlib.govt.nz/20020604/00182451_00002170_tn.jpg" ead:role="thumbnail"/> </rdf:Description> </rdf:RDF>

METS File Group and structural Map <fileSec> <fileGrp ID="FG2170_pm" USE="Preservation Master"> <file ID="F2170_pm" MIMETYPE="image/tiff" SIZE="17379652" CREATED="1997-04-13T14:51:14"> <FLocat ID="FL2170_pm" LOCTYPE="URL" xlink:href="objects/preservation/1/2170_pm.tif" xlink:actuate="onRequest"/> </file> </fileGrp> <fileGrp ID="FG2170_ds" USE="Dissemination Source"> <file ID="F2170_ds" MIMETYPE="image/tiff" SIZE="17379652" CREATED="2002-09-11T09:12:20"> <FLocat ID="FL2170_ds" LOCTYPE="URL" xlink:href="objects/source/1/2170_ds.tif" xlink:actuate="onRequest"/> </file> </fileGrp> <fileGrp ID="FG2170_df" USE="Dissemination Format"> <file ID="F2170_df" MIMETYPE="image/jpeg" SIZE="123394" CREATED="2002-10-31T15:32:26"> <FLocat ID="FL2170_df" LOCTYPE="URL" xlink:href="objects/access/1/2170_df.jpg" xlink:actuate="onRequest"/> </file> </fileGrp> <fileGrp ID="FG2170_pv" USE="Preview Image"> <file ID="F2170_pv" MIMETYPE="image/jpeg" SIZE="99725" CREATED="2003-04-08T10:56:22"> <FLocat ID="FL2170_pv" LOCTYPE="URL" xlink:href="objects/access/1/2170_pv.jpg" xlink:actuate="onRequest"/> </file> </fileGrp> <fileGrp ID="FG2170_tn" USE="Thumbnail Image"> <file ID="F2170_tn" MIMETYPE="image/jpeg" SIZE="23162" CREATED="2003-04-07T11:33:13"> <FLocat ID="FL2170_tn" LOCTYPE="URL" xlink:href="objects/access/1/2170_tn.jpg" xlink:actuate="onRequest"/> </file>

</fileGrp> </fileSec> <structMap ID="SM2170" TYPE="LOGICAL"> <div ID="DIV2170" LABEL="Blue smoke = Kohu auwahi" TYPE="Image"> <div ID="DIV2170_pm" LABEL="Preservation master" TYPE="tiff image"> <fptr FILEID="F2170_pm"/> </div> <div ID="DIV2170_ds" LABEL="Dissemination Source" TYPE="tiff image"> <fptr FILEID="F2170_ds"/> </div> <div ID="DIV2170_df" LABEL="Dissemination Format" TYPE="jpeg image"> <fptr FILEID="F2170_df"/> </div> <div ID="DIV2170_pv" LABEL="Preview Image" TYPE="jpeg image"> <fptr FILEID="F2170_pv"/> </div> <div ID="DIV2170_tn" LABEL="Thumbnail Image" TYPE="jpeg image"> <fptr FILEID="F2170_tn"/> </div> </div> </structMap>

Page 25: Managing digital objects and their metadata: challenges and responses Douglas Campbell and Adrienne Kebbell National Library of New Zealand Te Puna Mātauranga

NLNZ Metadata Extraction Tool

Automatic metadata extraction is essential

• Extracts embedded metadata from 15 common file

formats (eg. TIFF, JPEG, MS Word, PDF) and file details

for other formats

• Built in Java, outputs in XML (customisable using XSLT)

• Graphical interface or command line batch

• 10,000 JPEG files per hour

• Finalist in UK Pilgrim Trust’s 2004 Preservation Awards

Page 26: Managing digital objects and their metadata: challenges and responses Douglas Campbell and Adrienne Kebbell National Library of New Zealand Te Puna Mātauranga

Metadata Conversion Engine

Metadata modularity

DescriptiveRecords

MARC

ISAD(G)

Picture AustraliaCROSSWALK

DC XML

METS

DC RDF/XML

Matapihi

Govt Portal

Digital Archive

Discover

AdditionalData

DRD RDF AP

NZGLS

DC RDF/XML

Page 27: Managing digital objects and their metadata: challenges and responses Douglas Campbell and Adrienne Kebbell National Library of New Zealand Te Puna Mātauranga

Part 4Business Processes

Page 28: Managing digital objects and their metadata: challenges and responses Douglas Campbell and Adrienne Kebbell National Library of New Zealand Te Puna Mātauranga

Integration into the business

• We’re moving from an era of “pilots” to implementation

• Integrating into existing staff workflows rather than

establishing a separate unit

• Documenting the business process workflows

Page 29: Managing digital objects and their metadata: challenges and responses Douglas Campbell and Adrienne Kebbell National Library of New Zealand Te Puna Mātauranga

Part 5 Tying it all together

Page 30: Managing digital objects and their metadata: challenges and responses Douglas Campbell and Adrienne Kebbell National Library of New Zealand Te Puna Mātauranga

Dig

ital O

bjec

tsM

etad

ata

The Digital Archive Environment

Catalogues

Technical Info

Preservation Info

Selection describe

extract manage

Rights

Digital Store

Digital Object Workbench

• Archive

• Migrate

• Manage media

• Identity

• Prepare

• Arrange

• Authenticate

• Create derivatives

Harvest or

Digitise

acquire

or donatedlegal deposit

retrieveload

Access

metadata conversion search

export

manage

Page 31: Managing digital objects and their metadata: challenges and responses Douglas Campbell and Adrienne Kebbell National Library of New Zealand Te Puna Mātauranga

Digital Preservation Reportcard 2004

Digital preservation has come a long way in 5 years:

• From “overwhelmingly daunting” to “potentially achievable”

• A lot of thought, pilots, developments around the world

Improvements needed:

• Tools are still at the emerging stage

• Workflows/social side is sometimes forgotten

• Identifier scheme for PIDs - major outstanding issue

Page 32: Managing digital objects and their metadata: challenges and responses Douglas Campbell and Adrienne Kebbell National Library of New Zealand Te Puna Mātauranga

Questions…?

Page 33: Managing digital objects and their metadata: challenges and responses Douglas Campbell and Adrienne Kebbell National Library of New Zealand Te Puna Mātauranga

Managing digital objects and their metadata:

challenges and responses

Douglas Campbell and Adrienne KebbellNational Library of New Zealand Te Puna Mātauranga o Aoteaora

DC-2004 Conference, 12 October 2004