managing digital objects and their metadata: challenges and responses
DESCRIPTION
Managing digital objects and their metadata: challenges and responses. Douglas Campbell and Adrienne Kebbell National Library of New Zealand Te Puna Mātauranga o Aoteaora DC-2004 Conference, 12 October 2004. Agenda. Our situation Digital Preservation Frameworks Digital Objects - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Managing digital objects and their metadata: challenges and responses](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56815827550346895dc58a13/html5/thumbnails/1.jpg)
Managing digital objects and their metadata:
challenges and responses
Douglas Campbell and Adrienne KebbellNational Library of New Zealand Te Puna Mātauranga o Aoteaora
DC-2004 Conference, 12 October 2004
![Page 2: Managing digital objects and their metadata: challenges and responses](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56815827550346895dc58a13/html5/thumbnails/2.jpg)
Agenda
• Our situation
• Digital Preservation
Frameworks
• Digital Objects– Complex objects
– Identifiers
– File naming
• Metadata– Frameworks
– Descriptive metadata
– Preservation metadata
– Structural metadata
– Automatic extraction
– Modularity
• Integration
– Business process workflows
![Page 3: Managing digital objects and their metadata: challenges and responses](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56815827550346895dc58a13/html5/thumbnails/3.jpg)
National Library of New ZealandTe Puna Mātauranga o Aoteaora
• Collect, maintain, and make accessible literature and information resources that relate to New Zealand and the Pacific
• Alexander Turnbull Library:Preserve New Zealand's documentary heritage for generations to come
• Develop and deliver services for schools to support teaching and learning
• Apply the partnership responsibilities of the Treaty of Waitangi to all activities
![Page 4: Managing digital objects and their metadata: challenges and responses](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56815827550346895dc58a13/html5/thumbnails/4.jpg)
![Page 5: Managing digital objects and their metadata: challenges and responses](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56815827550346895dc58a13/html5/thumbnails/5.jpg)
National Digital Heritage Archive
• National Library Act 2003 gives legal deposit of electronic
materials to the National Library
• Archive development funded by Government
• Working towards “Trusted Digital Repository” certification
![Page 6: Managing digital objects and their metadata: challenges and responses](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56815827550346895dc58a13/html5/thumbnails/6.jpg)
Part 1 Digital Preservation Framework
![Page 7: Managing digital objects and their metadata: challenges and responses](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56815827550346895dc58a13/html5/thumbnails/7.jpg)
Open Archival Information System (OAIS) Model
KEY:SIP – Submission Information Package (Ingest)AIP – Archival Information Package (Archive)DIP – Dissemination Information Package (Access)
![Page 8: Managing digital objects and their metadata: challenges and responses](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56815827550346895dc58a13/html5/thumbnails/8.jpg)
Dig
ital O
bjec
tsM
etad
ata
Applying OAIS – building our framework
Catalogues
Technical Info
Preservation Info
Selection describe
extract manage
Rights
Digital Store
Digital Object Workbench
• Archive
• Migrate
• Manage media
• Identity
• Prepare
• Arrange
• Authenticate
• Create derivatives
Harvest or
Digitise
acquire
or donatedlegal deposit
retrieveload
Access
metadata conversion search
export
manage
![Page 9: Managing digital objects and their metadata: challenges and responses](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56815827550346895dc58a13/html5/thumbnails/9.jpg)
Part 2 Digital Objects
![Page 10: Managing digital objects and their metadata: challenges and responses](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56815827550346895dc58a13/html5/thumbnails/10.jpg)
Digital objects are complex
• Website – hundreds of files
• CD-ROM – hard-coded operation
• Diskette of accounts spreadsheets and correspondence –
dissimilar but related
• Self-contained single file, eg. MS Excel
• Dependent multiple files, eg. HTML + GIFs, or EXE + DLLs
• Self-contained multiple files, eg. Series of MS Word letters
![Page 11: Managing digital objects and their metadata: challenges and responses](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56815827550346895dc58a13/html5/thumbnails/11.jpg)
Classifying the “conceptual object”
• Simple digital object– A single file
– MS Word document, TIFF image
• Digital object group – A set of independent but related files described as a group
– Disk of 100 MS Word letters
• Complex digital object– A group of dependent files intended to be viewed as a single
conceptual object, often with only one entry point
– Website, CD-ROM
![Page 12: Managing digital objects and their metadata: challenges and responses](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56815827550346895dc58a13/html5/thumbnails/12.jpg)
Simple Digital Object
1 Descriptive Record
1 Preservation Object Record
(for PM Word file)
1 Original file [Word]
1 Preservation Master file[Word]
2 Access files [PDF + XML]
1 Simple Object eg. text document 1 PID for 4 files
Object Group
1 Descriptive Record for 800 files [Word, XML, PDF]
•1 Object Pres Data •200 File Data•NN Process Data•NN Metadata Modification Data
1 PID for 800 files
200 Original files [Word]
200 Preservation Master files[Word]
400 Access files [PDF + XML]
1 Object Group eg. 200 letters from
a donor
Complex Digital Object
1 Descriptive Record for 300
files [HTML + gif]
100 Original files [HTML + gif]
100 Preservation Master files[processed for local delivery]
100 Access files [HTML + gif]
1 Complex Object eg. Web Site of 80 html files + 20 gifs
1 PID for 300 files
•1 Object Pres Data •100 File Data•NN Process Data•NN Metadata Modification Data
Complexity of components
![Page 13: Managing digital objects and their metadata: challenges and responses](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56815827550346895dc58a13/html5/thumbnails/13.jpg)
Identifiers
Key characteristics of identifiers to consider:
• Granularity – Question: What do we need to identify? Answer: Whatever we need to identify!
• Intelligence – Unanticipated changes may render intelligent identifiers inaccurate, though dumb identifiers place a reliance on external metadata
• Actionable – Need to separate identity from location, eg. two URLs may be two locations of the same entity
• Persistence – Depends mostly on your commitment
• Extensibility – Be generic, follow standards, application independent
![Page 14: Managing digital objects and their metadata: challenges and responses](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56815827550346895dc58a13/html5/thumbnails/14.jpg)
Persistent Identifiers
Persistence means different things to different communities,
we separate them into:
• Persistent Identifier (PID) – assigned at the “conceptual”
level of an object, persists in perpetuity
• Persistent Locator (PL) – file locator, persists only for the
life of the file
We guarantee PIDs, but PLs to the “best current format” will
become inoperative over the decades as formats become
obsolescent
![Page 15: Managing digital objects and their metadata: challenges and responses](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56815827550346895dc58a13/html5/thumbnails/15.jpg)
File naming conventions – Plan “A”
Plan A: Make filenames unique by including role code, eg:
• DO – Digital Original
• DD – Digital Derivative
• PM – Preservation Master (best attempt to replicate in a
currently accessible format)
• AF – Access Format
• TN – Thumbnail
Filename: IID_role_instance.extension, eg. 1234_af_01.doc
![Page 16: Managing digital objects and their metadata: challenges and responses](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56815827550346895dc58a13/html5/thumbnails/16.jpg)
File naming conventions – Plan “B”
Plan B: “Virtualisation”
• Decouple locator and location
• Location and disk partitioning managed dynamically internally, delivered externally via persistent locator– /1234 (to access the default format)
– /1234?role=TN&size=150
• Locator may be HTTP, SOAP, etc.
• Provides additional opportunities such as transparent “on the fly” format conversions or correcting the MIME type reported
![Page 17: Managing digital objects and their metadata: challenges and responses](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56815827550346895dc58a13/html5/thumbnails/17.jpg)
Novel
Expression
Manifestation
Component
Item
Work
Manuscript
Word v5 PDF XML
Chap 1
Chap 2
Chap 1
Chap 2
Chap 1
Chap 2
XML XSL XML XSL
DOPM
ASAF AF
DOPM
ASAF AFAF AF
Published
Preservation
Lending
BookManifestation
Item
• FRBR
![Page 18: Managing digital objects and their metadata: challenges and responses](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56815827550346895dc58a13/html5/thumbnails/18.jpg)
Part 3 Metadata
![Page 19: Managing digital objects and their metadata: challenges and responses](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56815827550346895dc58a13/html5/thumbnails/19.jpg)
Metadata Framework
Four key categories of metadata for digital objects:
• Resource discovery – finding and identifying
• Structural – presenting in context (eg. pages in a book
rather than bunch of files, navigation, etc)
• Rights management and Access control – protection
of property rights, authentication and authorisation
• Technical and Administrative – properties of the
objects, how they were created, changes made, etc.
![Page 20: Managing digital objects and their metadata: challenges and responses](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56815827550346895dc58a13/html5/thumbnails/20.jpg)
Metadata Framework
Dublin Core
RDF
XML
Generic or GlobalAccess
NZ
GL
SD
C-G
ovG
ILS
AG
LS
MA
RC
DC
QM
OD
SM
ET
S
DC
-Ed
LO
M
EA
DIS
AD
(G)
Community / Sector
Specific Application
Profiles
Community / Sector
Specific Application Profiles
Following International Guidelines
Local
Library Education Archival Government
Metadata Standards Framework for National Library of New Zealand
![Page 21: Managing digital objects and their metadata: challenges and responses](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56815827550346895dc58a13/html5/thumbnails/21.jpg)
Descriptive metadata
Digital Resource Description (DRD) Application Profile
• Lightweight alternative to METS for simple objects based on Qualified DC
• XLink extensions to differentiate links to the multiple derivative files
• Local refinements for different identifier types, eg. local id, persistent id, locator
• RDF/XML encoding syntax
• Used in our “Discover” and “Matapihi” products
![Page 22: Managing digital objects and their metadata: challenges and responses](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56815827550346895dc58a13/html5/thumbnails/22.jpg)
Preservation metadata
NLNZ Preservation Metadata (2002)– Object – preservation info for object, eg. ID, software needed
– File – preservation info for a file, eg. format, size
– Process – record of actions taken, eg. format migration
– Metadata modification – record of changes to above metadata
![Page 23: Managing digital objects and their metadata: challenges and responses](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56815827550346895dc58a13/html5/thumbnails/23.jpg)
Structural metadata
Metadata Encoding & Transmission Standard (METS)
METS recordHeade
rDescriptiv
eAdministrati
veContent
FilesStructural Map
Structural Links
Behaviour
![Page 24: Managing digital objects and their metadata: challenges and responses](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56815827550346895dc58a13/html5/thumbnails/24.jpg)
Metadata Pieces for a Single TIFF Image
Preservation
DCQ Description
<?xml version="1.0" encoding="UTF-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:nlnzdl="http://www.natlib.govt.nz/dl#" xmlns:ead="http://www.natlib.govt.nz/dl#" xmlns:dcq="http://purl.org/dc/terms/" xmlns:dc="http://purl.org/dc/elements/1.1/"> <rdf:Description rdf:about="hdl:1727.11/00002170"> <dc:title>Blue smoke = Kohu auwahi</dc:title> <dc:creator>Karaitiana, Rangi Ruru Wananga, 1909-1970</dc:creator> <dc:subject> <dcq:LCSH> <rdf:value>Popular music - 1941-1950</rdf:value> </dcq:LCSH> </dc:subject> <dc:subject>Maori music - Waiata</dc:subject> <nlnzdl:category> <nlnzdl:NZCT> <rdf:value>A-M-02-03</rdf:value> <rdfs:label>Music Score Covers and Record Sleeves</rdfs:label> </nlnzdl:NZCT> </nlnzdl:category>
<dc:description>For voices and piano.</dc:description> <dc:description>Arranged with tonic sol-fa, chord symbols and two part vocal harmony"--Cover.</dc:description> <dc:description>Words in English and Maori.</dc:description> <dc:description>Caption title.</dc:description> <dc:publisher>C. Begg, [Dunedin] N.Z.</dc:publisher> <dc:contributor>Winchester, George, ca. 1900- , arranger</dc:contributor> <dcq:issued>c1947.</dcq:issued> <dc:type> <dcq:DCMIType> <rdf:value>Text</rdf:value> </dcq:DCMIType> </dc:type> <dc:type> <nlnzdl:LCSHFormOfComposition> <rdf:value>pp</rdf:value> <rdfs:label>Popular music</rdfs:label> </nlnzdl:LCSHFormOfComposition> </dc:type> <dc:format>1 score cover ([1]) p. ; 31 cm.</dc:format> <dcq:extent>17,640KB</dcq:extent> <dcq:extent>81KB</dcq:extent> <dc:format> <dcq:IMT> <rdf:value>image/tiff</rdf:value> </dcq:IMT> </dc:format> <dc:format> <dcq:IMT> <rdf:value>image/jpeg</rdf:value> </dcq:IMT> </dc:format> <dc:format> <dcq:IMT> <rdf:value>image/jpeg</rdf:value> </dcq:IMT> </dc:format> <dc:format> <dcq:IMT> <rdf:value>image/jpeg</rdf:value> </dcq:IMT> </dc:format> <dc:identifier rdf:resource="00182451_00002170_ds.tif"/> <dc:identifier rdf:resource="00182451_00002170_df.jpg"/> <dc:identifier rdf:resource="00182451_00002170_pv.jpg"/> <dc:identifier rdf:resource="00182451_00002170_tn.jpg"/> <nlnzdl:pid rdf:resource="hdl:1727.11/00002170"/> <nlnzdl:object rdf:resource="http://hdl.handle.net/1727.11/00002170"/> <dc:language> <dcq:ISO639-2> <rdf:value>eng</rdf:value> </dcq:ISO639-2> </dc:language> <dc:language> <dcq:ISO639-2> <rdf:value>eng</rdf:value> </dcq:ISO639-2> </dc:language> <dc:language> <dcq:ISO639-2> <rdf:value>mao</rdf:value> </dcq:ISO639-2> </dc:language> <dcq:hasFormat>Also available as an electronic resource.</dcq:hasFormat> <dcq:spatial>New Zealand</dcq:spatial> <dcq:temporal>1947</dcq:temporal> <dc:rights>Permission of the National Library of New Zealand, Te Puna Matauranga o Aotearoa must be obtained before any re-use of this item.</dc:rights> <ead:daoloc ead:behavior="image/tiff" ead:href="http://digital.natlib.govt.nz/source/20020605/00182451_00002170_ds.tif" ead:role="source"/> <ead:daoloc ead:behavior="image/jpeg" ead:href="http://digital.natlib.govt.nz/20020604/00182451_00002170_df.jpg" ead:role="reference" ead:title="Digital image of the cover of the score for Blue smoke. (81KB)"/> <ead:daoloc ead:behavior="image/jpeg" ead:href="http://digital.natlib.govt.nz/20020604/00182451_00002170_pv.jpg" ead:role="display"/> <ead:daoloc ead:behavior="image/jpeg" ead:href="http://digital.natlib.govt.nz/20020604/00182451_00002170_tn.jpg" ead:role="thumbnail"/> </rdf:Description> </rdf:RDF>
METS File Group and structural Map <fileSec> <fileGrp ID="FG2170_pm" USE="Preservation Master"> <file ID="F2170_pm" MIMETYPE="image/tiff" SIZE="17379652" CREATED="1997-04-13T14:51:14"> <FLocat ID="FL2170_pm" LOCTYPE="URL" xlink:href="objects/preservation/1/2170_pm.tif" xlink:actuate="onRequest"/> </file> </fileGrp> <fileGrp ID="FG2170_ds" USE="Dissemination Source"> <file ID="F2170_ds" MIMETYPE="image/tiff" SIZE="17379652" CREATED="2002-09-11T09:12:20"> <FLocat ID="FL2170_ds" LOCTYPE="URL" xlink:href="objects/source/1/2170_ds.tif" xlink:actuate="onRequest"/> </file> </fileGrp> <fileGrp ID="FG2170_df" USE="Dissemination Format"> <file ID="F2170_df" MIMETYPE="image/jpeg" SIZE="123394" CREATED="2002-10-31T15:32:26"> <FLocat ID="FL2170_df" LOCTYPE="URL" xlink:href="objects/access/1/2170_df.jpg" xlink:actuate="onRequest"/> </file> </fileGrp> <fileGrp ID="FG2170_pv" USE="Preview Image"> <file ID="F2170_pv" MIMETYPE="image/jpeg" SIZE="99725" CREATED="2003-04-08T10:56:22"> <FLocat ID="FL2170_pv" LOCTYPE="URL" xlink:href="objects/access/1/2170_pv.jpg" xlink:actuate="onRequest"/> </file> </fileGrp> <fileGrp ID="FG2170_tn" USE="Thumbnail Image"> <file ID="F2170_tn" MIMETYPE="image/jpeg" SIZE="23162" CREATED="2003-04-07T11:33:13"> <FLocat ID="FL2170_tn" LOCTYPE="URL" xlink:href="objects/access/1/2170_tn.jpg" xlink:actuate="onRequest"/> </file>
</fileGrp> </fileSec> <structMap ID="SM2170" TYPE="LOGICAL"> <div ID="DIV2170" LABEL="Blue smoke = Kohu auwahi" TYPE="Image"> <div ID="DIV2170_pm" LABEL="Preservation master" TYPE="tiff image"> <fptr FILEID="F2170_pm"/> </div> <div ID="DIV2170_ds" LABEL="Dissemination Source" TYPE="tiff image"> <fptr FILEID="F2170_ds"/> </div> <div ID="DIV2170_df" LABEL="Dissemination Format" TYPE="jpeg image"> <fptr FILEID="F2170_df"/> </div> <div ID="DIV2170_pv" LABEL="Preview Image" TYPE="jpeg image"> <fptr FILEID="F2170_pv"/> </div> <div ID="DIV2170_tn" LABEL="Thumbnail Image" TYPE="jpeg image"> <fptr FILEID="F2170_tn"/> </div> </div> </structMap>
![Page 25: Managing digital objects and their metadata: challenges and responses](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56815827550346895dc58a13/html5/thumbnails/25.jpg)
NLNZ Metadata Extraction Tool
Automatic metadata extraction is essential
• Extracts embedded metadata from 15 common file
formats (eg. TIFF, JPEG, MS Word, PDF) and file details
for other formats
• Built in Java, outputs in XML (customisable using XSLT)
• Graphical interface or command line batch
• 10,000 JPEG files per hour
• Finalist in UK Pilgrim Trust’s 2004 Preservation Awards
![Page 26: Managing digital objects and their metadata: challenges and responses](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56815827550346895dc58a13/html5/thumbnails/26.jpg)
Metadata Conversion Engine
Metadata modularity
DescriptiveRecords
MARC
ISAD(G)
Picture AustraliaCROSSWALK
DC XML
METS
DC RDF/XML
Matapihi
Govt Portal
Digital Archive
Discover
AdditionalData
DRD RDF AP
NZGLS
DC RDF/XML
![Page 27: Managing digital objects and their metadata: challenges and responses](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56815827550346895dc58a13/html5/thumbnails/27.jpg)
Part 4Business Processes
![Page 28: Managing digital objects and their metadata: challenges and responses](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56815827550346895dc58a13/html5/thumbnails/28.jpg)
Integration into the business
• We’re moving from an era of “pilots” to implementation
• Integrating into existing staff workflows rather than
establishing a separate unit
• Documenting the business process workflows
![Page 29: Managing digital objects and their metadata: challenges and responses](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56815827550346895dc58a13/html5/thumbnails/29.jpg)
Part 5 Tying it all together
![Page 30: Managing digital objects and their metadata: challenges and responses](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56815827550346895dc58a13/html5/thumbnails/30.jpg)
Dig
ital O
bjec
tsM
etad
ata
The Digital Archive Environment
Catalogues
Technical Info
Preservation Info
Selection describe
extract manage
Rights
Digital Store
Digital Object Workbench
• Archive
• Migrate
• Manage media
• Identity
• Prepare
• Arrange
• Authenticate
• Create derivatives
Harvest or
Digitise
acquire
or donatedlegal deposit
retrieveload
Access
metadata conversion search
export
manage
![Page 31: Managing digital objects and their metadata: challenges and responses](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56815827550346895dc58a13/html5/thumbnails/31.jpg)
Digital Preservation Reportcard 2004
Digital preservation has come a long way in 5 years:
• From “overwhelmingly daunting” to “potentially achievable”
• A lot of thought, pilots, developments around the world
Improvements needed:
• Tools are still at the emerging stage
• Workflows/social side is sometimes forgotten
• Identifier scheme for PIDs - major outstanding issue
![Page 32: Managing digital objects and their metadata: challenges and responses](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56815827550346895dc58a13/html5/thumbnails/32.jpg)
Questions…?
![Page 33: Managing digital objects and their metadata: challenges and responses](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56815827550346895dc58a13/html5/thumbnails/33.jpg)
Managing digital objects and their metadata:
challenges and responses
Douglas Campbell and Adrienne KebbellNational Library of New Zealand Te Puna Mātauranga o Aoteaora
DC-2004 Conference, 12 October 2004