digitization with millennium & contentdm

37
Digitization with Millennium & CONTENTdm Stuart Hunt IUG17 Anaheim May 2009

Upload: chavi

Post on 24-Feb-2016

42 views

Category:

Documents


0 download

DESCRIPTION

Digitization with Millennium & CONTENTdm. Stuart Hunt IUG17 Anaheim May 2009. Overview. Background Digitisation Metadata Workflows Now. University of Warwick. Royal Charter 1965 Russell Group 16,000 FTE students 5000 staff. University Library. Approx 1.1 million volumes - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Digitization with Millennium &  CONTENTdm

Digitization with Millennium & CONTENTdm

Stuart Hunt

IUG17AnaheimMay 2009

Page 2: Digitization with Millennium &  CONTENTdm

Overview• Background• Digitisation• Metadata• Workflows• Now

Page 3: Digitization with Millennium &  CONTENTdm

University of Warwick• Royal Charter 1965• Russell Group• 16,000 FTE students• 5000 staff

Page 4: Digitization with Millennium &  CONTENTdm

University Library• Approx 1.1 million volumes• 170 staff (110 FTE)• Millennium 2003• Approx 100,000 issues/renewals per yr• Approx 28,000 new books per yr• RLUK member• OCLC member

Page 5: Digitization with Millennium &  CONTENTdm

Content• Marandet Collection• 4000+ French plays 1720 to 1900• Acquired 1970s• Guide published 1979• Bibliographic records in Millennium,

RLUK, COPAC, & WorldCat• No IPR issues

Page 6: Digitization with Millennium &  CONTENTdm
Page 7: Digitization with Millennium &  CONTENTdm

Projects• Revolutionary Drama (1789-1800)

– 339 plays• Empire Period Drama (1801-1815)

– 123 plays• JISC Digitisation Programme:

Enriching Digital Resources• ‘Exposing Marandet’

– 1500 plays/75,000 pages

Page 8: Digitization with Millennium &  CONTENTdm

Objectives• Cross-searching• Full-text searching• Integration with existing & future

systems– Millennium– Web– Vertical search solution

Page 9: Digitization with Millennium &  CONTENTdm

Options• Existing solutions

– Millennium– In-house web publishing tool

• Separate product– Digital collection management software– CONTENTdm

• Solution would drive approach taken

Page 10: Digitization with Millennium &  CONTENTdm

Digital production• Image files

– TIFF & JPEG derivative– Full colour & greyscale– Outsourced

• Text files/full-text transcripts– OCR quality initially not acceptable– Re-keying– Outsourced

Page 11: Digitization with Millennium &  CONTENTdm

Media Management• Tried & tested solution• Quick & easy• Link digital content• D2D process simplified• Existing bibs• New bibs• Use existing authentication if required

Page 12: Digitization with Millennium &  CONTENTdm
Page 13: Digitization with Millennium &  CONTENTdm

Media Management• No full-text searching• No cross-collection searching (unless in

separate scope)• Tied to MARC metadata• Metadata enrichment difficult• Image file format• Not a total solution

Page 14: Digitization with Millennium &  CONTENTdm

CONTENTdm• Full-text & cross-collection searching• Not tied to MARC metadata• Metadata enrichment simple• Local Windows server• Initial licence <50K images• Upgraded to unlimited licence 2008

Page 15: Digitization with Millennium &  CONTENTdm

Local metadata context• Separate bibs

– Print vs electronic– Describes what is– Supports better (future) FRBRisation– Ease of maintenance– Location & format based scoping

• 793 for local added entry/uniform title– Collection name

Page 16: Digitization with Millennium &  CONTENTdm

Metadata option 1• Create metadata within CONTENTdm• Play-by-play• Metadata already present in Millennium

Page 17: Digitization with Millennium &  CONTENTdm

Metadata option 1• Assumes that metadata is already

available• Not scalable• Poor use of resources• Does not allow data to work harder or

smarter

Page 18: Digitization with Millennium &  CONTENTdm

Metadata option 2• Create metadata outside of Millennium• Metadata not already present in

Millennium• Play-by-play• Harvest from CONTENTdm into

Millennium via XML Harvester

Page 19: Digitization with Millennium &  CONTENTdm

XML Harvester• Single configuration file• Needs to be edited for each separate

resource• Uses XSLT not load table(s)• Major changes (e.g. harvest different

schema) may need to be done by III

Page 20: Digitization with Millennium &  CONTENTdm

Configuration file triggers@XML_TYPE=DC (or MARCXML)@OAI_FORMAT=oai_dc@DBNAME=[Repository name]@URL=[url for OAI-PMH]@USEOAI=true (or false)@OAISET=[Name of set]@RECID_MARCTAG=001

Page 21: Digitization with Millennium &  CONTENTdm

XML Harvester

Page 22: Digitization with Millennium &  CONTENTdm

Harvested metadata• Loaded through Data Exchange• Significant re-editing• Tags & indicators• Diacritics• Creating attached items or holdings

records

Page 23: Digitization with Millennium &  CONTENTdm

Harvested metadata

Page 24: Digitization with Millennium &  CONTENTdm

Metadata option 3• Batchload into CONTENTdm via

delimited file from Create Lists• Cross-walk MARC21 to DC• Directory structure

Page 25: Digitization with Millennium &  CONTENTdm

MARC to Simple DC crosswalkRecord# dc:identifier008/07-10 dc:language100 dc:creator245 dc:title260|ab dc:publisher

260|c dc:date300 dc:format5XX dc:description6XX dc:subject700 dc:contributor700|t dc:relation793 dc:source

Page 26: Digitization with Millennium &  CONTENTdm

MARC – DC Crosswalk

Page 27: Digitization with Millennium &  CONTENTdm

Additional DC elements• dc:rights• dc:type• Transcript mapped to dc:description

Page 28: Digitization with Millennium &  CONTENTdm

Metadata workflow• Create separate bibs for e-versions• Export print records via Data Exchange• MarcEdit to remove extraneous tags

(907, etc)• Insert 006, 007, 008/23, GMD, 533• Re-import into Millennium as new bibs• [856 CONTENTdm reference url added]

Page 29: Digitization with Millennium &  CONTENTdm

Metadata workflow• Review file of newly loaded bibs

exported from Create Lists• Cross-walked from MARC to DC• Additional DC elements added• Item level metadata added• Loaded to CDM as delimited files with

directory structure

Page 30: Digitization with Millennium &  CONTENTdm

Metadata in CONTENTdm• Compound objects• Document level• Page level

– Less rich than document level• Hospitable to multiple schemas• Deliberate attempt to stay close to DC• Administrative metadata

– Later feature

Page 31: Digitization with Millennium &  CONTENTdm

Document level• AACR in DC wrapper• All descriptive metadata from bib

(except LDR, 006, 007, 008, GMD)• Authority control (names, subjects,

uniform titles)• Rights (dc:rights)• Identifier (.b number)• Mapped to DC for OAI harvesting

Page 32: Digitization with Millennium &  CONTENTdm

Page level• Basic descriptive metadata (creator,

title, publisher, date)• Rights (dc:rights)• Identifier (.b number)• Transcript (dc:description)• No OAI harvesting at page level

– Local decision

Page 33: Digitization with Millennium &  CONTENTdm

Access & availability• Availability across local → global

continuum• Metadata contribution• Collection level descriptions• OAI• Collapse D2D

Page 34: Digitization with Millennium &  CONTENTdm

Metadata in WorldCat• Local CDM server – not able to use

Connexion Digital Import• Bug between WorldCat and CDM for

compound objects• FRBRized display in worldcat.org

potentially impedes discovery

Page 35: Digitization with Millennium &  CONTENTdm

Now• ‘Exposing Marandet’ completes 9/2009• Established service 4 collections

– Ancien Régime Drama– Revolutionary Drama– Empire Period Drama– Restoration Drama

• Integration with course delivery• Metadata enrichment to/from CÉSAR