adding structural metadata to locally digitized content in hathitrust poster

1
Adding Structural Metadata to Locally Digitized Content in HathiTrust (2) Using PageTag software, structural metadata is added to a relational database to drive the HathiTrust pageturner interface. Options include title pages, chapters, page numbers, indices, and other sections. (4) Using a combination of Perl and shell scripts, the database outputs a text file with structural metadata and OCR confidence ratings. This is combined with the original page images for HathiTrust ingest. (3) The page images undergo OCR processing with PrimeOCR software. (5) After ingest, the images and metadata come together in the HathiTrust pageturner. Jason Colman & Kathryn Horne University of Michigan Library (1) The process accepts three standard digitization formats: PDF, JPEG2000, & TIFF.

Upload: clirdlf

Post on 29-Nov-2015

87 views

Category:

Documents


0 download

DESCRIPTION

Poster for the 2013 DLF Forum Community Idea Exchange by Jason Colman and Kathryn Horne

TRANSCRIPT

Page 1: Adding Structural Metadata to Locally Digitized Content in HathiTrust Poster

Adding Structural Metadata to Locally Digitized Content in HathiTrust

(2) Using PageTag software, structural metadata is added to a relational database to drive the HathiTrust pageturner interface. Options include title pages, chapters, page numbers, indices, and other sections.

(4) Using a combination of Perl and shell scripts,the database outputs a text file with structuralmetadata and OCR confidence ratings. This iscombined with the original page images forHathiTrust ingest.

(3) The page images undergo OCR processingwith PrimeOCR software.

(5) After ingest, the images and metadatacome together in the HathiTrust pageturner.

Jason Colman & Kathryn HorneUniversity of Michigan Library

(1) The process accepts three standard digitization formats: PDF, JPEG2000, & TIFF.