adding structural metadata to locally digitized content in hathitrust poster

Post on 29-Nov-2015

87 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Poster for the 2013 DLF Forum Community Idea Exchange by Jason Colman and Kathryn Horne

TRANSCRIPT

Adding Structural Metadata to Locally Digitized Content in HathiTrust

(2) Using PageTag software, structural metadata is added to a relational database to drive the HathiTrust pageturner interface. Options include title pages, chapters, page numbers, indices, and other sections.

(4) Using a combination of Perl and shell scripts,the database outputs a text file with structuralmetadata and OCR confidence ratings. This iscombined with the original page images forHathiTrust ingest.

(3) The page images undergo OCR processingwith PrimeOCR software.

(5) After ingest, the images and metadatacome together in the HathiTrust pageturner.

Jason Colman & Kathryn HorneUniversity of Michigan Library

(1) The process accepts three standard digitization formats: PDF, JPEG2000, & TIFF.

top related