digitization projects tech con 2006
DESCRIPTION
Digital:BillTRANSCRIPT
Digitization Projects at the State Library of Pennsylvania: Where the Past and Future Meet
Bill NorkHead of Systems & Preservation
William FeeDigital Collections Librarian
Kurt BodlingDigital Resources Cataloger
Pennsylvania Department of Education, State Library of Pennsylvania
www.statelibrary.state.pa.us/digital_projects
Or
Visit the State Library of PA Website
www.statelibrary.state.pa.us
Select Digital Projects of the State Library
Digitization things learned the hard wayOr why do I drink so much coffee?By Bill Fee
•Try to plan things out as much as you can before starting a project
•No matter how much you plan, something will blow up in your face.
•It’s often better to throw people at a problem than equipment (if they hit just right, this also counts as percussive maintenance)
•Loud, obnoxious and driving punk rock and techno really improve the workflow (though that could be just a personal preference)
Hardware & Software
We run a Dell Optiplex GX260 with a 2.26 Ghz non-hyperthreaded processor. Alas, we’re a PC shop.
Scanner-wise, we have a $25,000 Minolta PS 7000 overhead engine book scanner and an HP ScanJet 7400C that’s up for replacement.
Hardware & Software- again
Direct scans into Photoshop. I can save the archival TIFF, then edit it and create the access JPEG right there. As a library you should be able to get an educational license, which is a heck of a lot cheaper. The program itself may seem more full featured than you need, but things like batch process when you're doing a whole directory of images with the same edits and sizing really save time. Get them to pay for classes, though- about 200 per but well worth it.
Still More Hardware & Software
We use Omnipage for OCR. You'll save yourself a heck of a lot of correction time by doing a dual scan- 1 into Photoshop, one directly into the OCR program, whichever you use. Omnipage has about a 98 or 99 percent accuracy for anything but newspapers, but there are others just as good. Hit up ComputerShopper.com and read reviews.
If I'm doing a web page, I use the Composer feature in Mozilla or Netscape.
I’ve been using these programs and essentially the same hardware since the bad old pre-standards Dark Ages of 5 years ago, and they seem to work.
What criteria do you use to have an item digitized? Must be PA related. Usually in such poor shape that it cannot circulate, or from the
Rare Book Room, or ordered by the Director or Commissioner. Must have less than 5-10 holding libraries in FirstSearch (not
counting us). Usually fits a theme- current is the VLaT project- Violence,
Labor and Transportation = riots, train wrecks, mine accidents, etc.
Other problems you will find
Bureaucracy Shipment File and folder nomenclature Poor scans and OCR Storage Personnel “High-priority” projects New software, new uses for software, new problem
with software that only come up because it’s a new project.
Metadata Considerations
Kurt A.T. BodlingDigital Resources CatalogerState Library of Pennsylvania
The Starting Place What is the digital object?
– Something newly created?– Already cataloged?– A collection?– A single item?– A selection from an item?
Who is it for?
Ben Franklin solutions
Easy call: siphon data from OPAC Tougher: dealing with chapters and
single letters
General solution to obit challenges
Sampling and testing Hunting down exceptions Creating a data dictionary And, of course, going back later to
make changes
Data Dictionary defined
MARC : AACR2 :: Dublin Core : Data Dictionary
Data Dictionary for the Pennsylvania Scrap Book Necrology collection. Label in CONTENTdm
Dublin Core Mapping
Content Description and Instructions
Title Title Pennsylvania Scrap Book Necrology, Volume ##, p. ##. Metadata crew replaces the first ## with the actual volume numbers (Arabic, not Roman) in the template before loading images of each volume. Add page numbers for each page as part of uploading process.
Creator Creator State Library of Pennsylvania Surname(s) included
Subject Metadata crew enters the surnames of the deceased individuals on each page. Separate surnames with commas.
Description Description Microfilmed scrapbooks of obituaries clipped from Pennsylvania newspapers from 16 October 1891 to 3 March 1904. Many Civil War veterans included.
Publisher Publisher State Library of Pennsylvania Contributor Contributor Date Date Metadata crew enters the year(s) of obituaries
included in each volume. Type Type text Format Format image/jpeg Identifier Identifier Source Source PHAK 929.3748 P384mi Language Language eng Relation Relation Coverage Coverage Rights Rights Digital images copyright State Library of
Pennsylvania. All rights reserved. May be used for educational purposes as long as a credit statement is included. For all other uses, contact the State Library of Pennsylvania, Digital Rights Office, 333 Market Street, Harrisburg, PA 17126-1745. Phone: (717) 783-5969
Audience Audience Transcripts None This field is a full-text searchable field into
which the OCR for each page will be loaded. It will not be viewable by users, only searchable. The uploading process, if followed correctly, should do this automatically.
Creating the data dictionary
Simple issues first:– Steal data from the catalog– Use boilerplate ‘rights management’
statement– Get repeated data into a template
Creating the data dictionary
More difficult challenges– Names of the deceased– Citation to original source newspapers– Omissions– Enhancements– Difficulties caused by original scrapbooking
Names of the deceased
Not authority controlled Variations between two obit versions Variations within one obit Lacking first name
Name variations:
Anonymous child:
Names of the deceased
Solutions:– Enter only surname, but– Enter all spellings that appear
Citations to original sources
Visible on microfilm, but NOT in jpeg Easily recoverable
Citations to original sources
Solution:– Leave this information out of metadata
Omissions
Blank pages Pages glued together Military unit information
Military unit info:
Omissions
Solutions:– Record page numbers as they appear– Note when pages don’t appear– Omit unit information
Enhancements
Geographic info Occupational info Marital status And on and on and on.
Enhancements
Solutions:– Forego most enrichment– Include “former slave”– Include some terms like “suicide” and
“murder”
Scrapbook difficulties
Running on to second page Running on to 3rd, 4th, 5th … pages
Multiple page obit:
Scrapbook difficulties
Repeated obituaries
Scrapbook difficulties
Label at bottom of page, obit on next
Text and title split:
Scrapbook difficulties
Year-end cumulative death notice Articles that were not obits at all Volumes containing two years
Cumulative notice:
Not an obit:
My Lessons Learned
Metadata isn’t (aren’t?) scary Patience and perseverance win out Small crew = quick decisions
What Did we Learn?
More man-hours than we thought
More staffing to complete task
Decisions about how deep to go with metadata
Questions?
Call or email one of us
Bill Fee 717-783-7014 [email protected]
Kurt Bodling 717-783-5996 [email protected]
Bill Nork [email protected]