digitizing spectator - libraries digital program

27
Columbia Spectator Archive Progress Report on Phase 1 Stephen Paul Davis Columbia University Libraries Digital Program June 27, 2012

Upload: robert-frech

Post on 19-Jul-2015

313 views

Category:

Documents


2 download

TRANSCRIPT

Columbia Spectator Archive

Progress Report on Phase 1

Stephen Paul Davis

Columbia University Libraries

Digital Program

June 27, 2012

The Plan

• Partnership between Columbia Libraries /

Information Services and the Spectator

• High quality scanning of original Spectator

issues from Columbia University Archives and

the Spectator Editorial Offices

• State-of-the-art text processing (OCR) of

scanned images to allow searching at article

• Feature-rich online presentation

• Permanent, long-term digital preservation

The Players

• The Spectator staff and board

• University Archives

• Libraries‟ Preservation & Digital Conversion Division

• Libraries‟ Digital Program Division

• Libraries‟ Information Technology Division

• Digital Data Divide

• Brechin Imaging Services

• Digital Library Consulting (Veridian provider)

• Cornell University Libraries [behind the scenes]

The Context

Columbia Libraries Digital Program’s mission:

• To carry out digitization and access projects chiefly

from Columbia‟s rare and special collections (2002-)

• To build and support Columbia‟s long-term digital

preservation infrastructure (2010-)

• To develop and support preservation of and access

to born-digital archival collections (2011-)

Columbia Libraries Digitization Program

• Digitization Projects (Digital Scriptorium, APIS (papyrus project), John Jay Papers,

Herbert Lehman Papers, etc.)

• Digital Exhibitions (See especially: Core Curriculum:CC, Core Curriculum:LitHum,

1968:Columbia in Crisis, Varsity Show)

• „Born-Digital‟ & Web Archives (Columbia University, Human Rights Organizations, etc.)

Columbia‟s Technology Platforms

Columbia University Libraries / Information Services

has a:

• robust repository infrastructure that follows

• national and international standards and

• „best practices‟ to support

• digital publishing and

• long-term digital preservation

Columbia‟s Repository & Preservation Infrastructure Schematic Overview

Newspaper Access …

• Providing flexible access to newspaper

content is complicated and expensive

• Not cost-effective for single institutions to

build custom, newspaper-oriented software

• Only two major vendors provide software

optimized for newspapers

• DL Consulting’s Veridian is by far the better &

most frequent choice for research libraries

Spectator Stats

Spectator run from 1877-2009:

Number of volumes = 155

Estimated no. of pages = 79,145

Average pages per volume = 500 (wide variation!)

Est. vols. requiring disbinding = 100 Est. vols. unable to be digitized = 10

NB: Most volumes contain severely brittle paper; only 24 volumes have flexible paper

Why Scan From Originals?

Scanning from originals retains visual content

6 May 1968

Tiny sampler of Spec Archive images

19 February 1957 11 October 1956

29 September 1959

27 October 1961

3 December 1973

2 October 1972 7 March 1974

Challenges of Scanning from Originals

Disbinding fragile pages

Repairing and Conserving

Preservation Boxing (for shipping & long-term storage)

Phase 1 Completion

• Prep, rehouse, digitize & encode Spec volumes

for 1955-1992: completed June 15th

• Load into VeridianTest System: June 29th

• Design Spectator Archive website: July 15th

• Move test system to production environment:

July 30th

• Do user testing and quality review: August 15th

• Launch new public site: September 4th

Demo of Test System

• 1964: http://tinyurl.com/78hhypj

• 1968: http://tinyurl.com/7jk6ynz

• 1973: http://tinyurl.com/7gu55p6

• 1983: http://tinyurl.com/7dq8zly

• Searching “coeducation”: http://tinyurl.com/7cwd95g

• Partial content list: http://tinyurl.com/7q8w4nq

[Note that these are all temporary links that work as of 6/28/2012 but which

will stop working altogether at some point in the next few weeks.]

Phase 2 Goals

Finish the Project!

(Prep, rehouse, repair, digitize & encode Spec

volumes for 1877-1954 and 1992-2009)

Phase 2 Costs (for ca. 55,000 pages)

• Preparation, rehousing, repair

= will be covered by CU Libraries

• Scanning of 55,000 pages

= $55,000 + $5,000 contingency

• OCR, segmentation, selective text correction

= $55,000 + $5,000 contingency

• Load into host system, license, maintenance

= already covered by CU Libraries

• Long term preservation of master image (tiff) files

= may require additional fundraising

Final, key points

• The Spectator Archive project is extremely important

for preservation of and access to Columbia

University‟s history

• This is an archival preservation project as well as an

information access project

• Columbia Libraries is making a major, long-term

investment to ensure the success of this project

• The Libraries and the Spec have made a great start,

but additional funding is needed to complete the job

Questions

Stephen Paul Davis, Director

Libraries Digital Program

Columbia University

[email protected]