a framework for publishing oral history interviews to the web
Post on 13-Feb-2016
28 Views
Preview:
DESCRIPTION
TRANSCRIPT
A Framework for Publishing Oral History Interviews to the Web
Stephen Paul DavisDirector, Libraries Digital Program
Columbia University
OCLC Western Digital ForumAugust 2006
rev. 10/2011
The Players Columbia's Libraries Digital Program
Columbia Center for Oral History (formerly: Oral History Research Office)
Columbia's Digital Knowledge Ventures (ceased operations)
Backstage Library Works (formerly: OCLC Preservation Services)
George Blood, L.P. (formerly: Safe Sound Archive)
OCLC Digital Archive
The Characters Bennett Cerf – publisher
Kenneth Clark – psychologist, social activist
Mamie Clark – psychologist, social activist
Moe Foner – labor activist
Andrew Heiskell – publisher
Edward I. Koch – political figure
Mary Lasker – philanthropist
John B. Oakes – newspaper editor
Frances Perkins – political figure
Frank Stanton – leader in broadcasting
The Script Sessions: 10 interviewees in 193 individual interview
sessions
Recordings: 205 hours on 170 Tapes (109 Cassettes, 53 Five-inch Reels, 8 Seven-inch Reels)
Transcriptions
◦ 11,064 pages of typescript in 72 notebook binders
◦ 2,644 pages in MS Word format
Related material: name indexes, biographies, tables of contents, photos
The Plot Online audio in Real & MP3 format, both downloadable & streaming Audio segments directly correlated with transcriptions at the
paragraph level Page images of transcriptions in PDF OCR'd transcriptions plus TEI/XML mark up Full-text search and retrieval Name index entries linked back to references in text Abstract of each interview A general introduction A few pictures Rights and permissions cleared in advance
The Revised Plot Online audio in Real & MP3 format, both downloadable & streaming Audio segments directly correlated with transcriptions at the
paragraph session level Page images of transcriptions in PDF OCR'd Re-keyed transcriptions plus TEI/XML mark up Full-text search and retrieval Name index entries linked back to references in text Abstract of each interview A general introduction Three general introductory essays & a video interview with ORHO
director emeritus Ten introductions for the interviewees A few 50 pictures Ten new, detailed tables of contents Ten audio & text 'excerpts' to provide interview lead-ins Rights and permissions cleared in advance
◦ Dropped: Robert F. Wagner, Kitty Carlisle Hart, Alice Hartley Neel, Schuyler Garrison Chapin, Ed Koch (1997)
◦ Almost dropped: Foner (bad language) ◦ Added: Mamie Clark, Mary Lasker, Frances Perkins, John Oakes
Cataloging & Metadata Cataloging options: Audio: the original audio collection, the complete wav files, the
complete MP3 files, the segmented Real files Transcriptions: the original typescripts and/or Word files; the
converted XML files; the generated HTML files
Cataloging decisions Previous catalog records for oral history transcripts left intact
under “Reminiscences of …”
New collection-level catalog record created for entire NNY site
New “analytic” catalog records created for each Notable New Yorker subsite as a component of the NNY collection site: 773 0_ |7 nnbc |a Notable New Yorkers |h [electronic resource]. |w (OCoLC65181290)
Ticket Prices Scanning, keying & XML Markup: $12,200 Audio transfers, file header edits, MP3 creation & media: $13,720 Audio time coding & post-processing: $9,000 Web site (outsource): $17,150
◦ Pre-production, $2,600◦ Rights research & permissions, $1,000◦ Web site design, $3,850◦ Web programming, $7,500◦ Copy editing & QA, $1,400◦ XSLT Generation of HTML from METS/TEI, $2,000
Additional site content: $12,800◦ Introductory Essays, $5,700◦ Tables of Contents, etc. $5,900◦ Video shoot & post-production, $1,200
Oral History Research Office Contributions: "Priceless"◦ Text preprocessing◦ Audio inventory◦ Rights and permissions clearances◦ Editorial review
Digital Library Program Contributions: “Ditto”◦ Project and vendor coordination ◦ Text QC, post-processing, METS file creation◦ Text indexing & retrieval system (Lucene)◦ Application integration
Challenges 1Problems with Rights & Permissions Permission status uncertain Permission withdrawn Permission equivocal
Problems with Source Material Incomplete / outdated inventory of original media Missing tapes, audio files Patrons using only (single) copy of transcripts Misnumbered pages in transcriptions Missing pages in transcriptions
Scanning & Keying Vendor / Digital Program Relations Novelty of / unfamiliarity with oral history content Delays in providing vendor with source material Recognition that typescripts could not be OCR’d because of poor quality;
instead 100% rekeying of originals Clarity, interpretation, accuracy of markup specs
Challenges 2Web Design Vendor / Digital Program Relations Outsource design of a web site intended to be maintained afterwards in-house; Differences in development process, methodology Difference in “one shot” site versus ongoing collection-driven site Differences in design “values,” e.g., aesthetics versus usability; “teaching &
learning” ethos versus “easy & effective access” ethos; role of branding; Differences in familiarity and experience with full-text / cross-text search and
retrieval Availability of time to meet & discuss issues, project management by email,
deadlines,
Curatorial / Digital Program Relations Curatorial time and staffing constraints Curatorial enthusiasm leading to requirements creep Assumptions about feasibility of “last minute changes”
Textual Issues Identity of the “master file” after online publication? “Fixity” of transcriptions in MS Word Retaining consistency of references / citations in paper version and in online version
Challenges IIIIssues Relating to the Practice of Oral History Publishing oral history interviews reflecting older, “outdated”
practice along with those reflecting current practice Making available original, unedited audio files in conjunction with
transcriptions reviewed & edited by the interviewees Web exposure of interviews that were originally to be available
onsite to scholars and researchers Influence on current and prospective interview subjects who know
that their comments will be published on the Web
The Moral (Lessons Learned) 1 Commit to doing more planning up front than you think you need to
do;
Set up a rigorous schedule of face-to-face meetings with key stakeholders even if they don't think you need to;
Make sure all content pieces are agreed to, in hand, fixed, and have clear permissions to publish before agreeing to do the project (or at least before contracting with vendors);
Oral Histories are by their nature fuzzy in their fixity;
Widows often object to their husbands' bad language long after their husbands are gone;
Keep detailed inventories of all content pieces before, during and after the project (good asset management);
Enthusiasm can often lead to scope creep;
The Moral (Lessons Learned) II Push off non-essential scope creep to Phase 2;
Don't try to edit Emeritus' prose;
Many people don't like Realmedia / RealPlayer any more (I blame Microsoft);
Curators often have other things to do than what you're interested in having them do;
Library Digital Program staff always have other things to do than the project the curator is interested in;
If a Digital Project is successful it becomes a permanent part of your life and will always need care and feeding even if you think you're finished with it, so get used to it;
There are less expensive ways to do projects like Notable New Yorkers but not that much less expensive.
top related