text collections and contentdm

29
Text Collections and CONTENTdm Using Monograph Compound Objects to display the Orleans Parish School Board Minute Books

Upload: gena-chattin

Post on 29-Jun-2015

265 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Text Collections and CONTENTdm

Text Collections and CONTENTdmUsing Monograph Compound Objects to display the Orleans Parish School Board Minute Books

Page 2: Text Collections and CONTENTdm

Agenda• The Collection

the history, the data, the images, the deadline

• The Strategymonograph compound objects w/a tab-

delimited text file• The Results

what went well, what didn’t, next steps• Some Alternatives

considering other digital text collection display methods

Page 3: Text Collections and CONTENTdm

the Orleans Parish School Board Minute Books

Bound volumes of Orleans Parish School Board meeting minutes.

Dates Covered: 1841-1996

Includes the Civil War and Desegregation. Scholars as far away as Japan consult this collection on site

Thanks to UNO history professor Al Kennedy who rescued many of the documents from being discarded

Page 4: Text Collections and CONTENTdm

“Unclaimed History”Indexing/Summaries Pilot Project

A Board of Regents grant allowed UNO Midlo Center for New Orleans Studies historians to summarize and index +/-900 pages of meeting minutes from just before, during, and right after the Civil War.

Indexed by:

• VOLUME 

• MEETING• Meeting Title • Meeting Date• Board Members present,

absent• Keywords • Meeting Summary

• PAGE• Page Summaries• Page Dates

Page 5: Text Collections and CONTENTdm

Data ConsiderationsBased on what was indexed as part of the grant, our data structure would have to support the following:

• Data• Volume-Level Metadata (Title:

Municipality, District, and Dates Covered)• "Chapter"-Level Metadata (Meeting

Information, Keywords, Dates)• Page-Level Metadata (Page Summaries,

Dates)

Given CONTENTdm as our repository tool, how would we make this happen?

Page 6: Text Collections and CONTENTdm

CONTENTdm Monograph Compound Object

"Monograph" is the compound object structure that would allow us to keep the volume-meeting-page structure and retain the index data created for all those levels (incl. page level).

Page 7: Text Collections and CONTENTdm

From Excel to a Compound Object Ultimately will be data on 300-600 pages for dozens of volumes of minute books. The UNO History Dept. provided data in Excel for first three indexed volumes. Verdict: Convert Excel file into tab-delimited text for import into CONTENTdm.

What is a tab-delimited text file?

• a plain text file without formatting where data fields (Excel cells) are separated by a "tab" character

• file is saved with extension ".txt"• similar to a CSV file where a comma separates the

values instead of a tab

Page 8: Text Collections and CONTENTdm
Page 9: Text Collections and CONTENTdm

From Excel to a Compound Object (cont.)How do you make a tab-delimited text file from an Excel file?

When saving your Excel spreadsheet, choose "Text (Tab delimited)" from the "Save as type" drop down box under the file name 

 Remember where you save it. You will need to tell CONTENTdm where to find it later.

More information: • Microsoft instructions:

http://office.microsoft.com/en-us/excel-help/import-or-export-text-txt-or-csv-files-HP010099725.aspx#BMexport

• CONTENTdm Help: "Using Tab-Delimited Text Files": http://www.contentdm.org/help6/projectclient/entering5.asp

Page 10: Text Collections and CONTENTdm

Building the Tab-Delimited Text File: Part 1

What kind of columns are necessary to tell CONTENTdm how to structure your "monograph?“• Which rows are chapters?• Which rows are pages? Some terminology:• Object: Book-level; the entire bound volume of

minutes; contains chapters, etc.• Item: Page-level; an individual page within a

book/object.• CONTENTdm Field = Excel Column • CONTENTdm Record = Excel Row 

Page 11: Text Collections and CONTENTdm

Building the Tab-Delimited Text File: Part 2Our "Object": Minute Book Volume 1, City of Lafayette, June 1, 1847 - July 5, 1854Unique Identifier: op000001

Our "Items":347 pages(op000001_0001.jpg, op000001_0002.jpg, etc. etc. etc.)

Our "Chapters": Meeting, June 1, 1847 (Pages 1-4)Meeting, June 11, 1847 (Pages 5-10)Meeting, June 24, 1847 (Pages 11-14)etc. etc. etc.

We have data at all these levels.

Page 12: Text Collections and CONTENTdm

Building the Tab-Delimited Text File: Part 3After creating a column for all the fields you want to populate in CONTENTdm (i.e. Title, Creator, etc.), you need two columns at the start of the Excel spreadsheet:

1. CDM_LVL - tells CONTENTdm where you want this row to fall in the book-chapter-page hierarchy.

2. CDM_LVL_NAME - this is what will display as the title of this row in the table of contents (i.e. "Chapter 9" or "Page 135")

Page 13: Text Collections and CONTENTdm

Building the Tab-Delimited Text File: Part 4

Some libraries will not add a separate row for the "Chapter," but since we have metadata at that level, here is how we assigned levels for the OPSB project:

CDM_LVL Assigned Level0 Book / Object1 Meeting /

Chapter2 Page / Item

NOTE: CONTENTdm will allow up to nine levels in a monograph compound object.

Page 14: Text Collections and CONTENTdm

CDM_LVL CDM_LVL_NAME TITLE CREATOR PAGE DESCRIPTION KEYWORDS FILE NAME

City of Lafayette Meeting Minutes, 1847-1854

City of Lafayette Meeting Minutes, 1847-1854

Orleans Parish School Board

0 City of Lafayette Meeting Minutes, 1847-1854

Front Cover Orleans Parish School Board

Public Board of Administrators meeting minutes, 1847-1854

op1_0001.jpg

1 City of Lafayette Meeting Minutes, 1847-1854

Meeting, June 1, 1847

Orleans Parish School Board

1-4 Discussion of whipping, Superintendent's monthly report, discussion of library, and discussion of attendance rules.

discipline; attendance; expenses

op1_0002.jpg

2 Meeting, June 1, 1847

Page 1 Orleans Parish School Board

1 Charges were leveled against Mrs. Smith for severely whipping a student.

op1_0002.jpg

2 Meeting, June 1, 1847

Page 2 Orleans Parish School Board

2 Monthly Superintendent report discussion

op1_0003.jpg

2 Meeting, June 1, 1847

Page 3 Orleans Parish School Board

3 Monthly Superintendent report discussion cont.

op1_0004.jpg

2 Meeting, June 1, 1847

Page 4 Orleans Parish School Board

4 Discussion of attendance rules.

op1_0005.jpg

1 City of Lafayette Meeting Minutes, 1847-1854

Meeting, June 11, 1847

Orleans Parish School Board

5-10 Results of whipping investigation was sole topic of discussion.

Discipline op1_0006.jpg

Page 15: Text Collections and CONTENTdm

CONTENTdm Ingest with Project ClientOnce you have created a project in project client, add a compound object:

Page 16: Text Collections and CONTENTdm

Choose “Compound Object Wizard” in the “Add using” drop-down box and click “Add.”

Page 17: Text Collections and CONTENTdm

Choose “Monograph” from the list of compound object types.Yes, we will be using a tab-delimited text file.

Page 18: Text Collections and CONTENTdm

Browse to find your tab-delimited text file.Browse to find the directory where your page (item) files are saved.

NOTE: All image (page) files for an object (book) must be saved in the same directory.

Page 19: Text Collections and CONTENTdm
Page 20: Text Collections and CONTENTdm

“Label pages using tab-delimited text file”will label each pagewith its actual titleas opposed to somethinglike “op000005_0039”…

Page 21: Text Collections and CONTENTdm

Click through the summaries and click “Finish” to upload the filesto CONTENTdm.

Notice how it is adding more items than you have pages?

“But I only had 347 pages!!!”

Page 22: Text Collections and CONTENTdm

This is because of all the added structure rows (chapters, etc.), which CONTENTdm counts as items:

571 rows in Excel = 347 page rows plus all the chapter/meeting-level rows.

Page 23: Text Collections and CONTENTdm

The Results: What went well, what didn’t, and next steps

Table of Contents Navigation is Confusing

• Multiple expansions are necessary to get to page links

• "Plus" (+) expansion icon very tiny. Difficult to see to get the idea that it should be clicked on and hard to hit with the mouse pointer.

Page 24: Text Collections and CONTENTdm

The Results: What went well, what didn’t, and next steps

Book metadata ("Object Description")

difficult to tell from page metadata ("Description")

• not clear to user what these terms mean

• helped by suppressing certain repeated fields

Page 25: Text Collections and CONTENTdm

The Results: What went well, what didn’t, and next steps

Users give up before they find “Search by Date”

• "Narrow your search by Date" only gives a few options, which seem random.

• After "Advanced Search,“ user must find and click another tiny link to “Search by Date.”

• “Search by Date” returns every individual page in a date range - quite a few results, given that each volume is 300 to 600 pages long. Either need a better way to filter or need to take date off page records.

Page 26: Text Collections and CONTENTdm

The Results:Next Steps• Have since added many more unindexed books to the

original three indexed as part of the grant. We hope there will be support to index these as well.

• Would like to ask historians or library staff to further index these by Municipality / District. This information is in the title but is not split out as data. Complicated because it changed over time…

• Would like to add CQRs, other search mechanisms to supplement CDM search and take advantage of rich data.

• PAGE TURNER!!!!!• Logical way for users to also download complete PDF of

minute books…

Page 27: Text Collections and CONTENTdm

Some Alternatives to Text as Monograph Compound Object

TEI EncodingWhat it isNot page images - take the text of a work, encode it in XML using the TEI standard, and write a Web app to output the XML file(s). In Action: Folger Digital Texts: http://www.folgerdigitaltexts.org/

METSWhat it is An XML "wrapper" that builds a structure around other metadata records (i.e. Dublin Core page records, etc.). This structure could include such levels as chapter, page, paragraph, sentence, headline, caption, and much more. In Action: The (CUA) Tower Online: http://tower.lib.cua.edu/

NOTE: You can encode Dublin Core records, TEI transcriptions, and more within a METS wrappers. CONTENTdm can handle METS through the Flex Loader (usually via a vendor).

Page 28: Text Collections and CONTENTdm

Resources• Creating Compound Objects (Documents,

Monographs, Postcards, and Picture Cubes): http://www.contentdm.com/USC/tutorials/compound-wizard.pdf

• Adding Compound Objects with Tab-Delimited Text: http://www.contentdm.com/help6/objects/adding3a.asp

• Clemson University documentation (more detailed instruction and uses more levels): http://library-web.clemson.edu/wiki/images/9/92/Using_a_tab-delimited_for_mongraphs.pdf

Page 29: Text Collections and CONTENTdm

Answers to Worksheet:

CDM_LVL CDM_LVL_NAME TITLE CREATOR PAGE DESCRIPTION KEYWORDS FILE NAME

  A Very Exciting Tale

A Very Exciting Tale

Smith, Joe        

0 A Very Exciting Tale

Front Cover Smith, Joe 1 Cover of the book

fiction; excitement

js000001_0001.jpg

 1 A Very Exciting Tale

Chapter 1 Smith, Joe 2-4 Our hero wakes up

  js000001_0002.jpg

 2 Chapter 1 Page 2 Smith, Joe 2 Joe gets out of bed.

  js000001_0002.jpg

 2 Chapter 1 Page 3 Smith, Joe 3 Joe has breakfast.

  js000001_0003.jpg

 2 Chapter 1 Page 4 Smith, Joe 4 Joe goes to work.

  js000001_0004.jpg