Transcript
Page 1: From Texts to eTexts: Thematic Research Collections & Text Encoding

Lecture 2: From Texts to eTexts: Thematic Research Collections and

Text Encoding.

Emma Clarke & Tomás Ó MurchúTheory and Practice of Digital Humanities.MPhil Digital Humanities

Page 2: From Texts to eTexts: Thematic Research Collections & Text Encoding

Why Thematic Research Collections?

Libraries as Laboratories (Palmer)

Exaggeration?

Limitations of scattered content

Digital aggregations of primary sources and related materials that support research on a theme. (Palmer).

TRCs getting closer to the laboratory ideal – source material, tools & expertise together to advance the production of new knowledge.

PART 1: THEMATIC RESEARCH COLLECTIONS

Page 3: From Texts to eTexts: Thematic Research Collections & Text Encoding

THEMATIC RESEARCH COLLECTIONS

Many shapes and sizes…

May contain manuscripts, images, commentary, audio, letters, translations, versions etc.

Page 4: From Texts to eTexts: Thematic Research Collections & Text Encoding

Digital Libraries/Archives & TRCs

Digital Libraries and Archives differ in mission and method.

Library collections are amassed for preservation, dispensing, bibliographic, and symbolic purposes

Digital Libraries have diverse collections.

Perseus Collection – a digital archive.

Bolles Collection on the History ofLondon – a TRC within a digital archive (Perseus Collection).

www.perseus.tufts.edu/ or perseus.mpiwg-berlin.mpg.de/

DIFFERENCES BETWEEN THEMATIC RESEARCH COLLECTIONS AND DIGITAL LIBRARIES AND ARCHIVES

Page 5: From Texts to eTexts: Thematic Research Collections & Text Encoding

John Unsworth (2000)

1. Necessarily Electronic (because of cost of 2,3,8)2. Constituted of Heterogeneous datatypes (multimedia)3. Extensive but thematically coherent4. Structured but open-ended5. Designed to support research6. Authored or multi-authored7. Interdisciplinary8. Collections of digital primary resources (and they 

        are themselves second-generation digital resources)

CHARACTERISTICS OF THEMATIC RESEARCH COLLECTIONS

Page 6: From Texts to eTexts: Thematic Research Collections & Text Encoding

CHARACTERISTICS OF THEMATIC RESEARCH COLLECTIONS

Palmer (2004)

Content Function

* Digital

* Thematic

* Coherent Scholarly contribution

* Heterogeneous Contextual mass

* Structured Interdisciplinary platform

* Open-ended Activity support

Basic elements

Variable characteristics

Research support

Page 7: From Texts to eTexts: Thematic Research Collections & Text Encoding

Two Basic Elements of a TRC

Digital : Digital format even though sources may exist as manuscripts, images etc.

Thematic: Contents are focused on particular research themes.

• Author Orientated-Walt Whitman Archive, Thomas MacGreevy Archive

• Historical Event/Period - Salem Witch Trials Archive, 1641 Depositions, September 11 Digital Archive

• Specific focused theme – Hamlet on the Ramparts

CHARACTERISTICS OF THEMATIC RESEARCH COLLECTIONS

Page 8: From Texts to eTexts: Thematic Research Collections & Text Encoding

Variable Characteristics

Coherent: A coherent set of primary resources that relate directly to the theme.

Heterogeneous: Manuscripts, letters, critical essays, reviews, biographies, bibliographies

Structured: Permits searches and analysis. Interrelated groups structured together – images together, letters together etc.

Open Ended: Potential to grow and change. New sources added and improved. Annotations, links etc. Sep 11 archive

CHARACTERISTICS OF THEMATIC RESEARCH COLLECTIONS

Page 9: From Texts to eTexts: Thematic Research Collections & Text Encoding

What goes into the TRC?

In both physical and digital libraries, materials are usually separated for reasons unimportant to a researcher. For example, primary texts may be part of a special collection, while secondary works may be in separate book and journal collections.

A TRC has a mix of heterogeneous but closely associated materials.

For example in the http://dante.ilt.columbia.edu/ - Digital Dante Archive

CONTENT DECISIONS IN TRCS

Page 10: From Texts to eTexts: Thematic Research Collections & Text Encoding

The Interdisciplinary nature of TRCs

TRCs usually contain resources from different fields within the humanities world.

For example Thomas MacGreevy Archive aims to promote inquiry into the interconnections between literature, culture, history, and politics by blurring the boundaries that separate the different fields of study.

http://www.macgreevy.org

CONTENT DECISIONS IN TRCS

Page 11: From Texts to eTexts: Thematic Research Collections & Text Encoding

1.  TRCs contain their own digital primary resources rather than basing their work on digital primary resources produced by libraries or publishers - issues with permissions & copyrights and ability to edit, intervene in, comment on, contextualize materials produced and controlled by others.

2. Lack of willingness of libraries to collect the scholars' "second-generation" digital publications so that they can become someone else's digital primary

PROBLEMS FOR TRCS

Page 12: From Texts to eTexts: Thematic Research Collections & Text Encoding

3. “Do-it-yourselfism”.Each scholar/team builds their own digital library (and acts as his or her own publisher) leads to wasted and duplicated effort, loss of materials and loss of confidence in digital scholarship because, most importantly, it produces a more or less immediate breakdown in referential integrity.

4. Marketing, design, editorial skills and services of publishers are not connecting with born-digital scholarly publications: editorial standards are not always what they should be, documentation is sometimes sloppy, problems of rights and permissions are frequently ignored, etc.

PROBLEMS FOR TRCS

Page 13: From Texts to eTexts: Thematic Research Collections & Text Encoding

5. The genre of the thematic research collection is largely developing outside of publishing institutions.  As a consequence, publishers seem of questionable relevance to it.

6. Publishers have been, historically, the conduit connecting authors to libraries—but that connection is not being made for thematic research collections.  As a consequence, publications of this sort  are not making their way into library collections.

PROBLEMS FOR TRCS

Page 14: From Texts to eTexts: Thematic Research Collections & Text Encoding

TEI Consortium

TEI Guidelines

Website:

TEI

TEXT ENCODING INITIATIVE

Page 15: From Texts to eTexts: Thematic Research Collections & Text Encoding

More organised and searchable than a scan

Contains more information than a transcript

• page layout • line breaks • material qualities • physical properties • other meta-data

WHY ENCODE?

Page 16: From Texts to eTexts: Thematic Research Collections & Text Encoding

By markup language we mean a set of markup conventions used together for encoding texts.

A markup language must specify:• what markup is allowed, • what markup is required, • how markup is to be distinguished from text, • and what the markup means

“Markup is an act of interpretation” (Cummings)

Following examples from University of Michigan Library

MARKUP LANGUAGES

Page 17: From Texts to eTexts: Thematic Research Collections & Text Encoding

Click icon to add pictureClick icon to add pictureClick icon to add picture

Page 18: From Texts to eTexts: Thematic Research Collections & Text Encoding

Click icon to add pictureClick icon to add pictureClick icon to add picture

Page 19: From Texts to eTexts: Thematic Research Collections & Text Encoding

Three characteristics of XML seem to the TEI to make it unlike other markup languages:

• emphasis on descriptive rather than procedural markup;• document type concept;• independence of any one hardware or software system.

Compared with HTML, XML has some other important characteristics:

• it is extensible (customisable): it does not contain a fixed set of tags

• its documents must be well-formed according to a defined syntax, and may be formally validated

• it focuses on the meaning of data, not its presentation

WHY XML? WHY NOT HTML?

Page 20: From Texts to eTexts: Thematic Research Collections & Text Encoding

XML EXAMPLE

Page 21: From Texts to eTexts: Thematic Research Collections & Text Encoding

Official title: Guidelines for Electronic text Encoding and Interchange

Continually revised set of proposals of suggested methods for text encoding.

Guidelines describe the principles that should be used when marking up texts

They will evolve and inevitably change but they will overall stay true to the initial design goals:

TEI GUIDELINES

Page 22: From Texts to eTexts: Thematic Research Collections & Text Encoding

INITIAL DESIGN GOALS OF TEI GUIDELINES

1.  suffice to represent the textual features needed for research

2.  be simple, clear, and concrete

3.  be easy for researchers to use without special-purpose software

4.  allow the rigorous definition and efficient processing of texts

5.  provide for user-defined extensions

6.  conform to existing and emergent standards

Page 23: From Texts to eTexts: Thematic Research Collections & Text Encoding

Apply to texts in any natural language, of any date, in any literary genre or text type, without restriction on form or content.

Are customisable.

Examples of document content (tags)

THE GUIDELINES

Textual elements Titles/ paragraphs/ headings/ dedications

Non-textual elements Graphics/ illustrations/ cover/ binding material/ line breaks

Meta-data Publication dates/ prices/ page counts / history

Page 24: From Texts to eTexts: Thematic Research Collections & Text Encoding

If marking up texts is “an act of interpretation” then it is one person/ a group of people’s interpretation of what is important information.

By marking up documents and creating online scholarly editions, we are using historical texts / documents in a way that they were never intended to be used by the creator.

“Because (TEI) … treats the humanities corpus … as informational structures, it ipso facto violates some of the most basic reading practices of the humanities community, scholarly as well as popular.” (McGann 2001: 139)

CRITICISM (?) OF TEI

Page 25: From Texts to eTexts: Thematic Research Collections & Text Encoding

A Family At War: The Diary of Mary Martin

1 January – 25 May 1916

Written in letter format to her son Charlie who went missing in action during WW1, the diary chronicles thedaily activities of Mary, her family, friends and relatives.

Diary of Mary Martin site

TEI PROJECTS

Page 26: From Texts to eTexts: Thematic Research Collections & Text Encoding

Autour d’une séquence et des notes du Cahier 46: enjeu du codage dans les brouillons de Proust

Around a sequence and some notes of Notebook 46: encoding issues about Proust's drafts

Proust Prototype

TEI PROJECTS


Top Related