talk 13: customising and documenting your tei...

35
Talk 13: Customising and Documenting Your TEI project James Cummings 28 January 2014 @jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 1/35

Upload: others

Post on 26-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

Talk 13: Customising and Documenting YourTEI project

James Cummings

28 January 2014

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 1/35

Page 2: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

Customising the TEI

How the TEI is constructedMaking a TEI schemaSpecifying your profile of the TEIGenerating your own documentation

Every use of the TEI involves making use of a customisation of theTEI.

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 2/35

Page 3: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

Some terminologyThe TEI encoding scheme consists of a number of modulesEach module contains a number of element specificationsEach element specification contains:

a canonical name (<gi>) for the element, and optionally othernames in other languagesa canonical description (also possibly translated) of its functiona declaration of the classes to which it belongsa definition for each of its attributesa definition of its content modelusage examples and notes

a TEI schema specification (<schemaSpec>) is made byselecting modules or elements and (optionally) modifying theircontentsa TEI document containing a schema specification is called anODD (One Document Does it all)

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 3/35

Page 4: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

What is a module?

A convenient way of grouping together a number of elementdeclarationsThese are usually on a related topic or specific applicationMost chapters of P5 focus on elements drawn from a singlemodule, which that chapter then definesA TEI Schema is created by selecting modules and adding orremoving elements from them as needed

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 4/35

Page 5: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

Which modules exist?Module name Chapteranalysis Simple Analytic Mechanismscertainty Certainty and Responsibilitycore Elements Available in All TEI Documentscorpus Language Corporadictionaries Dictionariesdrama Performance Textsfigures Tables, Formulae, and Graphicsgaiji Representation of Non-standard Characters and Glyphsheader The TEI Headeriso-fs Feature Structureslinking Linking, Segmentation, and Alignmentmsdescription Manuscript Descriptionnamesdates Names, Dates, People, and Placesnets Graphs, Networks, and Treesspoken Transcriptions of Speechtagdocs Documentation Elementstei The TEI Infrastructuretextcrit Critical Apparatustextstructure Default Text Structuretranscr Representation of Primary Sourcesverse Verse

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 5/35

Page 6: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

How do you choose?

Just choose everything (not really a good idea)The TEI provides a small set of predefined combinations (TEILite, TEI Bare...)Or you could roll your own (but then you need to know whatyou’re choosing)

Roma a command line script, with a web front end,designed to make this process much easier

http://www.tei-c.org/Roma/

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 6/35

Page 7: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

Roma: New

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 7/35

Page 8: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

Roma: Customize

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 8/35

Page 9: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

Roma: Schema

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 9/35

Page 10: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

Roma: Documentation

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 10/35

Page 11: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

What did we just do?We processed a pre-existing ODD file which contained (as well assome discursive prose) the following schema specification:.

......

<schemaSpec ident="tei_bare" start="TEI"><moduleRef key="core"/><moduleRef key="tei"/><moduleRef key="header"/><moduleRef key="textstructure"/><elementSpec ident="abbr" mode="delete" module="core"/><elementSpec ident="add" mode="delete" module="core"/>

<!-- ... --><elementSpec ident="trailer" mode="delete" module="textstructure"/><elementSpec ident="title" mode="change" module="core"><attList><attDef ident="level" mode="delete"/>

</attList></elementSpec>

<!-- ... --></schemaSpec>

We selected four modules, deleted loads of elements, and alsodeleted an attribute.

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 11/35

Page 12: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

Roma provides an interface to the detail

The [Modules] tab shows the modules availableSelecting a module from it shows the elements within thatmodule, and gives you the choice to

include all of them (and then remove some)exclude all of them (and then put back the ones you want)

You can also change an element’s attribute list, and thevalues they permit

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 12/35

Page 13: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

Roma: Modules

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 13/35

Page 14: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

Roma: Change Module

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 14/35

Page 15: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

What does our REED corpus need?

A simple selection of elements, but alsowe want to allow only certain values for @type on <div>we may want to create a new element (can you think ofsomething?)

Other constraints are possible — we might want to insist that a<div type="record"> contains a date, for example.

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 15/35

Page 16: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

The ODD advantageWe can express these constraints in our ODD meta-schema, andthen generate a formal schema to enforce them using whicheverschema language we like.

TEI schemas can be generated inISO RELAX NG languageW3C Schema LanguageXML DTD language

ODD itself defines an element’s content models using a subsetof RELAX NG syntaxDatatypes are defined in terms of W3C datatypesSome facilities (e.g. alternation, namespaces) cannot beexpressed in DTDs — RELAX NG schema is recommendedAdditional constraints can be expressed in Schematron

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 16/35

Page 17: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

Roma: selecting attributes

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 17/35

Page 18: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

Roma: constraining attribute values

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 18/35

Page 19: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

What did we just do?Our ODD now includes something like this:.

......

<elementSpec ident="div" mod-ule="textstructure" mode="change"><attList><attDef ident="type" mode="change" usage="req"><valList type="closed" mode="replace"><valItem ident="prose"/><valItem ident="verse"/><valItem ident="drama"/>

<!-- ... --></valList>

</attDef></attList>

</elementSpec>

Note that we can also add documentation to the ODD:.

......

<valItem ident="verse"><gloss>contains (parts of ) a poem</gloss>

</valItem>

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 19/35

Page 20: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

Defining a new element

When defining a new element, we need to considerits name and descriptionwhat attributes it can carrywhat it can containwhere it can appear in a document

The TEI class system helps us answer all these questions (exceptthe first).

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 20/35

Page 21: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

The TEI Class System

The TEI distinguishes over 500 elements,Having these organised into classes aids comprehension,modularity, and modification.Attribute class: the members share common attributesModel class: they can appear in the same locations (and areoften semantically related)Classes may contain other classesAn element can be a member of any number of classes,irrespective of the module it belongs to.

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 21/35

Page 22: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

Attribute Classes

Attribute classes are given (usually adjectival) namesbeginning with att.; e.g. att.naming, att.typedall members of att.naming inherit from it attributes @keyand @ref ; all members of att.typed inherit from it @typeand @subtypeIf we want an element to carry the @type attribute, therefore,we add the element to the att.typed class, rather thandefine those attributes explicitly.

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 22/35

Page 23: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

A very important attribute class: att.global

All elements are usually members of att.global; this class provides,among others:

@xml:id a unique identifier@xml:lang the language of the element content

@n a number or name for an element@rend how the element in question was rendered or

presented in the source text.

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 23/35

Page 24: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

Model Classes

Model classes contain groups of elements which are allowed inthe same place. e.g. if you are adding an element which iswanted wherever the <bibl> is allowed, add it to themodel.biblLike classModel classes are usually named with a Like or Part suffix:

members of model.pLike are all things that ‘behave like’paragraphs, and are permitted in the same places as paragraphsmembers of model.pPart are all things which can appearwithin paragraphs. This class is subdivided into

model.pPart.edit elements for simple editorial interventionsuch as <corr>, <del> etc.model.pPart.data‘data-like’ elements such as <name>,<num>, <date> etc.model.pPart.msdesc extra elements for manuscript descriptionsuch as <seal> or <origPlace>

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 24/35

Page 25: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

Basic Model Class StructureSimplifying wildly, one may say that the TEI recognises three kindsof element:

divisions high level major divisions of textschunks elements such as paragraphs appearing within texts

or divisions, but not other chunksphrase-level elements elements such as highlighted phrases which

can occur only within chunksThere are ‘base model classes’ corresponding with each of these,and also with the following groupings:inter-level elements elements such as lists which can appear either

in or between chunkscomponents elements which can appear directly within texts or

text divisionsAnd yes, there is a class model.global for elements that can appearanywhere inside a text — at any hierarchic level.

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 25/35

Page 26: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

Defining a new element

What other elements is it like?What other elements can contain it?What can it contain?

Conclusions:What model do we select?What content model do we select?

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 26/35

Page 27: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

Roma: Defining a new element

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 27/35

Page 28: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

Defining a content modelA typical TEI element defines its content by referencingclasses of element which it can contain, rather than usingspecific elements.Content models are defined using the RELAXNG vocabularyHere are some very common predefined content models:macro.paraContent content of paragraphs and similar

elementsmacro.limitedContent content of prose elements that are not

used for transcription of extant materialsmacro.phraseSeq a sequence of character data and

phrase-level elementsmacro.phraseSeq.limited a sequence of character data and

those phrase-level elements that are not typicallyused for transcribing extant documents

macro.specialPara the content model of elements which eithercontain a series of component-level elements orelse contain a series of phrase-level andinter-level elements

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 28/35

Page 29: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

Roma: Defining a new element 2

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 29/35

Page 30: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

What did we just do?We added a new element specification to our ODD, like this:.

......

<elementSpecident="something"ns="http://www.example.org/ns/nonTEI"mode="add">

<desc> contains something division like.</desc><classes><memberOf key="model.divPart"/><memberOf key="att.typed"/>

</classes><content><rng:ref name="something"/><rng:oneOrMore><rng:ref name="model.pLike"/>

</rng:oneOrMore></content>

</elementSpec>

Note that this new element is not in the TEI namespace. Itbelongs to this specific project only!

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 30/35

Page 31: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

Other kinds of constraintsYou can also constrain the content of an element or the valueof an attribute to be of a particular datatype (for example, toinsist that the @when attribute of the element <date>contains only a date)This can be done by using one of a set of predefined macrosto define the content. Examples include

data.word a single word or tokendata.name an XML Name

data.enumerated a single XML name taken from adocumented list

data.temporal.w3c a W3C datedata.truthValue a truth value (true/false)data.language a human language

data.sex human or animal sexOr you can define a more complex constraint, e.g. usingSchematron

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 31/35

Page 32: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

Schematron constraintsAn element specification can also contain a <constraintSpec>element which contains rules about its content expressed asISO Schematron constraints

.

......

<elementSpec ident="div" mod-ule="teistructure" mode="change"   xmlns:s="http://purl.oclc.org/dsdl/schematron"><constraintSpec ident="div" scheme="isoschematron"><constraint><s:assert test="@type='record' and .//tei:date">prose

must include a paragraph</s:assert>

</constraint></constraintSpec>

</elementSpec>

However...You can only add such rules by editing your ODD file: Romadoesn’t know about them.Not all schema languages can implement these constraints.

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 32/35

Page 33: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

TEI customization for fun and profit

With the Stationers’ Register project the keying company would:do double-keying data entry matching any XML schemaand charged per kilobyte of output!

So we:used TEI ODD to create a highly compressed / byte-reducedrenaming schemaand up-converted the result back to pure TEI!

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 33/35

Page 34: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

tei_corset example

.

......

<nm r="rm">iijli vjs viijd</nm> from

<n r="b">Thomas barthelett</n>

was up-converted to.

......

<seg type="notFee" rend="roman-numerals"><num type="totalPence" value="800"><num type="poundsAsPence" value="720"> iij<hi rend="superscript">li</hi></num><num type="shillingsAsPence" value="72"> vj<hi rend="superscript">s</hi></num><num type="pence" value="8"> viij<hi rend="superscript">d</hi></num>

</num></seg> from<persName role="stationer" rend="bold"><forename>Thomas</forename><surname>barthelett</surname>

</persName>

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 34/35

Page 35: Talk 13: Customising and Documenting Your TEI projecttei.oucs.ox.ac.uk/Talks/2014-01-toronto/talk13.pdf · Some terminology The TEI encoding scheme consists of a number of modules

Next

Any Questions? That is a quick look at some of the basic thingsone can do with the TEI ODD language, and the Roma web tool.Now let’s do an exercise where we try out customising the TEI!

@jamescummings – [email protected] – http://tei.it.ox.ac.uk/Talks/2014-01-toronto/ 35/35