customizing the imdi metadata schema for endangered languages heidi johnson (ailla) arienne dwyer...

19
Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)

Upload: felicity-barton

Post on 02-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)

Customizing the IMDI metadata schema for endangered languages

Heidi Johnson (AILLA)

Arienne Dwyer (DOBES)

Page 2: Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)

Introduction

IMDI: International Standards for Language Engineering Metadata Initiative DOBES: Volkswagen Foundation’s Documentation of Endangered Languages initiativeAILLA: the Archive of the Indigenous Languages of Latin America

Page 3: Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)

Types of resources

Audio and video recordings in various digital formatsAnnotation text files, e.g. transcriptions and translationsStandalone texts, e.g. dictionaries, poetryWide range of genres: from verbal art to scholarly analyses

Page 4: Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)

Bundles of resources

Session (IMDI, 2001): resources resulting from a linguistic elicitation session - recordings and annotations.Only models one kind of resource production - a recording session.Collections will include a greater variety of resources, in sets of related materials.

Page 5: Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)

Types of bundles

Canonical bundle: the original session. A digitized recording, in different formats, and some textual annotation files, also in different formats.Minimal bundle: a single file. Examples: dictionary, poem, recording of uninterpretable chants.Meta-bundle: a bundle containing other bundles. Example: a book about a set of annotated recordings.

Page 6: Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)

Bundle elements

Current:– Name of bundle– Date and place of production

Proposed:– Resource relations– Date archived– Last modified

Page 7: Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)

Major subschemas

Project

Collector

Content

Participants

Resources

References

Page 8: Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)

The Content Subschema

Genre is the top-level category:– Interaction: conversation, interview …– Explanation: description, recipe …– Performance: narrative, poem, oratory …– Teaching: primer, textbook …– Analysis: grammar, dictionary …

Page 9: Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)

Other Content categories

Modality: speech, writing, gestureCommunication context:– Interactivity– Planning– Involvement

LanguagesTaskDescriptionKeys

Page 10: Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)

AILLA’s Content Keys

Register: a characterization of how the discourse reflects the social context. Example: honorific speech

Style: about poetic and stylistic effects. Examples: parallelism, metered verse.

Page 11: Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)

The Project subschema

Current elements:– Name: a nickname or acronym– Title: official title– ID: a unique identifier– Contact information

Proposed element:– Funder: name of funding organization

Page 12: Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)

The Collector subschema

AILLA renames this Depositor, since this is the individual we have to keep track of (e.g. for Level 3 access permission). When the Depositor is not also the Collector, Collector can be listed under Participants.

Page 13: Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)

The Participants subschema

Type: functional role, e.g. creator

Role: family relationship

Name/Full name

Language(s)

Ethnic group, age, sex:

Education

Anonymous: True if participant’s Full name is reserved; False otherwise

Page 14: Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)

AILLA additions to Participants

Origin: Place (country, region, etc) of origin of the creator of the primary resource in the bundle (e.g. the speaker whose voice is recorded).

Occupation: Can be relevant in assessing accuracy of some kinds of data.

Page 15: Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)

The Resources subschema

Resources contains information about formats and provenance of files in a bundle.Media Files: audio, video, etc. Annotation Files: text files.Proposal: call them all Media Files, to reduce redundancy in the database. (All have URL, size, etc. elements.)

Page 16: Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)

Text resources

Current elements:– Type: type of annotation, e.g. phonetic

transcription.– Content encoding: annotation encoding

scheme, e.g. EUROTYP.– Character encoding: character set(s) used

in a text file.

Page 17: Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)

Text resources 2

Proposed elements:– Transcription type– Translation (aka Glossing) type– Software: used to produce transcriptions,

translations, other annotations (e.g. Shoebox)

Describe Annotator in Participants (along with Translator, etc.)

Page 18: Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)

Proposed subschema

Place: composed of several elements:– Continent– Country– Region– Subregion (address)

Repeated at least twice, in Bundle and in Participants (Origin).Might also be useful in the Language subschema.

Page 19: Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)

Conclusion

IMDI schema is a flexible tool.

Customization through Key/Value pairs allows local modifications.

Most of the proposed changes are terminological, moving from the DOBES in-house terminology to more general usage.