problems in the annotation of spoken language...

Joaquim LlisterriGrup de Fonètica, Departament de Filologia Espanyola

Sonderforschungsbereich “Mehrsprachigkeit”Universität Hamburg

25 July 2007

Problems in the annotation of spokenlanguage corporaJoaquim Llisterri

Grup de Fonètica, Departament de FilologiaEspanyola, Universitat Autònoma de Barcelona

[email protected]

http://liceu.uab.cat/~joaquim


Problems in the annotation of spoken language corporaSonderforschungsbereich “Mehrsprachigkeit”, Universität Hamburg

25 July 2007

http://liceu.uab.cat/~joaquim/

language_resources/Hamburg_07/

Hamburg_07.html


Problems in the annotation of spokenlanguage corpora

Levels of annotation

Orthographic representation

Segmental annotation

Suprasegmental annotation

Final remarks







Final remarks

mailto:[email protected]

http://liceu.uab.cat/~joaquim

http://liceu.uab.cat/



“Corpus annotation is the practice ofadding interpretative linguisticinformation to a corpus”

LEECH, G. (2005) “Adding linguistic annotation”, in WYNNE, M.(Ed.) Developing Linguistic Corpora: a Guide to Good Practice.

Oxford: Oxbow Books. pp. 17-29.http://www.ahds.ac.uk/creating/guides/linguistic-

corpora/chapter2.htm



• The annotation of a corpus can beconceived as a set of hierarchicallyorganized layers

• Layers usually represent linguistic levelsof analysis


Levels of annotationMusical score


Levels of annotationEXMARaLDA Partitur Editor

http://www.ahds.ac.uk/creating/guides/linguistic-corpora/chapter2.htm





• The levels of annotation are establishedaccording to the aims of the research tobe carried out with the corpus

• Pragmatics and discourse analysis

• Grammar of spoken language

• Phonetics and phonology

• …



• Different kinds of labels are used fordifferent annotation levels

• Phonetic symbols: phonetic labelling

• Morphological tags: POS (Part ofSpeech Tagging)

• Syntactic labels: Parsing



• Phonetic labelling



• POS tagging

CLiC, Centre de Llenguatge i Computacióhttp://clic.fil.ub.es

http://clic.fil.ub.es



• Parsing

CLiC, Centre de Llenguatge i Computacióhttp://clic.fil.ub.es







Final remarks



• The orthographic representationcorresponds to a representation of thespeakers utterances using the standardspelling of a given language

• Also known as transliteration

• The orthographic level is common to allspeech and spoken corpora



“[…] the words that appear in an orthographictranscription of a speech event constitute only a partialrepresentation of the original speech event. Tosupplement this record of the event, the analyst cancapture other features, by making either a prosodic orphonetic transcription, and can also record contextualfeatures. However, […] the record remains inevitablypartial.”

THOMPSON, P. (2005) "Spoken language corpora", in WYNNE, M. (Ed.) DevelopingLinguistic Corpora: a Guide to Good Practice. Oxford: Oxbow Books. pp. 59-70.


http://clic.fil.ub.es






Problems in spontaneous speech

• Punctuation

• Adding punctuation implies a segmentationdecided by the transcriber

• Lack of punctuation decreases legibility ofthe text

• Avoidance of more difficult punctuationmarks like “;”



Problems in spontaneous speech

• Non-standard forms (situational, social orgeographic variation)

• Vocal semi-lexical forms

• Disfluencies: self-repairs, word fragments

• Unintelligible fragments



Recommendations

Preliminary Recommendations on Spoken

Texts. EAGLES Document EAG-TCWG-STP/P, May 1996.http://www.ilc.cnr.it/EAGLES96/spokentx/spokentx.html



• Use conventional spelling forms as theyappear in a standard dictionary. Thisalso applies to contractions, reducedword forms, apostrophes, dialect forms,interjections and vocalised semi-lexicalevents

http://www.ilc.cnr.it/EAGLES9



• If more than one orthographic form ispossible or if non-standard spellings orspelling variations are necessary,maintain a lexicon of the spelling formsused in the transcription



• Represent numbers, abbreviations,acronyms and spelled words in fullorthographic form as pronounced by thespeaker



Recommendations

SENIA, F.- van VELDEN, J.G. (1997) Specifications of

orthographic transcription and lexicon conventions.LRE-4001 SpeechDat Technical Report SD1.3.2, Finalversion, 10 January 1997.http://www.speechdat.org/speechdat/deliverables/public/SD132V24.PDF



• Normal lexical items will be represented bytheir spellings in the normal way

• It is possible to include a very restricted number ofmarkings for regular variations in pronunciation,provided that they are documented and no more thantwo or three regular variations are indicated

http://www.speechdat.org/speechdat/deli



• Abbreviations should be represented bytheir full orthographic forms, unless theyare spoken in their abbreviated form

• Number sequences will be spelled out toreflect what was said



• If a speaker pronounces letters, acronyms orabbreviations as a word, then these should bespelled out as words

• No punctuation will be provided in thetranscription other than those symbols used forspecial transcription purposes


Ortographic representation

• Enriched orthographical representation

• Incorporates information which is notpossible to represent with conventionalspelling

• Used in pragmatics, discourse andconversation analysis, among otherfields



• Phenomena included in an enrichedorthographic representation need to beencodedSPERBERG-McQUEEN, C.M. - BURNARD, L. (Eds.)(2007) "7 Transcriptions of Speech", in TEI P5:Guidelines for Electronic Text Encoding andInterchange. The TEI Consortium: Oxford, Providence,Charlottesville, Nancy. http://www.tei-c.org/release/doc/tei-p5-doc/html/TS.html

http://www.tei-c.Joaquim





• <u> (utterance) a stretch of speech usuallypreceded and followed by silence or by achange of speaker.

• <pause/> a pause either between or withinutterances.

• <vocal> (Vocalized semi-lexical) any vocalizedbut not necessarily lexical phenomenon, forexample voiced pauses, non-lexicalbackchannels, etc.



• <kinesic> (Non-vocalized quasi-lexical) anycommunicative phenomenon, not necessarilyvocalized, for example a gesture, frown, etc.

• <event> any phenomenon or occurrence, notnecessarily vocalized or communicative, forexample incidental noises or other eventsaffecting communication.



• <writing> (Writing) a passage of written textrevealed to participants in the course of aspoken text.

• <shift/> marks the point at which someparalinguistic feature of a series of utterancesby any one speaker changes.



<shift/> in tempo• a - allegro (fast)• aa - very fast• acc - accelerando (getting faster)• l - lento (slow)• ll - very slow• rall - rallentando (getting slower)



<shift/> in loud (loudness)

• f - forte (loud)

• ff - very loud

• cresc - crescendo (getting louder)

• p - piano (soft)

• pp - very soft

• dimin - diminuendo (getting softer)



<shift/> in pitch (pitch range)• high - high pitch-range• low - low pitch-range• wide - wide pitch-range• narrow - narrow pitch-range• asc - ascending• desc - descending• monot - monotonous• scand - scandent, each succeeding syllable higher than

the last, generally ending in a falling tone



<shift/> in tension• sl - slurred• lax - lax, a little slurred• ten - tense• pr - very precise• st - staccato, every stressed syllable being

doubly stressed• leg - legato, every syllable receiving more or

less equal stressJoaquim Llisterri

Grup de Fonètica, Departament de Filologia Espanyola


<shift/> in rhythm• rh - beatable rhythm

• arrh - arrhythmic, particularly halting

• spr - spiky rising, with markedly higher unstressed syllables

• spf - spiky falling, with markedly lower unstressed syllables

• glr - glissando rising, like spiky rising but the unstressed syllables,usually several, also rise in pitch relative to each other

• glf - glissando falling, like spiky falling but with the unstressedsyllables also falling in pitch relative to each other



<shift/> in voice (voice quality)• whisp - whisper

• breath - breathy

• husk - husky

• creak - creaky

• fals - falsetto

• reson - resonant



<shift/> in voice (voice quality)• giggle - unvoiced laugh or giggle• laugh - voiced laugh• trem - tremulous• sob - sobbing• yawn - yawning• sigh - sighing



• “A full definition of the sense of thevalues provided for each feature shouldbe provided in the encoding descriptionsection of the text header”



• "Keep it simple”

• "Document everything adequately"

SENIA, F.- van VELDEN, J.G. (1997) Specifications

of orthographic transcription and lexicon conventions.LRE-4001 SpeechDat Technical Report SD1.3.2,Final version, 10 January 1997.http://www.speechdat.org/speechdat/deliverables/public/SD132V24.PDF

http://www.speechdat.org/speechdat/de







Final remarks



• Segmental annotation concerns the phoneticrepresentation of the utterances pronounced bythe speakers

• Levels of segmental annotation GIBBON, D. - MOORE, R.- WINSKI, R. (Eds.) (1998)

Spoken Language System and Corpus Design. Berlin:Mouton De Gruyter. (Handbook of Standards andResources for Spoken Language Systems, I)



• Citation or canonical form

• Words are transcribed in theircanonical form, as pronounced inisolation in careful speech



• Broad transcription or phonotypicaltranscription

• Phonological transcription plus regularor predictable contextual phoneticphenomenaSAMPA



• Narrow transcription

• Phonetic transcription with allophonesclosely representing the phoneticrealizationX-SAMPA



• Acoustic-phonetic transcription

• Representation of acoustic-phoneticevents which can be observed in thewaveform



SAMPA

• SAM (Speech Assessment Methods)Phonetic Alphabet (1987-1989)

http://www.phon.ucl.ac.uk/home/sampa/home.htm

John WellsUniversity College London



SAMPA

• Only 7-bits ASCII characters

• Phonological transcription: onlycontrastive symbols are used

• Some symbols for allophones have beenintroduced for certain languages

http://www.phon.ucl.ac.uk/home/



Catalan SAMPA

http://liceu.uab.es/~joaquim/language_resources/SAMPA_Catalan.html



Catalan SAMPA




Catalan SAMPA




Catalan SAMPA








Catalan SAMPA




Catalan SAMPA




X-SAMPA

• Extended SAM (Speech AssessmentMethods) Phonetic Alphabet

http://www.phon.ucl.ac.uk/home/sampa/x-sampa.htm

John WellsUniversity College London



X-SAMPA

• Equivalence in ASCII codes of all IPAsymbols, including diacritics and tonalmarks










Final remarks



• Prosodic or suprasegmental phenomena• Stress / Accent• Melody / Intonation• Rate• Rhythm• Pauses• Voice quality



• Some of the suprasegmental elements areannotated in enriched orthographicrepresentations



THOMPSON, P. (2005) "Spoken Language Corpora", in WYNNE, M. (Ed.) Developing LinguisticCorpora: a Guide to Good Practice. Oxford: Oxbow Books: 59-70.

http://ahds.ac.uk/guides/linguistic-corpora/chapter5.htm

http://ahds.ac.uk/guides/linguistic-corpora/chapter5.htm



SAMPROSA

• SAM (Speech Assessment Methods)Prosodic Alphabet

http://www.phon.ucl.ac.uk/home/sampa/samprosa.htm

John Wells

University College London



• SAMPROSA - Local tone



• Most of the problems are found in thetranscription of intonation (melody +stress)

• Continuous variations of three physicalparameters which have to betransformed into a symbolic (discrete)representation linguistically meaningful



“The standard system for annotating prosody (stress,intonation, etc.) is ToBI (= Tones and Break Indices),which comes with its own speech-processing platform. Itsphonological model originated with Pierrehumbert (1980).The system is partially automated, but needs to besubstantially adapted for fresh languages and dialects.”

LEECH, G. (2005) “Adding linguistic annotation”, in WYNNE, M. (Ed.)Developing Linguistic Corpora: a Guide to Good Practice. Oxford: Oxbow

Books. pp. 17-29.http://www.ahds.ac.uk/creating/guides/linguistic-








“ToBI is well supported by dedicated softwareand a committed research community. On theother hand, it has met with criticism, and twoalternative annotation systems worth examiningare INTSINT (see Hirst 1991) and TSM — toneticstress marks (see Knowles et al. 1996).”

LEECH, G. (2005) “Adding linguistic annotation”, in WYNNE, M. (Ed.)Developing Linguistic Corpora: a Guide to Good Practice. Oxford: Oxbow

Books. pp. 17-29.http://www.ahds.ac.uk/creating/guides/linguistic-




ToBI (Tone and Break Indices)• Phonological representation based in the

metrical autosegmental model

BECKMAN, M. E. - HIRSCHBERG, J. - SHATTUCK-HUFNAGEL, S.(2005) "The original ToBI system and the evolution of the ToBIframework”, in JUN, S.-A. (Ed.), Prosodic Typology. The Phonology ofIntonation and Phrasing (pp. 9-54). Oxford: Oxford University Press. pp.9-54. http://www.ling.ohio-state.edu/~tobi/JunBook/BeckHirschShattuckToBI.pdf

http://www.ling.ohio-state.edu/~tobi/



ToBI (Tone and Break Indices)• Orthographic tier• Break index tier• Tone tier

• Phrasal tones• Pitch accents• Boundary tones

• Miscellaneous tier



ToBI (Tone and Break Indices)




http://www.ling.ohio-state.edu/~tobi/JunBook/BeckHirschShattuckToBI.pdf



http://www.ling.ohio-state.edu/~tobi/



• ToBI (Tone and Break Indices)• Heavily dependent on a phonological

model• Needs adaptation for particular

languages• Somehow it implies a previous

knowledge of expected intonationalphenomena



INTSINT (International Transcription Systemfor Intonation)

Daniel Hirst, Laboratoire Parole et Langage, Universitéde Provence, Aix-en-Provence

CAMPIONE, E.- HIRST, D.- VÉRONIS, J. (2000) "Automaticstylisation and symbolic coding of F0: Implementations of theINTSINT model", in BOTINIS, A. (Ed.) Intonation: Analysis,

Modelling and Technology. Dordrecht: Kluwer AcademicPublishers. pp. 185-208. http://www.up.univ-mrs.fr/~veronis/pdf/2000Campione.pdf



INTSINT

• F0 detection



INTSINT

• Stylization with target points

http://www.up.univ-mrs.Joaquim





INTSINT

• Coding with INTSINT labels




INTSINT

• Absolute tones



INTSINT

• Relative iterative

tones



INTSINT

• Relative non

iterative tones



INTSINT• Symbolic representation of the F0

contour in discrete categories• Based on target points with values for

time and F0 which are coded withINTSINT symbols

• Perceptual equivalence between thestylized and the actual melodic contour



INTSINT• Praat implementationC. Auran, Laboratoire Parole et Langage, Université de

Provencehttp://www.lpl.univ-aix.fr/~auran/english/ressources.html

G. Rolland, Institut de la Communication Parlée,Grenoblehttp://www.icp.inpg.fr/~loeven/Praat/momel_english.html







Final remarks

http://www.lpl.univ-aix.fr/~auran/english/ressources.html



http://www.icp.inpg.fr/~loeven/Praat/mo


Final remarks

• Orthographic representation

• Standards for encoding: TEI (TextEncoding Initiative)

• Different transcription/transliterationpractices


Final remarks

• Segmental annotation

• Choice of annotation levels

• Standard for transcription symbols:IPA and computer-readableequivalents (SAMPA, X-SAMPA)


Final remarks

• Suprasegmental annotation

• Different systems for differentapproaches to annotation:phonological (ToBI) or phonetic(INTSINT)

• Not in conflict, but complementary


Final remarks

• Choices in annotation depend on theresearch objectives

• Be eclectic if needed, but ensurereusability• Automatic conversion between systems• Document everything


Problems in the annotation of spoken language corporaSonderforschungsbereich “Mehrsprachigkeit”, Universität Hamburg

25 July 2007

http://liceu.uab.cat/~joaquim/

language_resources/Hamburg_07/

Hamburg_07.html

http://liceu.uab.cat/

problems in the annotation of spoken language...

Documents