a corpus search methodology for focus realization jonathan howell and mats rooth linguistics and cis...
Post on 21-Dec-2015
216 views
TRANSCRIPT
![Page 1: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022032522/56649d6c5503460f94a4c189/html5/thumbnails/1.jpg)
A Corpus Search Methodology for Focus Realization
Jonathan Howell and Mats Rooth
Linguistics and CIS
Cornell University
![Page 2: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022032522/56649d6c5503460f94a4c189/html5/thumbnails/2.jpg)
Goals
Study phonetic realization of focus in cases where formal-semantic theories make clear predictions.
Natural data from podcasts, radio, etc.
Find data using speech search engine based on speech recognition (Everyzing)
Automate all of the workflow
Today: preliminary data from pilot
![Page 3: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022032522/56649d6c5503460f94a4c189/html5/thumbnails/3.jpg)
he stayed longer than I did
-er [[ he he stayed x long]2
than [ IF stayed x long ]~2]
[ y stayed x-long ] antecedent clause
[ speaker stayed x-long ] scope of focus
![Page 4: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022032522/56649d6c5503460f94a4c189/html5/thumbnails/4.jpg)
… I should have liked that song a lot more than I did.
[more
x[[should w[ I like that song x well in w]]
than [I like that song x well in w0]]]
![Page 5: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022032522/56649d6c5503460f94a4c189/html5/thumbnails/5.jpg)
I understand even less than I did before
even less [[ I prs understand x much]2
than [I understood x much beforeF] ]~2]
![Page 6: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022032522/56649d6c5503460f94a4c189/html5/thumbnails/6.jpg)
Focus in comparative clauses
• Coherent syntactic-semantic theory about where focus should go
• Possibilities are constrained, because the main clause is usually the antecedent for focus interpretation in the comparative clause
• On a theoretical basis, we often think we know the correct grammatical analysis of sentences people use
![Page 7: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022032522/56649d6c5503460f94a4c189/html5/thumbnails/7.jpg)
![Page 8: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022032522/56649d6c5503460f94a4c189/html5/thumbnails/8.jpg)
![Page 9: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022032522/56649d6c5503460f94a4c189/html5/thumbnails/9.jpg)
![Page 10: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022032522/56649d6c5503460f94a4c189/html5/thumbnails/10.jpg)
![Page 11: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022032522/56649d6c5503460f94a4c189/html5/thumbnails/11.jpg)
![Page 12: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022032522/56649d6c5503460f94a4c189/html5/thumbnails/12.jpg)
![Page 13: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022032522/56649d6c5503460f94a4c189/html5/thumbnails/13.jpg)
![Page 14: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022032522/56649d6c5503460f94a4c189/html5/thumbnails/14.jpg)
Result
Hundreds of elements of a minimal pair varying position for focus
Speech files for short and 10-second intervals spanning than I did
Everyzing html contains time offsets for beginnings words. These are converted by program into a Praat representation.
Alingments are not good enough to use without correction.
![Page 15: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022032522/56649d6c5503460f94a4c189/html5/thumbnails/15.jpg)
Classification
Listen to sound snippet to determine if there is an actual token of “than I did”.
True in 56% of cases in a sample of 179 tokens.
![Page 16: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022032522/56649d6c5503460f94a4c189/html5/thumbnails/16.jpg)
Classify correct tokens into three grammatical-semantic classes
s comparing than- and main clauses, reference varies in the position of “I”. This licenses focus on the subject “I”.
[ he looked younger than I did. ]
21/40 tokens
![Page 17: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022032522/56649d6c5503460f94a4c189/html5/thumbnails/17.jpg)
d Comparing than- and main clauses, reference is constant in the position of “I”, but varies in the possible-world or temporal index of did, and not in any following position.
Depending on details of the representation of modality and time, this could license a focus on “did”.
5/40 tokens
![Page 18: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022032522/56649d6c5503460f94a4c189/html5/thumbnails/18.jpg)
f comparing than- and main clauses, reference in the position of I is constant, but varies in some position following did, often a temporal phrase.
I actually look younger now than I did 5 years ago
13/40 tokens
![Page 19: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022032522/56649d6c5503460f94a4c189/html5/thumbnails/19.jpg)
![Page 20: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022032522/56649d6c5503460f94a4c189/html5/thumbnails/20.jpg)
![Page 21: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022032522/56649d6c5503460f94a4c189/html5/thumbnails/21.jpg)
Mark vowel intervals in I and did with hand work.
Pitch in vowel region and duration of vowel region contribute positively to the area under the pitch curve (definite integral of pitch).
Number of glottal pulses in the vowel region.
![Page 22: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022032522/56649d6c5503460f94a4c189/html5/thumbnails/22.jpg)
![Page 23: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022032522/56649d6c5503460f94a4c189/html5/thumbnails/23.jpg)
![Page 24: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022032522/56649d6c5503460f94a4c189/html5/thumbnails/24.jpg)
![Page 25: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022032522/56649d6c5503460f94a4c189/html5/thumbnails/25.jpg)
NLP vs. Acoustic Phonetics
Classification based on signal
NLP classifier based on correct sentence (or speech recognition output), using parsing and machine learning on text features
![Page 26: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022032522/56649d6c5503460f94a4c189/html5/thumbnails/26.jpg)
Multiple focus
Issues marking of multiple foci with different scopes, and prominence of focus relative to accents not marking focus.
You made a very small amount more than I did. Now I make muchF more than youF do.