the interface between model-theoretic and corpus-based semantics sebastian pado
TRANSCRIPT
![Page 1: The interface between model-theoretic and corpus-based semantics Sebastian Pado](https://reader036.vdocuments.mx/reader036/viewer/2022083005/56649f205503460f94c38897/html5/thumbnails/1.jpg)
The interface between model-theoretic and corpus-based
semantics
Sebastian Pado
![Page 2: The interface between model-theoretic and corpus-based semantics Sebastian Pado](https://reader036.vdocuments.mx/reader036/viewer/2022083005/56649f205503460f94c38897/html5/thumbnails/2.jpg)
Natural language semantics
• Model-theoretic semantics– Compositional calculation of sentence meaning– Formal descriptions of ambiguities– Inference
• Model-theoretic semantics– Compositional calculation of sentence meaning– Formal descriptions of ambiguities– Inference
• Corpus-based semantics- Distributional, graded meaning representation- Probabilistic knowledge acquisition from corpora- Prediction of linguistic behaviour based on context
• Corpus-based semantics- Distributional, graded meaning representation- Probabilistic knowledge acquisition from corpora- Prediction of linguistic behaviour based on context
![Page 3: The interface between model-theoretic and corpus-based semantics Sebastian Pado](https://reader036.vdocuments.mx/reader036/viewer/2022083005/56649f205503460f94c38897/html5/thumbnails/3.jpg)
Complementary benefits
How to divide work between the approaches?
Model-theoreticsemantics
Good for sentence level(closed word classes)
Limited coverage
Correct
Model-theoreticsemantics
Good for sentence level(closed word classes)
Limited coverage
Correct
Corpus-basedsemantics
Good for lexical level(open word classes)
High coverage, robustness
Approximative
Corpus-basedsemantics
Good for lexical level(open word classes)
High coverage, robustness
Approximative
![Page 4: The interface between model-theoretic and corpus-based semantics Sebastian Pado](https://reader036.vdocuments.mx/reader036/viewer/2022083005/56649f205503460f94c38897/html5/thumbnails/4.jpg)
Strategies
1. More expressive representations for corpus-based models of meaning: Compositionality in vector spaces- Ongoing collaboration with Katrin Erk
(Dept. of Linguistics, U. Texas at Austin)
2. Corpus-based methods for enrichment of formal meaning representations– Core of SFB project proposal
![Page 5: The interface between model-theoretic and corpus-based semantics Sebastian Pado](https://reader036.vdocuments.mx/reader036/viewer/2022083005/56649f205503460f94c38897/html5/thumbnails/5.jpg)
Strategy 1
More expressive representations for corpus-based models of meaning
![Page 6: The interface between model-theoretic and corpus-based semantics Sebastian Pado](https://reader036.vdocuments.mx/reader036/viewer/2022083005/56649f205503460f94c38897/html5/thumbnails/6.jpg)
Compositionality in Vector Spaces• Vector space: Representation of word meaning by
context co-occurrences
• What is the representation of a phrase?– Centroid of two vectors?– No: Must take mode of combination into account
• “a horse draws…” : pull• “draw a horse” : sketch
![Page 7: The interface between model-theoretic and corpus-based semantics Sebastian Pado](https://reader036.vdocuments.mx/reader036/viewer/2022083005/56649f205503460f94c38897/html5/thumbnails/7.jpg)
A first step
• Structured vector space model [Erk & Pado 2008]
– Covers Verb+Object, Verb+Subject combinations– Word meaning consists of lexical vector plus selectional
preferences (=experiences) for dependents/governors
![Page 8: The interface between model-theoretic and corpus-based semantics Sebastian Pado](https://reader036.vdocuments.mx/reader036/viewer/2022083005/56649f205503460f94c38897/html5/thumbnails/8.jpg)
A first step
• Structured vector space model [Erk & Pado 2008]
– Covers Verb+Object, Verb+Subject combinations– Phrase meaning consists of two vectors:
• Verb meaning modified by nominal expectations about governor• Noun meaning modified by verbal expectations about dependent
![Page 9: The interface between model-theoretic and corpus-based semantics Sebastian Pado](https://reader036.vdocuments.mx/reader036/viewer/2022083005/56649f205503460f94c38897/html5/thumbnails/9.jpg)
Current state
• Evaluation: Better distinction between contextually appropriate and inappropriate paraphrases (WSD-style task)
• Further research questions– Generalisation to longer phrases
• More expressive model of expectations
– Modelling of phrases involving closed word classes• E.g. Negation
![Page 10: The interface between model-theoretic and corpus-based semantics Sebastian Pado](https://reader036.vdocuments.mx/reader036/viewer/2022083005/56649f205503460f94c38897/html5/thumbnails/10.jpg)
Strategy 2
Corpus-based methods for enrichment of formal meaning representations
![Page 11: The interface between model-theoretic and corpus-based semantics Sebastian Pado](https://reader036.vdocuments.mx/reader036/viewer/2022083005/56649f205503460f94c38897/html5/thumbnails/11.jpg)
Formal models of meaning in context
• Lexicon entries cannot provide the full range of readings for words/phrases– Readings often productively negotiated in text– Type/sort conflict
• Examples:– Metonymy/Metaphor– Telic adjectives (“fast typist”)– Coercion/Reinterpretation
![Page 12: The interface between model-theoretic and corpus-based semantics Sebastian Pado](https://reader036.vdocuments.mx/reader036/viewer/2022083005/56649f205503460f94c38897/html5/thumbnails/12.jpg)
Example: Coercion
• Wegen einer 15-jährigen kam es zu einem Streit, in dessen Verlauf sie verletzt wurde.
• […] Sie hatte sich mit einem 21-jährigen unterhalten.
• Red and blue expressions are coreferring, but red expression has wrong type (wegen takes <e,t>; expression is <e>).
• Here, context overtly provides missing event
• Often, this is not the case: Operator must be recovered from general knowledge
![Page 13: The interface between model-theoretic and corpus-based semantics Sebastian Pado](https://reader036.vdocuments.mx/reader036/viewer/2022083005/56649f205503460f94c38897/html5/thumbnails/13.jpg)
The role of corpus methods
• Acquisition of general reinterpretation operators from corpora
• Recovery/prediction of operators for instances with type/sort conflict– Making implicit meaning explicit: can be seen as
context-driven semantic specification
• Interest primarily empirical
![Page 14: The interface between model-theoretic and corpus-based semantics Sebastian Pado](https://reader036.vdocuments.mx/reader036/viewer/2022083005/56649f205503460f94c38897/html5/thumbnails/14.jpg)
Project Steps
• Creation of multilingual corpus of type/sort conflict cases with human annotations– Informed by formal considerations
• Development of CL methods to predict operators for conflict resolution
• Ideally, task-based evaluation (to be determined)
• Consequences/insights for formal descriptions
![Page 15: The interface between model-theoretic and corpus-based semantics Sebastian Pado](https://reader036.vdocuments.mx/reader036/viewer/2022083005/56649f205503460f94c38897/html5/thumbnails/15.jpg)
Research Questions• When can operators be found overtly in context; when must
general operators be recovered?– Influence of local discourse?
• CL methods for efficient and accurate prediction of operators– What linguistic levels are helpful? Semantic classes, semantic roles,
dependency relations, …?– Focus on more than one language: Can bilingual processing help?
• What is the level of generality of acquired operators?– What shape do people’s expectations have? – Do peoples’ judgments of recovered operators agree?
• Can empirical results have impact on formal descriptions?– E.g. do sort and type conflicts behave differently or similarly?
• Relation to work on textual entailment?
![Page 16: The interface between model-theoretic and corpus-based semantics Sebastian Pado](https://reader036.vdocuments.mx/reader036/viewer/2022083005/56649f205503460f94c38897/html5/thumbnails/16.jpg)
Collaborations
• D1 (Representation of ambiguities)– Formal descriptions as information source for corpus
development– Attempt to transfer of empirical results back into theory
• B5 (Polysemy in a conceptual system)– Ontological information as knowledge source for CL
operator models– Entailment as shared evaluation task
• Open for other ideas