the interface between model-theoretic and corpus-based semantics sebastian pado

16
The interface between model-theoretic and corpus-based semantics Sebastian Pado

Upload: mae-cole

Post on 04-Jan-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: The interface between model-theoretic and corpus-based semantics Sebastian Pado

The interface between model-theoretic and corpus-based

semantics

Sebastian Pado

Page 2: The interface between model-theoretic and corpus-based semantics Sebastian Pado

Natural language semantics

• Model-theoretic semantics– Compositional calculation of sentence meaning– Formal descriptions of ambiguities– Inference

• Model-theoretic semantics– Compositional calculation of sentence meaning– Formal descriptions of ambiguities– Inference

• Corpus-based semantics- Distributional, graded meaning representation- Probabilistic knowledge acquisition from corpora- Prediction of linguistic behaviour based on context

• Corpus-based semantics- Distributional, graded meaning representation- Probabilistic knowledge acquisition from corpora- Prediction of linguistic behaviour based on context

Page 3: The interface between model-theoretic and corpus-based semantics Sebastian Pado

Complementary benefits

How to divide work between the approaches?

Model-theoreticsemantics

Good for sentence level(closed word classes)

Limited coverage

Correct

Model-theoreticsemantics

Good for sentence level(closed word classes)

Limited coverage

Correct

Corpus-basedsemantics

Good for lexical level(open word classes)

High coverage, robustness

Approximative

Corpus-basedsemantics

Good for lexical level(open word classes)

High coverage, robustness

Approximative

Page 4: The interface between model-theoretic and corpus-based semantics Sebastian Pado

Strategies

1. More expressive representations for corpus-based models of meaning: Compositionality in vector spaces- Ongoing collaboration with Katrin Erk

(Dept. of Linguistics, U. Texas at Austin)

2. Corpus-based methods for enrichment of formal meaning representations– Core of SFB project proposal

Page 5: The interface between model-theoretic and corpus-based semantics Sebastian Pado

Strategy 1

More expressive representations for corpus-based models of meaning

Page 6: The interface between model-theoretic and corpus-based semantics Sebastian Pado

Compositionality in Vector Spaces• Vector space: Representation of word meaning by

context co-occurrences

• What is the representation of a phrase?– Centroid of two vectors?– No: Must take mode of combination into account

• “a horse draws…” : pull• “draw a horse” : sketch

Page 7: The interface between model-theoretic and corpus-based semantics Sebastian Pado

A first step

• Structured vector space model [Erk & Pado 2008]

– Covers Verb+Object, Verb+Subject combinations– Word meaning consists of lexical vector plus selectional

preferences (=experiences) for dependents/governors

Page 8: The interface between model-theoretic and corpus-based semantics Sebastian Pado

A first step

• Structured vector space model [Erk & Pado 2008]

– Covers Verb+Object, Verb+Subject combinations– Phrase meaning consists of two vectors:

• Verb meaning modified by nominal expectations about governor• Noun meaning modified by verbal expectations about dependent

Page 9: The interface between model-theoretic and corpus-based semantics Sebastian Pado

Current state

• Evaluation: Better distinction between contextually appropriate and inappropriate paraphrases (WSD-style task)

• Further research questions– Generalisation to longer phrases

• More expressive model of expectations

– Modelling of phrases involving closed word classes• E.g. Negation

Page 10: The interface between model-theoretic and corpus-based semantics Sebastian Pado

Strategy 2

Corpus-based methods for enrichment of formal meaning representations

Page 11: The interface between model-theoretic and corpus-based semantics Sebastian Pado

Formal models of meaning in context

• Lexicon entries cannot provide the full range of readings for words/phrases– Readings often productively negotiated in text– Type/sort conflict

• Examples:– Metonymy/Metaphor– Telic adjectives (“fast typist”)– Coercion/Reinterpretation

Page 12: The interface between model-theoretic and corpus-based semantics Sebastian Pado

Example: Coercion

• Wegen einer 15-jährigen kam es zu einem Streit, in dessen Verlauf sie verletzt wurde.

• […] Sie hatte sich mit einem 21-jährigen unterhalten.

• Red and blue expressions are coreferring, but red expression has wrong type (wegen takes <e,t>; expression is <e>).

• Here, context overtly provides missing event

• Often, this is not the case: Operator must be recovered from general knowledge

Page 13: The interface between model-theoretic and corpus-based semantics Sebastian Pado

The role of corpus methods

• Acquisition of general reinterpretation operators from corpora

• Recovery/prediction of operators for instances with type/sort conflict– Making implicit meaning explicit: can be seen as

context-driven semantic specification

• Interest primarily empirical

Page 14: The interface between model-theoretic and corpus-based semantics Sebastian Pado

Project Steps

• Creation of multilingual corpus of type/sort conflict cases with human annotations– Informed by formal considerations

• Development of CL methods to predict operators for conflict resolution

• Ideally, task-based evaluation (to be determined)

• Consequences/insights for formal descriptions

Page 15: The interface between model-theoretic and corpus-based semantics Sebastian Pado

Research Questions• When can operators be found overtly in context; when must

general operators be recovered?– Influence of local discourse?

• CL methods for efficient and accurate prediction of operators– What linguistic levels are helpful? Semantic classes, semantic roles,

dependency relations, …?– Focus on more than one language: Can bilingual processing help?

• What is the level of generality of acquired operators?– What shape do people’s expectations have? – Do peoples’ judgments of recovered operators agree?

• Can empirical results have impact on formal descriptions?– E.g. do sort and type conflicts behave differently or similarly?

• Relation to work on textual entailment?

Page 16: The interface between model-theoretic and corpus-based semantics Sebastian Pado

Collaborations

• D1 (Representation of ambiguities)– Formal descriptions as information source for corpus

development– Attempt to transfer of empirical results back into theory

• B5 (Polysemy in a conceptual system)– Ontological information as knowledge source for CL

operator models– Entailment as shared evaluation task

• Open for other ideas