a corpus of preposition supersenses - georgetown...

1
A Corpus of Preposition Supersenses before back as after by like about from at 234 on 238 with 316 for 496 in 505 of 517 to 592 131 133 133 134 135 Attribute 143 Destination 153 Time 162 Theme 200 Purpose 298 Location 596 Prepositional polysemy is rampant. They ran to the roof for a quick escape. DESTINATION PURPOSE They made for the roof to escape the cops. Preposition supersenses: semantic functions/thematic roles marked by English prepositions (Schneider et al., 2015) Comprehensive annotation: manually tagged all preposition tokens & types Analysis of distribution of semantic functions • Corpus release English preposition corpus @ http://tiny.cc/prepwiki Linguists group together related functions with terms like “locative”. Based on (Schneider et al., LAW 2015), we apply a hierarchy of 75 supersenses for comprehensive annotation of English prepositions in a 55,000 word corpus of web reviews. The English Web Treebank Reviews corpus had already been analyzed for comprehensive multiword expressions (Schneider et al. 2014) and noun and verb supersenses (Schneider & Smith 2015). PrepWiki (Schneider et al. 2015) In PrepWiki, example sentences are grouped by TPP sense but mapped individually to supersenses. Nathan Schneider · Jena D. Hwang · Vivek Srikumar LAW 2016 • Berlin University of Edinburgh/ Georgetown University IHMC University of Utah Meredith Green · Abhijit Suresh · Kathryn Conger · Tim O’Gorman · Martha Palmer University of Colorado at Boulder preposition types (114) supersenses (63) N = 4250 I had been a patient of Dr. Olbina for 9 years and had spent thousands of dollars on crowns etc . DURATION QUANTITY THEME POSSESSOR Theme Destination Location Recipient Topic Purpose Direction Stimulus Comparison/Contrast State Beneficiary Agent Co-Agent ProfessionalAspect 0 50 100 150 ADV CAU COM DIR GOL LOC MNR PAG PPT PRD PRP TMP VSP NUMBERED ARGM Purpose Location Time RelativeTime Explanation Duration DeicticTime Attribute Circumstance Direction Manner Comparison/Contrast Scalar/Rank StartTime 0 50 100 150 200 PropBank analysis Preposition supersenses All sentences were independently annotated twice, then adjudicated by an expert annotator Original IAA rates varied considerably; mostly 60%–78% IAA between experts: 88% Annotation Special (non-supersense) labels: `d = discourse `i = infinitival complement Because_of/ E XPLANATION the ants I dropped them to/ E ND S TATE a 3_star . I was told to/`i take my coffee to_go/ M ANNER if I wanted to/`i finish it . With/ A TTRIBUTE higher than/ S CALAR /R ANK average prices to_boot/`d ! multiword preposition idiomatic PPs StartState Configuration Circumstance Temporal Place Whole Elements Possessor Species Instance Quantity Superset Causer Agent Creator Co-Agent Explanation Attribute Manner Reciprocation Purpose Function Age Time Frequency Duration RelativeTime EndTime StartTime ClockTimeCxn DeicticTime Path Locus Value Comparison/Contrast Scalar/Rank ValueComparison Approximator Contour Direction Extent Location Source State Goal InitialLocation Material Donor/Speaker Destination Recipient EndState Via Traversed 1DTrajectory 2DArea 3DMedium Transit Instrument Patient Co-Patient Activity Means Course Accompanier Beneficiary Theme Co-Theme Topic ProfessionalAspect Undergoer Co-Participant Aector Participant Experiencer Stimulus of

Upload: others

Post on 20-Jan-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Corpus of Preposition Supersenses - Georgetown Universitypeople.cs.georgetown.edu/nschneid/p/psstcorpus-poster.pdfA Corpus of Preposition Supersenses before back as after by like

A Corpus of Preposition Supersenses

beforebackas

afterbylikeaboutfrom

at!234 on!

238 with!316

for!496

in!505

of!517

to!592

131133133134

135

Attribute!143

Destination!153

Time!162

Theme!200

Purpose!298

Location!596

Prepositional polysemy is rampant.

They ran to the roof for a quick escape. DESTINATION PURPOSE They made for the roof to escape the cops.

• Preposition supersenses: semantic functions/thematic roles marked by English prepositions (Schneider et al., 2015)

• Comprehensive annotation: manually tagged all preposition tokens & types

• Analysis of distribution of semantic functions

• Corpus release

English preposition corpus @ http://tiny.cc/prepwiki

Linguists group together related functions with terms like “locative”. Based on (Schneider et al., LAW 2015), we apply a hierarchy of 75 supersenses for comprehensive annotation of English prepositions in a 55,000 word corpus of web reviews.

The English Web Treebank Reviews corpus had already been analyzed for comprehensive multiword expressions (Schneider et al. 2014) and noun and verb supersenses (Schneider & Smith 2015).

Pre

pW

iki

(Sch

nei

der

et

al. 2015)

In PrepWiki, example sentences are grouped by TPP sense but mapped individually to supersenses.

Nathan Schneider · Jena D. Hwang · Vivek Srikumar LAW 2016 • Berlin

University of Edinburgh/Georgetown University

IHMC University of Utah

Meredith Green · Abhijit Suresh · Kathryn Conger · Tim O’Gorman · Martha PalmerUniversity of Colorado at Boulder

preposition types (114) supersenses (63)

N = 4250

I had been a patient of Dr. Olbina for 9 years and

had spent thousands of dollars on crowns etc .

DURATION

QUANTITY THEME

POSSESSOR

Theme

Destination

Location

Recipient

Topic

Purpose

Direction

Stimulus

Comparison/Contrast

State

Beneficiary

Agent

Co-Agent

ProfessionalAspect

0 50 100 150

ADV CAU COM DIR GOL LOCMNR PAG PPT PRD PRP TMPVSP

NUMBERED ARGM

Function tags mapped to fewer than 20 supersense-tagged prepositions overall are not displayed. (This accounts for why the bars are not strictly decreasing in width.) Numbered arguments tagged with VPC are mapped to Direction in 8 instances. AM-LVB is mapped to Purpose in 8 instances, while AM-CXN is the dominant function tag mapped to Scalar/Rank (18 instances).

Distribution of PropBank function tags for the most frequent mapped supersenses (numbered args at left, AM at right).

Purpose

Location

Time

RelativeTime

Explanation

Duration

DeicticTime

Attribute

Circumstance

Direction

Manner

Comparison/Contrast

Scalar/Rank

StartTime

0 50 100 150 200

�1

Pro

pB

ank

anal

ysis

Preposition supersenses

All sentences were independently annotated twice, then adjudicated by an expert annotator • Original IAA rates varied considerably;

mostly 60%–78% • IAA between experts: 88%

Ann

otat

ion

Special (non-supersense) labels: `d = discourse`i = infinitival complement

tags.

Identifying preposition tokens. TPP, and there-fore PrepWiki, contains senses for canonical prepo-sitions, i.e., those used transitively in the [PP P NP]construction. Taking inspiration from Pullum andHuddleston (2002), PrepWiki further assigns su-persenses to spatiotemporal particle uses of out,up, away, together, etc., and subordinating uses ofas, after, in, with, etc. (including infinitival to andinfinitival-subject for, as in It took over 1.5 hoursfor our food to come out).7

Non-supersense labels. These are used wherethe preposition serves a special syntactic functionnot captured by the supersense inventory. Themost frequent is `i, which applies only to infini-tival to tokens that are not PURPOSE or FUNCTIONadjuncts.8 The label `d applies to discourse ex-pressions like On the other hand; the unqualifiedbacktick (`) applies to miscellaneous cases such asinfinitival-subject for and both prepositions in theas-as comparative construction (as wet as water;as much cake as you want).9

Multiword expressions. Figure 3 shows howprepositions can interact with multiword expres-sions (MWEs). An MWE may function holisticallyas a preposition: PrepWiki treats these as multi-word prepositions. An idiomatic phrase may beheaded by a preposition, in which case we assign ita preposition supersense or tag it as a discourse ex-pression (`d: see the previous paragraph). Finally,a preposition may be embedded within an MWE(but not its head): we do not use a preposition su-persense in this case, though the MWE as a wholemay already be tagged with a verb supersense.

Heuristics. The annotation tool uses heuristicsto detect candidate preposition tokens in each sen-tence given its POS tagging and MWE annotation.A single-word expression is included if: (a) it istagged as a verb particle (RP) or infinitival to (TO),or, (b) it is tagged as a transitive preposition or

7PrepWiki does not include subordinators/complementizers that cannot take NP complements:that, because, while, if, etc.

8Because the word to is ambiguous between infinitivaland prepositional usages, and because infinitivals, like PPs,can serve as PURPOSE or FUNCTION modifiers, we allowinfinitival to to be so marked. E.g., a shoulder to cry onwould qualify as FUNCTION. By contrast, I want/love/try toeat cookies and To love is to suffer would qualify as `i. Seefigure 1 for examples from the corpus.

9Annotators used additional non-supersense labels to marktokens that were incorrectly flagged as prepositions by ourheuristics: e.g., price was way to high was marked as anadverb. We ignore these tokens for purposes of this paper.

(4) Because_of/EXPLANATION the ants I dropped themto/ENDSTATE a 3_star .

(5) I was told to/`i take my coffee to_go/MANNER if Iwanted to/`i finish it .

(6) With/ATTRIBUTE higher than/SCALAR/RANK

average prices to_boot/`d !

(7) I worked~with/PROFESSIONALASPECT Sam_Moneswho took_ great _care_of me .

Figure 3: Prepositions involved in multiword expres-sions. (4) Multiword preposition because of (others includein front of, due to, apart from, and other than). (5) PP idiom:the preposition supersense applies to the MWE as a whole.(6) Discourse PP idiom: instead of a supersense, expressionsserving a discourse function are tagged as `d. (7) Prepositionwithin a multiword expression: the expression is headed by averb, so it receives a verb supersense (not shown) rather thana preposition supersense.

subordinator (IN) or adverb (RB), and it is listed inPrepWiki (or the spelling variants list). A strongMWE instance is included if: (a) the MWE beginswith a word that matches the single-word criteria(idiomatic PP), or, (b) the MWE is listed in Prep-Wiki (multiword preposition).Annotation task. Annotators proceeded sentenceby sentence, working in a custom web interface(figure 4). For each token matched by the aboveheuristics, annotators filled in a text box with thecontextually appropriate label. A dropdown menushowed the list of preposition supersenses and non-supersense labels, starting with labels known tobe associated with the preposition being annotated.Hovering over a menu item would show examplesentences to illustrate the usage in question, aswell as a brief definition of the supersense. Thispreposition-specific rendering of the dropdownmenu—supported by data from PrepWiki—wascrucial to reducing the overhead of annotation (andannotator training) by focusing the annotator’s at-tention on the relevant categories/usages. Newexamples were added to PrepWiki as annotatorsspotted coverage gaps. The tool also showed themultiword expression annotation of the sentence,which could be modified if necessary to fit Prep-Wiki’s conventions for multiword prepositions.

3.2 Quality ControlAnnotators. Annotators were selected from un-dergraduate and graduate linguistics students at theUniversity of Colorado at Boulder. All annota-tors had prior experience with semantic role label-ing. Every sentence was independently annotatedby two annotators, and disagreements were subse-

multiword preposition

idiomatic PPs

StartState

Configuration

CircumstanceTemporal

Place

Whole ElementsPossessor

SpeciesInstance

Quantity

Superset

CauserAgent

CreatorCo-Agent

Explanation Attribute

Manner

Reciprocation PurposeFunction

Age Time FrequencyDuration

RelativeTime

EndTimeStartTime ClockTimeCxnDeicticTime

Path LocusValue

Comparison/ContrastScalar/Rank

ValueComparison

ApproximatorContour

Direction

ExtentLocation Source State

GoalInitialLocation

MaterialDonor/Speaker

DestinationRecipient

EndState

ViaTraversed

1DTrajectory2DArea 3DMedium

Transit

Instrument

Patient

Co-Patient

Activity

Means

Course

Accompanier

Beneficiary

ThemeCo-Theme Topic

ProfessionalAspectUndergoer

Co-Participant

AffectorParticipant

Experiencer Stimulus

of