a corpus-based approach to argument structure goals functional

Upload: thangdaotao

Post on 03-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 A Corpus-based Approach to Argument Structure Goals Functional

    1/13

    A corpus-based approach to argumentstructure

    Jos M. Garca-Miguel

    Universidade de [email protected]

    2007 RRG International Conference (Mxico) RRG 2007Garca-Miguel: Corpus-based approach to argument structure

    Goals

    To present some possibilities offered by asyntacto-semantic database (ADESSE) for thestudy of the interactions of verbs and constructionin Spanish

    To introduce the main criteria used in the buildingof such a database

    To overview some general features of argument

    structure in Spanish

    To compare both the approach and the results ofADESSE with some insightful proposals of RRGtheory

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 3

    Functional Grammar(s)

    This talk is not specifically about RRG, but it takes asbackground many ideas shared by most functionalists

    Functionalist share the idea that grammar associates formswith meanings and discourse functions: language is a system of communicative social action in which

    grammatical structures are employed to express meaning incontext (Van Valin 2005: 1)

    They differ with respect to (among other things): Standards of adequacy (typological, psychological, ) How the conceive the relation system - use The role of formalization and the specifics of the formalism

    Moderate functionalists (RRG) Extreme functionalists

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure

    LEXICON

    CONSTRUCTIONAL

    SCHEMAS

    LANGUAGE

    USE

    'entrenchment'

    (frequency memory)

    (A simplified model of)Functional grammar

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 5

    Variation and Grammar:

    the 'emergentist' view

    "Grammar is built up from specific instancesof use which marry lexical items withconstructions; it is routinized and entrenchedby repetition and schematized by thecategorization of exemplars" (Bybee 2006)

    Grammar is not fixed and absolute with aittle variation sprinkled on the top, but it isvariable and probabilistic to its very core"(Bybee & Hopper 2001)

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure

    Corpus linguistics

    Nowadays, to study language use and frequencyof words and constructions means corpuslinguistics

    Computers have made possible the quick search oflarge bodies of real text and make easier the task ofanalysing, annotating and storing linguistic data

    Some linguists (e.g. Butler think that functional linguistics should not only corpus-based but corpus-driven

  • 7/29/2019 A Corpus-based Approach to Argument Structure Goals Functional

    2/13

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 7

    Corpus linguistics

    me problems:

    The words and patterns that we find in the corpus should not beconfused with the words and patterns that are possible in thelanguage

    A corpus cannot tell us what is not possible

    Every (fragment in the) corpus needs some analysis andinterpretation by the linguist

    "The conclusion is that 'intuition-based' linguists and

    corpus-based' linguists need each other. Or better, that thetwo kinds of linguists, wherever possible, should exist in thesame body" (Fillmore 1992: 35)

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure

    Corpus linguistics and syntax

    "Corpus linguistic research has been largely limited to phenomenathat can be accessed via searches on particular words ()

    However, a (theoretical) syntactician is usually interested in moreabstract structural properties that cannot be investigated easily inthis way"(Manning 2003: 294)

    I.e., it is not always enough with google search of raw text, nor even withmorphosyntactically annotated corpora (or with texts accompanied ofinterlinearized glosses)

    We need detailed syntactic and semantic annotation ofcorpora

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 9

    ADESSE

    ESSE =Base de datos de verbos, Alternancias de Ditesis yEsquemas Sintactico-Semnticos del EspaolSyntactic Database of Verbs, Diathesis Alternations and

    Constructional Schemas of Spanish]

    http://adesse.uvigo.es/

    An on-going project funded by Spanish MEC and EU funds

    Goal

    A database with syntactic and semanticnformation for all the verbs and clauses in acorpus of Spanish

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure

    BDSBase de Datos Sintcticos del espaol actualhttp://www.bds.usc.es/

    A database with the (manual) syntactic analysis of 159,00clauses of the corpus ARThus

    ARTHUS Corpus (Archivo de Textos Hispnicos de la Universidade Santiago de Compostela)

    1.5 million words

    Textual genres: narrative (37%), spoken (19%), essay (17%),theater (15%), journalistic (12%)

    Origin: Spain (79%), Americas (21%)

    ADESSE antecedent

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 11

    BDS ADESSEUSC 1990-1999]

    Syntacticinformation

    Grammatical featuresof clauses, verbs andarguments of thecorpus[Each record (clause):64 fields]

    [Univ. of Vigo 2002- ]

    All the syntactic information fromBDS

    +Semantic information

    - Verb senses

    - Verb classes

    - Semantic roles

    159.000 clauses

    3.500 verbs

    13.500 valency patterns

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure

  • 7/29/2019 A Corpus-based Approach to Argument Structure Goals Functional

    3/13

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 13

    (part of) a record in ADESSE

    IVDder

    S D Int Schema

    novioautomvil

    ReceiverPossessionDonorm RoleNPNPnt Category

    le3plreement

    IObjDObjSubjnt Function

    A2A1A0

    Activeice

    Transfer of possessionrb class

    REGALARED

    Al novio le regalaron un automvil convertible [CRO: 44, 1]'They gave the bridegroom a convertible car as gift'

    XT

    Valueeld

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure

    BDS/ADESSE

    Other grammatical features

    Clause: Clause Type (main, subordinate, ) Mood Tense Modal and Phase Auxiliaries Negation Illocutionary force Voice

    Arguments: Definiteness Number Person

    SENTENCE

    CLAUSE

    CORE

    NUC

    PRED

    DP

    NPNP

    AGR AGR

    novio le regala- ron un automvil

    a: DAT ACTIVE PSA:NOM (ACC)

    3pl], )] CAUSE [BECOME have (novio, automvil)]

    0:donor>

    Cat: < NP , NP>

    SchemaSynt Funct: < S D I>

    Order: < I V S >

    Agr: < le, S3pl >

    Sem roles

    UNDERGOERACTORNMR

    RRG representation and ADESSE features

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure

    BDS/ADESSE database aims to be theory-neutral

    it only assumes common Basic Linguistic Theory(in the sense proposed by Dixon)

    but is fairly compatible with functional andconstructional grammars

    the approach is aimed to correct or complementbasic linguistic theory (or theories) in the light ofcorpus evidence

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 17

    Basic strategies:

    Verbs and arguments in ADESSEBDS provides a syntactic characterization ofarguments and constructions ESCRIBIR 'write'

    Subj DO IO Subj DO

    SUSTITUIR 'substitute, replace' Subj DO porNP Subj DO

    ENSEAR

    Subj DO IO Subj DO a Inf

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure

    Verbs and arguments in ADESSEIn many cases, each syntactic construction selections asubset of the potential participants of the scene evoked bythe verb

    a) Juan [0] le escribi una carta [1] a su madre [2] sobre susrecuerdos de infancia [3]John wrote a letter to his mother about his childhoodremembrances

    b) Juan [0] escribi una carta [1]John wrote a letter

    c) Juan [0] le escribi a su madre [2]John wrote to his mother

    The task in ADESSE is to annotate which of the potentiparticipants is selected in each syntactic schema

  • 7/29/2019 A Corpus-based Approach to Argument Structure Goals Functional

    4/13

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 19

    Verbs and arguments in ADESSE

    e same syntactic construction can be mapped with differentconfigurations of semantic arguments

    Sustituir'replace'a) Deco [1] sustituy a Xavi[2]

    Deco replaced Xavi

    b) Rijkaard[0] sustituy a Xavi[2]Rijkaard replaced Xavi

    c) Rijkaard[0] sustituy a Xavi[2]por Deco [1]Rijkaard replaced Xavi with Deco

    e same set of semantic arguments can be linked to different

    syntactic patterns Ensear'teach'

    a) Ella [0] le [1] enseaba su idioma [2]She taught him her language

    b) Ella [0] ense al nio [1] a caminar[2]She taught the baby how to walk

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 2

    Verbs and Arguments in ADESSE

    Valency potential of a lexical entry: which arguments can be selected by a given verb?

    Valency realizations (diatheses): which arguments are actually expressed

    which is syntactic realization of each argument

    voice

    (The strategy in ADESSE is to define the valency potential of each verbentry and to register in the corpus all the valency realizations)

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 21 RRG 2007Garca-Miguel: Corpus-based approach to argument structure 2

    Valency patterns and frequency

    Enseaban a saber comer[MAD:277]2A2:Obl(a)A0:Subj

    Enseaban cosas tiles [MAD:277]18A2:DObjA0:Subj

    A coser la enseaban desde pequea [USOS:71A2:Obl(a)A1:DObjA0:Subj

    Kant ense en Knigsberg[TIE:195]4A0:Subj

    Eso me ensear a fiarme de ti [PAI:161]22A2:Obl(a)A1:IObjA0:Subj

    Si su hijo no sabe, l le ensear [SON:99]28A1:IObjA0:Subj

    Me enseaba su idioma [JOV:134]58A2:DObjA1:IObjA0:Subj

    ExamplesNValency pattterns in Active Voice

    A2

    (Content)A1

    (Learner)A0

    (Teacher)

    Valency potential of the verb ENSEAR 'teach'

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 23

    Arguments and gradienceValency patterns occur in the corpus with differentfrequency, ranging from the more usual to the rare andunexpected.

    As a consequence, verb arguments are not alwayssyntactically realized Obligatoriness optionality of arguments is not a yes or no matter,

    but a gradient

    Obligatory arguments are those (referential) elements morefrequently tied to the verb in texts

    Because obligatoriness is one of the main criteria for theargument adjunct distinction, this is also a gradient

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 2

    Arguments and gradienceArguments ofEnsear'teach' (N clauses =139)

    (75.5 %)105Content2

    (79.9 %)111Learner1

    (99.3 %)138Teacher0

    Arguments ofEscribir'write' (N clauses = 321)

    (3.1 %)10Topic4(26.5 %)85Receiver2

    (64.8 %)208Text1

    (93.5 %)300Writer0

  • 7/29/2019 A Corpus-based Approach to Argument Structure Goals Functional

    5/13

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 25

    Frequency and argument structure

    Argument structure' needs to be replaced by agreatly enriched probabilistic theory capturing theentire range of combinations of predicates andparticipants that people have stored as sorted andorganized memories of what they have heard andrepeated over a lifetime of language use

    (Thompson & Hopper 2001: 47)

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 2

    Frequency and argument structure

    Manning (2003)

    Many subcategorization distinctions presented in thelinguistics literature as categorical are actuallycounterexemplified in studies of large corpora of writtenlanguage use.

    We can get a much better picture of what is going on byestimating a probability mass function (pmf) over thesubcategorization patterns for the verbs in question.

    we can put a probability function over the kinds ofdependents to expect with a verb or class, conditioned onvarious features. (302)

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 27

    Frequency and argument structure

    Proposals

    The meaning of a verb determines its contexts of use ands determined by its contexts of use

    Argument structure is a generalization over registered use,where frequency/probability of cooccurrence is a speciallymportant factor in the entrenchment of a valency pattern.

    nstead of obligatory arguments, or participants inherent toa scene, our past linguistic experiences provides us with

    certain probability expectations about the reference to acertain participant type in the scenes evoked by a verb

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 2

    Verbs and constructions

    A Basic problem: Polisemy and contextualacommodation. Changes in meaning when a verbenters alternating valency patterns

    The formalization of syntactic alternations different lexical entries: each different meaning is a

    different verb entry

    lexical rules (RRG), relating different LS

    underspecification: only one verb meaning, thedifferences in meaning should be attributed to theconstructions

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 29

    Verbs and constructionsStrategies in ADESSElexical rules or constructions?

    Underspecification: Reduce lexical entries to aminimum (searching wider coverage of the corpus andless task consuming)

    (new strategy in ADESSE-II): Levels of granularity in thedefinition of verb senses

    Nevertheless, this is a 'false dichotomy' (Croft 2003), motivated either

    by the level of granularity or by the perspective adopted when linkingsyntax and semantics (Van Valin 2004)

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 3

    Lexical entries in ADESSETwo levels: Level 1: Macro-aception [verb 'meaning'], associated with a semantic

    domain and a set of participant roles Level 2: (Sub)aception [verb 'senses'] {work in progress}

    Volver1.2 (met) 'return to a state or activity'

    Volver-3 '(cause) become'

    Volver-2 'turn round'

    Volver1.1 (lit) 'return to a place'Volver-1 'return, go back'

    Ensear-2 'teach'

    Ensear-1 'show'

    Conocer 1.3 'understand, know deeply'

    Conocer 1.2 'recognize, distinguish'

    Conocer 1.1 'get to know'Conocer-1 'know'

    LEVEL 2LEVEL 1

  • 7/29/2019 A Corpus-based Approach to Argument Structure Goals Functional

    6/13

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 31

    Verb Entries

    (At level 1) We distinguish verb entries when they areassociated with different sets of semantic roles that means differences in valency potential,

    not in valency realization

    We try to limit sense distinctions to a minimum, but we mustdistinguish senses that cannot be 'unified': partir 1 (go away) vs.partir 2(break) saber 1 (know) vs. saber 2(taste)

    or (less clearly) senses which are related with different lexical

    classes ensear 1 ('show') vs ensear 2('teach')

    Anyway, there no clear boundaries between senses at anylevel (cf. Kilgarriff 1997)

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 3

    Unifying verb senses

    Typically included in one single verb entry (level 1):

    Diathesis alternations(causative / inchoative, locative alternation, ...)

    Semantic differences are attributed primarily to the construction, nto the verb

    Paradigmatic alternatives within an argument(For ex: write a letter / a novel / a musical work)

    Meaning accommodation or co-compositionality, but not differentverb senses

    Metaphoric and other figurative uses they are annotated as the literal uses, but marked as figurative

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 33

    Paths of Schematization

    IODOPredSubjENSEAR

    ingls y francsenseabaMi padre

    (to somebody)(a language)teaches(somebody)

    OIndODirensearSuj

    My father taught us English and French

    [[SEVSEV: 277]: 277]

    ching verbs/frame

    nos

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 3

    Syntactic Patterns Verbs

    Once we have defined verb entries and syntactic patterns, we cantake two complementary (not necessarily incompatible) points ofview concerning the association of verbs and constructions:

    P. of v. of the verb:alternations (for ex., Levin 1993) of valency patternskeeping, as far as possible, the lexical elements Me ense a cantar "She taught me to sing"

    Me ense canto "She taught me singing"

    P. de v. of the construction:'surface generalizations' concerning uses of a

    constructional schema with different lexical elements(Goldberg 1995, 2000; also Dowty 1998)

    Me ense a cantar "She taught me to sing"

    Me oblig a cantar "She forced me to sing"

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 35

    Verbs and alternationsany lexicalist approaches havecused on whether verbs admitscertain alternation, for ex. the

    ausative alternation

    r, alternatively, whether theyan be combined with a certainpe of argument) No

    Yes

    A0 A1

    YesCrecer

    'grow'

    YesCambiar'change'

    A1

    YesYesEnsear'teach'

    YesNoAprender'learn'

    N1 N2A0 N1 N2

    this is not a yes/no question,ough the syntagmatic axis

    serve to describe theavioral profile of verbs

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 3

    Arguments and behaviour profile:

    Change of state verbs

    ABRIR

    CAMBIAR

    CERR

    AR

    ROMPE

    R

    E

    NCEN

    DER

    CREC

    ER

    AR

    REG

    LAR

    APAG

    AR

    LIMPIAR

    ORGANIZAR

    RESO

    LVER

    AUME

    NTAR

    BOR

    RAR

    PROLON

    GAR

    ILUMI

    NAR

    0: AGT

    1: PAT

    88%100%

    90%85%83%

    100%100%99%94%99%100%100%100%100%98%

    81%

    38%

    90%

    81%

    100%

    0%

    88%

    75%

    100%

    88% 90%

    45%

    83%

    54%

    72%

    0%10%20%

    30%40%

    50%

    60%

    70%

    80%

    90%

    100%

  • 7/29/2019 A Corpus-based Approach to Argument Structure Goals Functional

    7/13

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 37

    Arguments and behavior profile:Verbs of knowledge

    SABE

    R

    CONO

    CER

    RECORD

    AR

    COMPR

    ENDER

    ESTU

    DIAR

    OLVIDAR

    ACORDA

    R

    APREN

    DER

    ENTE

    RAR

    IMAGIN

    AR

    ENSEAR

    SOA

    R

    DEMOSTRA

    R

    0: Inducer

    2: Content

    1: Cognizer

    0%

    20%

    40%

    60%

    80%

    100%

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 3

    Valency alternations and behavior profil

    Each lexical element can be described in terms of thefeatures and contructions it combines with in context (itsyntagmatic profile)

    A relevant part of the profile of a verb are its constructionaschemas, the realizations of its arguments, and thefrequencies of schemas and argument realizations

    Two verbs of the same class may have similar sintagmaticcombination and differ in the relative frequency of each

    combination, and as a consequence in the relativefrequency of their core arguments

    It is hypothesized that the differences in frequency aremotivated by differences in meaning

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 39

    "Surface generalizations"

    (cf. Goldberg 1995, 2002; also Dowty 2000)

    e can see the relation between verbs and constructionsfrom the point of view of the constructions.

    That is, any constructional schema (vgr., passive, doubleobject, ) can be described by itself and not as derivedfrom any other alternating schema.

    The meaning of a constructional schema is established(and learned) by generalizing from the meaning ofparticular utterances that instantiate the schema An essential part of the characterization of a constructional schema

    comes from its association with a class of verbs. The strength of the association of a constructional schema and a

    verb should be measured on the basis of a (syntactically annotated)corpus

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 4

    Verbs in the schema

    El profesor le dejaba la casa para que la habitara [HI96'Lend'DEJAR

    Le trae un kilo de bombones a mam [BAI:424]96'Bring'TRAER

    Si se te cae un ojo, te pondrn otro enseguida [MOR:101'Put'PONER

    Le explic cmo funcionaba la imprenta [TER: 062]112'Explain'EXPLICAR

    No puedo ofrecerles grandes comodidades [LAB: 115119'Offer'OFRECER

    Me permite usar su telfono? [SON: 285]124'Allow'PERMITIR

    Me pregunt que si tena dinero [LAB: 268]219'Ask, inquire'PREGUNTAR

    Nos pidi que lo esperramos [HIS: 013]273'Ask, request'PEDIR

    Rosetta le contaba que el otro iba empeorando [SON:308'Tell'CONTAR

    Le hizo la promesa de llevarle al local[TER: 119]514'Make'HACER

    Le dije que no quera volver[TER: 046]599'Say, tell'DECIR

    Mi padre slo me da dinero para estudiar[SON:115]1272'Give'DAR

    ExampleNMeaningVERB

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 41

    Verb classes of clauses in the schema

    Active

    6158127(causarcause)20118EXISTENTIAL

    costarcost17180ATTRIBUTIVE

    tocartouch80305OTHER FACTS

    abriropen140430CHANGES

    recordarremember64615MENTAL

    traerbring88686LOC & MOVEMENT

    hacerdo, make18806MODULATION

    decirsay, tell1082419COMMUNICATION

    dargive802568POSSESSION

    Example verbsVerbsClausesCLASS

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 4

    S D/Ian a Inf

    More frequent verbs in the constructional schema S D/Ian a Inf (ADESSE):

    Me mand a comprar pan9DesplazamientoMANDAR

    Me forz a abandonar el intento9ObligacinFORZAR

    Me sac a bailar//lo sac a saludar10DesplazamientoSACAR

    Me anim a escribir10InduccinANIMAR

    (esto) Me impuls a escribir11InduccinIMPULSAR

    Me ense a coser23ConocimientoENSEAR

    Me invit a ver su casa42InduccinINVITAR

    Me llev a ver una casa52DesplazamientoLLEVAR

    Me ayud a triunfar70InduccinAYUDAR

    Me oblig a salir94ObligacinOBLIGAR

    ExampleNClassVERB

  • 7/29/2019 A Corpus-based Approach to Argument Structure Goals Functional

    8/13

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 43

    Paths of Schematization

    Grouping verbs in classeso main paths of schematization / generalization:A verb is related, by its lexical meaning, with other partly similar verbs ensear-1 'show' mostrar'show', ver'see', mirar'look'

    ensear-2'teach' aprender 'learn', estudiar'study', saber'know'

    On the other hand, by being used in a syntactic schema, it issemantically construed as other verbs that realize the same pattern ensear dar'give', decir `say', contar 'tell', preguntar'ask',

    ensear dar'give', decir `say', contar 'tell', preguntar,

    That puts each verb in a complex network of semantic relations

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 4

    S D I S D/I a infinitive

    darhacer decir

    ENSEAR

    invitarayudar obligar

    ver

    mostrar

    mirar

    aprendersaber

    conocer estudiar

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 45

    Semantic Classes of Verbs/Preds

    wo main criteria of semantic classification of verbs(both of them used in RRG)

    Aktionsart classes (based on LS) State, Activity, Achievement, Accomplishment, etc

    Ontological/conceptual classes(based on the 'constant' part of LS) Location, perception, cognition, consumption, etc.

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 4

    ADESSE Verb Classes

    Goal of ADESSE verb classification:to represent generalizations over types of conceptualframes evoked by individual verbs

    It is a conceptual/ontological classification, inspired inlexical relations of synonymy and hyponymy/troponymy,not aspectual nor primarily syntactic

    It is a hierarchical classification, with up to four levels at thpresent stage Top level classes 6 options

    [~Hallidays 'process types'] Classes recognized so far 60 options

    With the possibility of increase granularity in the future

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 47

    ADESSE hierarchy of semantic classes

    tocar33 Other facts

    tratar, atreverse61 Dispositivehacer, obligar60 CausativeMODULATION

    ser21 Attributive

    pedir42 Command

    estar, poner312 Location

    creer132 Belief

    haber, aparecer

    criticardecir, hablar

    destruirromper, cambiarhacer, crear

    ir, llevartener, dar

    saber, ensearver, escuchar

    gustar, temerExamples

    41 Judgement

    XISTENTIAL

    OMMUNICATION

    323 Destruction

    322 Modification

    321 Creation32 Change

    311 Displacement31 "Space"MATERIAL

    22 Possession

    ELATIONAL

    131 Knowledge13 Cognition

    12 Perception

    11 FeelingMENTAL

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 4

    ADESSE top-level classes

    Hacermake

    HaberexistDecirsay

    RerlaughTocar 'touch'

    Abriropen

    Irgo

    TenerhaveSerbe

    SaberknowVerseeGustarlike

    Exs

    1583113

    1171552399

    109276413141761544131472

    CLAUSE

    3978TOTAL1396 MODULATION

    2015 EXISTENTIAL3354 COMMUNICATION33134 Behaviour42933 Other facts92332 Change62931 "Space"3 MATERIAL18522 Possession23221 Attributive2 RELATIONAL17113 Cognition11512 Perception27111 Feeling1 MENTAL

    VERBS

  • 7/29/2019 A Corpus-based Approach to Argument Structure Goals Functional

    9/13

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 49

    Semantic roles and verb classes

    Each (sub)class is associated with a set of semantic rolesprototypical for the cognitive domain evoked

    Source

    --

    Agent

    Initiator (causer)

    Initiator (causer)

    Initial-possessor(Donor)

    Initiator (causer)

    Initiator (causer)

    0

    ReceiverMessageSayermunication

    Patientge

    LocativeThemeization

    GoalThemeacement

    PossessedFinal possessor(Receiver)

    sfer

    PossessedPossessoression

    ContentCognizerition

    PerceivedPerceivereption

    StimulusSenserng

    --21Class

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 5

    Semantic roles

    There is no single list of semantic roles, and role definitionis made at three levels:

    Verb-specific roles(representing valency potential) EscribirA0:Writer, A1:Text, A2:Receiver, A3:Topic EnsearA0: Teacher, A1: Learner; A2: Content

    Class-specific roles Communication Sayer, Message, Receiver, Topic

    Cognition Cognizer, Content

    SynSem Schemas (valency realizations), pointing toverb-specific roles Active Suj=A0 DObj=A1 IObj=A2

    Le escriba canciones de amor

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 51

    Semantic roles, verbs and semantic classes

    Each verb entry is associated with a set of argumentsembracing any possible core participant with this verb

    By default, verb arguments inherit the role labels from theclass(es) to which the verb belongs

    A2TextWriter

    A3A1A0cribir write

    MMUNICATION Sayer Message Receiver Topic

    utut,, sometimessometimes,, verbverb--specificspecific rolerole labelslabels areare alsoalso usedused

    EATION Creator Effected

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 5

    RRG: Generalized Semantic Roles

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 54

    ADESSE roles

    destruir.01Destroyed

    romper.01, cambiar.01Affected

    hacer.01, crear.01Created

    Patient

    tener.01, dar.01Possessor

    saber.01, ensear.01, creer.01Cognizer

    ver.01, mirar.01Perceiver

    gustar.01, temer.01Emoter

    Experiencer

    saber.02, ensear.02, creer.02Content

    ver.02, mirar.02Perceived

    gustar.02, temer.02Emoted

    Stimulus

    tener.02, dar.02Possessed

    CLASS-SPECIFIC ROLES VERB-SPECIFIC ROLES

    ncreasing generalization

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 5

    Generalized Semantic Roles

    (still a tentative in ADESSE)Common LS templates of lexical entries

    pred'(x) | BECOME pred'(x) pred'(x, y) | BECOME pred' (x, y) [do' (z)] CAUSE [BECOME pred'(x, (y))]

    Default indices for arguments

    z = A0 (initiator or causer) x = A1 (first argument ofpred') y = A2 (second argument ofpred')

    that is only the default numbering, because many verbs have a more complexsemantic structure

  • 7/29/2019 A Corpus-based Approach to Argument Structure Goals Functional

    10/13

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 56

    Generalized Semantic Roles

    z = A0 (initiator or causer)

    x = A1 (first argument ofpred')

    y = A2 (second argument ofpred')

    A0, A1, and A2 are the closest Adesse's relatives ofmacro-roles Actor and Undergoer

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 5

    Generalized Semantic Roles:A0 A1 A2 vs Actor Undergoer

    The ADESSE hierarchy A0 A1 A2 is similar to the Actor-undergoer hierarchy:

    ACTOR UNDERGOER

    Arg of

    DO

    1st

    arg. of

    do(x,

    1st

    arg. of

    pred (x,y)

    2nd

    arg of

    pred (x,y)

    Arg of

    pred(x)

    A0 A1 A2

    Arg of[do(x,)] CAUSE 1

    st

    arg. ofdo(x, 1

    st

    arg. ofpred (x,y)orpred(x)

    2

    nd

    arg ofpred (x,y)

    But A0-A1-A2 should not be confused with macroroles themselves

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 58

    Generalized Semantic RolesA0 A1 A2 vs Actor Undergoer

    Learner /Thing learned

    Thing learned /Learner

    TeacherEAR(x, )] CAUSE [BECOME know(y,z)]

    Thing learnedLearnerENDER

    Thing knownKnowerER

    A2A1A0

    DESSE:Arguments of saber, aprenderand ensear:

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 5

    GSR and Argument realization

    A0 A1 A

    DOSubj

    2%Other

    5%S=0

    93%S=1Intransitive: Subj (+ oblique)

    10%Other

    3 %S=0 D=2

    25 %S=0 D=1

    61%S=1 D=2

    Transitive: Subj DObj (+ oblique)

    34%other / not set

    30%S=1 D=2 I=3

    36%S=0 D=2 I=1

    Ditransitive: Subj DObj IObj (+ oblique)Frequency of argument realizations

    Active voice

    Subject is almost always higherthan DO in the hierarchy of GSRs

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 60

    GSR and Indirect ObjectsA1 A2

    DObj

    10%Other

    25%S=1 I 2

    61%S=2 I = 1

    Two participants:Subj IObj (+ oblique)

    34%other / not set

    30%S=1 D=2 I=3

    36%S=0 D=2 I=1

    Three participants:Subj DObj IObj (+ oblique)he status of IObj in this hierarchy is

    nclear. In schemas, it can be

    conceived as a middle (A1) or asan additional (A3) argument

    In , IO usually outranks thesubject, as in psychological verbs[A1:Experiencer A2:Stimulus]

    A Mara le gusta la msica

    "Mary likes music"roblem: the nature of IO and DO

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 6

    GSR and Argument realization"Inversion (?)"

    Psychological verbs (feeling, desire, )

    a) : querer, temer, amar, odiar, admirar,Ex: l la quiere "He loves her"

    pred'(A1, A2); stative with human EXP as PSA

    b) : gustar, interesar, importar, encantar, doler, ..Ex: A ella le gusta "She likes him/her"

    pred'(A1, A2); stative with some PSA properties on IO

    c) : sorprender, asustar, preocupar, impresionar, Ex: l la impresion "He impressed her"

    but there are no clear limits between (b) and (c)

  • 7/29/2019 A Corpus-based Approach to Argument Structure Goals Functional

    11/13

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 62

    Psychological verbs and dative case

    ercentage of dative case for 3rderson clitic Experiencer in two-articipant clauses with verbs ofeeling'

    A Paula le/la alegr la noticia

    'The news pleased Paula"

    he main conditioning factors

    eem to be dinamicity andfectiveness : calmar, for

    xample, is more dynamic andfective than tranquilizar (1/6)17%Calmar'calm'

    (3/7)43%Consolar'console'

    (6/11)55%Tranquilizar'calm'

    (4/7)57%Preocupar'worry'

    (3/5)60%Distraer'amuse'

    (2/3)67%Alegrar'please'

    (14/19)74%Sorprender'surprise'

    (9/12)75%Impresionar'impress'

    (14/15)93%Molestar'bother'

    (224/225)100%Gustar'like'

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 6

    Objects

    Subj & DO & IO = direct core arguments

    Rough RRG equivalents Subject = PSA DO = Undergoer IO = NMR

    Corpus data reveal a continuum DO IO withseveral factors at play: Case: accusative vs dative (3rd person clitic) Preposition a Cliticization and clitic-doubling

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 64

    Variable case marking

    Some Objects are referred by a dative clitic [le(s)] instead of anaccusative clitic [lo(s)/la(s)], even with the same verb and within thesame text

    a. El padre le enseaba a conocer las hierbas (Jov: 023)The father taught him [dat] to know about herbs

    b.A coserla enseaban desde muy pequea (Usos: 071)She was taught to sew since her early infance

    a. Lo que realmente lopreocupaba era.. (Hist: 131)What actually worried him [ac] was

    b. Esos bienes que tanto lepreocupan (Hist: 70)These goods that so much worry him [dat]

    Le is the canonical form for masculine and feminine Indirect ObjectsR] in ditransitive clauses

    Le regal un libro a Mara

    n two-participant clauses we can rank the verbs from more dative-likeo less dative-like

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 6

    Variable object marking

    Some Objects are doubled by a cross-referencpronominal clitic

    - Conocs a Elena Garro? []- Y de dnde la conocs vos a Elena Garro? (BAIRES:418, 24-

    Doubling by clitic is usual with Indirect Objects

    Le regal un libro a Mara "I gave Mary a book"

    [besides syntactic function, the main conditioning factor is accessibilstatus (Belloro, yesterday)]

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 66

    Variable object markingSome Objects are marked by preposition a

    Encontr a un amigo I met a friend

    Encontr un amigo id.

    Preposition a is used obligatorily to mark fullndirect Objects

    Le regal un libro a MaraI gave Mary a book

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 6

    3-participantsSubj DO IO

    2-participants:

    Subj Obj

    99.7%0.3%31.1%Dative clitic

    [vs Accusative]

    42.8%1.0%3.8%Clitic Doubling

    [vs (a) NP]

    100 %0.7%9.9 %A + NP[vs NP]

    R = IOO = DOObject

  • 7/29/2019 A Corpus-based Approach to Argument Structure Goals Functional

    12/13

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 68

    truction Ditransitive (Mono)transitive Ditransitive

    cipant O P R

    NP

    non-doubled

    accusative

    a NP

    (doubled)

    dative

    ctic Function DO IO

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 6

    0%

    20%

    40%

    60%

    80%

    100%

    % a 100% 100% 100% 90% 42,4% 2,7% 0,4% 0%

    % doubling 99,5% 100% 99,2% 14,7% 2,8% 2,7% 0,3% 0,2%

    % dative 100% 83% 73% 54% 2,7% 0,2%

    1 2 Vd Pro 3 anim def animindef

    inan def. inanindef.

    Claus

    Object variation and the animacy hierarchy

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 70

    Object variation

    No dialect of Spanish has categorical rules for the use of a, cliticdoubling or lesmo. Everywhere we have a gradient

    The general tendency is to have unmarked nominals for O inditransitive clauses and for P low in the animacy hierarchy.In general,we have morphologically marked objects for referents high in theanimacy hierarchy (Silverstein 1976)

    Other relevant factors (not explored today):

    Clitic doubling is also goberned by the animacy hierarchy, butcorrelates more strongly with discourse status: topicality and

    accesibility of the referent Case is also governed by dialect variation, gender, and type of

    process (dinamicity, effectiveness).

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure

    Subject and object in Spanish

    Participants in SUBJ-PRED-OBJ[(+Obl)pattern[N=81954clauses]

    SUBJ OBJ

    Human 80.50 % 27.14 %Agreement/clitic only 63.80 % 25.90 %Definite (if NP) 90.00 % 66.33 %Preverbal (if NP or clause) 73.67% 3.84 %

    Participantsin SUBJ-PRED-DO-IO (+Obl) pattern [N = 8455 clauses]

    SUBJ DO IOHuman 84.18 % 2.25% 90.24 %

    Agreement/clitic only 65.95 % 10.65% 74.14 %Definite (if NP) 90.06 % 53.57% 89.02 %Preverbal (if NP or clause) 74.50 % 2.40 % 9.50 %

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 72

    Object and markednessTransitive clauses

    SUBJECT DIROBJ

    Human Non Human

    Highly accessible Low accessibility

    Definite less definite

    Topic (Part of) Rheme[Agent] [Patient]

    Objects

    (unmarked) (marked)

    Non Human Human a-marking, le

    Indefinite Definite

    Low accessibility More accessible

    doubling(Part of) Rheme Topic

    Patient Less affected lesmo

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure

    ConclusionsADESSE: What is it good for?

    ADESSE is, above all, a database for the empirical study the interaction between verbs and constructions in Spanis

    Constructional alternatives for a verb, a syntactic function or asemantic role (with frequencies in a corpus)

    Verbs and syntactic constructions for a semantic domain Verbs and semantic domains for a particular construction ...

    Additionally, it allows the search and study of manyimaginable correlations between syntactic and semanticfeatures (case, person, number, definiteness, tense, mood)

  • 7/29/2019 A Corpus-based Approach to Argument Structure Goals Functional

    13/13

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 74

    Conclusions / final remarks

    t is necessary to observe (spoken and written)anguage in use and to take into accountfrequency as an important factor of languagestructure and meaning

    Many grammatical categories manifest as agradient and not as discrete categories. We haveseen that this is the case with argument structure,

    argument realization, and grammatical relations.The task is to identify the factors influencing thechoice of a form, and the strength of each factor

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure

    Conclusions / final remarks

    It order to achieve descriptive adequacy, we needcorpora annotated with increasingly detailedsyntactic and semantic categories

    But the categories used in corpus annotationshould not be taken for granted, and must berevised in the light of corpus evidence

    Therefore, we have to move continually fromanalytical categories to corpus and from corpus to

    analytical categories

    (That is also a disclaimer:Researchers can get adesse data at with somanalysis, but the final analysis of each example is responsibility of the user)

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure 77

    References

    Belloro, Valeria (yesterday). The pragmatics of dative doubling inSpanish. 2007 RRG Conference. Mxico Butler, Christopher S. 2004.Corpus studies and functional linguistic theories. Functions ofLanguage 11/2: 147-86.

    Butler, Christopher S. (2004) Corpus studies and functional linguisticheories. Functions of Language 11/2: 147-86.

    Bybee, Joan. (2006) From Usage to Grammar The Mind's Response toRepetition. Language 82/4 711-33.

    Bybee, Joan, and Paul Hopper (2001). Frequency and the emergenceof linguistic structure. Amsterdam: John Benjamins.

    Company, Concepcin. (2001). Multiple dative-markinggrammaticalization. Spanish as a special kind of primary objectCroft, William. (2003). Lexical rules vs. constructions: a falsedichotomy. Motivation in Language. Studies in honor of GnterRadden. eds H. Cuyckens et al, 49-68. Amsterdam: John Benjamins.

    RRG 2007Garca-Miguel: Corpus-based approach to argument structure

    Dowty, David. (20009. 'The garden swarms with bees' and the fallacy'argument alternation'. Polysemy: theoretical and computationalapproaches. eds Y. Ravin, and C. Leacock, 110-128. Oxford: OxfordUniversity Press.

    Fillmore, Charles J. (1992). "Corpus linguistics" or "Computer-aidedarmchair linguistics". Directions in Corpus Linguistics. ed Jan Svartv35-60. Berlin: Mouton de Gruyter.

    Goldberg, Adele E. (1995). Constructions: A Construction GrammarApproach to Argument Structure. Chicago / London: University ofChicago Press.

    Goldberg, Adele E. (2002). Surface generalizations: An alternative toalternations. Cognitive Linguistics 13/4: 327-56.

    Gries, Stefan Th. (2003). Towards a corpus-based identification ofprototypical instances of constructions.Annual Review of CognitiveLinguistics 1: 1-27.

    Halliday, M. A. K. (2004). An introduction to functional grammar(3rdedition, revised by Christian M. I. M. Matthiessen). London: Arnold.

    Kilgarriff, Adam. (1997). "I don't believe in word senses". Computersand the Humanities 31: 91-113.

    Langacker, Ronald W. (1991). Foundations of Cognitive Grammar,Vol. II: Descriptive Application. Stanford: Stanford Univ. Press.

    Levin, Beth. (1993). English Verb Classes and Alternations: aPreliminary Investigation. Chicago: University of Chicago Press.

    Manning, Christopher. (2003). Probabilistic Syntax. ProbabilisticLinguistics. eds Rens Bod, Jennifer Hay, and Stefanie Jannedy, 289-341. Cambridge (Mass.): MIT Press.

    Nichols, Johanna. (1984). Functional theories of grammar.AnnualReview of Anthropology13: 97-117.

    Silverstein, Michael. (1976). Hierarchies of features and ergativity.Grammatical categories in Australian languages. ed. R. M. W. Dixon,112-71. Canberra Australian Institute of Aboriginal Studies.

    Thompson, Sandra A. and Paul Hopper. (2001). Transitivity, clausestructure, and argument structure: evidence from conversation.Frequency and the emergence of linguistic structure. eds Joan Bybeeand Paul Hopper, 27-60. Amsterdam: John Benjamins.

    Van Valin, Robert D. (2001). Functional Linguistics. The Handbook oLinguistics. eds M. Aronoff & J. Rees-Miller, 319-36. Oxford: Blackwe

    Van Valin, Robert D. (2005). Exploring the Syntax-Semantics InterfacCambridge: Cambridge University Press.

    Van Valin, Robert D. (2004). Lexical representation, co-composition,and linking syntax and semantics. (Ms)http://linguistics.buffalo.edu/research/rrg/vanvalin_papers/LexRepCoCompLnkgRRG.p

    Van Valin, Robert D., and Randy J. LaPolla. (1997). Syntax, Structuremeaning and function. Cambridge: Cambridge University Press.