gl'2005paper

Upload: rositsa-dekova

Post on 07-Apr-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/4/2019 GL'2005paper

    1/8

    LEXICAL ENCODING OF VERBS IN ENGLISH AND BULGARIAN

    Rositsa DekovaDepartment of Modern Languages, NTNU

    Trondheim, Norway, NO-4791

    [email protected]

    Abstract

    This paper focuses on the information that

    can be encoded in verbs as lexical entries, and

    its formal representation both in English and

    Bulgarian. For this purpose, an already existing,

    but not very widespread framework is used - the

    Sign Model (Dimitrova-Vulchanova, 1996/99;

    Hellan and Dimitrova-Vulchanova 2000) that

    describes words as meaningful cells, includingmorpho-syntactic information. Based on corpora

    data and results from continuation tests, my

    research is an attempt to find a unified format

    for representing lexical entries not only within a

    single language, but also across languages.

    1 IntroductionThe knowledge that native speakers

    demonstrate suggests that something beyond

    word-specific idiosyncratic properties needs tobe accounted for in the lexical representation of

    words. Information about the syntactic

    environment in which a particular word can

    appear should also be part of the lexical

    encoding, a finding that is particularly relevant

    for verbs.

    The assumption that only some participant

    information is encoded lexically is widespread

    across a number of different linguistic theories.

    Traditionally, participants in a situation, denoted

    by a particular verb, are divided into two main

    groups arguments and adjuncts (or

    complements and modifiers). A verb selects a set

    of arguments. The adjuncts, in contrast, are

    neither required, nor dependent on the particular

    verb. They can co-occur with many other verbs.

    Furthermore, syntactic realization does not

    always overlap with semantic obligatoriness.

    Therefore, I will refer to the set of entities

    included in a situation as semantic participants

    (following the terms in Koenig et al. 2002),where lexically encoded semantic participants

    are called arguments, and non-lexically encoded

    semantic participants adjuncts. The terms

    complementand modifierare used respectively

    for their syntactic correlates.

    2 Theoretical BackgroundAlthough it is widely accepted that the

    syntactic structure of many sentences is

    determined mostly or entirely by the participant

    information included in the lexical entries of

    verbs, there are no reliable syntactic criteriathat can be used to delimit the set of items that

    can express lexically encoded participant

    information. In other words, there is no

    established set of necessary and sufficient

    criteria that can serve as a clear-cut basis for

    the distinction between information that is

    lexically encoded and information that is not

    (that is the distinction between arguments and

    adjuncts).

    One very good solution, however, has been

    suggested in a paper by Koenig et al. ClassSpecificity and the Lexical Encoding of

    Participant Information (Koenig et al. 2002).

    They propose two criteria, semantic

    obligatoriness and verb class specificity, which

    jointly determine the argument status of

    participant information. The authors define

    lexically encoded information as that

    information which is accessed immediately

    upon recognition of a word. This is because

    only lexically encoded participant information

    is expected to play a role in the immediate

    representation that readers form for sentences.

    This information is said to be obligatory, that

    is, it is entailed to hold of the class of

    situations denoted by a word (ibid, p.226),

    and it is also relatively specific to the

    corresponding verbs. Those two properties can

    be directly observed by language users, and

    therefore, according to Koenig et al., they can

    serve as criteria providing a basis for learning

    the distinction between arguments andadjuncts.

    mailto:[email protected]:[email protected]
  • 8/4/2019 GL'2005paper

    2/8

    The contrast between the verbs in sentences

    (1) and (2) illustrates this approach:

    1. He cut the paper with the scissors.2. She drank the cocktail with a straw.While cutalways describes a situation where

    an instrument is included, drinkonly allows an

    instrument to be included in some types of

    situations denoted by it. Thus, the instrument

    phrase with the scissors is both obligatory and

    specific for the verb cut, but one does not have

    to use an instrument to drink. Therefore an

    instrument should be included in the lexical

    representation ofcut, and need not be present in

    the encoding ofdrink. Relying heavily on thesetwo criteria, namely semantic obligatoriness and

    verb class specificity, I have been able to isolate

    the participant information that should be

    encoded in the lexical representation of a

    selected set of verbs.

    3 My researchI have selected basic verb types in English

    and Bulgarian and examined their semantic

    properties with an account of their syntacticdistribution. Special attention was paid to

    approximately 20 verbs, subgroups of what are

    called Verbs of Contact by Impact(as defined in

    Levin, 1993) along with verbs that include

    motion (in Levins classification, those fall in

    the group ofThrow Verbs).

    In order to determine the possible morpho-

    syntactic environment of the verbs selected, I

    have partially analyzed the type of syntactic

    behaviour they exhibit in the available corpora. I

    have also tested the results of the analysesagainst native speaker judgments in two similar

    continuation tests for both languages.

    3.1 Corpora used in the researchI have used Brown and LOB corpora for

    English; for Bulgarian I have used a corpus that

    is still under construction in the Laboratory of

    Computer Modelling of Bulgarian Language, at

    the Bulgarian Academy of Sciences, where I did

    part of my field research. The corpora researchwas aimed at revealing the possible and/or

    preferred syntactic environment of the relevant

    verbs, as well as an analysis of the most

    common semantic participants in relation with

    their syntactic distribution.

    For the purpose of this paper, I will only

    show a few illustrative examples for English

    (LOB corpora only) and Bulgarian. And also,as the Bulgarian corpora is very large, but does

    not allow for specific searches yet, only the

    first 100 of the occurrences have been analysed

    in detail for verbs occurring more often.

    3.2 Results and examplesThe results from the corpora research are

    summarised in Table 1 and Table 2, for English

    and Bulgarian, respectively.As expected, the verbs examined showed a

    great tendency to occur in a syntactic

    environment that consisted of elements that are

    overt expressions of the semantic participants

    linked to the particular verb. Thus I could

    observe the relations between a semantic

    participant of a verb and the possible syntactic

    positions it can occupy with this verb being the

    main verb in the sentence. Special attention is

    paid to some of the information obtained from

    the corpora analyses, and its significance forthe lexical representation of the verbs. The

    focus of the paper is on the results for

    Bulgarian, because it is not so well studied, and

    therefore perhaps more interesting to discuss.

    One peculiarity can be clearly seen in three

    of the Bulgarian verbs analyzed:proboda/stab,

    pljasna/slap, and potupam/tap. Aside from the

    appearance of the traditionally accepted

    complement (the direct object), which here is

    referred to as Limit (the participant that is

    affected or changed), we can see that there arealso many phrases that are identified as Body-

    Part/Possessor (Levins term, ibid: 71), as

    illustrated by the following examples:

    (3) i go probode v sartseto.

    and stabbed him in his heart.

    (4)Tja pljasna Gabi po koljanoto.She slapped Gabi on her knee.

    (5)i go potupah po ramoto.and slapped him on his shoulder

  • 8/4/2019 GL'2005paper

    3/8

    The high degree of occurrences of those

    phrases with particular verbs suggests that the

    information conveyed by them is an important

    part of the lexical representation of the verbs.

    This, however, does not directly mean that we

    should include them as separate participants in

    the situation described. On the contrary, the

    Possessor Raising phrase specifies theLimit, or

    more precisely, the place of its contact with

    what is called the Launch-part, but it does not

    constitute a separate participant. Therefore, we

    should distinguish between different types of

    information all of which is important for the

    lexical encoding of particular verb, but which

    should not be treated in the same way.

    Another interesting example would be the

    presence ofpath in the verbpljasna/slap:

    (6) ...opashkata mu, , pljasna vav vodata.

    his tail, , slapped in the water.

    (7) ...tja pljasna dolu...

    she slapped down

    (She fell down)

    (8)...and pljasna dolu varhu trevata.and slapped down onto the grass.

    (She fell down on the grass)

    Whereas in sentence (6) the prepositional

    phrase in the watercan not be initially identified

    as path, the prepositional phrases in sentences

    (7) and (8) show that in the water should also

    be regarded as an overt expression of the end of

    path information that is lexically encoded in the

    verb. A similar behaviour is observed for other

    verbs of motion (see for example Dimitrova-

    Vulahanova, 2004).

    4 The continuation testsTo determine what kind of participant

    information should be included in the lexical

    encoding of this particular set of verbs, I have

    used not only data from English and Bulgarian

    corpora, as described earlier, but also the results

    from two similar continuation tests conducted

    for both English and Bulgarian. These tests were

    developed to test native speakers intuition

    about the most prominent participants in asituation denoted by the target verbs.

    4.1 Methodology of the testsThe tests were organized as follows: there

    were 50 to 60 sentences, containing as many as

    20 target verbs, together with approximately

    the same amount of sentences containing

    distracter verbs, equally distributed among thetarget sentences. The first 30-40 sentences

    consisted only of a subject and a verb, while

    the last sentences also contained a direct object.

    All the participants in the tests were asked to

    complete the sentences without spending too

    much time on any of the items. I encouraged

    the participants to write down each

    continuation fast; so that it would be the first

    thing that came into their mind (additional

    literature on the methodology of similar type oftests can be found in Koenig et al. 2002, 2003).

    The main idea behind the continuation tests

    was to confirm the hypothesis that, if implicit

    participant information is lexically encoded,

    then it will play an important role in the

    immediate representation that the readers form

    for sentences, and is therefore more likely to be

    used to continue a sentence. Thus I expected to

    receive a significantly higher percentage of

    continuations related to semantic participant

    information, than the percentage of theresponses that do not include lexically encoded

    participant information.

    4.2 Results and analysis

    Some of the results for Bulgarian (in per

    cent) can be seen in Table 3, in the Appendix.

    The tests for English have not yet been fully

    completed, but the analyses so far are

    consistent with the results from the Bulgarian

    tests. There are higher percentages ofcontinuations related to information about

    semantic participants: approximately 90% of

    the continuations for tap, stab, and cutcan be

    defined as Limit. And since the answers differ

    widely from each other (e.g. continuations for

    cut included: the bread, John, her finger, her

    hair, her knee) it can not just be assumed that

    the results are merely due to the existence of a

    certain stereotype or a phraseological unit

    containing the target verb.

    The continuations provided for thesentences also confirmed the corpora analyses

  • 8/4/2019 GL'2005paper

    4/8

    as can be seen in the tables in the Appendix. In

    addition, the continuations for the second half of

    the sentences in the tests, which were virtually

    complete (as described earlier, the sentences

    had a subject, verb, and a direct object),

    contained a high degree of fillers that were

    consistent with the information assumed to besemantically encoded. Instrument/Body

    extension constituted 90% of the fillers for stab,

    27% - for cut, and 37% - for tap.

    5 The formal descriptionSo far we have seen that the kind of

    participant information that can be lexically

    encoded, is semantically obligatory, and is

    restricted to a verb, or a verb class in terms ofselection. Based on the data from the corpora

    research, together with the results from the

    continuation tests, I have tried to find a suitable

    formalized lexical representation for the set of

    verbs selected, that is, a representation that will

    account for the considerable complexity, and

    subtlety of their meaning.

    Following a proposal made by Hellan and

    Vulchanova (Hellan & Dimitrova-Vulchanova

    2000) I assume that there is a set of lexical

    semantic factors that serves as the basis forpredictions about the possible morpho-syntactic

    environment of a verb. One of the potential

    members of that set is called criteriality. In

    order to define criteriality first I must briefly

    describe the structural unit constituting the

    meaning of a verb, called a cell (Dimitrova-

    Vulchanova 1996/99).

    A cell consists of two parts an aspectual

    part and a dimensional part. The aspectual part

    specifies the following factors:

    a. Situational vs. Non-Situational reflecting whether what is expressed by

    verb is situated in time or not.

    b. Dynamic vs. Stative relevant only forSituational verbs and reflecting whether

    some kind of change or Force emission is

    involved or not.

    c. Monodevelopmental vs. Non-Monodevelopmental depending on

    whether the dimensional part includes

    Monodevelopment or not.

    d. Protracted vs. Non-Protracted acontrast that is close to the traditional

    distinction durational vs. non-

    durational

    The dimensional part consists of a number

    of dimensions. Each of them reflects a differentaspect of the involvement of one and the same

    participant in the situation denoted by the verb.

    It is a new decompositional approach to the

    traditional Theta-roles (Dowty, 1991). The

    importance has been shifted to the number of

    the participants, as well as to the differentiation

    of the sub-events constituting the main event.

    Each of those sub-events should be separately

    described in detail. All of them describe the

    situation as a whole.The dimensions may consist of one or more

    values. The dimension of Force, then, may

    incorporate the values of Source (the

    participant performing the action),Launch-Part

    (the part of the participant, if any, performing

    the action), andLimit(the item upon which the

    force has been performed). The Control

    dimension (for action that is under the control

    of a participant) will incorporate the values of

    the Controller, theMeans, and the Target. The

    dimension of Monodevelopment (short formonotonic development) includes the value

    of a Monodeveloper (the one performing the

    monotonic development) together with

    information about the possible respects in

    which the development can take place

    Integrity, Location (mainly regarding path),

    and Quality, being one of the main cases.

    For many verbs a further dimension of

    Conditioning is possible in close relation with

    Monodevelopment. Conditioning applies

    when, in a given context, a given event oractor, called the Conditioner, is sufficient to

    release a certain event, called Conditioned. For

    example, in John broke the window, John is

    the Conditioner for the event of breaking the

    window. In contrast, in The window broke, no

    Conditioner is identified and no Conditioning

    obtains in this usage of break.

    A participant in a situation is thus defined

    by the set of values characterising co-indexed

    elements in the different dimensions.

    Furthermore, the meaning of a verb (the Cell)is identified with the conditions that have to be

  • 8/4/2019 GL'2005paper

    5/8

    met by the participants in a situation so that it

    can count as being expressed by this particular

    verb. The notion of Criteriality, then, applies to

    the items of a cell that have properties by which

    the situation is easily identified as belonging to

    a certain type.

    The following items (in Hellan andVulchanova 2000) are defined as criterial:

    1. An item with the valueMonodeveloper

    2. A Source whose Launch-part(a) behaves monotonically, or

    (b) is specified for inherent

    properties

    3. A Limit with sustained contact4.

    An item characterized for Posture5. A Source for an iterative activitywith a cumulative Target

    With this theoretical approach as the basis of

    my research, I have attempted to find a unified

    format for representing lexical entries not only

    within a single language, but also across

    languages. Thus a more formal (and probably

    more accurate) comparison of verbs (their

    meaning, as well as and their syntactic

    behaviour) can be achieved, as it is easilypossible to compare verbs that also encode

    more/less information than their correlates.

    A more in depth analysis of the basic cell of

    the verb tap/potupam illustrates this approach:

    Cell of tap/potupam

    Global specification: +Protracted

    Constituency of Development: Recursion based

    Mode of recursion: iterative

    Recursive unit: CellnAspectual Specification: +2-point

    Element specification:

    Conditioning|Constituency| Force| Monodevelopment

    Conditioner1 Source1

    Fingers2 Launch-part2Monodeveloper

    2

    Limit3

    Conditioned2

    Cell:

    Aspectual specification: +2-point

    Element specification:

    MonodevelopmentaElement: 2

    Phasing: +2-point

    Medium: Location

    Line of Trajectory

    End:Contact with 3

    Limit3

    Thus sentences (9) and (10) can be

    evaluated as described bellow:

    (9) John tapped his fingers on the desk.(10) His fingers tapped on the desk.

    In the context of (9) John tapped his

    fingers on the desk,John will be defined as theset of values (Conditioner1, Source1), his

    fingers (Fingers2, Launch-part2, Mover2), and

    the desk - (Absorber3, Limit3). However, in the

    case of (10) His fingers tapped on the desk,

    the dimension of Conditioning will not be

    present.

    This representational format makes this not

    only possible but also easily predictable. The

    basic cell contains two items that can be

    counted as criterial one of them is John,

    according to 2(a) (a Source whose Launch-partbehaves monotonically), and the other one is

    his fingers, according to 1 (an item with the

    value Monodeveloper).

    The alternation, described in Levin (1993)

    as Causative/Inchoative, was incorrectly

    predicted by Levin as impossible with the verb

    tap. According to Levins criteria, verbs

    undergoing Causative/Inchoative Alternation

    can be characterized as verbs of Change of

    State or Change of Location. The verb tap is a

    member of theHitVerbs, a sub-group of Verbsof Contact by Impact, and of two other verb

  • 8/4/2019 GL'2005paper

    6/8

    groups, Throw verbs andInvestigate Verbs, and

    was not predicted to be able to allow this

    alternation. Levin may have come to this

    incorrect conclusion by overlooking the

    individual participants and the single sub-events

    in the situation, instead regarding the situation

    as a whole. As we have already seen, there is, infact, change of location, but with regard to the

    Launch-part only.

    6 ConclusionI have tried to show that breaking up

    information for encoding into relevant semantic

    features, and using a suitable formal

    representation are crucial in finding a unifiedformat of representing lexical entries, not only

    within a single language, but also across

    languages. Thus a more formal (and probably

    more accurate) comparison of verbs (their

    meaning and their syntactic behaviour) can be

    achieved, because this approach makes it

    possible to compare verbs that encode more/less

    information than their correlates. It would also

    be very interesting to investigate whether the

    representational format presented in this paper

    can be integrated within some of the well-known lexical theories, such as Pustejovskys

    Generative Lexicon (Pustejovsky, 1995). Thus

    an investigation of the possible optimal

    solutions will be pursued to describe those verbs

    that do not have semantic equivalents in another

    language. This will lead my research to a new

    stage: the creation of a VerbNet, as a

    Distributed Lexical Database, containing a

    network of verb classes with their semantic

    features.

    7 AcknowledgmentsI would like to thank my colleagues and

    friends at the Department of Modern

    Languages, NTNU, Trondheim, who supported

    me my research. As well as my colleagues at

    the Laboratory of Computer Modelling ofBulgarian Language, at the Bulgarian Academy

    of Sciences, Sofia, where I collected my data

    for Bulgarian and who accepted me as part of

    their research team.

    References

    M. Dimitrova-Vulchanova, 1996/99. Verb

    Semantics, Diathesis and Aspect. Doctoral

    dissertation, NTNU (University ofTrondheim)/LINCOM, Newcastle/

    Munchen

    M. Dimitrova-Vulchanova, 2004. Paths in

    Verbs of Motion. Presented at the Argument

    Structure CASTLE Conference, November

    4-6, 2004, Troms University

    D. Dowty, 1991. Thematic proto-roles and

    argument selection. Language, 67(3): 547-

    619.

    L. Hellan, and M. Dimitrova-Vulchanova 2000.

    Criteriality and Grammatical Realization. Lexical Specification and insertion. CLIT

    series, John Benjamins.

    J. P. Koenig, G. Mauner, B. Bienvenue,

    (2002). Class Specificity and the Lexical

    Encoding of Participant Information. Brain

    and Language, 81, 224-235.

    J. P. Koenig, G. Mauner, and B. Bienvenue,

    (2003). Arguments for Adjuncts. Cognition,

    89, 67-103.

    Levin, B. 1993. English Verb Classes and

    Alternations. Chicago and London:University of Chicago Press.

    J. Pustejovsky, 1995. The Generative Lexicon.

    The MIT Press.

  • 8/4/2019 GL'2005paper

    7/8

    Appendix:

    Table 1: The English corpus data

    USAGE SUBJECT OBJECTVERBTrans Intrans Lit Fig

    Human/

    part

    Meta-

    phor

    Instr/ or

    body extensOther Argument Adjunct

    cut 6923

    37 pass86 43

    43 source

    4 limit4 limit 7

    6 source

    29 limit

    58 limit

    5 instrument12 manner

    stab 41

    1 pass2 2

    2 source

    1 limit- - 1

    2 limit

    1 BPP1 manner

    slap 83

    1 pass12 -

    8 source

    1 limit- - 2 source

    9 limit

    3 part. loc.

    2 BPP

    4 manner

    tap 14 2 15 1 15 source - - 1

    16 limit

    4 instrument

    1 BPP

    3 manner

    1 quantity

    Table 2: The Bulgarian corpus data

    USAGE SUBJECT OBJECT

    VERBTrans Intr Lit Fig

    Human/

    part

    Meta

    phor

    Instr /or

    body ext.Other Argument Adjunct

    rezha

    (cut)

    71

    1 refl.

    21

    7 se-passive 90 12 39source 3

    2/

    2 wings

    5 source

    7 limit

    71 limit

    12 instr2 BPP

    20 mann

    8 loc3 quant

    proboda

    (stab)

    83

    4 refl.- 72 15 34source 14 12 4 source

    84 limit

    24 BPP

    23 instr

    9 mann

    4 quant

    1 time

    1 loc

    pljasna

    (slap)10 31 41 -

    21source

    1 (face)-

    3

    (feet/tail)

    3

    (object)

    4 (bird)

    19 limit

    15 instr

    13 BPP

    6 path end

    3 mann

    3 quant

    2 loc

    potupam

    (tap)

    93

    5 refl.

    2100 - 76source - 2 -

    95 limit

    72 BPP

    7 instr/b.e.

    13 mann

    2 loc

    1 time

  • 8/4/2019 GL'2005paper

    8/8

    Table 3: Results from the continuation test for Bulgarian

    SentenceInstr/

    Body extLimit Path

    Body-Part/

    PossessorLoc Temp Manner Other

    3. Bob potupa

    Bob tapped10

    90

    +3 +16 +3

    8. Lucy pljasna

    Lucy slapped37

    26

    +10 +313 7

    17

    shamar

    slap

    9. Margaret otrjaza

    Margaret cut93 7 (s.o.)

    11. Billy probode

    Billy stabbed

    7

    +2

    90

    +13(-)

    26. Knigata pljasna

    The book slapped10

    70 origin

    3 end10

    3

    3(-)

    28. Nozhat rezhe

    The knife cuts43 47

    7

    3(-)

    32. Valnite pljaskaha

    The waves slapped57 3 27

    3

    10(-)

    34. Lilly otrjaza hljaba

    Lilly cut the bread27 47 26

    36. Ann probode mesoto

    Ann stabbed the meat90 10

    38. Iva potupvashe po

    masata

    Iva tapped on the table

    37 533

    7(-)

    Legend:

    Argument: lexically encoded semantic

    participant

    Adjunct: non-lexically encoded semanticparticipant

    Instr/or body ext. (b.e.): instrument or body

    extension (hand, finger, leg, foot) used as an

    instrument

    Human/part: human or part of a human (face,

    head, eyes, hair)

    Trans: transitive usage of the verb

    Intrans: intransitive usage of the verb

    Lit: literal usage of the verb

    Fig: figurative usage of the verb

    Source: the participant performing the action

    Limit: the participant that is affected or

    changed

    BPP: body-part/possessor

    Loc: (event) location

    Part. loc: participant locationPass: passive

    Se-pass: se-passive (a certain type of passive in

    Bulgarian)

    Refl: reflexive

    Quant: quantity

    Mann: manner

    Temp: temporal (time)

    (-): no continuation was provided

    (s.o.): someone

    +: refers to continuations provided in addition

    to the first one