gl'2005paper
TRANSCRIPT
-
8/4/2019 GL'2005paper
1/8
LEXICAL ENCODING OF VERBS IN ENGLISH AND BULGARIAN
Rositsa DekovaDepartment of Modern Languages, NTNU
Trondheim, Norway, NO-4791
Abstract
This paper focuses on the information that
can be encoded in verbs as lexical entries, and
its formal representation both in English and
Bulgarian. For this purpose, an already existing,
but not very widespread framework is used - the
Sign Model (Dimitrova-Vulchanova, 1996/99;
Hellan and Dimitrova-Vulchanova 2000) that
describes words as meaningful cells, includingmorpho-syntactic information. Based on corpora
data and results from continuation tests, my
research is an attempt to find a unified format
for representing lexical entries not only within a
single language, but also across languages.
1 IntroductionThe knowledge that native speakers
demonstrate suggests that something beyond
word-specific idiosyncratic properties needs tobe accounted for in the lexical representation of
words. Information about the syntactic
environment in which a particular word can
appear should also be part of the lexical
encoding, a finding that is particularly relevant
for verbs.
The assumption that only some participant
information is encoded lexically is widespread
across a number of different linguistic theories.
Traditionally, participants in a situation, denoted
by a particular verb, are divided into two main
groups arguments and adjuncts (or
complements and modifiers). A verb selects a set
of arguments. The adjuncts, in contrast, are
neither required, nor dependent on the particular
verb. They can co-occur with many other verbs.
Furthermore, syntactic realization does not
always overlap with semantic obligatoriness.
Therefore, I will refer to the set of entities
included in a situation as semantic participants
(following the terms in Koenig et al. 2002),where lexically encoded semantic participants
are called arguments, and non-lexically encoded
semantic participants adjuncts. The terms
complementand modifierare used respectively
for their syntactic correlates.
2 Theoretical BackgroundAlthough it is widely accepted that the
syntactic structure of many sentences is
determined mostly or entirely by the participant
information included in the lexical entries of
verbs, there are no reliable syntactic criteriathat can be used to delimit the set of items that
can express lexically encoded participant
information. In other words, there is no
established set of necessary and sufficient
criteria that can serve as a clear-cut basis for
the distinction between information that is
lexically encoded and information that is not
(that is the distinction between arguments and
adjuncts).
One very good solution, however, has been
suggested in a paper by Koenig et al. ClassSpecificity and the Lexical Encoding of
Participant Information (Koenig et al. 2002).
They propose two criteria, semantic
obligatoriness and verb class specificity, which
jointly determine the argument status of
participant information. The authors define
lexically encoded information as that
information which is accessed immediately
upon recognition of a word. This is because
only lexically encoded participant information
is expected to play a role in the immediate
representation that readers form for sentences.
This information is said to be obligatory, that
is, it is entailed to hold of the class of
situations denoted by a word (ibid, p.226),
and it is also relatively specific to the
corresponding verbs. Those two properties can
be directly observed by language users, and
therefore, according to Koenig et al., they can
serve as criteria providing a basis for learning
the distinction between arguments andadjuncts.
mailto:[email protected]:[email protected] -
8/4/2019 GL'2005paper
2/8
The contrast between the verbs in sentences
(1) and (2) illustrates this approach:
1. He cut the paper with the scissors.2. She drank the cocktail with a straw.While cutalways describes a situation where
an instrument is included, drinkonly allows an
instrument to be included in some types of
situations denoted by it. Thus, the instrument
phrase with the scissors is both obligatory and
specific for the verb cut, but one does not have
to use an instrument to drink. Therefore an
instrument should be included in the lexical
representation ofcut, and need not be present in
the encoding ofdrink. Relying heavily on thesetwo criteria, namely semantic obligatoriness and
verb class specificity, I have been able to isolate
the participant information that should be
encoded in the lexical representation of a
selected set of verbs.
3 My researchI have selected basic verb types in English
and Bulgarian and examined their semantic
properties with an account of their syntacticdistribution. Special attention was paid to
approximately 20 verbs, subgroups of what are
called Verbs of Contact by Impact(as defined in
Levin, 1993) along with verbs that include
motion (in Levins classification, those fall in
the group ofThrow Verbs).
In order to determine the possible morpho-
syntactic environment of the verbs selected, I
have partially analyzed the type of syntactic
behaviour they exhibit in the available corpora. I
have also tested the results of the analysesagainst native speaker judgments in two similar
continuation tests for both languages.
3.1 Corpora used in the researchI have used Brown and LOB corpora for
English; for Bulgarian I have used a corpus that
is still under construction in the Laboratory of
Computer Modelling of Bulgarian Language, at
the Bulgarian Academy of Sciences, where I did
part of my field research. The corpora researchwas aimed at revealing the possible and/or
preferred syntactic environment of the relevant
verbs, as well as an analysis of the most
common semantic participants in relation with
their syntactic distribution.
For the purpose of this paper, I will only
show a few illustrative examples for English
(LOB corpora only) and Bulgarian. And also,as the Bulgarian corpora is very large, but does
not allow for specific searches yet, only the
first 100 of the occurrences have been analysed
in detail for verbs occurring more often.
3.2 Results and examplesThe results from the corpora research are
summarised in Table 1 and Table 2, for English
and Bulgarian, respectively.As expected, the verbs examined showed a
great tendency to occur in a syntactic
environment that consisted of elements that are
overt expressions of the semantic participants
linked to the particular verb. Thus I could
observe the relations between a semantic
participant of a verb and the possible syntactic
positions it can occupy with this verb being the
main verb in the sentence. Special attention is
paid to some of the information obtained from
the corpora analyses, and its significance forthe lexical representation of the verbs. The
focus of the paper is on the results for
Bulgarian, because it is not so well studied, and
therefore perhaps more interesting to discuss.
One peculiarity can be clearly seen in three
of the Bulgarian verbs analyzed:proboda/stab,
pljasna/slap, and potupam/tap. Aside from the
appearance of the traditionally accepted
complement (the direct object), which here is
referred to as Limit (the participant that is
affected or changed), we can see that there arealso many phrases that are identified as Body-
Part/Possessor (Levins term, ibid: 71), as
illustrated by the following examples:
(3) i go probode v sartseto.
and stabbed him in his heart.
(4)Tja pljasna Gabi po koljanoto.She slapped Gabi on her knee.
(5)i go potupah po ramoto.and slapped him on his shoulder
-
8/4/2019 GL'2005paper
3/8
The high degree of occurrences of those
phrases with particular verbs suggests that the
information conveyed by them is an important
part of the lexical representation of the verbs.
This, however, does not directly mean that we
should include them as separate participants in
the situation described. On the contrary, the
Possessor Raising phrase specifies theLimit, or
more precisely, the place of its contact with
what is called the Launch-part, but it does not
constitute a separate participant. Therefore, we
should distinguish between different types of
information all of which is important for the
lexical encoding of particular verb, but which
should not be treated in the same way.
Another interesting example would be the
presence ofpath in the verbpljasna/slap:
(6) ...opashkata mu, , pljasna vav vodata.
his tail, , slapped in the water.
(7) ...tja pljasna dolu...
she slapped down
(She fell down)
(8)...and pljasna dolu varhu trevata.and slapped down onto the grass.
(She fell down on the grass)
Whereas in sentence (6) the prepositional
phrase in the watercan not be initially identified
as path, the prepositional phrases in sentences
(7) and (8) show that in the water should also
be regarded as an overt expression of the end of
path information that is lexically encoded in the
verb. A similar behaviour is observed for other
verbs of motion (see for example Dimitrova-
Vulahanova, 2004).
4 The continuation testsTo determine what kind of participant
information should be included in the lexical
encoding of this particular set of verbs, I have
used not only data from English and Bulgarian
corpora, as described earlier, but also the results
from two similar continuation tests conducted
for both English and Bulgarian. These tests were
developed to test native speakers intuition
about the most prominent participants in asituation denoted by the target verbs.
4.1 Methodology of the testsThe tests were organized as follows: there
were 50 to 60 sentences, containing as many as
20 target verbs, together with approximately
the same amount of sentences containing
distracter verbs, equally distributed among thetarget sentences. The first 30-40 sentences
consisted only of a subject and a verb, while
the last sentences also contained a direct object.
All the participants in the tests were asked to
complete the sentences without spending too
much time on any of the items. I encouraged
the participants to write down each
continuation fast; so that it would be the first
thing that came into their mind (additional
literature on the methodology of similar type oftests can be found in Koenig et al. 2002, 2003).
The main idea behind the continuation tests
was to confirm the hypothesis that, if implicit
participant information is lexically encoded,
then it will play an important role in the
immediate representation that the readers form
for sentences, and is therefore more likely to be
used to continue a sentence. Thus I expected to
receive a significantly higher percentage of
continuations related to semantic participant
information, than the percentage of theresponses that do not include lexically encoded
participant information.
4.2 Results and analysis
Some of the results for Bulgarian (in per
cent) can be seen in Table 3, in the Appendix.
The tests for English have not yet been fully
completed, but the analyses so far are
consistent with the results from the Bulgarian
tests. There are higher percentages ofcontinuations related to information about
semantic participants: approximately 90% of
the continuations for tap, stab, and cutcan be
defined as Limit. And since the answers differ
widely from each other (e.g. continuations for
cut included: the bread, John, her finger, her
hair, her knee) it can not just be assumed that
the results are merely due to the existence of a
certain stereotype or a phraseological unit
containing the target verb.
The continuations provided for thesentences also confirmed the corpora analyses
-
8/4/2019 GL'2005paper
4/8
as can be seen in the tables in the Appendix. In
addition, the continuations for the second half of
the sentences in the tests, which were virtually
complete (as described earlier, the sentences
had a subject, verb, and a direct object),
contained a high degree of fillers that were
consistent with the information assumed to besemantically encoded. Instrument/Body
extension constituted 90% of the fillers for stab,
27% - for cut, and 37% - for tap.
5 The formal descriptionSo far we have seen that the kind of
participant information that can be lexically
encoded, is semantically obligatory, and is
restricted to a verb, or a verb class in terms ofselection. Based on the data from the corpora
research, together with the results from the
continuation tests, I have tried to find a suitable
formalized lexical representation for the set of
verbs selected, that is, a representation that will
account for the considerable complexity, and
subtlety of their meaning.
Following a proposal made by Hellan and
Vulchanova (Hellan & Dimitrova-Vulchanova
2000) I assume that there is a set of lexical
semantic factors that serves as the basis forpredictions about the possible morpho-syntactic
environment of a verb. One of the potential
members of that set is called criteriality. In
order to define criteriality first I must briefly
describe the structural unit constituting the
meaning of a verb, called a cell (Dimitrova-
Vulchanova 1996/99).
A cell consists of two parts an aspectual
part and a dimensional part. The aspectual part
specifies the following factors:
a. Situational vs. Non-Situational reflecting whether what is expressed by
verb is situated in time or not.
b. Dynamic vs. Stative relevant only forSituational verbs and reflecting whether
some kind of change or Force emission is
involved or not.
c. Monodevelopmental vs. Non-Monodevelopmental depending on
whether the dimensional part includes
Monodevelopment or not.
d. Protracted vs. Non-Protracted acontrast that is close to the traditional
distinction durational vs. non-
durational
The dimensional part consists of a number
of dimensions. Each of them reflects a differentaspect of the involvement of one and the same
participant in the situation denoted by the verb.
It is a new decompositional approach to the
traditional Theta-roles (Dowty, 1991). The
importance has been shifted to the number of
the participants, as well as to the differentiation
of the sub-events constituting the main event.
Each of those sub-events should be separately
described in detail. All of them describe the
situation as a whole.The dimensions may consist of one or more
values. The dimension of Force, then, may
incorporate the values of Source (the
participant performing the action),Launch-Part
(the part of the participant, if any, performing
the action), andLimit(the item upon which the
force has been performed). The Control
dimension (for action that is under the control
of a participant) will incorporate the values of
the Controller, theMeans, and the Target. The
dimension of Monodevelopment (short formonotonic development) includes the value
of a Monodeveloper (the one performing the
monotonic development) together with
information about the possible respects in
which the development can take place
Integrity, Location (mainly regarding path),
and Quality, being one of the main cases.
For many verbs a further dimension of
Conditioning is possible in close relation with
Monodevelopment. Conditioning applies
when, in a given context, a given event oractor, called the Conditioner, is sufficient to
release a certain event, called Conditioned. For
example, in John broke the window, John is
the Conditioner for the event of breaking the
window. In contrast, in The window broke, no
Conditioner is identified and no Conditioning
obtains in this usage of break.
A participant in a situation is thus defined
by the set of values characterising co-indexed
elements in the different dimensions.
Furthermore, the meaning of a verb (the Cell)is identified with the conditions that have to be
-
8/4/2019 GL'2005paper
5/8
met by the participants in a situation so that it
can count as being expressed by this particular
verb. The notion of Criteriality, then, applies to
the items of a cell that have properties by which
the situation is easily identified as belonging to
a certain type.
The following items (in Hellan andVulchanova 2000) are defined as criterial:
1. An item with the valueMonodeveloper
2. A Source whose Launch-part(a) behaves monotonically, or
(b) is specified for inherent
properties
3. A Limit with sustained contact4.
An item characterized for Posture5. A Source for an iterative activitywith a cumulative Target
With this theoretical approach as the basis of
my research, I have attempted to find a unified
format for representing lexical entries not only
within a single language, but also across
languages. Thus a more formal (and probably
more accurate) comparison of verbs (their
meaning, as well as and their syntactic
behaviour) can be achieved, as it is easilypossible to compare verbs that also encode
more/less information than their correlates.
A more in depth analysis of the basic cell of
the verb tap/potupam illustrates this approach:
Cell of tap/potupam
Global specification: +Protracted
Constituency of Development: Recursion based
Mode of recursion: iterative
Recursive unit: CellnAspectual Specification: +2-point
Element specification:
Conditioning|Constituency| Force| Monodevelopment
Conditioner1 Source1
Fingers2 Launch-part2Monodeveloper
2
Limit3
Conditioned2
Cell:
Aspectual specification: +2-point
Element specification:
MonodevelopmentaElement: 2
Phasing: +2-point
Medium: Location
Line of Trajectory
End:Contact with 3
Limit3
Thus sentences (9) and (10) can be
evaluated as described bellow:
(9) John tapped his fingers on the desk.(10) His fingers tapped on the desk.
In the context of (9) John tapped his
fingers on the desk,John will be defined as theset of values (Conditioner1, Source1), his
fingers (Fingers2, Launch-part2, Mover2), and
the desk - (Absorber3, Limit3). However, in the
case of (10) His fingers tapped on the desk,
the dimension of Conditioning will not be
present.
This representational format makes this not
only possible but also easily predictable. The
basic cell contains two items that can be
counted as criterial one of them is John,
according to 2(a) (a Source whose Launch-partbehaves monotonically), and the other one is
his fingers, according to 1 (an item with the
value Monodeveloper).
The alternation, described in Levin (1993)
as Causative/Inchoative, was incorrectly
predicted by Levin as impossible with the verb
tap. According to Levins criteria, verbs
undergoing Causative/Inchoative Alternation
can be characterized as verbs of Change of
State or Change of Location. The verb tap is a
member of theHitVerbs, a sub-group of Verbsof Contact by Impact, and of two other verb
-
8/4/2019 GL'2005paper
6/8
groups, Throw verbs andInvestigate Verbs, and
was not predicted to be able to allow this
alternation. Levin may have come to this
incorrect conclusion by overlooking the
individual participants and the single sub-events
in the situation, instead regarding the situation
as a whole. As we have already seen, there is, infact, change of location, but with regard to the
Launch-part only.
6 ConclusionI have tried to show that breaking up
information for encoding into relevant semantic
features, and using a suitable formal
representation are crucial in finding a unifiedformat of representing lexical entries, not only
within a single language, but also across
languages. Thus a more formal (and probably
more accurate) comparison of verbs (their
meaning and their syntactic behaviour) can be
achieved, because this approach makes it
possible to compare verbs that encode more/less
information than their correlates. It would also
be very interesting to investigate whether the
representational format presented in this paper
can be integrated within some of the well-known lexical theories, such as Pustejovskys
Generative Lexicon (Pustejovsky, 1995). Thus
an investigation of the possible optimal
solutions will be pursued to describe those verbs
that do not have semantic equivalents in another
language. This will lead my research to a new
stage: the creation of a VerbNet, as a
Distributed Lexical Database, containing a
network of verb classes with their semantic
features.
7 AcknowledgmentsI would like to thank my colleagues and
friends at the Department of Modern
Languages, NTNU, Trondheim, who supported
me my research. As well as my colleagues at
the Laboratory of Computer Modelling ofBulgarian Language, at the Bulgarian Academy
of Sciences, Sofia, where I collected my data
for Bulgarian and who accepted me as part of
their research team.
References
M. Dimitrova-Vulchanova, 1996/99. Verb
Semantics, Diathesis and Aspect. Doctoral
dissertation, NTNU (University ofTrondheim)/LINCOM, Newcastle/
Munchen
M. Dimitrova-Vulchanova, 2004. Paths in
Verbs of Motion. Presented at the Argument
Structure CASTLE Conference, November
4-6, 2004, Troms University
D. Dowty, 1991. Thematic proto-roles and
argument selection. Language, 67(3): 547-
619.
L. Hellan, and M. Dimitrova-Vulchanova 2000.
Criteriality and Grammatical Realization. Lexical Specification and insertion. CLIT
series, John Benjamins.
J. P. Koenig, G. Mauner, B. Bienvenue,
(2002). Class Specificity and the Lexical
Encoding of Participant Information. Brain
and Language, 81, 224-235.
J. P. Koenig, G. Mauner, and B. Bienvenue,
(2003). Arguments for Adjuncts. Cognition,
89, 67-103.
Levin, B. 1993. English Verb Classes and
Alternations. Chicago and London:University of Chicago Press.
J. Pustejovsky, 1995. The Generative Lexicon.
The MIT Press.
-
8/4/2019 GL'2005paper
7/8
Appendix:
Table 1: The English corpus data
USAGE SUBJECT OBJECTVERBTrans Intrans Lit Fig
Human/
part
Meta-
phor
Instr/ or
body extensOther Argument Adjunct
cut 6923
37 pass86 43
43 source
4 limit4 limit 7
6 source
29 limit
58 limit
5 instrument12 manner
stab 41
1 pass2 2
2 source
1 limit- - 1
2 limit
1 BPP1 manner
slap 83
1 pass12 -
8 source
1 limit- - 2 source
9 limit
3 part. loc.
2 BPP
4 manner
tap 14 2 15 1 15 source - - 1
16 limit
4 instrument
1 BPP
3 manner
1 quantity
Table 2: The Bulgarian corpus data
USAGE SUBJECT OBJECT
VERBTrans Intr Lit Fig
Human/
part
Meta
phor
Instr /or
body ext.Other Argument Adjunct
rezha
(cut)
71
1 refl.
21
7 se-passive 90 12 39source 3
2/
2 wings
5 source
7 limit
71 limit
12 instr2 BPP
20 mann
8 loc3 quant
proboda
(stab)
83
4 refl.- 72 15 34source 14 12 4 source
84 limit
24 BPP
23 instr
9 mann
4 quant
1 time
1 loc
pljasna
(slap)10 31 41 -
21source
1 (face)-
3
(feet/tail)
3
(object)
4 (bird)
19 limit
15 instr
13 BPP
6 path end
3 mann
3 quant
2 loc
potupam
(tap)
93
5 refl.
2100 - 76source - 2 -
95 limit
72 BPP
7 instr/b.e.
13 mann
2 loc
1 time
-
8/4/2019 GL'2005paper
8/8
Table 3: Results from the continuation test for Bulgarian
SentenceInstr/
Body extLimit Path
Body-Part/
PossessorLoc Temp Manner Other
3. Bob potupa
Bob tapped10
90
+3 +16 +3
8. Lucy pljasna
Lucy slapped37
26
+10 +313 7
17
shamar
slap
9. Margaret otrjaza
Margaret cut93 7 (s.o.)
11. Billy probode
Billy stabbed
7
+2
90
+13(-)
26. Knigata pljasna
The book slapped10
70 origin
3 end10
3
3(-)
28. Nozhat rezhe
The knife cuts43 47
7
3(-)
32. Valnite pljaskaha
The waves slapped57 3 27
3
10(-)
34. Lilly otrjaza hljaba
Lilly cut the bread27 47 26
36. Ann probode mesoto
Ann stabbed the meat90 10
38. Iva potupvashe po
masata
Iva tapped on the table
37 533
7(-)
Legend:
Argument: lexically encoded semantic
participant
Adjunct: non-lexically encoded semanticparticipant
Instr/or body ext. (b.e.): instrument or body
extension (hand, finger, leg, foot) used as an
instrument
Human/part: human or part of a human (face,
head, eyes, hair)
Trans: transitive usage of the verb
Intrans: intransitive usage of the verb
Lit: literal usage of the verb
Fig: figurative usage of the verb
Source: the participant performing the action
Limit: the participant that is affected or
changed
BPP: body-part/possessor
Loc: (event) location
Part. loc: participant locationPass: passive
Se-pass: se-passive (a certain type of passive in
Bulgarian)
Refl: reflexive
Quant: quantity
Mann: manner
Temp: temporal (time)
(-): no continuation was provided
(s.o.): someone
+: refers to continuations provided in addition
to the first one