phrases as input units in italian n+n compounds jan radimský university of south bohemia, České...

13
Phrases as input units in Italian N+N compounds Jan Radimský University of South Bohemia, České Budějovice / Budweis (CZ)

Upload: kristian-burke

Post on 21-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Phrases as input units in Italian N+N compounds Jan Radimský University of South Bohemia, České Budějovice / Budweis (CZ)

Phrases as input units in Italian N+N compounds

Jan RadimskýUniversity of South Bohemia, České Budějovice / Budweis (CZ)

Page 2: Phrases as input units in Italian N+N compounds Jan Radimský University of South Bohemia, České Budějovice / Budweis (CZ)

Overview Why Italian N+N compounds?

Lieber-Scalise (2007), Baroni-Guevara-Zamparelli (2009), Delfitto-Paradisi (2007)

Verbal-nexus N+N compounds allow for insertion of N+A phrases Quantitative verification – corpus data? Other types of LI violation?

State of the art Lexical integrity hypothesis Italian N+N compounds (definition, input units)

Compound classification: Verbal-nexus and ATAP compounds Insertion vs. Visibility to syntax (Construction Grammar)

Data gathering ItWac corpus Gathering and filtering of frequency lists

Results, interpretation

Page 3: Phrases as input units in Italian N+N compounds Jan Radimský University of South Bohemia, České Budějovice / Budweis (CZ)

Lexical integrity hypothesis Concept of LIH

Phrases cannot become an input of morphological operations Variety of terms

Lapointe (Generalized Lexicalist Hypothesis), Selkirk (Word Structure Autonomy Condition), Di Sciullo and Williams (The Atomicity Thesis), Bresnan and Mchombo (Lexical Integrity Principle), Botha (No Phrase Constraint).

Strong version of LIH Scalise (1984)

a WFR (i.e. ‘word formation rule’) can take as its base only major lexical categories (N, A, V), but not phrases (NP, AP, VP) or sentences

at most lexicalized phrases (i.e. ‘phrases stored in the lexicon’)

Weak version of LIH Lieber-Scalise (2007): The lexical integrity hypothesis in a new theoretical

universe Many counter-examples , like: a [pipe and slipper] husband Some theories reject the LIH (Distributed Morphology, Construction

Grammar) But: strong restrictions on the presence of phrases in compounds Why and when may phrases be input units of compounds?

Page 4: Phrases as input units in Italian N+N compounds Jan Radimský University of South Bohemia, České Budějovice / Budweis (CZ)

Italian N+N compounds Definition (Guevara-Scalise, 2009:107)

Italian N+N compound: [N R N]Z

Made up of two nouns “R” represents an implicit relationship between the constituents (a relationship not

spelled out by any lexical item).” Example

vagone merci (“freight wagon”) - compound luna [di]PREP miele (“honeymoon”) - noun phrase

Gaeta-Ricca (2009): Features [+/-] morphological and [+/-] lexical are mutually independent Compounds:

[+] morphological: implicit relationship between the constituents [+/-] lexical: semantic opacity, listedness... not relevant.

Compounds vs. apposition: Apposition: at least one constituent is a DP or a referential expression (proper noun)

mia sorella Maria “my sister Maria” [la casa di Mario]DP, [l’unica villa col giardino del paese]DP

“Mario’s house, the only villa with garden in the village” l’aggettivo ‘buono’ “the adjective ‘buono’” la [legge]N1 [n. 457]N2-ref – “the law number 457”

Page 5: Phrases as input units in Italian N+N compounds Jan Radimský University of South Bohemia, České Budějovice / Budweis (CZ)

Italian N+N compounds: input units

Nouns – free lexemes: the default option (Bisetto A., 2004:33)

Compounds: [N+N]N

[[direzione]N [[ufficio]N [acquisti]N]N]N “head of puchasing office” close to the so-called “label jargon” (Bisetto A., 2004:42) Italian – compared with Germanic languages – makes little use of it when

the two embedded compounds are of the same type Zuffi (1981:17-18) Phrases:

Insertion of phrases is possible, but restricted (Lieber-Scalise, 2007) Only in verbal-nexus compounds (both head and non-head position) Only (N+A)NP phrases – be they lexicalized or not

Explanation Lieber-Scalise (2007): construction “involving a fixed template for the phrasal

element, which is then down-graded to a word ” Baroni-Guevara-Zamparelli (2009): VNxCs without internal modifiers are

compounds, while VNxCs with internal adjectival modifiers are formed according to the rules of the so-called “headline syntax”.

Delfitto and Paradisi (2007): VNxCs have a syntactic origin

Page 6: Phrases as input units in Italian N+N compounds Jan Radimský University of South Bohemia, České Budějovice / Budweis (CZ)

Theoretical background Insertion vs. Visibility to syntax

Construction Morphology (Booij, 2009:85): No phrase constraint – insertion of phrases Lexical integrity constraint – syntactic rules operating

on compound elements

Insertion: allowed NP instead of N

Syntactic operations on compound elements: not allowed conjunction, wh-movement of the head, wh-movement

of the non-head, non-head topicalization, pronominal reference

Page 7: Phrases as input units in Italian N+N compounds Jan Radimský University of South Bohemia, České Budějovice / Budweis (CZ)

Compound classification Based on Bisetto-Scalise (2005) and Scalise-Bisetto (2009)

Subordinate verbal-nexus compounds (VNxCs) noleggio auto (“car rental”)

deverbal head (< transitive verb) argument of the deverbal head (direct object of the underlying

verb) Interpretation triggered by the deverbal head (N1)

Attributive-appositive compounds (ATAP) head modifier – attribute

parola chiave (“key word”) – Appositive compound modifier = concrete noun with metaphoric interpretation

luogo simbolo (“symbolic place”) – Attributive compound modifier = abstract noun with literal interpretation

Interpretation triggered by the modifier (N2)

Page 8: Phrases as input units in Italian N+N compounds Jan Radimský University of South Bohemia, České Budějovice / Budweis (CZ)

Data gathering: compounds ItWac binominals database

372,361 lemmatized binominals from the ItWac corpus Based on extraction of complete frequency lists

patterns Art-N-N; Prep-N-N; Art-N-A (N/A ambiguity, tagging errors) provided with annotations (lemmatization, gender, number, deverbal N1,

collocability...) Extraction of VNxCs: 1,364 types

Deverbal head (WordManager – Bopp, 1993) N+N appears also as N+di+N in ItWac (property of more than 90% of VNxCs

according to Baroni-Guevara-Pirrelli, 2009) [trattamento]N1 [rifiuti]N2-pl “waste treatment” [trattamento]N1 [di]PREP-gen. [rifiuti]N2-pl “treatment of waste” [trattamento]N1 [dei]PREP-gen.+Art.Det. [rifiuti]N2-pl “treatment of the waste”

Manual filtering Extraction of ATAPCs: 1,800 types

Frequently repeated modifiers (N2 combine with many N1, without gender agreement) [ruolo / punto / fattore...]M [chiave]F “key [role / point / factor...]”

Manual filtering

Page 9: Phrases as input units in Italian N+N compounds Jan Radimský University of South Bohemia, České Budějovice / Budweis (CZ)

Data gathering: compounds with NP constituents

Extraction and filtering of complete frequency lists ItWac - Baroni et al. (2006) “Kontext” – www.korpus.cz

Example – VNxCs with the structure: [Nhead [N-Prep-N]NP-argument] [rimborso [spese di viaggio]NP] refund of travel

expenses

Gathering of a lemmatized frequency list with the given structure [tag=”NOUN”] [tag=”NOUN”] [word=”a” | word=”di” | word=”da”] [tag=”NOUN”]

Matching identified VNxCs:[rimborso [spese di viaggio]NP] - tested VNxC

rimborso spese - known VNxC

Page 10: Phrases as input units in Italian N+N compounds Jan Radimský University of South Bohemia, České Budějovice / Budweis (CZ)

Argument position (in red) All phrases (N-A, N-Prep-N, N-e-N)

and compounds (N-N) Higher (type, token) frequencies

Type of the pattern

Pattern Types Tokens Example

Insertion of noun phrases

[N-[N-A]] > 1,386 > 5,091 [gestione [risorse umane]NA]human resources management

[[N-A]-N] 187 532 [[trasporto ferroviario]NA passeggeri]railway passenger transport

[N1-[N2a-PREP-N2b]] 731 2 872 [rimborso [spese di viaggio]NP]refund of travel expenses

Insertion of [N+N] compounds 

[N1-[N2a-N2b] N-VNxC]N-GROUND >1000 >6000 [centro [elaborazione dati]NN]data processing center

[N1-[N2a-N2b] N-GROUND]N-VNxC 297 1195 [convocazione [conferenza stampa]NN]press conference invitation

[N1-[N2a-N2b] N-VNxC]N-VNxC 279 1232 [scadenza [presentazione offerte]NN]expiration of offer presentation

[N1-[N2a-N2b] N-I-ATAP]N-VNxC 114 451 [approvazione [linee guida]NN]guidelines approval

Insertion of coordinate nouns

[N1a-e- N1b]- N2 221 1234 [[progettazione e direzione] [lavori]]design and supervision of works

N1-[N2a-e- N2b] 236 955 [trasmissione [voce e dati]]voice and data transmission

Head position Only selected phrases (N-A, N-e-N) Lower frequencies (except for

coordination)

Phrases and compounds in VNxCs

Page 11: Phrases as input units in Italian N+N compounds Jan Radimský University of South Bohemia, České Budějovice / Budweis (CZ)

Phrases and compounds in ATAPCs  Pattern Types Tokens Example

Modifier position

(a) N-[N-PREP-N]

4150 19666 [donna [vittima di violenza]NPN]women victim of violence

(b) N-[ADV-N] 125 293 [ruolo [più cult]AdvN]the most cult role

Head position

(c) [N-A]-N 228 3735 [[settore tecnologico]NA chiave] key technology sector

(d) [N-PREP-N]-N

50 352 [[valore di concentrazione]NPN limite] “maximum concentration value”

(e) [N-N]-N 4 39 [[conferenza stampa]NN-grounding fiume]NN-I-

ATAP

never-ending press conference Modifier position (a-b) Pattern (a) is item-specific: only

12 modifiers of 147, as portatore (bearer), frutto (fruit), oggetto (object), simbolo (symbol)...

Pattern (b): few modifiers – adjectives?(Grandi-Nissim-Tamburini, 2011)

Head position Free insertion of phrases,

especially N-A (c)

Page 12: Phrases as input units in Italian N+N compounds Jan Radimský University of South Bohemia, České Budějovice / Budweis (CZ)

Conclusion Noun phrases in VNxCs

Not only NA phrase, but also NPN phrase, coordinate nouns and NN compounds

Free insertion rather on the argument position (N2) Noun phrases in ATAP compounds

Frequent insertion of NPN phrase on modifier position, item specific Free insertion of NA phrases rather on the head position (N1)

Explanation Free insertion of phrases on the element that does not trigger the

interpretation of the compound Argument of the VNxC:

[gestione [risorse umane]NA] human resources management

Head of the ATAPC: [[settore tecnologico]NA chiave] key technology sector

Further research: phrases in grounding compounds?

Page 13: Phrases as input units in Italian N+N compounds Jan Radimský University of South Bohemia, České Budějovice / Budweis (CZ)

References Baroni Marco et al. (2006), The WaCky Wide Web: A Collection of Very Large Linguistically Processed Web-Crawled

Corpora. Online<http://wacky.sslmit.unibo.it/lib/exe/fetch.php?media=papers:wacky_2008.pdf>

Baroni Marco, Guevara Emiliano, Pirrelli Vito (2009), Sulla tipologia dei composti N+N in italiano: přincipi categoriali ed evidenza distribuzionale a confronto. In: Ruben Benatti, Giacomo Ferrari and Monica Mosca (eds.), Linguistica e modelli tecnologici di ricerca (Atti del 40esimo Congresso della Società di Linguistica Italiana). Roma: Bulzoni, pp. 73-95.

Baroni Marco, Guevara Emiliano, Zamparelli Roberto (2009), The dual nature of Deverbal Nominal Constructions: Evidence from acceptability ratings and corpus analysis. Corpus Linguistics and Linguistic Theory, 5–1, pp. 27–60.

Bisetto Antonietta (2004), Composizione con elementi italiani. In: Grossmann Maria, Rainer Franz, Bertinetto Pier Marco, La formazione delle parole in italiano. Tubingen, M. Niemeyer, pp. 33-50.

Bisetto Antonietta, Scalise Sergio (2005), The classification of compounds. Lingue e Linguaggio, 4(2), pp. 319-332.

Booij Geert E. (2009), Lexical Integrity as a Formal Universal: A Constructionist View. In: Scalise S. et al. (eds.), Universals of Language Today. Dordrecht, Springer, pp. 83-100.

Bopp Stephan (1993), Computerimplementation der italienischen Flexions und Wortbildungsmorphologie, Olms Verlag, Hildesheim.

Delfitto Denis, Paradisi Paola (2007), Prepositionless genitive and N+N compounding in (old) French and Italian. In: Torck D., Wetzels W. L. (eds.), Romance languages and linguistic theory. Amsterdam, John Benjamins. pp. 53-72.

Gaeta Livio, Davide Ricca (2009), Composita solvantur: Compounds as lexical units or morphological objects? Rivista di Linguistica, 22/1, pp. 35-70.

Grandi Nicola, Nissim Malvina, Tamburini Fabio (2011), Noun-Clad Adjectives. On the adjectival status of non-head constituents of Italian attributive compounds. Lingue e linguaggio, X.1, pp. 161-176.

Guevara Emiliano, Scalise Sergio (2009), Searching for Universals in Compounding. In: Sergio Scalise, Elisabetta Magni, Antonietta Bisetto (eds.), Universals of Language Today. Springer, pp. 101-128.

Lieber Rochelle, Scalise Sergio (2007), The Lexical Integrity Hypothesis in a new theoretical universe. In: Booij G. et al., Proceedings of the Fifth Mediterranean Morphology Meeting. Bologna, Università degli studi di Bologna, pp. 1-24.

Scalise Sergio (1984), Generative morphology. Dordrecht, Foris Publications.

Scalise Sergio, Bisetto Antonietta (2009), The classification of compounds, In: Lieber R., Štekaurer P., « The Oxford handbook of compounding », Oxford, Oxford University Press.

Zuffi Stefano (1981), The nominal composition in Italian. Topics in generative morphology. Journal of Italian Linguistics, 1981/2, pp. 1-54.