acquisition of lexical knowledge for nlp german rigau i claramunt rigau talp research center...

92
Acquisition of Acquisition of Lexical Knowledge for Lexical Knowledge for NLP NLP German Rigau i Claramunt German Rigau i Claramunt http://www.lsi.upc.es/~rigau http://www.lsi.upc.es/~rigau TALP Research Center TALP Research Center Departament de Llenguatges i Sistemes Departament de Llenguatges i Sistemes Informàtics Informàtics Universitat Politècnica de Catalunya Universitat Politècnica de Catalunya

Upload: jasmine-stowers

Post on 14-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition ofAcquisition ofLexical Knowledge for Lexical Knowledge for NLPNLP

German Rigau i ClaramuntGerman Rigau i Claramunt

http://www.lsi.upc.es/~rigauhttp://www.lsi.upc.es/~rigau

TALP Research CenterTALP Research Center

Departament de Llenguatges i Sistemes Departament de Llenguatges i Sistemes InformàticsInformàtics

Universitat Politècnica de CatalunyaUniversitat Politècnica de Catalunya

Page 2: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 2

Acquisition of Lexical Knowledge for NLPAcquisition of Lexical Knowledge for NLPOutlineOutline

SettingSetting Words and WorksWords and Works Structured SourcesStructured Sources

– MRDs, thesauri MRDs, thesauri Unstructured SourcesUnstructured Sources

– corpora corpora

Page 3: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 3

Acquisition of Lexical Knowledge for NLPAcquisition of Lexical Knowledge for NLPSettingSetting

NLP and the LexiconNLP and the Lexicon– Theoretical: WG, GPSG, HPSG.Theoretical: WG, GPSG, HPSG.– Practical: realistic complexity and Practical: realistic complexity and

coveragecoverage

Lexical Bottleneck (Briscoe 91)Lexical Bottleneck (Briscoe 91)– Even worse for languages other than Even worse for languages other than

EnglishEnglish

Page 4: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 4

Acquisition of Lexical Knowledge for NLPAcquisition of Lexical Knowledge for NLPSettingSetting

Which LK is needed by a concrete Which LK is needed by a concrete NLP system?NLP system?

Where is this LK located?Where is this LK located? Which procedures can be applied?Which procedures can be applied?

Page 5: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 5

Acquisition of Lexical Knowledge for NLP Acquisition of Lexical Knowledge for NLP SettingSetting

Which LK is needed by a concrete NLP Which LK is needed by a concrete NLP system?system?– Phonology: Phonology: phonemes, stress, phonemes, stress,

etc.etc.– Morphology: Morphology: POS, etc.POS, etc.– Syntactic:Syntactic: category, subcat., etc.category, subcat., etc.– Semantic:Semantic: class, SRs, etc.class, SRs, etc.– Pragmatic:Pragmatic: usage, registers, TDs, usage, registers, TDs,

etc.etc.– Translations:Translations: translation linkstranslation links

Page 6: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 6

Acquisition of Lexical Knowledge for NLP Acquisition of Lexical Knowledge for NLP SettingSetting

Where is this LK located?Where is this LK located?

– Human brainHuman brain– Structured Lexical Resources:Structured Lexical Resources:

Monolingual and bilingual MRDsMonolingual and bilingual MRDs ThesauriThesauri

– Unstructured Lexical Resources: Unstructured Lexical Resources: Monolingual and bilingual Corpora Monolingual and bilingual Corpora

– Mixing resourcesMixing resources

Page 7: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 7

Acquisition of Lexical Knowledge for NLP Acquisition of Lexical Knowledge for NLP SettingSetting

Which procedures can be applied?Which procedures can be applied?– Prescriptive approachPrescriptive approach

Machine-aided manual constructionMachine-aided manual construction

– Descriptive approachDescriptive approach Automatic acquisition from pre-existing Automatic acquisition from pre-existing

Lexical ResourcesLexical Resources

– Mixed approachMixed approach

Page 8: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 8

Acquisition of Lexical Knowledge for NLPAcquisition of Lexical Knowledge for NLPOutlineOutline

SettingSetting Words and WorksWords and Works Structured SourcesStructured Sources

– MRDs, thesauriMRDs, thesauri Unstructured SourcesUnstructured Sources

– corpora corpora

Page 9: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 9

Words and WorksWords and WorksWhere is this Lexical Knowledge Where is this Lexical Knowledge

located?located? – Human brain: Human brain:

Linguistic String Project (Fox et al. 88)Linguistic String Project (Fox et al. 88)– Lexical Information for 10,000 entriesLexical Information for 10,000 entries

WordNet (Miller et al. 90)WordNet (Miller et al. 90)– Semantic Information v1.6 with 99,642 synsetsSemantic Information v1.6 with 99,642 synsets

Comlex (Grishman et al. 94)Comlex (Grishman et al. 94)– Syntactic information 38,000 English wordsSyntactic information 38,000 English words

CYC Ontology (Lenat 95)CYC Ontology (Lenat 95)– a person-century of effort to produce 100,000 a person-century of effort to produce 100,000

termsterms LDOCE3-NLPLDOCE3-NLP

– dictionary with 80,000 sensesdictionary with 80,000 senses

Page 10: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 10

Words and WorksWords and WorksWhere is this Lexical Knowledge Where is this Lexical Knowledge

located?located? – Structured Lexical ResourcesStructured Lexical Resources

Monolingual MRDs:Monolingual MRDs:– LDOCELDOCE

learner’s dictionarylearner’s dictionary 35,956 entries and 76,059 definitions35,956 entries and 76,059 definitions 86% semantic and 44% pragmatic codes86% semantic and 44% pragmatic codes controlled vocabulary of 2,000 wordscontrolled vocabulary of 2,000 words (Boguraev & Briscoe 89)(Boguraev & Briscoe 89) (Vossen & Serail 90)(Vossen & Serail 90) (Bruce & Guthrie 92), (Wilks et al. 93)(Bruce & Guthrie 92), (Wilks et al. 93) (Dolan et al. 93), (Richardson 97)(Dolan et al. 93), (Richardson 97)

Page 11: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 11

Words and WorksWords and WorksWhere is this Lexical Knowledge Where is this Lexical Knowledge

located?located? – Structured Lexical ResourcesStructured Lexical Resources

Other Monolingual MRDs:Other Monolingual MRDs:– Webster’s (Jensen & Ravin 87)Webster’s (Jensen & Ravin 87)– LPPL (Artola 93)LPPL (Artola 93)– DGILE (Castellón 93), (Taulé 95), (Rigau 98)DGILE (Castellón 93), (Taulé 95), (Rigau 98)– CIDE (Harley & Glennon 97)CIDE (Harley & Glennon 97)– AHD (Richardson 97)AHD (Richardson 97)– WordNet (Harabagiu 98)WordNet (Harabagiu 98)

Bilingual MRDsBilingual MRDs– Collins Spanish/English (Knigth & Luk 94)Collins Spanish/English (Knigth & Luk 94)– Vox/Harrap’s Spanish/English (Rigau 98)Vox/Harrap’s Spanish/English (Rigau 98)

Page 12: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 12

Words and WorksWords and WorksWhere is this Lexical Knowledge Where is this Lexical Knowledge

located?located? – Structured Lexical ResourcesStructured Lexical Resources

Thesauri: Thesauri: – Roget’s Thesaurus Roget’s Thesaurus

60,071 words in 1,000 categories60,071 words in 1,000 categories (Yarowsky 92), (Grefenstette 93), (Resnik (Yarowsky 92), (Grefenstette 93), (Resnik

95)95)– Roget’s II and The New Collins ThesaurusRoget’s II and The New Collins Thesaurus

(Byrd 89)(Byrd 89)– Macquarie’s thesaurusMacquarie’s thesaurus

(Grefenstette 93)(Grefenstette 93)– Bunrui Goi Hyou Japanese thesaurusBunrui Goi Hyou Japanese thesaurus

(Utsuro et al. 93)(Utsuro et al. 93)

Page 13: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 13

Words and WorksWords and WorksWhere is this Lexical Knowledge Where is this Lexical Knowledge

located?located? – Structured Lexical ResourcesStructured Lexical Resources

EncyclopaediaEncyclopaedia– Grolier’s Encyclopaedia (Yarowsky 92)Grolier’s Encyclopaedia (Yarowsky 92)– Encarta (Richardson et al. 98)Encarta (Richardson et al. 98)

OthersOthers– Telephonic GuidesTelephonic Guides

Mixing structured lexical resourcesMixing structured lexical resources– Roget’s Thesaurus and Grolier’s (Yarowsky 92)Roget’s Thesaurus and Grolier’s (Yarowsky 92)– LDOCE, WN, Collins, ONTOS, UM (Knight & Luk LDOCE, WN, Collins, ONTOS, UM (Knight & Luk

94)94)– Japanese MRD to WN (Okumura & Hovy 94)Japanese MRD to WN (Okumura & Hovy 94)– LLOCE, LDOCE (Chen & Chang 98)LLOCE, LDOCE (Chen & Chang 98)

Page 14: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 14

Words and WorksWords and WorksWhere is this Lexical Knowledge Where is this Lexical Knowledge

located?located? – Unstructured Lexical ResourcesUnstructured Lexical Resources

Corpora: Corpora: – WSJ, Brown Corpus (SemCor), HansardWSJ, Brown Corpus (SemCor), Hansard– Proper Nouns (Hearst & Schütze 95)Proper Nouns (Hearst & Schütze 95)– Idiosyncratic Collocations (Church et al. 91)Idiosyncratic Collocations (Church et al. 91)– Preposition preferences (Resnik and Hearst Preposition preferences (Resnik and Hearst

93)93)– Subcategorization structures (Briscoe and Subcategorization structures (Briscoe and

Carroll 97)Carroll 97)– Selectional restrictions (Resnik 93), (Ribas 95)Selectional restrictions (Resnik 93), (Ribas 95)– Thematic structure (Basili et al. 92)Thematic structure (Basili et al. 92)– Word semantic classes (Dagan et al. 94)Word semantic classes (Dagan et al. 94)– Bilingual Lexicons for MT (Fung 95)Bilingual Lexicons for MT (Fung 95)

Page 15: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 15

Words and WorksWords and WorksWhere is this Lexical Knowledge Where is this Lexical Knowledge

located?located? – Mixing structured and non-structured Mixing structured and non-structured

Lexical ResourcesLexical Resources MRDs and CorporaMRDs and Corpora

– (Liddy & Paik 92)(Liddy & Paik 92)– (Klavans & Tzoukermann 96)(Klavans & Tzoukermann 96)

WordNet and CorporaWordNet and Corpora– (Resnik 93), (Ribas 95), (Li & Abe 95), (Resnik 93), (Ribas 95), (Li & Abe 95),

(McCarthy 01)(McCarthy 01)– (Mihalcea & Moldovan 99)(Mihalcea & Moldovan 99)

Page 16: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 16

Words and WorksWords and WorksLexical Acquisition from MRDsLexical Acquisition from MRDs

– Syntactic Disambiguation (Dolan et al. 93)Syntactic Disambiguation (Dolan et al. 93)– Semantic Processing (Vanderwende 95)Semantic Processing (Vanderwende 95)– WSD (Lesk 86), (Wilks & Stevenson 97), (Rigau 98)WSD (Lesk 86), (Wilks & Stevenson 97), (Rigau 98)– IR (Krovetz & Croft 92)IR (Krovetz & Croft 92)– MT (Knight and Luk 94), (Tanaka & Umemura 94)MT (Knight and Luk 94), (Tanaka & Umemura 94)– Semantically enriching MRDs Semantically enriching MRDs

(Yarowsky 92), (Knight 93), (Chen & Chan 98)(Yarowsky 92), (Knight 93), (Chen & Chan 98)– Building LKBsBuilding LKBs

(Bruce & Guthrie 92)(Bruce & Guthrie 92) (Dolan et al. 93)(Dolan et al. 93) (Artola 93)(Artola 93) (Castellón 93), (Taulé 95), (Rigau 98)(Castellón 93), (Taulé 95), (Rigau 98)

Page 17: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 17

Words and WorksWords and WorksInternational Projects on Lexical International Projects on Lexical AcquisitionAcquisition

– Japanese ProjectsJapanese Projects EDR (Yokoi 95) EDR (Yokoi 95)

– Nine years project oriented to MTNine years project oriented to MT– Bilingual Corpora with 250,000 wordsBilingual Corpora with 250,000 words– Monolingual, bilingual and coocurrence Monolingual, bilingual and coocurrence

dictionariesdictionaries– 200,000 general vocabulary200,000 general vocabulary– 100,000 technical terminology100,000 technical terminology– 400,000 concepts400,000 concepts

Page 18: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 18

Words and WorksWords and WorksInternational Projects on Lexical International Projects on Lexical AcquisitionAcquisition

– American ProjectsAmerican Projects Comlex (Grishman et al. 94)Comlex (Grishman et al. 94)

– Syntactic information for 38,000 wordsSyntactic information for 38,000 words WordNet (Miller 90)WordNet (Miller 90)

– Semantic informationSemantic information– more than 123,000 words organised in 99,000 more than 123,000 words organised in 99,000

synsets synsets – more than 116,000 relations between synsetsmore than 116,000 relations between synsets

Pangloss (Knight & Luk 94)Pangloss (Knight & Luk 94)– PUM, ONTOS, LDOCE semantic categories, WordNet PUM, ONTOS, LDOCE semantic categories, WordNet

Cyc (Lenat 95)Cyc (Lenat 95)– common-sense knowledgecommon-sense knowledge– 100,000 concepts and 1,000,000 axioms100,000 concepts and 1,000,000 axioms

Page 19: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 19

Words and WorksWords and WorksInternational Projects on Lexical International Projects on Lexical AcquisitionAcquisition

– European ProjectsEuropean Projects Acquilex I and IIAcquilex I and II

– LA from monolingual and bilingual MRDs and LA from monolingual and bilingual MRDs and corpora corpora

LE-ParoleLE-Parole– Large-scale harmonised set of corpora and Large-scale harmonised set of corpora and

lexicons for all the EU languageslexicons for all the EU languages EuroWordNetEuroWordNet

– To develop a multilingual WordNet for several To develop a multilingual WordNet for several European Languages European Languages

Page 20: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 20

Acquisition of Lexical Knowledge for NLP Acquisition of Lexical Knowledge for NLP SettingSetting

Acquilex IAcquilex I

Acquilex IIAcquilex II

EuroWordNetEuroWordNet

Page 21: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 21

Words and Works Words and Works AcquilexAcquilex

Lexical Knowledge AcquisitionLexical Knowledge Acquisition Mixed approachMixed approach DictionariesDictionaries (MRD -> MTD -> LDB -> LKB) (MRD -> MTD -> LDB -> LKB) PartnersPartners

– Cambridge UniversityCambridge University– Instituto di Linguistica Computazional de PisaInstituto di Linguistica Computazional de Pisa– Amsterdam UniversityAmsterdam University– Dublin UniversityDublin University

30 months30 months ThesisThesis

– (Castellón 1993) (Castellón 1993) – (Taulé 1995)(Taulé 1995)– (Rigau 1998)(Rigau 1998)

Page 22: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 22

Words and Works Words and Works Acquilex IIAcquilex II

Lexical Knowledge AcquisitionLexical Knowledge Acquisition Mixed approachMixed approach CorporaCorpora PartnersPartners

– Cambridge UniversityCambridge University– Instituto di Linguistica Computazional de PisaInstituto di Linguistica Computazional de Pisa– Amsterdam UniversityAmsterdam University

30 months30 months ThesisThesis

– [Ribas 1995] (Acquisition of Selectional Restrictions)[Ribas 1995] (Acquisition of Selectional Restrictions)– [Ageno ...] (Robust Parsing)[Ageno ...] (Robust Parsing)– [Padró 1998] (Relaxation labelling)[Padró 1998] (Relaxation labelling)– [Màrquez 1999] (Desition Trees)[Màrquez 1999] (Desition Trees)

Page 23: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 23

Words and Works Words and Works EuroWordNetEuroWordNet

Multilingual WordNetMultilingual WordNet PartnersPartners

– English, Spanish, Dutch, Italian English, Spanish, Dutch, Italian – (and French, German, Txec, Estonian)(and French, German, Txec, Estonian)

25.000 noun synsets and 5.000 verbal synsets 25.000 noun synsets and 5.000 verbal synsets 30 months30 months ThesisThesis

– [Farreres ...] (Mapping of Bilingual [Farreres ...] (Mapping of Bilingual dictionaries)dictionaries)

– [Daudé ...] (Mapping of hierarchies)[Daudé ...] (Mapping of hierarchies)

Page 24: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 24

Acquisition of Lexical Knowledge for NLPAcquisition of Lexical Knowledge for NLPOutlineOutline

SettingSetting Words and WorksWords and Works Structured SourcesStructured Sources

– MRDs, thesauriMRDs, thesauri Unstructured SourcesUnstructured Sources

– corpora corpora

Page 25: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 25

Structured Sources Structured Sources Acquisition of LK from MRDsAcquisition of LK from MRDs

Focusing on:Focusing on:– the massive acquisition of LKthe massive acquisition of LK– from MRDs (conventional, in any language)from MRDs (conventional, in any language)– using automatic methodologiesusing automatic methodologies

Why MRDs?Why MRDs?

The conventional dictionaries for human use usually “contain spelling, pronunciation, hyphenation, capitalization, usage notes for semantic domains, geographic regions, and propiety; ethimological, syntactic and semantic information about the most basic units of the language” (Amsler 81)

Page 26: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 26

Structured Sources Structured Sources DictionariesDictionaries

LDOCE LDOCE (Longman Dictionary of Contemporary English)(Longman Dictionary of Contemporary English)

DGILE DGILE (Diccionario General Ilustrado de la Lengua Española)(Diccionario General Ilustrado de la Lengua Española)

DGLC DGLC (Diccionari General de la Llengua Catalana)(Diccionari General de la Llengua Catalana)

DVHE DVHE (Diccionari Vox-Harrap’s Esencial)(Diccionari Vox-Harrap’s Esencial)

Page 27: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 27

Structured Sources Structured Sources Dictionaries: LDOCEDictionaries: LDOCE

Higly coded, restricted vocabularyHigly coded, restricted vocabulary 76.059 senses in 30.373 entries76.059 senses in 30.373 entries

– LDOCE id, POS, Grammatical Code, Idiom, Pragmatic LDOCE id, POS, Grammatical Code, Idiom, Pragmatic Code, Code,

– Semantic Code (subject-preference), object-reference, Semantic Code (subject-preference), object-reference, – indirect-object-preference, definition.indirect-object-preference, definition.

|cheese_0_1| <n> <U-C> <> <FO--> <5> <> <> <(any of |cheese_0_1| <n> <U-C> <> <FO--> <5> <> <> <(any of many many kinds of) soft or firm solid food made from kinds of) soft or firm solid food made from pressed and pressed and sometimes ripened milk solids (CURDs)>sometimes ripened milk solids (CURDs)>

|cheese_0_2| <n> <C> <> <FO--> <J> <> <> <usu-a large |cheese_0_2| <n> <C> <> <FO--> <J> <> <> <usu-a large shaped shaped and wrapped quantity of this>and wrapped quantity of this>

|cheese_0_3| <n> <C> <green cheese> <FO--> <> <> <> |cheese_0_3| <n> <C> <green cheese> <FO--> <> <> <> <newly <newly made cheese>made cheese>

Page 28: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 28

Structured Sources Structured Sources Dictionaries: DGILEDictionaries: DGILE

Poorly coded, no restricted vocabularyPoorly coded, no restricted vocabulary 157.843 senses in 89.043 entries157.843 senses in 89.043 entries 1.4 million words in definitions and examples1.4 million words in definitions and examples

((queso ) (ETIM l. caseu )((queso ) (ETIM l. caseu )

(Sense 1) (CA m.) (DEF Masa que se obtiene cuajando la leche, exprimiéndola (Sense 1) (CA m.) (DEF Masa que se obtiene cuajando la leche, exprimiéndola para para que deje suero y echándole sal para que se conserve: ~ de Gruyre; ~ que deje suero y echándole sal para que se conserve: ~ de Gruyre; ~ de de Roquefort; ~ de bola, el de tipo holandés, de forma esférica; ~ de Roquefort; ~ de bola, el de tipo holandés, de forma esférica; ~ de hierba, el que hierba, el que se hace cuajando la leche con hierba a propósito; ~ se hace cuajando la leche con hierba a propósito; ~ manchego, el de pasta manchego, el de pasta compacta, algo dura, crudo, de leche de oveja.)compacta, algo dura, crudo, de leche de oveja.)

(Sense 2) (CA m.) (DEF ~ de cerdo, manjar hecho con carne de cerdo o jabalí, (Sense 2) (CA m.) (DEF ~ de cerdo, manjar hecho con carne de cerdo o jabalí, picada picada y prensada.)y prensada.)

(Sense 3)(CA m.)(DEF ~ helado, helado compacto hecho en molde.)(Sense 3)(CA m.)(DEF ~ helado, helado compacto hecho en molde.)

(Sense 4)(CA m.)(DEF Medio ~, tablero grueso, semicircular, que usan los (Sense 4)(CA m.)(DEF Medio ~, tablero grueso, semicircular, que usan los sastres para sastres para planchar cuellos y solapas y para sentar costuras curvas.)planchar cuellos y solapas y para sentar costuras curvas.)

(Sense 5)(CA m.)(REG fam.)(DEF Pie.)(Sense 5)(CA m.)(REG fam.)(DEF Pie.)

(Sense 6)(CA m.)(GEO Venez.)(DEF ~ frito, estafa.)(Sense 6)(CA m.)(GEO Venez.)(DEF ~ frito, estafa.)

(RELA 1)(TIPOR Rel.)(TXR Del l. caseu derivan numerosos tecn. como caseína, (RELA 1)(TIPOR Rel.)(TXR Del l. caseu derivan numerosos tecn. como caseína, cáseo, cáseo, caseificar, caseico, caseoso.))caseificar, caseico, caseoso.))

Page 29: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 29

Structured Sources Structured Sources Dictionaries: DGLC (Fabra)Dictionaries: DGLC (Fabra)

Poorly coded, no restricted vocabularyPoorly coded, no restricted vocabulary 89.360 senses in 51.135 entries89.360 senses in 51.135 entries

((formatge)(CC m.)((formatge)(CC m.)

(NS 1 > 1 > 1 > 0 > 0 > 0)(CG m.)(DF Massa alimentosa que s’obté (NS 1 > 1 > 1 > 0 > 0 > 0)(CG m.)(DF Massa alimentosa que s’obté coagulant la llet, coagulant la llet, esprement-ne el xerigot i consolidant la part presa.)esprement-ne el xerigot i consolidant la part presa.)

(NS 2 > 1 > 0 > 1 > 0 > 0)(CG m.)(EX Formatge de Ma.)(NS 2 > 1 > 0 > 1 > 0 > 0)(CG m.)(EX Formatge de Ma.)

(NS 3 > 1 > 0 > 2 > 0 > 0)(CG m.)(EX Formatge fresc, salat.)(NS 3 > 1 > 0 > 2 > 0 > 0)(CG m.)(EX Formatge fresc, salat.)

(NS 4 > 1 > 0 > 3 > 0 > 0)(CG m.)(EX Ratllar formatge.)(NS 4 > 1 > 0 > 3 > 0 > 0)(CG m.)(EX Ratllar formatge.)

(NS 5 > 1 > 0 > 4 > 0 > 0)(CG m.)(EX Un formatge.)(NS 5 > 1 > 0 > 4 > 0 > 0)(CG m.)(EX Un formatge.)

(FI f4)(FI f4)

))

Page 30: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 30

– Morphological InformationMorphological Information POS (n, v, adj, adv, etc.)POS (n, v, adj, adv, etc.) Derivative formsDerivative forms Composed formsComposed forms Derivative Model (verbs)Derivative Model (verbs)

– Sintactic InformationSintactic Information IdiomsIdioms Implicit KnowledgeImplicit Knowledgebarrer_1_1 limpiar (el suelo) con la escoba.freír_1_1 cocer (un manjar) en aceite o grasa hirviendo. comprar_1_1 adquirir (una cosa) a cambio de cierta cantidad de

dinero. cazar_1_1 buscar o perseguir (a las aves, fieras, etc.) para

cogerlas o matarlas.

Structured Sources Structured Sources AcquilexAcquilex

Page 31: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 31

Explicit (LDOCE: Pragmatic Codes, Semantic Code, etc.;DGILE: Explicit (LDOCE: Pragmatic Codes, Semantic Code, etc.;DGILE: Tema, sinonims, antonims, sentits figutats, etc.)Tema, sinonims, antonims, sentits figutats, etc.)

Implicit Implicit

jardín_1_1jardín_1_1 Terreno donde se cultivan plantas y Terreno donde se cultivan plantas y floresflores ornamentales. ornamentales. florero_1_4florero_1_4 Maceta con Maceta con floresflores..

ramo_1_3ramo_1_3 Conjunto natural o artificial de Conjunto natural o artificial de floresflores, ramas o hierbas., ramas o hierbas.

pétalo_1_1pétalo_1_1 Hoja que forma la corola de la Hoja que forma la corola de la florflor. .

tálamo_1_3tálamo_1_3 Receptáculo de la Receptáculo de la florflor. .

miel_1_1miel_1_1 Substancia viscosa y muy dulce que elaboran las abejas, en una Substancia viscosa y muy dulce que elaboran las abejas, en una distensión del esófago, con el jugo de las distensión del esófago, con el jugo de las floresflores y luego depositan en las y luego depositan en las celdillas de sus panales. celdillas de sus panales.

florería_1_1florería_1_1 Floristería; tienda o puesto donde se venden Floristería; tienda o puesto donde se venden floresflores. .

florista_1_1florista_1_1 Persona que tiene por oficio hacer o vender Persona que tiene por oficio hacer o vender floresflores. .

camelia_1_1camelia_1_1 Arbusto cameliáceo de jardín, originario de Oriente, de hojas Arbusto cameliáceo de jardín, originario de Oriente, de hojas perennes y lustrosas, y perennes y lustrosas, y floresflores grandes, blancas, rojas o rosadas grandes, blancas, rojas o rosadas (Camellia japonica). (Camellia japonica).

camelia_1_2camelia_1_2 FlorFlor de este arbusto. de este arbusto.

rosa_1_1rosa_1_1 FlorFlor del rosal. del rosal.

Structured Sources Structured Sources Semantic InformationSemantic Information

Page 32: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 32

Structured Sources Structured Sources Main ProblemsMain Problems

– Conventional dictionaries are not systematicConventional dictionaries are not systematic

– Dictionaries are built for human useDictionaries are built for human use

– Implicit KnowledgeImplicit Knowledge words are described/translated in terms of wordswords are described/translated in terms of words

Page 33: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 33

Structured Sources Structured Sources SEISD (Rigau 98) SEISD (Rigau 98)

The SystemThe System– General FrameGeneral Frame– MethodologyMethodology– SEISDSEISD– Application of the methodologyApplication of the methodology

Page 34: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 34

SEISDSEISD

General FrameGeneral Frame – Characteristics of the Lexical Characteristics of the Lexical

Resources usedResources used– Lexical Knowledge to be extractedLexical Knowledge to be extracted– Lexical Knowledge RepresentationLexical Knowledge Representation– The acquisition processThe acquisition process

Page 35: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 35

SEISDSEISDGeneral FrameGeneral Frame

– Characteristics of the Lexical Characteristics of the Lexical Resources usedResources used

DGILEDGILE Spanish/English bilingual DictionariesSpanish/English bilingual Dictionaries WordNetWordNet Type System of the LKBType System of the LKB

Page 36: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 36

– Characteristics of the Lexical Characteristics of the Lexical Resources usedResources used

DGILEDGILE– 89,043 entries and 157,842 senses89,043 entries and 157,842 senses– 1.4 million words in definitions and examples1.4 million words in definitions and examples– neither semantic nor pragmatic codesneither semantic nor pragmatic codes– no restricted vocabularyno restricted vocabulary

vinovino (l. vinu) (l. vinu) mm. Zumo de uvas fermentado; ... . Zumo de uvas fermentado; ... 2 2 fig. fig. BautizarBautizar o o cristianizarcristianizar, el ~, echarle agua. , el ~, echarle agua. 33 fig. fig. Dormir uno elDormir uno el ~, dormir ~, dormir mientras dura la borrachera; mientras dura la borrachera; tener uno mal ~tener uno mal ~, ser , ser pendenciero en la embriaguez. pendenciero en la embriaguez. 44 p.ext. Zumo. | HOMOF.: vino p.ext. Zumo. | HOMOF.: vino (v.) , bino (v.) .(v.) , bino (v.) .REL. REL. EnológicoEnológico, , enólogoenólogo, , enotecniaenotecnia, derivados de , derivados de enologíaenología, , ciencia de la vinicultura, formada del gr. ciencia de la vinicultura, formada del gr. oinosoinos..

SEISDSEISDGeneral FrameGeneral Frame

Page 37: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 37

– Characteristics of the Lexical Resources Characteristics of the Lexical Resources usedused

Spanish/English bilingual DictionariesSpanish/English bilingual Dictionaries– EEI: 16,463 entries with 28,002 translation fieldsEEI: 16,463 entries with 28,002 translation fields– EIE: 15,352 entries with 27,033 translation fieldsEIE: 15,352 entries with 27,033 translation fields

vinovino mm wine. ~ wine. ~ de Jerezde Jerez, sherry; ~ , sherry; ~ tintotinto, red wine., red wine.

winewine nn vino vino

SEISDSEISDGeneral FrameGeneral Frame

Page 38: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 38

SEISDSEISDGeneral FrameGeneral Frame

– Characteristics of the Lexical Characteristics of the Lexical Resources usedResources used

WordNetWordNet– v1.6 has 123,497 content words and 99,642 v1.6 has 123,497 content words and 99,642

synsetssynsets

Sense 1Sense 1wine, vino -- (fermented juice (of grapes especially))wine, vino -- (fermented juice (of grapes especially)) => sake, saki -- (Japanese beverage from fermented rice ...)=> sake, saki -- (Japanese beverage from fermented rice ...) => vintage -- (a season's yield of wine from a vineyard)=> vintage -- (a season's yield of wine from a vineyard) => red wine -- (wine having a red color derived from skins ...)=> red wine -- (wine having a red color derived from skins ...) => Pinot noir -- (dry red California table wine ...)=> Pinot noir -- (dry red California table wine ...) => claret, red Bordeaux -- (dry red Bordeaux or Bordeaux-=> claret, red Bordeaux -- (dry red Bordeaux or Bordeaux-like wine)like wine) => Saint Emilion -- (full-bodied red wine from ...)=> Saint Emilion -- (full-bodied red wine from ...) => Chianti -- (dry red Italian table wine from the => Chianti -- (dry red Italian table wine from the Chianti ...)Chianti ...) => Cabernet, Cabernet Sauvignon -- (superior Bordeaux-=> Cabernet, Cabernet Sauvignon -- (superior Bordeaux-type red wine)type red wine) => Rioja -- (dry red table wine from the Rioja ...)=> Rioja -- (dry red table wine from the Rioja ...) => zinfandel -- (dry fruity red wine from California)=> zinfandel -- (dry fruity red wine from California)

Page 39: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 39

SEISDSEISDGeneral FrameGeneral Frame

– Characteristics of the Lexical Resources Characteristics of the Lexical Resources usedused

Type System of the LKB (Copestake 92)Type System of the LKB (Copestake 92)– 527 types with 196 features527 types with 196 features

Page 40: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 40

SEISDSEISDGeneral FrameGeneral Frame

– Lexical Knowledge to be extractedLexical Knowledge to be extracted Explicit information (POS, TD, uses, etc.)Explicit information (POS, TD, uses, etc.) Implicit informationImplicit information

– Hypernym/hyponym relations (class/subclass)Hypernym/hyponym relations (class/subclass)– Synonymy/Antonymy relationsSynonymy/Antonymy relations– Meronym/Holonym relation (part/whole, ...)Meronym/Holonym relation (part/whole, ...)– Case role relations (agentive, telic, ...)Case role relations (agentive, telic, ...)– Content relations (qualia, form, Content relations (qualia, form,

constitutive, ...)constitutive, ...)– Collocational relations (compounds, idioms, ...)Collocational relations (compounds, idioms, ...)– Selectional restrictions (typical subject, object, Selectional restrictions (typical subject, object,

...)...)– Translation EquivalencesTranslation Equivalences

Page 41: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 41

SEISDSEISDGeneral FrameGeneral Frame

– Lexical Knowledge RepresentationLexical Knowledge Representation LKB (Copestake 92)LKB (Copestake 92)

– represent both syntactic and semantic represent both syntactic and semantic information information

– Type Feature Structure formalism (Carpenter Type Feature Structure formalism (Carpenter 92)92)

– default inheritancedefault inheritance– lexical and phrasal ruleslexical and phrasal rules– multilingual relationsmultilingual relations

Page 42: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 42

Structured Sources Structured Sources SEISDSEISD

The SystemThe System– General FrameGeneral Frame– MethodologyMethodology– SEISDSEISD– Application of the methodologyApplication of the methodology

Page 43: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 43

SEISDSEISD

MethodologyMethodology

MRD1

MRDn

LDB1 Tax1

LDBn Taxn

MLKB... ... ...

LKB1

LKBn

...

Page 44: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 44

SEISDSEISDMethodologyMethodology

– Problems following a pure descriptive Problems following a pure descriptive approachapproach

CircularityCircularity Errors and inconsistenciesErrors and inconsistencies Definitions with omitted genusDefinitions with omitted genus

Top dictionary senses do not usually Top dictionary senses do not usually represent useful knowledge for the LKBrepresent useful knowledge for the LKB

– Too generalToo general– Too specific Too specific

Page 45: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 45

SEISDSEISDMethodologyMethodology

Mixed MethodologyMixed Methodology

Prescriptive approachPrescriptive approach Manual construction of the Manual construction of the Type SystemType System

Page 46: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 46

SEISDSEISDMethodologyMethodology

Mixed MethodologyMixed Methodology

Descriptive approachDescriptive approach Acquiring implicit information from MRDsAcquiring implicit information from MRDs

Prescriptive approachPrescriptive approach Manual construction of the Manual construction of the Type SystemType System

Page 47: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 47

SEISDSEISDMethodologyMethodology

Mixed MethodologyMixed Methodology

Descriptive approachDescriptive approach Acquiring implicit information from MRDsAcquiring implicit information from MRDs

Prescriptive approachPrescriptive approach Manual construction of the Manual construction of the Type SystemType System

Page 48: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 48

SEISDSEISDMethodologyMethodology

– Step 1Step 1: Selection of the main top : Selection of the main top beginners for a beginners for a

semantic primitivesemantic primitive– Step 2Step 2: Exploiting genus, : Exploiting genus,

construction of construction of taxonomiestaxonomies

– Step 3Step 3: Exploiting differentia: Exploiting differentia– Step 4Step 4: Mapping the LK into the LKB: Mapping the LK into the LKB– Step 5Step 5: Tlinks Generation: Tlinks Generation– Step 6Step 6: Validation and exploitation of : Validation and exploitation of

the the LKB LKB

Page 49: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 49

Structured Sources Structured Sources SEISDSEISD

The SystemThe System– General FrameGeneral Frame– MethodologyMethodology– SEISDSEISD– Application of the methodologyApplication of the methodology

Page 50: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 50

Structured Sources Structured Sources SEISDSEISD

– SEISDSEISD: Sistema d’Extracció : Sistema d’Extracció d’Informació Semàntica de Diccionaris d’Informació Semàntica de Diccionaris (Ageno et al. 92)(Ageno et al. 92)

designed to support the main methodologydesigned to support the main methodology taking into account the characteristics of taking into account the characteristics of

the Lexical resources usedthe Lexical resources used reusability of software and lexical resourcesreusability of software and lexical resources allowing modular improvementsallowing modular improvements minimal effortminimal effort

Page 51: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 51

SEISDSEISD

User

LKB System

PRE

SemBuild

TaxBuild

CRS

TGE

LDB/LKB system

LDB SystemLinguistic KnowledgeSegWord and FPar MACO+, Relax and SinPar

Lexical Knowledge

User

LDBDGILEEnglish/SpanishSpanish/English

MTDs

WordNet

Taxonomies

LKBType SystemLexicons

LDB/LKB Lexicons

Page 52: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 52

Structured Sources Structured Sources SEISD (Rigau 98) SEISD (Rigau 98)

The SystemThe System– General FrameGeneral Frame– MethodologyMethodology– SEISDSEISD– Application of the methodologyApplication of the methodology

Page 53: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 53

SEISD: Application of the methodology SEISD: Application of the methodology Step 1:Step 1: Selection of the main top Selection of the main top beginners beginners for a semantic primitive (Rigau et al. 98)for a semantic primitive (Rigau et al. 98)

Word sense: Word sense: zumo_1_1zumo_1_1 Attached-to:Attached-to: c_art_substc_art_subst type. type.Definition:Definition: líquidolíquido que se extrae de las que se extrae de las flores, flores,

hierbas, frutos, etc. hierbas, frutos, etc. ((liquidliquid extracted from extracted from

flowers, herbs,flowers, herbs, fruits, etcfruits, etc).).

Page 54: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 54

SEISD: Application of the methodology SEISD: Application of the methodology Step 1:Step 1: Selection of the main top Selection of the main top beginners beginners for a semantic primitivefor a semantic primitive

– A) Attaching DGILE senses to semantic A) Attaching DGILE senses to semantic primitivesprimitives

1) First labelling:1) First labelling:– Conceptual Distance (Rigau 94)Conceptual Distance (Rigau 94)

2) Second labelling:2) Second labelling:– Salient Words (Yarowsky 92)Salient Words (Yarowsky 92)

– B) Filtering ProcessB) Filtering Process

Page 55: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 55

SEISD: Application of the methodology SEISD: Application of the methodology Step 1:Step 1: Selection of the main top Selection of the main top beginners beginners for a semantic primitivefor a semantic primitive

– A.1) First labelling:A.1) First labelling: Conceptual Distance (Agirre et al. 94)Conceptual Distance (Agirre et al. 94)

– length of the shortest pathlength of the shortest path– specificity of the conceptsspecificity of the concepts

)c,path(cc kwcwc

21

i2i1k2i2

1i1 )depth(c

1min)w,dist(w

using WordNet using WordNet Bilingual dictionaryBilingual dictionary

Page 56: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 56

SEISD: Application of the methodology SEISD: Application of the methodology Step 1:Step 1: Selection of the main top Selection of the main top beginners beginners for a semantic primitivefor a semantic primitive

<object, ...><object, ...>

<artifact, artefact><artifact, artefact>

<entity><entity>

<structure, construction><structure, construction>

<building, edifice><building, edifice>

<place of worship, ...><place of worship, ...>

<<churchchurch, church building>, church building>

<<abbeyabbey>>

<house, lodging><house, lodging>

abadíaabadía_1_2 _1_2 IglesiaIglesia o o monasteriomonasterio regido por un abad o abadesa regido por un abad o abadesa

((abbey, a church or a monastery ruled by an abbot or an abbessabbey, a church or a monastery ruled by an abbot or an abbess))

<religious residence, cloiser><religious residence, cloiser>

<monastery><monastery>

<abbey><abbey>

<convent><convent>

<abbey><abbey>

Page 57: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 57

SEISD: Application of the methodology SEISD: Application of the methodology Step 1:Step 1: Selection of the main top Selection of the main top beginners beginners for a semantic primitivefor a semantic primitive

<object, ...><object, ...>

<artifact, artefact><artifact, artefact>

<entity><entity>

<structure, construction><structure, construction>

<building, edifice><building, edifice>

<place of worship, ...><place of worship, ...>

<<churchchurch, church building>, church building>

<<abbeyabbey> > 06 ARTIFACT06 ARTIFACT

<house, lodging><house, lodging>

abadíaabadía_1_2 _1_2 IglesiaIglesia o o monasteriomonasterio regido por un abad o abadesa regido por un abad o abadesa

((abbey, a church or a monastery ruled by an abbot or an abbessabbey, a church or a monastery ruled by an abbot or an abbess))

<religious residence, cloiser><religious residence, cloiser>

<monastery><monastery>

<abbey><abbey>

<convent><convent>

<abbey><abbey>

Page 58: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 58

SEISD: Application of the methodology SEISD: Application of the methodology Step 1:Step 1: Selection of the main top Selection of the main top beginners beginners for a semantic primitivefor a semantic primitive

– A.1) First labelling (Results)A.1) First labelling (Results)

29,205 labelled definitions (31%)29,205 labelled definitions (31%) 61% accuracy at a sense level61% accuracy at a sense level 64% accuracy at a file level64% accuracy at a file level

Page 59: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 59

– A.2) Second labelling:A.2) Second labelling: Salient Words (Yarowsky 92)Salient Words (Yarowsky 92)

SEISD: Application of the methodology SEISD: Application of the methodology Step 1:Step 1: Selection of the main top Selection of the main top beginners beginners for a semantic primitivefor a semantic primitive

Pr(w)

SC)|Pr(wlogSC)|Pr(wSC)AR(w, 2

ImportanceImportance– local frequencylocal frequency– appears more significantly more often in the appears more significantly more often in the

corpus of a semantic category than at other corpus of a semantic category than at other points in the whole corpus points in the whole corpus

Page 60: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 60

SEISD: Application of the methodology SEISD: Application of the methodology Step 1:Step 1: Selection of the main top Selection of the main top beginners beginners for a semantic primitivefor a semantic primitive

– A.2) Second labelling (Results):A.2) Second labelling (Results):

86,759 labelled definitions (93%)86,759 labelled definitions (93%) 80% accuracy at a file level80% accuracy at a file level

biberón_1_1 ARTIFACT 4.8399 biberón_1_1 ARTIFACT 4.8399 FrascoFrasco de cristal ... de cristal ...

((glass flask ...)glass flask ...)

biberón_1_2 FOOD 7.4443 biberón_1_2 FOOD 7.4443 LecheLeche que contiene este frasco ... que contiene este frasco ...

((milk contained in that flask ...)milk contained in that flask ...)

Page 61: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 61

SEISD: Application of the methodology SEISD: Application of the methodology Step 1:Step 1: Selection of the main top Selection of the main top beginners beginners for a semantic primitivefor a semantic primitive

– B) Filtering process (FOODs)B) Filtering process (FOODs)

removes all genus termsremoves all genus terms– FILTER 1: not FOODs by the bilingual mappingFILTER 1: not FOODs by the bilingual mapping– FILTER 2: appear more often as genus in other SCFILTER 2: appear more often as genus in other SC– FILTER 3: with a low frequencyFILTER 3: with a low frequency

Page 62: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 62

SEISD: Application of the methodology SEISD: Application of the methodology Step 1:Step 1: Selection of the main top Selection of the main top beginners beginners for a semantic primitivefor a semantic primitive

– B) Filtering process (FOOD Results)B) Filtering process (FOOD Results)

FILTER 1 FILTER 2LABEL2 #GT Accuracy #GT AccuracyLABEL2+F3>9 31 94% 31 100%LABEL2+F3>8 35 95% 35 100%LABEL2+F3>7 37 91% 37 95%LABEL2+F3>6 43 92% 41 94%LABEL2+F3>5 49 92% 47 92%LABEL2+F3>4 55 91% 56 91%LABEL2+F3>3 64 85% 65 87%LABEL2+F3>2 85 82% 82 83%LABEL2+F3>1 125 78% 123 82%

Page 63: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 63

Word sense:Word sense: vino_1_1vino_1_1 Hypernym:Hypernym: zumo_1_1zumo_1_1..Definition:Definition: zumo de uvas fermentado. zumo de uvas fermentado.

((fermented juice of fermented juice of grapesgrapes).).

Word sense: Word sense: rueda_2_1rueda_2_1 Hypernym:Hypernym: vino_1_1vino_1_1..Definition:Definition: vino procedente de la región vino procedente de la región de de Rueda Rueda (Valladolid). (Valladolid).

((wine from the region wine from the region of Ruedaof Rueda).).

SEISD: Application of the methodology SEISD: Application of the methodology Step 2 (TaxBuild):Step 2 (TaxBuild): Exploiting GenusExploiting Genus(Rigau et al. 97) (Rigau et al. 97)

Page 64: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 64

SEISD: Application of the methodology SEISD: Application of the methodology Step 2 (TaxBuild):Step 2 (TaxBuild): Exploiting Genus Exploiting Genus

– Genus Sense IdentificationGenus Sense Identification 97% accuracy for nouns97% accuracy for nouns

– Genus Sense DisambiguationGenus Sense Disambiguation Unsupervised WSDUnsupervised WSD Unrestricted WSD (coverage 100%)Unrestricted WSD (coverage 100%) Eight Heuristics (McRoy 92)Eight Heuristics (McRoy 92)

– Combining several lexical resourcesCombining several lexical resources– Combining several methods Combining several methods

Page 65: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 65

SEISD: Application of the methodology SEISD: Application of the methodology Step 2 (TaxBuild):Step 2 (TaxBuild): Exploiting Genus Exploiting Genus

–Results:Results:

Polysemous OverallPrec. Cov. Prec. Cov.

Heuristic 1: Monosemous Genus Term - - 100% 16%Heuristic 2: Entry Sense Ordering 70% 100% 75% 100%Heuristic 3: Explicit Semantic Domain 100% 1% 100% 2%Heuristic 4: Word Matching 72% 61% 79% 56%Heuristic 5: Simple Concordance 57% 100% 65% 95%Heuristic 6: Cooccurrence Vectors 60% 100% 66% 97%Heuristic 7: Semantic Vectors 58% 99% 63% 94%Heuristic 8: Conceptual Distance 49% 95% 57% 89%Sum 79% 100% 83% 100%

Page 66: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 66

SEISD: Application of the methodology SEISD: Application of the methodology Step 2 (TaxBuild):Step 2 (TaxBuild): Exploiting Genus Exploiting Genus

–Knowledge provided by each heuristic:Knowledge provided by each heuristic:

OverallPrec. Cov.

- Heuristic 1: Monosemous Genus Term 79% 100%- Heuristic 2: Entry Sense Ordering 72% 100%- Heuristic 3: Explicit Semantic Domain 82% 98%- Heuristic 4: Word Matching 81% 100%- Heuristic 5: Simple Concordance 81% 100%- Heuristic 6: Cooccurrence Vectors 81% 100%- Heuristic 7: Semantic Vectors 81% 100%- Heuristic 8: Conceptual Distance 77% 100%Sum 83% 100%

Page 67: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 67

SEISD: Application of the methodology SEISD: Application of the methodology Step 2 (TaxBuild):Step 2 (TaxBuild): Exploiting Genus Exploiting Genus

–F2+F3>9: 35,099 definitionsF2+F3>9: 35,099 definitions–F2+F3>4: 40,754 definitionsF2+F3>4: 40,754 definitions–No filters: 111,624 definitionsNo filters: 111,624 definitions

FOOD [Castellón 93] F2+F3>9 F2+F3>4Genus terms 63 33 68Dictionary senses 392 952 1,242Levels 6 5 6Senses in level 1 2 18 48Senses in level 2 67 490 604Senses in level 3 88 379 452Senses in level 4 67 44 65Senses in level 5 87 21 60Senses in level 6 6 0 13

Page 68: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 68

SEISD: Application of the methodology SEISD: Application of the methodology Step 2 (TaxBuild):Step 2 (TaxBuild): Exploiting Genus Exploiting Genus

......zumo_1_1 zumo_1_1 vino_1_1 vino_1_1 quianti_1_1 quianti_1_1 zumo_1_1 zumo_1_1 vino_1_1 vino_1_1 raya_1_8 raya_1_8 zumo_1_1 zumo_1_1 vino_1_1 vino_1_1 requena_1_1 requena_1_1 zumo_1_1 zumo_1_1 vino_1_1 vino_1_1 reserva_1_12 reserva_1_12 zumo_1_1 zumo_1_1 vino_1_1 vino_1_1 ribeiro_1_1 ribeiro_1_1 zumo_1_1 zumo_1_1 vino_1_1 vino_1_1 rioja_1_1 rioja_1_1 zumo_1_1 zumo_1_1 vino_1_1 vino_1_1 roete_1_1 roete_1_1 zumo_1_1 zumo_1_1 vino_1_1 vino_1_1 rosado_1_3 rosado_1_3 zumo_1_1 zumo_1_1 vino_1_1 vino_1_1 rueda_2_1 rueda_2_1 zumo_1_1 zumo_1_1 vino_1_1 vino_1_1 sherry_1_1 sherry_1_1 zumo_1_1 zumo_1_1 vino_1_1 vino_1_1 tarragona_1_1 tarragona_1_1 zumo_1_1 zumo_1_1 vino_1_1 vino_1_1 tintilla_1_1 tintilla_1_1 zumo_1_1 zumo_1_1 vino_1_1 vino_1_1 tintorro_1_1 tintorro_1_1 zumo_1_1 zumo_1_1 vino_1_1 vino_1_1 toro_3_1toro_3_1......

Page 69: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 69

Word Sense:Word Sense: rueda_2_1rueda_2_1Definition:Definition: vino procedente de la región de Ruedavino procedente de la región de RuedaSinPar:SinPar: sn:sn: [n:[n: vino]vino]

origin:origin: [n:[n: región,región, sp:sp: [r0d:[r0d: de,de,

sn:sn: [n:[n: Rueda]]].Rueda]]].

SEISD: Application of the methodology SEISD: Application of the methodology Step 3 (SemBuild):Step 3 (SemBuild): Exploiting Exploiting DifferentiaDifferentia

Page 70: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 70

SEISD: Application of the methodology SEISD: Application of the methodology Step 3 (SemBuild):Step 3 (SemBuild): Exploiting Exploiting DifferentiaDifferentia

FOOD [Castellón 93] F2+F3>9 F2+F3>4definitions 392 952 1,242properties 137 717 825pp-mod 197 1,118 1,310goal 15 17 19composed-by 44 82 96simil 2 19 23purpose 5 18 25color 2 10 12temp 5 14 14origin 0 18 20total 407 2,013 2,344total syntagms 883 2,760 3,270

–MACO+, Relax (Padró 97), SinParMACO+, Relax (Padró 97), SinPar

Page 71: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 71

rueda x_1_1rueda x_1_1<lex-noun-sign rqs> <vino_X_1_1<lex-noun-sign rqs><lex-noun-sign rqs> <vino_X_1_1<lex-noun-sign rqs><lex-sign sense-id : sense-id dictionary> = (“VOX”)<lex-sign sense-id : sense-id dictionary> = (“VOX”)<lex-sign sense-id : sense-id word> = (“rueda”)<lex-sign sense-id : sense-id word> = (“rueda”)<lex-sign sense-id : sense-id homonym-no> = (“2”)<lex-sign sense-id : sense-id homonym-no> = (“2”)<lex-sign sense-id : sense-id sense-no> = (“1”)<lex-sign sense-id : sense-id sense-no> = (“1”)<rqs : origin-area > = <(“Rueda”).<rqs : origin-area > = <(“Rueda”).

SEISD: Application of the methodology SEISD: Application of the methodology Step 4 (CRS):Step 4 (CRS): Placing the LK into the LKBPlacing the LK into the LKB

Page 72: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 72

rueda_x_2_1 linked to wine_l_1_1 (parent)rueda_x_2_1 linked to wine_l_1_1 (parent)rueda_x_2_1 linked to drink_l_2_1 (grandparent)rueda_x_2_1 linked to drink_l_2_1 (grandparent)

rueda_x_2_1 linked to <wine, vino> (parent)rueda_x_2_1 linked to <wine, vino> (parent)rueda_x_2_1 linked to <..., drink,rueda_x_2_1 linked to <..., drink, ... ...> (grandparent)> (grandparent)

SEISD: Application of the methodologySEISD: Application of the methodologyStep 5 (TGE):Step 5 (TGE): Tlinks Generation (Ageno et Tlinks Generation (Ageno et al. 93)al. 93)

Page 73: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 73

SEISD: Application of the SEISD: Application of the methodologymethodologyStep 5 (TGE):Step 5 (TGE): Tlinks Tlinks GenerationGeneration

<fs0:1> <fs0:0> <fs1:0> <fs1:1>

furniture furniture muebles mueble

identity pluraltlink

–Simple tlink:Simple tlink:

–Partial tlinkPartial tlink–Rioja_x_1_1 linked to wine_l_1_1Rioja_x_1_1 linked to wine_l_1_1

–Phrasal tlinkPhrasal tlink–ahumado_x_1_1 linked to ahumado_x_1_1 linked to smoked_food_l_1_1smoked_food_l_1_1

Page 74: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 74

First experimentFirst experiment

– (semi)automatic approach using (semi)automatic approach using PREPRE

– linking DGILE to LDOCElinking DGILE to LDOCE– drink taxonomy (235 drink taxonomy (235

definitions)definitions)

SEISD: Application of the SEISD: Application of the methodologymethodologyStep 5 (TGE):Step 5 (TGE): Tlinks Tlinks GenerationGeneration

Page 75: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 75

First experiment First experiment (results)(results)

SEISD: Application of the SEISD: Application of the methodologymethodologyStep 5 (TGE):Step 5 (TGE): Tlinks Tlinks GenerationGeneration

Tlinks Spanish Englishsimple-tlinks (14,5%) 55by simple-tlink-ruleset 41 26 31by compound-tlink-ruleset 1 1 1by orthographic-tlink-ruleset 13 13 13phrasal-tlinks (0.5 %) 2by phrasal-noun-tlink-ruleset 2 1 3partial-tlinks (85 %) 320by parent-tlink-ruleset 268 149 15by grandparent-tlink-ruleset 44 30 10by general-tlink-ruleset 8 7 6

Page 76: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 76

Second experimentSecond experiment

– automatic approach using automatic approach using PREPRE

Conceptual DistanceConceptual Distance

– linking DGILE to WordNetlinking DGILE to WordNet– food taxonomy (140 food taxonomy (140

definitions)definitions)

SEISD: Application of the SEISD: Application of the methodologymethodologyStep 5 (TGE):Step 5 (TGE): Tlinks Tlinks GenerationGeneration

Page 77: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 77

SEISD: Application of the SEISD: Application of the methodologymethodologyStep 5 (TGE):Step 5 (TGE): Tlinks Tlinks GenerationGeneration

zumo_1_1zumo_1_1

vino_1_1vino_1_1

rueda_2_1rueda_2_1

<juice><juice>

<foodstuff><foodstuff>

<food, nutrient><food, nutrient>

<substance, matter><substance, matter>

<object><object>

<entity><entity>

<..., drink, ...><..., drink, ...>

<beverage, drink, ...><beverage, drink, ...>

<wine, vino><wine, vino>

Page 78: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 78

SEISD: Application of the SEISD: Application of the methodologymethodologyStep 5 (TGE):Step 5 (TGE): Tlinks Tlinks GenerationGeneration

zumo_1_1zumo_1_1

vino_1_1vino_1_1

rueda_2_1rueda_2_1

<juice><juice>

<foodstuff><foodstuff>

<food, nutrient><food, nutrient>

<substance, matter><substance, matter>

<object><object>

<entity><entity>

<..., drink, ...><..., drink, ...>

<beverage, drink, ...><beverage, drink, ...>

<wine, vino><wine, vino>simple-tlinksimple-tlink

simple-tlinksimple-tlink

partial-tlinkpartial-tlink

Page 79: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 79

Second experiment Second experiment (results)(results)

SEISD: Application of the SEISD: Application of the methodologymethodologyStep 5 (TGE):Step 5 (TGE): Tlinks Tlinks GenerationGeneration

simple-tlinks 57simple-tlink-ruleset 52compound-tlink-ruleset 2orthographic-tlink-ruleset 3phrasal-tlinks 1phrasal-noun-tlink-ruleset 1partial-tlinks 84parent-tlink-ruleset 78grandparent-tlink-ruleset 6

Page 80: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 80

Ten class methodsTen class methods– Four monosemic criteriaFour monosemic criteria– Four polysemic criteriaFour polysemic criteria– two hybrid criteriatwo hybrid criteria

Three conceptual distance Three conceptual distance methodsmethods– CD1: using pairwise word CD1: using pairwise word

coocurrencescoocurrences– CD2: using headword and genusCD2: using headword and genus– CD3: using bilingual Spanish entries CD3: using bilingual Spanish entries

with multiple translationswith multiple translations

SEISD: Application of the methodologySEISD: Application of the methodologyStep 5: Mapping bilingual entries to Step 5: Mapping bilingual entries to WordNetWordNet(Atserias et al. 97)(Atserias et al. 97)

Page 81: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 81

Ten class methodsTen class methods– Four monosemic criteriaFour monosemic criteria

SEISD: Application of the methodologySEISD: Application of the methodologyStep 5: Mapping bilingual entries to Step 5: Mapping bilingual entries to WordNetWordNet

SWSW EWEW

SWSW EWEW

EWEW

SWSW EWEW

EWEWSWSW

SWSW EWEW

SWSW

Page 82: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 82

Ten class methodsTen class methods– Four monosemic criteriaFour monosemic criteria

SEISD: Application of the methodologySEISD: Application of the methodologyStep 5: Mapping bilingual entries to Step 5: Mapping bilingual entries to WordNetWordNet

SWSW EWEW

SWSW EWEW

EWEW

SynsetSynset

SynsetSynset

SynsetSynset

SynsetSynset

SWSW EWEW

EWEWSWSW

SynsetSynset

SynsetSynset

SWSW EWEW

SWSW

Page 83: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 83

Ten class methodsTen class methods– Four polysemic criteriaFour polysemic criteria

SEISD: Application of the methodologySEISD: Application of the methodologyStep 5: Mapping bilingual entries to Step 5: Mapping bilingual entries to WordNetWordNet

SWSW EWEW

SWSW EWEW

EWEW

SWSW EWEW

EWEWSWSW

Synset+Synset+

Synset+Synset+

Synset+Synset+

Synset+Synset+

Synset+Synset+

Synset+Synset+

SWSW EWEW

SWSW

Page 84: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 84

Ten class methodsTen class methods– Variant criterionVariant criterion

– Field criterionField criterion

SEISD: Application of the methodologySEISD: Application of the methodologyStep 5: Mapping bilingual entries to Step 5: Mapping bilingual entries to WordNetWordNet

<..., EW, ..., EW, ...><..., EW, ..., EW, ...>

SWSW

<..., headword-EW, ..., Ind-EW, ...><..., headword-EW, ..., Ind-EW, ...>

SWSW

Page 85: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 85

Ten class methods (results)Ten class methods (results)

SEISD: Application of the methodologySEISD: Application of the methodologyStep 5: Mapping bilingual entries to Step 5: Mapping bilingual entries to WordNetWordNet

Criterion #links #synsets #words %okmono1 3697 3583 3697 92mono2 935 929 661 89mono3 1863 1158 1863 89mono4 2688 1328 2063 85poly1 5121 4887 1992 80poly2 1450 1426 449 75poly3 11687 6611 3165 58poly4 40298 9400 3754 61Variant 3164 2195 2261 85Field 510 379 421 78

Page 86: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 86

Three CD methods (results)Three CD methods (results)

SEISD: Application of the methodologySEISD: Application of the methodologyStep 5: Mapping bilingual entries to Step 5: Mapping bilingual entries to WordNetWordNet

Criter. #links #synsets #words %okCD - 1 23,828 11,269 7,283 56CD - 2 24,739 12,709 10,300 61CD - 3 4,567 3,089 2,313 75

Page 87: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 87

Combining methods (results)Combining methods (results)

SEISD: Application of the methodologySEISD: Application of the methodologyStep 5: Mapping bilingual entries to Step 5: Mapping bilingual entries to WordNetWordNet

method2method1 cd2 cd3 p1 p2 p3 p4cd1 size 15736 1849 2076 556 3146 15105

%ok 79 85 86 86 72 64cd2 size 0 2401 2536 592 3777 13246

%ok 0 86 88 86 75 67cd3 size 0 0 205 180 215 3114

%ok 0 0 95 95 100 77p1 size 0 0 0 0 77 178

%ok 0 0 0 0 100 88p2 size 0 0 0 0 28 78

%ok 0 0 0 0 77 96

Page 88: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 88

Resulting Spanish WordNetsResulting Spanish WordNets

SEISD: Application of the methodologySEISD: Application of the methodologyStep 5: Mapping bilingual entries to Step 5: Mapping bilingual entries to WordNetWordNet

Criterion #links #synsets #word #CS #poly linksSpWN v0.0 10,982 7,131 8,396 87.4 1,777Combination 7,244 5,852 3,939 85.6 2,075SpWN v0.1 15,535 10,786 9,986 86.4 3,373

Page 89: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 89

Acquisition of Lexical Knowledge for NLP Acquisition of Lexical Knowledge for NLP LK from MRDs: TaxonomiesLK from MRDs: Taxonomies

......zumo_1_1 zumo_1_1 vino_1_1 vino_1_1 quianti_1_1 quianti_1_1 zumo_1_1 zumo_1_1 vino_1_1 vino_1_1 raya_1_8 raya_1_8 zumo_1_1 zumo_1_1 vino_1_1 vino_1_1 requena_1_1 requena_1_1 zumo_1_1 zumo_1_1 vino_1_1 vino_1_1 reserva_1_12 reserva_1_12 zumo_1_1 zumo_1_1 vino_1_1 vino_1_1 ribeiro_1_1 ribeiro_1_1 zumo_1_1 zumo_1_1 vino_1_1 vino_1_1 rioja_1_1 rioja_1_1 zumo_1_1 zumo_1_1 vino_1_1 vino_1_1 roete_1_1 roete_1_1 zumo_1_1 zumo_1_1 vino_1_1 vino_1_1 rosado_1_3 rosado_1_3 zumo_1_1 zumo_1_1 vino_1_1 vino_1_1 rueda_2_1 rueda_2_1 zumo_1_1 zumo_1_1 vino_1_1 vino_1_1 sherry_1_1 sherry_1_1 zumo_1_1 zumo_1_1 vino_1_1 vino_1_1 tarragona_1_1 tarragona_1_1 zumo_1_1 zumo_1_1 vino_1_1 vino_1_1 tintilla_1_1 tintilla_1_1 zumo_1_1 zumo_1_1 vino_1_1 vino_1_1 tintorro_1_1 tintorro_1_1 zumo_1_1 zumo_1_1 vino_1_1 vino_1_1 toro_3_1toro_3_1......

Page 90: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 90

Acquisition of Lexical Knowledge for NLP Acquisition of Lexical Knowledge for NLP LK from MRDs: MTDsLK from MRDs: MTDs

371.616 conexions371.616 conexions

11.8004 11.8004 9.8 9.8 16 16 elaborado elaborado queso queso 35 35 11311310.8938 10.8938 8.0 8.0 23 23 pasta pasta queso queso 178 178 11311310.4846 10.4846 7.5 7.5 25 25 leche leche queso queso 274 274 11311310.2483 10.2483 9.2 9.2 13 13 oveja oveja queso queso 45 45 1131139.1513 9.1513 7.6 7.6 16 16 queso queso sabor sabor 113 113 1601607.4956 7.4956 8.3 8.3 8 8 queso queso tortilla tortilla 113 113 51516.7732 6.7732 7.5 7.5 8 8 queso queso vaca vaca 113 113 84846.5830 6.5830 6.1 6.1 12 12 maíz maíz queso queso 347 347 1131136.2208 6.2208 8.9 8.9 5 5 queso queso suero suero 113 113 21216.1509 6.1509 8.8 8.8 5 5 mantequilla mantequilla queso queso 22 22 1131136.1474 6.1474 7.9 7.9 6 6 compacta compacta queso queso 50 50 1131135.9918 5.9918 7.7 7.7 6 6 picante picante queso queso 55 55 1131135.9002 5.9002 9.8 9.8 4 4 manchego manchego queso queso 9 9 1131135.6805 5.6805 7.3 7.3 6 6 cabra cabra queso queso 75 75 1131135.6300 5.6300 5.9 5.9 9 9 pan pan queso queso 287 287 113113

Page 91: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 91

Page 92: Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes

Acquisition of Lexical Knowledge for NLP 92

Acquisition of Lexical Knowledge for NLP Acquisition of Lexical Knowledge for NLP LK from MRDs: EuroWordNetLK from MRDs: EuroWordNet

SPANISH SPANISH synsetssynsets wordswords variantsvariants

adjsadjs 12,46112,461 8,7148,714 16,71316,713

nounsnouns 43,57343,573 47,81347,813 62,31962,319

verbsverbs 8,2988,298 6,0106,010 13,23013,230

SumSum 64,33264,332 62,53762,537 92,26292,262

CATALAN CATALAN

adjsadjs 1,3861,386 1,5071,507 2,0302,030

nounsnouns 30,70130,701 32,98832,988 42,32042,320

verbsverbs 4,4874,487 4,2894,289 10,29710,297

SumSum 36,57436,574 38,78438,784 54,64754,647