marek maziarz, maciej piasecki, ewa rudnicka, stanisław szpakowicz g4.19 research group wrocław...
TRANSCRIPT
![Page 1: Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0ac4d/html5/thumbnails/1.jpg)
Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz
G4.19 Research GroupWrocław University of Technology
nlp.pwr.wroc.pl
plwordnet.pwr.wroc.pl
Beyond the Transfer and Merge Wordnet Construction:
plWordNet and a Comparison with
WordNet
![Page 2: Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0ac4d/html5/thumbnails/2.jpg)
Wordnet
{samochód 1, pojazd samochodowy 1, auto 1, wóz 1 `car, automobile’ }
{pogotowie 3, karetka 1, sanitarka 1, karetka pogotowia 1 `ambulance’ }
meronymy
{ samochodzik 2 `small car’ }deminutiveness
{bagażnik 1 `boot’ }
hypernymy/hyponymy
![Page 3: Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0ac4d/html5/thumbnails/3.jpg)
plWordNet 2.0
![Page 4: Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0ac4d/html5/thumbnails/4.jpg)
Independent vs. Translation-based Wordnet Construction
• Transfer and merge.Examples: – EuroWordNet – most component wordnets built
by the transfer method (Vossen 2002)
– MultiWordNet – semi-automatic acquisition method from the Princeton WordNet (Bentivogli et. al. 2000)
– IndoWordNet – expansion from Hindi Wordnet (Sinha et al. 2006, Bhattacharyya 2010)
– FinWordNet – directly translated from the Princeton WordNet
![Page 5: Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0ac4d/html5/thumbnails/5.jpg)
Independent vs. Translation-based Wordnet Construction
• From scratch.Examples: –GermaNet – the core built
independently– plWordNet – a unique, corpus-based
method; largely independent of the Princeton WordNet
![Page 6: Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0ac4d/html5/thumbnails/6.jpg)
Synonymy and synsets
• “A wordnet is a collection of synsets linked by semantic relations.”
• A synset is a set of synonyms which represent the same lexicalised concept
• Synonyms are members of the same synset
Wordnet development deserves better: an operational theory with precise guidelines for wordnet editors.
![Page 7: Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0ac4d/html5/thumbnails/7.jpg)
Basic building block: synset vs lexical unit?
• Synset relations link lexicalised concepts• But are named after linguistic lexico-semantic
relations• Substitution tests are defined for lexical units • Synsets group lexical units• Every wordnet includes relations between
lexical units (lexical relations), e.g., antonymy• Lexical units can be observed in text,
concepts cannot
![Page 8: Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0ac4d/html5/thumbnails/8.jpg)
Constitutive relations
• Synset = a group of lexical units which share all constitutive relations
• Constitutive relation = a lexico-semantic relation which– is frequent enough– and frequently shared by groups
Also– is established in linguistics– and accepted in the wordnet tradition
• Examples: hypernymy, meronymy, cause
![Page 9: Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0ac4d/html5/thumbnails/9.jpg)
Synset as an abbreviation
Synset as a notational conventionfor a group of lexical units sharing certain relationsrepresents synonyms{afekt 1 `passion’, uczucie 2 `feeling’} hypernym
{miłość 1 `love’, umiłowanie 1 `affection’ , kochanie 1 `loving’}
This is based on constitutive relationsAdditional distinctions: stylistic register and aspectMinimal committment principle: make as few
assumptions as possible
![Page 10: Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0ac4d/html5/thumbnails/10.jpg)
Relations in plWordNet
• Starting point: relations in Princeton WordNet, EuroWordNet and GermaNete.g., hyponymy, meronymy, antonymy,cause, instance for proper names
• Additional constitutive relations– e.g., verb meronymy, preceding,
presupposition, – gradation for adjectives
![Page 11: Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0ac4d/html5/thumbnails/11.jpg)
Relations in plWordNet
• Specific: derivationally based lexico-semantic relations, e.g.,– inhabitant (góral ‘highlander’ – góry
‘highlands’)– inchoativity (zapalić sięperfect `light, start
burning' -- palić sięimperfect `burn, produce light')
– process (chamiećimperfect `to become a boor‘ – cham `boor‘)
![Page 12: Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0ac4d/html5/thumbnails/12.jpg)
Construction process
1. Data collection: 1.8 billion words corpus2. Data selection phase– corpus browsing– WSD-based word usage example extraction– WordnetWeaver: semi-automatic expansion
3. Data analysis – questions• is it a correct Polish lemma?• how many lexical units does it have?• how to describe them with relations?
• Other knowledge sources: available Polish dictionaries, thesauri,
encyclopaedias, lexicons, the Web, and intuition.
![Page 13: Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0ac4d/html5/thumbnails/13.jpg)
The result – size matters
compared withPrinceton WordNet:
• General statistics• Lexical coverage• Polysemy• Synset size• Relation density• Hypernymy depth
www.plwordnet.pwr.wroc.pl
![Page 14: Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0ac4d/html5/thumbnails/14.jpg)
General statistics
Number of synsets, lemmas and LUs in the largest wordnets
![Page 15: Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0ac4d/html5/thumbnails/15.jpg)
Lexical coverage
Proportion of lemmas from PWN/plWN found among vocabulary with a given corpus frequency
![Page 16: Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0ac4d/html5/thumbnails/16.jpg)
Polysemy
Proportion of polysemous lemmas with regard to POS
![Page 17: Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0ac4d/html5/thumbnails/17.jpg)
Relation density
Synset relation density in PWN 3.1 and in plWordNet 2.0
![Page 18: Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0ac4d/html5/thumbnails/18.jpg)
Hypernymy depth
Hypernymy path length for nouns in PWN 3.1and plWordNet 2.0
![Page 19: Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0ac4d/html5/thumbnails/19.jpg)
Hypernymy depth
Polish WordNet
Princeton WordNet
![Page 20: Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0ac4d/html5/thumbnails/20.jpg)
Hypernymy depth
Computer
ElectricDevice
Device
Artifact
Object
Physical
Entity
Polish WordNet
Princeton WordNet
SUMO
![Page 21: Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0ac4d/html5/thumbnails/21.jpg)
Mapping procedure:plWordNet onto Princeton WordNet
1.Recognise the sense of the source synset: • the position in the network structure • existing relations, commentaries; other synsets
containing the given lemma
2.Search the target synset• candidates for the target synset: intuitions,
automatic prompting and dictionaries • verifying candidates:
• comparing hypernymy and hyponymy structures• existing inter-lingual relations; • definitions, commentaries; dictionaries
3.Link the source synset with the target synset
![Page 22: Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0ac4d/html5/thumbnails/22.jpg)
Hierarchy of inter-lingual relations• Inter-lingual Synonymy (only one per
synset) • Inter-lingual inter-register synonymy• I-partial synonymy• I-hyponymy• I-hypernymy• I-meronymy
for parts, elements or materials of bigger wholes
• I-holonymy for a whole made of smaller parts, elements or
materials
![Page 23: Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0ac4d/html5/thumbnails/23.jpg)
Results of inter-lingual mapping• Mapping direction: plWordNet – Princeton WordNet• Bottom-up – from the lowest levels in the hierarchy up• ~48 300 synsets mapped (~64 400 lexical
units/senses)– Synonymy: 15268– Partial synonymy: 971– Inter-register synonymy: 676– Hyponymy: 23677– Hypernymy: 3526– Meronymy: 1898– Holonymy: 555
• Mapped branches– people, artefacts, places, food, time units: all
communication, states and processes, body parts, group names: partially
![Page 24: Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0ac4d/html5/thumbnails/24.jpg)
Different relations for coding the same conceptual dependencies
![Page 25: Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0ac4d/html5/thumbnails/25.jpg)
Applications
Free WordNet-type licence facilitate applications. Examples:• Semantic annotation in a corpus of referential gestures (Lis, 2012)• Lexicon of semantic valency frames (Hajnicz, 2011; Hajnicz, 2012)• Features for text mining from Web pages (Maciolek and Dobrowolski,
2013)• Mapping between a lexicon and an ontology (Wróblewska et al., 2013)• Word-to-word similarity in ontologies (Lula and Paliwoda-Pękosz, 2009)• Text similarity for Information Retrieval (Siemiński, 2012)• Text classification (Maciołek, 2010)• Terminology extraction and clustering (Mykowiecka and Marciniak,
2012)• Automated extraction of Opinion Attribute Lexicons (Wawer and
Gołuchowski, 2012)• Named Entity Recognition • Word Sense Disambiguation (Gołuchowski and Przepiórkowski, 2012)• Anaphora resolutionMore than 500 registered users, ~70 declared commercial applications
![Page 26: Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0ac4d/html5/thumbnails/26.jpg)
Conclusions
• plWordNet 2.0 – a national wordnet not adapted from Princeton WordNet
• plWordNet 2.0 is comparable to WordNet 3.1in size, as well as in lexical coverage, hypernymy
depth and relation density• Synset membership depends only on
constitutive relations between lexical units.• A unique mapping strategy and a unique
opportunity to compare the two lexical systems
• plWordNet 3.0 (2015): – a comprehensive wordnet of Polish– 200k of lemmas and 260k of LUs, mapped to PWN
3.?
![Page 27: Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0ac4d/html5/thumbnails/27.jpg)
Thank-you
www.plwordnet.pwr.wroc.pl
Thank you!
![Page 28: Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0ac4d/html5/thumbnails/28.jpg)
Differences between plWN and PWN• Inter-lingual lexico-grammatical
differences: – marked forms (diminutives,
augmentatives)– lexicalised gender– lexical gaps
• Differences in the definition of synonymy and synset:– 'Mixed' PWN synsets – marked and
unmarked forms, feminine and masculine, countable and uncountable, hypernym and hyponym- hypernymy and (plWN) vs. and/or (PWN)
![Page 29: Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0ac4d/html5/thumbnails/29.jpg)
Differences between plWN and PWN• Other differences:– synset definitions incompatible with
relations (PWN)– different relations used for coding the
same conceptual dependencies– more fine-grained meaning
differentiation– differences boiling down to the content
and size of resource
![Page 30: Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0ac4d/html5/thumbnails/30.jpg)
Differences in lexicalisation
![Page 31: Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0ac4d/html5/thumbnails/31.jpg)
Relation density
Synset relation density in PWN 3.1 and in plWordNet 2.0
in the select semantic domains
Semantic domain