complete and consistent annotation of wordnet with the top concept ontology

25
Complete and Consistent Annotation of WordNet with the Top Concept Ontology Javier Álvez, Jordi Atserias, Jordi Carrera, Salvador Climent, Egoitza Laparra, Antoni Oliver and German Rigau Basque Country Univ., Pompeu Fabra Univ, (Barcelona), Open Univ. Of Catalonia (Barcelona)

Upload: argyle

Post on 16-Mar-2016

34 views

Category:

Documents


1 download

DESCRIPTION

Complete and Consistent Annotation of WordNet with the Top Concept Ontology. Javier Álvez, Jordi Atserias, Jordi Carrera, Salvador Climent, Egoitza Laparra, Antoni Oliver and German Rigau Basque Country Univ., Pompeu Fabra Univ, (Barcelona), Open Univ. Of Catalonia (Barcelona). Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Complete and Consistent Annotation of WordNet with the Top Concept Ontology

Complete and Consistent Annotation of WordNet with

the Top Concept OntologyJavier Álvez, Jordi Atserias, Jordi Carrera, Salvador Climent, Egoitza Laparra, Antoni Oliver and German

Rigau

Basque Country Univ., Pompeu Fabra Univ, (Barcelona), Open Univ. Of Catalonia (Barcelona)

Page 2: Complete and Consistent Annotation of WordNet with the Top Concept Ontology

Introduction

• 4 years work

• Full annotation of WordNet’s Nouns with Semantic Features (EWN TCO)

• Aimed to be an important semantic resource for NLP (selectional preferences, synset clustering, reasoning…).

Page 3: Complete and Consistent Annotation of WordNet with the Top Concept Ontology

Result

• 65.989 noun concepts (synsets) = 116.364 noun lexemes (variants) consistently

annotated

• Average of 6.47 features per synset– Features organized in a multilevel hierarchy

Page 4: Complete and Consistent Annotation of WordNet with the Top Concept Ontology

Structure of the talk

• Methodology

• Examples and Discussion

• Conclusions

Page 5: Complete and Consistent Annotation of WordNet with the Top Concept Ontology

Methodology

• Annotation of the Inter Lingual Index (=EnWn1.6, SpaWN, mapping to other WNs...) with the nodes/features of the TCO (a shallow ontology defined in the EWN Project [Vossen et. Al 1998])

• Methodology based on:– INCOMPATIBILITY OF ONTOLOGICAL

INFORMATION– SUBSUMPTION BLOCKAGE POINTS

Page 6: Complete and Consistent Annotation of WordNet with the Top Concept Ontology

The Top Concept Ontology

• Organized in three orders of entities:

– 1st Order (physical entities)– 2nd Order (situations)– 3rd Order (abstract entities)

Page 7: Complete and Consistent Annotation of WordNet with the Top Concept Ontology

The Top Concept Ontology

• 1st Order entities organized in four Qualia-like features:

– Origin (Artifact, Natural..) – Form (Object, Substance…)– Composition (Group, Part)– Function (Building, Container, Vehicle…)

Page 8: Complete and Consistent Annotation of WordNet with the Top Concept Ontology

The Top Concept Ontology

• 2nd Order Entities organized in two dimensions

– Situation Type: Dynamic (Bounded Events, Unbounded Events) & Static (Properties, Relations)

– Situation Component: (Cause, Manner, Modal…)

• 3rd Order Entities, no further subdivided

Page 9: Complete and Consistent Annotation of WordNet with the Top Concept Ontology

Methodology• We don’t modify the structure of neither the TCO nor

WN (=> future work). We just annotate.

• We declared pairs of TCO properties as incompatible (e.g.:natural vs. artifact, substance vs. object)

• Initial annotation situation: In EWN, TCO features were manually assigned to a basic set of 1024 EWN synsets (= Base Concepts)

Page 10: Complete and Consistent Annotation of WordNet with the Top Concept Ontology

Methodology

1. We annotated automatically the rest of the Top Synsets (from the BCs up to the Top) using a Wordnet’s SemanticFile-TCO table of equivalence (e.g. NounAct <=> Agentive , NounAttribute <=> Property )

2. We performed a full automatic top-down expansion of such information via the WN1.6 hierarchy (feature inheritance)

Page 11: Complete and Consistent Annotation of WordNet with the Top Concept Ontology

Methodology

• This caused feature incompatibility to arise:• about 225.000 conflicts in 25.000 synsets

• Causes:• Wrong manual annotation in EWN • Wrong TCO-SF equivalence • ... but basically:

– Subsumption in WN not always work» ISA Overloading etc.

– Multiple inheritance in WN

Page 12: Complete and Consistent Annotation of WordNet with the Top Concept Ontology

Methodology

• We checked manually all feature incompatibilities in order to:– (i) adding and/or deleting ontological features– (ii) setting inheritance blockage points.

• A blockage point is an annotation in WN1.6 which breaks the ISA relation between two synsets, thus no inheritance is allowed.

Page 13: Complete and Consistent Annotation of WordNet with the Top Concept Ontology

A simple example

Bandung

Java

island

city

Page 14: Complete and Consistent Annotation of WordNet with the Top Concept Ontology

A simple example

Bandung

Java

island=NATURAL

city=ARTIFACT

Page 15: Complete and Consistent Annotation of WordNet with the Top Concept Ontology

A simple example

Bandung+NATURAL

+ARTIFACT

Java+NATURAL

island=NATURAL

city=ARTIFACT

Page 16: Complete and Consistent Annotation of WordNet with the Top Concept Ontology

A simple example

Bandung+ARTIFACT

Java+NATURAL

island=NATURAL

city=ARTIFACT

Page 17: Complete and Consistent Annotation of WordNet with the Top Concept Ontology

MethodologyInformation used for decision making

• Relational information regarding every synset and neighbours; i.e. the WN structure

• Synsets' glosses as provided by EWN

• Glosses, descriptions and examples of the TCO features as provided in [Alonge et al. 1998]

• Usual word-substitution tests to acknowledge hyponymy, as in [Cruse 1986]

Page 18: Complete and Consistent Annotation of WordNet with the Top Concept Ontology

Methodology• When all incompatibilities were fixed, a new

automatic re-expansion was launched which resulted in a new (smaller) number of conflicts.

• Following this iterative and incremental approach, inheritance was re-calculated and data are re-examined several times.

• Task finished when a new cycle of re-expansion of properties did not result in new conflicts.

Page 19: Complete and Consistent Annotation of WordNet with the Top Concept Ontology

Methodology• Then, two final steps were applied:

1. Since the TCO is itself a hierarchy, for every synset, its annotation was expanded up-feature; e.g. Animal expands ot Living, Natural, Origin and 1stOrderEntity

2. The whole hierarchy was checked for consistency using formal Theorem Provers like Vampire and E-prover

– This step resulted in a number of new conflicts which were finally fixed.

Page 20: Complete and Consistent Annotation of WordNet with the Top Concept Ontology

Typology of miscategorizations (IS-A Overload)

• Overgeneralization• Reduction of sense• Confusion of senses• Suspect Type-to-role relationshipSuspect Type-to-role relationship• Extensional ambiguity• 3rd Order Entities vs Mental 2nd Order Entities (TCO labels)• Technical inconsistencies

(in black:[Guarino 1998] original typology)

Page 21: Complete and Consistent Annotation of WordNet with the Top Concept Ontology

Typology of miscategorizations

• Overgeneralitzation = Hypernym has more features than Hyponym should have

• Reduction of Sense = Hypernym fails to capture part of the Hyponym’s meaning

• Confusion of senses = Multiple inheritance where hypernyms are incompatible

Page 22: Complete and Consistent Annotation of WordNet with the Top Concept Ontology

Typology of miscategorizations

• Extensional ambiguity = e.g. “layer”: is it an object or a substance?

• 3rd Order Entities vs Mental 2nd Order Entities (TCO labels) = e.g “discipline” (process thus 2ndOrder) IS-A “knowledge domain” (3rdOrder)

• Technical inconsistencies = e.g. Hyponymy-Meronymy confusion

Page 23: Complete and Consistent Annotation of WordNet with the Top Concept Ontology

Conclusions

• WN1.6 (= ILI) fully and consistently annotated for Nouns with 60 semantic features organized in a shallow ontology – 65.000 synsets,116.000 variants– Average of 6.48 TCO features per synset

• 350 inheritance-blocking points detected in WN– 28.000 synsets have at least one in their hypernymy chain [=

they are affected by WN hierarchy mistakes or inadequacies]• The resource is free. It can be downloaded from our web site

(vid. proceedings)

Page 24: Complete and Consistent Annotation of WordNet with the Top Concept Ontology

The Statue of Liberty+OBJECT

+IMAGE_REPRESENTATION

+CONCEPT

monument+OBJECT

artifact+OBJECT

art+OBJECT

sculpture=IMAGE_REPRESENTATION

+CONCEPT

+OBJECT

impressionism+OBJECT

figure+CONCEPT

shape+CONCEPT

abstraction=CONCEPT

object=OBJECT

Page 25: Complete and Consistent Annotation of WordNet with the Top Concept Ontology

The Statue of Liberty+OBJECT

+IMAGE_REPRESENTATION

monument+OBJECT

artifact+OBJECT

art=CONCEPT

sculpture=IMAGE_REPRESENTATION

=OBJECT

impressionism+CONCEPT

figure+CONCEPT

shape+CONCEPT

abstraction=CONCEPT

object=OBJECT