on the semantic representation and extraction of complex category descriptors
DESCRIPTION
Natural language descriptors used for categorizations are present from folksonomies to ontologies. While some descriptors are composed of simple expressions, other descriptors have complex compositional patterns (e.g. ‘French Senators Of The Second Empire’, ‘Churches Destroyed In The Great Fire Of London And Not Rebuilt’). As conceptual models get more complex and decentralized, more content is transferred to unstructured natural language descriptors, increasing the terminological variation, reducing the conceptual integration and the structure level of the model. This work describes a formal representation for complex natural language category descriptors (NLCDs). In the representation, complex categories are decomposed into a graph of primitive concepts, supporting their interlinking and semantic interpretation. A category extractor is built and the quality of its extraction under the proposed representation model is evaluated.TRANSCRIPT
![Page 1: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/1.jpg)
On the Semantic Representation and Extraction of Complex
Category DescriptorsAndré Freitas, Rafael Vieira, Edward Curry, Danilo
Carvalho, João C. Pereira da Silva
Insight Centre for Data AnalyticsNLDB 2014
Montpellier, France
![Page 2: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/2.jpg)
Outline
Motivation Extracting Natural Language Category Descriptors
(NLCDs) Evaluation Summary
2
![Page 3: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/3.jpg)
Motivation3
![Page 4: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/4.jpg)
Big Data Vision: More complete data-based picture of the world
for systems and users.
4
![Page 5: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/5.jpg)
“Schema” Growth & Complexity Fundamental shift in the database landscape How to build large ‘schemas’?
10s-100s attributes1,000s-1,000,000s attributes
5
![Page 6: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/6.jpg)
Target Motivational Scenario: Wikipedia
Decentralized content generation 300,000 editors have edited Wikipedia more than 10
times > 280,000 distinct Natural Language Category
Descriptors (NLCDs)
6
![Page 7: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/7.jpg)
Natural Language Category Descriptors (NLCDs)
7
![Page 8: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/8.jpg)
NLCDs Natural Language Category Descriptors (NLCDs)
are natural language descriptors for sets
Simple NLCDs:- ‘People’- ‘Countries’- ‘Films’
Complex NLCDs:- ‘French Senators Of The Second Empire’- ‘United Kingdom Parliamentary Constituencies Represented By A Sitting Prime Minister’
Goal: - Parse NLCDs into an integrated structured graph
8
![Page 9: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/9.jpg)
Assumptions
NLCD
NLCDs as a more syntactically tractable subset of natural language
NLCDs as a low effort interface for structuring a domain of discourse
IE
9
![Page 10: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/10.jpg)
Formality vs. Usability Spectrum
NLCDss NLCD graphss
Information Extraction
10
NLCD graphss
![Page 11: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/11.jpg)
Applications Database Creation Semantic Annotation Entity/Semantic Search
11
![Page 12: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/12.jpg)
Other Examples
IFRS and US GAAP - ‘Partially owned properties’ - ‘Residential portfolio segment’ - ‘Assets arising from exploration for and evaluation of
mineral resources’ - ‘Key management personnel compensation’ - ‘Other long-term employee benefits’
12
![Page 13: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/13.jpg)
Extracting Natural Language Category Descriptors (NLCDs)
13
![Page 14: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/14.jpg)
Natural Language Category Descriptors
What is Big Data?
14
![Page 15: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/15.jpg)
Core Features
Manual analysis of 10,000 NLCDs.
15
![Page 16: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/16.jpg)
Features/Core Lexical Categories Distribution
16
![Page 17: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/17.jpg)
Number of distinct POS Tag patterns
17
![Page 18: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/18.jpg)
Graph Representation Model
18
![Page 19: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/19.jpg)
Focus of the Representation
Taxonomic Structure
Context Representation (Open Relation Extraction)
- Reification-based
![Page 20: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/20.jpg)
Examples
20
![Page 21: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/21.jpg)
Examples
21
![Page 22: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/22.jpg)
Examples
22
![Page 23: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/23.jpg)
Examples
23
![Page 24: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/24.jpg)
NLCD Extractor
24
![Page 25: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/25.jpg)
NLCD Extractor: POS Tagging
25
![Page 26: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/26.jpg)
NLCD Extractor: Segmentation
26
![Page 27: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/27.jpg)
NLCD Extractor: Named Entity Recognition
27
![Page 28: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/28.jpg)
NLCD Extractor: Core Detection
28
![Page 29: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/29.jpg)
NLCD Extractor: WSD
29
![Page 30: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/30.jpg)
NLCD Extractor: Entity Linking
30
Dbpedia
![Page 31: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/31.jpg)
NLCD Extractor: RDF Representation
31
Dbpedia
![Page 32: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/32.jpg)
RDF Representation
32
![Page 33: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/33.jpg)
Evaluation33
![Page 34: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/34.jpg)
Evaluation Setup Total of 287,957 English Wikipedia categories (Open
Domain scenario)
Selected random sample of 2,696 categories
Manual evaluation of the core extraction features- Entity segmentation- Relation identification- Unary operators- Specialization relations- Category core identification- Entity core identification- Word Sense Disambiguation (WordNet)- Entity linking (DBpedia)
34
![Page 35: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/35.jpg)
Results
Performance:- (i) graph extraction time: 9.8 ms per graph- (ii) word sense disambiguation: 121.0 ms per word - (iii) entity linking: 530.0 ms per link
* i5-3317U (1.70GHz) CPU computer with 4GB RAM (4 core, 2 threads per core).
35
![Page 36: On the Semantic Representation and Extraction of Complex Category Descriptors](https://reader033.vdocuments.mx/reader033/viewer/2022051207/53fb630e8d7f729c2e8b577f/html5/thumbnails/36.jpg)
Summary NLCDs can provide a more tractable (from the IE
perspective) natural language interface for structuring large KBs
We developed an approach for the representation, extraction and integration of NLCDs
- ~75% extraction accuracy
Limitations:- Need for a more principled and formal definition for a NLCD- Need for a better entity recognition and linking approach
Future Work: evaluation under a domain-specific scenario
36