tartar information extraction transforming arbitrary tables into f-logic frames with tartar...
Post on 19-Dec-2015
219 views
TRANSCRIPT
![Page 1: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d2e5503460f94a0595e/html5/thumbnails/1.jpg)
TARTARInformation Extraction
Transforming Arbitrary Tables into F-Logic Frames with TARTARAleksander Pivk, York Sure, Philipp Cimiano,Matjaz Gams, Vladislav Rajkovic, Rudi Studer
Presented By Stephen Lynn
![Page 2: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d2e5503460f94a0595e/html5/thumbnails/2.jpg)
TARTARInformation Extraction
Information Extraction Free-form Text
Linguistic/NLP approaches
Tabular StructuresTable comprehension task
html, excel, pdf, text, etc.Semantic interpretation taskMore effort???
![Page 3: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d2e5503460f94a0595e/html5/thumbnails/3.jpg)
TARTARInformation Extraction
TARTAR Architecture
![Page 4: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d2e5503460f94a0595e/html5/thumbnails/4.jpg)
TARTARInformation Extraction
Semantic Representation Frame Logic (F-Logic)
Model-theoretic semanticsComplete resolution-based proof theoryExpressive power of logicAvailability of efficient reasoning tools
![Page 5: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d2e5503460f94a0595e/html5/thumbnails/5.jpg)
TARTARInformation Extraction
F-Logic Frame
![Page 6: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d2e5503460f94a0595e/html5/thumbnails/6.jpg)
TARTARInformation Extraction
Table Comprehension Dimensions – a grouping of cells representing
similar entities
![Page 7: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d2e5503460f94a0595e/html5/thumbnails/7.jpg)
TARTARInformation Extraction
Table Comprehension Stub – dimension with headers used to index
elements in body
![Page 8: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d2e5503460f94a0595e/html5/thumbnails/8.jpg)
TARTARInformation Extraction
Table Comprehension Box head – column headers (often nested)
![Page 9: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d2e5503460f94a0595e/html5/thumbnails/9.jpg)
TARTARInformation Extraction
Table Comprehension Body – data values
![Page 10: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d2e5503460f94a0595e/html5/thumbnails/10.jpg)
TARTARInformation Extraction
Table Classes 1D, 2D, Complex
![Page 11: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d2e5503460f94a0595e/html5/thumbnails/11.jpg)
TARTARInformation Extraction
Methodology
![Page 12: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d2e5503460f94a0595e/html5/thumbnails/12.jpg)
TARTARInformation Extraction
Cleaning & Canonicalization Clean DOM tree
CyberNeko HTML Parser
Rowspan/Colspan expansion
![Page 13: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d2e5503460f94a0595e/html5/thumbnails/13.jpg)
TARTARInformation Extraction
Structure Detection Token Type Hierarchy Assign Functional Types and Probabilities
![Page 14: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d2e5503460f94a0595e/html5/thumbnails/14.jpg)
TARTARInformation Extraction
Structure Detection Detect Logical Table Orientation
![Page 15: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d2e5503460f94a0595e/html5/thumbnails/15.jpg)
TARTARInformation Extraction
Structure Detection Discover and Level Regions
Logical Units
![Page 16: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d2e5503460f94a0595e/html5/thumbnails/16.jpg)
TARTARInformation Extraction
FTM Building Functional Table Model (FTM)
Arrange regions into a treeLeaf nodes are data
![Page 17: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d2e5503460f94a0595e/html5/thumbnails/17.jpg)
TARTARInformation Extraction
Semantic Enriching of FTM Labeling
WordNet and GoogleSets
Map FTM to a frame
![Page 18: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d2e5503460f94a0595e/html5/thumbnails/18.jpg)
TARTARInformation Extraction
Evaluation Crawl, extract, filter web tables
135 tables85.4% success rateMostly problems with complex tables
Compare auto-generated frames with human generated frames14 people transformed 3 tables each21 total tables (each done twice)Syntactic/Semantic correctness (Strict and Soft)
![Page 19: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d2e5503460f94a0595e/html5/thumbnails/19.jpg)
TARTARInformation Extraction
Results
Inter-annotator agreement
System-annotator agreement
![Page 20: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649d2e5503460f94a0595e/html5/thumbnails/20.jpg)
TARTARInformation Extraction
Benefits Fully automated knowledge formalization Arbitrary tables Independent of domain knowledge Independent of document type Explicit semantics of generated frames Query answering over heterogeneous tables