michael fuchs | how to compute semantic relationships between entities and facts out of natural...
TRANSCRIPT
How to compute semantic relationships
between entities and facts out of
natural texts
Michael Fuchs Technology Evangelist
ABBYY [email protected]
Agenda
1. How machines read pixels
2. Documents, words, layout & semantics
3. Syntactic & semantic text parsing
4. Live demo
5. Q&A
2
How machines read pixels
3
Separate pixels to characters Pixel analysis Find text/image blocks
How machines read pixels
4
Build proper words as editable text Recognize individual characters
-> Linguistics: Alphabets & Morphology Dictionaries
-> Math, AI, Statistics, Experience, and…
Requirements to make a machine read text:
5
What is needed to make a machine understand the meaning
of words, sentences, texts?
Documents & Words
6
What is a document?
Statistics can give basic insights
-> No real semantic understanding
b) Words in order?
Layouts generate visual pattern
-> Semantics can be derived from layout
a) Bag of words?
Documents, Words and Layout
7
Document with layout
Text document with “simulated” layout Text with line breaks
Text only
-> Rules can extract data out of (semi-)structured texts and documents -> Layout helps to identify the semantic meaning of data
Text and Structure
Is “plain” natural language text unstructured?
8
-> yes, at least for almost all IT systems
-> not for humans who can read and speak the language
-> Facts and their relations can’t be reliably detected with “simple” rules
Text, Structure & Translation
9
Is a word by word translation enough?
-> … well – not really…
-> Semantic understanding of the words and their relationship in sentences is needed!
-> That is true for humans and machines
Text & Structure
10
Why is natural language text understanding difficult for machines?
-> Languages are not logical and context dependent
– different usage, e.g. as verb, noun, adjective
-> Different words – the same concept, e.g. to buy/sell something
– different meanings, e.g. run, plant, apple …
-> One word – different variants, e.g. go, went, gone
Basic Language Structure
11
-> Morphology = Rules how to use words
-> Semantics = meaning and the usage of words
-> Semantic Relations = reflect/organise the meaning and relations of words and sentences.
-> Syntax = Rules are used to build correct sentences
How to get to the insides of a sentence?
Compreno System Architecture
13
Extraction rules Interpretation
rules
Identification rules
Morphological analyzer
Syntactic and semantic analysis
Anaphora resolution
Disambiguation
Semantic representation
of text
Parser Information Extraction
Module
RDF Graph
Morphology Analysis
14 14
Sentence Analysis with Semantic Info
15
17
How to get the correct semantic meaning of words?
ABBYY’s answer: Universal Semantic Hierarchy
= language independent semantic concepts
ABBYY’s Universal Semantic Hierarchy
18
Semantic Meaning “Vocabulary” EN “Vocabulary” DE
Handling Lexical Ambiguity
19
Recovering Omitted Words and Links (Ellipsis)
20
Recovered Node
Ellipsis
Identifying Pronoun Referents (Anaphora)
21
Mary saw her students. They were wearing masks. She was surprised. (Mary → her, Mary → she, students → they).
From Text to Semantic with Compreno
22
DEMO
Summary: What is ABBYY Compreno? ● … NLP technology featuring a unique model-based approach that employs
universal language models and identifies language structures.
● …. combines both syntactic and semantic analysis, as well as machine learning on untagged text corpora.
● … allows to create a semantic representation of text
● … able to resolve complex language phenomena: − lexical ambiguity − omitted words and links recovering ellipsis − identifying pronoun referents anaphora − coreference − coordination and more
● … support of English, Russian, German in progress
24
QUESTIONS?
Thank you for your attention!