advanced artificial intelligence natural language...

85
Advanced Artificial Intelligence Natural Language Processing Chung-Ang University, Van Dat Tuong Good afternoon everyone. Today I am delighted to be here to talk about Natural Language Processing (NLP), a subfield of Artificial Intelligence.

Upload: others

Post on 24-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Advanced Artificial Intelligence

Natural Language Processing

Chung-Ang University, Van Dat Tuong

Good afternoon everyone. Today I am delighted to be here to talk about Natural Language Processing (NLP), a

subfield of Artificial Intelligence.

Page 2: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Contents

• Introduction

• History

• Rule-based vs Statistical NLP

• Major Evaluations and tasks

• How NLP works

2Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

My talk is divided into 5 main parts: Introduction, History, Rule-based vs Statistical NLP, Major Evaluations and

tasks, and How NLP works

Page 3: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Introduction

People communicate in many different ways: through speaking and listening, making gestures, using specialized

hand signals (such as when driving or directing traffic), using sign languages for the deaf, or through various forms

of text.

3Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 4: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Introduction

By text we mean words that are written or printed on a flat surface (paper, card, street signs and so on) or displayed

on a screen or electronic devices in order to be read by their intended recipients (or by whoever happens to be

passing by).

4Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 5: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Introduction

This presentation will focus on concerning with the method in which computer systems can analyze and interpret

texts, and I will assume for convenience that these texts are presented in an electronic format (documents, images,

etc). This leads to the term, Natural Language Processing.

5Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 6: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Introduction

NLP is defined as a subfield of computer science, information engineering, and artificial intelligence concerned

with the interactions between computers and human (natural) languages, in particular how to program computers

to process and analyze large amounts of natural language data.

6Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 7: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Introduction

Let's use an example to show just how powerful NLP is used in a practical situation: When you're typing on google

search, like many of us do every day, you'll see word suggestions based on what you typed and what you're

currently typing. That's NLP in action.

7Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 8: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Introduction

As a human, you may speak and write in English, Korean, or Chinese. But a computer’s native language – known as

machine code or machine language, is largely incomprehensible to most people. At the device’s lowest levels,

communication occurs not with words but through millions of '0’ and '1’ (binary form) that produce logical actions.

8Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 9: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

The real reason why NLP is hard

The process of reading and understanding language is complex than it seems at the first glance. There are many

things that go into truly understanding what a piece of text means in the real-world. For example, what do you

think the above piece of text means?

9Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 10: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

The real reason why NLP is hard

To a human it’s probably quite obvious what this sentence means. We know Steph Curry is a basketball player; or

even if not, we know that he plays on some kind of team, probably a sport team. When we see “on fire” and

“destroyed” we know that it means Steph Curry played really well last night and beat the other team.

10Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 11: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

The real reason why NLP is hard

On the other way, computer tends to take things a bit too literally. Viewing things literally like a computer, we would

see “Steph Curry” based on the capitalization, assume that’s a person, place, or otherwise important thing which is

great! But then we see that Steph Curry “was on fire”. A computer might tell you that someone literally lit Steph

Curry onfire yesterday! … yikes. After that, the computer might say that he has physically destroyed the other

team…. and they no longer exist according to this computer… great…

11Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 12: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Fortunately, not all is grim! Thanks to Machine Learning computer systems can actually do some really clever

things to quickly extract and understand information from natural language!

12Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Sol.

Page 13: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

History

The history of NLP generally started in 1950s, although work can be found from earlier periods. In 1950, Alan

Turing published an article titled "Intelligence" which proposed what is now called the Turing test as a

criterion of intelligence.

13Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 14: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

History

In 1954, Georgetown experiment involved fully automatic translation of more than sixty Russian sentences

into English. The authors claimed that within three or five years then, machine translation would be a solved

problem, however, real progress was much slower.

14Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 15: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

History

In 1960s, some notably successful NLP systems were developed, such as SHRDLU - a natural language

system working in restricted "blocks worlds" with restricted vocabularies.

15Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 16: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

History

And ELIZA - a simulation of a Rogerian psychotherapist, sometimes can provide a startlingly human-like

interaction. ELIZA might provide a generic response like "Why do you want to be happy?" responding for

statement “I want to be happy“.

16Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 17: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

History

During 1970s, many programmers began to write "conceptual ontologies", which structured real-world

information into computer-understandable data. By the time, many chatterbots were written including PARRY,

Racter, and Jabberwacky, conceptually similar to ELIZA in previous stage.

17Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 18: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

History

Starting from late 1980s, there was a revolution in NLP with the introduction of machine learning algorithms.

Some of the earliest-used machine learning algorithms, such as decision trees, produced systems of hard if-

then rules, similar to existed hand-written rules.

18Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 19: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

History

19Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

However, part-of-speech tagging introduced the use of hidden Markov models (HMM) to NLP, and increasingly,

research has focused on statistical models, which make soft, probabilistic decisions based on attaching real-valued

weights to the features, making up the input data.

Such models are generally more robust when given unfamiliar input, especially input that contains errors (as is very

common for the real-world data), and produce more reliable results when integrated into a larger system comprising

multiple subtasks.

Page 20: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

History

20Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Recent researches have increasingly focused on unsupervised and semi-supervised learning algorithms which are

able to learn from non-annotated data or combinations of annotated and non-annotated data. Typically, comparing

with supervised learning, results are less accurate for a given amount of input data, however, there is an enormous

amount of non-annotated data available, which can often make up the inferior results.

Page 21: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

History

21Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

In 2010s, feature learning and deep neural network-style machine learning methods became widespread in NLP, due

in part to a flurry of results showing that such techniques can achieve state-of-the-art results in many NLP tasks.

Page 22: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

History

22Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Popular techniques include the use of word embeddings to capture semantic properties of words, and an increase in

end-to-end learning of a higher-level task (e.g. question answering) instead of relying on a pipeline of separate

intermediate tasks (e.g. part-of-speech tagging and dependency parsing).

Page 23: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

History

23Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

In some areas, this shift has entailed substantial changes in how NLP systems are designed, such that deep neural

network-based approaches may be viewed as a new paradigm distinct from statistical NLP. For instance, the term

neural machine translation (NMT) emphasizes the fact that deep learning-based approaches to machine translation

directly learn sequence-to-sequence transformations, obviating the need for intermediate steps such as word

alignment and language modeling that were used in statistical machine translation (SMT).

Page 24: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Rule-based vs Statistical NLP

24Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

In the early days, many language-processing systems were designed by hand-coding a set of rules, e.g. by writing

grammars or devising heuristic rules for stemming. However, this is rarely robust to natural language variation.

Since the so-called "statistical revolution" in the late 1980s and mid 1990s, the machine-learning paradigm calls

instead for using statistical inference to automatically learn such rules through the analysis of large corpora of

typical real-world examples (a set of documents, possibly with human or computer annotations).

Page 25: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Rule-based vs Statistical NLP

25Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Systems based on machine-learning algorithms have many advantages over hand-produced rules:

• The learning procedures used during machine learning automatically focus on the most common cases, whereas

when writing rules by hand it is often not at all obvious where the effort should be directed.

• Automatically learning procedures can make use of statistical-inference algorithms to produce models that are

robust to unfamiliar input and to erroneous input.

Page 26: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Rule-based vs Statistical NLP

26Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

• Based on automatically learning the rules, systems are more accurate simply by supplying more input data which

requires only a corresponding increase in the number of man-hours worked, generally without significant

increases in the complexity of the annotation process.

Page 27: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Major evaluations and tasks

The next slides consist some of the most commonly researched tasks in NLP. Note that some of these tasks

have direct real-world applications, while others more commonly serve as subtasks that are used to aid in

solving larger tasks.

Though NLP tasks are closely intertwined, they are frequently subdivided into categories for convenience.

27Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

1.Syntax

2.Semantics

3.Discourse

4.Speech

Page 28: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Syntax

Grammar induction, also known as grammar inference, is the process in machine learning which learns a “formal

grammar” which describes a language’s syntax from a set of observations, thus constructing a model which

accounts for the characteristics of the observed objects.

1. Grammar induction

28Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 29: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Syntax

More generally, grammar inference is a branch of machine learning where the instance space consists of discrete

combinatorial objects such as strings, trees and graphs.

1. Grammar induction

29Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 30: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Syntax

In computational linguistics, lemmatization is the algorithmic process of determining the lemma of a word based

on its intended meaning. It depends on correctly identifying the intended part of speech and meaning of a word in

a sentence, as well as within the larger context surrounding that sentence, such as neighboring sentences or even an

entire document.

2. Lemmatization

30Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 31: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Syntax

In linguistic morphology and information retrieval, stemming is the process of reducing inflected words to their

word stem, base or root form - generally a written word form. It is usually sufficient that related words map to the

same stem, even if this stem is not in itself a valid root.

3. Stemming

31Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 32: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Syntax

Lemmatization is closely related to stemming. The difference is that a stemmer operates on a single word without

knowledge of the context, and therefore cannot discriminate between words which have different meanings

depending on part of speech. However, stemmers are typically easier to implement and run faster. The reduced

accuracy may not matter for some applications.

3. Stemming

32Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 33: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Syntax

Morphological segmentation separates words into individual morphemes and identify the class of the morphemes.

The difficulty of this task depends greatly on the complexity of the morphology (i.e. the structure of words) of the

language being considered.

4. Morphological segmentation

33Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 34: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Syntax

English has fairly simple morphology, especially inflectional morphology, and thus it is often possible to ignore this

task entirely and simply model all possible forms of a word (e.g. "open, opens, opened, opening") as separate words.

Some other languages like Turkish, Meitei can not apply this approach as each dictionary entry has thousands of

possible word forms.

4. Morphological segmentation

34Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 35: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Syntax

Tagging is the task of labelling (or tagging) each word in a sentence with the appropriate part of speech. Given a

sentence, determine the part of speech for each word. Inflectional morphology languages, like English is prone to

ambiguity. Chinese is also prone to ambiguity because it is a tonal language during verbalization.

35Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

5. Part-of-Speech tagging

Page 36: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Syntax

Part-of-speech tagging is harder than just having a list of words and their parts of speech, because some words can

represent more than one part of speech at different times, and because some parts of speech are complex or

unspoken.

Many words, especially common ones, can serve as multiple parts of speech.

"book" :

36Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

noun ("the book on the table")

or verb ("to book a flight");

"set" can be a noun, verb or adjective; and

"out" can be any of at least five different parts of speech.

5. Part-of-Speech tagging

Page 37: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Syntax

Parsing determines the parse tree (grammatical analysis) of a given sentence. The grammar for natural languages is

ambiguous and typical sentences have multiple possible analyses. In fact, perhaps surprisingly, for a typical

sentence there may be thousands of potential parses (most of which will seem completely nonsensical to a human).

6. Parsing

37Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 38: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Syntax

There are two primary types of parsing: Dependency Parsing and Constituency Parsing. Dependency Parsing

focuses on the relationships between words in a sentence (marking things like Primary Objects and predicates),

whereas Constituency Parsing focuses on building out the Parse Tree using a Probabilistic Context-Free Grammar.

6. Parsing

38Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 39: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Syntax

Accurately sentence breaking requires analysis of the local context around periods and the punctuations. Given a

chunk of text, find the sentence boundaries. Sentence boundaries are often marked by periods or other punctuation

marks, but these same characters can serve other purposes (e.g. marking abbreviations).

7. Sentence breaking

39Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 40: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Syntax

Word segmentation separates a chunk of continuous text into separate words. For a language like English, this is

fairly trivial, since words are usually separated by spaces. However, some languages like Chinese and Japanese

require knowledge of the vocabulary and morphology of words in the language.

8. Word Segmentation

Sentence Segmentation

Image segmentation

Topic segmentation

40Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 41: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Syntax

Terminology extraction (also known as term extraction, glossary extraction, term recognition, or terminology

mining) is a subtask of information extraction. The goal of terminology extraction is to automatically extract

relevant terms from a given corpus.

9. Terminology extraction

41Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 42: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Semantics

Lexical semantics is a subfield of linguistic semantics. The units of analysis in lexical semantics are lexical units

which include not only words but also sub-words or sub-units such as affixes and even compound words and phrases.

It looks at how the meaning of the lexical units correlates with the structure of the language or syntax.

1. Lexical Semantics

• He plays bass guitar.

• That bass was delicious!

42Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 43: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Semantics

The study of lexical semantics looks at the classification and decomposition of lexical items, the differences and

similarities in lexical semantic structure cross-linguistically, and the relationship of lexical meaning to sentence

meaning and syntax.

1. Lexical Semantics

43Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 44: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Semantics

Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that

investigates the use of software to automatically translate text or speech from one human language to another.

2. Machine translation

44Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 45: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Semantics

This is one of the most difficult problems, and is a member of a class of problems requiring all of the different

types of knowledge that humans possess (grammar, semantics, facts about the real world, etc.) in order to solve

properly.

2. Machine translation

45Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 46: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Semantics

Translation process includes:

• Decoding the meaning of the source text, and

• Re-encoding this meaning in the target language

2. Machine translation

46Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 47: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Semantics

Given a stream of text, determine which items in the text map to proper names, such as people or places, and what

the type of each such name is (e.g. person, location). Note that, although capitalization can aid in recognizing

named entities in languages such as English, this information cannot aid in determining the type of entity,

3. Natural entity recognition (NER)

47Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 48: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Semantics

and in any case is often inaccurate or insufficient. For example, the first word of a sentence is also capitalized, and

named entities often span several words, only some of which are capitalized. Furthermore, many other languages in

non-Western scripts like Chinese do not have any capitalization at all.

3. Natural entity recognition (NER)

48Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 49: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Semantics

And even the languages with capitalization, for example, German capitalizes all nouns, regardless of whether they

are names, French and Spanish do not capitalize names that serve as adjectives.

3. Natural entity recognition (NER)

49Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 50: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Semantics

Natural language generation is one task of NLP that focuses on generating natural language from structured data

such as a knowledge base or a logical form (linguistics). It can be used to produce long documents that

summarize or explain the contents of computer databases, for example making news reports.

4. Natural language generation

50Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 51: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Semantics

It can also be used to generate short blurbs of text in interactive conversations (a chatterbot) which might even be

read out loud by a text-to-speech system.

4. Natural language generation

51Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 52: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Semantics

Natural language understanding converts chunks of text into more formal representations such as first-order logic

structures that are easier for computer programs to manipulate. It involves the identification of the intended

semantic from the multiple possible semantics which can be derived from a natural language expression

5. Natural language understanding

52Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 53: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Semantics

which usually takes the form of organized notations of natural language concepts. Introduction and creation of

language metamodel and ontology are efficient, however, empirical. An explicit formalization of natural language

semanticswithout confusions with implicit assumptions such as closed-world assumption (CWA) vs.

5. Natural language understanding

53Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 54: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Semantics

open-world assumption (OWA), or subjective Yes/No vs. objective True/False is expected for the construction of a

basis of semantics formalization.

5. Natural language understanding

54Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 55: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Semantics

Given an image representing printed text, determine the corresponding text.

6. Optical character recognition(OCR)

55Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 56: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Semantics

Question answering is a computer science discipline within the fields of information retrieval and NLP, which is

concerned with building systems that automatically answer, or even ask questions in a natural human language.

7. Question answering

56Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

An important feature in V.A:

❑ Apple Siri

❑ Cortana (MS)

❑ Google Assistant

❑ Samsung Bixby

❑ Jarvis (FB)

Page 57: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Semantics

Given a human-language question, determine its answer. Typical questions have a specific right answer (such as

"What is the capital of Canada?"), but sometimes open-ended questions are also considered (such as "What is the

meaning of life?"). Recent works have looked at even more complex questions.

7. Question answering

57Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 58: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Semantics

Given two text fragments, determine if one being true entails the other, entails the other's negation, or allows the

other to be either true or false.

8. Recognizing Textual entailment

58Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 59: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Semantics

Given a chunk of text, identify the relationships among named entities (e.g. Roman Empire has the capital Rome;

Roman Roads has length of 400.000km).

9. Relationship extraction

59Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 60: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Semantics

Semantic analysis extracts subjective information usually from a set of documents, often using online reviews to

determine polarity about specific objects. It is especially useful for identifying trends of public opinion in the

social media, for the purpose of marketing.

10. Semantic analysis

60Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 61: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Semantics

Given a chunk of text, separate it into segments that each of which is devoted to a topic, and identify the topic of

the segment.

11. Topic segmentation and recognition

61Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 62: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Semantics

Many words have more than one meaning; we have to select the meaning which makes the most sense in the

context. For this problem, we are typically given a list of words and associated word senses, e.g. from a dictionary

or from an online resource such as WordNet.

12. Word sense disambiguation

62Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 63: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Discourse

Automatic summarization is a part of machine learning and data mining. The main idea of summarization is to find a

subset of data which contains the information of the entire data set. Such techniques are widely used in industry

today.

1. Automatic Summarization

63Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 64: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Discourse

In details, it is the process of shortening a text document with software, in order to create a summary with the major

points of the original document. It produces a readable summary of a chunk of text and often be used to provide

summaries of text of a known type, such as articles in the financial section of a newspaper.

1. Automatic Summarization

64Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 65: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Discourse

Given a sentence or larger chunk of text, determine which words refer to the same objects. Anaphora

resolution is a specific example of this task, and is specifically concerned with matching up pronouns with the

nouns or names to which they refer.

2. Coreference resolution

65Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 66: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Discourse

More general idea comes to "bridging relationships" which involves referring expressions. For example, in a

sentence "He entered John's house through the front door", "the front door" is a referring expression and the

bridging relationship to be identified is the fact that the door being referred to is the front door of John's house.

2. Coreference resolution

66Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 67: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Discourse

Discourse analysis is a general term for a number of approaches to analyze written, vocal, or sign language use, or

any significant semiotic event. It includes a number of related tasks. One task is identifying the discourse structure

of connected text, i.e. the nature of the discourse relationships between sentences (e.g. elaboration, explanation,

3. Discourse analysis

67Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 68: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Discourse

contrast). Another possible task is recognizing and classifying the speech acting in a chunk of text (e.g. yes-no

question, content question, statement, assertion, etc.). In other words, discourse analysis study the way sentences

and speech go together to make texts and interactions and how those texts and interactions fit into our social world.

3. Discourse analysis

68Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 69: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Speech

Given a sound clip of a person or people speaking, determine the textual representation of the speech. This is the

opposite of text to speech and is one of the extremely difficult problems. In natural speech there are hardly any

pauses between successive words, and thus "speech segmentation" is a necessary subtask of speech recognition.

1. Speech recognition

69Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 70: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Speech

Note also that in most spoken languages, the sounds representing successive letters blend into each other in a

process termed co-articulation, so the conversion of the analog signal to discrete characters can be a very difficult

process.

1. Speech recognition

70Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 71: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Speech

The quality of a speech recognition systems are assessed according to two factors: Accuracy (error rate in

converting spoken words to digital data) and speed (how well the software can keep up with a human speaker).

1. Speech recognition

71Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 72: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Speech

Speech segmentation is the process of identifying the boundaries between words, syllables, or phonemes in spoken

natural languages. It’s a subtask of speech recognition as mentioned above andtypically grouped with it.

2. Speech segmentation

72Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 73: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Speech

Given a text, transform those units and produce a spoken representation. Text-to-speech can be used to aid the

visually impaired. A text-to-speech (TTS) system converts normal language text intospeech.

3. Text-to-speech

73Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 74: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Speech

A text-to-speech system (or "engine") is composed of two parts: front-end and back-end. The front-end has two

major tasks. First, it converts raw text containing symbols like numbers and abbreviations into the equivalent of

written-out words. Then, it assigns phonetic transcriptions to each word, and divides and marks the text into

prosodic units, like phrases, clauses, and sentences.

3. Text-to-speech

74Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 75: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Speech

The back-end, often referred to as the synthesizer, then converts the symbolic linguistic representation into sound.

In certain systems, this part includes the computation of the target prosody (pitch contour, phoneme durations),

which is then imposed on the output speech.

3. Text-to-speech

75Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 76: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

So, How does NLP work?

As mentioned above, NLP is a form of artificial intelligence that analyzes the human language. It takes many

forms, but at its core, the technology helps machine understanding, and even communicating with humans.

76Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 77: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

So, How does NLP work?

Understanding NLP isn't the easiest thing. It's a highly advanced form of AI that's only recently become

viable. It means that not only we are still learning about NLP but also it's very difficult to grasp.

77Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 78: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

So, How does NLP work?

78Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

1

2

3

4

5

6

7

Page 79: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

So, How does NLP work?

What following below is the easiest way to understand how NLP works. The first step in NLP depends on the

specific application of the system.

79Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 80: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

So, How does NLP work?

Voice-based systems like Alexa or Google Assistant need to translate your words into text. That's done (usually)

using the Hidden Markov Models system (HMM). The HMM uses math models to determine what you've said and

translate that into text usable by the NLP system.

80Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 81: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

So, How does NLP work?

Put it in the simplest way, the HMM listens to 10- to 20-millisecond clips of your speech and looks for phonemes

(the smallest unit of speech) to compare with pre-recorded speech. Next is the actual understanding of the language

and context.

81Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 82: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

So, How does NLP work?

Each NLP system uses slightly different techniques, but on the whole, they're fairly similar. The systems try to break

each word down into its part of speech (noun, verb, etc.). This happens through a series of coded grammar rules that

rely on algorithms that incorporate statistical machine learning to help to determine the context of what you said.

82Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 83: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

So, How does NLP work?

If we're not talking about speech-to-text NLP, the system just skips the first step and moves directly into analyzing

the words using the algorithms and grammar rules. The end result is the ability to categorize what is said in many

different ways. Depending on the underlying focus of the NLP software, the results get used in different ways.

83Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 84: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

Future vision

Virtual assistant is one among important future visions. Virtual assistants will become better at understanding and

responding to complex and long-form natural language requests, which use conversational language, in real time.

These assistants will be able to converse more like humans, take notes during dictation, analyze complex requests

and execute tasks in a single context, suggest important improvements to business documents, and more.

Human-like virtual assistants

84Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong

Page 85: Advanced Artificial Intelligence Natural Language Processingmi.cau.ac.kr/teaching/lecture_aai/NLP.pdf · 2019-05-06 · Rule-based vs Statistical NLP Advanced Artificial Intelligence

THANK YOU FOR

LISTENING!!

85Advanced Artificial Intelligence / Chung-Ang University / Van Dat Tuong