a primer on natural language processing...natural language processing main current research...

Post on 17-Jun-2020

24 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A Primer onNatural Language Processing

Mohammad Taher Pilehvar

TeIAS Summer School on Data Science

26 August 2019

Artificial Intelligence

Design algorithms that make computers behave intelligently

But, what is intelligent behavior?

Image from threatpost.com

Artificial IntelligenceScenario 1: Vision

Artificial IntelligenceScenario 1: Vision

Artificial IntelligenceScenario 1: Vision

To a non-intelligent computer, photos are nothing but sets of colored pixels

Artificial Intelligence

Scenario 1: Vision (face detection/recognition)

Artificial Intelligence

Scenario 1: Vision (autonomous cars)

Artificial Intelligence

Scenario 2: Motion/Manipulation (Robotics)

Artificial Intelligence

Scenario 3: Learning/Planning

Artificial Intelligence

Scenario 4: Natural language!

??!!

What’s the capital of Iran?

Artificial Intelligence

Khatam

01001011 01101000 01110100 01101000 01101101

K h a t a m

Scenario 4: Natural language!

Artificial Intelligence

Scenario 4: Natural language!

Make computers

understand and

generate natural

language

Natural Language Processing(Computational Linguistics)

NLP ML AI

Natural Language Processing(Computational Linguistics)

Natural Language Processing(Computational Linguistics)

Natural Language Understanding Natural Language Generation

*

Difficulties of Language Understanding

Difficulty of Language Understanding

Common sense knowledge

• The trophy would not fit in the brown suitcase because it is too big.• What is too big?

Difficulty of Language Understanding

Common sense knowledge

• The trophy would not fit in the brown suitcase because it is too big.• What is too big?

• The town councilors refused to give the demonstrators a permit because they feared (advocated) violence. • Who feared (advocated) violence?

Difficulty of Language Understanding

Context

“It is raining outside. This is the reason why I won't go out”.

• What is the reason to not go outside?• This?

Coreference resolution:• I did not vote for Donald Trump because I think he is a lier!

Anaphora resolution:• I bought a new Thinkpad, I have an old Macbook. I am going to give it away!

Difficulty of Language Understanding

Slang, idioms and sarcasm

• Those shoes are goat; She is busted; He is rather a frenemy

• In a nutshell; piece of cake; think outside the box; bad apple; get the picture

• That’s just what I needed today!(When something bad happens)

Difficulty of Language Understanding

Ambiguity

Illustration from IBM Watson

Difficulty of Language Understanding

Ambiguity

Difficulty of Language Understanding

Ambiguity

Difficulty of Language Understanding

Ambiguity

Difficulty of Language Understanding

Ambiguity

Difficulty of Language Understanding

Amazon fire!

Difficulty of Language Understanding

Metonymic Ambiguity

• London voted to stay in the EU

• The White House admits Trump is lying to manipulate his voters

• The kettle is boiling

• Iran beat Cuba after dropping first two sets

Difficulty of Language Understanding

Syntactic Ambiguity

I heard his cell phone ring in my office

WiC (Word-in-Context) dataset(Pilehvar and Collados, 2019, nominated for IJCAI’s research excellence award)

Label Target Context-1 Context-2

False bedThere's a lot of trash on the bed of the river

I keep a glass of water next to my bed when I sleep

Label Target Context-1 Context-2

False bedThere's a lot of trash on the bed of the river

I keep a glass of water next to my bed when I sleep

False landThe pilot managed to land the airplane safely

The enemy landed several of our aircrafts

Label Target Context-1 Context-2

False bedThere's a lot of trash on the bed of the river

I keep a glass of water next to my bed when I sleep

False landThe pilot managed to land the airplane safely

The enemy landed several of our aircrafts

True air Air pollutionOpen a window and let in some air

Label Target Context-1 Context-2

False bedThere's a lot of trash on the bed of the river

I keep a glass of water next to my bed when I sleep

False landThe pilot managed to land the airplane safely

The enemy landed several of our aircrafts

True air Air pollutionOpen a window and let in some air

True windowThe expanded window will give us time to catch the thieves

You have a two-hour window of clear weather to finish working on the lawn

WiC (Word-in-Context) dataset

Team System Accuracy

Google BERT++ 69.9

Facebook AI RoBERTa 69.6

Stanford Hazy Research Snorkel 72.1

Performance upperbound -- 80.0

Difficulty of Language Generation

Massive vocabulary size

Dynamic word order

Syntax and grammar

Fluency

Natural Language Processing(Computational Linguistics)

Applications of NLP

*

Machine Translation

Information Retrieval

Document Summarisation

Question Answering

Plagiarism Detection

Document Classification

Spam Detection

Fake News Detection

Chatbots

Social Media Analysis

Sentiment Analysis

Social Media Analysis

Tip of the Tongue (ToT)

Reverse dictionary

NLP and Deep Learning

Source: XenonStack

*

NLP and Deep Learning

Word Sense Disambiguation

NLP and Deep Learning

Word Sense Disambiguation

Conventional approach

Extract (hand-crafted) features:

• Surrounding words

• Part of speech tags

• Collocations

NLP and Deep Learning

Word Sense Disambiguation

DL-based approach

• End-to-end model

• Input words, output classes

• No features involved

Figure from Kågebäck and Salomonsson (2016)

NLP and Deep Learning

Sentence Similarity Measurement

Figure from Google AI blog

NLP and Deep Learning

Sentence Similarity Measurement

Conventional approach

Extract features:

• String-based: if their words look similar (phone vs. telephone)

• Semantic: if their words have similar meanings (dozens of individual techniques)

• Style: ratio of function words, if they have overlapping numbers

• Phonetic: if they sound similar

• …

NLP and Deep Learning

Sentence Similarity Measurement

DL-based approach

Figure from Mueller and Thyagarajan (2016)

NLP and Deep Learning

Stance detection

Gibraltar source says the Iranian tanker Grace-1 will be allowed to leave

Agree: Iran says Britain might release seized Grace 1 oil tanker soon

Disagree: Iranian tanker continues to be detained by Gibraltar

NLP and Deep Learning

Stance detection

Conventional approach

Extract (hand-crafted) features:

• Word overlaps

• Word frequencies

• Count features

• …

NLP and Deep Learning

Stance detection

DL-based approachEnd-to-end

NLP and Deep Learning

Word embeddings (2013)

Khatam pizza

desk

rain

NLP and Deep Learning

Word embeddings (2013)

train

rail

station

passenger

railway

bus

terminal

transit

flower

fruit

treeseed

leaf

university

education

library

studies

NLP and Deep Learning

Word embeddings (2013)

NLP and Deep Learning

Word embeddings (2013)

NLP and deep learning

Contextualised Models (since 2018)

A new turning point in NLP

Evolving very rapidly

2013 2014 2015 2016 2017 2018 2019 2020

Word2vec

GloVe

ELMo

GPTBERT

XLNet

ULMFit GPT-2

RoBERTa

NLP and deep learning

Contextualised Models

One system for all tasks!

Natural Language Processing

Main Current Research Challenges

*

Existing challenges in NLP

Natural Language Understanding

• Learning language from the ground up

• Innate biases vs. learning from scratch

• Linguistics, cognitive and neuroscience aspects

• Reasoning

Existing challenges in NLP

NLP for low-resource languages

• Lack of data, for training and for evaluation

• Incentives

• Universal language models

• Cross-lingual representations

Existing challenges in NLP

Reasoning at scale

Current NLP is unable to analyze large or multiple documents

A challenging task:

• NarrativeQA: questions about entire movie scripts and books

Existing challenges in NLP

Evaluation

Current evaluation benchmarks and performance metrics often themselves need re-evaluation!

• Machine Translation

• Dialogue

• Language Generation

Language Modeling

Language Model

Language Modeling

Language Model

Language ModelingPersian poetry

https://www.darbare.com/Post/30084

مثنوی مولویشاهنامه فردوسی

Language ModelingWikipedia articles

http://karpathy.github.io

Language ModelingWikipedia articles

http://karpathy.github.io

Language ModelingXML

http://karpathy.github.io

Language ModelingScientific article

Generative modelsSunspring

Generative modelsSunspring

Thanks

Up next:

Michael Zock on Tip of the tongue problem!

MZ is not a computer scientist, but a psycholinguist working (now for decades) on languageproduction.

His goal lies in the building of computational tools to help people to speak and to write be it themother tongue, or a foreign language. To achieve his goal he relies on knowledge from psychology(psycholinguistics + neuroscience) and engineering skills (NLP).

Those who are interested in more details may take a look at his website:

http://pageperso.lif.univ-mrs.fr/~michael.zock/

top related