natural language processing: an introductioncis521/lectures/nlp-intro.pdf · natural language...
TRANSCRIPT
![Page 1: Natural Language Processing: An Introductioncis521/Lectures/NLP-intro.pdf · Natural Language Processing: An Introduction. CIS 521 - Intro to AI 2 ... “Natural language, whether](https://reader034.vdocuments.mx/reader034/viewer/2022043014/5fb1c0129757257aca40fe15/html5/thumbnails/1.jpg)
Natural Language Processing:
An Introduction
![Page 2: Natural Language Processing: An Introductioncis521/Lectures/NLP-intro.pdf · Natural Language Processing: An Introduction. CIS 521 - Intro to AI 2 ... “Natural language, whether](https://reader034.vdocuments.mx/reader034/viewer/2022043014/5fb1c0129757257aca40fe15/html5/thumbnails/2.jpg)
CIS 521 - Intro to AI 2
NLP: The Ultimate Goal (1990)
The Ultimate Goal – For computers to use NL as effectively as
humans do….
“Natural language, whether spoken, written, or typed, is the
most natural means of communication between humans,
and the mode of expression of choice for most of the
documents they produce. As computers play a larger role
in the preparation, acquisition, transmission, monitoring,
storage, analysis, and transformation of information,
endowing them with the ability to understand and generate
information expressed in natural languages becomes more
and more necessary.”
![Page 3: Natural Language Processing: An Introductioncis521/Lectures/NLP-intro.pdf · Natural Language Processing: An Introduction. CIS 521 - Intro to AI 2 ... “Natural language, whether](https://reader034.vdocuments.mx/reader034/viewer/2022043014/5fb1c0129757257aca40fe15/html5/thumbnails/3.jpg)
CIS 521 - Intro to AI 3
NLP: Grand Challenges (1990)
The Ultimate Goal – For computers to use NL as effectively as
humans do….
Reading and writing text
• Abstracting
• Monitoring
• Extraction into Databases
Interactive Dialogue: Natural, effective access to computer systems
• Informal Speech Input and Output
Translation: Input and Output in Multiple Languages
![Page 4: Natural Language Processing: An Introductioncis521/Lectures/NLP-intro.pdf · Natural Language Processing: An Introduction. CIS 521 - Intro to AI 2 ... “Natural language, whether](https://reader034.vdocuments.mx/reader034/viewer/2022043014/5fb1c0129757257aca40fe15/html5/thumbnails/4.jpg)
Review: Significant Advances In NLP I
• Web-scale information extraction
& question answering
• IBM’s Watson
• Interactive Dialogue Systems
• Apple’s Siri
• (Microsoft Cortana)
• (Amazon Echo)
• (Google Assistant)
4CIS 521 - Intro to AI
![Page 5: Natural Language Processing: An Introductioncis521/Lectures/NLP-intro.pdf · Natural Language Processing: An Introduction. CIS 521 - Intro to AI 2 ... “Natural language, whether](https://reader034.vdocuments.mx/reader034/viewer/2022043014/5fb1c0129757257aca40fe15/html5/thumbnails/5.jpg)
Significant Advances In NLP II
新华网海牙3月24日电(记者陈贽潘治)第三届核安全峰会24日在荷兰海牙举行。国家主席习近平出席并发表重要讲话,介绍中国核安全措施和成就,阐述中国关于发展和安全并重、权利和义务并重、自主和协作并重、治标和治本并重的核安全观,呼吁国际社会携手合作,实现核能持久安全和发展。
Xinhua News Agency, The Hague, March 24
(Xinhua Chen Zhi Pan Zhi) The third nuclear
safety summit held in The Hague, the
Netherlands. Chinese President Xi Jinping
attended and delivered an important speech to
introduce China's nuclear safety measures
and achievements, to elaborate on China's
development and safety, both rights and
obligations, both autonomy and cooperation,
both temporary and temporary nuclear
security concept, called on the international
community to work together, To achieve long-
term nuclear safety and development.
CIS 521 - Intro to AI 5
Automatic Machine Translation
Xinhua story (Chinese) Google translate (11/1/17)
.
![Page 6: Natural Language Processing: An Introductioncis521/Lectures/NLP-intro.pdf · Natural Language Processing: An Introduction. CIS 521 - Intro to AI 2 ... “Natural language, whether](https://reader034.vdocuments.mx/reader034/viewer/2022043014/5fb1c0129757257aca40fe15/html5/thumbnails/6.jpg)
CIS 521 - Intro to AI 6
Review: MultiMedia Monitoring System
BBN MAPS & Language Weaver MT (2005)
![Page 7: Natural Language Processing: An Introductioncis521/Lectures/NLP-intro.pdf · Natural Language Processing: An Introduction. CIS 521 - Intro to AI 2 ... “Natural language, whether](https://reader034.vdocuments.mx/reader034/viewer/2022043014/5fb1c0129757257aca40fe15/html5/thumbnails/7.jpg)
CIS 521 - Intro to AI 7
![Page 8: Natural Language Processing: An Introductioncis521/Lectures/NLP-intro.pdf · Natural Language Processing: An Introduction. CIS 521 - Intro to AI 2 ... “Natural language, whether](https://reader034.vdocuments.mx/reader034/viewer/2022043014/5fb1c0129757257aca40fe15/html5/thumbnails/8.jpg)
Current system now includes:
“tools and technologies that enable analysts to quickly discover
relevant information and drill down into the data.
• Geolocation:
• Geographical visualizations pinpoint the areas about which
participants are communicating.
• Sentiment:
• Analysis of the tone of interactions enables users to understand
sentiments expressed over time, either individually or as a group by
topic or theme.
• Topics and themes:
• BBN's Unsupervised Topic Discovery component automatically
identifies topics, thematically classifying content or correlating it to
Twitter hashtags.”
Source: http://www.raytheon.com/capabilities/products/m3s/index.html
CIS 521 - Intro to AI 8
![Page 9: Natural Language Processing: An Introductioncis521/Lectures/NLP-intro.pdf · Natural Language Processing: An Introduction. CIS 521 - Intro to AI 2 ... “Natural language, whether](https://reader034.vdocuments.mx/reader034/viewer/2022043014/5fb1c0129757257aca40fe15/html5/thumbnails/9.jpg)
CIS 521 - Intro to AI 9
Early Successes: Human Machine Interfaces
• SHRDLU (Winograd, 1969)
• A fragile demonstration of the fundamental vision
• LUNAR (Woods, Webber, Kaplan 1971)
• Answering geologist’s questions about the Apollo 11 moon rocks
![Page 10: Natural Language Processing: An Introductioncis521/Lectures/NLP-intro.pdf · Natural Language Processing: An Introduction. CIS 521 - Intro to AI 2 ... “Natural language, whether](https://reader034.vdocuments.mx/reader034/viewer/2022043014/5fb1c0129757257aca40fe15/html5/thumbnails/10.jpg)
CIS 521 - Intro to AI 10
Review: SHRDLU: A demonstration proof
![Page 11: Natural Language Processing: An Introductioncis521/Lectures/NLP-intro.pdf · Natural Language Processing: An Introduction. CIS 521 - Intro to AI 2 ... “Natural language, whether](https://reader034.vdocuments.mx/reader034/viewer/2022043014/5fb1c0129757257aca40fe15/html5/thumbnails/11.jpg)
LUNAR – William Woods 1971
• NLP interface to database of analyses of Apollo
11 moon rocks
• Examples
• What is the average concentration of aluminum in high alkali
rocks?
• How many breccias contain olivine?
• Give me the modal analyses of those samples for all phases.
• Handled 78% of sentences typed by geologists at
1971 Lunar Rocks conference
• (90% after “minor fixes”)
CIS 521 - Intro to AI 11
![Page 12: Natural Language Processing: An Introductioncis521/Lectures/NLP-intro.pdf · Natural Language Processing: An Introduction. CIS 521 - Intro to AI 2 ... “Natural language, whether](https://reader034.vdocuments.mx/reader034/viewer/2022043014/5fb1c0129757257aca40fe15/html5/thumbnails/12.jpg)
CIS 521 - Intro to AI 12
The Past: Crucial flaws in the paradigm
These and other later systems worked well, BUT
1. Person-years of work to port to new applications
2. Very limited coverage of English
Crucially, they worked well because of a magical fact:
People automatically adapt and limit their language given
a small set of exemplars if the underlying linguistic
generalizations are HABITABLE
This won’t handle pre-existing text!
![Page 13: Natural Language Processing: An Introductioncis521/Lectures/NLP-intro.pdf · Natural Language Processing: An Introduction. CIS 521 - Intro to AI 2 ... “Natural language, whether](https://reader034.vdocuments.mx/reader034/viewer/2022043014/5fb1c0129757257aca40fe15/html5/thumbnails/13.jpg)
CIS 521 - Intro to AI 13
The State of NLP
NLP Past before 1995:
• Rich Representations
NLP Present:
• Powerful Statistical Disambiguation
![Page 14: Natural Language Processing: An Introductioncis521/Lectures/NLP-intro.pdf · Natural Language Processing: An Introduction. CIS 521 - Intro to AI 2 ... “Natural language, whether](https://reader034.vdocuments.mx/reader034/viewer/2022043014/5fb1c0129757257aca40fe15/html5/thumbnails/14.jpg)
14
1995: A breakthrough in parsing
106 words of Treebank Annotation
+ Machine Learning = Robust Parsers
(Magerman ’95)
Parser
Trees
Models
TrainingProgram
trainingsentences answers
The founder of Pakistan's
nuclear program, Abdul
Qadeer Khan, has
admitted he transferred
nuclear technology to
Iran, Libya and North
Korea
•1990 Best hand-built parsers: ~40-60% accuracy (guess)
•1995+ Statistical parsers: >90% accuracy
(both on short sentences)
The
founder
of
Pakistan’s
nuclear department
Abdul Qadeer Khan
has
admitted
he
transferred
nuclear technology
to
Iran,
Libya,
and
North Korea
NPNP
NPNP
NP
PP
PP
VP
NP
NP
NP
NP
NP
NP
VP
S
SBA
R
S
VP
CIS 521 - Intro to AI
![Page 15: Natural Language Processing: An Introductioncis521/Lectures/NLP-intro.pdf · Natural Language Processing: An Introduction. CIS 521 - Intro to AI 2 ... “Natural language, whether](https://reader034.vdocuments.mx/reader034/viewer/2022043014/5fb1c0129757257aca40fe15/html5/thumbnails/15.jpg)
15
The Penn Treebank: 1988-94
Analysts
S
NP-SBJ
VP
have VP
been VP
expecting NP
a GM-Jaguar
pact
NP
that
SBAR
WHNP-1
*T*-1
S
NP-SBJ
VP
wouldVP
give
the US car
maker
NP
NP
an eventual
30% stake
NP
the British
company
NP
PP-LOC
in
• Wall Street Journal: 1.3 million words
• Brown Corpus: 1 million words
• Switchboard: 1 million words
• All Tagged with Part-of-Speech & Syntactic Structure
• Developed ’88-’94 (Marcus, Santorini, Taylor, Bies, …)
• Finished before it had any practical use!
CIS 521 - Intro to AI
![Page 16: Natural Language Processing: An Introductioncis521/Lectures/NLP-intro.pdf · Natural Language Processing: An Introduction. CIS 521 - Intro to AI 2 ... “Natural language, whether](https://reader034.vdocuments.mx/reader034/viewer/2022043014/5fb1c0129757257aca40fe15/html5/thumbnails/16.jpg)
Lexicalized parsing results (Labeled Constituent Precision/Recall F1)
16
(adapted from Chris Manning, Stanford)
CIS 521 - Intro to AI
Method Accuracy
PCFGs (Charniak 97) 73%
Conditional Models – Decision Trees (Magerman 95) 84.2%
Lexical Dependencies (Collins 96) 85.5%
Conditional Models—Logistic (Ratnaparkhi 97) 86.9%
Generative Lexicalized Model (Charniak 97) 86.7%
Generative Lexicalized Model (Collins 97) 88.2%
Logistic-inspired Model (Charniak 99) 89.6%
Boosting (Collins 2000) 89.8%
MaxEnt discriminative reranking (Charniak & Johnson 03) 91.0%
![Page 17: Natural Language Processing: An Introductioncis521/Lectures/NLP-intro.pdf · Natural Language Processing: An Introduction. CIS 521 - Intro to AI 2 ... “Natural language, whether](https://reader034.vdocuments.mx/reader034/viewer/2022043014/5fb1c0129757257aca40fe15/html5/thumbnails/17.jpg)
A Few Core Technologies
1. Named Entity Recognition & Information Extraction
2. Machine Translation
3. Text Summarization
![Page 18: Natural Language Processing: An Introductioncis521/Lectures/NLP-intro.pdf · Natural Language Processing: An Introduction. CIS 521 - Intro to AI 2 ... “Natural language, whether](https://reader034.vdocuments.mx/reader034/viewer/2022043014/5fb1c0129757257aca40fe15/html5/thumbnails/18.jpg)
Information Extraction &
Named Entity Recognition
![Page 19: Natural Language Processing: An Introductioncis521/Lectures/NLP-intro.pdf · Natural Language Processing: An Introduction. CIS 521 - Intro to AI 2 ... “Natural language, whether](https://reader034.vdocuments.mx/reader034/viewer/2022043014/5fb1c0129757257aca40fe15/html5/thumbnails/19.jpg)
CIS 521 - Intro to AI 19
Named Entity Recognition
The task: identify atomic elements of information in
text
• Flag the who, where, when & how much in text
• Person names
• Company /organization names
• Locations
• Dates & times
• Percentages
• Monetary amounts
![Page 20: Natural Language Processing: An Introductioncis521/Lectures/NLP-intro.pdf · Natural Language Processing: An Introduction. CIS 521 - Intro to AI 2 ... “Natural language, whether](https://reader034.vdocuments.mx/reader034/viewer/2022043014/5fb1c0129757257aca40fe15/html5/thumbnails/20.jpg)
CIS 521 - Intro to AI 20
Won‘t simple lists solve the problem?
• too numerous to include in dictionaries
• changing constantly
• appear in many variant forms
• subsequent occurrences might be abbreviated
list search/matching doesn‘t perform well
![Page 21: Natural Language Processing: An Introductioncis521/Lectures/NLP-intro.pdf · Natural Language Processing: An Introduction. CIS 521 - Intro to AI 2 ... “Natural language, whether](https://reader034.vdocuments.mx/reader034/viewer/2022043014/5fb1c0129757257aca40fe15/html5/thumbnails/21.jpg)
CIS 521 - Intro to AI 21
Information Extraction
• Information extraction is the identification, in text, of
specified classes of Named Entities +
—Relations
—Events
• For relations and events, this includes finding the
participants and modifiers (date, time, location, etc.).
• Goal: fill out a data base with given relation or event types: people’s jobs
—people’s whereabouts
—merger and acquisition activity
—disease outbreaks
—genomics relation
![Page 22: Natural Language Processing: An Introductioncis521/Lectures/NLP-intro.pdf · Natural Language Processing: An Introduction. CIS 521 - Intro to AI 2 ... “Natural language, whether](https://reader034.vdocuments.mx/reader034/viewer/2022043014/5fb1c0129757257aca40fe15/html5/thumbnails/22.jpg)
CIS 521 - Intro to AI 22
Position Company Location Person Status
President European Information Services, Inc.
London George Garrick Out
CEO Nielsen Marketing Research USA George Garrick In
Extraction Example
• George Garrick, 40 years old, president of the London-
based European Information Services Inc., was
appointed chief executive officer of
Nielsen Marketing Research, USA.
George Garrick, 40 years old,
Nielsen Marketing Research, USA.
![Page 23: Natural Language Processing: An Introductioncis521/Lectures/NLP-intro.pdf · Natural Language Processing: An Introduction. CIS 521 - Intro to AI 2 ... “Natural language, whether](https://reader034.vdocuments.mx/reader034/viewer/2022043014/5fb1c0129757257aca40fe15/html5/thumbnails/23.jpg)
23CIS 521 - Intro to AI
Levels of BBN Statistical Analysis (2005)
rep
rese
nta
tives
Yu
go
sla
vP
resi
den
t
Slo
bo
da
n
Mil
ose
vic
rece
ived on
Th
urs
da
y
theof
the
Ass
oci
ati
on of
Yu
go
sla
v
Ba
nk
s ,
hea
ded b
y
its
pre
sid
ent
Mil
os
Mil
osa
vlj
evic,
wh
o is
als
o
the
gen
era
l
dir
ecto
r of
Ju
go
Ba
nk
a
Person ORG ORGPersonGPE
NPANPA
NPA
NPANPA
NPA
NP
NPA
NP
NPPP
PP
PP
SBAR
PP
VP
VPVP
S
S
WHNP
Name finding
Parsing
Co-reference
Yugoslav President Slobodan Milosevic received on Thursday the
representatives of the Association of Yugoslav Banks, headed by its president
Milos Milosavljevic, who is also the general director of JugoBanka.
23
![Page 24: Natural Language Processing: An Introductioncis521/Lectures/NLP-intro.pdf · Natural Language Processing: An Introduction. CIS 521 - Intro to AI 2 ... “Natural language, whether](https://reader034.vdocuments.mx/reader034/viewer/2022043014/5fb1c0129757257aca40fe15/html5/thumbnails/24.jpg)
24
CIS 521 - Intro to AI 24
Information Extraction from
Propositions
rep
rese
nta
tiv
es
Yu
go
sla
v
Pre
sid
ent
Slo
bo
da
n
Mil
ose
vic
rece
ived on
Th
urs
da
y
theof
the
Ass
oci
ati
on of
Yu
go
sla
v
Ba
nk
s ,
hea
ded b
y
its
pre
sid
ent
Mil
os
Mil
osa
vlj
evic,
wh
o is
als
o
the
gen
era
l
dir
ecto
r of
Ju
go
Ba
nk
a
Person ORG ORGPersonGPE Person ORG PersonDate
received
president representatives
headed
president
is
director
subj obj obj subj arg arg
on
Propositions are normalized connections from the parse trees.
Entities and relations are extracted statistically from propositions.
Person: Slobodan Milosevic
Position: president
Organization: Yugoslavia
Person: Milos Milosevic
Position: president
Organization: Association
of Yugoslav Banks
Person: Milos Milosevic
Position: general director
Organization: JugoBanka
![Page 25: Natural Language Processing: An Introductioncis521/Lectures/NLP-intro.pdf · Natural Language Processing: An Introduction. CIS 521 - Intro to AI 2 ... “Natural language, whether](https://reader034.vdocuments.mx/reader034/viewer/2022043014/5fb1c0129757257aca40fe15/html5/thumbnails/25.jpg)
Statistical Machine Translation
(For more on this topic, check out courses
taught by Prof. Chris Callison-Burch)
(Next several slides from Language Weaver)
![Page 26: Natural Language Processing: An Introductioncis521/Lectures/NLP-intro.pdf · Natural Language Processing: An Introduction. CIS 521 - Intro to AI 2 ... “Natural language, whether](https://reader034.vdocuments.mx/reader034/viewer/2022043014/5fb1c0129757257aca40fe15/html5/thumbnails/26.jpg)
CIS 521 - Intro to AI 26
Statistical Machine Translation Technology
Spanish/English
Bilingual TextEnglish Text
Statistical Analysis Statistical Analysis
Que hambre tengo yo
Spanish Broken
EnglishEnglish
What hunger have I,
Hungry I am so,
I am so hungry,
Have I that hunger …
I am so hungry
![Page 27: Natural Language Processing: An Introductioncis521/Lectures/NLP-intro.pdf · Natural Language Processing: An Introduction. CIS 521 - Intro to AI 2 ... “Natural language, whether](https://reader034.vdocuments.mx/reader034/viewer/2022043014/5fb1c0129757257aca40fe15/html5/thumbnails/27.jpg)
CIS 521 - Intro to AI 27
How A Statistical MT System Learns
![Page 28: Natural Language Processing: An Introductioncis521/Lectures/NLP-intro.pdf · Natural Language Processing: An Introduction. CIS 521 - Intro to AI 2 ... “Natural language, whether](https://reader034.vdocuments.mx/reader034/viewer/2022043014/5fb1c0129757257aca40fe15/html5/thumbnails/28.jpg)
CIS 521 - Intro to AI 28
Translating a New Document
![Page 29: Natural Language Processing: An Introductioncis521/Lectures/NLP-intro.pdf · Natural Language Processing: An Introduction. CIS 521 - Intro to AI 2 ... “Natural language, whether](https://reader034.vdocuments.mx/reader034/viewer/2022043014/5fb1c0129757257aca40fe15/html5/thumbnails/29.jpg)
Text Summarization
(For more on this topic, check out courses
taught by Prof. Ani Nenkova)
![Page 30: Natural Language Processing: An Introductioncis521/Lectures/NLP-intro.pdf · Natural Language Processing: An Introduction. CIS 521 - Intro to AI 2 ... “Natural language, whether](https://reader034.vdocuments.mx/reader034/viewer/2022043014/5fb1c0129757257aca40fe15/html5/thumbnails/30.jpg)
CIS 521 - Intro to AI 30
(Includes pre-thesis work by Prof. Nenkova)