natural language processingdemo.clab.cs.cmu.edu/nlp/s20/files/slides/02-apps.pdf · senator john...

23
Natural Language Processing Lecture 2: Applications of NLP

Upload: others

Post on 20-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Natural Language Processingdemo.clab.cs.cmu.edu/NLP/S20/files/slides/02-apps.pdf · Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate

Natural Language ProcessingLecture 2: Applications of NLP

Page 2: Natural Language Processingdemo.clab.cs.cmu.edu/NLP/S20/files/slides/02-apps.pdf · Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate

Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate after consistently trailing in third place.In the latest primary, held in Florida yesterday, Edwards gained only 14% of the vote, with Hillary Clinton polling 50% and Barack Obama on 33%. A reported 1.5m voters turned out to vote.

Information Extraction (From 10,000 Feet)

• Input: text, empty relational database• Output: populated relational database

State Party Candidate Fraction

FL D Edwards 0.14

FL D Clinton 0.50

FL D Obama 0.33

Page 3: Natural Language Processingdemo.clab.cs.cmu.edu/NLP/S20/files/slides/02-apps.pdf · Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate

Named Entity Recognition

Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate after consistently trailing in third place.In the latest primary, held in Florida yesterday, Edwardsgained only 14% of the vote, with Hillary Clinton polling 50% and Barack Obama on 33%. A reported 1.5m voters turned out to vote.

Page 4: Natural Language Processingdemo.clab.cs.cmu.edu/NLP/S20/files/slides/02-apps.pdf · Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate

Reference Resolution

Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate after consistently trailing in third place.In the latest primary, held in Florida yesterday, Edwardsgained only 14% of the vote, with Hillary Clinton polling 50% and Barack Obama on 33%. A reported 1.5m voters turned out to vote.

Page 5: Natural Language Processingdemo.clab.cs.cmu.edu/NLP/S20/files/slides/02-apps.pdf · Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate

Coreference Resolution

Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate after consistently trailing in third place.In the latest primary, held in Florida yesterday, Edwardsgained only 14% of the vote, with Hillary Clinton polling 50% and Barack Obama on 33%. A reported 1.5m voters turned out to vote.

Page 6: Natural Language Processingdemo.clab.cs.cmu.edu/NLP/S20/files/slides/02-apps.pdf · Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate

Relation Detection

Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate after consistently trailing in third place.In the latest primary, held in Florida yesterday, Edwardsgained only 14% of the vote, with Hillary Clinton polling 50% and Barack Obama on 33%. A reported 1.5m voters turned out to vote.

member_of

John Edwards Democratic party

Hillary Clinton Democratic party

Barack Obama Democratic party

Page 7: Natural Language Processingdemo.clab.cs.cmu.edu/NLP/S20/files/slides/02-apps.pdf · Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate

NER

Page 8: Natural Language Processingdemo.clab.cs.cmu.edu/NLP/S20/files/slides/02-apps.pdf · Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate

Encoding the NER Problem...PresidentDonaldTrumpmetwithlocalleadersandfederal responders shortlyafterlandingatanAirForcebaseinCarolina,PuertoRico...

...B-PERI-PERI-PER

OOOOOOOOOOOO

B-ORGI-ORG

OO

B-LOCO

B-LOCI-LOC

...

President Donald Trump met with local leaders and federal responders shortly after landing at an Air Force base in Carolina, Puerto Rico, for what was supposed to be a briefing on the situation on the island. Instead, Trump turned it into an opportunity to congratulate himself and the federal government's response to the disaster…. He downplayed throughout his remarks how dire things are in Puerto Rico, where more than half of the people don't have power, running water, or cellphone service two weeks after Hurricane Maria, a Category 4 storm, tore through the island.

Page 9: Natural Language Processingdemo.clab.cs.cmu.edu/NLP/S20/files/slides/02-apps.pdf · Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate

Some Named Entity Types

• Different annotation schemes for NER use different types• Common types include• PER—person• ORG—Organization• LOC—Location• GPE—Geopolitical Entity• FAC—Facility• NAT—Natural phenomenon

• These are only tagged when they are proper names

Page 10: Natural Language Processingdemo.clab.cs.cmu.edu/NLP/S20/files/slides/02-apps.pdf · Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate

Evaluating an NER System

Correct named entity tokens

Hypothesized named entity

tokens

Page 11: Natural Language Processingdemo.clab.cs.cmu.edu/NLP/S20/files/slides/02-apps.pdf · Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate

NER System Building Process

Test Data

Evaluate Precision and Recall

Page 12: Natural Language Processingdemo.clab.cs.cmu.edu/NLP/S20/files/slides/02-apps.pdf · Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate

Relation Extraction

Page 13: Natural Language Processingdemo.clab.cs.cmu.edu/NLP/S20/files/slides/02-apps.pdf · Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate

Examples of Relations

Page 14: Natural Language Processingdemo.clab.cs.cmu.edu/NLP/S20/files/slides/02-apps.pdf · Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate

Seeding Tuples

ExamplesBrad is married to Angelina.Bill is married to Hillary.Hillary is married to Bill.Hillary is the wife of Bill.

SeedsX is married to YX is the wife of Y

Find all X,Y where these templates existExtract the X,Ys, find all other X … Y

Page 15: Natural Language Processingdemo.clab.cs.cmu.edu/NLP/S20/files/slides/02-apps.pdf · Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate

Bootstrapping Relations

Page 16: Natural Language Processingdemo.clab.cs.cmu.edu/NLP/S20/files/slides/02-apps.pdf · Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate

Information Retrieval

Page 17: Natural Language Processingdemo.clab.cs.cmu.edu/NLP/S20/files/slides/02-apps.pdf · Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate

The Vector Space Model

• Each document is a |Σ|-dimensional vector di:

• Given a query q, represent it in the same way.

• Similarity of vectors ⇒ relevance:

• Twists: tf-idf

Page 18: Natural Language Processingdemo.clab.cs.cmu.edu/NLP/S20/files/slides/02-apps.pdf · Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate

Evaluating Information Retrieval

precision

recall

Page 19: Natural Language Processingdemo.clab.cs.cmu.edu/NLP/S20/files/slides/02-apps.pdf · Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate

Question Answering

Page 20: Natural Language Processingdemo.clab.cs.cmu.edu/NLP/S20/files/slides/02-apps.pdf · Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate

Question Answering Architecture

SLP p. 793

Page 21: Natural Language Processingdemo.clab.cs.cmu.edu/NLP/S20/files/slides/02-apps.pdf · Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate

Your Project (Perhaps)

SLP p. 793

QuestionKnown Document

Page 22: Natural Language Processingdemo.clab.cs.cmu.edu/NLP/S20/files/slides/02-apps.pdf · Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate

Evaluating QA

Page 23: Natural Language Processingdemo.clab.cs.cmu.edu/NLP/S20/files/slides/02-apps.pdf · Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate

Some General Tools

• Supervised classification• Feature vector representations• Bootstrapping• Evaluation:• Precision and recall (and their curves)• Mean reciprocal rank