natural language processingdemo.clab.cs.cmu.edu/nlp/s20/files/slides/02-apps.pdf · senator john...

Post on 20-Jul-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Natural Language ProcessingLecture 2: Applications of NLP

Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate after consistently trailing in third place.In the latest primary, held in Florida yesterday, Edwards gained only 14% of the vote, with Hillary Clinton polling 50% and Barack Obama on 33%. A reported 1.5m voters turned out to vote.

Information Extraction (From 10,000 Feet)

• Input: text, empty relational database• Output: populated relational database

State Party Candidate Fraction

FL D Edwards 0.14

FL D Clinton 0.50

FL D Obama 0.33

Named Entity Recognition

Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate after consistently trailing in third place.In the latest primary, held in Florida yesterday, Edwardsgained only 14% of the vote, with Hillary Clinton polling 50% and Barack Obama on 33%. A reported 1.5m voters turned out to vote.

Reference Resolution

Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate after consistently trailing in third place.In the latest primary, held in Florida yesterday, Edwardsgained only 14% of the vote, with Hillary Clinton polling 50% and Barack Obama on 33%. A reported 1.5m voters turned out to vote.

Coreference Resolution

Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate after consistently trailing in third place.In the latest primary, held in Florida yesterday, Edwardsgained only 14% of the vote, with Hillary Clinton polling 50% and Barack Obama on 33%. A reported 1.5m voters turned out to vote.

Relation Detection

Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate after consistently trailing in third place.In the latest primary, held in Florida yesterday, Edwardsgained only 14% of the vote, with Hillary Clinton polling 50% and Barack Obama on 33%. A reported 1.5m voters turned out to vote.

member_of

John Edwards Democratic party

Hillary Clinton Democratic party

Barack Obama Democratic party

NER

Encoding the NER Problem...PresidentDonaldTrumpmetwithlocalleadersandfederal responders shortlyafterlandingatanAirForcebaseinCarolina,PuertoRico...

...B-PERI-PERI-PER

OOOOOOOOOOOO

B-ORGI-ORG

OO

B-LOCO

B-LOCI-LOC

...

President Donald Trump met with local leaders and federal responders shortly after landing at an Air Force base in Carolina, Puerto Rico, for what was supposed to be a briefing on the situation on the island. Instead, Trump turned it into an opportunity to congratulate himself and the federal government's response to the disaster…. He downplayed throughout his remarks how dire things are in Puerto Rico, where more than half of the people don't have power, running water, or cellphone service two weeks after Hurricane Maria, a Category 4 storm, tore through the island.

Some Named Entity Types

• Different annotation schemes for NER use different types• Common types include• PER—person• ORG—Organization• LOC—Location• GPE—Geopolitical Entity• FAC—Facility• NAT—Natural phenomenon

• These are only tagged when they are proper names

Evaluating an NER System

Correct named entity tokens

Hypothesized named entity

tokens

NER System Building Process

Test Data

Evaluate Precision and Recall

Relation Extraction

Examples of Relations

Seeding Tuples

ExamplesBrad is married to Angelina.Bill is married to Hillary.Hillary is married to Bill.Hillary is the wife of Bill.

SeedsX is married to YX is the wife of Y

Find all X,Y where these templates existExtract the X,Ys, find all other X … Y

Bootstrapping Relations

Information Retrieval

The Vector Space Model

• Each document is a |Σ|-dimensional vector di:

• Given a query q, represent it in the same way.

• Similarity of vectors ⇒ relevance:

• Twists: tf-idf

Evaluating Information Retrieval

precision

recall

Question Answering

Question Answering Architecture

SLP p. 793

Your Project (Perhaps)

SLP p. 793

QuestionKnown Document

Evaluating QA

Some General Tools

• Supervised classification• Feature vector representations• Bootstrapping• Evaluation:• Precision and recall (and their curves)• Mean reciprocal rank

top related