language modeling - ecology...

54
Introduction to N-grams Language Modeling Many Slides are adapted from slides by Dan Jurafsky

Upload: others

Post on 12-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

IntroductiontoN-grams

LanguageModeling

Many Slides are adapted from slides by Dan Jurafsky

Page 2: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

ProbabilisticLanguageModels

• Today’sgoal:assignaprobabilitytoasentence• MachineTranslation:• P(highwindstonite)>P(large windstonite)

• SpellCorrection• Theofficeisaboutfifteenminuets frommyhouse

• P(aboutfifteenminutes from)>P(aboutfifteenminuets from)

• SpeechRecognition• P(Isawavan)>>P(eyesaweofan)

• +Summarization,question-answering,etc.,etc.!!

Why?

Page 3: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

ProbabilisticLanguageModeling

• Goal:computetheprobabilityofasentenceorsequenceofwords:

P(W)=P(w1,w2,w3,w4,w5…wn)

• Relatedtask:probabilityofanupcomingword:P(w5|w1,w2,w3,w4)

• Amodelthatcomputeseitherofthese:P(W)orP(wn|w1,w2…wn-1) iscalledalanguagemodel.

• Better:thegrammarButlanguagemodelorLMisstandard

Page 4: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

HowtocomputeP(W)

• Howtocomputethisjointprobability:

• P(its,water,is,so,transparent,that)

• Intuition:let’srelyontheChainRuleofProbability

Page 5: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

Reminder:TheChainRule

• With 4 variables:P(A,B,C,D)=P(A)P(B|A)P(C|A,B)P(D|A,B,C)

• TheChainRuleinGeneralP(x1,x2,x3,…,xn)=P(x1)P(x2|x1)P(x3|x1,x2)…P(xn|x1,…,xn-1)

Page 6: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

TheChainRuleappliedtocomputejointprobabilityofwordsinsentence

P(“itswaterissotransparent”)=P(its)× P(water|its)× P(is|its water)× P(so|its wateris)× P(transparent|its wateris

so)

P(w1w2…wn ) = P(wi |w1w2…wi−1)i∏

Page 7: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

Howtoestimatetheseprobabilities

• Couldwejustcountanddivide?

• No!Toomanypossiblesentences!• We’llneverseeenoughdataforestimatingthese

P(the | its water is so transparent that) =

Count(its water is so transparent that the)Count(its water is so transparent that)

Page 8: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

MarkovAssumption

• Inotherwords,weapproximateeachcomponentintheproduct

P(w1w2…wn ) ≈ P(wi |wi−k…wi−1)i∏

P(wi |w1w2…wi−1) ≈ P(wi |wi−k…wi−1)

Page 9: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

MarkovAssumption

• Simplifyingassumption:

• Ormaybe

P(the | its water is so transparent that) ≈ P(the | that)

P(the | its water is so transparent that) ≈ P(the | transparent that)

AndreiMarkov

Page 10: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

Simplestcase:Unigrammodel

fifth, an, of, futures, the, an, incorporated, a, a, the, inflation, most, dollars, quarter, in, is, mass

thrift, did, eighty, said, hard, 'm, july, bullish

that, or, limited, the

Someautomaticallygeneratedsentencesfromaunigrammodel

P(w1w2…wn ) ≈ P(wi)i∏

Page 11: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

Conditiononthepreviousword:

Bigrammodel

texaco, rose, one, in, this, issue, is, pursuing, growth, in, a, boiler, house, said, mr., gurria, mexico, 's, motion, control, proposal, without, permission, from, five, hundred, fifty, five, yen

outside, new, car, parking, lot, of, the, agreement, reached

this, would, be, a, record, november

P(wi |w1w2…wi−1) ≈ P(wi |wi−1)

Page 12: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

N-grammodels

• Wecanextendtotrigrams,4-grams,5-grams• Ingeneralthisisaninsufficientmodeloflanguage

• becauselanguagehaslong-distancedependencies:

“ThecomputerwhichIhadjustputintothemachineroomonthefifthfloorcrashed.”

• ButwecanoftengetawaywithN-grammodelsStill, Most words depend on their previous few words

Page 13: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

IntroductiontoN-grams

LanguageModeling

Page 14: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

EstimatingN-gramProbabilities

LanguageModeling

Page 15: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

Estimatingbigramprobabilities

• TheMaximumLikelihoodEstimate

P(wi |wi−1) =count(wi−1,wi)count(wi−1)

P(wi |wi−1) =c(wi−1,wi)c(wi−1)

Page 16: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

Anexample

<s>IamSam</s><s>SamIam</s><s>Idonotlikegreeneggsandham</s>

P(wi |wi−1) =c(wi−1,wi)c(wi−1)

Page 17: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

Moreexamples:BerkeleyRestaurantProjectsentences

• canyoutellmeaboutanygoodcantonese restaurantscloseby• midpricedthai foodiswhati’m lookingfor• tellmeaboutchezpanisse• canyougivemealistingofthekindsoffoodthatareavailable• i’m lookingforagoodplacetoeatbreakfast• wheniscaffe venezia openduringtheday

Page 18: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

Rawbigramcounts

• Outof9222sentences

Page 19: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

Rawbigramprobabilities

• Normalizebyunigrams:

• Result:

Page 20: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

Bigramestimatesofsentenceprobabilities

P(<s>Iwantenglish food</s>)=P(I|<s>)× P(want|I)× P(english|want)× P(food|english)× P(</s>|food)=.000031

Page 21: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

Whatkindsofknowledge?

• P(english|want)=.0011• P(chinese|want)=.0065• P(to|want)=.66• P(eat|to)=.28• P(food|to)=0• P(want|spend)=0• P(i |<s>)=.25

Page 22: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

PracticalIssues

• Wedoeverythinginlogspace• Avoidunderflow• (alsoaddingisfasterthanmultiplying)

log(p1 × p2 × p3 × p4 ) = log p1 + log p2 + log p3 + log p4

Page 23: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

LanguageModelingToolkits

• SRILM• http://www.speech.sri.com/projects/srilm/

Page 24: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

GoogleN-GramRelease,August2006

http://ngrams.googlelabs.com/

Page 25: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

EstimatingN-gramProbabilities

LanguageModeling

Page 26: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

EvaluationandPerplexity

LanguageModeling

Page 27: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

ExtrinsicevaluationofN-grammodels

• BestevaluationforcomparingmodelsAandB• Puteachmodelinatask• spellingcorrector,speechrecognizer,MTsystem

• Runthetask,getanaccuracyforAandforB• Howmanymisspelledwordscorrectedproperly• Howmanywordstranslatedcorrectly

• CompareaccuracyforAandB

Page 28: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

Difficultyofextrinsic(in-vivo)evaluationofN-grammodels

• Extrinsicevaluation• Time-consuming;cantakedaysorweeks

• So• Sometimesuseintrinsic evaluation:perplexity

Page 29: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

IntuitionofPerplexity

• TheShannonGame:• Howwellcanwepredictthenextword?

• Unigramsareterribleatthisgame.(Why?)• Abettermodelofatext

• isonewhichassignsahigherprobabilitytothewordthatactuallyoccurs

Ialwaysorderpizzawithcheeseand____

The33rd PresidentoftheUSwas____

Isawa____

mushrooms 0.1

pepperoni 0.1

anchovies 0.01

….

fried rice 0.0001

….

and 1e-100

Page 30: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

Perplexity

Perplexityistheinverseprobabilityofthetestset,normalizedbythenumberofwords:

Chainrule:

Forbigrams:

Minimizingperplexityisthesameasmaximizingprobability

Thebestlanguagemodelisonethatbestpredictsanunseentestset• GivesthehighestP(sentence)

PP(W ) = P(w1w2...wN )−

1N

=1

P(w1w2...wN )N

Page 31: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

Lowerperplexity=bettermodel

• Training38millionwords,test1.5millionwords,WSJ

N-gramOrder

Unigram Bigram Trigram

Perplexity 962 170 109

Page 32: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

EvaluationandPerplexity

LanguageModeling

Page 33: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

Generalizationandzeros

LanguageModeling

Page 34: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

TheShannonVisualizationMethod• Choosearandombigram

(<s>,w)accordingtoitsprobability• Nowchoosearandombigram

(w,x)accordingtoitsprobability• Andsoonuntilwechoose</s>• Thenstringthewordstogether

<s> II wantwant to

to eateat Chinese

Chinese foodfood </s>

I want to eat Chinese food

Page 35: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

ApproximatingShakespeare

Page 36: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

Shakespeareascorpus

• N=884,647tokens,V=29,066• Shakespeareproduced300,000bigramtypesoutofV2=844millionpossiblebigrams.• So99.96%ofthepossiblebigramswereneverseen(havezeroentriesinthetable)

• Quadrigrams worse:What'scomingoutlookslikeShakespearebecauseitis Shakespeare

Page 37: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

Thewallstreetjournalisnotshakespeare(nooffense)

Page 38: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

Theperilsofoverfitting

• N-gramsonlyworkwellforwordpredictionifthetestcorpuslookslikethetrainingcorpus• Inreallife,itoftendoesn’t• Weneedtotrainrobustmodels thatgeneralize!• Onekindofgeneralization:Zeros!• Thingsthatdon’teveroccurinthetrainingset• Butoccurinthetestset

Page 39: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

Zeros• Trainingset:

…deniedtheallegations…deniedthereports…deniedtheclaims…deniedtherequest

P(“offer”|deniedthe)=0

• Testset…deniedtheoffer…deniedtheloan

Page 40: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

Zeroprobabilitybigrams

• Bigramswithzeroprobability• meanthatwewillassign0probabilitytothetestset!

• Andhencewecannotcomputeperplexity(can’tdivideby0)!

Page 41: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

Generalizationandzeros

LanguageModeling

Page 42: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

Smoothing:Add-one(Laplace)smoothing

LanguageModeling

Page 43: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

Add-oneestimation

• AlsocalledLaplacesmoothing• Pretendwesaweachwordonemoretimethanwedid• Justaddonetoallthecounts!

• MLEestimate:

• Add-1estimate:

PMLE (wi |wi−1) =c(wi−1,wi )c(wi−1)

PAdd−1(wi |wi−1) =c(wi−1,wi )+1c(wi−1)+V +1

Page 44: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

Berkeley Restaurant Corpus: Laplace smoothed bigram counts

Page 45: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

Laplace-smoothed bigrams

+1

Page 46: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

Reconstituted counts

+1

Page 47: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

Compare with raw bigram counts

Page 48: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

Add-1estimationisabluntinstrument

• Soadd-1isn’tusedforN-grams:• We’llseebettermethods

• Butadd-1isusedtosmoothotherNLPmodels• Fortextclassification• Indomainswherethenumberofzerosisn’tsohuge.

Page 49: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

Smoothing:Add-one(Laplace)smoothing

LanguageModeling

Page 50: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

Interpolation,Backoff

LanguageModeling

Page 51: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

Backoff and Interpolation• Sometimesithelpstouseless context

• Conditiononlesscontextforcontextsyouhaven’tlearnedmuchabout

• Backoff:• usetrigramifyouhavegoodevidence,• otherwisebigram,otherwiseunigram

• Interpolation:• mixunigram,bigram,trigram

• Interpolationworksbetter

Page 52: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

LinearInterpolation

• Simpleinterpolation

• Lambdasconditionaloncontext:

Page 53: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

N-gramSmoothingSummary

• Add-1smoothing:• OKfortextcategorization,notforlanguagemodeling

• Backoff andInterpolationworkbetter• Themostcommonlyusedmethod:

• ExtendedInterpolatedKneser-Ney

53

Page 54: Language Modeling - ecology labfaculty.cse.tamu.edu/huangrh/Fall18-489/l56_languagemodeling.pdfProbabilistic Language Modeling •Goal: compute the probability of a sentence or sequence

Interpolation,Backoff

LanguageModeling