part-of-speech tagging & parsing

Download Part-of-Speech Tagging & Parsing

Post on 23-Feb-2016

32 views

Category:

Documents

0 download

Embed Size (px)

DESCRIPTION

Part-of-Speech Tagging & Parsing. Chloé Kiddon (slides adapted and/or stolen outright from Andrew McCallum, Christopher Manning, and Julia Hockenmaier). Announcements. We do have a Hadoop cluster! It’s offsite. I need to know all groups who want it! - PowerPoint PPT Presentation

TRANSCRIPT

Part-of-Speech Tagging & Parsing

Part-of-Speech Tagging & ParsingChlo Kiddon

(slides adapted and/or stolen outright from Andrew McCallum, Christopher Manning, and Julia Hockenmaier)AnnouncementsWe do have a Hadoop cluster!Its offsite. I need to know all groups who want it!

You all have accounts for MySQL on the cubist machine (cubist.cs.washington.edu)Your folder is /projects/instr/cse454/a-f

Ill have a better email out this afternoon I hope

Grading HW1 should be finished by next week.Timely warningPOS tagging and parsing are two large topics in NLPUsually covered in 2-4 lectures

We have an hour and twenty minutes. Part-of-speech taggingOften want to know what part of speech (POS) or word class (noun,verb,) should be assigned to words in a piece of text

Part-of-speech tagging assigns POS labels to words

JJ JJ NNS VBP RB Colorless green ideas sleep furiously.Second line first. Why?4Why do we care?Parsing (come to later)

Speech synthesisINsult or inSULT, overFLOW or OVERflow, REad or reAD

Information extraction: entities, relationsRomeo loves Juliet vs. lost loves found again

Machine translationPenn Treebank Tagset1.CCCoordinating conjunction2.CDCardinal number3.DTDeterminer4.EXExistential there5.FWForeign word6.INPreposition or subordinating conjunction7.JJAdjective8.JJRAdjective, comparative9.JJSAdjective, superlative10.LSList item marker11.MDModal12.NNNoun, singular or mass13.NNSNoun, plural14.NPProper noun, singular15.NPSProper noun, plural16.PDTPredeterminer17.POSPossessive ending18.PPPersonal pronoun19.PP$Possessive pronoun20.RBAdverb21.RBRAdverb, comparative22.RBSAdverb, superlative23.RPParticle24.SYMSymbol25.TOto26.UHInterjection27.VBVerb, base form28.VBDVerb, past tense29.VBGVerb, gerund or present participle30.VBNVerb, past participle31.VBPVerb, non-3rd person singular present32.VBZVerb, 3rd person singular present33.WDTWh-determiner34.WPWh-pronoun35.WP$Possessive wh-pronoun36.WRBWh-adverb19926AmbiguityBuffalo buffalo buffalo.What makes this task hard?

THE buffalo FROM Buffalo WHO ARE buffaloed BY buffalo FROM Buffalo ALSO buffalo THE buffalo FROM Buffalo.

7How many words are ambiguous?

HockenmaierThese are just the tags words appeared with in the corpus. There may be unseen word/tag combinations.The main problem in the task of part of speech tagging is how to differentiate between the tags for ambiguous words8Nave approach!Pick the most common tag for the word

91% success rate!

Andrew McCallumWe have more informationWe are not just tagging words, we are tagging sequences of words

For a sequence of words W:W = w1w2w3wn

We are looking for a sequence of tags T: T = t1 t2 t3 tn

where P(T|W) is maximizedAndrew McCallumIn an ideal worldFind all instances of a sequence in the dataset and pick the most common sequence of tags

Count(heat oil in a large pot) = 0 ????Uhh

Spare data problemMost sequences will never occur, or will occur too few times for good predictions

Bayes RuleTo find P(T|W), use Bayes Rule:

We can maximize P(T|W) by maximizing P(W|T)*P(T)

Andrew McCallum

? What is Bayes rule?P(W) wont change. Words dont change.12Finding P(T)Generally,

Usually not feasible to accurately estimate more than tag bigrams (possibly trigrams)

In some way (supervised or unsupervised, we are going to need to learn these probabilities. 13Markov assumptionAssume that the probability of a tag only depends on the tag that came directly before it

Then,

Only need to count tag bigrams. Putting it all togetherWe can similarly assume

So:

And the final equation becomes:

Process as an HMMStart in an initial state t0 with probability (t0)Move from state ti to tj with transition probability a(tj|ti)In state ti, emit symbol wk with emission probability b(wk|ti) Adj.3.6Det.02.47Noun.3.7Verb.51.1.4the.4aP(w|Det).04low.02goodP(w|Adj).0001deal.001priceP(w|Noun)Three Questions for HMMsEvaluation Given a sequence of words W = w1w2w3wn and an HMM model , what is P(W|)

Decoding Given a sequence of words W and an HMM model , find the most probable parse T = t1 t2 t3 tn

Learning Given a tagged (or untagged) dataset, find the HMM that maximizes the data ? Remember the three questions17Three Questions for HMMsEvaluation Given a sequence of words W = w1w2w3wn and an HMM model , what is P(W|)

Tagging Given a sequence of words W and an HMM model , find the most probable parse T = t1 t2 t3 tn

Learning Given a tagged (or untagged) dataset, find the HMM that maximizes the data ? Remember the three questions18TaggingNeed to find the most likely tag sequence given a sequence of wordsmaximizes P(W|T)*P(T) and thus P(T|W)

Use Viterbi!Trellis

tagstime stepst1tjtNEvaluation Task:

P(w1,w2,,wi) given in tj at time iDecoding Task:Decoding Task:

max P(w1,w2,,wi) given in tj at time iAt every one of the n time steps (=words), the HMM can be in one of N states (=tags)20Trellis

tagstime stepst1tjtNEvaluation Task:

P(w1,w2,,wi) given in tj at time iDecoding Task:

max log P(w1,w2,,wi) given in tj at time iAt every one of the n time steps (=words), the HMM can be in one of N states (=tags)21Tagging initialization

tagstime stepst1tjtN= log P(w1|tj) + log P(tj)At every one of the n time steps (=words), the HMM can be in one of N states (=tags)22Tagging recursive step

tagstime stepst1tjtN

At every one of the n time steps (=words), the HMM can be in one of N states (=tags)23Tagging recursive step

tagstime stepst1tjtN

At every one of the n time steps (=words), the HMM can be in one of N states (=tags)24Pick best trellis cell for last word

tagstime stepst1tjtNAt every one of the n time steps (=words), the HMM can be in one of N states (=tags)25

Use back pointers to pick best sequencetagstime stepst1tjtNAt every one of the n time steps (=words), the HMM can be in one of N states (=tags)26Learning a POS-tagging HMMEstimate the parameters in the model using counts

With smoothing, this model can get 95-96% correct tagging

With models of 40-80 tags27Problem with supervised learningRequires a large hand-labeled corpusDoesnt scale to new languagesExpensive to produceDoesnt scale to new domains

Instead, apply unsupervised learning with Expectation Maximization (EM)Expectation step: calculate probability of all sequences using set of parametersMaximization step: re-estimate parameters using results from E-step? reasonsExpensive to produce in both time and money. Unless get undergrads to do the tagging.Compute the forward and backward probabilities for given model parameters and our observationsWith EM, you need to know your tagset beforehand 28Lots of other techniques!Trigram models (more common)Text normalizationError-based transformation learning (Brill learning)Rule-based systemCalculate initial states: proper noun detection, tagged corpusAcquire transformation rulesChange VB to NN when prev word was adjectiveThe long race finally endedMinimally supervised learningUnlabeled data but have a dictionary

Seems like POS-tagging is solvedPenn Treebank POS-tagging accuracy human ceilingHuman agreement 97%

In other languages, not so much

Interannotator agreementOther languages with more complex morphology need much larger tagsets for tagging to be useful, and will contain many more distinct word forms in corpora of the same size30So now we are HMM MastersWe can use HMMs toTag words in a sentence with their parts of speechExtract entities and other information from a sentence

Can we use them to determine syntactic categories?

SyntaxRefers to the study of the way words are arranged together, and the relationship between them.

Prescriptive vs. Descriptive

Goal of syntax is to model the knowledge of that people unconsciously have about the grammar of their native language

Parsing extracts the syntax from a sentence

? Pres vs desc?That was an epic failCs getting too mainstream ? Star trek intro: to boldly go where no man has gone before.A precise way to define and describe the structure of sentences.

32Parsing applicationsHigh-precision Question-Answering systems

Named Entity Recognition (NER) and information extraction

Opinion extraction in product reviews

Improved interaction during computer applications/gamesFill in examples33Basic English sentence structureIkeeatscakeNoun(subject)Verb(head)Noun(object)HockenmaierCan we build an HMM?Noun(subject)Verb(head)Noun(object)Ike, dogs, eat, sleep, cake, science, HockenmaierWords take argumentsI eat cake.I sleep cake.I give you cake. I give cake. HmmI eat you cake???SubcategorizationIntransitive verbs: take only a subjectTransitive verbs: take a subject and an objectDitransitive verbs: take a subject, object, and indirect objectSelectional preferencesThe object of eat should be edibleHockenmaierA better modelNoun(subject)Transitive Verb(head)Noun(object)Ike, dogs, eat, like, cake, science, Intransitive Verb(head)sleep, run, HockenmaierLanguage has recursive propertiesCoronel Mustard killed Mrs. Peacock*Coronel Mustard killed Mrs. Peacock in the library*Coronel Mustard killed Mrs. Peacock in the library with the candlestick*

Coronel Mustard killed Mrs. Peacock in the library with the candlestick at midnight

NounPrepositionHMMs cant generate hierarchical structureCoronel Mustard killed Mrs. Peacock in the library with the candlestick at midnight.

Does Mustard have the candlestick?Or is the candlestick just sitting in the library?

MemorylessCant make long range decisions about attachmentsNeed a bet

View more