sequence classification: chunking & ner shallow processing techniques for nlp ling570 november...
Post on 19-Dec-2015
214 views
TRANSCRIPT
Sequence Classification:
Chunking & NERShallow Processing Techniques for NLP
Ling570November 23, 2011
Roadmap Named Entity Recognition
Chunking
HW #9
Named Entity Recognition
RoadmapNamed Entity Recognition
Definition
Motivation
Challenges
Common Approach
Named Entity RecognitionTask: Identify Named Entities in (typically)
unstructured text
Typical entities:Person namesLocationsOrganizationsDatesTimes
ExampleMicrosoft released Windows Vista in 2007.
Example due to F. Xia
ExampleMicrosoft released Windows Vista in 2007.
<ORG>Microsoft</ORG> released <PRODUCT>Windows Vista</PRODUCT> in <YEAR>2007</YEAR>
Example due to F. Xia
ExampleMicrosoft released Windows Vista in 2007.
<ORG>Microsoft</ORG> released <PRODUCT>Windows Vista</PRODUCT> in <YEAR>2007</YEAR>
Entities:Often application/domain specific
Business intelligence:
Example due to F. Xia
ExampleMicrosoft released Windows Vista in 2007.
<ORG>Microsoft</ORG> released <PRODUCT>Windows Vista</PRODUCT> in <YEAR>2007</YEAR>
Entities:Often application/domain specific
Business intelligence: products, companies, featuresBiomedical:
Example due to F. Xia
ExampleMicrosoft released Windows Vista in 2007.
<ORG>Microsoft</ORG> released <PRODUCT>Windows Vista</PRODUCT> in <YEAR>2007</YEAR>
Entities:Often application/domain specific
Business intelligence: products, companies, featuresBiomedical: Genes, proteins, diseases, drugs, …
Example due to F. Xia
Named Entity TypesCommon categories
Named Entity ExamplesFor common categories:
Why NER?Machine translation:
Why NER?Machine translation:
Person
Why NER?Machine translation:
Person names typically not translatedPossibly transliteratedWaldheim
Number:
Why NER?Machine translation:
Person names typically not translatedPossibly transliteratedWaldheim
Number: 9/11: Date vs ratio911: Emergency phone number, simple number
Why NER?Information extraction:
MUC task: Joint ventures/mergersFocus on
Why NER?Information extraction:
MUC task: Joint ventures/mergersFocus on Company names, Person Names (CEO),
valuations
Why NER?Information extraction:
MUC task: Joint ventures/mergersFocus on Company names, Person Names (CEO),
valuations
Information retrieval:Named entities focus of retrieval In some data sets, 60+% queries target NEs
Why NER?Information extraction:
MUC task: Joint ventures/mergersFocus on Company names, Person Names (CEO),
valuations
Information retrieval:Named entities focus of retrieval In some data sets, 60+% queries target NEs
Text-to-speech:
Why NER? Information extraction:
MUC task: Joint ventures/mergersFocus on Company names, Person Names (CEO),
valuations
Information retrieval: Named entities focus of retrieval In some data sets, 60+% queries target NEs
Text-to-speech: 206-616-5728
Phone numbers (vs other digit strings) , differ by language
ChallengesAmbiguity
Washington chose
ChallengesAmbiguity
Washington choseD.C., State, George, etc
Most digit strings
ChallengesAmbiguity
Washington choseD.C., State, George, etc
Most digit strings
cat: (95 results)
ChallengesAmbiguity
Washington choseD.C., State, George, etc
Most digit strings
cat: (95 results)CAT(erpillar) stock tickerComputerized Axial TomographyChloramphenicol Acetyl Transferasesmall furry mammal
Context & Ambiguity
EvaluationPrecision
Recall
F-measure
ResourcesOnline:
Name listsBaby name, who’s who, newswire services,
census.govGazetteersSEC listings of companies
ToolsLingpipeOpenNLPStanford NLP toolkit
Approaches to NERRule/Regex-based:
Approaches to NERRule/Regex-based:
Match names/entities in listsRegex:
Approaches to NERRule/Regex-based:
Match names/entities in listsRegex: e.g \d\d/\d\d/\d\d: 11/23/11Currency: $\d+\.\d+
Approaches to NERRule/Regex-based:
Match names/entities in listsRegex: e.g \d\d/\d\d/\d\d: 11/23/11Currency: $\d+\.\d+
Machine Learning via Sequence Labeling:Better for names, organizations
Approaches to NERRule/Regex-based:
Match names/entities in listsRegex: e.g \d\d/\d\d/\d\d: 11/23/11Currency: $\d+\.\d+
Machine Learning via Sequence Labeling:Better for names, organizations
Hybrid
NER as Sequence Labeling
NER as Classification TaskInstance:
NER as Classification TaskInstance: token
Labels:
NER as Classification TaskInstance: token
Labels:Position: B(eginning), I(nside), Outside
NER as Classification TaskInstance: token
Labels:Position: B(eginning), I(nside), OutsideNER types: PER, ORG, LOC, NUM
NER as Classification TaskInstance: token
Labels:Position: B(eginning), I(nside), OutsideNER types: PER, ORG, LOC, NUMLabel: Type-Position, e.g. PER-B, PER-I, O, …How many tags?
NER as Classification TaskInstance: token
Labels:Position: B(eginning), I(nside), OutsideNER types: PER, ORG, LOC, NUMLabel: Type-Position, e.g. PER-B, PER-I, O, …How many tags?
(|NER Types|x 2) + 1
NER as Classification: Features
What information can we use for NER?
NER as Classification: Features
What information can we use for NER?
NER as Classification: Features
What information can we use for NER?
Predictive tokens: e.g. MD, Rev, Inc,..
How general are these features?
NER as Classification: Features
What information can we use for NER?
Predictive tokens: e.g. MD, Rev, Inc,..
How general are these features? Language? Genre? Domain?
NER as Classification: Shape Features
Shape types:
NER as Classification: Shape Features
Shape types: lower: e.g. cumming
All lower case
NER as Classification: Shape Features
Shape types: lower: e.g. cumming
All lower casecapitalized: e.g. Washington
First letter uppercase
NER as Classification: Shape Features
Shape types: lower: e.g. cumming
All lower casecapitalized: e.g. Washington
First letter uppercaseall caps: e.g. WHO
all letters capitalized
NER as Classification: Shape Features
Shape types: lower: e.g. cumming
All lower casecapitalized: e.g. Washington
First letter uppercaseall caps: e.g. WHO
all letters capitalizedmixed case: eBay
Mixed upper and lower case
NER as Classification: Shape Features
Shape types: lower: e.g. cumming
All lower casecapitalized: e.g. Washington
First letter uppercaseall caps: e.g. WHO
all letters capitalizedmixed case: eBay
Mixed upper and lower caseCapitalized with period: H.
NER as Classification: Shape Features
Shape types: lower: e.g. cumming
All lower casecapitalized: e.g. Washington
First letter uppercaseall caps: e.g. WHO
all letters capitalizedmixed case: eBay
Mixed upper and lower caseCapitalized with period: H.Ends with digit: A9
NER as Classification: Shape Features
Shape types: lower: e.g. cumming
All lower case capitalized: e.g. Washington
First letter uppercase all caps: e.g. WHO
all letters capitalized mixed case: eBay
Mixed upper and lower case Capitalized with period: H. Ends with digit: A9 Contains hyphen: H-P
Example Instance Representation
Example
Sequence LabelingExample
EvaluationSystem: output of automatic tagging
Gold Standard: true tags
EvaluationSystem: output of automatic tagging
Gold Standard: true tags
Precision: # correct chunks/# system chunks
Recall: # correct chunks/# gold chunks
F-measure:
EvaluationSystem: output of automatic tagging
Gold Standard: true tags
Precision: # correct chunks/# system chunks
Recall: # correct chunks/# gold chunks
F-measure:
F1 balances precision & recall
EvaluationStandard measures:
Precision, Recall, F-measureComputed on entity types (Co-NLL evaluation)
EvaluationStandard measures:
Precision, Recall, F-measureComputed on entity types (Co-NLL evaluation)
Classifiers vs evaluation measuresClassifiers optimize tag accuracy
EvaluationStandard measures:
Precision, Recall, F-measureComputed on entity types (Co-NLL evaluation)
Classifiers vs evaluation measuresClassifiers optimize tag accuracy
Most common tag?
EvaluationStandard measures:
Precision, Recall, F-measureComputed on entity types (Co-NLL evaluation)
Classifiers vs evaluation measuresClassifiers optimize tag accuracy
Most common tag? O – most tokens aren’t NEs
Evaluation measures focuses on NE
EvaluationStandard measures:
Precision, Recall, F-measureComputed on entity types (Co-NLL evaluation)
Classifiers vs evaluation measuresClassifiers optimize tag accuracy
Most common tag? O – most tokens aren’t NEs
Evaluation measures focuses on NE
State-of-the-art:Standard tasks: PER, LOC: 0.92; ORG: 0.84
Hybrid ApproachesPractical sytems
Exploit lists, rules, learning…
Hybrid ApproachesPractical sytems
Exploit lists, rules, learning…Multi-pass:
Early passes: high precision, low recallLater passes: noisier sequence learning
Hybrid ApproachesPractical sytems
Exploit lists, rules, learning…Multi-pass:
Early passes: high precision, low recallLater passes: noisier sequence learning
Hybrid system:High precision rules tag unambiguous mentions
Use string matching to capture substring matches
Hybrid ApproachesPractical sytems
Exploit lists, rules, learning…Multi-pass:
Early passes: high precision, low recallLater passes: noisier sequence learning
Hybrid system:High precision rules tag unambiguous mentions
Use string matching to capture substring matchesTag items from domain-specific name listsApply sequence labeler
Chunking
RoadmapChunking
Definition
Motivation
Challenges
Approach
What is Chunking?Form of partial (shallow) parsing
What is Chunking?Form of partial (shallow) parsing
Extracts major syntactic units, but not full parse trees
What is Chunking?Form of partial (shallow) parsing
Extracts major syntactic units, but not full parse trees
Task: identify and classify Flat, non-overlapping segments of a sentence
What is Chunking?Form of partial (shallow) parsing
Extracts major syntactic units, but not full parse trees
Task: identify and classify Flat, non-overlapping segments of a sentenceBasic non-recursive phrases
What is Chunking?Form of partial (shallow) parsing
Extracts major syntactic units, but not full parse trees
Task: identify and classify Flat, non-overlapping segments of a sentenceBasic non-recursive phrasesCorrespond to major POS
May ignore some categories; i.e. base NP chunking
What is Chunking?Form of partial (shallow) parsing
Extracts major syntactic units, but not full parse trees
Task: identify and classify Flat, non-overlapping segments of a sentenceBasic non-recursive phrasesCorrespond to major POS
May ignore some categories; i.e. base NP chunkingCreate simple bracketing
[NPThe morning flight][PPfrom][NPDenver][Vphas arrived]
What is Chunking?Form of partial (shallow) parsing
Extracts major syntactic units, but not full parse trees
Task: identify and classify Flat, non-overlapping segments of a sentenceBasic non-recursive phrasesCorrespond to major POS
May ignore some categories; i.e. base NP chunkingCreate simple bracketing
[NPThe morning flight][PPfrom][NPDenver][Vphas arrived]
[NPThe morning flight] from [NPDenver] has arrived
Why Chunking?Used when full parse unnecessary
Why Chunking?Used when full parse unnecessary
Or infeasible or impossible (when?)
Why Chunking?Used when full parse unnecessary
Or infeasible or impossible (when?)
Extraction of subcategorization frames Identify verb arguments
e.g. VP NP VP NP NP VP NP to NP
Why Chunking?Used when full parse unnecessary
Or infeasible or impossible (when?)
Extraction of subcategorization frames Identify verb arguments
e.g. VP NP VP NP NP VP NP to NP
Information extraction: who did what to whom
Why Chunking?Used when full parse unnecessary
Or infeasible or impossible (when?)
Extraction of subcategorization frames Identify verb arguments
e.g. VP NP VP NP NP VP NP to NP
Information extraction: who did what to whom
Summarization: Base information, remove mods
Why Chunking?Used when full parse unnecessary
Or infeasible or impossible (when?)
Extraction of subcategorization frames Identify verb arguments
e.g. VP NP VP NP NP VP NP to NP
Information extraction: who did what to whom
Summarization: Base information, remove mods
Information retrieval: Restrict indexing to base NPs
Processing Example Tokenization: The morning flight from Denver has arrived
Processing Example Tokenization: The morning flight from Denver has arrived
POS tagging: DT JJ N PREP NNP AUX V
Processing Example Tokenization: The morning flight from Denver has arrived
POS tagging: DT JJ N PREP NNP AUX V
Chunking: NP PP NP VP
Processing Example Tokenization: The morning flight from Denver has arrived
POS tagging: DT JJ N PREP NNP AUX V
Chunking: NP PP NP VP
Extraction: NP NP VP
etc
ApproachesFinite-state Approaches
Grammatical rules in FSTsCascade to produce more complex structure
ApproachesFinite-state Approaches
Grammatical rules in FSTsCascade to produce more complex structure
Machine LearningSimilar to POS tagging
Finite-State Rule-Based Chunking
Hand-crafted rules model phrasesTypically application-specific
Finite-State Rule-Based Chunking
Hand-crafted rules model phrasesTypically application-specific
Left-to-right longest match (Abney 1996)Start at beginning of sentenceFind longest matching rule
Finite-State Rule-Based Chunking
Hand-crafted rules model phrasesTypically application-specific
Left-to-right longest match (Abney 1996)Start at beginning of sentenceFind longest matching ruleGreedy approach, not guaranteed optimal
Finite-State Rule-Based Chunking
Chunk rules:Cannot contain recursion
NP -> Det Nominal:
Finite-State Rule-Based Chunking
Chunk rules:Cannot contain recursion
NP -> Det Nominal: OkayNominal -> Nominal PP:
Finite-State Rule-Based Chunking
Chunk rules:Cannot contain recursion
NP -> Det Nominal: OkayNominal -> Nominal PP: Not okay
Examples:NP (Det) Noun* NounNP Proper-NounVP VerbVP Aux Verb
Finite-State Rule-Based Chunking
Chunk rules: Cannot contain recursion
NP -> Det Nominal: OkayNominal -> Nominal PP: Not okay
Examples: NP (Det) Noun* Noun NP Proper-Noun VP Verb VP Aux Verb
Consider: Time flies like an arrow
Is this what we want?
Cascading FSTsRicher partial parsing
Pass output of FST to next FST
Cascading FSTsRicher partial parsing
Pass output of FST to next FST
Approach:First stage: Base phrase chunkingNext stage: Larger constituents (e.g. PPs, VPs)Highest stage: Sentences
Example
Chunking by ClassificationModel chunking as task similar to POS tagging
Instance:
Chunking by ClassificationModel chunking as task similar to POS tagging
Instance: tokens
Labels: Simultaneously encode segmentation &
identification
Chunking by ClassificationModel chunking as task similar to POS tagging
Instance: tokens
Labels: Simultaneously encode segmentation &
identification IOB (or BIO tagging) (also BIOE or BIOSE)
Segment: B(eginning), I (nternal), O(utside)
Chunking by ClassificationModel chunking as task similar to POS tagging
Instance: tokens
Labels: Simultaneously encode segmentation &
identification IOB (or BIO tagging) (also BIOE or BIOSE)
Segment: B(eginning), I (nternal), O(utside)Identity: Phrase category: NP, VP, PP, etc.
Chunking by ClassificationModel chunking as task similar to POS tagging
Instance: tokens
Labels: Simultaneously encode segmentation &
identification IOB (or BIO tagging) (also BIOE or BIOSE)
Segment: B(eginning), I (nternal), O(utside)Identity: Phrase category: NP, VP, PP, etc.The morning flight from Denver has arrivedNP-B NP-I NP-I PP-B NP-B VP-B VP-I
Chunking by ClassificationModel chunking as task similar to POS tagging
Instance: tokens
Labels: Simultaneously encode segmentation & identification IOB (or BIO tagging) (also BIOE or BIOSE)
Segment: B(eginning), I (nternal), O(utside)Identity: Phrase category: NP, VP, PP, etc.The morning flight from Denver has arrivedNP-B NP-I NP-I PP-B NP-B VP-B VP-INP-B NP-I NP-I NP-B
Features for ChunkingWhat are good features?
Features for ChunkingWhat are good features?
Preceding tagsfor 2 preceding words
Features for ChunkingWhat are good features?
Preceding tagsfor 2 preceding words
Wordsfor 2 preceding, current, 2 following
Features for ChunkingWhat are good features?
Preceding tagsfor 2 preceding words
Wordsfor 2 preceding, current, 2 following
Parts of speechfor 2 preceding, current, 2 following
Features for ChunkingWhat are good features?
Preceding tagsfor 2 preceding words
Wordsfor 2 preceding, current, 2 following
Parts of speechfor 2 preceding, current, 2 following
Vector includes those features + true label
Chunking as ClassificationExample
EvaluationSystem: output of automatic tagging
Gold Standard: true tags Typically extracted from parsed treebank
Precision: # correct chunks/# system chunks
Recall: # correct chunks/# gold chunks
F-measure:
F1 balances precision & recall
State-of-the-ArtBase NP chunking: 0.96
State-of-the-ArtBase NP chunking: 0.96
Complex phrases: Learning: 0.92-0.94Most learners achieve similar results
Rule-based: 0.85-0.92
State-of-the-ArtBase NP chunking: 0.96
Complex phrases: Learning: 0.92-0.94Most learners achieve similar results
Rule-based: 0.85-0.92
Limiting factors:
State-of-the-ArtBase NP chunking: 0.96
Complex phrases: Learning: 0.92-0.94Most learners achieve similar results
Rule-based: 0.85-0.92
Limiting factors:POS tagging accuracy Inconsistent labeling (parse tree extraction)Conjunctions
Late departures and arrivals are common in winterLate departures and cancellations are common in winter
HW #9
Building a MaxEnt POS Tagger
Q1: Build feature vector representations for POS tagging in SVMlight format
maxent_features.* training_file testing_file rare_wd_threshold rare_feat_threshold outdir
training_file, testing_file: like HW#7w1/t1 w2/t2 …wn/tn
Filter rare words and infrequent features
Store vectors & intermediate representations in outdir
Feature RepresentationsFeatures:
Ratnaparkhi, 1996, Table 1 (duplicated in MaxEnt slides)
Character issues:Replace “,” with “comma”Replace “:” with “colon”
Mallet and svmlight format use these as delimiters
Q2: ExperimentsRun MaxEnt classification using your training and
test files
Compare effects of different thresholds on feature count, accuracy, and runtime
Note: Big filesThis assignment will produce even larger sets of
results that HW#8. Please gzip your tar files. If the DropBox won’t accept the files, you can store
the files on patas. Just let Sanghoun know where to find them.