crf
TRANSCRIPT
BIONLP09and
CRFs
Farzaneh Sarafraz
18 February 2009
BioNLP'09
Event rather than entity Most entities are given 3 tasks
− Event detection and characterization− Event argument recognition− Negations and speculations
Example
"I kappa B/MAD3 masks the nuclear localization signal of NFkappa B p65 and requires the transactivation domain to inhibit NFkappa B p65 DNA binding. "
Event: negative regulation
Trigger: masks
Theme1: the first p65
Cause: MAD3
Site: nuclear localization signal
Example
"In contrast, NFkappa B p50 alone fails to stimulate kappa Bdirected transcription, and based on prior in vitro studies, is not directly regulated by I kappa B. "
Event: regulation
Theme1: this p50
Trigger: regulated
Negation: true for this event
Speculation: none
HMM and MEMM
Observations (X1, X2, ...)
labels (Y1, Y2, ...) p(Xi , Yi)
X ranges over observation sequence
Y ranges over and label sequence
Requires independence assumption
i.e. each item is labelled independently
Conditional Random Field
p(Y |X)
Y: label sequence
X: observation sequence
Maximise p
MMEM Label Bias Problem
Probability given the current state− Transitions leaving a state compete against
each other not all states
− Perstate normalization− Probability bias towards states with few transitions− Demonstrated experimentall
Label Bias Example
Training data:− A B C D− A B D D− A B C E− A B D C
Model says:− C > D 50%− C > E 50%
Why predict E when D is much more common?
CRF Solution
Model probability of transitions and probability of states
CRFs− Models probability of transition between states− Probability is conditional on current observation− Not normalised − Considers many "features" of observations
Features
"edge features" as well as "vertex features"− Word is capitalized− Word ends in "ing"− Label is "proper noun"
Features are important!
End.