from baby steps to leapfrog: how “less is more” baby steps to leapfrog: how “less is more”...
Post on 05-May-2018
216 Views
Preview:
TRANSCRIPT
From Baby Steps to Leapfrog:
How “Less is More”in Unsupervised Dependency Parsing
Valentin I. Spitkovsky
with Hiyan Alshawi (Google Inc.)and Daniel Jurafsky (Stanford University)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 1 / 30
Overview
Idea: (At Least) Two Axes worth Scaffolding
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 2 / 30
Overview
Idea: (At Least) Two Axes worth ScaffoldingModel (or Algorithmic) Complexity
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 2 / 30
Overview
Idea: (At Least) Two Axes worth ScaffoldingModel (or Algorithmic) Complexity [classic NLP]
— word alignment (unsupervised), e.g., IBM models 1-5(Brown et al., 1993)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 2 / 30
Overview
Idea: (At Least) Two Axes worth ScaffoldingModel (or Algorithmic) Complexity [classic NLP]
— word alignment (unsupervised), e.g., IBM models 1-5(Brown et al., 1993)
— parsing (supervised), e.g., “coarse-to-fine” grammars(Charniak and Johnson, 2005; Petrov 2009)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 2 / 30
Overview
Idea: (At Least) Two Axes worth ScaffoldingModel (or Algorithmic) Complexity [classic NLP]
— word alignment (unsupervised), e.g., IBM models 1-5(Brown et al., 1993)
— parsing (supervised), e.g., “coarse-to-fine” grammars(Charniak and Johnson, 2005; Petrov 2009)
Data (or Problem / Task) Complexity
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 2 / 30
Overview
Idea: (At Least) Two Axes worth ScaffoldingModel (or Algorithmic) Complexity [classic NLP]
— word alignment (unsupervised), e.g., IBM models 1-5(Brown et al., 1993)
— parsing (supervised), e.g., “coarse-to-fine” grammars(Charniak and Johnson, 2005; Petrov 2009)
Data (or Problem / Task) Complexity [rare in NLP]
— reinforcement learning, e.g., robot navigation(Singh, 1992; Sanger 1994)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 2 / 30
Overview
Idea: (At Least) Two Axes worth ScaffoldingModel (or Algorithmic) Complexity [classic NLP]
— word alignment (unsupervised), e.g., IBM models 1-5(Brown et al., 1993)
— parsing (supervised), e.g., “coarse-to-fine” grammars(Charniak and Johnson, 2005; Petrov 2009)
Data (or Problem / Task) Complexity [rare in NLP]
— reinforcement learning, e.g., robot navigation(Singh, 1992; Sanger 1994)
— closest in NLP: cautious named entity classification(Collins and Singer, 1999; Yarowsky, 1995)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 2 / 30
Overview
Outline: Three Data-Complexity-Aware Techniques
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 3 / 30
Overview
Outline: Three Data-Complexity-Aware Techniques
Baby Steps: scaffolding on data complexity— iterative, requires no initialization
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 3 / 30
Overview
Outline: Three Data-Complexity-Aware Techniques
Baby Steps: scaffolding on data complexity— iterative, requires no initialization
Less is More: filtering by data complexity— batch, capable of using a good initializer
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 3 / 30
Overview
Outline: Three Data-Complexity-Aware Techniques
Baby Steps: scaffolding on data complexity— iterative, requires no initialization
Less is More: filtering by data complexity— batch, capable of using a good initializer
Leapfrog: a combination (best of both worlds)— intended as an efficiency hack (but performs best)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 3 / 30
The Problem
Problem: Unsupervised Learning of Parsing
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 4 / 30
The Problem
Problem: Unsupervised Learning of Parsing
Input: Raw Text
... By most measures, the nation’s industrial sector is now
growing very slowly — if at all. Factory payrolls fell in
September. So did the Federal Reserve ...
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 4 / 30
The Problem
Problem: Unsupervised Learning of Parsing
NN NNS VBD IN NN ♦| | | | | |
Factory payrolls fell in September .
Input: Raw Text (Sentences, Tokens and POS-tags)
... By most measures, the nation’s industrial sector is now
growing very slowly — if at all. Factory payrolls fell in
September. So did the Federal Reserve ...
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 4 / 30
The Problem
Problem: Unsupervised Learning of Parsing
NN NNS VBD IN NN ♦| | | | | |
Factory payrolls fell in September .
Input: Raw Text (Sentences, Tokens and POS-tags)
... By most measures, the nation’s industrial sector is now
growing very slowly — if at all. Factory payrolls fell in
September. So did the Federal Reserve ...
Output: Syntactic Structures (and a Probabilistic Grammar)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 4 / 30
The Problem
Motivation: Unsupervised (Dependency) Parsing
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 5 / 30
The Problem
Motivation: Unsupervised (Dependency) Parsing
Insert your favorite reason(s) why you’d like to parseanything in the first place...
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 5 / 30
The Problem
Motivation: Unsupervised (Dependency) Parsing
Insert your favorite reason(s) why you’d like to parseanything in the first place...
... adjust for any data without reference tree banks:
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 5 / 30
The Problem
Motivation: Unsupervised (Dependency) Parsing
Insert your favorite reason(s) why you’d like to parseanything in the first place...
... adjust for any data without reference tree banks:— i.e., exotic languages
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 5 / 30
The Problem
Motivation: Unsupervised (Dependency) Parsing
Insert your favorite reason(s) why you’d like to parseanything in the first place...
... adjust for any data without reference tree banks:— i.e., exotic languages and/or genres (e.g., legal).
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 5 / 30
The Problem
Motivation: Unsupervised (Dependency) Parsing
Insert your favorite reason(s) why you’d like to parseanything in the first place...
... adjust for any data without reference tree banks:— i.e., exotic languages and/or genres (e.g., legal).
Potential applications:
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 5 / 30
The Problem
Motivation: Unsupervised (Dependency) Parsing
Insert your favorite reason(s) why you’d like to parseanything in the first place...
... adjust for any data without reference tree banks:— i.e., exotic languages and/or genres (e.g., legal).
Potential applications:◮ machine translation
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 5 / 30
The Problem
Motivation: Unsupervised (Dependency) Parsing
Insert your favorite reason(s) why you’d like to parseanything in the first place...
... adjust for any data without reference tree banks:— i.e., exotic languages and/or genres (e.g., legal).
Potential applications:◮ machine translation
— word alignment, phrase extraction, reordering;
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 5 / 30
The Problem
Motivation: Unsupervised (Dependency) Parsing
Insert your favorite reason(s) why you’d like to parseanything in the first place...
... adjust for any data without reference tree banks:— i.e., exotic languages and/or genres (e.g., legal).
Potential applications:◮ machine translation
— word alignment, phrase extraction, reordering;
◮ web search
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 5 / 30
The Problem
Motivation: Unsupervised (Dependency) Parsing
Insert your favorite reason(s) why you’d like to parseanything in the first place...
... adjust for any data without reference tree banks:— i.e., exotic languages and/or genres (e.g., legal).
Potential applications:◮ machine translation
— word alignment, phrase extraction, reordering;
◮ web search— retrieval, query refinement;
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 5 / 30
The Problem
Motivation: Unsupervised (Dependency) Parsing
Insert your favorite reason(s) why you’d like to parseanything in the first place...
... adjust for any data without reference tree banks:— i.e., exotic languages and/or genres (e.g., legal).
Potential applications:◮ machine translation
— word alignment, phrase extraction, reordering;
◮ web search— retrieval, query refinement;
◮ question answering
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 5 / 30
The Problem
Motivation: Unsupervised (Dependency) Parsing
Insert your favorite reason(s) why you’d like to parseanything in the first place...
... adjust for any data without reference tree banks:— i.e., exotic languages and/or genres (e.g., legal).
Potential applications:◮ machine translation
— word alignment, phrase extraction, reordering;
◮ web search— retrieval, query refinement;
◮ question answering, speech recognition, etc.
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 5 / 30
State-of-the-Art
State-of-the-Art: Directed Dependency Accuracy
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 6 / 30
State-of-the-Art
State-of-the-Art: Directed Dependency Accuracy42.2% on Section 23 (all sentences) of WSJ
(Cohen and Smith, 2009)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 6 / 30
State-of-the-Art
State-of-the-Art: Directed Dependency Accuracy42.2% on Section 23 (all sentences) of WSJ
(Cohen and Smith, 2009)
31.7% for the (right-branching) baseline(Klein and Manning, 2004)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 6 / 30
State-of-the-Art
State-of-the-Art: Directed Dependency Accuracy42.2% on Section 23 (all sentences) of WSJ
(Cohen and Smith, 2009)
31.7% for the (right-branching) baseline(Klein and Manning, 2004)
Scoring example:
NN NNS VBD IN NN ♦| | | | | |
Factory payrolls fell in September .
Directed Score: 35 = 60% (baseline: 2
5 = 40%);
Undirected Score: 45 = 80% (baseline: 4
5 = 80%).
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 6 / 30
State-of-the-Art
State-of-the-Art: A Brief History
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30
State-of-the-Art
State-of-the-Art: A Brief History
1992 — word classes (Carroll and Charniak)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30
State-of-the-Art
State-of-the-Art: A Brief History
1992 — word classes (Carroll and Charniak)
1998 — greedy linkage via mutual information (Yuret)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30
State-of-the-Art
State-of-the-Art: A Brief History
1992 — word classes (Carroll and Charniak)
1998 — greedy linkage via mutual information (Yuret)
2001 — iterative re-estimation with EM (Paskin)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30
State-of-the-Art
State-of-the-Art: A Brief History
1992 — word classes (Carroll and Charniak)
1998 — greedy linkage via mutual information (Yuret)
2001 — iterative re-estimation with EM (Paskin)
2004 — right-branching baseline— valence (DMV) (Klein and Manning)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30
State-of-the-Art
State-of-the-Art: A Brief History
1992 — word classes (Carroll and Charniak)
1998 — greedy linkage via mutual information (Yuret)
2001 — iterative re-estimation with EM (Paskin)
2004 — right-branching baseline— valence (DMV) (Klein and Manning)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30
State-of-the-Art
State-of-the-Art: A Brief History
1992 — word classes (Carroll and Charniak)
1998 — greedy linkage via mutual information (Yuret)
2001 — iterative re-estimation with EM (Paskin)
2004 — right-branching baseline— valence (DMV) (Klein and Manning)
2004 — annealing techniques (Smith and Eisner)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30
State-of-the-Art
State-of-the-Art: A Brief History
1992 — word classes (Carroll and Charniak)
1998 — greedy linkage via mutual information (Yuret)
2001 — iterative re-estimation with EM (Paskin)
2004 — right-branching baseline— valence (DMV) (Klein and Manning)
2004 — annealing techniques (Smith and Eisner)
2005 — contrastive estimation (Smith and Eisner)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30
State-of-the-Art
State-of-the-Art: A Brief History
1992 — word classes (Carroll and Charniak)
1998 — greedy linkage via mutual information (Yuret)
2001 — iterative re-estimation with EM (Paskin)
2004 — right-branching baseline— valence (DMV) (Klein and Manning)
2004 — annealing techniques (Smith and Eisner)
2005 — contrastive estimation (Smith and Eisner)
2006 — structural biasing (Smith and Eisner)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30
State-of-the-Art
State-of-the-Art: A Brief History
1992 — word classes (Carroll and Charniak)
1998 — greedy linkage via mutual information (Yuret)
2001 — iterative re-estimation with EM (Paskin)
2004 — right-branching baseline— valence (DMV) (Klein and Manning)
2004 — annealing techniques (Smith and Eisner)
2005 — contrastive estimation (Smith and Eisner)
2006 — structural biasing (Smith and Eisner)
2007 — common cover link representation (Seginer)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30
State-of-the-Art
State-of-the-Art: A Brief History
1992 — word classes (Carroll and Charniak)
1998 — greedy linkage via mutual information (Yuret)
2001 — iterative re-estimation with EM (Paskin)
2004 — right-branching baseline— valence (DMV) (Klein and Manning)
2004 — annealing techniques (Smith and Eisner)
2005 — contrastive estimation (Smith and Eisner)
2006 — structural biasing (Smith and Eisner)
2007 — common cover link representation (Seginer)
2008 — logistic normal priors (Cohen et al.)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30
State-of-the-Art
State-of-the-Art: A Brief History
1992 — word classes (Carroll and Charniak)
1998 — greedy linkage via mutual information (Yuret)
2001 — iterative re-estimation with EM (Paskin)
2004 — right-branching baseline— valence (DMV) (Klein and Manning)
2004 — annealing techniques (Smith and Eisner)
2005 — contrastive estimation (Smith and Eisner)
2006 — structural biasing (Smith and Eisner)
2007 — common cover link representation (Seginer)
2008 — logistic normal priors (Cohen et al.)
2009 — lexicalization and smoothing (Headden et al.)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30
State-of-the-Art
State-of-the-Art: A Brief History
1992 — word classes (Carroll and Charniak)
1998 — greedy linkage via mutual information (Yuret)
2001 — iterative re-estimation with EM (Paskin)
2004 — right-branching baseline— valence (DMV) (Klein and Manning)
2004 — annealing techniques (Smith and Eisner)
2005 — contrastive estimation (Smith and Eisner)
2006 — structural biasing (Smith and Eisner)
2007 — common cover link representation (Seginer)
2008 — logistic normal priors (Cohen et al.)
2009 — lexicalization and smoothing (Headden et al.)
2009 — soft parameter tying (Cohen and Smith)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30
State-of-the-Art
State-of-the-Art: Dependency Model with Valence
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 8 / 30
State-of-the-Art
State-of-the-Art: Dependency Model with Valence
a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 8 / 30
State-of-the-Art
State-of-the-Art: Dependency Model with Valence
a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)
h
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 8 / 30
State-of-the-Art
State-of-the-Art: Dependency Model with Valence
a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)
h
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 8 / 30
State-of-the-Art
State-of-the-Art: Dependency Model with Valence
a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)
h
a1
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 8 / 30
State-of-the-Art
State-of-the-Art: Dependency Model with Valence
a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)
h
a1
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 8 / 30
State-of-the-Art
State-of-the-Art: Dependency Model with Valence
a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)
h
a1
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 8 / 30
State-of-the-Art
State-of-the-Art: Dependency Model with Valence
a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)
h
a1 a2
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 8 / 30
State-of-the-Art
State-of-the-Art: Dependency Model with Valence
a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)
h
a1 a2
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 8 / 30
State-of-the-Art
State-of-the-Art: Dependency Model with Valence
a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)
h
a1 a2
STOP
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 8 / 30
State-of-the-Art
State-of-the-Art: Dependency Model with Valence
a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)
h
a1 a2
STOP
P(th) =∏
dir∈{L,R}
PSTOP(ch, dir,
adj︷︸︸︷
1n=0)
n∏
i=1
P(tai ) PATTACH(ch, dir, cai )
(1− PSTOP(ch, dir,
adj︷︸︸︷
1i=1))
n=|args(h,dir)|Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 8 / 30
State-of-the-Art
State-of-the-Art: Unsupervised Learning Engine
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 9 / 30
State-of-the-Art
State-of-the-Art: Unsupervised Learning EngineEM, via inside-outside re-estimation (Baker, 1979)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 9 / 30
State-of-the-Art
State-of-the-Art: Unsupervised Learning EngineEM, via inside-outside re-estimation (Baker, 1979)
w1 wmwp−1 wp wq wq+1
N1 (Manning and Schutze, 1999)
N j
· · · · · · · · ·
α
β
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 9 / 30
State-of-the-Art
State-of-the-Art: Unsupervised Learning EngineEM, via inside-outside re-estimation (Baker, 1979)
w1 wmwp−1 wp wq wq+1
N1 (Manning and Schutze, 1999)
N j
· · · · · · · · ·
α
β
BLACK
BOX
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 9 / 30
State-of-the-Art
State-of-the-Art: The Standard Corpus
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 10 / 30
State-of-the-Art
State-of-the-Art: The Standard Corpus
Training: WSJ10 (Klein, 2005)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 10 / 30
State-of-the-Art
State-of-the-Art: The Standard Corpus
Training: WSJ10 (Klein, 2005)
◮ The Wall Street Journal section of thePenn Treebank Project (Marcus et al., 1993)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 10 / 30
State-of-the-Art
State-of-the-Art: The Standard Corpus
Training: WSJ10 (Klein, 2005)
◮ The Wall Street Journal section of thePenn Treebank Project (Marcus et al., 1993)
◮ ... stripped of punctuation, etc.
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 10 / 30
State-of-the-Art
State-of-the-Art: The Standard Corpus
Training: WSJ10 (Klein, 2005)
◮ The Wall Street Journal section of thePenn Treebank Project (Marcus et al., 1993)
◮ ... stripped of punctuation, etc.◮ ... filtered down to sentences left
with no more than 10 POS tags;
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 10 / 30
State-of-the-Art
State-of-the-Art: The Standard Corpus
Training: WSJ10 (Klein, 2005)
◮ The Wall Street Journal section of thePenn Treebank Project (Marcus et al., 1993)
◮ ... stripped of punctuation, etc.◮ ... filtered down to sentences left
with no more than 10 POS tags;◮ ... and converted to reference dependencies
using “head percolation rules” (Collins, 1999).
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 10 / 30
State-of-the-Art
State-of-the-Art: The Standard Corpus
Training: WSJ10 (Klein, 2005)
◮ The Wall Street Journal section of thePenn Treebank Project (Marcus et al., 1993)
◮ ... stripped of punctuation, etc.◮ ... filtered down to sentences left
with no more than 10 POS tags;◮ ... and converted to reference dependencies
using “head percolation rules” (Collins, 1999).
Evaluation: Section 23 of WSJ∞ (all sentences).
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 10 / 30
State-of-the-Art
State-of-the-Art: The Standard Corpus
5 10 15 20 25 30 35 40 45
5
10
15
20
25
30
35
40
45Sentences (1,000s)
Tokens (1,000s)100
200
300
400
500
600
700
800
900
WSJk
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 11 / 30
State-of-the-Art
State-of-the-Art: The Standard Corpus
5 10 15 20 25 30 35 40 45
5
10
15
20
25
30
35
40
45Sentences (1,000s)
Tokens (1,000s)100
200
300
400
500
600
700
800
900
WSJk
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 11 / 30
(At Least) Two Issues
Issue I: Why so little data?
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 12 / 30
(At Least) Two Issues
Issue I: Why so little data?
extra unlabeled datahelps semi-supervised parsing (Suzuki et al., 2009)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 12 / 30
(At Least) Two Issues
Issue I: Why so little data?
extra unlabeled datahelps semi-supervised parsing (Suzuki et al., 2009)
yet state-of-the-art unsupervised methods use evenless than what’s available for supervised training...
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 12 / 30
(At Least) Two Issues
Issue I: Why so little data?
extra unlabeled datahelps semi-supervised parsing (Suzuki et al., 2009)
yet state-of-the-art unsupervised methods use evenless than what’s available for supervised training...
we will explore (three) judicious uses of dataand simple, scalable machine learning techniques
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 12 / 30
(At Least) Two Issues
Issue II: Non-convex objective...
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 13 / 30
(At Least) Two Issues
Issue II: Non-convex objective...
maximizing the probability of data (sentences):
θUNS = argmaxθ
∑
s
log∑
t∈T (s)
Pθ(t)
︸ ︷︷ ︸
Pθ(s)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 13 / 30
(At Least) Two Issues
Issue II: Non-convex objective...
maximizing the probability of data (sentences):
θUNS = argmaxθ
∑
s
log∑
t∈T (s)
Pθ(t)
︸ ︷︷ ︸
Pθ(s)
supervised objective would be convex (counting):
θSUP = argmaxθ
∑
s
log Pθ(t∗(s)).
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 13 / 30
(At Least) Two Issues
Issue II: Non-convex objective...
maximizing the probability of data (sentences):
θUNS = argmaxθ
∑
s
log∑
t∈T (s)
Pθ(t)
︸ ︷︷ ︸
Pθ(s)
supervised objective would be convex (counting):
θSUP = argmaxθ
∑
s
log Pθ(t∗(s)).
in general, θSUP 6= θUNS and θUNS 6= θUNS... (see CoNLL)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 13 / 30
(At Least) Two Issues
Issue II: Non-convex objective...
maximizing the probability of data (sentences):
θUNS = argmaxθ
∑
s
log∑
t∈T (s)
Pθ(t)
︸ ︷︷ ︸
Pθ(s)
supervised objective would be convex (counting):
θSUP = argmaxθ
∑
s
log Pθ(t∗(s)).
in general, θSUP 6= θUNS and θUNS 6= θUNS... (see CoNLL)
initialization matters!
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 13 / 30
(At Least) Two Issues
Issues: The Lay of the Land
5 10 15 20 25 30 35 40
20
30
40
50
60
70
80
90
WSJk
Directed Dependency Accuracy (%)on WSJk
(At Least) Two Issues
Issues: The Lay of the Land
5 10 15 20 25 30 35 40
20
30
40
50
60
70
80
90
WSJk
Directed Dependency Accuracy (%)on WSJk
(At Least) Two Issues
Issues: The Lay of the Land
5 10 15 20 25 30 35 40
20
30
40
50
60
70
80
90
WSJk
Directed Dependency Accuracy (%)on WSJk
Uninformed
(At Least) Two Issues
Issues: The Lay of the Land
5 10 15 20 25 30 35 40
20
30
40
50
60
70
80
90
WSJk
Directed Dependency Accuracy (%)on WSJk
Uninformed
(At Least) Two Issues
Issues: The Lay of the Land
5 10 15 20 25 30 35 40
20
30
40
50
60
70
80
90
WSJk
Directed Dependency Accuracy (%)on WSJk
Uninformed
(At Least) Two Issues
Issues: The Lay of the Land
5 10 15 20 25 30 35 40
20
30
40
50
60
70
80
90
WSJk
Directed Dependency Accuracy (%)on WSJk
Uninformed
(At Least) Two Issues
Issues: The Lay of the Land
5 10 15 20 25 30 35 40
20
30
40
50
60
70
80
90
WSJk
Directed Dependency Accuracy (%)on WSJk
Uninformed
Oracle
(At Least) Two Issues
Issues: The Lay of the Land
5 10 15 20 25 30 35 40
20
30
40
50
60
70
80
90
WSJk
Directed Dependency Accuracy (%)on WSJk
Uninformed
Oracle
(At Least) Two Issues
Issues: The Lay of the Land
5 10 15 20 25 30 35 40
20
30
40
50
60
70
80
90
WSJk
Directed Dependency Accuracy (%)on WSJk
Uninformed
Oracle
K&M (Ad-Hoc Harmonic Init)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 14 / 30
Baby Steps
Idea I: Baby Steps ... as Non-convex Optimization
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 15 / 30
Baby Steps
Idea I: Baby Steps ... as Non-convex Optimization
global non-convex optimization is hard ...
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 15 / 30
Baby Steps
Idea I: Baby Steps ... as Non-convex Optimization
global non-convex optimization is hard ...
meta-heuristic: take guesswork out of local search
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 15 / 30
Baby Steps
Idea I: Baby Steps ... as Non-convex Optimization
global non-convex optimization is hard ...
meta-heuristic: take guesswork out of local search
start with an easy (convex) case
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 15 / 30
Baby Steps
Idea I: Baby Steps ... as Non-convex Optimization
global non-convex optimization is hard ...
meta-heuristic: take guesswork out of local search
start with an easy (convex) case
slowly extend it to the fully complex target task
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 15 / 30
Baby Steps
Idea I: Baby Steps ... as Non-convex Optimization
global non-convex optimization is hard ...
meta-heuristic: take guesswork out of local search
start with an easy (convex) case
slowly extend it to the fully complex target task
take tiny (cautious) steps in the problem space
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 15 / 30
Baby Steps
Idea I: Baby Steps ... as Non-convex Optimization
global non-convex optimization is hard ...
meta-heuristic: take guesswork out of local search
start with an easy (convex) case
slowly extend it to the fully complex target task
take tiny (cautious) steps in the problem space
... try not to stray far from relevantneighborhoods in the solution space
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 15 / 30
Baby Steps
Idea I: Baby Steps ... as Non-convex Optimization
global non-convex optimization is hard ...
meta-heuristic: take guesswork out of local search
start with an easy (convex) case
slowly extend it to the fully complex target task
take tiny (cautious) steps in the problem space
... try not to stray far from relevantneighborhoods in the solution space
base case: sentences of length one (trivial — no init)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 15 / 30
Baby Steps
Idea I: Baby Steps ... as Non-convex Optimization
global non-convex optimization is hard ...
meta-heuristic: take guesswork out of local search
start with an easy (convex) case
slowly extend it to the fully complex target task
take tiny (cautious) steps in the problem space
... try not to stray far from relevantneighborhoods in the solution space
base case: sentences of length one (trivial — no init)
incremental step: smooth WSJk; re-init WSJ(k + 1)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 15 / 30
Baby Steps
Idea I: Baby Steps ... as Non-convex Optimization
global non-convex optimization is hard ...
meta-heuristic: take guesswork out of local search
start with an easy (convex) case
slowly extend it to the fully complex target task
take tiny (cautious) steps in the problem space
... try not to stray far from relevantneighborhoods in the solution space
base case: sentences of length one (trivial — no init)
incremental step: smooth WSJk; re-init WSJ(k + 1)
... this really is grammar induction!
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 15 / 30
Baby Steps
Idea I: Baby Steps ... as Graduated Learning
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 16 / 30
Baby Steps
Idea I: Baby Steps ... as Graduated Learning
WSJ1 — Atone (verbs!)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 16 / 30
Baby Steps
Idea I: Baby Steps ... as Graduated Learning
WSJ1 — Atone (verbs!)
Darkness fell. (nouns!)WSJ2 — It is.
Judge Not
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 16 / 30
Baby Steps
Idea I: Baby Steps ... as Graduated Learning
WSJ1 — Atone (verbs!)
Darkness fell. (nouns!)WSJ2 — It is.
Judge Not
Become a Lobbyist (determiners!)WSJ3 — But many have.
They didn’t.
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 16 / 30
Baby Steps
Idea I: Baby Steps ... and Related Notions
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 17 / 30
Baby Steps
Idea I: Baby Steps ... and Related Notions
shaping (Skinner, 1938)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 17 / 30
Baby Steps
Idea I: Baby Steps ... and Related Notions
shaping (Skinner, 1938)
less is more (Kail, 1984; Newport, 1988; 1990)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 17 / 30
Baby Steps
Idea I: Baby Steps ... and Related Notions
shaping (Skinner, 1938)
less is more (Kail, 1984; Newport, 1988; 1990)
starting small (Elman, 1993)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 17 / 30
Baby Steps
Idea I: Baby Steps ... and Related Notions
shaping (Skinner, 1938)
less is more (Kail, 1984; Newport, 1988; 1990)
starting small (Elman, 1993)
◮ scaffold on model complexity [restrict memory]
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 17 / 30
Baby Steps
Idea I: Baby Steps ... and Related Notions
shaping (Skinner, 1938)
less is more (Kail, 1984; Newport, 1988; 1990)
starting small (Elman, 1993)
◮ scaffold on model complexity [restrict memory]◮ scaffold on data complexity [restrict input]
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 17 / 30
Baby Steps
Idea I: Baby Steps ... and Related Notions
shaping (Skinner, 1938)
less is more (Kail, 1984; Newport, 1988; 1990)
starting small (Elman, 1993)
◮ scaffold on model complexity [restrict memory]◮ scaffold on data complexity [restrict input]
controversy! (Rohde and Plaut, 1999)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 17 / 30
Baby Steps
Idea I: Baby Steps ... and Related Notions
shaping (Skinner, 1938)
less is more (Kail, 1984; Newport, 1988; 1990)
starting small (Elman, 1993)
◮ scaffold on model complexity [restrict memory]◮ scaffold on data complexity [restrict input]
controversy! (Rohde and Plaut, 1999)
stepping stones (Brown et al., 1993)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 17 / 30
Baby Steps
Idea I: Baby Steps ... and Related Notions
shaping (Skinner, 1938)
less is more (Kail, 1984; Newport, 1988; 1990)
starting small (Elman, 1993)
◮ scaffold on model complexity [restrict memory]◮ scaffold on data complexity [restrict input]
controversy! (Rohde and Plaut, 1999)
stepping stones (Brown et al., 1993)
coarse-to-fine (Charniak and Johnson, 2005)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 17 / 30
Baby Steps
Idea I: Baby Steps ... and Related Notions
shaping (Skinner, 1938)
less is more (Kail, 1984; Newport, 1988; 1990)
starting small (Elman, 1993)
◮ scaffold on model complexity [restrict memory]◮ scaffold on data complexity [restrict input]
controversy! (Rohde and Plaut, 1999)
stepping stones (Brown et al., 1993)
coarse-to-fine (Charniak and Johnson, 2005)
curriculum learning (Bengio et al., 2009)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 17 / 30
Baby Steps
Idea I: Baby Steps ... and Related Notions
shaping (Skinner, 1938)
less is more (Kail, 1984; Newport, 1988; 1990)
starting small (Elman, 1993)
◮ scaffold on model complexity [restrict memory]◮ scaffold on data complexity [restrict input]
controversy! (Rohde and Plaut, 1999)
stepping stones (Brown et al., 1993)
coarse-to-fine (Charniak and Johnson, 2005)
curriculum learning (Bengio et al., 2009)
continuation methods (Allgower and Georg, 1990)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 17 / 30
Baby Steps
Idea I: Baby Steps ... and Related Notions
shaping (Skinner, 1938)
less is more (Kail, 1984; Newport, 1988; 1990)
starting small (Elman, 1993)
◮ scaffold on model complexity [restrict memory]◮ scaffold on data complexity [restrict input]
controversy! (Rohde and Plaut, 1999)
stepping stones (Brown et al., 1993)
coarse-to-fine (Charniak and Johnson, 2005)
curriculum learning (Bengio et al., 2009)
continuation methods (Allgower and Georg, 1990)
successive approximations!
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 17 / 30
Baby Steps
Idea I: Baby Steps ... Results!
5 10 15 20 25 30 35 40
20
30
40
50
60
70
80
90
WSJk
Directed Dependency Accuracy (%)on WSJk
Uninformed
Oracle
K&M (Ad-Hoc Harmonic Init)
Baby Steps
Idea I: Baby Steps ... Results!
5 10 15 20 25 30 35 40
20
30
40
50
60
70
80
90
WSJk
Directed Dependency Accuracy (%)on WSJk
Uninformed
Oracle
K&M Baby Steps
Baby Steps
Idea I: Baby Steps ... Results!
5 10 15 20 25 30 35 40
20
30
40
50
60
70
80
90
WSJk
Directed Dependency Accuracy (%)on WSJk
Uninformed
Oracle
K&M Baby Steps
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 18 / 30
Baby Steps
Idea I: Baby Steps ... Concerns?
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 19 / 30
Baby Steps
Idea I: Baby Steps ... Concerns?
ignores a good initializer
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 19 / 30
Baby Steps
Idea I: Baby Steps ... Concerns?
ignores a good initializer
unnecessarily meticulous
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 19 / 30
Baby Steps
Idea I: Baby Steps ... Concerns?
ignores a good initializer
unnecessarily meticulous
excruciatingly slow!
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 19 / 30
Baby Steps
Idea I: Baby Steps ... Concerns?
ignores a good initializer
unnecessarily meticulous
excruciatingly slow!
about a year behind state-of-the-art (on long sentences)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 19 / 30
Less is More
Idea II: Less is More
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 20 / 30
Less is More
Idea II: Less is More
short sentences are not representative (and few)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 20 / 30
Less is More
Idea II: Less is More
short sentences are not representative (and few)
long sentences are overwhelmingly difficult ...
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 20 / 30
Less is More
Idea II: Less is More
short sentences are not representative (and few)
long sentences are overwhelmingly difficult ...
is there a sweet spot data gradation?
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 20 / 30
Less is More
Idea II: Less is More
short sentences are not representative (and few)
long sentences are overwhelmingly difficult ...
is there a sweet spot data gradation?
perhaps train where Baby Steps flatlines!
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 20 / 30
Less is More
Idea II: Less is More ... the Learning Curve
5 10 15 20 25 30 35 40 45
3.0
3.5
4.0
4.5
5.0
WSJk
bpt
Cross-entropy h (in bits per token) on WSJ45
Less is More
Idea II: Less is More ... the Learning Curve
5 10 15 20 25 30 35 40 45
3.0
3.5
4.0
4.5
5.0
WSJk
bpt
Cross-entropy h (in bits per token) on WSJ45
Knee
[7, 15] Tight, Flat, Asymptotic Bound
Less is More
Idea II: Less is More ... the Learning Curve
5 10 15 20 25 30 35 40 45
3.0
3.5
4.0
4.5
5.0
WSJk
bpt
Cross-entropy h (in bits per token) on WSJ45
Knee
[7, 15] Tight, Flat, Asymptotic Bound
— automatically detect the knee: [7, 15]
Less is More
Idea II: Less is More ... the Learning Curve
5 10 15 20 25 30 35 40 45
3.0
3.5
4.0
4.5
5.0
WSJk
bpt
Cross-entropy h (in bits per token) on WSJ45
Knee
[7, 15] Tight, Flat, Asymptotic Bound
— automatically detect the knee: [7, 15]
— train at the “sweet spot” gradation: WSJ15
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 21 / 30
Less is More
Idea II: Less is More ... Results!
5 10 15 20 25 30 35 40
20
30
40
50
60
70
80
90
WSJk
Directed Dependency Accuracy (%)
Oracle
Baby Steps
on WSJk
K&M
Less is More
Idea II: Less is More ... Results!
5 10 15 20 25 30 35 40
20
30
40
50
60
70
80
90 on WSJ40
WSJk
Directed Dependency Accuracy (%)
Oracle
Baby Steps
Less is More
Idea II: Less is More ... Results!
5 10 15 20 25 30 35 40
20
30
40
50
60
70
80
90 on WSJ40
WSJk
Directed Dependency Accuracy (%)
Oracle
Baby Steps
Less is More︸ ︷︷ ︸
K&M∗
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 22 / 30
Less is More
Idea II: Less is More ... Concerns?
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 23 / 30
Less is More
Idea II: Less is More ... Concerns?
discards most of the data
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 23 / 30
Less is More
Idea II: Less is More ... Concerns?
discards most of the data
beats state-of-the-art (on long sentences, off WSJ15)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 23 / 30
Less is More
Idea II: Less is More ... Concerns?
discards most of the data
beats state-of-the-art (on long sentences, off WSJ15)
ignores a decent complementary initialization strategy
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 23 / 30
Leapfrog
Idea III: Leapfrog ... a Hack
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 24 / 30
Leapfrog
Idea III: Leapfrog ... a Hack
use both good systems!
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 24 / 30
Leapfrog
Idea III: Leapfrog ... a Hack
use both good systems!
thorough training up to WSJ15, where it’s cheap
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 24 / 30
Leapfrog
Idea III: Leapfrog ... a Hack
use both good systems!
thorough training up to WSJ15, where it’s cheap
use both good initializers (mix their best parse trees)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 24 / 30
Leapfrog
Idea III: Leapfrog ... a Hack
use both good systems!
thorough training up to WSJ15, where it’s cheap
use both good initializers (mix their best parse trees)
execute just a few steps of EM where it’s expensive
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 24 / 30
Leapfrog
Idea III: Leapfrog ... a Hack
use both good systems!
thorough training up to WSJ15, where it’s cheap
use both good initializers (mix their best parse trees)
execute just a few steps of EM where it’s expensive
hop on from WSJ15 to WSJ45, via WSJ30...
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 24 / 30
Leapfrog
Idea III: Leapfrog ... Results!
5 10 15 20 25 30 35 40
20
30
40
50
60
70
80
90
WSJk
Directed Dependency Accuracy (%)on WSJk
Oracle
Uninformed
Baby Steps
Leapfrog
Idea III: Leapfrog ... Results!
5 10 15 20 25 30 35 40
20
30
40
50
60
70
80
90
WSJk
Directed Dependency Accuracy (%)on WSJk
Oracle
Uninformed
Baby Steps
K&M∗
Leapfrog
Idea III: Leapfrog ... Results!
5 10 15 20 25 30 35 40
20
30
40
50
60
70
80
90
WSJk
Directed Dependency Accuracy (%)on WSJk
Oracle
Uninformed
Baby Steps
K&M∗
Leapfrog
Idea III: Leapfrog ... Results!
5 10 15 20 25 30 35 40
20
30
40
50
60
70
80
90
WSJk
Directed Dependency Accuracy (%)on WSJk
Oracle
Uninformed
Baby Steps
K&M∗
Leapfrog
Idea III: Leapfrog ... Results!
5 10 15 20 25 30 35 40
20
30
40
50
60
70
80
90
WSJk
Directed Dependency Accuracy (%)on WSJk
Oracle
Uninformed
Baby Steps
K&M∗
Leapfrog
Idea III: Leapfrog ... Results!
5 10 15 20 25 30 35 40
20
30
40
50
60
70
80
90
WSJk
Directed Dependency Accuracy (%)on WSJk
Oracle
Uninformed
Baby Steps
K&M∗
Leapfrog
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 25 / 30
Results
Results: ... on Section 23 of WSJ
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 26 / 30
Results
Results: ... on Section 23 of WSJ
Right-Branching (Klein and Manning, 2004) 31.7%
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 26 / 30
Results
Results: ... on Section 23 of WSJ
Right-Branching (Klein and Manning, 2004) 31.7%DMV @10 34.2%
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 26 / 30
Results
Results: ... on Section 23 of WSJ
Right-Branching (Klein and Manning, 2004) 31.7%DMV @10 34.2%Baby Steps @15 39.2%Baby Steps @45 39.4%
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 26 / 30
Results
Results: ... on Section 23 of WSJ
Right-Branching (Klein and Manning, 2004) 31.7%DMV @10 34.2%Baby Steps @15 39.2%Baby Steps @45 39.4%Soft Parameter Tying (Cohen and Smith, 2009) 42.2%
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 26 / 30
Results
Results: ... on Section 23 of WSJ
Right-Branching (Klein and Manning, 2004) 31.7%DMV @10 34.2%Baby Steps @15 39.2%Baby Steps @45 39.4%Soft Parameter Tying (Cohen and Smith, 2009) 42.2%Less is More @15 44.1%
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 26 / 30
Results
Results: ... on Section 23 of WSJ
Right-Branching (Klein and Manning, 2004) 31.7%DMV @10 34.2%Baby Steps @15 39.2%Baby Steps @45 39.4%Soft Parameter Tying (Cohen and Smith, 2009) 42.2%Less is More @15 44.1%Leapfrog @45 45.0%
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 26 / 30
Conclusion
Summary
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 27 / 30
Conclusion
Summary
explored scaffolding on data complexity
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 27 / 30
Conclusion
Summary
explored scaffolding on data complexity
awareness of data complexity does help!
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 27 / 30
Conclusion
Summary
explored scaffolding on data complexity
awareness of data complexity does help!
beats state-of-the-art with older techniques
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 27 / 30
Conclusion
Conclusion
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 28 / 30
Conclusion
Conclusion
(need a less adversarial learning algorithm)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 28 / 30
Conclusion
Conclusion
(need a less adversarial learning algorithm)
paradox: improved performance with less data
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 28 / 30
Conclusion
Conclusion
(need a less adversarial learning algorithm)
paradox: improved performance with less data
despite discarding samples from the true (test) distribution
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 28 / 30
Conclusion
Conclusion
(need a less adversarial learning algorithm)
paradox: improved performance with less data
despite discarding samples from the true (test) distribution
focusing on simple examples guides unsupervised learning
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 28 / 30
Conclusion
Conclusion
(need a less adversarial learning algorithm)
paradox: improved performance with less data
despite discarding samples from the true (test) distribution
focusing on simple examples guides unsupervised learning
mirrors supervised boosting (Freund and Schapire, 1997)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 28 / 30
Conclusion
Teaser
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 29 / 30
Conclusion
Teaser
we push the state-of-the-art further, to 50.4% (upanother 5%) using even faster and simpler methods!
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 29 / 30
Conclusion
Teaser
we push the state-of-the-art further, to 50.4% (upanother 5%) using even faster and simpler methods!
... hear us at CoNLL and ACL (Spitkovsky et al., 2010)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 29 / 30
Conclusion
Teaser
we push the state-of-the-art further, to 50.4% (upanother 5%) using even faster and simpler methods!
... hear us at CoNLL and ACL (Spitkovsky et al., 2010)
similar approaches may apply in other settings(e.g., word alignment)
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 29 / 30
Conclusion
Teaser
we push the state-of-the-art further, to 50.4% (upanother 5%) using even faster and simpler methods!
... hear us at CoNLL and ACL (Spitkovsky et al., 2010)
similar approaches may apply in other settings(e.g., word alignment)
... more to come!
Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 29 / 30
top related