transparent machine learning for information extraction: state-of-the-art and the future

259
© 2015 IBM Corporation Transparent Machine Learning Transparent Machine Learning Transparent Machine Learning Transparent Machine Learning for Information Extraction for Information Extraction for Information Extraction for Information Extraction Laura Chiticariu Yunyao Li Fred Reiss IBM Research - Almaden

Upload: yunyao-li

Post on 08-Apr-2017

765 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Transparent Machine Learning Transparent Machine Learning Transparent Machine Learning Transparent Machine Learning for Information Extractionfor Information Extractionfor Information Extractionfor Information Extraction

Laura Chiticariu

Yunyao Li

Fred Reiss

IBM Research - Almaden

Page 2: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Motivation

2

Page 3: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Case Study 1: Social Media

Information

Extraction

Product catalog, Customer Master Data, …

Social Media

• Products

Interests 360o Profile

• Relationships• Personal

Attributes

• Life

Events

Statistical

Analysis,

Report Gen.

Bank

6

23%

Bank

5

20%

Bank

4

21%

Bank

3

5%

Bank

2

15%

Bank

1

15%

ScaleScaleScaleScale450M+ tweets a day,

100M+ consumers, …

BreadthBreadthBreadthBreadthBuzz, intent, sentiment, life

events, personal atts, …

ComplexityComplexityComplexityComplexitySarcasm, wishful thinking, …

Customer 360º3

Page 4: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Case Study 2: Server Logs

WebServers

ApplicationServers

TransactionProcessing

Monitors

Database #1 Database #2

Log

File

Log

File

Log

File

Log

File

Log

File

Log

File

Log

File

Log

File

Log

File

Log FileDB #2

Log File

• Web site with multi-tier

architecture

• Every component produces its

own system logs

• An error shows up in the log for

Database #2

• What sequence of events led What sequence of events led What sequence of events led What sequence of events led

to this error?to this error?to this error?to this error?

12:34:56 SQL ERROR 43251:

Table CUST.ORDERWZ is not

Operations Analysis

4

Page 5: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Case Study 2: Server Logs

Web Server

Logs

Custom Custom

Application

Logs

Database

Logs

Raw LogsParsed

Log

Records

Parse Extract

Linkage

Information

End-to-End

Application

Sessions

Integrate

Operations Analysis

Graphical Models(Anomaly detection)

Analyze

5 min 0.85

CauseCause EffectEffect DelayDelay CorrelationCorrelation

10 min 0.62

Correlation Analysis

• Every customer has unique

components with unique log record

formats

• Need to quickly customize all

stages of analysis for these custom

log data sources

5

Page 6: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Case Study 3: Sentiment Analysis for Analyst Research ReportsCase Study 3: Sentiment Analysis for Analyst Research ReportsCase Study 3: Sentiment Analysis for Analyst Research ReportsCase Study 3: Sentiment Analysis for Analyst Research Reports

� Determine the sentiments expressed towards a financial entity or its aspects in financial

research reports

� Handle different categories of sentiment mentions

– Direct: Explicit recommendations

• Our current neutrals are on China/Hong Kong, Singapore, Indonesia and Thailand; underweight on Malaysia, Korea, Taiwan and India.

• We prefer HK Telecom from a long term perspective.

– Indirect: Mention of a change in a key indicator that can be directly linked to a recommendation

• Intel's 2013 capex is elevated relative to historical norms

• FHLMC reported a net loss of $2.5bn net loss for the quarter.

– Implicit: other sentiment mentions that are not direct recommendations or statements about a

key economic indicator

• Taiwan is making continuous progress on trade and investment liberalization, which bodes well for its long-term economic prospects

• Export outlook remains lackluster for the next 1-3 months.

Sentiment Mention Sentiment

Target

Sentiment

Polarity

Entity

Type

Sentiment

Category

Aspect

We prefer HK Telecom from a long term perspective HK Telecom Positive Company Direct n/a

Sell EUR/CHF at market for a decline to 1.31000 EUR Negative Currency Direct n/a

Sell EUR/CHF at market for a decline to 1.31000 CHF Positive Currency Direct n/a

Intel's 2013 capex is elevated relative to historical norms

Intel Positive Company Indirect Capex

Financial Analytics6

Page 7: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Requirements for IE in the Enterprise

• Scalability

7

Page 8: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Scalability Examples

�Social Media–Twitter has 450M+ messages per day; 1TB+ per day � 400+ TB per year

–Add to it enterprise-specific Facebook, Tumblr, and tens of thousands of

blogs/forums

�Financial Data–Regulatory filings can be in tens of millions and several TBs

�Machine data–One application server under moderate load at medium logging level �

1GB of app server logs per day

–A medium-size data center has tens of thousands of servers � Tens of

Terabytes of system logs per day

8

Page 9: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Requirements for IE in the Enterprise

• Scalability

• Expressivity

9

Page 10: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Expressivity Example: Varied Input Data

… hotel close to Seaworld in Orlando? Planning to go this coming weekend…

Social Media

Customer 360º

Security & Privacy

Operations Analysis

Financial Analytics

my social security # is 400-05-4356

Email

Machine

Data

Financial

Statements

Medical

Records

News

CRM

Patents

Product:Hotel Location:Orlando

Type:SSN Value:400054356

Storage Module 2 Drive 1 fault

Event:DriveFail Loc:mod2.Slot1

10

Page 11: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Expressivity Example: Different Kinds of Parses

We are raising our tablet forecast.

S

areNP

We

S

raisingNP

forecastNP

tablet

DET

our

subj

obj

subj pred

Natural Language Machine Log

Dependency

Tree

Oct 1 04:12:24 9.1.1.3 41865:

%PLATFORM_ENV-1-DUAL_PWR: Faulty

internal power supply B detected

Time Oct 1 04:12:24

Host 9.1.1.3

Process 41865

Category%PLATFORM_ENV-1-

DUAL_PWR

MessageFaulty internal power supply B detected

11

Page 12: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Expressivity Example: Fact Extraction (Tables)

12

Singapore 2012 Annual Report(136 pages PDF)

Identify note breaking down Operating

expenses line item, and extract opex

components

Identify line item for Operating expenses

from Income statement (financial table in

pdf document)

12

Page 13: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Expressivity Example: Sentiment Analysis

Mcdonalds mcnuggets are fake as shit but they so delicious.

We should do something cool like go to ZZZZ (kiddingkiddingkiddingkidding).

Makin chicken fries at home bc everyone sucks!

You are never too old for Disney movies.

Bank X got me ****ed up today!

Not a pleasant client experience. Please fix ASAP.

I'm still hearing from clients that Company A's website is better.

X... fixing something that wasn't broken

Intel's 2013 capex is elevated at 23% of sales, above average of 16%

IBM announced 4Q2012 earnings of $5.13 per share, compared with 4Q2011 earnings of $4.62

per share, an increase of 11 percent

We continue to rate shares of MSFT neutral.

Sell EUR/CHF at market for a decline to 1.31000…

FHLMC reported $4.4bn net loss and requested $6bn in capital from Treasury.

Customer SurveysCustomer SurveysCustomer SurveysCustomer Surveys

Social MediaSocial MediaSocial MediaSocial Media

Analyst Research Analyst Research Analyst Research Analyst Research

ReportsReportsReportsReports

13

Page 14: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Requirements for IE in the Enterprise

• Scalability

• Expressivity

• Ease of comprehension

14

Page 15: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Ease of Comprehension: What not to do (1)

package com.ibm.avatar.algebra.util.sentence;

import java.io.BufferedWriter;

import java.util.ArrayList;

import java.util.HashSet;

import java.util.regex.Matcher;

public class SentenceChunker

{

private Matcher sentenceEndingMatcher = null;

public static BufferedWriter sentenceBufferedWriter = null;

private HashSet<String> abbreviations = new HashSet<String> ();

public SentenceChunker ()

{

}

/** Constructor that takes in the abbreviations directly. */

public SentenceChunker (String[] abbreviations)

{

// Generate the abbreviations directly.

for (String abbr : abbreviations) {

this.abbreviations.add (abbr);

}

}

/**

* @param doc the document text to be analyzed

* @return true if the document contains at least one sentence boundary

*/

public boolean containsSentenceBoundary (String doc)

{

String origDoc = doc;

/*

* Based on getSentenceOffsetArrayList()

*/

// String origDoc = doc;

// int dotpos, quepos, exclpos, newlinepos;

int boundary;

int currentOffset = 0;

do {

/* Get the next tentative boundary for the sentenceString */

setDocumentForObtainingBoundaries (doc);

boundary = getNextCandidateBoundary ();

if (boundary != -1) {doc.substring (0, boundary + 1);

String remainder = doc.substring (boundary + 1);

String candidate = /*

* Looks at the last character of the String. If this last

* character is part of an abbreviation (as detected by

* REGEX) then the sentenceString is not a fullSentence and

* "false” is returned

*/

// while (!(isFullSentence(candidate) &&

// doesNotBeginWithCaps(remainder))) {

while (!(doesNotBeginWithPunctuation (remainder)

&& isFullSentence (candidate))) {

/* Get the next tentative boundary for the sentenceString */

int nextBoundary = getNextCandidateBoundary ();

if (nextBoundary == -1) {

break;

}

boundary = nextBoundary;

candidate = doc.substring (0, boundary + 1);

remainder = doc.substring (boundary + 1);

}

if (candidate.length () > 0) {

// sentences.addElement(candidate.trim().replaceAll("\n", "

// "));

// sentenceArrayList.add(new Integer(currentOffset + boundary

// + 1));

// currentOffset += boundary + 1;

// Found a sentence boundary. If the boundary is the last

// character in the string, we don't consider it to be

// contained within the string.

int baseOffset = currentOffset + boundary + 1;

if (baseOffset < origDoc.length ()) {

// System.err.printf("Sentence ends at %d of %d\n",

// baseOffset, origDoc.length());

return true;

}

else {

return false;

}

}

// origDoc.substring(0,currentOffset));

// doc = doc.substring(boundary + 1);

doc = remainder;

}

}

while (boundary != -1);

// If we get here, didn't find any boundaries.

return false;

}

public ArrayList<Integer> getSentenceOffsetArrayList (String doc)

{

ArrayList<Integer> sentenceArrayList = new ArrayList<Integer> ();

// String origDoc = doc;

// int dotpos, quepos, exclpos, newlinepos;

int boundary;

int currentOffset = 0;

sentenceArrayList.add (new Integer (0));

do {

/* Get the next tentative boundary for the sentenceString */

setDocumentForObtainingBoundaries (doc);

boundary = getNextCandidateBoundary ();

if (boundary != -1) {

String candidate = doc.substring (0, boundary + 1);

String remainder = doc.substring (boundary + 1);

/*

* Looks at the last character of the String. If this last character

* is part of an abbreviation (as detected by REGEX) then the

* sentenceString is not a fullSentence and "false" is returned

*/

// while (!(isFullSentence(candidate) &&

// doesNotBeginWithCaps(remainder))) {

while (!(doesNotBeginWithPunctuation (remainder) &&

isFullSentence (candidate))) {

/* Get the next tentative boundary for the sentenceString */

int nextBoundary = getNextCandidateBoundary ();

if (nextBoundary == -1) {

break;

}

boundary = nextBoundary;

candidate = doc.substring (0, boundary + 1);

remainder = doc.substring (boundary + 1);

}

if (candidate.length () > 0) {

sentenceArrayList.add (new Integer (currentOffset + boundary + 1));

currentOffset += boundary + 1;

}

// origDoc.substring(0,currentOffset));

// doc = doc.substring(boundary + 1);

doc = remainder;

}

}

while (boundary != -1);

if (doc.length () > 0) {

sentenceArrayList.add (new Integer (currentOffset + doc.length ()));

}

sentenceArrayList.trimToSize ();

return sentenceArrayList;

}

private void setDocumentForObtainingBoundaries (String doc)

{

sentenceEndingMatcher = SentenceConstants.

sentenceEndingPattern.matcher (doc);

}

private int getNextCandidateBoundary ()

{

if (sentenceEndingMatcher.find ()) {

return sentenceEndingMatcher.start ();

}

else

return -1;

}

private boolean doesNotBeginWithPunctuation (String remainder)

{

Matcher m = SentenceConstants.punctuationPattern.matcher (remainder);

return (!m.find ());

}

private String getLastWord (String cand)

{

Matcher lastWordMatcher = SentenceConstants.lastWordPattern.matcher (cand);

if (lastWordMatcher.find ()) {

return lastWordMatcher.group ();

}

else {

return "";

}

}

/*

* Looks at the last character of the String. If this last character is

* par of an abbreviation (as detected by REGEX)

* then the sentenceString is not a fullSentence and "false" is returned

*/

private boolean isFullSentence (String cand)

{

// cand = cand.replaceAll("\n", " "); cand = " " + cand;

Matcher validSentenceBoundaryMatcher =

SentenceConstants.validSentenceBoundaryPattern.matcher (cand);

if (validSentenceBoundaryMatcher.find ()) return true;

Matcher abbrevMatcher = SentenceConstants.abbrevPattern.matcher (cand);

if (abbrevMatcher.find ()) {

return false; // Means it ends with an abbreviation

}

else {

// Check if the last word of the sentenceString has an entry in the

// abbreviations dictionary (like Mr etc.)

String lastword = getLastWord (cand);

if (abbreviations.contains (lastword)) { return false; }

}

return true;

}

}

Java Implementation of Sentence Boundary Detection

15

Page 16: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Some light reading

Viterbi debug output

Ease of Comprehension: What not to do (2)

16

Feature extraction (2200 lines)

Page 17: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Ease of Comprehension Example

17

Page 18: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Requirements for IE in the Enterprise

• Scalability

• Expressivity

• Ease of comprehension

• Ease of debugging

18

Page 19: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Ease of Debugging: What Ease of Debugging: What Ease of Debugging: What Ease of Debugging: What notnotnotnot to doto doto doto do

19

Same features.

Same entities.

Slightly different

training data.

Wrong answer.English.CoNLL.4class

English.all.3class

Page 20: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Ease of Debugging Example

20

Page 21: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Requirements for IE in the Enterprise

• Scalability

• Expressivity

• Ease of comprehension

• Ease of debugging

• Ease of enhancement

21

Page 22: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Example: Sentiment Analysis

Intel's 2013 capex is elevated at 23% of sales, above average of 16%

FHLMC reported $4.4bn net loss and requested $6bn in capital from Treasury.

I'm still hearing from clients that Merrill's website is better.

Customer or

competitor?

Good or bad?

Entity of interest

I need to go back to Walmart, Toys R Us has the same toy $10

cheaper!

22

Page 23: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Requirements for IE in the Enterprise

• Scalability

• Expressivity

• Ease of comprehension

• Ease of debugging

• Ease of enhancement

Transparency

23

Page 24: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Road map

�Focus of this tutorial:

–Achieving transparency…

–…while leveraging machine learning

�Parts that will follow:

–Part 2: Intro to Transparent Machine Learning

–Part 3: State of the Art in Transparent ML

–Part 4: Case study

–Part 5: Research Challenges and Future Directions

24

Page 25: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Transparent ML: Intro

25

Page 26: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

A Brief History of IE

� 1978-1997: MUC (Message Understanding Conference) – DARPA competition 1987 to 1997

– FRUMP [DeJong82]– FASTUS [Appelt93],– TextPro, PROTEUS

� 1998: Common Pattern Specification Language (CPSL) standard [Appelt98]

– Standard for subsequent rule-based systems

� 1999-2010: Commercial products, GATE

� 2006 – Declarative IE started in Universities and Industrial Labs

� At first: Simple techniques like Naive Bayes

� 1990’s: Learning Rules– AUTOSLOG [Riloff93]– CRYSTAL [Soderland98]– SRV [Freitag98]

� 2000’s: More specialized models– Hidden Markov Models [Leek97]– Maximum Entropy Markov Models

[McCallum00]– Conditional Random Fields

[Lafferty01]– Automatic feature expansion

Rule-Based Machine Learning

26

Page 27: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

A False Dichotomy

Rule-Based Machine Learning

Model of representation

Rules

Learningalgorithm

None

Incorporation of domain knowledge

Manual, by

writing rules

Humans involved in all aspects

Humans not involved at all

IE system traditionally perceived as either

completely Rule-based or completely ML-based.

Regarded as lacking in research opportunitiesRegarded as lacking in research opportunities

Lots of research focuses here

Lots of research focuses here

Opaque

The more complex, the better

Completely automatic; the

more complex the better

The least, the better

27

Page 28: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Spectrum of Techniques

The Reality Is Much More Nuanced !

Rule-Based Machine Learning

Model of representation

RulesThe more complex, the

better

Learningalgorithm

None

Completely automatic;

the more complex the

better

Incorporation of domain knowledge

Manual, by

writing rulesThe least, the better

Humans involved in all aspects

Humans not involved at all

Opaque

RealRealRealRealSystemsSystemsSystemsSystemsRealRealRealReal

SystemsSystemsSystemsSystems

28

Page 29: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Real Systems: A Practical PerspectiveReal Systems: A Practical PerspectiveReal Systems: A Practical PerspectiveReal Systems: A Practical Perspective

• Entity extraction

• EMNLP, ACL,

NAACL, 2003-

2012

• 54 industrial

vendors (Who’s

Who in Text

Analytics, 2012)

[Chiticariu, Li, Reiss, EMNLP 2013]

PrimarilyPrimarilyPrimarilyPrimarilyRuleRuleRuleRule----basedbasedbasedbased

HybridHybridHybridHybrid

PrimarilyPrimarilyPrimarilyPrimarilyMachine Machine Machine Machine LearningLearningLearningLearning----basedbasedbasedbased

29

Page 30: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

PROs

• Easy to comprehend

• Easy to debug

• Easy to enhance

PROs

• Trainable

• Adapts automatically

• Reduces manual effort

CONs

• Heuristic

• Requires tedious manual labor

Rule-based IE ML-based IE

CONs

• Requires labeled data

• Requires retraining for domain adaptation

• Requires ML expertise to use or maintain

• Opaque

Rule-Based Machine Learning

Why Do Real Systems Use Rules ?Why Do Real Systems Use Rules ?Why Do Real Systems Use Rules ?Why Do Real Systems Use Rules ?

Page 31: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

PROs

• Easy to comprehend

• Easy to debug

• Easy to enhance

PROs

• Trainable

• Adapts automatically

• Reduces manual effort

CONs

• Heuristic

• Requires tedious manual labor

Rule-based IE ML-based IE

CONs

• Requires labeled data

• Requires retraining for domain adaptation

• Requires ML expertise to use or maintain

• Opaque

Rule-Based Machine Learning

Why Do Real Systems Use Rules ?Why Do Real Systems Use Rules ?Why Do Real Systems Use Rules ?Why Do Real Systems Use Rules ?

REQUIREMENTS in practiceREQUIREMENTS in practice NICE TO HAVE in practiceNICE TO HAVE in practice

Transparent ML: meet the REQUIREMENTs, while retaining as many of the NICE TO HAVEs !

Page 32: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Transparent Machine Learning (Transparent ML)Transparent Machine Learning (Transparent ML)Transparent Machine Learning (Transparent ML)Transparent Machine Learning (Transparent ML)

�An ideal Transparent ML technique is one that:1. Produces models that a typical real world user can read, understand, and

edit

� Easy to comprehend, debug, and enhance

2. Uses algorithms that a typical real world user can understand and

influence

� Easy to comprehend, debug, and enhance

3. Allows a real world user to incorporate domain knowledge when

generating the models

� Easy to enhance

32

Page 33: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Spectrum of Techniques

The Reality Is Much More Nuanced !

Model of representation

Rules ?The more complex, the

better

Learningalgorithm

None ?

Completely automatic;

the more complex the

better

Incorporation of domain knowledge

Manual, by

writing rules? The least, the better

Rule-BasedHumans involved

in all aspects

Machine LearningHumans not

involved at all

Opaque

RealRealRealRealSystemsSystemsSystemsSystemsRealRealRealReal

SystemsSystemsSystemsSystems

Transparent ML

33

Page 34: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

ProvenanceProvenanceProvenanceProvenance

34

Training Data

Learning

Algorithm

Learning

Algorithm

ModelModel

Input Documents

ExtractedObjects

Development time

(offline)

Run time

(online)

AlgorithmAlgorithmAlgorithmAlgorithm----levellevellevellevel Provenance:Provenance:Provenance:Provenance:Why and how was this model generated ?

AlgorithmAlgorithmAlgorithmAlgorithm----levellevellevellevel Provenance:Provenance:Provenance:Provenance:Why and how was this model generated ?

ModelModelModelModel----levellevellevellevel Provenance:Provenance:Provenance:Provenance:Why and how is an extracted object generated ?

ModelModelModelModel----levellevellevellevel Provenance:Provenance:Provenance:Provenance:Why and how is an extracted object generated ?

Page 35: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Key Dimension 1: Models Key Dimension 1: Models Key Dimension 1: Models Key Dimension 1: Models of Representationof Representationof Representationof Representation

35

� Simple models, e.g., dictionaries, regular expressions …

� … to more expressive models such as sequence patterns, dependency path patterns, rule

programs …

� … to more complex models e.g., classifiers, or a combination of the above

• Dictionary

• Regular

Expression

• Single rule

(pattern)

• Rule

Program

• Decision Tree

• SVM

• CRF

• HMM

• Deep Learning

• …

• Classification

rules

• Rules

+ Classifier

Simple Complex

Page 36: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Organization CandidateOrganization Candidate

Spectrum of Spectrum of Spectrum of Spectrum of Models Models Models Models of of of of Representation (1/4): Sequence Pattern RulesRepresentation (1/4): Sequence Pattern RulesRepresentation (1/4): Sequence Pattern RulesRepresentation (1/4): Sequence Pattern Rules

� A rule matches a linear sequence of tokens

� E.g., CPSL-style sequence rules [Appelt 1998]

� Components include:– Orthographic features: e.g., matches for a regular expression

– Lexical features: e.g., matches of a dictionary of terms

– Syntactic features. e.g., Part of Speech (POS) tags, Noun Phrase (NP) chunks

– Semantic features: e.g., named entity tags

36

Token

Dictionary=‘Org. Prefix’Token

Dictionary=‘Org. Prefix’Token

string=‘of’Token

string=‘of’Token

Dictionary=‘City Name’Token

Dictionary=‘City Name’

Page 37: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Spectrum of Spectrum of Spectrum of Spectrum of Models Models Models Models of of of of Representation (2/4): Path Pattern RulesRepresentation (2/4): Path Pattern RulesRepresentation (2/4): Path Pattern RulesRepresentation (2/4): Path Pattern Rules

� A rule matches a subgraph of a parse tree

[Sudo et al., 2003]

� Predicate-argument (PA) structure

–Based on direct relation with a predicate

� Chain Model

–Based on a chain of modifiers of a

predicate

� Subtree Model

–Any connected subtree of a dependency

parse

–Provide reliable contexts (like PA model)

–Captures long-distance relationship (like

Chain model)37

triggered

<Person> explosion

triggered

<Person>

triggered

explosion

triggered

heart

the city

triggered

<Person> explosion heart

the city

Page 38: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Spectrum of Models of Representation (3/4): PredicateSpectrum of Models of Representation (3/4): PredicateSpectrum of Models of Representation (3/4): PredicateSpectrum of Models of Representation (3/4): Predicate----based Rulesbased Rulesbased Rulesbased Rules

� Rule program expressed using first order logic

� SQL-like [Krishnamurthy et al., ICDE 2008]

create view Person as …; create view PhoneNum as …;

create view Sentence as …;

create view PersonPhone as

select P.name as person, N.number as phone

from Person P, PhoneNum N, Sentence S

where

Follows(P.name, N.number, 0, 30)

and Contains(S.sentence, P.name) and Contains(S.sentence, N.number)

and ContainsRegex(/\b(phone|at)\b/, SpanBetween(P.name, N.number));

� Prolog-like [Shen et al., 2007] Person(d, person) ⇐ …; PhoneNum(d, phone) ⇐ …; Sentence(d, person) ⇐ …;

PersonPhone(d, person, phone) ⇐ Person(d, person), PhoneNum(d, phone), Sentence(d, sentence),

before(person, phone, 0, 30),

match(spanBetween(person, phone), /\b(phone|at)\b/),

contains(sentence, person), contains(sentence, phone);

38

Within a single sentence

<Person> <PhoneNum>

0-30 chars

Contains “phone” or “at”

Within a single sentence

<Person> <PhoneNum>

0-30 chars

Contains “phone” or “at”

Page 39: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Spectrum of Spectrum of Spectrum of Spectrum of Models Models Models Models of of of of Representation (4/4)Representation (4/4)Representation (4/4)Representation (4/4)

�Classifiers– Decision trees, logistic regression, Support Vector Machines (SVM), …

�Graphical models– Conditional Random Fields (CRF), Hidden Markov Model (HMM), …

39

Page 40: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Key Dimension 1: Models Key Dimension 1: Models Key Dimension 1: Models Key Dimension 1: Models of Representationof Representationof Representationof Representation

40

Simple Complex

Transparent Opaque

� TransparencyTransparencyTransparencyTransparency: Does the model generate explainable output (i.e., extracted objects) ?

� TransparencyTransparencyTransparencyTransparency is determined by the presence or absence of Model-level Provenance

� ModelModelModelModel----level Provenance: level Provenance: level Provenance: level Provenance: ability to connect an extracted object to a subset of the input

data and a part of the model that generated it

� critical to comprehendcomprehendcomprehendcomprehending and debugdebugdebugdebugging the extracted objects

� The simpler the model, the more likely to have Model-level Provenance

� the more transparent the model

� Range of transparency cutoff on this spectrum, depending on the application

• Dictionary

• Regular

Expression

• Single rule

(pattern)

• Rule

Program

• Decision Tree

• SVM

• CRF

• HMM

• Deep Learning

• …

• Classification

rules

• Rules

+ Classifier

Page 41: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Key Dimension 2: Key Dimension 2: Key Dimension 2: Key Dimension 2: Learning Algorithms (1/2)Learning Algorithms (1/2)Learning Algorithms (1/2)Learning Algorithms (1/2)

UnsupervisedUnsupervised Semi-supervisedSemi-supervised SupervisedSupervised

No labeled data

Partially labeled data Fully labeled data

41

Page 42: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Key Dimension 2: Learning Key Dimension 2: Learning Key Dimension 2: Learning Key Dimension 2: Learning Algorithms (2/2)Algorithms (2/2)Algorithms (2/2)Algorithms (2/2)

UnsupervisedUnsupervised Semi-supervisedSemi-supervised SupervisedSupervised

� Transparency: Transparency: Transparency: Transparency: Does the learning algorithm generate explainable output, i.e.,

model?

� TransparencyTransparencyTransparencyTransparency is determined by the presence or absence of AlgorithmAlgorithmAlgorithmAlgorithm----level level level level

ProvenanceProvenanceProvenanceProvenance

� AlgorithmAlgorithmAlgorithmAlgorithm----level Provenance: alevel Provenance: alevel Provenance: alevel Provenance: ability to connect the model or part of the model

with a subset of the input data to the learning algorithm that produces the model

� Critical for comprehendcomprehendcomprehendcomprehending, debugdebugdebugdebuggggging and maintainmaintainmaintainmaintaining the model

42

Page 43: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Key Dimension 3: Incorporation of Domain Knowledge (Key Dimension 3: Incorporation of Domain Knowledge (Key Dimension 3: Incorporation of Domain Knowledge (Key Dimension 3: Incorporation of Domain Knowledge (1/3)1/3)1/3)1/3)

�Why Why Why Why do we need to incorporate domain knowledge ?do we need to incorporate domain knowledge ?do we need to incorporate domain knowledge ?do we need to incorporate domain knowledge ?

– In a contest/competition environment (e.g., MUC, TAC), the model is trained

on one domain and tested on the same domain

–Hardly the case in practice: the model is deployed in an environment usually

different from that where the model was trained

I'm still hearing from clients that Merrill's website is better.

Customer or competitor?

U.S. to Reduce Debt Next Quarter After Borrowing Needs Fall.

We remain confident Computershare will generate sufficient

earnings and operating cash flow to gradually reduce debt.

Debt reduction indicates

sentiment for Country,

but not Company

43

Page 44: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Key Dimension 3: Incorporation Key Dimension 3: Incorporation Key Dimension 3: Incorporation Key Dimension 3: Incorporation of of of of Domain Knowledge (2/3)Domain Knowledge (2/3)Domain Knowledge (2/3)Domain Knowledge (2/3)

�Types of domain knowledge –Complete labeled data –Seed examples (e.g. dictionary terms, patterns)–Type of extraction task–Choice of features and parameters–Metadata (e.g., knowledge base)

� Stages during learning when domain knowledge is incorporated–OfflineOfflineOfflineOffline: model is learned once and incorporates the domain knowledge all at once

– IterativeIterativeIterativeIterative: model is learned through a set of iterations, each iteration receiving more domain knowledge• Interactive: Interactive: Interactive: Interactive: Human actively involved in each iteration to provide more domain knowledge

–DeploymentDeploymentDeploymentDeployment: learnt model customized for the domain/application where it is deployed

44

Page 45: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Key Dimension 3: Incorporation Key Dimension 3: Incorporation Key Dimension 3: Incorporation Key Dimension 3: Incorporation of of of of Domain Knowledge (3/3)Domain Knowledge (3/3)Domain Knowledge (3/3)Domain Knowledge (3/3)

�TransparencyTransparencyTransparencyTransparency is determined by both: 1.1.1.1. ModelModelModelModel----level level level level ProvenanceProvenanceProvenanceProvenance

• Can extraction results be explained by the model?

The more explainable the results

� The easier to incorporate domain knowledge in the model to

influence the results

• Is the incorporation of domain knowledge to the model easy and intuitive?

The easier and more intuitive

� The easier it is to adapt the model to a new domain

2.2.2.2. AlgorithmAlgorithmAlgorithmAlgorithm----level Provenancelevel Provenancelevel Provenancelevel Provenance• What changes to the model does the domain knowledge result in ?

The more explainable the changes to the model

� The easier to incorporate domain knowledge in the algorithm to

influence the model

• Are the parameters intuitive and do they have clear semantics ?

The more intuitive parameters

� The easier it is to adapt the model to a new domain45

Page 46: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

RecapRecapRecapRecap

�The false dichotomy

�Transparent Machine Learning

�Provenance: Model and algorithm-level

�Ensuring provenance in

–Model

–Learning algorithm

–Domain adaptation

46

Page 47: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Transparent ML: State of the Art

47

Page 48: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

ObjectiveObjectiveObjectiveObjective

�Highlight some existing techniques exhibiting Transparent ML

–Breath over depth

�Mix of techniques: Recent or/and influential

–Not an exhaustive list !

48

Page 49: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Transparent ML Techniques Transparent ML Techniques Transparent ML Techniques Transparent ML Techniques

49

Unsupervised Semi-supervised Supervised

Dictionary

Regex

Rules

Rules + Classifier

Classification Rules

Page 50: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Transparent ML Techniques Transparent ML Techniques Transparent ML Techniques Transparent ML Techniques

50

Unsupervised Semi-supervised Supervised

Dictionary

Regex

Rules

Rules + Classifier

Classification Rules

Page 51: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation51

DictionariesDictionariesDictionariesDictionaries

� A dictionary (gazetteer) contains terms for a particular concept

� Very important for IE tasks

–E.g. list of country names, common first names, organization suffixes

–Highly data dependent � Crucial for domain adaptation

Page 52: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation52

General Approaches for Dictionary General Approaches for Dictionary General Approaches for Dictionary General Approaches for Dictionary LearningLearningLearningLearning

�Dictionary Learning/Lexicon Induction: Dictionary Learning/Lexicon Induction: Dictionary Learning/Lexicon Induction: Dictionary Learning/Lexicon Induction: learn a new dictionary–Semi-supervised (also known as Set Expansion)

• Often used in practice because it allows for targeting specific entity classes

• Dominant approach: Bootstrapping: e.g. [Riloff & Jones AAAI 1999]

Seed entries � (semi-)automatically expand the list based on context

–Unsupervised: Cluster related terms• Use targeted patterns or co-occurrence statistics, e.g. [Gerow 2014]

�Dictionary Refinement: Dictionary Refinement: Dictionary Refinement: Dictionary Refinement: update an existing dictionary –E.g., by removing ambiguous terms (e.g., [Baldwin et al., ACL 2013])

–Related problem: Related problem: Related problem: Related problem: Dictionary refinement in the context of a rule program

(see later)

Page 53: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Meta BootstrappingMeta Bootstrapping

Dictionary Learning: Bootstrapping Dictionary Learning: Bootstrapping Dictionary Learning: Bootstrapping Dictionary Learning: Bootstrapping [[[[RiloffRiloffRiloffRiloff & Jones AAAI 1999]& Jones AAAI 1999]& Jones AAAI 1999]& Jones AAAI 1999]

� Input: Input: Input: Input: Corpus, Candidate Extraction Patterns, Seed Words

� Mutual Bootstrapping: Mutual Bootstrapping: Mutual Bootstrapping: Mutual Bootstrapping: find the Extraction Pattern (EP) that is most useful to extracting known

category members; add all its extracted NPs to the dictionary

– Scoring heuristic tries to balance pattern reliability and number of known terms extracted

� Meta Bootstrapping: Meta Bootstrapping: Meta Bootstrapping: Meta Bootstrapping: guard against semantic drift due to few bad words extracted by “Best EP”

– Scoring heuristic rewards NPs extracted by many category EPs

Mutual BootstrappingMutual Bootstrapping

Select best EP

NPs extracted

by best EP

Candidate Extraction Patterns

and their extractions

5 best NPs

Seed wordsbolivia,

san miguel

Temporary

dictionarybolivia,

san miguel,

guatemala,

left side

Category

Extraction

Patterns“born in <X>”

“shot in <X>”

Permanent

dictionarybolivia,

san miguel,

guatemala

Initialize

53

Page 54: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Dictionary Learning: SemiDictionary Learning: SemiDictionary Learning: SemiDictionary Learning: Semi----supervisedsupervisedsupervisedsupervised

� Reducing semantic drift

– Multi-category bootstrapping, e.g., BASILISK [Thellen & Riloff EMNLP 2002]

– Distributional similarity to detect terms that could lead to semantic drift, e.g., [McIntosh &

Curran, ACL 2009]

– Discover negative categories, e.g., [McIntosh EMNLP 2010]

– Hybrid: bootstrapping + semantic tagger + coreference, e.g., [Qadir & Riloff, *SEM 2012]

– Incorporate user interaction: [Coden et al., Sem. Web Eval. Challenge 2014]

� Exploit the Web, e.g., [Downey et al., IJCAI 2007]

� Multi-word expressions, e.g., [Qadir et al. AAAI 2015]

54

Page 55: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Dictionary Learning: Unsupervised [Dictionary Learning: Unsupervised [Dictionary Learning: Unsupervised [Dictionary Learning: Unsupervised [GerowGerowGerowGerow, ACL 2014], ACL 2014], ACL 2014], ACL 2014]

� InputInputInputInput: a corpus

� GoalGoalGoalGoal: extract qualifiable sets of specialist terms found

in the corpus

� Algorithm

– Construct co-occurrence graph of all words in the

corpus

• Two words are connected if they are observed

in a n-word window

– Identify communities in the graph using a

community detection algorithm

– Rank words by their centrality in the community

� Minimal preprocessing

– No document structure

– No semantic relationship

– No threshold

55

Communities from NIPS Proceedings

Page 56: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Term Ambiguity Detection (TAD) [Baldwin et al, ACL 2013]Term Ambiguity Detection (TAD) [Baldwin et al, ACL 2013]Term Ambiguity Detection (TAD) [Baldwin et al, ACL 2013]Term Ambiguity Detection (TAD) [Baldwin et al, ACL 2013]

CameraEOS 5D

Video GameA New Beginning

MovieSkyfall 007

MovieBrave

CategoryTerm

Video GameA New Beginning

MovieBrave

CategoryTerm

Ambiguous

CameraEOS 5D

MovieSkyfall 007

CategoryTerm

Unambiguous

TAD

56

Movie night watching brave with Cammie n Isla n loads munchies

This brave girl deserves endless retweets!

Watching brave with the kiddos!

watching Bregor playing Civ 5: Brave New World and thinking of getting it

� Perform term disambiguation at the term, not instance level– Given term T and its category C, do allallallall the

mentions of the term reference a member of that category?

� Motivation for IE– Simpler model if the term unambiguous– More complex model otherwise

56

Page 57: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation57

Step 1: N-gramDoes the term share a namewith a common word/phrase?

Step 2: OntologyWiktionary + Wikipedia

Step 3: ClusteringCluster the contexts in which the term appears

Ambiguous

Unambiguous

Term Ambiguity Detection (TAD) [Baldwin et al, ACL 2013]Term Ambiguity Detection (TAD) [Baldwin et al, ACL 2013]Term Ambiguity Detection (TAD) [Baldwin et al, ACL 2013]Term Ambiguity Detection (TAD) [Baldwin et al, ACL 2013]

Page 58: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Transparent ML in Dictionary Learning/RefinementTransparent ML in Dictionary Learning/RefinementTransparent ML in Dictionary Learning/RefinementTransparent ML in Dictionary Learning/Refinement

� Transparency in Model of Representation– Very simple– Model-level Provenance: trivial to connect an extracted object with the input text and the part of the model that determined it

� Transparency in Learning Algorithm–Bootstrapping [Riloff & Jones, AAAI 1999] �Algorithm-level Provenance

• Every change in the model can be justified by the extraction pattern that extracts it• In turn, the extraction pattern can be explained by the seed terms matching the pattern

–TAD [Baldwin et al., ACL 2013] �Some transparency• Coarse granularity of transparency in terms of each level of filtering• Finer granularity of transparency within some of the filters, e.g., based on Wikipedia/Wiktionary

–[Gerow 2014] � No transparency

� Transparency in Incorporation of Domain Knowledge (DK)–Offline, for majority of techniques–But, easy to incorporate DK at deployment (by further modifying the dictionary)– Interactive techniques potentially fruitful to explore in semi-supervised settings

58

Page 59: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Transparent ML Techniques Transparent ML Techniques Transparent ML Techniques Transparent ML Techniques

59

Unsupervised Semi-supervised Supervised

Dictionary

Regex

Rules

Rules + Classifier

Classification Rules

Page 60: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation60

Regular Expressions (Regex)Regular Expressions (Regex)Regular Expressions (Regex)Regular Expressions (Regex)

� Regexes are essential to many IE tasks– Email addresses– Software names– Credit card numbers– Social security numbers– Gene and Protein names– ….

� But writing regexes for IE is not straightforward !

� ExampleExampleExampleExample: Simple regex for phone number phone number phone number phone number extraction:

blocks of digits separated by non-word character:

R0 = (\d+\W)+\d+

– Identifies valid phone numbers (e.g. 800-865-1125, 725-1234)

– Produces invalid matches (e.g. 123-45-6789, 10/19/2002, 1.25 …)

– Misses valid phone numbers (e.g. (800) 865-CARE)

Web collections

Email compliance

Bioinformatics

Page 61: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation61

Learning Regular ExpressionsLearning Regular ExpressionsLearning Regular ExpressionsLearning Regular Expressions

� Supervised

– Refine regex given positive and negative examples [Li et al., EMNLP 2008]

� Semi-supervised

– Learning regex from positive examples [Brauer et al., CIKM 2011]

Page 62: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation62

Conventional Regex Writing Process for IE

Regex0

Sample

Documents

Match 1

Match 2

J

Good

Enough?

N

YRegexfinal

(\d+\W)+\d+(\d+\W)+\d{4}

800-865-1125725-1234C123-45-678910/19/20021.25C

Regex1Regex2Regex3 (\d+[\.\s\-])+\d{4}(\d{3}[\.\s\-])+\d{4}

Page 63: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation63

Learning Regexfinal automatically in ReLIE [Li et al., EMNLP 2008]

Regex0

Sample

Documents

Match 1

Match 2

J

NegMatch 1

J

NegMatch m0

PosMatch 1

J

PosMatch n0

Labeled Matches ReLIERegexfinal

Page 64: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation64

ReLIE ReLIE ReLIE ReLIE IntuitionIntuitionIntuitionIntuition

R0 ([A-Z][a-zA-Z]{1,10}\s){1,5}\s*(\w{0,2}\d[\.]?){1,4}

([A-Z][a-zA-Z]{1,10}\s){1,5}\s*(\ [a-zA-Z] {0,2}\d[\.]?){1,4}

([A-Z][a-zA-Z]{1,10}\s){1,2}\s*(\w{0,2}\d[\.]?){1,4}

([A-Z][a-zA-Z]{1,10}\s){1,5}\s*(?!(201|J|330))(\w{0,2}\d[\.]?){1,4}

([A-Z] [a-z] {1,10}\s){1,5}\s*(\\w{0,2}\d[\.]?){1,4}

J

([A-Z][a-zA-Z]{1,10}\s){2,4}\s*(\w{0,2}\d[\.]?){1,4}

J

Compute F-measure

F1

F7

F8

F34

F48

J

((?!(Copyright|Page|Physics|Question| � � � |Article|Issue)

[A-Z][a-zA-Z]{1,10}\s){1,5}\s*(\w{0,2}\d[\.]?){1,4} F35

R’

([A-Z] [a-z] {1,10}\s){1,5}\s*( [a-zA-z] {0,2}\d[\.]?){1,4}

([A-Z] [a-z] {1,10}\s) {1,2} \s*(\\w{0,2}\d[\.]?){1,4}

(((?!(Copyright|Page|Physics|Question| � � � |Article|Issue)

[A-Z] [a-z] {1,10}\s){1,5}\s*(\\w{0,2}\d[\.]?){1,4}

JJ

([A-Z] [a-z] {1,10}\s){1,5} \s*( \d {0,2}\d[\.]?){1,4}

([A-Z] [a-z] {1,10}\s){1,5} \s*(\\w{0,2}\d[\.]?){1,3}

JJ

([A-Z] [a-z] {1,10}\s){1,5}\s*(?!(201|J|330))(\w{0,2}\d[\.]?){1,4}

JJJJ..

>>

>

JJ

J

• Generate candidate regular expressions by modifying current regular expression

• Select the “best candidate” R’

• If R’ is better than current regular expression, repeat the process

• Use a validation set to avoid overfitting

Page 65: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

–Find the best Rf among all possible regexes–Best = Highest F-measure over a document collection D

–Can only compute F-measure based on the labeled data � Limit Rf such

that any match of Rf is also a match of R0

� Two Regex Transformations

–Drop-disjunct Transformation:

R = Ra((((RRRR1111| | | | RRRR2222||||… … … … RRRRiiii|||| RRRRi+1i+1i+1i+1||||…| R…| R…| R…| Rnnnn)))) Rb → R’ = Ra (R(R(R(R1111| … | … | … | … RRRRiiii|…)|…)|…)|…) Rb

– Include-Intersect Transformation

R = RaXXXXRb → R’ = Ra(XXXX ∩∩∩∩YYYY) Rb, where Y ≠ ∅

65

Regex Learning ProblemRegex Learning ProblemRegex Learning ProblemRegex Learning Problem

Page 66: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

� Character Class Restriction

E.g. To restrict the matching of non-word characters

(\d+\\\\WWWW)+\d+ → (\d+[[[[\\\\....\\\\ssss\\\\----]]]])+\d+

� Quantifier Restriction

E.g. To restrict the number of digits in a block

66

(\d+\W)+\d+ →→→→ (\d{3}\W)+\d+

Applying Drop-Disjunct Transformation

Page 67: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation67

Applying Include-Intersect Transformation

� Negative Dictionaries

–Disallow certain words from matching specific portions of the regex

E.g. a simple pattern for software name extraction:

blocks of capitalized words followed by version number:

R0 = ([A-Z]\w*\s*)+[Vv]?(\d+\.?)+

– Identifies valid software name (e.g. Eclipse 3.2, Windows 2000Eclipse 3.2, Windows 2000Eclipse 3.2, Windows 2000Eclipse 3.2, Windows 2000)

–Produces invalid matches (e.g. ENGLISH 123, Room 301, Chapter 1.2ENGLISH 123, Room 301, Chapter 1.2ENGLISH 123, Room 301, Chapter 1.2ENGLISH 123, Room 301, Chapter 1.2)

Rf = (?! ENGLISH|Room|Chapter) ([A-Z]\w*\s*)+[Vv]?(\d+\.?)+

Page 68: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Learning regex from positive examples [Brauer et al. 2011]

� InputInputInputInput: set of examples

� OutputOutputOutputOutput: one regex

68

z800z800 AABd700 ASEz40yd50t ATX

(d|z)([0-9]0{2}|[0-9]0[a-z]) ([A-Z]+)?

Notebook models

Page 69: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Learning Learning Learning Learning a Regex a Regex a Regex a Regex from Pfrom Pfrom Pfrom Positive Examples ositive Examples ositive Examples ositive Examples [[[[BrauerBrauerBrauerBrauer et al. et al. et al. et al. CIKM 2011CIKM 2011CIKM 2011CIKM 2011]]]]Step 1: Step 1: Step 1: Step 1: Build automata to capture all features of the examples

• Features: class vs. instance level and token vs. character level

• Transitions encode the sequential ordering of features in the examples

69

z800z800 AABd700 ASEz40yd50t ATX

InstancesInstancesInstancesInstances

Features at (character) class levelFeatures at (character) class level

Features at instance levelFeatures at instance level

Token features

Character features

Page 70: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Learning Learning Learning Learning a Regex a Regex a Regex a Regex from Pfrom Pfrom Pfrom Positive Examples ositive Examples ositive Examples ositive Examples [[[[BrauerBrauerBrauerBrauer et al. et al. et al. et al. CIKM 2011CIKM 2011CIKM 2011CIKM 2011]]]]Step 2: Step 2: Step 2: Step 2: Choose among class vs. instance feature

• Prefer instance feature if very common in the examples

• Parameter β to further influence the feature selection towards class features

(for higher recall) vs. instance (for higher precision)

70

z800z800 AABd700 ASEz40yd50t ATX

InstancesInstancesInstancesInstances

Features at (character) class levelFeatures at (character) class level

Features at instance levelFeatures at instance level

Token features

Character features

Page 71: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Learning a Regex from Positive Examples Learning a Regex from Positive Examples Learning a Regex from Positive Examples Learning a Regex from Positive Examples [[[[BrauerBrauerBrauerBrauer et al. CIKM 2011]et al. CIKM 2011]et al. CIKM 2011]et al. CIKM 2011]Step 3: Step 3: Step 3: Step 3: Choose among token vs. character feature

• Use the Minimum Description Length (MDL) principle to choose most

promising abstraction layer

• To balance model complexity with its fitness to encode the data

71

z800z800 AABd700 ASEz40yd50t ATX

InstancesInstancesInstancesInstances

Features at (character) class levelFeatures at (character) class level

Features at instance levelFeatures at instance level

Token features

Character features

Page 72: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Learning Learning Learning Learning a Regex a Regex a Regex a Regex from Pfrom Pfrom Pfrom Positive Examples ositive Examples ositive Examples ositive Examples [[[[BrauerBrauerBrauerBrauer et al. et al. et al. et al. CIKM 2011CIKM 2011CIKM 2011CIKM 2011]]]]Step 4: Step 4: Step 4: Step 4: Generate regular expressions for each end state

• Pick the expression with smallest MDL from begin to end state

• Apply some simplification rules, e.g. cardinality

• Final regex: (z|d) ((<NB>0{2}) | (<NB>0<LC>)) ( _<UC>+){0,1}

72

z800z800 AABd700 ASEz40yd50t ATX

InstancesInstancesInstancesInstances

Features at (character) class levelFeatures at (character) class level

Features at instance levelFeatures at instance level

Token features

Character features

Page 73: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Transparent ML in Regex Learning/RefinementTransparent ML in Regex Learning/RefinementTransparent ML in Regex Learning/RefinementTransparent ML in Regex Learning/Refinement

�Transparency in Model of Representation– Simple

– Model-level provenance: easy to connect a result of the model with the input text that

determined it

�Transparency in Learning Algorithm– No algorithm-level provenance

– RELIE [Li et al., EMNLP 2008] � some transparency in terms of influencing the model via the

initial regular expression

– [Brauer et al., CIKM 2011] � some transparency in influencing feature selection

�Transparency in Incorporation of Domain Knowledge (DK)– Offline

– But, easy to incorporate DK at deployment (by modifying the regex)

– Interactive techniques potentially useful

73

Page 74: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Transparent ML Techniques Transparent ML Techniques Transparent ML Techniques Transparent ML Techniques

74

Unsupervised Semi-supervised Supervised

Dictionary

Regex

Rules

Rules + Classifier

Classification Rules

Page 75: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Fact Extraction Fact Extraction Fact Extraction Fact Extraction

Fact (or concept): Fact (or concept): Fact (or concept): Fact (or concept): can be an entity, relation, event, …

75

Traditional IEOpen IE

[Banko et el., 2007]

Input Corpus (+ labeled data) Corpus

Type Specified in advanceDiscovered automatically,

or specified via ontology

Extractor Type-specific Type-independent

Several papers, and two tutorials in this EMNLP:Several papers, and two tutorials in this EMNLP:Several papers, and two tutorials in this EMNLP:Several papers, and two tutorials in this EMNLP:• Knowledge Acquisition for Web Search (now)• Learning Semantic Relations from Text (Friday morning)

Several papers, and two tutorials in this EMNLP:Several papers, and two tutorials in this EMNLP:Several papers, and two tutorials in this EMNLP:Several papers, and two tutorials in this EMNLP:• Knowledge Acquisition for Web Search (now)• Learning Semantic Relations from Text (Friday morning)

Page 76: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Transparent ML Techniques Transparent ML Techniques Transparent ML Techniques Transparent ML Techniques

76

Unsupervised Semi-supervised Supervised

Dictionary

Regex

Rules

Rules + Classifier

Classification Rules

Page 77: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation77

Fact Extraction: Supervised

� Fact (or concept): Fact (or concept): Fact (or concept): Fact (or concept): can be an entity, relation, event, …

� Context: Context: Context: Context: Traditional IE

� Input: Input: Input: Input: Document collection, labeled with the target concept

� Goal: Goal: Goal: Goal: induce rules that capture the target concept

� Earlier work: Earlier work: Earlier work: Earlier work: Sequence patterns (CPSL-style) as target language

� Recent work: Recent work: Recent work: Recent work: Predicate-based rule program as target language

Page 78: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation78

Fact Extraction: Supervised

� Fact (or concept): Fact (or concept): Fact (or concept): Fact (or concept): can be an entity, relation, event, …

� Context: Context: Context: Context: Traditional IE

� Input: Input: Input: Input: Document collection, labeled with the target concept

� Goal: Goal: Goal: Goal: induce rules that capture the target concept

� Earlier work: Earlier work: Earlier work: Earlier work: Sequence patterns (CPSL-style) as target language

� Recent work: Recent work: Recent work: Recent work: Predicate-based rule program as target language

Page 79: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Supervised Learning of Sequence Patterns� Input:Input:Input:Input:

– Collection of text documents, labeled with target concept– Available basic features: tokens, orthography, parts of speech, dictionaries, entities, …

� Goal: Goal: Goal: Goal: Define the smallest set of rules that cover the maximum number of training cases with high precision

� Model of Representation: Model of Representation: Model of Representation: Model of Representation: unordered disjunction of sequence pattern rules

� General framework: General framework: General framework: General framework: greedy hill climbing strategy to learn one rule at a time1. S is the set of rules, initially empty2.While there exists a training concept not covered by any rule in S

• Generate new rules around it• Add new rules to S

3.Post process rules to prune away redundant rules

� Techniques: Techniques: Techniques: Techniques: Bottom-up and top-down

� SurveysSurveysSurveysSurveys: [Muslea, AAAI Workshop on ML in IE 1999]

[Sarawagi, Foundations and Trends in Databases, 2008]

79

Page 80: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation80

Bottom-up Techniques: Generalize a Specific Rule

� Start with a specific rule covering a single instance (100% precision)

� Generalize the rule to increase its coverage, with a possible loss of precision–Many strategies: e.g., dropping a token, or replacing a token by a more general

feature

� Remove instances covered by the rule from the training set

� Example systems: RAPIER [Califf & Mooney AAAI 1999, JML 2003], (LP)2

[Ciravegna IJCAI 2001]

Page 81: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation81

Bottom-up Technique Example: (LP)2 [Ciravegna IJCAI 2001]

� Example Example Example Example text:text:text:text:

� Initial rule: Initial rule: Initial rule: Initial rule: snippet of w tokens to the left and right of the labeled instance

<Token>[string=“studying”] <Token>[string=“at”](<Token>[string=“University”] <Token>[string=“of”] <Token>[string=“Chicago”]):ORG

� Some generalizations of the initial rule:Some generalizations of the initial rule:Some generalizations of the initial rule:Some generalizations of the initial rule:– Two tokens generalized to orthography type

<Token>[string=“studying”] <Token>[string=“at”](<Token>[orth=“CapsWord”] <Token>[string=“of”] <Token>[orth=“CapsWord”]):ORG

– Two tokens are dropped, two tokens generalized by whether they appear in dictionaries

(<Token>[Lookup=“OrgPrefix”] <Token>[string=“of”] <Token>[Lookup=“CityName”]):ORG

� Exponential number of generalizations Exponential number of generalizations Exponential number of generalizations Exponential number of generalizations � heuristics to reduce the search space– Greedily select the best single step of generalization– User-specified maximum number of generalizations retained

� Top-k “best” generalizations are added to the “best rules pool”– Based on a combination of measures of quality of rules, including precision, overall coverage, and coverage of instances not covered by other rules

I am studying at University of Chicago.University of Chicago.University of Chicago.University of Chicago.

Page 82: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation82

Top-down Techniques: Specialize a Generic Rule

� Start with a generic rule covering all instances (100% coverage)

� Specialize the rule in various ways to get a set of rules with high precision (inductive logic – style)

� Example systems: WHISK [Soderland, ML 1999], [Aitken, ECAI 2002]

Page 83: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Top-down Technique Example: WHISK [Soderland, ML 1999]

83

� Seed labeled instance:Seed labeled instance:Seed labeled instance:Seed labeled instance:

� Initial rule: * Initial rule: * Initial rule: * Initial rule: * ( * )( * )( * )( * ) * * * * ( * ) ( * ) ( * ) ( * ) * * * * ( * )( * )( * )( * )

� Some specializations of the initial rule:Some specializations of the initial rule:Some specializations of the initial rule:Some specializations of the initial rule:– First slot anchored inside: * * * * ( ( ( ( Neighborhood Neighborhood Neighborhood Neighborhood ) ) ) ) * * * * ( * ) ( * ) ( * ) ( * ) * * * * ( * )( * )( * )( * )

– First slot anchored outside: @start ( *) ‘@start ( *) ‘@start ( *) ‘@start ( *) ‘----’ ’ ’ ’ * * * * ( * ) ( * ) ( * ) ( * ) * * * * ( * )( * )( * )( * )

� Greedily select the best single step of generalization– Capture the seed and minimize error on training set– Heuristics to prefer the least restrictive rule that fits the data, e.g., choose semantic class and syntactic tags over literals

� Semi-supervised and interactive– Start with a random sample of unlabeled instances, possibly satisfying some keywords– In each iteration, automatically select instances from 3 sets for the user to label

• Covered by an existing rule � increase support for the rule or provide counter example• “Near” misses of existing rules• Not covered by any rule

Capitol Hill – 1 br townhome, all inclusive $675

Page 84: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Transparent ML in Transparent ML in Transparent ML in Transparent ML in Learning of CPSLLearning of CPSLLearning of CPSLLearning of CPSL----style Patternsstyle Patternsstyle Patternsstyle Patterns

�Transparency in Model of Representation– Relatively simple representation

– Model-level Provenance: easy to connect an extracted object with the input text and a part of

the model (i.e., a rule) that determined it

�Transparency in Learning Algorithm–No transparency

�Transparency in Incorporation of Domain Knowledge (DK)–Most systems � offline (fully supervised)– WHISK � interactive

• Active learning techniques used to select examples for the user to label

–Easy to incorporate domain knowledge at deployment (by further modifying the

rules)

84

Page 85: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation85

Fact Extraction: Fact Extraction: Fact Extraction: Fact Extraction: SupervisedSupervisedSupervisedSupervised

� Earlier work: Earlier work: Earlier work: Earlier work: Sequence patterns ((((CPSL-style) as target language

� Recent work: Recent work: Recent work: Recent work: Predicate-based rule program as target language

Page 86: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Supervised Learning of PredicateSupervised Learning of PredicateSupervised Learning of PredicateSupervised Learning of Predicate----based Rulesbased Rulesbased Rulesbased Rules

�Rule Induction: Rule Induction: Rule Induction: Rule Induction: generate a rule program from basic features

�Rule refinement: Rule refinement: Rule refinement: Rule refinement: refine an existing rule program

86

Page 87: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Supervised Learning of PredicateSupervised Learning of PredicateSupervised Learning of PredicateSupervised Learning of Predicate----based Rulesbased Rulesbased Rulesbased Rules

�Rule Induction: Rule Induction: Rule Induction: Rule Induction: generate a rule program from basic features–E.g., [Nagesh et al., 2012]

�Rule refinement: Rule refinement: Rule refinement: Rule refinement: refine an existing rule program

87

Page 88: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation88

NER Rule InductionNER Rule InductionNER Rule InductionNER Rule Induction [[[[NageshNageshNageshNagesh et al., EMNLP 2012]et al., EMNLP 2012]et al., EMNLP 2012]et al., EMNLP 2012]

� InputInputInputInput:

–Basic features (dictionaries & regular expressions)

–Fully labeled document collection (PER, ORG, LOC)

� GoalGoalGoalGoal: Induce an initial set of named-entity rules that can be refined /

customized by domain-expert

Page 89: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation89

Anatomy of a Anatomy of a Anatomy of a Anatomy of a Named Entity Named Entity Named Entity Named Entity EEEExtractorxtractorxtractorxtractor

Basic Features

(BF rules)

Candidate

Definition

(CD rules)

Candidate

Refinement

(CR rules)

Consolidation

(CO rules)

> we met Ms. Anna Smith from Melinda Gates Foundation>Document

Caps

..Ms. Anna Smith from Melinda Gates Foundation >

FirstNameDict

Caps Caps Caps Caps

LastNameDict FirstNameDict LastNameDictPersonCandidate

PersonCandidate

PersonCandidateWithSalutationOrganization

Person

Page 90: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation90

Overview of Overview of Overview of Overview of Rule Induction Rule Induction Rule Induction Rule Induction SSSSystemystemystemystem

Basic Features

(BF rules)

Candidate

Definition

(CD rules)

Candidate

Refinement

(CR rules)

Consolidation

(CO rules)

Induction of

CD rules

Induction of

CR rules

Clustering

and LGG

Proposition

Rule Learning

RIPPER

BF rules

Annotated

dataset

Simple CO rule

Page 91: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation91

First order representation of First order representation of First order representation of First order representation of labeled datalabeled datalabeled datalabeled data

<PER> M. Waugh </PER>

BF rules

--Caps

LastNameDict

InitialDict

Textual Spans generated

--Caps � Waugh

LastNameDict � Waugh

InitialDict � M.

J

First Order Logic predicates

--Caps(X2), LastNameDict(X2),

InitialDict(X1)

Glue predicatesstartsWith(X, X1)

endsWith(X, X2)

immBefore(X1, X2)

contains(Y, Y3)

equals(Z1, Z2)

+

person(X, d1) :- startsWith(X, X1), InitialDict(X1),

endsWith(X, X2), immBefore(X1, X2), Caps(X2), LastNameDict(X2)

First order representation

X1 X2

X

Page 92: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation92

Induction of CD rules: Induction of CD rules: Induction of CD rules: Induction of CD rules: Least general generalisation (LGG) of annotationsLeast general generalisation (LGG) of annotationsLeast general generalisation (LGG) of annotationsLeast general generalisation (LGG) of annotations

person(X,D1) :- startsWith(X, X1), FirstNameDict(X1),endsWith(X, X2), immBefore(X1,X2), Caps(X2).

person(Y,D2) :- startsWith(Y, Y1), FirstNameDict(Y1), Caps(Y1),endsWith(Y, Y2), immBefore(Y1,Y2), Caps(Y2).

startsWith(Z, Z1), FirstNameDict(Z1),

PER: john

PER: John

Smith

Doe

LGG of the above

endsWith(Z, Z2), immBefore(Z1,Z2), Caps(Z2)

person(Z,D) :-

Page 93: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation93

Clustering of AnnotationsClustering of AnnotationsClustering of AnnotationsClustering of Annotations

person(X,D1) :- startsWith(X, X1), FirstNameDict(X1),endsWith(X, X2), immBefore(X1,X2), Caps(X2).

person(Y,D2) :- startsWith(Y, Y1), FirstNameDict(Y1), Caps(Y1), endsWith(Y, Y2), immBefore(Y1,Y2), Caps(Y2).

person(Z,D1) :- startsWith(Z, Z1), InitialDict(Z1),endsWith(Z, Z2), immBefore(Z1,Z2), Caps(Z2).

person(W,D3) :- startsWith(W, W1), InitialDict(W1), Caps(W1), endsWith(W, W2), immBefore(W1,W2), Caps(W2).

....

....

....

....

....

J. Doe

John Doe

j. Smith

john Smith

These are not

useful LGG

computations

Cluster examples

� reduce the

computationFeatures for clustering

are obtained from RHS

of example clauses

M. Waughperson(K,D3) :- startsWith(K, K1), InitialDict(K1), Caps(K1),

endsWith(K, K2), immBefore(K1,K2), Caps(K2).

LGG rule

LGG rule

Page 94: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation94

Induction of CR Induction of CR Induction of CR Induction of CR rulesrulesrulesrules� Build a table encoding whether a span generated by one CD rule matches (M) or overlaps (O) with a span generated by any other CD rule

� Learn compositions of CD rules via the RIPPER propositional learner [Furnkranz and Widmer, 1994]

� Inductive Bias to model rule developer expertise and restrict the size of generated rules1. Disallow the BFs for one entity type from appearing in CD rules for another type

• Avoids: PerCD [FirstNameDict][CapsPerson ^ CapsOrg]

2. Restriction of type of CD views that can appear in a CR• Avoids: PerCR (OrgCD = M) AND (LocCD != O)

LOC (locCDi = M) AND (orgCDj != O)

A span of text

is a LOCmatches

a Loc-CD rule

does not

overlap with

a org-CD rule

““““Washington”””” in Washington Post will be filtered due to this rule

Page 95: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Supervised Learning of PredicateSupervised Learning of PredicateSupervised Learning of PredicateSupervised Learning of Predicate----based Rulesbased Rulesbased Rulesbased Rules

�Rule Induction: Rule Induction: Rule Induction: Rule Induction: generate a rule program from basic features

�Rule refinement: Rule refinement: Rule refinement: Rule refinement: refine an existing rule program –Refine rules [Liu et al., 2010]–Refine dictionaries used by the rules [Roy et al., 2013]

95

Page 96: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation96

Rule Refinement [Liu et al. VLDB 2010]

R1: create view Phone as

Regex(‘d{3}-\d{4}’, Document, text);

R2: create view Person as

Dictionary(‘first_names.dict’, Document, text);

Dictionary file first_names.dict:

anna, james, john, peterJ

R3: create table PersonPhone(match span);

insert into PersonPhone

select Merge(F.match, P.match) as match

from Person F, Phone P

where Follows(F.match, P.match, 0, 60);

Document:text

Anna at James St. office (555-

5555), or James, her assistant

– 777-7777 have the details.

Phone:match

555-5555

777-7777

Person:match

Anna

James

James

Person PhonePerson Person Phone

Anna at James St. office (555-5555), or James, her assistant - 777-7777 have the details.

� Rules expressed in SQL– Select, Project, Join, Union all, Except all– Text-specific extensions

• Regex, Dictionary table functions• New selection/join predicates

– Can express core functionality of IE rule languages• AQL, CPSL, XLog

� Relational data model– Tuples and views– New data type span: region of text in a document

Page 97: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation97

Rule Refinement [Liu et al. VLDB 2010]

R1: create view Phone as

Regex(‘d{3}-\d{4}’, Document, text);

R2: create view Person as

Dictionary(‘first_names.dict’, Document, text);

Dictionary file first_names.dict:

anna, james, john, peterJ

R3: create table PersonPhone(match span);

insert into PersonPhone

select Merge(F.match, P.match) as match

from Person F, Phone P

where Follows(F.match, P.match, 0, 60);

Document:text

Anna at James St. office (555-

5555), or James, her assistant

– 777-7777 have the details.

Phone:match

555-5555

777-7777

Person:match

Anna

James

James

Person PhonePerson Person Phone

Anna at James St. office (555-5555), or James, her assistant - 777-7777 have the details.

� Rules expressed in SQL– Select, Project, Join, Union all, Except all– Text-specific extensions

• Regex, Dictionary table functions• New selection/join predicates

– Can express core functionality of IE rule languages• AQL, CPSL, XLog

� Relational data model– Tuples and views– New data type span: region of text in a documentChallengesChallengesChallengesChallenges

• Which rule to refine and how?• What are the effects and side-effects?

ChallengesChallengesChallengesChallenges

• Which rule to refine and how?• What are the effects and side-effects?

Page 98: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Method Overview

� Framework for systematic exploration of multiple

refinements geared towards improving precision

� InputInputInputInput: Extractor P

Results of P, fully labeled

� GoalGoalGoalGoal: Generate refinements of P that remove false

positives, while not affecting true positives

� Basic IdeaBasic IdeaBasic IdeaBasic Idea:Cut any provenance link � wrong output disappears

PersonDictionary

FirstNames.dict

Doc

PersonPhoneJoin

Follows(name,phone,0,60)

James

James����555-5555

(Simplified) provenance

of a wrong output

PhoneRegex

/\d{3}-\d{4}/

555-5555

PhonePerson

Anna at James St. office (555-5555), J

98

Page 99: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation99

99

HLC 2

Remove James

from output of R2’’’’Dictionary op.

HLC 3: Remove

James����555-5555

from output of R3’’’’s

join op.

HLC 1

Remove 555-5555

from output of

R1’’’’s Regex op.

σtrue

πMerge(F.match, P.match) as match

⋈Follows(F.match,P.match,0,60)

Dictionary‘firstName.dict’, text

Regex‘\d{3}-\d{4}’, text

R2 R1

R3

Doc

Goal: remove ““““James���� 555-5555”””” from output

High Level Changes:What Operator to Modify ?

� Canonical algebraic representation of extraction rules as trees of operators

PhonePerson

Anna at James St. office (555-5555), J

Page 100: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation10

0

100

σtrue

πMerge(F.match, P.match) as match

⋈Follows(F.match,P.match,0,60)

Dictionary‘firstName.dict’, text

Regex‘\d{3}-\d{4}’, text

R2 R1

R3

Doc

Goal: remove ““““James���� 555-5555”””” from output

LLC 1

Remove ‘‘‘‘James’’’’from FirstNames.dict

LLC 2

Add filter pred. on

street suffix in right

context of match

LLC 3

Reduce character gap between

F.match and P.match from 60 to 10

Low-Level Changes: How to Modify the Operator ?

� Canonical algebraic representation of extraction rules as trees of operators

PhonePerson

Anna at James St. office (555-5555), J

Page 101: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation10

1

Types of LowTypes of LowTypes of LowTypes of Low----Level ChangesLevel ChangesLevel ChangesLevel Changes

1. Modify numerical join parameters - implements HLCs for ⋈

2. Remove dictionary entries - implements HLCs for Dictionary, σContainsDict()� More on this later

3. Add filtering dictionary - implements HLCs for σ� Parameters: target of filter (match, or left/right context)

4. Add filtering view - applies to an entire view� Parameters: filtering view, filtering mode (Contains, IsContained, Overlaps)� E.g., “Subtract from the result of rule R3 PersonPhone spans that are strictly contained within

another PersonPhone span”

� Other LLC generation modules can be incorporated

Page 102: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation10

2

Computing Computing Computing Computing ModelModelModelModel----level Provenancelevel Provenancelevel Provenancelevel Provenance

PersonPhone rule:

insert into PersonPhone

select Merge(F.match, P.match) as match

from Person F, Phone P

where Follows(F.match, P.match, 0, 60);

match

Anna at James St. office (555-5555

James St. office (555-5555

PersonPhone

� (Model-level) Provenance: Explains output data in terms of the input data, the intermediate data, and the transformation (e.g., SQL query, ETL, workflow)

– Surveys: [Davidson & Freire, SIGMOD 2008] [Cheney et al., Found. Databases 2009]

� For predicate-based rule languages (e.g., SQL), can be computed automatically!

Person PhonePerson

Anna at James St. office (555-5555) J.

Page 103: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation10

3

Computing ModelComputing ModelComputing ModelComputing Model----level Provenancelevel Provenancelevel Provenancelevel Provenance

Rewritten PersonPhone rule:

insert into PersonPhone

select Merge(F.match, P.match) as match,

GenerateID() as ID,

P.id as nameProv, Ph.id as numberProv

‘AND’ as how

from Person F, Phone P

where Follows(F.match, P.match, 0, 60);

Person PhonePerson

Anna at James St. office (555-5555) J.

ID: 1 ID: 2 ID: 3

1 3AND

2 3AND

match

Anna at James St. office (555-5555

James St. office (555-5555

PersonPhone

Provenance

� (Model-level) Provenance: Explains output data in terms of the input data, the intermediate data, and the transformation (e.g., SQL query, ETL, workflow)

– Surveys: [Davidson & Freire, SIGMOD 2008] [Cheney et al., Found. Databases 2009]

� For predicate-based rule languages (e.g., SQL), can be computed automatically!

Page 104: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation10

4

Generating HLCs and LLCs

� HLCs: HLCs: HLCs: HLCs: compute directly from

provenance graph and negative

examples

� LLCs: LLCs: LLCs: LLCs: Naive approach– For each HLC (ti, Op),

enumerate all possible LLCs– For each LLC:

• Compute set of local tuples it removes from the output of Op

• Propagate removals up the provenance graph to compute the effect on end-to-end result

– Rank LLCs based on improvement in F1

(t5 , π3 )

(t4, σ3 )

(t3, ⋈3 )

(t1, Regex1)

(t2, Dictionary2)

HLCs:

James St. office

(555-5555

π3

555-5555 James

σ3

⋈3

555-5555

Regex1

James

Dictionary2

t5:

t2:t1:

t4:

t3:

555-5555 James

Provenance graphof a wrong output

Doc

Page 105: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Problems with the Naïve ApproachProblems with the Naïve ApproachProblems with the Naïve ApproachProblems with the Naïve Approach

� Problem 1: Given an HLC, the number of possible LLCs may be large–E.g., HLC is (t, Dictionary), 1000 dictionary entries � 2999-1 possible LLCs !

� Solution: Limit the LLCs considered to a set of tractable size, while still considering all feasible combinations of HLCs for Op

–Generate a single LLC for each of k promising combinations of HLCs for Op–k is the number of LLCs presented to the user

� Problem 2: Traversing the provenance graph is expensive

–O(n2), where n is the size of the operator tree

� Solution: For each Op and tuple ti in the output of Op, remember mapping ti�

{set of affected output tuples}

+++

+

+++

+ ++

+ +

+

+ + ++

++ +

+

++++

-- - --

- - -- - --

--

- - ---

-

--

--

- -

-Tuples to remove

from output of OpOutput tuples

105

Page 106: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation10

6

LLC Generation: Learning a Filter Dictionary

James ���� James St

Morgan ���� Morgan Ave

June ���� June Blvd

Anna ���� Anna Karenina Blvd

Hall ���� Hall St

Output of

σσσσ operator

Final output of

Person extractor

‘‘‘‘st’’’’ ���� James St

Hall St

‘‘‘‘blvd’’’’ ���� June Blvd

Anna Karenina Blvd

‘‘‘‘ave’’’’ ���� Morgan Ave

Common token in

right context

Effects of filtering

with the token

Generated LLCs:Add ContainsDict(‘‘‘‘SuffxDict’’’’, RightContextTok(match,2)) to σσσσ operator, where SuffixDict contains:

1. ‘st’2. ‘st’,’blvd’3. ‘st’, ‘blvd’,’ave’

Page 107: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Supervised Learning of PredicateSupervised Learning of PredicateSupervised Learning of PredicateSupervised Learning of Predicate----based Rulesbased Rulesbased Rulesbased Rules

�Rule Induction: Rule Induction: Rule Induction: Rule Induction: generate a rule program from basic features

�Rule refinement: Rule refinement: Rule refinement: Rule refinement: refine an existing rule program –Refine rules [Liu et al., 2010]–Refine dictionaries used by the rules [Roy et al., 2013]

107

Page 108: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

“JJJThis April, mark your calendars for the first derby of the season: Arsenal at Chelsea.

JJJJJJJJ.,..April Smith and John Lee reporting live from JJ.. David said thatJJ”

10

8

Dictionary Refinement Problem [Roy et al, SIGMOD 2013]Dictionary Refinement Problem [Roy et al, SIGMOD 2013]Dictionary Refinement Problem [Roy et al, SIGMOD 2013]Dictionary Refinement Problem [Roy et al, SIGMOD 2013]

Goal: Maximize F-scoreSelect a set S of entries to remove from dictionaries

J that maximizes the new F-score

J subject to |S| ≤ k

or new recall ≥ rSize Constraint

(limit #deleted entries) Recall Constraint

(limit #true positives deleted)

April �

Chelsea �

April Smith �

John Lee �

David �

w5 + w3 w4

w3 Input:• Predicate-based rule program (SQL-like)

• Boolean model-level provenance of each result

• ���� / ���� Label of each result

Input:• Predicate-based rule program (SQL-like)

• Boolean model-level provenance of each result

• ���� / ���� Label of each result

w7

w1

w2 w6

We also studied

the incomplete labeling case

S =

Possible output

New F-score = 1 ☺

w1: chelsea

w3: april

Page 109: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

“JJJThis April, mark your calendars for the first derby of the season: Arsenal at Chelsea.

JJJJJJJJ.,..April Smith and John Lee reporting live from JJ.. David said thatJJ”

10

9

Dictionary Refinement Problem [Roy et al, SIGMOD 2013]Dictionary Refinement Problem [Roy et al, SIGMOD 2013]Dictionary Refinement Problem [Roy et al, SIGMOD 2013]Dictionary Refinement Problem [Roy et al, SIGMOD 2013]

Goal: Maximize F-scoreSelect a set S of entries to remove from dictionaries

J that maximizes the new F-score

J subject to |S| ≤ k

or new recall ≥ rSize Constraint

(limit #deleted entries) Recall Constraint

(limit #true positives deleted)

April �

Chelsea �

April Smith �

John Lee �

David �

w5 + w3 w4

w3 Input:• Predicate-based rule program (SQL-like)

• Boolean model-level provenance of each result

• ���� / ���� Label of each result

Input:• Predicate-based rule program (SQL-like)

• Boolean model-level provenance of each result

• ���� / ���� Label of each result

w7

w1

w2 w6

We also studied

the incomplete labeling case

S =

Possible output

New F-score = 1 ☺

w1: chelsea

w3: april

ChallengesChallengesChallengesChallenges

• Complex input-output dependencies• Complex objective function

ChallengesChallengesChallengesChallenges

• Complex input-output dependencies• Complex objective function

Page 110: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation11

0

Complex Objective FunctionComplex Objective FunctionComplex Objective FunctionComplex Objective Function

2 * G-s

Go + G-s + B-s

New F-score after deleting S =

Go = original #true positives

G-s = remaining #true positives after deleting S

B-s = remaining #false positives after deleting S

Both numerator and denominator depend on S

(even if we try to rewrite the expression)

Page 111: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation11

1

Results: Simple Rules Results: Simple Rules Results: Simple Rules Results: Simple Rules

(remaining true

positives after

deleting S ≥ r)

Some details next

NP-hard(reduction from

the subset-sum problem)

“Near optimal” Algorithm(simple, provably close to

optimal)

Optimal Algorithm

• Provenance has a simple form• One input to many results• Provenance has a simple form• One input to many results

Page 112: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation11

2

Sketch of Optimal AlgorithmSketch of Optimal AlgorithmSketch of Optimal AlgorithmSketch of Optimal Algorithmfor Simple Rules, Size Constraint |S| for Simple Rules, Size Constraint |S| for Simple Rules, Size Constraint |S| for Simple Rules, Size Constraint |S| ≤ kkkk

1. Guess the optimal F-score θ

2. Verify if there exists a subset S,

|S|≤ k, giving this F-score θ

3. Repeat by binary-search in [0, 1]

until the optimal θ is found

2 * G-s

Go + G-s + B-s

F-s =

G-s (2 - θ) - θB-s - θGo ≥ 0

≥ θ

G-s = Go - ΣΣΣΣ w∈∈∈∈SGw B-s = Bo - ΣΣΣΣ w∈∈∈∈SBw

ΣΣΣΣ w∈∈∈∈Sf(Gw,Bw) ≥ Const, where |S| ≤ k

Does not work for

general case

(many-to-many)

Binary search on real

numbers in [0, 1]

(still poly-time) Top-k problem,

poly-time!

Page 113: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation11

3

Results: Complex RulesResults: Complex RulesResults: Complex RulesResults: Complex Rules

Simple RulesProvenance: w

Optimal Algorithm

(bound on the true

positives retained)

NP-hard

“Near optimal” Algorithm

• Efficient Heuristics

•Sketch:

•Find an initial solution

•Improve solution by

hill-climbing

NP-hard

even for two dictionaries(reduction from

the k-densest subgraph problem)

• Arbitrary extraction rules• Arbitrary provenance• Many to many dependency

• Arbitrary extraction rules• Arbitrary provenance• Many to many dependency

Page 114: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation11

4

What if not all the results are labeled?

April �

Chelsea �

April Smith �

John Lee �

David �

w5 + w3 w4

w3

w7

w1

w2 w6

Jignoring unlabeled results may lead to over-fitting

So far we assumed all results are labeled as

true positive / false positive

Page 115: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation11

5

Simple Rules Complex Rules

Estimating Missing LabelsEstimating Missing LabelsEstimating Missing LabelsEstimating Missing Labels

April Smith

John Lee

David

w5 + w3 w4

w3

w2 w6

April

Chelsea

David

April

Chelsea

April

David

w3

w7

w1

w3

w7

w3

w1

����

����

����

Possible approach:Label of an entry =

Empirical fraction of true positives

w3 april: 0.33

w1 chelsea: 0.50

w7 david: 1.00

Empirical estimation does not work!

• Arbitrary monotone Boolean

expressions

• Very few or no labels available!

• We assume a statistical model and

estimate labels using

Expectation-Maximization algorithm

0

1

w7

w3

w1Chelsea

April

David

0.33

Page 116: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Transparent ML in Transparent ML in Transparent ML in Transparent ML in Learning Learning Learning Learning of Predicateof Predicateof Predicateof Predicate----based Rulesbased Rulesbased Rulesbased Rules

� Transparency in Model of Representation– Predicate-based rules, completely declarative– Model-level provenance computed automatically– Interesting issue: Interpretability of program

• Induced program is declarative, but there is a more subjective aspect of “code quality” �Two equivalent programs may have very different levels of “interpretability”

• Applies primarily to Rule Induction• Applies to Rule Refinement to a considerable smaller extent because: (1) learning is constrained by the initial program, and (2) user guides the learning interactively

• Initial investigation [Nagesh et. Al, 2012]; more work is needed

� Transparency in Learning Algorithm– Some transparency in terms of the user influencing the model

• Rule Induction � inductive bias • Rule Refinement � user selects among suggested refinements

� Transparency in Incorporation of Domain Knowledge (DK)–Offline (Rule Induction) or Interactive (Rule Refinement)– Easy to incorporate DK at deployment (by further modifying the rules)

116

Page 117: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Transparent ML Techniques Transparent ML Techniques Transparent ML Techniques Transparent ML Techniques

117

Unsupervised Semi-supervised Supervised

Dictionary

Regex

Rules

Rules + Classifier

Classification Rules

Page 118: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

FlashExtract [Le & FlashExtract [Le & FlashExtract [Le & FlashExtract [Le & Gulwani, PLDI 2014Gulwani, PLDI 2014Gulwani, PLDI 2014Gulwani, PLDI 2014] ] ] ]

� GoalGoalGoalGoal: Data Extraction from semi-structured text

documents

� User InteractionUser InteractionUser InteractionUser Interaction: Positive/negative examples of

rectangular regions on a document

– Interactive

� Different colors & nested regions enables data extraction

into a data structure with struct/sequence constructs

SeqSeqSeqSeq([blueblueblueblue] StructStructStructStruct(Name: [greengreengreengreen] String,

City: [yellowyellowyellowyellow] String))

� Techniques borrowed from program synthesis

118

Page 119: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

FlashExtract: Learning AlgorithmFlashExtract: Learning AlgorithmFlashExtract: Learning AlgorithmFlashExtract: Learning Algorithm

� Model of Representation: Model of Representation: Model of Representation: Model of Representation: Program consisting of

core operations:

– Map, Filter, Merge, Pair

� Learning Algorithm: Learning Algorithm: Learning Algorithm: Learning Algorithm: Inductive on the grammar

structure

– Learn programs from positive examples

– Discard those that capture the negative

examples

� Learn city extractor = learn a Map operator

– The lineslineslineslines that hold the city

– The pairpairpairpair that identifies the city within a line

� Learn lines = learn a Boolean filter

119

Page 120: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

FlashExtract: City ExtractorFlashExtract: City ExtractorFlashExtract: City ExtractorFlashExtract: City Extractor

1.1.1.1. FilterFilterFilterFilter lines that end with

“WA”

120

Page 121: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

FlashExtract: City FlashExtract: City FlashExtract: City FlashExtract: City ExtractorExtractorExtractorExtractor

1.1.1.1. FilterFilterFilterFilter lines that end with

“WA”

2.2.2.2. MapMapMapMap each selected line

to a pairpairpairpair of positions

121

Page 122: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

FlashExtract: City ExtractorFlashExtract: City ExtractorFlashExtract: City ExtractorFlashExtract: City Extractor

1.1.1.1. FilterFilterFilterFilter lines that end with

“WA”

2.2.2.2. MapMapMapMap each selected line

to a pairpairpairpair of positions

3. Learn two leaf

expressions for the

start/end positions• Begin of line

• ‘, ‘

122

Page 123: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Transparent ML in Transparent ML in Transparent ML in Transparent ML in FlashExtractFlashExtractFlashExtractFlashExtract

�Transparency in Model of Representation– Simple domain-specific language � easy to comprehend

– Language is imperative � no model-level provenance

• Output can be explained only by watching program execution

�Transparency in Learning Algorithm– No transparency

�Transparency in Incorporation of Domain Knowledge (DK)– Interactive

–Can incorporate DK at deployment (by further modifying the program)

123

Page 124: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Transparent ML Techniques Transparent ML Techniques Transparent ML Techniques Transparent ML Techniques

124

Unsupervised Semi-supervised Supervised

Dictionary

Regex

Rules

Rules + Classifier

Classification Rules

Page 125: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Rule Learning: UnsupervisedRule Learning: UnsupervisedRule Learning: UnsupervisedRule Learning: Unsupervised

�Traditional IE: Traditional IE: Traditional IE: Traditional IE: Pattern Discovery [Li et al., CIKM 2011]

�Open IE: Open IE: Open IE: Open IE: ClauseIE [DelCorro & Gemulla, WWW 2013]

125

Page 126: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Rule Learning: UnsupervisedRule Learning: UnsupervisedRule Learning: UnsupervisedRule Learning: Unsupervised

�Traditional IE: Traditional IE: Traditional IE: Traditional IE: Pattern Discovery [Li et al., CIKM 2011]

�Open IE: Open IE: Open IE: Open IE: ClauseIE [DelCorro & Gemulla, WWW 2013]

126

Page 127: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation12

7

Pattern Discovery [LiPattern Discovery [LiPattern Discovery [LiPattern Discovery [Li et al., CIKM 2011]et al., CIKM 2011]et al., CIKM 2011]et al., CIKM 2011]

�Manually identify patterns � tedious + time consuming–⟨PERSON⟩ .* at .* ⟨PHONE_NUMBER⟩–⟨PERSON⟩’s (cell|office|home)? number is ⟨PHONE_NUMBER⟩

� Basic idea:–Group similar strings together to facilitate pattern discovery

Kristen’s phone number is (281)584-1405Andrea Walter’s office number is x345763

Curt’s number is 713-789-0090�

⟨PERSON⟩’s (cell|office|home)? Number is ⟨PHONE_NUMBER⟩

Page 128: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation128

Practical RequirementsPractical RequirementsPractical RequirementsPractical Requirements

�ConfigurableConfigurableConfigurableConfigurable

–Grouping may be done along multiple aspects of the data

�DeclarativeDeclarativeDeclarativeDeclarative

–Providing justification for group membership for debugging

�ScalableScalableScalableScalable

–We expect to have many instances and possibly many groups

Page 129: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation129

Overview: Clustering based on Overview: Clustering based on Overview: Clustering based on Overview: Clustering based on SemanticSemanticSemanticSemantic----SignatureSignatureSignatureSignature

InputSequence

Mining

Generating

Drop RulesGrouping

Sequence DB

Computing

Correlation

Generating

Semantic-

Signature

Configuration

Potentially offline

Page 130: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation130

Running Example: Person Phone

ID Input Contextual String

1 can be reached at

2 can be reached at her cell

3 can also be reached at

4 may be reached at her office #

Input

� John can be reached at (408)123-4567

� Jane can be reached at her cell (212)888-1234

�Mr. Doe can also be reached at (123)111-2222

�Mary may be reached at her office # (111)222-3333

Page 131: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation131

Input Contextual StringID

may be reached at her office #4

can also be reached at3

can be reached at her cell2

can be reached at1

Input Contextual StringID

may be reached at her office #4

can also be reached at3

can be reached at her cell2

can be reached at1 be reached

reached at

be

at

can

Sequence

be reached

reached at

be

at

can

Sequence

Step 1. Sequence MiningStep 1. Sequence MiningStep 1. Sequence MiningStep 1. Sequence Mining

Input Sequence Mining

Generating Drop Rules

GroupingComputing Correlation

Generating Semantic Signature

� Configurable by–ffffminminminmin: Minimum support of the sequence–llllminminminmin: Minimum sequence length–llllmaxmaxmaxmax: Maximum sequence length

Example: Given fmin=3, lmin=1, lmax=2

Page 132: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation132

Step 2. Computing CorrelationStep 2. Computing CorrelationStep 2. Computing CorrelationStep 2. Computing Correlation

Input Sequence Mining

Generating Drop Rules

GroupingComputing Correlation

Generating Semantic Signature

� Different measures of correlation can be used

– The presence of one sequence predicates the other

– Uncertainty Coefficient

Example

0.2930.029atcan

0.2770.022atbe reached

0.7500.946be reachedcan

U(Y|X)U(X|Y)Sequence YSequence X

0.2930.029atcan

0.2770.022atbe reached

0.7500.946be reachedcan

U(Y|X)U(X|Y)Sequence YSequence X

Page 133: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation133

Step 3. Generating Drop Rules Step 3. Generating Drop Rules Step 3. Generating Drop Rules Step 3. Generating Drop Rules ---- IIII

Input Sequence Mining

Generating Drop Rules

GroupingComputing Correlation

Applying Drop Rules

� Rule format:

– DROP sequence X IF sequence X AND sequence Y (present in the same contextual

string)

� Generated based on threshold over correlation measure

Generating Semantic Signature

Page 134: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation134

Step 3. Generating Drop Rules - II

Input Sequence Mining

Generating Drop Rules

GroupingComputing Correlation

Applying Drop Rules

Example: If U(X|Y) > 0.25 or U(Y|X) > 0.25, generate a drop rule

Co

nfi

de

nce

sc

ore

DROP “can” IF “can” AND “be reached”DROP “be reached” IF “can” AND “be reached”DROP “at” IF “be reached” AND “at”DROP “at” IF “can” AND “at”

Sequence X Sequence Y U(X|Y) U(Y|X)

can be reached 0.946 0.750

be reached at 0.022 0.277

can at 0.029 0.293

Generating Semantic Signature

Page 135: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation135

Step 4. Generating Semantic SignatureStep 4. Generating Semantic SignatureStep 4. Generating Semantic SignatureStep 4. Generating Semantic Signature

� Applying drop rules in the decreasing order of their associated confidence score

Example:

DROP “can” IF “can” AND “be reached”DROP “be reached” IF “can” AND “be reached”DROP “at” IF “be reached” AND “at”DROP “at” IF “can” AND “at”

can; be reached; at

Input Sequence Mining

Generating Drop Rules

GroupingComputing Correlation

Applying Drop Rules

Generating Semantic Signature

Page 136: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation136

Step 5. GroupingStep 5. GroupingStep 5. GroupingStep 5. Grouping

� Step 1: Sequences with the same semantic signature form a group

� Step 2: Further merge groups of small size with similarsimilarsimilarsimilar semantic signatures to those of the larger

ones

� reduce the number of clusters to be examined

Input Sequence Mining

Generating Drop Rules

GroupingComputing Correlation

Applying Drop Rules

Generating Semantic Signature

Page 137: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Transparent ML in Pattern DiscoveryTransparent ML in Pattern DiscoveryTransparent ML in Pattern DiscoveryTransparent ML in Pattern Discovery

�Transparency in Model of Representation– Sequence Patterns

– Model-level Provenance

�Transparency in Learning Algorithm–Some algorithm-level provenance: final sequences can be explained through the

chain of drop rules

–User can influence the model through the initial configuration

�Transparency in Incorporation of Domain Knowledge (DK)–Offline

–But, easy to incorporate domain knowledge at deployment (by further modifying

the rules)

137

Page 138: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Rule Learning: UnsupervisedRule Learning: UnsupervisedRule Learning: UnsupervisedRule Learning: Unsupervised

�Traditional IE: Traditional IE: Traditional IE: Traditional IE: Pattern Discovery [Li et al., CIKM 2011]

�Open IE: Open IE: Open IE: Open IE: ClauseIE [DelCorro & Gemulla, WWW 2013]

138

Page 139: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

ClausIEClausIEClausIEClausIE [Del [Del [Del [Del CorroCorroCorroCorro & & & & GemullaGemullaGemullaGemulla, WWW , WWW , WWW , WWW 2013] 2013] 2013] 2013]

� Goal: Goal: Goal: Goal: Separate the identification of information from its representation

� Identifies essential and optional arguments in a clause

– 7 essential clauses: SV, SVA, SVO, SVC, SVOOind, SVOA, SVOC

– A minimal clause is a clause without the optional adverbials (A)

� AlgorithmAlgorithmAlgorithmAlgorithm

1. Clause Identification: Walk the dependency tree and identify clauses using a deterministic

flow chart of decision questions

2. Proposition Generation: For each clause, generate one or more propositions

139

Page 140: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

ClausIEClausIEClausIEClausIE: Example: Example: Example: Example

140

Bell, a telecommunication company, which is based in Los Angeles,

makes and distributes electronic and building products.

Bell, a telecommunication company, which is based in Los Angeles,

makes and distributes electronic and building products.

(S: Bell, V: ‘is’, C: a telecommunication company)

(S: Bell, V: is based, A: in Los Angeles)

(S: Bell, V: makes, O: electronic products)

(S: Bell, V: makes, O: computer products)

(S: Bell, V: makes, O: building products)

(S: Bell, V: distributes, O: electronic products)

(S: Bell, V: distributes, O: computer products)

(S: Bell, V: distributes, O: building products)

Page 141: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

ClausIEClausIEClausIEClausIE: Example: Example: Example: Example

141

Bell, a telecommunication company, which is based in Los Angeles,

makes and distributes electronic and building products.

Bell, a telecommunication company, which is based in Los Angeles,

makes and distributes electronic and building products.

(S: Bell, V: ‘is’, C: a telecommunication company)

(S: Bell, V: is based, A: in Los Angeles)

(S: Bell, V: makes, O: electronic products)

(S: Bell, V: makes, O: computer products)

(S: Bell, V: makes, O: building products)

(S: Bell, V: distributes, O: electronic products)

(S: Bell, V: distributes, O: computer products)

(S: Bell, V: distributes, O: building products)

Page 142: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Clause Identification Flow ChartClause Identification Flow ChartClause Identification Flow ChartClause Identification Flow Chart

142

Page 143: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Transparent ML Transparent ML Transparent ML Transparent ML in in in in ClausIEClausIEClausIEClausIE

�Transparency in Model of Representation– Essential clauses = abstraction of dependency path patterns – Easier to comprehend compared to path patterns– Model-level provenance (partial):

• Can connect an extracted object with the part of the model (i.e., clause) that determined it• Comprehending why the clause matches the parse tree of the input text requires reasoning about the clause identification flow chart

�Transparency in Learning Algorithm–User can influence the model through customizing the types of generated propositions• Type of relations: Messi plays in Barcelona � plays or plays in• Triples or n-ary propositions: (Messi, plays football in, Barcelona) or (Messi, plays, football, in Barcelona)

�Transparency in Incorporation of Domain Knowledge (DK)–Offline

143

Page 144: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Transparent ML Techniques Transparent ML Techniques Transparent ML Techniques Transparent ML Techniques

144

Unsupervised Semi-supervised Supervised

Dictionary

Regex

Rules

Rules + Classifier

Classification Rules

Page 145: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Fact Extraction: SupervisedFact Extraction: SupervisedFact Extraction: SupervisedFact Extraction: Supervised

145

Set of extraction patterns that, collectively,

can extract every NP in the training corpus.

Semantically constrain the types of noun

phrases that are legitimate extractions for

opinion sources

Count number of correct and incorrect

extractions for each pattern; estimate

probability that the pattern will extract an

opinion source in new texts

AutoSlog heuristicsAutoSlog heuristics

Semantic restrictionSemantic restriction

Apply patterns to corpus;

gather statistics

Apply patterns to corpus;

gather statistics

CRFCRF

� AutoSlog-SE [Choi et al., EMNLP 2005]: Identifying sources of opinions with CRF and extraction

patterns

Basic features:

orthographic, lexical,

syntactic, semantic

Basic features:

orthographic, lexical,

syntactic, semantic

Incorporate extraction patterns as

features to increase recall of CRF model.

Page 146: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Transparent ML Transparent ML Transparent ML Transparent ML in AutoSlogin AutoSlogin AutoSlogin AutoSlog----SESESESE

�Transparency in Model of Representation– Path patterns + CRF

– Model-level provenance (partial)

• Provenance at the level of patterns

• No provenance at the level of the CRF � overall, cannot explain an extracted object

�Transparency in Learning Algorithm–CRF training is not transparent

�Transparency in Incorporation of Domain Knowledge (DK)–Offline

146

Page 147: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Transparent ML Techniques Transparent ML Techniques Transparent ML Techniques Transparent ML Techniques

147

Unsupervised Semi-supervised Supervised

Dictionary

Regex

Rules

Rules + Classifier

Classification Rules

Page 148: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

SemiSemiSemiSemi----supervised (using Bootstrapping) Relation Extractionsupervised (using Bootstrapping) Relation Extractionsupervised (using Bootstrapping) Relation Extractionsupervised (using Bootstrapping) Relation Extraction

Initial Seed Tuples:

Initial Seed Tuples Occurrences of Seed Tuples

Generate Extraction Patterns

Generate New Seed Tuples

Augment Table

ORGANIZATION LOCATION

MICROSOFT REDMOND

IBM ARMONK

BOEING SEATTLE

INTEL SANTA CLARA

Example Task: Organization “located in” Location

Slide from Eugene Agichtein148

Page 149: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

SemiSemiSemiSemi----supervised (using Bootstrapping) Relation Extractionsupervised (using Bootstrapping) Relation Extractionsupervised (using Bootstrapping) Relation Extractionsupervised (using Bootstrapping) Relation Extraction

Occurrences of

seed tuples:

Computer servers at Microsoft’s

headquarters in Redmond…

In mid-afternoon trading, share of

Redmond-based Microsoft fell…

The Armonk-based IBM introduced

a new line…

The combined company will operate

from Boeing’s headquarters in Seattle.

Intel, Santa Clara, cut prices of its

Pentium processor.

ORGANIZATION LOCATION

MICROSOFT REDMOND

IBM ARMONK

BOEING SEATTLE

INTEL SANTA CLARA

Initial Seed Tuples Occurrences of Seed Tuples

Generate Extraction Patterns

Generate New Seed Tuples

Augment Table

Slide from Eugene Agichtein149

Page 150: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

• <STRING1>’s headquarters in <STRING2>

•<STRING2> -based <STRING1>

•<STRING1> , <STRING2>

SemiSemiSemiSemi----supervised (using Bootstrapping) Relation Extractionsupervised (using Bootstrapping) Relation Extractionsupervised (using Bootstrapping) Relation Extractionsupervised (using Bootstrapping) Relation Extraction

Initial Seed Tuples Occurrences of Seed Tuples

Generate Extraction Patterns

Generate New Seed Tuples

Augment Table

DIPRE Patterns

[Brin, WebDB 1998]

Slide from Eugene Agichtein150

Page 151: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

SemiSemiSemiSemi----supervised (using Bootstrapping) Relation Extractionsupervised (using Bootstrapping) Relation Extractionsupervised (using Bootstrapping) Relation Extractionsupervised (using Bootstrapping) Relation Extraction

Initial Seed Tuples Occurrences of Seed Tuples

Generate Extraction Patterns

Generate New Seed Tuples

Augment Table

Generatenew seedtuples; start newiteration

ORGANIZATION LOCATION

AG EDWARDS ST LUIS

157TH STREET MANHATTAN

7TH LEVEL RICHARDSON

3COM CORP SANTA CLARA

3DO REDWOOD CITY

JELLIES APPLE

MACWEEK SAN FRANCISCO

Slide from Eugene Agichtein151

Page 152: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Fact Extraction: SemiFact Extraction: SemiFact Extraction: SemiFact Extraction: Semi----supervised and Unsupervisedsupervised and Unsupervisedsupervised and Unsupervisedsupervised and Unsupervised

Systems differ in:

� Model of Representation

� Learning Algorithm and Incorporation of Domain Knowledge:

– Bootstrapping � initial set of seeds grown iteratively, over multiple iterations

– Distant supervision � a single iteration

– Unsupervised � no seeds

152

Page 153: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Fact Extraction: SemiFact Extraction: SemiFact Extraction: SemiFact Extraction: Semi----supervised and Unsupervisedsupervised and Unsupervisedsupervised and Unsupervisedsupervised and Unsupervised

� Bootstrapping � initial set of seeds grown iteratively, over multiple iterations

� Distant supervision � a single iteration

� Unsupervised � no seeds

153

Page 154: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Bootstrapping: Example SystemsBootstrapping: Example SystemsBootstrapping: Example SystemsBootstrapping: Example Systems

• AutoSlog-TS [Riloff, AAAI 1996]

• DIPRE [Brin, WebDB 1998]

• Snowball [Agichtein & Gravano, DL 2000]

• KnowItAll [Etzioni et al., J. AI 2005]

• KnowItNow [Cafarella et al., HLT 2005]

• Fact Extraction on the Web [Pasca et al., ACL 2006]

• Coupled Pattern Learning (part of NELL) [Carlson et al., WSDM 2010]

• [Gupta & Manning, ACL 2014]

• INSTAREAD [Hoffman et al., CoRR abs. 2015]

• …

154

Page 155: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Bootstrapping: Example SystemsBootstrapping: Example SystemsBootstrapping: Example SystemsBootstrapping: Example Systems

• AutoSlog-TS [Riloff, AAAI 1996]

• DIPRE [Brin, WebDB 1998]

• Snowball [Agichtein & Gravano, DL 2000]

• KnowItAll [Etzioni et al., J. AI 2005]

• KnowItNow [Cafarella et al., HLT 2005]

• Fact Extraction on the Web [Pasca et al., ACL 2006]

• Coupled Pattern Learning (part of NELL) [Carlson et al., WSDM 2010]

• [Gupta & Manning, ACL 2014]

• INSTAREAD [Hoffman et al., CoRR abs. 2015]

• …

155

Page 156: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

SnowballSnowballSnowballSnowball [Agichtein [Agichtein [Agichtein [Agichtein & & & & Gravano, DL 2000Gravano, DL 2000Gravano, DL 2000Gravano, DL 2000] ] ] ]

�5-tuple: <left, tag1, middle, tag2, rightleft, tag1, middle, tag2, rightleft, tag1, middle, tag2, rightleft, tag1, middle, tag2, right>, – tag1tag1tag1tag1, tag2tag2tag2tag2 are named-entity tags (from a NER component)

– leftleftleftleft, middlemiddlemiddlemiddle, and rightrightrightright are vectors of weighed terms.

ORGANIZATION LOCATION

< left , tag1 , middle , tag2 , right >

ORGANIZATION 's central headquarters in LOCATION is home to...

{<'s 0.5>, <central 0.5>

<headquarters 0.5>, < in 0.5>}

{<is 0.75>,

<home 0.75> }

156

Page 157: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

{<servers 0.75>

<at 0.75>}

Snowball Pattern GenerationSnowball Pattern GenerationSnowball Pattern GenerationSnowball Pattern Generation

{<’s 0.5> <central 0.5>

<headquarters 0.5> <in

0.5>}ORGANIZATION LOCATION

{<shares 0.75>

<of 0.75>}

{<- 0.75>

<based 0.75> }{<fell 1>}

{<the 1>} {<- 0.75>

<based 0.75> }

ORGANIZATION

LOCATION {<introduced

0.75> <a 0.75>}

LOCATION

ORGANIZATION

{<operate 0.75>

<from 0.75>}

{<’s 0.7> <headquarters

0.7> <in 0.7>}ORGANIZATION LOCATION

Cluster 1

Cluster 2

Occurrences of seed tuples converted to Pattern Representation.

The weight of each term is a function of the frequency of the term in the corresponding context.

Patterns clustered using a similarity metric

Patterns are formed as centroidscentroidscentroidscentroids of the clusters.

157

Page 158: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

SnowballSnowballSnowballSnowball Tuple ExtractionTuple ExtractionTuple ExtractionTuple Extraction

� Represent each new text segment in the collection as a 5-tuple:

� Find most similar pattern (if any)

� Estimate correctness of extracted tuple:

– A tuple has high confidence if generated by multiple high-confidence patterns

– Conf (Pattern) = #positive /(#positive + # negative)• #positive: extracted tuples that agree on both Org and Loc attributes with a seed tuple from a previous

iteration

• #negative: extracted tuples with the same Org value with a seed tuple, but different Loc value (assumes

Org is a key for the relation)

Netscape 's flashy headquarters in Mountain View is near

LOCATIONORGANIZATION{<'s 0.7>, <headquarters 0.7>,

< in 0.7>}

158

Page 159: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

KnowItAll KnowItAll KnowItAll KnowItAll [[[[Etzioni et Etzioni et Etzioni et Etzioni et al., J. AI 2005al., J. AI 2005al., J. AI 2005al., J. AI 2005] ] ] ] PredicatesCountry(X)

Domain-independent

Rule Templates<class> “such as” NP

Bootstrapping

Extraction Rules“countries such as” NP

Discriminators“country X”

ExtractorWorld Wide Web

ExtractionsCountry(“France”)

Assessor

Validated ExtractionsCountry(“France”), prob=0.999

Slide from Dan Weld159

Page 160: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

KnowItAll RulesKnowItAll RulesKnowItAll RulesKnowItAll Rules

Rule Template Rule Template Rule Template Rule Template (domain-independent):

Predicate: predName(Class1)

Pattern: NP1 “such as” NPList2

Contraints: head(NP1) = plural(label(Class1)

properNoun(head(each(NPList2)))

Bindings: instanceOf(Class1, head(each(NPList2)))

Extraction Rule Extraction Rule Extraction Rule Extraction Rule (substituting “instanceOf” and “Country”)

Predicate: instanceOf(Country)

Pattern: NP1 “such as” NPList2

Contraints: head(NP1) = “nations”

properNoun(head(each(NPList2)))

Bindings: instanceOf(Country, head(each(NPList2)))

Keywords: “nations such as”

SentenceSentenceSentenceSentence: Other nations such as France, India and Pakistan, have conducted recent tests.

ExtractionsExtractionsExtractionsExtractions:

instanceOf(Country, France), instanceOf(Country, India), instanceOf(Country, Pakistan)160

Page 161: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

KnowItAll Pattern Learning

� Goal: Goal: Goal: Goal: supplement domain-independent patterns with domain-specific patterns

”Headquarted in <city>”

� To increase recall (by learning extractors) and precision (by learning discriminators)

� Bootstrapping algorithm:

– Start with seed instances generated by domain-independent extractors

– For each seed, issue a Web search query and return the documents

– For each occurrence in each document, form a context string by taking the w

words to its left and right

–Output the best patterns according to some metric. A pattern is any substring

of the context string that includes the occurrence and at least one other word

161

Page 162: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Coupled Pattern Learning [Carlson Coupled Pattern Learning [Carlson Coupled Pattern Learning [Carlson Coupled Pattern Learning [Carlson et al., 2010et al., 2010et al., 2010et al., 2010]]]]

162

hard hard hard hard (under constrained)

semi-supervised learning problem

easier easier easier easier (more constrained)

semi-supervised learning problem

Basic Idea: Basic Idea: Basic Idea: Basic Idea: coupled training via multiple functions to avoid semantic drift

� use the output of one classification function

to compare to another and vice versa

Page 163: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Coupled Pattern Learning [Carlson et al., 2010]Coupled Pattern Learning [Carlson et al., 2010]Coupled Pattern Learning [Carlson et al., 2010]Coupled Pattern Learning [Carlson et al., 2010]

� InputInputInputInput: Ontology of entity and relation types; seed tuples

� Model of RepresentationModel of RepresentationModel of RepresentationModel of Representation: Sequence Patterns + Ranking Function

� Types of Types of Types of Types of Coupled ConstraintsCoupled ConstraintsCoupled ConstraintsCoupled Constraints

– Mutual exclusion

• Mutually exclusive predicates cannot both be satisfied by the same input

– Argument type-checking

• E.g., arguments of CompanyIsInEconomicSector relation have to be of type Company and

EconomicSector

� Coupled Pattern Learning:Coupled Pattern Learning:Coupled Pattern Learning:Coupled Pattern Learning:

1. Generate patterns (for both entity and relation)

2. Extract candidate tuples

3. (New) Filter tuples based on constraints

4. Rank patterns and tuples; decide which to promote

5. Repeat

� Part of the NELL system [Mitchell et al., AAAI 2015]

163

Page 164: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Iterative Feedback in NELL [Mitchell Iterative Feedback in NELL [Mitchell Iterative Feedback in NELL [Mitchell Iterative Feedback in NELL [Mitchell et al., AAAI 2015et al., AAAI 2015et al., AAAI 2015et al., AAAI 2015]]]]

164

Model-level

Provenance

User feedback incorporated

in next iterations of learning

Page 165: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Transparent ML in Bootstrapping SystemsTransparent ML in Bootstrapping SystemsTransparent ML in Bootstrapping SystemsTransparent ML in Bootstrapping Systems

� Transparency in Model of Representation– Sequence Patterns + Ranking function– Partial Model-level Provenance: Extracted objects explained by the supporting patterns

• Snowball: Term weights make patterns more difficult to comprehend, loosing some transparency

• Cannot typically explain why the extracted object is above the ranking threshold

� Transparency in Learning Algorithm– Algorithm-level Provenance in KnowItAll and CPL

• Learning of each pattern Learning of each pattern Learning of each pattern Learning of each pattern can be explained by the supporting tuplessupporting tuplessupporting tuplessupporting tuples• Extraction of each tuple Extraction of each tuple Extraction of each tuple Extraction of each tuple can be explained by the supporting patternssupporting patternssupporting patternssupporting patterns

– Snowball �more diffused provenance because patters are centroids of clusters, hence explainable by support tuples of all patterns in the cluster

– KnowItAll: some transparency in influencing the model based on initial keywords– SPIED-Viz [Gupta & Manning 2014] � Visually explain patterns/tuples (see Part 4)

�Transparency in Incorporation of Domain Knowledge (DK)–Offline (Snowball, KnowItAll) or Interactive (CPL)–Possible to incorporate DK at deployment (by reviewing the patterns)

• CPL� crowdsourced review of tuples for continuous learning

165

Page 166: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

INSTAREAD [Hoffmann et INSTAREAD [Hoffmann et INSTAREAD [Hoffmann et INSTAREAD [Hoffmann et al., al., al., al., CoRRCoRRCoRRCoRR abs. 2015abs. 2015abs. 2015abs. 2015]]]]� Model Model Model Model of Representation: of Representation: of Representation: of Representation: Prolog-like predicate-based rules

killNoun(‘murder’);

killOfVictim(c, b) ⇐ prep-of(c, b) ∧ token(c, d) ∧ killNoun(d);

killed(a, b) ⇐ person(a) ∧ person(b) ∧ nsubjpass(c, a)

∧ token(c, ‘sentenced’) ∧ prep-for(c, d) ∧ killOfVictim(d, b);

Mr. Williams was sentenced for the murder of Wright.

killOfVictim(murder, Wright), killed(Williams, Wright)

� Support for disjunction (∨), negation (¬), existential (∃) and universal (∀) quantification

� Rich set of predicates:

– Built-in: tokenBefore, isCapitalized, …

– Output of other NLP systems: Phrase structure, Typed dependencies parser, Co-reference resolution,

Named entities

166

Page 167: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

INSTAREAD [Hoffmann et al., INSTAREAD [Hoffmann et al., INSTAREAD [Hoffmann et al., INSTAREAD [Hoffmann et al., CoRRCoRRCoRRCoRR abs. 2015abs. 2015abs. 2015abs. 2015]]]]

Semi-automatic rule generation with user in the loop

1.1.1.1. Core Linguistic Rules: Core Linguistic Rules: Core Linguistic Rules: Core Linguistic Rules: Prepopulate the system with syntactic lexical patterns

– Given subject X, object Y and verb ‘kill’, generate rules to capture ‘X killed Y’, ‘Y was killed by

X’,…

2.2.2.2. Bootstrapped Bootstrapped Bootstrapped Bootstrapped RRRRule induction: ule induction: ule induction: ule induction: Use results of existing rules to generate seed tuples to

automatically generate ranked list of new rules

– Two ranking criteria: PMI and number of extractions

– Allow the user to manually inspect the rules and select the rules

3.3.3.3. WordWordWordWord----level distributional similarity: level distributional similarity: level distributional similarity: level distributional similarity: Given seed keyword, automatically suggest similar

keywords

– Generate new rules based on user keyword selection

167

Page 168: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Transparent ML in Transparent ML in Transparent ML in Transparent ML in INSTAREADINSTAREADINSTAREADINSTAREAD

�Transparency in Model of Representation– Predicate-based rules, declarative

– Model-level Provenance

�Transparency in Learning Algorithm– Transparency in terms of user influencing the model by selecting rules

– User-friendly visual interface (see Part 4)

�Transparency in Incorporation of Domain Knowledge (DK)– InteractiveInteractiveInteractiveInteractive: User can modify/remove a generated rule, or define a new rule, e.g., based on

suggested keywords

–Easy to incorporate DK at deployment (by further modifying the rules)

168

Page 169: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Fact Extraction: SemiFact Extraction: SemiFact Extraction: SemiFact Extraction: Semi----supervised and Unsupervisedsupervised and Unsupervisedsupervised and Unsupervisedsupervised and Unsupervised

� Bootstrapping � initial set of seeds grown iteratively, over multiple iterations

� Distant supervision � a single iteration

� Unsupervised � no seeds

169

Page 170: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Fact Extraction: Distant SupervisionFact Extraction: Distant SupervisionFact Extraction: Distant SupervisionFact Extraction: Distant Supervision

• General Framework

1. Construct training set of seed tuples

2. Distant supervision: generalize training set into extraction patterns

3. Execute patterns

4. Score extracted tuples

• Example systems:

• OLLIE [Mausam et al., EMNLP 2012]

• RENOUN [Yahya et al. EMNLP 2014]

170

Page 171: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

OLLIE [Mausam et al., EMNLP 2012]OLLIE [Mausam et al., EMNLP 2012]OLLIE [Mausam et al., EMNLP 2012]OLLIE [Mausam et al., EMNLP 2012]

� InputInputInputInput: Seed triplets <arg1, {rel}, arg2>

� Model of Representation: Model of Representation: Model of Representation: Model of Representation: Path Patterns + Classifier

– Patterns centered around verbs, nouns, adjectives, etc.

� Pattern Learning: Pattern Learning: Pattern Learning: Pattern Learning: Generalize from sentences that are “paraphrases” of seed tuples

� Classifier (factual vs. nonClassifier (factual vs. nonClassifier (factual vs. nonClassifier (factual vs. non----factual): factual): factual): factual):

– Context analysis (dependency-based): to discard invalid facts, e.g., conditional, or attributed to

someone else

– Logistic regression classifier to identify other likely non-factual tuples

• Trained on manually labeled triples extracted from 1000 sentences

171

Page 172: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

OLLIE Pattern LearningOLLIE Pattern LearningOLLIE Pattern LearningOLLIE Pattern Learning

172

Annacone: arg1, NNnsubj dobj

hired: postag=VBDFederer:

arg2, NNDependency Parse

Delexicalize relation nodes

Retain lexical constraints on slot

nodes, and generalize based on seed

sentences where the fully

delexicalized pattern was seen

(Annacone; is the coach of; Federer)

Federer hired Annacone as coach

coach: rel, NNprep

Seed tuple

NN: arg1nsubj dobjslot: postag=VBD

lex: hiredNN: arg2

NN: relprep

NN: arg1

nsubjdobj

slot: postag=VBD

lex: hired,

named, assigned

NN: arg2

NN: relprep

“Paraphrase” of seed tuple�

sentence contains content words

linked by a linear dependency path

Page 173: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

RENOUN [Yahya RENOUN [Yahya RENOUN [Yahya RENOUN [Yahya et et et et alalalal., EMNLP ., EMNLP ., EMNLP ., EMNLP 2014201420142014]]]]

� Focus on facts centered around noun phrases:

‘The CEO of Google, Larry Page’ Google � CEO (Attribute) � Larry Page

� Model of Representation: Model of Representation: Model of Representation: Model of Representation: Path Patterns + Ranking function

� Input: OInput: OInput: OInput: Ontology of nominal attributes (e.g., Biperdia)

8 manually crafted high-precision patterns to find seed tuples in corpus

E.g., <Attribute> of <Subject>, <Object>

� Pattern Pattern Pattern Pattern LearningLearningLearningLearning: : : : Generalize from seed tuples

� Fact Scoring: Fact Scoring: Fact Scoring: Fact Scoring: Score(t) = Σ frequency(pi) x coherence(pi), for all patterns pi that support t

– A pattern has high coherence if it applies to attributes that are similar as per their word vectors

– Rank facts by the score, and consider top-K, where K is set by the user

173

Page 174: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

RENOUN RENOUN RENOUN RENOUN Pattern LearningPattern LearningPattern LearningPattern Learning

174

Dependency Parse

Minimal subgraph containing head

tokens of S, A, O

Delexicalize the S, A, O nodes; lift

noun POS tags to N; Discard

patterns supported by less than 10

seed tuples

Google � CEO (Attribute) � Larry Page

A CEO, like Larry Page of Google is…

Seed tuple (Biperdia + 8 patterns)

“Paraphrase” of seed tuple �

contains Attribute of the seed, with

Subject and Object as in seed

Page 175: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Transparent ML in Distant Supervision SystemsTransparent ML in Distant Supervision SystemsTransparent ML in Distant Supervision SystemsTransparent ML in Distant Supervision Systems

� Transparency in Model of Representation– Path Patterns + Classifier/Ranking function

– Model-level provenance (partial)

• Extracted objects explained by the supporting patterns

• Ranking function (RENOUN) typically easier to understand than a logistic regression

classifier (OLLIE)

• OLLIE � dependency-based context analysis portion of the classifier is transparent

� Transparency in Learning Algorithm– Algorithm-level Provenance: Learning Learning Learning Learning of each pattern of each pattern of each pattern of each pattern can be explained by the supporting supporting supporting supporting

tuplestuplestuplestuples

– RENOUN � some additional transparency in terms of user influencing the model via the

threshold K

�Transparency in Incorporation of Domain Knowledge (DK)–Offline

–Possible to incorporate DK at deployment (by reviewing the patterns)

175

Page 176: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Fact Extraction: SemiFact Extraction: SemiFact Extraction: SemiFact Extraction: Semi----supervised and Unsupervisedsupervised and Unsupervisedsupervised and Unsupervisedsupervised and Unsupervised

� Bootstrapping � initial set of seeds grown iteratively, over multiple iterations

� Distant supervision � a single iteration

� Unsupervised � no seeds

176

Page 177: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Fact Extraction: UnsupervisedFact Extraction: UnsupervisedFact Extraction: UnsupervisedFact Extraction: Unsupervised

� Traditional IE: Traditional IE: Traditional IE: Traditional IE: [Sudo et al., ACL 2003]

� Open IE: Open IE: Open IE: Open IE: REVERB [Fader et al., EMNLP 2011]

177

Page 178: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation178

An Improved Extraction Pattern Representation Model for Automatic An Improved Extraction Pattern Representation Model for Automatic An Improved Extraction Pattern Representation Model for Automatic An Improved Extraction Pattern Representation Model for Automatic IE Pattern Acquisition [IE Pattern Acquisition [IE Pattern Acquisition [IE Pattern Acquisition [SudoSudoSudoSudo et al., et al., et al., et al., ACL 2003ACL 2003ACL 2003ACL 2003] ] ] ]

� Scope: Scope: Scope: Scope: Traditional IE, w/ extraction task specified by TREC-like narrative

description

� Preprocessing: Preprocessing: Preprocessing: Preprocessing: Dependency Analysis, NE-tagging

�Model: Model: Model: Model: Path patterns

� Learning AlgorithmLearning AlgorithmLearning AlgorithmLearning Algorithm

1. Retrieve relevant documents R

• Issue search query using sentences from narrative description

2. Count all possible subtrees in R

• Make a Pattern List of those that conform the pattern model

3. Rank each subtree (inspired by TF/IDF):

• β trained to prioritize among overlapping patterns, preferring more specific patterns

R

β

⋅=

i

iidf

Ntfscore log

tfi� # of times

subtree i

occurred in

documents in R

dfi� # of source

documents which

contain subtree i

Page 179: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

REVERB [Fader et alREVERB [Fader et alREVERB [Fader et alREVERB [Fader et al., EMNLP ., EMNLP ., EMNLP ., EMNLP 2011]2011]2011]2011]

� Scope: Scope: Scope: Scope: Open IE of relations centered around verbs

� Preprocessing: Preprocessing: Preprocessing: Preprocessing: POS tagging, NP chunking

� Model: Model: Model: Model: Fixed syntactic pattern + classifier

� Pattern: Pattern: Pattern: Pattern: <NP1> … < VP> … <NP2>

– <VP> satisfies:

• Syntactic constraint: V|VP|VW*P � to allow light-verb constructions (e.g., “give a talk at’)

• Lexical constraint � to avoid over-specified relations

• Based on large dictionary of generic relation phrases, automatically discovered from 500M Web

pages

• Adjacent/overlapping VPs are merged into a single VP

– <NP1> and <NP2> are the noun phrases closest to <VP> to the left/right

• Exclude relative pronoun, who-adverb and existential “there”

� Learning Algorithm:Learning Algorithm:Learning Algorithm:Learning Algorithm:

– Find all matches for the syntactic pattern

– Use logistic regression to assign a confidence to each extracted triple

• Classifier trained manually labeled extracted triples from 1000 sentences

– Trade precision for recall using a confidence threshold

179

Page 180: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Transparent ML in Unsupervised Fact ExtractionTransparent ML in Unsupervised Fact ExtractionTransparent ML in Unsupervised Fact ExtractionTransparent ML in Unsupervised Fact Extraction

�Transparency in Model of Representation– Sequence/Path Patterns + Classifier/Ranking function

–Model-level Provenance (partial)

• Extracted objects explained by the supporting patterns

• Ranking function ([Sudo 2013]) typically easier to understand compared to a

logistic regression classifier (REVERB)

�Transparency in Learning Algorithm–No transparency

�Transparency in Incorporation of Domain Knowledge (DK)–Offline

–Can incorporate DK at deployment, by reviewing the patterns (not for

REVERB)

180

Page 181: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Transparent ML Techniques Transparent ML Techniques Transparent ML Techniques Transparent ML Techniques

181

Unsupervised Semi-supervised Supervised

Dictionary

Regex

Rules

Rules + Classifier

Classification Rules

Page 182: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

RIPPER [Cohen, ICML 1995]RIPPER [Cohen, ICML 1995]RIPPER [Cohen, ICML 1995]RIPPER [Cohen, ICML 1995]

• Classic propositional rule learner algorithm that:

• Performs efficiently on large noisy data

• Extends naturally to first order logic representations

• Competitive in generalization performance

• InputInputInputInput: positive and negative examples

• Algorithm (sketch)Algorithm (sketch)Algorithm (sketch)Algorithm (sketch)

1. Building stage: Repeat until <stopping condition>

1. Split examples into two sets: Grow and Prune

2. Grow one rule by greedily adding conditions until the rule is 100% precise on

Grow set

3. Incrementally prune each rule based on Prune set � to avoid overfitting

2. Optimization stage: Simplify ruleset by deleting rules in order to reduce total

description length

• Useful for learning Predicate-based rules for IE, e.g. rule induction [Nagesh et al., 2012]

• Extensions: e.g., SLIPPER [Cohen & Singer 1999]

182

Page 183: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

CHIMERA CHIMERA CHIMERA CHIMERA [[[[SuganthanSuganthanSuganthanSuganthan et et et et al., al., al., al., SIGMOD 2015]SIGMOD 2015]SIGMOD 2015]SIGMOD 2015]

Rule generation for product classification: (motor | engine) oils? � motor oil

1. Tool to increase the recall of a single classification rule

183

(motor | \syn ) oils?

Automotive

Truck

ATV

• Rank candidate synonyms based on context similarity with known

synonyms

• User feedback on some candidates � re-rank remaining candidates

• Rank candidate synonyms based on context similarity with known

synonyms

• User feedback on some candidates � re-rank remaining candidates

Page 184: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

CHIMERA CHIMERA CHIMERA CHIMERA [[[[SuganthanSuganthanSuganthanSuganthan et et et et al., al., al., al., SIGMOD 2015]SIGMOD 2015]SIGMOD 2015]SIGMOD 2015]

Rule generation for product classification: (motor | engine) oils? � motor oil

2. Tool to generate classification rules from examples

– Sequence mining to generate candidate rules from labeled product titles

– Greedy algorithm to select a subset of rules that provide good coverage and high precision

184

Page 185: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Transparent ML in Learning of Classification RulesTransparent ML in Learning of Classification RulesTransparent ML in Learning of Classification RulesTransparent ML in Learning of Classification Rules

� Transparency in Model of Representation– Classification rules

– Model-level Provenance

� Transparency in Learning Algorithm– RIPPER � No transparency

– CHIMERA � transparency in terms of the user influencing the learning via (1) the initial rule and

(2) selection of candidate synonyms

�Transparency in Incorporation of Domain Knowledge (DK)–Offline (RIPPER), or interactive (CHIMERA)

–Possible to incorporate DK at deployment (by modifying the rules)

185

Page 186: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Transparent ML Techniques Transparent ML Techniques Transparent ML Techniques Transparent ML Techniques

186

Unsupervised Semi-supervised Supervised

Dictionary

Regex

Rules

Rules + Classifier

Classification Rules

Page 187: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Recap

�Transparency in Model–Model-level provenance available in most surveyed systems, with some

exceptions: imperative language (FlashExtract), complex rules w/ weights

(Snowball), using a CRF (AutoSlog-SE)

�Transparency in Learning Algorithm–Algorithm-level provenance available in a few systems, to various extents

–User ability to influence the model � a variety of ways

�Transparency in Incorporation of Domain Knowledge– Interactive � few systems: WHISK, INSTAREAD, CHIMERA

–Deployment �mostly depends on model-level provenance

187

Page 188: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Transparent ML: Building an End-to-end Transparent IE System

188

Page 189: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Outline

� Building a Transparent IE System

� Transparent Machine Learning

� Building Developer Tools around Transparent IE

� Case Study and Demo

189

Page 190: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Background: The SystemT Project

�Early 2000’s: NLP group starts at IBM Research – Almaden

�Initial focus: Collection-level machine learning problems

�Observation: Most time spent on feature extraction

–Technology used: Cascading finite state automata

190

Page 191: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Problems with Cascading Automata

• Scalability

• Expressivity

• Ease of comprehension

• Ease of debugging

• Ease of enhancement

Transparency

191

Page 192: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Proin, in sagittis facilisis, volutpat dapibus, ultrices sit amet, sem , volutpat dapibus, ultrices sit amet,

sem Tomorrow, we will meet Mark Scott, Howard Smith and amet lt arcu tincidunt orci. Pellentesque justo tellus , scelerisque quis, facilisis nunc

volutpat enim, quis viverra lacus nulla sit lectus. Curabitur cursus tincidunt orci. Pellentesque justo tellus , scelerisque quis, facilisis quis, interdum non, ante.

Suspendisse

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Proin elementum neque at justo. Aliquam erat volutpat. Curabitur a massa. Vivamus luctus, risus in

sagittis facilisis arcu Tomorrow, we will meet Mark Scott, Howard Smith and hendrerit faucibus pede mi ipsum. Curabitur cursus tincidunt orci.

Pellentesque justo tellus , scelerisque quis, facilisis quis, interdum non, ante. Suspendisse feugiat, erat in

Lack of Transparency in Cascading Automata

Tokenization

(preprocessing step)

Level 1

⟨Gazetteer⟩[type = LastGaz] � ⟨Last⟩

⟨Gazetteer⟩[type = FirstGaz] � ⟨First⟩

⟨Token⟩[~ “[A-Z]\w+”] � ⟨Caps⟩

Rule priority used to prefer

First over Caps

Rule priority used to prefer First over Caps.

First preferred over Last since it was declared earlier

Page 193: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Proin, in sagittis facilisis, volutpat dapibus, ultrices sit amet, sem , volutpat dapibus, ultrices sit amet,

sem Tomorrow, we will meet Mark Scott, Howard Smith and amet lt arcu tincidunt orci. Pellentesque justo tellus , scelerisque quis, facilisis nunc

volutpat enim, quis viverra lacus nulla sit lectus. Curabitur cursus tincidunt orci. Pellentesque justo tellus , scelerisque quis, facilisis quis, interdum non, ante.

Suspendisse

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Proin elementum neque at justo. Aliquam erat volutpat. Curabitur a massa. Vivamus luctus, risus in

sagittis facilisis arcu Tomorrow, we will meet Mark Scott, Howard Smith and hendrerit faucibus pede mi ipsum. Curabitur cursus tincidunt orci.

Pellentesque justo tellus , scelerisque quis, facilisis quis, interdum non, ante. Suspendisse feugiat, erat in

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Proin elementum neque at justo. Aliquam erat volutpat. Curabitur a massa. Vivamus luctus, risus in e

sagittis Tomorrow, we will meet Mark Scott, Howard Smith and hendrerit faucibus pede mi sed ipsum. Curabitur cursus tincidunt orci.

Pellentesque justo tellus , scelerisque quis, facilisis quis, interdum non, ante. Suspendisse feugiat, erat in feugiat tincidunt, est nunc volutpat enim, quis viverra

lacus nulla sit amet lectus. Nulla odio lorem, feugiat et, volutpat dapibus, ultrices sit amet, sem. Vestibulum quis dui vitae massa euismod faucibus. Pellentesque

id neque id tellus hendrerit tincidunt. Etiam augue. Class aptent

Lack of Transparency in Cascading Automata

Tokenization

(preprocessing step)

Level 1

⟨Gazetteer⟩[type = LastGaz] � ⟨Last⟩

⟨Gazetteer⟩[type = FirstGaz] � ⟨First⟩

⟨Token⟩[~ “[A-Z]\w+”] � ⟨Caps⟩

Level 2 ⟨First⟩ ⟨Last⟩ � ⟨Person⟩

⟨First⟩ ⟨Caps⟩ � ⟨Person⟩

⟨First⟩ � ⟨Person⟩

Rigid Rule Priority in Level 1

caused partial results

Page 194: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Problems with Cascading Automata

• Scalability: Redundant passes over document

• Expressivity: Frequent use of custom code

• Ease of comprehension

• Ease of debugging

• Ease of enhancement

Operational

semantics

+ custom code

= no provenance

Page 195: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Outline

� Building a Transparent IE System

� Transparent Machine Learning

� Building Developer Tools around Transparent IE

� Case Study and Demo

195

Page 196: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Bringing Transparency to Feature ExtractionBringing Transparency to Feature ExtractionBringing Transparency to Feature ExtractionBringing Transparency to Feature Extraction

�Our approach: Use a declarative language

–Decouple meaning of extraction rules from execution plan

�Our language: AQL (Annotator Query Language)

–Semantics based on relational calculus

–Syntax based on SQL

196

Page 197: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Document

text: String

Person

last: Spanfirst: Span fullname: Span

AQL Data Model (Simplified)

� Relational data modelRelational data modelRelational data modelRelational data model: data is organized in : data is organized in : data is organized in : data is organized in tuples; tuples have a ; tuples have a ; tuples have a ; tuples have a schema

� Special Special Special Special data types necessary data types necessary data types necessary data types necessary for text processing:for text processing:for text processing:for text processing:–Document consists of a single texttexttexttext attribute–Annotations are represented by a type called SpanSpanSpanSpan, which consists of beginbeginbeginbegin, endendendendand documentdocumentdocumentdocument attribute

197

Page 198: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

create view FirstCaps as

select CombineSpans(F.name, C.name) as name

from First F, Caps C

where FollowsTok(F.name, C.name, 0, 0);

<First> <Caps>

0 tokens

• Declarative: Specify logical conditions that input tuples should satisfy in order to

generate an output tuple

• Choice of SQL-like syntax for AQL motivated by wider adoption of SQL

• Compiles into SystemT algebra

AQL By Example

198

Page 199: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

create view Person as

select S.name as name

from (

( select CombineSpans(F.name, C.name) as name

from First F, Caps C

where FollowsTok(F.name, C.name, 0, 0))

union all

( select CombineSpans(F.name, L.name) as name

from First F, Last L

where FollowsTok(F.name, L.name, 0, 0))

union all

( select *

from First F )

) S

consolidate on name;

<First><Caps>

<First><Last>

<First>

Revisiting the Person Example

199

Page 200: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

create view Person as

select S.name as name

from (

( select CombineSpans(F.name, C.name) as name

from First F, Caps C

where FollowsTok(F.name, C.name, 0, 0))

union all

( select CombineSpans(F.name, L.name) as name

from First F, Last L

where FollowsTok(F.name, L.name, 0, 0))

union all

( select *

from First F )

) S

consolidate on name;

Explicit clause for

resolving

ambiguity

Input may contain

overlapping annotations

(No Lossy Sequencing

problem)

Revisiting the Person Example

200

Page 201: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation201

Compiling and Executing AQL

AQL Language

Optimizer

Operator

Graph

Specify extractor semantics

declaratively (express logic of

computation, not control flow)

Choose efficient execution

plan that implements

semantics

Optimized execution plan

executed at runtime

Page 202: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Regular Expression Extraction Operator

202

[A-Z][a-z]+

DocumentInput Tuple

J

we will meet Mark

Scott and

J

Output Tuple 2 Span 2Document

Span 1Output Tuple 1 Document

Regex

Page 203: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

How AQL Solved our Problems

• Scalability: Cost-based query optimization

• Expressivity: Complex tasks, no custom code

• Ease of comprehension

• Ease of debugging

• Ease of enhancement

Clear and Simple

Provenance

203

Page 204: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation20

4

Computing Computing Computing Computing ModelModelModelModel----level Provenancelevel Provenancelevel Provenancelevel Provenance

PersonPhone rule:

insert into PersonPhone

select Merge(F.match, P.match) as match

from Person F, Phone P

where Follows(F.match, P.match, 0, 60);

match

Anna at James St. office (555-5555

James St. office (555-5555

PersonPhone

� (Model-level) Provenance: Explains output data in terms of the input data, the intermediate data, and the transformation (e.g., SQL query, ETL, workflow)

– Surveys: [Davidson & Freire, SIGMOD 2008] [Cheney et al., Found. Databases 2009]

� For predicate-based rule languages (e.g., SQL), can be computed automatically!

Person PhonePerson

Anna at James St. office (555-5555) J.

Page 205: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation20

5

Computing ModelComputing ModelComputing ModelComputing Model----level Provenancelevel Provenancelevel Provenancelevel Provenance

Rewritten PersonPhone rule:

insert into PersonPhone

select Merge(F.match, P.match) as match,

GenerateID() as ID,

P.id as nameProv, Ph.id as numberProv

‘AND’ as how

from Person F, Phone P

where Follows(F.match, P.match, 0, 60);

Person PhonePerson

Anna at James St. office (555-5555) J.

ID: 1 ID: 2 ID: 3

1 3AND

2 3AND

match

Anna at James St. office (555-5555

James St. office (555-5555

PersonPhone

Provenance

� (Model-level) Provenance: Explains output data in terms of the input data, the intermediate data, and the transformation (e.g., SQL query, ETL, workflow)

– Surveys: [Davidson & Freire, SIGMOD 2008] [Cheney et al., Found. Databases 2009]

� For predicate-based rule languages (e.g., SQL), can be computed automatically!

Page 206: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

AQL: Going beyond feature extractionAQL: Going beyond feature extractionAQL: Going beyond feature extractionAQL: Going beyond feature extraction

Dataset Entity Type System Precision Recall F-measure

CoNLL 2003

LocationSystemT 93.11 91.61 92.35Florian 90.59 91.73 91.15

OrganizationSystemT 92.25 85.31 88.65Florian 85.93 83.44 84.67

PersonSystemT 96.32 92.39 94.32Florian 92.49 95.24 93.85

Enron PersonSystemT 87.27 81.82 84.46Minkov 81.1 74.9 77.9

206

Extraction Extraction Extraction Extraction Task:Task:Task:Task: Named-entity extraction

Systems Systems Systems Systems compared:compared:compared:compared: SystemT (customized) vs. [Florian et al.’03] [Minkov et al.’05]

[Chiticariu et al., EMNLP’10]

Transparency without machine learning outperforms machine learning without transparency.

Page 207: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Outline

� Building a Transparent IE System

� Transparent Machine Learning

� Building Developer Tools around Transparent IE

� Case Study and Demo

207

Page 208: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Machine Learning in Machine Learning in Machine Learning in Machine Learning in SystemTSystemTSystemTSystemT

�AQL provides a foundation of transparency

�Next step: Add machine learning without losing transparency

�Major machine learning efforts:

–Low-level features

–Rule refinement

–Rule induction

–Normalization

–Embedded Models

208

Page 209: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Machine Learning in Machine Learning in Machine Learning in Machine Learning in SystemTSystemTSystemTSystemT

�Low-level features

�Rule refinement

�Rule induction

�Normalization

�Embedded Models

209

Page 210: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation21

0

Recap from Part 3: Regular Expression learning with ReLIE[Li et al., EMNLP 2008]

Regex0

Sample

Documents

Match 1

Match 2

J

NegMatch 1

J

NegMatch m0

PosMatch 1

J

PosMatch n0

Labeled Matches ReLIERegexfinal

Clear semantics presented to the user.

Page 211: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation211

Recap from Part 3: Pattern discovery Recap from Part 3: Pattern discovery Recap from Part 3: Pattern discovery Recap from Part 3: Pattern discovery for dictionaries for dictionaries for dictionaries for dictionaries [Li[Li[Li[Li et al., CIKM 2011]et al., CIKM 2011]et al., CIKM 2011]et al., CIKM 2011]

Page 212: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Machine Learning in Machine Learning in Machine Learning in Machine Learning in SystemTSystemTSystemTSystemT

�Low-level features

�Rule refinement

�Rule induction

�Normalization

�Embedded Models

212

Page 213: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation21

3

Recap: Rule Refinement [Liu et al. VLDB 2010]

R1: create view Phone as

Regex(‘d{3}-\d{4}’, Document, text);

R2: create view Person as

Dictionary(‘first_names.dict’, Document, text);

Dictionary file first_names.dict:

anna, james, john, peterJ

R3: create table PersonPhone(match span);

insert into PersonPhone

select Merge(F.match, P.match) as match

from Person F, Phone P

where Follows(F.match, P.match, 0, 60);

Document:text

Anna at James St. office (555-

5555), or James, her assistant

– 777-7777 have the details.

Phone:match

555-5555

777-7777

Person:match

Anna

James

James

Person PhonePerson Person Phone

Anna at James St. office (555-5555), or James, her assistant - 777-7777 have the details.

� Rules expressed in SQL– Select, Project, Join, Union all, Except all– Text-specific extensions

• Regex, Dictionary table functions• New selection/join predicates

– Can express core functionality of IE rule languages• AQL, CPSL, XLog

� Relational data model– Tuples and views– New data type span: region of text in a document

Page 214: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation21

4

Method Overview [Liu et al. VLDB 2010]

� Framework for systematic exploration of multiple

refinements geared towards improving precision

� InputInputInputInput: Extractor P

Labeled results in the output of P

� GoalGoalGoalGoal: Generate refinements of P that remove false

positives, while not affecting true positives

� Basic IdeaBasic IdeaBasic IdeaBasic Idea:Cut any provenance link � wrong output disappears

PersonDictionary

FirstNames.dict

Doc

PersonPhoneJoin

Follows(name,phone,0,60)

James

James����555-5555

(Simplified) provenance

of a wrong output

PhoneRegex

/\d{3}-\d{4}/

555-5555

Provenance (transparency) enables automatic rulerefinement.

Page 215: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Machine Learning in Machine Learning in Machine Learning in Machine Learning in SystemTSystemTSystemTSystemT

�Low-level features

�Rule refinement

�Rule induction

�Normalization

�Embedded Models

215

Page 216: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation21

6

Recap from Part 3: Rule InductionRecap from Part 3: Rule InductionRecap from Part 3: Rule InductionRecap from Part 3: Rule Induction[[[[NageshNageshNageshNagesh et al., EMNLP 2012]et al., EMNLP 2012]et al., EMNLP 2012]et al., EMNLP 2012]

Basic Features

(BF rules)

Candidate

Definition

(CD rules)

Candidate

Refinement

(CR rules)

Consolidation

(CO rules)

Induction of

CD rules

Induction of

CR rules

Clustering

and LGG

Proposition

Rule Learning

RIPPER

Basic

Feature

rules

Annotated

dataset

Simple CO rule

Page 217: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation21

7

RecapRecapRecapRecap: : : : LeastLeastLeastLeast general generalisation (LGG) of annotationsgeneral generalisation (LGG) of annotationsgeneral generalisation (LGG) of annotationsgeneral generalisation (LGG) of annotations

person(X,D1) :- startsWith(X, X1), FirstNameDict(X1),endsWith(X, X2), immBefore(X1,X2), Caps(X2).

person(Y,D2) :- startsWith(Y, Y1), FirstNameDict(Y1), Caps(Y1),endsWith(Y, Y2), immBefore(Y1,Y2), Caps(Y2).

startsWith(Z, Z1), FirstNameDict(Z1),

PER: john

PER: John

Smith

Doe

LGG of the above

endsWith(Z, Z2), immBefore(Z1,Z2), Caps(Z2)

person(Z,D) :-

Prolog representation of declarative AQL

Page 218: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Machine Learning in Machine Learning in Machine Learning in Machine Learning in SystemTSystemTSystemTSystemT

�Low-level features

�Rule refinement

�Rule induction

�Normalization

�Embedded Models

218

Page 219: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation219

Normalization

�To deep-parse social media (tweets), we need to normalize the

text into a more grammatical form

�Designed a normalizer based on a graph model–Zhang, Baldwin, Ho, Kimelfeld, Li: Adaptive Parser-Centric Text

Normalization, ACL 2013

�Parameters tuned by supervised machine learning

�Customizable by mapping dictionaries

–Contractions, abbreviations, etc.

–Example: kinda� kind of, rep � the representative

Page 220: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation220

Normalization Example

Ay woudent of see em.

Generated replacements

Targeted use of machine learning

Page 221: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Machine Learning in Machine Learning in Machine Learning in Machine Learning in SystemTSystemTSystemTSystemT

�Low-level features

�Rule refinement

�Rule induction

�Normalization

�Embedded Models

221

Page 222: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Input Documents

Extractedfeatures

SystemT Embeddable

Runtime

LanguageWare

Operators

SystemT

OperatorsStatistical

Learning

Algorithm

Lab

el

Feature Extraction

Training the Parser: Efficient and Powerful Feature Extraction

AQL

create view FirstWord as

select...

from ...

where ...;

...

create view Digits as

extract regex...

from ...

where ... ;

Input documents

English FirstWord: I, We,

This, J

Digits: 360, 2014J

Capitalized: We,

IBM, J

German FirstWord: Ich,

Wir,Dies, J

Digits: 360, 2014J

Capitalized: Wir,

IBM, JStatistical

Parser

Annotations

AQL

create function Parse(span Span)

return String ...

external name ...

...

create view Parse as

select Parse(S.text) as parse,...

from sentence S;

create view Actions as

select...

from Parse P;

...

-- Example: We are excited

create view Sentiment_SpeakerIsPositive as

select ‘positive’ as polarity, ...

‘isPositive’ as patternType,...

from Actions A, Roles R

where MatchesDict(‘PositiveVerb.dict’, A.verb)

and MatchesDict(‘Speakers.dict’, R.value)

and ...;

...

Input documents

Applying the Parser: Easy Incorporation of Parsing Results for Complex Extractors

UDFs

SentimentMentionpolarity mention target clue patternType

positive We like IBM from a long timer

J

IBM like SpeakerDoesPosi

tive

negative Microsoft earnings expected

to drop

Microsoft expected to

drop

TargetDoesNegat

ive… …

Parsing + extraction

Cost-based optimization

LanguageWare

Operators

SystemT

Operators

SystemT Embeddable

Runtime

Embed

statistical

parser as UDF

Cost-based optimization

Run

statistical

parser

Build rules

using parser

results

Identify

sentiment

based on

parse tree

patterns

Identify the

first word of

a sentence

Identify all

digits

Simplify Training and Applying Statistical Simplify Training and Applying Statistical Simplify Training and Applying Statistical Simplify Training and Applying Statistical PPPParsersarsersarsersarsers

222

Page 223: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

SystemT

Embeddable

Runtime

LanguageWare

Operators

SystemT

Operators

Cost-based optimization

. .

.

AQLcreate function NamedEntityRecognition(span Span)

return table (type String, entity Span) ...

external name ...

create function NamedEntity as

select N.*

from NamedEntityRecognition(Document) N;

create view PersonRuleBased as

select ...

from ...

where ...

create view Person as

select P.person

from (

(select P.person from PersonRuleBased)

union all

(select N.entity as person from NamedEntity N

where Equals(N,’Person’)

) P

consolidate on P.person;

output view Person;

... ...

Input documents statistical NER Annotations

Embed

statistical NER

as UDF

Apply

statistical

NER

AQL rules for

NER

Combine

results from

statistical

NER and

rule-based

NER for

better quality

Personperson

Ginni Rometty

Barack Obama

J J

Organizationorganization type

IBM commercial

White House government

.. J

Combine Combine Combine Combine Statistical and RuleStatistical and RuleStatistical and RuleStatistical and Rule----based NER for Better Quality based NER for Better Quality based NER for Better Quality based NER for Better Quality

223

Page 224: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Outline

� Building a Transparent IE System

� Transparent Machine Learning

� Building Developer Tools around Transparent IE

� Case Study and Demo

224

Page 225: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Transparent ML at different stages in Extractor Development

2

2

Develop

TestAnalyze

Development

Deploy

Refine

Test

Maintenance

Task Analysis

• Concordance Viewer

and Labeling Tool

[Chiticariu ‘12]

•Extraction plan

[Li’12]

• Track provenance [Liu ’10]

• Contextual clue discovery [Li ‘11]

• Regex Learning [Li ’08]

• Rule Refinement [Liu ’10]

• Rule Induction [Nagesh’12]

• Dictionary Refinement [Roy ’13]

• Visual Programming [Li ‘15]

• NE Interface [Chiticariu ’10b]

• Visual Programming [Li ‘15]

Page 226: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

• Consumable visualization of extraction results

Simple

• business analyst� business end user� CTO

3. Group 3

Different User Groups

226

Difficult to learn/use

Po

werf

ul

1. Group 1

2. Group 2

• data scientist• programmer

• Connect Concepts to build the extractor• High-level visual abstraction • Leverage smart parsing and pre-built concepts

• Full power AQL Eclipse-based tooling

Le

ss

Po

werf

ul

3. Group 3

• data modeler• data scientist• technical sales• enablement team

Page 227: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Eclipse Tools Overview

227

Ease of

Programming

Performance

Tuning

Automatic

Discovery

AQL Editor

Explain

Pattern Discovery

Result Viewer

Regex Learner

AQL Editor: syntax highlighting, auto-complete,

hyperlink navigation

Result Viewer: visualize/compare/evaluate

Explain: show how each result was generated

Workflow UI: end-to-end development wizard

Regex Generator: generate regular expressions

from examples

Pattern Discovery: identify patterns in the

data

Profiler: identify performance bottlenecks to be

hand tuned

Page 228: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Web Tools Overview

228

Ease of

Programming

Ease of

Sharing

Canvas: Visual construction of extractors,

Customization of existing extractors

Result Viewer: visualize/compare/evaluate

Concept catalog: share concepts

Project: share extractor development

Page 229: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

InfoSphere

StreamsStreamsStreamsStreams

Development EnvironmentDevelopment EnvironmentDevelopment EnvironmentDevelopment Environment

Cost-based optimization.

. .

. . .

. . .

. . .

SystemT: Overall Architecture

Transparent ML tools for AQL development

InfoSphere

BigInsightsBigInsightsBigInsightsBigInsights

ETL ETL

SystemT RuntimeSystemT RuntimeSystemT RuntimeSystemT RuntimeSystemT RuntimeSystemT RuntimeSystemT RuntimeSystemT Runtime

Input Documents

ExtractedObjects

SystemTSystemTSystemTSystemTSystemTSystemTSystemTSystemT

IBM EnginesIBM EnginesIBM EnginesIBM Engines

Rule language with familiar SQL-like syntax

Specify extractor semantics declaratively

Choose an efficient execution plan that

implements the semantics

Embeddable Java runtime

Highly scalable, small memory

footprint

Declarative AQL language

229

Page 230: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Outline

� Building a Transparent IE System

� Transparent Machine Learning

� Building Developer Tools around Transparent IE

� Case Study and Demo

230

Page 231: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Case Study: Sentiment Analysis over Research Reports

� Drawn from engagements with three major U.S. investment banks

� Basic problem: Automatically extract analysts’ detailed opinions on securities

and markets from analyst research reports

� Key challenges

–Customizing for domain-specific expressions

– Identifying the target of sentiment expressions

–Aggregating sentiment by document

231

We are upgrading US equities back to Overweight on a 6-month.We are upgrading US equities back to Overweight on a 6-month.

We have upgraded the Belgian market to Neutral from Underweight in the current quarter.We have upgraded the Belgian market to Neutral from Underweight in the current quarter.

As a relative momentum call versus the weakness anticipated in ASEAN, we are upgrading Korea to Overweight, and upgrading Taiwan to Neutral in 1Q.

As a relative momentum call versus the weakness anticipated in ASEAN, we are upgrading Korea to Overweight, and upgrading Taiwan to Neutral in 1Q.

Page 232: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Sentiment Analysis over Research Reports

232

ParsingVerb Context

Identification

text Sentiment

Models

SME Domain Knowledge

Unsupervised and

Semi-supervised

Machine Learning

Unsupervised and

Semi-supervised

Machine Learning

Page 233: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Phase 1: ParsePhase 1: ParsePhase 1: ParsePhase 1: Parse

233

We have upgraded US equities from 3M Neutral to 3M Overweight.We have upgraded US equities from 3M Neutral to 3M Overweight.

Clause

verb, auxiliary, presentSubject Complement

have

Noun Phrase

We

Clause

verb, past participleSubject Object Complement

upgradedNoun Phrase

equities

Prepositional Phrase

Subject Object

to Noun Phrase

OverweightNoun Phrase

3M

Noun Phrase

US

Prepositional Phrase

Subject Object

from Noun Phrase

NeutralNoun Phrase

3M

Page 234: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Transparent Machine Learning in the Parsing Phase

� Adaptive Text Normalization [Zhang et al., 2013]

– Model targeted towards generating

sentences that can be successfully parsed

– Sequential rules + graph model

• Explainable to a certain extent

– Allows incorporation of domain knowledge

at deployment

� The IBM English Slot Grammar Parser [McCord

et al., 2012]

– Candidate generation is rule-driven

– Ranking is less transparent

– Allows incorporation of domain knowledge

at deployment

• E.g., list of noun phrases, additional

word senses

234

Page 235: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Other Semantic Linguistic Constructs

calculated from multiple Syntactic Constructs

Other Semantic Linguistic Constructs

calculated from multiple Syntactic Constructs

Verb calculated from multiple

Syntactic Linguistic Constructs

Verb calculated from multiple

Syntactic Linguistic Constructs

We have upgraded US equities from 3M Neutral to 3M Overweight.We have upgraded US equities from 3M Neutral to 3M Overweight.

ObjectContext

Preposition ‘to’

VerbPresent perfect tense

Clauseverb, auxiliary, present

have Clauseverb, past participle

upgraded

Noun Phrase

Equities

Clauseverb, past participla

Object Complement

upgradedPrepositional Phrase

toNoun Phrase

US

Noun Phrase

OverweightNoun Phrase

3M

Phase 2: Identify Context

235

Page 236: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Transparent Machine Learning in the Context Identification Phase

� Dictionary Learning [Roy et al., 2013]

� Refine dictionaries within an AQL rule set

� Recall from Part 3

� Pattern Discovery [Li et al., 2011]

– Unsupervised discovery of contextual

patterns

• E.g., financial metrics, asset class

synonyms

– Recall from Part 3

236

Page 237: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

We have upgraded US equities from 3M Neutral to 3M Overweight.We have upgraded US equities from 3M Neutral to 3M Overweight.

VerbPresent perfect tense

ContextPreposition ‘to’

Domain knowledge Domain knowledge Domain knowledge Domain knowledge

• Polarity determined by recommendation (Overweight) not Verb (upgrade)

• Statements in present perfect are relevant

Domain knowledge Domain knowledge Domain knowledge Domain knowledge

• Polarity determined by recommendation (Overweight) not Verb (upgrade)

• Statements in present perfect are relevant

Object

Outlook ModelOutlook ModelOutlook ModelOutlook ModelAssign Positive Assign Positive Assign Positive Assign Positive polarity if:polarity if:polarity if:polarity if:

• Verb is in present/future/infinitive tense

• Object contains entity

• Context w/ preposition ‘to’ contains positive

recommendation (e.g. ‘Overweight’)

Outlook ModelOutlook ModelOutlook ModelOutlook ModelAssign Positive Assign Positive Assign Positive Assign Positive polarity if:polarity if:polarity if:polarity if:

• Verb is in present/future/infinitive tense

• Object contains entity

• Context w/ preposition ‘to’ contains positive

recommendation (e.g. ‘Overweight’)

Phase 3: AssignPhase 3: AssignPhase 3: AssignPhase 3: Assign PolarityPolarityPolarityPolarity

237

Page 238: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Transparent Machine Learning in the Polarity Assignment Phase

� The sentiment model: AQL rules– Exposes customization points:

• Dictionaries of sentiment clues

• Disable or change the behavior of certain

rules (e.g., discard past tense sentiments)

– Generic model adapted for the domain,

mostly manually

– Automatic adaptation of dictionaries not

possible due to absence of labeled data

� Sentiment Aggregation as a

Classification Problem– Given individual sentiment instances for an

entity from a document, classify the

document-level polarity for the entity

– SVM model trained based on

(entity/polarity) pairs in 100 documents

– Model embedded in AQL for scoring

238

Page 239: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Sentiment Analysis over Research Reports: Transparent ML

239

ParsingVerb Context

Identification

text Sentiment

Models

Expert Domain

Knowledge

Unsupervised and

Semi-supervised

Machine Learning

Unsupervised and

Semi-supervised

Machine Learning

• Adaptive Text Normalization• Transparent Deep Parser• Adaptive Text Normalization• Transparent Deep Parser

• Automatic Dictionary Refinement and Contextual Clue Discovery

• Automatic Dictionary Refinement and Contextual Clue Discovery

• Transparent Sentiment Model

• Embedded Classifier

• Transparent Sentiment Model

• Embedded Classifier

Page 240: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Demo

Page 241: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Find Find Find Find OOOOut ut ut ut MMMMore about ore about ore about ore about SystemTSystemTSystemTSystemT!!!!

241

https://ibm.biz/BdF4GQ

Page 242: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Find Find Find Find OOOOut ut ut ut MMMMore about ore about ore about ore about SystemTSystemTSystemTSystemT!!!!

242

https://ibm.biz/BdF4GQ

Try out SystemT

Watch a demo

Learn about using SystemT in unversity courses

Page 243: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Other SystemsOther SystemsOther SystemsOther Systems

� PropMiner (TU Berlin) [Akbik et al., 2013]

� ICE (New York University) [He and Grishman, 2015]

� SPIED (Stanford) [Gupta and Manning, 2014]

� CHIMERA (WalmartLabs, U. Wisconsin-Madison) [Sun et al, 2014]

� BBN Technologies System [Freeman et al., 2011]

� INSTAREAD (U. Washington) [Hoffman et al., 2015]

243

Page 244: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

PropMinerPropMinerPropMinerPropMiner (TU Berlin(TU Berlin(TU Berlin(TU Berlin)))) [[[[AkbikAkbikAkbikAkbik et al, 2013] et al, 2013] et al, 2013] et al, 2013]

244

1. Construct Example Sentence

2. Annotate Relation Triple

3. Parse Tree Visualization

5. Relevant Existing Rules

4. i. Auto-Generated Rule & Corresponding Results

ii. Edit Rules / Label Results

Additional features:

1. Sentence suggestion

2. Conflict resolution

Page 245: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation245

1. A ranked list of key phrases. Key phrases

appear more often in the in-domain corpus

than in general language will rank higher.

2. Given user-given or auto-constructed

seeds, automatically construct a ranked list

of similar terms in the corpus.

3. Linearize lexicalized dependency path for

easier understanding.

4. Auto-construct exact and fuzzy

dependency-path based relation extractors

with bootstrapping user input

ICE (ICE (ICE (ICE (New York University) New York University) New York University) New York University) [[[[He and Grishman, 2015] He and Grishman, 2015] He and Grishman, 2015] He and Grishman, 2015]

Page 246: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

SPIED (SPIED (SPIED (SPIED (StanfordStanfordStanfordStanford)))) [[[[Gupta and Manning, 2014] Gupta and Manning, 2014] Gupta and Manning, 2014] Gupta and Manning, 2014]

246 EntityEntityEntityEntity----centric viewcentric viewcentric viewcentric view

Page 247: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation247 PatternPatternPatternPattern----centric viewcentric viewcentric viewcentric view

SPIED (SPIED (SPIED (SPIED (StanfordStanfordStanfordStanford)))) [[[[Gupta and Manning, 2014] Gupta and Manning, 2014] Gupta and Manning, 2014] Gupta and Manning, 2014]

Page 248: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

CHIMERA (WalmartLabs, Univ. Wisconsin-Madison) [Sun et al, 2014]

248

Combine ruleCombine ruleCombine ruleCombine rule----based and machine learning based approaches to overcomebased and machine learning based approaches to overcomebased and machine learning based approaches to overcomebased and machine learning based approaches to overcome

Challenges for MLChallenges for MLChallenges for MLChallenges for ML----based approach:based approach:based approach:based approach:

1. Difficult to generate training data

2. Difficult to Generate Representative Sample

3. Difficult to Handle “Corner Cases”

4. Concept Drift & Changing Distribution

Challenges for ruleChallenges for ruleChallenges for ruleChallenges for rule----based approach:based approach:based approach:based approach:

1. Labor intensive

2. Time consuming

3. Cannot utilize existing labeled data

Page 249: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

BBN Technologies System [Freeman et al, 2011]

249

Third-party Ontology and Resources (guidelines/examples/sample documents)

Third-party Ontology and Resources (guidelines/examples/sample documents)

Domain-Specialization- Class detector based on unsupervised clustering

- Manually-added coreference heuristics

- Seed-based bootstrap relation learner

- Manually-developed rules in a pattern language

Domain-Specialization- Class detector based on unsupervised clustering

- Manually-added coreference heuristics

- Seed-based bootstrap relation learner

- Manually-developed rules in a pattern language

Existing ACE-

specific Extractors

Existing ACE-

specific Extractors

Sample patterns for possibleTreatment

Opaque step

Page 250: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

INSTAREAD (University of Washington) INSTAREAD (University of Washington) INSTAREAD (University of Washington) INSTAREAD (University of Washington) [[[[Hoffman et al. 2015] Hoffman et al. 2015] Hoffman et al. 2015] Hoffman et al. 2015]

250

1. Identify examples

by search.

2. Suggest

related terms

for more

examples

3. User-

created/refined rule

4. Auto-suggested

rules via

bootstrapping

Page 251: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Transparent ML for Information Extraction: Research Challenges and Future Directions

251

Page 252: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Research Challenges

�How to make transparent

ML for IE more

principled, effective,

and efficient?

252

Page 253: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Future Directions - 1

�Define Define Define Define a standard IE a standard IE a standard IE a standard IE language language language language and data and data and data and data modelmodelmodelmodel

� What is the right data model to capture text, annotations over text, and their

properties?

� Can we establish a standard declarative extensible language to solve most IE tasks

encountered so far?

� Desired characteristics:

• Expressivity:

Able to represent and combine different kinds of transparent models of

representation

• Extensibility:

Allow new models to be added in the future

• Declarativity:

Enable optimization, scalability, explainability

253

Page 254: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Future Directions - 2

� Systems Systems Systems Systems research based on a standard IE research based on a standard IE research based on a standard IE research based on a standard IE languagelanguagelanguagelanguage

� Data representation

� Automatic performance optimization

� Exploring modern hardware

254

Page 255: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

Future Directions - 3

�ML ML ML ML research based on a standard IE research based on a standard IE research based on a standard IE research based on a standard IE languagelanguagelanguagelanguage

� How to learn basic primitives such as regular expressions and dictionaries?

� How to automatically generate models that are comprehensible and debuggable ?

� How to design learning algorithms that are more comprehensible and debuggable ?

� How to enable easy incorporation of domain knowledge?

255

Page 256: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

References� [Agichtein & Gravano 2000] Eugene Agichtein, Luis Gravano. Snowball: extracting relations from large plain-text collections. ACM Conference on Digital Libraries, 2000

� [Aitken 2002] Learning information extraction rules: An inductive logic programming approach, European Conference on Artificial Intelligence, 2002

� [Baldwin et al, 2013] Tyler Baldwin, Yunyao Li, Bogdan Alexe, Ioana R Stanoi. Automatic Term Ambiguity Detection. ACL 2013

� [Banko et al., 2007] Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, Oren Etzioni: Open Information Extraction from the Web. IJCAI 2007: 2670-2676

� [Brauer et al., 2011] Falk Brauer, Robert Rieger, Adrian Mocan, WojciechM. Barczynski:Enabling information extraction by inference of regular expressions from sample entities. CIKM 2011

� [Brin 1998] Sergey Brin. Extracting Patterns and Relations from the World Wide Web. WebDB, 1998

� [Cafarella et al., 2005] Michael J. Cafarella, Doug Downey, Stephen Soderland, Oren Etzioni . KnowItNow: fast, scalable information extraction from the web. HLT 2005

� [Califf & Mooney 1999] M. E. Calif and R. J. Mooney, Relational learning of pattern-match rules for information extraction, AAAI 1999

� [Califf & Mooney 2003] M. Calif and R. Mooney, Bottom-up Relational Learning of Pattern Matching Rules for Information Extraction, Journal of Machine Learning 2003.

� [Carlson et al. 2010] Andrew Carlson, Justin Betteridge, Richard C. Wang, Estevam R. Hruschka, Jr., Tom M. Mitchell , Coupled semi-supervised learning for information extraction, WSDM 2010

� [Chang & Manning 2014] Angel X. Chang and Christopher D. Manning. 2014. TokensRegex: Defining cascaded regular expressions over tokens. Stanford University Technical Report, 2014.

� [Chiticariu et al., 2010] Laura Chiticariu, Rajasekar Krishnamurthy, Yunyao Li, Frederick Reiss, Shivakumar Vaithyanathan. Domain adaptation of rule-based annotators for named-entity recognition tasks. EMNLP 2010

� [Cheney et al, 2009] James Cheney, Laura Chiticariu, Wang Chiew Tan: Provenance in Databases: Why, How, and Where. Foundations and Trends in Databases 1(4): 379-474 (2009)

� [Choi et al., 2005] Yejin Choi, Claire Cardie, Ellen Riloff, Siddharth Patwardhan: Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns. EMNLP 2005

� [Ciravegna 2001] F. Ciravegna, Adaptive information extraction from text by rule induction and generalization. IJCAI 2001

� [Coden 2014] Anni Coden, Daniel Gruhl, Neal Lewis, Pablo N. Mendes, Meena Nagarajan, Cartic Ramakrishnan, Steve Welch: Semantic Lexicon Expansion for Concept-Based Aspect-Aware Sentiment Analysis. SemWebEval@ESWC 2014: 34-40

� [Cohen 1995] W. Cohen. Fast effective Rule Induction. ICML 1995

� [Cohen & Singer 1999] W. Cohen, Y. Singer. A Simple, Fast and Effective Rule Learner. AAAI 1999

� [Davidson & Freire, 2008] Susan B. Davidson, Juliana Freire: Provenance and scientific workflows: challenges and opportunities. SIGMOD 2008

� [Del Corro & Gemulla 2013] Luciano Del Corro, Rainer Gemulla. ClausIE: clause-based open information extraction. WWW 2013

� [Downey et al., 2004] Doug Downey, Oren Etzioni, Stephen Soderland, and Daniel S. Weld. Learning Text Patterns for Web Information Extraction and Assessment. AAAI Workshop on Adaptive Text Extraction and Mining, 2004

� [Downey et al. 2007] Doug Downey, Matthew Broadhead, Oren Etzioni: Locating Complex Named Entities in Web Text. IJCAI 2007

� [Etzioni et al, 2005] Oren Etzioni, Michael Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, Alexander Yates. Unsupervised named-entity extraction from the web: an experimental study. J Artificial Intelligence 2005

256

Page 257: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

References� [Fader et al. 2011] Anthony Fader, Stephen Soderland, Oren Etzioni. Identifying relations for open information extraction. EMNLP 2011

� [Gerow 2014 A. Gerow. Extracting clusters of specialist terms from unstructured text. EMNLP 2014

� [Grishman &Min 2010] New York University KBP 2010 Slot‐Filling System, Ralph Grishman and Bonan Min, TAC Workshop 2010

� [Gupta & Manning, 2014] Sonal Gupta and Christopher Manning. Improved Pattern Learning for Bootstrapped Entity Extraction. ACL 2014

� [Hoffman et al. 2015] R. Hoffmann, L. Zettlemoyer, D.S. Weld . Extreme Extraction: Only One Hour per Relation. June 2015.

� [Ji et al, 2010] H Ji, R Grishman, HT Dang, K Griffitt, J Ellis, Overview of the TAC 2010 Knowledge Base Population Track, TAC Workshop, 2010

� [Krishnamurthy et al., 2008] Rajasekar Krishnamurthy, Yunyao Li, Sriram Raghavan, Frederick Reiss, Shivakumar Vaithyanathan, Huaiyu Zhu: SystemT: a system for declarative information extraction. SIGMOD Record 37(4): 7-13 (2008)

� [Kudo & Matsumoto 2004] T. Kudo and Y. Matsumoto. 2004. A boosting algorithm for classification of semi-structured text. EMNLP 2004

� [Kobayashi et al. 2007] Nozomi Kobayashi , Kentaro Inui , Yuji Matsumoto .Extracting Aspect-Evaluation and Aspect-Of Relations in Opinion Mining , EMNLP 2007

� [Le & Gulwani 2014] Vu Le, Sumit Gulwani: FlashExtract: a framework for data extraction by examples. PLDI 2014

� [Li et al., 2008] Yunyao Li, Rajasekar Krishnamurthy, Sriram Raghavan, Shivakumar Vaithyanathan, H. V. Jagadish: Regular Expression Learning for Information Extraction. EMNLP 2008

� [Li et al, 2011] Y. Li, V. Chu, S. Blohm, H. Zhu, H. Ho. Facilitating pattern discovery for relation extraction with semantic-signature-based clustering. CIKM 2011

� [Liu et al, 2010] Automatic Rule Refinement for Information Extraction, Bin Liu, Laura Chiticariu, Vivian Chu, H. V. Jagadish, Frederick Reiss, PVLDB 3(1), 2010

� [Mausam et al. 2012] Mausam, Michael Schmitz, Robert Bart, Stephen Soderland, Oren Etzioni. Open language learning for information extraction. EMNLP 2012

� [McIntosh & Curran 2009] Tara McIntosh, James R. Curran: Reducing Semantic Drift with Bagging and Distributional Similarity. ACL/IJCNLP 2009

� [McIntosh 2010] T. McIntosh: Unsupervised Discovery of Negative Categories in Lexicon Bootstrapping. EMNLP 2010

257

Page 258: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

References� [Mitchell et al. 2015] T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel, J. Krishnamurthy, N. Lao, K.

Mazaitis, T. Mohamed, N. Nakashole, E. Platanios, A. Ritter, M. Samadi, B. Settles, R. Wang, D. Wijaya, A. Gupta, X. Chen, A. Saparov, M. Greaves, J. Welling. Never Ending Learning. AAAI 2015

� [Muslea 1999] Ion Muslea. Extraction patterns for information extraction tasks: A survey. In: AAAI Workshop on Machine Learning for Information Extraction, 1999

� [Nagesh et al, 2012] Ajay Nagesh, Ganesh Ramakrishnan, Laura Chiticariu, Rajasekar Krishnamurthy, Ankush Dharkar, Pushpak Bhattacharyya. Towards Efficient Named-Entity Rule Induction for Customizability, EMNLP-CoNLL, 2012

� [Pasca et al., 2006] Marius Paşca, Dekang Lin, Jeffrey Bigham, Andrei Lifchits, Alpa Jain. Names and similarities on the web: fact extraction in the fast lane. ACL 2006

� [Prasse et al., 2012] Paul Prasse, Christoph Sawade, Niels Landwehr, Tobias Scheffer: Learning to Identify Regular Expressions that Describe Email Campaigns. ICML 2012

� [Qadir 2012] Qadir, A. and Riloff, E. Ensemble-based Semantic Lexicon Induction for Semantic Tagging, SEM 2012

� [Riloff 1993], Ellen Riloff: Automatically Constructing a Dictionary for Information Extraction Tasks. AAAI 1993

� [Riloff 1996] Ellen Riloff:Automatically Generating Extraction Patterns from Untagged Text. AAAI 1996

� [Riloff & Jones 1999] Riloff, E. and Jones, R. Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping, AAAI 1999

� [Roy et al, 2013], Provenance-based dictionary refinement in information extraction, Sudeepa Roy, Laura Chiticariu, Vitaly Feldman, Frederick Reiss, Huaiyu Zhu, SIGMOD, 2013

� [Qadir & Riloff, 2012] A. Qudir, E. Riloff. Ensemble-based Semantic Lexicon Induction for Semantic Tagging, *SEM 2012

� [Qadir et al., 2015] A. Qudir, P. Mendes, D. Gruhl, N. Lewis. Semantic lexicon induction from Twitter with pattern relatedness and flexible term length. AAAI 2015

� [Qiu & Zhang 2014] Likun Qiu, Yue Zhang. ZORE: A Syntax-based System for Chinese Open Relation Extraction. EMNLP 2014

� [Sarawagi 2008] Sunita Sarawagi: Information Extraction. Foundations and Trends in Databases 1(3): 261-377 (2008)

� [Shen et al., 2007] W. Shen, A.Doan, J. F. Naughton, R. Ramakrishnan. Declarative Information Extraction Using Datalog with Embedded Extraction Predicates. VLDB 2007

� [Soderland 1999] S. Soderland, Learning information extraction rules for semi-structured and free text, Machine Learning, vol. 34, 1999.

� [Sudo et al., 2003] Kiyoshi Sudo, Satoshi Sekine, Ralph Grishman. An improved extraction pattern representation model for automatic IE pattern acquisition. ACL 2003

� [Surdeanu et al., 2003] Mihai Surdeanu, Sanda Harabagiu, John Williams, Paul Aarseth. Using predicate-argument structures for information extraction. ACL 2003

� [Surdeanu 2013] Mihai Surdeanu. Overview of the TAC2013 Knowledge Base Population Evaluation: English Slot Filling and Temporal Slot Filling, TAC Workshop, 2013

� [Yahya et al. 2014] Mohamed Yahya, Steven Whang, Rahul Gupta, Alon Y. Halevy. ReNoun: Fact Extraction for Nominal Attributes. EMNLP 2014258

Page 259: Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

© 2015 IBM Corporation

References

� [Akbik et al, 2013] Propminer: A Workflow for Interactive Information Extraction and. Exploration using Dependency Trees. ACL 2013

� [Atasu et al., 2013] Kubilay Atasu, Raphael Polig, Christoph Hagleitner, Frederick R. Reiss: Hardware-accelerated regular expression matching for high-throughput text analytics. FPL 2013

� [Chiticariu et al., 2011] Laura Chiticariu, Vivian Chu, Sajib Dasgupta, Thilo W. Goetz, Howard Ho, Rajasekar Krishnamurthy, Alexander Lang, Yunyao Li, Bin Liu, Sriram Raghavan, Frederick Reiss, Shivakumar Vaithyanathan, Huaiyu Zhu:The SystemT IDE: an integrated development environment for information extraction rules. SIGMOD Demo 2011

� [Chiticariu et al., 2010b] Laura Chiticariu, Rajasekar Krishnamurthy, Yunyao Li, Frederick Reiss, Shivakumar Vaithyanathan: Domain Adaptation of Rule-Based Annotators for Named-Entity Recognition Tasks. EMNLP 2010

� [Cunningham et al., 2002] Hamish Cunningham, Diana Maynard, Kalina Bontcheva, Valentin Tablan: A framework and graphical development environment for robust NLP tools and applications. ACL 2002

� [Fagin et al., 2013] Ronald Fagin, Benny Kimelfeld, Frederick Reiss, Stijn Vansummeren: Spanners: a formal framework for information extraction. PODS 2013

� [Fagin et al., 2014] Ronald Fagin, Benny Kimelfeld, Frederick Reiss, Stijn Vansummeren: Cleaning inconsistencies in information extraction via prioritized repairs. PODS 2014

� [Freeman et al, 2011] Extreme Extraction --Machine Reading in a Week, EMNLP 2011

� [Gupta and Manning, 2014] SPIED: Stanford Pattern-based Information Extraction and Diagnostics, CoNLL 2014

� [He and Grishman, 2015] ICE: Rapid Information Extraction Customization for NLP Novices, NAACL 2015

� [Hoffman et al. 2015] R. Hoffmann, L. Zettlemoyer, D.S. Weld . Extreme Extraction: Only One Hour per Relation. June 2015

� [Li et al., 2012] Yunyao Li, Laura Chiticariu, Huahai Yang, Frederick Reiss, Arnaldo Carreno-Fuentes: WizIE: A Best Practices Guided Development Environment for Information Extraction. ACL (System Demonstrations) 2012

� [Li et al., 2015] Yunyao Li, Elmer Kim; Marc Touchette; Ramiya Venkatachalam; Hao Wang. VINERy: A Visual IDE for Information Extraction. VLDB Demo, 2015

� [McCord et al., 2012] Michael C. McCord, J. William Murdock, Branimir Boguraev: Deep parsing in Watson. IBM Journal of Research and Development 56(3): 3 (2012)

� [Polig et al., 2014a] Raphael Polig, Kubilay Atasu, Laura Chiticariu, Christoph Hagleitner, H. Peter Hofstee, Frederick R. Reiss, Huaiyu Zhu, Eva Sitaridi: Giving Text Analytics a Boost. IEEE Micro 34(4): 6-14 (2014)

� [Polig et al., 2014b] Raphael Polig, Kubilay Atasu, Heiner Giefers, Laura Chiticariu: Compiling text analytics queries to FPGAs. FPL 2014

� [Sun et al, 2014] Chimera: Large-Scale Classification using Machine Learning, Rules, and Crowdsourcing, VLDB 2014

� [Zhang et el., 2013] Congle Zhang, Tyler Baldwin, Howard Ho, Benny Kimelfeld, Yunyao Li: Adaptive Parser-Centric Text Normalization. ACL (1) 2013

259