machine learning - department of computer...

57
Machine Learning Programs “learn” behaviors from labelled examples - Supervised learning Training Data: Class Temperature Sweating? Chills? Appetite-Loss? Post-Nasal-Drip? Rash? COLD 98.7 no yes no yes yes FLU 100.1 yes no yes no no COLD 98.9 no yes no yes no FLU 99.4 yes no yes no yes FLU 99.1 yes no yes no yes COLD 98.4 no yes no yes no Test Data: Class Temperature Sweating? Chills? Appetite-Loss? Post-Nasal-Drip? Rash? ??? 98.3 no yes no yes yes ??? 101.2 yes no yes no no . . . .

Upload: vukhue

Post on 23-Apr-2018

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Machine

Learning

Programs“learn”

behaviorsfromlabelled

examples-Supervised

learning

TrainingD

ata:

Class

Temperature

Sweating?

Chills?

Appetite-Loss?

Post-Nasal-D

rip?R

ash?C

OL

D98.7

noyes

noyes

yesFL

U100.1

yesno

yesno

noC

OL

D98.9

noyes

noyes

noFL

U99.4

yesno

yesno

yesFL

U99.1

yesno

yesno

yesC

OL

D98.4

noyes

noyes

no

TestData:

Class

Temperature

Sweating?

Chills?

Appetite-Loss?

Post-Nasal-D

rip?R

ash????

98.3no

yesno

yesyes

???101.2

yesno

yesno

no

..

..

Page 2: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Word

SenseD

isambiguation

Problem:

Thecom

panysaid

theplantisstilloperating

...�

(A)M

anufacturingplant

or

(B)Living

plant

TrainingD

ata:SenseC

ontext(1)M

anufacturing...union

responsestoplant

closures....”

”...com

puterdiskdrive

plantlocated

in...

””

company

manufacturing

plantisin

Orlando

...(2)L

iving...anim

alratherthanplant

tissuescanbe

...”

”...to

strainm

icroscopicplant

lifefrom

the...

””

andG

olgiapparatusofplant

andanim

alcells

TestData:

SenseC

ontext???

...vinylchloridem

onomer

plant,w

hichis...

???...m

oleculesfoundin

planttissue

fromthe

...

..

..

Page 3: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Machine

Translation(English

French)

Problem:

...He

wrote

thelastsentence

two

yearslater...

peine(legalsentence)

or

phrase(gram

maticalsentence)

TrainingD

ata:

TranslationC

ontext(1)peine

...foram

aximum

sentencefora

youngoffender...

””

...ofthem

inimum

sentenceofseven

yearsinjail...

””

...were

underthesentence

ofdeathatthattim

e...

(2)phrase...read

thesecond

sentencebecause

itisjustas...”

”...The

nextsentence

isavery

important...

””

...Itisthesecond

sentencew

hichIthink

isat...

TestData:Translation

Context

???...cannotcriticize

asentence

handeddow

nby

...???

...listento

thissentence

utteredby

aform

er...

..

..

Page 4: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Text-to-SpeechSynthesis

Problem:

...slightlyelevated

leadlevels...

l� d(asin

leadm

ine)or

li:d(asin

leadrole)

TrainingD

ata:

PronunciationC

ontext(1)l� d

...itmonitorsthe

leadlevelsin

drinking...

””

...conferenceon

leadpoisoning

in...

””

...strontiumand

leadisotope

zonation...

(2)li:d...m

aintainedtheir

leadThursday

over...”

”...to

Boston

andlead

singerforPurple...

””

...Bush

a17-point

leadin

Texas,only3

...

TestData:

PronunciationC

ontext???

...median

bloodlead

concentrationw

as..???

...hisdouble-digitlead

nationwide

.The...

..

..

Page 5: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

AccentR

estorationin

Spanish&

French

Problem:

Input:...deja

travaillecote

acote

...

Output:

...dejatravaille

cotea

cote...

Exam

ples:...appelerl’autre

cotede

l’atlantique...

cote(m

eaningside)

or

cote(m

eaningcoast)

...unefam

illedespecheurs...

pecheurs(meaning

fishermen)

or

pecheurs(meaning

sinners)

..

..

Page 6: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

AccentR

estorationin

Spanish&

French

TrainingD

ata:

PatternC

ontext(1)cote

...dulaisserde

cotefaute

detem

ps...”

”...appelerl’autre

cotede

l’atlantique...

””

...passede

notrecote

dela

frontiere...

(2)cote...vivre

surnotrecote

ouesttoujours...”

”...creersurla

cotedu

labradordes...”

”travaillaientcote

acote

,ilsavaient...

TestData:

PatternC

ontext???

...passede

notrecote

dela

frontiere...

???...creersurla

cotedu

labradordes...

..

..

Page 7: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Capitalization

Restoration

Problem:

...FR

IEDC

HIC

KEN

,TU

RK

EY

SAN

DW

ICH

ESA

ND

FROZEN

...�

turkey(the

bird)or

Turkey(the

country)Training

Data:

Capitalization

Context

(1)turkey...

OF

FRIE

DC

HIC

KE

N,

TU

RK

EY

SAN

DW

ICH

ES

AN

DFR

OZ

EN

...”

”...

NT

SA

POU

ND

,W

HIL

ET

UR

KE

YPR

ICE

SR

OSE

1.2C

EN

TS

...”

”...

PLA

Y,

RE

AL

GR

AD

E-AT

UR

KE

Y,

WH

ICH

ON

LYA

PRIC

E...

(2)Turkey...

INU

ND

AT

ED

EA

STE

RN

TU

RK

EY

AFT

ER

TH

EE

AR

LIE

R...

””

...FE

EL

ING

ST

OW

AR

DT

UR

KE

YSU

RFA

CE

DW

HE

NG

RE

EC

E...

””

...T

HE

CO

NT

RA

CT

WIT

HT

UR

KE

YW

ILL

PRO

VID

EO

PPOR

TU...

TestData:

Capitalization

Context

???...

NE

CK

LIK

ET

HA

TO

FA

TU

RK

EY

ON

AC

HO

PPING

BL

OC

K...

???...

PRO

BL

EM

IST

HA

TT

UR

KE

YIS

NO

TA

EU

RO

PEA

N...

..

..

Page 8: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

SpellingC

orrection

Problem:

...andhe

firedpresidentialaid/aide

Dick

Morrisafter...

aidor

aide

TrainingD

ata:

SpellingC

ontext(1)aid

...andcutthe

foreignaid/aide

budgetinfiscal1996

...”

”...they

offeredfederal

aid/aideforflood-ravaged

states...(2)aide

...firedpresidential

aid/aideD

ickM

orrisafter...”

”...and

saidthe

chiefaid/aide

toSen.B

aker,Mr.John

...

TestData:Spelling

Context

???...said

thelongtim

eaid/aide

tothe

MayorofSt....

???...w

illsquandertheaid/aide

itreceivesfromthe

...

..

..

Page 9: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Other

Applications

VowelR

estorationin

Hebrew

andA

rabic

Capitalization

Restoration

(e.g.T

UR

KE

Y

Turkey/turkey)

SpellingC

orrection(e.g.principal/principle)

ProperN

ounC

lassification(e.g.W

ashington

PER

SON/PL

AC

E)

SpeechR

ecognition(e.g./eid/

aid/aide)

..

..

Page 10: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Machine

Learning

Algorithm

s�

NeuralN

ets

Decision

Trees

Decision

Lists

Bayesian

Classifiers

Genetic

Algorithm

s

..

..

Page 11: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Machine

Learning

Programs“learn”

behaviorsfromlabelled

examples-Supervised

learning

TrainingD

ata:

Class

Temperature

Sweating?

Chills?

Appetite-Loss?

Post-Nasal-D

rip?R

ash?C

OL

D98.7

noyes

noyes

yesFL

U100.1

yesno

yesno

noC

OL

D98.9

noyes

noyes

noFL

U99.4

yesno

yesno

yesFL

U99.1

yesno

yesno

yesC

OL

D98.4

noyes

noyes

no

TestData:

Class

Temperature

Sweating?

Chills?

Appetite-Loss?

Post-Nasal-D

rip?R

ash????

98.3no

yesno

yesyes

???101.2

yesno

yesno

no

..

..

Page 12: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

.

Authorship

ID:W

hoW

rotea

Student’sTermPaper?

Frequencyas

Frequencyas

Word

inText

StudentAStudentB

optimally

971

certainly84

3typically

464

perspicuous26

0actually

134

whilst

60

the241

229aw

esome

063

totally0

40w

onderful0

26incredibly

013

����������� �� ������ �

����������� �� ������ �

� ����

�� ��� �� ������ �

�� ��� �� ������ �

� �����

..

..

Page 13: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

.

Com

biningE

vidence-O

ne(B

ayesian)Approach

����������� �� ������ �

����������� �� ������ �

� ����

�� ��� �� ������ �

�� ��� �� ������ �

� �����

�� �����

�� ������ �

�� �����

�� ������ �

� �� !

�� � ������ �

�� � ������ �

� ��

� � "# �� ������ �

�� � "# �� ������ �

� $�

� � "% �� ������ �

�� � "% �� ������ �

� $���

..

..

Page 14: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

SourcesofEvidence

-Wordsin

Context

Frequencyas

Frequencyas

Word

toleft

Aid

Aide

foreign718

1federal

2970

western

1460

provide88

0covert

260

oppose13

0future

90

similar

60

presidential0

63chief

040

longtime

026

aids-infected0

2sleepy

01

disaffected0

1indispensable

21

practical2

0squander

10

..

..

Page 15: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Com

plexFeatures-L

inguisticPatterns

PositionC

ollocationl� d

li:dN

-grams

+1L

leadlevel/N

2190

-1W

narrowlead

070

(word,

+1W

leadin

207898

lemm

a,-1

W,+1

Woflead

in162

0part-of-speech)

-1W

,+1W

thelead

in0

301+1

P,+2P

lead,

NO

UN

2347

Wide-context

�k

Wzinc

(in

��

words)

2350

collocations�

kW

copper(in

��

words)

1300

Verb-object-V

Lfollow

/V

lead0

527relationships

-VL

take/V

lead1

665

..

..

Page 16: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Algorithm

1:Decision

Lists

LogLEvidence

Pronunciation11.40

follow/V

+lead

li:d11.20

zinc(in

��

words)

l� d11.10

leadlevel/N

l� d10.66

ofleadin

l� d10.59

thelead

in

li:d10.51

leadrole

li:d10.35

copper(in

��

words)

l� d10.28

leadtim

e

li:d10.24

leadlevels

l� d10.16

leadpoisoning

l� d���

New

Sentence:

Studiesidentifiedslightly

elevatedcopperand

leadlevels.

Classification:

l� d

..

..

Page 17: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Com

biningvs.N

otCom

biningProbabilities

Use

allmatching

patternsintargetcontext

� �����

�����

������� ���������� ������������

������� ���������� ������������ ��

Use

onlythe

highestscoringpattern

Agree

-B

othclassificationscorrect

92%B

othclassificationsincorrect

6%D

isagree-

Singlebestevidence

correct1.3%

Com

binedevidence

correct0.7%

Total-100%

..

..

Page 18: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Smoothing

andInterpolation

Smoothing

oflikelihoodratiossensitive

tovariablesincluding

Collocationaldistance

Typeofw

ord(noun,verb,contentw

ord,functionw

ord)

Nature

ofsyntacticrelationship

Improve

probabilityestim

atesbyinterpolating

between

globalandresidualprobabilities

..

..

Page 19: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

EvaluationSam

plePrior

%W

ordPron1

Pron2Size

Prob.C

orrectlives

laIvzlIvz

3318669

98w

oundw

aænd

wund

448355

98N

icenaIs

nis573

5694

Begin

bIæ

gIn

beIgIn

114375

97C

hitæ

ikaI

128853

98C

olonkoæ

æloæ

koælæ

n1984

6998

lead(N

)lid

læd

1216566

98tear(N

)tæ

æ�

tIæ

227188

97axes(N

æksiz

ææ

ksIz1344

7296

IVaIvi

fææ

æ1442

7698

Jandæ

æn

jæn

132790

98routed

æutId

æaæ

tId589

6094

bassbeIs

bæs

186557

99AV

ERA

GE

6366067

97

..

..

Page 20: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Com

parativeE

valuation�

Accentrestoration

taskin

Spanish

N-gram

Tagger93.8%

Bayesian

Classifier

89.4%D

ecisionL

ist96.8%

..

..

Page 21: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

AdvantagesofA

lgorithm�

Successfullyintegratesnon-independentfeatures

Com

binesstrengthsofBayesian

classifiersandN

-gramtaggers

Modelslocalsequence

andw

idecontext

Returnsprobability

valueswith

allclassifications

Efficient

Resulting

decisionlistsare

easyto

interpretandm

odify

..

..

Page 22: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

.Problem:L

exicalAm

biguityR

esolution�

Word

sensedisam

biguation

Lexicalchoicein

machine

translation

Hom

ographdisam

biguationin

speechsynthesis

Accentrestoration

inSpanish

andFrench

Otherapplications

Three

Algorithm

s:

Decision

lists(supervised)

Bayesian

word-classdiscrim

inators(unsupervised)

Modulated

bootstrappingfrom

seedw

ords(unsupervised)

..

..

Page 23: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Need

forU

nsupervisedA

lgorithms

Hand-tagged

trainingdata

areexpensive

andgenerally

unavailable

WordN

etsense-taggedcorpus:sm

all,underdevelopment

Parallelalignedbilingualcorpora

Sourceofautom

aticallytagged

datafortranslation

distinctions

Currently

limited

availabilityand

coverage

Goal:M

ethodsfortrainingon

untagged,monolingualtext

..

..

Page 24: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Bayesian

Word-classD

iscrimination

Roget’sT

hesaurusCategories(1042

word

classes):

MA

CH

INE

-tractor,bulldozer,crane,jackhamm

er,drill,forklift...A

NIM

AL

-alligator,lizard,bat,flamingo,heron,crane,stork

...M

INE

RA

L-strontium

,zinc,magnesium

,lead,copper,cobalt...

Statisticalword-classdetectors:

...theengine

oftheX

XX

wasdam

aged...

�� M

AC

HIN

E� context�

�����

�� AN

IMA

L context��

���

�� MIN

ER

AL context�

��

���

...

..

..

Page 25: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

ClassD

iscriminators

Word-sense

Discrim

inators

crane

AN

IMA

Lor

MA

CH

INE

...theengine

ofthecrane

wasdam

aged...

�� MA

CH

INE� context� �

��

�� AN

IMA

L� context� ����

�� MIN

ER

AL� context� �

����

...

...flocksofcranesnestedin

thesw

amp

...

�� MA

CH

INE� context� �

����

�� AN

IMA

L� context� �����

�� MIN

ER

AL� context� �

����

...

..

..

Page 26: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Corpus Position

Corpus Position

AN

IMA

L

MA

CH

INE

ProbabilityProbability

..

..

Page 27: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

TrainingofC

lassModels

Word

Class

Context

MA

CH

INE

...powerforthe

crane,hoistand

derrickassem

bly...

MA

CH

INE

...beenm

anufacturingforklift

partsfor30years...

MA

CH

INE

...foundvalvesfor

generator,refinery

turbines...M

AC

HIN

E...the

fumesofthe

tractorbegan

tobotherm

yeyes...

MA

CH

INE

...thecarbon-tipped

drillforced

manufacturers...

MA

CH

INE

...thenoise

ofabulldozer

disturbedthe

peaceof...

MA

CH

INE

...begana

firedrill

justafterthelunch

break...

MA

CH

INE

...while

thecrow

nedcrane

oftennestsin

marshy

...M

AC

HIN

E...boughta

fleetoftractor

plowsform

aintenance...

Hand-labelled

trainingdata

areunnecessary

Them

ajorityofw

ords(bytype)have

onlyone

sense

Secondarysensesare

widely

distributedacrosscategories

Thenoise

introducedby

thesecondary

sensesistolerable

focusedsignal/diffuse

noise

..

..

Page 28: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

TrainingofParam

eters�

Weighteach

classmem

berequally(dog

vs.wildebeest)

modeltypicalm

embersofthe

class,notmostfrequent

Bag-of-w

ordsBayesian

models(topic

detectors)

���� �� � �� ��� ����� �� � �

��� � ��� �����

���� � �� �

���������

Add

richersetofcollocationm

odels(fromdecision

listwork)

..

..

Page 29: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Corpus Position

Corpus Position

BR

OA

D

LOC

ALIZED

CO

NC

EPTD

ETECTO

R

DETEC

TOR

TOPIC

Class ProbabilityClass Probability

..

..

Page 30: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Application:L

anguagem

odelingfor

speechrecognition

....he

consumed

anenorm

ous/steIk/

with

wine

....

/steIk/

���������� ������ � � ��������

���������� ������ � � ��������

N-gram

Language

Models:

1)

������� �����������

trigram2)

������� ���������

bigram3a)

��������

unigram(static)

3b)

��������� ����

topicsensitive

unigram

��� !"# $

�������� % ���#�����&% ���# ��������

'

Sensitiveto

longdistance

dependencies

'

Successfulinface

ofsparsen-gram

s

'

Improvessm

oothedprobability

estimates

..

..

Page 31: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Performance

Word

SenseR

ogetCategory

Accuracy

sentencepunishm

entL

EG

AL

AC

TIO

N98%

setofwords

GR

AM

MA

Rm

olequantity

CH

EM

ICA

LS

99%m

amm

alA

NIM

AL

skinblem

ishD

ISEA

SEtaste

preferencePA

RT

ICU

LA

RIT

Y93%

flavorSE

NSA

TIO

Nduty

obligationD

UT

Y96%

taxPR

ICE,FE

E

.[Yarow

sky,1992].

92%m

eanaccuracy

..

..

Page 32: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

.Problem:L

exicalAm

biguityR

esolution�

Word

sensedisam

biguation

Lexicalchoicein

machine

translation

Hom

ographdisam

biguationin

speechsynthesis

Accentrestoration

inSpanish

andFrench

Otherapplications

Three

Algorithm

s:

Decision

lists(supervised)

Bayesian

word-classdiscrim

inators(unsupervised)

Modulated

bootstrappingfrom

seedw

ords(unsupervised)

..

..

Page 33: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Motivating

Phenomena

One

senseper

collocation�

One

senseper

discourse:W

ordSenses

Accuracy

Applicability

tankvehicle/contnr

99.6%

50.5%

motion

legal/physical99.9

%49.8

%poach

steal/boil100.0

%44.4

%palm

tree/hand99.8

%38.5

%axes

grid/tools100.0

%35.5

%sake

benefit/drink100.0

%33.7

%bass

fish/music

100.0%

58.8%

spacevolum

e/outer99.2

%67.7

%plant

living/factory99.8

%72.8

%crane

bird/machine

100.0%

49.1%

Average99.8

%50.1

%

.

Algorithm

drivenby

thejointexploitation

oftheseproperties

..

..

Page 34: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Problem:L

earningfrom

Untagged

TrainingD

ataSense

TrainingExam

ples(K

eyword

inC

ontext)?

...company

saidthe

plantisstilloperating

...?

Although

thousandsofplant

andanim

alspecies?

...tostrain

microscopic

plantlife

fromthe

...?

vinylchloridem

onomer

plant,w

hichis...

?and

Golgiapparatusof

plantand

animalcells...

?...com

puterdiskdrive

plantlocated

in...

?...N

issancarand

truckplant

inJapan

is...?

...theproliferation

ofplant

andanim

allife...

?...keep

am

anufacturingplant

profitablew

ithout...?

...animalratherthan

planttissuescan

be...

?...union

responsestoplant

closures....?

...moleculesfound

inplant

andanim

altissue...

?...

...

plant

(A)m

anufacturingplant

or

(B)living

plant

..

..

Page 35: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

SeedW

ords�

Use

wordsfrom

dictionarydefinitions

filteredforrelevance

byrelative

frequencyand

syntacticposition

Use

asingle

definingcollocate

foreach

class

crane

BIR

Dor

MA

CH

INE

plant

LIFE

orM

AN

UFA

CT

UR

ING

Labelsalientcorpuscollocates

co-occurrenceanalysisdeterm

inesasm

allspanning

setofcollocatesforhandlabelling.

..

..

Page 36: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Exam

pleInitialState

SenseTraining

Examples

(Keyw

ordin

Context)

Aused

tostrain

microscopic

plantlife

fromthe

...A

...rapidgrow

thofaquatic

plantlife

inw

ater...A

...thatdividelife

intoplantand

animalkingdom

Abedstoo

saltyto

supportplant

life.R

iver...A

......

?...com

panysaid

theplant

isstilloperating...

?...m

oleculesfoundin

plantand

animaltissue

?...

...?

...Nissan

carandtruck

plantin

Japanis...

?...anim

alratherthanplant

tissuescanbe

...B

......

Bautom

atedm

anufacturingplant

inFrem

ont...B

...vastmanufacturing

plantand

distribution...

Bchem

icalmanufacturing

plant,producing

viscoseB

...keepa

manufacturing

plantprofitable

without

A=

Seedcontextscontaining

thecollocation

life(1%

)B

=Seed

contextscontainingthe

collocationm

anufacturing(1%

)?

=U

ntaggedresidual(98%

)

..

..

Page 37: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Exam

pleInitialState

???

??

???

??

?

???

??

???

??

?

???

??

???

??

?

???

??

???

??

?

???

??

???

??

?

???

??

???

??

?

???

??

???

??

?

???

??

???

??

?

???

??

???

??

?

???

??

???

??

?

???

??

???

??

??

??

??

???

??

?

???

??

???

??

??

??

??

???

??

?

???

??

???

??

?

???

??

??

??

?

???

??

??

??

?

???

??

??

??

?

???

??

??

??

?

???

??

??

??

??

??

??

??

??

?

???

??

??

??

?

???

??

??

??

?

???

??

??

??

?

???

??

??

??

?

???

??

?

???

??

? ???

??

? ???

??

? ???

??

?

???

??

??

??

??

??

??

??

??

??

??

??

?

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

???

??

???

??

?

??

??

??

??

??

??

??

??

??

??

???

??

???

??

?

???

??

???

??

?

???

??

???

??

?

???

??

??

??

?

???

??

??

??

?

???

??

??

??

??

??

??

??

??

?

???

??

??

??

?

???

??

??

??

?

???

??

??

??

?

???

??

??

??

????

??

?

???

??

?

???

??

??

??

??

??

??

??

??

??

??

??

???

??

?

???

??

??

??

?

???

??

??

??

?

???

??

????

??

?

???

??

?

AAA

AAA A

AAA

A

AA

AAA

AA A

AAA

AA

A AAAA

AA

AA

AA

AA

??

??

??

??

??

?

BB

BB

BB

BB

BB

BB

BB

BB

BB

BBB B

BB

BB

BB

B

BB

BB

B

B

BB

?

??

??

??

??

??

??

?

??

??

?

?

?

?Manufacturing

Life

BB

BBBB B B

AA

??

?

??

?

??

?

??

..

..

Page 38: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

IterationStep

Traina

supervisedsense

taggeronthe

currentseedsets

Initialdecisionlistfor

plant(abbreviated)LogL

Collocation

Sense8.10

plantlife

A7.58

manufacturing

plant

B7.39

life(w

ithin

2-10w

ords)

A7.20

manufacturing

(in

2-10w

ords)

B6.27

animal(w

ithin

2-10w

ords)

A4.70

equipment(w

ithin

2-10w

ords)

B4.39

employee

(within

2-10w

ords)

B4.30

assembly

plant

B4.10

plantclosure

B3.52

plantspecies

A3.45

microscopic

plant

A...

Apply

theresulting

taggertothe

residualexamples

Add

theexam

plesexceedingthreshold

tothe

growing

seedsets

..

..

Page 39: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Exam

pleInterm

ediateState

???

??

???

??

?

???

??

???

??

?

???

??

???

??

?

???

??

???

??

?

???

??

???

??

?

???

??

??

??

?

???

??

??

??

?

???

??

??

??

?

???

??

?

???

??

? ???

??

? ???

??

? ???

??

??

??

??

?

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

??

???

??

???

??

?

??

??

??

??

??

??

??

??

??

??

???

??

???

??

?

???

??

???

??

?

???

??

???

??

?

???

??

??

??

?

???

??

??

??

?

???

??

??

??

??

??

??

?

???

??

??

??

??

??

??

??

??

??

??

??

?

???

??

?

???

??

?

???

??

?

???

??

?

???

??

?

???

??

??

??

?

???

??

?

???

??

??

??

?

???

??

???

??

?

AAA

AAA A

AAA

A

AA

AAA

AA A

AAA

AA

A AAAA

AA

AA

AA

AA

??

??

??

?

??

?

BB

BB

BB

BB

BB

BB

BB

BB

BB

BBB B

BB

BB

BB

B

BB

BB

B

B

BB

?

??

??

??

??

??

?

?

?

?Manufacturing

BB

BBBB B B

AA

??

?

??

?

??

??

??

?

AA

AA

A

AAA

AA

AA

AA

AAA

AA

AA

AAA

A A AA

AA A

AA

AA

AA A

AAA A A

AAAAA

AA A

AA

AA

A AA

AA

AA A

AA

AA

?

??

?

?

??

??

??

?

?

Microscopic

???

??

???

?

??

???

??

??

??

??

??

??

??

??

???

?

??? ?

Cell

Species ?

??

??

??

??

??

??

??

??

?

??

?

?

?

??

??

??

?348

BBBBB

BB

BBB

BB

BB B

BB

BBB

BB B

BB

B BBB

B BB

BBBB

BB

BB

B

B B BBB

BB B

BBB

BBB

BBB

BBBB

BB

BBB

BBB

BB

B

??

??

??

??

Equipm

ent ??

??

?? ?

??

??

?

????

??

?? ?

??

??

? ??

?

? ?

Autom

ateE

mployee

???

?

? Anim

alA

A

A

A

AAA

Life

??

??

?

??

??

?

??

??

..

..

Page 40: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Use

oftheone-sense-per-discourse

constraint�

Error

correctionC

hangeD

isc.in

tag#

TrainingExam

ples(from

same

discourse)A

�A

525containsa

variedplantand

animallife

A�

A525

them

ostcomm

onplantlife

,the...

A

A525

slightwithin

Arctic

plantspecies...B

A525

areprotected

byplantpartsrem

ainingfrom

Labeling

previouslyuntagged

contexts(bridgeto

newcollocations)

Change

Disc.

intag

#Training

Examples

(fromsam

ediscourse)

A

A724

...theexistence

ofplantand

animallife

...A

A724

...classifiedaseither

plantoranimal...

?

A724

Although

bacterialandplantcellsare

enclosedA

A348

...thelife

oftheplant,producing

stemA

A348

...anaspectof

plantlife,forexam

ple?

A348

...tissues;becauseplantegg

cellshave?

A348

photosynthesis,andso

plantgrowth

isattuned

..

..

Page 41: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

FinalTrainingIteration

B BB

B BB

B

BB

BB

B BB

B

B BB

B BB

B

BB

B BB

B B

BB

BB B

BBB

BBB

BB B

B

B BBB

BB

B

B

BB B

BB

BB

B

BB

BB

B

BB

B BB

B BB

B

BB

B BB B B

B

B BB

BB

B

B

BB

B BB

B B

B BB

B

BB

B

B

B

BB

B BB

B

B

B BBB

BB

B

BB

BB

B

B

B

BB

B

B

BB

B

BB

BB

B BB

B

B

B BB

B

BB

B

BB

BB

B

B

B

BB

BB B

BB

BBB

BB

B BB

B

BBB

BB

BB

BB B

BB

BB

B

BB

BB B

BB

BBB

BB

B BB

B BBB

BB

B

B

BB

B BBB

B

B

BB

B

B

B

BB

B BB

B

B B

B??

??

??

?

B B

?

?

?

??B

?

B

BB

B

B

?

AA

AA

AA

?

B

B

BB

B B B

B

B

BB

B

B

B

B

?

AA A

AA

A

A

AAA

AA

A

A A

AA

A

AA

A

A

A AA A

AA

A

A

AA

AA

AA

A

B

AAA

AA

A

A

A

AA

A

AA

AA

AA

AA

AA

A

A AA

AA

AA A

AAA

?A

AAA

AA

A

AAAA

A AA

AA

AA

AA

A A

A

AA

AA

A

AA

A

AA

AA

AA

A

AAA

A

A

A

AA

A

AA

AAA

A

B

AA

AA

AA

AA

AA

AA

A

AA A

AA

A

AA

A

A

AA

A

AA

A

AA

AA A

AA

A

AA

A

AA

AA

A

A AA

AA

A

A

A

AA

AA

AA

A AAA

AA

AA

AA

AA

A AA

AA

A

AA A

A

AA A

AA

AAA

A

AA

A AA

AA

AA

AA

A A

AA

A

AA

AA

A

AA

A AA

AA

A

A

AA

A

AA A

AA

A AAA

AA

A

A A

AA

AAA

A

AAA

AA

AA

AA AA

A

A

A AA

AAA

AA

A

A

AAA

AA

AA

AA

A

AA A

A

AA

A

AA

AA

A

A AA

AA

A

AA

AA

A

A AA

AA

AA

A

AA

A

AA

AA

AA

A

AA

A

AA A

AA

AA

AA

AA

A

A

A

AA

A

A AA

A

BB

BB

BB

B

BB

BBB

BB

B

BB

BBB

B

B

BB

BB

BB

BB

B

BB

B BB

B BB B

BB

BB

BB

B

BB

B

BB

BB

B

BB

BB

B

BB

B

BB

BBB

BB

B BB

B

B

B BB

B

BB

BB

B

BB

B

B

B

B

BB

B

B

BB

B

BB

BB

B BB

B

BB B

BB

BB

B

BB

B

BB

B

BB

BB

BB

B

B

BB

B

BB

BB

BB

BB

BB

BB

B

BB

BBB

BB

B

A

AA

AA

A

A

AAA

AA

A

A

A

BB

BBB

B

B

AA

AA

A A

B

B

BB

BBB

B

BBB

BB

BB

BB

BB

BB

BB B

BB

B

BB

BB

B

BBB

BB

B B

BB

BB

B

BBB

BB

BB

BB

B

B

B

BB

BB

BB

..

..

Page 42: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

FinalDecision

List

Finaldecisionlistfor

plant(abbreviated)LogL

Collocation

Sense10.12

plantgrowth

A9.68

car(within

��

words)

B9.64

plantheight

A9.61

union(w

ithin

��

words)

B9.54

equipment(w

ithin

��

words)

B9.51

assembly

plant

B9.50

nuclearplant

B9.31

flower(w

ithin

��

words)

A9.24

job(w

ithin

��

words)

B9.03

fruit(within

��

words)

A9.02

plantspecies

A...

...

...thelossofanim

alandplantspeciesthrough

extinction...,

..

..

Page 43: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Escaping

fromInitialM

isclassification�

Discourse

consistencycan

overridelocalcollocationalevidence

Redundancy

oflanguagem

akestheprocessselfcorrecting

Change

intraining

parameters

incrementalincreasein

contextwidth

afterintermediateconvergence

perturbationofthe

class-inclusionthreshold

(similarto

simulated

annealing)

..

..

Page 44: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Performance

%Seed

TrainingO

ptionsSam

p.M

ajorSupvsd

Two

Dict.

TopW

ithSchutze

Word

SensesSize

SenseA

lgrtmW

ordsD

efn.C

olls.O

SPDA

lgrthmplant

living/factory7538

53.197.7

97.197.3

97.698.6

92space

volume/outer

574550.7

93.989.1

92.393.5

93.690

tankvehicle/container

1142058.2

97.194.2

94.695.8

96.595

motion

legal/physical11968

57.598.0

93.597.4

97.497.9

92bass

fish/music

185956.1

97.896.6

97.297.7

98.8–

palmtree/hand

157274.9

96.593.9

94.795.8

95.9–

poachsteal/boil

58584.6

97.196.6

97.297.7

98.5–

axesgrid/tools

134471.8

95.594.0

94.394.7

97.0–

dutytax/obligation

128050.0

93.790.4

92.193.2

94.1–

drugm

edicine/narcotic1380

50.093.0

90.491.4

92.693.9

–sake

benefit/drink407

82.896.3

59.695.8

96.197.5

–crane

bird/machine

214578.0

96.692.3

93.694.2

95.5–

AVG

393663.9

96.190.6

94.895.5

96.592.2

Baseline

(%m

ajorsense)63.9%

Two

definingw

ords90.6%

Dictionary

definitions94.8%

Topcollocations(2

minutesw

ork)95.5%

Dictionary

defns.(with

OSPD

)96.5%

Fullysupervised

algorithm96.1%

..

..

Page 45: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Conclusion

Unavailability

ofhand-tagged

trainingdata

hasbeen

abottleneck

forprogressin

sensedisam

biguation

Thisalgorithm

,trained

onraw

textand

anon-line

dictionaryw

ith-outany

human

supervision,rivalstheperform

anceoffully

supervisedm

ethods

Thus,costlyhand-tagged

trainingdata

may

beunnecessary

toachieve

accuratelexicalam

biguityresolution.

..

..

Page 46: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Gender

Classification

Problem:

...company

presidentBurakC

hopraannounced

hisplan...

MA

LE

or

FEM

AL

E

TrainingD

ata:

Gender

Context

(1)male

...company

presidentBurak

Chopra

announcedhisplan

...”

”...and

theyhired

Mr.

Walter

Brillasan

accountant...”

”...the

youngactor

KeanuR

eeveswaspaid

over5...

(2)female

...thenoted

authorArdinia

Lospellistedherfavorite

...”

”...and

hissisterSusan

Millerw

asalsofound

...”

”...m

embersincluded

Dr.

LivoniaD

eyw

hosaid

shew

ould...

TestData:G

enderC

ontext???

...theretired

General

FidelR

amosdied

lastnight...???

...wasvisited

byAltonette

Smith,a

doctorfromSt....

..

..

Page 47: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Problem:Isan

unusualname

male

offemale?

.

Alditha

FEM

AL

EA

rdinia�

FEM

AL

EA

ltonnette�

FEM

AL

EB

urak�

MA

LE

Deryk

MA

LE

..

..

Page 48: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Solution:Look

atFinalCharacters(Suffix)ofW

ord.

––

a�

99+%FE

MA

LE

tt

e�

97%FE

MA

LE

––

k�

98%M

AL

E–

–d

96%M

AL

E–

–p

97%M

AL

E

..

..

Page 49: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Application:G

enderC

lassfication

Problem:W

hereto

obtaintraining

data?�

Availablenam

edatabasesnotlabelled

with

gender

How

toidentify

genderina

largeem

ployeenam

edatabase?

..

..

Page 50: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Application:G

enderC

lassfication

Problem:W

hereto

obtaintraining

data?�

Availablenam

edatabasesnotlabelled

with

gender

How

toidentify

genderina

largeem

ployeenam

edatabase?

Solution:G

enderforanam

eisvery

closelycorrelated

with

them

eansalary

ofpersonswith

thenam

e

Nam

eM

eanSalary

Grade

Bernard

6.92Phillip

6.47A

rthur6.39

Sandra4.64

Carolyn

4.47D

orthy4.11

��

� ���

Male,��

� ���

Female

..

..

Page 51: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Problem:W

hataboutAdamand

Todd...

..

Page 52: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Nam

eM

eanSalary

Grade

Bernard

6.92Phillip

6.47A

rthur6.39

David

5.91R

obert5.87

John5.47

Susan5.24

Adam

5.13Todd

5.09Sandra

4.64C

arolyn4.47

Dorthy

4.11

..

..

Page 53: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Problem:A

geisa

Factor...

..

Page 54: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Nam

eM

eanSalary

Grade

Bernard

6.92Phillip

6.47A

rthur6.39

David

5.91R

obert5.87

John5.47

Susan5.24

Adam

5.13Todd

5.09Sandra

4.64C

arolyn4.47

Dorthy

4.11

..

..

Page 55: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Problem:A

geisa

Factor.Solution:

Com

putem

eanage

foranam

efrom

referencesinA

PN

ewsw

ire.

Matthew

Stuart,

23,

saidhe

wasnotaw

are...

.M

ildredJones

,87

,died

yesterdayin

Boston

....

ToddW

ilson,

11,

wasabducted

fromoutside

...

Nam

eM

eanA

gein

AP

Ethel64.7

Mildred

63.3Elm

er60.0

Todd23.1

Heather

22.4Tam

my

20.0

..

..

Page 56: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

..

..

Page 57: Machine Learning - Department of Computer Scienceyarowsky/cs466slides/lexical-ambiguity-lecture.pdfMachine Learning Programs “learn ... 44.4 % palm tree/hand 99.8 % 38.5 % axes grid/tools

Moral

.

�Trainingdata

isoftendifficultto

obtain.

(Especiallyfinding

automatic

sourcesforannotation)

�How

ever,doingso

canbe

halfthefun

..

..