taal- en spraaktechnologie · word sense disambiguation (wsd) machine learning and languages...

110
Covered so far Today Taal- en spraaktechnologie Sophia Katrenko Utrecht University, the Netherlands Sophia Katrenko Lecture 2

Upload: others

Post on 11-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Taal- en spraaktechnologie

Sophia Katrenko

Utrecht University, the Netherlands

Sophia Katrenko Lecture 2

Page 2: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Outline

1 Covered so far

2 TodayMachine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Sophia Katrenko Lecture 2

Page 3: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Covered last time

Collocation extraction methods

Sophia Katrenko Lecture 2

Page 4: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Today we discuss Chapter 7, and more precisely1 intro to machine learning2 word sense disambiguation techniques

Sophia Katrenko Lecture 2

Page 5: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Intro to Machine Learning (ML)

Sophia Katrenko Lecture 2

Page 6: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Why learning?

When talking about learning w.r.t. natural languages, weconsider at least two aspects

1 (first and second) language acquisition2 language understanding and generation by a machine

Here, we focus on the second.

Sophia Katrenko Lecture 2

Page 7: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Machine learning and languages

Learning by a machine can be used to1 model morphological, syntactic, semantic and pragmatic

analysis of a natural language2 solve application tasks, such as information extraction,

summarization, machine translation, and others.The second group does not exclude input from the first.

Sophia Katrenko Lecture 2

Page 8: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Example

Software fromhttp://cogcomp.cs.illinois.edu/page/demos/.

Utrecht University has concentrated its leading research into fifteenresearch focus areas.

PoS TaggingNNP/ Utrecht NNP/ University VBZ/ has VBN/ concentrated PRP$/its VBG/ leading NN/ research IN/ into NN/ fifteen NN/ research NN/

focus NNS/ areas ./ .

Shallow parsing[NP Utrecht University] [VP has concentrated] [NP its leading research][PP into] [NP fifteen research focus areas] .

Named entity recognition[ORG Utrecht University] has concentrated its leading research intofifteen research focus areas.

Sophia Katrenko Lecture 2

Page 9: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Example

Software fromhttp://cogcomp.cs.illinois.edu/page/demos/.

Utrecht University has concentrated its leading research into fifteenresearch focus areas.

PoS TaggingNNP/ Utrecht NNP/ University VBZ/ has VBN/ concentrated PRP$/its VBG/ leading NN/ research IN/ into NN/ fifteen NN/ research NN/

focus NNS/ areas ./ .

Shallow parsing[NP Utrecht University] [VP has concentrated] [NP its leading research][PP into] [NP fifteen research focus areas] .

Named entity recognition[ORG Utrecht University] has concentrated its leading research intofifteen research focus areas.

Sophia Katrenko Lecture 2

Page 10: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Example

Software fromhttp://cogcomp.cs.illinois.edu/page/demos/.

Utrecht University has concentrated its leading research into fifteenresearch focus areas.

PoS TaggingNNP/ Utrecht NNP/ University VBZ/ has VBN/ concentrated PRP$/its VBG/ leading NN/ research IN/ into NN/ fifteen NN/ research NN/

focus NNS/ areas ./ .

Shallow parsing[NP Utrecht University] [VP has concentrated] [NP its leading research][PP into] [NP fifteen research focus areas] .

Named entity recognition[ORG Utrecht University] has concentrated its leading research intofifteen research focus areas.

Sophia Katrenko Lecture 2

Page 11: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Example

Software fromhttp://cogcomp.cs.illinois.edu/page/demos/.

Utrecht University has concentrated its leading research into fifteenresearch focus areas.

PoS TaggingNNP/ Utrecht NNP/ University VBZ/ has VBN/ concentrated PRP$/its VBG/ leading NN/ research IN/ into NN/ fifteen NN/ research NN/

focus NNS/ areas ./ .

Shallow parsing[NP Utrecht University] [VP has concentrated] [NP its leading research][PP into] [NP fifteen research focus areas] .

Named entity recognition[ORG Utrecht University] has concentrated its leading research intofifteen research focus areas.

Sophia Katrenko Lecture 2

Page 12: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (1)

Learning involves three components: task T, experience E,and performance measure P.The goal of learning is to perform well w.r.t. someperformance measure P on task T given some pastexperience or observations.Consider, for example, weather prediction given previousobservations.

Sophia Katrenko Lecture 2

Page 13: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (1)

Learning involves three components: task T, experience E,and performance measure P.The goal of learning is to perform well w.r.t. someperformance measure P on task T given some pastexperience or observations.Consider, for example, weather prediction given previousobservations.

Sophia Katrenko Lecture 2

Page 14: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (1)

Learning involves three components: task T, experience E,and performance measure P.The goal of learning is to perform well w.r.t. someperformance measure P on task T given some pastexperience or observations.Consider, for example, weather prediction given previousobservations.

Sophia Katrenko Lecture 2

Page 15: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (2)

ButWhat is experience? Is it direct or implicit?Do given observations reflect the task/goal?Does the number of observations matter? What aboutnoisy data?

Sophia Katrenko Lecture 2

Page 16: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (2)

ButWhat is experience? Is it direct or implicit?Do given observations reflect the task/goal?Does the number of observations matter? What aboutnoisy data?

Sophia Katrenko Lecture 2

Page 17: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (2)

ButWhat is experience? Is it direct or implicit?Do given observations reflect the task/goal?Does the number of observations matter? What aboutnoisy data?

Sophia Katrenko Lecture 2

Page 18: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (3)

Today, we consider1 Learning tasks: regression and classification2 Learning types: supervised, unsupervised, and

semi-supervised3 Evaluation measures: accuracy, precision, recall and

F-score

Sophia Katrenko Lecture 2

Page 19: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (4)

Formally, let observations (training data) (X ,Y ) be definedas (X ,Y ) ∈ X × Y on the input space X and the outputspace Y.Pairs (X ,Y ) are random variables distributed according tothe unknown distribution D.The observed data points we denote by (xi, yi) and saythat they are independently and identically distributedaccording to D.The goal is to construct a hypothesis h such that for anyinstance from the input space X it predicts its label fromthe output space Y, i.e. h : X → Y.

Sophia Katrenko Lecture 2

Page 20: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (4)

Formally, let observations (training data) (X ,Y ) be definedas (X ,Y ) ∈ X × Y on the input space X and the outputspace Y.Pairs (X ,Y ) are random variables distributed according tothe unknown distribution D.The observed data points we denote by (xi, yi) and saythat they are independently and identically distributedaccording to D.The goal is to construct a hypothesis h such that for anyinstance from the input space X it predicts its label fromthe output space Y, i.e. h : X → Y.

Sophia Katrenko Lecture 2

Page 21: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (4)

Formally, let observations (training data) (X ,Y ) be definedas (X ,Y ) ∈ X × Y on the input space X and the outputspace Y.Pairs (X ,Y ) are random variables distributed according tothe unknown distribution D.The observed data points we denote by (xi, yi) and saythat they are independently and identically distributedaccording to D.The goal is to construct a hypothesis h such that for anyinstance from the input space X it predicts its label fromthe output space Y, i.e. h : X → Y.

Sophia Katrenko Lecture 2

Page 22: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (4)

Formally, let observations (training data) (X ,Y ) be definedas (X ,Y ) ∈ X × Y on the input space X and the outputspace Y.Pairs (X ,Y ) are random variables distributed according tothe unknown distribution D.The observed data points we denote by (xi, yi) and saythat they are independently and identically distributedaccording to D.The goal is to construct a hypothesis h such that for anyinstance from the input space X it predicts its label fromthe output space Y, i.e. h : X → Y.

Sophia Katrenko Lecture 2

Page 23: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (5)

Let also every example xi ∈ X , i = 1, . . . ,n be representedby a fixed number of features, xi = (xi1, . . . , xik ).For instance, for the task ‘is a given word a noun?’, X is acollection of words, Y = {0,1}, and training examples areof the form {w , y}, where w ∈ X and y ∈ Y, as in{Utrecht ,1}, {in,0}, . . .For PoS tagging, |Y| > 2 and is represented by tags NNS,IN, NN, and others.

Sophia Katrenko Lecture 2

Page 24: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (5)

Let also every example xi ∈ X , i = 1, . . . ,n be representedby a fixed number of features, xi = (xi1, . . . , xik ).For instance, for the task ‘is a given word a noun?’, X is acollection of words, Y = {0,1}, and training examples areof the form {w , y}, where w ∈ X and y ∈ Y, as in{Utrecht ,1}, {in,0}, . . .For PoS tagging, |Y| > 2 and is represented by tags NNS,IN, NN, and others.

Sophia Katrenko Lecture 2

Page 25: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (5)

Let also every example xi ∈ X , i = 1, . . . ,n be representedby a fixed number of features, xi = (xi1, . . . , xik ).For instance, for the task ‘is a given word a noun?’, X is acollection of words, Y = {0,1}, and training examples areof the form {w , y}, where w ∈ X and y ∈ Y, as in{Utrecht ,1}, {in,0}, . . .For PoS tagging, |Y| > 2 and is represented by tags NNS,IN, NN, and others.

Sophia Katrenko Lecture 2

Page 26: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (6)

Classification: For h : X → Y, if Y is discrete (set ofcategories). If Y = {+1,−1}, then it is a binaryclassification taskRegression: if output is continuous (a real number).There are different types of learning:

supervisedunsupervisedsemi-supervisedactive

Sophia Katrenko Lecture 2

Page 27: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (6)

Classification: For h : X → Y, if Y is discrete (set ofcategories). If Y = {+1,−1}, then it is a binaryclassification taskRegression: if output is continuous (a real number).There are different types of learning:

supervisedunsupervisedsemi-supervisedactive

Sophia Katrenko Lecture 2

Page 28: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (6)

Classification: For h : X → Y, if Y is discrete (set ofcategories). If Y = {+1,−1}, then it is a binaryclassification taskRegression: if output is continuous (a real number).There are different types of learning:

supervisedunsupervisedsemi-supervisedactive

Sophia Katrenko Lecture 2

Page 29: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (7)

Supervised learning requires a training set (as describedabove), which is used by an algorithm to produce afunction (hypothesis).Unsupervised learning uses no labeled data, and its goalis to reveal hidden structure in data.Semi-supervised learning takes as input both labeled(small amount) and unlabeled data.In the active learning scenario, a learning algorithm isquerying a human expert for true labels of the examples itselects according to some criteria (i.e., an example thealgorithm is not certain about).

Sophia Katrenko Lecture 2

Page 30: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (7)

Supervised learning requires a training set (as describedabove), which is used by an algorithm to produce afunction (hypothesis).Unsupervised learning uses no labeled data, and its goalis to reveal hidden structure in data.Semi-supervised learning takes as input both labeled(small amount) and unlabeled data.In the active learning scenario, a learning algorithm isquerying a human expert for true labels of the examples itselects according to some criteria (i.e., an example thealgorithm is not certain about).

Sophia Katrenko Lecture 2

Page 31: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (7)

Supervised learning requires a training set (as describedabove), which is used by an algorithm to produce afunction (hypothesis).Unsupervised learning uses no labeled data, and its goalis to reveal hidden structure in data.Semi-supervised learning takes as input both labeled(small amount) and unlabeled data.In the active learning scenario, a learning algorithm isquerying a human expert for true labels of the examples itselects according to some criteria (i.e., an example thealgorithm is not certain about).

Sophia Katrenko Lecture 2

Page 32: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (7)

Supervised learning requires a training set (as describedabove), which is used by an algorithm to produce afunction (hypothesis).Unsupervised learning uses no labeled data, and its goalis to reveal hidden structure in data.Semi-supervised learning takes as input both labeled(small amount) and unlabeled data.In the active learning scenario, a learning algorithm isquerying a human expert for true labels of the examples itselects according to some criteria (i.e., an example thealgorithm is not certain about).

Sophia Katrenko Lecture 2

Page 33: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (7): examples

How does this relate to natural language processing?Most research in NLP (at least initially) has concerned supervisedlearning: parsing (treebanks for training available), named entityrecognition systems, text categorization, others.

It has shifted to semi-supervised learning because of the cost of humanlabour (e.g., for parsing Steedman’02).

Semi-supervised methods perform quite well compared to heavysupervised systems.

Unsupervised learning is used when clustering words/documentsbased on their similarity.

Active learning is less studied, but is becoming more popular in the NLPcommunity (e.g., text annotation by Tomanek et al.’09, and anaphoraresolution by Gasperin’09 at the Workshop on Active Learning for NLP).

Sophia Katrenko Lecture 2

Page 34: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (7): examples

How does this relate to natural language processing?Most research in NLP (at least initially) has concerned supervisedlearning: parsing (treebanks for training available), named entityrecognition systems, text categorization, others.

It has shifted to semi-supervised learning because of the cost of humanlabour (e.g., for parsing Steedman’02).

Semi-supervised methods perform quite well compared to heavysupervised systems.

Unsupervised learning is used when clustering words/documentsbased on their similarity.

Active learning is less studied, but is becoming more popular in the NLPcommunity (e.g., text annotation by Tomanek et al.’09, and anaphoraresolution by Gasperin’09 at the Workshop on Active Learning for NLP).

Sophia Katrenko Lecture 2

Page 35: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (7): examples

How does this relate to natural language processing?Most research in NLP (at least initially) has concerned supervisedlearning: parsing (treebanks for training available), named entityrecognition systems, text categorization, others.

It has shifted to semi-supervised learning because of the cost of humanlabour (e.g., for parsing Steedman’02).

Semi-supervised methods perform quite well compared to heavysupervised systems.

Unsupervised learning is used when clustering words/documentsbased on their similarity.

Active learning is less studied, but is becoming more popular in the NLPcommunity (e.g., text annotation by Tomanek et al.’09, and anaphoraresolution by Gasperin’09 at the Workshop on Active Learning for NLP).

Sophia Katrenko Lecture 2

Page 36: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (7): examples

How does this relate to natural language processing?Most research in NLP (at least initially) has concerned supervisedlearning: parsing (treebanks for training available), named entityrecognition systems, text categorization, others.

It has shifted to semi-supervised learning because of the cost of humanlabour (e.g., for parsing Steedman’02).

Semi-supervised methods perform quite well compared to heavysupervised systems.

Unsupervised learning is used when clustering words/documentsbased on their similarity.

Active learning is less studied, but is becoming more popular in the NLPcommunity (e.g., text annotation by Tomanek et al.’09, and anaphoraresolution by Gasperin’09 at the Workshop on Active Learning for NLP).

Sophia Katrenko Lecture 2

Page 37: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (7): examples

How does this relate to natural language processing?Most research in NLP (at least initially) has concerned supervisedlearning: parsing (treebanks for training available), named entityrecognition systems, text categorization, others.

It has shifted to semi-supervised learning because of the cost of humanlabour (e.g., for parsing Steedman’02).

Semi-supervised methods perform quite well compared to heavysupervised systems.

Unsupervised learning is used when clustering words/documentsbased on their similarity.

Active learning is less studied, but is becoming more popular in the NLPcommunity (e.g., text annotation by Tomanek et al.’09, and anaphoraresolution by Gasperin’09 at the Workshop on Active Learning for NLP).

Sophia Katrenko Lecture 2

Page 38: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (8)

Empirical risk: the risk of the target function t is the minimum over allpossible hypotheses g and is called the Bayes risk R∗ = infgR(g).

Since the underlying distribution is unknown, the quality of h is usuallymeasured by the empirical error in Eq. 1.

Rn(h) =1n

n∑i=1

l(h(xi), yi) (1)

Zero-one loss Several loss functions have been proposed in theliterature so far, the best known of which is the zero-one loss (Eq. 2).This loss is a function that outputs 1 any time a method errs on a datapoint (h(xi) 6= yi) and 0 otherwise.

l(h(xi), yi) = Ih(xi) 6=yi (2)

Sophia Katrenko Lecture 2

Page 39: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (8)

Empirical risk: the risk of the target function t is the minimum over allpossible hypotheses g and is called the Bayes risk R∗ = infgR(g).

Since the underlying distribution is unknown, the quality of h is usuallymeasured by the empirical error in Eq. 1.

Rn(h) =1n

n∑i=1

l(h(xi), yi) (1)

Zero-one loss Several loss functions have been proposed in theliterature so far, the best known of which is the zero-one loss (Eq. 2).This loss is a function that outputs 1 any time a method errs on a datapoint (h(xi) 6= yi) and 0 otherwise.

l(h(xi), yi) = Ih(xi) 6=yi (2)

Sophia Katrenko Lecture 2

Page 40: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (8)

Empirical risk: the risk of the target function t is the minimum over allpossible hypotheses g and is called the Bayes risk R∗ = infgR(g).

Since the underlying distribution is unknown, the quality of h is usuallymeasured by the empirical error in Eq. 1.

Rn(h) =1n

n∑i=1

l(h(xi), yi) (1)

Zero-one loss Several loss functions have been proposed in theliterature so far, the best known of which is the zero-one loss (Eq. 2).This loss is a function that outputs 1 any time a method errs on a datapoint (h(xi) 6= yi) and 0 otherwise.

l(h(xi), yi) = Ih(xi) 6=yi (2)

Sophia Katrenko Lecture 2

Page 41: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (9)

At first glance the goal of any learning algorithm should be to minimizeempirical error Rn(h), which is often referred to as empirical riskminimization.

This turns out to be not sufficient as some methods can perform well onthe training set but be not as accurate on the new data points.

In structural risk minimization (Eq. 3) not only the empirical error istaken into account, but the complexity (capacity) of h as well. In Eq. 3,pen(h) stands for a penalty that reflects complexity of a hypothesis.

gn = arg minh∈H

Rn(h) + pen(h) (3)

Sophia Katrenko Lecture 2

Page 42: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (9)

At first glance the goal of any learning algorithm should be to minimizeempirical error Rn(h), which is often referred to as empirical riskminimization.

This turns out to be not sufficient as some methods can perform well onthe training set but be not as accurate on the new data points.

In structural risk minimization (Eq. 3) not only the empirical error istaken into account, but the complexity (capacity) of h as well. In Eq. 3,pen(h) stands for a penalty that reflects complexity of a hypothesis.

gn = arg minh∈H

Rn(h) + pen(h) (3)

Sophia Katrenko Lecture 2

Page 43: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (9)

At first glance the goal of any learning algorithm should be to minimizeempirical error Rn(h), which is often referred to as empirical riskminimization.

This turns out to be not sufficient as some methods can perform well onthe training set but be not as accurate on the new data points.

In structural risk minimization (Eq. 3) not only the empirical error istaken into account, but the complexity (capacity) of h as well. In Eq. 3,pen(h) stands for a penalty that reflects complexity of a hypothesis.

gn = arg minh∈H

Rn(h) + pen(h) (3)

Sophia Katrenko Lecture 2

Page 44: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (10)

Bias and varianceIf h∗ is the best function in H with R(h∗) = infh∈HR(h), then thedifference |R(h∗)− R∗| is called the approximation error or bias.

A quantity that measures how far any hypothesis h in H is from its besthypothesis R(h∗) is referred to as an estimation error or variance(|R(h∗)− Rn(h)|).Bias does not depend on data used during the training phase whereasvariance always does.

Variance is equal to zero if predictions of a method do not change andare always the same regardless of the training data.

Bias is equal to zero if a classifier outputs the optimal prediction.

Sophia Katrenko Lecture 2

Page 45: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (10)

Bias and varianceIf h∗ is the best function in H with R(h∗) = infh∈HR(h), then thedifference |R(h∗)− R∗| is called the approximation error or bias.

A quantity that measures how far any hypothesis h in H is from its besthypothesis R(h∗) is referred to as an estimation error or variance(|R(h∗)− Rn(h)|).Bias does not depend on data used during the training phase whereasvariance always does.

Variance is equal to zero if predictions of a method do not change andare always the same regardless of the training data.

Bias is equal to zero if a classifier outputs the optimal prediction.

Sophia Katrenko Lecture 2

Page 46: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (10)

Bias and varianceIf h∗ is the best function in H with R(h∗) = infh∈HR(h), then thedifference |R(h∗)− R∗| is called the approximation error or bias.

A quantity that measures how far any hypothesis h in H is from its besthypothesis R(h∗) is referred to as an estimation error or variance(|R(h∗)− Rn(h)|).Bias does not depend on data used during the training phase whereasvariance always does.

Variance is equal to zero if predictions of a method do not change andare always the same regardless of the training data.

Bias is equal to zero if a classifier outputs the optimal prediction.

Sophia Katrenko Lecture 2

Page 47: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (10)

Bias and varianceIf h∗ is the best function in H with R(h∗) = infh∈HR(h), then thedifference |R(h∗)− R∗| is called the approximation error or bias.

A quantity that measures how far any hypothesis h in H is from its besthypothesis R(h∗) is referred to as an estimation error or variance(|R(h∗)− Rn(h)|).Bias does not depend on data used during the training phase whereasvariance always does.

Variance is equal to zero if predictions of a method do not change andare always the same regardless of the training data.

Bias is equal to zero if a classifier outputs the optimal prediction.

Sophia Katrenko Lecture 2

Page 48: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (10)

Bias and varianceIf h∗ is the best function in H with R(h∗) = infh∈HR(h), then thedifference |R(h∗)− R∗| is called the approximation error or bias.

A quantity that measures how far any hypothesis h in H is from its besthypothesis R(h∗) is referred to as an estimation error or variance(|R(h∗)− Rn(h)|).Bias does not depend on data used during the training phase whereasvariance always does.

Variance is equal to zero if predictions of a method do not change andare always the same regardless of the training data.

Bias is equal to zero if a classifier outputs the optimal prediction.

Sophia Katrenko Lecture 2

Page 49: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (12)

Consider for instance binary classification where each examplehas to be classified either as positive or as negative.

Positive examples on which the method errs are referred toas false negatives (FN) and negative examples which itmisclassifies are called false positives (FP).Those examples that are classified correctly are either truepositives (TP) or true negatives (TN).

Sophia Katrenko Lecture 2

Page 50: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (12)

Consider for instance binary classification where each examplehas to be classified either as positive or as negative.

Positive examples on which the method errs are referred toas false negatives (FN) and negative examples which itmisclassifies are called false positives (FP).Those examples that are classified correctly are either truepositives (TP) or true negatives (TN).

Sophia Katrenko Lecture 2

Page 51: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (13)

Accuracy is defined as the fraction of all examples thatwere classified correctly (Eq. 4). Accuracy is often usedwhen the data set is balanced (i.e., a number of truepositives and true negatives is the same).

Acc =TP + TN

TP + TN + FP + FN(4)

Precision reflects how many examples in the data set thatwere classified as positive really belong to true positives,Eq. 5.

precision =TP

TP + FP(5)

Sophia Katrenko Lecture 2

Page 52: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (13)

Accuracy is defined as the fraction of all examples thatwere classified correctly (Eq. 4). Accuracy is often usedwhen the data set is balanced (i.e., a number of truepositives and true negatives is the same).

Acc =TP + TN

TP + TN + FP + FN(4)

Precision reflects how many examples in the data set thatwere classified as positive really belong to true positives,Eq. 5.

precision =TP

TP + FP(5)

Sophia Katrenko Lecture 2

Page 53: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Main learning notions (14)

Recall shows what fraction of the true positives were foundby the method (Eq. 6).

recall =TP

TP + FN(6)

The F1 score is defined as the harmonic mean betweenprecision and recall (Eq. 7).

F1 =2 ∗ precision ∗ recall

precision + recall(7)

Sophia Katrenko Lecture 2

Page 54: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Word Sense Disambiguation (WSD)

Sophia Katrenko Lecture 2

Page 55: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

WSD

We have already discussed polysemy and homonymy last time.Consider, for instance, how many senses bank has in WordNet 3.0(http://wordnetweb.princeton.edu/perl/webwn).

Sophia Katrenko Lecture 2

Page 56: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

WSD

So what is WordNet (Miller et al., 1990)?

A wide-coverage computational lexicon of English which exploitspsycholinguistic theories.

Concepts are expressed as sets of synonyms (synsets){ bank7

n, cant2n, camber2n }

A word sense is a word occurring in a synset, e.g. bank7n is the

seventh sense of noun bank

There are also semantic relations between synsets (e.g.,hypernymy, meronymy, entailment), and lexical relationsbetween word senses (e.g., antonymy, nominalization).

Sophia Katrenko Lecture 2

Page 57: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

WSD

Sentence: Utrecht University has concentrated its leading researchinto fifteen research focus areas.

Utrecht University has concentrated its leading research1 × 3 × 19 × 8 × 1 × 4 × 2 ×into fifteen research focus areas.1 × 1 × 2 × 7 × 6

= 306,432 interpretations!

Note that I already assumed the correct PoS tags here!Utrecht has only 1 sense, and is therefore monosemous, while focusis polysemous.

Sophia Katrenko Lecture 2

Page 58: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

WSD references

WSD is the task of finding out which sense of a word is activated byits use in a particular context in an automatic way.

Navigli R. Word Sense Disambiguation: a Survey. ACMComputing Surveys, 41(2), ACM Press, 2009, pp. 1-69.

Agirre E. and Edmonds P. Word Sense Disambiguation:Algorithms and Applications, New York, USA, Springer, 2006.

Ide N. and Vronis J. Word Sense Disambiguation: The State ofThe Art. Computational Linguistics, 24(1), 1998, pp. 1-40.

Sophia Katrenko Lecture 2

Page 59: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

WSD approaches

WSD has been typically seen as a supervised problem:classification given a fixed number of senses

grouping words having the same sense together (in anunsupervised way, clustering) is called word sensediscrimination

is important for many NLP applications (e.g., machinetranslation)

has been a popular topic for decades: have a look at Senseval-1(1998) up to SemEval (2010)!

http://www.senseval.org/

Sophia Katrenko Lecture 2

Page 60: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Senseval

Senseval has introduced the following tasks:

lexical sample: only a selected number of words are taggedaccording to their senses. E.g., in Senseval-1, these were 35words of different PoS, such as accident, bother, bitter.

all-words: all content (open-class) words in text have to beannotated⇒ more realistic, but also more difficult.

lexical substitution: find an alternative substitute word or phrasefor a target word in context (McCarthy and Navigli, 2007),whereby both synonyms need to be found and the context needsto be disambiguated.

cross-lingual disambiguation: disambiguate a target word bylabeling it with the appropriate translation in other languages(Lefever and Hoste, 2009)

Sophia Katrenko Lecture 2

Page 61: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Senseval

Even though primarily for English, Senseval has expanded itslanguage list to Basque, Chinese, Czech, Danish, Dutch, English,Estonian, Italian, Japanese, Korean, Spanish, Swedish.

Example of lexical substitution

Input: “The packed screening of about 100 high-level press peopleloved the film as well”Output: synonyms for the target movie (5); picture (3)

Example of cross-lingual disambiguation

Input: “Ill buy a train or coach ticket”Output: translations in other languagesDE: Bus (3); Linienbus (2); Omnibus (2); Reisebus (2);NL: autobus (3); bus (3); busvervoer (1); toerbus (1);

Sophia Katrenko Lecture 2

Page 62: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

WSD

So what is a word sense?

R. Navigli: A word sense is a commonly-accepted meaning of aword:

We are fond of fruit such as kiwifruit and banana.

The kiwibird is the national bird of New Zealand.

1 But is the number of senses per word really fixed?

2 What about the boundaries between senses - are they rigid?

Sophia Katrenko Lecture 2

Page 63: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

WSD

So why is it difficult? Consider the distribution of senses (sourceMacCartney; Navigli):

Sophia Katrenko Lecture 2

Page 64: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

WSD: Baselines

take the most frequent sense (MFS) in the corpus (or the firstWordNet sense)

yields around 50-60% accuracy on lexical sample task w/WordNet senses

is a strong baseline (why?)

Sophia Katrenko Lecture 2

Page 65: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

WSD: Baselines

Sophia Katrenko Lecture 2

Page 66: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

WSD approaches

WSD approaches we consider today/next time:

Supervised (Gale et al., 1992)

Dictionary-based (Lesk, simplified Lesk, 1986)

Minimally supervised (Yarowsky, 1995)

Unsupervised (Mihalcea, 2009; Ponzetto and Navigli, 2010)

Our own work on using qualia structures for WS induction (2008)

Sophia Katrenko Lecture 2

Page 67: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

WSD approaches: data

Supervised WSD needs training data! Then the steps are as follows

extract features given training/test set

train a ML method on the training set

apply a model to test data

Sense-annotated corpora for all-words task

SemCor: 200K words from Brown corpus w/ WordNet senses

SENSEVAL 3: 2081 tagged content words

Sophia Katrenko Lecture 2

Page 68: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

WSD approaches: data

SemCor 3.0

<wf cmd=ignore pos=DT>The</wf><wf cmd=done rdf=group pos=NNP lemma=group wnsn=1lexsn=1:03:00:: pn=group>Fulton County Grand Jury</wf><wf cmd=done pos=VB lemma=say wnsn=1 lexsn=2:32:00::>said</wf><wf cmd=done pos=NN lemma=friday wnsn=1 lexsn=1:28:00::>Friday</wf><wf cmd=ignore pos=DT>an</wf><wf cmd=done pos=NN lemma=investigation wnsn=1lexsn=1:09:00::>investigation</wf><wf cmd=ignore pos=IN>of</wf>

Sophia Katrenko Lecture 2

Page 69: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Supervised WSD

We have already talked about Naive Bayesian approach. How to useit for the WSD?

we aim at selecting the most probable sense s of a given wordw , described by a set of features f1 . . . fn, arg maxs∈S P(s|f )

s = arg maxsi∈S

P(s|w) =arg maxsi∈S

P(w |si)P(si)

P(w)

=arg maxs∈S

P(w |si)P(si)

we also naively assume all features to be independent:

s = arg maxsi∈S

P(si)n∏

j=1

P(fj |si)

Sophia Katrenko Lecture 2

Page 70: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Supervised WSD

we have to calculate P(si)

P(si) =freq(si ,w)

freq(w)

we have to calculate P(fj |si)

P(fj |si) =freq(fj , si)

freq(si)

don’t forget smoothing!

Sophia Katrenko Lecture 2

Page 71: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Supervised WSD

Naive Bayes has been used in (Gale, Church, Yarowsky, 92)

to disambiguate 6 words (duty, drug, land, language, position,sentence) with 2 senses each

using context of varying size

achieved around 90%

concluded that wide contexts are useful, as well asnon-immediately surrounding words

Sophia Katrenko Lecture 2

Page 72: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Dictionary-based WSD

Introduced in 1986 by Lesk and uses the following steps

Retrieve all sense definitions of target word from a machinereadable dictionary

Compare with sense definitions of words in context

Choose the sense with the most overlap

Sophia Katrenko Lecture 2

Page 73: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Dictionary-based WSD

Example (MacCartney)

pine(a) a kind of evergreen tree with needle-shaped leaves(b) to waste away through sorrow or illness

cone(a) a solid body which narrows to a point(b) something of this shape, whether solid or hollow(c) fruit of certain evergreen trees

A simplified version of Lesk’s method (Kilgarriff and Rosenzweig,2000) works on the overlap of words, and not senses from thedefinitions

Sophia Katrenko Lecture 2

Page 74: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Minimally supervised WSD

Introduced in 1995 by Yarowsky, based on two assumptions:

one sense per discoursethe sense of a target word can be determined by looking at thewords nearby (e.g. river and finance for bank1

n and bank2n,

respectively)

one sense per collocationsense of a target word tends to be preserved consistently withina single discourse (e.g. a document in finance)

Paper: http://acl.ldc.upenn.edu/P/P95/P95-1026.pdf

Sophia Katrenko Lecture 2

Page 75: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Minimally supervised WSD

It’s a bootstrapping method:

1 start from small seed set of manually annotated data Dl

2 learn decision-list classifier from Dl

3 use learned classifier to label unlabeled data Du

4 move high-confidence examples to Dl

5 repeat from step 2

Sophia Katrenko Lecture 2

Page 76: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Minimally supervised WSD

Source: Yarowsky (1995).

Sophia Katrenko Lecture 2

Page 77: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Minimally supervised WSD

Decision list: a sequence of “if/else if/else” rules

If f1, then class 1

Else if f2, then class 2

. . .

Else class n

Collocational features are identified from tagged data

Word immediately to the left or right of target :The window bars3

n were broken.

Pair of words to immediate left or right of target:The worlds largest bar1

n is here in New York.

Sophia Katrenko Lecture 2

Page 78: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Minimally supervised WSD

For all collocational features the log-likelihood ratio is computed, andthey are ordered according to it:

logP(sensei |fj)P(sensek |fj)

(8)

What does the log-likelihood ratio really mean?

Sophia Katrenko Lecture 2

Page 79: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Minimally supervised WSD

Quote from Yarowsky (1995)

“New data are classified by using the single most predictive piece ofdisambiguating evidence that appears in the target context. By notcombining probabilities, this decision-list approach avoids theproblematic complex modeling of statistical dependenciesencountered in other frameworks.”

Sophia Katrenko Lecture 2

Page 80: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Minimally supervised WSD

Initial decision list for plant (abbreviated), source: Yarowsky (1995)

LogL Collocation Sense8.10 plant life A7.58 manufacturing plant B7.39 life (within ±2-10 words) A7.20 manufacturing (in ±2-10 words) B6.27 animal (within ±2-10 words) A4.70 equipment (within ±2-10 words) B4.39 employee (within ±2-10 words) B4.30 assembly plant B4.10 plant closure B3.52 plant species A3.48 automate (within ±2-10 words) B3.45 microscopic plant A

Sophia Katrenko Lecture 2

Page 81: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Concrete noun categorization task

Sophia Katrenko Lecture 2

Page 82: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

1 Lexical representation/categorization in cognitive sciencea lexical concept is represented by a set of features (Rapp& Caramazza, 1991; Gonnerman et. al., 1997)lexical concepts are atomic representations and“conceptual relations . . . can be captured by the sets ofinferential relations drawn from elementary and complexconcepts” (Almeida, 1999), the thesis of conceptualatomism (Fodor, 1990)

2 Categorization in computational lingusticsword-space models (Sahlgren, 2006; Lenci, Baroni, andothers)

Sophia Katrenko Lecture 2

Page 83: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Data

44 concrete nouns to be categorized in2 categories (natural kind and artifact)3 categories (vegetable, animal and artifact)6 categories (green, fruitTree, bird, groundAnimal, vehicleand tool) the entity derived from the origin.

Sophia Katrenko Lecture 2

Page 84: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Data

44 concrete nouns to be categorized in2 categories (natural kind and artifact)3 categories (vegetable, animal and artifact)6 categories (green, fruitTree, bird, groundAnimal, vehicleand tool) the entity derived from the origin.

Sophia Katrenko Lecture 2

Page 85: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Data

44 concrete nouns to be categorized in2 categories (natural kind and artifact)3 categories (vegetable, animal and artifact)6 categories (green, fruitTree, bird, groundAnimal, vehicleand tool) the entity derived from the origin.

Sophia Katrenko Lecture 2

Page 86: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Data

44 concrete nouns to be categorized in2 categories (natural kind and artifact)3 categories (vegetable, animal and artifact)6 categories (green, fruitTree, bird, groundAnimal, vehicleand tool) the entity derived from the origin.

Sophia Katrenko Lecture 2

Page 87: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Generative Lexicon Theory

Pustejovsky (1998) proposed a linguistically motivatedapproach to modelling categories. Semantic descriptions use 4levels of linguistic representations such as

argument structure (”specification of number and a type oflogic arguments”)event structure (”definition of the event type of anexpression”)qualia structure (”a structural differentiation of thepredicative force for a lexical item”)lexical inheritance structure (”identification of how a lexicalstructure is related to other structures in the type lattice”)

Sophia Katrenko Lecture 2

Page 88: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Generative Lexicon Theory (cont’d)

Sophia Katrenko Lecture 2

Page 89: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Approach (cont’d)

How can we acquire qualia information? Some of the methodsproposed in the past:

Hearst, 1992 (hyperonymy)Girju, 2007 (part-whole relations)Cimiano and Wenderoth, 2007

predefined patterns for all 4 rolesranking results according to some measures

Yamada et al., 2007fully supervisedfocuses on acquisition of telic information

Sophia Katrenko Lecture 2

Page 90: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Approach (cont’d)

We make use of the patterns defined by Cimiano andWenderoth, 2007

role patternx NN is VBZ (a DT|the DT) kind NN of IN

formal x NN is VBZx NN and CC other JJx NN or CC other JJpurpose NN of IN (a DT)* x NN is VBZ

telic purpose NN of IN p NNP is VBZ(a DT|the DT)* x NN is VBZ used VVN to TOp NNP are VBP used VVN to TO

Table: Patterns: some examples

Sophia Katrenko Lecture 2

Page 91: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Approach (cont’d)

role pattern(a DT|the DT)* x NN is VBZ made VVN (up RP )*of IN

constitutive (a DT|the DT)* x NN comprises VVZ(a DT|the DT)* x NN consists VVZ of INp NNP are VBP made VVN (up RP )*of INp NNP comprise VVPto TO * a DT new JJ x NNto TO * a DT complete JJ x NN

agentive to TO * new JJ p NNPto TO * complete JJ p NNPa DT new JJ x NN has VHZ been VBNa DT complete JJ x NN has VHZ been VBN

Table: Patterns: some examples

Sophia Katrenko Lecture 2

Page 92: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Approach (cont’d)

Categorization procedure consists of the following stepsextraction of the passages containing candidates for therole fillers using patterns (Google, 50 snippets per pattern)PoS tagging of all passagesactual extraction of the candidates for the role fillers usingpatternsbuilding a word-space model where rows correspond to thewords provided by the organizers of the challenge andcolumns are the qualia elements for a selected role(clustering using CLUTO toolkit)

Sophia Katrenko Lecture 2

Page 93: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Approach (cont’d)

Categorization procedure consists of the following stepsextraction of the passages containing candidates for therole fillers using patterns (Google, 50 snippets per pattern)PoS tagging of all passagesactual extraction of the candidates for the role fillers usingpatternsbuilding a word-space model where rows correspond to thewords provided by the organizers of the challenge andcolumns are the qualia elements for a selected role(clustering using CLUTO toolkit)

Sophia Katrenko Lecture 2

Page 94: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Approach (cont’d)

Categorization procedure consists of the following stepsextraction of the passages containing candidates for therole fillers using patterns (Google, 50 snippets per pattern)PoS tagging of all passagesactual extraction of the candidates for the role fillers usingpatternsbuilding a word-space model where rows correspond to thewords provided by the organizers of the challenge andcolumns are the qualia elements for a selected role(clustering using CLUTO toolkit)

Sophia Katrenko Lecture 2

Page 95: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Approach (cont’d)

Categorization procedure consists of the following stepsextraction of the passages containing candidates for therole fillers using patterns (Google, 50 snippets per pattern)PoS tagging of all passagesactual extraction of the candidates for the role fillers usingpatternsbuilding a word-space model where rows correspond to thewords provided by the organizers of the challenge andcolumns are the qualia elements for a selected role(clustering using CLUTO toolkit)

Sophia Katrenko Lecture 2

Page 96: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Evaluation

We use two evaluation measures (Zhao and Karypis, 2004):

Entropy

pij =mij

mj,H(cj) = −

L∑i=1

pij log pij (9)

H(C) =K∑

j=1

mj

mH(cj) (10)

Purity

Pu(cj) = maxi=1,...,L

pij ,Pu(C) =K∑

j=1

mj

mPu(cj) (11)

where: C = c1, ..., cK is the output clusteringL is the set of classes (“gold” senses)mij is the number of words in cluster j of class i (a class is a gold cluster)mj is the number of words in cluster jm is the overall number of words to cluster

Sophia Katrenko Lecture 2

Page 97: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Evaluation

clustering entropy purity2-way 0.59 0.803-way 0.00 1.006-way 0.13 0.892-way>1 0.70 0.773-way>1 0.14 0.966-way>1 0.23 0.82

Table: Performance using formal role only

Sophia Katrenko Lecture 2

Page 98: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

What are the most representative elements in the clusters?

The similarity between elements in a cluster is measured asfollows:

zI =sI

j − µIl

δIl

(12)

sIj stands for the average similarity between the object j and the

rest objects in the same cluster, µIl is the average of sI

j valuesover all objects in the l th cluster, and δI

l is the standarddeviation of the similarities.

Sophia Katrenko Lecture 2

Page 99: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

What are the most representative elements in the clusters?

the core of the cluster respresenting tools is formed bychisel followed by knife and scissors as they have thelargest internal z-score (the same cluster wrongly containsrocket but according to the internal z-score, it is an outlier(with the lowest z-score in the cluster))bowl, cup, bottle and kettle all have the lowest internalz-scores in the cluster of vehicles. The core of the clusteris formed by a truck and motorcycle

Sophia Katrenko Lecture 2

Page 100: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Descriptive and discriminative features: 3-way clustering

Cl FeaturesVEG fruit (41.3%), vegetables (28.3%), crop (14.6%),

food (3.4%), plant (2.5%)ANI animal (43.3%), bird (23.0%), story (6.6%),

pet (3.5%), waterfowl (2.4%)ART tool (31.0%), vehicle (15.3%), weapon (5.4%),

instrument (4.4%), container (3.9%)VEG fruit (21.0%), vegetables (14.3%), animal (11.6%),

crop (7.4%), tool (2.5%)ANI animal (22.1%), bird (11.7%), tool (10.1%),

fruit (7.4%), vegetables (5.1%)ART tool (15.8%), animal (14.8%), bird (7.9%),

vehicle (7.8%), fruit (6.8%)

Sophia Katrenko Lecture 2

Page 101: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Results: telic role

seed extractionshelicopter to rescuerocket to propelchisel to cut, to chop, to cleanhammer to hitkettle to boil, to preparebowl to servepencil to draw, to createspoon to servebottle to store, to pack

Table: Some extractions for the telic role

Sophia Katrenko Lecture 2

Page 102: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Results: constitutive role

seed extractionshelicopter a section, a bodyrocket a section, a part, a bodymotorcycle a frame, a part, a structuretruck a frame, a segment, a program, a compartmenttelephone a tranceiver, a handset, a stationkettle a pool, a cylinderbowl a corpus, a piecepen an ink, a componentspoon a surface, a partchisel a bladehammer a handle, a headbottle a container, a component, a wall, a segment, a piece

Table: Some extractions for the constitutive role

Sophia Katrenko Lecture 2

Page 103: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Results per role

role clustering entropy purity commentsformal 6-way 0.13 0.89 all 44 wordsagentive 6-way 0.54 0.61 43 wordsconstitutive 6-way 0.51 0.61 28 words

Table: Performance using one role only

Sophia Katrenko Lecture 2

Page 104: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Results: formal and agentive roles combined

Figure: A combination of the formal and the agentive rolesSophia Katrenko Lecture 2

Page 105: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

The best performance

The best results are obtained by combining formal role with theagentive one

clustering entropy purity2-way 0.59 0.803-way 0.00 1.006-way 0.09 0.91

Table: Performance using formal and agentive roles

Interestingly, the worst performance on 2-way clustering isachieved by combining formal and constitutive roles (entropy of0.92, purity of 0.66)

Sophia Katrenko Lecture 2

Page 106: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Error analysis

1 Errors due to the extraction procedureincorrect PoS tagging/sentence boundary detectionpatterns do not always provide correct extractions/features(”chicken and other stories”)

2 Ambiguous words (”in fact, scottish gardens are starting tosee many more butterflies including peacocks”)

3 Features that do not suffice to discriminate among allcategories

Sophia Katrenko Lecture 2

Page 107: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Error analysis (cont’d)

1 6-way clustering always fails to discriminate between toolsand vehicles well. Containers (a bowl, a kettle, a cup, abottle) are always placed in the cluster of vehicles (insteadof tools). This is the only type of errors for the 6-wayclustering.

2 In 2-way clustering, vegetables are usually not considerednatural objects

Sophia Katrenko Lecture 2

Page 108: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

Conclusions

1 formal role is already sufficient for identification ofvegetables, animals and artifacts (perfect clustering)

2 a combination of formal and agentive roles provides thebest performance on 6-way clustering (in line withPustejovsky, 2001)

3 no combination of roles accounts well for natural objectsand artifacts

Sophia Katrenko Lecture 2

Page 109: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

To summarize

Today, we have looked atmachine learning problemsWSD methods

Sophia Katrenko Lecture 2

Page 110: Taal- en spraaktechnologie · Word Sense Disambiguation (WSD) Machine learning and languages Learning by a machine can be used to 1 model morphological, syntactic, semantic and pragmatic

Covered so farToday

Machine learning: what is it?Evaluation measuresWord Sense Disambiguation (WSD)

To summarize

Today, we have looked atmachine learning problemsWSD methods

Sophia Katrenko Lecture 2