semi-supervised single-label text categorization using...
TRANSCRIPT
Semi-supervised Single-label Text Categorizationusing Centroid-based Classifiers
Ana Cardoso-Cachopo Arlindo Oliveira
Instituto Superior Tecnico — Technical University of Lisbon / INESC-ID
SAC-IAR 2007, March 12th
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 1 / 19
Outline
1 Problem Description
2 Characteristics of the Datasets
3 Why use Centroid-based Methods
4 Why use Unlabeled Data
5 Incorporate Unlabeled Data using EM
6 Incrementally Incorporate Unlabeled Data
7 Experimental Results
8 Conclusions and Future Work(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 2 / 19
Outline
1 Problem Description
2 Characteristics of the Datasets
3 Why use Centroid-based Methods
4 Why use Unlabeled Data
5 Incorporate Unlabeled Data using EM
6 Incrementally Incorporate Unlabeled Data
7 Experimental Results
8 Conclusions and Future Work(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 2 / 19
Outline
1 Problem Description
2 Characteristics of the Datasets
3 Why use Centroid-based Methods
4 Why use Unlabeled Data
5 Incorporate Unlabeled Data using EM
6 Incrementally Incorporate Unlabeled Data
7 Experimental Results
8 Conclusions and Future Work(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 2 / 19
Outline
1 Problem Description
2 Characteristics of the Datasets
3 Why use Centroid-based Methods
4 Why use Unlabeled Data
5 Incorporate Unlabeled Data using EM
6 Incrementally Incorporate Unlabeled Data
7 Experimental Results
8 Conclusions and Future Work(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 2 / 19
Outline
1 Problem Description
2 Characteristics of the Datasets
3 Why use Centroid-based Methods
4 Why use Unlabeled Data
5 Incorporate Unlabeled Data using EM
6 Incrementally Incorporate Unlabeled Data
7 Experimental Results
8 Conclusions and Future Work(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 2 / 19
Outline
1 Problem Description
2 Characteristics of the Datasets
3 Why use Centroid-based Methods
4 Why use Unlabeled Data
5 Incorporate Unlabeled Data using EM
6 Incrementally Incorporate Unlabeled Data
7 Experimental Results
8 Conclusions and Future Work(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 2 / 19
Outline
1 Problem Description
2 Characteristics of the Datasets
3 Why use Centroid-based Methods
4 Why use Unlabeled Data
5 Incorporate Unlabeled Data using EM
6 Incrementally Incorporate Unlabeled Data
7 Experimental Results
8 Conclusions and Future Work(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 2 / 19
Outline
1 Problem Description
2 Characteristics of the Datasets
3 Why use Centroid-based Methods
4 Why use Unlabeled Data
5 Incorporate Unlabeled Data using EM
6 Incrementally Incorporate Unlabeled Data
7 Experimental Results
8 Conclusions and Future Work(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 2 / 19
Problem Description
Text Categorization
Single-label
DatasetsI Reuters 21578 - R8I 20 Newsgroups - 20NgI Web Knowledge Base - Web4I Cade - Cade12
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 3 / 19
Problem Description
Text Categorization
Single-label
DatasetsI Reuters 21578 - R8I 20 Newsgroups - 20NgI Web Knowledge Base - Web4I Cade - Cade12
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 3 / 19
Problem Description
Text Categorization
Single-label
DatasetsI Reuters 21578 - R8I 20 Newsgroups - 20NgI Web Knowledge Base - Web4I Cade - Cade12
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 3 / 19
Problem Description
Text Categorization
Single-label
DatasetsI Reuters 21578 - R8I 20 Newsgroups - 20NgI Web Knowledge Base - Web4I Cade - Cade12
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 3 / 19
Problem Description
Text Categorization
Single-label
DatasetsI Reuters 21578 - R8I 20 Newsgroups - 20NgI Web Knowledge Base - Web4I Cade - Cade12
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 3 / 19
Problem Description
Text Categorization
Single-label
DatasetsI Reuters 21578 - R8I 20 Newsgroups - 20NgI Web Knowledge Base - Web4I Cade - Cade12
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 3 / 19
Problem Description
Text Categorization
Single-label
DatasetsI Reuters 21578 - R8I 20 Newsgroups - 20NgI Web Knowledge Base - Web4I Cade - Cade12
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 3 / 19
Characteristics of the Datasets
Train Test Total Smallest LargestDocs Docs Docs Class Class
R8 5485 2189 7674 51 3923
20Ng 11293 7528 18821 628 999
Web4 2803 1396 4199 504 1641
Cade12 27322 13661 40983 625 8473
Numbers of documents for the datasets: number of training documents,number of test documents, total number of documents, number ofdocuments in the smallest class, and number of documents in the largestclass.
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 4 / 19
Why use Centroid-based Methods
Very fast
Good Accuracy
0.0
0.2
0.4
0.6
0.8
1.0Centroid
SVM
k-NN
LSI
Vector
Dumb
R8 20Ng Web4 Cade12
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 5 / 19
Why use Centroid-based Methods
Very fast
Good Accuracy
0.0
0.2
0.4
0.6
0.8
1.0Centroid
SVM
k-NN
LSI
Vector
Dumb
R8 20Ng Web4 Cade12
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 5 / 19
Why use Centroid-based Methods
Very fast
Good Accuracy
0.0
0.2
0.4
0.6
0.8
1.0Centroid
SVM
k-NN
LSI
Vector
Dumb
R8 20Ng Web4 Cade12
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 5 / 19
Why use Centroid-based Methods
Very fast
Good Accuracy
0.0
0.2
0.4
0.6
0.8
1.0Centroid
SVM
k-NN
LSI
Vector
Dumb
R8 20Ng Web4 Cade12
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 5 / 19
Why use Centroid-based Methods
Very fast
Good Accuracy
0.0
0.2
0.4
0.6
0.8
1.0Centroid
SVM
k-NN
LSI
Vector
Dumb
R8 20Ng Web4 Cade12
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 5 / 19
Why use Centroid-based Methods
Very fast
Good Accuracy
0.0
0.2
0.4
0.6
0.8
1.0Centroid
SVM
k-NN
LSI
Vector
Dumb
R8 20Ng Web4 Cade12
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 5 / 19
Why use Unlabeled Data
Small amounts of labeled data available
Large amounts of unlabeled data available
Hard or expensive to label new data
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 6 / 19
Why use Unlabeled Data
Small amounts of labeled data available
Large amounts of unlabeled data available
Hard or expensive to label new data
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 6 / 19
Why use Unlabeled Data
Small amounts of labeled data available
Large amounts of unlabeled data available
Hard or expensive to label new data
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 6 / 19
Incorporate Unlabeled Data using EM
If the entire dataset is available from the start, like in a library.
Inputs: A set of labeled document vectors, L, and a set of unlabeleddocument vectors U.Initialization step: For each class cj appearing in L, determine the class´scentroid −→cj , using one of the formulas for the centroids and consideringonly the labeled documents.Estimation step: For each unlabeled document dj ∈ U, classify itaccording to the available centroids.Maximization step: For each class cj , update its centroid −−→cjnew ,considering the labeled documents and the labels for the unlabeleddocuments obtained in the previous step.Iterate: Until the centroids do not change in two consecutive iterations.Outputs: For each class cj , the centroid −→cj .
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 7 / 19
Incorporate Unlabeled Data using EM
If the entire dataset is available from the start, like in a library.
Inputs: A set of labeled document vectors, L, and a set of unlabeleddocument vectors U.Initialization step: For each class cj appearing in L, determine the class´scentroid −→cj , using one of the formulas for the centroids and consideringonly the labeled documents.Estimation step: For each unlabeled document dj ∈ U, classify itaccording to the available centroids.Maximization step: For each class cj , update its centroid −−→cjnew ,considering the labeled documents and the labels for the unlabeleddocuments obtained in the previous step.Iterate: Until the centroids do not change in two consecutive iterations.Outputs: For each class cj , the centroid −→cj .
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 7 / 19
Incorporate Unlabeled Data using EM
If the entire dataset is available from the start, like in a library.
Inputs: A set of labeled document vectors, L, and a set of unlabeleddocument vectors U.Initialization step: For each class cj appearing in L, determine the class´scentroid −→cj , using one of the formulas for the centroids and consideringonly the labeled documents.Estimation step: For each unlabeled document dj ∈ U, classify itaccording to the available centroids.Maximization step: For each class cj , update its centroid −−→cjnew ,considering the labeled documents and the labels for the unlabeleddocuments obtained in the previous step.Iterate: Until the centroids do not change in two consecutive iterations.Outputs: For each class cj , the centroid −→cj .
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 7 / 19
Incorporate Unlabeled Data using EM
If the entire dataset is available from the start, like in a library.
Inputs: A set of labeled document vectors, L, and a set of unlabeleddocument vectors U.Initialization step: For each class cj appearing in L, determine the class´scentroid −→cj , using one of the formulas for the centroids and consideringonly the labeled documents.Estimation step: For each unlabeled document dj ∈ U, classify itaccording to the available centroids.Maximization step: For each class cj , update its centroid −−→cjnew ,considering the labeled documents and the labels for the unlabeleddocuments obtained in the previous step.Iterate: Until the centroids do not change in two consecutive iterations.Outputs: For each class cj , the centroid −→cj .
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 7 / 19
Incorporate Unlabeled Data using EM
If the entire dataset is available from the start, like in a library.
Inputs: A set of labeled document vectors, L, and a set of unlabeleddocument vectors U.Initialization step: For each class cj appearing in L, determine the class´scentroid −→cj , using one of the formulas for the centroids and consideringonly the labeled documents.Estimation step: For each unlabeled document dj ∈ U, classify itaccording to the available centroids.Maximization step: For each class cj , update its centroid −−→cjnew ,considering the labeled documents and the labels for the unlabeleddocuments obtained in the previous step.Iterate: Until the centroids do not change in two consecutive iterations.Outputs: For each class cj , the centroid −→cj .
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 7 / 19
Incorporate Unlabeled Data using EM
If the entire dataset is available from the start, like in a library.
Inputs: A set of labeled document vectors, L, and a set of unlabeleddocument vectors U.Initialization step: For each class cj appearing in L, determine the class´scentroid −→cj , using one of the formulas for the centroids and consideringonly the labeled documents.Estimation step: For each unlabeled document dj ∈ U, classify itaccording to the available centroids.Maximization step: For each class cj , update its centroid −−→cjnew ,considering the labeled documents and the labels for the unlabeleddocuments obtained in the previous step.Iterate: Until the centroids do not change in two consecutive iterations.Outputs: For each class cj , the centroid −→cj .
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 7 / 19
Incorporate Unlabeled Data using EM
If the entire dataset is available from the start, like in a library.
Inputs: A set of labeled document vectors, L, and a set of unlabeleddocument vectors U.Initialization step: For each class cj appearing in L, determine the class´scentroid −→cj , using one of the formulas for the centroids and consideringonly the labeled documents.Estimation step: For each unlabeled document dj ∈ U, classify itaccording to the available centroids.Maximization step: For each class cj , update its centroid −−→cjnew ,considering the labeled documents and the labels for the unlabeleddocuments obtained in the previous step.Iterate: Until the centroids do not change in two consecutive iterations.Outputs: For each class cj , the centroid −→cj .
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 7 / 19
Incrementally Incorporate Unlabeled Data
If the dataset changes over time, like a news feed or the web.
Inputs: A set of labeled document vectors, L, and a set of unlabeleddocument vectors U.Initialization step: For each class cj appearing in L, determine the class´scentroid −→cj , using one of the formulas for the centroids and consideringonly the labeled documents.Iterate: For each unlabeled document dj ∈ U:
Classify dj according to its similarity to each of the centroids.
Update the centroids with the new document dj classified in theprevious step.
Outputs: For each class cj , the centroid −→cj .
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 8 / 19
Incrementally Incorporate Unlabeled Data
If the dataset changes over time, like a news feed or the web.
Inputs: A set of labeled document vectors, L, and a set of unlabeleddocument vectors U.Initialization step: For each class cj appearing in L, determine the class´scentroid −→cj , using one of the formulas for the centroids and consideringonly the labeled documents.Iterate: For each unlabeled document dj ∈ U:
Classify dj according to its similarity to each of the centroids.
Update the centroids with the new document dj classified in theprevious step.
Outputs: For each class cj , the centroid −→cj .
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 8 / 19
Incrementally Incorporate Unlabeled Data
If the dataset changes over time, like a news feed or the web.
Inputs: A set of labeled document vectors, L, and a set of unlabeleddocument vectors U.Initialization step: For each class cj appearing in L, determine the class´scentroid −→cj , using one of the formulas for the centroids and consideringonly the labeled documents.Iterate: For each unlabeled document dj ∈ U:
Classify dj according to its similarity to each of the centroids.
Update the centroids with the new document dj classified in theprevious step.
Outputs: For each class cj , the centroid −→cj .
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 8 / 19
Incrementally Incorporate Unlabeled Data
If the dataset changes over time, like a news feed or the web.
Inputs: A set of labeled document vectors, L, and a set of unlabeleddocument vectors U.Initialization step: For each class cj appearing in L, determine the class´scentroid −→cj , using one of the formulas for the centroids and consideringonly the labeled documents.Iterate: For each unlabeled document dj ∈ U:
Classify dj according to its similarity to each of the centroids.
Update the centroids with the new document dj classified in theprevious step.
Outputs: For each class cj , the centroid −→cj .
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 8 / 19
Incrementally Incorporate Unlabeled Data
If the dataset changes over time, like a news feed or the web.
Inputs: A set of labeled document vectors, L, and a set of unlabeleddocument vectors U.Initialization step: For each class cj appearing in L, determine the class´scentroid −→cj , using one of the formulas for the centroids and consideringonly the labeled documents.Iterate: For each unlabeled document dj ∈ U:
Classify dj according to its similarity to each of the centroids.
Update the centroids with the new document dj classified in theprevious step.
Outputs: For each class cj , the centroid −→cj .
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 8 / 19
Incrementally Incorporate Unlabeled Data
If the dataset changes over time, like a news feed or the web.
Inputs: A set of labeled document vectors, L, and a set of unlabeleddocument vectors U.Initialization step: For each class cj appearing in L, determine the class´scentroid −→cj , using one of the formulas for the centroids and consideringonly the labeled documents.Iterate: For each unlabeled document dj ∈ U:
Classify dj according to its similarity to each of the centroids.
Update the centroids with the new document dj classified in theprevious step.
Outputs: For each class cj , the centroid −→cj .
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 8 / 19
Incrementally Incorporate Unlabeled Data
If the dataset changes over time, like a news feed or the web.
Inputs: A set of labeled document vectors, L, and a set of unlabeleddocument vectors U.Initialization step: For each class cj appearing in L, determine the class´scentroid −→cj , using one of the formulas for the centroids and consideringonly the labeled documents.Iterate: For each unlabeled document dj ∈ U:
Classify dj according to its similarity to each of the centroids.
Update the centroids with the new document dj classified in theprevious step.
Outputs: For each class cj , the centroid −→cj .
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 8 / 19
Experimental Results - Synthetic Dataset
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-5 -4 -3 -2 -1 0 1 2 3 4 5
Gaussian Distributionsµ1 = 1.0, σ1 = 1.0
µ2 = −1.0, σ2 = 1.0
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 9 / 19
Experimental Results - Synthetic Dataset
0.72
0.74
0.76
0.78
0.8
0.82
0.84
0.86
0 5 10 15 20
Acc
ura
cy
Labeled documents per class
µ1 = 1.0, σ1 = 1.0, µ2 = −1.0, σ2 = 1.0
CentroidEMInc
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-5 -4 -3 -2 -1 0 1 2 3 4 5
Gaussian Distributionsµ1 = 1.0, σ1 = 1.0
µ2 = −1.0, σ2 = 1.0
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 10 / 19
Experimental Results - Synthetic Dataset
0.58
0.6
0.62
0.64
0.66
0.68
0.7
0 5 10 15 20
Acc
ura
cy
Labeled documents per class
µ1 = 1.0, σ1 = 2.0, µ2 = −1.0, σ2 = 2.0
CentroidEMInc
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-5 -4 -3 -2 -1 0 1 2 3 4 5
Gaussian Distributionsµ1 = 1.0, σ1 = 2.0
µ2 = −1.0, σ2 = 2.0
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 11 / 19
Experimental Results - Synthetic Dataset
0.52
0.53
0.54
0.55
0.56
0.57
0.58
0.59
0 5 10 15 20
Acc
ura
cy
Labeled documents per class
µ1 = 1.0, σ1 = 4.0, µ2 = −1.0, σ2 = 4.0
CentroidEMInc
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-5 -4 -3 -2 -1 0 1 2 3 4 5
Gaussian Distributionsµ1 = 1.0, σ1 = 4.0
µ2 = −1.0, σ2 = 4.0
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 12 / 19
Experimental Results - Real World Datasets
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
0 5 10 15 20 25 30 35 40
Acc
ura
cy
Labeled documents per class
R8
CentroidEMInc
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 13 / 19
Experimental Results - Real World Datasets
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 10 20 30 40 50 60 70
Acc
ura
cy
Labeled documents per class
20Ng
CentroidEMInc
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 14 / 19
Experimental Results - Real World Datasets
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 10 20 30 40 50 60 70
Acc
ura
cy
Labeled documents per class
Web4
CentroidEMInc
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 15 / 19
Experimental Results - Real World Datasets
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 10 20 30 40 50 60 70
Acc
ura
cy
Labeled documents per class
Cade12
CentroidEMInc
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 16 / 19
Experimental Results - Real World Datasets
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
0 5 10 15 20 25 30 35 40
Acc
ura
cy
Labeled documents per class
R8
CentroidEMInc
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 10 20 30 40 50 60 70
Acc
ura
cy
Labeled documents per class
20Ng
CentroidEMInc
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 10 20 30 40 50 60 70
Acc
ura
cy
Labeled documents per class
Web4
CentroidEMInc
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 10 20 30 40 50 60 70
Acc
ura
cy
Labeled documents per class
Cade12
CentroidEMInc
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 17 / 19
Conclusions and Future Work
If the initial model of the data is sufficiently precise, using unlabeleddata improves performance.
Using unlabeled data degrades performance if the initial model is notprecise enough.
As future work, we plan to extend this approach to multi-labeldatasets.
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 18 / 19
Conclusions and Future Work
If the initial model of the data is sufficiently precise, using unlabeleddata improves performance.
Using unlabeled data degrades performance if the initial model is notprecise enough.
As future work, we plan to extend this approach to multi-labeldatasets.
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 18 / 19
Conclusions and Future Work
If the initial model of the data is sufficiently precise, using unlabeleddata improves performance.
Using unlabeled data degrades performance if the initial model is notprecise enough.
As future work, we plan to extend this approach to multi-labeldatasets.
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 18 / 19
Thank You.
Any Questions?
(IST-TULisbon/INESC-ID) Ana Cardoso-Cachopo SAC-IAR 2007, March 12th 19 / 19