optimal reverse prediction: linli xu, martha white and dale schuurmans icml 2009, best overall paper...
DESCRIPTION
Motivations 2/31 Lack of a foundational connection between supervised and unsupervised learning Supervised learning: minimizing prediction error Unsupervised learning: re-representing the input data For semi-supervised learning, one needs to consider both together The semi-supervised learning literature relies on intuitions: the “cluster assumption” and the “manifold assumption” A unification demonstrated in this paper leads to a novel semi- supervised principleTRANSCRIPT
![Page 1: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/1.jpg)
Optimal Reverse Prediction:
Linli Xu, Martha White and Dale SchuurmansICML 2009, Best Overall Paper Honorable Mention
A Unified Perspective on Supervised, Unsupervised and Semi-supervised Learning
Discussion led by Chunping WangECE, Duke University
October 23, 2009
![Page 2: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/2.jpg)
Outline
• Motivations • Preliminary Foundations• Reverse Supervised Least Squares • Relationship between Unsupervised Least Squares and
PCA, K-means, and Normalized Graph-cut• Semi-supervised Least Squares• Experiments• Conclusions
1/31
![Page 3: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/3.jpg)
Motivations
2/31
• Lack of a foundational connection between supervised and unsupervised learning
Supervised learning: minimizing prediction error
Unsupervised learning: re-representing the input data
• For semi-supervised learning, one needs to consider both together
• The semi-supervised learning literature relies on intuitions: the “cluster assumption” and the “manifold assumption”
• A unification demonstrated in this paper leads to a novel semi-supervised principle
![Page 4: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/4.jpg)
Preliminary Foundations Forward Supervised Least Squares
3/31
• Data:– a input matrix X, a output matrix Y, – t instances, n features, k responses– regression: – classification: – assumption: X, Y full rank,
• Problem: – Find parameters W minimizing least squares loss for a
model
ktRY 11 YY kt ,}1,0{
YXfW :
nt kt
kn
![Page 5: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/5.jpg)
Preliminary Foundations
4/31
• Linear
• Ridge regularization
• Kernelization
• Instance weighting
])')([(trmin YXWYXWW
YXIXXW
W'WYXWYXWW
')'(
]tr[])')([(trmin1
YIKA
AA'KYKAYKAA
1)(
]tr[])')([(trmin
YIKA
AA'KYKAYKAA
1)(
]tr[])')(([trmin
'XXK
![Page 6: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/6.jpg)
Preliminary Foundations Principal Components Analysis - dimensionality reduction
k-means – clustering
Normalized Graph-cut – clustering
5/31
)'(where,' max XXQWXWZ k
k
i Sxij
Sij
xS1
2* minarg
Weighted undirected graph ),,( AEVGnodes
edgesaffinity matrix
Graph partition problem: find a partition minimizing the total weight of edges connecting nodes in distinct subsets.
![Page 7: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/7.jpg)
Preliminary Foundations Normalized Graph-cut – clustering
6/31
• Partition indicator matrix Z
• Weighted degree matrix
• Total cut
• Normalized cut
kjtiSiSi
Zj
jij ,,1,,,1for,
if0if1
)(diag 1A
]'[tr)( 21
1
'21 LZZzAzC
k
jjj
)]'()'[(tr)( 1
21
1'
'
21 LZZZZ
zzzAz
NCk
j jj
jj
constraint
objective
objective
From Xing & Jordan, 2003
![Page 8: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/8.jpg)
SupervisedLeast
Squares Regression
Principle Component
Analysis
Unsupervised
K-means
Graph Norm Cut
Least Squares
Classification
First contribution
In literature
7/31
![Page 9: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/9.jpg)
This paper
7/31
SupervisedLeast
Squares Regression
Principle Component
Analysis
Unsupervised
K-means
Graph Norm Cut
Least Squares
Classification
Unification
First contribution
![Page 10: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/10.jpg)
Reverse Supervised Least Squares
8/31
• Traditional forward least squares: predict the outputs from the inputs
• Reverse least squares: predict the inputs from the outputs
Given reverse solutions U, the corresponding forward solutions W can be recovered exactly.
])')([(trmin YXWYXWW
])')([(trmin YUXYUXU
rankfullX
![Page 11: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/11.jpg)
Reverse Supervised Least Squares
9/31
• Ridge regularization
• Kernelization
• Instance weighting
YYUIXXWYYUYXWIXX '')'(''')'( 1
YYBIKAYYBYAIK '')('')( 1
')'(])'()([trmin 1YYYBYBIKYBIA
])'()[(trmin YBIKYBIB
Reverse problem:
Recover:
YYBIKAYYBYAIK '')('')( 1
Reverse problem:
Recover:
Recover:
![Page 12: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/12.jpg)
Reverse Supervised Least Squares
10/31
For supervised learning with least squares loss
forward and reverse perspectives are equivalent
each can be recovered exactly from the other
the forward and reverse losses are not identical since they are measured in different units – it is not principled to combine them directly!
![Page 13: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/13.jpg)
Unsupervised Least Squares
11/31
Unsupervised learning: no training labels Y are given
Principle: optimize over guessed labels Z
])')([(trminmin ZXWZXWWZ
• forward
• reverse ])')([(trminmin ZUXZUXUZ
For any W, we can choose Z=XW to achieve zero lossIt only gives trivial solutions It does not Work!
It gives non-trivial solutions
nkktktnt RURZRX ),}1,0{or(,
![Page 14: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/14.jpg)
Unsupervised Least Squares PCA
12/31
Proposition 1 Unconstrained reverse prediction
is equivalent to principal components analysis.
])')([(trminmin ZUXZUXUZ
This connection has been made in Jong& Kotz, 1999, and the authors extend it to the kernelized cases
Corollary 1 Kernelized reverse prediction
is equivalent to kernel principal components analysis.
])'()[(trminmin ZBIKZBIBZ
![Page 15: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/15.jpg)
Unsupervised Least Squares PCA
13/31
Proposition 1 Unconstrained reverse prediction
is equivalent to principal components analysis.
])')([(trminmin ZUXZUXUZ
])')([(trminarg* ZUXZUXUU
Proof
])'(')[(trmin])')([(trmin ** ZZIXXZZIZUXZUXZZ
![Page 16: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/16.jpg)
Unsupervised Least Squares PCA
13/31
Proposition 1 Unconstrained reverse prediction
is equivalent to principal components analysis.
])')([(trminmin ZUXZUXUZ
])')([(trminarg* ZUXZUXUU
Proof
])'(')[(trmin])')([(trmin ** ZZIXXZZIZUXZUXZZ
]')[(trmin XXZZIZ
]'[trmax XXZZZ
')'( 1 ZZZZ Recall thatThe solution for Z is not unique
)('')''()()( 1 ZRZTZTZTZTZTZTZTR
![Page 17: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/17.jpg)
Unsupervised Least Squares PCA
14/31
Proposition 1 Unconstrained reverse prediction
is equivalent to principal components analysis.
])')([(trminmin ZUXZUXUZ
Proof Consider the SVD of Z:
diagonaland','for,' kk IQQIPPQPZ
')( PPZZZR Then
The objective ]''[trmax]''[trmax':':
PXXPXXPPkk IPPPIPPP
)'(max XXQP kSolution
![Page 18: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/18.jpg)
Unsupervised Least Squares k-means
15/31
Proposition 2 Constrained reverse prediction
is equivalent to k-means clustering.
])')([(trminmin,}1,0{:
ZUXZUXUZZZ kt
11
The connection between PCA and k-means clustering has been made in Ding & He, 2004, but the authors show the connection of both to supervised (reverse) least squares.
Corollary 2 Constrained kernelized reverse prediction
is equivalent to kernel k-means.
])'()[(trminmin,}1,0{:
ZBIKZBIBZZZ kt
11
![Page 19: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/19.jpg)
Unsupervised Least Squares k-means
16/31
Proposition 2 Constrained reverse prediction
is equivalent to k-means clustering.
])')([(trminmin,}1,0{:
ZUXZUXUZZZ kt
11
])')([(trmin,}1,0{:
XZZXXZZXZZZ kt
11
Proof Equivalent problem
Consider the differenceXZZZZXXZZX ')'( 1
Diagonal matrix
Counts of data in each class
matrix
Each row: sum of data in each class
nk
![Page 20: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/20.jpg)
Unsupervised Least Squares k-means
17/31
Proposition 2 Constrained reverse prediction
is equivalent to k-means clustering.
])')([(trminmin,}1,0{:
ZUXZUXUZZZ kt
11
Proof
k classofmean
1 classofmean')'( 1 XZZZ
classinstanceofmean
class1instanceofmean')'( 1
tXZZZZ
k
n
n
t
means
encoding
![Page 21: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/21.jpg)
Unsupervised Least Squares k-means
18/31
Proposition 2 Constrained reverse prediction
is equivalent to k-means clustering.
])')([(trminmin,}1,0{:
ZUXZUXUZZZ kt
11
Proof Therefore
])')([(trmin,}1,0{:
XZZXXZZXZZZ kt
11
k
i Sxij
Sij
xS1
2* minarg
![Page 22: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/22.jpg)
Unsupervised Least Squares Norm-cut
19/31
Proposition 3 For a doubly nonnegative matrix K and weighting , weighted reverse prediction
is equivalent to normalized graph-cut.
])'()([trminmin 11
,}1,0{:ZBKZB
BZZZ kt
11
)(diag 1K
Proof For any Z, the solution to the inner minimization
')'( 1* ZZZB
])')'([(trmin 11
,}1,0{:KZZZZ
ZZZ kt
11Reduced objective
]')'([tr-tr[I]min 1
,}1,0{:KZZZZ
ZZZ kt
11
])(')'[(trmin 1
,}1,0{:ZKZZZ
ZZZ kt
11
![Page 23: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/23.jpg)
Unsupervised Least Squares Norm-cut
20/31
Proof Recall the normalized-cut (from Xing & Jordan, 2003)
])(')'[(tr 121 ZAZZZNC
Proposition 3 For a doubly nonnegative matrix K and weighting , weighted reverse prediction
is equivalent to normalized graph-cut.
])'()([trminmin 11
,}1,0{:ZBKZB
BZZZ kt
11
)(diag 1K
Since K doubly nonnegative, it could be a affinity matrix.
The objective is equivalent to normalized graph-cut.
![Page 24: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/24.jpg)
Unsupervised Least Squares Norm-cut
21/31
Corollary 3 The weighted least squares problem
is equivalent to normalized graph-cut on if .
])')(([trminmin 11
,}1,0{:ZUXZUX
UZZZ kt
11
With a specific K, we can relate normalized graph-cut to the reverse least squares.
'XXK 0K
![Page 25: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/25.jpg)
Second contribution
22/31
Reverse Prediction
SupervisedLeast
Squares Learning
Principle Component
Analysis
Unsupervised
K-means
Graph Norm Cut
The figure is taken from Xu’s slides
![Page 26: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/26.jpg)
22/31
Reverse Prediction
SupervisedLeast
Squares Learning
Principle Component
Analysis
Unsupervised
K-means
Graph Norm Cut
New
Semi-Supervised
The figure is taken from Xu’s slides
Second contribution
![Page 27: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/27.jpg)
Semi-supervised Least Squares
23/31
A principled approach: reverse loss decomposition
1x
2x
3x
4x
The figure is taken from Xu’s slides
Supervised reverse losses ])')([(tr YUXYUX
![Page 28: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/28.jpg)
Semi-supervised Least Squares
23/31
A principled approach: reverse loss decomposition
1x
2x
3x
4x
The figure is taken from Xu’s slides
Supervised reverse losses ])')([(tr YUXYUX
![Page 29: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/29.jpg)
Semi-supervised Least Squares
23/31
A principled approach: reverse loss decomposition
1x
2x
3x
4x
The figure is taken from Xu’s slides
3x
Supervised reverse losses ])')([(tr YUXYUX
![Page 30: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/30.jpg)
Semi-supervised Least Squares
23/31
A principled approach: reverse loss decomposition
1x
2x
3x
4x
The figure is taken from Xu’s slides
Supervised reverse losses
3x
])')([(tr YUXYUX
![Page 31: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/31.jpg)
Semi-supervised Least Squares
23/31
A principled approach: reverse loss decomposition
1x
2x
3x
4x
3x*
3x
The figure is taken from Xu’s slides
Supervised reverse losses ])')([(tr YUXYUX Unsupervised reverse losses ])')([(tr ZUXZUX
2*33
2*33
233 ˆˆ xxxxxx
![Page 32: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/32.jpg)
Semi-supervised Least Squares
24/31
Proposition 4 For any X, Y, and U
where
])')([(tr
])')([(tr
])')([(tr
**
**
YUUZYUUZ
UZXUZX
YUXYUX
])')([(trminarg* ZUXZUXZZ
Supervised loss
Unsupervised loss
Squared distance
Unsupervised loss depends only on the input data X;Squared distance depends on both X and Y.
Note: we cannot get the true supervised loss since we don’t have all the labels Y. We may estimate it using only labeled data, or also using auxiliary unlabeled data.
![Page 33: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/33.jpg)
Semi-supervised Least Squares
25/31
Corollary 4 For any U
where
]1E[
][]1E[
]1E[
2*
2*2*
2
FLLL
FUUU
FLLL
FLLL
UYUZT
UZXT
EUZXT
UYXT
])')([(trminarg* ZUXZUXZZ
Supervised loss estimate
Unsupervised loss estimate
Squared distance estimate
Labeled data are scarce, but plenty of unlabeled data are available. The variance of the supervised loss estimate is strictly reduced by introducing the second term to get a better unbiased unsupervised loss estimate.
![Page 34: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/34.jpg)
Semi-supervised Least Squares
26/31
A naive approach:]//[minmin 22
UFULFLLUZTZUXTUYX
Loss on labeled data Loss on unlabeled data
Advantages: • The authors combine supervised and unsupervised
reverse losses; while previous approaches combine unsupervised (reverse) loss with supervised (forward) loss, which are not in the same units.
• Compared to the principled approach, it admits more straightforward optimization procedures (alternating between U and Z)
![Page 35: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/35.jpg)
Regression ExperimentsLeast Squares + PCA
27/31
]//[minmin 22UFULFLLUZ
TZUXTUYX
• Two terms are not jointly convex no closed form solution
• Learning method: alternating (with a initial U got from supervised setting)
• Recovered forward solution
• Testing: given a new x,
• Can be kernelized
Basic formulation
YYUIXXW '')'( 1
xWy 'ˆ
![Page 36: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/36.jpg)
Regression ExperimentsLeast Squares + PCA
28/31
Forward root mean squared error (mean± standard deviations for 10 random splits of the data)
The values of (k, n; TL , TU ) are indicated for each data set.
The table is taken from Xu’s paper
![Page 37: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/37.jpg)
Classification ExperimentsLeast Squares + k-means
29/31
]//[minmin 22
,}1,0{:UFULFLLUZZZ
TZUXTUYXkt
11
xWy 'ˆ
• Recovered forward solution
• Testing: given a new x, , predict max response
Least Squares + Norm-cut
YYUIXXW '')'( 1
]/)(/)([minmin2121
,}1,0{:UFUUULFLLLLUZZZ
TZUXTUYXkt
11
![Page 38: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/38.jpg)
Classification ExperimentsLeast Squares + k-means
30/31
Forward root mean squared error (mean± standard deviations for 10 random splits of the data)
The values of (k, n; TL , TU ) are indicated for each data set.
The table is taken from Xu’s paper
![Page 39: Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b567f8b9ab0599a990e/html5/thumbnails/39.jpg)
Conclusions
31/31
Two main contributions:
1. A unified framework based on reverse least squares loss is proposed for several existing supervised and unsupervised algorithms;
2. In the unified framework, a novel semi-supervised principle is proposed.