machine learning 1 · 0 1 −1 0 1 x t m =1 0 1 −1 0 1 x t m =3 0 1 −1 0 1 x t m =9 0 1 −1 0...
TRANSCRIPT
Machine Learning 1 Lecture 1.2 - What is Machine Learning?
Erik Bekkers
(Bishop 1.0 and 1.1)
Image credit: Kirillm | Getty Images
Slide credits: Patrick Forré and Rianne van den Berg
Machine Learning 1
What is machine learning?
2
“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.”
- Tom M. Mitchell
Machine Learning, Tom Mitchell, McGraw Hill, 1997
Machine Learning 1 3
E: Experience
MNIST dataset
• ,
Han
@ritter digits of sire 20h28
1 !
2:
3 : ②
Machine Learning 1 4
E: Experience
6 1. Introduction
SID42354SID31984SID301902SIDW128368SID375990SID360097SIDW325120ESTsChr.10SIDW365099SID377133SID381508SIDW308182SID380265SIDW321925ESTsChr.15SIDW362471SIDW417270SIDW298052SID381079SIDW428642TUPLE1TUP1ERLUMENSIDW416621SID43609ESTsSID52979SIDW357197SIDW366311ESTsSMALLNUCSIDW486740ESTsSID297905SID485148SID284853ESTsChr.15SID200394SIDW322806ESTsChr.2SIDW257915SID46536SIDW488221ESTsChr.5SID280066SIDW376394ESTsChr.15SIDW321854WASWiskottHYPOTHETICALSIDW376776SIDW205716SID239012SIDW203464HLACLASSISIDW510534SIDW279664SIDW201620SID297117SID377419SID114241ESTsCh31SIDW376928SIDW310141SIDW298203PTPRCSID289414SID127504ESTsChr.3SID305167SID488017SIDW296310ESTsChr.6SID47116MITOCHONDRIAL60ChrSIDW376586HomosapiensSIDW487261SIDW470459SID167117SIDW31489SID375812DNAPOLYMERSID377451ESTsChr.1MYBPROTOSID471915ESTsSIDW469884HumanmRNASIDW377402ESTsSID207172RASGTPASESID325394H.sapiensmRNAGNALSID73161SIDW380102SIDW299104
BREA
STRE
NAL
MELAN
OMA
MELAN
OMA
MCF
7D-repro
COLO
NCO
LON
K562B-repro
COLO
NNS
CLC
LEUK
EMIA
RENA
LMELAN
OMA
BREA
STCN
SCN
SRE
NAL
MCF
7A-repro
NSCL
CK562A-repro
COLO
NCN
SNS
CLC
NSCL
CLEUK
EMIA
CNS
OVA
RIAN
BREA
STLEUK
EMIA
MELAN
OMA
MELAN
OMA
OVA
RIAN
OVA
RIAN
NSCL
CRE
NAL
BREA
STMELAN
OMA
OVA
RIAN
OVA
RIAN
NSCL
CRE
NAL
BREA
STMELAN
OMA
LEUK
EMIA
COLO
NBR
EAST
LEUK
EMIA
COLO
NCN
SMELAN
OMA
NSCL
CPR
OST
ATE
NSCL
CRE
NAL
RENA
LNS
CLC
RENA
LLEUK
EMIA
OVA
RIAN
PROST
ATE
COLO
NBR
EAST
RENA
LUN
KNOWN
FIGURE 1.3. DNA microarray data: expression matrix of 6830 genes (rows)and 64 samples (columns), for the human tumor data. Only a random sampleof 100 rows are shown. The display is a heat map, ranging from bright green(negative, under expressed) to bright red (positive, over expressed). Missing valuesare gray. The rows and columns are displayed in a randomly chosen order.
Expression matrix of genes (rows) for 64 human tumor samples (columns). [source: ESL 1.3]
§atmap for activity of a gene•
age
9 9 Tumor
Machine Learning 1 5
E: Experience
Examples of spam emails. [source: Yesware]
OoO
O
Machine Learning 1 6
T: Class of tasksClassification:
=
=
=
=
=¥0 : :}Spamspamspam0K
spam
Machine Learning 1 7
T: Class of tasksRegression
x
t
0 1
−1
0
1
t =
ε ∼
target :
noise :
input : "
sinful • IS"
NCO,I )
O
Machine Learning 1 8
T: Class of tasksRegression
x
t
0 1
−1
0
1
x
t
M = 0
0 1
−1
0
1
x
t
M = 1
0 1
−1
0
1
x
t
M = 3
0 1
−1
0
1
x
t
M = 9
0 1
−1
0
1
Polynomials of order M (red) fit to data constructed as t = sin(2πx) + ε (green)
fcxlzwo t w,x
+wz fit .- t Wm XM
Machine Learning 1 9
T: Class of tasksClustering
6 1. Introduction
SID42354SID31984SID301902SIDW128368SID375990SID360097SIDW325120ESTsChr.10SIDW365099SID377133SID381508SIDW308182SID380265SIDW321925ESTsChr.15SIDW362471SIDW417270SIDW298052SID381079SIDW428642TUPLE1TUP1ERLUMENSIDW416621SID43609ESTsSID52979SIDW357197SIDW366311ESTsSMALLNUCSIDW486740ESTsSID297905SID485148SID284853ESTsChr.15SID200394SIDW322806ESTsChr.2SIDW257915SID46536SIDW488221ESTsChr.5SID280066SIDW376394ESTsChr.15SIDW321854WASWiskottHYPOTHETICALSIDW376776SIDW205716SID239012SIDW203464HLACLASSISIDW510534SIDW279664SIDW201620SID297117SID377419SID114241ESTsCh31SIDW376928SIDW310141SIDW298203PTPRCSID289414SID127504ESTsChr.3SID305167SID488017SIDW296310ESTsChr.6SID47116MITOCHONDRIAL60ChrSIDW376586HomosapiensSIDW487261SIDW470459SID167117SIDW31489SID375812DNAPOLYMERSID377451ESTsChr.1MYBPROTOSID471915ESTsSIDW469884HumanmRNASIDW377402ESTsSID207172RASGTPASESID325394H.sapiensmRNAGNALSID73161SIDW380102SIDW299104
BREA
STRE
NAL
MELAN
OMA
MELAN
OMA
MCF
7D-repro
COLO
NCO
LON
K562B-repro
COLO
NNS
CLC
LEUK
EMIA
RENA
LMELAN
OMA
BREA
STCN
SCN
SRE
NAL
MCF
7A-repro
NSCL
CK562A-repro
COLO
NCN
SNS
CLC
NSCL
CLEUK
EMIA
CNS
OVA
RIAN
BREA
STLEUK
EMIA
MELAN
OMA
MELAN
OMA
OVA
RIAN
OVA
RIAN
NSCL
CRE
NAL
BREA
STMELAN
OMA
OVA
RIAN
OVA
RIAN
NSCL
CRE
NAL
BREA
STMELAN
OMA
LEUK
EMIA
COLO
NBR
EAST
LEUK
EMIA
COLO
NCN
SMELAN
OMA
NSCL
CPR
OST
ATE
NSCL
CRE
NAL
RENA
LNS
CLC
RENA
LLEUK
EMIA
OVA
RIAN
PROST
ATE
COLO
NBR
EAST
RENA
LUN
KNOWN
FIGURE 1.3. DNA microarray data: expression matrix of 6830 genes (rows)and 64 samples (columns), for the human tumor data. Only a random sampleof 100 rows are shown. The display is a heat map, ranging from bright green(negative, under expressed) to bright red (positive, over expressed). Missing valuesare gray. The rows and columns are displayed in a randomly chosen order.
Expression matrix of genes (rows) for 64 human tumor samples (columns). [source: ESL 1.3]
I2.D
N- dimpoint
vector a tumor
few.sn
random www.go's:&:¥
÷:÷÷:im.
t
Machine Learning 1 10
P: Performance measure
= 2
= 8
= 0
Classification
accuracy(y, y) = 1nsamples
nsamples
∑i=1# [yi
-
- Ji )
indicator fund. = {1 if Yi -- Fi0 otherwise
Machine Learning 1 11
P: Performance measureRegression
Polynomials of order M (red) fit to data constructed as t = sin(2πx) + ε (green)
x
t
M = 0
0 1
−1
0
1
x
t
M = 1
0 1
−1
0
1
x
t
M = 3
0 1
−1
0
1
x
t
M = 9
0 1
−1
0
1
MSE(y, y) = 1nsamples
nsamples
∑i=1
largeMSE small MSE
MSE
rill. .
p.
'''
,,
•
°. .
. .
Il l
a
mean squared error z
( yi - ji )[fuecxi )
Machine Learning 1 12
P: Performance measureClustering
6 1. Introduction
SID42354SID31984SID301902SIDW128368SID375990SID360097SIDW325120ESTsChr.10SIDW365099SID377133SID381508SIDW308182SID380265SIDW321925ESTsChr.15SIDW362471SIDW417270SIDW298052SID381079SIDW428642TUPLE1TUP1ERLUMENSIDW416621SID43609ESTsSID52979SIDW357197SIDW366311ESTsSMALLNUCSIDW486740ESTsSID297905SID485148SID284853ESTsChr.15SID200394SIDW322806ESTsChr.2SIDW257915SID46536SIDW488221ESTsChr.5SID280066SIDW376394ESTsChr.15SIDW321854WASWiskottHYPOTHETICALSIDW376776SIDW205716SID239012SIDW203464HLACLASSISIDW510534SIDW279664SIDW201620SID297117SID377419SID114241ESTsCh31SIDW376928SIDW310141SIDW298203PTPRCSID289414SID127504ESTsChr.3SID305167SID488017SIDW296310ESTsChr.6SID47116MITOCHONDRIAL60ChrSIDW376586HomosapiensSIDW487261SIDW470459SID167117SIDW31489SID375812DNAPOLYMERSID377451ESTsChr.1MYBPROTOSID471915ESTsSIDW469884HumanmRNASIDW377402ESTsSID207172RASGTPASESID325394H.sapiensmRNAGNALSID73161SIDW380102SIDW299104
BREA
STRE
NAL
MELAN
OMA
MELAN
OMA
MCF
7D-repro
COLO
NCO
LON
K562B-repro
COLO
NNS
CLC
LEUK
EMIA
RENA
LMELAN
OMA
BREA
STCN
SCN
SRE
NAL
MCF
7A-repro
NSCL
CK562A-repro
COLO
NCN
SNS
CLC
NSCL
CLEUK
EMIA
CNS
OVA
RIAN
BREA
STLEUK
EMIA
MELAN
OMA
MELAN
OMA
OVA
RIAN
OVA
RIAN
NSCL
CRE
NAL
BREA
STMELAN
OMA
OVA
RIAN
OVA
RIAN
NSCL
CRE
NAL
BREA
STMELAN
OMA
LEUK
EMIA
COLO
NBR
EAST
LEUK
EMIA
COLO
NCN
SMELAN
OMA
NSCL
CPR
OST
ATE
NSCL
CRE
NAL
RENA
LNS
CLC
RENA
LLEUK
EMIA
OVA
RIAN
PROST
ATE
COLO
NBR
EAST
RENA
LUN
KNOWN
FIGURE 1.3. DNA microarray data: expression matrix of 6830 genes (rows)and 64 samples (columns), for the human tumor data. Only a random sampleof 100 rows are shown. The display is a heat map, ranging from bright green(negative, under expressed) to bright red (positive, over expressed). Missing valuesare gray. The rows and columns are displayed in a randomly chosen order.
Expression matrix of genes (rows) for 64 human tumor samples (columns). [source: ESL 1.3]
within cluster sum of squares =nsamples
∑i=1
minμj∈C
①
on;
H Mj - Xin'
*
Machine Learning 1 13
P: Performance measureMSE(y, y) =
1
nsamples
nsamplesX
i=1
(yi � yi)2
<latexit sha1_base64="9De/MMwpWJ6v1Mkej6eg31VuxdU=">AAADGXicfZFNbxMxEIadbYESPprCkYtFhJSiUu1WSHCpVPEhcakogrSVsulq1utNrPhjZXuBleVfwZFfww1xhQt/hRPeJJVoCoxk6dXM4/F43rzizNg4/tmJ1tavXL22cb174+at25u9rTvHRtWa0CFRXOnTHAzlTNKhZZbT00pTEDmnJ/nseVs/eU+1YUq+s01FxwImkpWMgA2prCdTAXaqhTt8+9IPmh2cTsHiZhvv47TUQFzinczcOWVAVJwa7z1OTS0yx/YTf/YPAg+ajOFHy5YZ2z7by3r9eDeeB74skqXoo2UcZVudT2mhSC2otISDMaMkruzYgbaMcOq7aW1oBWQGEzoKUoKgZuzmi/H4QcgUuFQ6HGnxPPvnDQfCmEbkgWzHN6u1NvnXWmHahhdedzmfgdbQrMxky6djx2RVWyrJYqSy5tgq3PqBC6YpsbwJAohm4VeYTCFs3gbXuukLGn6t6WGY4HVFNVilH7oU9ETARx+2MEl3WvU/kMlzMKhuKukHooQAWQTPcj9Kxgvv8tL1E78C5GYB5IoX7aIUn0PBxmTVtMvieG83iXeTN4/7B8+Whm6ge+g+GqAEPUEH6BU6QkNE0A/0q7PWWY8+R1+ir9G3BRp1lnfuogsRff8NjQcCRQ==</latexit><latexit sha1_base64="9De/MMwpWJ6v1Mkej6eg31VuxdU=">AAADGXicfZFNbxMxEIadbYESPprCkYtFhJSiUu1WSHCpVPEhcakogrSVsulq1utNrPhjZXuBleVfwZFfww1xhQt/hRPeJJVoCoxk6dXM4/F43rzizNg4/tmJ1tavXL22cb174+at25u9rTvHRtWa0CFRXOnTHAzlTNKhZZbT00pTEDmnJ/nseVs/eU+1YUq+s01FxwImkpWMgA2prCdTAXaqhTt8+9IPmh2cTsHiZhvv47TUQFzinczcOWVAVJwa7z1OTS0yx/YTf/YPAg+ajOFHy5YZ2z7by3r9eDeeB74skqXoo2UcZVudT2mhSC2otISDMaMkruzYgbaMcOq7aW1oBWQGEzoKUoKgZuzmi/H4QcgUuFQ6HGnxPPvnDQfCmEbkgWzHN6u1NvnXWmHahhdedzmfgdbQrMxky6djx2RVWyrJYqSy5tgq3PqBC6YpsbwJAohm4VeYTCFs3gbXuukLGn6t6WGY4HVFNVilH7oU9ETARx+2MEl3WvU/kMlzMKhuKukHooQAWQTPcj9Kxgvv8tL1E78C5GYB5IoX7aIUn0PBxmTVtMvieG83iXeTN4/7B8+Whm6ge+g+GqAEPUEH6BU6QkNE0A/0q7PWWY8+R1+ir9G3BRp1lnfuogsRff8NjQcCRQ==</latexit><latexit sha1_base64="9De/MMwpWJ6v1Mkej6eg31VuxdU=">AAADGXicfZFNbxMxEIadbYESPprCkYtFhJSiUu1WSHCpVPEhcakogrSVsulq1utNrPhjZXuBleVfwZFfww1xhQt/hRPeJJVoCoxk6dXM4/F43rzizNg4/tmJ1tavXL22cb174+at25u9rTvHRtWa0CFRXOnTHAzlTNKhZZbT00pTEDmnJ/nseVs/eU+1YUq+s01FxwImkpWMgA2prCdTAXaqhTt8+9IPmh2cTsHiZhvv47TUQFzinczcOWVAVJwa7z1OTS0yx/YTf/YPAg+ajOFHy5YZ2z7by3r9eDeeB74skqXoo2UcZVudT2mhSC2otISDMaMkruzYgbaMcOq7aW1oBWQGEzoKUoKgZuzmi/H4QcgUuFQ6HGnxPPvnDQfCmEbkgWzHN6u1NvnXWmHahhdedzmfgdbQrMxky6djx2RVWyrJYqSy5tgq3PqBC6YpsbwJAohm4VeYTCFs3gbXuukLGn6t6WGY4HVFNVilH7oU9ETARx+2MEl3WvU/kMlzMKhuKukHooQAWQTPcj9Kxgvv8tL1E78C5GYB5IoX7aIUn0PBxmTVtMvieG83iXeTN4/7B8+Whm6ge+g+GqAEPUEH6BU6QkNE0A/0q7PWWY8+R1+ir9G3BRp1lnfuogsRff8NjQcCRQ==</latexit><latexit sha1_base64="9De/MMwpWJ6v1Mkej6eg31VuxdU=">AAADGXicfZFNbxMxEIadbYESPprCkYtFhJSiUu1WSHCpVPEhcakogrSVsulq1utNrPhjZXuBleVfwZFfww1xhQt/hRPeJJVoCoxk6dXM4/F43rzizNg4/tmJ1tavXL22cb174+at25u9rTvHRtWa0CFRXOnTHAzlTNKhZZbT00pTEDmnJ/nseVs/eU+1YUq+s01FxwImkpWMgA2prCdTAXaqhTt8+9IPmh2cTsHiZhvv47TUQFzinczcOWVAVJwa7z1OTS0yx/YTf/YPAg+ajOFHy5YZ2z7by3r9eDeeB74skqXoo2UcZVudT2mhSC2otISDMaMkruzYgbaMcOq7aW1oBWQGEzoKUoKgZuzmi/H4QcgUuFQ6HGnxPPvnDQfCmEbkgWzHN6u1NvnXWmHahhdedzmfgdbQrMxky6djx2RVWyrJYqSy5tgq3PqBC6YpsbwJAohm4VeYTCFs3gbXuukLGn6t6WGY4HVFNVilH7oU9ETARx+2MEl3WvU/kMlzMKhuKukHooQAWQTPcj9Kxgvv8tL1E78C5GYB5IoX7aIUn0PBxmTVtMvieG83iXeTN4/7B8+Whm6ge+g+GqAEPUEH6BU6QkNE0A/0q7PWWY8+R1+ir9G3BRp1lnfuogsRff8NjQcCRQ==</latexit>
x
t
M = 0
0 1
−1
0
1
x
t
M = 1
0 1
−1
0
1
x
t
M = 3
0 1
−1
0
1
x
t
M = 9
0 1
−1
0
1
Q: On which datapoints should performance be measured?
Best performance on training set :
Best performance on new datapoints :
Generalisation:
g.positing
i
q ,bad
!
great bad
← .
performance shouldbe
measured on new data
[ testdata )IT
Machine Learning 1
What is machine learning?
14
“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.”
- Tom M. Mitchell
Machine Learning, Tom Mitchell, McGraw Hill, 1997
•oo