zornitsa kozareva usc/isi marina del rey, ca · choice of classifier the attribute whose value is...
Post on 25-Aug-2020
3 Views
Preview:
TRANSCRIPT
!"#$$%&'()*+,-./+(&)+&0123&
Zornitsa Kozareva!USC/ISI!
Marina del Rey, CA!kozareva@isi.edu!
www.isi.edu/~kozareva!
43(-3*5&678&69:;&
!"#$%!&'(&)*%"+,'-*+./+)%0*-%#+*12/34/%$+&256'6%
!/(&7%8&)&%9'+'+4%:*;1&-/%• <*22/=>*+%*0%.&=?'+/%2/&-+'+4%&24*-')?.6%
– *@/+A6*B-=/%@&=(&4/%1-'C/+%'+%D&,&%
• E6/3%0*-%-/6/&-=?F%/3B=&>*+%&+3%&@@2'=&>*+%
• 9&'+%0/&)B-/67%– 3&)&%@-/A@-*=/66'+4%)**26%– 2/&-+'+4%&24*-')?.6%– /,&2B&>*+%./)?*36%
– 4-&@?'=&2%'+0/-/+=/%– /+,'-*+./+)%0*-%=*.@&-'+4%2/&-+'+4%&24*-')?.6%
G%
!/(&7%8&)&%9'+'+4%:*;1&-/%
• <2&66'H=&>*+%&24*-')?.67%– %3/='6'*+%)-//6F%(IIF%:J9F%I&',/AK&5/6%
• L-/3'=>*+%&24*-')?.67%– -/4-/66'*+%M2'+/&-N:J9O%F%@/-=/@)-*+%
• 9/)&A&24*-')?.67%– K&44'+4F%K**6>+4%M$3&P**6)O%
&.*+4%*)?/-6%
Q%
R/S+4%:)&-)/3%
• T+6)&22%!/(&%6*;1&-/%M*+%U'+BVO7%
– 8*1+2*&3%2'+(7%%• ?C@7NN@-3*1+2*&36W6*B-=/0*-4/W+/)N1/(&N1/(&AQAXAGWY'@%• E+Y'@%)?/%6*;1&-/%
– Z/[B'-/./+)7%%%%%D&,&%\W]%M*-%?'4?/-O%
– T+,*(/%!/(&%=*..&+37%• ^&,&%A=@%1/(&W^&-%!"#$%&'())%*+,-
_%
]%
java -Xmx1000M -jar weka.jar Weka GUI Chooser
@relation named_entity
@attribute position numeric @attribute pos_tag { NN, NP, VB, DT} @attribute word_length numeric @attribute in_gazetteer { no, yes} @attribute class { B-PER, I-PER, B-LOC, I-LOC, O}
@data 3,DT,3,no,B-ORG 4,NP,10,yes,I-ORG 15,NP,6,yes,O 7, NN,12,?,B-PER ...
Data File Format (.arff)
Other attribute types:
• String
• Date
Missing value
X%
List of attributes (last: class variable)
Frequency and categories for the selected
attribute
Statistics about the values of the selected attribute
Classification
Filter selection
Manual attribute selection
Statistical attribute selection
Preprocessing
The Preprocessing Tab
Slide adapted from Marti Hearst
Choice of classifier
The attribute whose value is to be predicted from the values of the remaining ones.
Default is the last attribute.
Cross-validation: split the data into e.g. 10 folds and
10 times train on 9 folds and test on the remaining one
The Classification Tab
Slide adapted from Marti Hearst
Choosing a classifier
Slide adapted from Marti Hearst
Slide adapted from Marti Hearst
Slide adapted from Marti Hearst
all other numbers can be obtained from it
different/easy class
accuracy
Slide adapted from Marti Hearst
Running on Test Set
\Q%Slide adapted from Marti Hearst
!"#$%<*..&+3%U'+/%
\_%
!/(&%:@/='H=&>*+6%• -̀&'+%=2&66'H/-%*+%)-&'+'+4%3&)&%&+3%*B)@B)%.*3/2%
• ^&,&%A=@%1/(&W^&-%!'.%//01#2&34*'5(*,%a)%b62%0*&1.#,%%A3%!62%0*#+&)(+#.,-
• ZB+%)-&'+/3%=2&66'H/-%.*3/2%*+%)/6)%3&)&%• ^&,&%A=@%1/(&W^&-%!'.%//01#2&34*'5(*,%a`%b6#/6&1.#,%%A2%b62%0*#+&)(+#.,-
• :@/='05'+4%@&-&./)/-67%A)%7%)-&'+'+4%H2/%MW&-cO%A`%7%)/6)%H2/%MW&-cO%A3%7%*B)@B)%H2/+&./%M)-&'+/3%=2&66'H/-%.*3/2O%A2%7%'+@B)%.*3/2%M0*-%)/6>+4O%A#%7%+B.K/-%*0%+/&-/6)%+/'4?K*-6%0*-%(II%&24*-')?.%&7-8-7#.9-:'7#'$-(46-(67#2-9%2%)#6#2-(95(*/;-#6'<=-
general parameters
Classifier-specific
parameters
\]%
"V&.@2/7%$II%'+%!/(&%
• -̀&'+%&%=2&66'H/-%B6'+4%GII%&24*-')?.%• ^&,&%A=@%1/(&W^&-%%%%%%%%%%%%%%%%%%1/(&W=2&66'H/-6W2&Y5WTP(%%
%%%%%%%%%%%%A)%%3&)&N1/&)?/-W&-c-
%%%%%%%%%%%%A#%%G%
%%%%%%%%%%%%A3%%.*3/2WG++%
• ZB+%)?/%)-&'+/3%=2&66'H/-%*+%)/6)%3&)&%• ^&,&%A=@%1/(&W^&-%%%%%%%%%%%%%%%%%%1/(&W=2&66'H/-6W2&Y5WTP(%%
%%%%%%%%%%%%A`%%3&)&N1/&)?/-W&-c-
%%%%%%%%%%%%A2%%.*3/2WG++%
Classifier-function in weka
Training file Algorithm parameter Output model name
Classifier-function in weka Test file
Input model name
\X%
:&.@2/%!/(&%dB)@B)%
\e%
• <2&66'H=&>*+%2&K/26%0*-%/&=?%'+6)&+=/%MB6/%fa@%\g%*@>*+O%• ^&,&%A=@%1/(&W^&-%%%1/(&W=2&66'H/-6W2&Y5WTK(%%A`%%3&)&N1/&)?/-W&-c%%%A2%%.*3/2WG++%%A@%\%
9*-/%8/)&'2/3%dB)@B)%
\h%
• (II7%%• 8/='6'*+%)-//67%• I&i,/%P&5/67%• $3&P**6)7%%%
!/(&%<2&66'H=&>*+%jB+=>*+6%
\k%
$33'>*+&2%T+0*-.&>*+%
• R/+/-&2%3*=B./+)&>*+7%%%%%?C@7NN111W=6W1&'(&)*W&=W+YN.2N1/(&N%?C@7NN@-3*1+2*&36W6*B-=/0*-4/W+/)N1/(&N1/(&W@@)%
• <*..&+3%2'+/%3*=7%
%%%?C@7NN1/(&W1'('6@&=/6W=*.NL-'./-%
Gl%
$66'4+./+)%m\%
G\%
I&./3%"+>)5%Z/=*4+'>*+%
• R',/+7% &% )-&'+% &+3% 3/,/2*@./+)% 3&)&% 6/)6% *0%"+42'6?%6/+)/+=/6%)&44/3%1')?%)?/%=2&66/67%– PAL"ZF%TAL"Z%M@/*@2/O%– PAdZRF%TAdZR%M*-4&+'Y&>*+O%– PAUd<F%TAUd<%M2*=&>*+O%– PA9T:<F%TA9T:<%M.'6=/22&+/*B6O%– d%M*B)6'3/F%./&+'+4%+*)%&%+&./3%/+>)5%=2&66O%
• n*B-% *K^/=>,/% '67% )*% 3/,/2*@% &% .&=?'+/%2/&-+'+4%I"% 656)/.F%1?'=?%1?/+%4',/+%&%+/1%@-/,'*B625% B+6//+% )/V)% M'W/W% )/6)% 6/)O% 1'22%'3/+>05% &+3% =2&66'05% )?/% +&./3% /+>>/6%=*--/=)25%
GG%
8&)&%8/6=-'@>*+%• `?/% 3&)&% =*+6'6)6% *0% )?-//% =*2B.+6% 6/@&-&)/3% K5% &% 6'+42/%
6@&=/W%"&=?%1*-3%?&6%K//+%@B)%*+%&%6/@&-&)/%2'+/%&+3%)?/-/%'6%&+%/.@)5%2'+/%&;/-%/&=?%6/+)/+=/W%%
GQ%
EWIW%%IIL%PAdZR%%*o='&2%II%d%%
"(/B6%IIL%PAL"Z%?/&36%JPp%d%%0*-%TI%d%%
P&4?3&3%IIL%PAUd<%%W%W%d%%
1*-3%
@&-)A*0A6@//=?A)&4%
+&./3%/+>)5%)&4%
<=>?@A%./&+6%)?/%1*-3%'6%)?/%K/4'++'+4%*0%&%@?-&6/%*0%)5@/%`nL"%B%./&+6%)?/%1*-3%'6%+*)%@&-)%*0%&%@?-&6/%%
Make sure to preserve the empty lines in the output of the test data
Z/6B2)6%0*-%I"%8/)/=>*+%
G_%
!+CDD=6996&"E3(FGH&AI3J-3/+(&K3)3&
8&)&%6/)6% m)*(/+6% mI"6%
-̀&'+% GX_Fe\]% \hFek_%
8/,/2*@./+)% ]GFkGQ% _FQ]\%
/̀6)% ]\F]QQ% QF]]h%
!3**1*3G&1)&3JL86996& @*1.FGF+(& M1.3JJ& N=G.+*1&
PTd%3/,W% kGW_]% klWhh% k\WXX%
AI3J-3/+(&O13G-*1G&
Z/6B2)6%0*-%I"%8/)/=>*+%
G]%
!+CDD=6996&"E3(FGH&AI3J-3/+(&K3)3&
8&)&%6/)6% m)*(/+6% mI"6%
-̀&'+% GX_Fe\]% \hFek_%
8/,/2*@./+)% ]GFkGQ% _FQ]\%
/̀6)% ]\F]QQ% QF]]h%
!3**1*3G&1)&3JL86996& @*1.FGF+(& M1.3JJ& N=G.+*1&
PTd%3/,W% kGW_]% klWhh% k\WXX%
Z/6B2)6%0*-%I"%<2&66'H=&>*+q%
GX%
"E3(FGH&&K1IL&
@*1.FGF+(& M1.3JJ& N=G.+*1&
Ud<% ekWl_% hlWll% ekW]G%
9T:<% ]]W_h% ]_WX\% ]]Wl_%
dZR% ekW]e% eXWlX% eeWee%
L"Z% heW\k% hXWk\% heWl]%
*,/-&22% ekW\]% eeWhl% ehW_e%
"E3(FGH&&>1G)L&
@*1.FGF+(& M1.3JJ& N=G.+*1&
Ud<% h]WeX% ekW_Q% hGW_e%
9T:<% XlW\k% ]eWQ]% ]hWeQ%
dZR% h\WG\% hGW_Q% h\Wh\%
L"Z% h_We\% kQW_e% hhWhe%
*,/-&22% h\WQh% h\W_l% h\WQk%
:56)/.%*0%<&--/-&6%/)%&2WFGllG%
`'./2'+/%
Ge%
M1J13G1&
>*3F(PK1I1J+EQ1()&K3)3& 43(-3*5&67)H&69:;&
>1G)&K3)3& 43(-3*5&R)H&69:;&
M1G-J)&"-SQFGGF+(&K13,JF(1&
43(-3*5&T)H&69:;&U::%#7&EQ&VO>W&
J3)1*&G-SQFGGF+(G&XFJJ&(+)&S1&3..1E)1,&
>1.H(F.3J&M1E+*)&K13,JF(1&
43(-3*5&T)H&69:;&
:BK.')%• `?/%6*B-=/%=*3/%0*-%)?/%0/&)B-/%4/+/-&>*+%%%%%%%MQ321&G-*1&F)&XFJJ&*-(&-(,1*&DF(-YO%
• `?/%*o='&2%)-&'+%&+3%)/6)%0/&)B-/%H2/6%B6/3%'+%)?/%H+&2%-B+F%)*4/)?/-%1')?%)?/%H+&2%*B)@B)%*0%5*B-%656)/.%0*-%)?/%)/6)%3&)&%
• $33'>*+&225%4/+/-&)/3%-/6*B-=/6%M'0%&+5O%
• !-')/%r_%@&4/3%K-'/0%3/6=-'@>*+%*0%5*B-%&@@-*&=?%/V@2&'+'+47%– B6/3%IUL%)**26%– 3/6'4+/3%0/&)B-/6%– /.@2*5/3%.&=?'+/%2/&-+'+4%&24*-')?.s.*>,&>*+%
Gh%
",&2B&>*+%'6%K&6/3%*+%• -&+('+4%*0%5*B-%656)/.%&4&'+6)%)?/%-/6)%
• 3/6'4+/3%0/&)B-/6%– (+I1JF%@-/,'*B625%B+(+*1+%0/&)B-/6%1'22%K/%0&,*-/3%– 656)/.t6%@-/%*-%@*6)%@-*=/66'+4%
• Z1(1*3)1,&*1G+-*.1G%%– 6'Y/F%./)?*36%&+3%6*B-=/6%0*-%4&Y/C//-%/V)-&=>*+%
– )-'44/-%2'6)6%
• [B&2')5%*0%)?/%@&@/-%3/6=-'@>*+%– 6)-B=)B-/%– B6/%*0%2')/-&)B-/%%– 1**+*&3(3J5GFG&
Gk%
R/+/-&)/%n*B-%d1+%Z/6*B-=/6%• "V)-&=)%4&Y/C//-6%0-*.%!'('@/3'&%%
– L/*@2/%M6'+4/-6F%)/&=?/-6F%.&)?/.&>='&+6%/)=WO%– U*=&>*+6%M='>/6F%=*B+)-'/6O%– d-4&+'Y&>*+6%MB+',/-6'>/6F%T`%=*.@&+'/6%/)=WO%
• "V)-&=)%)-'44/-%1*-36%0-*.%!*-3I/)%– 2**(%0*-%?5@*+5.6%*0%@/-6*+F%2*=&>*+F%*-4&+'Y&>*+%
• "V)-&=)%&+3%-&+(%)?/%@&C/-+6%'+%1?'=?%)?/%I"6%*==B--/3%'+%)?/%)-&'+%&+3%3/,/2*@./+)%3&)&W%:?*1%1?&)%@/-=/+)&4/6%*0%)?/6/%1/-/%0*B+3%'+%)?/%H+&2%)/6)%3&)&W%
• "V)-&=)%2'6)6%*0%,/-K6%0*B+3%+/V)%)*%)?/%I"6W%8*%5*B%H+3%&+5%6'.'2&-')5N-/4B2&-')5%*0% )?/%,/-K6%&66*='&)/3%1')?%/&=?%*+/%*0%)?/%I"%=&)/4*-'/6u%
Ql%
!?&)%.B6)%T%3*%v%• E6/% )?/% )-&'+% &+3% 3/,/2*@./+)% 3&)&% )*% 3/6'4+%&+3%)B+/%5*B-%I"%656)/.%
• 8/='3/% *+% )?/% 0/&)B-/6% 5*B% 1*B23% 2'(/% )*%'+=*-@*-&)/%'+%5*B-%I"%656)/.%
• <?**6/%&%.&=?'+/%2/&-+'+4%=2&66'H/-%0-*.%!/(&%• %?C@7NN111W=6W1&'(&)*W&=W+YN.2N1/(&N%• T+)-*%K5%9&->%w/&-6)%?C@7NN=*B-6/6W'6=?**2WK/-(/2/5W/3BN'G]XN0lXN2/=)B-/6N2/=)B-/\XW@@)%
>HFG&FG&3&SFZ&3GGFZ(Q1()&G)3*)&13*J5[&Q\%
!?&)%5*B%.B6)%+*)%3*%v%
• \G1& 1YFG/(Z& (3Q1,& 1(/)5& G5G)1QUGW& +*& JFS*3*5&1F)H1*&3G&3&]13)-*1&Z1(1*3)+*8&+-)E-)&Z1(1*3)+*&1).L%%
• T0%5*B%3*F%)?/+%5*B%1'22%?&,/%)*%-B+%5*B-%656)/.%0*-%)1*%&33'>*+&2%2&+4B&4/6%:@&+'6?F%T)&2'&+%!%
QG%
$,&'2&K2/%Z/6*B-=/6%• !*-3I/)%?C@7NN1*-3+/)W@-'+=/)*+W/3BN%• L&-)A*0A6@//=?%)&44/-6%
– -̀// &̀44/-?C@7NN111W'.6WB+'A6)BC4&-)W3/N@-*^/()/N=*-@2/VN -̀// &̀44/-N8/='6'*+ -̀// &̀44/-W?).2%
– :)&+0*-3%L*:% &̀44/-%?C@7NN+2@W6)&+0*-3W/3BN6*;1&-/N)&44/-W6?).2%
• IL%=?B+(/-%– ?C@7NN111W3=6W6?/0W&=WB(Nr.&-(N'+3/VW?).2u?C@7NN111W3=6W6?/0W&=WB(Nr.&-(N@?3N6*;1&-/N=?B+(/-W?).2%
• L&-6/-%– :)&+0*-3%L&-6/-%?C@7NN+2@W6)&+0*-3W/3BN6*;1&-/N2/VA@&-6/-W6?).2%
• d)?/-%% %?C@7NN+2@W6)&+0*-3W/3BN2'+(6N6)&)+2@W?).2%
QQ%
R**3%UB=(x%
Q_%
top related