crisp 12 pre
TRANSCRIPT
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 1/81
ECML/PKDD-2003Knowledge DiscoveryStandards
Tutorial resented !y"Sara! #nand $%niversity o& %lster'(
Mar)o *ro!elni) $+nstitute ,oe& Ste&an' and Dietric. ettsc.erec) $T.e o!ert *ordon
%niversity' Tuesday( 231 Sete!er 2003
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 2/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
Tutorial !4ectives
verview o& e5isting KD-standards Motivation &or using KD-standards
6ow do t.ese standards relate toeac. ot.er7
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 3/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
*lo!al view" C+SP-DM
Model generation" ,DMS8L/MM(LED9 DM
Data access"S8L
inter&aces
Model reresentation" PMML
T.e Knowledge Discovery
Process Data access"S8L
inter&aces
Model reresentation" PMML
Model generation" ,DMS8L/MM(LED9 DM
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 4/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
Tutorial utline
+ntroduction C+SP-DM S8L inter&aces &or Data Mining 9rea) ,ava Data Mining #P+
Predictive Model Mar)-u Language E5ales
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 5/81
C+SP-DM" # Standard
Process Model &or DataMining.tt"//www1cris-d1org/
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 6/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
.at is C+SP-DM7
Cross-+ndustry Standard Process &or Data Mining #i"
To develo an industry( tool and alication neutralrocess &or conducting Knowledge Discovery
De:ne tas)s( oututs &ro t.ese tas)s( terinologyand ining ro!le tye c.aracteriation
;ounding Consortiu Me!ers" DailerC.rysler(SPSS and <C
C+SP-DM Secial +nterest *rou = 200 e!ers Manageent Consultants Data are.ousing and Data Mining Practitioners
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 7/81ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
;our Levels o& #!straction P.ases
E5ale" Data Prearation
*eneric Tas)s # sta!le( general and colete set o& tas)s E5ale" Data Cleaning
Secialied Tas) 6ow is t.e generic tas) carried out E5ale" Missing >alue 6andling
Process +nstance E5ale" T.e ean value &or nueric attri!utes and
t.e ost &re?uent &or categorical attri!utes wasused
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 8/81ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
P.ases o& C+SP-DM
<ot linear( reeatedly !ac)trac)ing
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 9/81ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
9usiness %nderstanding
P.ase %nderstand t.e !usiness o!4ectives .at is t.e status ?uo7
%nderstand !usiness rocesses #ssociated costs/ain
De:ne t.e success criteria Develo a glossary o& ters" sea) t.e language Cost/9ene:t #nalysis
Current Systes #ssessent +denti&y t.e )ey actors
Miniu" T.e Sonsor and t.e Key %ser .at &ors s.ould t.e outut ta)e7 +ntegration o& outut wit. e5isting tec.nology landscae %nderstand ar)et nors and standards
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 10/81ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
9usiness %nderstanding
P.ase Tas) Decoosition 9rea) down t.e o!4ective into su!-tas)s Ma su!-tas)s to data ining ro!le de:nitions
+denti&y Constraints esources Law e1g1 Data Protection
9uild a ro4ect lan List assutions and ris)
$tec.nical/:nancial/!usiness/ organisational'&actors
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 11/81ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
Data %nderstanding P.ase Collect Data
.at are t.e data sources7 +nternal and E5ternal Sources $e1g1 #5io(
E5erian' Docuent reasons &or inclusion/e5clusions Deend on a doain e5ert #ccessi!ility issues
Legal and tec.nical
#re t.ere issues regarding data distri!utionacross di@erent data!ases/legacy systes .ere are t.e disconnects7
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 12/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
Data %nderstanding P.ase
++ Data Descrition Docuent data ?uality issues
re?uireents &or data rearation
Coute !asic statistics
Data E5loration Sile univariate data lots/distri!utions +nvestigate attri!ute interactions Data 8uality +ssues
Missing >alues %nderstand its source" Missing vs <ull values
Strange Distri!utions
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 13/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
Data Prearation P.ase
+ntegrate Data ,oining ultile data ta!les Suarisation/aggregation o& data
Select Data #ttri!ute su!set selection
ationale &or +nclusion/E5clusion Data saling
Training/>alidation and Test sets
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 14/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
Data Prearation P.ase ++ Data Trans&oration
%sing &unctions suc. as log ;actor/Princial Coonents analysis <oraliation/Discretisation/9inarisation
Clean Data 6andling issing values/utliers
Data Construction Derived #ttri!utes
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 15/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
T.e Modelling P.ase Select o& t.e aroriate odelling
tec.ni?ue Data re-rocessing ilications
#ttri!ute indeendence
Data tyes/<oralisation/Distri!utions Deendent on
Data ining ro!le tye utut re?uireents
Develo a testing regie Saling
>eri&y sales .ave siilar c.aracteristics and arereresentative o& t.e oulation
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 16/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
T.e Modelling P.ase 9uild Model
C.oose initial araeter settings Study odel !e.aviour
Sensitivity analysis
#ssess t.e odel 9eware o& over-:tting +nvestigate t.e error distri!ution
+denti&y segents o& t.e state sace w.ere t.e odel is
less e@ective +teratively ad4ust araeter settings
Docuent reasons o& t.ese c.anges
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 17/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
T.e Evaluation P.ase >alidate Model
6uan evaluation o& results !y doaine5erts
Evaluate use&ulness o& results &ro !usiness
ersective De:ne control grous Calculate li&t curves E5ected eturn on +nvestent
eview Process Deterine ne5t stes
Potential &or deloyent Deloyent arc.itecture Metrics &or success o& deloyent
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 18/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
T.e Deloyent P.ase Knowledge Deloyent is seci:c to
o!4ectives Knowledge Presentation Deloyent wit.in Scoring Engines and
+ntegration wit. t.e current +T in&rastructure #utoated re-rocessing o& live data &eeds AML inter&aces to 3rd arty tools
*eneration o& a reort nline/Bine
Monitoring and evaluation o& e@ectiveness Process deloyent/roduction Produce :nal ro4ect reort
Docuent everyt.ing along t.e way
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 19/81
Microso&t LE D9 &or DM
E5tension o& Microso&t
#nalysis Services &or DataMining
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 20/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
.at is LE D9 &or Data-
Mining7 LE D9 &or DM is Microso&ts E5tension o&#nalysis Server roduct &or covering DM&unctionality +t is closely connected to MS L#P Server or)s wit.in S8L Server data!ase suite
+t de:nes DM at several levels" E5tensions o& S8L language &or descri!ing DM
tas)s
#P+ in t.e &or o& CM inter&ace &or" $F' Prograing DM clients wit.in alications $2' Prograing DM roviders $server side coonents'
&or including new DM algorit.s %ses PMML &or odel descrition
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 21/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
#rc.itecture o& a solution using
LE D9 &or DM tec.nologyEnd-%ser #lication
Data!ase SystesMS S8L Server(MS L#P Server
racle( D92( G
MS E5cel /MS Site Server /
MS Coerce Server
MS #nalysis Server
Decision TreesCoonent
ClusteringCoonent
LE D9 &or DM
LE D9 &or DMLE D9 &or DM
LE D9 &or DM
LE D9
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 22/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
.at are )ey DM tas)s7 Key DM tas)s covered !y LD D9 &or
DM are" Predictive Modeling $Classi:cation' Segentation $Clustering' #ssociation $Data Suariation' Se?uence and Deviation #nalysis
Deendency Modeling
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 23/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
De:ning a doain H
Creating Mining Model !4ectUsing an OLE DB command object, the clientexecutes a CREATE statement that is similar to aCREATE TABLE statement:
CE#TE M+<+<* MDEL I#ge PredictionJ$ICustoer +DJ L<* KE(I*enderJ TEAT D+SCETE(I#geJ D%9LE D+SCET+ED$' PED+CT(IProduct Purc.asesJ T#9LE $IProduct <aeJ TEAT KE(
I8uantityJ D%9LE <M#L C<T+<%%S(IProduct TyeJ TEAT D+SCETE EL#TED T IProduct <aeJ'
'%S+<* IDecision TreesJ
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 24/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
+nserting Training Data intoModel
n a manner similar to !o!ulating an ordinar" table,the client uses a #orm o# the $%ERT $TO
statement&$ote the use o# the %'A(E statement to create thenested table&
+<SET +<T I#ge PredictionJ$ICustoer +DJ( I*enderJ( I#geJ(IProduct Purc.asesJ$SK+P( IProduct <aeJ( I8uantityJ( IProduct TyeJ'
'S6#PE
SELECT ICustoer +DJ( I*enderJ( I#geJ ;M Custoers DE 9
ICustoer +DJN#PPE<D $
SELECT ICust+DJ( IProduct <aeJ( I8uantityJ( IProduct TyeJ ;M SalesDE 9 ICust+DJNEL#TE ICustoer +DJ To ICust+DJ'
#S IProduct Purc.asesJ
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 25/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
%sing Models to a)e
Predictions(redictions are made )ith a %ELECT statement that joinsthe
model*s set o# all !ossible cases )ith another set o# actualcases&
SELECT t1ICustoer +DJ( I#ge PredictionJ1I#geJ;M I#ge PredictionJPED+CT+< ,+< $
S6#PE SELECT ICustoer +DJ( I*enderJ( ;M Custoers DE 9 ICustoer +DJN
#PPE<D $SELECT ICust+DJ( IProduct <aeJ( I8uantityJ ;M Sales DE 9 ICust+DJNEL#TE ICustoer +DJ To ICust+DJ
'#S IProduct Purc.asesJ
' as t< I#ge PredictionJ 1*ender O t1*ender and
I#ge PredictionJ 1IProduct Purc.asesJ1IProduct <aeJ O t1IProduct Purc.asesJ1IProduct <aeJ andI#ge PredictionJ 1IProduct Purc.asesJ1I8uantityJ O t1IProduct Purc.asesJ1I8uantityJ
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 26/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
#ssociation ules The #ollo)ing statement creates a data mining
model to +nd out those !roducts )hich sell togetherbased on an association algorithm& The model isinterested onl" in rules )ith at least +e items:
Create Mining Model My#ssociationModel $
Transactionid long )ey( IProduct urc.asesJ ta!le redict $ IProduct <aeJ te5t )ey ' '%sing IMy #ssociation #lgorit.J $Miniusie O Q'
Training an association model is exactl" the same as
training a tree model or a clustering model& To get all the association rules discoered b" the
algorithm, run the #ollo)ing statement:Select R &ro My#ssociationModel1content
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 27/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
egression #nalysis B" using a regression algorithm, the #ollo)ing
mining model !redicts loan ris- leel based onage, income, homeo)ner, and marital status:Create Mining Model MyegressionModel $
Custoerid long )ey(#ge long continuous(
6oeowner !oolean discrete(
Maritalstatus 9oolean discrete(
Loanris)LE>ELcontinuous redict
'%sing IMy egression #lgorit.J
The #ollo)ing statement returns all thecoe.cients o# the regression:Select R &ro MyegressionModel1content
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 28/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
>isual 9asic e5ale using t.e
LE D9 &or DM Clusteringcoonent$F' Di ClusterConnection #s <ew #DD91Connection$2' ClusterConnection1Provider O MSDMine$3' DMM<ae O ICollPlanDMMJ$' Data;ile<ae O 1UCollegePlan1d!
$Q' ClusterConnection1ConnectionString O locationOlocal.ostVW initial catalogOI;oodMart 2000JV$X' ClusterConnection1en
$Y' ClusterConnection1E5ecute CE#TE M+<+<* MDEL IClusterModelJW $IStudent +dJ L<* KE( ICollege PlansJ TEAT D+SCETEPED+CT(
W I*enderJ TEAT D+SCETE PED+CT( I+?J L<* C<T+<%%SPED+CT(W IParent EncourageentJ TEAT D+SCETE PED+CT( IParent+ncoeJL<* C<T+<%%S PED+CT'W %S+<* Microso&tClustering
$Z' G
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 29/81
AML# - AML &or #nalysis
.tt"//5la1org/
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 30/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
.at is AML &or #nalysis7
/0L #or Anal"sis is a set o& AML Message+nter&aces t.at use t.e industry standard S#P tode:ne t.e data access interaction !etween a
client alication and an analytical data rovider$L#P and Data Mining' wor)ing over t.e+nternet1
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 31/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
.at are t.e !ene:ts o&
AML#7 Customers will gain t.e a!ility to rotect server
and tools investents and ensure t.at newanalytical deloyents will interoerate and wor)cooeratively1
Deelo!ers will gain t.e a!ility to leveragee5isting develoer s)ills and to use oen accessAML-!ased e! services( eliinating t.e need torogra to ultile #P+s and ?uery languages1
nde!endent so#t)are endors will !e a!le toreduce cole5ity and costs &or develoent andaintenance !y writing to a single access inter&ace1
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 32/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
6istory o& AML#
2000 2001 2002 2003
Hyperion & MicrosoftAnnounce Co-Sponsorshipof XMLA Specification
SAS Joins Council
First XMLA CouncilMeetin !creation of S"# tea$s%
Microsoft eleases S'(
)ersion 1*0 elease+
)ersion 1*1 elease+
)ersion 1*2 !,'%
Apr .o/ MayAprApr Sep
"nterperate orshop "
"nterperate orshop ""
Mar
Secon+ XMLA CouncilMeetin
1st Public XMLAInterOperabilityDemonstration(TDWI)
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 33/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
E5ale o& AML# S#P
e?uest T.e &ollowing is an e5ale o& an Execute et.od callwit. [Stateent\ set to an L#P MDA SELECT stateent"
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 34/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
E5ale o& AML# S#P
esonse T.is is t.e a!!reviated resonse &or t.ereceding et.od call"
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 35/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
.at Provider >endors SuortAML#7
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 36/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
.at Consuer W Consulting>endors #re/ill Suort AML#7
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 37/81
9E#K
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 38/81
,DM" T.e ,ava #P+ &or DataMining
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 39/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
!4ective To develo a ,ava #P+ t.at suorts
9uilding o& odels Scoring o& data using odels Creation( storage( access and aintenance o& data
and etadata suorting data ining results To provide for data mining systems what JDBCTM did for
relational databases Implementers of data mining applications can expose a
single, standard AI !nderstood by a wide variety of client
applications and components Data Mining clients can be coded against a single AI that
is independent of the !nderlying data mining system "vendor
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 40/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
#roac. and
Develoent Leverages ot.er related standards
PMML $DM*' CM $M*' S8L/MM $+S' ,CA $,S-FX'
Pu!lic Dra&t eleased in ,uly( 2002
Currently wor) is continuing on t.e:nal dra&t
,M+ $,S-0' ,L#P $,S-X]'
C+SP-DM LED9 DM
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 41/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
elated Standards
DMG
PMML
%&rsntation o' data
(inin) (odls 'or intr-*ndor +#$an)
DTD/ML
MG
C"M
DM
b#t (odl
'or r&rsntin)
data (inin) (tadata:(odls, (odl rsults
1ML/DTD/ML
SL/MM
Pt. DM
SL ob#ts 'or d'inin),#ratin), and a&&l4in)
data (inin) (odls, and
obtainin) t$ir rsults
SL
LE D5
'or DM
SL-li! intr'a#'or data (inin)
o&rations
LE D5/SL
6S%-073
6DM
6a*a AP8 'or d'inin),#ratin), a&&l4in), and
obtainin) t$ir rsults o'
data (inin) (odls
6a*a
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 42/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
T.e E5ert *rou Mar) 6ornic)( racle
$Lead' 9E# Systes Couter #ssociates
Cororate+ntellect CalTec. ;air +ssac 6yerion +9M KAE<
8uadstone S#P S#S SPSS Strategic
#nalytics Sun Microsystes %niversity o&
%lster
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 43/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
%se Case # rograer is tas)ed wit. develoent o& a target
ar)eting tools t.at allows t.e user to C.oose a target caaign E-ail a rando sale o& t.e custoers 9uild a odel !ased on t.e resonses
#ly t.e odel to irove t.e targeting o& t.e caaign %sing ,DM $&or t.e 3rd and t. tas)s' t.e rograer
De:nes t.e target data &or t.e odelling using t.e P.ysical andLogical Data Classes
%ses t.e Classi:cation ;unction Settings class to set de&aultaraeters &or t.e learning tas)
Creates a !uild tas) t.at generates and ersists t.e odel Creates an aly tas) t.at alies t.e odel to select t.e caaign
targets Miniises ris) associated wit. a c.ange in t.e data ining vendor !y
using t.e standard ,DM inter&ace
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 44/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
6ow will it wor)7
,DM de:nes a set o&inter&aces &or De:ning t.e data to !e
used in t.e ining P.ysical/Logical Data
De:ning t.e dataining araeters ;unction settings
Suort &or <ovice%sers
#lgorit. settings E5ert %ser #lgorit. seci:c
settings
Per&oring Tas)s E5ecuting a data ining
algorit. +orting/E5orting to
PMML Testing t.e )nowledge #lying t.e )nowledge on
new data 9atc. and eal-tie
Scoring
Coute Statistics
+nterrogating t.e resulting)nowledge
Persistence o& all MetaData/Data
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 45/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
Tyical #rc.itecture
, D M
Cororateare.ouse
MetaData
eository
ProrietaryDataMiningEngine F
MetaDataeository
ProrietaryData MiningEngine 2
1
1
%ses ;actory Classes6ence( Service ProviderClasses need not!e ade u!lic
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 46/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
Con&orance ules &or ServiceProviders
a la carte aroac. to &unctions and algorit.ssuorted vendors ileent &unctions and algorit.s t.at t.eir
roducts suort
#t least one &unction ust !e suorted #ll core ac)ages ust !e suorted #ll et.ods wit.in a ileented class ust !e
ileented seantics seci:ed &or eac. et.od ust !e
ileented to ensure coon interretation o& a givenresult
Must suort ,2EE and/or ,2SE E5tension ay !e done t.roug. su!classing
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 47/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
Data Mining ;unctions
Suorted Classi:cation egression #ttri!ute +ortance Clustering #ssociation ules
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 48/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
#lgorit.s Suorted <a^ve 9ayes Decision Trees ;eed ;orward <eural <etwor)s Suort >ector Mac.ines K-Means
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 49/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
Code E5ale $F'// Gt a #onn#tion
9 Conn#tionS&# #onnS&# a*a+.data(inin).rsour#.Conn#tionS&# d(C;a#tor4.)tConn#tionS&#<2 #onnS&#.st=a( >usr9? <
3 #onnS&#.stPass@ord >&s@d? <
#onnS&#.st1%8 >(4DME? <
B a*a+.data(inin).rsour#.Conn#tion d(Conn d(C;a#tor4.)tConn#tion#onnS&# <
// Crat and &o&ulat t$ P$4si#al Data ob#t D'in t$ Data to b usd
P$4si#alDataSt;a#tor4 &ds;a#tor4 P$4si#alDataSt;a#tor4 d(Conn.)t;a#tor4 > a*a+.data(inin).data.P$4si#alDataSt? <
7 P$4si#alDataSt &d &ds;a#tor4.#rat >(ini*an.data? <
&d.i(&ortMtaData<
d(Conn.sa*b#t >(4PD?, &d <
// Crat Lo)i#alData ob#t
90 Lo)i#alData;a#tor4 ld;a#tor4 Lo)i#alData;a#tor4 d(Conn.)t;a#tor4>a*a+.data(inin).data.Lo)i#alData? <
99 Lo)i#alData ld ld;a#tor4.#rat &d <
// S&#i'4 $o@ attributs s$ould b usd
92 Lo)i#alAttribut in#o( ld.)tAttribut >in#o(? <
93 in#o(.stAttributT4& AttributT4&.nu(ri#al <
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 50/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
Code E5ale $2'// Crat t$ ;un#tionSttin)s 'or Classi'i#ation
9 Classi'i#ationSttin)s;a#tor4 #'s;a#tor4 Classi'i#ationSttin)s;a#tor4 d(Conn.)t;a#tor4
>a*a+.data(inin).su&r*isd.#lassi'i#ation.Classi'i#ationSttin)s? <
9B Classi'i#ationSttin)s sttin)s #'s;a#tor4.#rat<
9 sttin)s.stTar)tAttribut=a( >bu4Mini*an? <
97 sttin)s.stCostMatri+ #osts < // &rd'ind #ost (atri+
// Crat t$ Al)orit$(Sttin)s and add it to t$ ;un#tionSttin)s
9 =ai*5a4sSttin)s;a#tor4 nb;a#tor4 =ai*5a4sSttin)s;a#tor4 d(-Conn.)t;a#tor4
>a*a+.data(inin).al)orit$(.nai*ba4s.=ai*5a4s-Sttin)s? <
9 =ai*5a4sSttin)s nbSttin)s nb;a#tor4.#rat<
20 nbSttin)s.stSin)ltonT$rs$old .09L <
29 nbSttin)s.stPair@isT$rs$old .09L <
// Asso#iat LD and AS @it$ t$ ;un#tionSttin)s
22 sttin)s.stAl)orit$(Sttin)s nbSttin)s <
23 sttin)s.stLo)i#alData ld <
2 d(Conn.sa*b#t >(4;S?, sttin)s <
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 51/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
Code E5ale $3'// Crat t$ build tas!
2 5uildTas!;a#tor4 bt;a#tor4
5uildTas!;a#tor4 d(Conn.)t;a#tor4>a*a+.data(inin).tas!.5uildTas!? <
27 5uildTas! buildTas! bt;a#tor4.#rat >(4PD?, >(4;S?, >(4Modl? <
2 Fri'i#ation%&ort r&ort buildTas!.*ri'4<
2 i' r&ort null H // it$r rror or @arnin)
30 %&ortT4& r&ortT4& r&ort.)t%&ortT4& < // #$#! i' itIs ust a @arnin) or an rror
32 J ls H
33 d(Conn.sa*b#t >(45uildTas!?, buildTas! <
// E+#ut t$ tas! and blo#! until 'inis$d
3 E+#utionandl $andl d(Conn.+#ut >(45uildTas!? <
3B $andl.@ait;orCo(<ion null < // @ait @it$out ti(out until don
// A##ss t$ (odl
3 Classi'i#ationModl (odl
Classi'i#ationModl d(Conn.)tb#t >(4Modl?, =a(db#t.(odl <
37 J
// Clos t$ #onn#tion
3 d(Conn.#los<
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 52/81
PMML" T.e PredictiveModel Mar)u Language
.tt"//www1dg1org
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 53/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
Predictive Model Mar)-u Language$PMML'
+ndustry led standard &or reresentingt.e outut o& data ining
Suorted !y ;ull Me!ers" +9M( racle( Magni&y( SPSS(
S#S( StatSo&t( Microso&t( Cororate+ntellect(KAE<( Sal&ord Systes
<uerous #ssociated Me!ers
!4ective de:ne and s.are redictive odels using an
oen standard
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 54/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
ationale
Cole5 osaic o& so&tware alications Knowledge generators
Data Mining >endors Di@erent data ining algorit.s .ave di@erent
languages &or e5ressing t.e )nowledge discovered >endor deendent reresentations &or )nowledge e1g1
C/C__ routines
Knowledge consuers eal-tie Scoring / Personalisation engines
Mar)eting Tools >isualisation Tools
<eed &or a vendor indeendentreresentation o& data ining outut
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 55/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
PMML
9ene:ts rorietary issues and incoati!ilities no
longer a !arrier to t.e e5c.ange o& odels!etween alications
!ased on AML develo odels using any generator
vendor( deloy t.e odels using anyconsuer vendor alication
Develoent Current elease 21F Suorted !y ost current releases o&
e!er vendors alications
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 56/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
PMML Docuent 9asic AML structure DCTPE declaration
not re?uired # PMML docuent ust
!e a valid AML docuent o!ey PMML con&orance
rules
oot eleent [PMML\ X c.ild eleents
2 re?uired 6eader Data Dictionary
otional
+(l *rsionN9.0N O
[`DCTPE PMML P%9L+C PMML210
.tt"//www1dg1org/v2-0/lv201dtd\
PMML *rsionN2.0N O
adr /O
Minin)5uildTas! /O
DataDi#tionar4 /O
Trans'or(ationDi#tionar4 /O
SQun#Minin)Modl /O
E+tnsion /O
/PMMLO
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 57/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
6eader #ttri!utes
coyrig.t descrition
Eleents #lication $t.at generated t.e PMML'
<ae" Cari >ersion" 210
#nnotation ;ree te5t
TieSta Date/Tie o& odel creation
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 58/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
6eader $2'
+(l *rsionN9.0N O
PMML *rsionN9.0N O
adr #o&4ri)$t>Cor&orat8ntll#tN ds#ri&tion>%sults o' CAP%8N O
/adrO
. . .
. . .
/PMMLO
A&&li#ation na(>C%ALN *rsionN3.0N OAnnotationOT$is is a PMML do#u(nt @it$ rsults 'ro( t$
CAP%8 run on #o((odit4 (ar!t data./AnnotationO
Ti(sta(&O2003-03-02 9:30:00 GMT R00:00/Ti(sta(&O
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 59/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
Mining 9uild Tas) May contain any AML value descri!ing t.e
con:guration o& t.e training run t.atroduced t.e odel
+n&oration rovided in t.is eleent isessentially eta-data not used seci:cally in t.e deloyent o& t.e
odel !y t.e PMML consuer
Seci:c content structure not de:ned inPMML
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 60/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
Data Dictionary #ttri!utes
<u!er o& ;ields aids consistency c.ec)s
Eleents Data;ield
#ttri!utes <ae dislay<ae tye
categorical/ordinal/continuous De:nes legal oerations on t.e :eld values
Ta5onoy #ame of taxonomy that defines a hierarchy on the val!es
isCyclic
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 61/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
Data Dictionary $2'
Eleents >alue
De:nes doain &or ordinal and categorical attri!utes value dislay>alue roerty" valid/ invalid/ issing
+nterval De:nes t.e range o& valid values &or continuous
:elds closure" oenClosed( closeden( oenen(
closedClosed le&tMargin
rig.tMargin Ta5onoy
De:ne .ierarc.ies on seci:c :elds wit.in t.e datadictionary
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 62/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
Data Dictionary $3'
#ttri!utes nae" associates t.e ta5onoy wit. t.e aroriate :eld
wit.in t.e data dictionary $see Data;ield attri!ute ta5onoy'
Eleents C.ildParent
#ttri!utes c.ild;ield" nae o& :eld wit.in t.e ta!le $see Eleents
!elow' t.at reresents t.e c.ild value arent;ield" nae o& :eld wit.in t.e ta!le $see Eleents
!elow' t.at reresents t.e arent value arentLevel;ield" nae o& :eld wit.in t.e ta!le $see
Eleents !elow' t.at reresents t.e level in t.e.ierarc.y isecursive" es/<o" i& t.e w.ole .ierarc.y is de:ned in
t.e sae ta!le or an individual ta!le er level Eleents
+nline Ta!le/Ta!le Locator
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 63/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
DataDictionary colete
+(l *rsionN9.0N O
PMML *rsionN9.0N O
adr /O
. . .
/PMMLO
DataDi#tionar4 nu(';ilds N3N O
/DataDi#tionar4 O
Data;ild na( NT4&N o&t4&N#at)ori#alNO
Falu *alu N51 N/O Falu *alu NN/O
Falu *alu NCN/O
/Data;ildO
Data;ild na( NA)N o&t4& N#ontinuousNO
8ntr*al #losur N#losdClosdN l'tMar)in N0N ri)$tMar)in N9B0N/O
/Data;ildO
Data;ild na( NPostCodN o&t4&N#at)ori#alN ta+ono(4 NLo#ationN /O
Ta+ono(4 naeOLocation\ .
/Ta+ono(4O
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 64/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
Ta5onoy E5ale[Ta5onoy naeOLocation\[C.ildParent c.ildColunOPost Code arentColunODistrict\
[Ta!leLocator 5-d!naeOyD9 5-ta!le<aeOPostCodeDistrict /\[/C.ildParent\[C.ildParent c.ildColunOe!er arentColunOgrouisecursiveOyes\ [+nlineTa!le\
[E5tension e5tenderOMySyste\ [row e!erO] grouOCentralLondon/\
[row e!erO<] grouO<ort.London/\[row e!erO<2 grouO<ort.London/\[row e!erOF grouOCentralLondon/\[row e!erOCentralLondon grouOLondon/\
[row e!erO<ort.London grouOLondon/\[row e!erOEastLondon grouOLondon/\[row e!erOLondon grouOEngland/\GGGG1
[/E5tension\[/+nlineTa!le\[/C.ildParent\ [/Ta5onoy\
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 65/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
Trans&oration Dictionary De:nes aing o& source data values
to values ore suited &or use !y t.eining algorit.
PMML suorts Normalization$ map val!es to n!mbers, the inp!t
can be contin!o!s or discrete% Discretization$ map contin!o!s val!es to discrete
val!es%
Value mapping$ map discrete val!es to discreteval!es%
Aggregation$ s!mmari&e or collect gro!ps ofval!es, e%g% comp!te average
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 66/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
Trans&oration Dictionary
$2' Tran&orationDictionary Derived;ield Eleents
#ttri!utes nae
dislay<ae Eleents
E5ression $one o& t.e &ollowing' <orContinuous <orDiscrete Discretie Ma>alues #ggregates
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 67/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
Trans&oration Dictionary
$3'[Derived;ield naeOnoral#ge\
[<orContinuous :eldOage\[Linear<or origOQ norO0/\
[Linear<or origOZ2 norO01Q/\ [Linear<or origOF0Q norOF/\
[/<orContinuous\[/Derived;ield\[Derived;ield naeOale\
[<orDiscrete :eldOarital statusvalueO/\[/Derived;ield\[Derived;ield naeO&eale\
[<orDiscrete :eldOarital statusvalueO&/\[/Derived;ield\
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 68/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
Trans&oration Dictionary
$'[Derived;ield naeO!innedPro:t\ [Discretie :eldOPro:t\
[Discretie9in !in>alueOnegative\[+nterval closureOoenen rig.tMarginO0 /\
[/Discretie9in\[Discretie9in !in>alueOositive\
[+nterval closureOcloseden le&tMarginO0 /\[/Discretie9in\
[/Discretie\[/Derived;ield\[Derived;ield naeO.ouseTye\ [Ma>alues oututColunOlong;or\
[;ieldColunPair :eldOTye colunOs.ort;or/\[+nlineTa!le\[E5tension\
[row\[s.ort;or\9%[/s.ort;or\[long;or\!unglow[/long;or\ [/row\[row\[s.ort;or\6[/s.ort;or\[long;or\.ouse[/long;or\
[/row\[row\[s.ort;or\C[/s.ort;or\[long;or\cottage[/long;or\
[/row\ [/E5tension\[/+nlineTa!le\
[/Ma>alues\[/Derived;ield\
[Derived;ield naeOites9oug.t\ [#ggregate :eldOite &unctionOultiset grou;ieldOtransaction/\
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 69/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
T.e PMML Docuent
Data Dictionary
Trans&oration Dictionary
Mining Sc.ea
ModelF
G
Model2 Model)
Data
ModelStatistics
Mining Sc.ea
ModelStatistics
Mining Sc.ea
ModelStatistics
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 70/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
Mining Sc.ea Eleents
Mining;ield #ttri!utes
<ae
usageTye" active/ redicted/ suleentary utliers" as+s/ asMissing>alue/ asE5tree>alues low>alue .ig.>alue issing>alueelaceent issing>alueTreatent" as+s/ asMean/ asMode/
asMedian/ as>alue
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 71/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
+(l *rsionN9.0N OPMML *rsionN9.0N O
adr /O
DataDi#tionar4 /O
/PMMLO
MiningSc.ea
SQun#Modl &unction<aeOse?uences algorit.<aeOCari2iniuSuortO21FY iniuCon:denceO0100
nu!er&+tesOQ nu!er&SetsOQ nu!er&Se?uencesOFFnu!er&ulesO3\[E5tension naeOorder!y valueOnone/\
/SQun#Modl O
Minin)S#$(a O
/Minin)S#$(a O
Minin);ild na( NPri#N usa)T4&N&rdi#tdN /O
Minin);ild na( Nlo#ationN usa)T4&Na#ti*N /O
Minin);ild na( Nbdroo(sN usa)T4&Na#ti*N /O
Minin);ild na( N$ousT4&N usa)T4&Na#ti*N /O
Minin);ild na(NAraN usa)T4& Nsu&&l(ntar4N /O
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 72/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
Model Statistics Eleents
%nivariateStatistics #ttri!utes
;ield
Eleents Discrete Statistics Continuous Statistics Counts" >alid( +nvalid and Missing counts <ueric+n&o" in/ a5/ ean/ standard
deviation/ edian/ inter8uartileDistance
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 73/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
Suorted Data Mining
Models Tree Model <eural <etwor)s Clustering Model
egression Model *eneral egression Model <a^ve 9ayes Model
#ssociation ules Se?uence ule Model
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 74/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
Se?uence Model
eresents t.e oututo& Se?uence uleMining
#ttri!utes odel<ae &unction<ae algorit.<ae nu!er&Transactions
iniuSuort iniuCon:dence lengt.Liit G11
Eleents Se?uence ule
Eleents #ntecedent Se?uence
se?uencee&erenc
e Conse?uent Se?uence Deliiter
Se?uence Eleents
Sete&erence Deliiter
Set Predicate #rray
[Se?uenceModel &unction<aeOse?uences nu!er&TransactionsOF00iniuSuortO0120 iniuCon:denceO012Q nu!er&+tesOXnu!er&Sets Q nu!er&Se?uences 3 nu!er&ules F\ [MiningSc.ea\
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 75/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
nu!er&SetsOQ nu!er&Se?uencesO3 nu!er&ulesOF\ [MiningSc.ea\GGG [/MiningSc.ea\
[SetPredicate idOs00F :eldOtransaction oeratorOsuerset&\[#rray nOF tyeOstring\ inde51.tl [/#rray\ [/SetPredicate\
[SetPredicate idOs002 :eldOtransaction oeratorOsuerset&\[#rray nO2 tyeOstring\ [email protected] )dnuggets1co [/#rray\[/SetPredicate\
[SetPredicate idOs003 :eldOtransaction oeratorOsuerset&\[#rray nOF tyeOstring\ roducts1.tl [/#rray\ [/SetPredicate\
[SetPredicate idOs00 :eldOtransaction oeratorOsuerset&\[#rray nOF tyeOstring\ !as)et1.tl [/#rray\ [/SetPredicate\
[SetPredicate idOs00Q :eldOtransaction oeratorOsuerset&\[#rray nOF tyeOstring\ c.ec)out1.tl [/#rray\ [/SetPredicate\
[Se?uence idOse?00F nu!er&SetsOF occurrenceOZ0 suortO01Z0\[Sete&erence set+dOs00F/\ [/Se?uence\
[Se?uence idOse?002 nu!er&SetsO occurrenceO0 suortO010\[Sete&erence set+dOs002/\[Deliiter deliiterOacrossTieindows
gaO&alse/\[Sete&erence set+dOs003/\[Deliiter deliiterOsaeTieindow
gaOtrue/\[Sete&erence set+dOs00/\[Deliiter deliiterOsaeTieindow
gaO&alse/\[Sete&erence set+dOs00Q/\ [/Se?uence\
[Se?uenceule idOrule00F nu!er&SetsOQ occurrenceO20 suortO0120
con:denceO012Q\[#ntecedentSe?uence\[Se?uencee&erence
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 76/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
PMML Consuers
Post-Processing >isualiation >eri:cation and Evaluation Deloyent 6y!rids and Meta-Learning
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 77/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
PE#" Post-Processing #ssociation
ules
Sets o& #ssociation rules are !rowsed li)e we!ages
PMML-&oratedassocationrules can !euloaded
,orge et al1(2002
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 78/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
>ii - PMML >isualiation
,ava #let
Soe non-standarde5tensionsre?uired &or
!estvisualiation ettsc.erec)(
2003
eads( visualies and writes PMML:les
Couling wit. EK# in rogress
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 79/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
Cn H >isualiing C
gra.s coare and evaluate odels
,ava #let %nderstands PMML
as an e5tension to
>ii
;arrand and ;lac. $.tt"//www1cs1!ris1ac1u)/
bYE&arrand/rocon/inde51.tl'
%se eceiver erator C.aracteristics $C' to
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 80/81
ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#!
Suary
Standards .el to strealine e@orts Sign o& aturity in :eld o& KD ;ro #rt to Engineering Standards are still incolete( !ut"
Use what is available!
More tools utiliing standards are
needed
8/20/2019 Crisp 12 Pre
http://slidepdf.com/reader/full/crisp-12-pre 81/81
e&erences *rossan( 1L1( 6ornic)( M1;1( Meyer( *1 $2002'1 Data Mining Standards Initiatives(
Counications o& t.e #CM( >ol1 Q"Z see also .tt"//www1dg1org C.aan( P1( Clinton( ,1( Ker!er( 1( K.a!aa( T1( einart( T1( S.earer( C1 and irt.( 1 $2000'1
CRISP-DM 1.0: Step-by-step data mining guide( C+SP-DM consortiu( .tt"//www1cris-d1org Cli&ton( C1( T.uraising.a( 91 $200F'1 Emerging standards for data mining1 Couter
Standards W +nter&aces >ol 23 FZY H F]31 Compare and Contrast !"#P and $M" for #na%ysis
.tt"//www1ess!ase1co/resourceli!rary/articles/4ola5la1c& ,CA .tt"//www14c1org/en/4sr/detail7idO0FX ,L#P .tt"//www14c1org/en/4sr/detail7idOX]
,orge( #1( Poas( ,1 and #evedo( P1 $2002'1 Post-pro&essing operators for bro'sing %arge sets ofasso&iation ru%es1 Proc1 Discovery Science 021 $eds1 Lange( S1( Sato.( K1 and Sit.( C1 61'(L!ec)( *erany( L<CS( 2Q3( Sringer->erlag1
;arrand( ,1 and ;lac. P1 $2003'1 R!C!n: a too% for visua%ising R!C grap(s1 See".tt"//www1cs1!ris1ac1u)/bYE&arrand/rocon/inde51.tl
Melton( ,1 and Eisen!erg( #1 S)" Mu%timedia and #pp%i&ation Pa&*ages +S)",MM(.tt"//www1ac1org/sigod/record/issues/0FF2/standards1d&
M* Coon are.ouse MetaModel .tt"//www1og1org/cw/ S#P .tt"//www1w31org/T/S#P/ Tang( 1( Ki( P1 ui%ding Data Mining So%utions 'it( S)" Server /000(
.tt"//www1dreview1co/w.iteaer/wid2]21d& ettsc.erec)( D1( ,orge( #1( Moyle( S1 $to aear'1 Data Mining and De&ision Support
Integration t(roug( t(e Predi&tive Mode% Mar*up "anguage Standard and isua%iation inMladenic D( Lavrac <( 9o.anec M( Moyle S $editors'" Data Mining and Decision Suort"+ntegration and Colla!oration( Kluwer Pu!lis.ers1
AML# .tt"//www15la1org/