data mining & spss modeler

Click here to load reader

Upload: maxwell-allen

Post on 04-Jan-2016

440 views

Category:

Documents


8 download

DESCRIPTION

Data Mining & SPSS Modeler. Day 1. 数据挖掘简介. 1.1 数据挖掘概念. 1.1 数据挖掘概念. 数据挖掘概念. 数据挖掘定义: 用 已验证 的方法 从大量数据 中 发掘出 可 采取行动的内在知识,从而改善企业 运营。. 业务中的数据量呈现指数增长( GB/ 小时) 传统技术难以从这些大量数据中发现有价值的规律 数据挖掘可以帮助我们从大量数据中发现有价值的规律. 运营. 预测与分类 聚类 关联分析 序列分析 异常监测 时间序列. 1.1 数据挖掘概念. 预测和分类. - PowerPoint PPT Presentation

TRANSCRIPT

SPSS

Data Mining & SPSS Modeler11.1 Day 11.1 4GB/

31.1 4

4

Group 1Group 2Group 3Group n

1.1 5()51.1 6(Antecedent) (Consequent)

? ?(Antecedent) (Consequent)

? ?

(1) & (2) & & (m) Antecedents Consequent Buying Pattern 61.1 7Mail DirectorycafCommunitybannerModel 1MainModel 2cafecafeModel 3MainModel 4CommunityPattern 71.1 8 /1472365 8Transaction ID200902131224350001200902131224350002881.1 9

91.2 Day 1CRM Risk Scoring&/ScoringLTV (Life Time Value)Risk (Warranty)

1.2 11

Warranty&Cross-selling, Up-selling

1.2 1212--

3, , .SPSS, 3!! & 2~3%

: , Intervention : IMF, 911/ Data

01

03

021.2 1313 & & & Cross-selling, Up-selling (FDS)Underwriting List&

1.2 1414--

? SPSSCLTVRFM , !! !!

CLV()/()Life Stage ()VIPGoldVIP Silver()

&

01

03

021.2 1515--&

, ?SPSS!!!! ?

/ex) 78.5 (24%)//ex) 85.8 (18%)

/ Scoring

01

03

021.2 1616(Cross/Up-selling) (Re-selling)&Shopping Mall

1.2 1717--

, ? ?Mining!!SPSS !!!! !!

6 Set /

01

03

021.2 1818--

, e-MAIL & SMS ?SPSS Mining, , , !!!!

01

03

021.2 1919&Up-selling&&

1.2 2020--&

IAPPMOVIEMUSICLIFE Gain ChartRule Description

!!! Needs, ? SPSSCollaborative Filtering !!!!

01

03

021.2 2121--

CDMAW-CDMA

& ?, SPSS!!/!!

01

03

021.2 2222//

/

1.2 2323/--

+

?SPSS Decision!!, j !!

01

03

021.2 2424/--

Segment

//

(Time Series)

ARIMA / / AR

/

/??SPSS !!/!!

01

03

021.2 2525&Up-selling&&

(, , , )

1.2 2626--

DB

SPSS DB

Guide> TV > Rating/Guide> TV > RatingTV Program TableTVTableTableETCL ODBCData Mart INSERT.net Data Mart

?SPSS , !! &!!

01

03

021.2 27272.1 SPSS Modeler Day 1SPSS Modeler2.1 SPSS Modeler 14.2IBM SPSS ModelerGUI.IBM SPSS Modeler 14.2

(Open Architecture)

IBM SPSS Modeler Data Mining1429SPSS Education29 2011 SPSS Korea, Data Solution Inc. All rights reservedModelerCSSPSS Modeler SeverSQL PushbackSPSS Modeler Client90

ClientServer

Server302.1 C/SSPSS Education30 2011 SPSS Korea, Data Solution Inc. All rights reserved-1312.1

Mining

(Open Architecture)

MiningData Mining.

S/W DBMSData Mining S/WSourceMiddle Ware, S/WGroupS/W Up-Grade

SPSS Education31 2011 SPSS Korea, Data Solution Inc. All rights reserved-2322.1

(Open Architecture)

Data Mining DBMiningDB Bulk LoadingLoader2ExportSPSS Modeler(GUI )Loader 3DB TableBulk Loading4

DB Bulk Loader

IBM SPSSModeler

txt 1Data Base

* Client PC*** PC Client PCIBM SPSS Modeler ClientWindows XP/Vista/7TCP/IPMiningMart

Web- Java

SPSS Modeler scripts Python 2.6 SPSS Modeler IBM SPSS Modeler Server options.cfg python_exe_path (,) SPSS Modeler scripts Python scripts SPSS Modeler

SPSS Education32 2011 SPSS Korea, Data Solution Inc. All rights reserved-3CRISP-DM (Cross Industry Standard Process for Data Mining)CRISP-DM NCR, OHRA, SPSS, Daimler-Benz,

CRISP-DM

/ / Modeling Modeling

/

33. , . CRISP-DM .2.1 SPSS Education33 2011 SPSS Korea, Data Solution Inc. All rights reserved2.2 SPSS Modeler SPSS Modeler Day 1-12.2 SPSS Modeler SPSS Modeler 14.2IBM SPSS ModelerIBM SPSS ModelerNode StatisticsSAS ExcelODBC Microsoft SQL ServerDB2Oracle

Source NodeNode

Operation NodeNode27

Graph NodeNodeDecision Tree, Regression, Neural Network, Clustering, Association .

Modeling NodeMining Node

Output NodeIBM SPSS ModelerCRISP-DM /Mining Data Mining35

SPSS Education35 2011 SPSS Korea, Data Solution Inc. All rights reserved-22.2 SPSS Modeler

36SPSS Education36 2011 SPSS Korea, Data Solution Inc. All rights reserved-32.2 SPSS Modeler AutomatedClassificationAssociationSegmentation

37SPSS Education37 2011 SPSS Korea, Data Solution Inc. All rights reserved2.2 SPSS Modeler IBM SPSS Modeler IBM SPSS StatisticsExcelSASODBC

Source NodeOracle, SQL Server, DB2ODBCASCII Statistics, IBM Cognos BI, Excel, SAS 6,7,8

38

SPSS Education38 2011 SPSS Korea, Data Solution Inc. All rights reserved2.2 SPSS Modeler IBM SPSS Modeler

Operation NodeRFM

39SPSS Education39 2011 SPSS Korea, Data Solution Inc. All rights reserved2.2 SPSS Modeler 2\3

Graph Node

XYWeb XY NodeXYMulti plot.Node

40SPSS Education40 2011 SPSS Korea, Data Solution Inc. All rights reservedIBM SPSS Modeler

Modeling Node

412.2 SPSS Modeler SPSS Education41 2011 SPSS Korea, Data Solution Inc. All rights reservedIBM SPSS Modeler

Modeling Node(Gains Chart)(Response Chart ) (Lift Chart) (Profit Chart)

ROI

422.2 SPSS Modeler

SPSS Education42 2011 SPSS Korea, Data Solution Inc. All rights reservedOutput

Output NodeStatistics html, txt.Data Quality Null Value, Blankhtml, txt.Display Export Statistics.savExcel SAS 6,7,8 ODBC)

432.2 SPSS Modeler SPSS Education43 2011 SPSS Korea, Data Solution Inc. All rights reserved3.1 Day 1

453.1

45463.1

= 46 473.1 Statistics tree_credit.sav SPSS Statistics

IBM SPSS Statistics 47483.1

48493.1

49 CHAID 503.1

50513.1 400 200

51523.1

52 CHAID 533.1

----53543.1

54553.1

2 18% 1 (89%) 10 1 4 5 5 89% 97%--55563.1

1 (89%) 10 1 4 5 5 89% 97%-- 3 6 7 5 58% 85%564.1 Day 14.1 1 Class $XF-Class 90%

58584.1 2

59 = 100%/: :, : :

59604.1

60614.1

$R-Credit rating $R- $RC- 0.0 1.0 CHAID 16% 100%

612464 1960 79%624.1

$R-Credit rating -- 2464 1960 79%

62 Statistics

634.1

Statistics

Copyright IBM Corporation 1994, 2011. Statistics

Copyright IBM Corporation 1994, 2011. Statistics PMML IBM SPSS Collaboration and Deployment Services

63644.1 645.1 (Estimation)| |Day 25.1 |SPSSModelerLinear RegressionNeural Network

--

66661. .(Method of Least Squares) : ,

675.1 |-672. ( Enter) : (Stepwise) : /(Backwards) : (Forwards) :

Regression in SPSS Modeler

685.1 |-683. 31.sav

695.1 |-694.

705.1 |-

705. 1

715.1 |-R1715. 2

725.1 |-F725. 3

735.1 |-Sig>0.05= -139.099+ 0.469*735. 4

745.1 |-Mean=0.001P-PY=x

P-P741. Framework of Neural NetworkThe Neurals in Human Body

755.1 |-752. 1 (Input Layer) (Hidden Layer) (Output Layer)...X1X2XPZ1ZWZ2Y*Kw1w2w3wpw*K1w*Km765.1 |-762. 2 (Input Layer) -0~1 (Hidden Layer) (Output Layer) (target)775.1 |-773. Neural NetworkMLP(Multi Layer perceptron) RBF(radial basis function)

12312

MLP

K-means1MLP;12

RBF785.1 |-784. SPSS Modeler Neural Network1Neural Network in SPSS Modeler795.1 |-

794. SPSS Modeler Neural Network2Neural Network in SPSS Modeler805.1 |-

1 Bagging&Boosting

805. Neural NetworkRBFGDP.xls

815.1 |-816.

825.1 |-827. 1

835.1 |-837. 2

845.1 |-848.

855.1 |-857000

1. 865.1 |-

86:----2. 875.1 |-87T S C I 3. 885.1 |-

()88Nave Method Moving AverageExponential SmoothingHoltBrownWinterARIMA

Multivariate ARIMA4. 895.1 |-

894.1.3 (Association)|4.1 |Day 24.1.3 |1. Buying Pattern

(Antecedent) (Consequent)

? ?(Antecedent) (Consequent)

? ?

(1) & (2) & & (m) Antecedents Consequent 91912. (Cross-Selling), (Up-Selling) - Negative Rule - Negative Rule. ~A B , A ~B , ~A ~B (Fraud Detection) : Negative Rule[](Shelf Planning)924.1.3 |923.

934.1.3 |934. (Association) (Instances)A (Support)1)Pr(A) = A / (Rule Support)Pr(A C) = (A C) / (Confidence)2)Pr(A | C) = (A C) / A (Lift)3)Pr(A | C) /Pr(C)=Pr(A C) / { Pr(A) * Pr(C) } (Deployability) 4)Pr(A) Pr(A C) = -A C, A : Antecedent(), C : Consequent(), (coverage)(accuracy)944.1.3 |945. (Association) (Rule support), (Confidence), (Lift)

Transactione.g.AssociationAnalysisRules

___% ().___% () ..Rule & 954.1.3 |956. SPSS Modeler(Support), (Confidence), (Antecedent)4AprioriCARMA964.1.3 |961. AprioriApriori100% 40%43%40%15%

0

974.1.3 |-Apriori972. Apriori 10%

1 >110%C5.0 0 1

984.1.3 |-Apriori983. SPSS ModelerApriori1 Apriori in SPSS Modeler994.1.3 |-Apriori

Apriori

993. SPSS ModelerApriori2 Apriori in SPSS Modeler1004.1.3 |-Apriori

100shopping.txt4. Apriori

1014.1.3 |-Apriori101Apriori /5. Apriori

1024.1.3 |-Apriori102IDMilk and Frozen foods=>Bakery goods8585{Milk Frozen foods}Milk and Frozen foods=>Bakery goods85/786*100%=10.814% Milk and Frozen foods=>Bakery goods9.033%9.033%{MilkFrozen foodsBakery goods}6. Apriori1

1034.1.3 |-Apriori103Milk and Frozen foods=>Bakery goods83.529%= 9.033%/ 10.814% Milk and Frozen foods=>Bakery goods83.529%42.69%83.529%/ 42.69%=1.948140MilkFrozen foodsBakery goods140/786=1.781%,6. Apriori2

1044.1.3 |-Apriori1041Bakery goods7.

1054.1.3 |-Apriori1051. SPSS ModelerCarma1064.1.3 |-Carma Carma in SPSS Modeler

106CarmaCarmaBASKETS1n.txt2. Carma

1074.1.3 |-Carma1074.1.3 |-9PCCPU1067IDsequence.xls

1.

108108IDID12

2. 1

1094.1.3 |-109IDIDIDIDID

3.

1104.1.3 |-110$S-ID-1=brioches$SC-ID-1=0.700$S-ID-2=biscuits$SC-ID-2=0.600brioches70%biscuits60%

4.

1114.1.3 |-111FailTelRepair.txt ID fieldIDtime fieldIndex1, content fieldStage2992101121125.1.1 SPSS Modeler Batch 5.1 Batch Day 25.1.1 SPSS Modeler Batch1. BusinessUnderstandingDataPreparationDataUnderstandingModelingEvaluation???Deployment Data Mining Process (CRISP-DM) Management of Mining ResultsModeler Batch

1141142. Modeler Batch

ClientServerBatch Program Batch Script Modeler Batch Mode System ArchitectureDOS

clientstreamclientstreamDOS

DOSClembstreamScriptDOSDOSClembScript1155.1.1 SPSS Modeler Batch1153. SPSS Modeler Batch SPSS ModelerModeler BatchModeler Servemodelerscriptmodeler batchclemb.. -server SPSS Modeler -hostname-port-username-password -domain -hostname -port -username -password -password -epassword SPSS Modeler -domain -P =@ @ -directory -server_directory

-directory -execute-stream -script -model .gm -state -project -output .cou -help-P =1165.1.1 SPSS Modeler Batch1164. SPSS Modeler Batch1

batch SPSS Modeler Stream/Batch SPSS Modeler Stream (Clem_Batch_Test.str)

Server(C:\BATCH_TEST\ Clem_Batch_Test.str)Batch (C:\BATCH_TEST\ clem_batch.bat)

SPSS Modeler nodeBatch.

->1225.1.1 SPSS Modeler Batch1225.1.2 5.1 Batch Day 2Q & A124SPSS Education124 2010 SPSS Korea, Data Solution Inc. All rights reserved