data mining data mining

5
Data Mining Data Mining Decision Support Decision Support Data Mining Data Mining Decision Support Decision Support ? "Decision Support for Data Mining" Data Mining Decision Support Supporting decisions in the DM process, e.g.: –ROC methodology –Meta-learning and multi-strategy learning "Data Mining for Decision Support" Data Mining Decision Support Incorporating DM methods into DSS, e.g.: MS OLE DB for DM MS Analysis Services Improving models by data analysis Data Integrating DM and DS through Models Data Mining Decision Support Model Model Expertise Sequential Application: DM, then DS Data Mining Decision Support Model Model Data DM & DS in Data Pre-Processing Data Mining Decision Support

Upload: tommy96

Post on 29-Nov-2014

903 views

Category:

Documents


5 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Data Mining Data Mining

��

�� ��� �� �� �� �������� ����������� ������ ��� ������ � ���� ������ ������ ��� ������ � ���� ��

�� ��� �� �� �� ! Data MiningData Mining

"#$%&'()' (*+,$-'./0.$1 (232�4567689 : ;4<7=4>?4<!7=�4567689 9 @A7!7 Decision SupportDecision Support

1$('&&*#) :8�4@:8�4@ BCD EF GEHDICJK ,&2++*0*,23*$#K ,&L+3'.*#)K '-2&L23*$#K 2#2&/+*+K -*+L2&*M23*$#K 'NO&2#23*$#K PPP

�� ��� �� �� �

QR���S�T ����QR���S�T ����UUUVWXYZ[\]^]_V\Y`^\a]bcdefgf fhijg kile mfnk Data MiningData Mining

"#$%&'()' (*+,$-'./0.$1 (2324>?4<!7 Decision SupportDecision Support

1L&3*o233.*pL3' 1$('&*#) :8�4@:8�4@ qEHDIrJK ,L+3$1'.+ ,&L+3'.'(K 3/O'+ $0 ,2.+K ,L+3$1'.+ 2#2&/+'(K 2++$,*23*$# .L&'+K ,2. '-2&L23*$# 1$('&K ,2. '-2&L23*$# 1$('&K PPP �� ��� �� �� �

�� s �� t�������� � u�� s �� t�������� � uvwxw yz{z{| v}~ z� z�{ ������x?

�� ��� �� �� �

�� s �� t�������� � ��� s �� t�������� � �"Decision Support for Data Mining"

Data Mining

Decision Support

Supporting decisions in the DM process, e.g.:

–ROC methodology–Meta-learning and multi-strategy learning

"Data Mining for Decision Support"

Data Mining

Decision Support

Incorporating DM methods into DSS, e.g.:• MS OLE DB for DM• MS Analysis Services• Improving models by data analysis

Data

Integrating DM and DS throughModels

Data Mining

Decision Support

ModelModel

Expertise

Sequential Application:DM, then DS

Data Mining

Decision Support

ModelModel Data

DM & DS in Data Pre-Processing

Data Mining

Decision Support �� ��� �� �� �

��������������������

Page 2: Data Mining Data Mining

��

�� ��� �� �� �

� �������� ������� ���� ������ ����� �v� ��� vy��� ��� ������������ �������� ������ �vy ��� v���� �  �¡¢ £¤ ¥� � £�� �  ¦����§�§  � �̈ �©�§� ����� ��¨�§��� ª¥ ��� ����«� �vy¬ x­}{ v�� ®�}¯�}{x zw° w��°z~wx z�{±�� £�©�§���§�¦��²��� �³³ ���©�� �v�¬ x­}{ vy� ®�}¯�}{x zw° w��°z~wx z�{±�� ´§��� �����§ �� ���� ³ ���³ ��©�§§��� ¥� � £�� �vy w{µ v�� ®�w�w°°}° w��°z~wx z�{±�� ©��¶����� �� ��´�� �����§ · � �̧ ¸· £¢¹� �� ²º»¼� ©��§��� ���� ��¥¥� ���� ³ ��¶��� �����§���§ �� ��� �� �� �

½½�� ¾ � ���� ¾ � ��¿¿vwxw yz{z{| v}~ z� z�{ ������x£�©�§��� §´³³��� À����� ��� £�³��©�§§� �̧ ¸· ��� ©´ �̈ �§

�� ��� �� �� �

ÁÂ� �����ÁÂ� �����à ÄÅÆÇÈÉÊËÌËÍÇ ÅÎ ÌÇÏÐ� �́ � ³�§ ¸ Ñ Ð³�§ ¸Ò ÓÔÕÖ × ØÙÚÛÙ × ÜÙÝÒ ÓÔÕÞ × ßÙÚÛÙ × àÙÝà áÎ âÊÇÈÉÊËÌËÍÇ ÅÎ ÌÇÏÐ¥��§� ³�§ ¸ Ñ Ð��� ¸Ò ãÔÕÖ × äÙÚÛÙ × åÙÝÒ ãÔÕÞ × ÙÚÛÙ × ÙÝà æçè ÊÈÎéÇ��§Ò ãÔÕ êë ì íîïðÒ ÓÔÕ êë ñ íîïð0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80% 100%

False positive rate

Tru

e p

osi

tive

rate

classifier 1

classifier 2

Classifier 2

Predicted positive Predicted negative

Positive examples 30 20 50Negative examples 0 50 50

30 70 100

Classifier 1

Predicted positive Predicted negative

Positive examples 40 10 50Negative examples 10 40 50

50 50 100

ÔòóòÕ ãôíõö

�� ��� �� �� �0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80% 100%

Confirmation rules

WRAcc

CN2

false positive rate

tru

e p

osi

tive

rat

e

÷ø� ÁÂ� � �ù�R ø�SS÷ø� ÁÂ� � �ù�R ø�SS

�� ��� �� �� �

÷ø� ÁÂ� � �ù�R ø�SS÷ø� ÁÂ� � �ù�R ø�SS0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80% 100%

false positive rate

tru

e p

osi

tive

rat

e

�� ��� �� �� �0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80% 100%

false positive rate

tru

e p

osi

tive

rat

e

FPcost

FNcost= 1

2

Neg

Pos= 4

slope = 42 = 2

�ø ���� � �S����¾����ø ���� � �S����¾���

Page 3: Data Mining Data Mining

�ú

�� ��� �� �� �0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80% 100%

false positive rate

tru

e p

osi

tive

rat

e

�ø ���� � �S����¾ ����ø ���� � �S����¾���FPcost

FNcost= 1

8

Neg

Pos= 4

slope = 48 = .5 �� ��� �� �� �

½½�� ¾ � ���� ¾ � ��¿¿vwxw yz{z{| v}~ z� z�{ ������xº�� ���´©��� £�������§ ���� ��� £  ³ ��©�§§ ûÒ üý ýþÿ ýòÕ�òÕ � �ëíô�ðïð ýòÕ�ïõòðÒ �ê�òô Õò�ïðïêë

�� ��� �� �� �

½½�� ¾ � ���� ¾ � ��¿¿ T � ��S Á�ù��� �T � ��S Á�ù��� �

�� ��� �� �� �

½½�� ��� ���� ��� ��¿¿TT ��� ��� ������ ��� ���vwxw yz{z{| v}~ z� z�{ ������x���� �

���� �� �� �� ���

�� ��� �� �� �Data � ! " #$% Expertise

Model

&´�������¨� ��� �� �©��©���´������� ��¶´�� �����(Zupan & Bohanec)

½½�� ��� ���� ��� ��¿¿ TT ��� ��� ������ ��� ���vwxw yz{z{| v}~ z� z�{ ������x�� ��� �� �� �

'()*+ (, -.* /012(3'()*+ (, -.* /012(345 678 9 :;< û ¥ ��� �=³����§�> 5 ?@AB 9 :;< û ¥ ��� ����C 5 D EFG HI JKGL û ¥ ��� ���� ´���� �=³� �� §´³� �̈ �§���M 5 DG HJN ;û ²º»¼���¨���³�� ����� §´¶§�O´����� ��¥���� ¶� ��� �=³� ��P 5 QN HN ;;G ;û ³� ������ ��¨���³���� �¥ �����ª§« ¶� £¢¹ ��� ²º»¼R 5 S9 TUJ:GL û ©��¶����� §´¶������§ ��¨���³�� �� ��¥¥� ���� À��§

Page 4: Data Mining Data Mining

�V

�� ��� �� �� �v}~ z� z�{ ������x vwxw yz{z{|

Model

��W������S X��S����� �T��W������S X��S����� �T½½Y����Y���� ���Z �ø���Z �ø�� ����¿¿

�� ��� �� �� �

History Present status Tests

Symptoms Ulcers Amputations

RISK

Other changes Deformities

Loss of prot. sensation

Absence of pulse º�³´� ��� ��¶´��§ [��� ����� ��� ��¶´��§

½½��Z �ø����Z �ø�� ����¿¿ �� ���� \���� ���� \��]]\� �������\� �������

�� ��� �� �� �vwxw yz{z{| v}~ z� z�{ ������x

Model

��W������S X��S����� �T��W������S X��S����� �T½½Y����Y���� ����Z �ø��Z �ø�� ����¿¿

�� ��� �� �� �

��W������S X��S����� �T��W������S X��S����� �T½½Y����Y���� ����Z �ø��Z �ø�� ����¿¿vwxwyz{z{| v}~ z� z�{������x���� � ^ ���� � _

�� ��� �� �� �

�`aba��b�`aba��bccd�d�ccef�� g`h�fef�� g`h�fijk j lmnmnoplqrst uqnvk wxyk mqnz isymvmqn lqrstmn {l|p}~ny�wqnm�jk mqn qw ��tqjrz

isymvmqn }x��qwk }�sttv �� qn �jt�� qn �s��ôí� ���íë òó íô�

�� ��� �� �� �

\���SS�S X��S����� ��T\���SS�S X��S����� ��T��S� ��S���S���S� ���� � ��S�Z �ø��� ��S�Z �ø�� ����vwxwyz{z{| v}~ z� z�{������x���� � ^ ���� � ����� � _

Page 5: Data Mining Data Mining

��

�� ��� �� �� �

�/(��*� ��/(��*� ��/*)2�12(3 (, ��0)*�2� ���2*�*�*31�/*)2�12(3 (, ��0)*�2� ���2*�*�*31��555���z�w�� �~­��° �z|­ �~­��°  ¡¢� £¤ ¥Õí��íóòð¤ Ø êÕ Û¦¤ ¥Õí��íóòð¤ å êÕ ß§¤ �Õêôêë¥ò�¨¤ ©íïôð ðêêë ª¤ ©íïôð ôíóò��}µz~x z�{ �� ��� �� �� �

�� s �� t�������� �T X��������� s �� t�������� �T X�����������GA 1st grade

5

Slovene

GA 1st grade

2

History 4

Physics 4

unex abs 3rd semage enrol

2414

<=1 >1

<= 3 > 3

<= 2 > 2

<= 2 > 2

<= 1 > 1

<= 180 > 180 <= 0 > 0

LEGEND:

GA 1st grade - general achievement of the first highschool grade

Slovene - mark of subject Slovene languageHistory - mark of subject HistoryPhysics - mark of subject Physcisage enrol - age at enrolment (in months)unex ab 3rd sem - unexcused absence in the third

semester (hours)

final achievementc5

c1for lang 8th gradegen ach 7th grade

c2regular enrolfor lang

c7c3

citizenshipbirth state

c6gen ach prim schc4

math 8th gradephys 8th grade

vy� �«¬­vy� ®}¯w v�� v°± z

�� ��� �� �� �

\���SS�S X��S����� �\���SS�S X��S����� �T Q� ²�����T Q� ²�����Decision Support Data Mining

Building ConstructionProject Attributes

Models for Building Feasibility

Models for Client Value

Building Designs to maximiseClient Value

FeasibleBuilding Designs

ValueZone

FeasibleZone

Quality

Size

Sha

peýóò�ò üê�ôò³ üíÕ´ê �êöíëòõ³ µÕïõ ¶ðóÕê·ð´ï �� ��� �� �� �

� ��S��� �� ��S��� �� vy ¸ v� w����w~­}� w�}�Ò õê��ôò�òëóíÕ�Ò ð���ôò�òëóíÕ�� ¬}¹ w{µ µ}º}°��z{| �}�}w�~­ w�}w� ­��z~w° ~��»z{wx z�{��Ò ¼½ ¾¿À ¼ÁÒ ¼Á ¾¿À ¼½Ò ¼Á ÃÄÅÆ ¼½Ò ¼½Â ÃÄÅÆ ¼ÁÒ ¼Á ÇÆÈ ¼½� É�}{ ¯�}�x z�{��Ò ©êÕ�íôïÊíóïêë Ë©Õí�ò·êÕ´Ì ê© ÍüÎÍý ïëóò¥ÕíóïêëÒ õê��êë �òóöê�êôê¥ïòð íë� í��ÕêíõöòðÒ ðóíë�íÕ�ïÊíóïêë