01. introductionoverview of data mining

Upload: febrian-arsy

Post on 07-Aug-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/20/2019 01. IntroductionOverview of Data Mining

    1/32

  • 8/20/2019 01. IntroductionOverview of Data Mining

    2/32

    Introduction of Data Mining

    Session 01

    Matakuliah : M0824-Data Mining Tahun : Sep - 2011

  • 8/20/2019 01. IntroductionOverview of Data Mining

    3/32

    Bina NusantaraUni ersit! "

    Learning Outcomes#$plain data %ining concepts and techni&ues'

  • 8/20/2019 01. IntroductionOverview of Data Mining

    4/32

    Bina Nusantara

    Acknowledgments

    These slides have been adaptedfrom Han, J., Kamber, ., !

    "ei, #. $%&&'(. )ata ining*+oncepts and Techni ue. -disi%. organ Kaufman. an/rancisco.

  • 8/20/2019 01. IntroductionOverview of Data Mining

    5/32

    Bina Nusantara

    ( 0hat otivated )ata ining1( )ata ining 2 On 0hat Kind of )ata1

    ( )ata ining /unctionalities( +lassi3cation of )ata ining 4stems( 5ntegration of a )ata ining 4stems with a

    )atabase or )ata 0arehouse 4stems

    ( a6or 5ssues in )ata ining

    Outline ateri

    5

  • 8/20/2019 01. IntroductionOverview of Data Mining

    6/32

    0hat otivated )ata

    ining1

    Bina NusantaraUni ersit! )

  • 8/20/2019 01. IntroductionOverview of Data Mining

    7/32

    0h4 )ata ining1( The #$plosi e *ro+th of Data: fro% tera,!tes to peta,!tes

    Data collection and data a aila,ilit!

    ( .uto%ated data collection tools/ data,ase s!ste%s/ e,/ co%puteri edsociet!

    Ma or sources of a,undant data

    ( Business: e,/ e-co%%erce/ transactions/ stocks/ 3

    ( Science: 4e%ote sensing/ ,ioinfor%atics/ scienti5c si%ulation/ 3

    ( Societ! and e er!one: ne+s/ digital ca%eras/ 6ouTu,e

    ( e are dro+ning in data/ ,ut star ing for kno+ledge7

    ( 8Necessit! is the %other of in ention9 Data %ining .uto%ated anal!sis of %assi edata sets

    Bina Nusantara Uni ersit!

    ;

  • 8/20/2019 01. IntroductionOverview of Data Mining

    8/32

    0hat 5s )ata ining1( Data %ining

  • 8/20/2019 01. IntroductionOverview of Data Mining

    9/32

    March )/ 201)

    Data Mining: oncepts and Techni&ues C

    ?no+ledge Disco er!

  • 8/20/2019 01. IntroductionOverview of Data Mining

    10/32

    0hat is $not( )ata ining1 What is Data Mining?

    – Certain names are more prevalent incertain US locations ( !"rien# !$%r&e#!$eill' in "oston area)

    – *ro%p together similar +oc%mentsret%rne+ ,' search engine accor+ing to

    their conte t (e.g. /ma on rain1orest# /ma on.com#)

    What is not Data Mining?

    – oo& %p phone

    n%m,er in phone+irector'

    – 3%er' a We, searchengine 1or in1ormation

    a,o%t /ma on

  • 8/20/2019 01. IntroductionOverview of Data Mining

    11/32

    March )/ 201)

    Data Mining: oncepts and Techni&ues 11

    Data Mining in Business Intelligence Increasing potentialto supportbusiness decisions End User

    Business nal!st

    Datanal!st

    DB

    Decision Making

    Data PresentationVisualization Techniques

    Data Mining Information Discovery

    Data Exploration Statistical Summary, Querying, and Reporting

    Data Preprocessing/Integration, Data Warehouses

    Data Sources Paper, iles, !e" documents, Scientific e#periments, Data"ase Systems

  • 8/20/2019 01. IntroductionOverview of Data Mining

    12/32

    Data Mining Tasks'''( lassi5cation E redicti eF( lustering EDescripti eF( .ssociation 4ule Disco er! EDescripti eF

    ( Se&uential attern Disco er! EDescripti eF( 4egression E redicti eF( De iation Detection E redicti eF

  • 8/20/2019 01. IntroductionOverview of Data Mining

    13/32

    )ata ining 2 On 0hat

    Kind of )ata1

    Bina NusantaraUni ersit! 1"

  • 8/20/2019 01. IntroductionOverview of Data Mining

    14/32

    March )/ 201)

    Data Mining: oncepts and Techni&ues 1G

    Data Mining Hunction:

  • 8/20/2019 01. IntroductionOverview of Data Mining

    15/32

    March )/ 201)

    Data Mining: oncepts and Techni&ues 1K

    Data Mining Hunction:

    ( o+ to use such patterns for classi5cation/ clustering/and other applications>

  • 8/20/2019 01. IntroductionOverview of Data Mining

    16/32

    .ssociation 4ule Disco er!: .pplication

    1( Marketing and Sales ro%otion:Jet the rule disco ered ,e {Bagels, … } --> {Potato Chips}

    otato hips as conse&uent OP an ,e used to

    deter%ine +hat should ,e done to ,oost its sales'Bagels in the antecedent OP an ,e used to see +hichproducts +ould ,e aQected if the store discontinuesselling ,agels'Bagels in antecedent and otato chips in conse&uent OP

    an ,e used to see +hat products should ,e sold +ithBagels to pro%ote sale of otato chips7

    Bina Nusantara Uni ersit!

    1)

  • 8/20/2019 01. IntroductionOverview of Data Mining

    17/32

    .ssociation 4ule Disco er!: .pplication

    2( Super%arket shelf %anage%ent'*oal: To identif! ite%s that are ,ought together ,!su cientl! %an! custo%ers'.pproach: rocess the point-of-sale data collected +ith,arcode scanners to 5nd dependencies a%ong ite%s'. classic rule --

    ( If a custo%er ,u!s diaper and %ilk/ then heis er! likel! to ,u! ,eer'

    Bina Nusantara Uni ersit!

    1;

  • 8/20/2019 01. IntroductionOverview of Data Mining

    18/32

    March )/ 201)

    Data Mining: oncepts and Techni&ues 1A

    Data Mining Hunction:

  • 8/20/2019 01. IntroductionOverview of Data Mining

    19/32

    lassi5cation: .pplication 1( Direct Marketing

    *oal: 4educe cost of %ailing ,! targeting a setof consu%ers likel! to ,u! a ne+ cell-phoneproduct'.pproach:

    ( Use the data for a si%ilar product introduced ,efore'( e kno+ +hich custo%ers decided to ,u! and +hich

    decided other+ise' This {buy, don’t buy} decisionfor%s the class attribute '

    Bina Nusantara Uni ersit!

    1C

  • 8/20/2019 01. IntroductionOverview of Data Mining

    20/32

    lassi5cation: .pplication 2

    ( Hraud Detection*oal: redict fraudulent cases in credit cardtransactions'.pproach:

    ( Use credit card transactions and the infor%ation onits account-holder as attri,utes'

    hen does a custo%er ,u!/ +hat does he ,u!/ ho+often he pa!s on ti%e/ etc

    Bina Nusantara Uni ersit!

    20

  • 8/20/2019 01. IntroductionOverview of Data Mining

    21/32

    lassi5cation: .pplication "( usto%er .ttrition@ hurn:

    *oal: To predict +hether a custo%er is likel! to ,e lost to aco%petitor'.pproach:

    ( Use detailed record of transactions +ith eachof the past and present custo%ers/ to 5ndattri,utes'

    o+ often the custo%er calls/ +here he calls/ +hat ti%e-of-the da!he calls %ost/ his 5nancial status/ %arital status/ etc'

    ( Ja,el the custo%ers as lo!al or dislo!al'

    Bina Nusantara Uni ersit!

    21

  • 8/20/2019 01. IntroductionOverview of Data Mining

    22/32

    March )/ 201)

    Data Mining: oncepts and Techni&ues 22

    Data Mining Hunction:

  • 8/20/2019 01. IntroductionOverview of Data Mining

    23/32

    lustering: .pplication 1( Market Seg%entation:

    *oal: su,di ide a %arket into distinct su,setsof custo%ers +here an! su,set %a!concei a,l! ,e selected as a %arket target to,e reached +ith a distinct %arketing %i$'.pproach:

    ( ollect diQerent attri,utes of custo%ers ,ased ontheir geographical and lifest!le related infor%ation'

    ( Hind clusters of si%ilar custo%ers'

    Bina Nusantara Uni ersit!

    2"

  • 8/20/2019 01. IntroductionOverview of Data Mining

    24/32

    lustering: .pplication 2( Docu%ent lustering:

    *oal: To 5nd groups of docu%ents that are si%ilar toeach other ,ased on the i%portant ter%s appearing inthe%'

    .pproach: To identif! fre&uentl! occurring ter%s in eachdocu%ent' Hor% a si%ilarit! %easure ,ased on thefre&uencies of diQerent ter%s'

    Bina Nusantara Uni ersit!

    2G

  • 8/20/2019 01. IntroductionOverview of Data Mining

    25/32

  • 8/20/2019 01. IntroductionOverview of Data Mining

    26/32

    March )/ 201)

    Data Mining: oncepts and Techni&ues 2)

    Data Mining: on uence of Multiple Disciplines

    Data Mining

    Machineearning

    Statistics

    /pplications

    /lgorithm

    6attern$ecognition

    7igh-6er1ormanceComp%ting

    is%ali ation

    Data,ase9echnolog'

  • 8/20/2019 01. IntroductionOverview of Data Mining

    27/32

    5ntegration of a )ata ining4stems with a )atabase or

    )ata 0arehouse 4stems

    Bina NusantaraUni ersit! 2;

  • 8/20/2019 01. IntroductionOverview of Data Mining

    28/32

    March )/ 201)

    Data Mining: oncepts and Techni&ues 2A

    Integration of Data Mining and Data arehousing

    ( )ata mining s4stems, )7 , )ata warehouse s4stems coupling

    No coupling/ loose-coupling/ se%i-tight-coupling/ tight-coupling

    ( On8line anal4tical mining data

    integration of %ining and J. technologies

    ( 5nteractive mining multi8level knowledge

    Necessit! of %ining kno+ledge and patterns at diQerent le els of

    a,straction ,! drilling@rolling/ pi oting/ slicing@dicing/ etc'( 5ntegration of multiple mining functions

    haracteri ed classi5cation/ 5rst clustering and then association

  • 8/20/2019 01. IntroductionOverview of Data Mining

    29/32

    March )/ 201)

    Data Mining: oncepts and Techni&ues 2C

    .rchitecture: T!pical Data Mining S!ste%

    data cleaning" integration" and selection

    Data,ase or DataWareho%se Server

    Data Mining :ngine

    6attern :val%ation

    *raphical User ;nter1ace

  • 8/20/2019 01. IntroductionOverview of Data Mining

    30/32

    a6or 5ssues in )ata

    ining

    Bina NusantaraUni ersit! "0

  • 8/20/2019 01. IntroductionOverview of Data Mining

    31/32

    March )/ 201)

    Data Mining: oncepts and Techni&ues "1

    Ma or hallenges in Data Mining( # cienc! and scala,ilit! of data %ining algorith%s

    ( arallel/ distri,uted/ strea%/ and incre%ental %ining %ethods

    ( andling high-di%ensionalit!

    ( andling noise/ uncertaint!/ and inco%pleteness of data

    ( Incorporation of constraints/ e$pert kno+ledge/ and ,ackgroundkno+ledge in data %ining

    ( attern e aluation and kno+ledge integration

    ( Mining di erse and heterogeneous kinds of data: e'g'/

    ,ioinfor%atics/ e,/ soft+are@s!ste% engineering/ infor%ationnet+orks

    ( .pplication-oriented and do%ain-speci5c data %ining

    ( In isi,le data %ining

  • 8/20/2019 01. IntroductionOverview of Data Mining

    32/32

    Bina Nusantara

    )ilan6utkan ke pert. &%)ata "re8processing