www.monash.edu.au advanced topics in data mining and research directions cse5610 intelligent...
TRANSCRIPT
![Page 1: Www.monash.edu.au Advanced Topics in Data Mining and Research Directions CSE5610 Intelligent Software Systems Semester 1, 2006](https://reader035.vdocuments.mx/reader035/viewer/2022070307/551af5995503462e578b4bd4/html5/thumbnails/1.jpg)
www.monash.edu.au
Advanced Topics in Data Mining and Research DirectionsAdvanced Topics in Data Mining and Research Directions
CSE5610 Intelligent Software Systems
Semester 1, 2006
![Page 2: Www.monash.edu.au Advanced Topics in Data Mining and Research Directions CSE5610 Intelligent Software Systems Semester 1, 2006](https://reader035.vdocuments.mx/reader035/viewer/2022070307/551af5995503462e578b4bd4/html5/thumbnails/2.jpg)
www.monash.edu.au
2
Outline
• Mining Different Data Types
– Spatial, Temporal, Time Series, Data Streams, Multimedia, XML, Web, Text etc.
• Distributed Data Mining (DDM)
• Mobile & Ubiquitous Data Mining (UDM)
• Data Mining E-Services
• Anytime, Anywhere Data Mining E-Services
![Page 3: Www.monash.edu.au Advanced Topics in Data Mining and Research Directions CSE5610 Intelligent Software Systems Semester 1, 2006](https://reader035.vdocuments.mx/reader035/viewer/2022070307/551af5995503462e578b4bd4/html5/thumbnails/3.jpg)
www.monash.edu.au
3
Generations of Data Mining
• Four Generations of Data Mining Systems – Robert Grossman
• First Generation
– Stand Alone, Centralised, Single Algorithm
• Second Generation
– Integration with databases, support for high-dimensionality, complex data types
• Third Generation
– Distribution and Heterogeniety
• Fourth Generation
– Support for mining embedded, mobile and ubiquitous data sources
![Page 4: Www.monash.edu.au Advanced Topics in Data Mining and Research Directions CSE5610 Intelligent Software Systems Semester 1, 2006](https://reader035.vdocuments.mx/reader035/viewer/2022070307/551af5995503462e578b4bd4/html5/thumbnails/4.jpg)
www.monash.edu.au
Distributed Data Mining
![Page 5: Www.monash.edu.au Advanced Topics in Data Mining and Research Directions CSE5610 Intelligent Software Systems Semester 1, 2006](https://reader035.vdocuments.mx/reader035/viewer/2022070307/551af5995503462e578b4bd4/html5/thumbnails/5.jpg)
www.monash.edu.au
5
Distributed Data Mining
• Inherently distributed data
• MNC + Global Markets
• => Physical/geographical separation of users from the data sources
• Traditional data mining model involving the co-location of users, data and computational resources is inadequate
![Page 6: Www.monash.edu.au Advanced Topics in Data Mining and Research Directions CSE5610 Intelligent Software Systems Semester 1, 2006](https://reader035.vdocuments.mx/reader035/viewer/2022070307/551af5995503462e578b4bd4/html5/thumbnails/6.jpg)
www.monash.edu.au
6
Distributed Data Mining (DDM)
• The inherent distribution of data and other resources as a result of organisations being distributed.
• The large volumes of data, the transfer of which results in exorbitant communication costs.
• The need to mine heterogeneous data, the integration of which is both non-trivial and expensive.
• The performance and scalability bottle necks of data mining.
![Page 7: Www.monash.edu.au Advanced Topics in Data Mining and Research Directions CSE5610 Intelligent Software Systems Semester 1, 2006](https://reader035.vdocuments.mx/reader035/viewer/2022070307/551af5995503462e578b4bd4/html5/thumbnails/7.jpg)
www.monash.edu.au
7
Distributed Data Mining (DDM)
• DDM = Data Mining (DM) + Knowledge Integration (KI)
• DM - Performing traditional knowledge discovery at each distributed data site.
• KI - Merging the results generated from the individual sites into a body of cohesive and unified knowledge.
![Page 8: Www.monash.edu.au Advanced Topics in Data Mining and Research Directions CSE5610 Intelligent Software Systems Semester 1, 2006](https://reader035.vdocuments.mx/reader035/viewer/2022070307/551af5995503462e578b4bd4/html5/thumbnails/8.jpg)
www.monash.edu.au
8
Parallel Data Mining (PDM)
• Principal distinction between DDM & Parallel DM– parallel mining involves parallel processors
with or without shared memory
• Parallel data mining also includes development of parallel versions of traditional data mining techniques.
• Can be integration – DecisionCentre
![Page 9: Www.monash.edu.au Advanced Topics in Data Mining and Research Directions CSE5610 Intelligent Software Systems Semester 1, 2006](https://reader035.vdocuments.mx/reader035/viewer/2022070307/551af5995503462e578b4bd4/html5/thumbnails/9.jpg)
www.monash.edu.au
9
DDM – Algorithms & Architectures
• Research in distributed data mining can be divided into two broad categories [Fu01]:
• Data Mining Algorithms. – focus on efficient techniques for knowledge
integration.
• Distributed Data Mining Architectures.– focus on development of distributed data mining
architectures
– emphasizes the processes and technologies that support construction of software systems to perform distributed data mining
![Page 10: Www.monash.edu.au Advanced Topics in Data Mining and Research Directions CSE5610 Intelligent Software Systems Semester 1, 2006](https://reader035.vdocuments.mx/reader035/viewer/2022070307/551af5995503462e578b4bd4/html5/thumbnails/10.jpg)
www.monash.edu.au
10
Taxonomy of DDM Architectures
Distributed DataMining Systems
Client-Server Agents
Stationary Mobile
Architectures
Self-directedmigration
![Page 11: Www.monash.edu.au Advanced Topics in Data Mining and Research Directions CSE5610 Intelligent Software Systems Semester 1, 2006](https://reader035.vdocuments.mx/reader035/viewer/2022070307/551af5995503462e578b4bd4/html5/thumbnails/11.jpg)
www.monash.edu.au
11
Classification – DDM Systems
DDM Architectural Models DDM Systems
Client-server DecisionCentre [CDG99], IntelliMiner [PaS99, PaS01], InterAct [PaD02]
Agents Mobile Agent Stationary Agent
JAM [SPT97], Infosleuth [UMG98, MUU99], BODHI [KPH99], Papyrus [Ram98], PADMA [KHS97a, KHS97b]
![Page 12: Www.monash.edu.au Advanced Topics in Data Mining and Research Directions CSE5610 Intelligent Software Systems Semester 1, 2006](https://reader035.vdocuments.mx/reader035/viewer/2022070307/551af5995503462e578b4bd4/html5/thumbnails/12.jpg)
www.monash.edu.au
12
Client-Server DDM
PC Workstation Laptop
Data Mining Sever
DataServer 2
DataTransfer
UserData Mining
Request
DataMiningResults
DataServer 1
![Page 13: Www.monash.edu.au Advanced Topics in Data Mining and Research Directions CSE5610 Intelligent Software Systems Semester 1, 2006](https://reader035.vdocuments.mx/reader035/viewer/2022070307/551af5995503462e578b4bd4/html5/thumbnails/13.jpg)
www.monash.edu.au
13
Mobile Agent Model for DDM
PC Workstation
Task Controlling Agent
USERS
Agent SystemData MiningResult Agent
Data MiningResult Agent
DirectoryService
KnowledgeIntegration Agent
Data Resource Agents
DataServer 1
DataServer 1
Laptop
Data Mining Agents
![Page 14: Www.monash.edu.au Advanced Topics in Data Mining and Research Directions CSE5610 Intelligent Software Systems Semester 1, 2006](https://reader035.vdocuments.mx/reader035/viewer/2022070307/551af5995503462e578b4bd4/html5/thumbnails/14.jpg)
www.monash.edu.au
14
Hybrid Model for DDM
DDM Server
Agent Centre
DataSource 1
DataSource2
DataSource n
ClientServer
AgentAgent
Optimiser
![Page 15: Www.monash.edu.au Advanced Topics in Data Mining and Research Directions CSE5610 Intelligent Software Systems Semester 1, 2006](https://reader035.vdocuments.mx/reader035/viewer/2022070307/551af5995503462e578b4bd4/html5/thumbnails/15.jpg)
www.monash.edu.au
Ubiquitous Data Mining
![Page 16: Www.monash.edu.au Advanced Topics in Data Mining and Research Directions CSE5610 Intelligent Software Systems Semester 1, 2006](https://reader035.vdocuments.mx/reader035/viewer/2022070307/551af5995503462e578b4bd4/html5/thumbnails/16.jpg)
www.monash.edu.au
16
Ubiquitous Data Mining (UDM)
• Mining data in a resource-constrained environment to support the time critical information needs of mobile users
• Typical Characteristics– Mobile User – frequent disconnections– Handheld Device -
> Resource constraints – memory, battery, processor, screen real-estate
– Time critical– Real-time & On-line – Data Streams
• Example Scenarios
• Many Challenges
![Page 17: Www.monash.edu.au Advanced Topics in Data Mining and Research Directions CSE5610 Intelligent Software Systems Semester 1, 2006](https://reader035.vdocuments.mx/reader035/viewer/2022070307/551af5995503462e578b4bd4/html5/thumbnails/17.jpg)
www.monash.edu.au
17
Current Research
• Kargupta’s Group– MobiMine
• @CSSE, Monash Univ.– AgentUDM
– Adapative, Cost-efficient & Light-weight data mining techniques for data streams
> Mohamed Medhat > LWC, LWF & LWClass
> Watch this space!!!
![Page 18: Www.monash.edu.au Advanced Topics in Data Mining and Research Directions CSE5610 Intelligent Software Systems Semester 1, 2006](https://reader035.vdocuments.mx/reader035/viewer/2022070307/551af5995503462e578b4bd4/html5/thumbnails/18.jpg)
www.monash.edu.au
Data Mining E-Services
![Page 19: Www.monash.edu.au Advanced Topics in Data Mining and Research Directions CSE5610 Intelligent Software Systems Semester 1, 2006](https://reader035.vdocuments.mx/reader035/viewer/2022070307/551af5995503462e578b4bd4/html5/thumbnails/19.jpg)
www.monash.edu.au
19
Data Mining E-Services
• “…data analysis and mining functions themselves will be offered as business intelligence e-services that accept operational data from clients and return models or rules”
Umesh Dayal, 2001
•Why? – Knowledge is a key resource – Cost of data mining infrastructure
![Page 20: Www.monash.edu.au Advanced Topics in Data Mining and Research Directions CSE5610 Intelligent Software Systems Semester 1, 2006](https://reader035.vdocuments.mx/reader035/viewer/2022070307/551af5995503462e578b4bd4/html5/thumbnails/20.jpg)
www.monash.edu.au
20
Data Mining E-Services
• Current Commercial Landscape– Several ASPs -> DigiMine, Information Discovery,
WhiteCross Systems, ListAnalyst.com etc. etc.
– Mode of Operation
• Hybrid Model & Data Mining ASPs– Optimise Response Time
> Leads to improved throughput
– QoS Estimation
– Location Preferences of Clients
![Page 21: Www.monash.edu.au Advanced Topics in Data Mining and Research Directions CSE5610 Intelligent Software Systems Semester 1, 2006](https://reader035.vdocuments.mx/reader035/viewer/2022070307/551af5995503462e578b4bd4/html5/thumbnails/21.jpg)
www.monash.edu.au
21
Data Mining E-Services
• Current Commercial Landscape– Several ASPs -> DigiMine, Information Discovery,
WhiteCross Systems, ListAnalyst.com etc. etc.
– Mode of Operation
• Hybrid Model & Data Mining ASPs– Optimise Response Time
> Leads to improved throughput
– QoS Estimation
– Location Preferences of Clients
![Page 22: Www.monash.edu.au Advanced Topics in Data Mining and Research Directions CSE5610 Intelligent Software Systems Semester 1, 2006](https://reader035.vdocuments.mx/reader035/viewer/2022070307/551af5995503462e578b4bd4/html5/thumbnails/22.jpg)
www.monash.edu.au
Anytime, Anywhere Data Mining E-Services
![Page 23: Www.monash.edu.au Advanced Topics in Data Mining and Research Directions CSE5610 Intelligent Software Systems Semester 1, 2006](https://reader035.vdocuments.mx/reader035/viewer/2022070307/551af5995503462e578b4bd4/html5/thumbnails/23.jpg)
www.monash.edu.au
23
My Thoughts
• Data is a commodity, Analysis is a service
• Access anytime, anywhere• By anyone…
– From large corporations to small business to individuals
• From home buyers to mobile salespersons to grocery shoppers…
![Page 24: Www.monash.edu.au Advanced Topics in Data Mining and Research Directions CSE5610 Intelligent Software Systems Semester 1, 2006](https://reader035.vdocuments.mx/reader035/viewer/2022070307/551af5995503462e578b4bd4/html5/thumbnails/24.jpg)
www.monash.edu.au
24
My Thoughts
• A preliminary model for delivery– Datacentric Grids
High Performance Servers
MiningAlgorithms
ModelRepository
Mobile AgentManagement
System
Model Query
Compute NewModel Request
+Remote User
Data
Compute NewModel Request
+User Data
Compute NewModelRequest
Compute NewModelRequest + UserComputation
Data Repository
Data1
Data2
Datan
PrivateDatacentric
Grid
Compute NewModel Request+ User Data +
UserComputation
Datacentric Grid Management Module
![Page 25: Www.monash.edu.au Advanced Topics in Data Mining and Research Directions CSE5610 Intelligent Software Systems Semester 1, 2006](https://reader035.vdocuments.mx/reader035/viewer/2022070307/551af5995503462e578b4bd4/html5/thumbnails/25.jpg)
www.monash.edu.au
References
![Page 26: Www.monash.edu.au Advanced Topics in Data Mining and Research Directions CSE5610 Intelligent Software Systems Semester 1, 2006](https://reader035.vdocuments.mx/reader035/viewer/2022070307/551af5995503462e578b4bd4/html5/thumbnails/26.jpg)
www.monash.edu.au
26
References
• http://www.csse.monash.edu.au/projects/MobileComponents/projects/dame/
• http://www.csse.monash.edu.au/~shonali/research.html
• http://www.csee.umbc.edu/~hillol/DDMBIB/
• http://www.csee.umbc.edu/~hillol/diadic.html
• http://www.csse.monash.edu.au/~mgaber/main.html