distributed data mining system in java group member d91725001 王春笙 d92725002 林俊甫...

16
Distributed Data Mining System in Java Group Member D91725001 王王王 D92725002 王王王 D92725001 王王王

Upload: anis-sims

Post on 01-Jan-2016

239 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Distributed Data Mining System in Java Group Member D91725001 王春笙 D92725002 林俊甫 D92725001 王慧芬

Distributed Data Mining System in Java

Group MemberD91725001 王春笙D92725002 林俊甫D92725001 王慧芬

Page 2: Distributed Data Mining System in Java Group Member D91725001 王春笙 D92725002 林俊甫 D92725001 王慧芬

Overview of Project

Motivation and goals It is time-consuming to perform multi-layer data-

mining over a large data file Joint force to improve performance

Several computing power spreading over net Fault tolerant consideration

The mining process will be continue despite of server crash

Page 3: Distributed Data Mining System in Java Group Member D91725001 王春笙 D92725002 林俊甫 D92725001 王慧芬

Web ServerLog files

Node Node Node

Request service module

distributed mining

Prediction engine

Http

WebClient

System Architecture

Page 4: Distributed Data Mining System in Java Group Member D91725001 王春笙 D92725002 林俊甫 D92725001 王慧芬

Technological Infrastructure

System diagram

LAN

Server/Coordinator

Client Client Client

...

Mining data chunk

Page 5: Distributed Data Mining System in Java Group Member D91725001 王春笙 D92725002 林俊甫 D92725001 王慧芬

Project Timeline

識別碼任務名稱1 System Analysis

2 System Design

3

4 ServerCoordinator

5 Server MsgInterf

6 Server FileDispatch

7 Server IntegrationT.

8 Server Comm.

9 Server Comm. Test

10

11 Client Join/Leave

12 Client Rsc lookup

13 Client Get Rsc

14 Client Integration T.

15 Client Comm.

16 Client Comm. Test

17

18 GUI Design

19 GUI Test

20

21 Integration Test

22 System Test

23 Documentation

3 4 5 6 7 8 9 101112131415161718192021222324252627282930 1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031 1 2 3 4 5 6 7 8 9 102003/11 2003/12 2004/1

Page 6: Distributed Data Mining System in Java Group Member D91725001 王春笙 D92725002 林俊甫 D92725001 王慧芬

Job Distribution

Server programming 林俊甫

Client programming 王春笙

GUI programming and Integration 王慧芬

Page 7: Distributed Data Mining System in Java Group Member D91725001 王春笙 D92725002 林俊甫 D92725001 王慧芬

Technological Infrastructure

System design requirements Transparency Scalability

Dynamic join problem Multi-Threads RMI Multicast Socket Redundancy

Server crash failure Client crash failure

Page 8: Distributed Data Mining System in Java Group Member D91725001 王春笙 D92725002 林俊甫 D92725001 王慧芬

Technological Infrastructure

Rationale/justification Data-mining is computing intensive task Speed of web log data generation may so

quickly as single computer can’t handle it implement distributed prediction engine

have fault tolerance advantage

Page 9: Distributed Data Mining System in Java Group Member D91725001 王春笙 D92725002 林俊甫 D92725001 王慧芬

Technological Infrastructure

Alternatives considered Fully distributed data mining system

Each participant act as peer to peer autonomous node

Client/server distributed data mining system The data server act as fixed coordinator

Page 10: Distributed Data Mining System in Java Group Member D91725001 王春笙 D92725002 林俊甫 D92725001 王慧芬

Implementation Phase

System requirement Hardware

2 or 3 PC with Microsoft Windows platform 1 acts as Server, others act as client as well as redundant server.

Software For implementation

J2SE SDK 1.4.1 Eclipse 2.1 Netbeam 3.5.1

For execution Java web start

Page 11: Distributed Data Mining System in Java Group Member D91725001 王春笙 D92725002 林俊甫 D92725001 王慧芬

Implementation Phase

Implementation Logic Server/Coordinator

Activating at a well-known port, waiting for client connection by threads process. Logging all the connected client information to hash table. Dispatching the designate mining data to clients.

Maintain and multicasting the hast table to each client periodically

Merging & displaying the results return from clients Detecting the connection status for each client. If a client

fail, server performs the backup mechanism and orders backup client to take over failure client’s job.

Page 12: Distributed Data Mining System in Java Group Member D91725001 王春笙 D92725002 林俊甫 D92725001 王慧芬

Implementation Phase

Implementation Client

Once activated, enrolling to server (coordinator) Receiving the hash table broadcasted from server and

updating local hash table periodically and the mining data sent from server

Perform the data mining execution and return the result to server (Coordinator).

Detecting the server connection, if server is not alive, perform the backup mechanism to electing a client acting as backup server.

Page 13: Distributed Data Mining System in Java Group Member D91725001 王春笙 D92725002 林俊甫 D92725001 王慧芬

Implementation Phase

Failure and backup mechanism Client fail:

Server will be informed the connection failure with client . Then, server modifies the connection information in the hash

table, finds a client without any designated job in the hash table , and dispatches the unfinished job to the client.

Server fail: All clients will be informed the connection failure with server. Since all clients keep all connection information in hash table

which is periodically updated from server, after server failed, all clients elect a new server through the same election mechanism.

Then, new server broadcasts the result to all clients, and enter server listening state.

Page 14: Distributed Data Mining System in Java Group Member D91725001 王春笙 D92725002 林俊甫 D92725001 王慧芬

Implementation Phase

Data mining algorithm Using sequential patterns mining algorithm Apriori like Client mining data partition and sent results to c

oordinator(server) Coordinator receive client mining results,union

and validate results by scan all data again Results present as association rules

Page 15: Distributed Data Mining System in Java Group Member D91725001 王春笙 D92725002 林俊甫 D92725001 王慧芬

Implementation Phase

Installation Server

Web log file Server module Client module

Client Client module Server module

The role of a node in mining process may change

Page 16: Distributed Data Mining System in Java Group Member D91725001 王春笙 D92725002 林俊甫 D92725001 王慧芬

Implementation phase

Test Component(server, client, UI) unit test System integration test Fault tolerance test

Component error Transmission error

Network error Host error