distributed data mining system in java group member d91725001 王春笙 d92725002 林俊甫...
TRANSCRIPT
Distributed Data Mining System in Java
Group MemberD91725001 王春笙D92725002 林俊甫D92725001 王慧芬
Overview of Project
Motivation and goals It is time-consuming to perform multi-layer data-
mining over a large data file Joint force to improve performance
Several computing power spreading over net Fault tolerant consideration
The mining process will be continue despite of server crash
Web ServerLog files
Node Node Node
Request service module
distributed mining
Prediction engine
Http
WebClient
System Architecture
Technological Infrastructure
System diagram
LAN
Server/Coordinator
Client Client Client
...
Mining data chunk
Project Timeline
識別碼任務名稱1 System Analysis
2 System Design
3
4 ServerCoordinator
5 Server MsgInterf
6 Server FileDispatch
7 Server IntegrationT.
8 Server Comm.
9 Server Comm. Test
10
11 Client Join/Leave
12 Client Rsc lookup
13 Client Get Rsc
14 Client Integration T.
15 Client Comm.
16 Client Comm. Test
17
18 GUI Design
19 GUI Test
20
21 Integration Test
22 System Test
23 Documentation
3 4 5 6 7 8 9 101112131415161718192021222324252627282930 1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031 1 2 3 4 5 6 7 8 9 102003/11 2003/12 2004/1
Job Distribution
Server programming 林俊甫
Client programming 王春笙
GUI programming and Integration 王慧芬
Technological Infrastructure
System design requirements Transparency Scalability
Dynamic join problem Multi-Threads RMI Multicast Socket Redundancy
Server crash failure Client crash failure
Technological Infrastructure
Rationale/justification Data-mining is computing intensive task Speed of web log data generation may so
quickly as single computer can’t handle it implement distributed prediction engine
have fault tolerance advantage
Technological Infrastructure
Alternatives considered Fully distributed data mining system
Each participant act as peer to peer autonomous node
Client/server distributed data mining system The data server act as fixed coordinator
Implementation Phase
System requirement Hardware
2 or 3 PC with Microsoft Windows platform 1 acts as Server, others act as client as well as redundant server.
Software For implementation
J2SE SDK 1.4.1 Eclipse 2.1 Netbeam 3.5.1
For execution Java web start
Implementation Phase
Implementation Logic Server/Coordinator
Activating at a well-known port, waiting for client connection by threads process. Logging all the connected client information to hash table. Dispatching the designate mining data to clients.
Maintain and multicasting the hast table to each client periodically
Merging & displaying the results return from clients Detecting the connection status for each client. If a client
fail, server performs the backup mechanism and orders backup client to take over failure client’s job.
Implementation Phase
Implementation Client
Once activated, enrolling to server (coordinator) Receiving the hash table broadcasted from server and
updating local hash table periodically and the mining data sent from server
Perform the data mining execution and return the result to server (Coordinator).
Detecting the server connection, if server is not alive, perform the backup mechanism to electing a client acting as backup server.
Implementation Phase
Failure and backup mechanism Client fail:
Server will be informed the connection failure with client . Then, server modifies the connection information in the hash
table, finds a client without any designated job in the hash table , and dispatches the unfinished job to the client.
Server fail: All clients will be informed the connection failure with server. Since all clients keep all connection information in hash table
which is periodically updated from server, after server failed, all clients elect a new server through the same election mechanism.
Then, new server broadcasts the result to all clients, and enter server listening state.
Implementation Phase
Data mining algorithm Using sequential patterns mining algorithm Apriori like Client mining data partition and sent results to c
oordinator(server) Coordinator receive client mining results,union
and validate results by scan all data again Results present as association rules
Implementation Phase
Installation Server
Web log file Server module Client module
Client Client module Server module
The role of a node in mining process may change
Implementation phase
Test Component(server, client, UI) unit test System integration test Fault tolerance test
Component error Transmission error
Network error Host error