introduction to ieee icdm data mining contest (icdm dmc 2007)

25
Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007) [email protected]

Upload: dustin-scott

Post on 17-Jan-2018

255 views

Category:

Documents


0 download

DESCRIPTION

Introduction to ICDM DMC 2007 This year’contest is the first IEEE ICDM Data Mining Contest,which will be held in conjunction with the 2007 IEEE International Conference on Data Mining. 07/

TRANSCRIPT

Page 1: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Introduction to IEEE ICDM Data Mining

Contest (ICDM DMC 2007)

[email protected]

Page 2: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Main Parts

• Introduction to ICDM DMC 2007

• The work of our team

Page 3: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Introduction to ICDM DMC 2007

• This year’contest is the first IEEE ICDM Data Mining Contest,which will be held in conjunction with the 2007 IEEE International Conference on Data Mining.

• http://www.cse.ust.hk/~qyang/ICDMDMC07/

Page 4: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

What is the Problem?

• This year's contest is about indoor location estimation from radio signal strengths received by a client device from various WiFi Access Points (APs)

Page 5: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

What is the AP?

• Access Points are base stations for the wireless network. They transmit and receive radio frequencies for wireless enabled devices to communicate with.

Page 6: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

• The client device (which can be a PDA) is equipped with a wireless card that can receive signals from many surrounding wireless access points (APs).  Each of these APs is identifiable with a unique ID.  Based on the collection of signal strength values (RSS values), a data mining algorithm running on the client device tries to figure out the current location of the user.

Page 7: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

RSS Vectors

• RSS Vector = <(AP1, RSS Value1), (AP2, RSS Value2)...(AP k, RSS Value k)>

• The ID of AP is an integer between 0 and 100.• The value is also an interger between 0 and –99.• The number k is different in difference RSS• The WiFi data are very noisy due to the so-called

multi-path effect in indoor environments

Page 8: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Location Label

• All WiFi data are collected in 247 locations, where each location is a grid.  A grid has a size of about 1.5m×1.5m.

• Location label is an integer between 1 and 247.

Page 9: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Task 1. Indoor Location Estimation

• All the WiFi data (training data and test data) are collected by the same device in the same time period.

• There are two types of data provided in this task:

• 1 trace data • 2 non-trace data.

Page 10: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Task1. trace data

Page 11: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Some statistical information of task1.trace data

• 40 traces • 1404 collections , 130 collections labeled• 11881 pairs of APID and value• Average 8.5 pairs of APID and value per

collection, the minimum is 1,maximum is 19

Page 12: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Task1. non-trace data

Page 13: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Some statistical information of task1.non-trace data

• 1792 collections of RSS values • 375 collections labeled• Average 8.5 pairs of APID and value per

collection, the minimum is 1,maximum is 19

• 15256 pairs of APID and value

Page 14: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Task_2_training_data

Page 15: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Some statistical information of Task_2_training_data

• 2322 collections of RSS values • 621 collections labeled• 2.5 collections labeled per class. Min is 1

and max is 8• Average 8.6 pairs of APID and value per

collection, the minimum is 2,maximum is 19

Page 16: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Task2 Test Dataset

Page 17: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Task2 Landmark Dataset

Page 18: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Evaluation Criterion

• For Task 1, baseline is precision=60%.

• For Task 2, baseline is precision=30%.

Page 19: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

The algorithm of our teamfor task2

Page 20: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Step1:sieve out the collections labeled

Page 21: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Step2:Get Differences of Arbitrary Two Collections labeled

• Number of the pairs of APID – value which are only in one collection

• Sum of absolute of such RSS value above with -100

• Number of the pairs of APID – value which are in two collection

• Sum of absolute of such RSS value above• Is or is not same location, 1 is same and –1 is

not

Page 22: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

An example

• collectionA:119 18:-96 23:-87 66:-69

• collectionB: 54 18:-94 83:-62 85:-76 86:-72 89:-85

• The Five number is 6,149,1,2,-1

Page 23: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Step3:Get coefficients by Linear Fitting

• e=dlmread('distance_matrix.txt');• b=e(:,5);• x=e(:,7:9);• x(:,2)=[];• [x1,y1] = find(b>0);• x_pos =x(x1,:);• b_pos=b(x1,1);• x_append = x;• b_append = b;• for i = 1:floor(length(b)/length(b_pos))• x_append=cat(1,x_append,x_pos);• b_append = cat(1,b_append,b_pos);• end• a=x_append\b_append;• c=(x*a).*b;• accuracy = sum(c>0)/length(b);• display(accuracy);

Page 24: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Remainder Steps:

Step4: Get centers of per class( the collections of the same location)

Step5: Testing.Our highest precision=28.30%

Page 25: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Thank you!