introduction to ieee icdm data mining contest (icdm dmc 2007)

Post on 17-Jan-2018

256 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Introduction to ICDM DMC 2007 This year’contest is the first IEEE ICDM Data Mining Contest,which will be held in conjunction with the 2007 IEEE International Conference on Data Mining. 07/

TRANSCRIPT

Introduction to IEEE ICDM Data Mining

Contest (ICDM DMC 2007)

guofeng314@163.com

Main Parts

• Introduction to ICDM DMC 2007

• The work of our team

Introduction to ICDM DMC 2007

• This year’contest is the first IEEE ICDM Data Mining Contest,which will be held in conjunction with the 2007 IEEE International Conference on Data Mining.

• http://www.cse.ust.hk/~qyang/ICDMDMC07/

What is the Problem?

• This year's contest is about indoor location estimation from radio signal strengths received by a client device from various WiFi Access Points (APs)

What is the AP?

• Access Points are base stations for the wireless network. They transmit and receive radio frequencies for wireless enabled devices to communicate with.

• The client device (which can be a PDA) is equipped with a wireless card that can receive signals from many surrounding wireless access points (APs).  Each of these APs is identifiable with a unique ID.  Based on the collection of signal strength values (RSS values), a data mining algorithm running on the client device tries to figure out the current location of the user.

RSS Vectors

• RSS Vector = <(AP1, RSS Value1), (AP2, RSS Value2)...(AP k, RSS Value k)>

• The ID of AP is an integer between 0 and 100.• The value is also an interger between 0 and –99.• The number k is different in difference RSS• The WiFi data are very noisy due to the so-called

multi-path effect in indoor environments

Location Label

• All WiFi data are collected in 247 locations, where each location is a grid.  A grid has a size of about 1.5m×1.5m.

• Location label is an integer between 1 and 247.

Task 1. Indoor Location Estimation

• All the WiFi data (training data and test data) are collected by the same device in the same time period.

• There are two types of data provided in this task:

• 1 trace data • 2 non-trace data.

Task1. trace data

Some statistical information of task1.trace data

• 40 traces • 1404 collections , 130 collections labeled• 11881 pairs of APID and value• Average 8.5 pairs of APID and value per

collection, the minimum is 1,maximum is 19

Task1. non-trace data

Some statistical information of task1.non-trace data

• 1792 collections of RSS values • 375 collections labeled• Average 8.5 pairs of APID and value per

collection, the minimum is 1,maximum is 19

• 15256 pairs of APID and value

Task_2_training_data

Some statistical information of Task_2_training_data

• 2322 collections of RSS values • 621 collections labeled• 2.5 collections labeled per class. Min is 1

and max is 8• Average 8.6 pairs of APID and value per

collection, the minimum is 2,maximum is 19

Task2 Test Dataset

Task2 Landmark Dataset

Evaluation Criterion

• For Task 1, baseline is precision=60%.

• For Task 2, baseline is precision=30%.

The algorithm of our teamfor task2

Step1:sieve out the collections labeled

Step2:Get Differences of Arbitrary Two Collections labeled

• Number of the pairs of APID – value which are only in one collection

• Sum of absolute of such RSS value above with -100

• Number of the pairs of APID – value which are in two collection

• Sum of absolute of such RSS value above• Is or is not same location, 1 is same and –1 is

not

An example

• collectionA:119 18:-96 23:-87 66:-69

• collectionB: 54 18:-94 83:-62 85:-76 86:-72 89:-85

• The Five number is 6,149,1,2,-1

Step3:Get coefficients by Linear Fitting

• e=dlmread('distance_matrix.txt');• b=e(:,5);• x=e(:,7:9);• x(:,2)=[];• [x1,y1] = find(b>0);• x_pos =x(x1,:);• b_pos=b(x1,1);• x_append = x;• b_append = b;• for i = 1:floor(length(b)/length(b_pos))• x_append=cat(1,x_append,x_pos);• b_append = cat(1,b_append,b_pos);• end• a=x_append\b_append;• c=(x*a).*b;• accuracy = sum(c>0)/length(b);• display(accuracy);

Remainder Steps:

Step4: Get centers of per class( the collections of the same location)

Step5: Testing.Our highest precision=28.30%

Thank you!

top related