v-detector: a real-valued negative selection algorithm zhou ji st. jude childrens research hospital

41
V-detector: a real- valued negative selection algorithm Zhou Ji St. Jude Children’s Research Hospital

Upload: kayla-york

Post on 27-Mar-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

V-detector: a real-valued negative selection

algorithm

Zhou JiSt. Jude Children’s Research Hospital

Page 2: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

What is negative selection?

Biological background: T cells, thymus Major steps:

1. Generate candidates randomly

2. Eliminate those that recognize self samples

Page 3: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

Main steps

Generation detection

Page 4: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

What is matching rule?

When a sample and a detector are considered matching.

Matching rule plays an important role in negative selection algorithm. It largely depends on the data representation.

Page 5: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

In real-valued representation, detector can be visualized as hyper-sphere.Candidate 1: thrown-away; candidate 2: made a detector.

Match or not match?

Page 6: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

Main idea of V-detector

By allowing the detectors to have some variable properties, V-detector enhances negative selection algorithm from several aspects: It takes fewer large detectors to cover non-self region –

saving time and space Small detector covers “holes” better. Coverage is estimated when the detector set is generated.

The shapes of detectors or even the types of matching rules can be extended to be variable too.

Page 7: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

Main concept of Negative Selection and V-detector

Constant-sized detectors Variable-sized detectors

Page 8: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

Outline of the algorithm (generation of variable-sized detector set)

Page 9: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

Detector Set Generation Algorithm

Dreturn :20

maxT|D| Until:19

exit coverage) self maximum-1/(1 T if :18

1TT else :17

r radius and location xith detector w a is r x, where},,{DD then 0r if :16

:sr-drr then sr-d if :15

xand isbetween distanceEuclidean d :14

Sin severy for Repeat :13

:4 togo :12

return then )01/(1 tif :11

1t t :10

iddetector of radius theis )ir(d where then,)ir(ddd if :9

id oflocation theis )i x(d where x,and )idbetween x( distanceEuclidean dd :8

...} 2, 1,i .i{dDin idevery for Repeat :7

]01[ from sample random :6

inifiniter :5

0T :4

0 t :3

Repeat :2

D :1

coverage estimated :0c

radius self :

detector ofnumber maximum :maxT

samples self ofset :

),maxT Set(S,-Detector-V

rx

i

Dc

n, x

sr

S

ocs, r

D

mD

xDD

xisd

iisSis

nx

sr

m

S

srm,(S

return :9

|| Until:8

} { :7

2 togo ,srd if :6

and between distanceEuclidean :5

,...}2,1,{in every for Repeat :4

0] [1, from sample random :3

Repeat :2

D :1

radius self: :

detectors ofnumber :

samples self ofset :

),Set-Detector

Constant-sized detectors

Variable-sized detectors

Page 10: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

Screenshots of the software

Message view Visualization of data points and detectors

Page 11: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

Experiments and Results Synthetic Data

2D. Training data are randomly chosen from the normal region. Fisher’s Iris Data

One of the three types is considered as “normal”. Biomedical Data

Abnormal data are the medical measures of disease carrier patients.

Air Pollution Data Abnormal data are made by artificially altering the normal air

measurements Ball bearings:

Measurement: time series data with preprocessing - 30D and 5D

Page 12: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

Synthetic data - Cross-shaped self space Shape of self region and example detector coverage

(a) Actual self space (b) self radius = 0.05 (c) self radius = 0.1

Page 13: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

Synthetic data - Cross-shaped self

space Results

0

20

40

60

80

100

120

0.01 0.03 0.05 0.07 0.09 0.11 0.13 0.15 0.17 0.19

self radius

det

ecti

on

rat

e

0

10

20

30

40

50

60

70

80

90

fals

e a

larm

rat

e

Detection rate (99.99% coverage) Detection rate (99% coverage)False alarm rate (99% coverage) False alarm rate (99.99% coverage)

0

200

400

600

800

1000

1200

0.01 0.03 0.05 0.07 0.09 0.11 0.13 0.15 0.17 0.19

self radius

nu

mb

er o

f d

etec

tors

99.99% coverage 99% coverage

Detection rate and false alarm rate Number of detectors

Page 14: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

Error rates

0

5

10

15

20

25

30

35

40

45

0.01 0.03 0.05 0.07 0.09 0.11 0.13 0.15 0.17 0.19

self radius

err

or

rate

(p

erc

en

tag

e)

false negative (99% coverage) false positive (99% coverage)

Page 15: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

Synthetic data - Ring-shaped self space Shape of self region and example detector coverage

(a) Actual self space (b) self radius = 0.05 (c) self radius = 0.1

Page 16: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

0

20

40

60

80

100

120

0.01 0.03 0.05 0.07 0.09 0.11 0.13 0.15 0.17 0.19

self radius

det

ecti

on

rat

e

0

10

20

30

40

50

60

70

fals

e a

larm

rat

e

Detection rate (99.99% coverage) Detection rate (99% coverage)False alarm rate (99% coverage) False alarm rate (99.99% coverage)

0

200

400

600

800

1000

1200

0.01 0.03 0.05 0.07 0.09 0.11 0.13 0.15 0.17 0.19

self radius

nu

mb

er o

f d

etec

tors

99.99% coverage 99% coverage

Synthetic data - Ring-shaped self

space Results

Detection rate and false alarm rate Number of detectors

Page 17: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

Iris dataComparison with other methods: performance

Detection rate False alarm rate

Setosa 100% MILA 95.16 0

NSA (single level) 100 0

V-detector 99.98 0

Setosa 50% MILA 94.02 8.42

NSA (single level) 100 11.18

V-detector 99.97 1.32

Versicolor 100% MILA 84.37 0

NSA (single level) 95.67 0

V-detector 85.95 0

Versicolor 50% MILA 84.46 19.6

NSA (single level) 96 22.2

V-detector 88.3 8.42

Virginica 100% MILA 75.75 0

NSA (single level) 92.51 0

V-detector 81.87 0

Virginica 50% MILA 88.96 24.98

NSA (single level) 97.18 33.26

V-detector 93.58 13.18

Page 18: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

Iris dataComparison with other methods: number of detectors

mean max Min SD

Setosa 100% 20 42 5 7.87

Setosa 50% 16.44 33 5 5.63

Veriscolor 100% 153.24 255 72 38.8

Versicolor 50% 110.08 184 60 22.61

Virginica 100% 218.36 443 78 66.11

Virginica 50% 108.12 203 46 30.74

Page 19: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

Iris dataVirginica as normal, 50% points used to train

0

20

40

60

80

100

120

0.01 0.03 0.05 0.07 0.09 0.11 0.13 0.15 0.17 0.19

self radius

de

tec

tio

n r

ate

0

10

20

30

40

50

60

fals

e a

larm

ra

te

Detection rate (99.99% coverage) Detection rate (99% coverage)False alarm rate (99% coverage) False alarm rate (99.99% coverage)

0

200

400

600

800

1000

1200

0.01 0.03 0.05 0.07 0.09 0.11 0.13 0.15 0.17 0.19

self radius

nu

mb

er

of

de

tec

tors

99.99% coverage 99% coverage

Detection rate and false alarm rate Number of detectors

Page 20: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

Biomedical data

Blood measure for a group of 209 patients Each patient has four different types of

measurement 75 patients are carriers of a rare genetic

disorder. Others are normal.

Page 21: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

Biomedical data: results comparison

Training Data Algorithm Detection Rate False Alarm rate Number of Detectors

Mean SD Mean SD Mean SD

100% training MILA 59.07 3.85 0 0 1000* 0

NSA 69.36 2.67 0 0 1000 0

r=0.1 30.61 3.04 0 0 21.52 7.29

r=0.05 40.51 3.92 0 0 14.84 5.14

50% training MILA 61.61 3.82 2.43 0.43 1000* 0

NSA 72.29 2.63 2.94 0.21 1000 0

r = 0.1 32.92 2.35 0.61 0.31 15.51 4.85

r=0.05 42.89 3.83 1.07 0.49 12.28 4

25% training MILA 80.47 2.80 14.93 2.08 1000* 0

NSA 86.96 2.72 19.50 2.05 1000 0

r=0.1 43.68 4.25 1.24 0.5 12.24 3.97

r=0.05 57.97 5.86 2.63 0.77 8.94 2.57

Page 22: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

Biomedical data

0

10

20

30

40

50

60

70

80

90

100

0.01 0.03 0.05 0.07 0.09 0.11 0.13 0.15 0.17 0.19

self radius

de

tec

tio

n r

ate

0

10

20

30

40

50

60

fals

e a

larm

ra

te

Detection rate (99.99% coverage) Detection rate (99% coverage)False alarm rate (99% coverage) False alarm rate (99.99% coverage)

0

200

400

600

800

1000

1200

0.01 0.03 0.05 0.07 0.09 0.11 0.13 0.15 0.17 0.19

self radiusn

um

be

r o

f d

ete

cto

rs

99.99% coverage 99% coverage

Detection rate and false alarm rate Number of detectors

Page 23: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

Air pollution data Totally 60 original records. Each is 16 different measurements concerning air pollution. All the real data are considered as normal. More data are made artificially:

1. Decide the normal range of each of 16 measurements2. Randomly choose a real record3. Change three randomly chosen measurements within a larger

than normal range4. If some the changed measurements are out of range, the

record is considered abnormal; otherwise they are considered normal

Totally 1000 records including the original 60 are used as test data. The original 60 are used as training data.

Page 24: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

Air pollution data

0

20

40

60

80

100

120

0.01 0.03 0.05 0.07 0.09 0.11 0.13 0.15 0.17 0.19

self radius

de

tec

tio

n r

ate

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

fals

e a

larm

ra

te

Detection rate (99.99% coverage) Detection rate (99% coverage)False alarm rate (99% coverage) False alarm rate (99.99% coverage)

0

200

400

600

800

1000

1200

0.01 0.03 0.05 0.07 0.09 0.11 0.13 0.15 0.17 0.19

self radius

nu

mb

er

of

de

tec

tors

99.99% coverage 99% coverage

Detection rate and false alarm rate Number of detectors

Page 25: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

Ball bearing data

raw data: time series of acceleration measurements

Preprocessing (from time domain to representation space for detection)

1. FFT (Fast Fourier Transform) with Hanning windowing: window size 30

2. Statistical moments: up to 5th order

Page 26: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

Example of data (raw data of new bearings) --- first 1000 points

-60

-40

-20

0

20

40

60

80

1 33 65 97 129 161 193 225 257 289 321 353 385 417 449 481 513 545 577 609 641 673 705 737 769 801 833 865 897 929 961 993

Page 27: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

Example of data (FFT of new bearings) --- first 3 coefficients of the first 100 points

0

100

200

300

400

500

600

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100

coefficient 1 coefficient 2 coeffcient 3

Page 28: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

Example of data (statistical moments of new bearings) --- moments up to 3rd order of the first 100 points

-2000

-1000

0

1000

2000

3000

4000

5000

6000

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100

1st order 2nd order 3rd order

Page 29: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

Ball bearing’s structure and damage

Damaged cage

Page 30: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

Ball bearing data: resultsBall bearing conditions Total number of data points Number of detected

anomaliesPercentage detected

New bearing (normal) 2739 0 0%

Outer race completely broken 2241 2182 97.37%

Broken cage with one loose element 2988 577 19.31%

Damage cage, four loose elements 2988 337 11.28%

No evident damage; badly worn 2988 209 6.99%

Ball bearing conditions Total number of data points Number of detectedanomalies

Percentage detected

New bearing (normal) 2651 0 0%

Outer race completely broken 2169 1674 77.18%

Broken cage with one loose element 2892 14 0.48%

Damage cage, four loose elements 2892 0 0%

No evident damage; badly worn 2892 0 0%

Preprocessed with FFT

Preprocessed with statistical moments

Page 31: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

Ball bearing data: performance summary

Statistical Moments

77.18

Statistical Moments

21.22

FourierTransform97.37

FourierTransform37.68

FourierTransform3.65

Statistical Moments

00

20

40

60

80

100

120

Detection Rate for the WorstDamage

Detection Rate for AllDamages

False Alarm Rate

Page 32: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

New development of this work

A new algorithm to generate variable-sized detectors. Purpose: reduce the possible “false negative” at the

boundary of self region Why the issue exits: some self samples may be very close

to the boundary. Main idea: differentiate between “internal self samples” and

“boundary self samples” Solution: combine the advantage of the algorithms to

generate variable-sized and constant-sized detectors described previously.

Page 33: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

How much one sample tells

Page 34: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

Samples may be on boundary

Page 35: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

In term of detectors

Page 36: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

Comparing three methods

Constant-sized detectors V-detector New algorithm

Self radius = 0.05

Page 37: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

Comparing three methods

Constant-sized detectors V-detectors New algorithm

Self radius = 0.1

Page 38: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

Work ongoing

Estimate of coverage using formal statistics “point estimate” is the simplest method. Two types of statistical inference:

1. Confidence interval

2. Hypothesis testing

Page 39: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

Point estimate of proportion

Page 40: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

Summary

1. V-detector uses fewer detectors to obtain similar coverage.2. Smaller detectors are more acceptable if the total number of

detectors are largely controlled.3. Coverage estimate is superior to fixed number of detectors.4. V-detector can deal with high-dimensional data, including

time series, better.5. Self radius and estimated coverage are the two control

parameters in V-detector.6. Variable size, variable shape, variable matching rules, or

other variable properties of detectors provide encouraging opportunity to enhance negative selection mechanism.

Page 41: V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital

9-17-2004