managing and mining (streaming) sensor data petr cížekˇmanaging and mining (streaming) sensor...

20
Managing and mining (streaming) sensor data Petr ˇ Cížek Artificial Intelligence Center Czech Technical University in Prague November 3, 2016 Petr ˇ Cížek VPD 1/1

Upload: others

Post on 08-Jul-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Managing and mining (streaming) sensor data Petr CížekˇManaging and mining (streaming) sensor data Practical example 1 - sensors Sensor MB/s GB/h Inertial Measurement Unit 0.1 0.3

Managing and mining (streaming) sensor data

Petr Cížek

Artificial Intelligence CenterCzech Technical University in Prague

November 3, 2016

Petr Cížek VPD 1 / 1

Page 2: Managing and mining (streaming) sensor data Petr CížekˇManaging and mining (streaming) sensor data Practical example 1 - sensors Sensor MB/s GB/h Inertial Measurement Unit 0.1 0.3

Managing and mining (streaming) sensor data

Stream data mining / stream data querying

Problem definitionData can not be storedData arrive in stream or streamsRandom access to data impossible or very expensive→single scan algorithms

ChallengesQueries are continuousPre-defined vs. ad-hoc queriesAnswer update over time→ anytime property

Petr Cížek VPD 2 / 1

Page 3: Managing and mining (streaming) sensor data Petr CížekˇManaging and mining (streaming) sensor data Practical example 1 - sensors Sensor MB/s GB/h Inertial Measurement Unit 0.1 0.3

Managing and mining (streaming) sensor data

Practical example 1 - sensors

Sensor MB/s GB/hInertial Measurement Unit 0.1 0.3Monocular camera (640x480@60fps MJPEG) ∼1.73 ∼6.1Monocular camera (640x480@60fps RAW) 17.5 63.2Stereo camera (2x640x480@60fps RAW) 35 126.4Velodyne 3D laser scanner ∼100 ∼351

Petr Cížek VPD 3 / 1

Page 4: Managing and mining (streaming) sensor data Petr CížekˇManaging and mining (streaming) sensor data Practical example 1 - sensors Sensor MB/s GB/h Inertial Measurement Unit 0.1 0.3

Managing and mining (streaming) sensor data

Practical example 2 - institutions

Institution GB/sCERN

RAW data (sensors) ∼600000RAW data processed ∼25ALICE 4ATLAS 1CMS 0.6LHCb 0.8

Network peer nodesNIX.cz ∼37AMS-IX ∼500

Petr Cížek VPD 4 / 1

Page 5: Managing and mining (streaming) sensor data Petr CížekˇManaging and mining (streaming) sensor data Practical example 1 - sensors Sensor MB/s GB/h Inertial Measurement Unit 0.1 0.3

Managing and mining (streaming) sensor data

Comparison to traditional data mining

Traditional StreamNo. of passes Multiple SingleProcessing time Unlimited RestrictedMemory usage Unlimited RestrictedType of result Accurate ApproximateConcept Static EvolvingDistributed No Yes

Petr Cížek VPD 5 / 1

Page 6: Managing and mining (streaming) sensor data Petr CížekˇManaging and mining (streaming) sensor data Practical example 1 - sensors Sensor MB/s GB/h Inertial Measurement Unit 0.1 0.3

Managing and mining (streaming) sensor data

Stream data mining / data querying

ApplicationsStatistics, Classification, Clustering, Outlier (error) detection

ChallengesPre-defined vs. ad-hocqueriesConcept-driftConcept-evolutionFeature-evolution

MethodsRandom samplingSketchingHistogramsSliding windows (Fadingwindows)Multi resolution model(subsampling)Feature selection

Petr Cížek VPD 6 / 1

Page 7: Managing and mining (streaming) sensor data Petr CížekˇManaging and mining (streaming) sensor data Practical example 1 - sensors Sensor MB/s GB/h Inertial Measurement Unit 0.1 0.3

Managing and mining (streaming) sensor data

Challenges - concept-drift

Statistical properties of the target variable, which the model istrying to predict, evolve over time in unforeseen ways.

Petr Cížek VPD 7 / 1

Page 8: Managing and mining (streaming) sensor data Petr CížekˇManaging and mining (streaming) sensor data Practical example 1 - sensors Sensor MB/s GB/h Inertial Measurement Unit 0.1 0.3

Managing and mining (streaming) sensor data

Challenges - concept-drift

Statistical properties of the target variable, which the model istrying to predict, evolve over time in unforeseen ways.

Petr Cížek VPD 8 / 1

Page 9: Managing and mining (streaming) sensor data Petr CížekˇManaging and mining (streaming) sensor data Practical example 1 - sensors Sensor MB/s GB/h Inertial Measurement Unit 0.1 0.3

Managing and mining (streaming) sensor data

Challenges - concept-drift

Active solutionsActivated by triggersCan be used by any classification algorithmNeed to completely relearn the model when triggeredE.g. n latest decisions are monitored

Passive solutionsAdaptive - continuously updating the modelDon’t detect changes

Petr Cížek VPD 9 / 1

Page 10: Managing and mining (streaming) sensor data Petr CížekˇManaging and mining (streaming) sensor data Practical example 1 - sensors Sensor MB/s GB/h Inertial Measurement Unit 0.1 0.3

Managing and mining (streaming) sensor data

Challenges - concept-evolutionMisclassification of novel class in data

Petr Cížek VPD 10 / 1

Page 11: Managing and mining (streaming) sensor data Petr CížekˇManaging and mining (streaming) sensor data Practical example 1 - sensors Sensor MB/s GB/h Inertial Measurement Unit 0.1 0.3

Managing and mining (streaming) sensor data

Challenges - feature-evolutionThe features are evolving throughout the time

Petr Cížek VPD 11 / 1

Page 12: Managing and mining (streaming) sensor data Petr CížekˇManaging and mining (streaming) sensor data Practical example 1 - sensors Sensor MB/s GB/h Inertial Measurement Unit 0.1 0.3

Managing and mining (streaming) sensor data

Methods for stream data processing - randomsampling

Subsample the data in randomized way.Save only 1/n samples randomlyLaw of large numbers assure probability completeness

Petr Cížek VPD 12 / 1

Page 13: Managing and mining (streaming) sensor data Petr CížekˇManaging and mining (streaming) sensor data Practical example 1 - sensors Sensor MB/s GB/h Inertial Measurement Unit 0.1 0.3

Managing and mining (streaming) sensor data

Methods for stream data processing - sketching

Extract frequency moments of the streamThe k th frequency moment of a set of frequencies a isFk (a) =

∑ni=1 ak

i

F1 - total count of different frequenciesF2 - statistical properties - e.g. dispersionF∞ - frequency of the most frequent items

Petr Cížek VPD 13 / 1

Page 14: Managing and mining (streaming) sensor data Petr CížekˇManaging and mining (streaming) sensor data Practical example 1 - sensors Sensor MB/s GB/h Inertial Measurement Unit 0.1 0.3

Managing and mining (streaming) sensor data

Methods for stream data processing - histograms

Types of histogramsV-optimalv1, v2 · · · vn classes∑

i(vi − vi)2

Equal-widthEnd-biased

Petr Cížek VPD 14 / 1

Page 15: Managing and mining (streaming) sensor data Petr CížekˇManaging and mining (streaming) sensor data Practical example 1 - sensors Sensor MB/s GB/h Inertial Measurement Unit 0.1 0.3

Managing and mining (streaming) sensor data

Methods for stream data processing - slidingwindow

Forgetting mechanismSliding windowFading factor

Petr Cížek VPD 15 / 1

Page 16: Managing and mining (streaming) sensor data Petr CížekˇManaging and mining (streaming) sensor data Practical example 1 - sensors Sensor MB/s GB/h Inertial Measurement Unit 0.1 0.3

Managing and mining (streaming) sensor data

Methods for stream data processing - multiresolution model

Using decision treesUse part of stream forchoosing the root attributeFollowing examples pass toleaves

Advantage - scalableDisadvantage - only forstationary distribution→ Using context-drift awaredecision trees

Petr Cížek VPD 16 / 1

Page 17: Managing and mining (streaming) sensor data Petr CížekˇManaging and mining (streaming) sensor data Practical example 1 - sensors Sensor MB/s GB/h Inertial Measurement Unit 0.1 0.3

Managing and mining (streaming) sensor data

Methods for stream data processing - featureselection

Features are to characterize a particular sample→ dimensionreductionArtificial - (e.g. Image features)Learned - (e.g. using neural networks, reinforcement learning)

Petr Cížek VPD 17 / 1

Page 18: Managing and mining (streaming) sensor data Petr CížekˇManaging and mining (streaming) sensor data Practical example 1 - sensors Sensor MB/s GB/h Inertial Measurement Unit 0.1 0.3

Managing and mining (streaming) sensor data

Mining data streams - research issues

Mining sequential patternsMining partial periodicityMining notable gradientsMining outliers and unusual patternsClustering

Petr Cížek VPD 18 / 1

Page 19: Managing and mining (streaming) sensor data Petr CížekˇManaging and mining (streaming) sensor data Practical example 1 - sensors Sensor MB/s GB/h Inertial Measurement Unit 0.1 0.3

Managing and mining (streaming) sensor data

Thank you for your attention!

Petr Cížek VPD 19 / 1

Page 20: Managing and mining (streaming) sensor data Petr CížekˇManaging and mining (streaming) sensor data Practical example 1 - sensors Sensor MB/s GB/h Inertial Measurement Unit 0.1 0.3

Managing and mining (streaming) sensor data

References

Pier Luca Lanzi, Course "Machine Learning and Data Mining", Politecnico di Milanohttp://www.slideshare.net/pierluca.lanzi/18-data-streams

Anand Rajaraman and Jeffrey D. Ullman, Mining of Massive Datasets, CambridgeUniversity Press, 2011

Charu C. Aggarwal, Managing and Mining Sensor Data, Springer Science BusinessMedia, 2013.

Manuel Martín, Master’s thesis: Handling concept drift in data stream mining,University of Granadahttp://www.slideshare.net/draxus/handling-concept-drift-in-data-stream-mining

Petr Cížek VPD 20 / 1