Managing and mining (streaming) sensor data
Petr Cížek
Artificial Intelligence CenterCzech Technical University in Prague
November 3, 2016
Petr Cížek VPD 1 / 1
Managing and mining (streaming) sensor data
Stream data mining / stream data querying
Problem definitionData can not be storedData arrive in stream or streamsRandom access to data impossible or very expensive→single scan algorithms
ChallengesQueries are continuousPre-defined vs. ad-hoc queriesAnswer update over time→ anytime property
Petr Cížek VPD 2 / 1
Managing and mining (streaming) sensor data
Practical example 1 - sensors
Sensor MB/s GB/hInertial Measurement Unit 0.1 0.3Monocular camera (640x480@60fps MJPEG) ∼1.73 ∼6.1Monocular camera (640x480@60fps RAW) 17.5 63.2Stereo camera (2x640x480@60fps RAW) 35 126.4Velodyne 3D laser scanner ∼100 ∼351
Petr Cížek VPD 3 / 1
Managing and mining (streaming) sensor data
Practical example 2 - institutions
Institution GB/sCERN
RAW data (sensors) ∼600000RAW data processed ∼25ALICE 4ATLAS 1CMS 0.6LHCb 0.8
Network peer nodesNIX.cz ∼37AMS-IX ∼500
Petr Cížek VPD 4 / 1
Managing and mining (streaming) sensor data
Comparison to traditional data mining
Traditional StreamNo. of passes Multiple SingleProcessing time Unlimited RestrictedMemory usage Unlimited RestrictedType of result Accurate ApproximateConcept Static EvolvingDistributed No Yes
Petr Cížek VPD 5 / 1
Managing and mining (streaming) sensor data
Stream data mining / data querying
ApplicationsStatistics, Classification, Clustering, Outlier (error) detection
ChallengesPre-defined vs. ad-hocqueriesConcept-driftConcept-evolutionFeature-evolution
MethodsRandom samplingSketchingHistogramsSliding windows (Fadingwindows)Multi resolution model(subsampling)Feature selection
Petr Cížek VPD 6 / 1
Managing and mining (streaming) sensor data
Challenges - concept-drift
Statistical properties of the target variable, which the model istrying to predict, evolve over time in unforeseen ways.
Petr Cížek VPD 7 / 1
Managing and mining (streaming) sensor data
Challenges - concept-drift
Statistical properties of the target variable, which the model istrying to predict, evolve over time in unforeseen ways.
Petr Cížek VPD 8 / 1
Managing and mining (streaming) sensor data
Challenges - concept-drift
Active solutionsActivated by triggersCan be used by any classification algorithmNeed to completely relearn the model when triggeredE.g. n latest decisions are monitored
Passive solutionsAdaptive - continuously updating the modelDon’t detect changes
Petr Cížek VPD 9 / 1
Managing and mining (streaming) sensor data
Challenges - concept-evolutionMisclassification of novel class in data
Petr Cížek VPD 10 / 1
Managing and mining (streaming) sensor data
Challenges - feature-evolutionThe features are evolving throughout the time
Petr Cížek VPD 11 / 1
Managing and mining (streaming) sensor data
Methods for stream data processing - randomsampling
Subsample the data in randomized way.Save only 1/n samples randomlyLaw of large numbers assure probability completeness
Petr Cížek VPD 12 / 1
Managing and mining (streaming) sensor data
Methods for stream data processing - sketching
Extract frequency moments of the streamThe k th frequency moment of a set of frequencies a isFk (a) =
∑ni=1 ak
i
F1 - total count of different frequenciesF2 - statistical properties - e.g. dispersionF∞ - frequency of the most frequent items
Petr Cížek VPD 13 / 1
Managing and mining (streaming) sensor data
Methods for stream data processing - histograms
Types of histogramsV-optimalv1, v2 · · · vn classes∑
i(vi − vi)2
Equal-widthEnd-biased
Petr Cížek VPD 14 / 1
Managing and mining (streaming) sensor data
Methods for stream data processing - slidingwindow
Forgetting mechanismSliding windowFading factor
Petr Cížek VPD 15 / 1
Managing and mining (streaming) sensor data
Methods for stream data processing - multiresolution model
Using decision treesUse part of stream forchoosing the root attributeFollowing examples pass toleaves
Advantage - scalableDisadvantage - only forstationary distribution→ Using context-drift awaredecision trees
Petr Cížek VPD 16 / 1
Managing and mining (streaming) sensor data
Methods for stream data processing - featureselection
Features are to characterize a particular sample→ dimensionreductionArtificial - (e.g. Image features)Learned - (e.g. using neural networks, reinforcement learning)
Petr Cížek VPD 17 / 1
Managing and mining (streaming) sensor data
Mining data streams - research issues
Mining sequential patternsMining partial periodicityMining notable gradientsMining outliers and unusual patternsClustering
Petr Cížek VPD 18 / 1
Managing and mining (streaming) sensor data
Thank you for your attention!
Petr Cížek VPD 19 / 1
Managing and mining (streaming) sensor data
References
Pier Luca Lanzi, Course "Machine Learning and Data Mining", Politecnico di Milanohttp://www.slideshare.net/pierluca.lanzi/18-data-streams
Anand Rajaraman and Jeffrey D. Ullman, Mining of Massive Datasets, CambridgeUniversity Press, 2011
Charu C. Aggarwal, Managing and Mining Sensor Data, Springer Science BusinessMedia, 2013.
Manuel Martín, Master’s thesis: Handling concept drift in data stream mining,University of Granadahttp://www.slideshare.net/draxus/handling-concept-drift-in-data-stream-mining
Petr Cížek VPD 20 / 1