how dirty is your data : the duality between detecting events and faults
DESCRIPTION
How Dirty is your Data : The Duality between detecting Events and Faults J. Gupchup A. Terzis R. Burns A. Szalay Department of Computer Science Johns Hopkins University. Outline. Background Problem Statement Experiments Results Discussion. Application. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: How Dirty is your Data : The Duality between detecting Events and Faults](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a3e550346895db75aec/html5/thumbnails/1.jpg)
How Dirty is your Data : The Duality between detecting Events and
Faults
J. Gupchup A. Terzis R. Burns A. SzalayDepartment of Computer Science
Johns Hopkins University
![Page 2: How Dirty is your Data : The Duality between detecting Events and Faults](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a3e550346895db75aec/html5/thumbnails/2.jpg)
Outline
Background Problem Statement Experiments Results Discussion
![Page 3: How Dirty is your Data : The Duality between detecting Events and Faults](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a3e550346895db75aec/html5/thumbnails/3.jpg)
Application
Monitoring nesting conditions of the Maryland Box turtles
Science Questions: Do nesting conditions determine sex ?
Important to correlate observations with environmental events (rain, snow etc)
![Page 4: How Dirty is your Data : The Duality between detecting Events and Faults](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a3e550346895db75aec/html5/thumbnails/4.jpg)
Duality of Faults & Events
Data gathered from Sensor Networks contain faults
Delivering faulty data consumes resources and pollutes statistics
Need for fault detection techniques
Fault Detection methods detect readings that deviate from “normal” or “expected” values
Environmental Events :– Scientifically interesting– Deviate from the norm
![Page 5: How Dirty is your Data : The Duality between detecting Events and Faults](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a3e550346895db75aec/html5/thumbnails/5.jpg)
Research Question(s)
Are “Events” misclassified as “Faults” ?
What metrics could be used to quantify the misclassification ?
How does the misclassification vary with:– Type of Fault– Type of Fault Detection method– Type of modality (Moisture, Temperature)
Is it possible to design a fault detection mechanism that minimizes the misclassification ?
![Page 6: How Dirty is your Data : The Duality between detecting Events and Faults](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a3e550346895db75aec/html5/thumbnails/6.jpg)
Know Thy Faults
Short Faults– Sudden Change
in measurement
Noise Faults– Large variations in amplitude than expected– Little or no variation in amplitude (unresponsive)
![Page 7: How Dirty is your Data : The Duality between detecting Events and Faults](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a3e550346895db75aec/html5/thumbnails/7.jpg)
Fault Detection Methods
SHORT Rule– If Xi – X(i-1) > δSHORT mark current measurement as fault (point method)
δSHORT is established from domain knowledge
NOISE Rule– Take W successive samples– IF (σW ≤ σtrain-σallow) OR (σW ≥ σtrain+σallow), mark all W readings as faulty (block method)– σtrain and σallow are established from training data
Linear Least-square Estimation (LLSE)– Estimate expected value of a sensor’s value using other sensors using LLSE
– If Xmodel – Xactual > δLLSE for k of the node’s neighbors, mark the reading as faulty (point method)
A. Sharma, L. Golubchik, and R. Govindan, “On the prevalence of sensor faults in real world deployments”, IEEE conference on Sensor, Mesh and Ad Hoc Communications and networks (SECON), 2007
![Page 8: How Dirty is your Data : The Duality between detecting Events and Faults](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a3e550346895db75aec/html5/thumbnails/8.jpg)
Evaluation Metrics
Misclassification error (μ) for Point faults: μ = event readings tagged as faults / total event measurements
Total Misclassification (μ )= ∑i Di / ∑i Ei
Misclassification error (μ) for Block Faults:
Misclassification
Fault detection evaluation metric : False negative ratio = fraction of faults failed to be detected
Event Period (Ei)
time
Misclassification
Di
Event Period (Ei)
time
Di
![Page 9: How Dirty is your Data : The Duality between detecting Events and Faults](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a3e550346895db75aec/html5/thumbnails/9.jpg)
Jug bay Deployment Map
2
5
6
Turtle Nests
38.784607, -76.700460
Weather Station
Courtesy: Google maps
![Page 10: How Dirty is your Data : The Duality between detecting Events and Faults](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a3e550346895db75aec/html5/thumbnails/10.jpg)
Dataset
Sensor Data: Box temperature and soil moisture 3 motes from Jug Bay (previous slide) 5 months of data (sampled every 10 min.) Train Data Set (1 month), Test Data Set (4
months)
Event Ground Truth (Weather Data): Precipitation data collected from a weather
station ~ 700 m away (sampled every 15 min.) 21 major events (i.e. rainfall) occurred Total rainfall hours : 158 hours
![Page 11: How Dirty is your Data : The Duality between detecting Events and Faults](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a3e550346895db75aec/html5/thumbnails/11.jpg)
Faults Ground Truth
Start with a cleandata set
Inject Faults to Establish groundTruth
![Page 12: How Dirty is your Data : The Duality between detecting Events and Faults](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a3e550346895db75aec/html5/thumbnails/12.jpg)
Methodology
For Each Fault Detection Method & Each modality
Use 1st month’s data to Train
Obtain Model Parameters
Evaluate Method on Fault-Injected Test Data
![Page 13: How Dirty is your Data : The Duality between detecting Events and Faults](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a3e550346895db75aec/html5/thumbnails/13.jpg)
Soil Moisture ‘SHORT RULE’
Reducing the number of misclassification errors increases false negatives
![Page 14: How Dirty is your Data : The Duality between detecting Events and Faults](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a3e550346895db75aec/html5/thumbnails/14.jpg)
Misclassification LLSE method
Modality Misclassification error False Negatives
Box Temperature 0.3 % 77.19 %
Soil Moisture 46.3 % 50.03 %
Higher misclassification can occur due to :
Spatial & Temporal Heterogeneity of the soil
![Page 15: How Dirty is your Data : The Duality between detecting Events and Faults](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a3e550346895db75aec/html5/thumbnails/15.jpg)
Lessons Learned
There exists a tension between detecting Events and Faults
Fault Detection Algorithms need to take this into consideration– Events can be misclassified as faults
Need for novel Fault Detection methods that are robust in the presence of Events
![Page 16: How Dirty is your Data : The Duality between detecting Events and Faults](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a3e550346895db75aec/html5/thumbnails/16.jpg)
Need for Pattern Recognition techniques
![Page 17: How Dirty is your Data : The Duality between detecting Events and Faults](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a3e550346895db75aec/html5/thumbnails/17.jpg)
Acknowledgements
Abhishek Sharma, Dept. of Computer Science, University of Southern California
Chris Swarth, Jug Bay Wetlands Sanctuary Life Under Your Feet team Marcus Chang, University of Copenhagen
(Courtesy : Andreas Terzis)
![Page 18: How Dirty is your Data : The Duality between detecting Events and Faults](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a3e550346895db75aec/html5/thumbnails/18.jpg)
Questions !!!!