in a lot of applications, wireless sensing systems are used for inference and prediction on...
TRANSCRIPT
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
• In a lot of applications, wireless sensing systems are used for inference and prediction on environmental phenomena.
• Statistical models are widely used to represent these environmental phenomena:
• Models characterize how unknown quantities (phenomena) are related to known quantities (measurements):
• Choosing the models involves a great deal of uncertainty.• Often a single model M is used. If M does not characterize a
phenomenon correctly, the inferences and predictions will not be accurate.
• It is better to start with multiple plausible models and select the model by collecting measurements at informative locations.
Reducing Uncertainty in Sensor CalibrationReducing Uncertainty in Sensor Calibration
Reducing Uncertainty in Hardware Functionality (Fault Detection/Diagnosis)Reducing Uncertainty in Hardware Functionality (Fault Detection/Diagnosis)
Take Physical Sample
Reducing Uncertainty in Model SelectionReducing Uncertainty in Model Selection
Minimizing Data Uncertainty through System DesignMinimizing Data Uncertainty through System Design
Deployment Data quality indicators
Bangladesh 45%
GDI Sensors reported 3-60% faulty data
Ecuador Volcano
82% false negative rate / 13% false positive rate
Macroscope 8 of 33 temperature sensors faulty
Laura Balzano, Nabil Hajj Chehade, Sheela Nair, Nithya Ramanathan, Abhishek Sharma, Deborah Estrin, Leana Golubchik, Ramesh Govindan, Mark Hansen, Eddie Kohler, Greg Pottie, Mani Srivastava
Integrity Group, Center for Embedded Networked Sensing
Introduction:Introduction: There are Many Sources of Uncertainty in Interpreting Data There are Many Sources of Uncertainty in Interpreting DataEnvironment Modeling Uncertainty Sensor Calibration Uncertainty
UCLA – UCR – Caltech – USC – UC MercedUCLA – UCR – Caltech – USC – UC Merced
Center for Embedded Networked SensingCenter for Embedded Networked Sensing
Data uncertainty can be reduced through careful system design!
Hardware Uncertainty• Wireless sensing systems utilize low cost and unreliable hardware Faults are common Examples of Sensor
Faults
• Accurate calibration function is required to translate data from sensors
• Calibration parameters for most sensors drifts non-deterministically over time
Problem Description: Online fault detection and diagnosisBy detecting faults when they occur, instead of after the fact, users can take actions in the field to validate questionable data and fix hardware faults.
Confidence
Assumptions: Faults can be common, an initial fault-free training period is not always available, environmental phenomena are hard to predict so tight bounds on expected behavior are not possible
Evaluated in real-world DeploymentsConfidence detects faults with low false positive and negative rates.
Difficult to validate what is truly a fault without ground truth In our San Joaquin deployment we validated data by analyzing soil samples
taken from each sensor
Outlier Detection: Using a continually updated distribution, in place of statically defined thresholds, makes Confidence resilient to human configuration error and adaptable to dynamic environments
Replace Sensor
Gradient
Stan
dard
Dev
iati
on
Readings are mapped into a multi-dimensional space defined by carefully chosen features: gradient, distance from LDR, distance from NLDR, standard deviation.
Points far from the origin are faulty Assume a normal distribution of distances for good points. Points outside 2 standard deviations of the mean distance are considered outliers and are rejected. All other points are used to continually update distribution parameters.
Points are clustered using an online K-means algorithm. Clusters are associated with a previously successful remediating action
Bangladesh Detects 85% of faulty data in a real-world data-trace captured in Bangladesh even though over one third of the data are faulty
San Joaquin River
We ran Confidence In a deployment of 20 sensors in San Joaquin. Confidence accurately detected all 4 faults that occurred and correctly diagnosed 3 of the 4 faults, with no false positives or negatives
Data-driven techniques for identifying faulty sensor readings
1) Rule/Heuristic-based methods
2) Linear-Least Squares Estimation based method• Exploits correlation in the data measured at different sensors
• LLSE Equation:
• HMM model:
• Number of states
•Transition probabilities
• Conditional probability: Pr [O | S ]
• SHORT Rule: Compute the rate of change between two successive samples. If it is above a threshold, this is an instance of SHORT fault.
•NOISE Rule : Compute the std. deviation of samples within time window W. If it is above a threshold, the samples are corrupted by NOISE fault.
Results•Analyzed data sets from real world deployments to characterize the prevalence of data faults using these 3 methods.
• NAMOS deployment : CONSTANT+NOISE faults, up to 30% of samples affected by data faults.
• Intel Lab, Berkeley deployment: CONSTANT+NOISE faults, up to 20% of samples affected by data faults.
• Great Duck Island deployment: SHORT+NOISE faults, 10-15% of samples affected by data faults.
• SensorScope deployment: SHORT faults, very few samples affected by data faults.
SHORT fault NOISE fault
3) Learning data models : Hidden Markov Models
Injected CONSTANT fault
NOYES
Signatures for modeling normal and faulty behavior
• Difficult to initialize sensor signature without learning period that is guaranteed to be fault-free.
– Can use a stricter threshold during learning period to decrease chance of incorporating faults into sensor signature
• Method is dependent on accurately representing fault models, which is difficult without available labeled training data.
• Summarize sensor and fault behaviors using a signature: multivariate probability density of features (Cahill, Lambert, Pinhiero, and Sun; 2000)
• Features chosen to exploit differences between faulty and normal behavior. Current features summarize temporal and spatial information:
– Temporal: actual reading, change between successive readings, voltage
– Spatial: diff. from neighboring sensors.
• Calculate score for new readings using log likelihood ratio:
Higher scores are more suspicious. • Use of sensor signatures allows for sensor-
specific fault detection.
Fault Detection Algorithm(adapted from Detecting Fraud In the Real World; Cahill, Lambert, Pinhiero, and Sun; 2000)
Tested on one week of Cold-Air Drainage data
4/06 4/08 4/10 4/12
stuck-at fault
Sensor 2 malfunctioning at start of deployment;
Noisy readings are learned as “normal” sensor behavior
update sensor sig.
Signature update requires online density estimation•Sequentially update density estimate with each new reading•Unable to store historical data•Must compactly represent density•No single parametric family flexible enough to represent all distributions of featuresDeveloping a new method to do this using log-splines.
Calculate Features: Xt
Calculatescore
New reading
Sensor signature: St Fault signature: F
Score > threshold?
update fault sig.
Sen
sor
1
S
enso
r 2
unusually noisy readings
Low voltage
Problem Description: Blind Calibration Blindly calibrate sensor response from routine measurements collected from the sensor network.Manual calibration is not a scalable practice!
Consider a network with n sensors.
We can call the vector of a true signal from the n sensors x:
And the vector of the measured signal y:
Then assuming the measured signals y are a linear function of x:
and assuming the true signals x lie in a known r-dimensional subspace of Rn which can be defined by P, the orthogonal projection matrix onto the orthogonal complement of that subspace: Then under certain conditions on P, with no noise and exact
knowledge of the subspace, we can perfectly recover the gain factors and partially recover the offset factors.
robust to noise
Error at 2% noise in the measured signal:
Gain: <.01% Offset: <2.4%
robust to mismodeling
Error at 10% of the true signal outside of the assumed subspace:
Gain: <1% Offset: <4%
Evaluation:In a deployment with all sensors in a styrofoam
box, thus with a 1-d signal
subspace, the algorithm
recovers the gains and
offsets almost exactly.
In a deployment with sensors spread across a valley at the James Reserve, using a 4-d
signal subspace constructed from the calibrated data, the gain calibration was quite accurate. The offset
calibration, as expected, captured some of the non-zero mean signal; additionally it was sensitive to the model.
M1 : ti=1(x i,1)+ei, i=1,...,n
M2 : ti=2(x i,2)+ei, i=1,...,n
Max
2() pi{1(x i,1) 2(x i,ˆ t 2)}2
i1
n
n
iiiit xxp
1
222112 )},(),({minargˆ
2
1.Given a design N,where N is the number of observations,find :
ˆ 1Narg min1
{ti 1(x i,1)}2
i1
N
ˆ 2Narg min 2
{ti 2(x i,2)}2
i1
N
2. Add to the design a point xN1 such that :
xN 1 argmaxxZ
{1(x i,ˆ N1) 2(x i,
ˆ 2N )} 2
3.The (N 1)th observation is taken at xN 1
Update : N 1 = (1-) *N *(xN 1)
4.Go back to 1
Algorithm: T-Designs A sequential algorithm is used to iteratively collect measurements that maximize the discrimination between the two models [1].
00.5
11.5
2
0
20
40
60
8010
15
20
25
30
xy
Tem
p
Evaluation on Real Data:
ti=f (x i)+ei, i=1,...,n
M1 : ti=10+11x+12y+e i, i=1,...,n
M2 : ti=20+21x+22y+23x 2+24 y 2+ei, i=1,...,n
Likelihoods: M1 0.1754, M2 3.4368 M2 fits better.
Generalization: In case of multiple models, apply the same algorithm to the best two models that fit the data at each iteration (worst case).
Problem Description: Optimal Sensor placementWhere should we collect measurements to optimally choose a model that represents the field?
Assumptions: Two plausible models. Gaussian noise.
Idea: Find the locations where the “difference” between the two models is the largest.
Technically:
[1] A.C. Atkinson and V.V. Fedorov. Optimal design: Experiments for discriminating between several models. Biometrika 62, 289-303, 1975.