inferring air quality for station location recommendation ...inferring air quality for station...
TRANSCRIPT
Inferring Air Quality for Station LocationRecommendation Based on Urban Big Data
Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng
March 29, 2017
H. Hsieh et. al. Air Quality Inferrence March 29, 2017 1 / 33
Introduction
Table of Contents
1 Introduction
2 Related Work
3 Data and Features
4 Model and Algorithm
5 Experiments
6 Conclusion
H. Hsieh et. al. Air Quality Inferrence March 29, 2017 2 / 33
Introduction
Motivation
Urban air quality (e.g., concentration of NO2, PM2.5 and PM10,) hasattracted more and more attention.Air quality index (AQI) is defined to model the pollution levels of theair.Accurate air-quality monitoring stations are necessary for AQImeasurements.However, it is infeasible to construct a lot of monitoring stations dueto:
space constraintbudget constraintlabor constraint
Crowdsourcing based methods are not applicable due to capabilityconstraint on mobile devices.
H. Hsieh et. al. Air Quality Inferrence March 29, 2017 3 / 33
Introduction
Objective
We need a model to recommend locations for monitoring stations.
Problem DefinitionGiven a set of existing air monitoring stations, where to establish the nextones?
H. Hsieh et. al. Air Quality Inferrence March 29, 2017 4 / 33
Introduction
Challenges
Coverage maximization solution is not applicable since air-qualityvalues are affected by many factors such as weather, traffic, and landusage, which leads to geographically non-smooth values.Localizing stations based on inference difficulty is not applicable sincewe need the ground truth data of all the unobserved locations whichis not realistic.Localizing stations based on performance improvement maximizationis not applicable since we do not really have any observation dataabout the candidate locations.It is difficult to perform an evaluation on the proposed model.
H. Hsieh et. al. Air Quality Inferrence March 29, 2017 5 / 33
Introduction
Framework
A two-stage framework is proposed.First stage: create an AQI inference mechanism that not only caninfer the AQI values of any arbitrary unobserved location but alsoreveal the confidence of its inference. A semi-supervised learningframework to infer the air quality values of arbitrary unobservedlocations in a city is used.Second stage: establish new stations at the locations that canminimize the uncertainty of the inference model. A greedy-basedentropy-minimization (GEM) is used.
H. Hsieh et. al. Air Quality Inferrence March 29, 2017 6 / 33
Introduction
Framework
Figure: The proposed framework.
H. Hsieh et. al. Air Quality Inferrence March 29, 2017 7 / 33
Related Work
Table of Contents
1 Introduction
2 Related Work
3 Data and Features
4 Model and Algorithm
5 Experiments
6 Conclusion
H. Hsieh et. al. Air Quality Inferrence March 29, 2017 8 / 33
Related Work
Inferring Unobserved Sensor Values
Emission Models. Not applicable due to non-smooth value.1 Interpolation models: Inverse Distance Weighting (IDW) and Ordinary
Kriging (OK).2 Dispersion model.
Satellite Remote Sensing. Not applicable due to (1) human factorssuch as traffic and land usage are not considered and (2) sensitivity toweather conditions.Crowdsourcing. Not applicable due to (1) sensor capability and (2)sensing time constraint.Machine Learning methods. Not applicable based experiment results.
H. Hsieh et. al. Air Quality Inferrence March 29, 2017 9 / 33
Related Work
Sensor Deployment Strategies
Deploying from scratch without observed data.Deploying from scratch using observed data. Not applicable due to itdoes not consider incremental deployment.Incremental deployment using observed data.
H. Hsieh et. al. Air Quality Inferrence March 29, 2017 10 / 33
Data and Features
Table of Contents
1 Introduction
2 Related Work
3 Data and Features
4 Model and Algorithm
5 Experiments
6 Conclusion
H. Hsieh et. al. Air Quality Inferrence March 29, 2017 11 / 33
Data and Features
Dataset
Real datasets collected from Beijing air quality monitoring stations isused in this paper.Air Quality Records. The data contains the real-valued AQI of PM2.5and PM10.Meteorological Data. Five features including temperature, humidity,barometer pressure, wind speed, and weather condition (categorizedas cloudy, foggy, rainy, sunny, and snowy) are identified.Point-Of-Interests (POIs). POI has high correlation to the air qualityof the region (e.g. poor air quality might be associated with locationswith many factories).Road Networks. Air quality is strongly affected by the trafficcondition.
H. Hsieh et. al. Air Quality Inferrence March 29, 2017 12 / 33
Model and Algorithm
Table of Contents
1 Introduction
2 Related Work
3 Data and Features
4 Model and Algorithm
5 Experiments
6 Conclusion
H. Hsieh et. al. Air Quality Inferrence March 29, 2017 13 / 33
Model and Algorithm
Affinity Graph
We can infer AQI value of one location using information from otherlocations.Using location with station.Using near-by locationsUsing recent valuesUsing similar layers
Figure: Example of Affinity GraphH. Hsieh et. al. Air Quality Inferrence March 29, 2017 14 / 33
Model and Algorithm
Affinity Function
If two nodes are similar in terms of features, their AQI values aresimilar to each other.For two node u and v, feature similarity: ∆fk(u, v) = ||fk(u)− fk(v)||Affinity of u and v on one feature fk :AFfk (∆fk(u, v)) = a ·∆fk(u, v) + bFor a set of features F = {f1, f2, . . . , fm}, affinity of u and v :a(u, v) = exp(−
∑mk=1 π
2k × AFfk (∆fk(u, v))) which is a softmin of all
affinities.
H. Hsieh et. al. Air Quality Inferrence March 29, 2017 15 / 33
Model and Algorithm
AQI Inference
AOI distribution of one node u: P(u)Force P(u) to be similar to its close neighbors:Q(p) =
∑(u,v)∈E wu,v · (P(u)− P(v))2
Using KL Divergence to measure the difference between P(u) andP(v)KL Divergence: DK L(P(u)||P(v)) =
∫ qmaxx=0 P(u)[x ]ln( P(u)[x ]
P(v)[x ])What if P(v)[x ] = 0?
H. Hsieh et. al. Air Quality Inferrence March 29, 2017 16 / 33
Model and Algorithm
AQI Inference
P(u) is a weighted average of its neighbors, which can be betterillustrated using example in the Figure. So, what the semi-supervisedlearning does is to spread knowns AOI values out in a Affinity GraphFor mathematical part of proof and derivation, Semi-SupervisedLearning Using Gaussian Fields and Harmonic Functions and Anoverview on the Gaussian Fields and Harmonic Functions Method forSemi-supervised Learning would be more than helpful.
Figure: Example of Affinity Graph Learning
H. Hsieh et. al. Air Quality Inferrence March 29, 2017 17 / 33
Model and Algorithm
Minimizing Uncertainty
Since we have figured out that P(u) can be calculated using weightedaverage, the weights are only remaining unknown parameters in Q(P).Then, the question convert to a optimization problem, which is wewant to minimize or maximize something with some constraint onP(u).Intuition: maximize the likelihood of labeled nodes using validationdata. Suffering data sparsityTheir method: minimizing the uncertainty of their prediction. Theuncertainty can be represented using entropy.
H. Hsieh et. al. Air Quality Inferrence March 29, 2017 18 / 33
Model and Algorithm
Entropy
Common form of entropy: H(P) =∫
x P(x)log(P(x))dxEntropy in this paper:H(P) =
∫x (P(x)log(P(x))dx + (1− P(x)log(1− P(x))))
Objective: minimizing average entropy for all unknown nodes U.P(u) is related to w(u, v) andw(u, v) = exp(−
∑mk=1 π
2k × AFfk (∆fk(u, v))), which means the
unknown thing is πk
Using gradient descent method to solve it.
H. Hsieh et. al. Air Quality Inferrence March 29, 2017 19 / 33
Model and Algorithm
Algorithm
Extracting featuresConstruct affinity graphInitialize weights of graphGet initial results of H(P(U))Update P(U) using weights W and then get new H(P(U)), thencalculated gradient of H(P(U)) and update weights πk
Repeat last step until converge.
H. Hsieh et. al. Air Quality Inferrence March 29, 2017 20 / 33
Model and Algorithm
Algorithm 2
Identify the location X0 with the lowest entropyChoose the most likely value inferred from AQInf and mark X0 aslabelledUse the pseudo AQI value together with original observed data tobuild new modelIdentify another location X1 with lowest entropyRepeat 1-4 to rank the locations to be recommended from last to first
Figure: Illustration of GEMH. Hsieh et. al. Air Quality Inferrence March 29, 2017 21 / 33
Experiments
Table of Contents
1 Introduction
2 Related Work
3 Data and Features
4 Model and Algorithm
5 Experiments
6 Conclusion
H. Hsieh et. al. Air Quality Inferrence March 29, 2017 22 / 33
Experiments
Effectiveness of AOInf
Setting1 Decomposed into 50*50 grids, in which 22 have the monitoring
stations containing 10416 time stamps2 Cross-validation by randomly choosing 15 of 22 grids as labelled data
and evaluated by the other 73 Repeat 1000 times and average results are reported
3 Models1 Geographical distances features plus three recent and similar time
layers as features (D+T3)2 Features in (1) plus meteorology data (D+T3+M)3 Features in (2) plus roadnet and POI features (ALL)
Competitors (3 interpolation-based, 2 learning-based and 2semi-supervised learning methods)
1 Spatial kNN, Inverse Distance Weighting (IDW), Ordinary Kriging(OK)
2 ANN, SVR3 Co-training, RBF-SSL
H. Hsieh et. al. Air Quality Inferrence March 29, 2017 23 / 33
Experiments
Error of Air Quality Inference
H. Hsieh et. al. Air Quality Inferrence March 29, 2017 24 / 33
Experiments
Robustness of Air Quality Inference
H. Hsieh et. al. Air Quality Inferrence March 29, 2017 25 / 33
Experiments
Effectiveness of GEM
Setting1 Recommend k=5 locations to establish monitoring stations2 Choose 5 locations among 22 to be labelled data and reserve 10
locations to be candidate locations for building new stations, the rest 7are used for evaluation
2 Evaluation Metrics1 Top-rank ratio (TRR), TRR(C) = Rank(C)
C10k
2 RMSE-improvementCompetitors
1 Distance-based greedy2 Temporal feature-dissimilarity greedy3 Spatial feature-dissimilarity greedy4 Hybrid feature-dissimilarity greedy5 Entropy-based search6 Low edge weight search
H. Hsieh et. al. Air Quality Inferrence March 29, 2017 26 / 33
Experiments
Evaluating GEM with Top-rank ratio
H. Hsieh et. al. Air Quality Inferrence March 29, 2017 27 / 33
Experiments
Evaluating GEM with RMSE-improvement
H. Hsieh et. al. Air Quality Inferrence March 29, 2017 28 / 33
Experiments
Evaluating GEM with Entropy variation
H. Hsieh et. al. Air Quality Inferrence March 29, 2017 29 / 33
Experiments
Evaluating GEM with different number of training stations
H. Hsieh et. al. Air Quality Inferrence March 29, 2017 30 / 33
Experiments
Evaluating GEM with different time spans
H. Hsieh et. al. Air Quality Inferrence March 29, 2017 31 / 33
Conclusion
Table of Contents
1 Introduction
2 Related Work
3 Data and Features
4 Model and Algorithm
5 Experiments
6 Conclusion
H. Hsieh et. al. Air Quality Inferrence March 29, 2017 32 / 33
Conclusion
Conclusion
A model to recommend the most propoer locations for air qualitymonitoring stations is proposed.The affinity graph intergrates spatial and temporal correlations.The weights are learned to not only capture the correlation betweenfeatures and AQI but also to minimize the uncertainty of the model.It is much more effective than myopically minimize entropy or otherheuristics.
H. Hsieh et. al. Air Quality Inferrence March 29, 2017 33 / 33