inferring air quality for station location recommendation ...inferring air quality for station...

33
Inferring Air Quality for Station Location Recommendation Based on Urban Big Data Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng March 29, 2017 H. Hsieh et. al. Air Quality Inferrence March 29, 2017 1 / 33

Upload: others

Post on 08-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Inferring Air Quality for Station Location Recommendation ...Inferring Air Quality for Station Location Recommendation Based on Urban Big Data Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Inferring Air Quality for Station LocationRecommendation Based on Urban Big Data

Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

March 29, 2017

H. Hsieh et. al. Air Quality Inferrence March 29, 2017 1 / 33

Page 2: Inferring Air Quality for Station Location Recommendation ...Inferring Air Quality for Station Location Recommendation Based on Urban Big Data Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Introduction

Table of Contents

1 Introduction

2 Related Work

3 Data and Features

4 Model and Algorithm

5 Experiments

6 Conclusion

H. Hsieh et. al. Air Quality Inferrence March 29, 2017 2 / 33

Page 3: Inferring Air Quality for Station Location Recommendation ...Inferring Air Quality for Station Location Recommendation Based on Urban Big Data Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Introduction

Motivation

Urban air quality (e.g., concentration of NO2, PM2.5 and PM10,) hasattracted more and more attention.Air quality index (AQI) is defined to model the pollution levels of theair.Accurate air-quality monitoring stations are necessary for AQImeasurements.However, it is infeasible to construct a lot of monitoring stations dueto:

space constraintbudget constraintlabor constraint

Crowdsourcing based methods are not applicable due to capabilityconstraint on mobile devices.

H. Hsieh et. al. Air Quality Inferrence March 29, 2017 3 / 33

Page 4: Inferring Air Quality for Station Location Recommendation ...Inferring Air Quality for Station Location Recommendation Based on Urban Big Data Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Introduction

Objective

We need a model to recommend locations for monitoring stations.

Problem DefinitionGiven a set of existing air monitoring stations, where to establish the nextones?

H. Hsieh et. al. Air Quality Inferrence March 29, 2017 4 / 33

Page 5: Inferring Air Quality for Station Location Recommendation ...Inferring Air Quality for Station Location Recommendation Based on Urban Big Data Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Introduction

Challenges

Coverage maximization solution is not applicable since air-qualityvalues are affected by many factors such as weather, traffic, and landusage, which leads to geographically non-smooth values.Localizing stations based on inference difficulty is not applicable sincewe need the ground truth data of all the unobserved locations whichis not realistic.Localizing stations based on performance improvement maximizationis not applicable since we do not really have any observation dataabout the candidate locations.It is difficult to perform an evaluation on the proposed model.

H. Hsieh et. al. Air Quality Inferrence March 29, 2017 5 / 33

Page 6: Inferring Air Quality for Station Location Recommendation ...Inferring Air Quality for Station Location Recommendation Based on Urban Big Data Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Introduction

Framework

A two-stage framework is proposed.First stage: create an AQI inference mechanism that not only caninfer the AQI values of any arbitrary unobserved location but alsoreveal the confidence of its inference. A semi-supervised learningframework to infer the air quality values of arbitrary unobservedlocations in a city is used.Second stage: establish new stations at the locations that canminimize the uncertainty of the inference model. A greedy-basedentropy-minimization (GEM) is used.

H. Hsieh et. al. Air Quality Inferrence March 29, 2017 6 / 33

Page 7: Inferring Air Quality for Station Location Recommendation ...Inferring Air Quality for Station Location Recommendation Based on Urban Big Data Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Introduction

Framework

Figure: The proposed framework.

H. Hsieh et. al. Air Quality Inferrence March 29, 2017 7 / 33

Page 8: Inferring Air Quality for Station Location Recommendation ...Inferring Air Quality for Station Location Recommendation Based on Urban Big Data Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Related Work

Table of Contents

1 Introduction

2 Related Work

3 Data and Features

4 Model and Algorithm

5 Experiments

6 Conclusion

H. Hsieh et. al. Air Quality Inferrence March 29, 2017 8 / 33

Page 9: Inferring Air Quality for Station Location Recommendation ...Inferring Air Quality for Station Location Recommendation Based on Urban Big Data Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Related Work

Inferring Unobserved Sensor Values

Emission Models. Not applicable due to non-smooth value.1 Interpolation models: Inverse Distance Weighting (IDW) and Ordinary

Kriging (OK).2 Dispersion model.

Satellite Remote Sensing. Not applicable due to (1) human factorssuch as traffic and land usage are not considered and (2) sensitivity toweather conditions.Crowdsourcing. Not applicable due to (1) sensor capability and (2)sensing time constraint.Machine Learning methods. Not applicable based experiment results.

H. Hsieh et. al. Air Quality Inferrence March 29, 2017 9 / 33

Page 10: Inferring Air Quality for Station Location Recommendation ...Inferring Air Quality for Station Location Recommendation Based on Urban Big Data Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Related Work

Sensor Deployment Strategies

Deploying from scratch without observed data.Deploying from scratch using observed data. Not applicable due to itdoes not consider incremental deployment.Incremental deployment using observed data.

H. Hsieh et. al. Air Quality Inferrence March 29, 2017 10 / 33

Page 11: Inferring Air Quality for Station Location Recommendation ...Inferring Air Quality for Station Location Recommendation Based on Urban Big Data Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Data and Features

Table of Contents

1 Introduction

2 Related Work

3 Data and Features

4 Model and Algorithm

5 Experiments

6 Conclusion

H. Hsieh et. al. Air Quality Inferrence March 29, 2017 11 / 33

Page 12: Inferring Air Quality for Station Location Recommendation ...Inferring Air Quality for Station Location Recommendation Based on Urban Big Data Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Data and Features

Dataset

Real datasets collected from Beijing air quality monitoring stations isused in this paper.Air Quality Records. The data contains the real-valued AQI of PM2.5and PM10.Meteorological Data. Five features including temperature, humidity,barometer pressure, wind speed, and weather condition (categorizedas cloudy, foggy, rainy, sunny, and snowy) are identified.Point-Of-Interests (POIs). POI has high correlation to the air qualityof the region (e.g. poor air quality might be associated with locationswith many factories).Road Networks. Air quality is strongly affected by the trafficcondition.

H. Hsieh et. al. Air Quality Inferrence March 29, 2017 12 / 33

Page 13: Inferring Air Quality for Station Location Recommendation ...Inferring Air Quality for Station Location Recommendation Based on Urban Big Data Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Model and Algorithm

Table of Contents

1 Introduction

2 Related Work

3 Data and Features

4 Model and Algorithm

5 Experiments

6 Conclusion

H. Hsieh et. al. Air Quality Inferrence March 29, 2017 13 / 33

Page 14: Inferring Air Quality for Station Location Recommendation ...Inferring Air Quality for Station Location Recommendation Based on Urban Big Data Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Model and Algorithm

Affinity Graph

We can infer AQI value of one location using information from otherlocations.Using location with station.Using near-by locationsUsing recent valuesUsing similar layers

Figure: Example of Affinity GraphH. Hsieh et. al. Air Quality Inferrence March 29, 2017 14 / 33

Page 15: Inferring Air Quality for Station Location Recommendation ...Inferring Air Quality for Station Location Recommendation Based on Urban Big Data Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Model and Algorithm

Affinity Function

If two nodes are similar in terms of features, their AQI values aresimilar to each other.For two node u and v, feature similarity: ∆fk(u, v) = ||fk(u)− fk(v)||Affinity of u and v on one feature fk :AFfk (∆fk(u, v)) = a ·∆fk(u, v) + bFor a set of features F = {f1, f2, . . . , fm}, affinity of u and v :a(u, v) = exp(−

∑mk=1 π

2k × AFfk (∆fk(u, v))) which is a softmin of all

affinities.

H. Hsieh et. al. Air Quality Inferrence March 29, 2017 15 / 33

Page 16: Inferring Air Quality for Station Location Recommendation ...Inferring Air Quality for Station Location Recommendation Based on Urban Big Data Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Model and Algorithm

AQI Inference

AOI distribution of one node u: P(u)Force P(u) to be similar to its close neighbors:Q(p) =

∑(u,v)∈E wu,v · (P(u)− P(v))2

Using KL Divergence to measure the difference between P(u) andP(v)KL Divergence: DK L(P(u)||P(v)) =

∫ qmaxx=0 P(u)[x ]ln( P(u)[x ]

P(v)[x ])What if P(v)[x ] = 0?

H. Hsieh et. al. Air Quality Inferrence March 29, 2017 16 / 33

Page 17: Inferring Air Quality for Station Location Recommendation ...Inferring Air Quality for Station Location Recommendation Based on Urban Big Data Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Model and Algorithm

AQI Inference

P(u) is a weighted average of its neighbors, which can be betterillustrated using example in the Figure. So, what the semi-supervisedlearning does is to spread knowns AOI values out in a Affinity GraphFor mathematical part of proof and derivation, Semi-SupervisedLearning Using Gaussian Fields and Harmonic Functions and Anoverview on the Gaussian Fields and Harmonic Functions Method forSemi-supervised Learning would be more than helpful.

Figure: Example of Affinity Graph Learning

H. Hsieh et. al. Air Quality Inferrence March 29, 2017 17 / 33

Page 18: Inferring Air Quality for Station Location Recommendation ...Inferring Air Quality for Station Location Recommendation Based on Urban Big Data Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Model and Algorithm

Minimizing Uncertainty

Since we have figured out that P(u) can be calculated using weightedaverage, the weights are only remaining unknown parameters in Q(P).Then, the question convert to a optimization problem, which is wewant to minimize or maximize something with some constraint onP(u).Intuition: maximize the likelihood of labeled nodes using validationdata. Suffering data sparsityTheir method: minimizing the uncertainty of their prediction. Theuncertainty can be represented using entropy.

H. Hsieh et. al. Air Quality Inferrence March 29, 2017 18 / 33

Page 19: Inferring Air Quality for Station Location Recommendation ...Inferring Air Quality for Station Location Recommendation Based on Urban Big Data Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Model and Algorithm

Entropy

Common form of entropy: H(P) =∫

x P(x)log(P(x))dxEntropy in this paper:H(P) =

∫x (P(x)log(P(x))dx + (1− P(x)log(1− P(x))))

Objective: minimizing average entropy for all unknown nodes U.P(u) is related to w(u, v) andw(u, v) = exp(−

∑mk=1 π

2k × AFfk (∆fk(u, v))), which means the

unknown thing is πk

Using gradient descent method to solve it.

H. Hsieh et. al. Air Quality Inferrence March 29, 2017 19 / 33

Page 20: Inferring Air Quality for Station Location Recommendation ...Inferring Air Quality for Station Location Recommendation Based on Urban Big Data Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Model and Algorithm

Algorithm

Extracting featuresConstruct affinity graphInitialize weights of graphGet initial results of H(P(U))Update P(U) using weights W and then get new H(P(U)), thencalculated gradient of H(P(U)) and update weights πk

Repeat last step until converge.

H. Hsieh et. al. Air Quality Inferrence March 29, 2017 20 / 33

Page 21: Inferring Air Quality for Station Location Recommendation ...Inferring Air Quality for Station Location Recommendation Based on Urban Big Data Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Model and Algorithm

Algorithm 2

Identify the location X0 with the lowest entropyChoose the most likely value inferred from AQInf and mark X0 aslabelledUse the pseudo AQI value together with original observed data tobuild new modelIdentify another location X1 with lowest entropyRepeat 1-4 to rank the locations to be recommended from last to first

Figure: Illustration of GEMH. Hsieh et. al. Air Quality Inferrence March 29, 2017 21 / 33

Page 22: Inferring Air Quality for Station Location Recommendation ...Inferring Air Quality for Station Location Recommendation Based on Urban Big Data Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Experiments

Table of Contents

1 Introduction

2 Related Work

3 Data and Features

4 Model and Algorithm

5 Experiments

6 Conclusion

H. Hsieh et. al. Air Quality Inferrence March 29, 2017 22 / 33

Page 23: Inferring Air Quality for Station Location Recommendation ...Inferring Air Quality for Station Location Recommendation Based on Urban Big Data Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Experiments

Effectiveness of AOInf

Setting1 Decomposed into 50*50 grids, in which 22 have the monitoring

stations containing 10416 time stamps2 Cross-validation by randomly choosing 15 of 22 grids as labelled data

and evaluated by the other 73 Repeat 1000 times and average results are reported

3 Models1 Geographical distances features plus three recent and similar time

layers as features (D+T3)2 Features in (1) plus meteorology data (D+T3+M)3 Features in (2) plus roadnet and POI features (ALL)

Competitors (3 interpolation-based, 2 learning-based and 2semi-supervised learning methods)

1 Spatial kNN, Inverse Distance Weighting (IDW), Ordinary Kriging(OK)

2 ANN, SVR3 Co-training, RBF-SSL

H. Hsieh et. al. Air Quality Inferrence March 29, 2017 23 / 33

Page 24: Inferring Air Quality for Station Location Recommendation ...Inferring Air Quality for Station Location Recommendation Based on Urban Big Data Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Experiments

Error of Air Quality Inference

H. Hsieh et. al. Air Quality Inferrence March 29, 2017 24 / 33

Page 25: Inferring Air Quality for Station Location Recommendation ...Inferring Air Quality for Station Location Recommendation Based on Urban Big Data Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Experiments

Robustness of Air Quality Inference

H. Hsieh et. al. Air Quality Inferrence March 29, 2017 25 / 33

Page 26: Inferring Air Quality for Station Location Recommendation ...Inferring Air Quality for Station Location Recommendation Based on Urban Big Data Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Experiments

Effectiveness of GEM

Setting1 Recommend k=5 locations to establish monitoring stations2 Choose 5 locations among 22 to be labelled data and reserve 10

locations to be candidate locations for building new stations, the rest 7are used for evaluation

2 Evaluation Metrics1 Top-rank ratio (TRR), TRR(C) = Rank(C)

C10k

2 RMSE-improvementCompetitors

1 Distance-based greedy2 Temporal feature-dissimilarity greedy3 Spatial feature-dissimilarity greedy4 Hybrid feature-dissimilarity greedy5 Entropy-based search6 Low edge weight search

H. Hsieh et. al. Air Quality Inferrence March 29, 2017 26 / 33

Page 27: Inferring Air Quality for Station Location Recommendation ...Inferring Air Quality for Station Location Recommendation Based on Urban Big Data Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Experiments

Evaluating GEM with Top-rank ratio

H. Hsieh et. al. Air Quality Inferrence March 29, 2017 27 / 33

Page 28: Inferring Air Quality for Station Location Recommendation ...Inferring Air Quality for Station Location Recommendation Based on Urban Big Data Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Experiments

Evaluating GEM with RMSE-improvement

H. Hsieh et. al. Air Quality Inferrence March 29, 2017 28 / 33

Page 29: Inferring Air Quality for Station Location Recommendation ...Inferring Air Quality for Station Location Recommendation Based on Urban Big Data Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Experiments

Evaluating GEM with Entropy variation

H. Hsieh et. al. Air Quality Inferrence March 29, 2017 29 / 33

Page 30: Inferring Air Quality for Station Location Recommendation ...Inferring Air Quality for Station Location Recommendation Based on Urban Big Data Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Experiments

Evaluating GEM with different number of training stations

H. Hsieh et. al. Air Quality Inferrence March 29, 2017 30 / 33

Page 31: Inferring Air Quality for Station Location Recommendation ...Inferring Air Quality for Station Location Recommendation Based on Urban Big Data Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Experiments

Evaluating GEM with different time spans

H. Hsieh et. al. Air Quality Inferrence March 29, 2017 31 / 33

Page 32: Inferring Air Quality for Station Location Recommendation ...Inferring Air Quality for Station Location Recommendation Based on Urban Big Data Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Conclusion

Table of Contents

1 Introduction

2 Related Work

3 Data and Features

4 Model and Algorithm

5 Experiments

6 Conclusion

H. Hsieh et. al. Air Quality Inferrence March 29, 2017 32 / 33

Page 33: Inferring Air Quality for Station Location Recommendation ...Inferring Air Quality for Station Location Recommendation Based on Urban Big Data Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Conclusion

Conclusion

A model to recommend the most propoer locations for air qualitymonitoring stations is proposed.The affinity graph intergrates spatial and temporal correlations.The weights are learned to not only capture the correlation betweenfeatures and AQI but also to minimize the uncertainty of the model.It is much more effective than myopically minimize entropy or otherheuristics.

H. Hsieh et. al. Air Quality Inferrence March 29, 2017 33 / 33