approximate dynamic programming for automated vacuum waste collection systems

lable at ScienceDirect

Environmental Modelling & Software 67 (2015) 128e137

Contents lists avai

Environmental Modelling & Software

journal homepage: www.elsevier .com/locate/envsoft

Approximate dynamic programming for automated vacuum wastecollection systems*

C�esar Fern�andez a, Felip Many�a b, Carles Mateu a, *, Francina Sole-Mauri c

a INSPIRES Research Institute, Universitat de Lleida, Lleida, Spainb Artificial Intelligence Research Institute (IIIA, CSIC), Bellaterra, Spainc Chemical and Biological Engineering Department, University of British Columbia, Vancouver, Canada

a r t i c l e i n f o

Article history:Received 20 March 2014Received in revised form7 January 2015Accepted 20 January 2015Available online

Keywords:Urban wasteOptimizationLearningAVWCSVacuum waste collectionSmart cities

* This work has been partially funded by the Gegrant AGAUR 2009-SGR-1434, and the Ministerio dresearch projects CO-PRIVACY (TIN2011-27076-C03-C03-01/03), TASSAT (TIN2010-20967-C04-01/03), and310000) from program INNPACTO (funded by the Mlogía until 2011).* Corresponding author.

E-mail address: [email protected] (C. Mateu).

http://dx.doi.org/10.1016/j.envsoft.2015.01.0131364-8152/© 2015 Elsevier Ltd. All rights reserved.

a b s t r a c t

The collection and treatment of waste poses a major challenge to modern urban planning, particularly tosmart cities. To cope with this problem, a cost-effective alternative to conventional methods is the use ofAutomated Vacuum Waste Collection (AVWC) systems, using air suction on a closed network of under-ground pipes to transport waste from the drop off points scattered throughout the city to a centralcollection point. This paper describes and empirically evaluates a novel approach to defining dailyoperation plans for AVWC systems to improve quality of service, and reduce energy consumption, whichrepresents about 60% of the total operation cost. We model a daily AVWC operation as a Markov decisionprocess, and use Approximate Dynamic Programming techniques (ADP) to obtain optimal operationplans. The experiments, comparing our approach with the current approach implemented in some real-world AVWC systems, show that ADP techniques significantly improve the quality of AVWC operationplans.

© 2015 Elsevier Ltd. All rights reserved.

1. Introduction

As discussed in (Fern�andez et al., 2014) a smart city is a city inwhich information and communications technologies are mergedwith traditional infrastructures coordinated and integrated usingnew digital technologies. Awareness has arisen that cities have todevelop in a greener andmore sustainable way, since they consumethe majority of the world resources. Advanced systems to improveand automate processes within a city will play a leading role insmart cities. From smart design of buildings to intelligent controlsystems the possible improvements enabled by sensing technolo-gies are immense.

The collection and treatment of waste is a major challenge onmodern urban planning due to the growth of urban population, as

neralitat de Catalunya undere Economía y Competividad03), ARINF (TIN2009-14704-Newmatica (IPT-2011-1496-inisterio de Ciencia y Tecno-

well as to the increasing amount of waste generated on wealthyareas (Eurostat, 2012). The environmental issues related to wastecollection are believed to be related primarily to the emission ofexhaust gases from the combustion process, noise and odor. In thispaper we focus on Automated Vacuum Waste Collection (AVWC)systems, which are a cost-effective alternative to more conven-tional approaches. AVWC uses air suction on a closed network ofunderground pipes, to transport waste from the drop off pointsscattered throughout the city to a central collection point, reducinggreenhouse gas emissions and the inconveniences of the conven-tional method of waste collection (odors, noise, combustion gasemissions, etc.), as well as allowing better waste reuse and recy-cling (Fern�andez et al., 2014).

Considering together the different technologies and companies,over 1600 AVWC solutions are under construction or in operation inover 30 countries in Europe, North America, Australia, South EastAsia and the Middle East (University Transportation ResearchCenter, 2013). The technology is evolving to give answer togreater areas and to be able to give answer to greater amounts ofwaste.

The advantages of the AVWC systems over the conventionalones have been widely described. In (Parriaux et al., 2006), acomplete study highlighting the potential principal resources for

Delta:1_given name

Delta:1_surname

Delta:1_given name

Delta:1_surname

mailto:[email protected]

http://crossmark.crossref.org/dialog/?doi=10.1016/j.envsoft.2015.01.013&domain=pdf

www.sciencedirect.com/science/journal/13648152

http://www.elsevier.com/locate/envsoft

http://dx.doi.org/10.1016/j.envsoft.2015.01.013



Fig. 1. Schematic example of an automatic vacuum waste collection plant.

C. Fern�andez et al. / Environmental Modelling & Software 67 (2015) 128e137 129

underground use (space, water, geothermal energy and geo-materials) that could be used to increase the sustainability of citiesis provided. The utilization of subsurface space for waste collectionstarted in the 1960s and since then it has gained interest for anactivity that might be difficult, impossible, environmentally unde-sirable or even less profitable to be installed above ground. Thechange allows to release valuable surface space for other uses andenhance living conditions (Kaliampakos and Benardos, 2008).

Life cycle analysis (LCA) methodology has been applied to es-timate the environmental impact of several waste collection sys-tems and show the benefits of AVWC (Iriarte et al., 2009; ArandaUs�on et al., 2011; W€ager et al., 2011; Punkkinen et al., 2012;Us�on et al., 2013). From an environmental and functional pointof view, the substitution of trucks has a big influence in trafficcongestion, accidents, and minimizes noise and CO2 emissions(Kogler, 2007; Us�on et al., 2013). The removal of containers fromthe streets minimizes hygienic problems as container overload ispractically eliminated and odor issues are controlled by the vac-uum system.

From an economical point of view, the major advantage of theAVWC system is the reduction in the operational costs for the wastehandling. Although greater initial investments are required(Teerioja et al., 2012), in the long term the more economicaloperation of the system overcomes this disadvantage (Honkio,2009). In (Kogler, 2007), it is presented a comparison regardingoperational and investment costs between door-to-door truckcollection and AVWC for a new development. In (Teerioja et al.,2012), it is presented an environmental and cost analysis for thesame two systems but focusing on the development in an alreadybuilt space with dense population and well-established city func-tions. In (Nakou et al., 2014), it is provided a financial assessmentwhere heavy construction works were required within an alreadybuilt space in the city. In all cases, the studies demonstrated thatAVWC solutions have equivalent performance to conventionalcollection schemes, and the significantly lower operational costs bythe system compensate the initial investment requirements.

The main objective of this paper is to create and evaluate a newmethod for producing daily operation plans for AVWC systems insuch away that the energy consumption is reduced, and the qualityof service is improved.

An AVWC system uses air suction on a closed network of un-derground pipes to transport waste from the drop off points,scattered throughout the city, to a central collection point. It typi-cally covers an area of a few square kilometers. Among their ad-vantages are the reduction of greenhouse gas emissions, the abilityto mitigate the inconveniences of conventional waste collectionsystems (odors, noise, traffic congestion, …), and the achievementof higher levels of waste reuse and recycling.

AVWC systems are equipped with a control software that pro-duces plans for determining the inlets that should be emptiedduring a time interval subject to a number of constraints (e.g., fullinlets should always be emptied, inlets should be emptied at leastonce a day, air speed has upper and lower bounds, …). The qualityof such plans is decisive for reducing costs in AVWC, and it isparticularly important to reduce energy consumption because en-ergy represents about 60 percent of the total operation cost of anAVWC system.

The present work is a continuation of our research on AVWCsystems published in (B�ejar et al., 2012a, 2012b; Fern�andez et al.,2014). In (B�ejar et al., 2012b), we formally defined the problem ofoptimizing energy consumption in AVWC systems, describing thesystem operation, the dynamics (energy and time), and the oper-ative constraints. Later, in (B�ejar et al., 2012a; Fern�andez et al.,2014), we proposed a Constraint Integer Programming (CIP)encoding of the problem in order to take optimal real-time

decisions. The proposed CIP-based approach is a single step in thequest for an optimal operation plan over an extended time horizon,typically a day, and was successful in finding optimal, or nearoptimal solutions, even for large systems. This paper makes a stepforward: it presents an original approach to defining operationplans for AVWC systems, modelling a daily AVWC operation as aMarkov decision process, and using Approximate Dynamic Pro-gramming (ADP) techniques to obtain optimal operation policies.Our proposal is tested against existing solutions by using disposaldata of real-world AVWC deployments, and the empirical resultsobtained show that new operation policies not only improve thequality of service but significantly reduce energy consumptionunder different scenarios.

The paper is structured as follows. Section 2 contains the modelof AVWC systems described in (Fern�andez et al., 2014), which isincluded here to make the paper as self contained as possible.Section 3 presents the model of a daily AVWC operation as aMarkov decision process. Section 4 describes how ADP is appliedfor producing good quality, daily operation plans. Section 5 in-troduces the data used for benchmarking our solution. We presentthe data from two real-world plants used in the experiments, andthe method for generating more synthetic data from real mea-surements. We also briefly describe the PLC controller algorithmused in real-world plants. Section 6 reports on the empiricalinvestigation, and analyses the results obtained in our experi-ments. Section 7 concludes the paper and suggests future researchdirections.

2. Modelling AVWC systems

An AVWC system consists in a tree-shaped pipe network rootedat a central collection point. This central collection point has themeans to split the collected waste by fraction (glass, organic refuse,paper, plastics, …), and is where waste is packed for disposal incontainers that are then transported with trucks to a landfill areafor recycling or performing mechanical biological treatment. Thenetwork usually has sector valves located on some of the branchjunctions that can isolate one of the branches to reduce the volumeof air that will be suctioned. The drop off points are located alongthe branches, and contain inlets for the different fractions. Thereare also air valves that act as air entry points that help produce the

Fig. 2. Schematic representation of all the possible sectors according to the sectorvalves set up.

C. Fern�andez et al. / Environmental Modelling & Software 67 (2015) 128e137130

air flow when the suction starts. Air valves can be located next toinlets, although it is notmandatory to have an air valve in each dropoff point.

An AVWC system is modelled as a set fT ;I ;F ;Va;Vsg (B�ejaret al., 2012b). T ðN ;EÞ is a rooted binary tree with nodes (N )representing either waste inlets (I ) or pipe junctions, and edges (E)corresponding to union pipes between nodes. F represents the setof waste fractions. Air valves (Va), located at some inlets, create airstreams able to empty downstream inlets. Sector valves (Vs) aredisposed along the tree in order to segment the whole tree struc-ture, and define isolated sectors (s), making a more efficienttransport for the inlets comprised in the corresponding sector. Thesector, defined by a configuration of open and closed sector valves,is the subtree that contains all the paths to the root that containonly open valves and at least an inlet.

Each inlet in I is denoted by Ifi , where i denotes the inlet numberand f denotes the fraction, meanwhile vai and vsi denote air andsector valves respectively. The status of any valve is open (o) orclosed (c). Fig. 1 is a small example of the system, with 3 types offraction, 5 inlets (two of them handling 2 types of fraction, so onecan consider having 7 inlets), 4 air valves and 3 sector valves. Notethat, in this case, only 5 combinations of Vs out of the 8 possible arevalid, giving 5 different sectors, as depicted in Fig. 2.1 We denote byS the set of sectors derived from T and Vs. Each sector s has a set ofinlets I s such that I ¼ ∪s2SI s.

Three important subtrees that will deeply impact the systemdynamics arise from the topology: emptying, air and vacuumsubtrees. The emptying subtree (T E

i ) is unique for each inlet, and isdefined as the path that waste must follow from inlet i to the rootnode. Of course, T E

i must not contain closed sector valves on it. Theair subtree (T A

i ) is the path followed by the air stream in charge ofwaste transport along T E

i . Note that TEi 4T A

i , and T Ei ¼ T A

i if inlet ihas an air valve; otherwise, the airflow must come from an up-stream inlet. The vacuum subtree (T V

s ) is unique for each sector,and represents the total amount of air to be moved before pro-ceeding to waste transport.

Time is considered to be slotted. At each slot time (t), the systemdecides to operate a set of inlets of a given fraction (f) and section(s), at a determined air speed (v), according to their level of occu-

pancy and to the system state. Being Lfi the waste level of an inlet Ifiat the beginning of the current slot t, a valid emptying sequence

Ef ;s ¼ ½Ifi1 ; Ifi2;…� must be operated under a maximum transfer ca-

pacity constraint (Lfmax) such thatP

Ifi 2Ef ;st

Lfi � Lfmax. We denote L as

the set of all loads Lfi at a given slot.Energy and time calculations are out of the scope of this paper.

We refer to (Fern�andez et al., 2014) for details about suitable energyand time models. Our objective in this paper is to find a set ofemptying sequences Ef ;s for a full day length operation that opti-mizes a given criterion, such as energy consumption.

3. AVWC as a Markov Decision Process

A Markov Decision Process (MDP) (Puterman, 2008) can becharacterized by a five-tuplefT; S;Ap

st ; Ptð,jst ; aÞ;Rtðst ; aÞ : t2T; st2S; a2Apstg, where T is the set

of time slots where decisions are taken, S is the set of possible statesbeing st a given state at time t, Ap

st is the set of suitable actions for a

1 Following the notation ðvs1; vs2; vs3Þ, fðc; c; cÞ; ðc;o; cÞg are not valid assignments(because the resulting subtree only contains the root node) and fðc; c;oÞ; ðc;o;oÞggive the same sector configuration.

given state st under a determined policy p, Ptð,jst ; aÞ denotes thetransition probabilities among states, and Rtðst ; aÞ denotes the re-wards derived from decisions.

AVWC systems can be naturally modeled as a MDP. The statespace is determined by the inlets occupancy levels, as well as someother factors that we detail below. According to a given state, onemust determine a given action. Actions in AVWC systems consist ofa set of decisions such as determining the inlets that should beemptied, and selecting the type of fraction that should be treated.The stochastic nature of a MDP, in this case, is derived from thedisposal behavior of users. As we do not have a formal character-ization of their probability functions, ADP provides us methods toobtain approximate solutions, as we will see in Section 4. Rewards,as detailed below, are regarded as the energy cost derived from agiven action.

For AVWC systems, we assume T to be finite and discrete, andsay that MDPs have finite horizon. This horizon is a day, and wetypically take a slot time duration of a few minutes.

In a first approach, S can be represented as the Cartesian productS ¼ S �F � ℝ�L, or equivalently, a state can be represented by avector ðs; f ; v; l1;/ljLjÞ, where s and f indicate the previous operatedsection and fraction at air speed v, and li2f0;1;2g denotes adiscrete three-level inlet load.2 Note that, even assuming operationat constant air speed, the cardinality of S, jSj, is inOð��S��,��F ��,3jLjÞ. Itis around 1018 for the largest topologies considered in this paper.Such a dimensionality lead us to define a first level of aggregationfor the state space. We redefine S as

S4S �F � ℝ�N 1F;M �/�N jSj

F;M , where N sF;M denotes pairs of

values indicating the number of inlets at load level 2 (full (F), above80%) and the number of inlets at load level 1 (medium (M) load,between 20% and 80%) for section s2S. Note that

2 Throughout the experiments we use level 0 for inlet loads below 20% of fullcapacity, level 2 for loads above 80%, and level 1 for the remaining cases.

��N sF;M
�� ¼ O�ðjI sjþ1Þ,ðjI s jþ2Þ

2

�and S is a subset because, as an

example, some values for N iF;M �N j

F;M may not be alloweddepending on the topology and how sections i and j are derived.Under this aggregation, jSj is upper bounded by

O��S��,��F ��,ðjI jþ1Þ,ðjI jþ2Þ

2

�. Now, under the same previous as-

sumptions and considering a worst-case scenario where all thesectors have jI j inlets, jSj is roughly 1015. Even so, it is worthnoticing that such a bound is far from being realistic, because mostsectors share only a small amount of inlets. As an example, thelargest topology considered here has around 108 states.

Each action a2Aps is represented by a vector ðs; f ; v;Ef ;sÞ that

describes the section, fraction, air speed, and set of inlets to beoperated in the next time slot. Note that the action do nothingmustbe also included in Ap

s .The transition probabilities determined by Ptð,jst ; aÞ are un-

known because we do not know the waste disposal distributionfunctions of the inlets. Let Dt denote the joint distribution functionfor the waste disposal process over all the inlets. In Section 4, weexplain how approximate dynamic programming techniquessample Dt to overcome the lack of knowledge about distribution Dt .

Rewards Rtðst ; aÞ can be regarded as costs derived from de-cisions. In our context, rewards are measured in energy units, andare computed according to the transitory (Etrt ) and stationary (Estt )energy expressions of the models detailed in (Fern�andez et al.,2014). As the objective is to minimize operation cost, some spe-cific rewards must be considered to avoid a zero energy con-sumption solution. Two operative conditions determine suchspecific rewards:

� Each inlet must have load level 0 at the last time slot.� Each inlet with load level 2 must be emptied.

Each violation of any of the previous conditions implies a pen-alty cost added to the reward.

Under this scenario, the objective is to solve the classicalBellman equation (Bellman, 2003):

minp

E

(Xt2T

gtRtðst ; aÞ)

(1)

where g is the discount factor. Eq. (1) can be expressed in terms ofthe state values as

Vpt ðstÞ ¼ min

a2Apst

�Rtðst ; aÞ þ gE

�Vptþ1ðs0tÞ

��st��¼ min

a2Apst

Rtðst ; aÞ þ g

Xs0 t2S

Ptðs0t jst ; aÞVptþ1ðs0tÞ

!(2)

where Vpt ðstÞ is the value of beginning at state st at time slot t for

policy p.

Algorithm 1. Single-pass ADP.

4. Solving AVWC problems with ADP

Approximate Dynamic Programming (ADP) are algorithmicstrategies to approximate solutions to Eq. (2) when it is notpossible to visit all the state space due to its large dimensionality(Powell and Roy, 2004; Powell, 2007; Bertsekas, 2012). ADPtechniques also apply to situations in which expectations cannotbe computed by performing sample space approaches (Pflug,1996; Spall, 2003).

Denoting st as the system state at time t, we can writestþ1 ¼ f ðst ; a;Dtþ1Þ, where f ð,Þ is a transition function that de-termines the next state depending on the action performed (a), andthe posterior waste disposals Dtþ1. Instead of relying on pre-decisional states as st , we can split such a transition into twosteps, defining a post-decision state as sat ¼ f1ðst ; aÞ, beingstþ1 ¼ f2ðsat ;Dtþ1Þ. As shown in (Powell, 2007), we can write Eq. (2)about the post-decision state variable as follows:

Vpt�1�sat�1

� ¼ E

(mina2Ap

st

�Rtðst ; aÞ þ gVp

t�sat��sat�1

�): (3)

Eq. (3) has significant algorithmic advantages because nowexpectation may be approached iteratively, and function minoperates over deterministic conditions. Effectively, assuming thatwe iterate over Eq. (3), we can use updating functions to approxi-mate the average using, for iteration n,

Vp;nt�1�sat�1

� ¼ ð1� anÞVp;n�1t�1

�sat�1

�þ anbvnt ; (4)

where an is a step-size function, and bvnt is

bvnt ¼ mina2Ap

snt

Rt�snt ; a

�þ gVp;n�1t

�f1�snt ; a

��(5)

denoting pre-state snt its dependence on the n-th sample reali-zation of waste disposal random variables Dt . It is worthnoticing that the solution of Eq. (5) requires to encode thesystem, as well as its corresponding status, as a constraintinteger programming (CIP) problem, and solve it as detailed in(Fern�andez et al., 2014).

At this point, we implement two strategies for finite horizonADP problems, both described in (Powell, 2007). Alg. 1 is a single-pass version of ADP that we use to evaluate learning strategies forAVWC systems when the number of waste disposal samples ishigh. Considering that our set of real disposal data is small,roughly the data of two-months, we build waste disposal gener-ators to preliminary test ADP algorithms and tune design aspectssuch as the state space aggregation mentioned in Section 3. Wastegenerators will provide enough sample paths (N) to proveconvergence, even for complex topologies where the number ofstates is large.

Nevertheless, when testing real case scenarios, where theavailable set of sample paths is relatively small, we apply a hybrid


value/policy iteration as in Alg. 2. In this case, we iterate as manytimes as required (M) over all the available data set (N). Note thatdifferently from Alg. 1, policy updates after iterating over all theavailable samples.

Algorithm 2. Hybrid value/policy iteration ADP.

5. The benchmark

For testing the benefits of ADP-based solutions in front of thepolicies currently applied in AVWC plants, we deal with data fromtwo existing plants. We refer to them as Plant1 and Plant2. Both aremid-size facilities, but have some structural differences.

Plant1, depicted in Fig. 3, has 30 inlets, distributed in 4 sectors,with several inlets sharing sectors. The big red square in the figureis the Central RSU, the tree root, and all numbered squares arewaste inlets. Its topology is highly linear, with long branches havingfew inlets. In fact, Plant1 is a work in progress deployment, whereseveral inlets are yet to be built whilst the plant is already onproduction. The not yet deployed inlets are depicted as small dotsin the topology tree of Fig. 3.

Plant2, depicted in Fig. 4, has 36 inlets and 5 sectors, and ishighly sectorized, with few inlets sharing sectors. Plant2 is denserthan Plant1. It has more inlets on the same surface and pipe length.

A small AVWC plant, called Test1, is created from Plant2 fortesting purposes: it only considers 2 sections and 15 inlets. All theempirical experimentation reported in this paper assumes both aunique fraction, and constant air speed. Finally, we consider a 24-h operation window with time slots of 5 min.

5.1. Real and synthetic data

For obtaining a detailed profile of user disposals, in each inlet weused volume sensors that performed continuous measurements ofits occupancy for a period of 36 days at both plants. This set of wastedisposal measurements determined the sample path Dn

t for Alg. 2.Actually, we only used 26 days as sample data (N ¼ 26), and kept offa set of 10 days for performance measurements of the ADP policy.

Additionally, we created a synthetic generator for waste disposalprocesses that allows us to generate as many sample paths asneeded. For each inlet i, we define a multivariate random variable,

di ¼ ðdi1;/dijT jÞ, where dit ¼ unifð0;UitÞ; t2T is uniformly random

distributed, and Uit ¼

P5j¼1uje

�ðt�tijÞ2

s2 . ðu1;/;u5Þ are constants, and

tij ¼ unifð1; jT jÞ are independent, uniformly distributed random

variables. This simple model, even far from realistic, tries to capturea typical disposal activity, centered around a few time slots along aday. Uniform distributions for disposal load and slot time distri-bution are assumed for the sake of simplicity.

Synthetic and real waste disposal differ mainly in two aspects.First, synthetic disposals are uncorrelated among inlets, not beingso in real scenarios. Second, although the peak time slot tij remainsinvariant for a given inlet, variations in the maximum value ofdisposals Ui

t make synthetic waste disposals highly variable. As aresult, synthetic waste disposals push ADP learning to the limit,making the effective state space larger than in real scenarios, andrequiring more iterations to converge to optimal policies, as wewillsee in the experiments.

5.2. State-of-the-art controllers

Current AVWC system policies rely on some sort of program-mable logic controller (PLC) implementations. PLCs make decisionsbased on the system sensors, which provide information such asinlet level, time slot, fraction type, …Alg. 3 describes the controlleralgorithm used in our experimentation. At each time slot, as manyemptying sequences (Es) as number of sectors are obtained. A giveninlet becomes part of the emptying sequence if its occupancy levelis non-residual (levels 1 or 2). If an emptying sequence has someinlets at level 2 or its total load exceeds a given threshold (Thresh) ofthe maximum transfer capacity, such an emptying sequence iseligible for being operated. Finally, among the eligible sequences,the one with the maximum load is chosen. The final objective istwofold: first, to avoid incurring penalties by emptying inlets closeto the maximum capacity; and second, to group as many inlets aspossible in emptying sequences, minimizing plant operation cyclesand, therefore, reducing energy consumption.

Algorithm 3. Programmable logic controller.

Optimal values for parameter Thresh depend on plant charac-teristics and waste disposal behavior. Fig. 5 plots the average costand energy consumption per day for Plant1 and Plant2 under Alg.3 operation applied to the 36 days period of waste disposal data.As mentioned before, cost operation adds, to the energy con-sumption value, the penalties incurred due to the violation of theconditions detailed in Section 3. We have considered an addedcost of 500 Megajoules (MJ) per penalty. It is an arbitrary choice,and reflects the importance one attributes to the quality of service.Plant1 shows an optimal cost threshold below 0.2, meanwhile theoptimal threshold of Plant2 rises up to 1. The reason of such adifference relies on the fact that the load in Plant1 is much higherthan the load in Plant2. Being so, Plant1 has a higher probability ofpenalties due to overload, meanwhile Plant2 rarely overloads itsinlets.

For the remaining experimentation, we use the optimalthreshold for Alg. 3 according to the plant being considered. Wealso use a penalty of 500 MJ, regardless of the topology.

6. ADP performance

The dimensionality of the state space is, undoubtedly, one ofthe challenges ADP has to face. If the state space dimension is too

Fig. 3. Topology of Plant1.


large, we may have not enough data to properly train ADP, oreven having data, the number of iterations required for conver-gence to good policies can make impracticable our approach. Asmentioned in Section 3, aggregation is an effective technique toreduce the state space dimension. First, we start by aggregatingthe number of full and medium occupancy inlets by section. Thesets N s

F;M that define the state space S are pairs of valuesreflecting the number of inlets at load levels 1 and 2 in section s.This aggregation method (aggregation 1), that proves to beeffective for small topologies, still leads to excessive dimen-sionality for larger scenarios.

The second aggregation method we consider (aggregation 2)

consists in defining new sets N 2;sF;M4N s

F;M such that

N 2;sF;M ¼ fðmaxða; fmÞ; maxðb; fmÞÞ; ða; bÞ2N s

F;Mg. We take fm ¼ 3in our experimentation.

In order to evaluate AVWC performance, we measure therequired amount of energy to operate the system. Of course,different actions or decisions will operate the system in differentways, being some of them more efficient than others. As theobjective function determined by Eq. (1) looks for minimizing thetotal rewards, if such rewards were merely the energy

Fig. 4. Topology of Plant2.

Fig. 5. Cost and energy operation values as a function of Thresh for Alg. 3 (PLC).


consumption of certain actions, a trivial solution would be doingnothing. Obviously, this is not a valid result because an AVWCsystem must obey certain requirements. In our simulations, as inreal systems, we observe two main constraints, alreadymentioned in Section 3. First, all the inlets must have a load level0 at the end of the last time slot. In other words, all the inletsmust be empty when operations end. Second, an inlet with loadlevel 2 must be emptied during the following time slot to avoiddisposals overflow. Whenever any of the previous constraints isviolated, one can say that the operational costs are increased. Atthis point we define the operational cost as the operational en-ergy plus the penalties introduced every time that an operationalconstraint is violated.

Fig. 6 shows the cost operation of plant Test1 with syntheticwaste disposals. We compare Alg. 3 (PLC) and ADP Alg. 1 usingboth aggregation methods. Cost peaks reflect the policy inability

Fig. 6. Cost for ADP aggregation methods on plant Test1.

Fig. 7. Filtered cost for deterministic and ADP aggregation methods on plant Test1.


to prevent cost penalties. In all our experimentation, we employa penalty of 500 MegaJoules (MJ), which is roughly twice theenergy required to operate plant Test1 for a day. We observehow ADP policies tend to reduce penalties along iterations, dueto their learning capacity. Not surprisingly, we also observe howaggregation 2 avoids more penalties than aggregation 1 becauseof its reduced state space cardinality. Fig. 7 plots the same costvalues but smoothed with a Hann window of 1000 sampleswidth.3 It can be noted that after several iterations, ADP reducessignificantly the average cost of operation in relation to Alg. 3.We also observe a performance loss for aggregation 2 in relationto aggregation 1. Even such a performance loss for aggregation2, henceforth, it is the default method used because of its speedup. Effectively, the larger state space of aggregation 1, dramat-ically slows the iteration time. As an example, the real timeemployed by aggregation 2, for the 6000 iterations in a smallproblem such as Test1, is roughly 12 h for a single CPU. Aggre-gation 1 employs almost 4 days, being enlarged this gap as thetopology grows.

The same experiment was conducted with a larger plant, Plant1.Fig. 8 compares cost with energy consumption for PLC and ADPAlg.1. Smoothed values overlapped in Fig. 8 are zoomed in Fig. 9. We

3 When processing digital signals ecost vs. iterations can be regarded as sucheand some sort of filtering is required, windows functions are used to average the setof samples inside the window. Those windows may have different forms e rect-angular, triangular, …e being Hann windows cosinus shaped functions frequentlyused in signal processing.

observe a better performance of ADP not only for cost but also forenergy consumption. This point is particularly interesting becauseADP techniques enhance service quality by reducing the operationcost, meanwhile energy consumption is maintained or evenreduced.

A second set of experiments was conducted for Plant1 andPlant2 using real disposal data. As mentioned in Section 5, 36days of sample disposals are available for both plants. We keep26 days as sample data for ADP training, and perform cost andenergy evaluations over the remaining 10-day test samples. Ac-cording to the notation of Alg. 2, we iterate m times a 26-daysample data set (N ¼ 26). After a given iteration m, we compute

cost and energy over the test samples according to policy Vp;mt

derived from Alg. 2. Plots in Fig. 10 show the cost and energyaverages over the 10 test samples after each iteration for bothplants, Plant1 and Plant2. Despite having a small set of sampledata to derive an optimal policy (26 days), we observe importantimprovements in cost and energy consumption. Plant1 experi-mentation improvements are not as good as Plant2 becauseoperation load in Plant1 is low. This fact can be observed whenlooking at cost and energy values. Both parameters are prettyclose because the low load of the system does not trigger over-loads, and consequently, no penalties exist. Even though, ADP4

policies improve a 10.5% the cost and energy consumption.Such an improvement is explained by a more efficient decisionwhen selecting the inlets and sections to be emptied. A differentsituation is observed in Plant2, where the load system is higher,having more penalty situations. In this case, ADP policy is able tosave the 45.4% of the cost in relation to PLC policy even reducing11.3% the energy consumption.

7. Conclusions and future work

We showed how ADP techniques may help AVWC systems toderive more efficient operation policies, reducing their operationcost and energy requirements. AVWC systems spend greatamounts of energy in a day operation. As an example, a mid-sizeplant as Plant2 requires an average energy of 1:6,109 Joules(444 kW,h) per day. Even small improvements result in greatenergetic savings and environmental benefits. It is worthnoticing that a reduction between 10% and 30% in energy

4 Note that at the first iterations, ADP policies performworse than PLC. Obviously,ADP starts with an estimation of the system state values (Vp

t ðstÞ) far from optimal,being updated during the successive iterations.

Fig. 8. Cost and energy for ADP Alg. 1 in Plant1.

Fig. 9. Filtered cost and energy for ADP Alg. 1 in Plant1.

Fig. 10. Cost and energy averages for PLC and ADP Alg. 2


consumption implies a reduction between 7% and 15% of theoperation cost.

An ADP single-pass iteration value implementation, inconjunction with synthetic disposal generators, helped us todevise good aggregation methods for real topologies in whichthe available data was too scarce to implement a full experi-mental setup. Building over this knowledge, we implementedan ADP hybrid value/iteration policy algorithm, and compared itagainst state-of-the-art policies based on PLCs under conditionsof real scenarios. Results show that, despite having a small setof inlet disposal data to train our ADP implementation, theperformance is greatly improved, specially for high load con-ditions. We are able to improve the quality of service byreducing the number of penalties due to inlets overloads, aswell as the energy consumption. The policies obtained can beused in real scenarios to control the actual PLCs that governAVWC systems.

using real data waste disposals at Plant1 and Plant2.


As future work, we plan to improve the speed of the proposedalgorithms. First, more efficient data structures storing the policyvalues, now implemented using hash tables, should be devised.Second, the asynchronous nature of the policy updating stepsfacilitates the design of parallel implementations of the algo-rithms. Such improvements might allow to tackle largertopologies.

References

Aranda Us�on, J.A., Ferreira, G., Zabalza Bribi�an, I., Zambrana V�asquez, D., S�aez deGuinoa, A., 2011. Environmental performance of the end-of-life-tyres recycling.In: 6th Dubrovnik Conference on Sustainable Development of Energy, Waterand Environment Systems.

B�ejar, Ram�on, Fern�andez, C�esar, Many�a, Felipe, Mateu, Carles, Sole-Mauri, Francina,2012. Optimizing energy consumption in automated vacuum waste collectionsystems. In: Proceedings of ICTAI 2012. IEEE Press, pp. 291e298.

B�ejar, Ram�on, Fern�andez, C�esar, Mateu, Carles, Many�a, Felipe, Sole-Mauri, Francina,Vidal, David, 2012. The automated vacuum waste collection optimizationproblem. In: Proceedings of AAAI 2012. AAAI Press/The MIT Press, pp. 264e266.

Bellman, R.E., 2003. Dynamic Programming. Dover Books on Computer ScienceSeries. Dover Publications.

Bertsekas, Dimitri P., 2012 Dynamic Programming and Optimal Control, fourth ed.,Vol. II. Approximate Dynamic Programming. Athena Scientific.

Eurostat, European Comission, December 2012. Waste Statistics. http://epp.eurostat.ec.europa.eu/statistics_explained/index.php/Waste_statistics.

Fern�andez, C�esar, Mateu, Carles, Many�a, Felip, Sole-Mauri, Francina, 2014. Model-ling energy consumption in automated vacuum waste collection systems. En-viron. Model. Softw. 56 (June 2014), 63e73.

Honkio, Katariina, 2009. The future of waste collection? Underground automatedwaste conveying systems. Waste Manag. 10 (4).

Iriarte, A., Gabarrell, X., Rieradevall, J., 2009. LCA of selective waste collection sys-tems in dense urban areas. Waste Manag. 29 (2), 903e914.

Kaliampakos, D., Benardos, A., 2008. Underground space development: settingmodern strategies. Undergr. Spaces 8e10 (4), 1e10.

Kogler, T., 2007. Waste Collection. report, iswa working group on collection andtransportation technology. http://www.iswa.org/uploads/tx_iswaknowledgebase/ctt_2007_2.pdf.

Nakou, D., Benardos, A., Kaliampakos, D., 2014. Assessing the financial and envi-ronmental performance of underground automated vacuum waste collectionsystems. Tunn. Undergr. Space Technol. 41, 263e271.

Parriaux, A., Tacher, L., Kaufmann, V., Blunier, P., 2006. Underground resources andsustainable development in urban areas. In: IAEG 2006 Engineering Geology forTomorrow's Cities.

Pflug, Georg Ch., 1996. Optimization of Stochastic Models: the Interface betweenSimulation and Optimization (The Springer International Series in Engineeringand Computer Science). Springer.

Powell, Warren B., 2007. Approximate Dynamic Programming: Solving the Cursesof Dimensionality (Wiley Series in Probability and Statistics). Wiley-Interscience.

Powell, Warren B., Roy, Benjamin Van, 2004. Approximate dynamic programmingfor high dimensional resource allocation problems. In: In Handbook of Learningand Approximate Dynamic Programming. IEEE Press.

Punkkinen, H., Merta, E., Teerioja, N., Moliis, K., Kuvaja, E., 2012. Environmentalsustainability comparison of a hypothetical pneumatic waste collection systemand a door-to-door system. Waste Manag. 32 (10), 1775e1781.

Puterman, Martin L., 2008. Markov Decision Processes. John Wiley & Sons, Inc.Spall, James C., 2003. Introduction to Stochastic Search and Optimization, 1 edition.

John Wiley & Sons, Inc., New York, NY, USA.Teerioja, N., Moliis, K., Kuvaja, E., Ollikainen, M., Punkkinen, H., Merta, E., 2012.

Pneumatic vs. door-to-door waste collection systems in existing urbanareas: a comparison of economic performance. Waste Manag. 32 (10),1782e1791.

University Transportation Research Center, 2013. A Study of the Feasibility ofPneumatic Transport of Municipal Solid Waste and Recyclables in ManhattanUsing Existing Transportation Infrastructure. http://www.utrc2.org/sites/default/files/pubs/pneumatic-waste-manhattan-report-Final_0.pdf.

Us�on, Alfonso Aranda, Ferreira, Germ�an, V�asquez, David Zambrana, Bribi�an, IgnacioZabalza, Sastresa, Eva Llera, 2013. Environmental-benefit analysis of two urbanwaste collection systems. Sci. Total Environ. 463464 (1), 7277.

W€ager, P.A., Hischier, R., Eugster, M., 2011. Environmental impacts of the swisscollection and recovery systems for waste electrical and electronic equipment.Sci. Total Environ. 409, 1746e1756.

http://refhub.elsevier.com/S1364-8152(15)00034-1/sref1






























http://epp.eurostat.ec.europa.eu/statistics_explained/index.php/Waste_statistics

http://epp.eurostat.ec.europa.eu/statistics_explained/index.php/Waste_statistics

















http://www.iswa.org/uploads/tx_iswaknowledgebase/ctt_2007_2.pdf

http://www.iswa.org/uploads/tx_iswaknowledgebase/ctt_2007_2.pdf































http://www.utrc2.org/sites/default/files/pubs/pneumatic-waste-manhattan-report-Final_0.pdf

http://www.utrc2.org/sites/default/files/pubs/pneumatic-waste-manhattan-report-Final_0.pdf













approximate dynamic programming for automated vacuum waste collection systems

Documents