robust strategy synthesis for probabilistic systems applied to …sseshia/pubdir/emsoft... ·...

10
Robust Strategy Synthesis for Probabilistic Systems Applied to Risk-Limiting Renewable-Energy Pricing Alberto Puggelli, Alberto L. Sangiovanni-Vincentelli, Sanjit A. Seshia Department of Electrical Engineering and Computer Science University of California, Berkeley {puggelli, alberto, sseshia}@eecs.berkeley.edu ABSTRACT We address the problem of synthesizing control strategies for El- lipsoidal Markov Decision Processes (EMDP), i.e., MDPs whose transition probabilities are expressed using ellipsoidal uncertainty sets. The synthesized strategy aims to maximize the total expected reward of the EMDP, constrained to a specification expressed in Probabilistic Computation Tree Logic (PCTL). We prove that the EMDP strategy synthesis problem for the fragment of PCTL dis- abling operators with a finite time bound is NP-complete and pro- pose a novel sound and complete algorithm to solve it. We apply these results to the problem of synthesizing optimal en- ergy pricing and dispatch strategies in smart grids that integrate re- newable sources of energy. We use rewards to maximize the profit of the network operator and a PCTL specification to constrain the risk of power unbalance and guarantee quality-of-service for the users. The EMDP model used to represent the decision-making scenario was trained with measured data and quantitatively cap- tures the uncertainty in the prediction of energy generation. An ex- perimental comparison shows the effectiveness of our method with respect to previous approaches presented in the literature. 1. INTRODUCTION Several real-world multi-agent systems exhibit stochastic behav- ior, and can be modeled using formalisms such as Markov Deci- sion Processes (MDPs) [1]. Desired properties of such systems can be both Boolean, e.g., expressible in logics such as probabilistic computation tree logic (PCTL) [2, 3], or quantitative, such as max- imizing a reward function [4]. The synthesis of strategies to satisfy Boolean properties and optimize quantitative measures is naturally a topic of much relevance. Moreover, for probabilistic models that are inferred from empirical data, it is necessary to design strategies that are robust to uncertainties in estimated state-transition proba- bilities. In this paper, we present a new approach to robust strategy synthesis for a class of MDPs with uncertainties, with application to risk-limiting renewable energy pricing and dispatch. Main Motivating Application. Electricity consumption world- wide is projected to grow from 18 trillion kWh in 2006 to 32 trillion kWh in 2030, a 77% increase [5]. To avoid catastrophic pollution damage to the planet, it is necessary to employ energy sources alter- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. ESWEEK’14, October 12 - 17 2014, New Delhi, India Copyright 2014 ACM 978-1-4503-3052-7/14/10 $15.00. http://dx.doi.org/10.1145/2656045.2656069 native to fossil fuels [6]. In fact, an acceleration in the deployment of renewables is already taking place in Europe, Asia and North America. In this paper, we focus on wind energy, which nowadays has higher capacity than solar energy, and is expected to consti- tute a significant portion of renewable generation integrated to the power grids of North America [6]. The correct operation of power systems requires the balance be- tween energy supply and demand at all times. The risk for the system operator can be quantified both by the probability of not meeting such a balance constraint, and by the (positive) gap be- tween demand and supply. High values of either indicator make the occurrence of disruptions, faults and ultimately blackouts more likely [7]. In grids that only integrate fossil energy sources, the task for the system operator amounts to dispatch the production of en- ergy during the day, based on averaged demand profiles. A wealth of deterministic optimization frameworks have been developed to solve the energy dispatch problem, aiming to maximize the opera- tor profit (or some other form of social welfare cost function) while guaranteeing energy balance also in the presence of one fault in the network, the so-called N-1 worst-case dispatch [7]. When considering the need for limiting operation risk, it is ap- parent that a high penetration of wind generation puts forth big operational challenges [7]. Unlike fossil energy resources, wind generation is non-dispatchable, i.e., it cannot be harvested by re- quest. Further, wind availability exhibits high variability across all timescales, which makes it challenging to forecast (errors can be up to 20% of the forecast value [8]). To compensate for supply uncertainty, researchers have proposed the concept of demand response, i.e., the management of customer energy consumption in response to supply conditions. Indeed, a large fraction of the total daily electricity consumption in developed countries is from residential and small commercial energy users, e.g., water heaters and dish washers, which do not need to operate at a specific moment but only within a time interval, e.g., some time overnight [8]. Traditionally, these energy users pay a fixed price per unit of electricity, which represents an average cost of power generation over a given time frame (e.g., a season). In smart grids with two-way communications, realtime pricing protocols can be implemented so that price can vary according to the availability of energy supply, to incentivize (or disincentivize) energy demand. The first stochastic optimization frameworks have thus been pro- posed [9, 10]. These works aim to determine both optimal energy pricing and dispatch of non-renewable baseline energy (wind pene- tration usually accounts only up to 20-30% of the total energy gen- eration, so fossil generation is still required). On the other hand, these techniques do not explicitly consider the risk of power un- balance during optimization, and only evaluate the probability of loss of load after synthesis via Monte Carlo simulation, to assess the quality of the proposed solution. Unfortunately, these results offer little insight to the operator when the risk is too high. Indeed, Varaiya et al. [7] advocated the need for an optimization frame-

Upload: others

Post on 11-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Robust Strategy Synthesis for Probabilistic Systems Applied to …sseshia/pubdir/emsoft... · 2014-08-07 · Robust Strategy Synthesis for Probabilistic Systems Applied to Risk-Limiting

Robust Strategy Synthesis for Probabilistic SystemsApplied to Risk-Limiting Renewable-Energy Pricing

Alberto Puggelli Alberto L Sangiovanni-Vincentelli Sanjit A SeshiaDepartment of Electrical Engineering and Computer Science

University of California Berkeleypuggelli alberto sseshiaeecsberkeleyedu

ABSTRACTWe address the problem of synthesizing control strategies for El-lipsoidal Markov Decision Processes (EMDP) ie MDPs whosetransition probabilities are expressed using ellipsoidal uncertaintysets The synthesized strategy aims to maximize the total expectedreward of the EMDP constrained to a specification expressed inProbabilistic Computation Tree Logic (PCTL) We prove that theEMDP strategy synthesis problem for the fragment of PCTL dis-abling operators with a finite time bound is NP-complete and pro-pose a novel sound and complete algorithm to solve it

We apply these results to the problem of synthesizing optimal en-ergy pricing and dispatch strategies in smart grids that integrate re-newable sources of energy We use rewards to maximize the profitof the network operator and a PCTL specification to constrain therisk of power unbalance and guarantee quality-of-service for theusers The EMDP model used to represent the decision-makingscenario was trained with measured data and quantitatively cap-tures the uncertainty in the prediction of energy generation An ex-perimental comparison shows the effectiveness of our method withrespect to previous approaches presented in the literature

1 INTRODUCTIONSeveral real-world multi-agent systems exhibit stochastic behav-

ior and can be modeled using formalisms such as Markov Deci-sion Processes (MDPs) [1] Desired properties of such systems canbe both Boolean eg expressible in logics such as probabilisticcomputation tree logic (PCTL) [2 3] or quantitative such as max-imizing a reward function [4] The synthesis of strategies to satisfyBoolean properties and optimize quantitative measures is naturallya topic of much relevance Moreover for probabilistic models thatare inferred from empirical data it is necessary to design strategiesthat are robust to uncertainties in estimated state-transition proba-bilities In this paper we present a new approach to robust strategysynthesis for a class of MDPs with uncertainties with applicationto risk-limiting renewable energy pricing and dispatch

Main Motivating Application Electricity consumption world-wide is projected to grow from 18 trillion kWh in 2006 to 32 trillionkWh in 2030 a 77 increase [5] To avoid catastrophic pollutiondamage to the planet it is necessary to employ energy sources alter-

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies are notmade or distributed for profit or commercial advantage and that copies bearthis notice and the full citation on the first page Copyrights for componentsof this work owned by others than ACM must be honored Abstracting withcredit is permitted To copy otherwise or republish to post on servers or toredistribute to lists requires prior specific permission andor a fee Requestpermissions from PermissionsacmorgESWEEKrsquo14 October 12 - 17 2014 New Delhi IndiaCopyright 2014 ACM 978-1-4503-3052-71410 $1500httpdxdoiorg10114526560452656069

native to fossil fuels [6] In fact an acceleration in the deploymentof renewables is already taking place in Europe Asia and NorthAmerica In this paper we focus on wind energy which nowadayshas higher capacity than solar energy and is expected to consti-tute a significant portion of renewable generation integrated to thepower grids of North America [6]

The correct operation of power systems requires the balance be-tween energy supply and demand at all times The risk for thesystem operator can be quantified both by the probability of notmeeting such a balance constraint and by the (positive) gap be-tween demand and supply High values of either indicator makethe occurrence of disruptions faults and ultimately blackouts morelikely [7] In grids that only integrate fossil energy sources the taskfor the system operator amounts to dispatch the production of en-ergy during the day based on averaged demand profiles A wealthof deterministic optimization frameworks have been developed tosolve the energy dispatch problem aiming to maximize the opera-tor profit (or some other form of social welfare cost function) whileguaranteeing energy balance also in the presence of one fault in thenetwork the so-called N-1 worst-case dispatch [7]

When considering the need for limiting operation risk it is ap-parent that a high penetration of wind generation puts forth bigoperational challenges [7] Unlike fossil energy resources windgeneration is non-dispatchable ie it cannot be harvested by re-quest Further wind availability exhibits high variability across alltimescales which makes it challenging to forecast (errors can beup to 20 of the forecast value [8])

To compensate for supply uncertainty researchers have proposedthe concept of demand response ie the management of customerenergy consumption in response to supply conditions Indeed alarge fraction of the total daily electricity consumption in developedcountries is from residential and small commercial energy userseg water heaters and dish washers which do not need to operateat a specific moment but only within a time interval eg some timeovernight [8] Traditionally these energy users pay a fixed priceper unit of electricity which represents an average cost of powergeneration over a given time frame (eg a season) In smart gridswith two-way communications realtime pricing protocols can beimplemented so that price can vary according to the availability ofenergy supply to incentivize (or disincentivize) energy demand

The first stochastic optimization frameworks have thus been pro-posed [9 10] These works aim to determine both optimal energypricing and dispatch of non-renewable baseline energy (wind pene-tration usually accounts only up to 20-30 of the total energy gen-eration so fossil generation is still required) On the other handthese techniques do not explicitly consider the risk of power un-balance during optimization and only evaluate the probability ofloss of load after synthesis via Monte Carlo simulation to assessthe quality of the proposed solution Unfortunately these resultsoffer little insight to the operator when the risk is too high IndeedVaraiya et al [7] advocated the need for an optimization frame-

work capable of bounding the risk interpreted stochastically at op-timization time Moreover a minimum amount of delivered energyneeds to be guaranteed to the users since otherwise operators couldpotentially increase the energy price to force users out of the systemto obtain power balance at times of little wind generation We re-fer to this guaranteed delivered energy as Quality of Service (QoS)for the users As summarized in the next section our contributiontargets these needs

Paper Contributions Our first contribution is a novel stochasticmodel to capture energy pricing and dispatch strategies for smartgrids with wind energy sources (other renewables can be consid-ered analogously) The model is an Ellipsoidal Markov DecisionProcessMC (EMDP) a special case of the Convex-MDP (CMDP)model first introduced by Puggelli et al [11 12] ie an MDPwhere transition probabilities are only known to lie in ellipsoidalsets While previous works used analytical distributions eg Gaus-sian [10] to model uncertainty in wind availability we use mea-sured data (from the wind farm at Lake Benton Minnesota USA)to train a likelihood model of the wind generation and give quanti-tative means to represent the confidence in the forecast values Wethen approximate the likelihood region with an ellipsoidal modelwhich is more accurate than the linear ones often used in the lit-erature while remaining computationally tractable Our empiricalapproach has the promise of more faithfully representing the prob-ability distribution of the generated energy because it is tailored tothe specific wind farm under analysis and it is robust to forecasterrors

As a second contribution we cast the constrained optimizationproblem as the strategy synthesis problem for EMDPs to maximizethe total expected reward constrained to PCTL properties The op-timization aims to maximize the profits for the system operatorwhile constraints limit the risk of power unbalance and guaranteethe desired QoS for the users We focus on finite-horizon History-dependent Deterministic (HD) strategies ie for each state an op-timal action to take is chosen deterministically based on the entire(finite) execution history of the decision process The limitationto finite-horizon strategies is not restrictive since energy pricingand dispatch decisions are taken on a daily basis As explained inSection 3 History-dependent Random (HR) strategies are in gen-eral more powerful ie they can produce a higher expected re-ward Nevertheless we focus on deterministic strategies becausewe believe that deterministic pricing strategies are easier to adoptin a real-world scenario since they can be better understood bythe system agents (eg household users) Using a classical con-struction [4] we then unfold within the model the finite sequenceof decision epochs over the day by constructing a second EMDPMprimeC in which we replicate the states of the original EMDPMC ateach decision epoch In the strategy synthesis forMprimeC we focus onMarkov (for each state an optimal action is taken based only on thecurrent state) Deterministic (MD) strategies It can be proven [4]that the desired HD strategy for each state s ofMC can be recon-structed by collecting the sequence of optimal MD actions for eachreplica sprime of s along the decision epochs ofMprimeC While the two for-mulations produce the same result we consider the synthesis prob-lem of MD strategies since this formulation allows us to leveragethe results on model-checking of CMDPs described in [11]

As our third contribution we prove that the problem of determin-ing the existence of an MD strategy for EMDPs with total expectedreward higher than a given threshold and constrained to specifica-tions in PCTL is NP-complete This result shows that the problemcomplexity does not increase when introducing uncertainties in thestate-transition probabilities with respect to the strategy synthesisproblem for MDPs which is also NP-complete [13] We then de-velop an algorithm to solve the optimization version of the prob-lem ie maximize the reward The algorithm is the first soundand complete synthesis algorithm for EMDPs that can process ar-bitrary formulas (also with multiple and nested quantitative oper-

ators) for the fragment of PCTL that disallows operators with afinite time bound This technique thus represents an improvementto previously proposed approaches which were not complete [14]or could only process one PCTL operator per time [15] Its keyadvantage is the capability of ranking candidate strategies by thevalue of their reward The first proposed strategy that satisfies thePCTL specification for any resolution of uncertainty is the solu-tion of the synthesis problem Although the algorithm worst-caserunning time is exponential in the size of the EMDP this capabil-ity may allow considerable speed-ups in practical scenarios Theseresults hold also for Interval-MDPs (IMDPs) and for the dual prob-lem of cost minimization (not considered in the paper) Further theproposed algorithm can be applied to a wider range of applicationseg semi-autonomous car driving

The rest of the paper is organized as follows Section 2 givesbackground on CMDPs and PCTL Section 3 presents related workIn Section 4 we describe the proposed algorithm for the synthesisof constrained optimal strategies for EMDPs We then give detailsof the EMDP model used to synthesize energy-pricing strategiesin smart grids with renewable sources in Section 5 and presentexperimental results in Section 6 Lastly we conclude and discussfuture directions of research in Section 7

An extended version of this paper is available at [12]

2 PRELIMINARIESFor a given vector v we will use

1Tv =sumi v[i] =

sumi vi

to represent the sum of all the elements of the vector

Definition 21 A Probability Distribution (PD) over a finite setZ of cardinality n is a vector micro isin Rn satisfying 0 le micro le 1 and1Tmicro = 1 The element micro[i] represents the probability of realizationof event zi We call Dist(Z) the set of distributions over Z

21 Convex Markov Decision Process (CMDP)

Definition 22 CMDP A labeled finite CMDPMC is a tupleMC = (S S0 AΩF AX L) where S is a finite set of statesof cardinality N = |S| S0 is the set of initial states A is a finiteset of actions (M = |A|) Ω is a finite set of atomic propositionsF is a finite set of convex sets of transition PDs A S rarr 2A is afunction that maps each state to the set of actions available at thatstate X = S timesArarr F is a function that associates to state s andaction a the corresponding convex set Fas isin F of transition PDsand L S rarr 2Ω is a labeling function

The set Fas = Distas(S) represents the uncertainty in defining atransition distribution forMC given state s and action a We callfas isin Fas an observation of this uncertainty Also fas isin RN and wecollect the vectors fas foralls isin S into an observed transition matrixF a isin RNtimesN Abusing terminology we call Fa the uncertainty setof the transition matrices and F a isin Fa Fas is interpreted as therow of Fa corresponding to state s Finally fasisj = fasi [j] is theobserved transition probability from si to sj under action a Thedata-type of a isin A(si) can be different from the one of b isin A(sj)if si 6= sj We will use this property in the case study presented inSection 5

To model uncertainty in state transitions we make the followingassumptions

Assumption 21 Rectangular Uncertainty Fa can be factoredas the Cartesian product of its rows ie its rows are uncorrelatedFormally for every a isin A Fa = Fas0 times middot middot middot times F

asNminus1

This assumption will allow us to consider one state transition at

a time and it eases the development of the synthesis algorithm

Assumption 22 CMDP Semantics CMDPs model nondetermin-istic choices made from a convex set of uncountably many choicesEach time a state is visited a transition distribution within the set isadversarially picked and a probabilistic step is taken accordingly(The same semantics is used for IMDPs in [16])

A transition between state s to state sprime in a CMDP occurs in threesteps First an action a isin A(s) is chosen nondeterministicallySecondly an observed PD fas isin Fas is chosen The selection of fasmodels uncertainty in the transition Lastly a successor state sprime ischosen randomly according to the transition PD fas

A path π in MC is a finite or infinite sequence of the form

s0

fa0s0s1minusminusminusrarr s1

fa1s1s2minusminusminusrarr middot middot middot with faisisi+1

gt 0 foralli ge 0 si isin Sai isin A(si) We indicate with Πfin (Πinf ) the set of all finite (in-finite) paths ofMC πs[i] (πa[i]) is the ith state (selected action)along the path and for finite paths last(π) is the last state visitedin π isin Πfin Πs = π | π[0] = s is the set of paths starting in s

The algorithm presented in Section 4 can be applied both to theinterval and ellipsoidal models of uncertainty [11] (and to a mix ofthem) but in this paper we will focus on the latter one since it ismore suitable for the application analyzed in Section 5 This modelis a second-order approximation of the likelihood model of uncer-tainty (likelihood models are often used when transition probabil-ities are determined experimentally) and it is more accurate thana linear one [17] The transition frequencies associated to actiona isin A are collected in matrix Ha Uncertainty in each row of Ha

can be described by the likelihood region gas

gas = fas isin RN|sumsprime h

assprime log(fassprime)geβ

as (1)

where βas lt βasmax =sumsprime h

assprime log(hassprime) represents the uncer-

tainty level The second-order approximation of gas is [17]

gas asymp fas isin RN|sumsprime

(fassprimeminushassprime)

2

hassprime

le(Kas )2 (2)

withKas = 2(βasmaxminusβas ) ge 0 representing the uncertainty levelWe then write the approximation of gas in conic form and intersectit with the probability simplex to obtain the uncertainty set

Fas = fas isin RN | fas ge 01T fas = 1 (3)Eas (fas minus has) 2 le 1 Eas 0

with Eas = (Kas )minus1 times diag((hass0)minus05 middot middot middot (hassN )minus05

) 0

positive definite Note that the conic representation is crucial toenhance the scalability of the algorithmic approach presented inSection 4 because convex solvers process conic constraints moreefficiently than generic quadratic constraints

Definition 23 Ellipsoidal-MDP (EMDP) An EMDP is a CMDPwhere every uncertainty set Fas isin F is expressed using Set (3)

We determine the size R of the CMDP MC as follows MChas N states O(M) actions per state and O(N2) transitions foreach action Let Da

s denote the number of constraints required toexpress the rectangular uncertainty set Fas and D = max

sisinSaisinADas

The overall size ofMC is thusR = O(N2M +NMD)To illustrate our results we will use the EMDPMC in Fig 1

with S = s0 middot middot middot s3 S0 = s0 A = a b Ω = ϑ absA s0 s1 rarr a b s2 s3 rarr a L s0 s2 rarr ϑ s3 rarr abs The parameters of the uncertainty ellipsoids are shownnext to each transition

211 Strategies and NatureTo analyze quantitative properties we need a probability space

over infinite paths [18] However a probability space can onlybe constructed once nondeterminism and uncertainty have been re-solved We call each possible resolution of nondeterminism a strat-egy which chooses an action in each state ofMC

s013rs=113ϑ13

s113rs=313empty13

s313rs=013

abs13

s213rs=113ϑ13

b13

a 13

rb= 613K=0113

ra= 913K=0213

h02b = 025

h01b = 075

h02a = 035

h03a = 065

h23a =1

h33a =1

h12b =1

h12a = 08

h11a = 02

b 13

a13

a13a13

Fig 1 Example EMDP

Definition 24 Strategy A randomized strategy forMC is a func-tion σ = Πfin times A rarr [0 1] with

sumA(last(π)) σ(π a) = 1 and

a isin A(last(π)) if σ(π a) gt 0 We call Σ the set of all strategiesσ ofMC

Conversely we call a nature [17] each possible resolution of un-certainty ie a nature chooses a transition PD for each state andaction ofMC

Definition 25 Nature Given action a isin A a randomizednature is the function ηa Πfin times Dist(S) rarr [0 1] withintFalast(π)

ηa(π fas ) = 1 and fas isin Falast(π) if ηa(π fas ) gt 0 We

call Nat the set of all natures ηa ofMC

A strategy σ (nature ηa) is Markovian (M) if it depends only onlast(π) Also σ (ηa) is deterministic (D) if σ(π a) = 1 for somea isin A(last(π)) (ηa(π fas ) = 1 for some fas isin Falast(π)) Asexplained in Section 1 we developed a synthesis algorithm for MDstrategies There are in total I =|As0 | times middot middot middot times |AsN |= O(MN )

MD strategies collected in the set ΣMD After fixing a strategy σ all the non-determinism inMC is re-

solved For MD strategies we obtain the induced Convex MarkovChain (CMC) Mσ

C = (S S0ΩF X L) MσC has still size R

since the state space S does not change for MD strategies Furtherthe only convex set Fas isin F of transition PDs available at eachstate s isin S is the one corresponding to the action a isin A(s) suchthat σ(s a) = 1

212 RewardsRewards allow modeling additional quantitative measures of a

CMDP eg profit We associate rewards to states and to actionsavailable in each state

Definition 26 Reward Structure and Path Reward A re-ward structure for a CMDP MC is a tuple r = (rs ra)comprising a state (action) reward function rs S rarr Rge0

(ra S timesArarr Rge0) Given a (possibly infinite) path πwith horizon T isin N cup +infin the path reward for π isrewr(π T ) =

sumTt=0 rs(πs[t]) + ra(πs[t] πa[t])

In this paper we will rank the available MD strategies for theCMDP based on their total expected reward

Definition 27 Total Expected Reward The total expected rewardfor state s isin S under strategy σ isin ΣMD is defined as

Wσs = min

ηaisinNatEση

a

[rewr(π T )] (4)

where we minimize across the action range ηa isin Nat of the ad-versarial nature the expected reward over all paths starting from swith horizon T visited under strategy σ

We will only consider CMDPs such that Wσs exists and it is

finite foralls isin Sforallσ isin ΣMD These include finite and infinite-horizon CMDPs (T isin N cup +infin) with zero-reward absorbingstates (ω = abs) as the ones used in Section 5 An extension toother classes of CMDPs eg with negative rewards is possible insome specific cases but outside the scope of the paper For moredetails see [4] Finally according to Assumption 22 we can sub-stitute ηa isin Nat with fas isin Fas and only consider MD natures

22 Probabilistic Computation Tree LogicWe use Probabilistic Computation Tree Logic (PCTL) a proba-

bilistic logic derived from CTL which includes a probabilistic op-erator P [2] and a reward operator R [19] to express properties ofCMDPs [3] The syntax of the fragment of the logic in which oper-ators with a finite time bound are disallowed is defined as follows

φ = True | ω | notφ | φ1 and φ2 | P1p [ψ] | Rr1v[ρ] state formulasψ = X φ | φ1 U φ2 path formulasρ = C φ rewards

with ω isin Ω an atomic proposition 1isin le ltge gt p isin [0 1]v isin Rge0

Path formulas use the Next (X ) and Unbounded Until (U ) op-erators They are evaluated over paths and only allowed as param-eters to the P operator Reward formulas use the Cumulative (C )operator The sizeQ of a PCTL formula is defined as the number ofBoolean connectives plus the number of P and R operators in the

formula We define Pσηa

s [ψ]def= Prob

(π isin Πσηa

s | π |= ψ)

the probability of taking a path π isin Πs that satisfies ψ under strat-egy σ and nature ηa Pσmaxs [ψ] (Pσmins [ψ]) denotes the maxi-mum (minimum) probabilityPση

a

s [ψ] across all natures ηa isin Natfor a fixed strategy σ An analogous definition holds also for re-ward properties which can be expressed also using (multiple) re-ward structures different from the one used to maximize the totalexpected reward For a CMDP MC strategy σ and property φwe will useMC σ |=Nat φ to denote that when starting from anyinitial state s isin S0 and operating under σMC satisfies φ for anyηa isin Nat The semantics of the logic is reported in Table 1 wherewe write |= instead of σ |=Nat for brevity

3 RELATED WORKRelated work falls into two main categories renewable-energy

pricing in smart grids and strategy synthesis from PCTL specifica-tions for probabilistic systems

The integration of renewable energy sources in power grids hasmotivated the development of stochastic frameworks to solve theenergy-pricing problem The work in [20] presents a risk-limitingoptimization framework That effort focuses on modeling the un-certainty in energy availability on the supply side but it does not

Table 1 PCTL semantics forMC s |= Trues |= ω iff ω isin L(s)s |= notφ iff s 6|= φs |= φ1 and φ2 iff s |= φ1 and s |= φ2

s |= Pp(p) [ψ] iff Pσmax(min)s [ψ] p(p)

s |= Rv(v) [C φ] iff Eσmax(min)s [rewr(π tφ)] v(v)

π |= X φ iff π[1] |= φπ |= φ1 U φ2 iff existi ge 0 | π[i] |= φ2 and forallj lt i π[j] |= φ1

rewr(π tφ) = Σtφt=0rs(πs[t]) + ra(πs[t] πa[t])

tφ = mint | πs[t] |= φ

consider the problem of controlling user demand through economicincentives The effectiveness of demand response in balancing sup-ply and demand in power grids was studied in [9] and a stochas-tic framework to optimize operator profits was presented in [10]We closely follow the optimization setup presented in these worksbut we also constrain the operator risk and the user QoS at syn-thesis time Finally Varaiya et al argued the need to quantita-tively constrain the operator risk in [7] The risk-limiting dispatchapproach proposed in that work is optimal for properties of theform Pge1 or Ple0 but sub-optimal for properties with satisfactionthreshold p isin (0 1) Further QoS is not considered

The problem of strategy synthesis for MDPs from PCTL speci-fications was first studied in [13] Strategies are divided into fourcategories depending on 1) whether the transition is chosen deter-ministically (D) or randomly (R) 2) the choice does (does not) de-pend on the sequence of previously visited states (Markovian (M)and History-dependent (H)) Also it is proven that the four types ofstrategies form a strict hierarchy (MD ≺ MR ≺ HD ≺ HR) andthat determining whether it exists an MDMR (HDHR) strategythat meets all specifications is NP-complete (elementary) Kuceraet al [21] show how to synthesize MR controllers that are robustto linear perturbations via a reduction to a formula in the first-order logic of the reals This work is the closest to ours albeitwe consider also non-linear models of uncertainties Also to thebest of our knowledge the algorithm has not been applied to anycase study In [14] and [15] routines for the model-checking ofPCTL properties of MDPs are adapted to the strategy synthesisproblem These algorithms are polynomial in the model size butthey are not complete (ie there might exist a solution even if noneis reported) [14] or can handle properties with only one quanti-tative operator [15] In [22] the synthesis of multi-strategies forMDPs is studied This approach can handle only a subset of PCTLproperties and it only considers MDPs with no uncertainties Fi-nally [23] studies two-player games with winning objectives ex-pressed in PCTL Also our formulation can be interpreted as agame where the controller plays against nature On the other handfollowing the CMDP semantics defined in Assumption 22 we givenature the power of selecting a different strategy at each executionstep while [23] aims to analyze the more complex problem of find-ing the single optimal strategy for nature Our formulation is usefulto model time-varying processes (eg the power generated by thewind) and an optimal strategy for the controller can be synthesizedalgorithmically The existence of an optimal controller and naturein the formulation of [23] is instead undecidable in general

4 CONSTRAINED TOTAL EXPECTED RE-WARD MAXIMIZATION FOR CMDPS

We formally define the optimization problem under analysis provethat its decision problem version is NP-complete and present an al-gorithm to solve the former

Constrained Total Expected Reward Maximization for EMDPsGiven an EMDPMC a reward structure r and a PCTL formulaφ determine strategy σlowast forMC such that

σlowast = argmaxσisinΣMD

φ

WσS0

(5)

where WσS0

is the sum of the expected rewards over all the initialstates s isin S0 and ΣMD

φ is the set of Markov-Deterministic strate-gies for whichMC satisfies φ for any ηa isin Nat starting from anys isin S0 and operating under σ isin ΣMD

φ The same results hold for the dual problem of minimizing the

EMDP total expected cost after substituting all ldquomaxrdquo (ldquominrdquo) op-erators with ldquominrdquo (ldquomaxrdquo) in the derivations below

We will use the following lemmas (the proofs are available in thereferences)

Opmizaon13 Engine13 (strategy13 ranking)13

Feasible13 Candidate13

op0mal13 solu0on13 NO13 addi0onal13 constraint13

Op0mal13 strategy13 that13 sa0sfies13 proper0es13

No13 strategy13 sa0sfies13 proper0es13

Verificaon13 Engine13 (verifies13 if13 properes13 are13 sasfied)13

Unfeasible13

YES13

Fig 2 Lazy algorithm for the constrained optimization

Lemma 41 Complexity of PCTL model-checking forCMDPs [11] Verifying that a CMDP MC satisfies a PCTL for-mula φ is solvable in polynomial time

Lemma 42 Computation of the total expected reward for Con-vex Markov Chains (CMCs) [17] Given a CMC its total ex-pected reward is computable in polynomial time

Lemma 43 Complexity of PCTL strategy synthesis forMDPs [13] The problem of determining the existence of an MDstrategy σ for an MDPM such thatM σ |= φ is NP-complete

We now prove that the decision problem version of Problem (5)is NP-complete

Theorem 41 The problem of determining the existence of an MDstrategy σ for an EMDP MC with total expected reward Wσ

S0

larger or equal to WT and satisfying a specification φ in PCTLis NP-complete

Proof Given a candidate solution σc we can in polynomial time1) check whether MC σc |=Nat φ by Lemma 41 2) computeWσcS0

on the induced CMCMσcC by Lemma 42 σc is a solution if

and only if check 1) passes and WσcS0ge WT This proves that the

problem is in NP To prove NP-hardness we reduce the problem inLemma 43 to the one under analysis We set WT = 0 and de-scribe transition probabilities with point ellipses ie ellipses withnull axes (Kas rarr +infin foralls isin S foralla isin A(s) in Equation (2))

In the rest of the section we describe an algorithm to solve Prob-lem (5) We use a lazy approach based on strategy-iteration con-ceptually similar to the ones proposed in [24] and for non-linearconstraints [25] As shown in Fig 2 the algorithm is split into twomain routines communicating in a loop At each iteration the opti-mization engine (OE) is responsible to generate a candidate strategyσc σc maximizes the total expected reward but it does not neces-sarily satisfy PCTL property φ since the OE formulation does notcontain any information about φ The candidate solution is thenpassed to the verification engine (VE) which checks whetherMCsatisfies φ for all resolutions of uncertainty ie forallηa isin Nat whenoperating under σc If the check passes σlowast = σc and the algo-rithm terminates Otherwise the VE generates an additional con-straint for the OE to prevent the previous σc to be selected againand the loop repeats The novelty of our approach is devising amathematical formulation for the OE capable of generating at eachiteration the candidate strategy that maximizes the total expectedreward among those still available (at each iteration in which theVE reports a failure the candidate strategy is discarded) The firstcandidate strategy that satisfies the PCTL specification becomes the

solution of the synthesis problem Such a strategy is feasible (sat-isfies φ) and it has the maximum total reward among strategies thatsatisfy φ so it solves Problem (5) exactly

In general there is not guarantee on the uniqueness of the op-timal strategy σlowast ie multiple strategies with the same expectedreward might exist each satisfying φ Since every such strategyis equivalent from the user perspective the algorithm just reportsthe first one found Alternatively all equivalent strategies could begenerated by continuing the iteration between the OE and the VEuntil finally a reduction in the expected reward gets detected andby reporting all synthesized strategies satisfying φ

The next subsections give details on the OE and VE and analyzethe algorithm properties

41 Optimization EngineWe start with the classical linear-programming (LP) formulation

to maximize the total expected reward for MDPs [4]

minxl

sumsisinS0

xs

st xs minus las = ras + xTfas foralls isin Sforalla isin A(s) (6)xs l

as ge 0 foralls isin Sforalla isin A(s)

Vector x collects the total expected reward for each state s isin S (atthe end of the optimization Wσc

s = xs foralls isin S) and the cost func-tion sums the total expected rewards for all the initial states s isin S0We then set Wσc

S0=sumsisinS0

xs Variables las are slack variablesfor each constraint Also ras = rs(s) + ra(s a) foralls isin Sforalla isin ASince the slack variables have negative sign the slack can only benegative ie the left-hand side (LHS) can only be larger or equalthan the right-HS (RHS) The ldquominrdquo operator makes sure that foreach state the constraint with the highest RHS has null slack ielas = 0 The optimal strategy can then be reconstructed by selectingthe action a isin A(s) foralls isin S corresponding to the constraint withnull slack eg σc(s0 a) = 1 if las0 = 0 Our goal is to modifysuch a formulation to allow a sub-optimal solution to be selected incase the optimal solution does not satisfy the PCTL specificationWe now describe an equivalent formulation to the original prob-lem that is more suitable to achieve this goal We will describe inSection 42 how to add constraints to this formulation to select sub-optimal solutions in order of optimality We refer to Problem (7)

maxxzln

sumsisinS0

xs

st xs minus las + nas = ras + xTfas foralls isin Sforalla isin A(s) (7a)las le Bzas nas le Bzas foralls isin Sforalla isin A(s) (7b)

zTs 1 = Ms minus 1 foralls isin S (7c)xs l

as n

as ge 0 zas isin 0 1foralls isin Sforalla isin A(s)

We associate a binary variable zas to each action for every stateso the problem becomes a Mixed-Integer Linear Program (MILP)zajsi = 0 if action aj is chosen for state si and Constraint (7c) guar-

antees that only one action can be selected for each state whereMs = |A(s)| For example σc(s0 a) = 1 if zas0 = 0 We then as-sociate to each constraint a second slack variable nas with sign op-posite to las For selected actions zaisi = 0 Constraint (7b) makessure that zajsi = 0 implies lajsi = 0 and najsi = 0 so that the cor-responding Constraint (7a) sets the value of xs1 For unselectedactions zaksi = 1 variable laksi gt 0 (naksi gt 0) if selecting ac-tion ak had resulted in a lower (higher) value of xsi With these

1 B is a big number with respect to the problem data It needs to behigher than the reward computed for any state ie B ge xs foralls isinS but not excessively high to avoid convergence problems in theoptimizer when solving Problem (7) A suitable value of B can befound by trial and error

constraints any action can be selected We finally change the op-timization operator to ldquomaxrdquo so that at the first iteration of thealgorithm the total expected reward gets maximized

We now proceed to consider uncertainties in the transition prob-abilities Constraint (7a) gets updated to Constraint (8) since theadversarial nature tries to minimize the expected reward The newconstraint can be made linear again for an arbitrary uncertaintymodel by replacing it with a set of constraints one for each pointin Fas However this approach results in infinite constraints if theset Fas contains infinitely many points as in the cases consideredin the paper thus making the problem not solvable We solve thisdifficulty for the ellipsoidal uncertainty model using duality InConstraint (9) we replace the primal inner problem with its dualforalls isin S a isin A(s)

xs minus las + nas = ras + minfas isinFas

xTfas (8)

dArrxs minus las + nas = ras + max

λasisinDas (x)

g (λas ) (9)

Vector Lagrange multiplierλas =

[λa1s λ

a2sλ

a3s

]Dual cost functiong (λas ) = λa1s minus λa2s minus ha

sTEasλ

a3s

Dual feasibility setDas (x) = λas | λa3s2 le λa2s xminus λa1s1 + Eas

Tλa3s = 0

The dual problem is convex by construction [26] and has size poly-nomial in the size R of the EMDP [11] Since also the primalproblem is convex strong duality holds ie the primal and dualoptimal solutions coincide because the primal problem satisfiesSlaterrsquos condition [26] for any non-trivial uncertainty set Fas Anydual solution underestimates the primal solution When substitut-ing the primal problem with the dual in Constraint (9) we can dropthe inner optimization operator because the outer optimization op-erator will nevertheless aim to find the least underestimate to max-imize its cost function We get the full formulation for the OE (thequantifiers foralls isin Sforalla isin A(s) are equal to Problem (7))

maxxλlnz

sumsisinS0

xs

st xs minus las + nas = ras + g (λas )

las le Bzas nas le Bzas (10)

zTs 1 = Ms minus 1

xs las n

as ge 0λas isin Das (x) zas isin 0 1

For the ellipsoidal model Problem (10) can be written as Prob-lem (11) which is a Mixed-Integer Quadratic-Constrained Program(MIQCP) that we solve using the back-end optimizer Gurobi [27]

maxxλzln

sumsisinS0

xs

st xs minus las + nas = ras + λa1s minus λa2s minus hasTEasλ

a3s

las le Bzas nas le Bzas

zTs 1 = Ms minus 1 (11)

xs las n

as λ

a2s ge 0 λa3s ge 0 zas isin 0 1

λa3s2 le λa2s xminus λa1s1 + EasTλa3s = 0

42 Verification EngineAfter fixing a candidate strategy σc all the non-determinism inMC is resolved The VE has thus the task of checking whetherthe induced Ellipsoidal-MC (EMC) Mσc

C = (S S0ΩF X L)satisfies the PCTL formula φ for all resolutions of uncertainty ieforallηa isin Nat We use the sound and complete verification algo-rithm presented in [11 12] (we remember that EMCs are a special

case of CMDPs) If the EMC satisfies φ then the optimal strat-egy has been found and σlowast = σc Otherwise the VE needs togenerate an additional constraint to be passed to the OE so thatthe same candidate solution does not get selected anymore If vec-tor zc = [zas0 middot middot middot z

asN ] collects all the binary decision variables that

were set to zero in the previous round of optimization ie the vari-ables corresponding to the previously selected actions we just needto add constraint

zTc 1 ge 1 (12)

so that it is not possible to select the same set of actions againAs an example we optimize the total expected reward of the

EMDP in Fig 1 subject to property φ = Pge08[ϑ U abs] The firstiteration of the OE generates strategy σc1 which selects actions[b a a a] for states [s0 middot middot middot s3] with Wσc1

s0 = 10625 The VEthough reports Pσc1mins0 [ϑ U abs] = 0207 so the strategy is re-jected The VE adds the constraint zbs0 + zas1 + zas2 + zas3 = 1 tothe OE formulation which generates at the second generation σc2([a a a a]) with Wσc2

s0 = 10188 The VE computes the valuePσc2mins0 [ϑ U abs] = 1 and the algorithm terminates reportingσlowast = σc2

43 Algorithm AnalysisWe prove soundness (if a strategy σlowast is returned it indeed opti-

mally solves Problem (5)) completeness (if no solution is returnedno strategy σ isin ΣMD exists such thatMC σ |=Nat φ) and ana-lyze the runtime performance of the proposed algorithm

Theorem 42 The algorithm presented in this section to solveProblem (5) is sound complete and has runtime exponential in thesize R of the EMDP and polynomial in the size Q of the PCTLspecification φ

Proof Problem (10) returns the MD strategy σc that maximizesWσS0

among those still available By Lemma 41 the VE is soundso if it returns MC σc |=Nat φ indeed σlowast = σc (exit arrow atthe bottom of Fig 2) The VE is also complete so if it returnsMC σc 6|=Nat φ the current σc can be discarded This is doneby generating a constraint of the form of Constraint (12) whichremoves only the current σc from the strategies to be explored bythe OE This proves soundness of the overall algorithm

Failure of finding a solution is declared only by the OE whenProblem (10) becomes unfeasible because all available strategieshave previously been discarded by the VE (exit arrow at the top ofFigure 2) This proves completeness

Finally the algorithm goes at most through I = O(MN ) itera-tions (from Section 2 I is the total number of MD strategies M isthe number of actions and N the number of states of the EMDP)Each requires solving an instance of Problem (10) and a verificationcheck (done in time polynomial inR andQ by Lemma 41) Prob-lem (10) can be solved by branch-and-bound algorithms in time ex-ponential in the number of binary variables (whose number isMNand remains constant across iterations) and polynomial in the num-ber of constraints (whose number is polynomial in R at the firstiteration and it grows at each iteration limited by I = O(MN ))The total complexity isO(MNtimes

(2MNtimespoly(MN )+poly(Q))

)

exponential inR and polynomial inQ

The algorithm performs better on problems that do have a fea-sible solution arguably the most interesting ones while the opti-mization step could be removed if the goal is to prove unfeasibilityAs an alternative σlowast could be determined by testing all I avail-able MD strategies and selecting the one with the highest rewardamong those satisfying φ [13] We believe (and experimentallyshow in Section 6) that our approach can achieve better runningtime by decoupling the problem into an optimization and a veri-fication part and by testing strategies in order of optimality Fi-

Real-time price13

Day-ahead price13

Network Operator13Energy dispatch and pricing13

Base-line13 Wind13 Fast-start13

Traditional13Users13

Opportunistic13Users13

Day-ahead13dispatch13

Real-time dispatch13

Uncertain demand13

Uncertain supply13

Supply13

Demand13

Fig 3 Network operator inputs (dashed) and outputs (solid)

nally speed-ups can be obtained by implementing online routinesfor integer-constraint simplification and to produce more succinctcertificates of unfeasibility from the VE [2425] These routines areoutside the scope of this paper and will be covered in future work

5 SUPPLY SCHEDULING AND ENERGYPRICING WITH RENEWABLE SOURCES

We model the pricing and dispatch problem following the sce-nario sketched in Fig 3 [7 10 28] More details are availablein [12]

Three agents operate in the system the network operator andtwo types of users traditional and opportunistic Further three en-ergy sources are available non-dispatchable wind and two dis-patchable sources baseline and fast-start generators We will usestochastic variables WDt Do to represent wind energy supplyand traditional and opportunistic user demand respectively

The network operator takes two kinds of decisions 1) dispatchof non-renewable sources to guarantee that the aggregate energysupply matches the demand 2) pricing of energy to maximizeprofits and incentivize users to join or leave the network depend-ing on energy availability Traditional and opportunistic users re-act to pricing decisions on different timescales Traditional usersonly react to day-ahead pricing to decide how much energy theyare willing to purchase while opportunistic users are capable ofrescheduling in real-time their energy demand depending on theenergy price in exchange of lower expected prices

A 24-hour period gets divided into T1-slots of equal length (egT1 = 1h) and each T1-slot into K T2-slots (eg T2 = 30minK = 2) [10] The operator maximizes its economic profit by tak-ing decisions on the two time scales On day-ahead for each T1-slot it dispatches Q units of baseline energy (unit cost c1) withq = QK units per T2-slot and sets the price for traditional users(u) so that they can decide one day ahead when to schedule theirdemand in the following day The choice of u sets the expected de-mand of traditional users (E[Dt]) In real-time for each T2-slot theoperator first observes the values of traditional-user demand dt andwind availabilityw It then sets the price for opportunistic users (v)to determine the expected demand of opportunistic users (E[Do])Third it dispatches the production of more fast-start energy (unitcost c2) or cancels part of the already dispatched baseline energy(paying only unit cost cp instead of c1) depending on wind avail-ability and user demand to balance supply and demand In realscenarios cp lt c1 lt c2 and wind-energy is assumed to be free forbrevity [10] The operator thus tries to use as much wind energyas possible and to dispatch on day-ahead the exact amount of base-line energy not to incur in real-time in cancellation costs or in theextra cost for fast-start supplies Since more profitable strategiesmight imply a higher reliance on the uncertain wind energy or anincrease in energy prices correct system functionality needs limitson energy-unbalance risks and QoS guarantees for users

There are three sources of stochastic behavior (i) wind-energysupply (W ) (ii) traditional (Dt) user demand (iii) opportunistic(Do) user demand We thus use a stochastic optimization frame-work The result of the optimization should return optimal strate-

wα 0 0

w1αdt1αdo1γ

w1αdt1αdo1β

k=013 k=113 k=213

v1a

v1a

v1av1

b

v1b

Qa ua

Qb ub

Qa ub

w1β 0 0

s0

0 0 0

w1α 0 0w1αdt1α 0

w1β dt1β 0w1β dt1α 0

w1αdt1β 0

w1αdt1αdo1α

w1αdt1αdo1δv1b

w2β 0 0

w2α 0 0

wαdtαdoγ

wαdtαdoβva

va

vavb

Qa ua

Qb ub

Qa ub

wβ 0 0

s0

0 0 0

wαdtα 0

wβ dtβ 0wβ dtα 0wαdtβ 0

wαdtαdoα

wαdtαdoδvb

vb

ab uQ

ab uQ

Fig 4 Sketch ofME (top) andMprimeE (bottom) The latter isME

unfolded across decision epochs For each EMDP the initial states0 is shown on the left Each state is represented by the tuple(w dt do) of observations of WDt Do The pairs (Q u) rep-resent the day-ahead decisions about dispatch of baseline energyQand energy pricing for traditional users u Two arrows per decisiondepart from each state because in this figure we assumed only twodiscretization levels for each quantity (labeled with greek lettersα β middot middot middot ) We only show the state graph in the EMDPs related todecision (Qa ua) but similar state graphs are present also for theother decision pairs (Qa ub) (Qb ua) (Qb ub) as hinted by thedashed arrows departing from s0

gies about 1) the day-ahead decisions (Q and u) and 2) the real-time decisions (v) for each possible observation of W and Dt Inreal-time the actual decision (vobs) will be taken deterministicallyamong the synthesized ones based on the observed values w anddt ie the actual wind availability and traditional-user demandIn summary we will optimize over one T1-slot (the decision prob-lem is periodic so we can run one optimization for each T1-slotstand-alone) and aim to determine optimal values for Q u and v

We use the Ellipsoidal-MDPME = (S S0 AΩF AX L)sketched in Fig 4 (top) All quantities are bounded and uniformlydiscretized to keep the state and action spaces finite States s isin Sare a tuple s = (w dt do) where w dt do refer to the observedvalues of available wind energy and user demand in that state Weconsider only one T1-slot per time so we model only one choiceof day-ahead energy dispatch Q and pricing for traditional usersu (at the initial state s0 (Q u) isin A) The process then transi-tions through K decision epochs as follows First values of windenergy (w) and traditional user demand (dt) are stochastically cho-sen according to the corresponding distributions (described below)Second for each observation of W and Dt a decision on v isin A ismade and the opportunistic user demand (do) is stochastically cho-sen To transition between epochs a new value of wind energy w(the backward arrow in Fig 4 (top)) is chosen and the steps repeat

In general the optimal decision v at each epoch depends onprevious observations of wind availability (w) and user demand(dt do) so v is history-dependent To synthesize control strategiesusing the algorithm in Section 4 we unfold the sequence of deci-sion epochs in the EMDPMprimeE = (Sprime S0 A

primeΩF primeAprimeX prime Lprime)in Fig 4 (bottom) InMprimeE we have explicitly marked each quan-tity with an additional subscript k = 1 middot middot middot K to refer to the cor-responding T2-slot Each state s isin S ofME (apart from the initialstate) has been replicated K times in MprimeE s rarr s1 s2 middot middot middot sK After K decision epochs the states transition to an absorbing state(not shown in Fig 4) Since now all decision epochs are explicitly

codified in the model Markov strategies are optimal forMprimeE Thecorresponding optimal history-dependent strategy forME can bereconstructed for each s isin S by collecting the sequence of optimaldecisions returned by the algorithm in the replicas s1 s2 middot middot middot sK In the following we will only considerMprimeE As an additional ad-vantage different wind distributions can be used to transition be-tween different epochs inMprimeE (only one distribution could be usedinME) to account for the time-varying nature of wind availabilityThe modeling expressivity is thus increased

State transition probabilities are computed using the followingstochastic models

User demand The demand of both traditional (Dt) and oppor-tunistic (Do) users is modeled using Gaussian distributions [10]with Dt sim N (αtu

γt βtE[Dt]) and Do sim N (αovγo βoE[Do])

Parameter γt lt 0 (γo lt 0) is the elasticity of the traditional (op-portunistic) users ie the ratio of the percentage change of theexpected demand to that of price variations formally defined asγt = uE[Dt] middot partE[Dt]partu Parameters αt αo βt βo are fitting pa-rameters To compute transition probabilities we truncate and dis-cretize the continuous PDs in equally-sized intervals pick the mid-dle point of each interval as the discretization value and then inte-grate the PD across the interval to determine how likely the systemtransitions to that discretized value2

Wind-energy availability We created a stochastic model of theavailable wind energy starting from measured data collected fromthe wind farm at Lake Benton Minnesota USA [29] The goal is totake forecast values into account while also considering the intrin-sic inaccuracies of these predictions First we compute the (dis-crete) empirical PD microW of a training set of collected wind-energydata Second we divide a new set of data in T2-slots and considerthe average value for each new T2-slot as the forecast energy valueWe then scale microW to have such expected value E[Wk] thus obtain-ing microWk Finally we compute ellipsoidal Sets (3) F primeas (collectedinF prime) to represent uncertainty in the transition probability betweentwo discretized energy levels in two consecutive T2-slots Transi-tion frequencies are computed by counting observed transitions in atraining set of data Using classical results from statistics [17] wethen compute the value of parameter β from Set (1) correspond-ing to the confidence level CL in the measurements In particular0 le CL le 1 and CL = 1minus cdfχ2

d(2 lowast (βmax minus β)) where cdfχ2

d

is the cumulative density function of the Chi-squared distributionwith d degrees of freedom (d is equal to the number of bins used todiscretize W ) Moreover an increasing value of parameter β canbe used in the sequence of decision epochs to model the fact thatforecast farther-away in time are less accurate

We provide the states with thick circles in Fig 4 (bottom) withthree reward structures to express the profit and risk of the oper-ator and the QoS for the users We choose those states becausethe quantities Dtk DokWk are all fully observable in them thusallowing the computation of the rewards We set

rProfitsk [$] = udtk+vkdokminus(cp∆k+c1(qminus∆k))1A∆kge0

minus(c1qminusc2∆k)1B∆klt0 (13a)

rLoLsk [MWh] = max(0 XE[Wk] + Y E[Dtk +Dok]minus∆k)(13b)

rQualitysk [MWh] = dtk + dok (13c)

with ∆k = wk + q minus dtk minus dok representing the surplus ofsupply on demand X and Y defined in the following and 1 theindicator function Reward (13a) subtracts operating costs to the

2For simplicity we assume that the distributions of Dt and Doare known with certainty In fact these distributions are usuallyestimated more accurately than the wind forecast by analyzing thehistory of measured data [10] Adding uncertainty models also forthese distributions is left as future work

operator revenue to compute the net profit Indicator 1A (1B) cor-responds to the scenario when the sum of day-ahead dispatchedand wind energy is sufficient (insufficient) to cover the demandIn the latter case fast-start energy needs to be dispatched in real-time Reward (13b) computes the Loss of Load (LoL) In prac-tical scenarios the amount of fast-start energy available in real-time is limited Often this limit is computed with the formulaFS le XE[W ] + Y E[Dt +Do] (eg X = 3 Y = 10) [30]If ∆ + FS lt 0 the network incurs in a LoL with potentiallyrisky consequences Reward (13c) accounts for user demand in-centivized by energy pricing

Finally we mark all states with ∆ +FS lt 0 with the label riskand use label abs for the absorbing state so Ω = risk abs

The optimal strategy σlowast = (ulowast Qlowast vlowastk) 1 le k le K is thesolution of problem

Wlowasts0 = maxQu

minfas isinFprimeas

Eσfas

W EσDt maxv1middotmiddotmiddotvK

EσDorewrProfit(πK)

stMprimeE σlowast |=Nat φ where (14)

φ = RrLoL

leEENSM [C abs] andRrQuality

geQoSm [C abs]

and Pge1minusLoLPM [notrisk U abs]

In Problem (14) we maximize the expected value of the operatorprofit Wlowasts0 under the worst-case resolution of uncertainty in thewind-energy forecast by summing the instantaneous state rewardsrProfits along the paths π isin Πfin of K steps ofMprimeE correspond-ing to the K T2-slots By replicating the decision process for eachof the K T2-slots while building modelMprimeE every K-step execu-tion path traverses all decision epochs so the computed reward asintroduced in Definition 26 and 27 will account for the sum of thecontributions of each decision epoch

According to the semantics defined in Table 1 the PCTL specifi-cation φ constrains the expected operator risk and user QoS acrossthe decision horizon EENSM is the desired maximum value ofExpected Energy Not Served LoLPM is the maximum allowedvalue of Loss of Load Probability (these two properties limit therisk for the operator) andQoSm is the minimum value of QoS thatneeds to be guaranteed to the users

6 EXPERIMENTAL RESULTSWe implemented the algorithm in Python and interfaced it with

PRISM [31] as a front-end for entering models and with Gurobi [27]as the back-end optimizer Experiments were run on a 24 GHz In-tel Xeon with 32GB of RAM

In this section we present experimental results obtained by solv-ing Problem (14) Our goals are to give insight about the algo-rithm functionality compare its scalability to other strategy synthe-sis approaches and show that the synthesized energy pricing anddispatch strategies can achieve better performances than other so-lutions presented in the literature

We define the following quantities

Profit = Wlowasts0EENS = Rr

LoLσlowastmaxs0 [C abs]

QoS = RrQualityσlowastmins0 [C abs]

1minus LoLP = Pσlowastmin

s0 [notrisk U abs]

where the min and max operators refer to the action range of na-ture Nat As defined in Section 22 these quantities represent thequantitative values of rewards and satisfaction probability that getthen compared to the corresponding thresholds (EENSM QoSmLoLPM ) in Problem (14) to determine the satisfaction of φ Wewill then normalize the Profit to ProfitM the maximum com-puted profit value for each set of experiments We set T1 = 1hT2 = 30min so K = 2 and consider two pricing options bothfor traditional and opportunistic users If not otherwise stated

Fig 5 Performance vs iteration

Fig 6 Profit vs confidence in forecast

we will use CL = 90 and discretize the wind energy W in 5bins and traditional Dt opportunistic Do demands and baselinesupply Q in 2 bins We set QoSm = 80

sumk E[Dtk +Dok]

LoLPM = 10 EENSM = 5sumk E[Wk + q] The other pa-

rameter values were taken from [10]In Fig 5 we show the trend of the expected system performance

as a function of the synthesis algorithm iteration The Profitmonotonically decreases until the proposed candidate strategy σcmeets all specifications We note that 1minusLoLP andQoS (EENSnot shown has a trend similar to 1 minus LoLP ) instead vary non-monotonically Intuitively this is because the Profit can be in-creased either by scheduling less baseline energy Q to reduce can-cellation costs cp but incurring in a higher risk or by increasing the

Table 2 Performance Analysis

W bins 5 10 15 20Profit 1 098 097 0965

1minus LoLP 099 099 099 099EENS 098 098 098 098QoS 101 101 101 101

Runtime 144s 400s 1368s 3289sIter 223 53 547 332N + Tr 1343 2719 4115 5591

MD Strat (I) 4096 42e6 43e9 44e12

005 010 015 020 025 030082084086088090092094096098100

Pro

fit

Pro

fit M

Performance Comparison vs Varaiya et al [7] and He et al [10]

Ours[7][10]

005 010 015 020 025 03000020406081012

EE

NS

EE

NS M

Ours[7][10]

005 010 015 020 025 030Wind Penetration ηW

088090092094096098100102

QoS

QoS

m Ours[7][10]

Fig 7 Comparison to alternative approaches via Monte Carlo sim-ulation

energy price v for opportunistic users with consequent reduction ofQoS (the peaks of 1 minus LoLP are aligned to the valleys of QoS)By ranking strategies by expected Profit our algorithm is capa-ble of selecting the optimal strategy despite the complex parameterinterdependences in the model under analysis

In Table 2 we compare synthesis results while varying the num-ber of discretization bins for W (all values are normalized to thecorresponding specification) First we note that the expected sys-tem performances do not substantially vary by changing the num-ber of bins supporting our choice of 5 bins in the other experi-ments Second runtime results show that the algorithm can handlein reasonable time problems more than 10times larger than the onesanalyzed in [14] the only other algorithm in the literature capableof processing arbitrary PCTL formulas (we useN+Tr the sum ofstates and transitions in the EMDP as a proxy of the model size)Third we note that the alternative approach of verifying all the IMD strategies σ isin ΣMD is not practical due to the exponentialincrease of I (defined in Section 211) with the problem size

In Fig 6 we study the effect of different confidence levels CL inthe wind-energy forecast on the expected operator Profit whilevarying the value of wind penetration ηW =

sumKk=1

E[Wk]E[Wk + q]

and keeping all constraints constant At high CL higher profitscan be expected for increasing ηW (wind energy is assumed free)On the other hand for low CL higher wind penetration createsmore uncertainty thus lowering the expected Profit The networkoperator can use these curves to assess the return of investment inemploying more accurate (and expensive) forecast techniques

Finally in Fig 7 we compare results with two other energy-pricing formulations proposed in the literature He et al [10] solvethe optimization problem without enforcing any constraint Varaiyaet al [7] only put limits on the acceptable LoLP (their approach isnot trivially extendable to reward properties expressed using the Roperator) and solve optimally only for LoLP = 0 Comparison isdone by solving the different optimization problems and then run-ning Monte Carlo simulations (1000 runs) of the controlled systemon test data (different from the training ones) to evaluate its per-formance (Profit EENS and QoS) As expected the uncon-strained strategy from [10] has higher Profit (up to 5) but alsoup to 12 more EENS and 10 less QoS compared to our ap-proach The strategy from [7] guarantees null EENS but it hasup to 6 less Profit (due to over constraining EENS) and 10less QoS (which is left unconstrained)

As a final remark we note that runtime may increase exponen-tially as we tighten the specification thresholds (QoSm LoLPM EENSM ) since it becomes increasingly more difficult to find a

solution within the exponentially-sized search space Neverthelessthe chosen values were tight enough to improve the quality of alter-native energy pricing and dispatch strategies proposed in the litera-ture while maintaining the runtime acceptable for this application

7 CONCLUSIONS AND FUTURE WORKWe proposed the first sound and complete algorithm for the syn-

thesis of control strategies for MDPs satisfying properties expressedin PCTL and robust to uncertainties in the transition probabilitiesWe then applied the algorithm to the problem of energy pricingand dispatch in smart grids with renewables Results showed thatnetwork-operator risks can be effectively constrained at design timeand that more accurate predictions of the expected profit can be ob-tained by taking the uncertainty of wind availability into account

As future work we plan to investigate techniques to generatemore concise constraints to prove failure of the verification step inorder to prune more effectively the search space for the optimiza-tion engine and to apply the proposed strategy synthesis approachto further case studies eg semi-autonomous car driving

Acknowledgments The authors thank J Venkatesh and profT Simunic Rosing for providing the measurement data used in thecase study The research was partially funded by DARPA AwardNumber HR0011-12-2-0016 and by TerraSwarm one of six cen-ters of STARnet a Semiconductor Research Corporation programsponsored by MARCO and DARPA Approved for public releasedistribution is unlimited The content of this paper does not neces-sarily reflect the position or the policy of the US government andno official endorsement should be inferred

8 REFERENCES[1] Courcoubetis C and Yannakakis M ldquoThe Complexity of

Probabilistic Verificationrdquo Journal of ACM vol 42(4) pp857ndash907 1995

[2] H Hansson and B Jonsson ldquoA Logic for Reasoning AboutTime and Reliabilityrdquo Formal Aspects of Computing vol6(5) pp 512ndash535 1994

[3] A Bianco and L de Alfaro ldquoModel Checking ofProbabilistic and Nondeterministic Systemsrdquo in Proc ofFSTTCS ser LNCS Springer Berlin Heidelberg 1995vol 1026 pp 499ndash513

[4] M Puterman Markov Decision Processes DiscreteStochastic Dynamic Programming John Wiley and Sons1994

[5] ldquoInternational Energy Agency World Energy Outlook 2009rdquo[Online] httpwwwworldenergyoutlookorgdocsweo2009WEO2009_es_englishpdf

[6] A Gore An Inconvenient Truth New York Rodale 2006[7] P Varaiya F Wu and J Bialek ldquoSmart Operation of Smart

Grid Risk-Limiting Dispatchrdquo Proceedings of the IEEEvol 99 no 1 pp 40ndash57 2011

[8] K Porter and J Rogers ldquoSurvey of Variable GenerationForecasting in the Westrdquo in NREL Subcontract ReportSR-5500-54457 April 2012 p 56

[9] S Koch et al ldquoModeling and Control of AggregatedHeterogeneous Thermostatically Controlled Loads forAncillary Servicesrdquo Proc of PSCC pp 1ndash7 2011

[10] M He S Murugesan and J Zhang ldquoMultiple TimescaleDispatch and Scheduling for Stochastic Reliability in SmartGrids with Wind Generation Integrationrdquo in Proc of IEEEINFOCOM 2011 pp 461ndash465

[11] A Puggelli et al ldquoPolynomial-Time Verification of PCTLproperties of MDPs with Convex Uncertaintiesrdquo in Proc of

CAV ser LNCS Springer Berlin Heidelberg 2013 vol8044 pp 527ndash542

[12] A Puggelli ldquoFormal Techniques for the Verification andOptimal Control of Probabilistic Systems in the Presence ofModeling Uncertaintiesrdquo PhD dissertation University ofCalifornia Berkeley Berkeley CA 2014

[13] C Baier et al ldquoController Synthesis for ProbabilisticSystemsrdquo in Exploring New Frontiers of TheoreticalInformatics Springer US 2004 vol 155 pp 493ndash506

[14] M Lahijanian et al ldquoTemporal Logic Motion Planning andControl with Probabilistic Satisfaction Guaranteesrdquo IEEETransactions on Robotics vol 28 no 2 pp 396ndash409 2012

[15] M Kwiatkowska and D Parker ldquoAutomated Verificationand Strategy Synthesis for Probabilistic Systemsrdquo in Proc ofATVA ser LNCS vol 8172 Springer 2013 pp 5ndash22

[16] K Sen et al ldquoModel-Checking Markov Chains in thePresence of Uncertaintiesrdquo in Proc of TACAS ser LNCSSpringer Berlin Heidelberg 2006 vol 3920 pp 394ndash410

[17] A Nilim and L El Ghaoui ldquoRobust Control of MarkovDecision Processes with Uncertain Transition MatricesrdquoJournal of Operations Research pp 780ndash798 2005

[18] M Y Vardi ldquoAutomatic Verification of ProbabilisticConcurrent Finite State Programsrdquo in Proc of SFCS 1985pp 327ndash338

[19] V Forejt et al ldquoAutomated Verification Techniques forProbabilistic Systemsrdquo Formal Methods for EternalNetworked Software Systems vol 6659 pp 53ndash113 2011

[20] F Bouffard and F Galiana ldquoStochastic Security forOperations Planning with Significant Wind PowerGenerationrdquo in IEEE Transaction on Power Systems 2008vol 23 no 2 pp 306ndash316

[21] A Kucera and O Stražovskyacute ldquoOn the Controller Synthesisfor Finite-State Markov Decision Processesrdquo in Proc ofFSTTCS ser LNCS Springer Berlin Heidelberg 2005vol 3821 pp 541ndash552

[22] K Draeger et al ldquoPermissive Controller Synthesis forProbabilistic Systemsrdquo in Proc of TACAS ser LNCSSpringer 2014 pp 531ndash546

[23] T Brazdil et al ldquoStochastic Games with Branching-TimeWinning Objectivesrdquo in Proc of IEEE SLCS Aug 2006 pp349ndash358

[24] C Hang et al ldquoSynthesizing Cyber-Physical ArchitecturalModels with Real-Time Constraintsrdquo in Proc of CAVBerlin Heidelberg Springer-Verlag 2011 pp 441ndash456

[25] P Nuzzo et al ldquoCalCS SMT Solving for Non-linearConvex Constraintsrdquo in Proc of FMCAD 2010 pp 71ndash80

[26] S Boyd and L Vandenberghe ldquoConvex OptimizationrdquoCambridge University Press 2004

[27] ldquoGurobi Optimizerrdquo [Online] httpwwwgurobicom[28] S Stoft Power System Economics Designing Markets for

Electricity NewYork Wiley 2002[29] Y Wan ldquoLong-term Wind Power Variabilityrdquo NREL

Technical Report TP-5500-53637 [Online]httpwwwnrelgovdocsfy12osti53637pdf January 2012

[30] E Ela et al ldquoOperating Reserves and Variable GenerationrdquoNREL Technical Report [Online]httpwwwnrelgovdocsfy11osti51978pdf August 2011

[31] M Kwiatkowska et al ldquoPRISM 40 Verification ofProbabilistic Real-Time Systemsrdquo Proc of CAV pp585ndash591 2011

Page 2: Robust Strategy Synthesis for Probabilistic Systems Applied to …sseshia/pubdir/emsoft... · 2014-08-07 · Robust Strategy Synthesis for Probabilistic Systems Applied to Risk-Limiting

work capable of bounding the risk interpreted stochastically at op-timization time Moreover a minimum amount of delivered energyneeds to be guaranteed to the users since otherwise operators couldpotentially increase the energy price to force users out of the systemto obtain power balance at times of little wind generation We re-fer to this guaranteed delivered energy as Quality of Service (QoS)for the users As summarized in the next section our contributiontargets these needs

Paper Contributions Our first contribution is a novel stochasticmodel to capture energy pricing and dispatch strategies for smartgrids with wind energy sources (other renewables can be consid-ered analogously) The model is an Ellipsoidal Markov DecisionProcessMC (EMDP) a special case of the Convex-MDP (CMDP)model first introduced by Puggelli et al [11 12] ie an MDPwhere transition probabilities are only known to lie in ellipsoidalsets While previous works used analytical distributions eg Gaus-sian [10] to model uncertainty in wind availability we use mea-sured data (from the wind farm at Lake Benton Minnesota USA)to train a likelihood model of the wind generation and give quanti-tative means to represent the confidence in the forecast values Wethen approximate the likelihood region with an ellipsoidal modelwhich is more accurate than the linear ones often used in the lit-erature while remaining computationally tractable Our empiricalapproach has the promise of more faithfully representing the prob-ability distribution of the generated energy because it is tailored tothe specific wind farm under analysis and it is robust to forecasterrors

As a second contribution we cast the constrained optimizationproblem as the strategy synthesis problem for EMDPs to maximizethe total expected reward constrained to PCTL properties The op-timization aims to maximize the profits for the system operatorwhile constraints limit the risk of power unbalance and guaranteethe desired QoS for the users We focus on finite-horizon History-dependent Deterministic (HD) strategies ie for each state an op-timal action to take is chosen deterministically based on the entire(finite) execution history of the decision process The limitationto finite-horizon strategies is not restrictive since energy pricingand dispatch decisions are taken on a daily basis As explained inSection 3 History-dependent Random (HR) strategies are in gen-eral more powerful ie they can produce a higher expected re-ward Nevertheless we focus on deterministic strategies becausewe believe that deterministic pricing strategies are easier to adoptin a real-world scenario since they can be better understood bythe system agents (eg household users) Using a classical con-struction [4] we then unfold within the model the finite sequenceof decision epochs over the day by constructing a second EMDPMprimeC in which we replicate the states of the original EMDPMC ateach decision epoch In the strategy synthesis forMprimeC we focus onMarkov (for each state an optimal action is taken based only on thecurrent state) Deterministic (MD) strategies It can be proven [4]that the desired HD strategy for each state s ofMC can be recon-structed by collecting the sequence of optimal MD actions for eachreplica sprime of s along the decision epochs ofMprimeC While the two for-mulations produce the same result we consider the synthesis prob-lem of MD strategies since this formulation allows us to leveragethe results on model-checking of CMDPs described in [11]

As our third contribution we prove that the problem of determin-ing the existence of an MD strategy for EMDPs with total expectedreward higher than a given threshold and constrained to specifica-tions in PCTL is NP-complete This result shows that the problemcomplexity does not increase when introducing uncertainties in thestate-transition probabilities with respect to the strategy synthesisproblem for MDPs which is also NP-complete [13] We then de-velop an algorithm to solve the optimization version of the prob-lem ie maximize the reward The algorithm is the first soundand complete synthesis algorithm for EMDPs that can process ar-bitrary formulas (also with multiple and nested quantitative oper-

ators) for the fragment of PCTL that disallows operators with afinite time bound This technique thus represents an improvementto previously proposed approaches which were not complete [14]or could only process one PCTL operator per time [15] Its keyadvantage is the capability of ranking candidate strategies by thevalue of their reward The first proposed strategy that satisfies thePCTL specification for any resolution of uncertainty is the solu-tion of the synthesis problem Although the algorithm worst-caserunning time is exponential in the size of the EMDP this capabil-ity may allow considerable speed-ups in practical scenarios Theseresults hold also for Interval-MDPs (IMDPs) and for the dual prob-lem of cost minimization (not considered in the paper) Further theproposed algorithm can be applied to a wider range of applicationseg semi-autonomous car driving

The rest of the paper is organized as follows Section 2 givesbackground on CMDPs and PCTL Section 3 presents related workIn Section 4 we describe the proposed algorithm for the synthesisof constrained optimal strategies for EMDPs We then give detailsof the EMDP model used to synthesize energy-pricing strategiesin smart grids with renewable sources in Section 5 and presentexperimental results in Section 6 Lastly we conclude and discussfuture directions of research in Section 7

An extended version of this paper is available at [12]

2 PRELIMINARIESFor a given vector v we will use

1Tv =sumi v[i] =

sumi vi

to represent the sum of all the elements of the vector

Definition 21 A Probability Distribution (PD) over a finite setZ of cardinality n is a vector micro isin Rn satisfying 0 le micro le 1 and1Tmicro = 1 The element micro[i] represents the probability of realizationof event zi We call Dist(Z) the set of distributions over Z

21 Convex Markov Decision Process (CMDP)

Definition 22 CMDP A labeled finite CMDPMC is a tupleMC = (S S0 AΩF AX L) where S is a finite set of statesof cardinality N = |S| S0 is the set of initial states A is a finiteset of actions (M = |A|) Ω is a finite set of atomic propositionsF is a finite set of convex sets of transition PDs A S rarr 2A is afunction that maps each state to the set of actions available at thatstate X = S timesArarr F is a function that associates to state s andaction a the corresponding convex set Fas isin F of transition PDsand L S rarr 2Ω is a labeling function

The set Fas = Distas(S) represents the uncertainty in defining atransition distribution forMC given state s and action a We callfas isin Fas an observation of this uncertainty Also fas isin RN and wecollect the vectors fas foralls isin S into an observed transition matrixF a isin RNtimesN Abusing terminology we call Fa the uncertainty setof the transition matrices and F a isin Fa Fas is interpreted as therow of Fa corresponding to state s Finally fasisj = fasi [j] is theobserved transition probability from si to sj under action a Thedata-type of a isin A(si) can be different from the one of b isin A(sj)if si 6= sj We will use this property in the case study presented inSection 5

To model uncertainty in state transitions we make the followingassumptions

Assumption 21 Rectangular Uncertainty Fa can be factoredas the Cartesian product of its rows ie its rows are uncorrelatedFormally for every a isin A Fa = Fas0 times middot middot middot times F

asNminus1

This assumption will allow us to consider one state transition at

a time and it eases the development of the synthesis algorithm

Assumption 22 CMDP Semantics CMDPs model nondetermin-istic choices made from a convex set of uncountably many choicesEach time a state is visited a transition distribution within the set isadversarially picked and a probabilistic step is taken accordingly(The same semantics is used for IMDPs in [16])

A transition between state s to state sprime in a CMDP occurs in threesteps First an action a isin A(s) is chosen nondeterministicallySecondly an observed PD fas isin Fas is chosen The selection of fasmodels uncertainty in the transition Lastly a successor state sprime ischosen randomly according to the transition PD fas

A path π in MC is a finite or infinite sequence of the form

s0

fa0s0s1minusminusminusrarr s1

fa1s1s2minusminusminusrarr middot middot middot with faisisi+1

gt 0 foralli ge 0 si isin Sai isin A(si) We indicate with Πfin (Πinf ) the set of all finite (in-finite) paths ofMC πs[i] (πa[i]) is the ith state (selected action)along the path and for finite paths last(π) is the last state visitedin π isin Πfin Πs = π | π[0] = s is the set of paths starting in s

The algorithm presented in Section 4 can be applied both to theinterval and ellipsoidal models of uncertainty [11] (and to a mix ofthem) but in this paper we will focus on the latter one since it ismore suitable for the application analyzed in Section 5 This modelis a second-order approximation of the likelihood model of uncer-tainty (likelihood models are often used when transition probabil-ities are determined experimentally) and it is more accurate thana linear one [17] The transition frequencies associated to actiona isin A are collected in matrix Ha Uncertainty in each row of Ha

can be described by the likelihood region gas

gas = fas isin RN|sumsprime h

assprime log(fassprime)geβ

as (1)

where βas lt βasmax =sumsprime h

assprime log(hassprime) represents the uncer-

tainty level The second-order approximation of gas is [17]

gas asymp fas isin RN|sumsprime

(fassprimeminushassprime)

2

hassprime

le(Kas )2 (2)

withKas = 2(βasmaxminusβas ) ge 0 representing the uncertainty levelWe then write the approximation of gas in conic form and intersectit with the probability simplex to obtain the uncertainty set

Fas = fas isin RN | fas ge 01T fas = 1 (3)Eas (fas minus has) 2 le 1 Eas 0

with Eas = (Kas )minus1 times diag((hass0)minus05 middot middot middot (hassN )minus05

) 0

positive definite Note that the conic representation is crucial toenhance the scalability of the algorithmic approach presented inSection 4 because convex solvers process conic constraints moreefficiently than generic quadratic constraints

Definition 23 Ellipsoidal-MDP (EMDP) An EMDP is a CMDPwhere every uncertainty set Fas isin F is expressed using Set (3)

We determine the size R of the CMDP MC as follows MChas N states O(M) actions per state and O(N2) transitions foreach action Let Da

s denote the number of constraints required toexpress the rectangular uncertainty set Fas and D = max

sisinSaisinADas

The overall size ofMC is thusR = O(N2M +NMD)To illustrate our results we will use the EMDPMC in Fig 1

with S = s0 middot middot middot s3 S0 = s0 A = a b Ω = ϑ absA s0 s1 rarr a b s2 s3 rarr a L s0 s2 rarr ϑ s3 rarr abs The parameters of the uncertainty ellipsoids are shownnext to each transition

211 Strategies and NatureTo analyze quantitative properties we need a probability space

over infinite paths [18] However a probability space can onlybe constructed once nondeterminism and uncertainty have been re-solved We call each possible resolution of nondeterminism a strat-egy which chooses an action in each state ofMC

s013rs=113ϑ13

s113rs=313empty13

s313rs=013

abs13

s213rs=113ϑ13

b13

a 13

rb= 613K=0113

ra= 913K=0213

h02b = 025

h01b = 075

h02a = 035

h03a = 065

h23a =1

h33a =1

h12b =1

h12a = 08

h11a = 02

b 13

a13

a13a13

Fig 1 Example EMDP

Definition 24 Strategy A randomized strategy forMC is a func-tion σ = Πfin times A rarr [0 1] with

sumA(last(π)) σ(π a) = 1 and

a isin A(last(π)) if σ(π a) gt 0 We call Σ the set of all strategiesσ ofMC

Conversely we call a nature [17] each possible resolution of un-certainty ie a nature chooses a transition PD for each state andaction ofMC

Definition 25 Nature Given action a isin A a randomizednature is the function ηa Πfin times Dist(S) rarr [0 1] withintFalast(π)

ηa(π fas ) = 1 and fas isin Falast(π) if ηa(π fas ) gt 0 We

call Nat the set of all natures ηa ofMC

A strategy σ (nature ηa) is Markovian (M) if it depends only onlast(π) Also σ (ηa) is deterministic (D) if σ(π a) = 1 for somea isin A(last(π)) (ηa(π fas ) = 1 for some fas isin Falast(π)) Asexplained in Section 1 we developed a synthesis algorithm for MDstrategies There are in total I =|As0 | times middot middot middot times |AsN |= O(MN )

MD strategies collected in the set ΣMD After fixing a strategy σ all the non-determinism inMC is re-

solved For MD strategies we obtain the induced Convex MarkovChain (CMC) Mσ

C = (S S0ΩF X L) MσC has still size R

since the state space S does not change for MD strategies Furtherthe only convex set Fas isin F of transition PDs available at eachstate s isin S is the one corresponding to the action a isin A(s) suchthat σ(s a) = 1

212 RewardsRewards allow modeling additional quantitative measures of a

CMDP eg profit We associate rewards to states and to actionsavailable in each state

Definition 26 Reward Structure and Path Reward A re-ward structure for a CMDP MC is a tuple r = (rs ra)comprising a state (action) reward function rs S rarr Rge0

(ra S timesArarr Rge0) Given a (possibly infinite) path πwith horizon T isin N cup +infin the path reward for π isrewr(π T ) =

sumTt=0 rs(πs[t]) + ra(πs[t] πa[t])

In this paper we will rank the available MD strategies for theCMDP based on their total expected reward

Definition 27 Total Expected Reward The total expected rewardfor state s isin S under strategy σ isin ΣMD is defined as

Wσs = min

ηaisinNatEση

a

[rewr(π T )] (4)

where we minimize across the action range ηa isin Nat of the ad-versarial nature the expected reward over all paths starting from swith horizon T visited under strategy σ

We will only consider CMDPs such that Wσs exists and it is

finite foralls isin Sforallσ isin ΣMD These include finite and infinite-horizon CMDPs (T isin N cup +infin) with zero-reward absorbingstates (ω = abs) as the ones used in Section 5 An extension toother classes of CMDPs eg with negative rewards is possible insome specific cases but outside the scope of the paper For moredetails see [4] Finally according to Assumption 22 we can sub-stitute ηa isin Nat with fas isin Fas and only consider MD natures

22 Probabilistic Computation Tree LogicWe use Probabilistic Computation Tree Logic (PCTL) a proba-

bilistic logic derived from CTL which includes a probabilistic op-erator P [2] and a reward operator R [19] to express properties ofCMDPs [3] The syntax of the fragment of the logic in which oper-ators with a finite time bound are disallowed is defined as follows

φ = True | ω | notφ | φ1 and φ2 | P1p [ψ] | Rr1v[ρ] state formulasψ = X φ | φ1 U φ2 path formulasρ = C φ rewards

with ω isin Ω an atomic proposition 1isin le ltge gt p isin [0 1]v isin Rge0

Path formulas use the Next (X ) and Unbounded Until (U ) op-erators They are evaluated over paths and only allowed as param-eters to the P operator Reward formulas use the Cumulative (C )operator The sizeQ of a PCTL formula is defined as the number ofBoolean connectives plus the number of P and R operators in the

formula We define Pσηa

s [ψ]def= Prob

(π isin Πσηa

s | π |= ψ)

the probability of taking a path π isin Πs that satisfies ψ under strat-egy σ and nature ηa Pσmaxs [ψ] (Pσmins [ψ]) denotes the maxi-mum (minimum) probabilityPση

a

s [ψ] across all natures ηa isin Natfor a fixed strategy σ An analogous definition holds also for re-ward properties which can be expressed also using (multiple) re-ward structures different from the one used to maximize the totalexpected reward For a CMDP MC strategy σ and property φwe will useMC σ |=Nat φ to denote that when starting from anyinitial state s isin S0 and operating under σMC satisfies φ for anyηa isin Nat The semantics of the logic is reported in Table 1 wherewe write |= instead of σ |=Nat for brevity

3 RELATED WORKRelated work falls into two main categories renewable-energy

pricing in smart grids and strategy synthesis from PCTL specifica-tions for probabilistic systems

The integration of renewable energy sources in power grids hasmotivated the development of stochastic frameworks to solve theenergy-pricing problem The work in [20] presents a risk-limitingoptimization framework That effort focuses on modeling the un-certainty in energy availability on the supply side but it does not

Table 1 PCTL semantics forMC s |= Trues |= ω iff ω isin L(s)s |= notφ iff s 6|= φs |= φ1 and φ2 iff s |= φ1 and s |= φ2

s |= Pp(p) [ψ] iff Pσmax(min)s [ψ] p(p)

s |= Rv(v) [C φ] iff Eσmax(min)s [rewr(π tφ)] v(v)

π |= X φ iff π[1] |= φπ |= φ1 U φ2 iff existi ge 0 | π[i] |= φ2 and forallj lt i π[j] |= φ1

rewr(π tφ) = Σtφt=0rs(πs[t]) + ra(πs[t] πa[t])

tφ = mint | πs[t] |= φ

consider the problem of controlling user demand through economicincentives The effectiveness of demand response in balancing sup-ply and demand in power grids was studied in [9] and a stochas-tic framework to optimize operator profits was presented in [10]We closely follow the optimization setup presented in these worksbut we also constrain the operator risk and the user QoS at syn-thesis time Finally Varaiya et al argued the need to quantita-tively constrain the operator risk in [7] The risk-limiting dispatchapproach proposed in that work is optimal for properties of theform Pge1 or Ple0 but sub-optimal for properties with satisfactionthreshold p isin (0 1) Further QoS is not considered

The problem of strategy synthesis for MDPs from PCTL speci-fications was first studied in [13] Strategies are divided into fourcategories depending on 1) whether the transition is chosen deter-ministically (D) or randomly (R) 2) the choice does (does not) de-pend on the sequence of previously visited states (Markovian (M)and History-dependent (H)) Also it is proven that the four types ofstrategies form a strict hierarchy (MD ≺ MR ≺ HD ≺ HR) andthat determining whether it exists an MDMR (HDHR) strategythat meets all specifications is NP-complete (elementary) Kuceraet al [21] show how to synthesize MR controllers that are robustto linear perturbations via a reduction to a formula in the first-order logic of the reals This work is the closest to ours albeitwe consider also non-linear models of uncertainties Also to thebest of our knowledge the algorithm has not been applied to anycase study In [14] and [15] routines for the model-checking ofPCTL properties of MDPs are adapted to the strategy synthesisproblem These algorithms are polynomial in the model size butthey are not complete (ie there might exist a solution even if noneis reported) [14] or can handle properties with only one quanti-tative operator [15] In [22] the synthesis of multi-strategies forMDPs is studied This approach can handle only a subset of PCTLproperties and it only considers MDPs with no uncertainties Fi-nally [23] studies two-player games with winning objectives ex-pressed in PCTL Also our formulation can be interpreted as agame where the controller plays against nature On the other handfollowing the CMDP semantics defined in Assumption 22 we givenature the power of selecting a different strategy at each executionstep while [23] aims to analyze the more complex problem of find-ing the single optimal strategy for nature Our formulation is usefulto model time-varying processes (eg the power generated by thewind) and an optimal strategy for the controller can be synthesizedalgorithmically The existence of an optimal controller and naturein the formulation of [23] is instead undecidable in general

4 CONSTRAINED TOTAL EXPECTED RE-WARD MAXIMIZATION FOR CMDPS

We formally define the optimization problem under analysis provethat its decision problem version is NP-complete and present an al-gorithm to solve the former

Constrained Total Expected Reward Maximization for EMDPsGiven an EMDPMC a reward structure r and a PCTL formulaφ determine strategy σlowast forMC such that

σlowast = argmaxσisinΣMD

φ

WσS0

(5)

where WσS0

is the sum of the expected rewards over all the initialstates s isin S0 and ΣMD

φ is the set of Markov-Deterministic strate-gies for whichMC satisfies φ for any ηa isin Nat starting from anys isin S0 and operating under σ isin ΣMD

φ The same results hold for the dual problem of minimizing the

EMDP total expected cost after substituting all ldquomaxrdquo (ldquominrdquo) op-erators with ldquominrdquo (ldquomaxrdquo) in the derivations below

We will use the following lemmas (the proofs are available in thereferences)

Opmizaon13 Engine13 (strategy13 ranking)13

Feasible13 Candidate13

op0mal13 solu0on13 NO13 addi0onal13 constraint13

Op0mal13 strategy13 that13 sa0sfies13 proper0es13

No13 strategy13 sa0sfies13 proper0es13

Verificaon13 Engine13 (verifies13 if13 properes13 are13 sasfied)13

Unfeasible13

YES13

Fig 2 Lazy algorithm for the constrained optimization

Lemma 41 Complexity of PCTL model-checking forCMDPs [11] Verifying that a CMDP MC satisfies a PCTL for-mula φ is solvable in polynomial time

Lemma 42 Computation of the total expected reward for Con-vex Markov Chains (CMCs) [17] Given a CMC its total ex-pected reward is computable in polynomial time

Lemma 43 Complexity of PCTL strategy synthesis forMDPs [13] The problem of determining the existence of an MDstrategy σ for an MDPM such thatM σ |= φ is NP-complete

We now prove that the decision problem version of Problem (5)is NP-complete

Theorem 41 The problem of determining the existence of an MDstrategy σ for an EMDP MC with total expected reward Wσ

S0

larger or equal to WT and satisfying a specification φ in PCTLis NP-complete

Proof Given a candidate solution σc we can in polynomial time1) check whether MC σc |=Nat φ by Lemma 41 2) computeWσcS0

on the induced CMCMσcC by Lemma 42 σc is a solution if

and only if check 1) passes and WσcS0ge WT This proves that the

problem is in NP To prove NP-hardness we reduce the problem inLemma 43 to the one under analysis We set WT = 0 and de-scribe transition probabilities with point ellipses ie ellipses withnull axes (Kas rarr +infin foralls isin S foralla isin A(s) in Equation (2))

In the rest of the section we describe an algorithm to solve Prob-lem (5) We use a lazy approach based on strategy-iteration con-ceptually similar to the ones proposed in [24] and for non-linearconstraints [25] As shown in Fig 2 the algorithm is split into twomain routines communicating in a loop At each iteration the opti-mization engine (OE) is responsible to generate a candidate strategyσc σc maximizes the total expected reward but it does not neces-sarily satisfy PCTL property φ since the OE formulation does notcontain any information about φ The candidate solution is thenpassed to the verification engine (VE) which checks whetherMCsatisfies φ for all resolutions of uncertainty ie forallηa isin Nat whenoperating under σc If the check passes σlowast = σc and the algo-rithm terminates Otherwise the VE generates an additional con-straint for the OE to prevent the previous σc to be selected againand the loop repeats The novelty of our approach is devising amathematical formulation for the OE capable of generating at eachiteration the candidate strategy that maximizes the total expectedreward among those still available (at each iteration in which theVE reports a failure the candidate strategy is discarded) The firstcandidate strategy that satisfies the PCTL specification becomes the

solution of the synthesis problem Such a strategy is feasible (sat-isfies φ) and it has the maximum total reward among strategies thatsatisfy φ so it solves Problem (5) exactly

In general there is not guarantee on the uniqueness of the op-timal strategy σlowast ie multiple strategies with the same expectedreward might exist each satisfying φ Since every such strategyis equivalent from the user perspective the algorithm just reportsthe first one found Alternatively all equivalent strategies could begenerated by continuing the iteration between the OE and the VEuntil finally a reduction in the expected reward gets detected andby reporting all synthesized strategies satisfying φ

The next subsections give details on the OE and VE and analyzethe algorithm properties

41 Optimization EngineWe start with the classical linear-programming (LP) formulation

to maximize the total expected reward for MDPs [4]

minxl

sumsisinS0

xs

st xs minus las = ras + xTfas foralls isin Sforalla isin A(s) (6)xs l

as ge 0 foralls isin Sforalla isin A(s)

Vector x collects the total expected reward for each state s isin S (atthe end of the optimization Wσc

s = xs foralls isin S) and the cost func-tion sums the total expected rewards for all the initial states s isin S0We then set Wσc

S0=sumsisinS0

xs Variables las are slack variablesfor each constraint Also ras = rs(s) + ra(s a) foralls isin Sforalla isin ASince the slack variables have negative sign the slack can only benegative ie the left-hand side (LHS) can only be larger or equalthan the right-HS (RHS) The ldquominrdquo operator makes sure that foreach state the constraint with the highest RHS has null slack ielas = 0 The optimal strategy can then be reconstructed by selectingthe action a isin A(s) foralls isin S corresponding to the constraint withnull slack eg σc(s0 a) = 1 if las0 = 0 Our goal is to modifysuch a formulation to allow a sub-optimal solution to be selected incase the optimal solution does not satisfy the PCTL specificationWe now describe an equivalent formulation to the original prob-lem that is more suitable to achieve this goal We will describe inSection 42 how to add constraints to this formulation to select sub-optimal solutions in order of optimality We refer to Problem (7)

maxxzln

sumsisinS0

xs

st xs minus las + nas = ras + xTfas foralls isin Sforalla isin A(s) (7a)las le Bzas nas le Bzas foralls isin Sforalla isin A(s) (7b)

zTs 1 = Ms minus 1 foralls isin S (7c)xs l

as n

as ge 0 zas isin 0 1foralls isin Sforalla isin A(s)

We associate a binary variable zas to each action for every stateso the problem becomes a Mixed-Integer Linear Program (MILP)zajsi = 0 if action aj is chosen for state si and Constraint (7c) guar-

antees that only one action can be selected for each state whereMs = |A(s)| For example σc(s0 a) = 1 if zas0 = 0 We then as-sociate to each constraint a second slack variable nas with sign op-posite to las For selected actions zaisi = 0 Constraint (7b) makessure that zajsi = 0 implies lajsi = 0 and najsi = 0 so that the cor-responding Constraint (7a) sets the value of xs1 For unselectedactions zaksi = 1 variable laksi gt 0 (naksi gt 0) if selecting ac-tion ak had resulted in a lower (higher) value of xsi With these

1 B is a big number with respect to the problem data It needs to behigher than the reward computed for any state ie B ge xs foralls isinS but not excessively high to avoid convergence problems in theoptimizer when solving Problem (7) A suitable value of B can befound by trial and error

constraints any action can be selected We finally change the op-timization operator to ldquomaxrdquo so that at the first iteration of thealgorithm the total expected reward gets maximized

We now proceed to consider uncertainties in the transition prob-abilities Constraint (7a) gets updated to Constraint (8) since theadversarial nature tries to minimize the expected reward The newconstraint can be made linear again for an arbitrary uncertaintymodel by replacing it with a set of constraints one for each pointin Fas However this approach results in infinite constraints if theset Fas contains infinitely many points as in the cases consideredin the paper thus making the problem not solvable We solve thisdifficulty for the ellipsoidal uncertainty model using duality InConstraint (9) we replace the primal inner problem with its dualforalls isin S a isin A(s)

xs minus las + nas = ras + minfas isinFas

xTfas (8)

dArrxs minus las + nas = ras + max

λasisinDas (x)

g (λas ) (9)

Vector Lagrange multiplierλas =

[λa1s λ

a2sλ

a3s

]Dual cost functiong (λas ) = λa1s minus λa2s minus ha

sTEasλ

a3s

Dual feasibility setDas (x) = λas | λa3s2 le λa2s xminus λa1s1 + Eas

Tλa3s = 0

The dual problem is convex by construction [26] and has size poly-nomial in the size R of the EMDP [11] Since also the primalproblem is convex strong duality holds ie the primal and dualoptimal solutions coincide because the primal problem satisfiesSlaterrsquos condition [26] for any non-trivial uncertainty set Fas Anydual solution underestimates the primal solution When substitut-ing the primal problem with the dual in Constraint (9) we can dropthe inner optimization operator because the outer optimization op-erator will nevertheless aim to find the least underestimate to max-imize its cost function We get the full formulation for the OE (thequantifiers foralls isin Sforalla isin A(s) are equal to Problem (7))

maxxλlnz

sumsisinS0

xs

st xs minus las + nas = ras + g (λas )

las le Bzas nas le Bzas (10)

zTs 1 = Ms minus 1

xs las n

as ge 0λas isin Das (x) zas isin 0 1

For the ellipsoidal model Problem (10) can be written as Prob-lem (11) which is a Mixed-Integer Quadratic-Constrained Program(MIQCP) that we solve using the back-end optimizer Gurobi [27]

maxxλzln

sumsisinS0

xs

st xs minus las + nas = ras + λa1s minus λa2s minus hasTEasλ

a3s

las le Bzas nas le Bzas

zTs 1 = Ms minus 1 (11)

xs las n

as λ

a2s ge 0 λa3s ge 0 zas isin 0 1

λa3s2 le λa2s xminus λa1s1 + EasTλa3s = 0

42 Verification EngineAfter fixing a candidate strategy σc all the non-determinism inMC is resolved The VE has thus the task of checking whetherthe induced Ellipsoidal-MC (EMC) Mσc

C = (S S0ΩF X L)satisfies the PCTL formula φ for all resolutions of uncertainty ieforallηa isin Nat We use the sound and complete verification algo-rithm presented in [11 12] (we remember that EMCs are a special

case of CMDPs) If the EMC satisfies φ then the optimal strat-egy has been found and σlowast = σc Otherwise the VE needs togenerate an additional constraint to be passed to the OE so thatthe same candidate solution does not get selected anymore If vec-tor zc = [zas0 middot middot middot z

asN ] collects all the binary decision variables that

were set to zero in the previous round of optimization ie the vari-ables corresponding to the previously selected actions we just needto add constraint

zTc 1 ge 1 (12)

so that it is not possible to select the same set of actions againAs an example we optimize the total expected reward of the

EMDP in Fig 1 subject to property φ = Pge08[ϑ U abs] The firstiteration of the OE generates strategy σc1 which selects actions[b a a a] for states [s0 middot middot middot s3] with Wσc1

s0 = 10625 The VEthough reports Pσc1mins0 [ϑ U abs] = 0207 so the strategy is re-jected The VE adds the constraint zbs0 + zas1 + zas2 + zas3 = 1 tothe OE formulation which generates at the second generation σc2([a a a a]) with Wσc2

s0 = 10188 The VE computes the valuePσc2mins0 [ϑ U abs] = 1 and the algorithm terminates reportingσlowast = σc2

43 Algorithm AnalysisWe prove soundness (if a strategy σlowast is returned it indeed opti-

mally solves Problem (5)) completeness (if no solution is returnedno strategy σ isin ΣMD exists such thatMC σ |=Nat φ) and ana-lyze the runtime performance of the proposed algorithm

Theorem 42 The algorithm presented in this section to solveProblem (5) is sound complete and has runtime exponential in thesize R of the EMDP and polynomial in the size Q of the PCTLspecification φ

Proof Problem (10) returns the MD strategy σc that maximizesWσS0

among those still available By Lemma 41 the VE is soundso if it returns MC σc |=Nat φ indeed σlowast = σc (exit arrow atthe bottom of Fig 2) The VE is also complete so if it returnsMC σc 6|=Nat φ the current σc can be discarded This is doneby generating a constraint of the form of Constraint (12) whichremoves only the current σc from the strategies to be explored bythe OE This proves soundness of the overall algorithm

Failure of finding a solution is declared only by the OE whenProblem (10) becomes unfeasible because all available strategieshave previously been discarded by the VE (exit arrow at the top ofFigure 2) This proves completeness

Finally the algorithm goes at most through I = O(MN ) itera-tions (from Section 2 I is the total number of MD strategies M isthe number of actions and N the number of states of the EMDP)Each requires solving an instance of Problem (10) and a verificationcheck (done in time polynomial inR andQ by Lemma 41) Prob-lem (10) can be solved by branch-and-bound algorithms in time ex-ponential in the number of binary variables (whose number isMNand remains constant across iterations) and polynomial in the num-ber of constraints (whose number is polynomial in R at the firstiteration and it grows at each iteration limited by I = O(MN ))The total complexity isO(MNtimes

(2MNtimespoly(MN )+poly(Q))

)

exponential inR and polynomial inQ

The algorithm performs better on problems that do have a fea-sible solution arguably the most interesting ones while the opti-mization step could be removed if the goal is to prove unfeasibilityAs an alternative σlowast could be determined by testing all I avail-able MD strategies and selecting the one with the highest rewardamong those satisfying φ [13] We believe (and experimentallyshow in Section 6) that our approach can achieve better runningtime by decoupling the problem into an optimization and a veri-fication part and by testing strategies in order of optimality Fi-

Real-time price13

Day-ahead price13

Network Operator13Energy dispatch and pricing13

Base-line13 Wind13 Fast-start13

Traditional13Users13

Opportunistic13Users13

Day-ahead13dispatch13

Real-time dispatch13

Uncertain demand13

Uncertain supply13

Supply13

Demand13

Fig 3 Network operator inputs (dashed) and outputs (solid)

nally speed-ups can be obtained by implementing online routinesfor integer-constraint simplification and to produce more succinctcertificates of unfeasibility from the VE [2425] These routines areoutside the scope of this paper and will be covered in future work

5 SUPPLY SCHEDULING AND ENERGYPRICING WITH RENEWABLE SOURCES

We model the pricing and dispatch problem following the sce-nario sketched in Fig 3 [7 10 28] More details are availablein [12]

Three agents operate in the system the network operator andtwo types of users traditional and opportunistic Further three en-ergy sources are available non-dispatchable wind and two dis-patchable sources baseline and fast-start generators We will usestochastic variables WDt Do to represent wind energy supplyand traditional and opportunistic user demand respectively

The network operator takes two kinds of decisions 1) dispatchof non-renewable sources to guarantee that the aggregate energysupply matches the demand 2) pricing of energy to maximizeprofits and incentivize users to join or leave the network depend-ing on energy availability Traditional and opportunistic users re-act to pricing decisions on different timescales Traditional usersonly react to day-ahead pricing to decide how much energy theyare willing to purchase while opportunistic users are capable ofrescheduling in real-time their energy demand depending on theenergy price in exchange of lower expected prices

A 24-hour period gets divided into T1-slots of equal length (egT1 = 1h) and each T1-slot into K T2-slots (eg T2 = 30minK = 2) [10] The operator maximizes its economic profit by tak-ing decisions on the two time scales On day-ahead for each T1-slot it dispatches Q units of baseline energy (unit cost c1) withq = QK units per T2-slot and sets the price for traditional users(u) so that they can decide one day ahead when to schedule theirdemand in the following day The choice of u sets the expected de-mand of traditional users (E[Dt]) In real-time for each T2-slot theoperator first observes the values of traditional-user demand dt andwind availabilityw It then sets the price for opportunistic users (v)to determine the expected demand of opportunistic users (E[Do])Third it dispatches the production of more fast-start energy (unitcost c2) or cancels part of the already dispatched baseline energy(paying only unit cost cp instead of c1) depending on wind avail-ability and user demand to balance supply and demand In realscenarios cp lt c1 lt c2 and wind-energy is assumed to be free forbrevity [10] The operator thus tries to use as much wind energyas possible and to dispatch on day-ahead the exact amount of base-line energy not to incur in real-time in cancellation costs or in theextra cost for fast-start supplies Since more profitable strategiesmight imply a higher reliance on the uncertain wind energy or anincrease in energy prices correct system functionality needs limitson energy-unbalance risks and QoS guarantees for users

There are three sources of stochastic behavior (i) wind-energysupply (W ) (ii) traditional (Dt) user demand (iii) opportunistic(Do) user demand We thus use a stochastic optimization frame-work The result of the optimization should return optimal strate-

wα 0 0

w1αdt1αdo1γ

w1αdt1αdo1β

k=013 k=113 k=213

v1a

v1a

v1av1

b

v1b

Qa ua

Qb ub

Qa ub

w1β 0 0

s0

0 0 0

w1α 0 0w1αdt1α 0

w1β dt1β 0w1β dt1α 0

w1αdt1β 0

w1αdt1αdo1α

w1αdt1αdo1δv1b

w2β 0 0

w2α 0 0

wαdtαdoγ

wαdtαdoβva

va

vavb

Qa ua

Qb ub

Qa ub

wβ 0 0

s0

0 0 0

wαdtα 0

wβ dtβ 0wβ dtα 0wαdtβ 0

wαdtαdoα

wαdtαdoδvb

vb

ab uQ

ab uQ

Fig 4 Sketch ofME (top) andMprimeE (bottom) The latter isME

unfolded across decision epochs For each EMDP the initial states0 is shown on the left Each state is represented by the tuple(w dt do) of observations of WDt Do The pairs (Q u) rep-resent the day-ahead decisions about dispatch of baseline energyQand energy pricing for traditional users u Two arrows per decisiondepart from each state because in this figure we assumed only twodiscretization levels for each quantity (labeled with greek lettersα β middot middot middot ) We only show the state graph in the EMDPs related todecision (Qa ua) but similar state graphs are present also for theother decision pairs (Qa ub) (Qb ua) (Qb ub) as hinted by thedashed arrows departing from s0

gies about 1) the day-ahead decisions (Q and u) and 2) the real-time decisions (v) for each possible observation of W and Dt Inreal-time the actual decision (vobs) will be taken deterministicallyamong the synthesized ones based on the observed values w anddt ie the actual wind availability and traditional-user demandIn summary we will optimize over one T1-slot (the decision prob-lem is periodic so we can run one optimization for each T1-slotstand-alone) and aim to determine optimal values for Q u and v

We use the Ellipsoidal-MDPME = (S S0 AΩF AX L)sketched in Fig 4 (top) All quantities are bounded and uniformlydiscretized to keep the state and action spaces finite States s isin Sare a tuple s = (w dt do) where w dt do refer to the observedvalues of available wind energy and user demand in that state Weconsider only one T1-slot per time so we model only one choiceof day-ahead energy dispatch Q and pricing for traditional usersu (at the initial state s0 (Q u) isin A) The process then transi-tions through K decision epochs as follows First values of windenergy (w) and traditional user demand (dt) are stochastically cho-sen according to the corresponding distributions (described below)Second for each observation of W and Dt a decision on v isin A ismade and the opportunistic user demand (do) is stochastically cho-sen To transition between epochs a new value of wind energy w(the backward arrow in Fig 4 (top)) is chosen and the steps repeat

In general the optimal decision v at each epoch depends onprevious observations of wind availability (w) and user demand(dt do) so v is history-dependent To synthesize control strategiesusing the algorithm in Section 4 we unfold the sequence of deci-sion epochs in the EMDPMprimeE = (Sprime S0 A

primeΩF primeAprimeX prime Lprime)in Fig 4 (bottom) InMprimeE we have explicitly marked each quan-tity with an additional subscript k = 1 middot middot middot K to refer to the cor-responding T2-slot Each state s isin S ofME (apart from the initialstate) has been replicated K times in MprimeE s rarr s1 s2 middot middot middot sK After K decision epochs the states transition to an absorbing state(not shown in Fig 4) Since now all decision epochs are explicitly

codified in the model Markov strategies are optimal forMprimeE Thecorresponding optimal history-dependent strategy forME can bereconstructed for each s isin S by collecting the sequence of optimaldecisions returned by the algorithm in the replicas s1 s2 middot middot middot sK In the following we will only considerMprimeE As an additional ad-vantage different wind distributions can be used to transition be-tween different epochs inMprimeE (only one distribution could be usedinME) to account for the time-varying nature of wind availabilityThe modeling expressivity is thus increased

State transition probabilities are computed using the followingstochastic models

User demand The demand of both traditional (Dt) and oppor-tunistic (Do) users is modeled using Gaussian distributions [10]with Dt sim N (αtu

γt βtE[Dt]) and Do sim N (αovγo βoE[Do])

Parameter γt lt 0 (γo lt 0) is the elasticity of the traditional (op-portunistic) users ie the ratio of the percentage change of theexpected demand to that of price variations formally defined asγt = uE[Dt] middot partE[Dt]partu Parameters αt αo βt βo are fitting pa-rameters To compute transition probabilities we truncate and dis-cretize the continuous PDs in equally-sized intervals pick the mid-dle point of each interval as the discretization value and then inte-grate the PD across the interval to determine how likely the systemtransitions to that discretized value2

Wind-energy availability We created a stochastic model of theavailable wind energy starting from measured data collected fromthe wind farm at Lake Benton Minnesota USA [29] The goal is totake forecast values into account while also considering the intrin-sic inaccuracies of these predictions First we compute the (dis-crete) empirical PD microW of a training set of collected wind-energydata Second we divide a new set of data in T2-slots and considerthe average value for each new T2-slot as the forecast energy valueWe then scale microW to have such expected value E[Wk] thus obtain-ing microWk Finally we compute ellipsoidal Sets (3) F primeas (collectedinF prime) to represent uncertainty in the transition probability betweentwo discretized energy levels in two consecutive T2-slots Transi-tion frequencies are computed by counting observed transitions in atraining set of data Using classical results from statistics [17] wethen compute the value of parameter β from Set (1) correspond-ing to the confidence level CL in the measurements In particular0 le CL le 1 and CL = 1minus cdfχ2

d(2 lowast (βmax minus β)) where cdfχ2

d

is the cumulative density function of the Chi-squared distributionwith d degrees of freedom (d is equal to the number of bins used todiscretize W ) Moreover an increasing value of parameter β canbe used in the sequence of decision epochs to model the fact thatforecast farther-away in time are less accurate

We provide the states with thick circles in Fig 4 (bottom) withthree reward structures to express the profit and risk of the oper-ator and the QoS for the users We choose those states becausethe quantities Dtk DokWk are all fully observable in them thusallowing the computation of the rewards We set

rProfitsk [$] = udtk+vkdokminus(cp∆k+c1(qminus∆k))1A∆kge0

minus(c1qminusc2∆k)1B∆klt0 (13a)

rLoLsk [MWh] = max(0 XE[Wk] + Y E[Dtk +Dok]minus∆k)(13b)

rQualitysk [MWh] = dtk + dok (13c)

with ∆k = wk + q minus dtk minus dok representing the surplus ofsupply on demand X and Y defined in the following and 1 theindicator function Reward (13a) subtracts operating costs to the

2For simplicity we assume that the distributions of Dt and Doare known with certainty In fact these distributions are usuallyestimated more accurately than the wind forecast by analyzing thehistory of measured data [10] Adding uncertainty models also forthese distributions is left as future work

operator revenue to compute the net profit Indicator 1A (1B) cor-responds to the scenario when the sum of day-ahead dispatchedand wind energy is sufficient (insufficient) to cover the demandIn the latter case fast-start energy needs to be dispatched in real-time Reward (13b) computes the Loss of Load (LoL) In prac-tical scenarios the amount of fast-start energy available in real-time is limited Often this limit is computed with the formulaFS le XE[W ] + Y E[Dt +Do] (eg X = 3 Y = 10) [30]If ∆ + FS lt 0 the network incurs in a LoL with potentiallyrisky consequences Reward (13c) accounts for user demand in-centivized by energy pricing

Finally we mark all states with ∆ +FS lt 0 with the label riskand use label abs for the absorbing state so Ω = risk abs

The optimal strategy σlowast = (ulowast Qlowast vlowastk) 1 le k le K is thesolution of problem

Wlowasts0 = maxQu

minfas isinFprimeas

Eσfas

W EσDt maxv1middotmiddotmiddotvK

EσDorewrProfit(πK)

stMprimeE σlowast |=Nat φ where (14)

φ = RrLoL

leEENSM [C abs] andRrQuality

geQoSm [C abs]

and Pge1minusLoLPM [notrisk U abs]

In Problem (14) we maximize the expected value of the operatorprofit Wlowasts0 under the worst-case resolution of uncertainty in thewind-energy forecast by summing the instantaneous state rewardsrProfits along the paths π isin Πfin of K steps ofMprimeE correspond-ing to the K T2-slots By replicating the decision process for eachof the K T2-slots while building modelMprimeE every K-step execu-tion path traverses all decision epochs so the computed reward asintroduced in Definition 26 and 27 will account for the sum of thecontributions of each decision epoch

According to the semantics defined in Table 1 the PCTL specifi-cation φ constrains the expected operator risk and user QoS acrossthe decision horizon EENSM is the desired maximum value ofExpected Energy Not Served LoLPM is the maximum allowedvalue of Loss of Load Probability (these two properties limit therisk for the operator) andQoSm is the minimum value of QoS thatneeds to be guaranteed to the users

6 EXPERIMENTAL RESULTSWe implemented the algorithm in Python and interfaced it with

PRISM [31] as a front-end for entering models and with Gurobi [27]as the back-end optimizer Experiments were run on a 24 GHz In-tel Xeon with 32GB of RAM

In this section we present experimental results obtained by solv-ing Problem (14) Our goals are to give insight about the algo-rithm functionality compare its scalability to other strategy synthe-sis approaches and show that the synthesized energy pricing anddispatch strategies can achieve better performances than other so-lutions presented in the literature

We define the following quantities

Profit = Wlowasts0EENS = Rr

LoLσlowastmaxs0 [C abs]

QoS = RrQualityσlowastmins0 [C abs]

1minus LoLP = Pσlowastmin

s0 [notrisk U abs]

where the min and max operators refer to the action range of na-ture Nat As defined in Section 22 these quantities represent thequantitative values of rewards and satisfaction probability that getthen compared to the corresponding thresholds (EENSM QoSmLoLPM ) in Problem (14) to determine the satisfaction of φ Wewill then normalize the Profit to ProfitM the maximum com-puted profit value for each set of experiments We set T1 = 1hT2 = 30min so K = 2 and consider two pricing options bothfor traditional and opportunistic users If not otherwise stated

Fig 5 Performance vs iteration

Fig 6 Profit vs confidence in forecast

we will use CL = 90 and discretize the wind energy W in 5bins and traditional Dt opportunistic Do demands and baselinesupply Q in 2 bins We set QoSm = 80

sumk E[Dtk +Dok]

LoLPM = 10 EENSM = 5sumk E[Wk + q] The other pa-

rameter values were taken from [10]In Fig 5 we show the trend of the expected system performance

as a function of the synthesis algorithm iteration The Profitmonotonically decreases until the proposed candidate strategy σcmeets all specifications We note that 1minusLoLP andQoS (EENSnot shown has a trend similar to 1 minus LoLP ) instead vary non-monotonically Intuitively this is because the Profit can be in-creased either by scheduling less baseline energy Q to reduce can-cellation costs cp but incurring in a higher risk or by increasing the

Table 2 Performance Analysis

W bins 5 10 15 20Profit 1 098 097 0965

1minus LoLP 099 099 099 099EENS 098 098 098 098QoS 101 101 101 101

Runtime 144s 400s 1368s 3289sIter 223 53 547 332N + Tr 1343 2719 4115 5591

MD Strat (I) 4096 42e6 43e9 44e12

005 010 015 020 025 030082084086088090092094096098100

Pro

fit

Pro

fit M

Performance Comparison vs Varaiya et al [7] and He et al [10]

Ours[7][10]

005 010 015 020 025 03000020406081012

EE

NS

EE

NS M

Ours[7][10]

005 010 015 020 025 030Wind Penetration ηW

088090092094096098100102

QoS

QoS

m Ours[7][10]

Fig 7 Comparison to alternative approaches via Monte Carlo sim-ulation

energy price v for opportunistic users with consequent reduction ofQoS (the peaks of 1 minus LoLP are aligned to the valleys of QoS)By ranking strategies by expected Profit our algorithm is capa-ble of selecting the optimal strategy despite the complex parameterinterdependences in the model under analysis

In Table 2 we compare synthesis results while varying the num-ber of discretization bins for W (all values are normalized to thecorresponding specification) First we note that the expected sys-tem performances do not substantially vary by changing the num-ber of bins supporting our choice of 5 bins in the other experi-ments Second runtime results show that the algorithm can handlein reasonable time problems more than 10times larger than the onesanalyzed in [14] the only other algorithm in the literature capableof processing arbitrary PCTL formulas (we useN+Tr the sum ofstates and transitions in the EMDP as a proxy of the model size)Third we note that the alternative approach of verifying all the IMD strategies σ isin ΣMD is not practical due to the exponentialincrease of I (defined in Section 211) with the problem size

In Fig 6 we study the effect of different confidence levels CL inthe wind-energy forecast on the expected operator Profit whilevarying the value of wind penetration ηW =

sumKk=1

E[Wk]E[Wk + q]

and keeping all constraints constant At high CL higher profitscan be expected for increasing ηW (wind energy is assumed free)On the other hand for low CL higher wind penetration createsmore uncertainty thus lowering the expected Profit The networkoperator can use these curves to assess the return of investment inemploying more accurate (and expensive) forecast techniques

Finally in Fig 7 we compare results with two other energy-pricing formulations proposed in the literature He et al [10] solvethe optimization problem without enforcing any constraint Varaiyaet al [7] only put limits on the acceptable LoLP (their approach isnot trivially extendable to reward properties expressed using the Roperator) and solve optimally only for LoLP = 0 Comparison isdone by solving the different optimization problems and then run-ning Monte Carlo simulations (1000 runs) of the controlled systemon test data (different from the training ones) to evaluate its per-formance (Profit EENS and QoS) As expected the uncon-strained strategy from [10] has higher Profit (up to 5) but alsoup to 12 more EENS and 10 less QoS compared to our ap-proach The strategy from [7] guarantees null EENS but it hasup to 6 less Profit (due to over constraining EENS) and 10less QoS (which is left unconstrained)

As a final remark we note that runtime may increase exponen-tially as we tighten the specification thresholds (QoSm LoLPM EENSM ) since it becomes increasingly more difficult to find a

solution within the exponentially-sized search space Neverthelessthe chosen values were tight enough to improve the quality of alter-native energy pricing and dispatch strategies proposed in the litera-ture while maintaining the runtime acceptable for this application

7 CONCLUSIONS AND FUTURE WORKWe proposed the first sound and complete algorithm for the syn-

thesis of control strategies for MDPs satisfying properties expressedin PCTL and robust to uncertainties in the transition probabilitiesWe then applied the algorithm to the problem of energy pricingand dispatch in smart grids with renewables Results showed thatnetwork-operator risks can be effectively constrained at design timeand that more accurate predictions of the expected profit can be ob-tained by taking the uncertainty of wind availability into account

As future work we plan to investigate techniques to generatemore concise constraints to prove failure of the verification step inorder to prune more effectively the search space for the optimiza-tion engine and to apply the proposed strategy synthesis approachto further case studies eg semi-autonomous car driving

Acknowledgments The authors thank J Venkatesh and profT Simunic Rosing for providing the measurement data used in thecase study The research was partially funded by DARPA AwardNumber HR0011-12-2-0016 and by TerraSwarm one of six cen-ters of STARnet a Semiconductor Research Corporation programsponsored by MARCO and DARPA Approved for public releasedistribution is unlimited The content of this paper does not neces-sarily reflect the position or the policy of the US government andno official endorsement should be inferred

8 REFERENCES[1] Courcoubetis C and Yannakakis M ldquoThe Complexity of

Probabilistic Verificationrdquo Journal of ACM vol 42(4) pp857ndash907 1995

[2] H Hansson and B Jonsson ldquoA Logic for Reasoning AboutTime and Reliabilityrdquo Formal Aspects of Computing vol6(5) pp 512ndash535 1994

[3] A Bianco and L de Alfaro ldquoModel Checking ofProbabilistic and Nondeterministic Systemsrdquo in Proc ofFSTTCS ser LNCS Springer Berlin Heidelberg 1995vol 1026 pp 499ndash513

[4] M Puterman Markov Decision Processes DiscreteStochastic Dynamic Programming John Wiley and Sons1994

[5] ldquoInternational Energy Agency World Energy Outlook 2009rdquo[Online] httpwwwworldenergyoutlookorgdocsweo2009WEO2009_es_englishpdf

[6] A Gore An Inconvenient Truth New York Rodale 2006[7] P Varaiya F Wu and J Bialek ldquoSmart Operation of Smart

Grid Risk-Limiting Dispatchrdquo Proceedings of the IEEEvol 99 no 1 pp 40ndash57 2011

[8] K Porter and J Rogers ldquoSurvey of Variable GenerationForecasting in the Westrdquo in NREL Subcontract ReportSR-5500-54457 April 2012 p 56

[9] S Koch et al ldquoModeling and Control of AggregatedHeterogeneous Thermostatically Controlled Loads forAncillary Servicesrdquo Proc of PSCC pp 1ndash7 2011

[10] M He S Murugesan and J Zhang ldquoMultiple TimescaleDispatch and Scheduling for Stochastic Reliability in SmartGrids with Wind Generation Integrationrdquo in Proc of IEEEINFOCOM 2011 pp 461ndash465

[11] A Puggelli et al ldquoPolynomial-Time Verification of PCTLproperties of MDPs with Convex Uncertaintiesrdquo in Proc of

CAV ser LNCS Springer Berlin Heidelberg 2013 vol8044 pp 527ndash542

[12] A Puggelli ldquoFormal Techniques for the Verification andOptimal Control of Probabilistic Systems in the Presence ofModeling Uncertaintiesrdquo PhD dissertation University ofCalifornia Berkeley Berkeley CA 2014

[13] C Baier et al ldquoController Synthesis for ProbabilisticSystemsrdquo in Exploring New Frontiers of TheoreticalInformatics Springer US 2004 vol 155 pp 493ndash506

[14] M Lahijanian et al ldquoTemporal Logic Motion Planning andControl with Probabilistic Satisfaction Guaranteesrdquo IEEETransactions on Robotics vol 28 no 2 pp 396ndash409 2012

[15] M Kwiatkowska and D Parker ldquoAutomated Verificationand Strategy Synthesis for Probabilistic Systemsrdquo in Proc ofATVA ser LNCS vol 8172 Springer 2013 pp 5ndash22

[16] K Sen et al ldquoModel-Checking Markov Chains in thePresence of Uncertaintiesrdquo in Proc of TACAS ser LNCSSpringer Berlin Heidelberg 2006 vol 3920 pp 394ndash410

[17] A Nilim and L El Ghaoui ldquoRobust Control of MarkovDecision Processes with Uncertain Transition MatricesrdquoJournal of Operations Research pp 780ndash798 2005

[18] M Y Vardi ldquoAutomatic Verification of ProbabilisticConcurrent Finite State Programsrdquo in Proc of SFCS 1985pp 327ndash338

[19] V Forejt et al ldquoAutomated Verification Techniques forProbabilistic Systemsrdquo Formal Methods for EternalNetworked Software Systems vol 6659 pp 53ndash113 2011

[20] F Bouffard and F Galiana ldquoStochastic Security forOperations Planning with Significant Wind PowerGenerationrdquo in IEEE Transaction on Power Systems 2008vol 23 no 2 pp 306ndash316

[21] A Kucera and O Stražovskyacute ldquoOn the Controller Synthesisfor Finite-State Markov Decision Processesrdquo in Proc ofFSTTCS ser LNCS Springer Berlin Heidelberg 2005vol 3821 pp 541ndash552

[22] K Draeger et al ldquoPermissive Controller Synthesis forProbabilistic Systemsrdquo in Proc of TACAS ser LNCSSpringer 2014 pp 531ndash546

[23] T Brazdil et al ldquoStochastic Games with Branching-TimeWinning Objectivesrdquo in Proc of IEEE SLCS Aug 2006 pp349ndash358

[24] C Hang et al ldquoSynthesizing Cyber-Physical ArchitecturalModels with Real-Time Constraintsrdquo in Proc of CAVBerlin Heidelberg Springer-Verlag 2011 pp 441ndash456

[25] P Nuzzo et al ldquoCalCS SMT Solving for Non-linearConvex Constraintsrdquo in Proc of FMCAD 2010 pp 71ndash80

[26] S Boyd and L Vandenberghe ldquoConvex OptimizationrdquoCambridge University Press 2004

[27] ldquoGurobi Optimizerrdquo [Online] httpwwwgurobicom[28] S Stoft Power System Economics Designing Markets for

Electricity NewYork Wiley 2002[29] Y Wan ldquoLong-term Wind Power Variabilityrdquo NREL

Technical Report TP-5500-53637 [Online]httpwwwnrelgovdocsfy12osti53637pdf January 2012

[30] E Ela et al ldquoOperating Reserves and Variable GenerationrdquoNREL Technical Report [Online]httpwwwnrelgovdocsfy11osti51978pdf August 2011

[31] M Kwiatkowska et al ldquoPRISM 40 Verification ofProbabilistic Real-Time Systemsrdquo Proc of CAV pp585ndash591 2011

Page 3: Robust Strategy Synthesis for Probabilistic Systems Applied to …sseshia/pubdir/emsoft... · 2014-08-07 · Robust Strategy Synthesis for Probabilistic Systems Applied to Risk-Limiting

Assumption 22 CMDP Semantics CMDPs model nondetermin-istic choices made from a convex set of uncountably many choicesEach time a state is visited a transition distribution within the set isadversarially picked and a probabilistic step is taken accordingly(The same semantics is used for IMDPs in [16])

A transition between state s to state sprime in a CMDP occurs in threesteps First an action a isin A(s) is chosen nondeterministicallySecondly an observed PD fas isin Fas is chosen The selection of fasmodels uncertainty in the transition Lastly a successor state sprime ischosen randomly according to the transition PD fas

A path π in MC is a finite or infinite sequence of the form

s0

fa0s0s1minusminusminusrarr s1

fa1s1s2minusminusminusrarr middot middot middot with faisisi+1

gt 0 foralli ge 0 si isin Sai isin A(si) We indicate with Πfin (Πinf ) the set of all finite (in-finite) paths ofMC πs[i] (πa[i]) is the ith state (selected action)along the path and for finite paths last(π) is the last state visitedin π isin Πfin Πs = π | π[0] = s is the set of paths starting in s

The algorithm presented in Section 4 can be applied both to theinterval and ellipsoidal models of uncertainty [11] (and to a mix ofthem) but in this paper we will focus on the latter one since it ismore suitable for the application analyzed in Section 5 This modelis a second-order approximation of the likelihood model of uncer-tainty (likelihood models are often used when transition probabil-ities are determined experimentally) and it is more accurate thana linear one [17] The transition frequencies associated to actiona isin A are collected in matrix Ha Uncertainty in each row of Ha

can be described by the likelihood region gas

gas = fas isin RN|sumsprime h

assprime log(fassprime)geβ

as (1)

where βas lt βasmax =sumsprime h

assprime log(hassprime) represents the uncer-

tainty level The second-order approximation of gas is [17]

gas asymp fas isin RN|sumsprime

(fassprimeminushassprime)

2

hassprime

le(Kas )2 (2)

withKas = 2(βasmaxminusβas ) ge 0 representing the uncertainty levelWe then write the approximation of gas in conic form and intersectit with the probability simplex to obtain the uncertainty set

Fas = fas isin RN | fas ge 01T fas = 1 (3)Eas (fas minus has) 2 le 1 Eas 0

with Eas = (Kas )minus1 times diag((hass0)minus05 middot middot middot (hassN )minus05

) 0

positive definite Note that the conic representation is crucial toenhance the scalability of the algorithmic approach presented inSection 4 because convex solvers process conic constraints moreefficiently than generic quadratic constraints

Definition 23 Ellipsoidal-MDP (EMDP) An EMDP is a CMDPwhere every uncertainty set Fas isin F is expressed using Set (3)

We determine the size R of the CMDP MC as follows MChas N states O(M) actions per state and O(N2) transitions foreach action Let Da

s denote the number of constraints required toexpress the rectangular uncertainty set Fas and D = max

sisinSaisinADas

The overall size ofMC is thusR = O(N2M +NMD)To illustrate our results we will use the EMDPMC in Fig 1

with S = s0 middot middot middot s3 S0 = s0 A = a b Ω = ϑ absA s0 s1 rarr a b s2 s3 rarr a L s0 s2 rarr ϑ s3 rarr abs The parameters of the uncertainty ellipsoids are shownnext to each transition

211 Strategies and NatureTo analyze quantitative properties we need a probability space

over infinite paths [18] However a probability space can onlybe constructed once nondeterminism and uncertainty have been re-solved We call each possible resolution of nondeterminism a strat-egy which chooses an action in each state ofMC

s013rs=113ϑ13

s113rs=313empty13

s313rs=013

abs13

s213rs=113ϑ13

b13

a 13

rb= 613K=0113

ra= 913K=0213

h02b = 025

h01b = 075

h02a = 035

h03a = 065

h23a =1

h33a =1

h12b =1

h12a = 08

h11a = 02

b 13

a13

a13a13

Fig 1 Example EMDP

Definition 24 Strategy A randomized strategy forMC is a func-tion σ = Πfin times A rarr [0 1] with

sumA(last(π)) σ(π a) = 1 and

a isin A(last(π)) if σ(π a) gt 0 We call Σ the set of all strategiesσ ofMC

Conversely we call a nature [17] each possible resolution of un-certainty ie a nature chooses a transition PD for each state andaction ofMC

Definition 25 Nature Given action a isin A a randomizednature is the function ηa Πfin times Dist(S) rarr [0 1] withintFalast(π)

ηa(π fas ) = 1 and fas isin Falast(π) if ηa(π fas ) gt 0 We

call Nat the set of all natures ηa ofMC

A strategy σ (nature ηa) is Markovian (M) if it depends only onlast(π) Also σ (ηa) is deterministic (D) if σ(π a) = 1 for somea isin A(last(π)) (ηa(π fas ) = 1 for some fas isin Falast(π)) Asexplained in Section 1 we developed a synthesis algorithm for MDstrategies There are in total I =|As0 | times middot middot middot times |AsN |= O(MN )

MD strategies collected in the set ΣMD After fixing a strategy σ all the non-determinism inMC is re-

solved For MD strategies we obtain the induced Convex MarkovChain (CMC) Mσ

C = (S S0ΩF X L) MσC has still size R

since the state space S does not change for MD strategies Furtherthe only convex set Fas isin F of transition PDs available at eachstate s isin S is the one corresponding to the action a isin A(s) suchthat σ(s a) = 1

212 RewardsRewards allow modeling additional quantitative measures of a

CMDP eg profit We associate rewards to states and to actionsavailable in each state

Definition 26 Reward Structure and Path Reward A re-ward structure for a CMDP MC is a tuple r = (rs ra)comprising a state (action) reward function rs S rarr Rge0

(ra S timesArarr Rge0) Given a (possibly infinite) path πwith horizon T isin N cup +infin the path reward for π isrewr(π T ) =

sumTt=0 rs(πs[t]) + ra(πs[t] πa[t])

In this paper we will rank the available MD strategies for theCMDP based on their total expected reward

Definition 27 Total Expected Reward The total expected rewardfor state s isin S under strategy σ isin ΣMD is defined as

Wσs = min

ηaisinNatEση

a

[rewr(π T )] (4)

where we minimize across the action range ηa isin Nat of the ad-versarial nature the expected reward over all paths starting from swith horizon T visited under strategy σ

We will only consider CMDPs such that Wσs exists and it is

finite foralls isin Sforallσ isin ΣMD These include finite and infinite-horizon CMDPs (T isin N cup +infin) with zero-reward absorbingstates (ω = abs) as the ones used in Section 5 An extension toother classes of CMDPs eg with negative rewards is possible insome specific cases but outside the scope of the paper For moredetails see [4] Finally according to Assumption 22 we can sub-stitute ηa isin Nat with fas isin Fas and only consider MD natures

22 Probabilistic Computation Tree LogicWe use Probabilistic Computation Tree Logic (PCTL) a proba-

bilistic logic derived from CTL which includes a probabilistic op-erator P [2] and a reward operator R [19] to express properties ofCMDPs [3] The syntax of the fragment of the logic in which oper-ators with a finite time bound are disallowed is defined as follows

φ = True | ω | notφ | φ1 and φ2 | P1p [ψ] | Rr1v[ρ] state formulasψ = X φ | φ1 U φ2 path formulasρ = C φ rewards

with ω isin Ω an atomic proposition 1isin le ltge gt p isin [0 1]v isin Rge0

Path formulas use the Next (X ) and Unbounded Until (U ) op-erators They are evaluated over paths and only allowed as param-eters to the P operator Reward formulas use the Cumulative (C )operator The sizeQ of a PCTL formula is defined as the number ofBoolean connectives plus the number of P and R operators in the

formula We define Pσηa

s [ψ]def= Prob

(π isin Πσηa

s | π |= ψ)

the probability of taking a path π isin Πs that satisfies ψ under strat-egy σ and nature ηa Pσmaxs [ψ] (Pσmins [ψ]) denotes the maxi-mum (minimum) probabilityPση

a

s [ψ] across all natures ηa isin Natfor a fixed strategy σ An analogous definition holds also for re-ward properties which can be expressed also using (multiple) re-ward structures different from the one used to maximize the totalexpected reward For a CMDP MC strategy σ and property φwe will useMC σ |=Nat φ to denote that when starting from anyinitial state s isin S0 and operating under σMC satisfies φ for anyηa isin Nat The semantics of the logic is reported in Table 1 wherewe write |= instead of σ |=Nat for brevity

3 RELATED WORKRelated work falls into two main categories renewable-energy

pricing in smart grids and strategy synthesis from PCTL specifica-tions for probabilistic systems

The integration of renewable energy sources in power grids hasmotivated the development of stochastic frameworks to solve theenergy-pricing problem The work in [20] presents a risk-limitingoptimization framework That effort focuses on modeling the un-certainty in energy availability on the supply side but it does not

Table 1 PCTL semantics forMC s |= Trues |= ω iff ω isin L(s)s |= notφ iff s 6|= φs |= φ1 and φ2 iff s |= φ1 and s |= φ2

s |= Pp(p) [ψ] iff Pσmax(min)s [ψ] p(p)

s |= Rv(v) [C φ] iff Eσmax(min)s [rewr(π tφ)] v(v)

π |= X φ iff π[1] |= φπ |= φ1 U φ2 iff existi ge 0 | π[i] |= φ2 and forallj lt i π[j] |= φ1

rewr(π tφ) = Σtφt=0rs(πs[t]) + ra(πs[t] πa[t])

tφ = mint | πs[t] |= φ

consider the problem of controlling user demand through economicincentives The effectiveness of demand response in balancing sup-ply and demand in power grids was studied in [9] and a stochas-tic framework to optimize operator profits was presented in [10]We closely follow the optimization setup presented in these worksbut we also constrain the operator risk and the user QoS at syn-thesis time Finally Varaiya et al argued the need to quantita-tively constrain the operator risk in [7] The risk-limiting dispatchapproach proposed in that work is optimal for properties of theform Pge1 or Ple0 but sub-optimal for properties with satisfactionthreshold p isin (0 1) Further QoS is not considered

The problem of strategy synthesis for MDPs from PCTL speci-fications was first studied in [13] Strategies are divided into fourcategories depending on 1) whether the transition is chosen deter-ministically (D) or randomly (R) 2) the choice does (does not) de-pend on the sequence of previously visited states (Markovian (M)and History-dependent (H)) Also it is proven that the four types ofstrategies form a strict hierarchy (MD ≺ MR ≺ HD ≺ HR) andthat determining whether it exists an MDMR (HDHR) strategythat meets all specifications is NP-complete (elementary) Kuceraet al [21] show how to synthesize MR controllers that are robustto linear perturbations via a reduction to a formula in the first-order logic of the reals This work is the closest to ours albeitwe consider also non-linear models of uncertainties Also to thebest of our knowledge the algorithm has not been applied to anycase study In [14] and [15] routines for the model-checking ofPCTL properties of MDPs are adapted to the strategy synthesisproblem These algorithms are polynomial in the model size butthey are not complete (ie there might exist a solution even if noneis reported) [14] or can handle properties with only one quanti-tative operator [15] In [22] the synthesis of multi-strategies forMDPs is studied This approach can handle only a subset of PCTLproperties and it only considers MDPs with no uncertainties Fi-nally [23] studies two-player games with winning objectives ex-pressed in PCTL Also our formulation can be interpreted as agame where the controller plays against nature On the other handfollowing the CMDP semantics defined in Assumption 22 we givenature the power of selecting a different strategy at each executionstep while [23] aims to analyze the more complex problem of find-ing the single optimal strategy for nature Our formulation is usefulto model time-varying processes (eg the power generated by thewind) and an optimal strategy for the controller can be synthesizedalgorithmically The existence of an optimal controller and naturein the formulation of [23] is instead undecidable in general

4 CONSTRAINED TOTAL EXPECTED RE-WARD MAXIMIZATION FOR CMDPS

We formally define the optimization problem under analysis provethat its decision problem version is NP-complete and present an al-gorithm to solve the former

Constrained Total Expected Reward Maximization for EMDPsGiven an EMDPMC a reward structure r and a PCTL formulaφ determine strategy σlowast forMC such that

σlowast = argmaxσisinΣMD

φ

WσS0

(5)

where WσS0

is the sum of the expected rewards over all the initialstates s isin S0 and ΣMD

φ is the set of Markov-Deterministic strate-gies for whichMC satisfies φ for any ηa isin Nat starting from anys isin S0 and operating under σ isin ΣMD

φ The same results hold for the dual problem of minimizing the

EMDP total expected cost after substituting all ldquomaxrdquo (ldquominrdquo) op-erators with ldquominrdquo (ldquomaxrdquo) in the derivations below

We will use the following lemmas (the proofs are available in thereferences)

Opmizaon13 Engine13 (strategy13 ranking)13

Feasible13 Candidate13

op0mal13 solu0on13 NO13 addi0onal13 constraint13

Op0mal13 strategy13 that13 sa0sfies13 proper0es13

No13 strategy13 sa0sfies13 proper0es13

Verificaon13 Engine13 (verifies13 if13 properes13 are13 sasfied)13

Unfeasible13

YES13

Fig 2 Lazy algorithm for the constrained optimization

Lemma 41 Complexity of PCTL model-checking forCMDPs [11] Verifying that a CMDP MC satisfies a PCTL for-mula φ is solvable in polynomial time

Lemma 42 Computation of the total expected reward for Con-vex Markov Chains (CMCs) [17] Given a CMC its total ex-pected reward is computable in polynomial time

Lemma 43 Complexity of PCTL strategy synthesis forMDPs [13] The problem of determining the existence of an MDstrategy σ for an MDPM such thatM σ |= φ is NP-complete

We now prove that the decision problem version of Problem (5)is NP-complete

Theorem 41 The problem of determining the existence of an MDstrategy σ for an EMDP MC with total expected reward Wσ

S0

larger or equal to WT and satisfying a specification φ in PCTLis NP-complete

Proof Given a candidate solution σc we can in polynomial time1) check whether MC σc |=Nat φ by Lemma 41 2) computeWσcS0

on the induced CMCMσcC by Lemma 42 σc is a solution if

and only if check 1) passes and WσcS0ge WT This proves that the

problem is in NP To prove NP-hardness we reduce the problem inLemma 43 to the one under analysis We set WT = 0 and de-scribe transition probabilities with point ellipses ie ellipses withnull axes (Kas rarr +infin foralls isin S foralla isin A(s) in Equation (2))

In the rest of the section we describe an algorithm to solve Prob-lem (5) We use a lazy approach based on strategy-iteration con-ceptually similar to the ones proposed in [24] and for non-linearconstraints [25] As shown in Fig 2 the algorithm is split into twomain routines communicating in a loop At each iteration the opti-mization engine (OE) is responsible to generate a candidate strategyσc σc maximizes the total expected reward but it does not neces-sarily satisfy PCTL property φ since the OE formulation does notcontain any information about φ The candidate solution is thenpassed to the verification engine (VE) which checks whetherMCsatisfies φ for all resolutions of uncertainty ie forallηa isin Nat whenoperating under σc If the check passes σlowast = σc and the algo-rithm terminates Otherwise the VE generates an additional con-straint for the OE to prevent the previous σc to be selected againand the loop repeats The novelty of our approach is devising amathematical formulation for the OE capable of generating at eachiteration the candidate strategy that maximizes the total expectedreward among those still available (at each iteration in which theVE reports a failure the candidate strategy is discarded) The firstcandidate strategy that satisfies the PCTL specification becomes the

solution of the synthesis problem Such a strategy is feasible (sat-isfies φ) and it has the maximum total reward among strategies thatsatisfy φ so it solves Problem (5) exactly

In general there is not guarantee on the uniqueness of the op-timal strategy σlowast ie multiple strategies with the same expectedreward might exist each satisfying φ Since every such strategyis equivalent from the user perspective the algorithm just reportsthe first one found Alternatively all equivalent strategies could begenerated by continuing the iteration between the OE and the VEuntil finally a reduction in the expected reward gets detected andby reporting all synthesized strategies satisfying φ

The next subsections give details on the OE and VE and analyzethe algorithm properties

41 Optimization EngineWe start with the classical linear-programming (LP) formulation

to maximize the total expected reward for MDPs [4]

minxl

sumsisinS0

xs

st xs minus las = ras + xTfas foralls isin Sforalla isin A(s) (6)xs l

as ge 0 foralls isin Sforalla isin A(s)

Vector x collects the total expected reward for each state s isin S (atthe end of the optimization Wσc

s = xs foralls isin S) and the cost func-tion sums the total expected rewards for all the initial states s isin S0We then set Wσc

S0=sumsisinS0

xs Variables las are slack variablesfor each constraint Also ras = rs(s) + ra(s a) foralls isin Sforalla isin ASince the slack variables have negative sign the slack can only benegative ie the left-hand side (LHS) can only be larger or equalthan the right-HS (RHS) The ldquominrdquo operator makes sure that foreach state the constraint with the highest RHS has null slack ielas = 0 The optimal strategy can then be reconstructed by selectingthe action a isin A(s) foralls isin S corresponding to the constraint withnull slack eg σc(s0 a) = 1 if las0 = 0 Our goal is to modifysuch a formulation to allow a sub-optimal solution to be selected incase the optimal solution does not satisfy the PCTL specificationWe now describe an equivalent formulation to the original prob-lem that is more suitable to achieve this goal We will describe inSection 42 how to add constraints to this formulation to select sub-optimal solutions in order of optimality We refer to Problem (7)

maxxzln

sumsisinS0

xs

st xs minus las + nas = ras + xTfas foralls isin Sforalla isin A(s) (7a)las le Bzas nas le Bzas foralls isin Sforalla isin A(s) (7b)

zTs 1 = Ms minus 1 foralls isin S (7c)xs l

as n

as ge 0 zas isin 0 1foralls isin Sforalla isin A(s)

We associate a binary variable zas to each action for every stateso the problem becomes a Mixed-Integer Linear Program (MILP)zajsi = 0 if action aj is chosen for state si and Constraint (7c) guar-

antees that only one action can be selected for each state whereMs = |A(s)| For example σc(s0 a) = 1 if zas0 = 0 We then as-sociate to each constraint a second slack variable nas with sign op-posite to las For selected actions zaisi = 0 Constraint (7b) makessure that zajsi = 0 implies lajsi = 0 and najsi = 0 so that the cor-responding Constraint (7a) sets the value of xs1 For unselectedactions zaksi = 1 variable laksi gt 0 (naksi gt 0) if selecting ac-tion ak had resulted in a lower (higher) value of xsi With these

1 B is a big number with respect to the problem data It needs to behigher than the reward computed for any state ie B ge xs foralls isinS but not excessively high to avoid convergence problems in theoptimizer when solving Problem (7) A suitable value of B can befound by trial and error

constraints any action can be selected We finally change the op-timization operator to ldquomaxrdquo so that at the first iteration of thealgorithm the total expected reward gets maximized

We now proceed to consider uncertainties in the transition prob-abilities Constraint (7a) gets updated to Constraint (8) since theadversarial nature tries to minimize the expected reward The newconstraint can be made linear again for an arbitrary uncertaintymodel by replacing it with a set of constraints one for each pointin Fas However this approach results in infinite constraints if theset Fas contains infinitely many points as in the cases consideredin the paper thus making the problem not solvable We solve thisdifficulty for the ellipsoidal uncertainty model using duality InConstraint (9) we replace the primal inner problem with its dualforalls isin S a isin A(s)

xs minus las + nas = ras + minfas isinFas

xTfas (8)

dArrxs minus las + nas = ras + max

λasisinDas (x)

g (λas ) (9)

Vector Lagrange multiplierλas =

[λa1s λ

a2sλ

a3s

]Dual cost functiong (λas ) = λa1s minus λa2s minus ha

sTEasλ

a3s

Dual feasibility setDas (x) = λas | λa3s2 le λa2s xminus λa1s1 + Eas

Tλa3s = 0

The dual problem is convex by construction [26] and has size poly-nomial in the size R of the EMDP [11] Since also the primalproblem is convex strong duality holds ie the primal and dualoptimal solutions coincide because the primal problem satisfiesSlaterrsquos condition [26] for any non-trivial uncertainty set Fas Anydual solution underestimates the primal solution When substitut-ing the primal problem with the dual in Constraint (9) we can dropthe inner optimization operator because the outer optimization op-erator will nevertheless aim to find the least underestimate to max-imize its cost function We get the full formulation for the OE (thequantifiers foralls isin Sforalla isin A(s) are equal to Problem (7))

maxxλlnz

sumsisinS0

xs

st xs minus las + nas = ras + g (λas )

las le Bzas nas le Bzas (10)

zTs 1 = Ms minus 1

xs las n

as ge 0λas isin Das (x) zas isin 0 1

For the ellipsoidal model Problem (10) can be written as Prob-lem (11) which is a Mixed-Integer Quadratic-Constrained Program(MIQCP) that we solve using the back-end optimizer Gurobi [27]

maxxλzln

sumsisinS0

xs

st xs minus las + nas = ras + λa1s minus λa2s minus hasTEasλ

a3s

las le Bzas nas le Bzas

zTs 1 = Ms minus 1 (11)

xs las n

as λ

a2s ge 0 λa3s ge 0 zas isin 0 1

λa3s2 le λa2s xminus λa1s1 + EasTλa3s = 0

42 Verification EngineAfter fixing a candidate strategy σc all the non-determinism inMC is resolved The VE has thus the task of checking whetherthe induced Ellipsoidal-MC (EMC) Mσc

C = (S S0ΩF X L)satisfies the PCTL formula φ for all resolutions of uncertainty ieforallηa isin Nat We use the sound and complete verification algo-rithm presented in [11 12] (we remember that EMCs are a special

case of CMDPs) If the EMC satisfies φ then the optimal strat-egy has been found and σlowast = σc Otherwise the VE needs togenerate an additional constraint to be passed to the OE so thatthe same candidate solution does not get selected anymore If vec-tor zc = [zas0 middot middot middot z

asN ] collects all the binary decision variables that

were set to zero in the previous round of optimization ie the vari-ables corresponding to the previously selected actions we just needto add constraint

zTc 1 ge 1 (12)

so that it is not possible to select the same set of actions againAs an example we optimize the total expected reward of the

EMDP in Fig 1 subject to property φ = Pge08[ϑ U abs] The firstiteration of the OE generates strategy σc1 which selects actions[b a a a] for states [s0 middot middot middot s3] with Wσc1

s0 = 10625 The VEthough reports Pσc1mins0 [ϑ U abs] = 0207 so the strategy is re-jected The VE adds the constraint zbs0 + zas1 + zas2 + zas3 = 1 tothe OE formulation which generates at the second generation σc2([a a a a]) with Wσc2

s0 = 10188 The VE computes the valuePσc2mins0 [ϑ U abs] = 1 and the algorithm terminates reportingσlowast = σc2

43 Algorithm AnalysisWe prove soundness (if a strategy σlowast is returned it indeed opti-

mally solves Problem (5)) completeness (if no solution is returnedno strategy σ isin ΣMD exists such thatMC σ |=Nat φ) and ana-lyze the runtime performance of the proposed algorithm

Theorem 42 The algorithm presented in this section to solveProblem (5) is sound complete and has runtime exponential in thesize R of the EMDP and polynomial in the size Q of the PCTLspecification φ

Proof Problem (10) returns the MD strategy σc that maximizesWσS0

among those still available By Lemma 41 the VE is soundso if it returns MC σc |=Nat φ indeed σlowast = σc (exit arrow atthe bottom of Fig 2) The VE is also complete so if it returnsMC σc 6|=Nat φ the current σc can be discarded This is doneby generating a constraint of the form of Constraint (12) whichremoves only the current σc from the strategies to be explored bythe OE This proves soundness of the overall algorithm

Failure of finding a solution is declared only by the OE whenProblem (10) becomes unfeasible because all available strategieshave previously been discarded by the VE (exit arrow at the top ofFigure 2) This proves completeness

Finally the algorithm goes at most through I = O(MN ) itera-tions (from Section 2 I is the total number of MD strategies M isthe number of actions and N the number of states of the EMDP)Each requires solving an instance of Problem (10) and a verificationcheck (done in time polynomial inR andQ by Lemma 41) Prob-lem (10) can be solved by branch-and-bound algorithms in time ex-ponential in the number of binary variables (whose number isMNand remains constant across iterations) and polynomial in the num-ber of constraints (whose number is polynomial in R at the firstiteration and it grows at each iteration limited by I = O(MN ))The total complexity isO(MNtimes

(2MNtimespoly(MN )+poly(Q))

)

exponential inR and polynomial inQ

The algorithm performs better on problems that do have a fea-sible solution arguably the most interesting ones while the opti-mization step could be removed if the goal is to prove unfeasibilityAs an alternative σlowast could be determined by testing all I avail-able MD strategies and selecting the one with the highest rewardamong those satisfying φ [13] We believe (and experimentallyshow in Section 6) that our approach can achieve better runningtime by decoupling the problem into an optimization and a veri-fication part and by testing strategies in order of optimality Fi-

Real-time price13

Day-ahead price13

Network Operator13Energy dispatch and pricing13

Base-line13 Wind13 Fast-start13

Traditional13Users13

Opportunistic13Users13

Day-ahead13dispatch13

Real-time dispatch13

Uncertain demand13

Uncertain supply13

Supply13

Demand13

Fig 3 Network operator inputs (dashed) and outputs (solid)

nally speed-ups can be obtained by implementing online routinesfor integer-constraint simplification and to produce more succinctcertificates of unfeasibility from the VE [2425] These routines areoutside the scope of this paper and will be covered in future work

5 SUPPLY SCHEDULING AND ENERGYPRICING WITH RENEWABLE SOURCES

We model the pricing and dispatch problem following the sce-nario sketched in Fig 3 [7 10 28] More details are availablein [12]

Three agents operate in the system the network operator andtwo types of users traditional and opportunistic Further three en-ergy sources are available non-dispatchable wind and two dis-patchable sources baseline and fast-start generators We will usestochastic variables WDt Do to represent wind energy supplyand traditional and opportunistic user demand respectively

The network operator takes two kinds of decisions 1) dispatchof non-renewable sources to guarantee that the aggregate energysupply matches the demand 2) pricing of energy to maximizeprofits and incentivize users to join or leave the network depend-ing on energy availability Traditional and opportunistic users re-act to pricing decisions on different timescales Traditional usersonly react to day-ahead pricing to decide how much energy theyare willing to purchase while opportunistic users are capable ofrescheduling in real-time their energy demand depending on theenergy price in exchange of lower expected prices

A 24-hour period gets divided into T1-slots of equal length (egT1 = 1h) and each T1-slot into K T2-slots (eg T2 = 30minK = 2) [10] The operator maximizes its economic profit by tak-ing decisions on the two time scales On day-ahead for each T1-slot it dispatches Q units of baseline energy (unit cost c1) withq = QK units per T2-slot and sets the price for traditional users(u) so that they can decide one day ahead when to schedule theirdemand in the following day The choice of u sets the expected de-mand of traditional users (E[Dt]) In real-time for each T2-slot theoperator first observes the values of traditional-user demand dt andwind availabilityw It then sets the price for opportunistic users (v)to determine the expected demand of opportunistic users (E[Do])Third it dispatches the production of more fast-start energy (unitcost c2) or cancels part of the already dispatched baseline energy(paying only unit cost cp instead of c1) depending on wind avail-ability and user demand to balance supply and demand In realscenarios cp lt c1 lt c2 and wind-energy is assumed to be free forbrevity [10] The operator thus tries to use as much wind energyas possible and to dispatch on day-ahead the exact amount of base-line energy not to incur in real-time in cancellation costs or in theextra cost for fast-start supplies Since more profitable strategiesmight imply a higher reliance on the uncertain wind energy or anincrease in energy prices correct system functionality needs limitson energy-unbalance risks and QoS guarantees for users

There are three sources of stochastic behavior (i) wind-energysupply (W ) (ii) traditional (Dt) user demand (iii) opportunistic(Do) user demand We thus use a stochastic optimization frame-work The result of the optimization should return optimal strate-

wα 0 0

w1αdt1αdo1γ

w1αdt1αdo1β

k=013 k=113 k=213

v1a

v1a

v1av1

b

v1b

Qa ua

Qb ub

Qa ub

w1β 0 0

s0

0 0 0

w1α 0 0w1αdt1α 0

w1β dt1β 0w1β dt1α 0

w1αdt1β 0

w1αdt1αdo1α

w1αdt1αdo1δv1b

w2β 0 0

w2α 0 0

wαdtαdoγ

wαdtαdoβva

va

vavb

Qa ua

Qb ub

Qa ub

wβ 0 0

s0

0 0 0

wαdtα 0

wβ dtβ 0wβ dtα 0wαdtβ 0

wαdtαdoα

wαdtαdoδvb

vb

ab uQ

ab uQ

Fig 4 Sketch ofME (top) andMprimeE (bottom) The latter isME

unfolded across decision epochs For each EMDP the initial states0 is shown on the left Each state is represented by the tuple(w dt do) of observations of WDt Do The pairs (Q u) rep-resent the day-ahead decisions about dispatch of baseline energyQand energy pricing for traditional users u Two arrows per decisiondepart from each state because in this figure we assumed only twodiscretization levels for each quantity (labeled with greek lettersα β middot middot middot ) We only show the state graph in the EMDPs related todecision (Qa ua) but similar state graphs are present also for theother decision pairs (Qa ub) (Qb ua) (Qb ub) as hinted by thedashed arrows departing from s0

gies about 1) the day-ahead decisions (Q and u) and 2) the real-time decisions (v) for each possible observation of W and Dt Inreal-time the actual decision (vobs) will be taken deterministicallyamong the synthesized ones based on the observed values w anddt ie the actual wind availability and traditional-user demandIn summary we will optimize over one T1-slot (the decision prob-lem is periodic so we can run one optimization for each T1-slotstand-alone) and aim to determine optimal values for Q u and v

We use the Ellipsoidal-MDPME = (S S0 AΩF AX L)sketched in Fig 4 (top) All quantities are bounded and uniformlydiscretized to keep the state and action spaces finite States s isin Sare a tuple s = (w dt do) where w dt do refer to the observedvalues of available wind energy and user demand in that state Weconsider only one T1-slot per time so we model only one choiceof day-ahead energy dispatch Q and pricing for traditional usersu (at the initial state s0 (Q u) isin A) The process then transi-tions through K decision epochs as follows First values of windenergy (w) and traditional user demand (dt) are stochastically cho-sen according to the corresponding distributions (described below)Second for each observation of W and Dt a decision on v isin A ismade and the opportunistic user demand (do) is stochastically cho-sen To transition between epochs a new value of wind energy w(the backward arrow in Fig 4 (top)) is chosen and the steps repeat

In general the optimal decision v at each epoch depends onprevious observations of wind availability (w) and user demand(dt do) so v is history-dependent To synthesize control strategiesusing the algorithm in Section 4 we unfold the sequence of deci-sion epochs in the EMDPMprimeE = (Sprime S0 A

primeΩF primeAprimeX prime Lprime)in Fig 4 (bottom) InMprimeE we have explicitly marked each quan-tity with an additional subscript k = 1 middot middot middot K to refer to the cor-responding T2-slot Each state s isin S ofME (apart from the initialstate) has been replicated K times in MprimeE s rarr s1 s2 middot middot middot sK After K decision epochs the states transition to an absorbing state(not shown in Fig 4) Since now all decision epochs are explicitly

codified in the model Markov strategies are optimal forMprimeE Thecorresponding optimal history-dependent strategy forME can bereconstructed for each s isin S by collecting the sequence of optimaldecisions returned by the algorithm in the replicas s1 s2 middot middot middot sK In the following we will only considerMprimeE As an additional ad-vantage different wind distributions can be used to transition be-tween different epochs inMprimeE (only one distribution could be usedinME) to account for the time-varying nature of wind availabilityThe modeling expressivity is thus increased

State transition probabilities are computed using the followingstochastic models

User demand The demand of both traditional (Dt) and oppor-tunistic (Do) users is modeled using Gaussian distributions [10]with Dt sim N (αtu

γt βtE[Dt]) and Do sim N (αovγo βoE[Do])

Parameter γt lt 0 (γo lt 0) is the elasticity of the traditional (op-portunistic) users ie the ratio of the percentage change of theexpected demand to that of price variations formally defined asγt = uE[Dt] middot partE[Dt]partu Parameters αt αo βt βo are fitting pa-rameters To compute transition probabilities we truncate and dis-cretize the continuous PDs in equally-sized intervals pick the mid-dle point of each interval as the discretization value and then inte-grate the PD across the interval to determine how likely the systemtransitions to that discretized value2

Wind-energy availability We created a stochastic model of theavailable wind energy starting from measured data collected fromthe wind farm at Lake Benton Minnesota USA [29] The goal is totake forecast values into account while also considering the intrin-sic inaccuracies of these predictions First we compute the (dis-crete) empirical PD microW of a training set of collected wind-energydata Second we divide a new set of data in T2-slots and considerthe average value for each new T2-slot as the forecast energy valueWe then scale microW to have such expected value E[Wk] thus obtain-ing microWk Finally we compute ellipsoidal Sets (3) F primeas (collectedinF prime) to represent uncertainty in the transition probability betweentwo discretized energy levels in two consecutive T2-slots Transi-tion frequencies are computed by counting observed transitions in atraining set of data Using classical results from statistics [17] wethen compute the value of parameter β from Set (1) correspond-ing to the confidence level CL in the measurements In particular0 le CL le 1 and CL = 1minus cdfχ2

d(2 lowast (βmax minus β)) where cdfχ2

d

is the cumulative density function of the Chi-squared distributionwith d degrees of freedom (d is equal to the number of bins used todiscretize W ) Moreover an increasing value of parameter β canbe used in the sequence of decision epochs to model the fact thatforecast farther-away in time are less accurate

We provide the states with thick circles in Fig 4 (bottom) withthree reward structures to express the profit and risk of the oper-ator and the QoS for the users We choose those states becausethe quantities Dtk DokWk are all fully observable in them thusallowing the computation of the rewards We set

rProfitsk [$] = udtk+vkdokminus(cp∆k+c1(qminus∆k))1A∆kge0

minus(c1qminusc2∆k)1B∆klt0 (13a)

rLoLsk [MWh] = max(0 XE[Wk] + Y E[Dtk +Dok]minus∆k)(13b)

rQualitysk [MWh] = dtk + dok (13c)

with ∆k = wk + q minus dtk minus dok representing the surplus ofsupply on demand X and Y defined in the following and 1 theindicator function Reward (13a) subtracts operating costs to the

2For simplicity we assume that the distributions of Dt and Doare known with certainty In fact these distributions are usuallyestimated more accurately than the wind forecast by analyzing thehistory of measured data [10] Adding uncertainty models also forthese distributions is left as future work

operator revenue to compute the net profit Indicator 1A (1B) cor-responds to the scenario when the sum of day-ahead dispatchedand wind energy is sufficient (insufficient) to cover the demandIn the latter case fast-start energy needs to be dispatched in real-time Reward (13b) computes the Loss of Load (LoL) In prac-tical scenarios the amount of fast-start energy available in real-time is limited Often this limit is computed with the formulaFS le XE[W ] + Y E[Dt +Do] (eg X = 3 Y = 10) [30]If ∆ + FS lt 0 the network incurs in a LoL with potentiallyrisky consequences Reward (13c) accounts for user demand in-centivized by energy pricing

Finally we mark all states with ∆ +FS lt 0 with the label riskand use label abs for the absorbing state so Ω = risk abs

The optimal strategy σlowast = (ulowast Qlowast vlowastk) 1 le k le K is thesolution of problem

Wlowasts0 = maxQu

minfas isinFprimeas

Eσfas

W EσDt maxv1middotmiddotmiddotvK

EσDorewrProfit(πK)

stMprimeE σlowast |=Nat φ where (14)

φ = RrLoL

leEENSM [C abs] andRrQuality

geQoSm [C abs]

and Pge1minusLoLPM [notrisk U abs]

In Problem (14) we maximize the expected value of the operatorprofit Wlowasts0 under the worst-case resolution of uncertainty in thewind-energy forecast by summing the instantaneous state rewardsrProfits along the paths π isin Πfin of K steps ofMprimeE correspond-ing to the K T2-slots By replicating the decision process for eachof the K T2-slots while building modelMprimeE every K-step execu-tion path traverses all decision epochs so the computed reward asintroduced in Definition 26 and 27 will account for the sum of thecontributions of each decision epoch

According to the semantics defined in Table 1 the PCTL specifi-cation φ constrains the expected operator risk and user QoS acrossthe decision horizon EENSM is the desired maximum value ofExpected Energy Not Served LoLPM is the maximum allowedvalue of Loss of Load Probability (these two properties limit therisk for the operator) andQoSm is the minimum value of QoS thatneeds to be guaranteed to the users

6 EXPERIMENTAL RESULTSWe implemented the algorithm in Python and interfaced it with

PRISM [31] as a front-end for entering models and with Gurobi [27]as the back-end optimizer Experiments were run on a 24 GHz In-tel Xeon with 32GB of RAM

In this section we present experimental results obtained by solv-ing Problem (14) Our goals are to give insight about the algo-rithm functionality compare its scalability to other strategy synthe-sis approaches and show that the synthesized energy pricing anddispatch strategies can achieve better performances than other so-lutions presented in the literature

We define the following quantities

Profit = Wlowasts0EENS = Rr

LoLσlowastmaxs0 [C abs]

QoS = RrQualityσlowastmins0 [C abs]

1minus LoLP = Pσlowastmin

s0 [notrisk U abs]

where the min and max operators refer to the action range of na-ture Nat As defined in Section 22 these quantities represent thequantitative values of rewards and satisfaction probability that getthen compared to the corresponding thresholds (EENSM QoSmLoLPM ) in Problem (14) to determine the satisfaction of φ Wewill then normalize the Profit to ProfitM the maximum com-puted profit value for each set of experiments We set T1 = 1hT2 = 30min so K = 2 and consider two pricing options bothfor traditional and opportunistic users If not otherwise stated

Fig 5 Performance vs iteration

Fig 6 Profit vs confidence in forecast

we will use CL = 90 and discretize the wind energy W in 5bins and traditional Dt opportunistic Do demands and baselinesupply Q in 2 bins We set QoSm = 80

sumk E[Dtk +Dok]

LoLPM = 10 EENSM = 5sumk E[Wk + q] The other pa-

rameter values were taken from [10]In Fig 5 we show the trend of the expected system performance

as a function of the synthesis algorithm iteration The Profitmonotonically decreases until the proposed candidate strategy σcmeets all specifications We note that 1minusLoLP andQoS (EENSnot shown has a trend similar to 1 minus LoLP ) instead vary non-monotonically Intuitively this is because the Profit can be in-creased either by scheduling less baseline energy Q to reduce can-cellation costs cp but incurring in a higher risk or by increasing the

Table 2 Performance Analysis

W bins 5 10 15 20Profit 1 098 097 0965

1minus LoLP 099 099 099 099EENS 098 098 098 098QoS 101 101 101 101

Runtime 144s 400s 1368s 3289sIter 223 53 547 332N + Tr 1343 2719 4115 5591

MD Strat (I) 4096 42e6 43e9 44e12

005 010 015 020 025 030082084086088090092094096098100

Pro

fit

Pro

fit M

Performance Comparison vs Varaiya et al [7] and He et al [10]

Ours[7][10]

005 010 015 020 025 03000020406081012

EE

NS

EE

NS M

Ours[7][10]

005 010 015 020 025 030Wind Penetration ηW

088090092094096098100102

QoS

QoS

m Ours[7][10]

Fig 7 Comparison to alternative approaches via Monte Carlo sim-ulation

energy price v for opportunistic users with consequent reduction ofQoS (the peaks of 1 minus LoLP are aligned to the valleys of QoS)By ranking strategies by expected Profit our algorithm is capa-ble of selecting the optimal strategy despite the complex parameterinterdependences in the model under analysis

In Table 2 we compare synthesis results while varying the num-ber of discretization bins for W (all values are normalized to thecorresponding specification) First we note that the expected sys-tem performances do not substantially vary by changing the num-ber of bins supporting our choice of 5 bins in the other experi-ments Second runtime results show that the algorithm can handlein reasonable time problems more than 10times larger than the onesanalyzed in [14] the only other algorithm in the literature capableof processing arbitrary PCTL formulas (we useN+Tr the sum ofstates and transitions in the EMDP as a proxy of the model size)Third we note that the alternative approach of verifying all the IMD strategies σ isin ΣMD is not practical due to the exponentialincrease of I (defined in Section 211) with the problem size

In Fig 6 we study the effect of different confidence levels CL inthe wind-energy forecast on the expected operator Profit whilevarying the value of wind penetration ηW =

sumKk=1

E[Wk]E[Wk + q]

and keeping all constraints constant At high CL higher profitscan be expected for increasing ηW (wind energy is assumed free)On the other hand for low CL higher wind penetration createsmore uncertainty thus lowering the expected Profit The networkoperator can use these curves to assess the return of investment inemploying more accurate (and expensive) forecast techniques

Finally in Fig 7 we compare results with two other energy-pricing formulations proposed in the literature He et al [10] solvethe optimization problem without enforcing any constraint Varaiyaet al [7] only put limits on the acceptable LoLP (their approach isnot trivially extendable to reward properties expressed using the Roperator) and solve optimally only for LoLP = 0 Comparison isdone by solving the different optimization problems and then run-ning Monte Carlo simulations (1000 runs) of the controlled systemon test data (different from the training ones) to evaluate its per-formance (Profit EENS and QoS) As expected the uncon-strained strategy from [10] has higher Profit (up to 5) but alsoup to 12 more EENS and 10 less QoS compared to our ap-proach The strategy from [7] guarantees null EENS but it hasup to 6 less Profit (due to over constraining EENS) and 10less QoS (which is left unconstrained)

As a final remark we note that runtime may increase exponen-tially as we tighten the specification thresholds (QoSm LoLPM EENSM ) since it becomes increasingly more difficult to find a

solution within the exponentially-sized search space Neverthelessthe chosen values were tight enough to improve the quality of alter-native energy pricing and dispatch strategies proposed in the litera-ture while maintaining the runtime acceptable for this application

7 CONCLUSIONS AND FUTURE WORKWe proposed the first sound and complete algorithm for the syn-

thesis of control strategies for MDPs satisfying properties expressedin PCTL and robust to uncertainties in the transition probabilitiesWe then applied the algorithm to the problem of energy pricingand dispatch in smart grids with renewables Results showed thatnetwork-operator risks can be effectively constrained at design timeand that more accurate predictions of the expected profit can be ob-tained by taking the uncertainty of wind availability into account

As future work we plan to investigate techniques to generatemore concise constraints to prove failure of the verification step inorder to prune more effectively the search space for the optimiza-tion engine and to apply the proposed strategy synthesis approachto further case studies eg semi-autonomous car driving

Acknowledgments The authors thank J Venkatesh and profT Simunic Rosing for providing the measurement data used in thecase study The research was partially funded by DARPA AwardNumber HR0011-12-2-0016 and by TerraSwarm one of six cen-ters of STARnet a Semiconductor Research Corporation programsponsored by MARCO and DARPA Approved for public releasedistribution is unlimited The content of this paper does not neces-sarily reflect the position or the policy of the US government andno official endorsement should be inferred

8 REFERENCES[1] Courcoubetis C and Yannakakis M ldquoThe Complexity of

Probabilistic Verificationrdquo Journal of ACM vol 42(4) pp857ndash907 1995

[2] H Hansson and B Jonsson ldquoA Logic for Reasoning AboutTime and Reliabilityrdquo Formal Aspects of Computing vol6(5) pp 512ndash535 1994

[3] A Bianco and L de Alfaro ldquoModel Checking ofProbabilistic and Nondeterministic Systemsrdquo in Proc ofFSTTCS ser LNCS Springer Berlin Heidelberg 1995vol 1026 pp 499ndash513

[4] M Puterman Markov Decision Processes DiscreteStochastic Dynamic Programming John Wiley and Sons1994

[5] ldquoInternational Energy Agency World Energy Outlook 2009rdquo[Online] httpwwwworldenergyoutlookorgdocsweo2009WEO2009_es_englishpdf

[6] A Gore An Inconvenient Truth New York Rodale 2006[7] P Varaiya F Wu and J Bialek ldquoSmart Operation of Smart

Grid Risk-Limiting Dispatchrdquo Proceedings of the IEEEvol 99 no 1 pp 40ndash57 2011

[8] K Porter and J Rogers ldquoSurvey of Variable GenerationForecasting in the Westrdquo in NREL Subcontract ReportSR-5500-54457 April 2012 p 56

[9] S Koch et al ldquoModeling and Control of AggregatedHeterogeneous Thermostatically Controlled Loads forAncillary Servicesrdquo Proc of PSCC pp 1ndash7 2011

[10] M He S Murugesan and J Zhang ldquoMultiple TimescaleDispatch and Scheduling for Stochastic Reliability in SmartGrids with Wind Generation Integrationrdquo in Proc of IEEEINFOCOM 2011 pp 461ndash465

[11] A Puggelli et al ldquoPolynomial-Time Verification of PCTLproperties of MDPs with Convex Uncertaintiesrdquo in Proc of

CAV ser LNCS Springer Berlin Heidelberg 2013 vol8044 pp 527ndash542

[12] A Puggelli ldquoFormal Techniques for the Verification andOptimal Control of Probabilistic Systems in the Presence ofModeling Uncertaintiesrdquo PhD dissertation University ofCalifornia Berkeley Berkeley CA 2014

[13] C Baier et al ldquoController Synthesis for ProbabilisticSystemsrdquo in Exploring New Frontiers of TheoreticalInformatics Springer US 2004 vol 155 pp 493ndash506

[14] M Lahijanian et al ldquoTemporal Logic Motion Planning andControl with Probabilistic Satisfaction Guaranteesrdquo IEEETransactions on Robotics vol 28 no 2 pp 396ndash409 2012

[15] M Kwiatkowska and D Parker ldquoAutomated Verificationand Strategy Synthesis for Probabilistic Systemsrdquo in Proc ofATVA ser LNCS vol 8172 Springer 2013 pp 5ndash22

[16] K Sen et al ldquoModel-Checking Markov Chains in thePresence of Uncertaintiesrdquo in Proc of TACAS ser LNCSSpringer Berlin Heidelberg 2006 vol 3920 pp 394ndash410

[17] A Nilim and L El Ghaoui ldquoRobust Control of MarkovDecision Processes with Uncertain Transition MatricesrdquoJournal of Operations Research pp 780ndash798 2005

[18] M Y Vardi ldquoAutomatic Verification of ProbabilisticConcurrent Finite State Programsrdquo in Proc of SFCS 1985pp 327ndash338

[19] V Forejt et al ldquoAutomated Verification Techniques forProbabilistic Systemsrdquo Formal Methods for EternalNetworked Software Systems vol 6659 pp 53ndash113 2011

[20] F Bouffard and F Galiana ldquoStochastic Security forOperations Planning with Significant Wind PowerGenerationrdquo in IEEE Transaction on Power Systems 2008vol 23 no 2 pp 306ndash316

[21] A Kucera and O Stražovskyacute ldquoOn the Controller Synthesisfor Finite-State Markov Decision Processesrdquo in Proc ofFSTTCS ser LNCS Springer Berlin Heidelberg 2005vol 3821 pp 541ndash552

[22] K Draeger et al ldquoPermissive Controller Synthesis forProbabilistic Systemsrdquo in Proc of TACAS ser LNCSSpringer 2014 pp 531ndash546

[23] T Brazdil et al ldquoStochastic Games with Branching-TimeWinning Objectivesrdquo in Proc of IEEE SLCS Aug 2006 pp349ndash358

[24] C Hang et al ldquoSynthesizing Cyber-Physical ArchitecturalModels with Real-Time Constraintsrdquo in Proc of CAVBerlin Heidelberg Springer-Verlag 2011 pp 441ndash456

[25] P Nuzzo et al ldquoCalCS SMT Solving for Non-linearConvex Constraintsrdquo in Proc of FMCAD 2010 pp 71ndash80

[26] S Boyd and L Vandenberghe ldquoConvex OptimizationrdquoCambridge University Press 2004

[27] ldquoGurobi Optimizerrdquo [Online] httpwwwgurobicom[28] S Stoft Power System Economics Designing Markets for

Electricity NewYork Wiley 2002[29] Y Wan ldquoLong-term Wind Power Variabilityrdquo NREL

Technical Report TP-5500-53637 [Online]httpwwwnrelgovdocsfy12osti53637pdf January 2012

[30] E Ela et al ldquoOperating Reserves and Variable GenerationrdquoNREL Technical Report [Online]httpwwwnrelgovdocsfy11osti51978pdf August 2011

[31] M Kwiatkowska et al ldquoPRISM 40 Verification ofProbabilistic Real-Time Systemsrdquo Proc of CAV pp585ndash591 2011

Page 4: Robust Strategy Synthesis for Probabilistic Systems Applied to …sseshia/pubdir/emsoft... · 2014-08-07 · Robust Strategy Synthesis for Probabilistic Systems Applied to Risk-Limiting

where we minimize across the action range ηa isin Nat of the ad-versarial nature the expected reward over all paths starting from swith horizon T visited under strategy σ

We will only consider CMDPs such that Wσs exists and it is

finite foralls isin Sforallσ isin ΣMD These include finite and infinite-horizon CMDPs (T isin N cup +infin) with zero-reward absorbingstates (ω = abs) as the ones used in Section 5 An extension toother classes of CMDPs eg with negative rewards is possible insome specific cases but outside the scope of the paper For moredetails see [4] Finally according to Assumption 22 we can sub-stitute ηa isin Nat with fas isin Fas and only consider MD natures

22 Probabilistic Computation Tree LogicWe use Probabilistic Computation Tree Logic (PCTL) a proba-

bilistic logic derived from CTL which includes a probabilistic op-erator P [2] and a reward operator R [19] to express properties ofCMDPs [3] The syntax of the fragment of the logic in which oper-ators with a finite time bound are disallowed is defined as follows

φ = True | ω | notφ | φ1 and φ2 | P1p [ψ] | Rr1v[ρ] state formulasψ = X φ | φ1 U φ2 path formulasρ = C φ rewards

with ω isin Ω an atomic proposition 1isin le ltge gt p isin [0 1]v isin Rge0

Path formulas use the Next (X ) and Unbounded Until (U ) op-erators They are evaluated over paths and only allowed as param-eters to the P operator Reward formulas use the Cumulative (C )operator The sizeQ of a PCTL formula is defined as the number ofBoolean connectives plus the number of P and R operators in the

formula We define Pσηa

s [ψ]def= Prob

(π isin Πσηa

s | π |= ψ)

the probability of taking a path π isin Πs that satisfies ψ under strat-egy σ and nature ηa Pσmaxs [ψ] (Pσmins [ψ]) denotes the maxi-mum (minimum) probabilityPση

a

s [ψ] across all natures ηa isin Natfor a fixed strategy σ An analogous definition holds also for re-ward properties which can be expressed also using (multiple) re-ward structures different from the one used to maximize the totalexpected reward For a CMDP MC strategy σ and property φwe will useMC σ |=Nat φ to denote that when starting from anyinitial state s isin S0 and operating under σMC satisfies φ for anyηa isin Nat The semantics of the logic is reported in Table 1 wherewe write |= instead of σ |=Nat for brevity

3 RELATED WORKRelated work falls into two main categories renewable-energy

pricing in smart grids and strategy synthesis from PCTL specifica-tions for probabilistic systems

The integration of renewable energy sources in power grids hasmotivated the development of stochastic frameworks to solve theenergy-pricing problem The work in [20] presents a risk-limitingoptimization framework That effort focuses on modeling the un-certainty in energy availability on the supply side but it does not

Table 1 PCTL semantics forMC s |= Trues |= ω iff ω isin L(s)s |= notφ iff s 6|= φs |= φ1 and φ2 iff s |= φ1 and s |= φ2

s |= Pp(p) [ψ] iff Pσmax(min)s [ψ] p(p)

s |= Rv(v) [C φ] iff Eσmax(min)s [rewr(π tφ)] v(v)

π |= X φ iff π[1] |= φπ |= φ1 U φ2 iff existi ge 0 | π[i] |= φ2 and forallj lt i π[j] |= φ1

rewr(π tφ) = Σtφt=0rs(πs[t]) + ra(πs[t] πa[t])

tφ = mint | πs[t] |= φ

consider the problem of controlling user demand through economicincentives The effectiveness of demand response in balancing sup-ply and demand in power grids was studied in [9] and a stochas-tic framework to optimize operator profits was presented in [10]We closely follow the optimization setup presented in these worksbut we also constrain the operator risk and the user QoS at syn-thesis time Finally Varaiya et al argued the need to quantita-tively constrain the operator risk in [7] The risk-limiting dispatchapproach proposed in that work is optimal for properties of theform Pge1 or Ple0 but sub-optimal for properties with satisfactionthreshold p isin (0 1) Further QoS is not considered

The problem of strategy synthesis for MDPs from PCTL speci-fications was first studied in [13] Strategies are divided into fourcategories depending on 1) whether the transition is chosen deter-ministically (D) or randomly (R) 2) the choice does (does not) de-pend on the sequence of previously visited states (Markovian (M)and History-dependent (H)) Also it is proven that the four types ofstrategies form a strict hierarchy (MD ≺ MR ≺ HD ≺ HR) andthat determining whether it exists an MDMR (HDHR) strategythat meets all specifications is NP-complete (elementary) Kuceraet al [21] show how to synthesize MR controllers that are robustto linear perturbations via a reduction to a formula in the first-order logic of the reals This work is the closest to ours albeitwe consider also non-linear models of uncertainties Also to thebest of our knowledge the algorithm has not been applied to anycase study In [14] and [15] routines for the model-checking ofPCTL properties of MDPs are adapted to the strategy synthesisproblem These algorithms are polynomial in the model size butthey are not complete (ie there might exist a solution even if noneis reported) [14] or can handle properties with only one quanti-tative operator [15] In [22] the synthesis of multi-strategies forMDPs is studied This approach can handle only a subset of PCTLproperties and it only considers MDPs with no uncertainties Fi-nally [23] studies two-player games with winning objectives ex-pressed in PCTL Also our formulation can be interpreted as agame where the controller plays against nature On the other handfollowing the CMDP semantics defined in Assumption 22 we givenature the power of selecting a different strategy at each executionstep while [23] aims to analyze the more complex problem of find-ing the single optimal strategy for nature Our formulation is usefulto model time-varying processes (eg the power generated by thewind) and an optimal strategy for the controller can be synthesizedalgorithmically The existence of an optimal controller and naturein the formulation of [23] is instead undecidable in general

4 CONSTRAINED TOTAL EXPECTED RE-WARD MAXIMIZATION FOR CMDPS

We formally define the optimization problem under analysis provethat its decision problem version is NP-complete and present an al-gorithm to solve the former

Constrained Total Expected Reward Maximization for EMDPsGiven an EMDPMC a reward structure r and a PCTL formulaφ determine strategy σlowast forMC such that

σlowast = argmaxσisinΣMD

φ

WσS0

(5)

where WσS0

is the sum of the expected rewards over all the initialstates s isin S0 and ΣMD

φ is the set of Markov-Deterministic strate-gies for whichMC satisfies φ for any ηa isin Nat starting from anys isin S0 and operating under σ isin ΣMD

φ The same results hold for the dual problem of minimizing the

EMDP total expected cost after substituting all ldquomaxrdquo (ldquominrdquo) op-erators with ldquominrdquo (ldquomaxrdquo) in the derivations below

We will use the following lemmas (the proofs are available in thereferences)

Opmizaon13 Engine13 (strategy13 ranking)13

Feasible13 Candidate13

op0mal13 solu0on13 NO13 addi0onal13 constraint13

Op0mal13 strategy13 that13 sa0sfies13 proper0es13

No13 strategy13 sa0sfies13 proper0es13

Verificaon13 Engine13 (verifies13 if13 properes13 are13 sasfied)13

Unfeasible13

YES13

Fig 2 Lazy algorithm for the constrained optimization

Lemma 41 Complexity of PCTL model-checking forCMDPs [11] Verifying that a CMDP MC satisfies a PCTL for-mula φ is solvable in polynomial time

Lemma 42 Computation of the total expected reward for Con-vex Markov Chains (CMCs) [17] Given a CMC its total ex-pected reward is computable in polynomial time

Lemma 43 Complexity of PCTL strategy synthesis forMDPs [13] The problem of determining the existence of an MDstrategy σ for an MDPM such thatM σ |= φ is NP-complete

We now prove that the decision problem version of Problem (5)is NP-complete

Theorem 41 The problem of determining the existence of an MDstrategy σ for an EMDP MC with total expected reward Wσ

S0

larger or equal to WT and satisfying a specification φ in PCTLis NP-complete

Proof Given a candidate solution σc we can in polynomial time1) check whether MC σc |=Nat φ by Lemma 41 2) computeWσcS0

on the induced CMCMσcC by Lemma 42 σc is a solution if

and only if check 1) passes and WσcS0ge WT This proves that the

problem is in NP To prove NP-hardness we reduce the problem inLemma 43 to the one under analysis We set WT = 0 and de-scribe transition probabilities with point ellipses ie ellipses withnull axes (Kas rarr +infin foralls isin S foralla isin A(s) in Equation (2))

In the rest of the section we describe an algorithm to solve Prob-lem (5) We use a lazy approach based on strategy-iteration con-ceptually similar to the ones proposed in [24] and for non-linearconstraints [25] As shown in Fig 2 the algorithm is split into twomain routines communicating in a loop At each iteration the opti-mization engine (OE) is responsible to generate a candidate strategyσc σc maximizes the total expected reward but it does not neces-sarily satisfy PCTL property φ since the OE formulation does notcontain any information about φ The candidate solution is thenpassed to the verification engine (VE) which checks whetherMCsatisfies φ for all resolutions of uncertainty ie forallηa isin Nat whenoperating under σc If the check passes σlowast = σc and the algo-rithm terminates Otherwise the VE generates an additional con-straint for the OE to prevent the previous σc to be selected againand the loop repeats The novelty of our approach is devising amathematical formulation for the OE capable of generating at eachiteration the candidate strategy that maximizes the total expectedreward among those still available (at each iteration in which theVE reports a failure the candidate strategy is discarded) The firstcandidate strategy that satisfies the PCTL specification becomes the

solution of the synthesis problem Such a strategy is feasible (sat-isfies φ) and it has the maximum total reward among strategies thatsatisfy φ so it solves Problem (5) exactly

In general there is not guarantee on the uniqueness of the op-timal strategy σlowast ie multiple strategies with the same expectedreward might exist each satisfying φ Since every such strategyis equivalent from the user perspective the algorithm just reportsthe first one found Alternatively all equivalent strategies could begenerated by continuing the iteration between the OE and the VEuntil finally a reduction in the expected reward gets detected andby reporting all synthesized strategies satisfying φ

The next subsections give details on the OE and VE and analyzethe algorithm properties

41 Optimization EngineWe start with the classical linear-programming (LP) formulation

to maximize the total expected reward for MDPs [4]

minxl

sumsisinS0

xs

st xs minus las = ras + xTfas foralls isin Sforalla isin A(s) (6)xs l

as ge 0 foralls isin Sforalla isin A(s)

Vector x collects the total expected reward for each state s isin S (atthe end of the optimization Wσc

s = xs foralls isin S) and the cost func-tion sums the total expected rewards for all the initial states s isin S0We then set Wσc

S0=sumsisinS0

xs Variables las are slack variablesfor each constraint Also ras = rs(s) + ra(s a) foralls isin Sforalla isin ASince the slack variables have negative sign the slack can only benegative ie the left-hand side (LHS) can only be larger or equalthan the right-HS (RHS) The ldquominrdquo operator makes sure that foreach state the constraint with the highest RHS has null slack ielas = 0 The optimal strategy can then be reconstructed by selectingthe action a isin A(s) foralls isin S corresponding to the constraint withnull slack eg σc(s0 a) = 1 if las0 = 0 Our goal is to modifysuch a formulation to allow a sub-optimal solution to be selected incase the optimal solution does not satisfy the PCTL specificationWe now describe an equivalent formulation to the original prob-lem that is more suitable to achieve this goal We will describe inSection 42 how to add constraints to this formulation to select sub-optimal solutions in order of optimality We refer to Problem (7)

maxxzln

sumsisinS0

xs

st xs minus las + nas = ras + xTfas foralls isin Sforalla isin A(s) (7a)las le Bzas nas le Bzas foralls isin Sforalla isin A(s) (7b)

zTs 1 = Ms minus 1 foralls isin S (7c)xs l

as n

as ge 0 zas isin 0 1foralls isin Sforalla isin A(s)

We associate a binary variable zas to each action for every stateso the problem becomes a Mixed-Integer Linear Program (MILP)zajsi = 0 if action aj is chosen for state si and Constraint (7c) guar-

antees that only one action can be selected for each state whereMs = |A(s)| For example σc(s0 a) = 1 if zas0 = 0 We then as-sociate to each constraint a second slack variable nas with sign op-posite to las For selected actions zaisi = 0 Constraint (7b) makessure that zajsi = 0 implies lajsi = 0 and najsi = 0 so that the cor-responding Constraint (7a) sets the value of xs1 For unselectedactions zaksi = 1 variable laksi gt 0 (naksi gt 0) if selecting ac-tion ak had resulted in a lower (higher) value of xsi With these

1 B is a big number with respect to the problem data It needs to behigher than the reward computed for any state ie B ge xs foralls isinS but not excessively high to avoid convergence problems in theoptimizer when solving Problem (7) A suitable value of B can befound by trial and error

constraints any action can be selected We finally change the op-timization operator to ldquomaxrdquo so that at the first iteration of thealgorithm the total expected reward gets maximized

We now proceed to consider uncertainties in the transition prob-abilities Constraint (7a) gets updated to Constraint (8) since theadversarial nature tries to minimize the expected reward The newconstraint can be made linear again for an arbitrary uncertaintymodel by replacing it with a set of constraints one for each pointin Fas However this approach results in infinite constraints if theset Fas contains infinitely many points as in the cases consideredin the paper thus making the problem not solvable We solve thisdifficulty for the ellipsoidal uncertainty model using duality InConstraint (9) we replace the primal inner problem with its dualforalls isin S a isin A(s)

xs minus las + nas = ras + minfas isinFas

xTfas (8)

dArrxs minus las + nas = ras + max

λasisinDas (x)

g (λas ) (9)

Vector Lagrange multiplierλas =

[λa1s λ

a2sλ

a3s

]Dual cost functiong (λas ) = λa1s minus λa2s minus ha

sTEasλ

a3s

Dual feasibility setDas (x) = λas | λa3s2 le λa2s xminus λa1s1 + Eas

Tλa3s = 0

The dual problem is convex by construction [26] and has size poly-nomial in the size R of the EMDP [11] Since also the primalproblem is convex strong duality holds ie the primal and dualoptimal solutions coincide because the primal problem satisfiesSlaterrsquos condition [26] for any non-trivial uncertainty set Fas Anydual solution underestimates the primal solution When substitut-ing the primal problem with the dual in Constraint (9) we can dropthe inner optimization operator because the outer optimization op-erator will nevertheless aim to find the least underestimate to max-imize its cost function We get the full formulation for the OE (thequantifiers foralls isin Sforalla isin A(s) are equal to Problem (7))

maxxλlnz

sumsisinS0

xs

st xs minus las + nas = ras + g (λas )

las le Bzas nas le Bzas (10)

zTs 1 = Ms minus 1

xs las n

as ge 0λas isin Das (x) zas isin 0 1

For the ellipsoidal model Problem (10) can be written as Prob-lem (11) which is a Mixed-Integer Quadratic-Constrained Program(MIQCP) that we solve using the back-end optimizer Gurobi [27]

maxxλzln

sumsisinS0

xs

st xs minus las + nas = ras + λa1s minus λa2s minus hasTEasλ

a3s

las le Bzas nas le Bzas

zTs 1 = Ms minus 1 (11)

xs las n

as λ

a2s ge 0 λa3s ge 0 zas isin 0 1

λa3s2 le λa2s xminus λa1s1 + EasTλa3s = 0

42 Verification EngineAfter fixing a candidate strategy σc all the non-determinism inMC is resolved The VE has thus the task of checking whetherthe induced Ellipsoidal-MC (EMC) Mσc

C = (S S0ΩF X L)satisfies the PCTL formula φ for all resolutions of uncertainty ieforallηa isin Nat We use the sound and complete verification algo-rithm presented in [11 12] (we remember that EMCs are a special

case of CMDPs) If the EMC satisfies φ then the optimal strat-egy has been found and σlowast = σc Otherwise the VE needs togenerate an additional constraint to be passed to the OE so thatthe same candidate solution does not get selected anymore If vec-tor zc = [zas0 middot middot middot z

asN ] collects all the binary decision variables that

were set to zero in the previous round of optimization ie the vari-ables corresponding to the previously selected actions we just needto add constraint

zTc 1 ge 1 (12)

so that it is not possible to select the same set of actions againAs an example we optimize the total expected reward of the

EMDP in Fig 1 subject to property φ = Pge08[ϑ U abs] The firstiteration of the OE generates strategy σc1 which selects actions[b a a a] for states [s0 middot middot middot s3] with Wσc1

s0 = 10625 The VEthough reports Pσc1mins0 [ϑ U abs] = 0207 so the strategy is re-jected The VE adds the constraint zbs0 + zas1 + zas2 + zas3 = 1 tothe OE formulation which generates at the second generation σc2([a a a a]) with Wσc2

s0 = 10188 The VE computes the valuePσc2mins0 [ϑ U abs] = 1 and the algorithm terminates reportingσlowast = σc2

43 Algorithm AnalysisWe prove soundness (if a strategy σlowast is returned it indeed opti-

mally solves Problem (5)) completeness (if no solution is returnedno strategy σ isin ΣMD exists such thatMC σ |=Nat φ) and ana-lyze the runtime performance of the proposed algorithm

Theorem 42 The algorithm presented in this section to solveProblem (5) is sound complete and has runtime exponential in thesize R of the EMDP and polynomial in the size Q of the PCTLspecification φ

Proof Problem (10) returns the MD strategy σc that maximizesWσS0

among those still available By Lemma 41 the VE is soundso if it returns MC σc |=Nat φ indeed σlowast = σc (exit arrow atthe bottom of Fig 2) The VE is also complete so if it returnsMC σc 6|=Nat φ the current σc can be discarded This is doneby generating a constraint of the form of Constraint (12) whichremoves only the current σc from the strategies to be explored bythe OE This proves soundness of the overall algorithm

Failure of finding a solution is declared only by the OE whenProblem (10) becomes unfeasible because all available strategieshave previously been discarded by the VE (exit arrow at the top ofFigure 2) This proves completeness

Finally the algorithm goes at most through I = O(MN ) itera-tions (from Section 2 I is the total number of MD strategies M isthe number of actions and N the number of states of the EMDP)Each requires solving an instance of Problem (10) and a verificationcheck (done in time polynomial inR andQ by Lemma 41) Prob-lem (10) can be solved by branch-and-bound algorithms in time ex-ponential in the number of binary variables (whose number isMNand remains constant across iterations) and polynomial in the num-ber of constraints (whose number is polynomial in R at the firstiteration and it grows at each iteration limited by I = O(MN ))The total complexity isO(MNtimes

(2MNtimespoly(MN )+poly(Q))

)

exponential inR and polynomial inQ

The algorithm performs better on problems that do have a fea-sible solution arguably the most interesting ones while the opti-mization step could be removed if the goal is to prove unfeasibilityAs an alternative σlowast could be determined by testing all I avail-able MD strategies and selecting the one with the highest rewardamong those satisfying φ [13] We believe (and experimentallyshow in Section 6) that our approach can achieve better runningtime by decoupling the problem into an optimization and a veri-fication part and by testing strategies in order of optimality Fi-

Real-time price13

Day-ahead price13

Network Operator13Energy dispatch and pricing13

Base-line13 Wind13 Fast-start13

Traditional13Users13

Opportunistic13Users13

Day-ahead13dispatch13

Real-time dispatch13

Uncertain demand13

Uncertain supply13

Supply13

Demand13

Fig 3 Network operator inputs (dashed) and outputs (solid)

nally speed-ups can be obtained by implementing online routinesfor integer-constraint simplification and to produce more succinctcertificates of unfeasibility from the VE [2425] These routines areoutside the scope of this paper and will be covered in future work

5 SUPPLY SCHEDULING AND ENERGYPRICING WITH RENEWABLE SOURCES

We model the pricing and dispatch problem following the sce-nario sketched in Fig 3 [7 10 28] More details are availablein [12]

Three agents operate in the system the network operator andtwo types of users traditional and opportunistic Further three en-ergy sources are available non-dispatchable wind and two dis-patchable sources baseline and fast-start generators We will usestochastic variables WDt Do to represent wind energy supplyand traditional and opportunistic user demand respectively

The network operator takes two kinds of decisions 1) dispatchof non-renewable sources to guarantee that the aggregate energysupply matches the demand 2) pricing of energy to maximizeprofits and incentivize users to join or leave the network depend-ing on energy availability Traditional and opportunistic users re-act to pricing decisions on different timescales Traditional usersonly react to day-ahead pricing to decide how much energy theyare willing to purchase while opportunistic users are capable ofrescheduling in real-time their energy demand depending on theenergy price in exchange of lower expected prices

A 24-hour period gets divided into T1-slots of equal length (egT1 = 1h) and each T1-slot into K T2-slots (eg T2 = 30minK = 2) [10] The operator maximizes its economic profit by tak-ing decisions on the two time scales On day-ahead for each T1-slot it dispatches Q units of baseline energy (unit cost c1) withq = QK units per T2-slot and sets the price for traditional users(u) so that they can decide one day ahead when to schedule theirdemand in the following day The choice of u sets the expected de-mand of traditional users (E[Dt]) In real-time for each T2-slot theoperator first observes the values of traditional-user demand dt andwind availabilityw It then sets the price for opportunistic users (v)to determine the expected demand of opportunistic users (E[Do])Third it dispatches the production of more fast-start energy (unitcost c2) or cancels part of the already dispatched baseline energy(paying only unit cost cp instead of c1) depending on wind avail-ability and user demand to balance supply and demand In realscenarios cp lt c1 lt c2 and wind-energy is assumed to be free forbrevity [10] The operator thus tries to use as much wind energyas possible and to dispatch on day-ahead the exact amount of base-line energy not to incur in real-time in cancellation costs or in theextra cost for fast-start supplies Since more profitable strategiesmight imply a higher reliance on the uncertain wind energy or anincrease in energy prices correct system functionality needs limitson energy-unbalance risks and QoS guarantees for users

There are three sources of stochastic behavior (i) wind-energysupply (W ) (ii) traditional (Dt) user demand (iii) opportunistic(Do) user demand We thus use a stochastic optimization frame-work The result of the optimization should return optimal strate-

wα 0 0

w1αdt1αdo1γ

w1αdt1αdo1β

k=013 k=113 k=213

v1a

v1a

v1av1

b

v1b

Qa ua

Qb ub

Qa ub

w1β 0 0

s0

0 0 0

w1α 0 0w1αdt1α 0

w1β dt1β 0w1β dt1α 0

w1αdt1β 0

w1αdt1αdo1α

w1αdt1αdo1δv1b

w2β 0 0

w2α 0 0

wαdtαdoγ

wαdtαdoβva

va

vavb

Qa ua

Qb ub

Qa ub

wβ 0 0

s0

0 0 0

wαdtα 0

wβ dtβ 0wβ dtα 0wαdtβ 0

wαdtαdoα

wαdtαdoδvb

vb

ab uQ

ab uQ

Fig 4 Sketch ofME (top) andMprimeE (bottom) The latter isME

unfolded across decision epochs For each EMDP the initial states0 is shown on the left Each state is represented by the tuple(w dt do) of observations of WDt Do The pairs (Q u) rep-resent the day-ahead decisions about dispatch of baseline energyQand energy pricing for traditional users u Two arrows per decisiondepart from each state because in this figure we assumed only twodiscretization levels for each quantity (labeled with greek lettersα β middot middot middot ) We only show the state graph in the EMDPs related todecision (Qa ua) but similar state graphs are present also for theother decision pairs (Qa ub) (Qb ua) (Qb ub) as hinted by thedashed arrows departing from s0

gies about 1) the day-ahead decisions (Q and u) and 2) the real-time decisions (v) for each possible observation of W and Dt Inreal-time the actual decision (vobs) will be taken deterministicallyamong the synthesized ones based on the observed values w anddt ie the actual wind availability and traditional-user demandIn summary we will optimize over one T1-slot (the decision prob-lem is periodic so we can run one optimization for each T1-slotstand-alone) and aim to determine optimal values for Q u and v

We use the Ellipsoidal-MDPME = (S S0 AΩF AX L)sketched in Fig 4 (top) All quantities are bounded and uniformlydiscretized to keep the state and action spaces finite States s isin Sare a tuple s = (w dt do) where w dt do refer to the observedvalues of available wind energy and user demand in that state Weconsider only one T1-slot per time so we model only one choiceof day-ahead energy dispatch Q and pricing for traditional usersu (at the initial state s0 (Q u) isin A) The process then transi-tions through K decision epochs as follows First values of windenergy (w) and traditional user demand (dt) are stochastically cho-sen according to the corresponding distributions (described below)Second for each observation of W and Dt a decision on v isin A ismade and the opportunistic user demand (do) is stochastically cho-sen To transition between epochs a new value of wind energy w(the backward arrow in Fig 4 (top)) is chosen and the steps repeat

In general the optimal decision v at each epoch depends onprevious observations of wind availability (w) and user demand(dt do) so v is history-dependent To synthesize control strategiesusing the algorithm in Section 4 we unfold the sequence of deci-sion epochs in the EMDPMprimeE = (Sprime S0 A

primeΩF primeAprimeX prime Lprime)in Fig 4 (bottom) InMprimeE we have explicitly marked each quan-tity with an additional subscript k = 1 middot middot middot K to refer to the cor-responding T2-slot Each state s isin S ofME (apart from the initialstate) has been replicated K times in MprimeE s rarr s1 s2 middot middot middot sK After K decision epochs the states transition to an absorbing state(not shown in Fig 4) Since now all decision epochs are explicitly

codified in the model Markov strategies are optimal forMprimeE Thecorresponding optimal history-dependent strategy forME can bereconstructed for each s isin S by collecting the sequence of optimaldecisions returned by the algorithm in the replicas s1 s2 middot middot middot sK In the following we will only considerMprimeE As an additional ad-vantage different wind distributions can be used to transition be-tween different epochs inMprimeE (only one distribution could be usedinME) to account for the time-varying nature of wind availabilityThe modeling expressivity is thus increased

State transition probabilities are computed using the followingstochastic models

User demand The demand of both traditional (Dt) and oppor-tunistic (Do) users is modeled using Gaussian distributions [10]with Dt sim N (αtu

γt βtE[Dt]) and Do sim N (αovγo βoE[Do])

Parameter γt lt 0 (γo lt 0) is the elasticity of the traditional (op-portunistic) users ie the ratio of the percentage change of theexpected demand to that of price variations formally defined asγt = uE[Dt] middot partE[Dt]partu Parameters αt αo βt βo are fitting pa-rameters To compute transition probabilities we truncate and dis-cretize the continuous PDs in equally-sized intervals pick the mid-dle point of each interval as the discretization value and then inte-grate the PD across the interval to determine how likely the systemtransitions to that discretized value2

Wind-energy availability We created a stochastic model of theavailable wind energy starting from measured data collected fromthe wind farm at Lake Benton Minnesota USA [29] The goal is totake forecast values into account while also considering the intrin-sic inaccuracies of these predictions First we compute the (dis-crete) empirical PD microW of a training set of collected wind-energydata Second we divide a new set of data in T2-slots and considerthe average value for each new T2-slot as the forecast energy valueWe then scale microW to have such expected value E[Wk] thus obtain-ing microWk Finally we compute ellipsoidal Sets (3) F primeas (collectedinF prime) to represent uncertainty in the transition probability betweentwo discretized energy levels in two consecutive T2-slots Transi-tion frequencies are computed by counting observed transitions in atraining set of data Using classical results from statistics [17] wethen compute the value of parameter β from Set (1) correspond-ing to the confidence level CL in the measurements In particular0 le CL le 1 and CL = 1minus cdfχ2

d(2 lowast (βmax minus β)) where cdfχ2

d

is the cumulative density function of the Chi-squared distributionwith d degrees of freedom (d is equal to the number of bins used todiscretize W ) Moreover an increasing value of parameter β canbe used in the sequence of decision epochs to model the fact thatforecast farther-away in time are less accurate

We provide the states with thick circles in Fig 4 (bottom) withthree reward structures to express the profit and risk of the oper-ator and the QoS for the users We choose those states becausethe quantities Dtk DokWk are all fully observable in them thusallowing the computation of the rewards We set

rProfitsk [$] = udtk+vkdokminus(cp∆k+c1(qminus∆k))1A∆kge0

minus(c1qminusc2∆k)1B∆klt0 (13a)

rLoLsk [MWh] = max(0 XE[Wk] + Y E[Dtk +Dok]minus∆k)(13b)

rQualitysk [MWh] = dtk + dok (13c)

with ∆k = wk + q minus dtk minus dok representing the surplus ofsupply on demand X and Y defined in the following and 1 theindicator function Reward (13a) subtracts operating costs to the

2For simplicity we assume that the distributions of Dt and Doare known with certainty In fact these distributions are usuallyestimated more accurately than the wind forecast by analyzing thehistory of measured data [10] Adding uncertainty models also forthese distributions is left as future work

operator revenue to compute the net profit Indicator 1A (1B) cor-responds to the scenario when the sum of day-ahead dispatchedand wind energy is sufficient (insufficient) to cover the demandIn the latter case fast-start energy needs to be dispatched in real-time Reward (13b) computes the Loss of Load (LoL) In prac-tical scenarios the amount of fast-start energy available in real-time is limited Often this limit is computed with the formulaFS le XE[W ] + Y E[Dt +Do] (eg X = 3 Y = 10) [30]If ∆ + FS lt 0 the network incurs in a LoL with potentiallyrisky consequences Reward (13c) accounts for user demand in-centivized by energy pricing

Finally we mark all states with ∆ +FS lt 0 with the label riskand use label abs for the absorbing state so Ω = risk abs

The optimal strategy σlowast = (ulowast Qlowast vlowastk) 1 le k le K is thesolution of problem

Wlowasts0 = maxQu

minfas isinFprimeas

Eσfas

W EσDt maxv1middotmiddotmiddotvK

EσDorewrProfit(πK)

stMprimeE σlowast |=Nat φ where (14)

φ = RrLoL

leEENSM [C abs] andRrQuality

geQoSm [C abs]

and Pge1minusLoLPM [notrisk U abs]

In Problem (14) we maximize the expected value of the operatorprofit Wlowasts0 under the worst-case resolution of uncertainty in thewind-energy forecast by summing the instantaneous state rewardsrProfits along the paths π isin Πfin of K steps ofMprimeE correspond-ing to the K T2-slots By replicating the decision process for eachof the K T2-slots while building modelMprimeE every K-step execu-tion path traverses all decision epochs so the computed reward asintroduced in Definition 26 and 27 will account for the sum of thecontributions of each decision epoch

According to the semantics defined in Table 1 the PCTL specifi-cation φ constrains the expected operator risk and user QoS acrossthe decision horizon EENSM is the desired maximum value ofExpected Energy Not Served LoLPM is the maximum allowedvalue of Loss of Load Probability (these two properties limit therisk for the operator) andQoSm is the minimum value of QoS thatneeds to be guaranteed to the users

6 EXPERIMENTAL RESULTSWe implemented the algorithm in Python and interfaced it with

PRISM [31] as a front-end for entering models and with Gurobi [27]as the back-end optimizer Experiments were run on a 24 GHz In-tel Xeon with 32GB of RAM

In this section we present experimental results obtained by solv-ing Problem (14) Our goals are to give insight about the algo-rithm functionality compare its scalability to other strategy synthe-sis approaches and show that the synthesized energy pricing anddispatch strategies can achieve better performances than other so-lutions presented in the literature

We define the following quantities

Profit = Wlowasts0EENS = Rr

LoLσlowastmaxs0 [C abs]

QoS = RrQualityσlowastmins0 [C abs]

1minus LoLP = Pσlowastmin

s0 [notrisk U abs]

where the min and max operators refer to the action range of na-ture Nat As defined in Section 22 these quantities represent thequantitative values of rewards and satisfaction probability that getthen compared to the corresponding thresholds (EENSM QoSmLoLPM ) in Problem (14) to determine the satisfaction of φ Wewill then normalize the Profit to ProfitM the maximum com-puted profit value for each set of experiments We set T1 = 1hT2 = 30min so K = 2 and consider two pricing options bothfor traditional and opportunistic users If not otherwise stated

Fig 5 Performance vs iteration

Fig 6 Profit vs confidence in forecast

we will use CL = 90 and discretize the wind energy W in 5bins and traditional Dt opportunistic Do demands and baselinesupply Q in 2 bins We set QoSm = 80

sumk E[Dtk +Dok]

LoLPM = 10 EENSM = 5sumk E[Wk + q] The other pa-

rameter values were taken from [10]In Fig 5 we show the trend of the expected system performance

as a function of the synthesis algorithm iteration The Profitmonotonically decreases until the proposed candidate strategy σcmeets all specifications We note that 1minusLoLP andQoS (EENSnot shown has a trend similar to 1 minus LoLP ) instead vary non-monotonically Intuitively this is because the Profit can be in-creased either by scheduling less baseline energy Q to reduce can-cellation costs cp but incurring in a higher risk or by increasing the

Table 2 Performance Analysis

W bins 5 10 15 20Profit 1 098 097 0965

1minus LoLP 099 099 099 099EENS 098 098 098 098QoS 101 101 101 101

Runtime 144s 400s 1368s 3289sIter 223 53 547 332N + Tr 1343 2719 4115 5591

MD Strat (I) 4096 42e6 43e9 44e12

005 010 015 020 025 030082084086088090092094096098100

Pro

fit

Pro

fit M

Performance Comparison vs Varaiya et al [7] and He et al [10]

Ours[7][10]

005 010 015 020 025 03000020406081012

EE

NS

EE

NS M

Ours[7][10]

005 010 015 020 025 030Wind Penetration ηW

088090092094096098100102

QoS

QoS

m Ours[7][10]

Fig 7 Comparison to alternative approaches via Monte Carlo sim-ulation

energy price v for opportunistic users with consequent reduction ofQoS (the peaks of 1 minus LoLP are aligned to the valleys of QoS)By ranking strategies by expected Profit our algorithm is capa-ble of selecting the optimal strategy despite the complex parameterinterdependences in the model under analysis

In Table 2 we compare synthesis results while varying the num-ber of discretization bins for W (all values are normalized to thecorresponding specification) First we note that the expected sys-tem performances do not substantially vary by changing the num-ber of bins supporting our choice of 5 bins in the other experi-ments Second runtime results show that the algorithm can handlein reasonable time problems more than 10times larger than the onesanalyzed in [14] the only other algorithm in the literature capableof processing arbitrary PCTL formulas (we useN+Tr the sum ofstates and transitions in the EMDP as a proxy of the model size)Third we note that the alternative approach of verifying all the IMD strategies σ isin ΣMD is not practical due to the exponentialincrease of I (defined in Section 211) with the problem size

In Fig 6 we study the effect of different confidence levels CL inthe wind-energy forecast on the expected operator Profit whilevarying the value of wind penetration ηW =

sumKk=1

E[Wk]E[Wk + q]

and keeping all constraints constant At high CL higher profitscan be expected for increasing ηW (wind energy is assumed free)On the other hand for low CL higher wind penetration createsmore uncertainty thus lowering the expected Profit The networkoperator can use these curves to assess the return of investment inemploying more accurate (and expensive) forecast techniques

Finally in Fig 7 we compare results with two other energy-pricing formulations proposed in the literature He et al [10] solvethe optimization problem without enforcing any constraint Varaiyaet al [7] only put limits on the acceptable LoLP (their approach isnot trivially extendable to reward properties expressed using the Roperator) and solve optimally only for LoLP = 0 Comparison isdone by solving the different optimization problems and then run-ning Monte Carlo simulations (1000 runs) of the controlled systemon test data (different from the training ones) to evaluate its per-formance (Profit EENS and QoS) As expected the uncon-strained strategy from [10] has higher Profit (up to 5) but alsoup to 12 more EENS and 10 less QoS compared to our ap-proach The strategy from [7] guarantees null EENS but it hasup to 6 less Profit (due to over constraining EENS) and 10less QoS (which is left unconstrained)

As a final remark we note that runtime may increase exponen-tially as we tighten the specification thresholds (QoSm LoLPM EENSM ) since it becomes increasingly more difficult to find a

solution within the exponentially-sized search space Neverthelessthe chosen values were tight enough to improve the quality of alter-native energy pricing and dispatch strategies proposed in the litera-ture while maintaining the runtime acceptable for this application

7 CONCLUSIONS AND FUTURE WORKWe proposed the first sound and complete algorithm for the syn-

thesis of control strategies for MDPs satisfying properties expressedin PCTL and robust to uncertainties in the transition probabilitiesWe then applied the algorithm to the problem of energy pricingand dispatch in smart grids with renewables Results showed thatnetwork-operator risks can be effectively constrained at design timeand that more accurate predictions of the expected profit can be ob-tained by taking the uncertainty of wind availability into account

As future work we plan to investigate techniques to generatemore concise constraints to prove failure of the verification step inorder to prune more effectively the search space for the optimiza-tion engine and to apply the proposed strategy synthesis approachto further case studies eg semi-autonomous car driving

Acknowledgments The authors thank J Venkatesh and profT Simunic Rosing for providing the measurement data used in thecase study The research was partially funded by DARPA AwardNumber HR0011-12-2-0016 and by TerraSwarm one of six cen-ters of STARnet a Semiconductor Research Corporation programsponsored by MARCO and DARPA Approved for public releasedistribution is unlimited The content of this paper does not neces-sarily reflect the position or the policy of the US government andno official endorsement should be inferred

8 REFERENCES[1] Courcoubetis C and Yannakakis M ldquoThe Complexity of

Probabilistic Verificationrdquo Journal of ACM vol 42(4) pp857ndash907 1995

[2] H Hansson and B Jonsson ldquoA Logic for Reasoning AboutTime and Reliabilityrdquo Formal Aspects of Computing vol6(5) pp 512ndash535 1994

[3] A Bianco and L de Alfaro ldquoModel Checking ofProbabilistic and Nondeterministic Systemsrdquo in Proc ofFSTTCS ser LNCS Springer Berlin Heidelberg 1995vol 1026 pp 499ndash513

[4] M Puterman Markov Decision Processes DiscreteStochastic Dynamic Programming John Wiley and Sons1994

[5] ldquoInternational Energy Agency World Energy Outlook 2009rdquo[Online] httpwwwworldenergyoutlookorgdocsweo2009WEO2009_es_englishpdf

[6] A Gore An Inconvenient Truth New York Rodale 2006[7] P Varaiya F Wu and J Bialek ldquoSmart Operation of Smart

Grid Risk-Limiting Dispatchrdquo Proceedings of the IEEEvol 99 no 1 pp 40ndash57 2011

[8] K Porter and J Rogers ldquoSurvey of Variable GenerationForecasting in the Westrdquo in NREL Subcontract ReportSR-5500-54457 April 2012 p 56

[9] S Koch et al ldquoModeling and Control of AggregatedHeterogeneous Thermostatically Controlled Loads forAncillary Servicesrdquo Proc of PSCC pp 1ndash7 2011

[10] M He S Murugesan and J Zhang ldquoMultiple TimescaleDispatch and Scheduling for Stochastic Reliability in SmartGrids with Wind Generation Integrationrdquo in Proc of IEEEINFOCOM 2011 pp 461ndash465

[11] A Puggelli et al ldquoPolynomial-Time Verification of PCTLproperties of MDPs with Convex Uncertaintiesrdquo in Proc of

CAV ser LNCS Springer Berlin Heidelberg 2013 vol8044 pp 527ndash542

[12] A Puggelli ldquoFormal Techniques for the Verification andOptimal Control of Probabilistic Systems in the Presence ofModeling Uncertaintiesrdquo PhD dissertation University ofCalifornia Berkeley Berkeley CA 2014

[13] C Baier et al ldquoController Synthesis for ProbabilisticSystemsrdquo in Exploring New Frontiers of TheoreticalInformatics Springer US 2004 vol 155 pp 493ndash506

[14] M Lahijanian et al ldquoTemporal Logic Motion Planning andControl with Probabilistic Satisfaction Guaranteesrdquo IEEETransactions on Robotics vol 28 no 2 pp 396ndash409 2012

[15] M Kwiatkowska and D Parker ldquoAutomated Verificationand Strategy Synthesis for Probabilistic Systemsrdquo in Proc ofATVA ser LNCS vol 8172 Springer 2013 pp 5ndash22

[16] K Sen et al ldquoModel-Checking Markov Chains in thePresence of Uncertaintiesrdquo in Proc of TACAS ser LNCSSpringer Berlin Heidelberg 2006 vol 3920 pp 394ndash410

[17] A Nilim and L El Ghaoui ldquoRobust Control of MarkovDecision Processes with Uncertain Transition MatricesrdquoJournal of Operations Research pp 780ndash798 2005

[18] M Y Vardi ldquoAutomatic Verification of ProbabilisticConcurrent Finite State Programsrdquo in Proc of SFCS 1985pp 327ndash338

[19] V Forejt et al ldquoAutomated Verification Techniques forProbabilistic Systemsrdquo Formal Methods for EternalNetworked Software Systems vol 6659 pp 53ndash113 2011

[20] F Bouffard and F Galiana ldquoStochastic Security forOperations Planning with Significant Wind PowerGenerationrdquo in IEEE Transaction on Power Systems 2008vol 23 no 2 pp 306ndash316

[21] A Kucera and O Stražovskyacute ldquoOn the Controller Synthesisfor Finite-State Markov Decision Processesrdquo in Proc ofFSTTCS ser LNCS Springer Berlin Heidelberg 2005vol 3821 pp 541ndash552

[22] K Draeger et al ldquoPermissive Controller Synthesis forProbabilistic Systemsrdquo in Proc of TACAS ser LNCSSpringer 2014 pp 531ndash546

[23] T Brazdil et al ldquoStochastic Games with Branching-TimeWinning Objectivesrdquo in Proc of IEEE SLCS Aug 2006 pp349ndash358

[24] C Hang et al ldquoSynthesizing Cyber-Physical ArchitecturalModels with Real-Time Constraintsrdquo in Proc of CAVBerlin Heidelberg Springer-Verlag 2011 pp 441ndash456

[25] P Nuzzo et al ldquoCalCS SMT Solving for Non-linearConvex Constraintsrdquo in Proc of FMCAD 2010 pp 71ndash80

[26] S Boyd and L Vandenberghe ldquoConvex OptimizationrdquoCambridge University Press 2004

[27] ldquoGurobi Optimizerrdquo [Online] httpwwwgurobicom[28] S Stoft Power System Economics Designing Markets for

Electricity NewYork Wiley 2002[29] Y Wan ldquoLong-term Wind Power Variabilityrdquo NREL

Technical Report TP-5500-53637 [Online]httpwwwnrelgovdocsfy12osti53637pdf January 2012

[30] E Ela et al ldquoOperating Reserves and Variable GenerationrdquoNREL Technical Report [Online]httpwwwnrelgovdocsfy11osti51978pdf August 2011

[31] M Kwiatkowska et al ldquoPRISM 40 Verification ofProbabilistic Real-Time Systemsrdquo Proc of CAV pp585ndash591 2011

Page 5: Robust Strategy Synthesis for Probabilistic Systems Applied to …sseshia/pubdir/emsoft... · 2014-08-07 · Robust Strategy Synthesis for Probabilistic Systems Applied to Risk-Limiting

Opmizaon13 Engine13 (strategy13 ranking)13

Feasible13 Candidate13

op0mal13 solu0on13 NO13 addi0onal13 constraint13

Op0mal13 strategy13 that13 sa0sfies13 proper0es13

No13 strategy13 sa0sfies13 proper0es13

Verificaon13 Engine13 (verifies13 if13 properes13 are13 sasfied)13

Unfeasible13

YES13

Fig 2 Lazy algorithm for the constrained optimization

Lemma 41 Complexity of PCTL model-checking forCMDPs [11] Verifying that a CMDP MC satisfies a PCTL for-mula φ is solvable in polynomial time

Lemma 42 Computation of the total expected reward for Con-vex Markov Chains (CMCs) [17] Given a CMC its total ex-pected reward is computable in polynomial time

Lemma 43 Complexity of PCTL strategy synthesis forMDPs [13] The problem of determining the existence of an MDstrategy σ for an MDPM such thatM σ |= φ is NP-complete

We now prove that the decision problem version of Problem (5)is NP-complete

Theorem 41 The problem of determining the existence of an MDstrategy σ for an EMDP MC with total expected reward Wσ

S0

larger or equal to WT and satisfying a specification φ in PCTLis NP-complete

Proof Given a candidate solution σc we can in polynomial time1) check whether MC σc |=Nat φ by Lemma 41 2) computeWσcS0

on the induced CMCMσcC by Lemma 42 σc is a solution if

and only if check 1) passes and WσcS0ge WT This proves that the

problem is in NP To prove NP-hardness we reduce the problem inLemma 43 to the one under analysis We set WT = 0 and de-scribe transition probabilities with point ellipses ie ellipses withnull axes (Kas rarr +infin foralls isin S foralla isin A(s) in Equation (2))

In the rest of the section we describe an algorithm to solve Prob-lem (5) We use a lazy approach based on strategy-iteration con-ceptually similar to the ones proposed in [24] and for non-linearconstraints [25] As shown in Fig 2 the algorithm is split into twomain routines communicating in a loop At each iteration the opti-mization engine (OE) is responsible to generate a candidate strategyσc σc maximizes the total expected reward but it does not neces-sarily satisfy PCTL property φ since the OE formulation does notcontain any information about φ The candidate solution is thenpassed to the verification engine (VE) which checks whetherMCsatisfies φ for all resolutions of uncertainty ie forallηa isin Nat whenoperating under σc If the check passes σlowast = σc and the algo-rithm terminates Otherwise the VE generates an additional con-straint for the OE to prevent the previous σc to be selected againand the loop repeats The novelty of our approach is devising amathematical formulation for the OE capable of generating at eachiteration the candidate strategy that maximizes the total expectedreward among those still available (at each iteration in which theVE reports a failure the candidate strategy is discarded) The firstcandidate strategy that satisfies the PCTL specification becomes the

solution of the synthesis problem Such a strategy is feasible (sat-isfies φ) and it has the maximum total reward among strategies thatsatisfy φ so it solves Problem (5) exactly

In general there is not guarantee on the uniqueness of the op-timal strategy σlowast ie multiple strategies with the same expectedreward might exist each satisfying φ Since every such strategyis equivalent from the user perspective the algorithm just reportsthe first one found Alternatively all equivalent strategies could begenerated by continuing the iteration between the OE and the VEuntil finally a reduction in the expected reward gets detected andby reporting all synthesized strategies satisfying φ

The next subsections give details on the OE and VE and analyzethe algorithm properties

41 Optimization EngineWe start with the classical linear-programming (LP) formulation

to maximize the total expected reward for MDPs [4]

minxl

sumsisinS0

xs

st xs minus las = ras + xTfas foralls isin Sforalla isin A(s) (6)xs l

as ge 0 foralls isin Sforalla isin A(s)

Vector x collects the total expected reward for each state s isin S (atthe end of the optimization Wσc

s = xs foralls isin S) and the cost func-tion sums the total expected rewards for all the initial states s isin S0We then set Wσc

S0=sumsisinS0

xs Variables las are slack variablesfor each constraint Also ras = rs(s) + ra(s a) foralls isin Sforalla isin ASince the slack variables have negative sign the slack can only benegative ie the left-hand side (LHS) can only be larger or equalthan the right-HS (RHS) The ldquominrdquo operator makes sure that foreach state the constraint with the highest RHS has null slack ielas = 0 The optimal strategy can then be reconstructed by selectingthe action a isin A(s) foralls isin S corresponding to the constraint withnull slack eg σc(s0 a) = 1 if las0 = 0 Our goal is to modifysuch a formulation to allow a sub-optimal solution to be selected incase the optimal solution does not satisfy the PCTL specificationWe now describe an equivalent formulation to the original prob-lem that is more suitable to achieve this goal We will describe inSection 42 how to add constraints to this formulation to select sub-optimal solutions in order of optimality We refer to Problem (7)

maxxzln

sumsisinS0

xs

st xs minus las + nas = ras + xTfas foralls isin Sforalla isin A(s) (7a)las le Bzas nas le Bzas foralls isin Sforalla isin A(s) (7b)

zTs 1 = Ms minus 1 foralls isin S (7c)xs l

as n

as ge 0 zas isin 0 1foralls isin Sforalla isin A(s)

We associate a binary variable zas to each action for every stateso the problem becomes a Mixed-Integer Linear Program (MILP)zajsi = 0 if action aj is chosen for state si and Constraint (7c) guar-

antees that only one action can be selected for each state whereMs = |A(s)| For example σc(s0 a) = 1 if zas0 = 0 We then as-sociate to each constraint a second slack variable nas with sign op-posite to las For selected actions zaisi = 0 Constraint (7b) makessure that zajsi = 0 implies lajsi = 0 and najsi = 0 so that the cor-responding Constraint (7a) sets the value of xs1 For unselectedactions zaksi = 1 variable laksi gt 0 (naksi gt 0) if selecting ac-tion ak had resulted in a lower (higher) value of xsi With these

1 B is a big number with respect to the problem data It needs to behigher than the reward computed for any state ie B ge xs foralls isinS but not excessively high to avoid convergence problems in theoptimizer when solving Problem (7) A suitable value of B can befound by trial and error

constraints any action can be selected We finally change the op-timization operator to ldquomaxrdquo so that at the first iteration of thealgorithm the total expected reward gets maximized

We now proceed to consider uncertainties in the transition prob-abilities Constraint (7a) gets updated to Constraint (8) since theadversarial nature tries to minimize the expected reward The newconstraint can be made linear again for an arbitrary uncertaintymodel by replacing it with a set of constraints one for each pointin Fas However this approach results in infinite constraints if theset Fas contains infinitely many points as in the cases consideredin the paper thus making the problem not solvable We solve thisdifficulty for the ellipsoidal uncertainty model using duality InConstraint (9) we replace the primal inner problem with its dualforalls isin S a isin A(s)

xs minus las + nas = ras + minfas isinFas

xTfas (8)

dArrxs minus las + nas = ras + max

λasisinDas (x)

g (λas ) (9)

Vector Lagrange multiplierλas =

[λa1s λ

a2sλ

a3s

]Dual cost functiong (λas ) = λa1s minus λa2s minus ha

sTEasλ

a3s

Dual feasibility setDas (x) = λas | λa3s2 le λa2s xminus λa1s1 + Eas

Tλa3s = 0

The dual problem is convex by construction [26] and has size poly-nomial in the size R of the EMDP [11] Since also the primalproblem is convex strong duality holds ie the primal and dualoptimal solutions coincide because the primal problem satisfiesSlaterrsquos condition [26] for any non-trivial uncertainty set Fas Anydual solution underestimates the primal solution When substitut-ing the primal problem with the dual in Constraint (9) we can dropthe inner optimization operator because the outer optimization op-erator will nevertheless aim to find the least underestimate to max-imize its cost function We get the full formulation for the OE (thequantifiers foralls isin Sforalla isin A(s) are equal to Problem (7))

maxxλlnz

sumsisinS0

xs

st xs minus las + nas = ras + g (λas )

las le Bzas nas le Bzas (10)

zTs 1 = Ms minus 1

xs las n

as ge 0λas isin Das (x) zas isin 0 1

For the ellipsoidal model Problem (10) can be written as Prob-lem (11) which is a Mixed-Integer Quadratic-Constrained Program(MIQCP) that we solve using the back-end optimizer Gurobi [27]

maxxλzln

sumsisinS0

xs

st xs minus las + nas = ras + λa1s minus λa2s minus hasTEasλ

a3s

las le Bzas nas le Bzas

zTs 1 = Ms minus 1 (11)

xs las n

as λ

a2s ge 0 λa3s ge 0 zas isin 0 1

λa3s2 le λa2s xminus λa1s1 + EasTλa3s = 0

42 Verification EngineAfter fixing a candidate strategy σc all the non-determinism inMC is resolved The VE has thus the task of checking whetherthe induced Ellipsoidal-MC (EMC) Mσc

C = (S S0ΩF X L)satisfies the PCTL formula φ for all resolutions of uncertainty ieforallηa isin Nat We use the sound and complete verification algo-rithm presented in [11 12] (we remember that EMCs are a special

case of CMDPs) If the EMC satisfies φ then the optimal strat-egy has been found and σlowast = σc Otherwise the VE needs togenerate an additional constraint to be passed to the OE so thatthe same candidate solution does not get selected anymore If vec-tor zc = [zas0 middot middot middot z

asN ] collects all the binary decision variables that

were set to zero in the previous round of optimization ie the vari-ables corresponding to the previously selected actions we just needto add constraint

zTc 1 ge 1 (12)

so that it is not possible to select the same set of actions againAs an example we optimize the total expected reward of the

EMDP in Fig 1 subject to property φ = Pge08[ϑ U abs] The firstiteration of the OE generates strategy σc1 which selects actions[b a a a] for states [s0 middot middot middot s3] with Wσc1

s0 = 10625 The VEthough reports Pσc1mins0 [ϑ U abs] = 0207 so the strategy is re-jected The VE adds the constraint zbs0 + zas1 + zas2 + zas3 = 1 tothe OE formulation which generates at the second generation σc2([a a a a]) with Wσc2

s0 = 10188 The VE computes the valuePσc2mins0 [ϑ U abs] = 1 and the algorithm terminates reportingσlowast = σc2

43 Algorithm AnalysisWe prove soundness (if a strategy σlowast is returned it indeed opti-

mally solves Problem (5)) completeness (if no solution is returnedno strategy σ isin ΣMD exists such thatMC σ |=Nat φ) and ana-lyze the runtime performance of the proposed algorithm

Theorem 42 The algorithm presented in this section to solveProblem (5) is sound complete and has runtime exponential in thesize R of the EMDP and polynomial in the size Q of the PCTLspecification φ

Proof Problem (10) returns the MD strategy σc that maximizesWσS0

among those still available By Lemma 41 the VE is soundso if it returns MC σc |=Nat φ indeed σlowast = σc (exit arrow atthe bottom of Fig 2) The VE is also complete so if it returnsMC σc 6|=Nat φ the current σc can be discarded This is doneby generating a constraint of the form of Constraint (12) whichremoves only the current σc from the strategies to be explored bythe OE This proves soundness of the overall algorithm

Failure of finding a solution is declared only by the OE whenProblem (10) becomes unfeasible because all available strategieshave previously been discarded by the VE (exit arrow at the top ofFigure 2) This proves completeness

Finally the algorithm goes at most through I = O(MN ) itera-tions (from Section 2 I is the total number of MD strategies M isthe number of actions and N the number of states of the EMDP)Each requires solving an instance of Problem (10) and a verificationcheck (done in time polynomial inR andQ by Lemma 41) Prob-lem (10) can be solved by branch-and-bound algorithms in time ex-ponential in the number of binary variables (whose number isMNand remains constant across iterations) and polynomial in the num-ber of constraints (whose number is polynomial in R at the firstiteration and it grows at each iteration limited by I = O(MN ))The total complexity isO(MNtimes

(2MNtimespoly(MN )+poly(Q))

)

exponential inR and polynomial inQ

The algorithm performs better on problems that do have a fea-sible solution arguably the most interesting ones while the opti-mization step could be removed if the goal is to prove unfeasibilityAs an alternative σlowast could be determined by testing all I avail-able MD strategies and selecting the one with the highest rewardamong those satisfying φ [13] We believe (and experimentallyshow in Section 6) that our approach can achieve better runningtime by decoupling the problem into an optimization and a veri-fication part and by testing strategies in order of optimality Fi-

Real-time price13

Day-ahead price13

Network Operator13Energy dispatch and pricing13

Base-line13 Wind13 Fast-start13

Traditional13Users13

Opportunistic13Users13

Day-ahead13dispatch13

Real-time dispatch13

Uncertain demand13

Uncertain supply13

Supply13

Demand13

Fig 3 Network operator inputs (dashed) and outputs (solid)

nally speed-ups can be obtained by implementing online routinesfor integer-constraint simplification and to produce more succinctcertificates of unfeasibility from the VE [2425] These routines areoutside the scope of this paper and will be covered in future work

5 SUPPLY SCHEDULING AND ENERGYPRICING WITH RENEWABLE SOURCES

We model the pricing and dispatch problem following the sce-nario sketched in Fig 3 [7 10 28] More details are availablein [12]

Three agents operate in the system the network operator andtwo types of users traditional and opportunistic Further three en-ergy sources are available non-dispatchable wind and two dis-patchable sources baseline and fast-start generators We will usestochastic variables WDt Do to represent wind energy supplyand traditional and opportunistic user demand respectively

The network operator takes two kinds of decisions 1) dispatchof non-renewable sources to guarantee that the aggregate energysupply matches the demand 2) pricing of energy to maximizeprofits and incentivize users to join or leave the network depend-ing on energy availability Traditional and opportunistic users re-act to pricing decisions on different timescales Traditional usersonly react to day-ahead pricing to decide how much energy theyare willing to purchase while opportunistic users are capable ofrescheduling in real-time their energy demand depending on theenergy price in exchange of lower expected prices

A 24-hour period gets divided into T1-slots of equal length (egT1 = 1h) and each T1-slot into K T2-slots (eg T2 = 30minK = 2) [10] The operator maximizes its economic profit by tak-ing decisions on the two time scales On day-ahead for each T1-slot it dispatches Q units of baseline energy (unit cost c1) withq = QK units per T2-slot and sets the price for traditional users(u) so that they can decide one day ahead when to schedule theirdemand in the following day The choice of u sets the expected de-mand of traditional users (E[Dt]) In real-time for each T2-slot theoperator first observes the values of traditional-user demand dt andwind availabilityw It then sets the price for opportunistic users (v)to determine the expected demand of opportunistic users (E[Do])Third it dispatches the production of more fast-start energy (unitcost c2) or cancels part of the already dispatched baseline energy(paying only unit cost cp instead of c1) depending on wind avail-ability and user demand to balance supply and demand In realscenarios cp lt c1 lt c2 and wind-energy is assumed to be free forbrevity [10] The operator thus tries to use as much wind energyas possible and to dispatch on day-ahead the exact amount of base-line energy not to incur in real-time in cancellation costs or in theextra cost for fast-start supplies Since more profitable strategiesmight imply a higher reliance on the uncertain wind energy or anincrease in energy prices correct system functionality needs limitson energy-unbalance risks and QoS guarantees for users

There are three sources of stochastic behavior (i) wind-energysupply (W ) (ii) traditional (Dt) user demand (iii) opportunistic(Do) user demand We thus use a stochastic optimization frame-work The result of the optimization should return optimal strate-

wα 0 0

w1αdt1αdo1γ

w1αdt1αdo1β

k=013 k=113 k=213

v1a

v1a

v1av1

b

v1b

Qa ua

Qb ub

Qa ub

w1β 0 0

s0

0 0 0

w1α 0 0w1αdt1α 0

w1β dt1β 0w1β dt1α 0

w1αdt1β 0

w1αdt1αdo1α

w1αdt1αdo1δv1b

w2β 0 0

w2α 0 0

wαdtαdoγ

wαdtαdoβva

va

vavb

Qa ua

Qb ub

Qa ub

wβ 0 0

s0

0 0 0

wαdtα 0

wβ dtβ 0wβ dtα 0wαdtβ 0

wαdtαdoα

wαdtαdoδvb

vb

ab uQ

ab uQ

Fig 4 Sketch ofME (top) andMprimeE (bottom) The latter isME

unfolded across decision epochs For each EMDP the initial states0 is shown on the left Each state is represented by the tuple(w dt do) of observations of WDt Do The pairs (Q u) rep-resent the day-ahead decisions about dispatch of baseline energyQand energy pricing for traditional users u Two arrows per decisiondepart from each state because in this figure we assumed only twodiscretization levels for each quantity (labeled with greek lettersα β middot middot middot ) We only show the state graph in the EMDPs related todecision (Qa ua) but similar state graphs are present also for theother decision pairs (Qa ub) (Qb ua) (Qb ub) as hinted by thedashed arrows departing from s0

gies about 1) the day-ahead decisions (Q and u) and 2) the real-time decisions (v) for each possible observation of W and Dt Inreal-time the actual decision (vobs) will be taken deterministicallyamong the synthesized ones based on the observed values w anddt ie the actual wind availability and traditional-user demandIn summary we will optimize over one T1-slot (the decision prob-lem is periodic so we can run one optimization for each T1-slotstand-alone) and aim to determine optimal values for Q u and v

We use the Ellipsoidal-MDPME = (S S0 AΩF AX L)sketched in Fig 4 (top) All quantities are bounded and uniformlydiscretized to keep the state and action spaces finite States s isin Sare a tuple s = (w dt do) where w dt do refer to the observedvalues of available wind energy and user demand in that state Weconsider only one T1-slot per time so we model only one choiceof day-ahead energy dispatch Q and pricing for traditional usersu (at the initial state s0 (Q u) isin A) The process then transi-tions through K decision epochs as follows First values of windenergy (w) and traditional user demand (dt) are stochastically cho-sen according to the corresponding distributions (described below)Second for each observation of W and Dt a decision on v isin A ismade and the opportunistic user demand (do) is stochastically cho-sen To transition between epochs a new value of wind energy w(the backward arrow in Fig 4 (top)) is chosen and the steps repeat

In general the optimal decision v at each epoch depends onprevious observations of wind availability (w) and user demand(dt do) so v is history-dependent To synthesize control strategiesusing the algorithm in Section 4 we unfold the sequence of deci-sion epochs in the EMDPMprimeE = (Sprime S0 A

primeΩF primeAprimeX prime Lprime)in Fig 4 (bottom) InMprimeE we have explicitly marked each quan-tity with an additional subscript k = 1 middot middot middot K to refer to the cor-responding T2-slot Each state s isin S ofME (apart from the initialstate) has been replicated K times in MprimeE s rarr s1 s2 middot middot middot sK After K decision epochs the states transition to an absorbing state(not shown in Fig 4) Since now all decision epochs are explicitly

codified in the model Markov strategies are optimal forMprimeE Thecorresponding optimal history-dependent strategy forME can bereconstructed for each s isin S by collecting the sequence of optimaldecisions returned by the algorithm in the replicas s1 s2 middot middot middot sK In the following we will only considerMprimeE As an additional ad-vantage different wind distributions can be used to transition be-tween different epochs inMprimeE (only one distribution could be usedinME) to account for the time-varying nature of wind availabilityThe modeling expressivity is thus increased

State transition probabilities are computed using the followingstochastic models

User demand The demand of both traditional (Dt) and oppor-tunistic (Do) users is modeled using Gaussian distributions [10]with Dt sim N (αtu

γt βtE[Dt]) and Do sim N (αovγo βoE[Do])

Parameter γt lt 0 (γo lt 0) is the elasticity of the traditional (op-portunistic) users ie the ratio of the percentage change of theexpected demand to that of price variations formally defined asγt = uE[Dt] middot partE[Dt]partu Parameters αt αo βt βo are fitting pa-rameters To compute transition probabilities we truncate and dis-cretize the continuous PDs in equally-sized intervals pick the mid-dle point of each interval as the discretization value and then inte-grate the PD across the interval to determine how likely the systemtransitions to that discretized value2

Wind-energy availability We created a stochastic model of theavailable wind energy starting from measured data collected fromthe wind farm at Lake Benton Minnesota USA [29] The goal is totake forecast values into account while also considering the intrin-sic inaccuracies of these predictions First we compute the (dis-crete) empirical PD microW of a training set of collected wind-energydata Second we divide a new set of data in T2-slots and considerthe average value for each new T2-slot as the forecast energy valueWe then scale microW to have such expected value E[Wk] thus obtain-ing microWk Finally we compute ellipsoidal Sets (3) F primeas (collectedinF prime) to represent uncertainty in the transition probability betweentwo discretized energy levels in two consecutive T2-slots Transi-tion frequencies are computed by counting observed transitions in atraining set of data Using classical results from statistics [17] wethen compute the value of parameter β from Set (1) correspond-ing to the confidence level CL in the measurements In particular0 le CL le 1 and CL = 1minus cdfχ2

d(2 lowast (βmax minus β)) where cdfχ2

d

is the cumulative density function of the Chi-squared distributionwith d degrees of freedom (d is equal to the number of bins used todiscretize W ) Moreover an increasing value of parameter β canbe used in the sequence of decision epochs to model the fact thatforecast farther-away in time are less accurate

We provide the states with thick circles in Fig 4 (bottom) withthree reward structures to express the profit and risk of the oper-ator and the QoS for the users We choose those states becausethe quantities Dtk DokWk are all fully observable in them thusallowing the computation of the rewards We set

rProfitsk [$] = udtk+vkdokminus(cp∆k+c1(qminus∆k))1A∆kge0

minus(c1qminusc2∆k)1B∆klt0 (13a)

rLoLsk [MWh] = max(0 XE[Wk] + Y E[Dtk +Dok]minus∆k)(13b)

rQualitysk [MWh] = dtk + dok (13c)

with ∆k = wk + q minus dtk minus dok representing the surplus ofsupply on demand X and Y defined in the following and 1 theindicator function Reward (13a) subtracts operating costs to the

2For simplicity we assume that the distributions of Dt and Doare known with certainty In fact these distributions are usuallyestimated more accurately than the wind forecast by analyzing thehistory of measured data [10] Adding uncertainty models also forthese distributions is left as future work

operator revenue to compute the net profit Indicator 1A (1B) cor-responds to the scenario when the sum of day-ahead dispatchedand wind energy is sufficient (insufficient) to cover the demandIn the latter case fast-start energy needs to be dispatched in real-time Reward (13b) computes the Loss of Load (LoL) In prac-tical scenarios the amount of fast-start energy available in real-time is limited Often this limit is computed with the formulaFS le XE[W ] + Y E[Dt +Do] (eg X = 3 Y = 10) [30]If ∆ + FS lt 0 the network incurs in a LoL with potentiallyrisky consequences Reward (13c) accounts for user demand in-centivized by energy pricing

Finally we mark all states with ∆ +FS lt 0 with the label riskand use label abs for the absorbing state so Ω = risk abs

The optimal strategy σlowast = (ulowast Qlowast vlowastk) 1 le k le K is thesolution of problem

Wlowasts0 = maxQu

minfas isinFprimeas

Eσfas

W EσDt maxv1middotmiddotmiddotvK

EσDorewrProfit(πK)

stMprimeE σlowast |=Nat φ where (14)

φ = RrLoL

leEENSM [C abs] andRrQuality

geQoSm [C abs]

and Pge1minusLoLPM [notrisk U abs]

In Problem (14) we maximize the expected value of the operatorprofit Wlowasts0 under the worst-case resolution of uncertainty in thewind-energy forecast by summing the instantaneous state rewardsrProfits along the paths π isin Πfin of K steps ofMprimeE correspond-ing to the K T2-slots By replicating the decision process for eachof the K T2-slots while building modelMprimeE every K-step execu-tion path traverses all decision epochs so the computed reward asintroduced in Definition 26 and 27 will account for the sum of thecontributions of each decision epoch

According to the semantics defined in Table 1 the PCTL specifi-cation φ constrains the expected operator risk and user QoS acrossthe decision horizon EENSM is the desired maximum value ofExpected Energy Not Served LoLPM is the maximum allowedvalue of Loss of Load Probability (these two properties limit therisk for the operator) andQoSm is the minimum value of QoS thatneeds to be guaranteed to the users

6 EXPERIMENTAL RESULTSWe implemented the algorithm in Python and interfaced it with

PRISM [31] as a front-end for entering models and with Gurobi [27]as the back-end optimizer Experiments were run on a 24 GHz In-tel Xeon with 32GB of RAM

In this section we present experimental results obtained by solv-ing Problem (14) Our goals are to give insight about the algo-rithm functionality compare its scalability to other strategy synthe-sis approaches and show that the synthesized energy pricing anddispatch strategies can achieve better performances than other so-lutions presented in the literature

We define the following quantities

Profit = Wlowasts0EENS = Rr

LoLσlowastmaxs0 [C abs]

QoS = RrQualityσlowastmins0 [C abs]

1minus LoLP = Pσlowastmin

s0 [notrisk U abs]

where the min and max operators refer to the action range of na-ture Nat As defined in Section 22 these quantities represent thequantitative values of rewards and satisfaction probability that getthen compared to the corresponding thresholds (EENSM QoSmLoLPM ) in Problem (14) to determine the satisfaction of φ Wewill then normalize the Profit to ProfitM the maximum com-puted profit value for each set of experiments We set T1 = 1hT2 = 30min so K = 2 and consider two pricing options bothfor traditional and opportunistic users If not otherwise stated

Fig 5 Performance vs iteration

Fig 6 Profit vs confidence in forecast

we will use CL = 90 and discretize the wind energy W in 5bins and traditional Dt opportunistic Do demands and baselinesupply Q in 2 bins We set QoSm = 80

sumk E[Dtk +Dok]

LoLPM = 10 EENSM = 5sumk E[Wk + q] The other pa-

rameter values were taken from [10]In Fig 5 we show the trend of the expected system performance

as a function of the synthesis algorithm iteration The Profitmonotonically decreases until the proposed candidate strategy σcmeets all specifications We note that 1minusLoLP andQoS (EENSnot shown has a trend similar to 1 minus LoLP ) instead vary non-monotonically Intuitively this is because the Profit can be in-creased either by scheduling less baseline energy Q to reduce can-cellation costs cp but incurring in a higher risk or by increasing the

Table 2 Performance Analysis

W bins 5 10 15 20Profit 1 098 097 0965

1minus LoLP 099 099 099 099EENS 098 098 098 098QoS 101 101 101 101

Runtime 144s 400s 1368s 3289sIter 223 53 547 332N + Tr 1343 2719 4115 5591

MD Strat (I) 4096 42e6 43e9 44e12

005 010 015 020 025 030082084086088090092094096098100

Pro

fit

Pro

fit M

Performance Comparison vs Varaiya et al [7] and He et al [10]

Ours[7][10]

005 010 015 020 025 03000020406081012

EE

NS

EE

NS M

Ours[7][10]

005 010 015 020 025 030Wind Penetration ηW

088090092094096098100102

QoS

QoS

m Ours[7][10]

Fig 7 Comparison to alternative approaches via Monte Carlo sim-ulation

energy price v for opportunistic users with consequent reduction ofQoS (the peaks of 1 minus LoLP are aligned to the valleys of QoS)By ranking strategies by expected Profit our algorithm is capa-ble of selecting the optimal strategy despite the complex parameterinterdependences in the model under analysis

In Table 2 we compare synthesis results while varying the num-ber of discretization bins for W (all values are normalized to thecorresponding specification) First we note that the expected sys-tem performances do not substantially vary by changing the num-ber of bins supporting our choice of 5 bins in the other experi-ments Second runtime results show that the algorithm can handlein reasonable time problems more than 10times larger than the onesanalyzed in [14] the only other algorithm in the literature capableof processing arbitrary PCTL formulas (we useN+Tr the sum ofstates and transitions in the EMDP as a proxy of the model size)Third we note that the alternative approach of verifying all the IMD strategies σ isin ΣMD is not practical due to the exponentialincrease of I (defined in Section 211) with the problem size

In Fig 6 we study the effect of different confidence levels CL inthe wind-energy forecast on the expected operator Profit whilevarying the value of wind penetration ηW =

sumKk=1

E[Wk]E[Wk + q]

and keeping all constraints constant At high CL higher profitscan be expected for increasing ηW (wind energy is assumed free)On the other hand for low CL higher wind penetration createsmore uncertainty thus lowering the expected Profit The networkoperator can use these curves to assess the return of investment inemploying more accurate (and expensive) forecast techniques

Finally in Fig 7 we compare results with two other energy-pricing formulations proposed in the literature He et al [10] solvethe optimization problem without enforcing any constraint Varaiyaet al [7] only put limits on the acceptable LoLP (their approach isnot trivially extendable to reward properties expressed using the Roperator) and solve optimally only for LoLP = 0 Comparison isdone by solving the different optimization problems and then run-ning Monte Carlo simulations (1000 runs) of the controlled systemon test data (different from the training ones) to evaluate its per-formance (Profit EENS and QoS) As expected the uncon-strained strategy from [10] has higher Profit (up to 5) but alsoup to 12 more EENS and 10 less QoS compared to our ap-proach The strategy from [7] guarantees null EENS but it hasup to 6 less Profit (due to over constraining EENS) and 10less QoS (which is left unconstrained)

As a final remark we note that runtime may increase exponen-tially as we tighten the specification thresholds (QoSm LoLPM EENSM ) since it becomes increasingly more difficult to find a

solution within the exponentially-sized search space Neverthelessthe chosen values were tight enough to improve the quality of alter-native energy pricing and dispatch strategies proposed in the litera-ture while maintaining the runtime acceptable for this application

7 CONCLUSIONS AND FUTURE WORKWe proposed the first sound and complete algorithm for the syn-

thesis of control strategies for MDPs satisfying properties expressedin PCTL and robust to uncertainties in the transition probabilitiesWe then applied the algorithm to the problem of energy pricingand dispatch in smart grids with renewables Results showed thatnetwork-operator risks can be effectively constrained at design timeand that more accurate predictions of the expected profit can be ob-tained by taking the uncertainty of wind availability into account

As future work we plan to investigate techniques to generatemore concise constraints to prove failure of the verification step inorder to prune more effectively the search space for the optimiza-tion engine and to apply the proposed strategy synthesis approachto further case studies eg semi-autonomous car driving

Acknowledgments The authors thank J Venkatesh and profT Simunic Rosing for providing the measurement data used in thecase study The research was partially funded by DARPA AwardNumber HR0011-12-2-0016 and by TerraSwarm one of six cen-ters of STARnet a Semiconductor Research Corporation programsponsored by MARCO and DARPA Approved for public releasedistribution is unlimited The content of this paper does not neces-sarily reflect the position or the policy of the US government andno official endorsement should be inferred

8 REFERENCES[1] Courcoubetis C and Yannakakis M ldquoThe Complexity of

Probabilistic Verificationrdquo Journal of ACM vol 42(4) pp857ndash907 1995

[2] H Hansson and B Jonsson ldquoA Logic for Reasoning AboutTime and Reliabilityrdquo Formal Aspects of Computing vol6(5) pp 512ndash535 1994

[3] A Bianco and L de Alfaro ldquoModel Checking ofProbabilistic and Nondeterministic Systemsrdquo in Proc ofFSTTCS ser LNCS Springer Berlin Heidelberg 1995vol 1026 pp 499ndash513

[4] M Puterman Markov Decision Processes DiscreteStochastic Dynamic Programming John Wiley and Sons1994

[5] ldquoInternational Energy Agency World Energy Outlook 2009rdquo[Online] httpwwwworldenergyoutlookorgdocsweo2009WEO2009_es_englishpdf

[6] A Gore An Inconvenient Truth New York Rodale 2006[7] P Varaiya F Wu and J Bialek ldquoSmart Operation of Smart

Grid Risk-Limiting Dispatchrdquo Proceedings of the IEEEvol 99 no 1 pp 40ndash57 2011

[8] K Porter and J Rogers ldquoSurvey of Variable GenerationForecasting in the Westrdquo in NREL Subcontract ReportSR-5500-54457 April 2012 p 56

[9] S Koch et al ldquoModeling and Control of AggregatedHeterogeneous Thermostatically Controlled Loads forAncillary Servicesrdquo Proc of PSCC pp 1ndash7 2011

[10] M He S Murugesan and J Zhang ldquoMultiple TimescaleDispatch and Scheduling for Stochastic Reliability in SmartGrids with Wind Generation Integrationrdquo in Proc of IEEEINFOCOM 2011 pp 461ndash465

[11] A Puggelli et al ldquoPolynomial-Time Verification of PCTLproperties of MDPs with Convex Uncertaintiesrdquo in Proc of

CAV ser LNCS Springer Berlin Heidelberg 2013 vol8044 pp 527ndash542

[12] A Puggelli ldquoFormal Techniques for the Verification andOptimal Control of Probabilistic Systems in the Presence ofModeling Uncertaintiesrdquo PhD dissertation University ofCalifornia Berkeley Berkeley CA 2014

[13] C Baier et al ldquoController Synthesis for ProbabilisticSystemsrdquo in Exploring New Frontiers of TheoreticalInformatics Springer US 2004 vol 155 pp 493ndash506

[14] M Lahijanian et al ldquoTemporal Logic Motion Planning andControl with Probabilistic Satisfaction Guaranteesrdquo IEEETransactions on Robotics vol 28 no 2 pp 396ndash409 2012

[15] M Kwiatkowska and D Parker ldquoAutomated Verificationand Strategy Synthesis for Probabilistic Systemsrdquo in Proc ofATVA ser LNCS vol 8172 Springer 2013 pp 5ndash22

[16] K Sen et al ldquoModel-Checking Markov Chains in thePresence of Uncertaintiesrdquo in Proc of TACAS ser LNCSSpringer Berlin Heidelberg 2006 vol 3920 pp 394ndash410

[17] A Nilim and L El Ghaoui ldquoRobust Control of MarkovDecision Processes with Uncertain Transition MatricesrdquoJournal of Operations Research pp 780ndash798 2005

[18] M Y Vardi ldquoAutomatic Verification of ProbabilisticConcurrent Finite State Programsrdquo in Proc of SFCS 1985pp 327ndash338

[19] V Forejt et al ldquoAutomated Verification Techniques forProbabilistic Systemsrdquo Formal Methods for EternalNetworked Software Systems vol 6659 pp 53ndash113 2011

[20] F Bouffard and F Galiana ldquoStochastic Security forOperations Planning with Significant Wind PowerGenerationrdquo in IEEE Transaction on Power Systems 2008vol 23 no 2 pp 306ndash316

[21] A Kucera and O Stražovskyacute ldquoOn the Controller Synthesisfor Finite-State Markov Decision Processesrdquo in Proc ofFSTTCS ser LNCS Springer Berlin Heidelberg 2005vol 3821 pp 541ndash552

[22] K Draeger et al ldquoPermissive Controller Synthesis forProbabilistic Systemsrdquo in Proc of TACAS ser LNCSSpringer 2014 pp 531ndash546

[23] T Brazdil et al ldquoStochastic Games with Branching-TimeWinning Objectivesrdquo in Proc of IEEE SLCS Aug 2006 pp349ndash358

[24] C Hang et al ldquoSynthesizing Cyber-Physical ArchitecturalModels with Real-Time Constraintsrdquo in Proc of CAVBerlin Heidelberg Springer-Verlag 2011 pp 441ndash456

[25] P Nuzzo et al ldquoCalCS SMT Solving for Non-linearConvex Constraintsrdquo in Proc of FMCAD 2010 pp 71ndash80

[26] S Boyd and L Vandenberghe ldquoConvex OptimizationrdquoCambridge University Press 2004

[27] ldquoGurobi Optimizerrdquo [Online] httpwwwgurobicom[28] S Stoft Power System Economics Designing Markets for

Electricity NewYork Wiley 2002[29] Y Wan ldquoLong-term Wind Power Variabilityrdquo NREL

Technical Report TP-5500-53637 [Online]httpwwwnrelgovdocsfy12osti53637pdf January 2012

[30] E Ela et al ldquoOperating Reserves and Variable GenerationrdquoNREL Technical Report [Online]httpwwwnrelgovdocsfy11osti51978pdf August 2011

[31] M Kwiatkowska et al ldquoPRISM 40 Verification ofProbabilistic Real-Time Systemsrdquo Proc of CAV pp585ndash591 2011

Page 6: Robust Strategy Synthesis for Probabilistic Systems Applied to …sseshia/pubdir/emsoft... · 2014-08-07 · Robust Strategy Synthesis for Probabilistic Systems Applied to Risk-Limiting

constraints any action can be selected We finally change the op-timization operator to ldquomaxrdquo so that at the first iteration of thealgorithm the total expected reward gets maximized

We now proceed to consider uncertainties in the transition prob-abilities Constraint (7a) gets updated to Constraint (8) since theadversarial nature tries to minimize the expected reward The newconstraint can be made linear again for an arbitrary uncertaintymodel by replacing it with a set of constraints one for each pointin Fas However this approach results in infinite constraints if theset Fas contains infinitely many points as in the cases consideredin the paper thus making the problem not solvable We solve thisdifficulty for the ellipsoidal uncertainty model using duality InConstraint (9) we replace the primal inner problem with its dualforalls isin S a isin A(s)

xs minus las + nas = ras + minfas isinFas

xTfas (8)

dArrxs minus las + nas = ras + max

λasisinDas (x)

g (λas ) (9)

Vector Lagrange multiplierλas =

[λa1s λ

a2sλ

a3s

]Dual cost functiong (λas ) = λa1s minus λa2s minus ha

sTEasλ

a3s

Dual feasibility setDas (x) = λas | λa3s2 le λa2s xminus λa1s1 + Eas

Tλa3s = 0

The dual problem is convex by construction [26] and has size poly-nomial in the size R of the EMDP [11] Since also the primalproblem is convex strong duality holds ie the primal and dualoptimal solutions coincide because the primal problem satisfiesSlaterrsquos condition [26] for any non-trivial uncertainty set Fas Anydual solution underestimates the primal solution When substitut-ing the primal problem with the dual in Constraint (9) we can dropthe inner optimization operator because the outer optimization op-erator will nevertheless aim to find the least underestimate to max-imize its cost function We get the full formulation for the OE (thequantifiers foralls isin Sforalla isin A(s) are equal to Problem (7))

maxxλlnz

sumsisinS0

xs

st xs minus las + nas = ras + g (λas )

las le Bzas nas le Bzas (10)

zTs 1 = Ms minus 1

xs las n

as ge 0λas isin Das (x) zas isin 0 1

For the ellipsoidal model Problem (10) can be written as Prob-lem (11) which is a Mixed-Integer Quadratic-Constrained Program(MIQCP) that we solve using the back-end optimizer Gurobi [27]

maxxλzln

sumsisinS0

xs

st xs minus las + nas = ras + λa1s minus λa2s minus hasTEasλ

a3s

las le Bzas nas le Bzas

zTs 1 = Ms minus 1 (11)

xs las n

as λ

a2s ge 0 λa3s ge 0 zas isin 0 1

λa3s2 le λa2s xminus λa1s1 + EasTλa3s = 0

42 Verification EngineAfter fixing a candidate strategy σc all the non-determinism inMC is resolved The VE has thus the task of checking whetherthe induced Ellipsoidal-MC (EMC) Mσc

C = (S S0ΩF X L)satisfies the PCTL formula φ for all resolutions of uncertainty ieforallηa isin Nat We use the sound and complete verification algo-rithm presented in [11 12] (we remember that EMCs are a special

case of CMDPs) If the EMC satisfies φ then the optimal strat-egy has been found and σlowast = σc Otherwise the VE needs togenerate an additional constraint to be passed to the OE so thatthe same candidate solution does not get selected anymore If vec-tor zc = [zas0 middot middot middot z

asN ] collects all the binary decision variables that

were set to zero in the previous round of optimization ie the vari-ables corresponding to the previously selected actions we just needto add constraint

zTc 1 ge 1 (12)

so that it is not possible to select the same set of actions againAs an example we optimize the total expected reward of the

EMDP in Fig 1 subject to property φ = Pge08[ϑ U abs] The firstiteration of the OE generates strategy σc1 which selects actions[b a a a] for states [s0 middot middot middot s3] with Wσc1

s0 = 10625 The VEthough reports Pσc1mins0 [ϑ U abs] = 0207 so the strategy is re-jected The VE adds the constraint zbs0 + zas1 + zas2 + zas3 = 1 tothe OE formulation which generates at the second generation σc2([a a a a]) with Wσc2

s0 = 10188 The VE computes the valuePσc2mins0 [ϑ U abs] = 1 and the algorithm terminates reportingσlowast = σc2

43 Algorithm AnalysisWe prove soundness (if a strategy σlowast is returned it indeed opti-

mally solves Problem (5)) completeness (if no solution is returnedno strategy σ isin ΣMD exists such thatMC σ |=Nat φ) and ana-lyze the runtime performance of the proposed algorithm

Theorem 42 The algorithm presented in this section to solveProblem (5) is sound complete and has runtime exponential in thesize R of the EMDP and polynomial in the size Q of the PCTLspecification φ

Proof Problem (10) returns the MD strategy σc that maximizesWσS0

among those still available By Lemma 41 the VE is soundso if it returns MC σc |=Nat φ indeed σlowast = σc (exit arrow atthe bottom of Fig 2) The VE is also complete so if it returnsMC σc 6|=Nat φ the current σc can be discarded This is doneby generating a constraint of the form of Constraint (12) whichremoves only the current σc from the strategies to be explored bythe OE This proves soundness of the overall algorithm

Failure of finding a solution is declared only by the OE whenProblem (10) becomes unfeasible because all available strategieshave previously been discarded by the VE (exit arrow at the top ofFigure 2) This proves completeness

Finally the algorithm goes at most through I = O(MN ) itera-tions (from Section 2 I is the total number of MD strategies M isthe number of actions and N the number of states of the EMDP)Each requires solving an instance of Problem (10) and a verificationcheck (done in time polynomial inR andQ by Lemma 41) Prob-lem (10) can be solved by branch-and-bound algorithms in time ex-ponential in the number of binary variables (whose number isMNand remains constant across iterations) and polynomial in the num-ber of constraints (whose number is polynomial in R at the firstiteration and it grows at each iteration limited by I = O(MN ))The total complexity isO(MNtimes

(2MNtimespoly(MN )+poly(Q))

)

exponential inR and polynomial inQ

The algorithm performs better on problems that do have a fea-sible solution arguably the most interesting ones while the opti-mization step could be removed if the goal is to prove unfeasibilityAs an alternative σlowast could be determined by testing all I avail-able MD strategies and selecting the one with the highest rewardamong those satisfying φ [13] We believe (and experimentallyshow in Section 6) that our approach can achieve better runningtime by decoupling the problem into an optimization and a veri-fication part and by testing strategies in order of optimality Fi-

Real-time price13

Day-ahead price13

Network Operator13Energy dispatch and pricing13

Base-line13 Wind13 Fast-start13

Traditional13Users13

Opportunistic13Users13

Day-ahead13dispatch13

Real-time dispatch13

Uncertain demand13

Uncertain supply13

Supply13

Demand13

Fig 3 Network operator inputs (dashed) and outputs (solid)

nally speed-ups can be obtained by implementing online routinesfor integer-constraint simplification and to produce more succinctcertificates of unfeasibility from the VE [2425] These routines areoutside the scope of this paper and will be covered in future work

5 SUPPLY SCHEDULING AND ENERGYPRICING WITH RENEWABLE SOURCES

We model the pricing and dispatch problem following the sce-nario sketched in Fig 3 [7 10 28] More details are availablein [12]

Three agents operate in the system the network operator andtwo types of users traditional and opportunistic Further three en-ergy sources are available non-dispatchable wind and two dis-patchable sources baseline and fast-start generators We will usestochastic variables WDt Do to represent wind energy supplyand traditional and opportunistic user demand respectively

The network operator takes two kinds of decisions 1) dispatchof non-renewable sources to guarantee that the aggregate energysupply matches the demand 2) pricing of energy to maximizeprofits and incentivize users to join or leave the network depend-ing on energy availability Traditional and opportunistic users re-act to pricing decisions on different timescales Traditional usersonly react to day-ahead pricing to decide how much energy theyare willing to purchase while opportunistic users are capable ofrescheduling in real-time their energy demand depending on theenergy price in exchange of lower expected prices

A 24-hour period gets divided into T1-slots of equal length (egT1 = 1h) and each T1-slot into K T2-slots (eg T2 = 30minK = 2) [10] The operator maximizes its economic profit by tak-ing decisions on the two time scales On day-ahead for each T1-slot it dispatches Q units of baseline energy (unit cost c1) withq = QK units per T2-slot and sets the price for traditional users(u) so that they can decide one day ahead when to schedule theirdemand in the following day The choice of u sets the expected de-mand of traditional users (E[Dt]) In real-time for each T2-slot theoperator first observes the values of traditional-user demand dt andwind availabilityw It then sets the price for opportunistic users (v)to determine the expected demand of opportunistic users (E[Do])Third it dispatches the production of more fast-start energy (unitcost c2) or cancels part of the already dispatched baseline energy(paying only unit cost cp instead of c1) depending on wind avail-ability and user demand to balance supply and demand In realscenarios cp lt c1 lt c2 and wind-energy is assumed to be free forbrevity [10] The operator thus tries to use as much wind energyas possible and to dispatch on day-ahead the exact amount of base-line energy not to incur in real-time in cancellation costs or in theextra cost for fast-start supplies Since more profitable strategiesmight imply a higher reliance on the uncertain wind energy or anincrease in energy prices correct system functionality needs limitson energy-unbalance risks and QoS guarantees for users

There are three sources of stochastic behavior (i) wind-energysupply (W ) (ii) traditional (Dt) user demand (iii) opportunistic(Do) user demand We thus use a stochastic optimization frame-work The result of the optimization should return optimal strate-

wα 0 0

w1αdt1αdo1γ

w1αdt1αdo1β

k=013 k=113 k=213

v1a

v1a

v1av1

b

v1b

Qa ua

Qb ub

Qa ub

w1β 0 0

s0

0 0 0

w1α 0 0w1αdt1α 0

w1β dt1β 0w1β dt1α 0

w1αdt1β 0

w1αdt1αdo1α

w1αdt1αdo1δv1b

w2β 0 0

w2α 0 0

wαdtαdoγ

wαdtαdoβva

va

vavb

Qa ua

Qb ub

Qa ub

wβ 0 0

s0

0 0 0

wαdtα 0

wβ dtβ 0wβ dtα 0wαdtβ 0

wαdtαdoα

wαdtαdoδvb

vb

ab uQ

ab uQ

Fig 4 Sketch ofME (top) andMprimeE (bottom) The latter isME

unfolded across decision epochs For each EMDP the initial states0 is shown on the left Each state is represented by the tuple(w dt do) of observations of WDt Do The pairs (Q u) rep-resent the day-ahead decisions about dispatch of baseline energyQand energy pricing for traditional users u Two arrows per decisiondepart from each state because in this figure we assumed only twodiscretization levels for each quantity (labeled with greek lettersα β middot middot middot ) We only show the state graph in the EMDPs related todecision (Qa ua) but similar state graphs are present also for theother decision pairs (Qa ub) (Qb ua) (Qb ub) as hinted by thedashed arrows departing from s0

gies about 1) the day-ahead decisions (Q and u) and 2) the real-time decisions (v) for each possible observation of W and Dt Inreal-time the actual decision (vobs) will be taken deterministicallyamong the synthesized ones based on the observed values w anddt ie the actual wind availability and traditional-user demandIn summary we will optimize over one T1-slot (the decision prob-lem is periodic so we can run one optimization for each T1-slotstand-alone) and aim to determine optimal values for Q u and v

We use the Ellipsoidal-MDPME = (S S0 AΩF AX L)sketched in Fig 4 (top) All quantities are bounded and uniformlydiscretized to keep the state and action spaces finite States s isin Sare a tuple s = (w dt do) where w dt do refer to the observedvalues of available wind energy and user demand in that state Weconsider only one T1-slot per time so we model only one choiceof day-ahead energy dispatch Q and pricing for traditional usersu (at the initial state s0 (Q u) isin A) The process then transi-tions through K decision epochs as follows First values of windenergy (w) and traditional user demand (dt) are stochastically cho-sen according to the corresponding distributions (described below)Second for each observation of W and Dt a decision on v isin A ismade and the opportunistic user demand (do) is stochastically cho-sen To transition between epochs a new value of wind energy w(the backward arrow in Fig 4 (top)) is chosen and the steps repeat

In general the optimal decision v at each epoch depends onprevious observations of wind availability (w) and user demand(dt do) so v is history-dependent To synthesize control strategiesusing the algorithm in Section 4 we unfold the sequence of deci-sion epochs in the EMDPMprimeE = (Sprime S0 A

primeΩF primeAprimeX prime Lprime)in Fig 4 (bottom) InMprimeE we have explicitly marked each quan-tity with an additional subscript k = 1 middot middot middot K to refer to the cor-responding T2-slot Each state s isin S ofME (apart from the initialstate) has been replicated K times in MprimeE s rarr s1 s2 middot middot middot sK After K decision epochs the states transition to an absorbing state(not shown in Fig 4) Since now all decision epochs are explicitly

codified in the model Markov strategies are optimal forMprimeE Thecorresponding optimal history-dependent strategy forME can bereconstructed for each s isin S by collecting the sequence of optimaldecisions returned by the algorithm in the replicas s1 s2 middot middot middot sK In the following we will only considerMprimeE As an additional ad-vantage different wind distributions can be used to transition be-tween different epochs inMprimeE (only one distribution could be usedinME) to account for the time-varying nature of wind availabilityThe modeling expressivity is thus increased

State transition probabilities are computed using the followingstochastic models

User demand The demand of both traditional (Dt) and oppor-tunistic (Do) users is modeled using Gaussian distributions [10]with Dt sim N (αtu

γt βtE[Dt]) and Do sim N (αovγo βoE[Do])

Parameter γt lt 0 (γo lt 0) is the elasticity of the traditional (op-portunistic) users ie the ratio of the percentage change of theexpected demand to that of price variations formally defined asγt = uE[Dt] middot partE[Dt]partu Parameters αt αo βt βo are fitting pa-rameters To compute transition probabilities we truncate and dis-cretize the continuous PDs in equally-sized intervals pick the mid-dle point of each interval as the discretization value and then inte-grate the PD across the interval to determine how likely the systemtransitions to that discretized value2

Wind-energy availability We created a stochastic model of theavailable wind energy starting from measured data collected fromthe wind farm at Lake Benton Minnesota USA [29] The goal is totake forecast values into account while also considering the intrin-sic inaccuracies of these predictions First we compute the (dis-crete) empirical PD microW of a training set of collected wind-energydata Second we divide a new set of data in T2-slots and considerthe average value for each new T2-slot as the forecast energy valueWe then scale microW to have such expected value E[Wk] thus obtain-ing microWk Finally we compute ellipsoidal Sets (3) F primeas (collectedinF prime) to represent uncertainty in the transition probability betweentwo discretized energy levels in two consecutive T2-slots Transi-tion frequencies are computed by counting observed transitions in atraining set of data Using classical results from statistics [17] wethen compute the value of parameter β from Set (1) correspond-ing to the confidence level CL in the measurements In particular0 le CL le 1 and CL = 1minus cdfχ2

d(2 lowast (βmax minus β)) where cdfχ2

d

is the cumulative density function of the Chi-squared distributionwith d degrees of freedom (d is equal to the number of bins used todiscretize W ) Moreover an increasing value of parameter β canbe used in the sequence of decision epochs to model the fact thatforecast farther-away in time are less accurate

We provide the states with thick circles in Fig 4 (bottom) withthree reward structures to express the profit and risk of the oper-ator and the QoS for the users We choose those states becausethe quantities Dtk DokWk are all fully observable in them thusallowing the computation of the rewards We set

rProfitsk [$] = udtk+vkdokminus(cp∆k+c1(qminus∆k))1A∆kge0

minus(c1qminusc2∆k)1B∆klt0 (13a)

rLoLsk [MWh] = max(0 XE[Wk] + Y E[Dtk +Dok]minus∆k)(13b)

rQualitysk [MWh] = dtk + dok (13c)

with ∆k = wk + q minus dtk minus dok representing the surplus ofsupply on demand X and Y defined in the following and 1 theindicator function Reward (13a) subtracts operating costs to the

2For simplicity we assume that the distributions of Dt and Doare known with certainty In fact these distributions are usuallyestimated more accurately than the wind forecast by analyzing thehistory of measured data [10] Adding uncertainty models also forthese distributions is left as future work

operator revenue to compute the net profit Indicator 1A (1B) cor-responds to the scenario when the sum of day-ahead dispatchedand wind energy is sufficient (insufficient) to cover the demandIn the latter case fast-start energy needs to be dispatched in real-time Reward (13b) computes the Loss of Load (LoL) In prac-tical scenarios the amount of fast-start energy available in real-time is limited Often this limit is computed with the formulaFS le XE[W ] + Y E[Dt +Do] (eg X = 3 Y = 10) [30]If ∆ + FS lt 0 the network incurs in a LoL with potentiallyrisky consequences Reward (13c) accounts for user demand in-centivized by energy pricing

Finally we mark all states with ∆ +FS lt 0 with the label riskand use label abs for the absorbing state so Ω = risk abs

The optimal strategy σlowast = (ulowast Qlowast vlowastk) 1 le k le K is thesolution of problem

Wlowasts0 = maxQu

minfas isinFprimeas

Eσfas

W EσDt maxv1middotmiddotmiddotvK

EσDorewrProfit(πK)

stMprimeE σlowast |=Nat φ where (14)

φ = RrLoL

leEENSM [C abs] andRrQuality

geQoSm [C abs]

and Pge1minusLoLPM [notrisk U abs]

In Problem (14) we maximize the expected value of the operatorprofit Wlowasts0 under the worst-case resolution of uncertainty in thewind-energy forecast by summing the instantaneous state rewardsrProfits along the paths π isin Πfin of K steps ofMprimeE correspond-ing to the K T2-slots By replicating the decision process for eachof the K T2-slots while building modelMprimeE every K-step execu-tion path traverses all decision epochs so the computed reward asintroduced in Definition 26 and 27 will account for the sum of thecontributions of each decision epoch

According to the semantics defined in Table 1 the PCTL specifi-cation φ constrains the expected operator risk and user QoS acrossthe decision horizon EENSM is the desired maximum value ofExpected Energy Not Served LoLPM is the maximum allowedvalue of Loss of Load Probability (these two properties limit therisk for the operator) andQoSm is the minimum value of QoS thatneeds to be guaranteed to the users

6 EXPERIMENTAL RESULTSWe implemented the algorithm in Python and interfaced it with

PRISM [31] as a front-end for entering models and with Gurobi [27]as the back-end optimizer Experiments were run on a 24 GHz In-tel Xeon with 32GB of RAM

In this section we present experimental results obtained by solv-ing Problem (14) Our goals are to give insight about the algo-rithm functionality compare its scalability to other strategy synthe-sis approaches and show that the synthesized energy pricing anddispatch strategies can achieve better performances than other so-lutions presented in the literature

We define the following quantities

Profit = Wlowasts0EENS = Rr

LoLσlowastmaxs0 [C abs]

QoS = RrQualityσlowastmins0 [C abs]

1minus LoLP = Pσlowastmin

s0 [notrisk U abs]

where the min and max operators refer to the action range of na-ture Nat As defined in Section 22 these quantities represent thequantitative values of rewards and satisfaction probability that getthen compared to the corresponding thresholds (EENSM QoSmLoLPM ) in Problem (14) to determine the satisfaction of φ Wewill then normalize the Profit to ProfitM the maximum com-puted profit value for each set of experiments We set T1 = 1hT2 = 30min so K = 2 and consider two pricing options bothfor traditional and opportunistic users If not otherwise stated

Fig 5 Performance vs iteration

Fig 6 Profit vs confidence in forecast

we will use CL = 90 and discretize the wind energy W in 5bins and traditional Dt opportunistic Do demands and baselinesupply Q in 2 bins We set QoSm = 80

sumk E[Dtk +Dok]

LoLPM = 10 EENSM = 5sumk E[Wk + q] The other pa-

rameter values were taken from [10]In Fig 5 we show the trend of the expected system performance

as a function of the synthesis algorithm iteration The Profitmonotonically decreases until the proposed candidate strategy σcmeets all specifications We note that 1minusLoLP andQoS (EENSnot shown has a trend similar to 1 minus LoLP ) instead vary non-monotonically Intuitively this is because the Profit can be in-creased either by scheduling less baseline energy Q to reduce can-cellation costs cp but incurring in a higher risk or by increasing the

Table 2 Performance Analysis

W bins 5 10 15 20Profit 1 098 097 0965

1minus LoLP 099 099 099 099EENS 098 098 098 098QoS 101 101 101 101

Runtime 144s 400s 1368s 3289sIter 223 53 547 332N + Tr 1343 2719 4115 5591

MD Strat (I) 4096 42e6 43e9 44e12

005 010 015 020 025 030082084086088090092094096098100

Pro

fit

Pro

fit M

Performance Comparison vs Varaiya et al [7] and He et al [10]

Ours[7][10]

005 010 015 020 025 03000020406081012

EE

NS

EE

NS M

Ours[7][10]

005 010 015 020 025 030Wind Penetration ηW

088090092094096098100102

QoS

QoS

m Ours[7][10]

Fig 7 Comparison to alternative approaches via Monte Carlo sim-ulation

energy price v for opportunistic users with consequent reduction ofQoS (the peaks of 1 minus LoLP are aligned to the valleys of QoS)By ranking strategies by expected Profit our algorithm is capa-ble of selecting the optimal strategy despite the complex parameterinterdependences in the model under analysis

In Table 2 we compare synthesis results while varying the num-ber of discretization bins for W (all values are normalized to thecorresponding specification) First we note that the expected sys-tem performances do not substantially vary by changing the num-ber of bins supporting our choice of 5 bins in the other experi-ments Second runtime results show that the algorithm can handlein reasonable time problems more than 10times larger than the onesanalyzed in [14] the only other algorithm in the literature capableof processing arbitrary PCTL formulas (we useN+Tr the sum ofstates and transitions in the EMDP as a proxy of the model size)Third we note that the alternative approach of verifying all the IMD strategies σ isin ΣMD is not practical due to the exponentialincrease of I (defined in Section 211) with the problem size

In Fig 6 we study the effect of different confidence levels CL inthe wind-energy forecast on the expected operator Profit whilevarying the value of wind penetration ηW =

sumKk=1

E[Wk]E[Wk + q]

and keeping all constraints constant At high CL higher profitscan be expected for increasing ηW (wind energy is assumed free)On the other hand for low CL higher wind penetration createsmore uncertainty thus lowering the expected Profit The networkoperator can use these curves to assess the return of investment inemploying more accurate (and expensive) forecast techniques

Finally in Fig 7 we compare results with two other energy-pricing formulations proposed in the literature He et al [10] solvethe optimization problem without enforcing any constraint Varaiyaet al [7] only put limits on the acceptable LoLP (their approach isnot trivially extendable to reward properties expressed using the Roperator) and solve optimally only for LoLP = 0 Comparison isdone by solving the different optimization problems and then run-ning Monte Carlo simulations (1000 runs) of the controlled systemon test data (different from the training ones) to evaluate its per-formance (Profit EENS and QoS) As expected the uncon-strained strategy from [10] has higher Profit (up to 5) but alsoup to 12 more EENS and 10 less QoS compared to our ap-proach The strategy from [7] guarantees null EENS but it hasup to 6 less Profit (due to over constraining EENS) and 10less QoS (which is left unconstrained)

As a final remark we note that runtime may increase exponen-tially as we tighten the specification thresholds (QoSm LoLPM EENSM ) since it becomes increasingly more difficult to find a

solution within the exponentially-sized search space Neverthelessthe chosen values were tight enough to improve the quality of alter-native energy pricing and dispatch strategies proposed in the litera-ture while maintaining the runtime acceptable for this application

7 CONCLUSIONS AND FUTURE WORKWe proposed the first sound and complete algorithm for the syn-

thesis of control strategies for MDPs satisfying properties expressedin PCTL and robust to uncertainties in the transition probabilitiesWe then applied the algorithm to the problem of energy pricingand dispatch in smart grids with renewables Results showed thatnetwork-operator risks can be effectively constrained at design timeand that more accurate predictions of the expected profit can be ob-tained by taking the uncertainty of wind availability into account

As future work we plan to investigate techniques to generatemore concise constraints to prove failure of the verification step inorder to prune more effectively the search space for the optimiza-tion engine and to apply the proposed strategy synthesis approachto further case studies eg semi-autonomous car driving

Acknowledgments The authors thank J Venkatesh and profT Simunic Rosing for providing the measurement data used in thecase study The research was partially funded by DARPA AwardNumber HR0011-12-2-0016 and by TerraSwarm one of six cen-ters of STARnet a Semiconductor Research Corporation programsponsored by MARCO and DARPA Approved for public releasedistribution is unlimited The content of this paper does not neces-sarily reflect the position or the policy of the US government andno official endorsement should be inferred

8 REFERENCES[1] Courcoubetis C and Yannakakis M ldquoThe Complexity of

Probabilistic Verificationrdquo Journal of ACM vol 42(4) pp857ndash907 1995

[2] H Hansson and B Jonsson ldquoA Logic for Reasoning AboutTime and Reliabilityrdquo Formal Aspects of Computing vol6(5) pp 512ndash535 1994

[3] A Bianco and L de Alfaro ldquoModel Checking ofProbabilistic and Nondeterministic Systemsrdquo in Proc ofFSTTCS ser LNCS Springer Berlin Heidelberg 1995vol 1026 pp 499ndash513

[4] M Puterman Markov Decision Processes DiscreteStochastic Dynamic Programming John Wiley and Sons1994

[5] ldquoInternational Energy Agency World Energy Outlook 2009rdquo[Online] httpwwwworldenergyoutlookorgdocsweo2009WEO2009_es_englishpdf

[6] A Gore An Inconvenient Truth New York Rodale 2006[7] P Varaiya F Wu and J Bialek ldquoSmart Operation of Smart

Grid Risk-Limiting Dispatchrdquo Proceedings of the IEEEvol 99 no 1 pp 40ndash57 2011

[8] K Porter and J Rogers ldquoSurvey of Variable GenerationForecasting in the Westrdquo in NREL Subcontract ReportSR-5500-54457 April 2012 p 56

[9] S Koch et al ldquoModeling and Control of AggregatedHeterogeneous Thermostatically Controlled Loads forAncillary Servicesrdquo Proc of PSCC pp 1ndash7 2011

[10] M He S Murugesan and J Zhang ldquoMultiple TimescaleDispatch and Scheduling for Stochastic Reliability in SmartGrids with Wind Generation Integrationrdquo in Proc of IEEEINFOCOM 2011 pp 461ndash465

[11] A Puggelli et al ldquoPolynomial-Time Verification of PCTLproperties of MDPs with Convex Uncertaintiesrdquo in Proc of

CAV ser LNCS Springer Berlin Heidelberg 2013 vol8044 pp 527ndash542

[12] A Puggelli ldquoFormal Techniques for the Verification andOptimal Control of Probabilistic Systems in the Presence ofModeling Uncertaintiesrdquo PhD dissertation University ofCalifornia Berkeley Berkeley CA 2014

[13] C Baier et al ldquoController Synthesis for ProbabilisticSystemsrdquo in Exploring New Frontiers of TheoreticalInformatics Springer US 2004 vol 155 pp 493ndash506

[14] M Lahijanian et al ldquoTemporal Logic Motion Planning andControl with Probabilistic Satisfaction Guaranteesrdquo IEEETransactions on Robotics vol 28 no 2 pp 396ndash409 2012

[15] M Kwiatkowska and D Parker ldquoAutomated Verificationand Strategy Synthesis for Probabilistic Systemsrdquo in Proc ofATVA ser LNCS vol 8172 Springer 2013 pp 5ndash22

[16] K Sen et al ldquoModel-Checking Markov Chains in thePresence of Uncertaintiesrdquo in Proc of TACAS ser LNCSSpringer Berlin Heidelberg 2006 vol 3920 pp 394ndash410

[17] A Nilim and L El Ghaoui ldquoRobust Control of MarkovDecision Processes with Uncertain Transition MatricesrdquoJournal of Operations Research pp 780ndash798 2005

[18] M Y Vardi ldquoAutomatic Verification of ProbabilisticConcurrent Finite State Programsrdquo in Proc of SFCS 1985pp 327ndash338

[19] V Forejt et al ldquoAutomated Verification Techniques forProbabilistic Systemsrdquo Formal Methods for EternalNetworked Software Systems vol 6659 pp 53ndash113 2011

[20] F Bouffard and F Galiana ldquoStochastic Security forOperations Planning with Significant Wind PowerGenerationrdquo in IEEE Transaction on Power Systems 2008vol 23 no 2 pp 306ndash316

[21] A Kucera and O Stražovskyacute ldquoOn the Controller Synthesisfor Finite-State Markov Decision Processesrdquo in Proc ofFSTTCS ser LNCS Springer Berlin Heidelberg 2005vol 3821 pp 541ndash552

[22] K Draeger et al ldquoPermissive Controller Synthesis forProbabilistic Systemsrdquo in Proc of TACAS ser LNCSSpringer 2014 pp 531ndash546

[23] T Brazdil et al ldquoStochastic Games with Branching-TimeWinning Objectivesrdquo in Proc of IEEE SLCS Aug 2006 pp349ndash358

[24] C Hang et al ldquoSynthesizing Cyber-Physical ArchitecturalModels with Real-Time Constraintsrdquo in Proc of CAVBerlin Heidelberg Springer-Verlag 2011 pp 441ndash456

[25] P Nuzzo et al ldquoCalCS SMT Solving for Non-linearConvex Constraintsrdquo in Proc of FMCAD 2010 pp 71ndash80

[26] S Boyd and L Vandenberghe ldquoConvex OptimizationrdquoCambridge University Press 2004

[27] ldquoGurobi Optimizerrdquo [Online] httpwwwgurobicom[28] S Stoft Power System Economics Designing Markets for

Electricity NewYork Wiley 2002[29] Y Wan ldquoLong-term Wind Power Variabilityrdquo NREL

Technical Report TP-5500-53637 [Online]httpwwwnrelgovdocsfy12osti53637pdf January 2012

[30] E Ela et al ldquoOperating Reserves and Variable GenerationrdquoNREL Technical Report [Online]httpwwwnrelgovdocsfy11osti51978pdf August 2011

[31] M Kwiatkowska et al ldquoPRISM 40 Verification ofProbabilistic Real-Time Systemsrdquo Proc of CAV pp585ndash591 2011

Page 7: Robust Strategy Synthesis for Probabilistic Systems Applied to …sseshia/pubdir/emsoft... · 2014-08-07 · Robust Strategy Synthesis for Probabilistic Systems Applied to Risk-Limiting

Real-time price13

Day-ahead price13

Network Operator13Energy dispatch and pricing13

Base-line13 Wind13 Fast-start13

Traditional13Users13

Opportunistic13Users13

Day-ahead13dispatch13

Real-time dispatch13

Uncertain demand13

Uncertain supply13

Supply13

Demand13

Fig 3 Network operator inputs (dashed) and outputs (solid)

nally speed-ups can be obtained by implementing online routinesfor integer-constraint simplification and to produce more succinctcertificates of unfeasibility from the VE [2425] These routines areoutside the scope of this paper and will be covered in future work

5 SUPPLY SCHEDULING AND ENERGYPRICING WITH RENEWABLE SOURCES

We model the pricing and dispatch problem following the sce-nario sketched in Fig 3 [7 10 28] More details are availablein [12]

Three agents operate in the system the network operator andtwo types of users traditional and opportunistic Further three en-ergy sources are available non-dispatchable wind and two dis-patchable sources baseline and fast-start generators We will usestochastic variables WDt Do to represent wind energy supplyand traditional and opportunistic user demand respectively

The network operator takes two kinds of decisions 1) dispatchof non-renewable sources to guarantee that the aggregate energysupply matches the demand 2) pricing of energy to maximizeprofits and incentivize users to join or leave the network depend-ing on energy availability Traditional and opportunistic users re-act to pricing decisions on different timescales Traditional usersonly react to day-ahead pricing to decide how much energy theyare willing to purchase while opportunistic users are capable ofrescheduling in real-time their energy demand depending on theenergy price in exchange of lower expected prices

A 24-hour period gets divided into T1-slots of equal length (egT1 = 1h) and each T1-slot into K T2-slots (eg T2 = 30minK = 2) [10] The operator maximizes its economic profit by tak-ing decisions on the two time scales On day-ahead for each T1-slot it dispatches Q units of baseline energy (unit cost c1) withq = QK units per T2-slot and sets the price for traditional users(u) so that they can decide one day ahead when to schedule theirdemand in the following day The choice of u sets the expected de-mand of traditional users (E[Dt]) In real-time for each T2-slot theoperator first observes the values of traditional-user demand dt andwind availabilityw It then sets the price for opportunistic users (v)to determine the expected demand of opportunistic users (E[Do])Third it dispatches the production of more fast-start energy (unitcost c2) or cancels part of the already dispatched baseline energy(paying only unit cost cp instead of c1) depending on wind avail-ability and user demand to balance supply and demand In realscenarios cp lt c1 lt c2 and wind-energy is assumed to be free forbrevity [10] The operator thus tries to use as much wind energyas possible and to dispatch on day-ahead the exact amount of base-line energy not to incur in real-time in cancellation costs or in theextra cost for fast-start supplies Since more profitable strategiesmight imply a higher reliance on the uncertain wind energy or anincrease in energy prices correct system functionality needs limitson energy-unbalance risks and QoS guarantees for users

There are three sources of stochastic behavior (i) wind-energysupply (W ) (ii) traditional (Dt) user demand (iii) opportunistic(Do) user demand We thus use a stochastic optimization frame-work The result of the optimization should return optimal strate-

wα 0 0

w1αdt1αdo1γ

w1αdt1αdo1β

k=013 k=113 k=213

v1a

v1a

v1av1

b

v1b

Qa ua

Qb ub

Qa ub

w1β 0 0

s0

0 0 0

w1α 0 0w1αdt1α 0

w1β dt1β 0w1β dt1α 0

w1αdt1β 0

w1αdt1αdo1α

w1αdt1αdo1δv1b

w2β 0 0

w2α 0 0

wαdtαdoγ

wαdtαdoβva

va

vavb

Qa ua

Qb ub

Qa ub

wβ 0 0

s0

0 0 0

wαdtα 0

wβ dtβ 0wβ dtα 0wαdtβ 0

wαdtαdoα

wαdtαdoδvb

vb

ab uQ

ab uQ

Fig 4 Sketch ofME (top) andMprimeE (bottom) The latter isME

unfolded across decision epochs For each EMDP the initial states0 is shown on the left Each state is represented by the tuple(w dt do) of observations of WDt Do The pairs (Q u) rep-resent the day-ahead decisions about dispatch of baseline energyQand energy pricing for traditional users u Two arrows per decisiondepart from each state because in this figure we assumed only twodiscretization levels for each quantity (labeled with greek lettersα β middot middot middot ) We only show the state graph in the EMDPs related todecision (Qa ua) but similar state graphs are present also for theother decision pairs (Qa ub) (Qb ua) (Qb ub) as hinted by thedashed arrows departing from s0

gies about 1) the day-ahead decisions (Q and u) and 2) the real-time decisions (v) for each possible observation of W and Dt Inreal-time the actual decision (vobs) will be taken deterministicallyamong the synthesized ones based on the observed values w anddt ie the actual wind availability and traditional-user demandIn summary we will optimize over one T1-slot (the decision prob-lem is periodic so we can run one optimization for each T1-slotstand-alone) and aim to determine optimal values for Q u and v

We use the Ellipsoidal-MDPME = (S S0 AΩF AX L)sketched in Fig 4 (top) All quantities are bounded and uniformlydiscretized to keep the state and action spaces finite States s isin Sare a tuple s = (w dt do) where w dt do refer to the observedvalues of available wind energy and user demand in that state Weconsider only one T1-slot per time so we model only one choiceof day-ahead energy dispatch Q and pricing for traditional usersu (at the initial state s0 (Q u) isin A) The process then transi-tions through K decision epochs as follows First values of windenergy (w) and traditional user demand (dt) are stochastically cho-sen according to the corresponding distributions (described below)Second for each observation of W and Dt a decision on v isin A ismade and the opportunistic user demand (do) is stochastically cho-sen To transition between epochs a new value of wind energy w(the backward arrow in Fig 4 (top)) is chosen and the steps repeat

In general the optimal decision v at each epoch depends onprevious observations of wind availability (w) and user demand(dt do) so v is history-dependent To synthesize control strategiesusing the algorithm in Section 4 we unfold the sequence of deci-sion epochs in the EMDPMprimeE = (Sprime S0 A

primeΩF primeAprimeX prime Lprime)in Fig 4 (bottom) InMprimeE we have explicitly marked each quan-tity with an additional subscript k = 1 middot middot middot K to refer to the cor-responding T2-slot Each state s isin S ofME (apart from the initialstate) has been replicated K times in MprimeE s rarr s1 s2 middot middot middot sK After K decision epochs the states transition to an absorbing state(not shown in Fig 4) Since now all decision epochs are explicitly

codified in the model Markov strategies are optimal forMprimeE Thecorresponding optimal history-dependent strategy forME can bereconstructed for each s isin S by collecting the sequence of optimaldecisions returned by the algorithm in the replicas s1 s2 middot middot middot sK In the following we will only considerMprimeE As an additional ad-vantage different wind distributions can be used to transition be-tween different epochs inMprimeE (only one distribution could be usedinME) to account for the time-varying nature of wind availabilityThe modeling expressivity is thus increased

State transition probabilities are computed using the followingstochastic models

User demand The demand of both traditional (Dt) and oppor-tunistic (Do) users is modeled using Gaussian distributions [10]with Dt sim N (αtu

γt βtE[Dt]) and Do sim N (αovγo βoE[Do])

Parameter γt lt 0 (γo lt 0) is the elasticity of the traditional (op-portunistic) users ie the ratio of the percentage change of theexpected demand to that of price variations formally defined asγt = uE[Dt] middot partE[Dt]partu Parameters αt αo βt βo are fitting pa-rameters To compute transition probabilities we truncate and dis-cretize the continuous PDs in equally-sized intervals pick the mid-dle point of each interval as the discretization value and then inte-grate the PD across the interval to determine how likely the systemtransitions to that discretized value2

Wind-energy availability We created a stochastic model of theavailable wind energy starting from measured data collected fromthe wind farm at Lake Benton Minnesota USA [29] The goal is totake forecast values into account while also considering the intrin-sic inaccuracies of these predictions First we compute the (dis-crete) empirical PD microW of a training set of collected wind-energydata Second we divide a new set of data in T2-slots and considerthe average value for each new T2-slot as the forecast energy valueWe then scale microW to have such expected value E[Wk] thus obtain-ing microWk Finally we compute ellipsoidal Sets (3) F primeas (collectedinF prime) to represent uncertainty in the transition probability betweentwo discretized energy levels in two consecutive T2-slots Transi-tion frequencies are computed by counting observed transitions in atraining set of data Using classical results from statistics [17] wethen compute the value of parameter β from Set (1) correspond-ing to the confidence level CL in the measurements In particular0 le CL le 1 and CL = 1minus cdfχ2

d(2 lowast (βmax minus β)) where cdfχ2

d

is the cumulative density function of the Chi-squared distributionwith d degrees of freedom (d is equal to the number of bins used todiscretize W ) Moreover an increasing value of parameter β canbe used in the sequence of decision epochs to model the fact thatforecast farther-away in time are less accurate

We provide the states with thick circles in Fig 4 (bottom) withthree reward structures to express the profit and risk of the oper-ator and the QoS for the users We choose those states becausethe quantities Dtk DokWk are all fully observable in them thusallowing the computation of the rewards We set

rProfitsk [$] = udtk+vkdokminus(cp∆k+c1(qminus∆k))1A∆kge0

minus(c1qminusc2∆k)1B∆klt0 (13a)

rLoLsk [MWh] = max(0 XE[Wk] + Y E[Dtk +Dok]minus∆k)(13b)

rQualitysk [MWh] = dtk + dok (13c)

with ∆k = wk + q minus dtk minus dok representing the surplus ofsupply on demand X and Y defined in the following and 1 theindicator function Reward (13a) subtracts operating costs to the

2For simplicity we assume that the distributions of Dt and Doare known with certainty In fact these distributions are usuallyestimated more accurately than the wind forecast by analyzing thehistory of measured data [10] Adding uncertainty models also forthese distributions is left as future work

operator revenue to compute the net profit Indicator 1A (1B) cor-responds to the scenario when the sum of day-ahead dispatchedand wind energy is sufficient (insufficient) to cover the demandIn the latter case fast-start energy needs to be dispatched in real-time Reward (13b) computes the Loss of Load (LoL) In prac-tical scenarios the amount of fast-start energy available in real-time is limited Often this limit is computed with the formulaFS le XE[W ] + Y E[Dt +Do] (eg X = 3 Y = 10) [30]If ∆ + FS lt 0 the network incurs in a LoL with potentiallyrisky consequences Reward (13c) accounts for user demand in-centivized by energy pricing

Finally we mark all states with ∆ +FS lt 0 with the label riskand use label abs for the absorbing state so Ω = risk abs

The optimal strategy σlowast = (ulowast Qlowast vlowastk) 1 le k le K is thesolution of problem

Wlowasts0 = maxQu

minfas isinFprimeas

Eσfas

W EσDt maxv1middotmiddotmiddotvK

EσDorewrProfit(πK)

stMprimeE σlowast |=Nat φ where (14)

φ = RrLoL

leEENSM [C abs] andRrQuality

geQoSm [C abs]

and Pge1minusLoLPM [notrisk U abs]

In Problem (14) we maximize the expected value of the operatorprofit Wlowasts0 under the worst-case resolution of uncertainty in thewind-energy forecast by summing the instantaneous state rewardsrProfits along the paths π isin Πfin of K steps ofMprimeE correspond-ing to the K T2-slots By replicating the decision process for eachof the K T2-slots while building modelMprimeE every K-step execu-tion path traverses all decision epochs so the computed reward asintroduced in Definition 26 and 27 will account for the sum of thecontributions of each decision epoch

According to the semantics defined in Table 1 the PCTL specifi-cation φ constrains the expected operator risk and user QoS acrossthe decision horizon EENSM is the desired maximum value ofExpected Energy Not Served LoLPM is the maximum allowedvalue of Loss of Load Probability (these two properties limit therisk for the operator) andQoSm is the minimum value of QoS thatneeds to be guaranteed to the users

6 EXPERIMENTAL RESULTSWe implemented the algorithm in Python and interfaced it with

PRISM [31] as a front-end for entering models and with Gurobi [27]as the back-end optimizer Experiments were run on a 24 GHz In-tel Xeon with 32GB of RAM

In this section we present experimental results obtained by solv-ing Problem (14) Our goals are to give insight about the algo-rithm functionality compare its scalability to other strategy synthe-sis approaches and show that the synthesized energy pricing anddispatch strategies can achieve better performances than other so-lutions presented in the literature

We define the following quantities

Profit = Wlowasts0EENS = Rr

LoLσlowastmaxs0 [C abs]

QoS = RrQualityσlowastmins0 [C abs]

1minus LoLP = Pσlowastmin

s0 [notrisk U abs]

where the min and max operators refer to the action range of na-ture Nat As defined in Section 22 these quantities represent thequantitative values of rewards and satisfaction probability that getthen compared to the corresponding thresholds (EENSM QoSmLoLPM ) in Problem (14) to determine the satisfaction of φ Wewill then normalize the Profit to ProfitM the maximum com-puted profit value for each set of experiments We set T1 = 1hT2 = 30min so K = 2 and consider two pricing options bothfor traditional and opportunistic users If not otherwise stated

Fig 5 Performance vs iteration

Fig 6 Profit vs confidence in forecast

we will use CL = 90 and discretize the wind energy W in 5bins and traditional Dt opportunistic Do demands and baselinesupply Q in 2 bins We set QoSm = 80

sumk E[Dtk +Dok]

LoLPM = 10 EENSM = 5sumk E[Wk + q] The other pa-

rameter values were taken from [10]In Fig 5 we show the trend of the expected system performance

as a function of the synthesis algorithm iteration The Profitmonotonically decreases until the proposed candidate strategy σcmeets all specifications We note that 1minusLoLP andQoS (EENSnot shown has a trend similar to 1 minus LoLP ) instead vary non-monotonically Intuitively this is because the Profit can be in-creased either by scheduling less baseline energy Q to reduce can-cellation costs cp but incurring in a higher risk or by increasing the

Table 2 Performance Analysis

W bins 5 10 15 20Profit 1 098 097 0965

1minus LoLP 099 099 099 099EENS 098 098 098 098QoS 101 101 101 101

Runtime 144s 400s 1368s 3289sIter 223 53 547 332N + Tr 1343 2719 4115 5591

MD Strat (I) 4096 42e6 43e9 44e12

005 010 015 020 025 030082084086088090092094096098100

Pro

fit

Pro

fit M

Performance Comparison vs Varaiya et al [7] and He et al [10]

Ours[7][10]

005 010 015 020 025 03000020406081012

EE

NS

EE

NS M

Ours[7][10]

005 010 015 020 025 030Wind Penetration ηW

088090092094096098100102

QoS

QoS

m Ours[7][10]

Fig 7 Comparison to alternative approaches via Monte Carlo sim-ulation

energy price v for opportunistic users with consequent reduction ofQoS (the peaks of 1 minus LoLP are aligned to the valleys of QoS)By ranking strategies by expected Profit our algorithm is capa-ble of selecting the optimal strategy despite the complex parameterinterdependences in the model under analysis

In Table 2 we compare synthesis results while varying the num-ber of discretization bins for W (all values are normalized to thecorresponding specification) First we note that the expected sys-tem performances do not substantially vary by changing the num-ber of bins supporting our choice of 5 bins in the other experi-ments Second runtime results show that the algorithm can handlein reasonable time problems more than 10times larger than the onesanalyzed in [14] the only other algorithm in the literature capableof processing arbitrary PCTL formulas (we useN+Tr the sum ofstates and transitions in the EMDP as a proxy of the model size)Third we note that the alternative approach of verifying all the IMD strategies σ isin ΣMD is not practical due to the exponentialincrease of I (defined in Section 211) with the problem size

In Fig 6 we study the effect of different confidence levels CL inthe wind-energy forecast on the expected operator Profit whilevarying the value of wind penetration ηW =

sumKk=1

E[Wk]E[Wk + q]

and keeping all constraints constant At high CL higher profitscan be expected for increasing ηW (wind energy is assumed free)On the other hand for low CL higher wind penetration createsmore uncertainty thus lowering the expected Profit The networkoperator can use these curves to assess the return of investment inemploying more accurate (and expensive) forecast techniques

Finally in Fig 7 we compare results with two other energy-pricing formulations proposed in the literature He et al [10] solvethe optimization problem without enforcing any constraint Varaiyaet al [7] only put limits on the acceptable LoLP (their approach isnot trivially extendable to reward properties expressed using the Roperator) and solve optimally only for LoLP = 0 Comparison isdone by solving the different optimization problems and then run-ning Monte Carlo simulations (1000 runs) of the controlled systemon test data (different from the training ones) to evaluate its per-formance (Profit EENS and QoS) As expected the uncon-strained strategy from [10] has higher Profit (up to 5) but alsoup to 12 more EENS and 10 less QoS compared to our ap-proach The strategy from [7] guarantees null EENS but it hasup to 6 less Profit (due to over constraining EENS) and 10less QoS (which is left unconstrained)

As a final remark we note that runtime may increase exponen-tially as we tighten the specification thresholds (QoSm LoLPM EENSM ) since it becomes increasingly more difficult to find a

solution within the exponentially-sized search space Neverthelessthe chosen values were tight enough to improve the quality of alter-native energy pricing and dispatch strategies proposed in the litera-ture while maintaining the runtime acceptable for this application

7 CONCLUSIONS AND FUTURE WORKWe proposed the first sound and complete algorithm for the syn-

thesis of control strategies for MDPs satisfying properties expressedin PCTL and robust to uncertainties in the transition probabilitiesWe then applied the algorithm to the problem of energy pricingand dispatch in smart grids with renewables Results showed thatnetwork-operator risks can be effectively constrained at design timeand that more accurate predictions of the expected profit can be ob-tained by taking the uncertainty of wind availability into account

As future work we plan to investigate techniques to generatemore concise constraints to prove failure of the verification step inorder to prune more effectively the search space for the optimiza-tion engine and to apply the proposed strategy synthesis approachto further case studies eg semi-autonomous car driving

Acknowledgments The authors thank J Venkatesh and profT Simunic Rosing for providing the measurement data used in thecase study The research was partially funded by DARPA AwardNumber HR0011-12-2-0016 and by TerraSwarm one of six cen-ters of STARnet a Semiconductor Research Corporation programsponsored by MARCO and DARPA Approved for public releasedistribution is unlimited The content of this paper does not neces-sarily reflect the position or the policy of the US government andno official endorsement should be inferred

8 REFERENCES[1] Courcoubetis C and Yannakakis M ldquoThe Complexity of

Probabilistic Verificationrdquo Journal of ACM vol 42(4) pp857ndash907 1995

[2] H Hansson and B Jonsson ldquoA Logic for Reasoning AboutTime and Reliabilityrdquo Formal Aspects of Computing vol6(5) pp 512ndash535 1994

[3] A Bianco and L de Alfaro ldquoModel Checking ofProbabilistic and Nondeterministic Systemsrdquo in Proc ofFSTTCS ser LNCS Springer Berlin Heidelberg 1995vol 1026 pp 499ndash513

[4] M Puterman Markov Decision Processes DiscreteStochastic Dynamic Programming John Wiley and Sons1994

[5] ldquoInternational Energy Agency World Energy Outlook 2009rdquo[Online] httpwwwworldenergyoutlookorgdocsweo2009WEO2009_es_englishpdf

[6] A Gore An Inconvenient Truth New York Rodale 2006[7] P Varaiya F Wu and J Bialek ldquoSmart Operation of Smart

Grid Risk-Limiting Dispatchrdquo Proceedings of the IEEEvol 99 no 1 pp 40ndash57 2011

[8] K Porter and J Rogers ldquoSurvey of Variable GenerationForecasting in the Westrdquo in NREL Subcontract ReportSR-5500-54457 April 2012 p 56

[9] S Koch et al ldquoModeling and Control of AggregatedHeterogeneous Thermostatically Controlled Loads forAncillary Servicesrdquo Proc of PSCC pp 1ndash7 2011

[10] M He S Murugesan and J Zhang ldquoMultiple TimescaleDispatch and Scheduling for Stochastic Reliability in SmartGrids with Wind Generation Integrationrdquo in Proc of IEEEINFOCOM 2011 pp 461ndash465

[11] A Puggelli et al ldquoPolynomial-Time Verification of PCTLproperties of MDPs with Convex Uncertaintiesrdquo in Proc of

CAV ser LNCS Springer Berlin Heidelberg 2013 vol8044 pp 527ndash542

[12] A Puggelli ldquoFormal Techniques for the Verification andOptimal Control of Probabilistic Systems in the Presence ofModeling Uncertaintiesrdquo PhD dissertation University ofCalifornia Berkeley Berkeley CA 2014

[13] C Baier et al ldquoController Synthesis for ProbabilisticSystemsrdquo in Exploring New Frontiers of TheoreticalInformatics Springer US 2004 vol 155 pp 493ndash506

[14] M Lahijanian et al ldquoTemporal Logic Motion Planning andControl with Probabilistic Satisfaction Guaranteesrdquo IEEETransactions on Robotics vol 28 no 2 pp 396ndash409 2012

[15] M Kwiatkowska and D Parker ldquoAutomated Verificationand Strategy Synthesis for Probabilistic Systemsrdquo in Proc ofATVA ser LNCS vol 8172 Springer 2013 pp 5ndash22

[16] K Sen et al ldquoModel-Checking Markov Chains in thePresence of Uncertaintiesrdquo in Proc of TACAS ser LNCSSpringer Berlin Heidelberg 2006 vol 3920 pp 394ndash410

[17] A Nilim and L El Ghaoui ldquoRobust Control of MarkovDecision Processes with Uncertain Transition MatricesrdquoJournal of Operations Research pp 780ndash798 2005

[18] M Y Vardi ldquoAutomatic Verification of ProbabilisticConcurrent Finite State Programsrdquo in Proc of SFCS 1985pp 327ndash338

[19] V Forejt et al ldquoAutomated Verification Techniques forProbabilistic Systemsrdquo Formal Methods for EternalNetworked Software Systems vol 6659 pp 53ndash113 2011

[20] F Bouffard and F Galiana ldquoStochastic Security forOperations Planning with Significant Wind PowerGenerationrdquo in IEEE Transaction on Power Systems 2008vol 23 no 2 pp 306ndash316

[21] A Kucera and O Stražovskyacute ldquoOn the Controller Synthesisfor Finite-State Markov Decision Processesrdquo in Proc ofFSTTCS ser LNCS Springer Berlin Heidelberg 2005vol 3821 pp 541ndash552

[22] K Draeger et al ldquoPermissive Controller Synthesis forProbabilistic Systemsrdquo in Proc of TACAS ser LNCSSpringer 2014 pp 531ndash546

[23] T Brazdil et al ldquoStochastic Games with Branching-TimeWinning Objectivesrdquo in Proc of IEEE SLCS Aug 2006 pp349ndash358

[24] C Hang et al ldquoSynthesizing Cyber-Physical ArchitecturalModels with Real-Time Constraintsrdquo in Proc of CAVBerlin Heidelberg Springer-Verlag 2011 pp 441ndash456

[25] P Nuzzo et al ldquoCalCS SMT Solving for Non-linearConvex Constraintsrdquo in Proc of FMCAD 2010 pp 71ndash80

[26] S Boyd and L Vandenberghe ldquoConvex OptimizationrdquoCambridge University Press 2004

[27] ldquoGurobi Optimizerrdquo [Online] httpwwwgurobicom[28] S Stoft Power System Economics Designing Markets for

Electricity NewYork Wiley 2002[29] Y Wan ldquoLong-term Wind Power Variabilityrdquo NREL

Technical Report TP-5500-53637 [Online]httpwwwnrelgovdocsfy12osti53637pdf January 2012

[30] E Ela et al ldquoOperating Reserves and Variable GenerationrdquoNREL Technical Report [Online]httpwwwnrelgovdocsfy11osti51978pdf August 2011

[31] M Kwiatkowska et al ldquoPRISM 40 Verification ofProbabilistic Real-Time Systemsrdquo Proc of CAV pp585ndash591 2011

Page 8: Robust Strategy Synthesis for Probabilistic Systems Applied to …sseshia/pubdir/emsoft... · 2014-08-07 · Robust Strategy Synthesis for Probabilistic Systems Applied to Risk-Limiting

codified in the model Markov strategies are optimal forMprimeE Thecorresponding optimal history-dependent strategy forME can bereconstructed for each s isin S by collecting the sequence of optimaldecisions returned by the algorithm in the replicas s1 s2 middot middot middot sK In the following we will only considerMprimeE As an additional ad-vantage different wind distributions can be used to transition be-tween different epochs inMprimeE (only one distribution could be usedinME) to account for the time-varying nature of wind availabilityThe modeling expressivity is thus increased

State transition probabilities are computed using the followingstochastic models

User demand The demand of both traditional (Dt) and oppor-tunistic (Do) users is modeled using Gaussian distributions [10]with Dt sim N (αtu

γt βtE[Dt]) and Do sim N (αovγo βoE[Do])

Parameter γt lt 0 (γo lt 0) is the elasticity of the traditional (op-portunistic) users ie the ratio of the percentage change of theexpected demand to that of price variations formally defined asγt = uE[Dt] middot partE[Dt]partu Parameters αt αo βt βo are fitting pa-rameters To compute transition probabilities we truncate and dis-cretize the continuous PDs in equally-sized intervals pick the mid-dle point of each interval as the discretization value and then inte-grate the PD across the interval to determine how likely the systemtransitions to that discretized value2

Wind-energy availability We created a stochastic model of theavailable wind energy starting from measured data collected fromthe wind farm at Lake Benton Minnesota USA [29] The goal is totake forecast values into account while also considering the intrin-sic inaccuracies of these predictions First we compute the (dis-crete) empirical PD microW of a training set of collected wind-energydata Second we divide a new set of data in T2-slots and considerthe average value for each new T2-slot as the forecast energy valueWe then scale microW to have such expected value E[Wk] thus obtain-ing microWk Finally we compute ellipsoidal Sets (3) F primeas (collectedinF prime) to represent uncertainty in the transition probability betweentwo discretized energy levels in two consecutive T2-slots Transi-tion frequencies are computed by counting observed transitions in atraining set of data Using classical results from statistics [17] wethen compute the value of parameter β from Set (1) correspond-ing to the confidence level CL in the measurements In particular0 le CL le 1 and CL = 1minus cdfχ2

d(2 lowast (βmax minus β)) where cdfχ2

d

is the cumulative density function of the Chi-squared distributionwith d degrees of freedom (d is equal to the number of bins used todiscretize W ) Moreover an increasing value of parameter β canbe used in the sequence of decision epochs to model the fact thatforecast farther-away in time are less accurate

We provide the states with thick circles in Fig 4 (bottom) withthree reward structures to express the profit and risk of the oper-ator and the QoS for the users We choose those states becausethe quantities Dtk DokWk are all fully observable in them thusallowing the computation of the rewards We set

rProfitsk [$] = udtk+vkdokminus(cp∆k+c1(qminus∆k))1A∆kge0

minus(c1qminusc2∆k)1B∆klt0 (13a)

rLoLsk [MWh] = max(0 XE[Wk] + Y E[Dtk +Dok]minus∆k)(13b)

rQualitysk [MWh] = dtk + dok (13c)

with ∆k = wk + q minus dtk minus dok representing the surplus ofsupply on demand X and Y defined in the following and 1 theindicator function Reward (13a) subtracts operating costs to the

2For simplicity we assume that the distributions of Dt and Doare known with certainty In fact these distributions are usuallyestimated more accurately than the wind forecast by analyzing thehistory of measured data [10] Adding uncertainty models also forthese distributions is left as future work

operator revenue to compute the net profit Indicator 1A (1B) cor-responds to the scenario when the sum of day-ahead dispatchedand wind energy is sufficient (insufficient) to cover the demandIn the latter case fast-start energy needs to be dispatched in real-time Reward (13b) computes the Loss of Load (LoL) In prac-tical scenarios the amount of fast-start energy available in real-time is limited Often this limit is computed with the formulaFS le XE[W ] + Y E[Dt +Do] (eg X = 3 Y = 10) [30]If ∆ + FS lt 0 the network incurs in a LoL with potentiallyrisky consequences Reward (13c) accounts for user demand in-centivized by energy pricing

Finally we mark all states with ∆ +FS lt 0 with the label riskand use label abs for the absorbing state so Ω = risk abs

The optimal strategy σlowast = (ulowast Qlowast vlowastk) 1 le k le K is thesolution of problem

Wlowasts0 = maxQu

minfas isinFprimeas

Eσfas

W EσDt maxv1middotmiddotmiddotvK

EσDorewrProfit(πK)

stMprimeE σlowast |=Nat φ where (14)

φ = RrLoL

leEENSM [C abs] andRrQuality

geQoSm [C abs]

and Pge1minusLoLPM [notrisk U abs]

In Problem (14) we maximize the expected value of the operatorprofit Wlowasts0 under the worst-case resolution of uncertainty in thewind-energy forecast by summing the instantaneous state rewardsrProfits along the paths π isin Πfin of K steps ofMprimeE correspond-ing to the K T2-slots By replicating the decision process for eachof the K T2-slots while building modelMprimeE every K-step execu-tion path traverses all decision epochs so the computed reward asintroduced in Definition 26 and 27 will account for the sum of thecontributions of each decision epoch

According to the semantics defined in Table 1 the PCTL specifi-cation φ constrains the expected operator risk and user QoS acrossthe decision horizon EENSM is the desired maximum value ofExpected Energy Not Served LoLPM is the maximum allowedvalue of Loss of Load Probability (these two properties limit therisk for the operator) andQoSm is the minimum value of QoS thatneeds to be guaranteed to the users

6 EXPERIMENTAL RESULTSWe implemented the algorithm in Python and interfaced it with

PRISM [31] as a front-end for entering models and with Gurobi [27]as the back-end optimizer Experiments were run on a 24 GHz In-tel Xeon with 32GB of RAM

In this section we present experimental results obtained by solv-ing Problem (14) Our goals are to give insight about the algo-rithm functionality compare its scalability to other strategy synthe-sis approaches and show that the synthesized energy pricing anddispatch strategies can achieve better performances than other so-lutions presented in the literature

We define the following quantities

Profit = Wlowasts0EENS = Rr

LoLσlowastmaxs0 [C abs]

QoS = RrQualityσlowastmins0 [C abs]

1minus LoLP = Pσlowastmin

s0 [notrisk U abs]

where the min and max operators refer to the action range of na-ture Nat As defined in Section 22 these quantities represent thequantitative values of rewards and satisfaction probability that getthen compared to the corresponding thresholds (EENSM QoSmLoLPM ) in Problem (14) to determine the satisfaction of φ Wewill then normalize the Profit to ProfitM the maximum com-puted profit value for each set of experiments We set T1 = 1hT2 = 30min so K = 2 and consider two pricing options bothfor traditional and opportunistic users If not otherwise stated

Fig 5 Performance vs iteration

Fig 6 Profit vs confidence in forecast

we will use CL = 90 and discretize the wind energy W in 5bins and traditional Dt opportunistic Do demands and baselinesupply Q in 2 bins We set QoSm = 80

sumk E[Dtk +Dok]

LoLPM = 10 EENSM = 5sumk E[Wk + q] The other pa-

rameter values were taken from [10]In Fig 5 we show the trend of the expected system performance

as a function of the synthesis algorithm iteration The Profitmonotonically decreases until the proposed candidate strategy σcmeets all specifications We note that 1minusLoLP andQoS (EENSnot shown has a trend similar to 1 minus LoLP ) instead vary non-monotonically Intuitively this is because the Profit can be in-creased either by scheduling less baseline energy Q to reduce can-cellation costs cp but incurring in a higher risk or by increasing the

Table 2 Performance Analysis

W bins 5 10 15 20Profit 1 098 097 0965

1minus LoLP 099 099 099 099EENS 098 098 098 098QoS 101 101 101 101

Runtime 144s 400s 1368s 3289sIter 223 53 547 332N + Tr 1343 2719 4115 5591

MD Strat (I) 4096 42e6 43e9 44e12

005 010 015 020 025 030082084086088090092094096098100

Pro

fit

Pro

fit M

Performance Comparison vs Varaiya et al [7] and He et al [10]

Ours[7][10]

005 010 015 020 025 03000020406081012

EE

NS

EE

NS M

Ours[7][10]

005 010 015 020 025 030Wind Penetration ηW

088090092094096098100102

QoS

QoS

m Ours[7][10]

Fig 7 Comparison to alternative approaches via Monte Carlo sim-ulation

energy price v for opportunistic users with consequent reduction ofQoS (the peaks of 1 minus LoLP are aligned to the valleys of QoS)By ranking strategies by expected Profit our algorithm is capa-ble of selecting the optimal strategy despite the complex parameterinterdependences in the model under analysis

In Table 2 we compare synthesis results while varying the num-ber of discretization bins for W (all values are normalized to thecorresponding specification) First we note that the expected sys-tem performances do not substantially vary by changing the num-ber of bins supporting our choice of 5 bins in the other experi-ments Second runtime results show that the algorithm can handlein reasonable time problems more than 10times larger than the onesanalyzed in [14] the only other algorithm in the literature capableof processing arbitrary PCTL formulas (we useN+Tr the sum ofstates and transitions in the EMDP as a proxy of the model size)Third we note that the alternative approach of verifying all the IMD strategies σ isin ΣMD is not practical due to the exponentialincrease of I (defined in Section 211) with the problem size

In Fig 6 we study the effect of different confidence levels CL inthe wind-energy forecast on the expected operator Profit whilevarying the value of wind penetration ηW =

sumKk=1

E[Wk]E[Wk + q]

and keeping all constraints constant At high CL higher profitscan be expected for increasing ηW (wind energy is assumed free)On the other hand for low CL higher wind penetration createsmore uncertainty thus lowering the expected Profit The networkoperator can use these curves to assess the return of investment inemploying more accurate (and expensive) forecast techniques

Finally in Fig 7 we compare results with two other energy-pricing formulations proposed in the literature He et al [10] solvethe optimization problem without enforcing any constraint Varaiyaet al [7] only put limits on the acceptable LoLP (their approach isnot trivially extendable to reward properties expressed using the Roperator) and solve optimally only for LoLP = 0 Comparison isdone by solving the different optimization problems and then run-ning Monte Carlo simulations (1000 runs) of the controlled systemon test data (different from the training ones) to evaluate its per-formance (Profit EENS and QoS) As expected the uncon-strained strategy from [10] has higher Profit (up to 5) but alsoup to 12 more EENS and 10 less QoS compared to our ap-proach The strategy from [7] guarantees null EENS but it hasup to 6 less Profit (due to over constraining EENS) and 10less QoS (which is left unconstrained)

As a final remark we note that runtime may increase exponen-tially as we tighten the specification thresholds (QoSm LoLPM EENSM ) since it becomes increasingly more difficult to find a

solution within the exponentially-sized search space Neverthelessthe chosen values were tight enough to improve the quality of alter-native energy pricing and dispatch strategies proposed in the litera-ture while maintaining the runtime acceptable for this application

7 CONCLUSIONS AND FUTURE WORKWe proposed the first sound and complete algorithm for the syn-

thesis of control strategies for MDPs satisfying properties expressedin PCTL and robust to uncertainties in the transition probabilitiesWe then applied the algorithm to the problem of energy pricingand dispatch in smart grids with renewables Results showed thatnetwork-operator risks can be effectively constrained at design timeand that more accurate predictions of the expected profit can be ob-tained by taking the uncertainty of wind availability into account

As future work we plan to investigate techniques to generatemore concise constraints to prove failure of the verification step inorder to prune more effectively the search space for the optimiza-tion engine and to apply the proposed strategy synthesis approachto further case studies eg semi-autonomous car driving

Acknowledgments The authors thank J Venkatesh and profT Simunic Rosing for providing the measurement data used in thecase study The research was partially funded by DARPA AwardNumber HR0011-12-2-0016 and by TerraSwarm one of six cen-ters of STARnet a Semiconductor Research Corporation programsponsored by MARCO and DARPA Approved for public releasedistribution is unlimited The content of this paper does not neces-sarily reflect the position or the policy of the US government andno official endorsement should be inferred

8 REFERENCES[1] Courcoubetis C and Yannakakis M ldquoThe Complexity of

Probabilistic Verificationrdquo Journal of ACM vol 42(4) pp857ndash907 1995

[2] H Hansson and B Jonsson ldquoA Logic for Reasoning AboutTime and Reliabilityrdquo Formal Aspects of Computing vol6(5) pp 512ndash535 1994

[3] A Bianco and L de Alfaro ldquoModel Checking ofProbabilistic and Nondeterministic Systemsrdquo in Proc ofFSTTCS ser LNCS Springer Berlin Heidelberg 1995vol 1026 pp 499ndash513

[4] M Puterman Markov Decision Processes DiscreteStochastic Dynamic Programming John Wiley and Sons1994

[5] ldquoInternational Energy Agency World Energy Outlook 2009rdquo[Online] httpwwwworldenergyoutlookorgdocsweo2009WEO2009_es_englishpdf

[6] A Gore An Inconvenient Truth New York Rodale 2006[7] P Varaiya F Wu and J Bialek ldquoSmart Operation of Smart

Grid Risk-Limiting Dispatchrdquo Proceedings of the IEEEvol 99 no 1 pp 40ndash57 2011

[8] K Porter and J Rogers ldquoSurvey of Variable GenerationForecasting in the Westrdquo in NREL Subcontract ReportSR-5500-54457 April 2012 p 56

[9] S Koch et al ldquoModeling and Control of AggregatedHeterogeneous Thermostatically Controlled Loads forAncillary Servicesrdquo Proc of PSCC pp 1ndash7 2011

[10] M He S Murugesan and J Zhang ldquoMultiple TimescaleDispatch and Scheduling for Stochastic Reliability in SmartGrids with Wind Generation Integrationrdquo in Proc of IEEEINFOCOM 2011 pp 461ndash465

[11] A Puggelli et al ldquoPolynomial-Time Verification of PCTLproperties of MDPs with Convex Uncertaintiesrdquo in Proc of

CAV ser LNCS Springer Berlin Heidelberg 2013 vol8044 pp 527ndash542

[12] A Puggelli ldquoFormal Techniques for the Verification andOptimal Control of Probabilistic Systems in the Presence ofModeling Uncertaintiesrdquo PhD dissertation University ofCalifornia Berkeley Berkeley CA 2014

[13] C Baier et al ldquoController Synthesis for ProbabilisticSystemsrdquo in Exploring New Frontiers of TheoreticalInformatics Springer US 2004 vol 155 pp 493ndash506

[14] M Lahijanian et al ldquoTemporal Logic Motion Planning andControl with Probabilistic Satisfaction Guaranteesrdquo IEEETransactions on Robotics vol 28 no 2 pp 396ndash409 2012

[15] M Kwiatkowska and D Parker ldquoAutomated Verificationand Strategy Synthesis for Probabilistic Systemsrdquo in Proc ofATVA ser LNCS vol 8172 Springer 2013 pp 5ndash22

[16] K Sen et al ldquoModel-Checking Markov Chains in thePresence of Uncertaintiesrdquo in Proc of TACAS ser LNCSSpringer Berlin Heidelberg 2006 vol 3920 pp 394ndash410

[17] A Nilim and L El Ghaoui ldquoRobust Control of MarkovDecision Processes with Uncertain Transition MatricesrdquoJournal of Operations Research pp 780ndash798 2005

[18] M Y Vardi ldquoAutomatic Verification of ProbabilisticConcurrent Finite State Programsrdquo in Proc of SFCS 1985pp 327ndash338

[19] V Forejt et al ldquoAutomated Verification Techniques forProbabilistic Systemsrdquo Formal Methods for EternalNetworked Software Systems vol 6659 pp 53ndash113 2011

[20] F Bouffard and F Galiana ldquoStochastic Security forOperations Planning with Significant Wind PowerGenerationrdquo in IEEE Transaction on Power Systems 2008vol 23 no 2 pp 306ndash316

[21] A Kucera and O Stražovskyacute ldquoOn the Controller Synthesisfor Finite-State Markov Decision Processesrdquo in Proc ofFSTTCS ser LNCS Springer Berlin Heidelberg 2005vol 3821 pp 541ndash552

[22] K Draeger et al ldquoPermissive Controller Synthesis forProbabilistic Systemsrdquo in Proc of TACAS ser LNCSSpringer 2014 pp 531ndash546

[23] T Brazdil et al ldquoStochastic Games with Branching-TimeWinning Objectivesrdquo in Proc of IEEE SLCS Aug 2006 pp349ndash358

[24] C Hang et al ldquoSynthesizing Cyber-Physical ArchitecturalModels with Real-Time Constraintsrdquo in Proc of CAVBerlin Heidelberg Springer-Verlag 2011 pp 441ndash456

[25] P Nuzzo et al ldquoCalCS SMT Solving for Non-linearConvex Constraintsrdquo in Proc of FMCAD 2010 pp 71ndash80

[26] S Boyd and L Vandenberghe ldquoConvex OptimizationrdquoCambridge University Press 2004

[27] ldquoGurobi Optimizerrdquo [Online] httpwwwgurobicom[28] S Stoft Power System Economics Designing Markets for

Electricity NewYork Wiley 2002[29] Y Wan ldquoLong-term Wind Power Variabilityrdquo NREL

Technical Report TP-5500-53637 [Online]httpwwwnrelgovdocsfy12osti53637pdf January 2012

[30] E Ela et al ldquoOperating Reserves and Variable GenerationrdquoNREL Technical Report [Online]httpwwwnrelgovdocsfy11osti51978pdf August 2011

[31] M Kwiatkowska et al ldquoPRISM 40 Verification ofProbabilistic Real-Time Systemsrdquo Proc of CAV pp585ndash591 2011

Page 9: Robust Strategy Synthesis for Probabilistic Systems Applied to …sseshia/pubdir/emsoft... · 2014-08-07 · Robust Strategy Synthesis for Probabilistic Systems Applied to Risk-Limiting

Fig 5 Performance vs iteration

Fig 6 Profit vs confidence in forecast

we will use CL = 90 and discretize the wind energy W in 5bins and traditional Dt opportunistic Do demands and baselinesupply Q in 2 bins We set QoSm = 80

sumk E[Dtk +Dok]

LoLPM = 10 EENSM = 5sumk E[Wk + q] The other pa-

rameter values were taken from [10]In Fig 5 we show the trend of the expected system performance

as a function of the synthesis algorithm iteration The Profitmonotonically decreases until the proposed candidate strategy σcmeets all specifications We note that 1minusLoLP andQoS (EENSnot shown has a trend similar to 1 minus LoLP ) instead vary non-monotonically Intuitively this is because the Profit can be in-creased either by scheduling less baseline energy Q to reduce can-cellation costs cp but incurring in a higher risk or by increasing the

Table 2 Performance Analysis

W bins 5 10 15 20Profit 1 098 097 0965

1minus LoLP 099 099 099 099EENS 098 098 098 098QoS 101 101 101 101

Runtime 144s 400s 1368s 3289sIter 223 53 547 332N + Tr 1343 2719 4115 5591

MD Strat (I) 4096 42e6 43e9 44e12

005 010 015 020 025 030082084086088090092094096098100

Pro

fit

Pro

fit M

Performance Comparison vs Varaiya et al [7] and He et al [10]

Ours[7][10]

005 010 015 020 025 03000020406081012

EE

NS

EE

NS M

Ours[7][10]

005 010 015 020 025 030Wind Penetration ηW

088090092094096098100102

QoS

QoS

m Ours[7][10]

Fig 7 Comparison to alternative approaches via Monte Carlo sim-ulation

energy price v for opportunistic users with consequent reduction ofQoS (the peaks of 1 minus LoLP are aligned to the valleys of QoS)By ranking strategies by expected Profit our algorithm is capa-ble of selecting the optimal strategy despite the complex parameterinterdependences in the model under analysis

In Table 2 we compare synthesis results while varying the num-ber of discretization bins for W (all values are normalized to thecorresponding specification) First we note that the expected sys-tem performances do not substantially vary by changing the num-ber of bins supporting our choice of 5 bins in the other experi-ments Second runtime results show that the algorithm can handlein reasonable time problems more than 10times larger than the onesanalyzed in [14] the only other algorithm in the literature capableof processing arbitrary PCTL formulas (we useN+Tr the sum ofstates and transitions in the EMDP as a proxy of the model size)Third we note that the alternative approach of verifying all the IMD strategies σ isin ΣMD is not practical due to the exponentialincrease of I (defined in Section 211) with the problem size

In Fig 6 we study the effect of different confidence levels CL inthe wind-energy forecast on the expected operator Profit whilevarying the value of wind penetration ηW =

sumKk=1

E[Wk]E[Wk + q]

and keeping all constraints constant At high CL higher profitscan be expected for increasing ηW (wind energy is assumed free)On the other hand for low CL higher wind penetration createsmore uncertainty thus lowering the expected Profit The networkoperator can use these curves to assess the return of investment inemploying more accurate (and expensive) forecast techniques

Finally in Fig 7 we compare results with two other energy-pricing formulations proposed in the literature He et al [10] solvethe optimization problem without enforcing any constraint Varaiyaet al [7] only put limits on the acceptable LoLP (their approach isnot trivially extendable to reward properties expressed using the Roperator) and solve optimally only for LoLP = 0 Comparison isdone by solving the different optimization problems and then run-ning Monte Carlo simulations (1000 runs) of the controlled systemon test data (different from the training ones) to evaluate its per-formance (Profit EENS and QoS) As expected the uncon-strained strategy from [10] has higher Profit (up to 5) but alsoup to 12 more EENS and 10 less QoS compared to our ap-proach The strategy from [7] guarantees null EENS but it hasup to 6 less Profit (due to over constraining EENS) and 10less QoS (which is left unconstrained)

As a final remark we note that runtime may increase exponen-tially as we tighten the specification thresholds (QoSm LoLPM EENSM ) since it becomes increasingly more difficult to find a

solution within the exponentially-sized search space Neverthelessthe chosen values were tight enough to improve the quality of alter-native energy pricing and dispatch strategies proposed in the litera-ture while maintaining the runtime acceptable for this application

7 CONCLUSIONS AND FUTURE WORKWe proposed the first sound and complete algorithm for the syn-

thesis of control strategies for MDPs satisfying properties expressedin PCTL and robust to uncertainties in the transition probabilitiesWe then applied the algorithm to the problem of energy pricingand dispatch in smart grids with renewables Results showed thatnetwork-operator risks can be effectively constrained at design timeand that more accurate predictions of the expected profit can be ob-tained by taking the uncertainty of wind availability into account

As future work we plan to investigate techniques to generatemore concise constraints to prove failure of the verification step inorder to prune more effectively the search space for the optimiza-tion engine and to apply the proposed strategy synthesis approachto further case studies eg semi-autonomous car driving

Acknowledgments The authors thank J Venkatesh and profT Simunic Rosing for providing the measurement data used in thecase study The research was partially funded by DARPA AwardNumber HR0011-12-2-0016 and by TerraSwarm one of six cen-ters of STARnet a Semiconductor Research Corporation programsponsored by MARCO and DARPA Approved for public releasedistribution is unlimited The content of this paper does not neces-sarily reflect the position or the policy of the US government andno official endorsement should be inferred

8 REFERENCES[1] Courcoubetis C and Yannakakis M ldquoThe Complexity of

Probabilistic Verificationrdquo Journal of ACM vol 42(4) pp857ndash907 1995

[2] H Hansson and B Jonsson ldquoA Logic for Reasoning AboutTime and Reliabilityrdquo Formal Aspects of Computing vol6(5) pp 512ndash535 1994

[3] A Bianco and L de Alfaro ldquoModel Checking ofProbabilistic and Nondeterministic Systemsrdquo in Proc ofFSTTCS ser LNCS Springer Berlin Heidelberg 1995vol 1026 pp 499ndash513

[4] M Puterman Markov Decision Processes DiscreteStochastic Dynamic Programming John Wiley and Sons1994

[5] ldquoInternational Energy Agency World Energy Outlook 2009rdquo[Online] httpwwwworldenergyoutlookorgdocsweo2009WEO2009_es_englishpdf

[6] A Gore An Inconvenient Truth New York Rodale 2006[7] P Varaiya F Wu and J Bialek ldquoSmart Operation of Smart

Grid Risk-Limiting Dispatchrdquo Proceedings of the IEEEvol 99 no 1 pp 40ndash57 2011

[8] K Porter and J Rogers ldquoSurvey of Variable GenerationForecasting in the Westrdquo in NREL Subcontract ReportSR-5500-54457 April 2012 p 56

[9] S Koch et al ldquoModeling and Control of AggregatedHeterogeneous Thermostatically Controlled Loads forAncillary Servicesrdquo Proc of PSCC pp 1ndash7 2011

[10] M He S Murugesan and J Zhang ldquoMultiple TimescaleDispatch and Scheduling for Stochastic Reliability in SmartGrids with Wind Generation Integrationrdquo in Proc of IEEEINFOCOM 2011 pp 461ndash465

[11] A Puggelli et al ldquoPolynomial-Time Verification of PCTLproperties of MDPs with Convex Uncertaintiesrdquo in Proc of

CAV ser LNCS Springer Berlin Heidelberg 2013 vol8044 pp 527ndash542

[12] A Puggelli ldquoFormal Techniques for the Verification andOptimal Control of Probabilistic Systems in the Presence ofModeling Uncertaintiesrdquo PhD dissertation University ofCalifornia Berkeley Berkeley CA 2014

[13] C Baier et al ldquoController Synthesis for ProbabilisticSystemsrdquo in Exploring New Frontiers of TheoreticalInformatics Springer US 2004 vol 155 pp 493ndash506

[14] M Lahijanian et al ldquoTemporal Logic Motion Planning andControl with Probabilistic Satisfaction Guaranteesrdquo IEEETransactions on Robotics vol 28 no 2 pp 396ndash409 2012

[15] M Kwiatkowska and D Parker ldquoAutomated Verificationand Strategy Synthesis for Probabilistic Systemsrdquo in Proc ofATVA ser LNCS vol 8172 Springer 2013 pp 5ndash22

[16] K Sen et al ldquoModel-Checking Markov Chains in thePresence of Uncertaintiesrdquo in Proc of TACAS ser LNCSSpringer Berlin Heidelberg 2006 vol 3920 pp 394ndash410

[17] A Nilim and L El Ghaoui ldquoRobust Control of MarkovDecision Processes with Uncertain Transition MatricesrdquoJournal of Operations Research pp 780ndash798 2005

[18] M Y Vardi ldquoAutomatic Verification of ProbabilisticConcurrent Finite State Programsrdquo in Proc of SFCS 1985pp 327ndash338

[19] V Forejt et al ldquoAutomated Verification Techniques forProbabilistic Systemsrdquo Formal Methods for EternalNetworked Software Systems vol 6659 pp 53ndash113 2011

[20] F Bouffard and F Galiana ldquoStochastic Security forOperations Planning with Significant Wind PowerGenerationrdquo in IEEE Transaction on Power Systems 2008vol 23 no 2 pp 306ndash316

[21] A Kucera and O Stražovskyacute ldquoOn the Controller Synthesisfor Finite-State Markov Decision Processesrdquo in Proc ofFSTTCS ser LNCS Springer Berlin Heidelberg 2005vol 3821 pp 541ndash552

[22] K Draeger et al ldquoPermissive Controller Synthesis forProbabilistic Systemsrdquo in Proc of TACAS ser LNCSSpringer 2014 pp 531ndash546

[23] T Brazdil et al ldquoStochastic Games with Branching-TimeWinning Objectivesrdquo in Proc of IEEE SLCS Aug 2006 pp349ndash358

[24] C Hang et al ldquoSynthesizing Cyber-Physical ArchitecturalModels with Real-Time Constraintsrdquo in Proc of CAVBerlin Heidelberg Springer-Verlag 2011 pp 441ndash456

[25] P Nuzzo et al ldquoCalCS SMT Solving for Non-linearConvex Constraintsrdquo in Proc of FMCAD 2010 pp 71ndash80

[26] S Boyd and L Vandenberghe ldquoConvex OptimizationrdquoCambridge University Press 2004

[27] ldquoGurobi Optimizerrdquo [Online] httpwwwgurobicom[28] S Stoft Power System Economics Designing Markets for

Electricity NewYork Wiley 2002[29] Y Wan ldquoLong-term Wind Power Variabilityrdquo NREL

Technical Report TP-5500-53637 [Online]httpwwwnrelgovdocsfy12osti53637pdf January 2012

[30] E Ela et al ldquoOperating Reserves and Variable GenerationrdquoNREL Technical Report [Online]httpwwwnrelgovdocsfy11osti51978pdf August 2011

[31] M Kwiatkowska et al ldquoPRISM 40 Verification ofProbabilistic Real-Time Systemsrdquo Proc of CAV pp585ndash591 2011

Page 10: Robust Strategy Synthesis for Probabilistic Systems Applied to …sseshia/pubdir/emsoft... · 2014-08-07 · Robust Strategy Synthesis for Probabilistic Systems Applied to Risk-Limiting

solution within the exponentially-sized search space Neverthelessthe chosen values were tight enough to improve the quality of alter-native energy pricing and dispatch strategies proposed in the litera-ture while maintaining the runtime acceptable for this application

7 CONCLUSIONS AND FUTURE WORKWe proposed the first sound and complete algorithm for the syn-

thesis of control strategies for MDPs satisfying properties expressedin PCTL and robust to uncertainties in the transition probabilitiesWe then applied the algorithm to the problem of energy pricingand dispatch in smart grids with renewables Results showed thatnetwork-operator risks can be effectively constrained at design timeand that more accurate predictions of the expected profit can be ob-tained by taking the uncertainty of wind availability into account

As future work we plan to investigate techniques to generatemore concise constraints to prove failure of the verification step inorder to prune more effectively the search space for the optimiza-tion engine and to apply the proposed strategy synthesis approachto further case studies eg semi-autonomous car driving

Acknowledgments The authors thank J Venkatesh and profT Simunic Rosing for providing the measurement data used in thecase study The research was partially funded by DARPA AwardNumber HR0011-12-2-0016 and by TerraSwarm one of six cen-ters of STARnet a Semiconductor Research Corporation programsponsored by MARCO and DARPA Approved for public releasedistribution is unlimited The content of this paper does not neces-sarily reflect the position or the policy of the US government andno official endorsement should be inferred

8 REFERENCES[1] Courcoubetis C and Yannakakis M ldquoThe Complexity of

Probabilistic Verificationrdquo Journal of ACM vol 42(4) pp857ndash907 1995

[2] H Hansson and B Jonsson ldquoA Logic for Reasoning AboutTime and Reliabilityrdquo Formal Aspects of Computing vol6(5) pp 512ndash535 1994

[3] A Bianco and L de Alfaro ldquoModel Checking ofProbabilistic and Nondeterministic Systemsrdquo in Proc ofFSTTCS ser LNCS Springer Berlin Heidelberg 1995vol 1026 pp 499ndash513

[4] M Puterman Markov Decision Processes DiscreteStochastic Dynamic Programming John Wiley and Sons1994

[5] ldquoInternational Energy Agency World Energy Outlook 2009rdquo[Online] httpwwwworldenergyoutlookorgdocsweo2009WEO2009_es_englishpdf

[6] A Gore An Inconvenient Truth New York Rodale 2006[7] P Varaiya F Wu and J Bialek ldquoSmart Operation of Smart

Grid Risk-Limiting Dispatchrdquo Proceedings of the IEEEvol 99 no 1 pp 40ndash57 2011

[8] K Porter and J Rogers ldquoSurvey of Variable GenerationForecasting in the Westrdquo in NREL Subcontract ReportSR-5500-54457 April 2012 p 56

[9] S Koch et al ldquoModeling and Control of AggregatedHeterogeneous Thermostatically Controlled Loads forAncillary Servicesrdquo Proc of PSCC pp 1ndash7 2011

[10] M He S Murugesan and J Zhang ldquoMultiple TimescaleDispatch and Scheduling for Stochastic Reliability in SmartGrids with Wind Generation Integrationrdquo in Proc of IEEEINFOCOM 2011 pp 461ndash465

[11] A Puggelli et al ldquoPolynomial-Time Verification of PCTLproperties of MDPs with Convex Uncertaintiesrdquo in Proc of

CAV ser LNCS Springer Berlin Heidelberg 2013 vol8044 pp 527ndash542

[12] A Puggelli ldquoFormal Techniques for the Verification andOptimal Control of Probabilistic Systems in the Presence ofModeling Uncertaintiesrdquo PhD dissertation University ofCalifornia Berkeley Berkeley CA 2014

[13] C Baier et al ldquoController Synthesis for ProbabilisticSystemsrdquo in Exploring New Frontiers of TheoreticalInformatics Springer US 2004 vol 155 pp 493ndash506

[14] M Lahijanian et al ldquoTemporal Logic Motion Planning andControl with Probabilistic Satisfaction Guaranteesrdquo IEEETransactions on Robotics vol 28 no 2 pp 396ndash409 2012

[15] M Kwiatkowska and D Parker ldquoAutomated Verificationand Strategy Synthesis for Probabilistic Systemsrdquo in Proc ofATVA ser LNCS vol 8172 Springer 2013 pp 5ndash22

[16] K Sen et al ldquoModel-Checking Markov Chains in thePresence of Uncertaintiesrdquo in Proc of TACAS ser LNCSSpringer Berlin Heidelberg 2006 vol 3920 pp 394ndash410

[17] A Nilim and L El Ghaoui ldquoRobust Control of MarkovDecision Processes with Uncertain Transition MatricesrdquoJournal of Operations Research pp 780ndash798 2005

[18] M Y Vardi ldquoAutomatic Verification of ProbabilisticConcurrent Finite State Programsrdquo in Proc of SFCS 1985pp 327ndash338

[19] V Forejt et al ldquoAutomated Verification Techniques forProbabilistic Systemsrdquo Formal Methods for EternalNetworked Software Systems vol 6659 pp 53ndash113 2011

[20] F Bouffard and F Galiana ldquoStochastic Security forOperations Planning with Significant Wind PowerGenerationrdquo in IEEE Transaction on Power Systems 2008vol 23 no 2 pp 306ndash316

[21] A Kucera and O Stražovskyacute ldquoOn the Controller Synthesisfor Finite-State Markov Decision Processesrdquo in Proc ofFSTTCS ser LNCS Springer Berlin Heidelberg 2005vol 3821 pp 541ndash552

[22] K Draeger et al ldquoPermissive Controller Synthesis forProbabilistic Systemsrdquo in Proc of TACAS ser LNCSSpringer 2014 pp 531ndash546

[23] T Brazdil et al ldquoStochastic Games with Branching-TimeWinning Objectivesrdquo in Proc of IEEE SLCS Aug 2006 pp349ndash358

[24] C Hang et al ldquoSynthesizing Cyber-Physical ArchitecturalModels with Real-Time Constraintsrdquo in Proc of CAVBerlin Heidelberg Springer-Verlag 2011 pp 441ndash456

[25] P Nuzzo et al ldquoCalCS SMT Solving for Non-linearConvex Constraintsrdquo in Proc of FMCAD 2010 pp 71ndash80

[26] S Boyd and L Vandenberghe ldquoConvex OptimizationrdquoCambridge University Press 2004

[27] ldquoGurobi Optimizerrdquo [Online] httpwwwgurobicom[28] S Stoft Power System Economics Designing Markets for

Electricity NewYork Wiley 2002[29] Y Wan ldquoLong-term Wind Power Variabilityrdquo NREL

Technical Report TP-5500-53637 [Online]httpwwwnrelgovdocsfy12osti53637pdf January 2012

[30] E Ela et al ldquoOperating Reserves and Variable GenerationrdquoNREL Technical Report [Online]httpwwwnrelgovdocsfy11osti51978pdf August 2011

[31] M Kwiatkowska et al ldquoPRISM 40 Verification ofProbabilistic Real-Time Systemsrdquo Proc of CAV pp585ndash591 2011