an energy management controller to optimally tradeoff fuel

An Energy Management Controller to Optimally Tradeoff FuelEconomy and Drivability for Hybrid Vehicles

Daniel F. Opila, Xiaoyong Wang, Ryan McGee, R. Brent Gillespie, Jeffrey A. Cook, and J.W. Grizzle

Abstract—Hybrid Vehicle fuel economy performance is highlysensitive to the Energy Management strategy used to regulatepower flow among the various energy sources and sinks. Optimalnon-causal solutions are easy to determine if the drive cycleis known a priori. It is very challenging to design causalcontrollers that yield good fuel economy for a range of possibledriver behavior. Additional challenges come in the form ofconstraints on powertrain activity, such as shifting and startingthe engine, which are commonly called “drivability” metricsand can adversely affect fuel economy. In this paper, drivabilityrestrictions are included in a Shortest Path Stochastic DynamicProgramming (SP-SDP) formulation of the real-time energymanagement problem for a prototype vehicle, where the drivecycle is modeled as a stationary, finite-state Markov chain. Whenthe SP-SDP controllers are evaluated with a high-fidelity vehiclesimulator over standard government drive cycles, and comparedto a baseline industrial controller, they are shown to improvefuel economy more than 11% for equivalent levels of drivability.In addition, the explicit tradeoff between fuel economy anddrivability is quantified for the SP-SDP controllers.

I. INTRODUCTION

Hybrid vehicles have become increasingly popular in theautomotive marketplace in the past decade. The most commontype is the electric hybrid, which consists of an internalcombustion engine (ICE), a battery, and at least one electricmachine (EM). Hybrids are built in several configurationsincluding series, parallel, and the series-parallel configurationconsidered here. Hybrid vehicles are characterized by multipleenergy sources; the strategy to control the energy flow amongthese multiple sources is termed “Energy Management” and iscrucial for good fuel economy. An excellent overview of thisarea is available in [4].

The energy management problem has been studied exten-sively in academic circles. Various control design methods areused, including rule-based [5], [6], [7], [8], neural networks[9], game theory [10], and fuzzy logic [11]. There are manyproposed methods available for both the non-causal (cycleknown in advance) and causal (cycle unknown in advance)cases [12], [13], [14], as well as those with partial futureinformation [15], [16]. The most commonly used optimizationstrategies are the Equivalent Consumption Minimization Strat-egy (ECMS) [17], [18], [19], [20], [21], [22] and Stochastic

This material is based upon work supported under a National ScienceFoundation Graduate Research Fellowship. D.F. Opila is supported by NDSEGand NSF-GRFP fellowships. D.F. Opila, J.A. Cook, and J.W. Grizzle weresupported by a grant from Ford Motor Company. Portions of this work haveappeared in [1], [2], [3].

Daniel Opila and Brent Gillespie are with the Dept. of Mechanical En-gineering, University of Michigan.{dopila,brentg}@umich.eduXiaoyong Wang and Ryan McGee are with Ford Motor Company,Dearborn, MI. Jeffrey Cook and Jessy Grizzle are with the Dept. ofElectrical Engineering and Computer Science, University of Michigan.{jeffcook,grizzle}@umich.edu

or Deterministic Dynamic Programming [23], [24], [25], [26].The majority of existing work focuses on controllers thatseek to minimize fuel consumption alone. In practice, fuel-optimal controllers can lead to excessive gear shifting andengine starting/stopping [27], [28], [29], [30]. Such powertrainbehavior is known as “drivability.” Previous research hasaddressed drivability in a suboptimal manner by incorporatingpenalties on engine starts in an ECMS formulation [18]. Thereference [31] addressed engine starts indirectly by includinga hysteresis term “to avoid a too frequent switch on - switchoff of the [internal combustion] engine, which would cause anadditional energy use and wearout.”

In this paper, drivability restrictions are directly incorpo-rated for the first time in a causal, optimal controller designmethod for the energy management system of an HEV. Themain tool is Shortest Path Stochastic Dynamic Programming(SP-SDP), which, as explained in [32], [33], [34], [26], isa specific formulation of Stochastic Dynamic Programming(SDP) that allows infinite horizon optimization problems tobe addressed without the use of discounting. In the energymanagement problem, the power requested by the driver,which is the equivalent of a drive cycle, is modeled as astationary, finite-state Markov chain [23]. The state space ofthe Markov chain is constructed to include a terminal statecorresponding to key-off [26]. The terminal state is designedto be absorbing (that is, it is reached in finite time withprobability one, and there is zero probability of transitioningout of it). If zero cost is incurred in the absorbing terminalstate, then the expected value of the cost function is finite,even without discounting [33], [34].

The controllers generated through SP-SDP are causal statefeedbacks and hence are directly implementable in a real-timecontrol architecture. The controllers are provably optimal if thedriving behavior matches the assumed Markov chain model.In this paper, the Markov chains representing driver behaviorare modeled on standard government test cycles, as in [23],[26]. It is also possible to build the Markov chains on the basisof real-world driving data, as reported in [35].

In addition to generating a class of optimal controllers,the SP-SDP method allows direct study of the tradeoffsbetween different performance goals, here, drivability andfuel economy. The ability to easily generate Pareto tradeoffcurves is perhaps just as interesting as a specific fuel economybenefit. The designer can generate both the maximum attain-able performance curve and causal controllers that generatethe computed performance. Drivability is emphasized in thispaper, but one could also study the fuel economy tradeoffwith other attributes such as emissions, battery wear, or enginenoise characteristics.

One place where SP-SDP can have a major impact is

1

in controller design for new vehicles. Significant effort isrequired to develop a controller for a new drivetrain, especiallywith a completely new architecture. The SP-SDP method canautomatically generate a provably optimal controller for agiven vehicle architecture and component sizing much fasterthan a person could do it manually. This is especially valuableearly in a program during the hardware design phase.

The work reported here is a collaborative effort between theUniversity of Michigan and Ford Motor Company. The vehiclestudied is a modified Volvo S-80 prototype and does not matchany vehicle currently on the market. As a benchmark, Fordprovided a controller developed for this prototype vehicle.This industrial controller was described in [2] and is termedhereafter the “baseline” controller. In addition, Ford provideda high-fidelity vehicle simulation model calibrated for theprototype vehicle; this is the same simulation environmentused to develop HEV control algorithms and to evaluate fueleconomy and drivability for production vehicles [36].

The remainder of the paper is organized as follows. Sec-tion II presents the vehicle architecture and two dynamicmodels; one is a simplified vehicle model for controller designand the second is the high-fidelity model mentioned above.The drivability metrics used in the optimization problem arepresented in Section III. The particular form of infinite-horizonstochastic optimal control used here, SP-SDP, is presented inSection IV, with a key result that greatly enhances off-linecomputational speed presented in Section V. The procedurefor sweeping out the Pareto tradeoff surface is presentedin Section VI; this involves computing a large family ofcontrollers based on the simplified control-oriented modeland evaluating each controller’s performance with the high-fidelity model, which will more closely approximate the actualperformance on the prototype vehicle. The main results ofthe work are presented in Sections VII and VIII. Concludingremarks are given in Section IX. The Appendix providesadditional information on enhancing off-line computationalspeed for SP-SDP and points out a relation between SP-SDPand ECMS.

II. VEHICLE

A. Vehicle Architecture

The vehicle studied in this paper is a prototype Volvo S-80 series-parallel electric hybrid and is shown schematicallyin Fig. 1. A 2.4 L diesel engine is coupled to the frontaxle through a dual clutch 6-speed transmission. An electricmachine, EM1, is directly coupled to the engine crankshaftand can generate power regardless of clutch state. A secondelectric machine, EM2, is directly coupled to the rear axlethrough a fixed gear ratio without a clutch and always rotatesat a speed proportional to vehicle speed. Energy is stored ina 1.5 kWh battery pack with a recommended State of Charge(SOC) range of 0.35-0.65. The system parameters are listedin Table I.

B. Vehicle Models

The work presented in this paper uses two separate dynamicmodels to represent the same prototype hybrid vehicle. The

TABLE I: Vehicle Parameters

Engine Displacement 2.4 LMax Engine Power 120 kWElectric Machine Power EM1 (Front) 15 kWElectric Machine Power EM2 (Rear) 35 kWBattery Capacity 1.5 kWhBattery Power Limit 34 kWBattery SOC Range 0.35-0.65Vehicle Mass 1895 kg

tex t

tex t

Differential

Battery

Electric Machine 2(EM2)

t etex t

Transmission

EM 1

Clutch

Front

Electric Machine 1

Diesel Engine

Fig. 1: Vehicle Configuration

first model is quite simple; it has a sample time of 1s, useslookup tables, and has very few states. It is used primarily todesign the controller and do the optimization, and is called the“control-oriented” model.

The second model comes from Ford Motor Company anduses its in-house modeling architecture. This sophisticatedmodel is used to evaluate fuel economy and controller behaviorby simulating controllers on drive cycles. This model isreferred to as the “high-fidelity” model in this paper [36].

This combination of models allows the controller to bedesigned on the basis of a simple model for computationaltractability, while providing performance assessment on thebasis of a model that much more closely reflects the compli-cated dynamics of the prototype vehicle.

C. Control-Oriented Model

When using Shortest-Path Stochastic Dynamic Program-ming, the off-line computation cost is very sensitive to thenumber of system states. For this reason, the model used todevelop the controller must be as simple as possible. Thevehicle model used here contains the minimum functionalityrequired to model the vehicle behavior of interest on a second-by-second basis. Dynamics much faster than the sample timeof 1s are ignored. Long-term transients that only weaklyaffect performance are also ignored; coolant temperature isone example.

The vehicle hardware allows three main operating condi-tions:

1) Parallel Mode-The engine is on and the clutch isengaged.

2

2) Series Mode-The engine is on and the clutch is disen-gaged. The only torque to the wheels is through EM2.

3) Electric Mode-The engine is off and the clutch isdisengaged; again the only torque to the wheels isthrough EM2.

The model does not restrict the direction of power flow. Theelectric machines can be either motors or generators in allmodes.

The dynamics of the internal combustion engine are ig-nored; it is assumed that the engine torque exactly matchesvalid commands and the fuel consumption is a function onlyof speed, ωICE , and torque, TICE . The fuel consumption mf

is derived from a lookup table based on dynamometer testing,

mf = F (ωICE , TICE).

The dual clutch transmission has discrete gears and notorque converter. The transmission is modeled with a constantmechanical efficiency of 0.95. Transmission gear shifts areallowed every time step (1s) and transmission dynamics areassumed negligible. While the physical configuration of thetransmission allows arbitrary shifting, the low-level transmis-sion controller anticipates sequential up/down shifting and themodel respects this assumption. This technique is advanta-geous in hardware because shifts execute by smoothly transi-tioning between the two clutches and continually transmittingtorque. One transmission shaft holds the even gears and theother the odd gears. An arbitrary gear may be selected whenthe clutch is disengaged. When the clutch is engaged, thevehicle is in parallel mode and the engine speed is assumeddirectly proportional to wheel speed based on the currenttransmission gear ratio Rg ,

ωICE = Rgωwheel.

The electric machine EM1 is directly coupled to thecrankshaft, and thus rotates at the engine speed ωICE ,

ωEM1 = ωICE .

In parallel mode, the engine torque TICE and EM1 torqueTEM1 transmitted to the wheel are assumed proportionalto wheel torque based on the current gear ratio Rg andthe transmission efficiency ηtrans. The rear electric machinetorque TEM2 transmitted to the wheel is proportional to themachine’s gear ratio REM2 and rear differential efficiencyηdiff . The total wheel torque Twheel from both axles isthus the sum of the ICE and EM1 torques to the wheelηTransRg(TICE +TEM1) and the rear electric machine EM2torque to the wheel ηdiffREM2TEM2,

ηtransRg(TICE +TEM1)+ηdiffREM2TEM2 = Twheel. (1)

The clutch can be disengaged at any time, and power isdelivered to the road through the rear electric machine EM2.This condition is treated as the neutral gear 0, which combineswith the 6 standard gears for a total of 7 gear states. If theengine is on with the clutch disengaged, the vehicle is in seriesmode. The engine-EM1 combination acts as a generator andcan operate at an arbitrary torque and speed. If the engine isoff while the clutch is disengaged, the vehicle is in electric

mode. The clutch is never engaged with the engine off, sothis mode is undefined and not used.

TABLE II: Vehicle Mode Definitions.

Mode Clutch State Engine State Gear StateElectric Disengaged Off 0Series Disengaged On 0

Parallel Engaged On 1-6Undefined/not used Engaged Off 1-6

The battery system is similarly reduced to a table lookupform. The electrical dynamics due to the motor, battery, andpower electronics are assumed sufficiently fast to be ignored.The energy losses and efficiencies in these components canbe grouped together such that the change in battery SOC isa function κ of electric machine speeds ωEM1 and ωEM2,torques TEM1 and TEM2, and battery SOC at the current timestep,

SOCk+1 = κ(SOCk, ωEM1k, ωEM2k

, TEM1k, TEM2k

). (2)

Assuming a known vehicle speed, the only state variablerequired for this vehicle model is the battery SOC. Changesin battery performance due to temperature, age, and wear areignored.

During operation, the desired wheel torque is defined by thedriver. If we assume the vehicle must meet the torque demandperfectly, then the sum of the ICE and EM contributions towheel torque (1) must equal the demanded torque Tdemand,

Twheel = Tdemand. (3)

This adds a constraint to the control optimization, reducingthe 4 control inputs to a 3 degree of freedom problem. Inparallel mode the control inputs are Engine Torque, EM1Torque, and Transmission Gear. In series mode, the electricmachine command becomes EM1 Speed.

Optimization using the control-oriented model assumes aperfect driver during the design process. The desired roadpower is calculated as the exact power required to drivethe cycle at that time. A causal PID driver is used duringsimulation. Now, given vehicle speed, demanded road powerand this choice of control inputs, the dynamics become anexplicit function κ of the state Battery SOC and the threecontrol choices as shown in Fig. 2,

SOCk+1 = κ(SOCk, TICEk, TEM1k

, Geark). (4)

In series mode, TEM1 is replaced with ωEM1. The engine fuelconsumption can be calculated from the control inputs.Operational Assumptions:

This control-oriented model uses several assumptions aboutthe allowed vehicle behavior.

1) Regenerative braking is used as much as possible up tothe actuator limits; friction brakes provide any remainingtorque.

2) The clutch in the transmission allows the diesel engineto be decoupled from the wheels.

3) There is no ability to slip the clutch for starts.

3

State: SOCk

Velocity Torque Demand

Vehicle SOC Dynamics

Engine Torque

SOCk+1

Stochastic Driver

Electric Machine 1 Command

Transmission Gear

Fig. 2: Vehicle SOC Dynamics model. The three inputs areEngine Torque, Electric Machine 1 Torque, and TransmissionGear. The vehicle velocity and required torque are providedby the stochastic driver model. In Series Mode, the ElectricMachine 1 command is speed rather than torque.

4) There are no traction control restrictions on the amountof torque that can be applied to the wheels.

D. High-Fidelity Vehicle Simulation Model

As part of this project, Ford provided an in-house modelused to simulate fuel economy. It is a complex, MAT-LAB/Simulink based model with a large number of parametersand states [36]. Every individual subsystem in the vehicle isrepresented by an appropriate block with its own dynamicsand low-level controllers. For each new vehicle, subsystemsare combined appropriately to yield a complete system.

This high-fidelity model contains the baseline controlleralgorithm. To generate simulation results using this controller,an automated driver follows the target cycle using the baselinecontroller.

To use the high-fidelity model with the control algorithmdeveloped here, the SP-SDP controller is implemented inSimulink by interfacing appropriate feedback and commandsignals: Battery State of Charge, Vehicle Speed, Engine State,Gear Command, etc. The high-fidelity model can then bedriven by the SP-SDP controller along a given drive cycle.

E. Baseline Industrial Controller

Optimal Battery Power

Engine State

Machine

Actuator CommandsWheel Power

Battery Power

Eng State

Fig. 3: High Level Baseline Controller Architecture.

The baseline prototype energy management controller stud-ied here is quite complex. Its key features are contained inthree modules, as depicted in Fig. 3. Driver power demand isdetermined from pedal position. One module determines the

optimal battery power flow and adds it to the driver demandto determine the Total Power. A second module determinesthe optimal engine state based on the Total Power using astate machine with hysteresis. A third rule-based module thendetermines individual actuator commands (e.g., power fromthe engine and the two electric machines) based on the TotalPower and the desired engine state. The transmission gear isselected independently by the transmission.

The primary tuning parameters are actually five scalarfunctions, two in the Optimal Battery Power module and threefunctions of vehicle speed in the Engine State Machine mod-ule. These are the same functions that a calibrator would adjustin the vehicle. One advantage of the baseline architecture isthat engine behavior and battery charge maintenance featuresare largely confined to their respective blocks with minimalcrosstalk, simplifying the tuning process considerably.

III. DRIVABILITY CONSTRAINTS

A. Motivation

Customer perception is a crucial component in vehicle pur-chasing decisions. The driver’s perception of overall vehicleresponse and behavior is termed drivability. Manufacturersare very aware of this and exert significant developmenteffort to satisfy drivability requirements. Generally speaking,drivability concerns affect designs as much as fuel economygoals. Current academic work in hybrid vehicle optimizationprimarily focuses on fuel economy. Such results are somewhatless useful to industry because of drivability restrictions inproduction vehicles. If these fuel-optimal controllers are used,drivability restrictions are typically imposed as a separatestep [1]. In this paper, we investigate the optimization offuel economy and drivability simultaneously. Two significantcharacteristics that are noticeable to the driver are the basicbehaviors of the transmission and engine. By including thesereal-world concerns, one can generate controllers that improveperformance and are one step closer to being directly imple-mentable in production.

Drivability is a rather vague term that covers many aspectsof vehicle performance including acceleration, engine noise,braking, automated shifting activity, shift quality [37], andother behaviors. Improving drivability often comes at theexpense of fuel economy. For example, optimal fuel economyfor gasoline engines typically dictates upshifting at the lowestspeed possible, but this leaves the driver little accelerationability after the upshift. Thus upshifts are scheduled to occurat higher speeds than what is optimal for fuel economy.

B. Formulation

There are many aspects to powertrain behavior [38]. Whilemetrics exist to quantify some of these behaviors, for manymore, only qualitative judgements are available. An importantcontribution of the work reported here is the translation ofsome of the qualitative measures into quantitative metricsthan can be used in the optimization formulation. In-housedrivability experts were consulted to assist in developingnumerical drivability criteria. The first step is to describe andquantify engine and transmission behaviors using performance

4

Gear Hunting EventsEngine Off Dwell Time <X seconds

Gear Dwell Time < X secondsEngine On Dwell Time < X secondsShort Duration Events

Mean Engine Off Time

Mean Time in GearMean Engine On TimeMean Dwell Times

Transmission BehaviorEngine Behavior

Gear EventsEngine EventsSimplified Metrics

Fig. 4: Drivability Metric Reduction. The seven complex en-gine and transmission metrics are divided into two categories,mean dwell times and short-duration events. These metrics arethen reduced to the two simplified metrics.

metrics, and the second step is to reduce the complexityof these metrics so that they may be more easily used inoptimization.

A primary concern in drivetrain activity is the frequencyand timing of events, such as gear shifts and engine start/stop.Two categories of metrics are used, the mean time betweenevents and the number of short-duration events; the latter areespecially bothersome to drivers. A short duration event occurswhen the dwell time in a particular state is less than somespecified value; the metric is the number of these occurrences.This type of metric is denoted “Dwell time less than Xseconds,” where X is the cutoff criteria. These “mean” and“short duration” categories of metrics applied to the engine andtransmission generate 7 distinct metrics, termed the “complex”metrics. These 7 metrics represent a detailed description ofvehicle behavior and are shown in the top table in Fig. 4.Many other metrics could obviously be used, but these are animportant subset of the possibilities.

For the transmission, a particularly annoying short-durationevent is hunting, rapid shifting back and forth between thesame two gears. We define a gear hunting event as a sequentialupshift-downshift or downshift-upshift that occurs faster thansome cutoff time X . The metric is the number of occurrencesof a hunting event. This type of shifting often occurs in normaldriving, but only becomes bothersome when the shifts areclosely spaced. Shifting that is frequent or perceived to beunnecessary is often termed shift “busyness,” and is reflectedin both mean dwell time and short duration metric categories.

The most bothersome engine events are those of veryshort duration, and to some extent drivers ignore the long-horizon (10-20s) engine behavior as long as there are noshort-duration events. Although it is theoretically possible toincorporate these metrics in the optimization formulation, thecomputational burden of the required additional states makesthe problem intractable. Another disadvantage is the largeparameter space of penalty weights for the various metrics.Even if all these metrics were directly implemented and acontroller computed, the control designer is left with a verycomplex design process.

C. Simplified Drivability Metrics

We choose therefore to simplify these complex metrics intotwo quantities that can be easily used. The first is gear events,

the total number of shift events on a given trip. The secondmetric is engine events, the total number of engine start andstop events on a trip. This reduction is depicted in Fig. 4.By definition, engine starts and stops are each counted as anevent. Each shift with the clutch engaged is counted as a gearevent, regardless of the change in gear number. A 1st − 2nd

shift is the same as a 1st−3rd shift. Engaging or disengagingthe clutch is not counted as a gear event, regardless of the gearbefore or after the event.

To be useful, it must be possible to adjust the behaviorsmeasured by the complex metrics by adjusting the weights onthe two simple metrics. This has been evaluated in [35]; spacerestrictions prohibit including this analysis here. By findingsimple metrics that are well correlated with the complexmetrics, we can incorporate the simple metrics into the fullSP-SDP algorithm. This provides indirect control over morecomplex drivability behavior while keeping the optimizationproblem feasible.

IV. SHORTEST PATH STOCHASTIC DYNAMICPROGRAMMING

A. Cost Function

In order to design a controller with acceptable drivabilitycharacteristics, the controller should make some compromisebetween fuel economy and drivability. In practice, it is com-mon to think of this problem as a constrained optimization byminimizing fuel consumption while maintaining a constrainton acceptable drivability. Implementable controllers lack fu-ture knowledge, so constraints that depend on both past andfuture are difficult to enforce. Instead, drivability events areincluded as penalties, and those penalty weights are adjusteduntil the outcome is acceptable.

Controllers based only on fuel economy and drivabilitycompletely drain the battery as they seek to minimize fuel. Anadditional cost is added to ensure that the vehicle is chargesustaining over the cycle. This SOC-based cost only occurs atthe end of the trip and is represented as a function φSOC(xT )of the state x, which includes SOC. The performance indexfor a particular drive cycle is then (suppressing the summingindex)

J =T∑0

mf +αT∑0

IGE(x, u)+βT∑0

IEE(x, u)+φSOC(xT ).

(5)The functions I(x, u) are indicator functions and show whena state and control combination produces a gear event (GE)or engine event (EE). T is the time duration from key-on, thestart of the trip, to key-off, the end of the trip. The drivabilitybehavior is not incorporated as a direct constraint, so the searchfor the weighting factors α and β involves some trial and errorbecause the mapping from penalty to outcome is not knowna priori. Note that setting α and β to zero means solving foroptimal fuel economy only.

As the cycle is not known exactly in advance, this opti-mization is conducted in the stochastic sense by minimizingthe expected sum of a running cost function c(x, u, w) subject

5

to the system dynamics,

minEw

∞∑k=0

c(xk, uk, wk) (6)

subject toxk+1 = f(xk, uk, wk) (7)g1(xk, uk, wk) ≤ 0 (8)g2(xk, uk, wk) = 0. (9)

The expectation over the disturbance w is denoted Ew. Ac-tuator limits, torque delivery requirements, and other systemrequirements can be incorporated in the constraints g1 and g2.These constraints are enforced instantaneously at each timestep, in contrast to drivability goals which involve performanceover the whole cycle.

Now, to implement the optimization goal (6), a running costfunction is prescribed to represent (5) as a function only ofthe state x, control input u, and disturbance w at the currenttime

c(x, u, w) = mf (x, u)+αIGE(x, u)+βIEE(x, u)+φSOC(x,w)(10)

The SOC-based cost φSOC(x,w) applies only at the end ofthe trip, when the key-off disturbance occurs and the systemtransitions to the key-off absorbing state1. As this is thestochastic case, the key-off disturbance replaces the terminaltime cost φSOC(xT ) in (5).

B. Problem Formulation

To determine the optimal control strategy for this vehicle,the SP-SDP algorithm is used [25], [26], [33], [34]. Thismethod directly generates a causal controller; characteristics offuture driving behavior are specified via a finite-state Markovchain rather than exact future knowledge. The system modelis formulated as

xk+1 = f(xk, uk, wk), (11)

where uk is a particular control choice in the set of allowablecontrols U , xk is the state, and wk is a random variable arisingfrom the unknown drive cycle. Given this formulation, theoptimal cost V ∗(x) over an infinite horizon is a function ofthe state x and satisfies

V ∗(x) = minu∈U

Ew[c(x, u, w) + V ∗(f(x, u, w))], (12)

where c(x, u, w) is the instantaneous cost as a function of stateand control; (10) is a typical example. This equation representsa compromise between minimizing the current cost c(x, u, w)and the expected future cost V (f(x, u, w)). The control u isselected based on the expectation over the random variablew, rather than a deterministic cost, because the future canonly be estimated based on the probability distribution of w.Note that the cost V (x) is a function of the state only. Thiscost is finite for all x if every point in the state space has apositive probability of eventually transitioning to an absorbing

1Many other vehicle behaviors can be optimally controlled by addingappropriate functions of the form φ(x, u, w); a typical example is limitingSOC deviations during operation to reduce battery wear.

state that incurs zero cost from that time onward. Here, theabsorbing state is key-off, the end of the drive cycle.

The optimal control u∗ is any control that achieves theminimum cost V ∗(x)

u∗(x) = argminu∈U

Ew[c(x, u, w) + V ∗(f(x, u, w))]. (13)

Note that the disturbance w in (12) and (13) may be condi-tioned on the state and control input,

P (wk|xk, uk). (14)

The driver demand is modeled as a Markov chain. Thisdriver is assigned two states: current velocity vk and currentacceleration ak, which are included in the full system state x.The unknown disturbance w in (12) is the acceleration at thenext time step, which is assigned a probability distribution.This means estimating the function

P (ak+1|vk, ak) (15)

for all states vk, ak. The transition probabilities (15) areestimated from known drive cycles that represent typicalbehavior, dubbed the “design cycles.” The variables vk, ak,and ak+1 are discretized to form a grid. For each discrete state[vk,ak] there are a variety of outcomes ak+1. The probabilityof each outcome ak+1 is estimated based on its frequency ofoccurrence during the design cycle, and is clearly a functionof state as in (14). See [25], [26] for more detail.

To track the gear event and engine event metrics (SectionIII), two additional states are required: the Current Gear (1-6)and Engine State (on or off).

Bringing this all together, the full system state vector xcontains five states: one state for the vehicle (Battery SOC),two states for the stochastic driver (vk, ak), and two statesto study drivability (Current Gear and Engine State). Thisformulation is termed the “SP-SDP-Drivability” controller. Asummary of system states is shown in Table III.

TABLE III: Control-Oriented Vehicle Model States

State UnitsBattery Charge (SOC) unitless

Vehicle Speed m/sCurrent Vehicle Acceleration m/s2

Current Transmission Gear Integer 1-6Current Engine State On or Off

The inputs to the model are engine torque, transmissiongear number, and the powers or torques of the two electricmachines. Looking ahead, Section V shows how the two elec-tric machines can be controlled through a single input using anoff-line optimization, reducing the inputs for the optimizationproblem to engine torque, transmission gear number, and totalelectric machine power. The power balance to meet driverdemand (Section II-C) then allows the elimination of onemore input. The final control input u that will be used inthe optimization problem consists therefore of Engine Torqueand Transmission Gear.

6

The form of the Bellman equation (12) associated with anydynamic programming problem allows an analytical compar-ison with ECMS and is discussed in the Appendix.Remark: As demands on controller functionality grow, soalso must the complexity of the design model. For example,to study fuel economy using deterministic dynamic program-ming, the only state required is the battery state of charge; thecontrol inputs are Engine Torque and Transmission Gear. Twomore states are required to study the stochastic version, andthe drivability model requires two additional states.

C. Terminal State

As mentioned in Section IV-B, the dynamics of the systemmust contain an absorbing state. For this case, the absorbingstate represents key-off, when the driver has finished the trip,shut down the vehicle, and removed the key. Once the key-off event occurs, there are no furthur costs incurred, the tripis over, and the vehicle cannot be restarted. The probabilityof transitioning to this state is zero unless the vehicle iscompletely stopped (vk, ak = 0). The probability of a tripending once the vehicle is stopped is calculated based onthe design cycles. This probability is less than one because astopped vehicle could represent a traffic light or other typicaldriving event that does not correspond to the end of a trip.

For fuel economy certification, the battery final SOC mustbe close to the initial SOC or else the test is invalid. To includethis in the SP-SDP formulation, a cost is imposed when thevehicle transitions into the key-off state and the SOC is lessthan the initial SOC. This penalty accrues only once, so theabsorbing state has zero cost from then onwards. Here we adda quadratic penalty in SOC if the final SOC is less than theinitial SOC. No penalty is assigned if the final SOC is higherthan the initial SOC.

The effects of this key-off penalty are clearly visible inthe value function V (x). For the fuel-only case, the valuefunction depends on the current acceleration, velocity, andSOC. Fig. 5 shows V (x) as a function of SOC for oneparticular acceleration and several velocities, with target finalSOC equal to 0.5. Notice that at low velocities, the valuefunction has a pronounced quadratic shape for SOC under0.5, but it flattens out at higher speeds. The SOC penalty onlyoccurs at key-off, which can only occur at zero speed. Thusthe SOC key-off penalty strongly affects the value function atlow speeds, when there is a higher probability of key-off inthe near future. At higher speeds, there is little chance of key-off anytime soon, so the SOC penalty only weakly affects thevalue function. Moreover, there will be a deceleration phasebefore reaching zero speed and thus an opportunity to rechargethe battery.

D. Implementable Constraints

Stochastic Dynamic Programming is inherently computa-tionally intensive and can quickly become intractable. Thecomputation burden is exponential in the number of systemstates; thus the cost function (10) should depend on a minimalnumber of states.

0.35 0.4 0.45 0.5 0.55 0.6 0.65200

400

600

800

1000

1200

1400

1600

SOC

Val

ue F

unct

ion

Velocity = 0 m/sVelocity = 8.6 m/sVelocity = 12.7 m/sVelocity = 25.3 m/s

Stopped

Highway Speed

Fig. 5: Value Function V (x) for several velocities and fixedacceleration. The quadratic penalty on SOC strongly affectsthe value function at low speeds when the driver is more likelyto turn the key off and end the trip.

For optimization, at each time step a penalty is assignedif either a shift or engine event occurs. The only two statesrequired to implement this cost function are the current gearand the engine state. Even so, including drivability in theoptimization imposes roughly a factor of ten increase incomputation over the fuel-only case.

In contrast, suppose the metric of interest were based on amoving window in time. The number of required grid pointsscales with the number of time steps used to specify the metric.For the 1 s update time studied here, penalizing engine eventsof 5 seconds duration or less (rather than the simple on/off)would require 5 grid points for the time history, increasing thesize of the state-space by the same factor of 5 over the on/offcase.

V. COMPUTATION REDUCTION

A. Theory

Proposition: (Minimization Decomposition)Consider a Bellman equation of the form

V ∗(x) = minu∈U(x),u∈U(x,u)

Ew[c(x, u, u, w) + V ∗(f(x, u, w))],

(16)and define

c(x, u) = minu∈U(x,u)

Ew[c(x, u, u, w)]. (17)

Then V ∗(x) satisfies (16) if and only if it satisfies

V ∗(x) = minu∈U(x)

Ew[c(x, u) + V ∗(f(x, u, w))].� (18)

The proof and more detail are available in the Appendix.This result allows a significant reduction in computationcomplexity for problems that have the specific structure (16).The reduced Bellman equation (18) may be solved using onlythe reduced control space U(x). This structure appears quiteoften in energy management problems (see Appendix).

7

The above decomposition is often exploited, usually withoutexplicit theoretical justification [16], [39], [40]. A typicalexample is the power-split configuration which uses enginepower and speed as inputs without an engine speed state [39].The fuel-minimizing engine speed (u) for each engine power(u) is precomputed and stored as a table (see Appendix).

The subsection below details the physical explanation of thestructure (16) for the vehicle considered in this work and howthe decomposition is implemented.

B. Super Electric Machine

In comparison to previous work in [1], the addition of asecond electric machine makes the computation of a SP-SDPsolution potentially more complex. Exploiting structure (16)and using Minimization Decomposition reduced the computa-tional cost to that of a vehicle with a single electric machine,a 90% reduction. The addition of the second electric machineis approximately free in terms of computation.

The system inputs require a tradeoff between the two elec-tric machine torques. Define T(·) as the wheel torque deliveredby a particular actuator. The system dynamics are only affectedby the sum of the electric machine (SEM) torques deliveredto the wheels TSEM

u : TSEM = TEM1 + TEM2 (19)

and not by the difference of the electric machine torquesTDEM

u : TDEM = TEM1 − TEM2. (20)

This torque splitting may be considered u. Since this onedegree of freedom optimization is static (i.e., independent ofthe dynamic states of the model including SOC), it takes theform (16) and can be computed a priori using (17) withoutloss of optimality. This reduces the dimension of the controlspace by one. The fundamental assumption that allows this towork is that the electric machine behaviors are dependent onlyon the current gear, engine state, and velocity, and not the pastvalues of those states.

The physical control inputs to the system are engine torque,transmission gear, EM1 torque and EM2 torque. By replacingthe two electric machine commands with a single electricwheel torque command, the SP-SDP algorithm has only 3control inputs.

A more intuitive explanation is to treat the system as a single“Super Electric Machine.” This device is a black box that takesa desired wheel torque command as an input and uses thevehicle velocity and transmission gear to match the commandtorque with minimal electric power as shown in Fig. 6. Oncethe optimization is complete, this device acts just like a normalelectric machine for the SP-SDP optimization. Internally, thedevice optimizes between the two (or possibly more) electricmachines and issues appropriate commands.

VI. SIMULATION PROCEDURE

SP-SDP-based controllers are compared to a baseline in-dustrial controller. SP-SDP controllers are designed using thecontrol-oriented model and evaluated using the high-fidelity

Optimize

EM1

Gear Velocity

EM2

Super Electric Machine

Total Wheel TorqueElectric Wheel Torque Command

Electric Wheel Torque Command

Super Electric Machine

EM-S

Mechanical Analog:

Internal Function:

++

Total Wheel Torque

Fig. 6: Schematic diagram of a conceptual “Super ElectricMachine” that optimizes the mix between the two electricmachines. This allows one degree of freedom of the controloptimization to be carried out off-line while maintaining theoptimality of the solution.

0 200 400 600 800 1000 1200 1400 1600 1800 20000

10

20

30

40

50

60FTP

time (s)

Spe

ed (

mph

)

0 200 400 600 800 1000 12000

20

40

60

80

time (s)

Spe

ed (

mph

)

NEDC

Fig. 7: Federal Test Procedure (FTP) and New European DriveCycle (NEDC).

vehicle simulation model of Section II-D. This demonstratessome robustness by using two models of the same vehicle,differing in the level of detail in their dynamics. Strictlyspeaking, the optimality guarantees are no longer valid be-cause the test model is different from the design model. Forpractical purposes, a strictly optimal model-based controlleris unattainable in hardware because a model will always havesome mismatch with a real vehicle. Demonstrating excellentperformance on the (exact) design model is only marginallyuseful as it presents no model uncertainty. By designing thecontroller on a simple model and testing on a (not perfectlymatched) complex model, we more closely approximate theprocess of designing on the basis of a model and testing onhardware.

8

Both SP-SDP and the baseline controllers are simulated ontwo government test cycles, the US Federal Test Procedure(FTP) and the New European Drive Cycle (NEDC), which areshown in Fig. 7. Procedurally, this is conducted as follows:

1) A family of SP-SDP controllers is designed accordingto the methods of Section IV. A family is generated byfixing the model driving statistics and sweeping the 2drivability penalties α and β in (10).

2) Each controller in the family is simulated on the high-fidelity model.

3) The fuel economy and drivability metrics are recorded.Fuel economy is computed in units of MPG (Milesdriven Per Gallon of fuel consumed), and hence largenumbers mean better fuel economy.

In the end, each family contains a few hundred individualcontrollers which have each been simulated on the cycle inquestion. Each simulation yields a data point with associatedfuel economy and drivability metrics. Each controller in thefamily has different drivability and fuel economy characteris-tics because of the varying drivability penalties.

Each controller is simulated on the high-fidelity modeldiscussed in Section II-D. The simulations are all causal, sothe final SOC is not guaranteed to exactly match the startingSOC. This could yield false fuel economy results, so all fueleconomy results are corrected based on the final SOC of thedrive cycle. This is done by estimating the additional fuelrequired to charge the battery to its initial SOC, or the potentialfuel savings shown by a final SOC that is higher than thestarting level. This correction is applied according to

∆mf = CBatt∆SOCBSFCmin

ηRegenmax

(21)

where ∆Fuel is the adjustment to the fuel used, CBatt isthe battery capacity, ∆SOC is the difference between thestarting and ending SOC, BSFCmin is the best Brake SpecificFuel Consumption for the engine, and ηRegen

max is the bestcharging efficiency of the electric system. This correction is areasonable approximation but not exact; the exact correctiondepends on the controller and the particular cycle. For theFTP cycle, the mean fuel economy correction for the SOCdeviations presented in Fig. 8e is 1.6%, with a 1.3% standarddeviation. Hence, using this simple correction does not changethe conclusions of the presented results in any substantial way.

Fuel economy numbers in this paper always include theSOC correction, and are normalized so that the baselinecontroller running the FTP cycle has a fuel economy of one.

All simulations in this paper use the same PID driver modelwith identical update rates.

VII. RESULTS: PERFORMANCE TRENDS

A. Fuel Economy Results

The main goal of this research is to tradeoff fuel economyand drivability requirements by using a class of optimalcontrollers, and validate the result against industrial designmethods. The three metrics of interest during vehicle drivingare the number of gear events, engine events, and the totalfuel consumption corrected for SOC. These metrics yield

conflicting goals and there is a distinct tradeoff among them.To study this tradeoff, several hundred controllers are designedwith varying penalties assigned to each gear event and engineevent. This creates one family of controllers as described inSection VI. The results are shown for FTP and NEDC in Fig.8.

After simulation, the resulting data show the tradeoff be-tween fuel economy and drivability. The typical result is a 3-Dscatterplot of one family of controllers as shown in Fig. 8a.Each point represents a single controller driven on the cyclein question. The controllers are all driven on the same testcycle. The total number of gear events and engine events areshown on the horizontal axes, while fuel economy is shownon the vertical axis as normalized MPG2. The combination ofthese points form a surface in 3-D space depicting the tradeoffsurface of various operating conditions. This particular figureshows a family of controllers designed using FTP statisticsrunning the FTP cycle. Fuel economy data presented in thispaper are normalized to the fuel economy of the baselinecontroller on FTP, shown as a large solid circle. Hence, a fueleconomy greater than one means more miles would be traveledusing the same fuel as consumed by the baseline controller,or equivalently, less fuel would be consumed for the samedistance traveled. A polynomial surface is fit to the raw dataand used to generate isoclines of constant gear, shown as solidand dashed lines.

Fig. 8c is a 2-D view of Fig. 8a looking along the gear eventsaxis. Each line in the plot represents a constant number of gearevents, while the horizontal and vertical axes show the numberof engine events and normalized fuel economy respectively.This particular vehicle is relatively insensitive to the number ofgear events, so most of the results concentrate on the tradeoffbetween engine activity and fuel economy. The final SOC forthese simulations is shown in Fig. 8e. All simulations start at0.5 SOC.

Similarly, a family of controllers is designed and simulatedon the NEDC. Fuel economy results are again shown in 3-Dand 2-D in Figs. 8b and 8d, while the final SOC is shownin Fig. 8f. Again, fuel economy is normalized to the baselinecontroller performance on FTP, so the baseline controller isslightly less fuel efficient on NEDC (0.99) than FTP (1.00).

B. Discussion

The frontiers of the 2-D and 3-D point clouds in Fig. 8clearly demonstrate the tradeoff between fuel economy anddrivability. Assuming causal controllers and the same a prioriinformation (vehicle model and Markov chain driver model),no controller’s average performance can exceed the SP-SDPfrontier3.

The plot of final SOC for the FTP cycle (Fig. 8e) shows adistinct downward trend for large numbers of engine events.The target final SOC is 0.5, which the controllers comevery close to achieving when engine events are unrestricted

2Recall that more miles per gallon means better fuel economy, while theinverse would hold if units of liters per 100 kilometers were used.

3The optimality guarantee for SP-SDP is based on expected cost (12) andnot controller performance on a particular realization of the statistics (i.e.,sample path or drive cycle).

9

0 20 40 60 80 100 120

0100

2003000.75

0.8

0.85

0.9

0.95

1

1.05

1.1

1.15

Engine EventsGear Events

Nor

mal

ized

MP

G

SP−SDPBaseline− 93 Gear Events25 Gear Events75 Gear Events

25 Gear Events

75 Gear Events

(a) FTP: Fuel Economy 3-D Scatterplot. Isoclines of constant gearevents (GE) are shown as solid blue (25 GE) and dashed black(75 GE) lines.

0 10 20 30

050

1000.75

0.8

0.85

0.9

0.95

1

1.05

1.1

1.15

Engine EventsGear Events

Nor

mal

ized

MP

G

SP−SDPBaseline− 19 Gear Events5 Gear Events20 Gear Events

20 Gear Events

5 Gear Events

(b) NEDC: Fuel Economy 3-D Scatterplot. Isoclines of constantgear events (GE) are shown as solid blue (5 GE) and dashed black(20 GE) lines.

0 20 40 60 80 100 1200.75

0.8

0.85

0.9

0.95

1

1.05

1.1

1.15

Engine Events

Nor

mal

ized

MP

G

SP−SDPBaseline− 93 Gear Events25 Gear Event Fit25 Gear Event Data75 Gear Event Fit75 Gear Event Data

75 Gear Events

25 Gear Events

(c) FTP: 2-D view along the gear events axis of Fig. 8a. The dataused to fit the constant gear events isoclines are shown as bluetriangles (25 GE) black squares (75 GE).

0 5 10 15 20 25 300.75

0.8

0.85

0.9

0.95

1

1.05

1.1

1.15

1.2

Engine Events

Nor

mal

ized

MP

G

SP−SDPBaseline− 19 Gear Events5 Gear Event Fit5 Gear Event Data20 Gear Event Fit20 Gear Event Data

20 Gear Events

5 Gear Events

(d) NEDC: 2-D view along the gear events axis of Fig. 8b. Thedata used to fit the constant gear event isoclines are shown as bluetriangles (5 GE) black squares (20 GE).

0 20 40 60 80 100 1200.48

0.5

0.52

0.54

0.56

0.58

0.6

0.62

Engine Events

SO

C

SP−SDPBaseline

(e) FTP: Final SOC for the cycle.

0 5 10 15 20 25 30 350.48

0.5

0.52

0.54

0.56

0.58

0.6

0.62

Engine Events

SO

C

SP−SDPBaseline

(f) NEDC: Final SOC for the cycle.

Fig. 8: Performance of SP-SDP controllers on FTP and NEDC. A separate family of controllers is designed for FTP and NEDCusing each cycle’s statistics. The family is designed by sweeping the parameters α and β in the cost function (10). Figs. 8a and8b show the data as a 3-D scatterplot of fuel economy vs. drivablity events. Fuel economy is presented in miles per gallon,so higher is better, and normalized so that the baseline controller has a fuel economy of 1.0 on FTP. SP-SDP controllers areshown as small dots (red). The baseline controller is shown as a large solid circle (green). Figs. 8c and 8d show the view alongthe gear events axes of Figs. 8a and 8b respectively. The raw data points, isoclines, and baseline controller are still visible.Figs. 8e and 8f show the final SOC for these controllers. All controllers start with SOC=0.5.

10

(low penalties). The final SOC penalty φSOC(x,w) in (10)is only applied if the final SOC is below this target. Forfinal SOCs above the target, the only cost is the fuel spentcharging the battery. With smaller numbers of engine events,the controller has less freedom to turn the engine on and chargethe battery. In effect, the controllers become more conservativeand maintain higher SOCs in general to avoid either additionalengine starts or a final SOC that is too low.

An interesting phenomenon occurs when the engine eventspenalty is very high. The clutch cannot slip at low speeds,so the only option is to shut down the engine, or disengagethe clutch to enter series mode. Series mode is generally notused, but with sufficiently high penalties on engine activity,the controller will always enter series mode at low speeds toavoid engine shutdown. This yields a cycle with zero engineevents, other than the initial start and final stop.

The results show some unexpected trends. Figs. 8c and 8dshow a slight decrease in fuel economy for large numbers ofengine events. In some cases on FTP, decreasing the numberof gear events actually increases fuel economy (8c). There aretwo issues here: the optimization is with respect to expectedvalue and not a single sample path as in the plot, and the plotdepicts controller performance on the high-fidelity model andnot the control-oriented design model. To determine whichof these two explanations is the correct one, simulations ofcontroller performance for the FTP cycle were conducted onthe simplified vehicle model (i.e., the control design model)in order to eliminate the issue of model mismatch. Thesesimulations show the same trends discussed above, implyingthat model mismatch is not causing these phenomena. Largenumbers of cycles were then simulated [35] to check theperformance in the expected sense rather than for a singlecycle. This second set of simulations show fuel economy ismonotonic in both gear and engine events as one would expect.

The SP-SDP results show significant (11%) performanceimprovements over the baseline controller for the metricsconsidered here. Production controllers incorporate many ad-ditional attributes, such as noise, harshness, durability, safety,accessory loading, diagnostics, etc. These attributes may de-crease the performance margin when fully incorporated. Oneobvious example is emissions, which was not considered ineither the baseline or SP-SDP controllers. For previous workon NOx emissions, see [41], [23], [42].

VIII. RESULTS: DETAILED PERFORMANCE

A. Results

Several controllers are studied in greater detail on the FTPcycle, which generally yields more interesting behavior thanthe NEDC. The performance of the baseline controller iscompared to 3 SP-SDP controllers in Table IV, all runningthe FTP cycle. The SP-SDP controllers are designed usingFTP statistics and are selected from those shown in Figs.8a-8e. SP-SDP #1 is the controller with the best correctedfuel economy without regard to drivability. The peak of thefuel economy surface (Fig. 8a) is very close to the baselinecontroller operating point in terms of drivability. SP-SDP #2has the closest drivability metrics to the baseline controller,

and is closely related to SP-SDP #1. SP-SDP #3 is selected byfinding a controller with similar fuel economy to the baselinecontroller and about half the number of drivability events.Essentially, we are presenting two possible design choices:improved fuel economy with similar drivability (SP-SDP#2),or similar fuel economy with reduced drivetrain activity (SP-SDP#3). The designer may also select some compromisebetween the two.

Time histories of the baseline and SP-SDP #1 controllersare presented for the first 500 seconds of the FTP cycle inFig. 9. The transmission gear is not shown if the engine is offor the clutch disengaged. The engine torque/speed operatingpoints for these two controllers on the full FTP75 cycle areshown in Fig. 10.

While it is not easy to pinpoint the performance differencesbetween the controllers, some summary metrics are shown forthe baseline and SP-SDP #1 controllers in Table V.

The Forward Wheel Energy is the integral of all motoring(output) wheel power, the Engine Output Energy is the totalenergy delivered at the engine output shaft, Engine BrakeSpecific Fuel Consumption (BSFC) (g/kWh) is the total fuelconsumed divided by the total engine output energy, andBraking Energy is the energy dissipated by the friction brakes.

For the electrical propulsion system, the Electro-MechanicalCharge Energy is the total mechanical energy absorbed bythe electric machines, and the Electro-Mechanical DischargeEnergy is the forward mechanical energy provided by theelectric machines. The losses are the difference between thetwo, and represent all losses in the electrical system includingaccessory loads. The Round-Trip Electrical Efficiency is theDischarge Energy plus any net change in SOC divided by theCharge Energy.

For each metric, the Net Change is shown as the differencebetween the SP-SDP value and the baseline value. The PercentChange is the Net Change scaled by the baseline value. Thefinal column only applies for energy metrics and is the NetChange scaled by the Forward Wheel Energy, a measure ofthe total energy required for the cycle. This gives a senseof relative scale to the numbers, for example, the percentagechange in electro-mechanical discharge energy is 56%, but thechange is only 16% when the net change is divided by the totaltrip energy.

B. Discussion

Figs. 9 and 10 and Table V lend some insight into theperformance differences between the SP-SDP and baselinecontroller. Table V shows the SP-SDP controller is moreefficient in its use of the diesel engine. The engine primarilyoperates near a high efficiency point or completely off, andyields a lower average BSFC as shown in Fig. 10. The high-torque operating points are also visible in Fig. 9. This allowsthe engine to remain off for slightly longer periods of time(Fig. 9). Note that in general, the engine start/stop activity ofthe two controllers is almost identical with the exception ofa few cases. The electric machines are used more extensivelyby the SP-SDP controller, which allows more efficient ICEutilization and more efficient overall electrical propulsion.

11

TABLE IV: Selected SP-SDP controller performance

Controller Description Fuel Economy (Corrected) Engine Events Gear Events Final SOC Fuel Economy (Uncorr.)Baseline Controller 1.000 88 93 0.505 0.997

SP-SDP #1-Best Fuel Economy 1.119 88 106 0.504 1.117SP-SDP #2-Similar Drivetrain Activity 1.114 88 93 0.506 1.110

SP-SDP #3-Reduced Drivetrain Activity 1.010 34 36 0.561 0.977

0 50 100 150 200 250 300 350 400 450 5000

20

40

Vel

ocity

(m

/s)

0 50 100 150 200 250 300 350 400 450 5000

100

200

Eng

ine

Tor

que

(Nm

)

0 50 100 150 200 250 300 350 400 450 5000.420.440.460.48

0.50.52

SO

C

0 50 100 150 200 250 300 350 400 450 500

2

4

6

Gea

r

0 50 100 150 200 250 300 350 400 450 5000

0.5

1

Eng

ine

Sta

te

time (s)

BaselineSP−SDP

Fig. 9: Time traces of selected simulation parameters. The baseline controller is shown as a solid (blue) line, and one particularSP-SDP controller that yields the best overall fuel economy is shown as a dashed (red) line; this is SP-SDP #1 in Table IV.The transmission gear is shown only when the engine is on and the clutch is engaged.

TABLE V: Selected SP-SDP controller performance

Baseline Controller SP-SDP #2 Net Change Percent Change Net ChangeForward Wheel Energy

Forward (Motoring) Wheel Energy (kJ) 8731 8579 -152 -1.7% -1.7%Engine Output Energy (kJ) 11939 11990 51 0.4% 0.6%

Braking Energy 255 6 -249 -97.8% -2.9%Electro-Mechanical Charge Energy (kJ) 5525 7196 1671 30.3% 19.1%

Electro-Mechanical Discharge Energy (kJ) 2514 3926 1412 56.2% 16.2%Electro-Mechanical Losses (kJ) 2978 3234 255 8.6% 2.9%

Round-Trip Electrical Efficiency (%) 46.1 55.1 9 - -Engine BSFC (g/kWh) 264.8 236.9 -27.8 -10.5% -

12

219

219 219 21

9

230

230

230

230230

240

240240

240

240240

257

257

257

257

257257

274

274

274274

274

343 343

343

Engine Speed (rpm)

Eng

ine

Tor

que

(Nm

)

0 1000 2000 3000 4000 5000 60000

50

100

150

200

250

300

Fig. 10: Engine Torque-Speed operating points on the FTPcycle. The solid black line represents an operational restriction,which both the SP-SDP and baseline controllers respect. Base-line controller operating points are shown as circles (blue) andSP-SDP controller operating points are shown as diamonds(red). The SP-SDP controller is SP-SDP #1 in Table IV andalso shown in Fig. 9. The isoclines show constant brakespecific fuel consumption in g/kWh.

IX. CONCLUSIONS

An energy management controller for a prototype Volvo-S80 parallel-series hybrid electric vehicle has been developedusing Shortest Path Stochastic Dynamic Programming to op-timally perform the inherent tradeoff between fuel efficiencyand drivability. The SP-SDP-based controllers minimize theexpected value of a cost function, using a statistical descriptionof expected driving behavior. Here, the cost was a weightedsum of consumed fuel and drivability penalties based on shiftevents and engine on-off events. By varying the weights, thePareto tradeoff surface of fuel economy versus drivability forthe SP-SDP-based controllers was evaluated on a high-fidelityvehicle simulation model provided by Ford Motor Company.

The performance of the SP-SDP-based controllers was com-pared against an industrial baseline controller, also providedby Ford Motor Company as part of this collaborative effort.For the same level of drivability, the SP-SDP-based controllerswere 11% more fuel efficient than the baseline controller onthe FTP cycle and the New European Drive cycle.

In general, dynamic programming is well known to sufferfrom the “curse of dimensionality,” referring to the exponentialexplosion of problem size with the number of state and controlvariables. The SP-SDP controller calculations were conductedhere using a two-step optimization strategy that preservedoptimality while decreasing computation by at least a factorof 10, thereby allowing the algorithm to be run on a desktopPC. This method is applicable to other vehicle configurationsthat have multiple actuator degrees of freedom.

While the excellent fuel economy of the SP-SDP controllersis very interesting, we feel a more important observation is

that the SP-SDP design method produces causal controllersthat respect constraints and perform well on (and off [35])standard test cycles. SP-SDP-based controllers can be directlyimplemented in a real-time control environment with littlemanual tuning, as demonstrated on an industrial vehicle modelwhich includes detailed subsystem models, dynamics, delays,and limits. We believe that stochastic optimization is a viableenergy management design method that merits transition fromacademic research papers into industrial labs and onto theroad.

REFERENCES

[1] D. Opila, D. Aswani, R. McGee, J. Cook, and J. Grizzle, “Incorporatingdrivability metrics into optimal energy management strategies for hybridvehicles,” in Proc. IEEE Conference on Decision and Control, 2008, pp.4382–4389.

[2] D. Opila, X. Wang, R. McGee, J. Cook, and J. Grizzle, “Performancecomparison of hybrid vehicle energy management controllers on real-world drive cycle data,” in Proc. American Control Conference, 2009,pp. 4618–4625.

[3] ——, “Fundamental structural limitations of an industrial energy man-agement controller architecture for hybrid vehicles,” in Proc. ASMEDynamic Systems and Control Conference, 2009, pp. 213 –221.

[4] A. Sciarretta and L. Guzzella, “Control of hybrid electric vehicles,” IEEEControl Systems Magazine, vol. 27, no. 2, pp. 60–70, 2007.

[5] S. Barsali, M. Ceraolo, and A. Possenti, “Techniques to control theelectricity generation in a series hybrid electrical vehicle,” IEEE Trans.Energy Convers., vol. 17, no. 2, pp. 260–266, 2002.

[6] S. Barsali, C. Miulli, and A. Possenti, “A control strategy to minimizefuel consumption of series hybrid electric vehicles,” IEEE Trans. EnergyConvers., vol. 19, no. 1, pp. 187–195, 2004.

[7] J.-S. Won, R. Langari, and M. Ehsani, “An energy management andcharge sustaining strategy for a parallel hybrid vehicle with CVT,” IEEETrans. Control Syst. Technol., vol. 13, no. 2, pp. 313–320, 2005.

[8] M. Gokasan, S. Bogosyan, and D. J. Goering, “Sliding mode basedpowertrain control for efficiency improvement in series hybrid-electricvehicles,” IEEE Trans. Power Electron., vol. 21, no. 3, pp. 779–790,2006.

[9] D. Prokhorov, “Training recurrent neurocontrollers for real-time appli-cations,” IEEE Trans. Neural Netw., vol. 18, no. 4, pp. 1003–1015, July2007.

[10] C. Dextreit, F. Assadian, I. Kolmanovsky, J. Mahtani, and K. Burnham,“Hybrid vehicle energy management using game theory,” in Proc., SAEWorld Conference, 2008, Paper no. 2008-01-1317.

[11] R. Langari and J.-S. Won, “Integrated drive cycle analysis for fuzzylogic based energy management in hybrid vehicles,” in Proc. 12th IEEEInternational Conference on Fuzzy Systems, vol. 1, 2003, pp. 290–295.

[12] L. Perez, G. Bossio, D. Moitre, and G. Garcia, “Supervisory control ofan HEV using an inventory control approach,” Latin American AppliedResearch, vol. 36, no. 2, pp. 93–100, 2006.

[13] L. Perez and E. Pilotta, “Optimal power split in a hybrid electric vehicleusing direct transcription of an optimal control problem,” Mathematicsand Computers in Simulation, vol. 79, no. 6, pp. 1959–1970, 2009.

[14] S. Kermani, S. Delprat, T. M. Guerra, and R. Trigui, “Predictive controlfor HEV energy management: experimental results,” in Proc. IEEEVehicle Power and Propulsion Conference, 2009, pp. 364–369.

[15] Q. Gong, Y. Li, and Z.-R. Peng, “Trip-based optimal power managementof plug-in hybrid electric vehicles,” IEEE Trans. Veh. Technol., vol. 57,no. 6, pp. 3393–3401, 2008.

[16] L. Johannesson, M. Asbogard, and B. Egardt, “Assessing the potentialof predictive control for hybrid vehicle powertrains using stochasticdynamic programming,” IEEE Trans. Intell. Transp. Syst., vol. 8, no. 1,pp. 71–83, 2007.

[17] G. Paganelli, S. Delprat, T. Guerra, J. Rimaux, and J. Santin, “Equivalentconsumption minimization strategy for parallel hybrid powertrains,” inProc. IEEE 55th Vehicular Technology Conference, vol. 4, 2002, pp.2076–2081.

[18] A. Sciarretta, M. Back, and L. Guzzella, “Optimal control of parallelhybrid electric vehicles,” IEEE Trans. Control Syst. Technol., vol. 12,no. 3, pp. 352–363, 2004.

[19] C. Musardo, G. Rizzoni, and B. Staccia, “A-ECMS: An adaptivealgorithm for hybrid electric vehicle energy management,” in Proc. IEEEConference on Decision and Control, 2005, pp. 1816–1823.

13

[20] C. Musardo, B. Staccia, S. Midlam-Mohler, Y. Guezennec, and G. Riz-zoni, “Supervisory control for NOx reduction of an HEV with a mixed-mode HCCI/CIDI engine,” in Proc. American Control Conference,vol. 6, 2005, pp. 3877–3881.

[21] S. Delprat, T. M. Guerra, and J. Rimaux, “Optimal control of a parallelpowertrain: from global optimization to real time control strategy,” inProc. IEEE 55th Vehicular Technology Conference, vol. 4, 2002, pp.2082–2088.

[22] S. Delprat, J. Lauber, T. M. Guerra, and J. Rimaux, “Control of a parallelhybrid powertrain: optimal control,” IEEE Trans. Veh. Technol., vol. 53,no. 3, pp. 872–881, 2004.

[23] C.-C. Lin, H. Peng, J. Grizzle, and J.-M. Kang, “Power managementstrategy for a parallel hybrid electric truck,” IEEE Transactions onControl Systems Technology, vol. 11, no. 6, pp. 839–849, 2003.

[24] B. Wu, C.-C. Lin, Z. Filipi, H. Peng, and D. Assanis, “Optimal powermanagement for a hydraulic hybrid delivery truck,” Vehicle SystemDynamics, vol. 42, no. 1-2, pp. 23–40, 2004.

[25] C.-C. Lin, H. Peng, and J. Grizzle, “A stochastic control strategy forhybrid electric vehicles,” in Proc. of the American Control Conference,2004, pp. 4710 – 4715.

[26] E. Tate, J. Grizzle, and H. Peng, “Shortest path stochastic control forhybrid electric vehicles,” International Journal of Robust and NonlinearControl, vol. 18, pp. 1409–1429, 2008.

[27] S. Delprat, T. M. Guerra, and J. Rimaux, “Control strategies for hybridvehicles: optimal control,” in Proc. Vehicular Technology Conference,vol. 3, 2002, pp. 1681–1685.

[28] ——, “Control strategies for hybrid vehicles: synthesis and evaluation,”in Proc. Vehicular Technology Conference, vol. 5, 2003, pp. 3246–3250.

[29] G. Paganelli, T. M. Guerra, S. Delprat, J.-J. Santin, M. Delhom, andE. Combes, “Simulation and assessment of power control strategies fora parallel hybrid car,” Proc. of the Institution of Mechanical Engineers,Part D: Journal of Automobile Engineering, vol. 214, pp. 705–717,2000.

[30] S. Kermani, S. Delprat, R. Trigui, and T. M. Guerra, “Predictive energymanagement of hybrid vehicle,” in Proc. IEEE Vehicle Power andPropulsion Conference, 2008, pp. 1–6.

[31] A. Kleimaier and D. Schroder, “An approach for the online optimizedcontrol of a hybrid powertrain,” in Proc. 7th Int Advanced MotionControl Workshop, 2002, pp. 215–220.

[32] D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming.Athena Scientific, 1996.

[33] D. Bertsekas, Dynamic Programming and Optimal Control. AthenaScientific, 2005, vol. 1.

[34] ——, Dynamic Programming and Optimal Control. Athena Scientific,2005, vol. 2.

[35] D. Opila, “Incorporating drivability metrics into optimal energy man-agement strategies for hybrid vehicles,” Ph.D. dissertation, Universityof Michigan, 2010.

[36] C. Belton, P. Bennett, P. Burchill, D. Copp, N. Darnton, K. Butts, J. Che,B. Hieb, M. Jennings, and T. Mortimer, “A vehicle model architecturefor vehicle system control design,” in Proc. SAE World Congress &Exhibition., 2003, Paper no. 2003-01-0092.

[37] P. Pisu, K. Koprubasi, and G. Rizzoni, “Energy management anddrivability control problems for hybrid electric vehicles,” in Proc. IEEEConference on Decision and Control, 2005, pp. 1824–1830.

[38] X. Wei, “Dynamic modeling of a hybrid electric drivetrain for fueleconomy, performance and driveability evaluations,” American Societyof Mechanical Engineers, Dynamic Systems and Control Division (Pub-lication) DSC, vol. 72, no. 1, pp. 443–450, 2003.

[39] J. Liu and H. Peng, “Modeling and control of a power-split hybridvehicle,” IEEE Trans. Control Syst. Technol., vol. 16, no. 6, pp. 1242–1251, 2008.

[40] L. Johannesson, M. Asbogard, and B. Egardt, “Assessing the potentialof predictive control for hybrid vehicle powertrains using stochastic dy-namic programming,” in Proc. IEEE Intelligent Transportation Systems,2005, pp. 366–371.

[41] E. D. Tate, J. W. Grizzle, and H. Peng, “SP-SDP for fuel consumptionand tailpipe emissions minimization in an EVT hybrid,” IEEE Trans.Control Syst. Technol., vol. 18, no. 3, pp. 673–687, 2010.

[42] C.-C. Lin, H. Peng, J. Grizzle, J. Liu, and M. Busdiecker, “Controlsystem development for an advanced-technology medium-duty hybridelectric truck,” in Proceedings of the International Truck & Bus Meeting& Exhibition, Ft. Worth, TX, USA, 2003.

[43] G. Paganelli, M. Tateno, A. Brahma, G. Rizzoni, and Y. Guezennec,“Control development for a hybrid-electric sport-utility vehicle: strat-egy, implementation and field test results,” in Proc. American ControlConference, 2001, pp. 5064–5069.

APPENDIX

A. Proof of Minimization Decomposition

Equation (16) may be written as

V ∗(x) = minu∈U(x)

minu∈U(x,u)

Ew[c(x, u, u, w) + V ∗(f(x, u, w))],

(22)and by the linearity of expectation

V ∗(x)= min

u∈U(x)min

u∈U(x,u)(Ew[c(x, u, u, w)]+Ew[V ∗(f(x, u, w))]).

(23)

The expectation of the value function is independent of uyielding

V ∗(x)= min

u∈U(x)( minu∈U(x,u)

Ew[c(x, u, u, w)]+Ew[V ∗(f(x, u, w))]).

(24)

Using the definition (17), (24) becomes (16). �

B. Related Comments on Minimization Decomposition

To implement the controller developed using MinimizationDecomposition, u must still be determined. It may be precom-puted and stored when calculating (17),

u∗(x, u) = argminu∈U(x,u)

c(x, u, u), (25)

and

c(x, u) = c(x, u, u∗(x, u)) = minu∈U(x,u)

c(x, u, u). (26)

This process reduces the space of control actions by U .The computation scales linearly with the number of possiblecontrol actions, and can be significantly reduced depending onthe problem structure and the size of u.

Minimization Decomposition may also be used when solv-ing for non-stationary value functions by appropriately replac-ing V (x) with a time-dependent Vk(x), for either deterministicor stochastic cases [16].

Remark: (Functional Form to use Minimization Decom-position) Suppose a system has dynamics f(x, u, u, w) thatare independent of some control component u and can bereformulated into a function f , such that

f(x, u, w) = f(x, u, u, w) (27)

with probability 1 (w.p. 1). Then the Bellman equation satisfies(16) and the Minimization Decomposition may be used.

While the property (27) seems quite restrictive, it occurssurprisingly often in the energy management problem. It islikely to occur if the number of control inputs M exceeds thedimension of the state space N , leaving a null control directionas used in [37].

14

Remark: (State Decomposition) In this energy managementproblem (as in most formulations) the dynamics may clearlybe broken down into two parts

f(x, u, w) =[fu(x, u)fw(x,w)

](28)

where the deterministic states are the known vehicle dynamicsand the stochastic driver dynamics are independent of thecontrol input.

This allows the control inputs to be studied without theeffect of w, simplifying the verification of condition (27).Define P as the dimension of fu, the state space that dependson u in (28). If M > P , (27) likely holds.

The main point is this: if the number of control inputsexceeds the number of states, the required computation canoften be drastically reduced. Even with discrete states (ie. gearnumber) the same techniques may often be used.

C. Power-Split Example

Consider for example the “Power-Split” architecture of theToyota Prius and Ford Escape, with a cost function thatpenalizes fuel use and SOC deviations from nominal to attaincharge sustenance. If one assumes that the dynamics of enginespeed changes are negligible at the timescales for energymanagement, the only vehicle state is SOC, as velocity andacceleration are assigned by the driver (stochastically whenusing SP-SDP). Assuming the vehicle matches driver demandtorque, the system is defined by two inputs. By using specificdefinitions of the system variables, the optimization reduces totwo one degree of freedom problems. A common method is totreat the two control inputs as engine speed and engine power.Suppose instead we choose engine speed ωICE and electricalpower Pelec, a slightly different definition. This allows amajor decoupling of the system dynamics. The evolution ofSOC is now only dependent on Pelec = u and completelyindependent of ωICE = u. The engine speed that results inminimum fuel use for a given Pelec can be calculated off-linebecause it is independent of SOC. This results in engine fuelconsumption as a 1-D function of power c(x, u) = c(x, Pelec),rather than the standard 2-D functions of power and speedc(x, u, u) = c(x, Pelec, ωICE).

D. Comparison of SP-SDP to ECMS for Fuel-only Optimiza-tion

One of the most well known optimization methods forenergy management in HEVs is the “Equivalent ConsumptionMinimization Strategy” (ECMS) [19], [43]. This method op-timizes for fuel economy only, which is equivalent to takingthe running cost in (12) as fuel flow rate, that is, c = mf .ECMS is popular among academics because it requires littlecomputation and seems easy to implement. At each time step,the controller minimizes a function that trades off batteryusage vs. fuel,

u∗k(x) = argminu∈U

[mf (x, u) + λk∆SOC(x, u)]. (29)

The design parameter is the weighting factor λk, whichrepresents the relative value of battery charge in terms of fuel.

In actual practice, a real difficulty arises because, unless λk

is carefully chosen, the vehicle will not be charge sustaining.The required values for λk are highly cycle dependent andtypically require on-line estimation.

It is now shown for the fuel only case that (12) and (13)of the SP-SDP algorithm yield a form very similar to (29)for the computation of the optimal control, with the addedbenefit that the SP-SDP algorithm automatically adjusts theweighting function. First, note that the state can be takenas x = [SOC, x]′, where x consists of vehicle velocity andacceleration. x is independent of the control input u becausevehicle acceleration is defined by the stochastic driver. Themodel (11) can thus be expressed in the form[

SOCk+1

xk+1

]=[SOCk + ∆SOC(SOCk, xk, uk)f(xk, wk)

].

(30)Next, let V ∗ be the optimal cost to go function in the

Bellman equation (12), and define

Q(σ, x) = Ew[V ∗([

σf(x, w)

])]

for an arbitrary SOC σ. Substituting the SOC dynamics (30)for σ, the Bellman equation (12) becomes

V ∗(SOC, x) = minu∈U

[mf (SOC, x, u)+

Q(SOC + ∆SOC(SOC, x, u), x)] . (31)

The running cost c = mf in (12) is not a function of therandom variable w and can be removed from the expectation.From this, the expression for the optimal control becomes

u∗(SOC, x) = argminu∈U

[mf (SOC, x, u)+

Q(SOC + ∆SOC(SOC, x, u), x)] . (32)

Doing a first-order Taylor expansion of Q then yields

u∗(SOC, x) ≈ argminu∈U

[mf (SOC, x, u)+

Q(SOC, x) +∂Q(SOC, x)∂SOC

∆SOC(SOC, x, u)]. (33)

Recognizing that Q(SOC, x), being independent of u, doesnot affect the minimization, and substituting x = [SOC, x]′

into (33), then yields

u∗(x) ≈ argminu∈U

[mf (x, u) +

∂Q(x)∂SOC

∆SOC(x, u)]. (34)

It follows that ∂Q(x)∂SOC is equivalent to the weighting factor λ

in (29). The SP-SDP algorithm has the same structure as theECMS method, but the weighting factor is a function of thestate variables, and is automatically updated on-line. There isa variant of ECMS method called Adaptive ECMS (A-ECMS)in which the weighting factor is also allowed to change overtime based on the current driving conditions [19]. A-ECMSis even more similar to the SP-SDP algorithm in that bothmethods have a weighting factor that is updated on-line as afunction of the state.

15

This relationship is illustrated by again studying the valuefunction V (x) as a function of SOC for fixed accelerationand velocity shown in Fig. 5. The local slope of V (x) in thefigure is closely related to the weighting factor in (34), which,once again, is analogous to λ in (29). Fundamentally, all fuel-minimizing control algorithms must estimate the value of bat-tery charge in terms of fuel and thus have some equivalent tothe weighting factor. It may appear linearly and explicitly as inECMS, or nonlinearly and implicitly as in SP-SDP. All knowninformation is incorporated in the weighting factor: currentstate, plant dynamics, and expected future driver demands.Once this weighting factor is determined, the control problemis a simple static optimization.

A basic difference of the algorithms lies in how theyestimate the value of battery charge in terms of fuel: ECMSuses a value assigned by the designer; A-ECMS estimates avalue based on battery charge and recent history; DeterministicDynamic Programming uses exact future knowledge; and SP-SDP uses estimates of cycle statistics. A benefit of dynamicprogramming methods, such as SP-SDP, is that they canoptimally accommodate more complicated objectives, such asthe fuel and drivability metrics studied here.

16

an energy management controller to optimally tradeoff fuel

Documents