lecture 4 - math.pku.edu.cnx =/, 2g.., n t-o, i, 2 - -, ty # ofpaths-nt define • it): state at...

15
Lecture 4

Upload: others

Post on 02-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 4 - math.pku.edu.cnX =/, 2g.., N t-o, I, 2 - -, Ty # ofpaths-NT Define • It): state at time t (SCH-I, 2) • vans:= Max {RCH: SHI-x}score, reward • VCQD is the best we

Lecture 4

Page 2: Lecture 4 - math.pku.edu.cnX =/, 2g.., N t-o, I, 2 - -, Ty # ofpaths-NT Define • It): state at time t (SCH-I, 2) • vans:= Max {RCH: SHI-x}score, reward • VCQD is the best we

XH

XII

XII

f--O

X =L .

t- I f: 2 f=3 4=4 .

Page 3: Lecture 4 - math.pku.edu.cnX =/, 2g.., N t-o, I, 2 - -, Ty # ofpaths-NT Define • It): state at time t (SCH-I, 2) • vans:= Max {RCH: SHI-x}score, reward • VCQD is the best we

X =/ , 2g ... , N

'

t - o , I , 2, - -

, Ty# of paths - NT

Define

• It) : state at time t (SCH - I , 2)

• vans := Max { RCH : SHI - x } .

-

score , reward .

• VCQD is the best we can do

• Goal i find Noll ) and optimal strategy .

Cost , rctix) , need N computations .

# Vctix) = NT

Tt of operations ⇒ N'T

Page 4: Lecture 4 - math.pku.edu.cnX =/, 2g.., N t-o, I, 2 - -, Ty # ofpaths-NT Define • It): state at time t (SCH-I, 2) • vans:= Max {RCH: SHI-x}score, reward • VCQD is the best we

Vail) -+4 K2, = -12 VC 3117=+3 .

UH ,D=t6

↳→

This.

Ub2) -46 VG,2)=t1V =-3 .

C-=3

Page 5: Lecture 4 - math.pku.edu.cnX =/, 2g.., N t-o, I, 2 - -, Ty # ofpaths-NT Define • It): state at time t (SCH-I, 2) • vans:= Max {RCH: SHI-x}score, reward • VCQD is the best we

Dynamioprogrammingprina.pl#

Botta problem

> info THI - JI! Let,xHhoaDdtt # GLADIt . x.Glatt, XHI , Oltl) i Xlto ) '- Xo -

Dehne the value function V : Cfo ,tixlRd→lR .÷÷.s÷::::::÷: ::*::5=0 , Eaxo F V Xo)

Page 6: Lecture 4 - math.pku.edu.cnX =/, 2g.., N t-o, I, 2 - -, Ty # ofpaths-NT Define • It): state at time t (SCH-I, 2) • vans:= Max {RCH: SHI-x}score, reward • VCQD is the best we

TheoremCDynamicprog.P-nhapeDPPJForevery.ESC- Cto its] ,SET

,2- ElRd , we have .

Yes . Z) = ingfffjhctixltlioetlldt t VCTIXK) ) }×

x. Ctl = AH, cacti) e tf I] , X a z

a

¥.

t¥** ,2.

EkeI

+to S T t ,

Page 7: Lecture 4 - math.pku.edu.cnX =/, 2g.., N t-o, I, 2 - -, Ty # ofpaths-NT Define • It): state at time t (SCH-I, 2) • vans:= Max {RCH: SHI-x}score, reward • VCQD is the best we

Prot

Define J': ' 'hefffjhct, xai.atDdt + VEXED} .

① "

J? E VCs, -23.11

Fix E>0 , pick Q : [Sit . T → ⑦ sit .

JCQJ S VCs, E)T E . C by dean of inf MV )Under this control , we have .

VCE, XK) ) 's JE'

hctixlth Htt ICXCti ) )

⇒ JEE SE Ht ,xHhatDdt t VE, xcel )s + JE' ) Lctixethoetildt + INCH ) - JEETS VCs, Z) t e .

⇒ JE S VCS, 2-7 .

Page 8: Lecture 4 - math.pku.edu.cnX =/, 2g.., N t-o, I, 2 - -, Ty # ofpaths-NT Define • It): state at time t (SCH-I, 2) • vans:= Max {RCH: SHI-x}score, reward • VCQD is the best we

Step 2 :"

J'Z VCs . 2- I

"

fix ETO r then I Qi : Csis → OH ft .

fjhctixltioccti ) Itt Vance)) £ JT + E . T

F- Q2 : cc, tis -9 sit . g) tJet' Ktxa cacti ) at + Elton ) S V K,Xk) ) T E .

Definea a f Oct) te CSE]

Out) tf TG ti

Jst' LHIXHI . Oltl ) dtt Efx Hit ) E J" t 2E .

-UGH S JEET

⇒ us,z , s J2t2# II.

Page 9: Lecture 4 - math.pku.edu.cnX =/, 2g.., N t-o, I, 2 - -, Ty # ofpaths-NT Define • It): state at time t (SCH-I, 2) • vans:= Max {RCH: SHI-x}score, reward • VCQD is the best we

Hamilton-facobi-BeflmanEquationCHJB-pxn.fr . - X ez .

Dpp : VBA) = ihofffghcfixuy, acts ) dtt VCTIXKD }µ infinitesimal F- Stas .

VCs, a infffgtttshc - - - sat + Vestas , xcstas)) }

S

X(Stas) = Hs) t Jst"

fat , XHI , O dt

= Eggs -1 As FCS, O ) t o CAS)2-

VC Stasi XCSTASDI VCS,Z ) -1 IsVCs, Z) - AS

+ VCSZJITFCSZ , OCS) ) AS t o (As)

Page 10: Lecture 4 - math.pku.edu.cnX =/, 2g.., N t-o, I, 2 - -, Ty # ofpaths-NT Define • It): state at time t (SCH-I, 2) • vans:= Max {RCH: SHI-x}score, reward • VCQD is the best we

fgst.NL XHI , Idf = LIS, Z , OG) ) Is + OCHS )

#5=inf{¥t Asf 2srcs.at#Vcsa5TfCsizocsD-

Thesz, T.to#J}⇒ 2sVCsz)toin¥{zVGz55fSR, + Usa, 071=0{ It , ,z ) = ICE) . ( terminal cost )

Hamilton- Jacobi - Bellman Equation .

Vlfyz ) - Eez ) { flow . 8th#Ffv)

Page 11: Lecture 4 - math.pku.edu.cnX =/, 2g.., N t-o, I, 2 - -, Ty # ofpaths-NT Define • It): state at time t (SCH-I, 2) • vans:= Max {RCH: SHI-x}score, reward • VCQD is the best we

IsV t 'out fffzv ) O t £02 ). ⇒C-IR.

⇒.

Page 12: Lecture 4 - math.pku.edu.cnX =/, 2g.., N t-o, I, 2 - -, Ty # ofpaths-NT Define • It): state at time t (SCH-I, 2) • vans:= Max {RCH: SHI-x}score, reward • VCQD is the best we

Implicational ✓ original omtrol problem .

Veto , Xo) [email protected])Fix s. 2- I E's.z CS , til -1 ⑦ is optimal .

①PP ⇒Vcs, z Ia chef Let, x , O Idt t VEIXK) ) ).

= Jst LH, xs.IM , Ofa Al )It t VK , KD{O*:= OI

. ix. h. J Taylor expansion .

- as""""

:÷g÷÷:¥÷÷s¥i÷";

Page 13: Lecture 4 - math.pku.edu.cnX =/, 2g.., N t-o, I, 2 - -, Ty # ofpaths-NT Define • It): state at time t (SCH-I, 2) • vans:= Max {RCH: SHI-x}score, reward • VCQD is the best we

- LCS,x7s) DTD ) - Fans, CSDTFCS , Msd , OTH )

} - Hs, Toko) - Davis,x* Tfa, O )

dehne p* = - Thus , ytf O C- Q

p fcs , xD , EG) ) - Lcsixcsl , GD Z- Ctx)

HP" Tfcssx-4010) - LCS, , O )

Q Oo

⇒ Still a necessary condition .

Page 14: Lecture 4 - math.pku.edu.cnX =/, 2g.., N t-o, I, 2 - -, Ty # ofpaths-NT Define • It): state at time t (SCH-I, 2) • vans:= Max {RCH: SHI-x}score, reward • VCQD is the best we

SufficientconditlhttssumeEr: Storti → ⑦ Satisfies

I be its controlled trajectory .

⇒ A-Vet , IAD t ×VHiXHDTfCt,xHh8HD-

IfVA, Ray+ KHANEH, so

⇒ integrate from to tot ,

vet, , I Cti ) ) - Uto, Ects ) x JIT kn - - Idt so .

- -

Elkan) iIoT¥ running cost

(x-

Jeong

Page 15: Lecture 4 - math.pku.edu.cnX =/, 2g.., N t-o, I, 2 - -, Ty # ofpaths-NT Define • It): state at time t (SCH-I, 2) • vans:= Max {RCH: SHI-x}score, reward • VCQD is the best we

⇒ JEET = M£807

Generate optimal control

① Solve HSB . → V '

② OTH - afgmfx×VCtnMtD¥x"Hid1- LH, tho) y

closed loop .