lecture 4 - math.pku.edu.cnx =/, 2g.., n t-o, i, 2 - -, ty # ofpaths-nt define • it): state at...
TRANSCRIPT
![Page 1: Lecture 4 - math.pku.edu.cnX =/, 2g.., N t-o, I, 2 - -, Ty # ofpaths-NT Define • It): state at time t (SCH-I, 2) • vans:= Max {RCH: SHI-x}score, reward • VCQD is the best we](https://reader033.vdocuments.mx/reader033/viewer/2022050412/5f8909e9be8edb55ae346597/html5/thumbnails/1.jpg)
Lecture 4
![Page 2: Lecture 4 - math.pku.edu.cnX =/, 2g.., N t-o, I, 2 - -, Ty # ofpaths-NT Define • It): state at time t (SCH-I, 2) • vans:= Max {RCH: SHI-x}score, reward • VCQD is the best we](https://reader033.vdocuments.mx/reader033/viewer/2022050412/5f8909e9be8edb55ae346597/html5/thumbnails/2.jpg)
XH
XII
XII
f--O
X =L .
t- I f: 2 f=3 4=4 .
![Page 3: Lecture 4 - math.pku.edu.cnX =/, 2g.., N t-o, I, 2 - -, Ty # ofpaths-NT Define • It): state at time t (SCH-I, 2) • vans:= Max {RCH: SHI-x}score, reward • VCQD is the best we](https://reader033.vdocuments.mx/reader033/viewer/2022050412/5f8909e9be8edb55ae346597/html5/thumbnails/3.jpg)
X =/ , 2g ... , N
'
t - o , I , 2, - -
, Ty# of paths - NT
Define
• It) : state at time t (SCH - I , 2)
• vans := Max { RCH : SHI - x } .
-
score , reward .
• VCQD is the best we can do
• Goal i find Noll ) and optimal strategy .
Cost , rctix) , need N computations .
# Vctix) = NT
Tt of operations ⇒ N'T
![Page 4: Lecture 4 - math.pku.edu.cnX =/, 2g.., N t-o, I, 2 - -, Ty # ofpaths-NT Define • It): state at time t (SCH-I, 2) • vans:= Max {RCH: SHI-x}score, reward • VCQD is the best we](https://reader033.vdocuments.mx/reader033/viewer/2022050412/5f8909e9be8edb55ae346597/html5/thumbnails/4.jpg)
Vail) -+4 K2, = -12 VC 3117=+3 .
UH ,D=t6
↳→
This.
Ub2) -46 VG,2)=t1V =-3 .
C-=3
![Page 5: Lecture 4 - math.pku.edu.cnX =/, 2g.., N t-o, I, 2 - -, Ty # ofpaths-NT Define • It): state at time t (SCH-I, 2) • vans:= Max {RCH: SHI-x}score, reward • VCQD is the best we](https://reader033.vdocuments.mx/reader033/viewer/2022050412/5f8909e9be8edb55ae346597/html5/thumbnails/5.jpg)
Dynamioprogrammingprina.pl#
Botta problem
> info THI - JI! Let,xHhoaDdtt # GLADIt . x.Glatt, XHI , Oltl) i Xlto ) '- Xo -
Dehne the value function V : Cfo ,tixlRd→lR .÷÷.s÷::::::÷: ::*::5=0 , Eaxo F V Xo)
![Page 6: Lecture 4 - math.pku.edu.cnX =/, 2g.., N t-o, I, 2 - -, Ty # ofpaths-NT Define • It): state at time t (SCH-I, 2) • vans:= Max {RCH: SHI-x}score, reward • VCQD is the best we](https://reader033.vdocuments.mx/reader033/viewer/2022050412/5f8909e9be8edb55ae346597/html5/thumbnails/6.jpg)
TheoremCDynamicprog.P-nhapeDPPJForevery.ESC- Cto its] ,SET
,2- ElRd , we have .
Yes . Z) = ingfffjhctixltlioetlldt t VCTIXK) ) }×
x. Ctl = AH, cacti) e tf I] , X a z
a
¥.
t¥** ,2.
EkeI
+to S T t ,
![Page 7: Lecture 4 - math.pku.edu.cnX =/, 2g.., N t-o, I, 2 - -, Ty # ofpaths-NT Define • It): state at time t (SCH-I, 2) • vans:= Max {RCH: SHI-x}score, reward • VCQD is the best we](https://reader033.vdocuments.mx/reader033/viewer/2022050412/5f8909e9be8edb55ae346597/html5/thumbnails/7.jpg)
Prot
Define J': ' 'hefffjhct, xai.atDdt + VEXED} .
① "
J? E VCs, -23.11
Fix E>0 , pick Q : [Sit . T → ⑦ sit .
JCQJ S VCs, E)T E . C by dean of inf MV )Under this control , we have .
VCE, XK) ) 's JE'
hctixlth Htt ICXCti ) )
⇒ JEE SE Ht ,xHhatDdt t VE, xcel )s + JE' ) Lctixethoetildt + INCH ) - JEETS VCs, Z) t e .
⇒ JE S VCS, 2-7 .
![Page 8: Lecture 4 - math.pku.edu.cnX =/, 2g.., N t-o, I, 2 - -, Ty # ofpaths-NT Define • It): state at time t (SCH-I, 2) • vans:= Max {RCH: SHI-x}score, reward • VCQD is the best we](https://reader033.vdocuments.mx/reader033/viewer/2022050412/5f8909e9be8edb55ae346597/html5/thumbnails/8.jpg)
Step 2 :"
J'Z VCs . 2- I
"
fix ETO r then I Qi : Csis → OH ft .
fjhctixltioccti ) Itt Vance)) £ JT + E . T
F- Q2 : cc, tis -9 sit . g) tJet' Ktxa cacti ) at + Elton ) S V K,Xk) ) T E .
Definea a f Oct) te CSE]
Out) tf TG ti
Jst' LHIXHI . Oltl ) dtt Efx Hit ) E J" t 2E .
-UGH S JEET
⇒ us,z , s J2t2# II.
![Page 9: Lecture 4 - math.pku.edu.cnX =/, 2g.., N t-o, I, 2 - -, Ty # ofpaths-NT Define • It): state at time t (SCH-I, 2) • vans:= Max {RCH: SHI-x}score, reward • VCQD is the best we](https://reader033.vdocuments.mx/reader033/viewer/2022050412/5f8909e9be8edb55ae346597/html5/thumbnails/9.jpg)
Hamilton-facobi-BeflmanEquationCHJB-pxn.fr . - X ez .
Dpp : VBA) = ihofffghcfixuy, acts ) dtt VCTIXKD }µ infinitesimal F- Stas .
VCs, a infffgtttshc - - - sat + Vestas , xcstas)) }
S
X(Stas) = Hs) t Jst"
fat , XHI , O dt
= Eggs -1 As FCS, O ) t o CAS)2-
VC Stasi XCSTASDI VCS,Z ) -1 IsVCs, Z) - AS
+ VCSZJITFCSZ , OCS) ) AS t o (As)
![Page 10: Lecture 4 - math.pku.edu.cnX =/, 2g.., N t-o, I, 2 - -, Ty # ofpaths-NT Define • It): state at time t (SCH-I, 2) • vans:= Max {RCH: SHI-x}score, reward • VCQD is the best we](https://reader033.vdocuments.mx/reader033/viewer/2022050412/5f8909e9be8edb55ae346597/html5/thumbnails/10.jpg)
fgst.NL XHI , Idf = LIS, Z , OG) ) Is + OCHS )
#5=inf{¥t Asf 2srcs.at#Vcsa5TfCsizocsD-
Thesz, T.to#J}⇒ 2sVCsz)toin¥{zVGz55fSR, + Usa, 071=0{ It , ,z ) = ICE) . ( terminal cost )
Hamilton- Jacobi - Bellman Equation .
Vlfyz ) - Eez ) { flow . 8th#Ffv)
![Page 11: Lecture 4 - math.pku.edu.cnX =/, 2g.., N t-o, I, 2 - -, Ty # ofpaths-NT Define • It): state at time t (SCH-I, 2) • vans:= Max {RCH: SHI-x}score, reward • VCQD is the best we](https://reader033.vdocuments.mx/reader033/viewer/2022050412/5f8909e9be8edb55ae346597/html5/thumbnails/11.jpg)
IsV t 'out fffzv ) O t £02 ). ⇒C-IR.
⇒.
![Page 12: Lecture 4 - math.pku.edu.cnX =/, 2g.., N t-o, I, 2 - -, Ty # ofpaths-NT Define • It): state at time t (SCH-I, 2) • vans:= Max {RCH: SHI-x}score, reward • VCQD is the best we](https://reader033.vdocuments.mx/reader033/viewer/2022050412/5f8909e9be8edb55ae346597/html5/thumbnails/12.jpg)
Implicational ✓ original omtrol problem .
Veto , Xo) [email protected])Fix s. 2- I E's.z CS , til -1 ⑦ is optimal .
①PP ⇒Vcs, z Ia chef Let, x , O Idt t VEIXK) ) ).
= Jst LH, xs.IM , Ofa Al )It t VK , KD{O*:= OI
. ix. h. J Taylor expansion .
- as""""
:÷g÷÷:¥÷÷s¥i÷";
![Page 13: Lecture 4 - math.pku.edu.cnX =/, 2g.., N t-o, I, 2 - -, Ty # ofpaths-NT Define • It): state at time t (SCH-I, 2) • vans:= Max {RCH: SHI-x}score, reward • VCQD is the best we](https://reader033.vdocuments.mx/reader033/viewer/2022050412/5f8909e9be8edb55ae346597/html5/thumbnails/13.jpg)
- LCS,x7s) DTD ) - Fans, CSDTFCS , Msd , OTH )
} - Hs, Toko) - Davis,x* Tfa, O )
dehne p* = - Thus , ytf O C- Q
p fcs , xD , EG) ) - Lcsixcsl , GD Z- Ctx)
HP" Tfcssx-4010) - LCS, , O )
Q Oo
⇒ Still a necessary condition .
![Page 14: Lecture 4 - math.pku.edu.cnX =/, 2g.., N t-o, I, 2 - -, Ty # ofpaths-NT Define • It): state at time t (SCH-I, 2) • vans:= Max {RCH: SHI-x}score, reward • VCQD is the best we](https://reader033.vdocuments.mx/reader033/viewer/2022050412/5f8909e9be8edb55ae346597/html5/thumbnails/14.jpg)
SufficientconditlhttssumeEr: Storti → ⑦ Satisfies
I be its controlled trajectory .
⇒ A-Vet , IAD t ×VHiXHDTfCt,xHh8HD-
IfVA, Ray+ KHANEH, so
⇒ integrate from to tot ,
vet, , I Cti ) ) - Uto, Ects ) x JIT kn - - Idt so .
- -
Elkan) iIoT¥ running cost
(x-
Jeong
![Page 15: Lecture 4 - math.pku.edu.cnX =/, 2g.., N t-o, I, 2 - -, Ty # ofpaths-NT Define • It): state at time t (SCH-I, 2) • vans:= Max {RCH: SHI-x}score, reward • VCQD is the best we](https://reader033.vdocuments.mx/reader033/viewer/2022050412/5f8909e9be8edb55ae346597/html5/thumbnails/15.jpg)
⇒ JEET = M£807
Generate optimal control
① Solve HSB . → V '
② OTH - afgmfx×VCtnMtD¥x"Hid1- LH, tho) y
closed loop .