automatic differentiation

14
Automatic Differentiation Higher derivatives and Taylor series Automatic Differentiation Tobias Hoeppner 18 May 2011 Tobias Hoeppner Automatic Differentiation

Upload: tobias

Post on 04-Dec-2015

10 views

Category:

Documents


0 download

DESCRIPTION

An introduction to automatic differentiation with examples.

TRANSCRIPT

Page 1: Automatic Differentiation

Automatic DifferentiationHigher derivatives and Taylor series

Automatic Differentiation

Tobias Hoeppner

18 May 2011

Tobias Hoeppner Automatic Differentiation

Page 2: Automatic Differentiation

Automatic DifferentiationHigher derivatives and Taylor series

Outline

Automatic Differentiation

Higher derivatives and Taylor series

Tobias Hoeppner Automatic Differentiation

Page 3: Automatic Differentiation

Automatic DifferentiationHigher derivatives and Taylor series

What is Automatic Differentiation

also own as:

I computational differentiation

I algorithmic differentiation

I differentiation of algorithms

AD is a process for evaluating derivatives which depends only onan alorithmic apecification of the function to be differentiated. inpractice the specification of the function is part of a computerprogramIt’s not symbolic differentiation It’s not divided differences

Tobias Hoeppner Automatic Differentiation

Page 4: Automatic Differentiation

Automatic DifferentiationHigher derivatives and Taylor series

An Example

the function

f (x , y) = (xy + sin x + 4)(3y2 + 6) (1)

the goal of symbolic diff is to produce fromulas for its derivatives

∂f

∂x= (y + cos x)(3y2 + 6) = 3y2 cos x + 6 cos x + 3y3 + 6y ,

(2)

∂f

∂y= 6y(xy + sin x + 4) + x(3y2 + 6) = 9xy2 + 6y sin x + 24y + 6x

(3)

in principle, avaluation of these formulas gives exact values of thederivatives but roundoff error due to floating point

Tobias Hoeppner Automatic Differentiation

Page 5: Automatic Differentiation

Automatic DifferentiationHigher derivatives and Taylor series

divided differences

I produce approximations to values

I involving only function evaluations

∂f

∂x≈ f (x + ∆x , y)− f (x − ∆x , y)

2∆x=

∂d

∂x+O(∆x2) (4)

where the term O(∆x2) denotes the (unknown) truncation error.in contrast, the values for derivatives obtained by AD are exactand are often much less expensive to compute.

Tobias Hoeppner Automatic Differentiation

Page 6: Automatic Differentiation

Automatic DifferentiationHigher derivatives and Taylor series

How and why does AD work?

I AD works whenever the chain rule holds

I the theretical exactness of automatic differentiation stemsfrom the fact that it uses the same rules of differentiation aslearned in elementary calculus.

I rules are applied to an algorithmic specification rather than toa formula

I step back a little and consider how to evaluate (rather thandifferentiate) a formula)

Tobias Hoeppner Automatic Differentiation

Page 7: Automatic Differentiation

Automatic DifferentiationHigher derivatives and Taylor series

evaluating a formula

I the formula given by (1)

I one starts with the values of x and y , builds up each factor,and then multiplies them to obtain the final result.

I the steps involved:

t1 = x , t6 = t5 + 4,

t2 = y , t7 = t22 ,

t3 = t1t2, t8 = 3t7, (5)

t4 = sin t1, t9 = t8 + 6,

t5 = t3 + t4, t10 = t6t9

the result is t10 = f (x , y)

Tobias Hoeppner Automatic Differentiation

Page 8: Automatic Differentiation

Automatic DifferentiationHigher derivatives and Taylor series

obtain derivatives

I in case of a function f = f (x1, . . . , xm) of several variables,the first partial derivatives can be expressed compactly as thegradient vector

∇f =

[∂f

∂x1, . . . ,

∂f

∂xm

](6)

if u and v are functions whose gradients ∇u and ∇v are known orare previously computed, we compute ∇f using the rules

∇(u ± v) = ∇u ±∇v ,

∇(uv) = u∇v + v∇u,

∇(u/v) = (∇− (u/v)∇v)/v , v 6= 0,

for the arithmetic options and the chain rule

∇φ(u) = φ′(u)∇u, (7)

Tobias Hoeppner Automatic Differentiation

Page 9: Automatic Differentiation

Automatic DifferentiationHigher derivatives and Taylor series

derivative of the code list

the code list (5) can be augmented with the gradients of eachentry,

t1 = x , ∇t1 = [1, 0],

t2 = y , ∇t2 = [0, 1],

t3 = t1t2, ∇t3 = t1∇t2 + t2∇t1 → [t2, t1],

t4 = sin t1, ∇t4 = (cos t1)∇t1 → [cos t1, 0],

t5 = t3 + t4, ∇t5 = ∇t3 +∇t4 → [t2 + cos t1, t1],

t6 = t5 + 4, ∇t6 = ∇t5 → [t2 + cos t1, t1],

t7 = t22 , ∇t7 = 2t2∇t2 → [0, 2t2],

t8 = 3t7, ∇t8 = 3∇t7 → [0, 6t2],

t9 = t8 + 6, ∇t9 = ∇t8 → [0, 6t2]

t10 = t6t9 ∇t10 = t6∇t9 + t9∇t6 → [t9(t2 + cos t1), 6t2t6 + t1t9].

Tobias Hoeppner Automatic Differentiation

Page 10: Automatic Differentiation

Automatic DifferentiationHigher derivatives and Taylor series

final results

I the final results are t10 = f (x , y) and its gradient∇t10 = ∇f (x , y) = [t9(t2 + cos t1), 6t2t6 + t1t9].

I count of operatations: 22 = 2 + 10m

Tobias Hoeppner Automatic Differentiation

Page 11: Automatic Differentiation

Automatic DifferentiationHigher derivatives and Taylor series

2nd order derivative

I in preceeding section, we computed first derivatives

I once a code list representation of function has been obtained,one can also apply rules for higher derivatives or recurrentrelations for Taylor coefficients.

I the second partial derivatives of a function f : Rm → Rconstitutes its Hessian matrix

H(f ) =

[∂2f

∂xi∂xj

]i ,j=1,...,m

(8)

-required for optimization algos using Newton’s method

Tobias Hoeppner Automatic Differentiation

Page 12: Automatic Differentiation

Automatic DifferentiationHigher derivatives and Taylor series

Rules for arithmetic operations

I the rules for results of arithmetic operations are

H(u ± v) = H(u)±H(v),

H(uv) = uH(v) +∇uT∇v +∇vT∇u + vH(u),

H(u/v) =(H(u)−∇(u/v)T∇v −∇vT∇(u/v)− (u/v)H(v)

), v 6= 0,

and the chain rule takes the form

H(φ(u)) = φ′′(u)∇uT∇u + φ′(u)H(u)

for a twice differentiable functions φ as the standard functions.

Tobias Hoeppner Automatic Differentiation

Page 13: Automatic Differentiation

Automatic DifferentiationHigher derivatives and Taylor series

The Taylor Series

f (x) = f (x0) +1

1!f ′(x0) · (x − x0)

+1

2!f ′′(x0)(x − x0)

2

+ . . . +1

n!f (n)(x0) · (x − x0)

n + . . .

=∞

∑n=0

1

n!f (n)(x0) · (x − x0)

n. (9)

Tobias Hoeppner Automatic Differentiation

Page 14: Automatic Differentiation

Automatic DifferentiationHigher derivatives and Taylor series

Taylor coefficients

I Taylor coefficients are scalars

I suppose that f is a function of m variables

I series expansion at point x0 = (x01, . . . , x0m)

I in direction h = (h1, . . . , hm)

f (x0 + h) =∞

∑k=0

1

k !f (k)(x0)h

k =∞

∑k=0

fk , (10)

where fk = f (k)(x0)hk/k !, k = 0, 1, . . . denote the normalizedTaylor coefficients.

Tobias Hoeppner Automatic Differentiation