the simple essence of automatic di...

46
The simple essence of automatic differentiation Differentiable programming made easy Conal Elliott Target November 2018 Conal Elliott November 2018 1 / 29

Upload: others

Post on 16-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

The simple essence of automatic differentiation

Differentiable programming made easy

Conal Elliott

Target

November 2018

Conal Elliott November 2018 1 / 29

Page 2: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Differentiable programming made easy

Current AI revolution runs on large data, speed, and AD.

, but

AD algorithm (backprop) is complex and stateful.

Complex graph APIs.

Solutions:

AD: Simple, calculated, efficient, parallel-friendly, generalized.

API: derivative.

Conal Elliott Simple essence of AD November 2018 2 / 29

Page 3: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Differentiable programming made easy

Current AI revolution runs on large data, speed, and AD., but

AD algorithm (backprop) is complex and stateful.

Complex graph APIs.

Solutions:

AD: Simple, calculated, efficient, parallel-friendly, generalized.

API: derivative.

Conal Elliott Simple essence of AD November 2018 2 / 29

Page 4: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Differentiable programming made easy

Current AI revolution runs on large data, speed, and AD., but

AD algorithm (backprop) is complex and stateful.

Complex graph APIs.

Solutions:

AD: Simple, calculated, efficient, parallel-friendly, generalized.

API: derivative.

Conal Elliott Simple essence of AD November 2018 2 / 29

Page 5: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

What’s a derivative?

Number

Vector

Covector

Matrix

Higher derivatives

Chain rule for each.

Conal Elliott Simple essence of AD November 2018 3 / 29

Page 6: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

What’s a derivative?

Number

Vector

Covector

Matrix

Higher derivatives

Chain rule for each.

Conal Elliott Simple essence of AD November 2018 3 / 29

Page 7: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

What’s a derivative?

Number

Vector

Covector

Matrix

Higher derivatives

Chain rule for each.

Conal Elliott Simple essence of AD November 2018 3 / 29

Page 8: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Derivatives as linear maps (Frechet)

D :: pa Ñ bq Ñ pa Ñ pa ( bqq

D f a is a local linear approximation to f at a.

:

limεÑ0

‖f pa ` εq ´ pf a `D f a εq‖‖ε‖

“ 0

Conal Elliott Simple essence of AD November 2018 4 / 29

Page 9: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Derivatives as linear maps (Frechet)

D :: pa Ñ bq Ñ pa Ñ pa ( bqq

D f a is a local linear approximation to f at a.:

limεÑ0

‖f pa ` εq ´ pf a `D f a εq‖‖ε‖

“ 0

Conal Elliott Simple essence of AD November 2018 4 / 29

Page 10: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Composition

Sequential:

p˝q :: pb Ñ cq Ñ pa Ñ bq Ñ pa Ñ cqpg ˝ f q a “ g pf aq

D pg ˝ f q a “ D g pf aq ˝D f a -- chain rule

Parallel:

pŸq :: pa Ñ cq Ñ pa Ñ dq Ñ pa Ñ c ˆ dqpf Ÿ gq a “ pf a, g aq

D pf Ÿ gq a “ D f a Ÿ D g a

Conal Elliott Simple essence of AD November 2018 5 / 29

Page 11: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Composition

Sequential:

p˝q :: pb Ñ cq Ñ pa Ñ bq Ñ pa Ñ cqpg ˝ f q a “ g pf aq

D pg ˝ f q a “ D g pf aq ˝D f a -- chain rule

Parallel:

pŸq :: pa Ñ cq Ñ pa Ñ dq Ñ pa Ñ c ˆ dqpf Ÿ gq a “ pf a, g aq

D pf Ÿ gq a “ D f a Ÿ D g a

Conal Elliott Simple essence of AD November 2018 5 / 29

Page 12: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Linear functions

Linear functions are their own derivatives everywhere.

D id a “ idD fst a “ fstD snd a “ snd

...

Conal Elliott Simple essence of AD November 2018 6 / 29

Page 13: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Linear functions

Linear functions are their own derivatives everywhere.

D id a “ idD fst a “ fstD snd a “ snd

...

Conal Elliott Simple essence of AD November 2018 6 / 29

Page 14: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Compositionality

Chain rule:

D pg ˝ f q a “ D g pf aq ˝D f a -- non-compositional

To fix, combine regular result with derivative:

D :: pa Ñ bq Ñ pa Ñ pb ˆ pa ( bqqq

D f “ f Ÿ D f -- specification

Often much work in common to f and D f .

Conal Elliott Simple essence of AD November 2018 7 / 29

Page 15: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Compositionality

Chain rule:

D pg ˝ f q a “ D g pf aq ˝D f a -- non-compositional

To fix, combine regular result with derivative:

D :: pa Ñ bq Ñ pa Ñ pb ˆ pa ( bqqq

D f “ f Ÿ D f -- specification

Often much work in common to f and D f .

Conal Elliott Simple essence of AD November 2018 7 / 29

Page 16: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Abstract algebra for functions

class Category p;q whereid :: a ; ap˝q :: pb ; cq Ñ pa ; bq Ñ pa ; cq

class Category p;q ñ Cartesian p;q whereexl :: pa ˆ bq; aexr :: pa ˆ bq; bpŸq :: pa ; cq Ñ pa ; dq Ñ pa ; pc ˆ dqq

Plus laws and classes for arithmetic etc.

Conal Elliott Simple essence of AD November 2018 8 / 29

Page 17: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Compiling to categories

sqr a “ a ˚ a

magSqr pa, bq “ sqr a ` sqr b

cosSinProd px , yq “ pcos z , sin z q where z “ x ˚ y

In categorical vocabulary:

sqr “ mul ˝ pid Ÿ idq

magSqr “ add ˝ ppsqr ˝ exlq Ÿ psqr ˝ exrqq

cosSinProd “ pcos Ÿ sinq ˝mul

Automated translation & generalization. See ICFP 2017 paper.

Conal Elliott Simple essence of AD November 2018 9 / 29

Page 18: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Compiling to categories

sqr a “ a ˚ a

magSqr pa, bq “ sqr a ` sqr b

cosSinProd px , yq “ pcos z , sin z q where z “ x ˚ y

In categorical vocabulary:

sqr “ mul ˝ pid Ÿ idq

magSqr “ add ˝ ppsqr ˝ exlq Ÿ psqr ˝ exrqq

cosSinProd “ pcos Ÿ sinq ˝mul

Automated translation & generalization. See ICFP 2017 paper.

Conal Elliott Simple essence of AD November 2018 9 / 29

Page 19: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Automatic differentiation (specification)

newtype D a b “ D pa Ñ b ˆ pa ( bqq

D :: pa Ñ bq Ñ D a b

D f “ D pf Ÿ D f q -- not computable

Specification: D preserves Category and Cartesian structure:

D id “ id

D pg ˝ f q “ D g ˝ D f

D exl “ exl

D exr “ exr

D pf Ÿ gq “ D f Ÿ D g

The game: solve these equations for the RHS operations.

Conal Elliott Simple essence of AD November 2018 10 / 29

Page 20: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Automatic differentiation (specification)

newtype D a b “ D pa Ñ b ˆ pa ( bqq

D :: pa Ñ bq Ñ D a b

D f “ D pf Ÿ D f q -- not computable

Specification: D preserves Category and Cartesian structure:

D id “ id

D pg ˝ f q “ D g ˝ D f

D exl “ exl

D exr “ exr

D pf Ÿ gq “ D f Ÿ D g

The game: solve these equations for the RHS operations.

Conal Elliott Simple essence of AD November 2018 10 / 29

Page 21: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Automatic differentiation (solution)

newtype D a b “ D pa Ñ b ˆ pa ( bqq

linearD f “ D pλa Ñ pf a, f qq

instance Category D whereid “ linearD idD g ˝D f “ D pλa Ñ let tpb, f 1q “ f a; pc, g 1q “ g b u in pc, g 1 ˝ f 1qq

instance Cartesian D whereexl “ linearD exlexr “ linearD exrD f Ÿ D g “ D pλa Ñ let tpb, f 1q “ f a; pc, g 1q “ g a u in ppb, cq, f 1 Ÿ g 1qq

instance NumCat D wherenegate “ linearD negateadd “ linearD addmul “ D pmul Ÿ pλpa, bq Ñ λpda, dbq Ñ b ˚ da ` a ˚ dbqq

Conal Elliott Simple essence of AD November 2018 11 / 29

Page 22: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Generalizing AD

newtype D a b “ D pa Ñ b ˆ pa ( bqq

linearD f “ D pλa Ñ pf a, f qq

instance Category D whereid “ linearD idD g ˝D f “ D pλa Ñ let tpb, f 1q “ f a; pc, g 1q “ g b u in pc, g 1 ˝ f 1qq

instance Cartesian D whereexl “ linearD exlexr “ linearD exrD f Ÿ D g “ D pλa Ñ let tpb, f 1q “ f a; pc, g 1q “ g a u in ppb, cq, f 1 Ÿ g 1qq

Each D operation just uses corresponding p(q operation.

Generalize from p(q to other cartesian categories.

Conal Elliott Simple essence of AD November 2018 12 / 29

Page 23: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Generalized AD

newtype Dp;q a b “ D pa Ñ b ˆ pa ; bqq

linearD f f 1 “ D pλa Ñ pf a, f 1qq

instance Category p;q ñ Category Dp;q where

id “ linearD id idD g ˝D f “ D pλa Ñ let tpb, f 1q “ f a; pc, g 1q “ g b u in pc, g 1 ˝ f 1qq

instance Cartesian p;q ñ Cartesian Dp;q where

exl “ linearD exl exlexr “ linearD exr exrD f Ÿ D g “ D pλa Ñ let tpb, f 1q “ f a; pc, g 1q “ g a u in ppb, cq, f 1 Ÿ g 1qq

instance ...ñ NumCat D wherenegate “ linearD negate negateadd “ linearD add addmul “ ??

Conal Elliott Simple essence of AD November 2018 13 / 29

Page 24: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Numeric operations

Specific to (linear) functions:

mul “ D pmul Ÿ pλpa, bq Ñ λpda, dbq Ñ b ˚ da ` a ˚ dbqq

Rephrase:

scale :: Multiplicative a ñ a Ñ pa Ñ aqscale u “ λv Ñ u ˚ v

pŹq :: pa Ñ cq Ñ pb Ñ cq Ñ ppa ˆ bq Ñ cqpf Ź gq pa, bq “ f a ` g b

Now

mul “ D pmul Ÿ pλpa, bq Ñ scale b Ź scale aqq

Conal Elliott Simple essence of AD November 2018 14 / 29

Page 25: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Numeric operations

Specific to (linear) functions:

mul “ D pmul Ÿ pλpa, bq Ñ λpda, dbq Ñ b ˚ da ` a ˚ dbqq

Rephrase:

scale :: Multiplicative a ñ a Ñ pa Ñ aqscale u “ λv Ñ u ˚ v

pŹq :: pa Ñ cq Ñ pb Ñ cq Ñ ppa ˆ bq Ñ cqpf Ź gq pa, bq “ f a ` g b

Now

mul “ D pmul Ÿ pλpa, bq Ñ scale b Ź scale aqq

Conal Elliott Simple essence of AD November 2018 14 / 29

Page 26: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

New generalized vocabulary

class Category p;q ñ Cocartesian p;q whereinl :: a ; pa ˆ bqinr :: b ; pa ˆ bqpŹq :: pa ; cq Ñ pb ; cq Ñ ppa ˆ bq; cq

class ScalarCat p;q a wherescale :: a Ñ pa ; aq

Differentiation:

D pf Ź gq pa, bq “ D f a Ź D g b

The rest are linear.

Conal Elliott Simple essence of AD November 2018 15 / 29

Page 27: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Linear transformations as functions

newtype a Ñ`b “ AddFun pa Ñ bq

instance Category pÑ`q whereid “ AddFun idp˝q “ inNew2 p˝q

instance Cartesian pÑ`q whereexl “ AddFun exlexr “ AddFun exrpŸq “ inNew2 pŸq

instance Cocartesian pÑ`q whereinl “ AddFun pλa Ñ pa, 0qqinr “ AddFun pλb Ñ p0, bqqpŹq “ inNew2 pλf g pa, bq Ñ f a ` g bq

instance Multiplicative s ñ ScalarCat pÑ`q s wherescale s “ AddFun ps ˚q

Conal Elliott Simple essence of AD November 2018 16 / 29

Page 28: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Extracting a data representation

Finally, extract a matrix or gradient vector.

Very inefficient for gradient-based optimization!

Alternatively, represent as “generalized matrices” (Ms a b).Then solve more homomorphisms.

Conal Elliott Simple essence of AD November 2018 17 / 29

Page 29: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Efficiency of composition

Composition is associative.

Some associations are more efficient than others, so

Associate optimally.

Equivalent to matrix chain multiplication — Opn log nq.

Choice determined by types, i.e., compile-time information.

All right: “forward mode AD” (FAD).

All left: “reverse mode AD” (RAD).

RAD is much better for gradient-based optimization.

Conal Elliott Simple essence of AD November 2018 18 / 29

Page 30: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Efficiency of composition

Composition is associative.

Some associations are more efficient than others, so

Associate optimally.

Equivalent to matrix chain multiplication — Opn log nq.

Choice determined by types, i.e., compile-time information.

All right: “forward mode AD” (FAD).

All left: “reverse mode AD” (RAD).

RAD is much better for gradient-based optimization.

Conal Elliott Simple essence of AD November 2018 18 / 29

Page 31: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Left-associating composition (RAD)

CPS-like category:

Represent a ; b by pb ; rq Ñ pa ; rq.

Meaning: f 1 ÞÑ pλh Ñ h ˝ f 1q.

Construct h ˝D f a directly, without D f a.

Old technique (Cayley 1854), vastly generalized by Yoneda.

Conal Elliott Simple essence of AD November 2018 19 / 29

Page 32: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Continuation category (specification)

newtype Contrp;q a b “ Cont ppb ; rq Ñ pa ; rqq

cont :: Category p;q ñ pa ; bq Ñ Contrp;q a b

cont f “ Cont p˝ f q

Require cont to preserve structure. Solve for methods.

Conal Elliott Simple essence of AD November 2018 20 / 29

Page 33: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Continuation category (solution)

instance Category p;q ñ Category Contrp;q where

id “ Cont idCont g ˝ Cont f “ Cont pf ˝ gq

instance Cartesian p;q ñ Cartesian Contrp;q where

exl “ Cont pjoin ˝ inlqexr “ Cont pjoin ˝ inrqpŸq “ inNew2 pλf g Ñ pf Ź gq ˝ unjoinq

instance Cocartesian p;q ñ Cocartesian Contrp;q where

inl “ Cont pexl ˝ unjoinqinr “ Cont pexr ˝ unjoinqpŹq “ inNew2 pλf g Ñ join ˝ pf Ÿ gqq

instance ScalarCat p;q a ñ ScalarCat Contrp;q a where

scale s “ Cont pscale sq

Conal Elliott Simple essence of AD November 2018 21 / 29

Page 34: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Reverse-mode AD without tears

DContrMs

Conal Elliott Simple essence of AD November 2018 22 / 29

Page 35: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Duality

Vector space dual: u ( s, with u a vector space over s.

If u has finite dimension, then u ( s – u.

Represent a ( b by pb ( sq Ñ pa ( sq by b Ñ a.

Ideal for extracting gradient vector. Just apply to 1 (id).

Conal Elliott Simple essence of AD November 2018 23 / 29

Page 36: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Duality (specification)

newtype Dual p;q a b “ Dual pb ; aq

asDual :: Contsp;q a b Ñ Dual p;q a b

asDual pCont f q “ Dual pdot´1 ˝ f ˝ dotq

where

dot :: u Ñ pu ( sq

dot´1 :: pu ( sq Ñ u

Require asDual to preserve structure. Solve for methods.

Conal Elliott Simple essence of AD November 2018 24 / 29

Page 37: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Duality (solution)

newtype Dual p;q a b “ Dual pb ; aq

instance Category p;q ñ Category Dual p;q where

id “ Dual idp˝q “ inNew2 pflip p˝qq

instance Cocartesian p;q ñ Cartesian Dual p;q where

exl “ Dual inlexr “ Dual inrpŸq “ inNew2 pŹq

instance Cartesian p;q ñ Cocartesian Dual p;q where

inl “ Dual exlinr “ Dual exrpŹq “ inNew2 pŸq

instance ScalarCat p;q s ñ ScalarCat Dual p;q s where

scale s “ Dual pscale sqConal Elliott Simple essence of AD November 2018 25 / 29

Page 38: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Backpropagation

DDualÑ

Conal Elliott Simple essence of AD November 2018 26 / 29

Page 39: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Backpropagation

DDualÑ

Conal Elliott Simple essence of AD November 2018 26 / 29

Page 40: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Conclusions

Simple AD algorithm, specializing to forward, reverse, mixed.

No graphs, tapes, tags, partial derivatives, or mutation.

Parallel-friendly and low memory use.

Calculated from simple, regular algebra problems.

Generalizes to derivative categories other than linear maps.

Differentiate regular Haskell code (via plugin).

ICFP 2018 paper: pictures, proofs, incremental computation.

Conal Elliott Simple essence of AD November 2018 27 / 29

Page 41: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Reflections: recipe for success

Key principles:

Capture main concepts as first-class values.

Focus on abstract notions, not specific representations.

Calculate efficient implementation from simple specification.

Not previously applied to AD (afaik).

Quandary: Most programming languages poor for function-like things.

Solution: Compiling to categories.

Conal Elliott Simple essence of AD November 2018 28 / 29

Page 42: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Reflections: recipe for success

Key principles:

Capture main concepts as first-class values.

Focus on abstract notions, not specific representations.

Calculate efficient implementation from simple specification.

Not previously applied to AD (afaik).

Quandary: Most programming languages poor for function-like things.

Solution: Compiling to categories.

Conal Elliott Simple essence of AD November 2018 28 / 29

Page 43: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Reflections: recipe for success

Key principles:

Capture main concepts as first-class values.

Focus on abstract notions, not specific representations.

Calculate efficient implementation from simple specification.

Not previously applied to AD (afaik).

Quandary: Most programming languages poor for function-like things.

Solution: Compiling to categories.

Conal Elliott Simple essence of AD November 2018 28 / 29

Page 44: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Reflections: recipe for success

Key principles:

Capture main concepts as first-class values.

Focus on abstract notions, not specific representations.

Calculate efficient implementation from simple specification.

Not previously applied to AD (afaik).

Quandary: Most programming languages poor for function-like things.

Solution: Compiling to categories.

Conal Elliott Simple essence of AD November 2018 28 / 29

Page 45: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Symbolic vs automatic differentiation

Often described as opposing techniques:

Symbolic:

Apply differentiation rules symbolically.

Can duplicate much work.

Needs algebraic manipulation.

Automatic:

FAD: easy to implement but often inefficient.

RAD: efficient but tricky to implement.

My view: AD is SD done by a compiler.

Compilers already work symbolically and preserve sharing.

Conal Elliott Simple essence of AD November 2018 29 / 29

Page 46: The simple essence of automatic di erentiationconal.net/talks/essence-of-automatic-differentiation-google.pdfConal ElliottSimple essence of ADNovember 2018 16 / 29. Extracting a data

Symbolic vs automatic differentiation

Often described as opposing techniques:

Symbolic:

Apply differentiation rules symbolically.

Can duplicate much work.

Needs algebraic manipulation.

Automatic:

FAD: easy to implement but often inefficient.

RAD: efficient but tricky to implement.

My view: AD is SD done by a compiler.

Compilers already work symbolically and preserve sharing.

Conal Elliott Simple essence of AD November 2018 29 / 29