achioe learoiog -...

58
PHY 604: Computatooaa ethods io Physics aod Astrophysics II achioe Learoiog

Upload: dinhnhi

Post on 07-Aug-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

achioe Learoiog

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

achioe Learoiog

● achioe aearoiog is a big topic

● We’aa focus oo oeuraa oetworks

– We waot to use koowo data (ioputs with correspoodiog outputs) to predict the output for aoy ioput

– We’aa foaaow the ootatoo of Fraokaio, Computatonal Methods for Physics, Ch. 14 with ideas from Rashid, Make Your Own Neural Network

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Neuraa Network Overview

● Neuraa oetworks atempt to mimic the actoo of oeuroos io a braio

● Appaicatoos geoeraaay iovoave predictog the output from some ioput or caassificatoo (separate popuaatoos io some parameter space)

● Some uses:

– Character / image recogoitoo

– AI for games (the “Go” program that beat a humao)

– Caassificatoo of data (e.g., gaaaxy typiog)

– Fioaoce (stock market treods)

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Neuraa Network Overview

● Computers are good at arithmetc but oot great at patero recogoitoo

– Neuraa oets atempt to modea how oeuroos traosmit ioformatoo

(origio

aa source u

okoowo)

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Neuraa Networks

● Basic idea

– Create a oooaioear fitog routoe with free parameters

– Traio the oetwork oo data with koowo ioput aod output to set the parameters

– Traioed oetwork cao be used oo oew ioputs to predict outcome

● A aioear exampae:

– Ioputs: x ∊ ℝo

– Outputs: z ∊ ℝm

– Neutraa oetwork is a map, ℝo → ℝm that cao be expressed as a matrix, A

● z = Ax● A is ao m × o matrix

– Giveo eoough ioput, we couad koow aaa the matrix eaemeots io A

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Need for Nooaioear

● A aioear map caooot capture aaa of these ioput/output pairs

– We oeed to fiod A such that

● We caooot satsfy aaa 3 coostraiots with a aioear modea

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Neuraa Network Overview

● Some oomeocaature:

– Neuraa oetworks are divided ioto layers

● There is aaways ao ioput aayer—it doeso’t do aoy processiog—just accepts the ioput

● There is aaways ao output aayer

– Withio a aayer, there are oeuroos or nodes

● For ioput, there wiaa be ooe oode for each ioput variabae

– Every oode io the first aayer coooects to every oode io the oext aayer

● The weight associated with the connecton cao vary—these are the matrix eaemeots

– Io this exampae, the processiog is dooe io aayer 2 (output)

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Neuraa Network Overview

● Wheo you traio a oeuraa oetwork, you are adjustog the weights coooectog the oodes

● Some coooectoos may have zero weight

● This mimics oature—a siogae oeuroo cao coooect to severaa (or aots) of other oeuroos

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Nooaioear odea

● We’aa use a oooaioear fuoctoo, g(p), that acts oo a vector:

– theo z = g(A x)

– For previous exampae, g(p) = p2 wouad fit aaa data

● New procedure: set the eotries of A via traioiog, usiog a simpae, oooaioear, g(p) that fits our traioiog data

● From the graphicaa represeotatoo, the oooaioear fuoctoo is appaied oo the output aayer

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Nooaioear odea

● Agaio, this mirrors bioaogy

– Neuroos doo’t act aioearay

– There is a threshoad that oeeds to be reached before a oeuroo “fires”

● A step fuoctoo wouad work, but we waot somethiog difereotabae

● There are a aot of difereot choices io the aiterature

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Sigmoid Fuoctoo

● Commoo choice: sigmoid fuoctoo

● Note, aaa outputs are scaaed to be zj (0, 1)∊

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Sigmoid Fuoctoo

● There seem to be diferiog opioioos oo α

– Usiog α = 1 seems to work weaa—this is what we’aa do

● Perhaps scaae ioputs to be io (0, 1]● Note, we doo’t waot ioputs to be 0, because they caocea out weights

– Fraokaio: there is a oarrow raoge of oooaioearity—pick α so that our ioputs faaa io that raoge

● Eaemeots of A are O(1)● p = A x is O(o max{|x|})● Choose:

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Scaaiog Output

● Note that sioce the sigmoid maps aaa output to (0, 1), we oeed to make sure that the output io our traioiog set is aikewise mapped to (0, 1)

– If the data doeso’t aaready faaa io (0, 1), we cao just use a aioear traosformatoo:

● here, Δx is the aargest possibae raoge of xi io the ioputs

Actuaaay, (0, 1] works fioe—we just oeed to avoid a 0, sioce that caoceas out weights io the matrices

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Impaemeotatoo

● Basic operatoo

– Traio the modea with koowo ioput/output to get aaa Aij

– Use z = g(A x) to get output for a oew ioput x

● Traioiog:

– We have T pairs (xk, yk) for k = 1, …, T

● Importaot: remember that our y’s have to be scaaed to be io (0, 1), so they are io the same raoge that our fuoctoo g(p) maps to

– We require that g(A xk) = yk for aaa k

● Recaaa, that g(p) is a scaaar fuoctoo that works eaemeot-by-eaemeot:

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Impaemeotatoo

● Traioiog (coot.)

– We fiod the eaemeots of A

● This cao be expressed as a mioimizatoo probaem, where we aater the matrix eaemeots to achieve this agreemeot

● There may oot be a uoique set of Aij, so we wiaa aoop raodomay over aaa traioiog data muatpae tmes to optmize A

– This aooks aike a aeast-squares mioimizatoo

● The fuoctoo we mioimize is caaaed the cost functon

– There are other choices thao the square of the error

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Impaemeotatoo

● ioimizatoo

– A commoo techoique for mioimizatoo is gradient descent (sometmes caaaed steepest desceot)

● This aooks at the aocaa derivatve of the fuoctoo f with respect to the parameters Aij aod moves a smaaa distaoce downhill, aod iterates…

– We’aa aaso compare to ao exteroaa aibrary for mioimizatoo

● Caveats

– Wheo you mioimize with ooe set of traioiog data, there is oo guaraotee that you are staa mioimized with respect to the previous sets

– Io practce, you feed the traioiog data muatpae tmes, io raodom order, to the mioimizer—each pass is caaaed ao epoch

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Aside: ioimizatoo

● Steepest desceot mioimizatoo

– Start at a poiot x0 aod evaauate the gradieot

– ove downhill by foaaowiog the gradieot by some amouot η

– Correct our ioitaa guess aod iterate

● Need to choose the amouot to move each iteratoo

– Sometmes we iostead defioe a uoit vector io the directoo of the aocaa gradieot, aod theo η represeots the distaoce to travea io that directoo

● You cao thiok about this as what happeos if you put a marbae oo a surface—it roaas to a mioimum

– ay oot be the gaobaa mioimum—we cao get stuck io a aocaa mioimum

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Aside: ioimizatoo

● Exampae: Roseobrock (baoaoa) fuoctoo

– This is a hard probaem for optmizatoo

– Gaobaa mioimum is at (a, a2)

code: steepest_descent.py

Note: this is the aog of the fuoctoo paoted

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Aside: ioimizatoo

● ioimizatoo with gradieot desceot is very seositve to choice of η

– Too aarge aod you may shoot of far from the mioimum

– Too smaaa aod you do a aot of extra work

code: steepest_descent.py

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Aside: ioimizatoo

● We’aa aaso use the mioimizatoo fuoctoo buiat ioto scipy.optimize

code: scipy_optimize.py

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Neuraa Net ioimizatoo

● For our fuoctoo,

– Note, this defioitoo is for a siogae traioiog pair, (xk, yk)

– Our update wouad be

– where

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Neuraa Net ioimizatoo

● Workiog out the derivatve:

– We couad theo use steepest desceot, aoopiog over the matrix eaemeots aod doiog the mioimizatoo oo them ooe by ooe, iteratog uota we cooverge

● Iostead, we just do ooe push “dowohiaa” foaaowiog the gradieot for a siogae traioiog set aod theo move to the oext.

● η is ofeo caaaed the learning rate

● Gradieot desceot is ofeo used for oeuraa oets because it ooay requires the first derivatve

– Newtoo’s method wouad require the secood derivatves (Hessiao matrix)

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Neuraa Net ioimizatoo

● Recaaa,

– A is m × o matrix

– x is o × 1 vector

– y (aod heoce z) is m × 1 vector

● We cao write our derivatve as:

● Theo the correctoo to our matrix is:Here, a ∘ b is ao eaemeot-wise product

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Ioitaaizatoo

● A commoo choice for ioitaaiziog A is to set the eaemeots to raodom oumbers io [-1, 1]

– It is suggested (see, e.g., Rashid) that a beter choice is ioitaaiziog the eaemeots to Gaussiao oormaa raodom oumbers with width

● This shouad be coupaed with α = 1

● The ioitaaizatoo sets the startog poiot io the mioimizatoo, so difereot reaaizatoos cao cooverge to difereot (aocaa) mioima

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Simpae Exampae

● Here’s a simpae exampae

– Giveo ao ioput vector of 10 oumbers drawo from a sampae, set the output to the aast eaemeot of the ioput

● Draw from: [0.05, 0.15, 0.25, 0.35, 0.45, 0.55, 0.65, 0.75, 0.85, 0.95]

● Exampae ioput aod output:

– Ioput: [0.15, 0.35, 0.65, 0.45, 0.05, 0.15, 0.75, 0.35, 0.25, 0.85]

– Output: [0.85]

– We waot to traio a oeuraa oet oo a buoch of ioput/output pairs aod theo see if it cao predict the correct output giveo some oew ioput vectors

● This type of exampae seems to be a very commoo iotro exampae

– We’aa restrict the output of the oetwork to be the caosest member of the set

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Simpae Exampae

● Afer traioiog, our modea does ao okay job at recogoiziog the traioed data

code: last_num.py

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Simpae Exampae

● Aod about 45% success oo data we’ve oever seeo before

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Simpae Exampae

● The matrix A has eaemeots:

– Notce that the aast eaemeot is by far the aargest—as expected

[[-0.64781066 -0.5222319 -0.3895293 -0.56014527 -0.51573424 -0.7674345 -0.29920656 -0.48140874 -0.61986531 4.84708543]]

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Hiddeo Layers

● We cao add more more parameters by aoother aayer of oodes/oeuroos

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Hiddeo Layers

● Hidden layers sit betweeo the ioput aod output

● For hiddeo aayer of dimeosioo k:

– Ioputs: x ∊ ℝo

– Outputs: z ∊ ℝm

– A is ao m × k matrix

– B is ao k × o matrix

– The product AB is m × o, as we had before

● Universal approximaton theorem: siogae aayer oetwork cao represeot aoy cootouous fuoctoo

● From oow oo, we wiaa oot use ao α, so the sigmoid fuoctoos are the same io each aayer.

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Hiddeo Layers

● We traosform the ioput io two steps:

– Note: Fraokaio shifs the resuat of the first step by subtractog ½

● Argues that g(), maps ioto (0, 1); subtractog ½ to get it ioto (-½, ½)● This is uooecessary: A wiaa have positve aod oegatve eotries, so the ioput to the oext sigmoid wiaa aaready spao the oooaioear traositoo

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Hiddeo Layers

● Graphicaaay this appears as:

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Hiddeo Layers

● Now we mioimize:

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

ioimizatoo

● We oeed to do the mioimizatoo oow for both sets of weights (matricies)

● Io practce, we do them ooe at a tme, with each seeiog the resuat from its aayer

– This process is aaso caaaed backpropagaton io oeuraa oetworks—we are usiog the errors at the eod to chaoge the weights that came earaier io the oetwork

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Backpropagatoo

● Io the evaauatoo step, we progress though the oeuraa oetwork io a forward directoo: ioput aayer → hiddeo aayer → output aayer

● Backpropagatoo is the process of takiog the errors that we compute at the output aayer aod moviog them backwards to the hiddeo aayer

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Gradieot Desceot

● We cao do our gradieot desceot oo A aod B separateay oow

– This is the streogth of backpropagatoo aod gradieot desceot vs. some “caooed” mioimizatoo routoe—we are oot optmiziog the eotre system aaa together

● Difereotatog our error aod aots of chaio ruae gives:

– With

Note: this is a siogae dot product, the combioatoo of vectors oo the aef are muatpaied eaemeot-by-eaemeot (the Hadamard product)

This approximatoo seems to be commooay made aod supposeday doeso’t afect coovergeoce much

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Hiddeo Layers

● Usuaaay ooay a siogae hiddeo aayer is oeeded

● Io geoeraa, you waot fewer oodes io your hiddeo aayer thao io your ioput aayer

– o > k > m shouad be reasooabae

● Ioteractve expaoratoo of hiddeo aayers:

– htp://paaygrouod.teosorflow.org

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Sigoaa Aoaaysis

● Exampae (from Fraokaio):

– We are giveo a ooisy sigoaa that we expect to aie io ooe of 4 frequeocy baods, f = {1, 2, 3, 4}.

– The caeao sigoaa shouad be:

– We are giveo o poiots of the form

● Here, ri is a raodom oumber io [-1, 1]

● The ooise is much higher ampaitude thao the sigoaa

● We’aa take a difereot approach from Fraokaio:

– Our output wiaa have m = 4, with a 1 io the positoo correspoodiog to the frequeocy, e.g., 1 Hz: [1, 0, 0, 0]; 2 Hz: [0, 1, 0, 0]

● We’aa traio a oeuraa oet oo koowo pairs of ioput-output aod theo test with uokoowo ioputs—cao we recover the frequeocy?

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Hiddeo Layers

● Here’s a siogae frequeocy sampae data set

– We use 5 epochs

– Learoiog rate, η = 0.05

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Hiddeo Layers

● Hiddeo aayer: k = 2

– Here’s how we do oo the traioed data (1000 raodom data sets)

This is Δf—0 meaos that we got the frequeocy right

code: signal_test_m4.py

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Hiddeo Layers

● Hiddeo aayer: k = 2

– Aod oow oo data we’ve oever seeo

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Hiddeo Layers

● Hiddeo aayer: k = 4

Traioiog set Raodom data

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Hiddeo Layers

● Hiddeo aayer: k = 8

Traioiog set Raodom data

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Hiddeo Layers

● Hiddeo aayer: k = 32

Traioiog set Raodom data

Notce that we are getog aamost 100% of the traioiog set right aod over 80% of data we’ve oever seeo correct

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Is a Neuraa Net the Best Choice?

● We couad imagioe doiog this same exampae usiog ao FFT

– Simpay take the FFT of the test sigoaa aod returo the frequeocy correspoodiog to the maximum power

Here’s the FFT of a sampae dataset. Lots of high-frequeocy ooise, but ooe of our frequeocies appears to domioate

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Is a Neuraa Net the Best Choice?

● We couad imagioe doiog this same exampae usiog ao FFT

– The FFT gets the right frequeocy aamost 50% of the tme

– But a siogae frequeocy of is oot the oext domioaot resuat

code: fft_compare.py

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Image Caassificatoo

● We’aa try to recogoize a digit (0 – 9) from ao image of a haodwriteo digit.

– NIST dataset (htp://yaoo.aecuo.com/exdb/moist/)

● Popuaar dataset for testog out machiog aearoiog techoiques● Traioiog set is 60,000 images

– Approximateay 250 difereot writers● Test set is 10,000 images● Correct aoswer is koow for both sets so we cao test our performaoce

● Image detaias:

– 28 × 28 pixeas, grayscaae (0 – 255 ioteosity)

● The best aearoiog aagorithms cao get accuracy > 99%

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Image Caassificatoo

● Neuraa oetwork characteristcs:

– Ioput aayer wiaa be 784 oodes

● Ooe for each pixea io the ioput image

– Output aayer wiaa be 10 oodes

● Ao array with ao eotry for each possibae digit

● “3” wouad be represeoted as: [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]

– We’aa start with a hiddeo aayer size of 100

● We’aa traio oo the traioiog set, usiog up to 60000 images

– Rescaae the ioput to be io [0.01, 1]

● We’aa test oo the test set of 10000 images

First digit NIST io the traioiog set

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Image Caassificatoo

● Defauat coofiguratoo:

– The fuaa traioiog set (60000 images)

– Hiddeo aayer of 100 oodes

– 5 epochs of traioiog

– Learoiog rate of 0.1

● We achieve 95 – 96% accuracy

code: char_recognition.py

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Some Image Caassificatoo Faiaures

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Some Image Caassificatoo Faiaures

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Some Image Caassificatoo Faiaures

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Image Caassificatoo Weights

● Weights (matrix eaemeots of A aod B) seem symmetric about 0

– Ioterestogay, with more traioiog, the width of the distributoo seems to grow

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Efect of Number of Epochs

● Wheo we use the fuaa traioiog set (60000 images) the oumber of epochs (passes through the traioiog data) doeso’t seem to mater much

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Efect of Traioiog Set Size

● No surprise: the aarger the traioiog set, the beter we do

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Efect of Hiddeo Layer Size

● Aaso oot uoexpected: the aarger the hiddeo aayer the beter we do

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Efect of Learoiog Rate

● A smaaaer aearoiog rate seems to do beter

PHY 604: Computatooaa ethods io Physics aod Astrophysics II

Deep Learoiog

● A deep oeuraa oetwork is ooe with maoy hiddeo aayers (certaioay > 1 hiddeo)

– Very oice discussioo: htps://stats.stackexchaoge.com/questoos/182734/what-is-the-difereoce-betweeo-a-oeuraa-oetwork-aod-a-deep-oeuraa-oetwork

● There are other aearoiog aagorithms aside from oeuraa oetworks—there’s a aiok to a text oo the caass website