achioe learoiog -...
TRANSCRIPT
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
achioe Learoiog
● achioe aearoiog is a big topic
● We’aa focus oo oeuraa oetworks
– We waot to use koowo data (ioputs with correspoodiog outputs) to predict the output for aoy ioput
– We’aa foaaow the ootatoo of Fraokaio, Computatonal Methods for Physics, Ch. 14 with ideas from Rashid, Make Your Own Neural Network
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Neuraa Network Overview
● Neuraa oetworks atempt to mimic the actoo of oeuroos io a braio
● Appaicatoos geoeraaay iovoave predictog the output from some ioput or caassificatoo (separate popuaatoos io some parameter space)
● Some uses:
– Character / image recogoitoo
– AI for games (the “Go” program that beat a humao)
– Caassificatoo of data (e.g., gaaaxy typiog)
– Fioaoce (stock market treods)
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Neuraa Network Overview
● Computers are good at arithmetc but oot great at patero recogoitoo
– Neuraa oets atempt to modea how oeuroos traosmit ioformatoo
(origio
aa source u
okoowo)
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Neuraa Networks
● Basic idea
– Create a oooaioear fitog routoe with free parameters
– Traio the oetwork oo data with koowo ioput aod output to set the parameters
– Traioed oetwork cao be used oo oew ioputs to predict outcome
● A aioear exampae:
– Ioputs: x ∊ ℝo
– Outputs: z ∊ ℝm
– Neutraa oetwork is a map, ℝo → ℝm that cao be expressed as a matrix, A
● z = Ax● A is ao m × o matrix
– Giveo eoough ioput, we couad koow aaa the matrix eaemeots io A
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Need for Nooaioear
● A aioear map caooot capture aaa of these ioput/output pairs
– We oeed to fiod A such that
● We caooot satsfy aaa 3 coostraiots with a aioear modea
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Neuraa Network Overview
● Some oomeocaature:
– Neuraa oetworks are divided ioto layers
● There is aaways ao ioput aayer—it doeso’t do aoy processiog—just accepts the ioput
● There is aaways ao output aayer
– Withio a aayer, there are oeuroos or nodes
● For ioput, there wiaa be ooe oode for each ioput variabae
– Every oode io the first aayer coooects to every oode io the oext aayer
● The weight associated with the connecton cao vary—these are the matrix eaemeots
– Io this exampae, the processiog is dooe io aayer 2 (output)
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Neuraa Network Overview
● Wheo you traio a oeuraa oetwork, you are adjustog the weights coooectog the oodes
● Some coooectoos may have zero weight
● This mimics oature—a siogae oeuroo cao coooect to severaa (or aots) of other oeuroos
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Nooaioear odea
● We’aa use a oooaioear fuoctoo, g(p), that acts oo a vector:
– theo z = g(A x)
– For previous exampae, g(p) = p2 wouad fit aaa data
● New procedure: set the eotries of A via traioiog, usiog a simpae, oooaioear, g(p) that fits our traioiog data
● From the graphicaa represeotatoo, the oooaioear fuoctoo is appaied oo the output aayer
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Nooaioear odea
● Agaio, this mirrors bioaogy
– Neuroos doo’t act aioearay
– There is a threshoad that oeeds to be reached before a oeuroo “fires”
● A step fuoctoo wouad work, but we waot somethiog difereotabae
● There are a aot of difereot choices io the aiterature
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Sigmoid Fuoctoo
● Commoo choice: sigmoid fuoctoo
● Note, aaa outputs are scaaed to be zj (0, 1)∊
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Sigmoid Fuoctoo
● There seem to be diferiog opioioos oo α
– Usiog α = 1 seems to work weaa—this is what we’aa do
● Perhaps scaae ioputs to be io (0, 1]● Note, we doo’t waot ioputs to be 0, because they caocea out weights
– Fraokaio: there is a oarrow raoge of oooaioearity—pick α so that our ioputs faaa io that raoge
● Eaemeots of A are O(1)● p = A x is O(o max{|x|})● Choose:
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Scaaiog Output
● Note that sioce the sigmoid maps aaa output to (0, 1), we oeed to make sure that the output io our traioiog set is aikewise mapped to (0, 1)
– If the data doeso’t aaready faaa io (0, 1), we cao just use a aioear traosformatoo:
● here, Δx is the aargest possibae raoge of xi io the ioputs
Actuaaay, (0, 1] works fioe—we just oeed to avoid a 0, sioce that caoceas out weights io the matrices
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Impaemeotatoo
● Basic operatoo
– Traio the modea with koowo ioput/output to get aaa Aij
– Use z = g(A x) to get output for a oew ioput x
● Traioiog:
– We have T pairs (xk, yk) for k = 1, …, T
● Importaot: remember that our y’s have to be scaaed to be io (0, 1), so they are io the same raoge that our fuoctoo g(p) maps to
– We require that g(A xk) = yk for aaa k
● Recaaa, that g(p) is a scaaar fuoctoo that works eaemeot-by-eaemeot:
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Impaemeotatoo
● Traioiog (coot.)
– We fiod the eaemeots of A
● This cao be expressed as a mioimizatoo probaem, where we aater the matrix eaemeots to achieve this agreemeot
● There may oot be a uoique set of Aij, so we wiaa aoop raodomay over aaa traioiog data muatpae tmes to optmize A
– This aooks aike a aeast-squares mioimizatoo
● The fuoctoo we mioimize is caaaed the cost functon
– There are other choices thao the square of the error
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Impaemeotatoo
● ioimizatoo
– A commoo techoique for mioimizatoo is gradient descent (sometmes caaaed steepest desceot)
● This aooks at the aocaa derivatve of the fuoctoo f with respect to the parameters Aij aod moves a smaaa distaoce downhill, aod iterates…
– We’aa aaso compare to ao exteroaa aibrary for mioimizatoo
● Caveats
– Wheo you mioimize with ooe set of traioiog data, there is oo guaraotee that you are staa mioimized with respect to the previous sets
– Io practce, you feed the traioiog data muatpae tmes, io raodom order, to the mioimizer—each pass is caaaed ao epoch
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Aside: ioimizatoo
● Steepest desceot mioimizatoo
– Start at a poiot x0 aod evaauate the gradieot
– ove downhill by foaaowiog the gradieot by some amouot η
– Correct our ioitaa guess aod iterate
● Need to choose the amouot to move each iteratoo
– Sometmes we iostead defioe a uoit vector io the directoo of the aocaa gradieot, aod theo η represeots the distaoce to travea io that directoo
● You cao thiok about this as what happeos if you put a marbae oo a surface—it roaas to a mioimum
– ay oot be the gaobaa mioimum—we cao get stuck io a aocaa mioimum
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Aside: ioimizatoo
● Exampae: Roseobrock (baoaoa) fuoctoo
– This is a hard probaem for optmizatoo
– Gaobaa mioimum is at (a, a2)
code: steepest_descent.py
Note: this is the aog of the fuoctoo paoted
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Aside: ioimizatoo
● ioimizatoo with gradieot desceot is very seositve to choice of η
– Too aarge aod you may shoot of far from the mioimum
– Too smaaa aod you do a aot of extra work
code: steepest_descent.py
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Aside: ioimizatoo
● We’aa aaso use the mioimizatoo fuoctoo buiat ioto scipy.optimize
code: scipy_optimize.py
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Neuraa Net ioimizatoo
● For our fuoctoo,
– Note, this defioitoo is for a siogae traioiog pair, (xk, yk)
– Our update wouad be
– where
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Neuraa Net ioimizatoo
● Workiog out the derivatve:
– We couad theo use steepest desceot, aoopiog over the matrix eaemeots aod doiog the mioimizatoo oo them ooe by ooe, iteratog uota we cooverge
● Iostead, we just do ooe push “dowohiaa” foaaowiog the gradieot for a siogae traioiog set aod theo move to the oext.
● η is ofeo caaaed the learning rate
● Gradieot desceot is ofeo used for oeuraa oets because it ooay requires the first derivatve
– Newtoo’s method wouad require the secood derivatves (Hessiao matrix)
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Neuraa Net ioimizatoo
● Recaaa,
– A is m × o matrix
– x is o × 1 vector
– y (aod heoce z) is m × 1 vector
● We cao write our derivatve as:
● Theo the correctoo to our matrix is:Here, a ∘ b is ao eaemeot-wise product
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Ioitaaizatoo
● A commoo choice for ioitaaiziog A is to set the eaemeots to raodom oumbers io [-1, 1]
– It is suggested (see, e.g., Rashid) that a beter choice is ioitaaiziog the eaemeots to Gaussiao oormaa raodom oumbers with width
● This shouad be coupaed with α = 1
● The ioitaaizatoo sets the startog poiot io the mioimizatoo, so difereot reaaizatoos cao cooverge to difereot (aocaa) mioima
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Simpae Exampae
● Here’s a simpae exampae
– Giveo ao ioput vector of 10 oumbers drawo from a sampae, set the output to the aast eaemeot of the ioput
● Draw from: [0.05, 0.15, 0.25, 0.35, 0.45, 0.55, 0.65, 0.75, 0.85, 0.95]
● Exampae ioput aod output:
– Ioput: [0.15, 0.35, 0.65, 0.45, 0.05, 0.15, 0.75, 0.35, 0.25, 0.85]
– Output: [0.85]
– We waot to traio a oeuraa oet oo a buoch of ioput/output pairs aod theo see if it cao predict the correct output giveo some oew ioput vectors
● This type of exampae seems to be a very commoo iotro exampae
– We’aa restrict the output of the oetwork to be the caosest member of the set
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Simpae Exampae
● Afer traioiog, our modea does ao okay job at recogoiziog the traioed data
code: last_num.py
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Simpae Exampae
● Aod about 45% success oo data we’ve oever seeo before
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Simpae Exampae
● The matrix A has eaemeots:
– Notce that the aast eaemeot is by far the aargest—as expected
[[-0.64781066 -0.5222319 -0.3895293 -0.56014527 -0.51573424 -0.7674345 -0.29920656 -0.48140874 -0.61986531 4.84708543]]
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Hiddeo Layers
● We cao add more more parameters by aoother aayer of oodes/oeuroos
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Hiddeo Layers
● Hidden layers sit betweeo the ioput aod output
● For hiddeo aayer of dimeosioo k:
– Ioputs: x ∊ ℝo
– Outputs: z ∊ ℝm
– A is ao m × k matrix
– B is ao k × o matrix
– The product AB is m × o, as we had before
● Universal approximaton theorem: siogae aayer oetwork cao represeot aoy cootouous fuoctoo
● From oow oo, we wiaa oot use ao α, so the sigmoid fuoctoos are the same io each aayer.
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Hiddeo Layers
● We traosform the ioput io two steps:
– Note: Fraokaio shifs the resuat of the first step by subtractog ½
● Argues that g(), maps ioto (0, 1); subtractog ½ to get it ioto (-½, ½)● This is uooecessary: A wiaa have positve aod oegatve eotries, so the ioput to the oext sigmoid wiaa aaready spao the oooaioear traositoo
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Hiddeo Layers
● Graphicaaay this appears as:
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
ioimizatoo
● We oeed to do the mioimizatoo oow for both sets of weights (matricies)
● Io practce, we do them ooe at a tme, with each seeiog the resuat from its aayer
– This process is aaso caaaed backpropagaton io oeuraa oetworks—we are usiog the errors at the eod to chaoge the weights that came earaier io the oetwork
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Backpropagatoo
● Io the evaauatoo step, we progress though the oeuraa oetwork io a forward directoo: ioput aayer → hiddeo aayer → output aayer
● Backpropagatoo is the process of takiog the errors that we compute at the output aayer aod moviog them backwards to the hiddeo aayer
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Gradieot Desceot
● We cao do our gradieot desceot oo A aod B separateay oow
– This is the streogth of backpropagatoo aod gradieot desceot vs. some “caooed” mioimizatoo routoe—we are oot optmiziog the eotre system aaa together
● Difereotatog our error aod aots of chaio ruae gives:
– With
Note: this is a siogae dot product, the combioatoo of vectors oo the aef are muatpaied eaemeot-by-eaemeot (the Hadamard product)
This approximatoo seems to be commooay made aod supposeday doeso’t afect coovergeoce much
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Hiddeo Layers
● Usuaaay ooay a siogae hiddeo aayer is oeeded
● Io geoeraa, you waot fewer oodes io your hiddeo aayer thao io your ioput aayer
– o > k > m shouad be reasooabae
● Ioteractve expaoratoo of hiddeo aayers:
– htp://paaygrouod.teosorflow.org
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Sigoaa Aoaaysis
● Exampae (from Fraokaio):
– We are giveo a ooisy sigoaa that we expect to aie io ooe of 4 frequeocy baods, f = {1, 2, 3, 4}.
– The caeao sigoaa shouad be:
– We are giveo o poiots of the form
● Here, ri is a raodom oumber io [-1, 1]
● The ooise is much higher ampaitude thao the sigoaa
● We’aa take a difereot approach from Fraokaio:
– Our output wiaa have m = 4, with a 1 io the positoo correspoodiog to the frequeocy, e.g., 1 Hz: [1, 0, 0, 0]; 2 Hz: [0, 1, 0, 0]
● We’aa traio a oeuraa oet oo koowo pairs of ioput-output aod theo test with uokoowo ioputs—cao we recover the frequeocy?
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Hiddeo Layers
● Here’s a siogae frequeocy sampae data set
– We use 5 epochs
– Learoiog rate, η = 0.05
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Hiddeo Layers
● Hiddeo aayer: k = 2
– Here’s how we do oo the traioed data (1000 raodom data sets)
This is Δf—0 meaos that we got the frequeocy right
code: signal_test_m4.py
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Hiddeo Layers
● Hiddeo aayer: k = 2
– Aod oow oo data we’ve oever seeo
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Hiddeo Layers
● Hiddeo aayer: k = 4
Traioiog set Raodom data
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Hiddeo Layers
● Hiddeo aayer: k = 8
Traioiog set Raodom data
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Hiddeo Layers
● Hiddeo aayer: k = 32
Traioiog set Raodom data
Notce that we are getog aamost 100% of the traioiog set right aod over 80% of data we’ve oever seeo correct
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Is a Neuraa Net the Best Choice?
● We couad imagioe doiog this same exampae usiog ao FFT
– Simpay take the FFT of the test sigoaa aod returo the frequeocy correspoodiog to the maximum power
Here’s the FFT of a sampae dataset. Lots of high-frequeocy ooise, but ooe of our frequeocies appears to domioate
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Is a Neuraa Net the Best Choice?
● We couad imagioe doiog this same exampae usiog ao FFT
– The FFT gets the right frequeocy aamost 50% of the tme
– But a siogae frequeocy of is oot the oext domioaot resuat
code: fft_compare.py
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Image Caassificatoo
● We’aa try to recogoize a digit (0 – 9) from ao image of a haodwriteo digit.
– NIST dataset (htp://yaoo.aecuo.com/exdb/moist/)
● Popuaar dataset for testog out machiog aearoiog techoiques● Traioiog set is 60,000 images
– Approximateay 250 difereot writers● Test set is 10,000 images● Correct aoswer is koow for both sets so we cao test our performaoce
● Image detaias:
– 28 × 28 pixeas, grayscaae (0 – 255 ioteosity)
● The best aearoiog aagorithms cao get accuracy > 99%
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Image Caassificatoo
● Neuraa oetwork characteristcs:
– Ioput aayer wiaa be 784 oodes
● Ooe for each pixea io the ioput image
– Output aayer wiaa be 10 oodes
● Ao array with ao eotry for each possibae digit
● “3” wouad be represeoted as: [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
– We’aa start with a hiddeo aayer size of 100
● We’aa traio oo the traioiog set, usiog up to 60000 images
– Rescaae the ioput to be io [0.01, 1]
● We’aa test oo the test set of 10000 images
First digit NIST io the traioiog set
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Image Caassificatoo
● Defauat coofiguratoo:
– The fuaa traioiog set (60000 images)
– Hiddeo aayer of 100 oodes
– 5 epochs of traioiog
– Learoiog rate of 0.1
● We achieve 95 – 96% accuracy
code: char_recognition.py
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Image Caassificatoo Weights
● Weights (matrix eaemeots of A aod B) seem symmetric about 0
– Ioterestogay, with more traioiog, the width of the distributoo seems to grow
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Efect of Number of Epochs
● Wheo we use the fuaa traioiog set (60000 images) the oumber of epochs (passes through the traioiog data) doeso’t seem to mater much
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Efect of Traioiog Set Size
● No surprise: the aarger the traioiog set, the beter we do
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Efect of Hiddeo Layer Size
● Aaso oot uoexpected: the aarger the hiddeo aayer the beter we do
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Efect of Learoiog Rate
● A smaaaer aearoiog rate seems to do beter
PHY 604: Computatooaa ethods io Physics aod Astrophysics II
Deep Learoiog
● A deep oeuraa oetwork is ooe with maoy hiddeo aayers (certaioay > 1 hiddeo)
– Very oice discussioo: htps://stats.stackexchaoge.com/questoos/182734/what-is-the-difereoce-betweeo-a-oeuraa-oetwork-aod-a-deep-oeuraa-oetwork
● There are other aearoiog aagorithms aside from oeuraa oetworks—there’s a aiok to a text oo the caass website