semi-supervised supernova classification · pdf filesemi-supervised supernova classi cation...

Semi-Supervised Supernova Classification

Joseph W. Richards

UC BerkeleyDepartment of AstronomyDepartment of Statistics

[email protected]

KICP Supernova Hub workshopPhotometric Identification of Supernovae

March 16, 2012

Center for Time-Domain Informatics

UC Berkeley (UCB):Faculty/StaffJosh Bloom, Dan Starr (Astro), John Rice, Noureddine El Karoui (Stats),Martin Wainwright, Masoud Nikravesh (CS)Post-DocsBrad Cenko, Berian James, JWR, Dovi Poznanski & Nat Butler (alumni)Grad StudentsAdam Miller, Adam Morgan, Chris Klein, James Long, Henrik Brink,Sharmo BhattacharyyaUndergradsPierre Christian, Tatyana Gavrilchenko, Stuart Gegenheimer, AnthonyParedes, Benjamin Gerard

Lawrence Berkeley National Laboratory (LBNL):Peter Nugent, Horst Simon

Visit our website: http://cftd.info/

J. Richards Semi-Supervised Classification 2

Approaches to PhotometricSupernova Typingfrom the results of the SN Photometric ClassificationChallenge: Kessler et al. (2010). arXiv:1008.1024


Template Fitting

Template Fitting

(1) Light curve data are fitted to each template in a largedictionary via maximum likelihood (i.e. χ2 minimization) orBayesian fitting.(2) Either (a) the class of the best fitting template is chosen,or (b) a Type Ia likelihood ratio test is performed.

I Latent model parameters (z , t0, stretch, reddening) areusually optimized (or marginalized) over.

I This is similar to hypothesis testing in the presence ofnuisance parameters.

SN Challenge Entries: Belov+Glazov; Gonzalez;Portsmouth χ2; Poz2007; Rodney; Sako; SNANA Cuts


Template Fitting

Pros 1 Easy to insert known physics (time dilation,K-corrections, reddening, etc.)

2 Simultaneously gives estimates of physical parameters3 If the physical model is correct, it can be used to

extrapolate to different populations.

Cons 1 Requires full time-varying SED template for each class ofsupernova (or at least for Type Ia’s)

2 Errors can arise due to incomplete basis and/or anoverly-restrictive model and/or degeneracies

3 Does not automatically learn from new data


Supervised Learning

Supervised Learning

(1) Class-predictive summary statistics (features) areestimated from light curve data.(2) Flexible (non-parametric) classification models are used tomap from feature vector to SN type.

I Domain knowledge crucial in constructing features.

I Training set of SN data of known type is needed to learnthe appropriate classification model.

I Probabilistic classifiers are often used (easy to tunepurity/efficiency).

SN Challenge Entries: JEDI entries; MGU+DU-1, 2;Portsmouth-Hubble


Supervised Learning

Pros 1 Automatically learns from new labeled data.2 No model or SED templates are necessary.3 Prediction is trivial; no model fitting needed.4 Can use dozens or even hundreds of features.

Cons 1 Classifier cannot extrapolate beyond the training set.2 Physics only enter through the features.

Difficult to inject known physics.3 Does not learn anything from the (abundant) data from

unlabeled supernovae.


Semi-Supervised Learning


(1) Use data from all observed supernovae to learn anappropriate feature representation.(2) Learn best classification model in this feature space usinglabeled training set.

I Domain knowledge enters through a discrepancy measurebetween light curves

I Training set decides optimal feature representation andbest classifier.

I Probabilistic, non-parametric classifiers can be used (easyto tune purity/efficiency).

SN Challenge Entries: InCA (this work)



Pros 1 Automatically learns from all labeled AND unlabeleddata.

2 No model or SED templates are necessary.3 No need to define a set of features.4 Prediction is trivial; no model fitting needed.

Cons 1 Classifier cannot extrapolate beyond the training set.But should outperform supervised learning if there isnatural clustering structure.

2 Need to define a discrepancy measure between lightcurves

3 Physics can possibly enter through discrepancymeasure...


Supernova Typing withDiffusion MapWith Darren Homrighausen, Chad Schafer, Peter Freeman(CMU Statistics), and Dovi Poznanski (TAU)

Richards et al. (2012). arXiv:1103.6034


Photometric Supernova Classification

Semi-supervised Approach:

1 Construct localdissimilarity measurebetween LCs

2 Use all data to find alow-dimensionalembedding for all SNevia diffusion map

3 Use labeled SNe totrain random forestclassifier on diffusionmap coordinates

4 Predict the class ofeach unlabeled SNe


Diffusion Map

I Diffusion map – non-linear method to uncoverlow-dimensional structure in dataCoifman & Lafon 2006, Lafon & Lee 2006, Richards et al. 2009; ApJ 691, 32

I Idea – estimate the true discrepancy between data via afictive diffusion process (i.e., Markov random walk).

I Related spectral methods: LLE, Laplacian eigenmaps,Hessian eigenmaps, Isomap, etc.

I Other uses of diffusion map in Astrophysics:

1 Spectral basis for galaxy population synthesis modelingRichards et al. 2009; MNRAS 399, 1044

2 Adaptive regression for photometric redshift estimationFreeman et al. 2009; MNRAS 398, 2012


Diffusion Map: Intuition

p1(xi , xj) RW transition prob.

Diffusion distance at t: D2t (xi , xj) =

∑k

(pt(xi ,xk )−pt(xj ,xk ))2

φ0(xk )


Diffusion Map: Procedure

How to Construct a Diffusion Map

1 Construct weighted graph on data set {x1, ..., xn}, with weights

w(xi , xj) = exp

(−∆`(xi , xj)2

ε

)∆` is a local discrepancy measureε is a tuning parameter

2 p1(xi , xj) = w(xi , xj)/∑

k w(xi , xk) is transition probability in afictive Markov random walk on the data

3 Find svd of P: p1(xi , xj) =∑

l≥0 λlψl(xi )φl(xj)

4 The m-dimensional diffusion map is:

Ψ : xi 7→[λt1ψ1(xi ), λ

t2ψ2(xi ), · · · , λtmψm(xi )

]Result: Dt(xi , xj) ' ||Ψ(xi)−Ψ(xj)||2


Diffusion Map: Spiral Example


Diffusion Map for SN Light Curves

0 20 40 60 80

010

2030

4050

DES_SN001292 Type: 22

0 20 40 60 80 1000

2040

60


−20 0 20 40 60 80

050

100

150


−40 −20 0 20 40 60 80

020

4060

80


−60 −40 −20 0 20 40 60

010

2030

40


−50 0 50 100

010

2030

4050

60


−60 −40 −20 0 20 40

010

2030

4050

60


−50 −40 −30 −20 −10 0 10

020

4060

80100


−50 0 50 100

010

2030

4050

60


−50 0 50 100

020

4060

80100


−50 −40 −30 −20 −10 0 10

020

4060

80


−50 0 50

050

100

150

200

250


−50 0 50 100

010

2030

4050


Thursday, March 15, 2012

Graphical Model on SN Light Curves


SN Light Curves: Local Discrepancy Metric

Local distance measure:

∆`(xi , xj) =∑b

1

∆tij||x̃i,b − x̃j,b||2

b indexes band, x̃ denotes optimal(normalized) spline fit, found via GCV

I ∆` constructed to capturedifferences in light curve shapesand colors

I Pair-wise weights betweensupernovae:

w(xi , xj) = exp(−∆`(xi ,xj )

2

ε

)I Use m-dimensional diffusion space

representation {ψ1(x), ..., ψm(x)}to discriminate SN type


SN Classification: Diffusion Map Coordinates

Diffusion map representation of the 1103 spectroscopic SNeRed: Ia, Blue: Ib+Ic+Ibc, Green: IIn+IIP+IIL



We can obtain physical intuition by visualizing how SNe varyacross the diffusion map coordinate space1.0 0.5 0.0 0.5 1.0 1.5

0.15

0.10

0.05

0.00

0.05

0.10

Diffusion Coordinate 3

Diff

usio

n C

oord

inat

e 7

B1

B2

B3

B4

IaIIn+IIP+IILIb+Ic+Ib/c

20 0 20 40 60Time since r band max

Nor

mal

ized

Flux

+ o

ffset

B1

B2

B3

B4


SN Classification: Training Classifier

Using Random Forest classifier on the diffusion mapcoordinates, we optimize the Type Ia FoM:

f̂Ia =1

NTotalIa

(N trueIa )2

N trueIa + WN false

Ia

; W ≡ 3 (1)

over ε, m, and t Ia and RF parameters (Ntree, mtry) usingcross-validation on the training set of confirmed SNe

Table 6 in Richards et al. (2012), arXiv:1103.6034J. Richards Semi-Supervised Classification 20


Diffusion map representation of all 21,000 SNe(confirmed+unconfirmed)


Sample Selection Bias in LightCurve Classificationwith Dan Starr, Adam Miller, Nat Butler, James Long, JohnRice, Josh Bloom (UC Berkeley), Henrik Brink & BerianJames (DARK)

Richards et al. (2012), arXiv:1106.2832


Sample Selection Bias

In astronomical problems, the training (labeled) and testing(unlabeled) sets are often drawn from different distributions.

Left: Training setRight: Testing set

This problem is referredto as Sample SelectionBias or Covariate Shift.

Also discussed inNewling et al. (2012)

SN Challenge DataKessler et al. (2010)

arXiv:1008.1024J. Richards Semi-Supervised Classification 23

Sample Selection Bias

For SN Ia typing, it isbetter to use deeperspectroscopic trainingsamples, even thoughthey produce data fromfewer SNe

Sm,25 - 25th mag-limitedspec survey is optimal(23.5th mag was used inSN Challenge)

Classifier evaluated overheld-out testing set

Figure: Type Ia SN Purity and Efficiencyof Random Forest classifier on SN

Challenge testing data


Sample Selection Bias: SN Challenge Results

Red: Template FittingBlue: SupervisedGreen: Semi-Supervised

All methods appear tobe affected adversely bySample-Selection Bias

Template Fitting shouldbe more immune to thistype of bias


Methods: Active Learning (AL)

Idea: Identify and manually label the unlabeleddata that would most help future iterations of theclassifier

Key: In astronomy, we often have the ability to selectivelyfollow up on sources:

I Spectroscopic study

I Higher cadence photometric monitoring

Pool-based, batch-mode Active Learning: On each ALiteration, select a batch of objects from the entire testing setfor manual labeling via a query function


Methods: Active Learning (AL)

- P̂RF(y |x) is the estimated RF probability

- ρ(x′, x) is the RF proximity measure

RF AL query functions; Richards et al. (2012) arXiv:1106.2832

AL1. Select testing data point (x′ ∈ U) that is mostunder-sampled by the training data (L):

S1(x′) =

∑x∈U ρ(x′, x)/NTest∑z∈L ρ(x′, z)/NTrain

(2)

AL2. Select testing point that maximizes the totalexpected change in the RF probs. over the testing data:

S2(x′) =

∑x∈U ρ(x′, x)(1−maxy P̂RF(y |x))∑

z∈L ρ(x′, z) + 1(3)


Results: AL for Variable Star Classification

Example: VarStar LCs

Active Learning appliedto 32-class variable starclassification problemusing photometric lightcurves from ASAS.

Training set of OGLEand Hipparcos lightcurves is heavily biased;classifier performancereaches optimal levelafter a few AL iterations.

!

!

!

!

!

!

!

!!

!

0 2 4 6 8

0.66

0.68

0.70

0.72

0.74

0.76

0.78

0.80

AL Iteration

Perc

ent A

gree

men

t with

AC

VS

!

!

!

!

!

!

!

!

!

!

0 2 4 6 8

0.15

0.20

0.25

0.30

0.35

0.40

AL Iteration

Perc

ent o

f Con

fiden

t ASA

S R

F La

bels

Off-the-shelf RFError Rate = 34.5%

RF w/ Active LearningError Rate = 20.5%

3-fold increase in classifier confidence


Where do we go from here?

1 Which method(s) should we use?I Template modeling should work well if our physics are

correct and our templates are accurate and complete.I Machine-learning based methods automatically learn

from training data but can be ignorant of the physics.I Semi-supervised methods have the ability to bridge this

gap: Learning performed on all observed SN data &known physics can be incorporated thru distance metric

Typically, the more data we observe, the more wediscover how little we actually know!

2 How do we optimize follow-up resources?I Rigorous targeting algorithms can help overcome

debilitating sample selection biasesI Template fitting, supervised learning, and

semi-supervised learning all stand to improve


An aside about Supernova Discovery

Work I am involved with for Transient Discovery &Classification with the Palomar Transient Factory:

1 Real/Bogus – Of 1.5M nightly detections (via imagesubtraction) which are the real time-varying astrophysicalobjects?

2 Oarical – Given that an object is Real, can we confidentlydeclare it a Transient or a Variable Star, given all of theinformation we know at the time of discovery?

3 ML Supernova Zoo – At time of discovery, predict howthe SN Zoo users would rate a subtraction image, giventhe ref & sub images plus any contextual information.

Bloom et al. (2011) arXiv:1106.5491; Brink et al. (2012) in prep.; Richards et al.

(2012) in prep.


Center for Time-Domain Informatics Publications

Butler, Nathaniel R. & Bloom, Joshua S. Optimal Time-Series Selection of Quasars(2011, AJ, 147, 93)

Richards, Joseph W., et al. On Machine-Learned Classification of Variable Starswith Sparse and Noisy Time-Series Data (2011, ApJ, 733, 1)

Bloom, Joshua S. & Richards, Joseph W. Data Mining and Machine-Learning inTime-Domain Discovery & Classification (2011, Chapter in the forthcoming book“Advances in Machine Learning and Data Mining for Astronomy”)

Klein, Christopher R., Richards, Joseph W., Butler, Nathaniel R. & Bloom, Joshua S.Mid-infrared Period-luminosity Relations of RR Lyrae Stars Derived from the WISEPreliminary Data Release (2011, ApJ, 732, 2)

Richards, Joseph W., Homrighausen, Darren, Freeman, Peter E., Schafer, Chad M. &Poznanski, Dovi Semi-supervised Learning for Photometric Supernova Classification(2012, MNRAS, 419, 1121)

Richards, Joseph W. et al. Active Learning to Overcome Sample Selection Bias:Application to Photometric Variable Star Classification (2012, ApJ, 744, 192)

Bloom, Joshua S. et al. Automating Discovery and Classification of Transients andVariable Stars in the Synoptic Survey Era (2011, arXiv:1106.5491)


semi-supervised supernova classification · pdf filesemi-supervised supernova classi cation...

Documents