dimensionality reduction of astronomical spectroscopic ... · dimensionality reduction of...

12
Data Analysis Project Final Report Wednesday 29 th November, 2017 Dimensionality Reduction of Astronomical Spectroscopic Data using Autoencoders Quanbin Ma Department of Machine Learning Carnegie Mellon University [email protected] Abstract Background. Recent major advances in the understanding of galaxy owe a great deal to highly successful galaxy surveys conducted with the Sloan Telescope. With massive amount of data, modern techniques such as machine learning comes in naturally for efficient handling. While many applications focused on characterizing and classifying astronomical objects within a predefined area of interest, unsupervised learning has great potential in processing newly obtained data without much human intervention. Aim. The goal of this project is to harvest the recent breakthrough of deep learning in both methodologies and computing resources, to build an automated system to generate embeddings of a manageable size to represent the immense astronomical spectroscopic data, and further detect unintended outliers that bear potential interest. Data. The Mapping Nearby Galaxies at APO (MaNGA) dataset provides a 3D data cube for each galaxy in its catalog, which consists of a series of spectral flux at different wavelengths for each spatial pixel. It is superior to many of its predecessors not only in range of galaxies included but also in the abundance of details within each galaxy. Methods. The original galaxy data is far too large to handle for most existing anomaly detection algorithms to operate on, due to both curse of dimensionality and memory constraints. We thus first seek to map the galaxy observations to some low-dimensional representation using autoencoder, and then apply various manifold learning techniques and anomaly detection algorithms to find suspicious outliers. Convolutional layers are also introduced to further capture the internal correlation both along the spatial and spectral axes. Results. We found that in general, after being compressed to low dimension, we can reconstruct back most of the shapes of the original spectra with spatial pattern retained, at the price of losing many important peak signals.The embeddings proved to contain high level features and could be used to successfully predict some attributes of the galaxy. Several instances of interesting outliers are detected due to their uniqueness in various aspects. DAP Committee members: Barnab´ as P´ oczos h [email protected] i (MLD); Aarti Singh h [email protected] i (MLD); Brett Andrews h [email protected] i (PITT). 1 of 12

Upload: others

Post on 18-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dimensionality Reduction of Astronomical Spectroscopic ... · Dimensionality Reduction of Astronomical Spectroscopic Data using Autoencoders Quanbin Ma Department of Machine Learning

Data Analysis Project Final Report Wednesday 29th November, 2017

Dimensionality Reduction of Astronomical Spectroscopic Data

using Autoencoders

Quanbin MaDepartment of Machine Learning

Carnegie Mellon University

[email protected]

Abstract

Background. Recent major advances in the understanding of galaxy owe a great deal to highlysuccessful galaxy surveys conducted with the Sloan Telescope. With massive amount of data,modern techniques such as machine learning comes in naturally for efficient handling. While manyapplications focused on characterizing and classifying astronomical objects within a predefined areaof interest, unsupervised learning has great potential in processing newly obtained data withoutmuch human intervention.

Aim. The goal of this project is to harvest the recent breakthrough of deep learning in bothmethodologies and computing resources, to build an automated system to generate embeddings ofa manageable size to represent the immense astronomical spectroscopic data, and further detectunintended outliers that bear potential interest.

Data. The Mapping Nearby Galaxies at APO (MaNGA) dataset provides a 3D data cube for eachgalaxy in its catalog, which consists of a series of spectral flux at different wavelengths for eachspatial pixel. It is superior to many of its predecessors not only in range of galaxies included butalso in the abundance of details within each galaxy.

Methods. The original galaxy data is far too large to handle for most existing anomaly detectionalgorithms to operate on, due to both curse of dimensionality and memory constraints. We thus firstseek to map the galaxy observations to some low-dimensional representation using autoencoder, andthen apply various manifold learning techniques and anomaly detection algorithms to find suspiciousoutliers. Convolutional layers are also introduced to further capture the internal correlation bothalong the spatial and spectral axes.

Results. We found that in general, after being compressed to low dimension, we can reconstructback most of the shapes of the original spectra with spatial pattern retained, at the price of losingmany important peak signals.The embeddings proved to contain high level features and could beused to successfully predict some attributes of the galaxy. Several instances of interesting outliersare detected due to their uniqueness in various aspects.

DAP Committee members:

Barnabas Poczos 〈 [email protected] 〉 (MLD);Aarti Singh 〈 [email protected] 〉 (MLD);Brett Andrews 〈 [email protected] 〉 (PITT).

1 of 12

Page 2: Dimensionality Reduction of Astronomical Spectroscopic ... · Dimensionality Reduction of Astronomical Spectroscopic Data using Autoencoders Quanbin Ma Department of Machine Learning

Data Analysis Project Final Report Wednesday 29th November, 2017

1 Introduction

Modern astronomical surveys are capable of producing immense amount of data. The Sloan DigitalSky Survey (SDSS) has collected rich features for millions of astronomical objects covering almostone-third of the sky. This creates a platform with huge potential for us to better explore andunderstand our universe. In spite of the automated pipelines currently being used, it is still hardfor human experts to inspect the data manually in order to discover particular phenomena to theirinterest.

Machine learning algorithms naturally fit in such context to provide automated solutions. For ex-ample, [1] used machine learning methods to successfully detect contaminated redshift estimations,while [2] classified quasar from stars with high accuracy. Though such supervised approaches couldgenerally achieve satisfactory results, especially in many binary object classification applications,they either require a priori knowledge of the dataset for labelling, or take huge computation timeto manually simulate synthetic instances for training, thus limiting their ability to scale.

A random-forest based method is proposed in [3] to automatically detect outliers of different classeswithout supervision, and picked out many interesting objects that have never been closely examinedbefore. However, they treated spectra from the same galaxy as separate individuals, which makeslittle use of the spatial correlation within the galaxy.

In this project, we aim to use unsupervised machine learning to examine a newly released astro-nomical spectroscopic dataset for thousands of galaxies. We approach the problem by first buildinglow-dimensional representations from the huge original inputs using convolutional autoencoders,which are able to capture both the spatial and spectral correlation within a galaxy, to enablefurther application of visualization and anomaly detection.

2 Data

The Mapping Nearby Galaxies at APO (MaNGA) [4] is one of the latest programs in the SloanDigital Sky Survey (SDSS). Unlike a traditional survey that usually has the capacity of a fewhundred of target galaxies and samples only a small sub-region at the center for each galaxy, theMaNGA survey aims at ultimately 10,000 nearby galaxies, and also investigate a galaxy as a wholeusing 17 simultaneous ‘Integral Field Units’ (IFUs), providing rich and complex internal structure.

Up to the most recent SDSS Data Release 14, the MaNGA dataset contains data cubes for 2,812galaxies (indexed by [plate]-[ifudesign]) in total, before some duplication and commissioning platesare removed. Each galaxy has a map from spatial pixels (spaxel) of a 2D image to the spectrum atthat location, resulting in an N×N×4563 cube. The size of spatial dimension depends on the IFUdesign used for the observation, and varies across the dataset. The spectral dimension contains theflux at 4563 different wavelengths, ranging from 3621.6A to 10353.8A with equal space. Along with

2 of 12

Page 3: Dimensionality Reduction of Astronomical Spectroscopic ... · Dimensionality Reduction of Astronomical Spectroscopic Data using Autoencoders Quanbin Ma Department of Machine Learning

Data Analysis Project Final Report Wednesday 29th November, 2017

the raw observation data, we also have metadata references that include the buildout of the fiberbundles and summary of the targets observed.

During pre-processing, we first clipped all the flux values to be positive, since there exists severalrare negative flux values due to instrumental error. In order to unify the shape of input to our model,we discarded a tiny portion of cubes that are smaller than 34× 34 in the spatial dimension, as wellas some observations that were flagged as ‘DO NOT USE’ due to poor measurement quality. Theremaining cubes are resized to a fixed size in the spatial dimensions using linear interpolation. Thespectral dimension are cut to have the length of an order of 2 to make pooling and unpooling easier,where the last few flux are removed with little information loss as they frequently contains hugenoise. We further replaced all the flux between 5574A to 5585A with interpolation of their neighbors,where an instrumental noise occurs across all the observations due to residual sky emission. Thisfinally leads to 2,627 data cubes of size N ×N × 4096.

Figure 1 shows the galaxy images and spectra at two typical spaxels for two random MaNGAgalaxies. The spectra at boundary are usually close to 0 and noisy, while the spectra near thecenter generally demonstrate more information of the galaxy. Different galaxies might have verydifferent flux values due to its brightness, slopes due to red/blue-ness, peak locations due to redshift,and even jiggling patterns. The common peak of the two galaxies are commonly referred to as theH-α emission line, which always appears at 6562.8A after deredshifting. However, we did not shiftthe spectra back to rest-frame as most astronomer would do, since this will almost double the lengthof the spectra and cause severe memory issues. This means that the flux at a wavelength are whatwe have observed directly on earth, instead of what we would see if we were in the target galaxy.We hope the model can learn the redshift itself by identifying the relative shift of the peaks, butin general we do believe that the result will be better with deredshifting.

3 Method

Autoencoders (AE) are neural networks that aim to reconstruct the input with minimum distortion.The encoder transforms an input into a lower-dimensional representation, and the decoder tries tomap it back. By comparing the reconstructed output from decoder against the original input, thewhole model can be trained in an unsupervised way, and learn to extract important features andcompose an latent embedding, providing an aggregated view of the input data that is of manageablesize for subsequent tasks like clustering and anomaly detection. The general idea of the AE familyis that the unique pattern of a single instance could be extracted as the embedding, while thegeneral distribution and other global information can be stored in the model parameters. Themapping in both the encoder and the decoder are typically done by stacks of fully connected layer.Figure 2a shows an one-layered AE, where x is the input vector, z is the output of encoder, i.e.the low-dimensional embedding, and x′ being the output of decoder, i.e. the reconstruction. Theencoder and decoder in this simple case is just two weight matrices. We refer interested readers to[5] for more details.

3 of 12

Page 4: Dimensionality Reduction of Astronomical Spectroscopic ... · Dimensionality Reduction of Astronomical Spectroscopic Data using Autoencoders Quanbin Ma Department of Machine Learning

Data Analysis Project Final Report Wednesday 29th November, 2017

(a) (b) (c)

(d) (e) (f)

Figure 1: Galaxy 8261-9102 and 8391-9101’s (a,d) images; (b,e) spectra at the center; (c,f) spectranear boundary.

Deep convolution-based neural network has remarkably outperformed traditional artificial neuralnetworks, and many of its other competitors in most image-related tasks in the past few years. Itis natural to replace the fully connected layers in the auto-encoder with convolutional ones to getconvolutional auto-encoders (CAE). When applied to image-like inputs, CAE can better capture thespatial correlation between neighbor pixels, while AE can only treat each pixel as an independentfeature. CAE also greatly reduce redundancy in model parameters when the input size is large asfilters are only applied locally instead of globally, thus can be scaled well to large inputs such asMaNGA cubes.

3.1 Architecture

We mainly focused using CAE for dimensionality reduction. In the encoder, we incorporatedconvolution layers to detect local pattern, followed by a pooling layer to downsample and combineinformation in a neighborhood. Multiple blocks of such combination were stacked together toextract high-level features, gradually reducing the dimensionality of the input cubes to finally geta desired low-dimensional representation vector.

4 of 12

Page 5: Dimensionality Reduction of Astronomical Spectroscopic ... · Dimensionality Reduction of Astronomical Spectroscopic Data using Autoencoders Quanbin Ma Department of Machine Learning

Data Analysis Project Final Report Wednesday 29th November, 2017

(a) (b) (c) (d)

Figure 2: (a) Demonstration of an one-layered AE. (b,c) Illustration of convolution and transposedconvolution with kernel size of 3× 3, with the bottom blue image being input and top green imagebeing output. The convolution operation maps the 5 × 5 input to 2 × 2 without padding, whilethe transposed convolution does the opposite. As shown, transposed convolution can be seenequivalently as convolution with a fractional stride, where zeros are inserted between input pixels.(d) Illustration of pooling, in this case the output is the numerical average of a neiborhood in theinput.

The vector was then passed to the decoder, where we performed the exact opposite of the operationsin the encoder. Transposed convolution, sometimes referred to as deconvolution, was performed.With a large stride, it would act like an unpooling layer to get back the lost neighboring informationduring pooling. However, such reconstruction, although enlarged in size, would be rather sparse.As pointed out in [6], such unpooling would result in low-resolution checkerboard artifacts, so weadopted their solution to use a nearest-neighbor interpolated resize layer followed by a convolutionlayer to further blend the sparse pixels and densify the outputs, which has shown success in super-resolution problems [7]. Normal deconvolution were used in decoder to reverse the convolution inencoder. Figures 2b to 2d [8] illustrates and compares the difference of these operations.

The depth of the MaNGA cubes is much larger than that of a typical image which has only 3channels (RGB). This leaves us multiple choices in terms of how to best utilize its internal correlationin both the spatial dimensions and along the spectral axis. We experimented several different flavorsof CAE, where each variants has its own advantage in some aspect of the reconstruction. Detailedmodel specifications are shown in Figure 3.

2D CAE The first and simplest model we tried is the traditional CAE that have been widelyused on regular image data. Each slice of the cube along the spectral dimension, representing themapping from spaxels to their flux at a certain wavelength, is treated as an independent ‘channel’.The convolution kernels will have the same depth as the input to the layer, and the depth ofoutput for a layer will be the number of different filters used. Convolution in this case will only beperformed along the two spatial dimensions only, and the operation we performed on the spectral

5 of 12

Page 6: Dimensionality Reduction of Astronomical Spectroscopic ... · Dimensionality Reduction of Astronomical Spectroscopic Data using Autoencoders Quanbin Ma Department of Machine Learning

Data Analysis Project Final Report Wednesday 29th November, 2017

(a)

(b)

(c)

(d)

Figure 3: Model specifications for (a) 2D CAE; (b) 3D CAE; (b) 1D-2D CAE; (b) 2D CAE oncompressed spectra.

6 of 12

Page 7: Dimensionality Reduction of Astronomical Spectroscopic ... · Dimensionality Reduction of Astronomical Spectroscopic Data using Autoencoders Quanbin Ma Department of Machine Learning

Data Analysis Project Final Report Wednesday 29th November, 2017

dimension will be somewhat similar to fully-connected layers. With such a large input size, thisinevitably requires a huge number of parameters, but the output depth after each convolutionlayer can be tuned by number of filters used, thus we can easily compress the cube to a desiredlow-dimensional representation.

3D CAE A major drawback in the previous design is that the spectrum at a spaxel is obviouslynot independent features. To further utilized the continuity and order information that lie within,we want to do convolution along the spectral dimension as well. This leads to CAE with 3D-convolution. The MaNGA cube can be imagined as a four-dimensional tesseract, where the lastdimension is of size 1, representing channel. The convolution kernels will now be tiny cubes thatconvolve along all three dimensions. At a given point, the filters will not only captures informationat nearby spaxels, but also at nearby wavelengths.

Although this seems like an ideal model to solve the problem, in practice the 3D convolution willgenerate outputs that have almost the same size as their inputs. Multiple stacks of 3D convolutionlayers near the top of the encoder will take up a huge amount of memory. This add constraints onthe model size itself, as well as some hyperparameters such as the batch size during training, whichwe found lead to sub-optimal solution at convergence.

1D-2D CAE A 3D convolution kernel can extract both spatial and spectral correlations, butone may argue that these two correlations might be orthogonal to each other. For example, in a3 × 3 × 3 cube, the center block should have high correlation with the 6 blocks that are directlyadjacent to it, but might not depend as heavily on its diagonal neighbors. This inspired us totake turns to perform 2D convolution along the spatial dimension only, treating the wavelengthsas totally separate examples in a batch, and 1D convolution along the spectral dimension. This isequivalent to doing 3D convolution, with constraints such that all subsequent channels of a kernelmust be a reweighted duplicate of the first channel. This further reduce the model complexity.

2D CAE on compressed spectra Instead of doing alternating 1D and 2D convolution, wedivide the whole process into two stage. In the first stage, spectra within a given cube are treatedas individual examples, and traditional dimensionality reduction techniques are performed to geta rather compressed representation for each spectrum. PCA and shallow AE are two commonchoices. Then in the second stage we convolve along the spatial dimension using 2D CAE. Notethat after the compression, each element in the compressed spectral dimension represents a highlevel feature of the original spectrum, such as mean, slope and location of its peak, thus therewon’t be any ordered information and features are indeed independent, making it unnecessary toconvolve along the spectral dimension.

7 of 12

Page 8: Dimensionality Reduction of Astronomical Spectroscopic ... · Dimensionality Reduction of Astronomical Spectroscopic Data using Autoencoders Quanbin Ma Department of Machine Learning

Data Analysis Project Final Report Wednesday 29th November, 2017

4 Experiments

4.1 Training

In our experiment, we picked N = 32 so that the input MaNGA cubes would have the shape of32 × 32 × 4096. Activation function is chosen to be ELU [9], which appears to work better thanReLU. We used average pooling to downsample the input, which turned out to work better thanmax pooling in our case. After the encoder, we would obtain a 4 × 4 cube to be our desired low-dimensional representation. We calculated the reconstruction error by comparing the output ofdecoder and the original input, and used the mean error to be reconstruction loss. Mean absoluteerror turned out to produce reconstruction with more accurate mean and slope, but totally ignorethe peak locations that are of great importance, while mean square error seemed to cause the wholecurve to be dragged up by the peaks as it is less robust, so we chose to go with MAE. The first2,000 examples were selected as training set, and the rest as validation set to monitor the trainingperformance. The models were re-trained on the whole dataset for the usage of downstream outlierdetection, since we will not be applying it to new data. We optimized the model for 50 epochsusing the ADAM [10] optimizer, with learning rate shrinking to half every 10 epochs. Batch sizewas set to 50 for 2D CAE, and 10 when the memory complaints for the rest cases.

We applied batch normalization [11] after each convolution/transposed-convolution, which normal-izes the layers to follow the standard Gaussian distribution. There has been a heated debate aroundwhether the trick works besides the internal-covariate-shift theory proposed by the original authors,but empirically we found that adding such operation indeed helps the network converge to a muchbetter optimum during training.

We polished the loss function to better fit the MaNGA dataset semantics. The IFUs are in fact ofa hexagon shape, so that the values in the four corners of the spatial square are often very weak,with excessive noise near boundary as shown in Figure 1f. Also, since the sensors are intentionallyaiming at the center of the target galaxy, most useful signals describing the galaxy will lie in thecenter spaxels. Thus we multiplied the errors in the loss function with a pre-defined mask, weightingup the cost of making wrong reconstruction in the center, and down weighting that near boundary.

In addition to the reconstruction loss, we attached another stream of network after the encoder,which takes in the output embeddings of the encoder, and uses two layers of fully-connected layerto make a single prediction of the average redshift. The regression MSE loss will be reweighted bya λ and added back to form the final loss function. This also gives us a more intuitive metric toevaluate how well the model (encoder part) is learning.

For PCA-2D, besides computing loss by comparing the flux, we also tried another approach bycomputing MSE on reconstructed principal components. The principal components are normalizedto balance their contribution to the total reconstruction loss. We also find it useful to manuallyboost up the weight for the first two principal components, which account for the mean and slope

8 of 12

Page 9: Dimensionality Reduction of Astronomical Spectroscopic ... · Dimensionality Reduction of Astronomical Spectroscopic Data using Autoencoders Quanbin Ma Department of Machine Learning

Data Analysis Project Final Report Wednesday 29th November, 2017

(a) (b)

(c) (d)

Figure 4: Reconstructions of the spectrum at the center spaxel for galaxy 8261-9102 by (a) 2DCAE; (b) 3D CAE; (b) 1D-2D CAE; (b) 2D CAE on compressed spectra.

of the original spectrum, over the rest which each controls the peak or jiggling pattern at certainwavelength. However, this turned out to be not working as well, due to the volatility in the principalcomponents.

4.2 Evaluation

We evaluated the model mainly by the quality of their reconstruction. Figure 5 shows the recon-struction of the center spaxel for a typical galaxy using the four models discussed. The 2D CAE cangenerally put the reconstruction around the correct mean, while the 3D CAE can fit the originalcurve much more accurately. The 1D-2D CAE seems to be consistently repeating a certain localjiggling pattern to form the whole spectrum, which might be a result of using too few parameters.PCA-2D is equivalent as fixing a pre-trained fully connected layer on the top for 2D CAE, and itturned out to reduce the training difficulty a bit. However, some occasional noise pattern can oftenbe observed for PCA-2D, mainly due to the volatility mentioned earlier. Unfortunately none of

9 of 12

Page 10: Dimensionality Reduction of Astronomical Spectroscopic ... · Dimensionality Reduction of Astronomical Spectroscopic Data using Autoencoders Quanbin Ma Department of Machine Learning

Data Analysis Project Final Report Wednesday 29th November, 2017

Model Recon MAE Loss Pred MSE Loss

2D CAE 49831 0.00486

3D CAE 44387 0.00158

1D-2D CAE 48742 0.00478

PCA-2D CAE 43978 0.00190

Table 1: Performance of different models. The reconstruction loss is the mean absolute differencebetween the reconstructed cube and the original, summed across the whole cube with the spaxelswithin 2 spaxels to the boundary down-weighted by a factor of 5. The prediction loss is the meansquared error between the prediction and actual average redshift.

these models has captured the emission peak, though, since we are using MAE in the first place.A MSE reconstruction loss, however, would fail to get even the mean right.

As shown in Table 1, the 3D CAE outperformed others in the both metrics. We further looked intoits reconstruction performance in the spatial dimensions. The predictions generally lies around thered reference line of perfect prediction, and performing much better than just guessing the mean,which shows that our encoder is indeed extracting high-level information. In Figure 5b, we takeout a slice from the cube at a certain wavelength to see the spatial distribution of flux. It can beseen that the reconstructed slice can capture spatial pattern of this particular galaxy regardless ofits irregular shape.

(a) (b)

Figure 5: Further results for 3D CAE: (a) predicted v.s. truth average redshift for all galaxy cubes;(b) Original, reconstructed and residual (difference between original and reconstructed) spatial fluxmap for galaxy 8338-12702 at 5739.8 A, under same color mapping scheme.

4.3 Outlier detection

With the dimensionality reduced to an acceptable range, we experimentally tried many widely-usedmanifold learning techniques, including from Isomap to multidimensional scaling, to visualize thedistribution and identify outliers. We also performed distance based outlier detection algorithm,

10 of 12

Page 11: Dimensionality Reduction of Astronomical Spectroscopic ... · Dimensionality Reduction of Astronomical Spectroscopic Data using Autoencoders Quanbin Ma Department of Machine Learning

Data Analysis Project Final Report Wednesday 29th November, 2017

(a) (b)

Figure 6: Images and spectra at the weird/center spaxel of example galaxies detected as outliers:(a) a star in the foreground at top right corner; (b) a blazar.

such as the k-NN based method, where a galaxy is scored by the its average distance to its top-k nearest neighbors. We found that several galaxy are consistently far away from the majorityclusters regardless of the method used, showing high probability of being an outlier.

There are also some galaxies that all models failed to reconstruct well. In this context, this in factmight be a good thing as it actually gives us a more straightforward way to classify them as outlierbased on the reconstruction loss. Those reconstructed with large error might well have differentdistribution from the rest and are likely to be weird.

During our initial runs of outlier detection, many of the observations that have foreground starslying in between the IFU and the target galaxy are identified as outliers. These cubes will haveunexpected large flux values at a certain spatial patch near its boundary, thus triggering the modelto treat them as abnormal samples. Figure 6a is an example, where a foreground star triggers hugeflux with downward slope (indicating it’s blue) at the top right corner. While this is indeed an trueoutlier, we in fact already have the knowledge of the star’s existence. In order to let the modelextract more ‘unknown unknowns’ that haven’t been discovered before, we mask out these patchesin the input data, as well as in the reconstruction loss used during training.

We continue to manually inspect the galaxy cube and identify the source of their weirdness. Figure 6shows several typical examples. Figure 6b is one of the mostly picked galaxy, which turned out tobe an blazar, one of the most energetic phenomena in the universe.

5 Conclusions

This projects mainly attempted to build low-dimensional representation for the MaNGA dataset,which contains spatial maps of astronomical spectroscopic data for galaxies, using convolutionalautoencoders in an unsupervised fashion. Multiple variants of CAE with different focus and thusdifferent architectures have been experimented and evaluated. Current results still have huge

11 of 12

Page 12: Dimensionality Reduction of Astronomical Spectroscopic ... · Dimensionality Reduction of Astronomical Spectroscopic Data using Autoencoders Quanbin Ma Department of Machine Learning

Data Analysis Project Final Report Wednesday 29th November, 2017

space for improvement, thus possible directions are discussed. From the dimensionality-reducedsummarizing embeddings, preliminary outlier detection is performed to locate weird galaxies amongthe whole dataset. A potential future work is to detect outlier spaxels within a single galaxy cube.

References

[1] B. Hoyle, M. M. Rau, K. Paech, C. Bonnett, S. Seitz, and J. Weller, “Anomaly detection for machinelearning redshifts applied to SDSS galaxies,” , vol. 452, pp. 4183–4194, Oct. 2015.

[2] C. M. Peters, G. T. Richards, A. D. Myers, M. A. Strauss, K. B. Schmidt, Z. Ivezic, N. P. Ross, C. L.MacLeod, and R. Riegel, “Quasar Classification Using Color and Variability,” , vol. 811, p. 95, Oct.2015.

[3] D. Baron and D. Poznanski, “The weirdest SDSS galaxies: results from an outlier detection algorithm,”, vol. 465, pp. 4530–4555, Mar. 2017.

[4] K. Bundy, M. A. Bershady, D. R. Law, R. Yan, N. Drory, N. MacDonald, D. A. Wake, B. Cherinka, J. R.Sanchez-Gallego, A.-M. Weijmans, D. Thomas, C. Tremonti, K. Masters, L. Coccato, A. M. Diamond-Stanic, A. Aragon-Salamanca, V. Avila-Reese, C. Badenes, J. Falcon-Barroso, F. Belfiore, D. Bizyaev,G. A. Blanc, J. Bland-Hawthorn, M. R. Blanton, J. R. Brownstein, N. Byler, M. Cappellari, C. Conroy,A. A. Dutton, E. Emsellem, J. Etherington, P. M. Frinchaboy, H. Fu, J. E. Gunn, P. Harding, E. J.Johnston, G. Kauffmann, K. Kinemuchi, M. A. Klaene, J. H. Knapen, A. Leauthaud, C. Li, L. Lin,R. Maiolino, V. Malanushenko, E. Malanushenko, S. Mao, C. Maraston, R. M. McDermid, M. R.Merrifield, R. C. Nichol, D. Oravetz, K. Pan, J. K. Parejko, S. F. Sanchez, D. Schlegel, A. Simmons,O. Steele, M. Steinmetz, K. Thanjavur, B. A. Thompson, J. L. Tinker, R. C. E. van den Bosch, K. B.Westfall, D. Wilkinson, S. Wright, T. Xiao, and K. Zhang, “Overview of the SDSS-IV MaNGA Survey:Mapping nearby Galaxies at Apache Point Observatory,” , vol. 798, p. 7, Jan. 2015.

[5] G. Hinton and R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science,vol. 313, no. 5786, pp. 504 – 507, 2006.

[6] A. Odena, V. Dumoulin, and C. Olah, “Deconvolution and checkerboard artifacts,” Distill, 2016.[Online]. Available: http://distill.pub/2016/deconv-checkerboard

[7] C. Dong, C. Change Loy, K. He, and X. Tang, “Image Super-Resolution Using Deep ConvolutionalNetworks,” ArXiv e-prints, Dec. 2015.

[8] V. Dumoulin and F. Visin, “A guide to convolution arithmetic for deep learning,” ArXiv e-prints, Mar.2016.

[9] D. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and accurate deep network learningby exponential linear units (elus),” CoRR, vol. abs/1511.07289, 2015. [Online]. Available:http://arxiv.org/abs/1511.07289

[10] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980,2014. [Online]. Available: http://arxiv.org/abs/1412.6980

[11] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internalcovariate shift,” CoRR, vol. abs/1502.03167, 2015. [Online]. Available: http://arxiv.org/abs/1502.03167

12 of 12