scientific data mining principles and applications with astronomical data. amos storkey institute...
TRANSCRIPT
![Page 1: Scientific Data Mining Principles and applications with astronomical data. Amos Storkey Institute for Adaptive and Neural Computation Division of Informatics](https://reader035.vdocuments.mx/reader035/viewer/2022081602/5515ed3a55034638038b5150/html5/thumbnails/1.jpg)
Scientific Data Mining
Principles and applications with astronomical data.
Amos Storkey
Institute for Adaptive and Neural Computation
Division of Informatics and Institute for Astronomy
University of Edinburgh
![Page 2: Scientific Data Mining Principles and applications with astronomical data. Amos Storkey Institute for Adaptive and Neural Computation Division of Informatics](https://reader035.vdocuments.mx/reader035/viewer/2022081602/5515ed3a55034638038b5150/html5/thumbnails/2.jpg)
Collaborators and Thanks
Collaborative work with Nigel Hambly, Chris Williams and Bob Mann.
Thanks also to many others at the Royal Observatory, Edinburgh for their help in clarifying many of the things that an astronomical outsider might misunderstand or falsely presume!
![Page 3: Scientific Data Mining Principles and applications with astronomical data. Amos Storkey Institute for Adaptive and Neural Computation Division of Informatics](https://reader035.vdocuments.mx/reader035/viewer/2022081602/5515ed3a55034638038b5150/html5/thumbnails/3.jpg)
Astro-informatics
Problems in Astronomy increasingly require use of machine learning, data mining and informatics techniques. Detection of spurious objects Record linkage Object classification and clustering Source seperation Compression Information about techniques
![Page 4: Scientific Data Mining Principles and applications with astronomical data. Amos Storkey Institute for Adaptive and Neural Computation Division of Informatics](https://reader035.vdocuments.mx/reader035/viewer/2022081602/5515ed3a55034638038b5150/html5/thumbnails/4.jpg)
Galaxy spectra
James Riden, with Alan Heavens and Ben Panter.Chris Williams.
Given spectra, what can be said about the generation history and metallicity of galaxy.
Data exploration techniques: ISOMAP and LLE – find data manifold and project to low dimension.
Develop probabilistic model for galaxy generation, infer history and metallicity parameters from spectra.
![Page 5: Scientific Data Mining Principles and applications with astronomical data. Amos Storkey Institute for Adaptive and Neural Computation Division of Informatics](https://reader035.vdocuments.mx/reader035/viewer/2022081602/5515ed3a55034638038b5150/html5/thumbnails/5.jpg)
Exploratory Data Analysis
![Page 6: Scientific Data Mining Principles and applications with astronomical data. Amos Storkey Institute for Adaptive and Neural Computation Division of Informatics](https://reader035.vdocuments.mx/reader035/viewer/2022081602/5515ed3a55034638038b5150/html5/thumbnails/6.jpg)
Exploratory Data Analysis
![Page 7: Scientific Data Mining Principles and applications with astronomical data. Amos Storkey Institute for Adaptive and Neural Computation Division of Informatics](https://reader035.vdocuments.mx/reader035/viewer/2022081602/5515ed3a55034638038b5150/html5/thumbnails/7.jpg)
Record Linkage Problem of linking records from different
datasets. There is an ambiguity in matches. Room for new techniques.
![Page 8: Scientific Data Mining Principles and applications with astronomical data. Amos Storkey Institute for Adaptive and Neural Computation Division of Informatics](https://reader035.vdocuments.mx/reader035/viewer/2022081602/5515ed3a55034638038b5150/html5/thumbnails/8.jpg)
Super-resolution
Improving resolution of a single image, or combining images from different sources to provide an increased resolution.
Image cleaning and characterisation. H alpha survey. Matches in short red. Examples.
![Page 9: Scientific Data Mining Principles and applications with astronomical data. Amos Storkey Institute for Adaptive and Neural Computation Division of Informatics](https://reader035.vdocuments.mx/reader035/viewer/2022081602/5515ed3a55034638038b5150/html5/thumbnails/9.jpg)
Part II – Main Problem
Locating junk objects in astronomical databases.
Makes finding non-matches across epochs or colours hard.
![Page 10: Scientific Data Mining Principles and applications with astronomical data. Amos Storkey Institute for Adaptive and Neural Computation Division of Informatics](https://reader035.vdocuments.mx/reader035/viewer/2022081602/5515ed3a55034638038b5150/html5/thumbnails/10.jpg)
Supercosmos Sky Survey Data UK, ESO and Palomar Schmidt sky survey plates. Optical: 3 colours and 2 epochs, 894 fields for each
covering the Southern sky. Digitised using SuperCOSMOS to 10 micron
(0.7arcsec). 5x105 to 107 objects on the plate. Objects and features extracted from plates to form a
catalogue of stars and galaxies and characteristics (eg ellipses), but also spurious objects, eg. from satellite tracks
Average of 2 satellite tracks per plate, a few hundred to a few thousand objects per track.
Aeroplanes, diffraction spikes, halos, scratches...
![Page 11: Scientific Data Mining Principles and applications with astronomical data. Amos Storkey Institute for Adaptive and Neural Computation Division of Informatics](https://reader035.vdocuments.mx/reader035/viewer/2022081602/5515ed3a55034638038b5150/html5/thumbnails/11.jpg)
Satellite track problem
Some satellite tracks tend to be recognised as a line of objects:
![Page 12: Scientific Data Mining Principles and applications with astronomical data. Amos Storkey Institute for Adaptive and Neural Computation Division of Informatics](https://reader035.vdocuments.mx/reader035/viewer/2022081602/5515ed3a55034638038b5150/html5/thumbnails/12.jpg)
Optical Artefacts
Can be halos about bright stars. High density of spurious points local to the star.
(Almost) horizontal and (almost) vertical diffraction spikes are possible.
![Page 13: Scientific Data Mining Principles and applications with astronomical data. Amos Storkey Institute for Adaptive and Neural Computation Division of Informatics](https://reader035.vdocuments.mx/reader035/viewer/2022081602/5515ed3a55034638038b5150/html5/thumbnails/13.jpg)
![Page 14: Scientific Data Mining Principles and applications with astronomical data. Amos Storkey Institute for Adaptive and Neural Computation Division of Informatics](https://reader035.vdocuments.mx/reader035/viewer/2022081602/5515ed3a55034638038b5150/html5/thumbnails/14.jpg)
![Page 15: Scientific Data Mining Principles and applications with astronomical data. Amos Storkey Institute for Adaptive and Neural Computation Division of Informatics](https://reader035.vdocuments.mx/reader035/viewer/2022081602/5515ed3a55034638038b5150/html5/thumbnails/15.jpg)
![Page 16: Scientific Data Mining Principles and applications with astronomical data. Amos Storkey Institute for Adaptive and Neural Computation Division of Informatics](https://reader035.vdocuments.mx/reader035/viewer/2022081602/5515ed3a55034638038b5150/html5/thumbnails/16.jpg)
![Page 17: Scientific Data Mining Principles and applications with astronomical data. Amos Storkey Institute for Adaptive and Neural Computation Division of Informatics](https://reader035.vdocuments.mx/reader035/viewer/2022081602/5515ed3a55034638038b5150/html5/thumbnails/17.jpg)
![Page 18: Scientific Data Mining Principles and applications with astronomical data. Amos Storkey Institute for Adaptive and Neural Computation Division of Informatics](https://reader035.vdocuments.mx/reader035/viewer/2022081602/5515ed3a55034638038b5150/html5/thumbnails/18.jpg)
Spurious object characteristics
Spurious objects cover all the ranges of magnitude measurements, they often (but not always) have characteristics resembling those of galaxies.
In fact their characteristics are wide and various. They are not easy to detect from their characteristics alone.
![Page 19: Scientific Data Mining Principles and applications with astronomical data. Amos Storkey Institute for Adaptive and Neural Computation Division of Informatics](https://reader035.vdocuments.mx/reader035/viewer/2022081602/5515ed3a55034638038b5150/html5/thumbnails/19.jpg)
Machine Learning Methods
Hough Transform and Circular Hough Transform
See http://www.anc.ed.ac.uk/~amos/hough.html
![Page 20: Scientific Data Mining Principles and applications with astronomical data. Amos Storkey Institute for Adaptive and Neural Computation Division of Informatics](https://reader035.vdocuments.mx/reader035/viewer/2022081602/5515ed3a55034638038b5150/html5/thumbnails/20.jpg)
Circular Hough Transform
![Page 21: Scientific Data Mining Principles and applications with astronomical data. Amos Storkey Institute for Adaptive and Neural Computation Division of Informatics](https://reader035.vdocuments.mx/reader035/viewer/2022081602/5515ed3a55034638038b5150/html5/thumbnails/21.jpg)
Hough Example: UKJ005
angle
Distance from origin0
2
dmax
![Page 22: Scientific Data Mining Principles and applications with astronomical data. Amos Storkey Institute for Adaptive and Neural Computation Division of Informatics](https://reader035.vdocuments.mx/reader035/viewer/2022081602/5515ed3a55034638038b5150/html5/thumbnails/22.jpg)
Data space corresponding to bin However:
Can’t find short lines Curves are problematic Background star/galaxy density changes can
cause errors.
![Page 23: Scientific Data Mining Principles and applications with astronomical data. Amos Storkey Institute for Adaptive and Neural Computation Division of Informatics](https://reader035.vdocuments.mx/reader035/viewer/2022081602/5515ed3a55034638038b5150/html5/thumbnails/23.jpg)
Renewal Strings
Hidden-Markov renewal processes. Look at all possible line segments in terms
of renewal processes. If local density is closer in signature to a
satellite track than the background stars and galaxies, then flag as a satellite track.
![Page 24: Scientific Data Mining Principles and applications with astronomical data. Amos Storkey Institute for Adaptive and Neural Computation Division of Informatics](https://reader035.vdocuments.mx/reader035/viewer/2022081602/5515ed3a55034638038b5150/html5/thumbnails/24.jpg)
Benefits
Can use line widths thirty times narrower than with Hough.
Copes with curves by using local linearity rather than restricted to global linearity.
Deals with local star/galaxy density differences. Copes with partial lines, dashed lines etc.
Flexible model. Can use other data (eg ellipticity) to strengthen
classification. Bayesian.
![Page 25: Scientific Data Mining Principles and applications with astronomical data. Amos Storkey Institute for Adaptive and Neural Computation Division of Informatics](https://reader035.vdocuments.mx/reader035/viewer/2022081602/5515ed3a55034638038b5150/html5/thumbnails/25.jpg)
Generative renewal string
Can generate from model.
![Page 26: Scientific Data Mining Principles and applications with astronomical data. Amos Storkey Institute for Adaptive and Neural Computation Division of Informatics](https://reader035.vdocuments.mx/reader035/viewer/2022081602/5515ed3a55034638038b5150/html5/thumbnails/26.jpg)
To use
Don’t use generative model! Too hard. Look at all line segments. Transform
star/galaxy model to Poisson process on line. Run Markov chain along each line.
Simplest case: class 0 is background process. Class 1defines a renewal processes corresponding to a scratch, satellite track etc. Processing is fully Markovian.
![Page 27: Scientific Data Mining Principles and applications with astronomical data. Amos Storkey Institute for Adaptive and Neural Computation Division of Informatics](https://reader035.vdocuments.mx/reader035/viewer/2022081602/5515ed3a55034638038b5150/html5/thumbnails/27.jpg)
Results
Get probabilistic results. Two possibilities: Probability of a given point being a spurious
point. Most probable classification of points.
![Page 28: Scientific Data Mining Principles and applications with astronomical data. Amos Storkey Institute for Adaptive and Neural Computation Division of Informatics](https://reader035.vdocuments.mx/reader035/viewer/2022081602/5515ed3a55034638038b5150/html5/thumbnails/28.jpg)
Results
Two examples. The left example is a small scratch or track in the corner of ukj005. Right is a track on a dense plate.
![Page 29: Scientific Data Mining Principles and applications with astronomical data. Amos Storkey Institute for Adaptive and Neural Computation Division of Informatics](https://reader035.vdocuments.mx/reader035/viewer/2022081602/5515ed3a55034638038b5150/html5/thumbnails/29.jpg)
![Page 30: Scientific Data Mining Principles and applications with astronomical data. Amos Storkey Institute for Adaptive and Neural Computation Division of Informatics](https://reader035.vdocuments.mx/reader035/viewer/2022081602/5515ed3a55034638038b5150/html5/thumbnails/30.jpg)
Further examples
Further examples can be found at http://www.anc.ed.ac.uk/~amos/sattrackres.html
A flythrough movie of one plate can be found at http://www.anc.ed.ac.uk/~amos/demos/flythroug
hnew3c0.avi (36MB)
![Page 31: Scientific Data Mining Principles and applications with astronomical data. Amos Storkey Institute for Adaptive and Neural Computation Division of Informatics](https://reader035.vdocuments.mx/reader035/viewer/2022081602/5515ed3a55034638038b5150/html5/thumbnails/31.jpg)
Conclusions
Machine Learning and Data Mining methods are, and will continue, to prove useful with astronomical databases.
Methods do not always work automatically. Some thought is needed.
Circular Hough transforms, and renewal strings have proven effective in locating a variety of spurious objects in astronomical databases.
So far have run on a quarter of one colour of SuperCOSMOS data.
![Page 32: Scientific Data Mining Principles and applications with astronomical data. Amos Storkey Institute for Adaptive and Neural Computation Division of Informatics](https://reader035.vdocuments.mx/reader035/viewer/2022081602/5515ed3a55034638038b5150/html5/thumbnails/32.jpg)
Contact and URLs
http://www.anc.ed.ac.uk/~amos/
http://www.roe.ac.uk/cosmos/scosmos.html