![Page 1: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649eb35503460f94bba2e7/html5/thumbnails/1.jpg)
Knowledge Extractionfrom
Scientific Data
Roy WilliamsCalifornia Institute of Technology
SDMIV24 October 2002
Edinburgh
KE Tools S Data
![Page 2: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649eb35503460f94bba2e7/html5/thumbnails/2.jpg)
Scientific Data Datacubes
N-dimensional array– spectrum, time-series, – image, voxels, hyperspectral image
Concentration Pattern matching Integration
Event Sets Often derived from pattern matching A set of events is a table Integrating Event Sets Clustering
![Page 3: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649eb35503460f94bba2e7/html5/thumbnails/3.jpg)
Knowledge Extraction
Concentration principle components cluster/outlier finding
Datacube Eventset Pattern matching From theory or from training set
Integration registration of datacubes join / crossmatch of eventsets
![Page 4: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649eb35503460f94bba2e7/html5/thumbnails/4.jpg)
DatacubeSome stars from the DPOSS survey
![Page 5: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649eb35503460f94bba2e7/html5/thumbnails/5.jpg)
DatacubeAn AVIRIS image of San Francisco Bay
400-2500 nm in 224 bandsR. Green, JPL
atmosphericabsorption
![Page 6: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649eb35503460f94bba2e7/html5/thumbnails/6.jpg)
Concentrating Information
eg Principle Component Analysis Given a set of vectors Compute dot products
(same as correlations)
Diagonalize Throw out weaker (noise) components
![Page 7: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649eb35503460f94bba2e7/html5/thumbnails/7.jpg)
Information concentrationPrinciple Component Analysis
![Page 8: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649eb35503460f94bba2e7/html5/thumbnails/8.jpg)
Event Sets
Created by pattern matching from a known rule from a training set by finding clusters
![Page 9: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649eb35503460f94bba2e7/html5/thumbnails/9.jpg)
Event Set = Table
name=longitudecontent=Earth coordinateunits=degreesdatatype=doubledisplay=f6.2
43.487.283.2
name=IDcontent=keyunits=nonedatatype=char
E3948547E3948545E3943766108?
103?
![Page 10: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649eb35503460f94bba2e7/html5/thumbnails/10.jpg)
Gravitational Lenses
A. Szalay, Johns Hopkins
Pattern matching finds events in datacubes
![Page 11: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649eb35503460f94bba2e7/html5/thumbnails/11.jpg)
Black hole collisionsLIGO: Laser Interferometric Gravitational Wave Experiment
![Page 12: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649eb35503460f94bba2e7/html5/thumbnails/12.jpg)
Creating Event SetsGiven a set of volcanoes, find a lot more volcanoesHere we use Singular Value Decomposition
Supervised Classification
![Page 13: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649eb35503460f94bba2e7/html5/thumbnails/13.jpg)
all sources
stellargalaxy
compactgalaxy
high fX/fopt
low fX/fopt
all sources
activedM stars
BLAGN
medium fX/fopt
NELGs
possible hi-z quasar
F/G stars?
normalgalaxies?
symbols: X-ray source counterpartscontours: all optical objects
BLAGN
Multiparameterdatacolour-colour-fx/fopt
Mike WatsonLeicester University
![Page 14: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649eb35503460f94bba2e7/html5/thumbnails/14.jpg)
Integrating Datacubes
Find a mapping from one domain to the otherRegistration of DPOSS and Hubble Deep Field
![Page 15: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649eb35503460f94bba2e7/html5/thumbnails/15.jpg)
Datacube RegistrationMovement of ice inferred from registration
![Page 16: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649eb35503460f94bba2e7/html5/thumbnails/16.jpg)
Integrating Event Sets
Database Join Fuzzy Join
eg astronomical crossmatch
Distributed Join does the Grid do databases?
![Page 17: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649eb35503460f94bba2e7/html5/thumbnails/17.jpg)
Integration of Star Catalogs
Roy Williams
2MASS versus DPOSS cross-identification with- j_m as 2MASS magnitude and - I_mtotn as DPOS magnitude
2MASS : j_m ,+ 15DPOSS: I_mtotn <= 18
DPOSS unmatched
2MASS matched
DPOSS matched
2MASS unmateched
Cross Matching
![Page 18: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649eb35503460f94bba2e7/html5/thumbnails/18.jpg)
Visualizing Event SetsUnsupervised clustering
50000 stars in color-color space
![Page 19: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649eb35503460f94bba2e7/html5/thumbnails/19.jpg)
A Grid of Services
Human gets Data
Network of Services
Understood by humanFurther processing after format change
Grid of pipes and enginesSwitches and actuators
data flow
![Page 20: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649eb35503460f94bba2e7/html5/thumbnails/20.jpg)
Example Grid of Services
StorageService
DPOSSService
CatalogService
User’s code CrossmatchService
2MASSService
Query CheckService
QueryEstimator
flexible complex metadataAND
broadband binary
![Page 21: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649eb35503460f94bba2e7/html5/thumbnails/21.jpg)
Computing Challenges
• High-dimensionalClustering & ClassificationVisualizationOutlier Detection
• Visualization of 1010 points
• Database access to 1010 points
• Large Distributed Join
![Page 22: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649eb35503460f94bba2e7/html5/thumbnails/22.jpg)
Standards needed
• Bundling diverse objects togetherwith code and references
• Referencing data resources on the Gridlocal, remote, replicated, ....
![Page 23: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649eb35503460f94bba2e7/html5/thumbnails/23.jpg)
Problem Solving Environment
StorageService
DPOSSService
CatalogService
User’s code CrossmatchService
2MASSService
Query CheckService
QueryEstimator
•Plumbing (big data) and electrical (control, metadata)
•Web service and workflow
•Finding service classes/implementations by semantics
•GUI / Executive / IO adapters / Algorithms