knowledge extraction from scientific data roy williams california institute of technology...

23
Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology [email protected] SDMIV 24 October 2002 Edinburgh KE Tools S Data

Upload: angelina-cole

Post on 13-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data

Knowledge Extractionfrom

Scientific Data

Roy WilliamsCalifornia Institute of Technology

[email protected]

SDMIV24 October 2002

Edinburgh

KE Tools S Data

Page 2: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data

Scientific Data Datacubes

N-dimensional array– spectrum, time-series, – image, voxels, hyperspectral image

Concentration Pattern matching Integration

Event Sets Often derived from pattern matching A set of events is a table Integrating Event Sets Clustering

Page 3: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data

Knowledge Extraction

Concentration principle components cluster/outlier finding

Datacube Eventset Pattern matching From theory or from training set

Integration registration of datacubes join / crossmatch of eventsets

Page 4: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data

DatacubeSome stars from the DPOSS survey

Page 5: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data

DatacubeAn AVIRIS image of San Francisco Bay

400-2500 nm in 224 bandsR. Green, JPL

atmosphericabsorption

Page 6: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data

Concentrating Information

eg Principle Component Analysis Given a set of vectors Compute dot products

(same as correlations)

Diagonalize Throw out weaker (noise) components

Page 7: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data

Information concentrationPrinciple Component Analysis

Page 8: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data

Event Sets

Created by pattern matching from a known rule from a training set by finding clusters

Page 9: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data

Event Set = Table

name=longitudecontent=Earth coordinateunits=degreesdatatype=doubledisplay=f6.2

43.487.283.2

name=IDcontent=keyunits=nonedatatype=char

E3948547E3948545E3943766108?

103?

Page 10: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data

Gravitational Lenses

A. Szalay, Johns Hopkins

Pattern matching finds events in datacubes

Page 11: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data

Black hole collisionsLIGO: Laser Interferometric Gravitational Wave Experiment

Page 12: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data

Creating Event SetsGiven a set of volcanoes, find a lot more volcanoesHere we use Singular Value Decomposition

Supervised Classification

Page 13: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data

all sources

stellargalaxy

compactgalaxy

high fX/fopt

low fX/fopt

all sources

activedM stars

BLAGN

medium fX/fopt

NELGs

possible hi-z quasar

F/G stars?

normalgalaxies?

symbols: X-ray source counterpartscontours: all optical objects

BLAGN

Multiparameterdatacolour-colour-fx/fopt

Mike WatsonLeicester University

Page 14: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data

Integrating Datacubes

Find a mapping from one domain to the otherRegistration of DPOSS and Hubble Deep Field

Page 15: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data

Datacube RegistrationMovement of ice inferred from registration

Page 16: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data

Integrating Event Sets

Database Join Fuzzy Join

eg astronomical crossmatch

Distributed Join does the Grid do databases?

Page 17: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data

Integration of Star Catalogs

Roy Williams

2MASS versus DPOSS cross-identification with- j_m as 2MASS magnitude and - I_mtotn as DPOS magnitude

2MASS : j_m ,+ 15DPOSS: I_mtotn <= 18

DPOSS unmatched

2MASS matched

DPOSS matched

2MASS unmateched

Cross Matching

Page 18: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data

Visualizing Event SetsUnsupervised clustering

50000 stars in color-color space

Page 19: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data

A Grid of Services

Human gets Data

Network of Services

Understood by humanFurther processing after format change

Grid of pipes and enginesSwitches and actuators

data flow

Page 20: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data

Example Grid of Services

StorageService

DPOSSService

CatalogService

User’s code CrossmatchService

2MASSService

Query CheckService

QueryEstimator

flexible complex metadataAND

broadband binary

Page 21: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data

Computing Challenges

• High-dimensionalClustering & ClassificationVisualizationOutlier Detection

• Visualization of 1010 points

• Database access to 1010 points

• Large Distributed Join

Page 22: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data

Standards needed

• Bundling diverse objects togetherwith code and references

• Referencing data resources on the Gridlocal, remote, replicated, ....

Page 23: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data

Problem Solving Environment

StorageService

DPOSSService

CatalogService

User’s code CrossmatchService

2MASSService

Query CheckService

QueryEstimator

•Plumbing (big data) and electrical (control, metadata)

•Web service and workflow

•Finding service classes/implementations by semantics

•GUI / Executive / IO adapters / Algorithms