knowledge extraction from scientific data roy williams california institute of technology...

Post on 13-Jan-2016

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Knowledge Extractionfrom

Scientific Data

Roy WilliamsCalifornia Institute of Technology

roy@caltech.edu

SDMIV24 October 2002

Edinburgh

KE Tools S Data

Scientific Data Datacubes

N-dimensional array– spectrum, time-series, – image, voxels, hyperspectral image

Concentration Pattern matching Integration

Event Sets Often derived from pattern matching A set of events is a table Integrating Event Sets Clustering

Knowledge Extraction

Concentration principle components cluster/outlier finding

Datacube Eventset Pattern matching From theory or from training set

Integration registration of datacubes join / crossmatch of eventsets

DatacubeSome stars from the DPOSS survey

DatacubeAn AVIRIS image of San Francisco Bay

400-2500 nm in 224 bandsR. Green, JPL

atmosphericabsorption

Concentrating Information

eg Principle Component Analysis Given a set of vectors Compute dot products

(same as correlations)

Diagonalize Throw out weaker (noise) components

Information concentrationPrinciple Component Analysis

Event Sets

Created by pattern matching from a known rule from a training set by finding clusters

Event Set = Table

name=longitudecontent=Earth coordinateunits=degreesdatatype=doubledisplay=f6.2

43.487.283.2

name=IDcontent=keyunits=nonedatatype=char

E3948547E3948545E3943766108?

103?

Gravitational Lenses

A. Szalay, Johns Hopkins

Pattern matching finds events in datacubes

Black hole collisionsLIGO: Laser Interferometric Gravitational Wave Experiment

Creating Event SetsGiven a set of volcanoes, find a lot more volcanoesHere we use Singular Value Decomposition

Supervised Classification

all sources

stellargalaxy

compactgalaxy

high fX/fopt

low fX/fopt

all sources

activedM stars

BLAGN

medium fX/fopt

NELGs

possible hi-z quasar

F/G stars?

normalgalaxies?

symbols: X-ray source counterpartscontours: all optical objects

BLAGN

Multiparameterdatacolour-colour-fx/fopt

Mike WatsonLeicester University

Integrating Datacubes

Find a mapping from one domain to the otherRegistration of DPOSS and Hubble Deep Field

Datacube RegistrationMovement of ice inferred from registration

Integrating Event Sets

Database Join Fuzzy Join

eg astronomical crossmatch

Distributed Join does the Grid do databases?

Integration of Star Catalogs

Roy Williams

2MASS versus DPOSS cross-identification with- j_m as 2MASS magnitude and - I_mtotn as DPOS magnitude

2MASS : j_m ,+ 15DPOSS: I_mtotn <= 18

DPOSS unmatched

2MASS matched

DPOSS matched

2MASS unmateched

Cross Matching

Visualizing Event SetsUnsupervised clustering

50000 stars in color-color space

A Grid of Services

Human gets Data

Network of Services

Understood by humanFurther processing after format change

Grid of pipes and enginesSwitches and actuators

data flow

Example Grid of Services

StorageService

DPOSSService

CatalogService

User’s code CrossmatchService

2MASSService

Query CheckService

QueryEstimator

flexible complex metadataAND

broadband binary

Computing Challenges

• High-dimensionalClustering & ClassificationVisualizationOutlier Detection

• Visualization of 1010 points

• Database access to 1010 points

• Large Distributed Join

Standards needed

• Bundling diverse objects togetherwith code and references

• Referencing data resources on the Gridlocal, remote, replicated, ....

Problem Solving Environment

StorageService

DPOSSService

CatalogService

User’s code CrossmatchService

2MASSService

Query CheckService

QueryEstimator

•Plumbing (big data) and electrical (control, metadata)

•Web service and workflow

•Finding service classes/implementations by semantics

•GUI / Executive / IO adapters / Algorithms

top related