computational and statistical problems for the virtual observatory

27
2004-06-03 CMU-CS lunch talk, Gerard Lemson 1 Computational and statistical problems for the Virtual Observatory With contributions from/thanks to: GAVO team: Wolfgang Voges, Matthias Steinmetz, Harry Enke, Hans- Martin Adorf Joerg Colberg (NVO@UPitt), Pat Dowler (CVO), Tony Banday (MPA), Class X team

Upload: mahdis

Post on 22-Jan-2016

27 views

Category:

Documents


0 download

DESCRIPTION

Computational and statistical problems for the Virtual Observatory. With contributions from/thanks to: GAVO team: Wolfgang Voges, Matthias Steinmetz, Harry Enke, Hans-Martin Adorf Joerg Colberg (NVO@UPitt), Pat Dowler (CVO), Tony Banday (MPA), Class X team. Overview. Intro to VO - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Computational and statistical problems for the Virtual Observatory

2004-06-03 CMU-CS lunch talk, Gerard Lemson

1

Computational and statistical problems for the Virtual Observatory

With contributions from/thanks to:GAVO team: Wolfgang Voges, Matthias Steinmetz, Harry Enke, Hans-Martin

AdorfJoerg Colberg (NVO@UPitt), Pat Dowler (CVO), Tony Banday (MPA),

Class X team

Page 2: Computational and statistical problems for the Virtual Observatory

2004-06-03 CMU-CS lunch talk, Gerard Lemson

2

Overview

• Intro to VO• IVOA standards process• Some concrete examples, demos• Scenarios, science cases• Interesting problems

Page 3: Computational and statistical problems for the Virtual Observatory

2004-06-03 CMU-CS lunch talk, Gerard Lemson

3

Intro to VO

• Very large data sets• Multi-wavelength astronomy made

easy• Federation of distributed archives.• Publication of expert services.• New software developments.• Why contribute ?• Too easy to do bad science ?

Page 4: Computational and statistical problems for the Virtual Observatory

2004-06-03 CMU-CS lunch talk, Gerard Lemson

4

Page 5: Computational and statistical problems for the Virtual Observatory

2004-06-03 CMU-CS lunch talk, Gerard Lemson

5

IVOA standards and specifications

• Collaboration of national VOs• Develop standards for interoperability

– publication (registry)– description (dm, ucd)– query (dal, voql)– data transfer (votable)– services (grid/web services)

• Interest groups:– architecture– applications– theory

Page 6: Computational and statistical problems for the Virtual Observatory

2004-06-03 CMU-CS lunch talk, Gerard Lemson

6

Babylonian confusion

Page 7: Computational and statistical problems for the Virtual Observatory

2004-06-03 CMU-CS lunch talk, Gerard Lemson

7

VO domain model as Esperanto

Page 8: Computational and statistical problems for the Virtual Observatory

2004-06-03 CMU-CS lunch talk, Gerard Lemson

8

Page 9: Computational and statistical problems for the Virtual Observatory

2004-06-03 CMU-CS lunch talk, Gerard Lemson

9

Workstation

Registry

RegistryQuery

Interface

VOQL Engine(parsing,splitting,planning)

Portal

WSDL/SOAP

SOAP(Web)Services

MetaDataRepository

DomainModel

Data Access Layer

ResourceModel

MPA

Simulations

Mapping,Services

ROSAT

RASSFields BSC

RASSPhotons

Mapping,Service

SDSS

SQLServer

Mapping,Services

ADQL/VOTable

(GA)VO

NGSE

VOTable

Page 10: Computational and statistical problems for the Virtual Observatory

2004-06-03 CMU-CS lunch talk, Gerard Lemson

10

Protocols

• VOTable + UCD DM based XML + XSLT

• SCS/SIAP/SSAP ADQL VOQL• SkyNode• Registry resource model and

harvesting interface

Page 11: Computational and statistical problems for the Virtual Observatory

2004-06-03 CMU-CS lunch talk, Gerard Lemson

11

Data models

• Targeted “small” data models– Quantity– Observation– Simulation

• Domain model as ontology • Meta-data repository• Bindings• Representations, views, transformations

Page 12: Computational and statistical problems for the Virtual Observatory

2004-06-03 CMU-CS lunch talk, Gerard Lemson

12

-name : string

Standards::Category

*-baseClass 0..1

1

-possibleValue

1..*

-abbreviation : string-amount : numeric

Standards::AtomicUnit

-power : rational-amount : numeric

Standards::ComponentUnit

Standards::CompoundUnit

1-component 1..*

Standards::Unit

*

-component1

-amount : numeric

Values::AtomicQuantity

*

-unit

1

Values::Classifier

-name : string

Values::ComponentQuantity

Values::CompositeQuantity

1

-component

1..*

Values::Identifier

Values::Quantity

*

-quantity 1Values::Value

-identifier : string

Experiments::Experiment-identifier : string-documentationURL : string

Protocols::Protocol

*

-recipe

1

Experiments::Result

1

-result

*

Protocols::ConfigurationDescriptor

-identifier : string

Protocols::Objective

1

-observable

*

Experiments::Subject

*

-observable *

1

-observation

*

-name : string-isIndependent : boolean

Protocols::Variable

*

-property

1

1

-variables

1..*

Experiments::Image

Experiments::ObjectList

Experiments::ConfidenceIndication

1

-confidence

*

Experiments::ValueAssignment

1

-values

* *

-variable 1

Experiments::Measurement

1

-value

1

Experiments::Identification

*

-value1

Experiments::Classification

* -value1

Protocols::AstronomicalObservatory

Protocols::Analysis

Protocols::Callibration Protocols::Simulator

Experiments::Configuration

*

-protocol

1

1

-configuration

*

1

-configurationParameter

1

Protocols::SourceExtraction

Experiments::InputData

Experiments::TimeOrderedData

Experiments::VisibilityData

Standards::CoordinateSystem

-name : string

Standards::EnergyBand

-locator : string-description : string

Products::PhysicalArtifact

-name : string-description : string

Standards::Name

*

-subject

1

*

-artifact 1

1

-inputData

*

*

-id

1

Standards::ClassificationSystem

*

-baseClassifcation

*

*

-category

1

Standards::NamingSystem

1

-object

*

*

-phenomenon

1

*

-phenomenon

1

-identifier : string-description : string

Standards::ReferenceSystem

Protocols::InputDataType

1

-inputDataType

* *

-type

1

Standards::MagnitudeSystem

Protocols::DataProcessingProtocols::Stacking

Protocols::CrossMatching

Standards::Constant

-name : string-abbreviation : string

Standards::PhysicalConstant

*

-value1

-name : string

Types::AbstractType

Types::DatatypeTypes::Representation

-name : string

Types::Field1

-field

*

*

-type1

*

-referenceSystem0..1* -type_11

*

-type 1

Protocols::Query

Phenomenology::AtomicNumericPhenomenon

*

-phenomenon

1

Phenomenology::BaseNumericPhenomenon

Phenomenology::CategoricalPhenomenon

Phenomenology::CompositePhenomenon

Phenomenology::DecompositionalPhenomenon

Phenomenology::DerivedNumericPhenomenon

-power : integer

Phenomenology::DerivedPhenomenonComponent

Phenomenology::Identification

Phenomenology::NumericPhenomenon

-name : string-description : string

Phenomenology::Phenomenon

Phenomenology::PositionalPhenomenon

-name : string

Phenomenology::Property

Phenomenology::ScientificArtifact

Phenomenology::SpatialSubjectType

-name : string-description : string

Phenomenology::SubjectType

*

-type

1

Phenomenology::Substance

1

-property1..*

1 -components1..*

*

-phenomenon

1

Phenomenology::TangibleObject

*

-component 1

*

-phenomenon

1

1

-uncertainty1

Experiments::Uncertainty

Page 13: Computational and statistical problems for the Virtual Observatory

2004-06-03 CMU-CS lunch talk, Gerard Lemson

13

«key» -identifier : string

Experiments::Experiment

Experiments::Result

-result*

Experiments::Measurement

Experiments::ValueAssignment

Values::Quantity

1

-value

1

Values::ClassifierExperiments::Classification -value

1

Experiments::Subject

-values*

-observation*

«key» -identifier : string-documentationURL : string

Protocols::Protocol-recipe

1

«key» -identifier : string

Protocols::Objective

-observable*

-observable

*

«key» -name : string-isIndependent : boolean

Protocols::Variable-variable

1

-variables1..*

«key» -name : string-description : string

Phenomenology::SubjectType-type

1

«key» -name : string

Phenomenology::Property-property

1

-property1..*

«key» -name : string-description : string

Phenomenology::Phenomenon

-phenomenon1

«key» -locator : string-description : string

Products::PhysicalArtifact

-artifact 1

Page 14: Computational and statistical problems for the Virtual Observatory

2004-06-03 CMU-CS lunch talk, Gerard Lemson

14

Theory in the VOWith Joerg Colberg

http://ivoa.net/pub/papers/TheoryInTheVO.pdf

• Spatial query protocols irrelevant• No object-based federation• New phenomena/observables.• Different kind of provenance.• Model dependency.• Theoretical archives rather

unstructured.• Theory/observational interface.

Page 15: Computational and statistical problems for the Virtual Observatory

2004-06-03 CMU-CS lunch talk, Gerard Lemson

15

Observed

Simulated

Thanks to Alexis Finoguenov, Ulrich Briel, Peter Schuecker, MPE)

Thanks to Volker Springel

Page 16: Computational and statistical problems for the Virtual Observatory

2004-06-03 CMU-CS lunch talk, Gerard Lemson

16

Some concrete efforts

• NVO (USA): Registry (DIS), ADQL, SkyNode, data mining (UPitt+CMU)

• AstroGrid (UK): grid/web services, work flows

• AVO (ESO, CDS, AstroGrid): Aladin visualization tool, science demos

• CVO (Canada): archive federation• France VO: GalICS• GAVO (Germany): data publication (

RASS photons), application prototypes, data mining, theory

Page 17: Computational and statistical problems for the Virtual Observatory

2004-06-03 CMU-CS lunch talk, Gerard Lemson

17

Scenarios, use cases, results

• Registry based data discovery and retrieval (GAVO, DIS)

• Class X classifier and generalizations• X-Ray cluster analysis using simulations• Cluster detection by combining SDSS

and RASS catalogues (Schuecker et al, astro-ph/0403116)

• Discovery of obscured quasars using VO tools (Padovani et al, astro-ph/0406056)

Page 18: Computational and statistical problems for the Virtual Observatory

2004-06-03 CMU-CS lunch talk, Gerard Lemson

18

Typical workflow

Find potentialcounterparts

Analyse result:classify, plot, fit

Identify

Externalarchives

Prepare input sources(upload, query, ..)

Analysisservice

ProbabilisitcMatcher

List upload services SOAP/HTTP

Extractgeneralized

SED

Page 19: Computational and statistical problems for the Virtual Observatory

2004-06-03 CMU-CS lunch talk, Gerard Lemson

19

Multi-Catalogue Multi-Cone Search

"Download Manager"Probabilistic MatcherVOTable Processor

Simple ConeSearch Service #1

ServiceRegistry

Table onLocal Disk

Simple ConeSearch Service #2

VOTables

VOTables

VOTable

BaseURLs

BaseURLs

Simple ConeSearch Service #3

VOTable

MatcherDataSets

Local Disk

VOTable

VOTablesTable

One or moreSCS Queries

Local Disk

InternetTable

Download manager

Page 20: Computational and statistical problems for the Virtual Observatory

2004-06-03 CMU-CS lunch talk, Gerard Lemson

20

ClassX@GAVO

Multi-Catalogue List

UploadClassification

MultiCatalogue

XMatch

SUMMS@VizieRNVSS@VizieR

HTTP GET

CSV USNO@VizieR

CSV

HTTP GET

CSVRASS

Archive@GAVO

HTML

CSV

RASSSourceQuery

JDBCResultSet

JSP/HTMLForm

(DisplayClassXResult)

HTTP GET

ClassXClassifier@HEASARC

JSP/HTMLForm

(DisplayXMatchResult)

JSP/HTMLForm

(DisplayMatchResult)

JSP/HTMLForm

(DisplayQueryResult)

HTML HTML HTML

ProbabilisitcMatcher

Java Objects

JSP/HTMLForm

(PoseQuery)

HTTP GET

SQLHTTP POST

HTTP POST

HTTP POST

HTTP POSTJava API

Call

Page 21: Computational and statistical problems for the Virtual Observatory

2004-06-03 CMU-CS lunch talk, Gerard Lemson

21

Theory/observational interface: X-Ray clusters

Goal: interpret observations of X-Ray cluster using results of hydro simulations:

1. Extract parameters from the observation (services) that can be queried directly (dm, ucd).

2. Find simulations that may be relevant, that are “similar” to observation by searching registry for hydro simulations of clusters (registry, voql). Requires simulation results to be published and described in sufficient detail (dm, ucd).

3. Observe simulations using “virtual telescope” (application, grid/webservices) configured according to telescope configuration extracted from observation (dm).

4. Compare real with virtual observation (services).5. For interesting simulation, extract full simulation result

(dal) for further analysis,6. or analyse the simulation using services (grid-services)

provided by the archive or some other service provider

Page 22: Computational and statistical problems for the Virtual Observatory

2004-06-03 CMU-CS lunch talk, Gerard Lemson

22

Virtualtelescope

Simulationarchive

Theory/observational

interface

Find possibly relevanthydro simulations(registry, dm, voql)

Observe selectedsimulations virtually

(application, services)

X-RayObservation

“Find similarsimulations”

Retrieve data (dal)

FeatureExtractor

Registryservice

ComparatorRetrieve data (dal)Analysissoftware

Analyse

Compare

Extract queriablefeatures (dm, ucd)

Retrieve telescopeconfiguration (dm)

Page 23: Computational and statistical problems for the Virtual Observatory

2004-06-03 CMU-CS lunch talk, Gerard Lemson

23

Computational, statistical and astronomical challenges I

Data models• Data modeling • Data model transformations, views• Archive structure• Database tuningQuerying, matching• Distributed query algorithms• Probabilistic matchers, systematic errors, identification of

moving sources• Improve identification using full point process information• Add physical properties, not just position, to identification• Complex, frequency dependent source definition• Characterization of complex results in "few" parameters

for discovery (PCA (after transformation)? 3D->2D ?)• Comparison of real and virtual observations

Page 24: Computational and statistical problems for the Virtual Observatory

2004-06-03 CMU-CS lunch talk, Gerard Lemson

24

Usage

• Complex model • Simplify using view concept• Example from RDB • XSLT for translation between

domain XSD and application-specific derived schemas.

Page 25: Computational and statistical problems for the Virtual Observatory

2004-06-03 CMU-CS lunch talk, Gerard Lemson

25

-_RAJ2000-_DECJ2000-M_APP-classification-image

SextractorGalaxies

CREATE VIEW SEXTRACTOR_GALAXIES ASSELECT S.RA AS _RAJ2000,

S.DEC AS _DECJ2000, -2.5 * LOG(S.FLUX) AS M_APP, S.CLASSIFICATION, I.STORAGE_URL AS IMAGE

FROM SOURCE S, SOURCE_CATALOGUE SC,

IMAGE I, SOURCE_EXTRACTOR AS SE

WHERE S.CLASS = ‘GALAXY’AND S.FLUX < 15AND S.CATALOGUE_ID = SC.IDAND IMAGE.ID = SC.IMAGE_IDAND SC.EXTRACTED_WITH = SOURCE_EXTRACTOR.IDAND SE.IDENTIFIER = ‘SExtractor’

Page 26: Computational and statistical problems for the Virtual Observatory

2004-06-03 CMU-CS lunch talk, Gerard Lemson

26

Probabilistic cross matching

RASS FSC

USNO

NVSS

Page 27: Computational and statistical problems for the Virtual Observatory

2004-06-03 CMU-CS lunch talk, Gerard Lemson

27

Computational, statistical and astronomical challenges II

Data mining• Algorithms for analyzing generic SEDs (classifiers

? visualization ? incorrect identification ?)• Source extraction using multiple images, at very

different wavelengths, how to take into account different physics/images of same source at different wavelengths ?

• Cluster finders using multiple catalogues • Publish sophisticated statistical analysis

algorithmsImplementation• Efficient implementation virtual telescopes

(parallel, distributed, grid based, data structures)