bridging the digital divide: egy and virtual observatories

18
Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC Bridging the Digital Divide: eGY and Virtual Observatories Barbara J. Thompson Solar Physics Branch and IHY/eGY Team NASA Goddard Space Flight Center with Bob Bentley, Rick Bogart, Alisdair Davey, Craig DeForest, Joe Gurman, Neal Hurlburt, Vladimir Papitashvili, Aaron Roberts, Adam Szabo, C. Alex Young, and Dominic Zarro http://egy.org - VO working group and enabling activities http://lwsde.gsfc.nasa. gov - results from a broad community VO discussion & workshop

Upload: hova

Post on 05-Feb-2016

30 views

Category:

Documents


0 download

DESCRIPTION

Bridging the Digital Divide: eGY and Virtual Observatories. Barbara J. Thompson Solar Physics Branch and IHY/eGY Team NASA Goddard Space Flight Center with - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Bridging the Digital Divide: eGY and Virtual Observatories

Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC

Bridging the Digital Divide:

eGY and Virtual Observatories

Barbara J. ThompsonSolar Physics Branch and IHY/eGY Team

NASA Goddard Space Flight Centerwith

Bob Bentley, Rick Bogart, Alisdair Davey, Craig DeForest, Joe Gurman, Neal Hurlburt, Vladimir Papitashvili, Aaron

Roberts, Adam Szabo, C. Alex Young, and Dominic Zarro

http://egy.org - VO working group and enabling activitieshttp://lwsde.gsfc.nasa.gov - results from a broad community VO discussion & workshop

Page 2: Bridging the Digital Divide: eGY and Virtual Observatories

Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC

Why don’t people use all existing data?

• Don’t know it exists• Don’t know how to obtain the data• Data permission issues • Don’t know enough about it to be able to analyze it• Don’t have access to or knowledge of the software• Takes too much effort to analyze it all• Data format or software incompatibility

All of these are clearly related to the objectives of the eGY. Virtual observatories will play a major role in helping us cope with most of these issues.

Page 3: Bridging the Digital Divide: eGY and Virtual Observatories

Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC

What is a VO?

More importantly, what is the difference between a VO and a data system?

- A data system’s primary concern is storage and service of data, and the accessibility / user interface can vary greatly between systems. - A virtual observatory does not necessarily store data, but provides a single access point to available data & models, using a standard interface.

Page 4: Bridging the Digital Divide: eGY and Virtual Observatories

Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC

What is a VO?

Data System I

Data System II

Data Products,Model Services

Search/query & result

User

VO Approach:

Streamlined query

Catalogableresults

Data Systems,Products,

Model Services

User query is translated to distributed query/search

UsersVONew products

Analysis toolsGood ideas

Data System Approach:

Page 5: Bridging the Digital Divide: eGY and Virtual Observatories

Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC

How a VO works

How a VO works and the services a VO provides is largely determined by the community it serves. Still, many VO’s contain the following features:

An interface & tools that make it easy to locate and retrieve data from catalogs, archives, and databases Interoperability: data services that can be used regardless of the client’s computing platform, operating system, and software capabilities Tools or access to tools for data analysis, modeling, simulation, and visualisation Tools to compare observations with results obtained from models, simulations, and theory Access to data in near real-time when necessary, as well as archived and historical data.

Different modes of implementation, such as socket-based programming and the “pull vs. push” concept, depend on user needs & preferences, global access issues, & the complexity of the data/models involved.

Page 6: Bridging the Digital Divide: eGY and Virtual Observatories

Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC

Key Features of a (Successful) VO

• Universal accessibility: web browser interface or a standard, easily implemented platform that can quickly be integrated with multiple user analysis environments• Easy to join• Provides not only data & models, but is able to provide information & access to tools, products & services enabling the science• Doesn’t reinvent any wheels - uses what’s available and adds features and versatility • Grass-rootsy, ground-up approach: few features have to be hard-wired or there from the beginning. New features and services can be added in an organic, community-based way• Adaptive - most VO’s are continuously modified and improved• Reverse-compatible: because most VO’s are constantly being updated, improved, modified, upgraded…• Less emphasis on formats, more emphasis on catalogability, accessibility & retrievability • Community-supported: allows ideas and advances from the entire scientific community to be rapidly ingested for global use• Focus remains on the fundamentals: enabling the user to locate, query & access data, models, tools & services

Page 7: Bridging the Digital Divide: eGY and Virtual Observatories

Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC

VO Data Provider’s Responsibilities and Requirements (there aren’t many!):

• Data must be accessible and retrievable in some agreed-upon standard way• The data’s search interface must be compatible with the VO, or the data provider must provide metadata which can be queried• Ideally, the data provider will also make analysis software available that is compatible and takes advantage of the VO interface • Data provider must still take responsibility to ensure intelligent analysis

Page 8: Bridging the Digital Divide: eGY and Virtual Observatories

Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC

What are the advantages of VO’s?

Ease of use / Accessibility:

• Enables greater access to data, including researchers in developing nations. This is good for scientists around the globe, and it’s more and more project reviews are taking into consideration the breadth of the “user base.”

• VO’s (should) talk to other VO’s. Compatibility with other data environments isn’t as much of an issue once you’ve joined a VO.

• Enables the use of multiple types of data – data format issues become more transparent

Cost and Time:

• Cheap for the data producers – to serve data, you needn’t set up a big data system. Just join a VO.

• Saves data retrieval time for the user of the data

• Saves analysis time – most VO’s are also able to provide information about and access to higher-level data products and results produced by other users

Page 9: Bridging the Digital Divide: eGY and Virtual Observatories

Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC

What are the advantages of VO’s?

Enables Science:

• You don’t look for the data, it finds you. Scientists will use more sources of data in analysis.• Reproducability/verification of results• Data products and higher levels of processing can be served as well, regardless of the source• Forms a foundation and interface for virtual analysis activities – VO’s are only the beginning!• A VO also can provide a versatile interface to electronic analysis activities, such as virtual “workflows,” open-source software environments, and Virtual Analysis Environments (VAE)

Page 10: Bridging the Digital Divide: eGY and Virtual Observatories

Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC

What a VO Won’t Do

• VO’s will never be able to remove the need for an active human role in data analysis. However, it will allow humans to do it with much greater efficiency.

• Data mining• Intelligent agents (AI/neural nets)• Data provider must still take responsibility to ensure

intelligent analysis

Page 11: Bridging the Digital Divide: eGY and Virtual Observatories

Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC

Why do we need VO’s?

- They save time & money- Broker / matchmaker between data providers and scientists - “you don’t have to find the data, it finds you.”- Enables global e-Science: A VO can provide a versatile interface to electronic analysis activities, such as virtual “workflows,” open-source software environments, and Virtual Analysis Environments (VAE)- Can play a major role in enabling science in developing nations

http://egy.org - VO working group and enabling activitieshttp://lwsde.gsfc.nasa.gov - results from a broad community VO discussion & workshop

Page 12: Bridging the Digital Divide: eGY and Virtual Observatories

Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC

The Solar Physics Division of the AAS andThe Solar Physics Division of the AAS andthe electronic Geophysical Year invite you to athe electronic Geophysical Year invite you to a

Showcase of Virtual Showcase of Virtual ObservatoriesObservatories

All virtual observatory initiatives are encouraged to participate. Room 237 will also serve as the “SPD Tutorial Facility” throughout this meeting. Please stop by to view the schedule of events.

Room 237, Morial Convention Center

May 24, 2005 2:00 - 4:00 PM

during the eGY poster session Tuesday afternoon

Page 13: Bridging the Digital Divide: eGY and Virtual Observatories

Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC

Backup / Misc Slides

Page 14: Bridging the Digital Divide: eGY and Virtual Observatories

Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC

We’re not downhearted (yet)

• The capacity and capacity per unit price of disk storage has steadily increased; doubling time is ~ 7 months

• Network-attached RAID (NAS) servers are becoming simpler and cheaper

• Simply storing the data is not a problem in the forseeable future

Data source: Rev. C of Seagate SCSIdisk drive product manuals (i.e., firstOEM-quantity release)

Page 15: Bridging the Digital Divide: eGY and Virtual Observatories

Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC

But how do we find and useall those data?

• Solar data searches tend to be for multiple wavelength/entendu data sources for the same time period (i.e. not RA and DEC or other position or object)

• Current archives are available on the Web at many sites, with heterogeneous search capabilities

• Most but not all data of current interest are in FITS format

• SolarSoft tree (in IDL™) offers ground-based, multiple space-based observatory support, as well lots of “generic” functionality (“the wheel” that you don’t want to have to reinvent)

Page 16: Bridging the Digital Divide: eGY and Virtual Observatories

Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC

Biggering and biggering

• Solar data set sizes are growing at an impressive rate

• Data sets that are “only” several Tbyte in size will be dwarfed in 5 - 6 years

Data point sizes represent the data rate; the ordinaterepresents the total data volume from the source.

Page 17: Bridging the Digital Divide: eGY and Virtual Observatories

Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC

Toward a Virtual Solar Observatory

• Three parts:– distributed archives– metadata “broker”

facility– Web-based front end

• Can have different implementations– XML– Gnutella– &c.

• Will be low-cost– No more than $1.2M

over next four years

• We must be really smart:– Same model adopted by NVO,

PDS; EGSO examining

• …. or maybe there’s exactly one obvious way to do this

Page 18: Bridging the Digital Divide: eGY and Virtual Observatories

Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC

Volodya:

I like your VO definition a lot, and thanks for the comments. They'll be a great help! I completely understand your comments about the CDAWs. The CDAWs had a great deal of difficulty generalizing their activities for individual users, while VO's start from that approach. We might have put the cart before the horse - it appears that the virtual analysis environments will be spawned by the VO's, and not the other way around. Can you think of any examples (outside of COSEC, my best example so far) of analysis environments extending from VO's? Your "pull" data concept enables you to store data for future use, and it can also store analyzed data and products as they are produced. I want to do a bit of prognosticating at the end of the talk, because I think the VO's will develop far beyond a streamlined data access system as they enable online analysis and joint analysis projects. Perhaps it's not too late for the CDAWs.