bridging the digital divide: egy and virtual observatories
DESCRIPTION
Bridging the Digital Divide: eGY and Virtual Observatories. Barbara J. Thompson Solar Physics Branch and IHY/eGY Team NASA Goddard Space Flight Center with - PowerPoint PPT PresentationTRANSCRIPT
Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC
Bridging the Digital Divide:
eGY and Virtual Observatories
Barbara J. ThompsonSolar Physics Branch and IHY/eGY Team
NASA Goddard Space Flight Centerwith
Bob Bentley, Rick Bogart, Alisdair Davey, Craig DeForest, Joe Gurman, Neal Hurlburt, Vladimir Papitashvili, Aaron
Roberts, Adam Szabo, C. Alex Young, and Dominic Zarro
http://egy.org - VO working group and enabling activitieshttp://lwsde.gsfc.nasa.gov - results from a broad community VO discussion & workshop
Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC
Why don’t people use all existing data?
• Don’t know it exists• Don’t know how to obtain the data• Data permission issues • Don’t know enough about it to be able to analyze it• Don’t have access to or knowledge of the software• Takes too much effort to analyze it all• Data format or software incompatibility
All of these are clearly related to the objectives of the eGY. Virtual observatories will play a major role in helping us cope with most of these issues.
Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC
What is a VO?
More importantly, what is the difference between a VO and a data system?
- A data system’s primary concern is storage and service of data, and the accessibility / user interface can vary greatly between systems. - A virtual observatory does not necessarily store data, but provides a single access point to available data & models, using a standard interface.
Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC
What is a VO?
Data System I
Data System II
Data Products,Model Services
Search/query & result
User
VO Approach:
Streamlined query
Catalogableresults
Data Systems,Products,
Model Services
User query is translated to distributed query/search
UsersVONew products
Analysis toolsGood ideas
Data System Approach:
Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC
How a VO works
How a VO works and the services a VO provides is largely determined by the community it serves. Still, many VO’s contain the following features:
An interface & tools that make it easy to locate and retrieve data from catalogs, archives, and databases Interoperability: data services that can be used regardless of the client’s computing platform, operating system, and software capabilities Tools or access to tools for data analysis, modeling, simulation, and visualisation Tools to compare observations with results obtained from models, simulations, and theory Access to data in near real-time when necessary, as well as archived and historical data.
Different modes of implementation, such as socket-based programming and the “pull vs. push” concept, depend on user needs & preferences, global access issues, & the complexity of the data/models involved.
Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC
Key Features of a (Successful) VO
• Universal accessibility: web browser interface or a standard, easily implemented platform that can quickly be integrated with multiple user analysis environments• Easy to join• Provides not only data & models, but is able to provide information & access to tools, products & services enabling the science• Doesn’t reinvent any wheels - uses what’s available and adds features and versatility • Grass-rootsy, ground-up approach: few features have to be hard-wired or there from the beginning. New features and services can be added in an organic, community-based way• Adaptive - most VO’s are continuously modified and improved• Reverse-compatible: because most VO’s are constantly being updated, improved, modified, upgraded…• Less emphasis on formats, more emphasis on catalogability, accessibility & retrievability • Community-supported: allows ideas and advances from the entire scientific community to be rapidly ingested for global use• Focus remains on the fundamentals: enabling the user to locate, query & access data, models, tools & services
Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC
VO Data Provider’s Responsibilities and Requirements (there aren’t many!):
• Data must be accessible and retrievable in some agreed-upon standard way• The data’s search interface must be compatible with the VO, or the data provider must provide metadata which can be queried• Ideally, the data provider will also make analysis software available that is compatible and takes advantage of the VO interface • Data provider must still take responsibility to ensure intelligent analysis
Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC
What are the advantages of VO’s?
Ease of use / Accessibility:
• Enables greater access to data, including researchers in developing nations. This is good for scientists around the globe, and it’s more and more project reviews are taking into consideration the breadth of the “user base.”
• VO’s (should) talk to other VO’s. Compatibility with other data environments isn’t as much of an issue once you’ve joined a VO.
• Enables the use of multiple types of data – data format issues become more transparent
Cost and Time:
• Cheap for the data producers – to serve data, you needn’t set up a big data system. Just join a VO.
• Saves data retrieval time for the user of the data
• Saves analysis time – most VO’s are also able to provide information about and access to higher-level data products and results produced by other users
Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC
What are the advantages of VO’s?
Enables Science:
• You don’t look for the data, it finds you. Scientists will use more sources of data in analysis.• Reproducability/verification of results• Data products and higher levels of processing can be served as well, regardless of the source• Forms a foundation and interface for virtual analysis activities – VO’s are only the beginning!• A VO also can provide a versatile interface to electronic analysis activities, such as virtual “workflows,” open-source software environments, and Virtual Analysis Environments (VAE)
Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC
What a VO Won’t Do
• VO’s will never be able to remove the need for an active human role in data analysis. However, it will allow humans to do it with much greater efficiency.
• Data mining• Intelligent agents (AI/neural nets)• Data provider must still take responsibility to ensure
intelligent analysis
Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC
Why do we need VO’s?
- They save time & money- Broker / matchmaker between data providers and scientists - “you don’t have to find the data, it finds you.”- Enables global e-Science: A VO can provide a versatile interface to electronic analysis activities, such as virtual “workflows,” open-source software environments, and Virtual Analysis Environments (VAE)- Can play a major role in enabling science in developing nations
http://egy.org - VO working group and enabling activitieshttp://lwsde.gsfc.nasa.gov - results from a broad community VO discussion & workshop
Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC
The Solar Physics Division of the AAS andThe Solar Physics Division of the AAS andthe electronic Geophysical Year invite you to athe electronic Geophysical Year invite you to a
Showcase of Virtual Showcase of Virtual ObservatoriesObservatories
All virtual observatory initiatives are encouraged to participate. Room 237 will also serve as the “SPD Tutorial Facility” throughout this meeting. Please stop by to view the schedule of events.
Room 237, Morial Convention Center
May 24, 2005 2:00 - 4:00 PM
during the eGY poster session Tuesday afternoon
Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC
Backup / Misc Slides
Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC
We’re not downhearted (yet)
• The capacity and capacity per unit price of disk storage has steadily increased; doubling time is ~ 7 months
• Network-attached RAID (NAS) servers are becoming simpler and cheaper
• Simply storing the data is not a problem in the forseeable future
Data source: Rev. C of Seagate SCSIdisk drive product manuals (i.e., firstOEM-quantity release)
Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC
But how do we find and useall those data?
• Solar data searches tend to be for multiple wavelength/entendu data sources for the same time period (i.e. not RA and DEC or other position or object)
• Current archives are available on the Web at many sites, with heterogeneous search capabilities
• Most but not all data of current interest are in FITS format
• SolarSoft tree (in IDL™) offers ground-based, multiple space-based observatory support, as well lots of “generic” functionality (“the wheel” that you don’t want to have to reinvent)
Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC
Biggering and biggering
• Solar data set sizes are growing at an impressive rate
• Data sets that are “only” several Tbyte in size will be dwarfed in 5 - 6 years
Data point sizes represent the data rate; the ordinaterepresents the total data volume from the source.
Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC
Toward a Virtual Solar Observatory
• Three parts:– distributed archives– metadata “broker”
facility– Web-based front end
• Can have different implementations– XML– Gnutella– &c.
• Will be low-cost– No more than $1.2M
over next four years
• We must be really smart:– Same model adopted by NVO,
PDS; EGSO examining
• …. or maybe there’s exactly one obvious way to do this
Spring 2005 AGU Joint Assembly 24 May 2005 Barbara J. Thompson NASA GSFC
Volodya:
I like your VO definition a lot, and thanks for the comments. They'll be a great help! I completely understand your comments about the CDAWs. The CDAWs had a great deal of difficulty generalizing their activities for individual users, while VO's start from that approach. We might have put the cart before the horse - it appears that the virtual analysis environments will be spawned by the VO's, and not the other way around. Can you think of any examples (outside of COSEC, my best example so far) of analysis environments extending from VO's? Your "pull" data concept enables you to store data for future use, and it can also store analyzed data and products as they are produced. I want to do a bit of prognosticating at the end of the talk, because I think the VO's will develop far beyond a streamlined data access system as they enable online analysis and joint analysis projects. Perhaps it's not too late for the CDAWs.