escienceptvu/gc/2011/pub/gridandescience... · 2016. 11. 4. · grid computing 6 what is escience ?...
TRANSCRIPT
1
eScience
Presenters:
Tai Tri Nguyen 10070939
Thang Quyet Nguyen 10070940
April – 11 - 2011
2Grid Computing
Agenda
What is eScience ?
Why eScience ?
New infrastructure enabling for eScience
The future holds for e-Science
Summary
Scientific Data Federation
3Grid Computing
Agenda
What is eScience ?
Why eScience ?
New infrastructure enabling for eScience
The future holds for e-Science
Summary
Scientific Data Federation
4Grid Computing
What is eScience ?
• eScience: the e isn‟t an abbreviation. The e ineScience is hard, complex and difficult. eScience ina manner can be known as to aim to “open science”.
• Endless discussed problem:
Socio Economic >< Science share culture
• “eScience is about global collaboration in key areasof science and the next generation of infrastructurethat will enable it”
created by John Taylor,
the Director General of the UK's Office of Science and Technology in 1999
was used to describe a large funding initiative starting in November 2000
5Grid Computing
What is eScience ?
• eScience is not just high bandwidth communication andHPC (High Performance Computers) running simulationslinked through “the GRID”
• eScience is about exploiting digital technology to supportall aspects of scientific activity
• eScience is about support for large-scale sciencethrough distributed global collaborations
• eScience is about formation of virtual co-laboratoriesallowing scientists to work together irrespective oflocation
– Universal access to scientific resources
– Support for scientific community
6Grid Computing
What is eScience ?
Understand eScience in general aspect:
• e-Science is all about furthering technology in order toadvance the scientific discipline.
• If scientific research is to go above and beyond, andreach heights that we would not have thought possiblebefore, then e-Science will be the infrastructure to pavethe way.
• There is a large group of professionals and researcherswho are currently working to make sure scientists areable to reach their goals
• eScience an global collaboration infrastructure forscience life
7Grid Computing
What is eScience ?
e-Science (UK) and Cyberinfrastructure (US)
• “e-Science is about global collaboration in key areas of science and the next generation of [computing] infrastructure that will enable it."
John Taylor, Director Office of Science and Technology, UK
• "Cyberinfrastructure is the coordinated aggregate of software,hardware and other technologies, as well as human expertise,required to support current and future discoveries in science andengineering. The challenge of Cyberinfrastructure is to integraterelevant and often disparate resources to provide a useful, usable,and enabling framework for research and discovery characterized bybroad access and 'end-to-end' coordination.“
Fran Berman, San Diego Supercomputer Center, UCSD
8Grid Computing
What is eScience ?
e-Science Grid in UK
9Grid Computing
Agenda
What is eScience ?
Why eScience ?
New infrastructure enabling for eScience
The future holds for e-Science
Summary
Scientific Data Federation
10Grid Computing
Why eScience ?
• The scientific imperative
new modes of scientific inquiry
data-intensive science
simulation-based science
remote access to experimental apparatus
virtual community science
• The industrial imperative
11Grid Computing
The scientific imperative
New modes of scientific inquiry
Data-intensive science:
The way of researching changes from few data, lots of thinking, to …
NOW: Lots of Data & Analysis
eScience is driven by data Data-driven scientific discovery!
You are here …
The Data Deluge
12Grid Computing
The scientific imperative
Data-intensive science:
LHC(Large Hadron Collider) 60TB/day Apache Point Telescope SDSS 15TB/day
Large Synoptic Survey Telescope 30TB/day Illumina Genome Analyzer 1TB/day
13Grid Computing
The scientific imperative
Simulation-based science
Numerical simulation represents another new problem-solving methodology which physical experiments cannot easily be performed but computational simulations are feasible.
• The Japanese Earth Simulator:
Allowing simulations to be performed at an unprecedented 10-km horizontal resolution and generating many tens of terabytes of data in a single run
• The UK Comb-e-Chem project:
http://www.combechem.org/
The goal of this project is to “synthesize” large numbers of new compounds by high-throughput combinatorial methods and then map their structure and properties.
14Grid Computing
The scientific imperative
Simulation-based science
• U.S. Encyclopedia of Life (EOL) project
http://www.eol.org/
Intent to document all of the 1.8-1.9 million livingspecies known to science. It aims to build one "infinitelyexpandable" page for each species, including video,sound, images, graphics, as well as text.
Seeks to produce a database of putative functional and 3D structure assignments for all known publicly available complete or partial genomes.
15Grid Computing
The scientific imperative
Remote Access to Experimental Apparatus
- The emergence of high-speed networks facilitate tointegrate the experimental apparatus into the scientificproblem-solving process.
- Earthquake Engineering Simulation (NEES)
http://nees.org/
Is an ambitious national program whose purpose is to advancethe study of earthquake engineering and to find new ways toreduce the hazard earthquakes represent to life and property
Collaborative tools aid (middleware) in experiment planningand allow engineers at remote sites to perform teleobservationand teleoperation of experiments, and enable access tocomputational resources and open source analytical tools forsimulation and analysis of experimental data
16Grid Computing
The scientific imperative
Remote Access to Experimental Apparatus
Sharing engineering research equipment, data resources, and leading
edge computing resources.
17Grid Computing
The scientific imperative
Virtual community science
• The global collaborative will lead to create a virtual community science.
• The most significant impact of Grid technologies on science may be global virtual communities of scientists able to address the fundamental problems of today and tomorrow.
• The Grid as an Enabler for Virtual Organisations.
“Virtual Organization: A set of individuals and/or institutions defined by such sharing rules. This concept is becoming fundamental to much of modern computing”
18Grid Computing
The industrial imperative
The new model: on-demand computing !!!
Computational resources on demand
19Grid Computing
Agenda
What is eScience ?
Why eScience ?
New infrastructure enabling for eScience
The future holds for e-Science
Summary
Scientific Data Federation
20Grid Computing
New infrastructure for eScience
Requires major investments in physical infrastructure (petabyte archival storage, terabit networks, sensor networks, teraopsupercomputers), software infrastructure (Grid middleware, collaboratories), and new application concepts and software
Governments are realizing the importance of these investments as a means of enabling scientific progress and enhancing national competitiveness
Development based on Grid infrastructure
21Grid Computing
New infrastructure for eScience
22Grid Computing
New infrastructure for eScience
Challenges ( by Tony Hey Director of UK e-Science Core Program [email protected])
• Building a Future Infrastructure
- Developing a Semantic Grid
- Trusted Ubiquitous Systems
- Rapid Customized Assembly of Services
- Autonomic Computing : self-managing characteristics of distributed computing resources, adapting to unpredictable changes whilst hiding intrinsic complexity to operators and users
• Putting the Infrastructure to work
- Support for New Forms of Community
- Socio-Economic Impact
- Collaboratory Intellectual Properties Register and legal issues
23Grid Computing
New infrastructure for eScience
A eScience Grid based framework
24Grid Computing
Agenda
What is eScience ?
Why eScience ?
New infrastructure enabling for eScience
The future holds for e-Science
Summary
Scientific Data Federation
25Grid Computing
The future holds for e-Science
• Innovation
It is the general consensus that the technology of tomorrow must beready to meet the inspirational thinking of scientists.
• Business
There is a desire not only to make the technology of e-Scienceavailable to scientists, but also commercial entities, such asengineers.
• Collaboration
Partnership is a vital element to the development of better storagefacilities and the enhancement of Grid infrastructures.
26Grid Computing
The future holds for e-Science
• Complex ideas
Scientists are trying to research things that they have never eventouched upon before. E-Science and its conglomerates are wellaware that advances in information technology are the only wayforward for the advancement of science.
• Education
If e-Science is to improve its image and further its impact uponscience, then it is essential that the students of tomorrow are trainedin the use of advanced computing technology.
• International development
It is essential to the future success of e-Science that its methods andtechnology are used across the globe, not just within UK.
27Grid Computing
UK eScience projects
• GRIDPP (PPARC)
• ASTROGRID (PPARC)
• Comb-e-Chem (EPSRC)
• DAME (EPSRC)
• DiscoveryNet (EPSRC)
• GEODISE (EPSRC)
• myGrid (EPSRC)
• RealityGrid (EPSRC)
• Climateprediction.com (NERC)
• Oceanographic Grid (NERC)
• Molecular Environmental Grid (NERC)
• NERC DataGrid (NERC + OST-CP)
• Biomolecular Grid (BBSRC)
• Proteome Annotation Pipeline (BBSRC)
• High-Throughput Structural Biology (BBSRC)
• Global Biodiversity (BBSRC)
• Biology of Ageing (BBSRC + MRC)
• Sequence and Structure Data (MRC)
• Molecular Genetics (MRC)
• Cancer Management (MRC + PPARC)
• Clinical e-Science Framework (MRC)
• Neuroinformatics Modeling Tools (MRC)
• MIASGRID (OST-CP)
• AKTing (OST-CP)
• EquatorGrid (OST-CP)
• DIRCGrid (OST-CP)
• MB-NG (OST-CP/PPARC)
• UK EDG (OST-CP/PPARC)
• OGSA-DAI (OST-CP)
28Grid Computing
Agenda
What is eScience ?
Why eScience ?
New infrastructure enabling for eScience
The future holds for e-Science
Summary
Scientific Data Federation
29Grid Computing
Astronomy Grid Application
• Why choose astronomy application:
– Scale of data: tetrabyte now, petabyte soon.
– Datasets are distributed.
– Modern data are carefully peer reviewed and collected with rigorous statistical and scientific standards.
– Data provenance is tracked and derived datasets are curated fairly carefully.
– Most data are publicly available and will remain available for the foreseeable future.
– Old data (may be less accurate) are essential to study time-varying phenomena.
– Can not download full copy of each archive for local processing but request small subset from each archive.
30Grid Computing
The Virtual Observatory
• Also called the World-Wide Telescope.
• Wiki Definition
– Virtual Observatory (VO) is a collection of interoperating data archive and software tools which utilize the internet to form a scientific research environment in which astronomical research programs can be conducted.
• Functions:
– Provide portals, protocols, and standards that unify the world‟s astronomy archives into a giant database containing all astronomy literature, images, raw data, derived datasets and simulation data
– Integrated as a single intelligent telescope.
31Grid Computing
The Virtual Observatory
• IVOA (International Virtual Observatory Alliance) is a standard body created by VO projects to develop and agree the vital interoperability standards upon which the VO implementations are constructed.
• Examples
– AstroGrid: the UK‟s Virtual Observatory Service
– Ero-VO: the European VO
– National Virtual Observatory: the USA‟s VO
– Virtual Observatory, India.
– Iran Virtual Observatory.
32Grid Computing
The Virtual Observatory
• Traditional model for publishing scientific data
– Authors:
• Individual or small group.
• Create the experiments that provide data.
• Write papers that contains and explain the data.
– Publishers:
• The scientific journals.
• Print papers.
– Curators:
• Organize and store the journals and make them available for consumers.
– Consumers:
• Scientists who want to use and cite the data in their own research.
Suitable only when all scientific data relevant for research could easily be included in the publication.
33Grid Computing
The Virtual Observatory
• Publishing scientific data on astronomy
– The role of author belong to collaborations.
– Projects can be also publishers and curators.
– Take 5 to 10 years to build the experiment before the author start producing data.
– Data volume is so large to be contained in a journal.
– Data published to the collaboration through Web-based archives.
– Consumers must deal with data from many sources.
34Grid Computing
The Virtual Observatory
• Metadata and provenance
– Important to capture the detail of how the data were derived and calibrated.
– UDC (unified content descriptor): words in a compressed dictionary derived by automatically detecting the most commonly used terms in over 150000 tables in the astronomical literature.
– Find common and comparable attributes in different archives
35Grid Computing
Web Services
• Access data from a VO:
– Most of data will be remote data access need to be as transparent as if it were local.
– Remote data volume may be huge move as much the data processing as near the data as possible.
– Data may be extracted from databases by a query
– Data may not exist at time of request.
36Grid Computing
Web Services
• Web services in the Virtual Observatory
– The core services can be combined into more complex portal to:
• Talk to several services
• Create more complex results.
– Modular components, standard interfaces, and access to commercially built toolkits for the lowest level communication tasks.
– Need to carefully define VO framework and core services that provides clear standards, interfaces, documentation and reference implementations.
37Grid Computing
Hierarchical architecture
• A
38Grid Computing
Hierarchical Architecture
• Archive
– Refer to a collection of historical records, as well as the physical place they are located.
– Store text, images, and draw data in blobs or files and store their schematized data in relational databases.
– Provide data mining tools to allow easy search and sub-setting of the data objects at each archive.
– Provide web service interface for on-demand queries.
– Provide a file transfer service for answers that involve substantial computation or data transfer.
– Contains metadata about their contents (both physical units and the provenance of the data).
39Grid Computing
Hierarchical Architecture
• Web services
– Support a common core schema that extends the VOTabledata model.
– The VO Table specify:
• A standard coordinate system, standard representations for core astronomical concepts
• Standard ways to represent both values and error
– Built on top of SOAP and XML Schema Definitions (XSDs).
– Most are interactive tasks that extract data on demand for portals and for interactive client tools.
40Grid Computing
Hierarchical Architecture
• Registries:
– One or more registries are declared in each archive.
– Record what kinds of information the archive provides
– Be widely replicated and given the overlaps of astronomy with other disciplines.
– Be used by portals: serve answers user queries by integrating data from many archives.
• Portals:
– Use registries to serve to answer user queries by integrating data from many archives.
– Individuals may build their own custom portals to solve particular problems.
– Sample portals: MAST, GLU, AstroGrid, SkyQuery.
41Grid Computing
Hierarchical Architecture
• Portal Example: SkyQuery
– Integrate 5 different Web services: SDSS, 2MASS, Faint Images, the Isaac Newton Telescope Wide Field Survey and Image web services.
– Archives located on 2 continents at several geographic locations.
– Accepts queries specifying the desired object properties.
– Decide which archives have relevant data (by querying each of them) and calculate an optimal query plan to answer the question.
– Resulting answer set is delivered to the user in tabular form along with images of the object.
– Its self a web service.
– Can be used as a component of some other portals.
– Built using SQL and the .NET tools
42Grid Computing
Hierarchical Architecture
• Astronomy application:
– Require access at the granularity of objects rather than entire files.
– Data resides in read-intensive database, accessed by associative query interface.
– Comparing observation requires access and compare individual records in several different archives.
– The use of spatial and other indices, a heavy use of databases.
– Access control must be addressed but less important.
– Resource management is important.
43Grid Computing
Hierarchical Architecture
• Data, networking and computation economics
– All data can be kept online.
– The data and derived products were collected at great expemse should safty stored at 2 or more locations.
– If a query is small, just be sent to one of the archive servers.
– If a query exceed the limit, some planning is required.
44Grid Computing
The Virtual Observatory and the Grid
• Compute-intensive tasks
– Transformation of raw instrument data into calibrated and cataloged data
– Software constantly refined and improved old data need to be reprocessed about once a year.
45Grid Computing
The Virtual Observatory and the Grid
• Data mining and statistics of tetrabytes
– Correlation algorithm involves the computations of pairwisedistances.
– Typical matrix sizes today are in range 10000^2 to 1000000^2.
– Even N log N algorithms are infeasible for datasets involving billions of objects the use of approximate and heuristic algorithms.
46Grid Computing
Agenda
What is eScience ?
Why eScience ?
New infrastructure enabling for eScience
The future holds for e-Science
Summary
Scientific Data Federation
47Grid Computing
Summary
e-Science and the Grid
„e-Science will change the dynamic of the way science isundertaken.‟
John Taylor, 2001
‘[The Grid] intends to make access to computing power,scientific data repositories and experimental facilities aseasy as the Web makes access to information.’
Tony Blair, 2002
48Grid Computing
Reference
1. Ian Foster and Carl Kesselman, The Grid 2 Blueprint for a New Computing Infrastructure. Morgan Kauffman Publishers, 2004.
2. NSF office of Cyberinfrastructurehttp://www.nsf.gov/dir/index.jsp?org=OCI
3. A group of UK eSciencehttp://www.escience-grid.org.uk/
4. Collaborative Research in e-Science and Open Access to Information- Paul A. David Stanford University - Matthijs den Besten Oxford e-Research Centre - Ralph Schroeder Oxford Internet Institute – Spring 2009-SIEPR Discussion Paper No. 08-21
5. Computer Challenges to emerge from eScience -Talk- Malcolm Atkinson (NeSC), Jon Crowcroft (Cambridge), Carole Goble (Manchester), John Gurd(Manchester), Tom Rodden (Nottingham),Nigel Shadbolt (Southampton), Morris Sloman (Imperial College), Ian Sommerville (Lancaster), Tony Storey (IBM)
6. The Encyclopedia Wikipediahttp://en.wikipedia.org/wiki/
7. National e-Science centre:
http://www.nesc.ac.uk/action/esi/
49Grid Computing
50