data management in a grid environment - theory and practical examples
DESCRIPTION
Data Management in a Grid Environment - theory and practical examples. Kerstin Kleese van Dam et. al., CCLRC e-Science Centre [email protected] http://www.e-science.clrc.ac.uk. Council for the Central Laboratory of the Research Councils. - PowerPoint PPT PresentationTRANSCRIPT
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
1
Data Management in a Grid Environment - theory and practical examples
Kerstin Kleese van Dam et. al.,
CCLRC e-Science Centre
http://www.e-science.clrc.ac.uk
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
2
Council for the Central Laboratory of the Research Councils
One of Europe’s largest Research Support Organisations, providing large scale experimental, data and computing facilities primarily to the UK research community both in academia and industry. Annually supporting around 12000 scientists from all major scientific domains. 1800 members of staff over three sites:
•Rutherford Appleton Laboratory in Oxfordshire
•Daresbury Laboratory in Cheshire
•Chilbolton Observatory in Hampshire
Large quantities of data associated with the various facilities. Houses 1 World Data Centre, 3 National Data Centres and a range of community based data services.
http://www.cclrc.ac.uk
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
3
CCLRC e-Science Centre
Early involvement in e-Science (from 1999 Data Grid / WOS onwards).
Centre established in 2000, since 2001 with direct governmental funding, additional funding through participation in other projects.
Currently housing UK Grid Support Centre (together with Manchester + Edinburgh) and BBSRC Grid Support Centre.
Involved in DataGrid, GridPP, AstroGrid and NERC DataGrid
Currently 40 permanent members of staff, 10 in the data management group.
http://www.escience.clrc.ac.uk
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
4
Data Management Group
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
5
Current e-Science Projects of the Data Management Group
Working on collaborations with partners inside CCLRC, the UK and internationally
CLRC DataPortal
Integration of ISIS and BADC operational Data Catalogues
Environment from the Molecular Level
NERC DataGrid
e-Science Technologies for the Simulation of Complex Materials
Extensions of the Storage Resource Broker (SRB) together with SDSC
Earth Science Portal Project
Database service for CCLRC and related e-Science projects
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
6
Data Management
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
7
Currently the scientist has to take care of his data, providing the binding link between different areas of work.
In the future we hope that e-Science technologies provide scientists with a more helpful environment …
Your personal e-Science Interface where ever you are.
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
8
Issues
Data capture from instruments and computers
Data Storage
Annotating data
Data Discovery
Association of data with appropriate applications
Conversion of data from one application to the other
Merging of data from different sources
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
9
Data capture from instruments and computers
In a Grid environment the Scientists will ultimately have little control where he will carry out his experiment or calculation and where therefore his data will be.
Capture Data
Capture Information about the environment
Direct where output goes
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
10
Data Capture from Experimental Facilities (1)
Instruments produce varying amounts of data, ranging from small (e.g. temperature readings at a station) to large (e.g. LHC with several Tbytes per second).
Each instrument will produce data in its own format, often incompatible with anything else.
Most facilities provide their own short term storage, but will neither annotate nor manage the data.
The collection of environmental information is often limited, much of the information is still recorded in lab notice books.
Correction values or error margins related to the instrument are not linked to the collected data.
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
11
Data Capture from Experimental Facilities (2) - Requirements
Generalised description of data format (possible standardisation for instruments of the same type).
Automatic capture of environment information including Instrument scientists if necessary.
Automatic linking of data about the environment and the raw data produced by the instrument.
Automatic insertion of both types of data into interim or final data repository.
Automatic linking of the donated data to existing related information e.g. proposal, other experiments of the same project.
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
12
Data Capture from Experimental Facilities (3) - Examples
ICAT - CLRC ISIS Catalogue http://www.isis.rl.ac.uk/dataanalysis
See also:
Comb-e-Chem - http://www.combechem.org
Collection of Raw data from the Instrument, Detector specific Information for this experiment etc.
Integrate Raw Data with original Proposal Information and Log files of the Instrument Scientists
Finally Integrated with other Facility Data within and outside CCLRC via Instances of the CCLRC DataPortal software.
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
13
Data Storage
The Grid environment provides access to a multitude of storage systems, often hiding the type of system behind services interfaces.
Where is the data
How can I manage it
On which media is my data (access time)
How can it be accessed
Where are replicas of my data
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
14
Data Storage (2) - Requirements
Easy overview where your data is on the Grid
Support to manage your data (transfers/replicas)
Access and access control to your data where ever it is
Support to share your data
Two possible solutions:
Globus Data Management tools - example ESG http://www.earthsystemsgrid.org
Storage Resource Broker (SRB) from the San Diego Super Computing Centre
http://www.npaci.edu/DICE/SRB
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
15
Typical Analysis Scenario and the use of Storage Resource Managers (SRM)
tape system
HRM
RequestExecuter
DRM
DiskCache
Metadatacatalog
Replicacatalog
NetworkWeatherService
logicalquery
pinning & filetransfer requests
network
DRM
DiskCache
clientclient ...
RequestInterpreter
requestplanning
logical files
site-specific files
Client’s site
...
DiskCache
site-specific files requests
Metadata Catalogue for Data Discovery within one Virtual Organisation
Replica Catalogue keeps track of all replica’s of specific datasets within one Virtual Organisation
The Network Weather Service helps to plan fastest Access routes to the data
Request goes out to Disk and Hierarchical Storage Resource Managers
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
17
Storage Resource Broker (1)
Professional Data Storage Management System initially developed in the mid 90’s by the San Diego Super Computing Centre. http://www.npaci.edu/DICE/SRB/. Current version supports many platforms and authentication methods. Web services Interfaces.
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
18
Storage Resource Broker
Integrated access to data on PC, UNIX, LINUX, DB and Tape Store http://www.npaci.edu/dice/srb/mySRB/mySRB.html
also used in the BIRN project http://www.nbirn.net/
SRB External Interface Modules: MySRB (web based), Command line Interface, C and Fortran API’s – Password and Certificate authorisation
Devise Interface Modules to wide range of platforms – easy to extend to new systems
MCAT provides links between logical to physical data location, replica and versioning. MCAT can be run on a variety of Relational Databases.
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
19
Replica or Original Data
Version of DataType of Data
Physical Data Location and Type of Resource
Functions including ingestion, movement and replication of data. Providing access to data for others
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
20
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
21
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
22
Biomedical Informatics Research Network
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
23
Annotating Data
Data without further information is only of short and very limited use.
Information about the data itself
Information about the where, why, who and when
Information about the environment in which the data was captured
Related Information
Example: CLRC Scientific Metadata Schema http://www.e-science.clrc.ac.uk/Activity/ACTIVITY=DataPortal;SECTION=5;
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
24
Diversity: Users & Searches
Discovery Excavation
Wider science
comm
unity
Data curator
Specialist userExperim
enter
General
comm
unity
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
25
General Scientific Metadata
Science Metadata Model
ISIS SRS HEPSpace
ScienceSocial
ScienceEarth
Science
A generic metadata model for all scientific applications with Specialisation for each domain
Can answer questions across domains
Can answer questions about specific domains
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
26
CLRC DataPortal - Scientific Metadata Model
Metadata Object
Topic
Study Description
Access Conditions
Data Location
Data Description
Related Material
Keywords providing a index on what the study is about.
Provenance about what the study is, who did it and when.
Conditions of use providing information on who and how the data can be accessed.
Detailed description of the organisation of the data into datasets and files.
Locations providing a navigational to where the data on the study can be found.References into the literature and community providing context about the study.
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
27
Data Discovery
Most data is currently ‘discovered’ by word of mouth from friends and colleagues or sheer luck.
Discovery
Browsing
Selection
Comparison
Access
Example: CLRC DataPortal http://esc.dl.ac.uk:9000/index.html
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
28
Different Levels of Metadata supporting Discovery and Selection
Metadata
XML
A
XML
B
A: Usage metadata generatedfrom (or about) the data. It could
be aggregated metadata: e.g.CDML from cdscan.
XML
C
XML
D
XML
QQ: Schema whichdefines supported
queries uponA,B,C,D
Relationships
B: Complete metadata from A+ user provided info to conform
with (at least) GEO profile.Application + template needed.
C: Metadata generated todescribe both documentations
and annotations (as opposed tobinary data).
D: Discovery metadata suitablefor harvesting to a portal.
Probably based on Dublin core& GEO. Subset of B and C.
Definitions
XML
D
XML
C
XML
BXML
AXML
D?
A -Metadata – can be derived from the data itself
D -Metadata – User provided information on what, who, what and when
C -Metadata – All related metadata, papers, pictures, related studies
B -Metadata – A summary of all other types of metadata
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
29
CLRC DataPortal
The DataPortal currently allows access to selected metadata and data from four facilities. The first three housed by CLRC:
The Synchrotron Radiation Department (SRD)
The Neutron Spallation Source (ISIS)
The British Atmospheric Data Centre (BADC)
Max-Planck Institute for Meteorology (MPIM)
You will be able to assess the available data via the basic search.
If you are not one of our partners, but would like to try the system you can use one of our test accounts: Login , using 'dpuser' for your username and password.
http://esc.dl.ac.uk:9000/index.html
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
30
DataPortal Architecture
The major functions of the DataPortal (DP) are grouped into modules, each module has a grid services interface to communicate with the other DP services and in some cases also with outside services like Visualisation or HPC Portal. The Soap protocol is used for communication and WSDL to describe the various services. We do not change any local metadata system, but use our own wrappers to translate our general query format into the local syntax. Replies from the resources will be XML files compliant with the CLRC Scientific Metadata Format:
(http://www-dienst.rl.ac.uk/library/2002/tr/dltr-2002001.pdf)
The UK e-Science Grid CA provides Globus x509 certificates for the UK e-Science community. The CA is located at RAL and is being run as part of the Grid Support Centre funded by the Research Councils' Core e-Science programme.
(http://www.grid-support.ac.uk/)
The implementation of the core modules as grid services allows the DataPortal to be a truly distributed application and allows several instances of the DataPortal to logically combined thus extending any user query.
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
31
General CLRC DataPortal Architecture
CLRC DataPortal Server Other Instances of the CLRC DataPortal Server
Local data
Local metadata
XML wrapper
Facility 1
Local data
Local metadata
XML wrapper
Facility N
Local data
Local metadata
XML wrapper
Facility 1 ...
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
32
DataPortal Architecture (2)
As well as interacting with the DataPortal via the Web Interface users can also run queries by directly calling the Query & Reply service assuming that they are properly authenticated. Other services are also externally visible, for example the Shopping Cart.
The Shopping Cart allows registered users to permanently store and annotate pointers to the external data files and data sets.
Facilities Facilities Access ControlAccess Control
CertificationCertificationAuthorityAuthority
DataPortal DataPortal Web InterfaceWeb Interface
AuthenticationAuthentication&&
AuthorisationAuthorisation
Session Session ManagementManagement
Facilities XML Facilities XML WrappersWrappers
QueryQuery&&
ReplyReply
FacilityFacilityAdministrationAdministration
DataPortal DataPortal Permanent Permanent RepositoryRepository
External External Data File Data File Store(s)Store(s)
Data TransferData Transfer
ServiceServiceLook UpLook Up
Shopping CartShopping Cart
Facility Administration allows external facilities to advertise their grid services to the DataPortal.
Accessing DataPortal either via Web Interface or Web Services Interfaces e.g. Query and Reply
Authenticate and Authorise user by checking certificate validity and check with associated facilities for general access rights
Query Generation, Selection of Suitable Facilities to Query. Farm out query to selected Facilities in parallel and collect and collate results
Put interesting Data in your personal, permanent Shopping Cart, which you can share with others as required.
Use the Data Transfer Service to send your data on to a chosen application or service
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
33
Choose Facilities of Interest
Select Discipline and reduce Search Field
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
34
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
35
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
36
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
37
Annotate your Search Results
Forgotten where your data came from?
Specific Services associated with this data
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
38
Association of data with appropriate applications
The scientists will need to be able to link to all his favourite applications for analysis, simulation and visualisation, but he also needs to be informed about suitable other program’s.
Suitable applications
Correct Format
Suitable for your environment
Availability
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
39
HPCGrid Services Portal
This is a pilot project funded by the CLRC e-Science Centre to develop a Web portal to search for resources and submit HPC applications to a computational Grid in the UK. It will form the basis of application portals for the UK e-Science Grid and "thematic Grids" for e.g. NERC DataGrid and HPCI Consortia.
This project is a collaboration with the San Diego Supercomputer Centre who have developed the GridPortPortal and HotPage software for the NPACI HPC Grid, and with the University of Lecce, Italy who have developed the Grid Resource broker.
http://esc.dl.ac.uk/HPCPortal/
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
40
HPC Grid Services Portal
Provides a portal for HPC resources which can be customised for domain-specific applications.
Original collaboration with San Diego Supercomputer Center, now University of Texas (Mary Thomas).
Similar functionality to HotPage and GridPort (SDSC):
Single sign-on using a digital certificate (GSI)
Resource monitoring and Discovery (Globus)
Application Discovery (search engine)
Personal "desktop" workspace
File transfer (Globus) and Job Submission (Globus)
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
41
InfoPortal
HPCPortal
DataPortal
Searching for Applications on the UK Level 2 Grid
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
42
Chose Application: DLPOLY
Resulting Findings for DLPOLY
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
43
Summary Description
Web Service Address for DLPOLY code
Information about the systems the code is installed and available for use
Link to job submission
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
44
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
45
All machines on the UK level 2 Grid and their availability
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
46
Conversion of data from one application to the other
The scientists will need to be able to pass data from one application to the next seamlessly and with minimum interference on their part.
Determining Data Formats
Data Schema
Interchange/Conversion
Example: e-Materials Project
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
47
The CLRC DataPortal Related Projects
E-SCIENCE TECHNOLOGIES IN THE SIMULATION OF COMPLEX MATERIALS
A combination of novel computational and computer science methodologies and teams will be used to develop GRID e-Science technologies to deliver new simulation solutions to problems and fields relating to combinatorial materials science and polymorph prediction. The project will exploit the latest developments in scientific simulation methodologies (both electronic structure and force field based) and hardware ranging from desktop to HPC. It will establish a field tested integrated data and computing e-Science infrastructure customised for these key areas of current materials science. This infrastructure will, among others, enable the automatic submission of simulation, triggered by the identification of knowledge gaps in the database in response to user queries. Furthermore, the automatic integration of experimental and computational results for screening applications will be supported.
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
48
The Science: Filtering
Purely SiO4
zeoliteMetal substitution with addition of
proton
Calculation of Vibrational Freqs
Add probe
Increase quality of calculation for
best candidates
Information of Interest Structure Total energy Binding Energy HOMO/LUMO Population Analysis Vibrational Freqs
Two point displacement method used to build up dynamical matrix.Single point energy calculation at each displacement +ve and –ve in x, y, and z.
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
49
The Computation
2. Energy and gradients passed from GAMESS-UK to GULP and then final forces passed back to ChemShell (newopt module), which performs geometry optimisation.
ChemShell
ChemShellOptimiser
ChemShell
GAMESS-UK
GULP
GAMESS-UK
GULP
RMS=x
Maxg and maxs < 0.01
3. Optimisation is considered complete when both max gradient and max step are below set criteria.
1. Micro iterations to relax shells wrt forces from QM region. RMS criteria (x) tested for further movement of shells.
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
50
CML – Chemical Markup Languages
CML is a new approach to managing molecular information. It has a large scope as it covers disciplines from macromolecular sequences to inorganic molecules and quantum chemistry. CML is new in bringing the power of XML to the management of chemical information. CML and associated tools allows for the conversion of current files without semantic loss into structured documents, including chemical publications, and provides for the precise location of information within files.
Developed by Peter Murray-Rust and Henry S. Rzepa.
http://www.xml-cml.org
As an addition they are also looking at:
CCML – a Computational Chemical Markup Language
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
51
<document>- <!-- CML document - caffeine - karne - 7/8/00 --> - <!-- file converted from: MDL .mol --> - <cml title="caffeine" id="cml_caffeine_karne" xmlns="x-schema:cml_schema_ie_02.xml">- <molecule title="caffeine" id="mol_caffeine_karne" convention="mol"> <formula>C8 H10 N4 O2</formula> <string title="CAS">58-08-2</string> <string title="ACX">I1001269</string> <string title="DOT">UN 1544</string> <string title="RTECS">EV6475000</string> <float title="molecule weight">194.19</float> <float title="melting point" units="degC">238</float> <float title="specific gravity">1.23</float> <string title="water solubility" units="g/100 mL" convention="g per 100 mL at 23 degC">1-5</string> <string title="comments">White powder or white glistening needles usually melted together. LIGHT SENSITIVE</string> - <list title="alternate names">
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
52
The CLRC DataPortal Related Projects
ENVIRONMENT FROM THE MOLECULAR LEVEL: AN E-SCIENCE PROPOSAL FOR MODELLING THE ATOMISTIC PROCESSES INVOLVED IN ENVIRONMENTAL ISSUES
Many environmental problems, such as transport of pollutants, development of remediation strategies, weathering, and containment of high-level radioactive waste, require an understanding of fundamental mechanisms and processes at a molecular level. Computer simulations at a molecular level can give considerable progress in our understanding of these processes. Developments in atomistic simulation tools must now be linked with GRID technologies in order to facilitate simulation studies that can be performed with realistic conditions, and which can scan across a wide range of physical and chemical parameters. This proposal brings together simulation scientists, applications developers and computer scientists to develop UK e-science/GRID capabilities for molecular simulations of environmental issues. A common set of simulation tools will be developed for a wide range of applications, and the GRID environment will be established which will result in a giant leap in the capabilities of these powerful scientific tools. See http://eminerals.org/
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
53
The CLRC DataPortal Related Projects
THE NERC DATAGRID
Data discovery and delivery are inherent components of many aspects of science. They can be considered part of a processing chain that starts with raw data from a variety of sources, and ends with the graphical production of information that is directly used in scientific research. This proposal is to build a grid which makes data discovery, delivery and use much easier than it is now, facilitating better use of the existing investment in the curation and maintenance of quality data archives. Further we intend to make the connection between data held in managed archives and data held by individual research groups seamless in such a way that the same tools can be used to compare and manipulate data from both sources. What will be completely new will be the ability to compare and contrast data from an extensive range of (US, European, UK, NERC) datasets from within one specific context. The presence of the NERC DataGrid will allow grid based visualisation services to access a wide variety of data held at the British Atmospheric and Oceanographic Data Centres (BADC and BODC respectively) as well as on individual storage systems belonging to groups which register their data with the NERC DataGrid. The structures put in place will also allow NERC data to become part of the putative future semantic grid. See http://ndg.badc.rl.ac.uk/
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
54
CLRC DataPortal Related Projects
EARTH SCIENCE PORTAL
The Earth Science Portal (ESP) is a collaboration designed to build the infrastructure needed to create web portals to provide access to observed and simulated data within the climate and weather communities. The infrastructure created within ESP will provide a flexible framework that will allow interoperability between the front-end and back-end software components.
The initial ESP community workshop was held on January 23rd and Friday, January 24th, 2003 at the National Center for Atmospheric Research, Boulder, Colorado. Based on the discussions of the workshop we created a draft document that describes the software framework within ESP. The development activities in ESP are intended to support this framework. The document will be updated based these activities and comments and suggestions from the community.
Partners are: BADC, CCLRC, CDC and GFDL NOAA, NASA, LLNL, NCAR and PMEL
http://nomads.gfdl.noaa.gov/~ck/esp/webpages
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
55
The CLRC DataPortal Related Projects
EUROPEAN SPATIO-TEMPORAL DATA INFRASTRUCTURE FOR HIGH-PERFORMANCE COMPUTING
ESTEDI, an initiative of European software vendors and supercomputing centres, will establish a European standard for the storage and retrieval of multidimensional high-performance computing (HPC) data. It addresses a main technical obstacle, the delivery bottleneck of large HPC results to the users, by augmenting high-volume data generators with a flexible data management and extraction tool for spatio-temporal raster data. To this end, the multidimensional database system RasDaMan will be enhanced with intelligent mass storage handling and optimised towards HPC. See http://www.estedi.org/
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
56
The CLRC DataPortal Related Projects
MSC PROJECT ON AUTOMATED DATA MANAGEMENT FOR CLIMATE SIMULATIONS
These days data is no longer only produced by experiments, measurements and observations. Many of the more complex phenomena are studied in computer simulations. These simulations can produce large quantities of data. However in contrast to much experimental or observational data these results are often not accessible to the wider research communities. Simulation data could be more widely exploited if better information was available concerning the simulation itself.This project aims to investigate the possibility of automatically capturing as much metadata concerning the simulation as possible and storing it in a suitable database. The database will be accessible via the CLRC DataPortal. It is expected that next to investigating the issue in general a prototype installation will be provided by the students.
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
57
The CLRC DataPortal Related Projects
CLRC e-Science Database Service
We looking for the most flexible operating systems in terms of both software available and price/performance ultimately led to the choice of a Linux based system (enterprise editions). For running the widest choice of databases, the Redhat Advanced Server and SuSE Linux Enterprise Server are available. Oracle has been selected for the initial database service as it offers a clustering technology. Oracle Real Application Clusters are the multi-node extension to Oracle database server. A cluster is a group of independent servers (nodes) that cooperate as a single system. The primary cluster components are processor nodes, a cluster interconnect, and a shared storage subsystem. Oracle cluster database combines the memory in the individual nodes to provide a single view of the distributed cache memory for the entire database system. Oracle are the only vendor to offer this capability.
PostgreSQLPostgreSQL
We chose IBM x440 series nodes as the building blocks for the data clusters. The IBM Enterprise X-Architecture consists of Intel processor-based servers, such as support for up to 16-way SMP capability and remote I/O. The clusters connect to 1TB RAID 5 storage arrays via fibre channel switches.
N+N meeting Australia 2003e-Science CentreKerstin Kleese van Dam
58
For Information see:
Integrated e-Science Environment Portal
http://esc.dl.ac.uk/IeSE/
HPC Grid Services Portalhttp://esc.dl.ac.uk/HPCPortal/
DataPortalhttp://esc.dl.ac.uk:9000/index.html
CLRC e-Science Centrehttp://www.e-science.clrc.ac.uk