grid activities at mta sztaki
DESCRIPTION
GRID activities at MTA SZTAKI. Peter Kacsuk MTA SZTAKI Laboratory of Parallel and Distributed Systems www.lpds.sztaki.hu. Contents. SZTAKI participation in EU and Hungarian Grid projects P-GRADE ( P arallel G r id R un-time and A pplication D evelopment E nvironment ) - PowerPoint PPT PresentationTRANSCRIPT
GRID activities at MTA SZTAKI
Peter Kacsuk
MTA SZTAKI
Laboratory of Parallel and Distributed Systems
www.lpds.sztaki.hu
ContentsContents
• SZTAKI participation in EU and Hungarian Grid projects
• P-GRADE (Parallel Grid Run-time and Application Development Environment)
• Integration of P-GRADE and Condor• TotalGrid• Meteorology application by TotalGrid• Future plans
Hungarian and international GRID projects
CERN LHC Grid
VISSZKI-Globus test-Condor test
DemoGrid - file system - monitoring- applications
SuperGrid - P-GRADE- portal- security- accounting
Condor
EU DataGrid
EU COSTSIMBEX
APART-2
EUGridLab
Cactus
EU Grid projects of SZTAKIEU Grid projects of SZTAKI
• DataGrid – application performance monitoring and visualization
• GridLab – grid monitoring and information system
• APART-2 – leading the Grid performance analysis WP
• SIMBEX – developing a European metacomputing system for chemists based on P-GRADE
Hungarian Grid projects of SZTAKIHungarian Grid projects of SZTAKI
• VISSZKI– explore and adopt Globus and Condor
• DemoGrid – grid and application performance monitoring and
visualization• SuperGrid (Hungarian Supercomputing Grid)
– integrating P-GRADE with Condor and Globus in order to provide a high-level program development environment for the Grid
• ChemistryGrid (Hungarian Chemistry Grid) – Developing chemistry Grid applications in P-GRADE
• JiniGrid (Hungarian Jini Grid) – Combining P-GRADE with Jini and Java– Creation the OGSA version of P-GRADE
• Hungarian Cluster Grid Initiative– To provide a nation-wide cluster Grid for universities
Structure of theStructure of the HungarianHungarian Supercomputing Supercomputing GridGrid
2.5 Gb/s Internet
NIIFI 2*64 proc. Sun E10000
BME 16 proc. Compaq AlphaServer
ELTE 16 proc. Compaq AlphaServer
• SZTAKI 58 proc. cluster
• University (ELTE, BME) clusters
P-GRADE
PVM MW
Condor-G
Globus
SUN HPC
Compaq AlphaServer
Compaq AlphaServer
Grid middleware
Grid fabric
High-level parallel development layer
Grid level job management
Condor, SGE
MPI
GRID portal
Web based GRID access
Condor, SGE
GRID application
GRID application
Low-levelparallel development
The Hungarian Supercomputing GRID project
Condor, SGEClusters
Condor, SGE
Distributed supercomputing: P-GRADEDistributed supercomputing: P-GRADE
• P-GRADE (Parallel Grid Run-time and Application Development Environment)
• A highly integrated parallel Grid application development system
• Provides:– Parallel, supercomputing programming for the Grid – Fast and efficient development of Grid programs– Observation and visualization of Grid programs– Fault and performance analysis of Grid programs
• Further development in the: – Hungarian Supercomputing Grid project– Hungarian Chemistry Grid project– Hungarian Jini Grid project
Three layers of GRAPNELThree layers of GRAPNEL
Communication TemplatesCommunication Templates
• Pre-defined regular process topologies– process farm– pipeline– 2D mesh
• User defines: – representative
processes– actual size
• Automatic scaling
Mesh TemplateMesh Template
Hierarchical Hierarchical DebuggingDebuggingby DIWIDEby DIWIDE
Macrostep DebuggingMacrostep Debugging
Support for systematic debugging to handle non-deterministic behaviour of parallel applications
Automatic dead-lock detection
Replay technique with collective breakpoints
Systematic and automatic generation of Execution Trees
Testing parallel programs for every time condition
GRM semi-on-line monitorGRM semi-on-line monitor
• Monitoring and visualising parallel programs at GRAPNEL level.
• Evaluation of long-running programs based on semi-on-line trace collection
• Support for debugger in P-GRADE by execution visualisation
• Collection of both statistics and event traces
• Application monitoring and visualization in the Grid
• No lost of trace data at program abortion. The execution can be visualised to the point of abortion.
PROVE Statistics WindowsPROVE Statistics Windows
Profiling based on counters Analysis of very long running
programs is enabled
PROVE: Visualization of Event TracesPROVE: Visualization of Event Traces
User controlled focus on processors, processes and messages
Scrolling visualization windows forward and backwards
Integration of Integration of Macrostep Macrostep Debugging Debugging and PROVEand PROVE
Features of Features of P-GRADEP-GRADE
Designed for non-specialist programmers Enables fast reengineering of sequential
programs for parallel computers and Grid systems
Unified graphical support in program design, debugging and performance analysis
Portability on supercomputers heterogeneous clusters components of the Grid
Two execution modes: Interactive mode Job mode
P-GRADE Interactive ModeP-GRADE Interactive Mode
P-GRADEInteractive
mode
Design/Edit
Compile
Map Debug
Monitor
VisualizeDevelopment cycle
Typical usage on supercomputers or clusters
P-GRADE Job Mode with CondorP-GRADE Job Mode with Condor
P-GRADEand
Condor
Compile
Design/Edit
CondorMap
Attach
Detach
Submit job
Typical usage on clusters or in the Grid
Condor/P-GRADE on the whole range of Condor/P-GRADE on the whole range of parallel and distributed systemsparallel and distributed systems
Super-computers
2100 2100 2100 2100
2100 2100 2100 2100
Clusters Grid
GFlops
Mainframes
P-GRADE
Condor
P-GRADE
Condor
P-GRADE
Condor flocking
Berlin CCGrid Grid Demo workshop: Berlin CCGrid Grid Demo workshop: Flocking of P-GRADEFlocking of P-GRADE programs by Condorprograms by Condor
m0 m1
P-GRADEBudapest
Budapest
n0 n1
Madison
p0 p1
Westminster
P-GRADE program runs at the Madison cluster
P-GRADE program runs at the Budapest cluster
P-GRADE program runs at the Westminster cluster
Next step: Check-pointing and Next step: Check-pointing and migration of P-GRADEmigration of P-GRADE programsprograms
m0 m1
P-GRADEGUI
Wisconsin
Budapest
n0 n1
London1
P-GRADE program downloaded to London as a Condor job
2
P-GRADE program runs at the London cluster
London cluster overloaded => check-pointing
3
P-GRADE program migrates to Budapest as a Condor job
4P-GRADE program runs at the Budapest cluster
Further develoFurther developpment: TotalGridment: TotalGrid
• TotalGrid is a total Grid solution that integrates the different software layers of a Grid (see next slide) and provides for companies and universities – exploitation of free cycles of desktop
machines in a Grid environment after the working/labor hours
– achieving supercomputer capacity using the actual desktops of the institution without further investments
– Development and test of Grid programs
Layers of TotalGridLayers of TotalGrid
Internet Ethernet
PVM or MPI
Condor or SGE
PERL-GRID
P-GRADE
PERL-GRIDPERL-GRID
• A thin layer for – Grid level job management between P-
GRADE and various local job managers like• Condor• SGE, etc.
– file staging
• Application in the Hungarian Cluster Grid
Hungarian Cluster Grid InitiativeHungarian Cluster Grid Initiative
• Goal: To connect the 99 new clusters of the Hungarian higher education institutions into a Grid
• Each cluster contains 20 PCs and a network server PC.– Day-time: the components of the clusters are used for
education– At night: all the clusters are connected to the Hungarian
Grid by the Hungarian Academic network (2.5 Gbit/sec)– Total Grid capacity by the end of 2003: 2079 PCs
• Current status:– About 400 PCs are already connected at 8 universities– Condor-based Grid system– VPN (Virtual Private Network)
• Open Grid: other clusters can join at any time
Structure of the Structure of the Hungarian Cluster GridHungarian Cluster Grid
2.5 Gb/s Internet
2003: 99*21 PC Linux clusters, total 2079 PCs
Condor => TotalGrid
Condor => TotalGrid
Condor => TotalGrid
Live demonstration of TotalGridLive demonstration of TotalGrid
• MEANDER Nowcast Program Package:– Goal: Ultra-short forecasting (30 mins)
of dangerous weather situations (storms, fog, etc.)
– Method: Analysis of all the available meteorology information for producing parameters on a regular mesh (10km->1km)
• Collaborative partners:– OMSZ (Hungarian Meteorology Service)– MTA SZTAKI
Structure of MEANDERStructure of MEANDER
First guess dataALADIN
SYNOP data Satelite Radar
CANARIDelta analysis
Basic fields: pressure, temperature, humidity, wind.
Derived fields: Type of clouds, visibility, etc.
GRID
Radar to grid
Satelite to grid
Current time
Type of clouds
Overcast
Visibility
Rainfall state
VisualizationFor meteorologists:HAWK
For users: GIF
Lightning
decode
P-GRADE version of MEANDERP-GRADE version of MEANDER
25 x
10 x 25 x 5 x
Implementation Implementation of the Delta of the Delta method in method in P-GRADEP-GRADE
netCDFoutput
34 MbitShared
PERL-GRIDCONDOR-PVM
job
11/5 MbitDedicated
P-GRADEPERL-GRID
job
HAWK
netCDFoutput
Live demo of MEANDER Live demo of MEANDER based on TotalGridbased on TotalGrid
ftp.met.hu
netCDF
512 kbitShared
netCDFinput
Parallel execution
Results of the delta methodResults of the delta method
• Temperature fields at 850 hPa pressure• Wind speed and direction on the 3D mesh of the MEANDER system
GRMTRACE
34 MbitShared
PERL-GRIDCONDOR-PVM
job
11/5 MbitDedicated
P-GRADEPERL-GRID
job
On-line Performance Visualization in On-line Performance Visualization in TotalGridTotalGrid
ftp.met.hu
netCDF
512 kbitShared
netCDFinput
Parallel execution and GRM
GRMTRACE
PROVE PROVE performance performance visualisationvisualisation
Edit, debugging
Performance-analysis
Execution
P-GRADEP-GRADE: Software Development and : Software Development and ExecutionExecution
Grid
Applications in P-GRADEApplications in P-GRADE
Completed applications• Meteorology: Nowcast package (Hungarian
Meteorology Service)• Urban traffic simulation (Univ. of Westminster)
Applications under development• Chemistry applications • Smog forecast system• Analysis of smog alarm strategies
Further extensions of P-GRADEFurther extensions of P-GRADE
• Automatic check-pointing of parallel applications inside a cluster (already prototyped)– Dynamic load-balancing at – Fault-tolerant execution mechanism
• Automatic check-pointing of parallel applications in the Grid (under development)– Automatic application migration in the Grid– Fault-tolerant execution mechanism in the Grid– Saving unfinished parallel jobs of the Cluster Grid
• Extensions under design– Parameter study support– Connecting P-GRADE with GAT– Workflow layer for complex Grid applications
Workflow interpretation of Workflow interpretation of MEANDERMEANDER
1st job
2nd job 3rd job 4th job
5th job
ConclusionsConclusions
• SZTAKI participates in the largest EU Grid projects and in all the Hungarian Grid projects
• Main results:– P-GRADE (SuperGrid project)– Integration of P-GRADE and Condor (SuperGrid)
• demo at Berlin CCGrid
– TotalGrid (Hungarian Cluster Grid)– Meteorology application in the Grid based on
the P-GRADE and TotalGrid approaches• demo at the 5th EU DataGrid conference
• Access of P-GRADE 8.2.2:
www.lpds.sztaki.hu
Thanks for your attention
Thanks for your attention
?
Further information: www.lpds.sztaki.hu
GRM semi-on-line monitorGRM semi-on-line monitor
• Semi-on-line– stores trace events in local storage (off-line)– makes it available for analysis at any time during
execution for user or system request (on-line pull model);
• Advantages– analyse the state (performance) of the application at
any time– scalability: analyse trace data in smaller sections and
delete them if they are not longer needed – Less overhead/intrusion to the execution system
than with on-line collection (see NetLogger)– Less network traffic: pull model instead of push
model. Collections initiated only from top.
GRM/PROVE in the DataGrid project
Basic tasks:
step 1: To create a GRM/PROVE version that is independent from P-GRADE and runable in the Grid
step 2: To connect the GRM monitor to the R-GMA information system
Main MonitorMM
App. Process App. ProcessApp. Process
Site 1
Submit machine
PC 1 PC 2 PC 1
Local MonitorLM
Local MonitorLM
Site 2
Local MonitorLM
GRM GRM in the in the gridgrid
App. Process
shm shm shm
Trace file PROVE
Pull model => smaller network traffic than in NetLogger
Local monitor => more scalable than NetLogger
Start-up of Local MonitorsStart-up of Local Monitors
Submit host Site
Server host
host 2 host 4
host 3
Grid broker
LAN
Local job manager
applicationprocess1
LocalMonitor
MainMonitor
GUI
WAN
applicationprocess2
LocalMonitor
MM Host:port
This mechanism is used in TotalGrid and in the live demo
Main Monitor
Site
Client machine
Machine 1
App.Process
App.Process
App.Process
R-GMA
PROVE
2nd step: Integration with2nd step: Integration with R-GMA R-GMA
Machine 2
Sensor
ProducerAPI
Application
ConsumerAPI
R-GMA R-GMA
Producer Servlet
RegistryAPI
Registry Servlet
SchemaAPI
Schema Servlet
“database of event types”
Consumer Servlet
RegistryAPI
Main Monitor
Instrumented application code
SQL SELECT
XML
SQL CREATE TABLE
SQL INSERT
Integration withIntegration with R-GMA R-GMA