SC 2004, Pittsburgh, Nov. 6-12
GRID superscalar: a programming paradigm for GRID applications
CEPBA-IBM Research Institute
Rosa M. Badia, Jesús Labarta, Josep M. Pérez, Raül Sirvent
SC 2004, Pittsburgh, Nov. 6-12
Outline
• Objective• The essence• User’s interface• Automatic code generation• Run-time features• Programming experiences• Ongoing work• Conclusions
SC 2004, Pittsburgh, Nov. 6-12
Objective
• Ease the programming of GRID applications
• Basic idea:
L3
Dir
ec
tory
/Co
ntr
ol
L2 L2 L2
LSU LSUIFUBXU
IDU IDU
IFUBXU
FPU FPU
FX
U
FX
UISU ISU
Grid
ns seconds/minutes/hours
SC 2004, Pittsburgh, Nov. 6-12
Outline
• Objective• The essence• User’s interface• Automatic code generation• Current run-time features• Programming experiences• Future work• Conclusions
SC 2004, Pittsburgh, Nov. 6-12
The essence
• Assembly language for the GRID– Simple sequential programming, well defined operations and
operands
– C/C++, Perl, …
• Automatic run time “parallelization”– Use architectural concepts from microprocessor design
• Instruction window (DAG), Dependence analysis, scheduling, locality, renaming, forwarding, prediction, speculation,…
SC 2004, Pittsburgh, Nov. 6-12
Input/output files
The essence
for (int i = 0; i < MAXITER; i++) {
newBWd = GenerateRandom();
subst (referenceCFG, newBWd, newCFG);
dimemas (newCFG, traceFile, DimemasOUT);
post (newBWd, DimemasOUT, FinalOUT);
if(i % 3 == 0) Display(FinalOUT);
}
fd = GS_Open(FinalOUT, R);
printf("Results file:\n"); present (fd);
GS_Close(fd);
SC 2004, Pittsburgh, Nov. 6-12
The essenceSubst
DIMEMAS
EXTRACT
Subst
DIMEMAS
EXTRACT…
GS_open
Subst
DIMEMAS
EXTRACT
Subst
DIMEMAS
EXTRACT
Subst
DIMEMAS
EXTRACT
Subst
DIMEMAS
EXTRACT
Subst
DIMEMAS
EXTRACT
Display
Display
CIRI Grid
SC 2004, Pittsburgh, Nov. 6-12
The essenceSubst
DIMEMAS
EXTRACT
Subst
DIMEMAS
EXTRACT…
GS_open
Subst
DIMEMAS
EXTRACT
Subst
DIMEMAS
EXTRACT
Subst
DIMEMAS
EXTRACT
Subst
DIMEMAS
EXTRACT
Subst
DIMEMAS
EXTRACT
Display
Display
CIRI Grid
SC 2004, Pittsburgh, Nov. 6-12
Outline
• Objective• The essence• User’s interface• Automatic code generation• Run-time features• Programming experiences• Ongoing work• Conclusions
SC 2004, Pittsburgh, Nov. 6-12
• Three components:
– Main program
– Subroutines/functions
– Interface Definition Language (IDL) file
• Programming languages: C/C++, Perl
User’s interface
SC 2004, Pittsburgh, Nov. 6-12
• A Typical sequential program
– Main program:
for (int i = 0; i < MAXITER; i++) {
newBWd = GenerateRandom();
subst (referenceCFG, newBWd, newCFG);
dimemas (newCFG, traceFile, DimemasOUT);
post (newBWd, DimemasOUT, FinalOUT);
if(i % 3 == 0) Display(FinalOUT);
}
fd = GS_Open(FinalOUT, R);
printf("Results file:\n"); present (fd);
GS_Close(fd);
User’s interface
SC 2004, Pittsburgh, Nov. 6-12
User’s interface
void dimemas(in File newCFG, in File traceFile, out File DimemasOUT){ char command[500]; putenv("DIMEMAS_HOME=/usr/local/cepba-tools"); sprintf(command, "/usr/local/cepba-tools/bin/Dimemas -o %s %s", DimemasOUT, newCFG ); GS_System(command);}
• A Typical sequential program– Subroutines/functions
void display(in File toplot){ char command[500];
sprintf(command, "./display.sh %s", toplot); GS_System(command);}
SC 2004, Pittsburgh, Nov. 6-12
User’s interface
• GRID superscalar programming requirements
– Main program: open/close files with• GS_FOpen, GS_Open, GS_FClose, GS_Close
– Currently required. Next versions will implement a version of C library functions with GRID superscalar semantic
– Subroutines/functions• Temporal files on local directory or ensure uniqueness of name per
subroutine invocation• GS_System instead of system• All input/output files required must be passed as arguments
SC 2004, Pittsburgh, Nov. 6-12
interface MC {void subst(in File referenceCFG, in double newBW, out File newCFG);void dimemas(in File newCFG, in File traceFile, out File DimemasOUT);void post(in File newCFG, in File DimemasOUT, inout File FinalOUT);void display(in File toplot)
};
• Gridifying the sequential program
– CORBA-IDL Like Interface: • In/Out/InOut files• Scalar values (in or out)
– The subroutines/functions listed in this file will be executed in a remote server in the Grid.
User’s interface
SC 2004, Pittsburgh, Nov. 6-12
Outline
• Objective• The essence• User’s interface• Automatic code generation• Run-time features• Programming experiences• Ongoing work• Conclusions
SC 2004, Pittsburgh, Nov. 6-12
Automatic code generation
app.idl
app-worker.capp.c app-functions.c
server
gsstubgen
app.h
client
app-stubs.c
app_constraints.cc app_constraints_wrapper.cc
app_constraints.h
app.xml
SC 2004, Pittsburgh, Nov. 6-12
Sample stubs file
#include <stdio.h> …int gs_result;
void Subst(file referenceCFG, double seed, file newCFG)
{
/* Marshalling/Demarshalling buffers */
char *buff_seed;
/* Allocate buffers */
buff_seed = (char *)malloc(atoi(getenv("GS_GENLENGTH"))+1);
/* Parameter marshalling */
sprintf(buff_seed, "%.20g", seed);
Execute(SubstOp, 1, 1, 1, 0, referenceCFG, buff_seed, newCFG);
/* Deallocate buffers */
free(buff_seed);
}…
SC 2004, Pittsburgh, Nov. 6-12
Sample worker main file
#include <stdio.h> …int main(int argc, char **argv) { enum operationCode opCod = (enum
operationCode)atoi(argv[2]);
IniWorker(argc, argv);
switch(opCod) { case SubstOp: { double seed;
seed = strtod(argv[4], NULL); Subst(argv[3], seed, argv[5]); } break;
…}
EndWorker(gs_result, argc, argv); return 0; }
SC 2004, Pittsburgh, Nov. 6-12
Sample constraints skeleton file
#include "mcarlo_constraints.h"#include "user_provided_functions.h"
string Subst_constraints(file referenceCFG, double seed, file newCFG) {
string constraints = "";
return constraints;}
double Subst_cost(file referenceCFG, double seed, file newCFG) {
return 1.0;}…
SC 2004, Pittsburgh, Nov. 6-12
Sample constraints wrapper file (1)
#include <stdio.h>…
typedef ClassAd (*constraints_wrapper) (char **_parameters);typedef double (*cost_wrapper) (char **_parameters);
// PrototypesClassAd Subst_constraints_wrapper(char **_parameters);double Subst_cost_wrapper(char **_parameters);…
// Function tablesconstraints_wrapper constraints_functions[4] = { Subst_constraints_wrapper, …};
cost_wrapper cost_functions[4] = { Subst_cost_wrapper, …};
SC 2004, Pittsburgh, Nov. 6-12
Sample constraints wrapper file (2)
ClassAd Subst_constraints_wrapper(char **_parameters) { char **_argp; // Generic buffers char *buff_referenceCFG; char *buff_seed;
// Real parameters char *referenceCFG; double seed; // Read parameters _argp = _parameters; buff_referenceCFG = *(_argp++); buff_seed = *(_argp++);
//Datatype conversion referenceCFG = buff_referenceCFG; seed = strtod(buff_seed, NULL); string _constraints = Subst_constraints(referenceCFG, seed); ClassAd _ad; ClassAdParser _parser; _ad.Insert("Requirements", _parser.ParseExpression(_constraints)); // Free buffers return _ad;}
SC 2004, Pittsburgh, Nov. 6-12
Sample constraints wrapper file (3)
double Subst_cost_wrapper(char **_parameters) { char **_argp;
// Generic buffers char *buff_referenceCFG;
char *buff_referenceCFG; char *buff_seed;
// Real parameters char *referenceCFG; double seed; // Allocate buffers // Read parameters _argp = _parameters; buff_referenceCFG = *(_argp++); buff_seed = *(_argp++); //Datatype conversion referenceCFG = buff_referenceCFG; seed = strtod(buff_seed, NULL); double _cost = Subst_cost(referenceCFG, seed); // Free buffers return _cost;}
…
SC 2004, Pittsburgh, Nov. 6-12
Binary building
client
GRID superscalarruntime
serveri
app-functions.c
app-worker.c
app-stubs.c
app.c
GT2
.
.
.
serveri
app-functions.c
app-worker.c
GT2 services: gsiftp, gram
app_constraints.cc
app_constraints_wrapper.cc
SC 2004, Pittsburgh, Nov. 6-12
Calls sequence without GRID superscalar
app.c
LocalHost
app-functions.c
SC 2004, Pittsburgh, Nov. 6-12
Calls sequence with GRID superscalar
app.c
app-stubs.c
GRID superscalarruntime
app_constraints_wrapper.cc
app_constraints.cc
GT2
LocalHostRemoteHost
app-functions.c
app-worker.c
SC 2004, Pittsburgh, Nov. 6-12
Outline
• Objective• The essence• User interface• Automatic code generation• Run-time features• Programming experiences• Ongoing work• Conclusions
SC 2004, Pittsburgh, Nov. 6-12
Run-time features
• Previous prototype over Condor and MW• Current prototype over Globus 2.x, using the API• File transfer, security, … provided by Globus• Run-time implemented primitives
– GS_on, GS_off
– Execute
– GS_Open, GS_Close, GS_FClose, GS_FOpen
– GS_Barrier
– Worker side: GS_System
SC 2004, Pittsburgh, Nov. 6-12
Run-time features
• Data dependence analysis
• Renaming
• File forwarding
• Shared disks management and file transfer policy
• Resource brokering
• Task scheduling
• Task submission
• End of task notification
• Results collection
• Explicit task synchronization
• File management primitives
• Checkpointing at task level
• Deployer
• Exception handling
• Current prototype over Globus 2.x, using the API• File transfer, security, … provided by Globus
SC 2004, Pittsburgh, Nov. 6-12
Data-dependence analysis
• Data dependence analysis– Detects RaW, WaR, WaW dependencies based on file parameters
• Oriented to simulations, FET solvers, bioinformatic applications– Main parameters are data files
• Tasks’ Directed Acyclic Graph is built based on these dependencies
Subst
DIMEMAS
EXTRACT
Subst
DIMEMAS
EXTRACT
SubstSubst
DIMEMAS
EXTRACT
Display
SC 2004, Pittsburgh, Nov. 6-12
“f1_2”“f1_1”
File-renaming
• WaW and WaR dependencies are avoidable with renaming
T1_1
T2_1
T3_1
T1_2
T2_2
T3_2
T1_N
T1_N
T1_N
…“f1” “f1” “f1”
While (!end_condition()){ T1 (…,…, “f1”); T2 (“f1”, …, …); T3 (…,…,…);} WaR
WaW
SC 2004, Pittsburgh, Nov. 6-12
File forwarding
T1
T2
f1
T1
T2
f1 (by socket)
• File forwarding reduces the impact of RaW data dependencies
SC 2004, Pittsburgh, Nov. 6-12
File transfer policy
client
server1
server2
T1
f1 f4
T6
f4 f7
f1f7
Working directories
SC 2004, Pittsburgh, Nov. 6-12
Shared working directories
client
server1
server2
f1f4
f7f1 f7
T1
T6
Working directories
SC 2004, Pittsburgh, Nov. 6-12
Shared input disks
client
server1
server2
Input directories
SC 2004, Pittsburgh, Nov. 6-12
Disks configuration file
khafre.cepba.upc.es SharedDisk0 /app/DB/input_data
kandake0.cepba.upc.es SharedDisk0 /usr/DB/inputs
kandake1.cepba.upc.es SharedDisk0 /usr/DB/inputs
kandake0.cepba.upc.es DiskLocal0 /home/ac/rsirvent/matmul-perl/worker_perl
kandake1.cepba.upc.es DiskLocal0 /home/ac/rsirvent/matmul-perl/worker_perl
khafre.cepba.upc.es DiskLocal1 /home/ac/rsirvent/matmul_worker/worker
working directories
shared directories
SC 2004, Pittsburgh, Nov. 6-12
Resource Broker
• Resource brokering– Currently not a main project goal– Interface between run-time and broker– A Condor resource ClassAdd is built for each resource
Broker configuration file:Machine LimitOfJobs Queue WorkingDirectory Arch OpSys GFlops Mem NCPUs SoftNameList
khafre.cepba.upc.es 3 none /home/ac/rsirvent/DEMOS/mcarlo i386 Linux 1.475 2587 4 Perl560 Dimemas23kadesh.cepba.upc.es 0 short /user1/uni/upc/ac/rsirvent/DEMOS/mcarlo
powerpc AIX 1.5 8000 16 Perl560 Dimemas23kandake.cepba.upc.es /home/ac/rsirvent/McarloClAds
workers
localhost
SC 2004, Pittsburgh, Nov. 6-12
Resource selection (1)
• Cost and constraints specified by user and per IDL task:• Cost (time) of each task instance is estimated
double Dimem_cost(file cfgFile, file traceFile){ double time;
time = (GS_Filesize(traceFile)/1000000) * f(GS_GFlops()); return(time);}
• A task ClassAdd is built on runtime for each task instance
string Dimem_constraints(file cfgFile, file traceFile){ return "(member(\"Dimemas\", other.SoftNameList))";}
SC 2004, Pittsburgh, Nov. 6-12
Resource selection (2)
• Broker receives requests from the run-time– ClassAdd library used to match resource ClassAdds with
task ClassAdds– If more than one matching, selects the resource which
minimizes:
– FT = File transfer time to resource r– ET = Execution time of task t on resource r (using user
provided cost function)
f ( t ,r ) FT( r ) ET( t ,r )
SC 2004, Pittsburgh, Nov. 6-12
Task scheduling
• Distributed between the Execute call, the callback function and the GS_Barrier call
• Possibilities – The task can be submitted immediately after being created
– Task waiting for resource
– Task waiting for data dependency
• GS_Barrier primitive before ending the program that waits for all tasks
SC 2004, Pittsburgh, Nov. 6-12
Task submission
• Task submitted for execution as soon as the data dependencies are solved if resources are available
• Composed of – File transfer– Task submission
• All specified in RSL • Temporal directory created in the server working directory for
each task• Calls to globus:
– globus_gram_client_job_request – globus_gram_client_callback_allow– globus_poll_blocking
SC 2004, Pittsburgh, Nov. 6-12
End of task notification
• Asynchronous state-change callbacks monitoring system – globus_gram_client_callback_allow()
– callback_func function
• Data structures update in Execute function, GRID superscalar primitives and GS_Barrier
SC 2004, Pittsburgh, Nov. 6-12
Results collection
• Collection of output parameters which are not files– Partial barrier synchronization (task generation from main code
cannot continue till we have this scalar result value)
• Socket and file mechanisms provided
SC 2004, Pittsburgh, Nov. 6-12
GS_Barrier
• Implicit task synchronization – GS_Barrier– Inserted in the user main program when required
– Main program execution is blocked
– globus_poll_blocking() called
– Once all tasks are finished the program may resume
SC 2004, Pittsburgh, Nov. 6-12
File management primitives
• GRID superscalar file management API primitives:– GS_FOpen – GS_FClose– GS_Open– GS_Close
• Mandatory for file management operations in main program• Opening a file with write option
– Data dependence analysis – Renaming is applied
• Opening a file with read option– Partial barrier until the task that is generating that file as output file finishes
• Internally file management functions are handled as local tasks– Task node inserted – Data-dependence analysis – Function locally executed
• Future work: offer a C library with GS semantic (source code with typicals calls could be used)
SC 2004, Pittsburgh, Nov. 6-12
3
Task level checkpointing
• Inter-task checkpointing• Recovers sequential consistency in the out-of-order execution
of tasks
0 1 2 3 4 5 6
Completed
Running
Committed
Successful execution
SC 2004, Pittsburgh, Nov. 6-12
3
Task level checkpointing
• Inter-task checkpointing• Recovers sequential consistency in the out-of-order execution
of tasks
0 1 2 3 4 5 6
Completed
Running
Committed
Failing execution
Failing
Cancel
Finished correctly
SC 2004, Pittsburgh, Nov. 6-12
3
Task level checkpointing
• Inter-task checkpointing• Recovers sequential consistency in the out-of-order execution
of tasks
0 1 2 3 4 5 6
Completed
Running
Committed
Restart execution
Failing
Finished correctly
Execution continues normally!
SC 2004, Pittsburgh, Nov. 6-12
Checkpointing
• On fail: from N versions of a file to one version (last committed version)
• Transparent to application developer
SC 2004, Pittsburgh, Nov. 6-12
Deployer
• Java based GUI• Allows workers specification: host details, libraries location…• Selection of Grid configuration • Grid configuration checking process:
– Aliveness of host (ping)
– Globus service is checked by submitting a simple test
– Sends a remote job that copies the code needed in the worker, and compiles it
• Automatic deployment– sends and compiles code in the remote workers and the master
• Configuration files generation
SC 2004, Pittsburgh, Nov. 6-12
Deployer (2)
• Automatic deployment
SC 2004, Pittsburgh, Nov. 6-12
Exception handling
• GS_Speculative_End(func) / GS_Throw
while (j<MAX_ITERS){ getRanges(Lini, BWini, &Lmin, &Lmax, &BWmin, &BWmax); for (i=0; i<ITERS; i++){ L[i] = gen_rand(Lmin, Lmax); BW[i] = gen_rand(BWmin, BWmax); Filter("nsend.cfg", L[i], BW[i], "tmp.cfg"); Dimemas("tmp.cfg", "nsend_rec_nosm.trf", Elapsed_goal, "dim_ou.txt"); Extract("tmp.cfg", "dim_out.txt", "final_result.txt"); } getNewIniRange("final_result.txt",&Lini, &BWini); j++;}GS_Speculative_End(my_func);
void Dimemas(char * cfgFile, char * traceFile, double goal, char * DimemasOUT){
… putenv("DIMEMAS_HOME=/aplic/DIMEMAS"); sprintf(aux, "/aplic/DIMEMAS/bin/Dimemas -o %s %s", DimemasOUT, cfgFile); gs_result = GS_System(aux); distance_to_goal = distance(get_time(DimemasOUT), goal);
if (distance_to_goal < goal*0.1) { printf("Goal Reached!!! Throwing exception.\n"); GS_Throw; }}
Function executed when a exception is thrown
SC 2004, Pittsburgh, Nov. 6-12
Exception handling (2)
• Any worker can call to GS_Throw at any moment• Task that rises the GS_Throw is the last valid task (all
sequential tasks after that must be undone) • The speculative part is considered from the task that throws the
exception till the GS_Speculative_End (no need of a Begin clause)
• Possibly of calling a local function when the exception is detected.
SC 2004, Pittsburgh, Nov. 6-12
Putting all together: involved files
User provided files
Files generated from IDL
Files generated by deployer
app.c
app-stubs.c
app_constraints_wrapper.ccapp_constraints.cc
app-functions.c
app-worker.capp.h
app_constraints.h
broker.cfg diskmaps.cfg
app.idl
SC 2004, Pittsburgh, Nov. 6-12
Outline
• Objective• The essence• User’s interface• Automatic code generation• Run-time features• Programming experiences• Ongoing work• Conclusions
SC 2004, Pittsburgh, Nov. 6-12
Programming experiences
• Performance modelling (Dimemas, Paramedir)– Algorithm flexibility
• NAS Grid Benchmarks– Improved component programs flexibility
– Reduced Grid level source code lines
• Bioinformatics application (production)– Improved portability (Globus vs just LoadLeveler)
– Reduced Grid level source code lines
• Pblade solution for bioinformatics
SC 2004, Pittsburgh, Nov. 6-12
Programming experiences
• fastDNAml– Computes the likelihood of various phylogenetic trees, starting with
aligned DNA sequences from a number of species (Indiana University code)
– Sequential and MPI (grid-enabled) versions available
– Ported to GRID superscalar • Lower pressure on communications than MPI• Simpler code than MPI
…
Tree evaluation
Barrier
SC 2004, Pittsburgh, Nov. 6-12
NAS Grid Benchmarks
LaunchLaunch
ReportReport
SP
SP
SP
SPSP
SPSP
SPSP
SP
SP
SP
SPSP
SPSP
SPSP
SP
SP
SP
SPSP
SPSP
SPSP
Launch
Report
BT MG FT
BT MG FT
BT MG FT
MF
MF
MF
MFMF
MF
LaunchLaunch
ReportReport
BTBT MGMG FTFT
BTBT MGMG FTFT
BTBT MGMG FTFT
MF
MF
MF
MFMF
MF
Launch
Report
LU LU LU
MG MG MG
FT FT FT
MFMFMF
MFMFMF
LaunchLaunch
ReportReport
LULU LULU LULU
MGMG MGMG MGMG
FTFT FTFT FTFT
MFMFMF
MFMFMF
Launch
Report
BT SP LU
BT SP LU
BT SP LU
MF
MF MF
MF
MF MF
MF
MF
LaunchLaunch
ReportReport
BTBT SPSP LULU
BTBT SPSP LULU
BTBT SPSP LULU
MF
MF MF
MF
MF MF
MF
MF
SC 2004, Pittsburgh, Nov. 6-12
NAS Grid Benchmarks
• All of them implemented with GRID superscalar• Run with classes S, W, A• Results scale as expected• When several servers are used, ASCII mode required
MB.S
050
100150200250300350
0 2 4 6
#tasks
Ell
ap
sed
tim
e (
s)
Khafre
Kadesh8
SC 2004, Pittsburgh, Nov. 6-12
Programming experiences
• Performance analysis– GRID superscalar run-time instrumented
– Paraver tracefiles from the client side
– Measures of task execution time in the servers
SC 2004, Pittsburgh, Nov. 6-12
Programming experiences
• Overhead of GRAM Job Manager polling interval
Globus overhead (VP.W)
05
1015202530354045
1 3 5 7 9 11 13 15
Task N
tim
e (
s) Task duration
Active to Done
Request to Active
SC 2004, Pittsburgh, Nov. 6-12
Programming experiences
• VP.S task assignment
BT
BT
BT
MF
MF
MF
MG
MG
MG
MF
MF
MF
FT
FT
FT
Kadesh
Khafre Remote file transfers
SC 2004, Pittsburgh, Nov. 6-12
Outline
• Objective• The essence• User’s interface• Automatic code generation• Run-time features• Programming experiences• Ongoing work• Conclusions
SC 2004, Pittsburgh, Nov. 6-12
Ongoing work
• OGSA oriented resource broker, based on Globus Toolkit 3.x.• Bindings to Ninf-G2• Binding to ssh/rsh/scp• New language bindings (shell script)• And more future work:
– Bindings to other basic middlewares• GAT, …
– Enhancements in the run-time performance guided by the performance analysis
SC 2004, Pittsburgh, Nov. 6-12
Conclusions
• Presentation of the ideas of GRID superscalar
• Exists a viable way to ease the programming of Grid applications
• GRID superscalar run-time enables– Use of the resources in the Grid
– Exploiting the existent parallelism
SC 2004, Pittsburgh, Nov. 6-12
More information
• GRID superscalar home page:
http://people.ac.upc.es/rosab/index_gs.htm
• Rosa M. Badia, Jesús Labarta, Raül Sirvent, Josep M. Pérez, José M. Cela, Rogeli Grima, “Programming Grid Applications with GRID Superscalar”, Journal of Grid Computing, Volume 1 (Number 2): 151-170 (2003).