program systems institute russian academy of sciences1 open ts: an advanced tool for parallel and...

85
1 Program Systems Institute Russian Academy of Sciences Open TS: Open TS: an Advanced Tool for an Advanced Tool for Parallel and Distributed Parallel and Distributed Computing Computing Program Systems Institute Program Systems Institute Russian Academy of Sciences, Russian Academy of Sciences, 2006-11-20 2006-11-20 (Redmond, USA) (Redmond, USA)

Upload: randell-lane

Post on 17-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

11

Program Systems Institute Russian Academy of Sciences

Open TS: Open TS: an Advanced Tool foran Advanced Tool for

Parallel and Distributed Parallel and Distributed ComputingComputing

Program Systems Institute Program Systems Institute Russian Academy of Sciences, Russian Academy of Sciences,

2006-11-202006-11-20(Redmond, USA)(Redmond, USA)

                      

Page 2: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

22

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Presentation OutlinePresentation OutlinePresentation OutlinePresentation Outline Short self-introductionShort self-introduction Open TS outineOpen TS outine Few sample programsFew sample programs Inside Open TSInside Open TS MPI vs Open TS case studyMPI vs Open TS case study OpenTS@WinCCS OpenTS@WinCCS (academic)(academic) T-System SimplifiedT-System Simplified Open TS GadgetsOpen TS Gadgets Conference ratingConference rating Future workFuture work

Page 3: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

33

Program Systems Institute Russian Academy of Sciences

Short Self-IntroductionShort Self-Introduction

Page 4: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

44

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. Pereslavl-Pereslavl-

ZalesskiZalesskiPereslavl-Pereslavl-ZalesskiZalesski

Russian Golden Ring Russian Golden Ring City: 857 years oldCity: 857 years old

Hometown of Great Hometown of Great Dukes of RussiaDukes of Russia

The first building site The first building site Peter The Great Peter The Great navynavy

Ancient capital of Ancient capital of Russian Orthodox Russian Orthodox churchchurch

Page 5: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

55

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

PSI RAS, Pereslavl-ZalesskiPSI RAS, Pereslavl-ZalesskiPSI RAS, Pereslavl-ZalesskiPSI RAS, Pereslavl-Zalesski

Page 6: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

66

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. Flagship “SKIFFlagship “SKIF К- К-

10001000””Flagship “SKIFFlagship “SKIF К- К-

10001000”” Peak performancePeak performance22,,5 5 TflopsTflops

Linpack-Linpack-performanceperformance22,0,0 TflopsTflops

Efficiency ratioEfficiency ratio8080..1 %1 %

November 2004November 2004: The most powerful: The most powerful supercomputer in ex-USSRsupercomputer in ex-USSR

November 2004November 2004 : : Rank 98 in Top500Rank 98 in Top500

Page 7: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

77

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Page 8: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

88

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

T-System HistoryT-System HistoryT-System HistoryT-System History Mid-Mid-80-80-iesies

Basic ideasBasic ideas of T-Systemof T-System 1990-1990-iesies

First implementationFirst implementation of T-Systemof T-System 2001-20022001-2002, “SKIF” , “SKIF”

GRACE — Graph Reduction Applied to GRACE — Graph Reduction Applied to Cluster Environment Cluster Environment

2003-current, “SKIF” 2003-current, “SKIF” Cooperation with MicrosoftCooperation with MicrosoftOpen TS — Open T-systemOpen TS — Open T-system

Page 9: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

99

Program Systems Institute Russian Academy of Sciences

Open TS OverviewOpen TS Overview

Page 10: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

1010

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Comparison: T-System and Comparison: T-System and MPIMPI

Comparison: T-System and Comparison: T-System and MPIMPI

C/Fortran T-System

Assembler MPI

High-levela few

keywords

Low-levelhundred(s)primitives

Sequential Parallel

Page 11: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

1111

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

T-System in ComparisonT-System in ComparisonT-System in ComparisonT-System in ComparisonRelated workRelated work Open TS differentiatorOpen TS differentiator

Charm++Charm++ FP-based approachFP-based approach

UPC, mpC++UPC, mpC++ Implicit parallelismImplicit parallelism

Glasgow Glasgow Parallel HaskellParallel Haskell

Allows C/C++ based low-Allows C/C++ based low-level optimizationlevel optimization

OMPC++OMPC++ Provides both language Provides both language and C++ templates and C++ templates librarylibrary

CilkCilk Supports SMP, MPI, PVM, Supports SMP, MPI, PVM, and GRID platformsand GRID platforms

Page 12: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

1212

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Open TS: an OutlineOpen TS: an OutlineOpen TS: an OutlineOpen TS: an Outline High-performance computing High-performance computing ““Automatic dynamic parallelization”Automatic dynamic parallelization” Combining functional and Combining functional and

imperative approaches, high-level imperative approaches, high-level parallel programmingparallel programming

Т++ Т++ language: “Parallel dialect” of language: “Parallel dialect” of C++ — an approach popular in 90-C++ — an approach popular in 90-iesies

Page 13: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

1313

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Т-Т-ApproachApproachТ-Т-ApproachApproach ““Pure” functions (Pure” functions (tfunctionstfunctions) invocations ) invocations

produce grains of parallelismproduce grains of parallelism T-Program isT-Program is

Functional – on higher levelFunctional – on higher level Imperative – on low level (optimization)Imperative – on low level (optimization)

C-compatible execution modelC-compatible execution model Non-ready variables, Multiple assignmentNon-ready variables, Multiple assignment ““Seamless” C-extension Seamless” C-extension (or Fortran-(or Fortran-

extension)extension)

Page 14: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

1414

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Т++Т++ Keywords KeywordsТ++Т++ Keywords Keywords tfuntfun —— Т-Т-functionfunction tvaltval—— Т-Т-variablevariable tptrtptr—— Т-Т-pointerpointer touttout —— Output parameter (like &) Output parameter (like &) tdroptdrop —— Make ready Make ready twaittwait —— Wait for readiness Wait for readiness tcttct —— Т-Т-contextcontext

Page 15: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

1515

Program Systems Institute Russian Academy of Sciences

Short IntroductionShort Introduction(Sample Programs)(Sample Programs)

Page 16: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

1616

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

#include <stdio.h>#include <stdio.h>

int int fibfib (int n) { (int n) {return n < 2 ? n : return n < 2 ? n : fibfib(n-1)+ (n-1)+ fibfib(n-2);(n-2);

}}

int int mainmain (int argc, char **argv) { (int argc, char **argv) {if (argc != 2) { printf("Usage: fib <n>\n"); return 1; }if (argc != 2) { printf("Usage: fib <n>\n"); return 1; }int n = atoi(argv[1]);int n = atoi(argv[1]);printf("fib(%d) = %d\n", n, printf("fib(%d) = %d\n", n, fibfib(n));(n));return 0;return 0;

}}

Sample Program (C++)Sample Program (C++)Sample Program (C++)Sample Program (C++)

Page 17: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

1717

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

#include <stdio.h>#include <stdio.h>

tfuntfun int int fibfib (int n) { (int n) {return n < 2 ? n : return n < 2 ? n : fibfib(n-1)+ (n-1)+ fibfib(n-2);(n-2);

}}

tfun tfun int int mainmain (int argc, char **argv) { (int argc, char **argv) {if (argc != 2) { printf("Usage: fib <n>\n"); return 1; }if (argc != 2) { printf("Usage: fib <n>\n"); return 1; }int n = atoi(argv[1]);int n = atoi(argv[1]);printf("fib(%d) = %d\n", n, printf("fib(%d) = %d\n", n, (int)(int)fibfib(n));(n));return 0;return 0;

}}

Sample Program (T++)Sample Program (T++)Sample Program (T++)Sample Program (T++)

Page 18: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

1818

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. Sample Program (T+Sample Program (T+

+)+)Sample Program (T+Sample Program (T+

+)+)

0%

20%

40%

60%

80%

100%

120%

0 2 4 6 8 10

CPUs

Time(%) CoE

WinCCS cluster,WinCCS cluster,4 nodes4 nodes

CPU:CPU: AMD Athlon AMD Athlon 64 X2 Dual Core 64 X2 Dual Core Processor 4400+ Processor 4400+ 2.21 GHz2.21 GHz

Gigabit EthernetGigabit Ethernet

time% =time% = timetimetapptapp(N)/timet(N)/timetappapp((1)1)

CoE = CoE = 1/(n1/(n×time%)×time%)CPU Cores

Page 19: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

1919

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Approximate calculation of Pi (C+Approximate calculation of Pi (C++)+)

Approximate calculation of Pi (C+Approximate calculation of Pi (C++)+)

#include <math.h>#include <math.h>#include <stdio.h>#include <stdio.h>#include <stdlib.h>#include <stdlib.h>doubledouble

iisumsum(double begin,(double begin, double finish,double finish, double d) {double d) {

double dl = finish - begin;double dl = finish - begin; double mid = double mid =

(begin + finish) / 2;(begin + finish) / 2; if (fabs(dl) > d)if (fabs(dl) > d) return return isumisum(begin, mid, (begin, mid,

d) +d) + isumisum(mid, finish, d);(mid, finish, d); return return ff(mid) * dl;(mid) * dl;}}

double double ff(double x) {(double x) {

return 4/(1+x*x);return 4/(1+x*x);

}}int int mainmain(int argc, char* argv[]){(int argc, char* argv[]){ unsigned long h;unsigned long h; double a, b, d, sum;double a, b, d, sum;

iif (argc < 2) {return 0;}f (argc < 2) {return 0;} a = 0; b = 1; h = atol(argv[1]);a = 0; b = 1; h = atol(argv[1]); d = fabs(b - a) / h;d = fabs(b - a) / h; sum = sum = isumisum(a, b, d);(a, b, d); printf("PI is approximately printf("PI is approximately

%15.15lf\n", sum);%15.15lf\n", sum); return 0;return 0;}}

Page 20: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

2020

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Approximate calculation of Pi (T+Approximate calculation of Pi (T++)+)

Approximate calculation of Pi (T+Approximate calculation of Pi (T++)+)

#include <math.h>#include <math.h>#include <stdio.h>#include <stdio.h>#include <stdlib.h>#include <stdlib.h>tfun tfun doubledouble

iisumsum(double begin,(double begin, double finish,double finish, double d) {double d) {

double dl = finish - begin;double dl = finish - begin; double mid = double mid =

(begin + finish) / 2;(begin + finish) / 2; if (fabs(dl) > d)if (fabs(dl) > d) return return isumisum(begin, mid, (begin, mid,

d) +d) + isumisum(mid, finish, d);(mid, finish, d); return return (double)(double)ff(mid) * dl;(mid) * dl;}}

tfun tfun double double ff(double x) {(double x) {

return 4/(1+x*x);return 4/(1+x*x);

}}tfun tfun int int mainmain(int argc, char* (int argc, char*

argv[]){argv[]){ unsigned long h;unsigned long h; double a, b, d, sum;double a, b, d, sum;

iif (argc < 2) {return 0;}f (argc < 2) {return 0;} a = 0; b = 1; h = atol(argv[1]);a = 0; b = 1; h = atol(argv[1]); d = fabs(b - a) / h;d = fabs(b - a) / h; sum = sum = isumisum(a, b, d);(a, b, d); printf("PI is approximately printf("PI is approximately

%15.15lf\n", sum);%15.15lf\n", sum); return 0;return 0;}}

Page 21: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

2121

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

0%

20%

40%

60%

80%

100%

120%

0 2 4 6 8 10

Time(%) CoE

Calculation of Pi (T++)Calculation of Pi (T++)Calculation of Pi (T++)Calculation of Pi (T++)

WinCCS cluster,WinCCS cluster,4 nodes4 nodes

CPU:CPU: AMD Athlon AMD Athlon 64 X2 Dual Core 64 X2 Dual Core Processor 4400+ Processor 4400+ 2.21 GHz2.21 GHz

Gigabit EthernetGigabit Ethernet

time% =time% = timetimetapptapp(N)/timet(N)/timetappapp((1)1)

CoE = CoE = 1/(n1/(n×time%)×time%)CPU Cores

Page 22: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

2222

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Map-ReduceMap-ReduceMap-ReduceMap-Reduce----- Original Message ---------- Original Message -----From:From: AlexyAlexy MaykovMaykovSent:Sent: Monday, October 02, 2006 11:58 PM Monday, October 02, 2006 11:58 PMSubject:Subject: MCCS projects MCCS projects……I work in Microsoft Live Labs … I have I work in Microsoft Live Labs … I have several questions below:several questions below:

1.1. How would you implement Map-How would you implement Map-ReduceReduce

in OpenTS?in OpenTS?……

Page 23: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

2323

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Map-Reduce (C+Map-Reduce (C++)+)Map-Reduce (C+Map-Reduce (C++)+)

#include <vector>#include <vector>#include <algorithm>#include <algorithm>#include <functional>#include <functional>#include <iostream>#include <iostream>#include <ctime>#include <ctime>using namespace std;using namespace std;

int int fibfib (int n) (int n){{ return (n < 2) ? n : return (n < 2) ? n : fibfib(n-1) +(n-1) + fib fib(n-(n-

2);2);}}int int plusplus (int val1, int val2) (int val1, int val2) {{ return val1 + val2;return val1 + val2;}}int int mainmain (int argc, char *argv[ (int argc, char *argv[ ] ])){{ const int factor = 23;const int factor = 23; const int vector_size = 40;const int vector_size = 40; vector<int> a, b, c; vector<int> a, b, c; vector<int> fa, fb;vector<int> fa, fb;

cout << " Filling vectors..." << endl;cout << " Filling vectors..." << endl;

for (int i = 1; i <= vector_size; i++)for (int i = 1; i <= vector_size; i++)

{{

a.push_back(i % factor);a.push_back(i % factor);

b.push_back((vector_size + 1 - i) % factor);b.push_back((vector_size + 1 - i) % factor);

c.push_back(0);c.push_back(0);

fa.push_back(0);fa.push_back(0);

fb.push_back(0);fb.push_back(0);

}}

cout << " Mapping..." << endl;cout << " Mapping..." << endl;

transform(a.begin(), a.end(), fa.begin(),transform(a.begin(), a.end(), fa.begin(), fib fib););

cout << " Mapping..." << endl;cout << " Mapping..." << endl;

transform(b.begin(), b.end(), fb.begin(), transform(b.begin(), b.end(), fb.begin(), fibfib););

cout << " Reducing..." << endl;cout << " Reducing..." << endl;

transform(fa.begin(), fa.end(), fb.begin(), transform(fa.begin(), fa.end(), fb.begin(), c.begin(), ::plus);c.begin(), ::plus);

cout << endl << " Result: (" ;cout << endl << " Result: (" ;

ostream_iterator<int> output(cout, " ");ostream_iterator<int> output(cout, " ");

copy(c.begin(), c.end(), output); copy(c.begin(), c.end(), output);

cout << "\b)" << endl;cout << "\b)" << endl;

return 0;return 0;

}}

Page 24: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

2424

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Map-Reduce (C+Map-Reduce (C++)+)Map-Reduce (C+Map-Reduce (C++)+)

#include <vector>#include <vector>#include <algorithm>#include <algorithm>#include <functional>#include <functional>#include <iostream>#include <iostream>#include <ctime>#include <ctime>using namespace std;using namespace std;

int int fibfib (int n) (int n){{ return (n < 2) ? n : return (n < 2) ? n : fibfib(n-1) +(n-1) + fib fib(n-(n-

2);2);}}int int plusplus (int val1, int val2) (int val1, int val2) {{ return val1 + val2;return val1 + val2;}}int int mainmain (int argc, char *argv[ (int argc, char *argv[ ] ])){{ const int factor = 23;const int factor = 23; const int vector_size = 40;const int vector_size = 40; vector<int> a, b, c; vector<int> a, b, c; vector<int> fa, fb;vector<int> fa, fb;

cout << " Filling vectors..." << endl;cout << " Filling vectors..." << endl;

for (int i = 1; i <= vector_size; i++)for (int i = 1; i <= vector_size; i++)

{{

a.push_back(i % factor);a.push_back(i % factor);

b.push_back((vector_size + 1 - i) % factor);b.push_back((vector_size + 1 - i) % factor);

c.push_back(0);c.push_back(0);

fa.push_back(0);fa.push_back(0);

fb.push_back(0);fb.push_back(0);

}}

cout << " Mapping..." << endl;cout << " Mapping..." << endl;

transform(a.begin(), a.end(), fa.begin(),transform(a.begin(), a.end(), fa.begin(), fib fib););

cout << " Mapping..." << endl;cout << " Mapping..." << endl;

transform(b.begin(), b.end(), fb.begin(), transform(b.begin(), b.end(), fb.begin(), fibfib););

cout << " Reducing..." << endl;cout << " Reducing..." << endl;

transform(fa.begin(), fa.end(), fb.begin(), transform(fa.begin(), fa.end(), fb.begin(), c.begin(), ::plus);c.begin(), ::plus);

cout << endl << " Result: (" ;cout << endl << " Result: (" ;

ostream_iterator<int> output(cout, " ");ostream_iterator<int> output(cout, " ");

copy(c.begin(), c.end(), output); copy(c.begin(), c.end(), output);

cout << "\b)" << endl;cout << "\b)" << endl;

return 0;return 0;

}}

Fibonacci

Just “Plus”

Page 25: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

2525

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Map-Reduce (C+Map-Reduce (C++)+)Map-Reduce (C+Map-Reduce (C++)+)

#include <vector>#include <vector>#include <algorithm>#include <algorithm>#include <functional>#include <functional>#include <iostream>#include <iostream>#include <ctime>#include <ctime>using namespace std;using namespace std;

int int fibfib (int n) (int n){{ return (n < 2) ? n : return (n < 2) ? n : fibfib(n-1) +(n-1) + fib fib(n-(n-

2);2);}}int int plusplus (int val1, int val2) (int val1, int val2) {{ return val1 + val2;return val1 + val2;}}int int mainmain (int argc, char *argv[ (int argc, char *argv[ ] ])){{ const int factor = 23;const int factor = 23; const int vector_size = 40;const int vector_size = 40; vector<int> a, b, c; vector<int> a, b, c; vector<int> fa, fb;vector<int> fa, fb;

cout << " Filling vectors..." << endl;cout << " Filling vectors..." << endl;

for (int i = 1; i <= vector_size; i++)for (int i = 1; i <= vector_size; i++)

{{

a.push_back(i % factor);a.push_back(i % factor);

b.push_back((vector_size + 1 - i) % factor);b.push_back((vector_size + 1 - i) % factor);

c.push_back(0);c.push_back(0);

fa.push_back(0);fa.push_back(0);

fb.push_back(0);fb.push_back(0);

}}

cout << " Mapping..." << endl;cout << " Mapping..." << endl;

transform(a.begin(), a.end(), fa.begin(),transform(a.begin(), a.end(), fa.begin(), fib fib););

cout << " Mapping..." << endl;cout << " Mapping..." << endl;

transform(b.begin(), b.end(), fb.begin(), transform(b.begin(), b.end(), fb.begin(), fibfib););

cout << " Reducing..." << endl;cout << " Reducing..." << endl;

transform(fa.begin(), fa.end(), fb.begin(), transform(fa.begin(), fa.end(), fb.begin(), c.begin(), ::plus);c.begin(), ::plus);

cout << endl << " Result: (" ;cout << endl << " Result: (" ;

ostream_iterator<int> output(cout, " ");ostream_iterator<int> output(cout, " ");

copy(c.begin(), c.end(), output); copy(c.begin(), c.end(), output);

cout << "\b)" << endl;cout << "\b)" << endl;

return 0;return 0;

}}

Filling vectors:a=[ k%23 | k[1..40]]b=[ (41-k)%23 | k[1..40]]

Five vectors: a, b, fa, fb, c

Page 26: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

2626

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Map-Reduce (C+Map-Reduce (C++)+)Map-Reduce (C+Map-Reduce (C++)+)

#include <vector>#include <vector>#include <algorithm>#include <algorithm>#include <functional>#include <functional>#include <iostream>#include <iostream>#include <ctime>#include <ctime>using namespace std;using namespace std;

int int fibfib (int n) (int n){{ return (n < 2) ? n : return (n < 2) ? n : fibfib(n-1) +(n-1) + fib fib(n-(n-

2);2);}}int int plusplus (int val1, int val2) (int val1, int val2) {{ return val1 + val2;return val1 + val2;}}int int mainmain (int argc, char *argv[ (int argc, char *argv[ ] ])){{ const int factor = 23;const int factor = 23; const int vector_size = 40;const int vector_size = 40; vector<int> a, b, c; vector<int> a, b, c; vector<int> fa, fb;vector<int> fa, fb;

cout << " Filling vectors..." << endl;cout << " Filling vectors..." << endl;

for (int i = 1; i <= vector_size; i++)for (int i = 1; i <= vector_size; i++)

{{

a.push_back(i % factor);a.push_back(i % factor);

b.push_back((vector_size + 1 - i) % factor);b.push_back((vector_size + 1 - i) % factor);

c.push_back(0);c.push_back(0);

fa.push_back(0);fa.push_back(0);

fb.push_back(0);fb.push_back(0);

}}

cout << " Mapping..." << endl;cout << " Mapping..." << endl;

transform(a.begin(), a.end(), fa.begin(),transform(a.begin(), a.end(), fa.begin(), fib fib););

cout << " Mapping..." << endl;cout << " Mapping..." << endl;

transform(b.begin(), b.end(), fb.begin(), transform(b.begin(), b.end(), fb.begin(), fibfib););

cout << " Reducing..." << endl;cout << " Reducing..." << endl;

transform(fa.begin(), fa.end(), fb.begin(), transform(fa.begin(), fa.end(), fb.begin(), c.begin(), ::plus);c.begin(), ::plus);

cout << endl << " Result: (" ;cout << endl << " Result: (" ;

ostream_iterator<int> output(cout, " ");ostream_iterator<int> output(cout, " ");

copy(c.begin(), c.end(), output); copy(c.begin(), c.end(), output);

cout << "\b)" << endl;cout << "\b)" << endl;

return 0;return 0;

}}

Transform vectors:fa = map fib afb = map fib bc = zipWith plus fa fb

Page 27: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

2727

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Map-Reduce (C+Map-Reduce (C++)+)Map-Reduce (C+Map-Reduce (C++)+)

#include <vector>#include <vector>#include <algorithm>#include <algorithm>#include <functional>#include <functional>#include <iostream>#include <iostream>#include <ctime>#include <ctime>using namespace std;using namespace std;

int int fibfib (int n) (int n){{ return (n < 2) ? n : return (n < 2) ? n : fibfib(n-1) +(n-1) + fib fib(n-(n-

2);2);}}int int plusplus (int val1, int val2) (int val1, int val2) {{ return val1 + val2;return val1 + val2;}}int int mainmain (int argc, char *argv[ (int argc, char *argv[ ] ])){{ const int factor = 23;const int factor = 23; const int vector_size = 40;const int vector_size = 40; vector<int> a, b, c; vector<int> a, b, c; vector<int> fa, fb;vector<int> fa, fb;

cout << " Filling vectors..." << endl;cout << " Filling vectors..." << endl;

for (int i = 1; i <= vector_size; i++)for (int i = 1; i <= vector_size; i++)

{{

a.push_back(i % factor);a.push_back(i % factor);

b.push_back((vector_size + 1 - i) % factor);b.push_back((vector_size + 1 - i) % factor);

c.push_back(0);c.push_back(0);

fa.push_back(0);fa.push_back(0);

fb.push_back(0);fb.push_back(0);

}}

cout << " Mapping..." << endl;cout << " Mapping..." << endl;

transform(a.begin(), a.end(), fa.begin(),transform(a.begin(), a.end(), fa.begin(), fib fib););

cout << " Mapping..." << endl;cout << " Mapping..." << endl;

transform(b.begin(), b.end(), fb.begin(), transform(b.begin(), b.end(), fb.begin(), fibfib););

cout << " Reducing..." << endl;cout << " Reducing..." << endl;

transform(fa.begin(), fa.end(), fb.begin(), transform(fa.begin(), fa.end(), fb.begin(), c.begin(), ::plus);c.begin(), ::plus);

cout << endl << " Result: (" ;cout << endl << " Result: (" ;

ostream_iterator<int> output(cout, " ");ostream_iterator<int> output(cout, " ");

copy(c.begin(), c.end(), output); copy(c.begin(), c.end(), output);

cout << "\b)" << endl;cout << "\b)" << endl;

return 0;return 0;

}}

Page 28: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

2828

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Map-Reduce (T+Map-Reduce (T++)+)Map-Reduce (T+Map-Reduce (T++)+)

#include <vector>#include <vector>#include <algorithm>#include <algorithm>#include <functional>#include <functional>#include <iostream>#include <iostream>#include <ctime>#include <ctime>using namespace std;using namespace std;

tfun tfun int int fibfib (int n) (int n){{ return (n < 2) ? n : return (n < 2) ? n : fibfib(n-1) +(n-1) + fib fib(n-(n-

2);2);}}tfun tfun int int plusplus (int val1, int val2) (int val1, int val2) {{ return val1 + val2;return val1 + val2;}}tfun tfun int int mainmain (int argc, char *argv[ (int argc, char *argv[ ] ])){{ const int factor = 23;const int factor = 23; const int vector_size = 40;const int vector_size = 40; vector<int> a, b, c; vector<int> a, b, c; vector<vector<tval tval int> fa, fb;int> fa, fb;

cout << " Filling vectors..." << endl;cout << " Filling vectors..." << endl;

for (int i = 1; i <= vector_size; i++)for (int i = 1; i <= vector_size; i++)

{{

a.push_back(i % factor);a.push_back(i % factor);

b.push_back((vector_size + 1 - i) % factor);b.push_back((vector_size + 1 - i) % factor);

c.push_back(0);c.push_back(0);

fa.push_back(0);fa.push_back(0);

fb.push_back(0);fb.push_back(0);

}}

cout << " Mapping..." << endl;cout << " Mapping..." << endl;

transform(a.begin(), a.end(), fa.begin(),transform(a.begin(), a.end(), fa.begin(), fib fib););

cout << " Mapping..." << endl;cout << " Mapping..." << endl;

transform(b.begin(), b.end(), fb.begin(), transform(b.begin(), b.end(), fb.begin(), fibfib););

cout << " Reducing..." << endl;cout << " Reducing..." << endl;

transform(fa.begin(), fa.end(), fb.begin(), transform(fa.begin(), fa.end(), fb.begin(), c.begin(), ::plus);c.begin(), ::plus);

cout << endl << " Result: (" ;cout << endl << " Result: (" ;

ostream_iterator<int> output(cout, " ");ostream_iterator<int> output(cout, " ");

copy(c.begin(), c.end(), output); copy(c.begin(), c.end(), output);

cout << "\b)" << endl;cout << "\b)" << endl;

return 0;return 0;

}}

Vector of T-values

Page 29: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

2929

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Map-Reduce (T++): “Laziness” Map-Reduce (T++): “Laziness” Map-Reduce (T++): “Laziness” Map-Reduce (T++): “Laziness” Filling, mapping — all T-functions are invoked, no T-Functions calculated: 0 seconds

Calculating of all T-functions, printing out: 8 seconds

Page 30: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

3030

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

0%

20%

40%

60%

80%

100%

120%

0 2 4 6 8 10

Time(%) CoE

Map-Reduce (T++)Map-Reduce (T++)Map-Reduce (T++)Map-Reduce (T++)

WinCCS cluster,WinCCS cluster,4 nodes4 nodes

CPU:CPU: AMD Athlon AMD Athlon 64 X2 Dual Core 64 X2 Dual Core Processor 4400+ Processor 4400+ 2.21 GHz2.21 GHz

Gigabit EthernetGigabit Ethernet

time% =time% = timetimetapptapp(N)/timet(N)/timetappapp((1)1)

CoE = CoE = 1/(n1/(n×time%)×time%)CPU Cores

Page 31: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

3131

Program Systems Institute Russian Academy of Sciences

Inside OpenTSInside OpenTS

Page 32: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

3232

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Open TSOpen TS: : EnvironmentEnvironmentOpen TSOpen TS: : EnvironmentEnvironment

Supports more then 1,000,000

threads per core

Supports more then 1,000,000

threads per core

Page 33: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

3333

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

SupermemorySupermemorySupermemorySupermemory

Utilization: non-ready values, Utilization: non-ready values, resource and status information, etc.resource and status information, etc.

Object-Oriented Distributed shared Object-Oriented Distributed shared memory (OO DSM)memory (OO DSM)

Global address spaceGlobal address space DSM-cell versioningDSM-cell versioning On top - automatic garbage collectionOn top - automatic garbage collection

Page 34: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

3434

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. Multithreading & Multithreading &

CommunicationsCommunicationsMultithreading & Multithreading & CommunicationsCommunications

Lightweight threadsLightweight threads PIXELS (1 000 000 threadsPIXELS (1 000 000 threads))

AsynchronousAsynchronous communications communications A thread A thread “A”“A” asks non-ready value (or new asks non-ready value (or new

job)job) Asynchronous request sent: Active Asynchronous request sent: Active

messages & Signals delivery over network to messages & Signals delivery over network to stimulate data transfer to the thread stimulate data transfer to the thread “A”“A”

Context switches (including a quant for Context switches (including a quant for communications)communications)

Latency HidingLatency Hiding for node-node exchange for node-node exchange

Page 35: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

3535

Program Systems Institute Russian Academy of Sciences

Open TS applicationsOpen TS applications(selected)(selected)

Page 36: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

3636

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

MultiGenMultiGenChelyabinsk State UniversityChelyabinsk State University

MultiGenMultiGenChelyabinsk State UniversityChelyabinsk State University

Level 0

Level 1

Level 2

Multi-conformation model

К0

К11 К12

К21 К22

Page 37: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

3737

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

MultiGen: SpeedupMultiGen: Speedup

Substance Atom number

Rotations number

Conformers Exectution time (min.:с)

1 node 4 nodes 16 nodes

NCI-609067 28 4 13 9:33 3:21 1:22

TOSLAB A2-0261 82 18 49 115:27 39:23 16:09

NCI-641295 126 25 74 266:19 95:57 34:48

National Cancer Institute USAReg.No. NCI-609067(AIDS drug lead)

TOSLAB company (Russia-Belgium)Reg.No. TOSLAB A2-0261(antiphlogistic drug lead)

National Cancer Institute USAReg.No. NCI-641295(AIDS drug lead)

Page 38: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

3838

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

AeromechanicsAeromechanicsInstitute of Mechanics, MSUInstitute of Mechanics, MSU

AeromechanicsAeromechanicsInstitute of Mechanics, MSUInstitute of Mechanics, MSU

Page 39: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

3939

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Belocerkovski’sBelocerkovski’s approachapproachBelocerkovski’sBelocerkovski’s approachapproach

flow presented asa collection of smallelementary whirlwind(colours: clockwiseand contra-clockwiserotation)

Page 40: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

4040

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Creating space-born radar image from Creating space-born radar image from hologramhologram

Creating space-born radar image from Creating space-born radar image from hologramhologram

Space Research Institute Development

Page 41: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

4141

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Simulating broadband radar Simulating broadband radar signalsignal

Simulating broadband radar Simulating broadband radar signalsignal

Graphical User Interface

Non-PSI RAS development team (Space research institute of Khrunichev corp.)

0

50

100

150

200

250

300

1 4 8 12 16 20 24 28

0

50

100

150

200

250

300

1 4 8 12 16 20 24 28

Page 42: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

4242

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. Landsat Image Landsat Image

ClassificationClassification Landsat Image Landsat Image

ClassificationClassification Computational Computational “web-service”“web-service”

Page 43: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

4343

Program Systems Institute Russian Academy of Sciences

Open TS vs MPI case Open TS vs MPI case studystudy

Page 44: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

4444

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

ApplicationsApplicationsApplicationsApplications Popular and widely used Popular and widely used Developed by independent teams (MPI Developed by independent teams (MPI

experts)experts)

PovRayPovRay – Persistence of Vision Ray- – Persistence of Vision Ray-tracer, enabled for parallel run by a tracer, enabled for parallel run by a patchpatch

ALCMD/MP_liteALCMD/MP_lite – molecular dynamics – molecular dynamics package (Ames Lab)package (Ames Lab)

Page 45: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

4545

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

T-PovRay vs MPI PovRay: T-PovRay vs MPI PovRay: code complexitycode complexity

T-PovRay vs MPI PovRay: T-PovRay vs MPI PovRay: code complexitycode complexity

ProgramProgram Source code Source code volumevolume

MPI modules for MPI modules for PovRay 3.10gPovRay 3.10g

1,500 lines1,500 lines

MPI patch for MPI patch for PovRay 3.50cPovRay 3.50c

3,000 lines3,000 lines

T++ modules (for T++ modules (for both versions 3.10g & both versions 3.10g & 3.50c)3.50c)

200 lines200 lines

~7—15 times

Page 46: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

4646

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

T-PovRay vs MPI PovRay: T-PovRay vs MPI PovRay: performanceperformance

T-PovRay vs MPI PovRay: T-PovRay vs MPI PovRay: performanceperformance

90%100%110%120%130%140%150%160%170%180%190%200%210%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Number of processors

Time MPI/Time OpenTS

2CPUs AMD Opteron 248 2.2 GHz RAM 4GB, GigE, LAM 7.1.1

Page 47: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

4747

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

T-PovRay vs MPI PovRay: T-PovRay vs MPI PovRay: performanceperformance

T-PovRay vs MPI PovRay: T-PovRay vs MPI PovRay: performanceperformance

90%100%110%120%130%140%150%160%170%180%190%200%210%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Number of processors

Time MPI/Time OpenTS

2CPUs AMD Opteron 248 2.2 GHz RAM 4GB, GigE, LAM 7.1.1

Page 48: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

4848

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. ALCMD/MPI vs ALCMD/MPI vs

ALCMD/OpenTS ALCMD/OpenTS ALCMD/MPI vs ALCMD/MPI vs

ALCMD/OpenTS ALCMD/OpenTS MP_Lite component of ALCMD MP_Lite component of ALCMD

rewritten in T++rewritten in T++ Fortran code is left intact Fortran code is left intact

M PI

M PIM P_Lite

ALCMD

OpenTS

OpenTSM P_Lite

ALCMD

Page 49: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

4949

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. ALCMD/MPI vs ALCMD/MPI vs

ALCMD/OpenTS : ALCMD/OpenTS : code complexitycode complexity

ALCMD/MPI vs ALCMD/MPI vs ALCMD/OpenTS : ALCMD/OpenTS : code complexitycode complexity

ProgramProgram Source code Source code volumevolume

MP_Lite total/MPIMP_Lite total/MPI ~20,000 lines~20,000 lines

MP_Lite,ALCMD-MP_Lite,ALCMD-related/related/MPIMPI

~3,500 lines~3,500 lines

MP_Lite,ALCMD-MP_Lite,ALCMD-related/related/OpenTSOpenTS

500 lines500 lines

~7 times

Page 50: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

5050

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. ALCMD/MPI vs ALCMD/MPI vs

ALCMD/OpenTS: ALCMD/OpenTS: performanceperformance

ALCMD/MPI vs ALCMD/MPI vs ALCMD/OpenTS: ALCMD/OpenTS:

performanceperformance

80%

90%

100%

110%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Number of processors

Time MPI/Time OpenTS

16 dual Athlon 1800, AMD Athlon MP 1800+ RAM 1GB, FastEthernet, LAM 7.0.6, Lennard-Jones MD, 512000 atoms

Page 51: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

5151

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. ALCMD/MPI vs ALCMD/MPI vs

ALCMD/OpenTS: ALCMD/OpenTS: performanceperformance

ALCMD/MPI vs ALCMD/MPI vs ALCMD/OpenTS: ALCMD/OpenTS:

performanceperformance

80%

90%

100%

110%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Number of processors

Time MPI/Time OpenTS

2CPUs AMD Opteron 248 2.2 GHz RAM 4GB, GigE, LAM 7.1.1, Lennard-Jones MD, 512000 atoms

Page 52: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

5252

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. ALCMD/MPI vs ALCMD/MPI vs

ALCMD/OpenTS: ALCMD/OpenTS: performanceperformance

ALCMD/MPI vs ALCMD/MPI vs ALCMD/OpenTS: ALCMD/OpenTS:

performanceperformance

80%

90%

100%

110%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Number of processors

Time MPI/Time OpenTS

2CPUs AMD Opteron 248 2.2 GHz RAM 4GB, InfiniBand,MVAMPICH 0.9.4, Lennard-Jones MD,512000 atoms

Page 53: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

5353

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. ALCMD/MPI vs ALCMD/MPI vs

ALCMD/OpenTS: ALCMD/OpenTS: performanceperformance

ALCMD/MPI vs ALCMD/MPI vs ALCMD/OpenTS: ALCMD/OpenTS:

performanceperformance

80%

90%

100%

110%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Number of processors

Time MPI/Time OpenTS

2CPUs AMD Opteron 248 2.2 GHz RAM 4GB, GigE, LAM 7.1.1, Lennard-Jones MD, 512000 atoms

Page 54: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

5454

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. ALCMD/MPI vs ALCMD/MPI vs

ALCMD/OpenTS: ALCMD/OpenTS: performanceperformance

ALCMD/MPI vs ALCMD/MPI vs ALCMD/OpenTS: ALCMD/OpenTS:

performanceperformance

80%

90%

100%

110%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Number of processors

Time MPI/Time OpenTS

2CPUs AMD Opteron 248 2.2 GHz RAM 4GB, InfiniBand,MVAMPICH 0.9.4, Lennard-Jones MD,512000 atoms

Page 55: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

5555

Program Systems Institute Russian Academy of Sciences

Porting OpenTSPorting OpenTSto MS Windows CCSto MS Windows CCS

Page 56: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

5656

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

2006: contract with Microsoft 2006: contract with Microsoft “Porting OpenTS to Windows “Porting OpenTS to Windows

Compute Cluster Server”Compute Cluster Server”

2006: contract with Microsoft 2006: contract with Microsoft “Porting OpenTS to Windows “Porting OpenTS to Windows

Compute Cluster Server”Compute Cluster Server” OpenTS@WinCCSOpenTS@WinCCS

inherits all basic features of the inherits all basic features of the original Linux versionoriginal Linux version

is available under FreeBSD licenseis available under FreeBSD license does not require any commercial does not require any commercial

compiler for T-program development compiler for T-program development — — it’s only enough to install VisualC+it’s only enough to install VisualC++ 2005 Express Edition (available for + 2005 Express Edition (available for free on Microsoft website) and PSDKfree on Microsoft website) and PSDK

Page 57: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

5757

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

OpenTS@WinCCSOpenTS@WinCCSOpenTS@WinCCSOpenTS@WinCCS AMD64 and x86 platforms are AMD64 and x86 platforms are

currently supportedcurrently supported Integration into Microsoft Visual Studio Integration into Microsoft Visual Studio

20052005 Two ways for building T-applications: Two ways for building T-applications:

command line and Visual Studio IDE command line and Visual Studio IDE An installer of OpenTS for Windows An installer of OpenTS for Windows

XP/2003/WCCSXP/2003/WCCS Installation of WCCS SDK (including Installation of WCCS SDK (including

MS-MPI), if necessaryMS-MPI), if necessary OpenTS self-testing procedureOpenTS self-testing procedure

Page 58: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

5858

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Installer of OpenTSInstaller of OpenTSfor Windows XP/2003/WCCSfor Windows XP/2003/WCCS

Installer of OpenTSInstaller of OpenTSfor Windows XP/2003/WCCSfor Windows XP/2003/WCCS

Page 59: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

5959

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

OpenTS integration into OpenTS integration into Microsoft Visual Studio 2005Microsoft Visual Studio 2005

OpenTS integration into OpenTS integration into Microsoft Visual Studio 2005Microsoft Visual Studio 2005

Page 60: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

6060

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

T++ demo applicationsT++ demo applicationsT++ demo applicationsT++ demo applications POVRay and ALCMD were ported to POVRay and ALCMD were ported to

WindowsWindows A benchmark testingA benchmark testing

Both Windows and Linux were testedBoth Windows and Linux were tested Same hardware usedSame hardware used Same OpenTS kernel source code used Same OpenTS kernel source code used

(cross-platform academic OpenTS (cross-platform academic OpenTS microkernel)microkernel)

Same applications (POVRay and ALCMD) Same applications (POVRay and ALCMD) source code used for Windows and source code used for Windows and LinuxLinux

Page 61: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

6161

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Benchmark notationsBenchmark notationsBenchmark notationsBenchmark notations time(N) — execution time (in seconds) time(N) — execution time (in seconds)

of T++ demo, where N of T++ demo, where N — — number CPU number CPU corescores

time_c — execution time of C time_c — execution time of C implementation (in seconds, one CPU implementation (in seconds, one CPU core used)core used)

time%(N) = time(N) / time_ctime%(N) = time(N) / time_c CoE = 1 / (N * time%(N)) — coefficient CoE = 1 / (N * time%(N)) — coefficient

of efficiencyof efficiency

Page 62: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

6262

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

POVRay time(%)POVRay time(%)POVRay time(%)POVRay time(%)

0%

20%

40%

60%

80%

100%

120%

1 2 3 4 5 6 7 8CPUs

Time(%) Linux,MPI Time(%) Linux,OpenTSTime(%) Windows,MPI Time(%) Windows,OpenTS

Page 63: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

6363

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

0%

20%

40%

60%

80%

100%

120%

1 2 3 4 5 6 7 8CPUs

CoE Linux,MPI CoE Linux,OpenTS CoE Windows,MPI CoE Windows,OpenTS

POVRay CoEPOVRay CoEPOVRay CoEPOVRay CoE

Page 64: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

6464

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Performance issuesPerformance issuesPerformance issuesPerformance issues Academic OpenTS:Academic OpenTS:

CoE for POVRay is decreasing CoE for POVRay is decreasing (as well as for Fib, Pi,…) (as well as for Fib, Pi,…)

Reason Reason (proof: next slide):(proof: next slide): asynchronous communications asynchronous communications unsupported in unsupported in academic OpenTSacademic OpenTS

Possible subject for future workPossible subject for future work

Page 65: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

6565

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

T-POVRay: time%T-POVRay: time%syncsync / time% / time%asyncasyncT-POVRay: time%T-POVRay: time%syncsync / time% / time%asyncasync

60%

80%

100%

120%

140%

160%

180%

200%

220%

0 1 2 3 4 5 6 7 8 9

CPUs

Time Sync / Time Async

Page 66: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

6666

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1 2 3 4 5 6 7 8CPUs

Time(%) Linux,MPI Time(%) Linux,OpenTSTime(%) Windows,MPI Time(%) Windows,OpenTS

ALCMD time(%)ALCMD time(%)ALCMD time(%)ALCMD time(%)

Page 67: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

6767

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

30%

40%

50%

60%

70%

80%

90%

100%

110%

120%

1 2 3 4 5 6 7 8CPUs

CoE Linux,MPI CoE Linux,OpenTS CoE Windows,MPI CoE Windows,OpenTS

ALCMD CoEALCMD CoEALCMD CoEALCMD CoE

Page 68: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

6868

Program Systems Institute Russian Academy of Sciences

Open TS “Gadgets”Open TS “Gadgets”

Page 69: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

6969

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. Web-services, Live Web-services, Live

documentsdocumentsWeb-services, Live Web-services, Live

documentsdocumentstfuntfun int fib (int n) { int fib (int n) { return n < 2 ? n : return n < 2 ? n :

fib(n-1)+fib(n-2);fib(n-1)+fib(n-2);}}

<operation name="wstfib"><operation name="wstfib"> <SOAP:operation style="rpc" soapAction=""/><SOAP:operation style="rpc" soapAction=""/> <input><input> <<SOAP:body use="encoded" namespace="urn:myservice“SOAP:body use="encoded" namespace="urn:myservice“ encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"/encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"/>> </input></input> <output><output> <SOAP:body use="encoded" namespace="urn:myservice" <SOAP:body use="encoded" namespace="urn:myservice" encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"/encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"/>> </output></output>

</operation></operation>

twsgen Perl script

Page 70: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

7070

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Trace visualizerTrace visualizerTrace visualizerTrace visualizer Collect trace of Collect trace of

T-program T-program executionexecution

Visualize Visualize performance performance metrics of metrics of OpenTS OpenTS runtimeruntime

Page 71: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

7171

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Fault-toleranceFault-toleranceFault-toleranceFault-tolerance Recalculation based fRecalculation based fault-toleranceault-tolerance

(+)(+) Very simple (in comparison with full transactional Very simple (in comparison with full transactional model)model)

(+)(+) Efficient (only minimal set of damaged functions Efficient (only minimal set of damaged functions are recalculated)are recalculated)

(–)(–) Applicable only for functional programsApplicable only for functional programs Fault-tolerant communications neededFault-tolerant communications needed

(eg.: DMPI v1.0)(eg.: DMPI v1.0) Implemented (experimental version on Linux )Implemented (experimental version on Linux ) Subject for future work for OpenTS @ WinCCSSubject for future work for OpenTS @ WinCCS

Page 72: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

7272

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Some other GadgetsSome other GadgetsSome other GadgetsSome other Gadgets Other T-languages: T-Refal, T-FortanOther T-languages: T-Refal, T-Fortan MemoizationMemoization Automatically choosing between call-Automatically choosing between call-

style and fork-style of function invocationstyle and fork-style of function invocation CheckpointingCheckpointing Heartbeat mechanismHeartbeat mechanism FlavoursFlavours of data references: “normal”, of data references: “normal”,

“glue” and “magnetic” “glue” and “magnetic” — — lazy, eager and lazy, eager and ultra-eager (speculative) data transferultra-eager (speculative) data transfer

Page 73: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

7373

Program Systems Institute Russian Academy of Sciences

T-System “Simplified”T-System “Simplified”

Page 74: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

7474

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

T-Sim libraryT-Sim libraryT-Sim libraryT-Sim library C++ templates, Open TS spun offC++ templates, Open TS spun off Simplistic implementation Simplistic implementation

no light-weight threads (NPTL threads)no light-weight threads (NPTL threads) no multiple-assignment variablesno multiple-assignment variables

FeaturesFeatures XML-RPC for WANs, MPI for LAN,meta-XML-RPC for WANs, MPI for LAN,meta-

cluster supportcluster support compatible load-balancing modelcompatible load-balancing model scheduler template library scheduler template library

Page 75: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

7575

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

T-Sim vs Open TST-Sim vs Open TST-Sim vs Open TST-Sim vs Open TS

FeatureFeature Open TSOpen TS T-SimT-Sim

LanguageLanguage TT++ - С++++ - С++ extension extension, , compiler (GCC) converter compiler (GCC) converter (Windows)(Windows). .

CC++ - ++ - static librarystatic library. .

Data transfer Data transfer DynamicDynamic MPI MPI ((multiple multiple implementations supportimplementations support), ), TCPTCP

XMLXML--RPCRPC, , MPI(experimental)MPI(experimental)

SerializationSerialization

SynchronizationSynchronization Non-ready variables,Non-ready variables, multiple assignmentmultiple assignment

Non-ready variables,Non-ready variables, single single assigmentassigment

Granule of Granule of Parallelism Parallelism

T-functions – lighweight, T-functions – lighweight, non-preemptive threads.non-preemptive threads.

C++ - C++ - ««bindersbinders»» (or (or closures)closures) , , started in a started in a separate OS-level thread separate OS-level thread (NPTL)(NPTL)..

Memory Memory ManagementManagement

Distributed reference countDistributed reference count User-levelUser-level

SchedulerScheduler Dynamic load-balancing, Dynamic load-balancing, plug-ins mechanismplug-ins mechanism..

C++ templates – strategies, C++ templates – strategies, “lego” to construct app-“lego” to construct app-specific schedulersspecific schedulers

Page 76: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

7676

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

T-Sim: sample programT-Sim: sample programT-Sim: sample programT-Sim: sample programtypedeftypedef TVal< TVal<intint> TInt ;> TInt ;

TSIM_TFUNDEF_2TSIM_TFUNDEF_2(fib,int,TInt,TFib)(fib,int,TInt,TFib)voidvoid fib( fib(intint in,TInt __out) in,TInt __out){{ intint out; out; if (in < 2) {if (in < 2) { out = in;out = in; }} elseelse { { TInt o1,TInt o1,o2o2;; TFib(in-1,o1);TFib(in-1,o1); TFib(in-2,o2);TFib(in-2,o2); out = o1+o2;out = o1+o2; o1.release();o1.release(); o2.release();o2.release(); }} __out = out;__out = out; returnreturn;;}}

int int main (main (intint argc, argc,charchar *argv[]) *argv[])

{{

intint t,t,res;res;

TSimRuntime rt;TSimRuntime rt;

TInt _res;TInt _res;

if (argc < 2) t = 10; if (argc < 2) t = 10;

else t = atoi(argv[1]);else t = atoi(argv[1]);

fib(t,_res);fib(t,_res);

res = _res;res = _res;

_res.release();_res.release();

cerrcerr << "The FIB "<<t<<"th is << "The FIB "<<t<<"th is " <<res<<" <<res<<endlendl;;

returnreturn 0; 0;

}}

Page 77: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

7777

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Important FeaturesImportant FeaturesImportant FeaturesImportant Features Scheduler “lego”Scheduler “lego”

““Strategies” of task distribution: round-Strategies” of task distribution: round-robin,on data location, on CPU power robin,on data location, on CPU power available, etc..available, etc..

Resource gathering pluggable Resource gathering pluggable (static/dynamic implemented)(static/dynamic implemented)

Map/Reduce implementation existsMap/Reduce implementation exists Active messages templateActive messages template Still ExperimentalStill Experimental

Page 78: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

7878

Program Systems Institute Russian Academy of Sciences

““Cooperative (or Cooperative (or Conference)Conference)

Rating” ProjectRating” Project

Page 79: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

7979

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Project OutlineProject OutlineProject OutlineProject Outline Goals:Goals:

Familiarize with Open TS@WinCCSFamiliarize with Open TS@WinCCS Demonstrate programming techniques Demonstrate programming techniques

safe with side-effect functions, safe with side-effect functions, ““monotonic” global objectmonotonic” global object

Branch-and-Bound search for an Shortest Branch-and-Bound search for an Shortest Hamilton path in full-graphHamilton path in full-graph

Two developers (Alexander&Sergey)Two developers (Alexander&Sergey) Timeframe: 13-16 November Timeframe: 13-16 November

(at SC06, Microsoft booth, in background)(at SC06, Microsoft booth, in background)

Page 80: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

8080

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

AlgorithmAlgorithmAlgorithmAlgorithm ““Conference” : experts reviewing Conference” : experts reviewing

paperspapers Each expert provides an order of Each expert provides an order of

papers (A better than B)papers (A better than B) Find an order, that minimizes conflictsFind an order, that minimizes conflicts Algorithm: recursionAlgorithm: recursion

Check, if the current cost is greater than Check, if the current cost is greater than current recordcurrent record

If it doesn’t, ask to add another node If it doesn’t, ask to add another node start from an empty orderstart from an empty order

Static Global Monotonic Object

Page 81: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

8181

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Code AnalysisCode AnalysisCode AnalysisCode Analysis C versionC version

File Input File Input — — 6363 Lines of Code Lines of Code Algorithm implementation Algorithm implementation —— 9898 Global variable to store record valueGlobal variable to store record value

T++ versionT++ version File Input File Input — — 6363 ((same)same) Algorithm implementation Algorithm implementation —— 165 = 98 165 = 98

+ 67+ 67 Record UpdateRecord Update (67)(67): start function, that : start function, that

updatesupdates on each node local copies of on each node local copies of global monotonic objectsglobal monotonic objects

Efficient support of global monotonic objects needed — possible future work

Page 82: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

8282

Program Systems Institute Russian Academy of Sciences

Proposal For Future Proposal For Future Cooperation with MicrosoftCooperation with Microsoft

Page 83: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

8383

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Open TS “Windows”iation”Open TS “Windows”iation”Open TS “Windows”iation”Open TS “Windows”iation” More efficient utilizing of Windows API More efficient utilizing of Windows API Asynchronous communications supportAsynchronous communications support SMP modeSMP mode T-program trace visualizerT-program trace visualizer Generating web-services for T-functionsGenerating web-services for T-functions DMPIDMPI Fault tolerance for T++ applicationsFault tolerance for T++ applications Different schedulersDifferent schedulers In future: OpenTS/.NET In future: OpenTS/.NET — — T#T#

Page 84: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

8484

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Templates and SkeletonsTemplates and SkeletonsTemplates and SkeletonsTemplates and Skeletons Development with collaboration with Development with collaboration with

interested MS teamsinterested MS teams Gather requirementsGather requirements PSI RAS implementationPSI RAS implementation Result: generic parallel solutionsResult: generic parallel solutions Map-reduce as the first candidateMap-reduce as the first candidate

C++ templates for usage OpenTS C++ templates for usage OpenTS kernel without (T++ kernel without (T++ → → C++)-converterC++)-converter

Page 85: Program Systems Institute Russian Academy of Sciences1 Open TS: an Advanced Tool for Parallel and Distributed Computing Program Systems Institute Russian

8585

Program Systems Institute Russian Academy of Sciences

THANKS THANKS

… … … … ANY QUESTIONSANY QUESTIONS ??????… …… …

[email protected]@opents.nett