october 17, 20011 mupc run time system for upc steve seidel, phil merkey jeevan savant, kian gap lee...
Post on 19-Dec-2015
216 views
TRANSCRIPT
October 17, 2001
1
MuPC Run Time SystemMuPC Run Time System for UPC for UPC
Steve Seidel, Phil MerkeySteve Seidel, Phil Merkey
Jeevan Savant, Kian Gap LeeJeevan Savant, Kian Gap LeeDepartment of Computer ScienceDepartment of Computer Science
Michigan Technological UniversityMichigan Technological University
Brian Wibecan, Brian Wibecan, Program PIProgram PI
Phil Becker, Phil Becker, Program ManagerProgram Manager
Kevin Harris, Bruce Trull,Kevin Harris, Bruce Trull,
and Daniel Christiansand Daniel ChristiansCompaq UPC DevelopmentCompaq UPC Development
October 17, 2001
2
UPC designed by Carlson UPC designed by Carlson et al.et al.
A “light weight” extension of C for parallelismA “light weight” extension of C for parallelismA shared memory, multithreaded modelA shared memory, multithreaded modelArrays and pointers can be sharedArrays and pointers can be sharedArray distribution is semi-automaticArray distribution is semi-automaticRemote references are automatically resolvedRemote references are automatically resolvedParallel constructs includeParallel constructs include– forallforall– fence and split barrierfence and split barrier
Built-ins forBuilt-ins for– memory allocation/freememory allocation/free– lockslocks
October 17, 2001
3
Compaq's UPC compilerCompaq's UPC compiler
UPC object codeUPC object code– front end translates UPC source to EDG ILfront end translates UPC source to EDG IL– lowering phase converts UPC-specifics to standard EDG ILlowering phase converts UPC-specifics to standard EDG IL– middle end converts EDG IL to GEM-compatible ILmiddle end converts EDG IL to GEM-compatible IL– GEM back end converts GEM IL to alpha object codeGEM back end converts GEM IL to alpha object code
Each of the intermediate phases above has some Each of the intermediate phases above has some UPC-specific components.UPC-specific components.Alternative:“Bail out" after lowering phase to Alternative:“Bail out" after lowering phase to produce C code that includes calls to a run time produce C code that includes calls to a run time system.system.Under discussion: EDG front end for UPCUnder discussion: EDG front end for UPC
October 17, 2001
4
Run Time System InterfaceRun Time System InterfaceThe RTS interface is an evolving set of data objects The RTS interface is an evolving set of data objects and methods that captures the semantics of “UPC and methods that captures the semantics of “UPC minus C”.minus C”.
An RTS "reference implementation" was suggested An RTS "reference implementation" was suggested by Harris.by Harris.
A publicly available reference implementation willA publicly available reference implementation will– promote UPC code base, user base and platform basepromote UPC code base, user base and platform base– challenge MPI and OpenMPchallenge MPI and OpenMP– foster RTS evolution foster RTS evolution – promote support for UPC toolspromote support for UPC tools
MuPC is MTU's run time system for UPCMuPC is MTU's run time system for UPC
October 17, 2001
5
Run Time System StructureRun Time System Structure
Run time structures describing shared objects Run time structures describing shared objects and globals are maintained. and globals are maintained.
References to nonlocal shared objects are References to nonlocal shared objects are made through made through getget and and putput..
UPC UPC barrierbarrier’s’s and and fencefence’s are passed directly ’s are passed directly to the RTS.to the RTS.
The same is true of UPC calls to other built-in The same is true of UPC calls to other built-in functions that provide locks and dynamic functions that provide locks and dynamic memory allocation.memory allocation.
October 17, 2001
6
Available compiler technologyAvailable compiler technology
Proprietary Compaq compiler supports a Proprietary Compaq compiler supports a proprietary RTS.proprietary RTS.
Reference compiler is not currently available, Reference compiler is not currently available, but ...but ...
Compaq will provide a compiler that supports Compaq will provide a compiler that supports the reference RTS.the reference RTS.
October 17, 2001
7
MuPC Design GoalsMuPC Design Goals
Public availabilityPublic availability
Wide platform baseWide platform base
Open source maintained by MTUOpen source maintained by MTU
User-level implementationUser-level implementation
Quick deliveryQuick delivery
Efficiency is not a primary goalEfficiency is not a primary goal
October 17, 2001
8
Available PlatformsAvailable Platforms
MTU (on site):MTU (on site):– Beowulf cluster (64 nodes)Beowulf cluster (64 nodes)– Sun Enterprise 4500 (12 processors)Sun Enterprise 4500 (12 processors)– SGI Origin 2000 (4 processors)SGI Origin 2000 (4 processors)– Sun workstation networks (various)Sun workstation networks (various)– Linux workstation networks (various)Linux workstation networks (various)– AlphaServer and 2 workstations (provided by Compaq)AlphaServer and 2 workstations (provided by Compaq)
Remote:Remote:– AlphaServer SC (Compaq)AlphaServer SC (Compaq)– T3E (Cray)T3E (Cray)
October 17, 2001
9
Transport vehicle selectionTransport vehicle selectionCandidatesCandidates– MPIMPI no one-sided communicationno one-sided communication– MPI-2MPI-2 incomplete implementationsincomplete implementations– PthreadsPthreads no multiprocessor support no multiprocessor support – OpenMPOpenMP expensive, possibly expensive, possibly
incompatibleincompatible– shmemshmem limited platform baselimited platform base– VIAVIA limited platform baselimited platform base– ARMCIARMCI limited user baselimited user base– TCP/IPTCP/IP too low-leveltoo low-level
Selection criteriaSelection criteria– Portability and availability: MPI, Pthreads, TCP/IPPortability and availability: MPI, Pthreads, TCP/IP– Technical shortcomings can be overcomeTechnical shortcomings can be overcome
October 17, 2001
10
MPI/Pthreads hybridMPI/Pthreads hybridtransport vehicletransport vehicle
MPI provides process control and interprocessor MPI provides process control and interprocessor communication.communication.
Pthreads provides multithreading within each Pthreads provides multithreading within each process to handle asynchronous remote accesses.process to handle asynchronous remote accesses.
The following are equivalent in MuPC:The following are equivalent in MuPC:– one MPI processone MPI process– one UPC thread (from the user’s point of view)one UPC thread (from the user’s point of view)– one user Pthread + one MPI send/recv Pthreadone user Pthread + one MPI send/recv Pthread
Thread safety is provided by isolating all MPI calls Thread safety is provided by isolating all MPI calls in the send/recv Pthread.in the send/recv Pthread.
October 17, 2001
11
upcrun -np 3 upc-demo
MPI_initpthread_create
MPI_initpthread_create
MPI_initpthread_create
userUPCthrd
send/recvthrd
userUPCthrd
send/recvthrd
userUPCthrd
send/recvthrd
upc_finalize upc_finalize upc_finalize
October 17, 2001
12
Example: Nonlocal array Example: Nonlocal array reference reference x=a[k];x=a[k];
// User shared arrayshared int a[10][THREADS];// Frontend-generated temporary pointershared int *UPC_RTS_ptr;...// UPC source code:// x=a[k];// Front end computes address,// phase and thread of remote reference.UPC_RTS_ptr = (vaddr,phase,thread);// Call is made to get a[k]x = MuPC_get_sync_int(UPC_RTS_ptr);
October 17, 2001
13
x = MuPC_get_sync_int(UPC_RTS_ptr p);
MuPC_get_sync_int Send/Recv Thr Send/Recv Thr
send_lock.type=GETsend_lock.ptr=pwait on recv_lock.donex=recv_lock.data
Pthread lock structs: send_lock recv_lock
while (threads)case...GET: MPI_Send(p,RECV)...REPLY: MPI_Recv(y) recv_lock.data=y recv_lock.done=T...end while
while (threads)case...RECV: MPI_Recv(p) MPI_Send(*p,REPLY)...end while
October 17, 2001
14
x = MuPC_get_sync_int(UPC_RTS_ptr p);
MuPC_get_sync_int Send/Recv Thr Send/Recv Thr
send_lock.type=GETsend_lock.ptr=pwait on recv_lock.donex=recv_lock.data
Pthread lock structs: send_lock recv_lock
while (threads)case...GET: MPI_Send(p,RECV)...REPLY: MPI_Recv(y) recv_lock.data=y recv_lock.done=T...end while
while (threads)case...RECV: MPI_Recv(p) MPI_Send(*p,REPLY)...end while
October 17, 2001
15
x = MuPC_get_sync_int(UPC_RTS_ptr p);
MuPC_get_sync_int Send/Recv Thr Send/Recv Thr
send_lock.type=GETsend_lock.ptr=pwait on recv_lock.donex=recv_lock.data
Pthread lock structs: send_lock recv_lock
while (threads)case...GET: MPI_Send(p,RECV)...REPLY: MPI_Recv(y) recv_lock.data=y recv_lock.done=T...end while
while (threads)case...RECV: MPI_Recv(p) MPI_Send(*p,REPLY)...end while
October 17, 2001
16
x = MuPC_get_sync_int(UPC_RTS_ptr p);
MuPC_get_sync_int Send/Recv Thr Send/Recv Thr
send_lock.type=GETsend_lock.ptr=pwait on recv_lock.donex=recv_lock.data
Pthread lock structs: send_lock recv_lock
while (threads)case...GET: MPI_Send(p,RECV)...REPLY: MPI_Recv(y) recv_lock.data=y recv_lock.done=T...end while
while (threads)case...RECV: MPI_Recv(p) MPI_Send(*p,REPLY)...end while
October 17, 2001
17
x = MuPC_get_sync_int(UPC_RTS_ptr p);
MuPC_get_sync_int Send/Recv Thr Send/Recv Thr
send_lock.type=GETsend_lock.ptr=pwait on recv_lock.donex=recv_lock.data
Pthread lock structs: send_lock recv_lock
while (threads)case...GET: MPI_Send(p,RECV)...REPLY: MPI_Recv(y) recv_lock.data=y recv_lock.done=T...end while
while (threads)case...RECV: MPI_Recv(p) MPI_Send(*p,REPLY)...end while
October 17, 2001
18
Synthetic TestingSynthetic Testing
Pseudo-code walkthroughs of all MuPC functionsPseudo-code walkthroughs of all MuPC functions
Synthetic test codes are C/MPI programs that Synthetic test codes are C/MPI programs that call MuPC RTS routines directly.call MuPC RTS routines directly.
Shared data is artificially allocated.Shared data is artificially allocated.
// THREAD 0int a[10];...// a[12]=42;index=12%10;thread=12/10;MuPC_put_integer(a,index,thread,42);...
// THREAD 1int a[10];...// outcome is// a[2]=42;...
October 17, 2001
19
Integration TestingIntegration TestingWrap Wrap getget’s, ’s, putput’s and ’s and notifynotify//waitwait to conform to conform to the RTS interface.to the RTS interface.
Integrate MuPC with front end ...Integrate MuPC with front end ...– ... data structures and globals... data structures and globals– ... initialization and finalization... initialization and finalization
Rewrite synthetic tests in UPC and compare to Rewrite synthetic tests in UPC and compare to previous results.previous results.
Add built-in functions forAdd built-in functions for– lockslocks– memory allocationmemory allocation
October 17, 2001
20
Full-scale TestingFull-scale Testing
MTU test kernelsMTU test kernels
GWU UPC test suiteGWU UPC test suite
Contributed UPC codesContributed UPC codes
October 17, 2001
21
Documentation, Delivery, and Documentation, Delivery, and DistributionDistribution
MuPC sourceMuPC source
Front-end binaries for targeted platformsFront-end binaries for targeted platforms
Makefiles, release notes, Makefiles, release notes, etc.etc.
Serve these items from MTU MuPC web siteServe these items from MTU MuPC web site
Publish a description of MuPCPublish a description of MuPC
October 17, 2001
22
Preliminary Work, Summer, 2001Preliminary Work, Summer, 2001
RTS header files provided by CompaqRTS header files provided by Compaq
MPI-2 one-sided communication proposed as MPI-2 one-sided communication proposed as primary transport vehicle but current primary transport vehicle but current implementations do not meet full standardimplementations do not meet full standard
MPI/Pthreads hybrid selectedMPI/Pthreads hybrid selected
Studied intermediate output of Compaq's UPC Studied intermediate output of Compaq's UPC front endfront end
Compaq hardware and software deliveredCompaq hardware and software delivered
Single-threaded working environment verifiedSingle-threaded working environment verified
Accounts on AlphaServer SC also providedAccounts on AlphaServer SC also provided
October 17, 2001
23
August 20-21, NashuaAugust 20-21, NashuaParticipants:Participants:– Bill Carlson, Brian Wibecan, Kevin Harris, Phil Becker, Bill Carlson, Brian Wibecan, Kevin Harris, Phil Becker,
Daniel Christians, Jim Bovay, Savant, Merkey, SeidelDaniel Christians, Jim Bovay, Savant, Merkey, Seidel
Discussed RTS definition and UPC features per Discussed RTS definition and UPC features per Wibecan's agenda.Wibecan's agenda.Outcomes:Outcomes:– MPI/Pthreads hybrid design feasibleMPI/Pthreads hybrid design feasible– MuPC will include MuPC will include upcccupccc and and upcrunupcrun MPI wrappers MPI wrappers– Agreed on RTS and UPC feature interpretationsAgreed on RTS and UPC feature interpretations– MuPC efficiency and performance not highest priority MuPC efficiency and performance not highest priority – Written meeting summary submitted to CompaqWritten meeting summary submitted to Compaq (Sept. 23, 2001)(Sept. 23, 2001)
October 17, 2001
24
Current WorkCurrent Work
Recent improvements:Recent improvements:– isolating MPI calls for thread safetyisolating MPI calls for thread safety– send/recv threads yield control when there are no send/recv threads yield control when there are no
pending requestspending requests
Skeleton implementations of get/put, barrier, Skeleton implementations of get/put, barrier, fence, and finalize have been scaled to over 30 fence, and finalize have been scaled to over 30 nodes on MTU’s Beowulf cluster.nodes on MTU’s Beowulf cluster.
October 17, 2001
25
Project Work Plan: Project Work Plan:
Start date June 28, 2001Start date June 28, 2001
This plan is based on the This plan is based on the Project Work ItemsProject Work Items specified in the March 27 RFP from Compaq specified in the March 27 RFP from Compaq and on the March 30 MTU Proposal.and on the March 30 MTU Proposal.
October 17, 2001
26
Completed Work ItemsCompleted Work Items(per MTU proposal)(per MTU proposal)
1(a): Review implementation methodologies1(a): Review implementation methodologies
(b): Identify development platforms(b): Identify development platforms (c): Align resources (c): Align resources (staff and platforms)(staff and platforms)
(d): Identify target platforms(d): Identify target platforms (e): Conclusion memo (e): Conclusion memo (sent 9/23/1)(sent 9/23/1)
2: Formal Work Plan and Agreement 2: Formal Work Plan and Agreement – (Written version of this document)(Written version of this document)
4: Initial Design of Run Time System4: Initial Design of Run Time System– Design presented in Nashua on August 20, 2001Design presented in Nashua on August 20, 2001
October 17, 2001
27
Remaining Work Items Remaining Work Items (w/completion dates)(w/completion dates)
5: Development of remaining primary 5: Development of remaining primary components components (Jan. 1, 2002)(Jan. 1, 2002)
– (d) locks(d) locks– (e) complete gets and puts(e) complete gets and puts– (b) memory allocation(b) memory allocation– (f) utility functions(f) utility functions
3: Test design and documentation 3: Test design and documentation (Feb. 1, 2002)(Feb. 1, 2002)
– This testing will be done concurrent with Item 5 above.This testing will be done concurrent with Item 5 above.– (a) Synthetic testing(a) Synthetic testing– (b) Integration testing(b) Integration testing– (c) Full-scale testing(c) Full-scale testing
October 17, 2001
28
6: Public Interface development 6: Public Interface development (April 1, 2002)(April 1, 2002)
– (a) Makefiles, release notes, installation notes, etc.(a) Makefiles, release notes, installation notes, etc.– (b) Bundle all necessary software(b) Bundle all necessary software– (c) Provide MTU-authored test codes and results(c) Provide MTU-authored test codes and results– (d) Release advance copies for review and comment(d) Release advance copies for review and comment
7: System Refinement and Delivery 7: System Refinement and Delivery (June 1, 2002)(June 1, 2002)
– (a) Release MuPC to the UPC Developers' Group(a) Release MuPC to the UPC Developers' Group– (b) Maintain MuPC website at MTU(b) Maintain MuPC website at MTU– (c) Publish description of MuPC (c) Publish description of MuPC
8: Completion Certification8: Completion Certification (June 28, 2002)(June 28, 2002)– (a) Final MuPC release by MTU(a) Final MuPC release by MTU
October 17, 2001
29
MuPC Project StaffMuPC Project Staff
Jeevan SavantJeevan Savant, M.S. Graduate Student, M.S. Graduate Student– MuPC design and implementationMuPC design and implementation– (Items 5(b,d,e,f), 6(a,d), and 7(a,c) above)(Items 5(b,d,e,f), 6(a,d), and 7(a,c) above)– Support: 9 months, half-timeSupport: 9 months, half-time
Kian Gap (Mark) LeeKian Gap (Mark) Lee, M.S. Graduate Student, M.S. Graduate Student– MuPC testing and platform integrationMuPC testing and platform integration– (Items 3(a,b,c), 6(b,c), 7(b,c) above)(Items 3(a,b,c), 6(b,c), 7(b,c) above)– Support: 9 months, half-timeSupport: 9 months, half-time
Phillip MerkeyPhillip Merkey, Research Assistant Professor, Research Assistant Professor
Steven SeidelSteven Seidel, Associate Professor, Associate Professor
October 17, 2001
30
Additional MTU UPC projectsAdditional MTU UPC projectsCharles WallaceCharles Wallace, Assistant Professor, Assistant Professor– UPC Memory modelsUPC Memory models
Xiaodi (Lisa) LiXiaodi (Lisa) Li, M.S. Graduate Student, M.S. Graduate Student– Benchmarking MuPC using one or two NAS parallel Benchmarking MuPC using one or two NAS parallel
benchmarksbenchmarks
Yi (Leon) LiangYi (Leon) Liang, M.S. Graduate Student, M.S. Graduate Student– Pthreads-only MuPC RTSPthreads-only MuPC RTS
Yongsheng HuangYongsheng Huang, M.S. Graduate Student, M.S. Graduate Student– UPC memory models, improving MuPC efficiencyUPC memory models, improving MuPC efficiency
Zhang ZhangZhang Zhang, Ph.D. Graduate Student, Ph.D. Graduate Student– UPC memory models, improving MuPC efficiencyUPC memory models, improving MuPC efficiency