synchronization, concurrency and memory access semantics in … · 2018. 11. 7. ·...
TRANSCRIPT
Synchronization, Concurrency and Memory Access Semantics in XMP
Joachim Protze ([email protected])
XMP Workshop, Tsukuba University
Synchronization, Concurrency and Memory Access Semantics in XMP
Joachim Protze
2
Motivation
• Increasing concurrency in HPC needs new concepts of
programming
• MPI + X is one candidate− Highlights the need for multilevel parallelism
− Potentially even more levels of parallelism
• Multiple PGAS approaches were developed− Did any of these gain broad acceptance?
− Why?
Key to success might be tool support
Synchronization, Concurrency and Memory Access Semantics in XMP
Joachim Protze
3
Challenges of parallel programming
• Deadlocks− Two or more processes block each other by waiting for resources the
other process is holding
• A data race happens when there are two memory accesses in
a program where− both target the same location
− both are performed concurrently by two execution units
− not both are reads
Synchronization, Concurrency and Memory Access Semantics in XMP
Joachim Protze
4
Agenda
• What are data races in XMP?
• How can they be detected?
• How does the tool get the information?
• How is it analysed?
Data race in XMP
Synchronization, Concurrency and Memory Access Semantics in XMP
Joachim Protze
6
Data race in XMP
• A data race happens when there are two memory accesses in
a program where− both target the same location
− both are performed concurrently by two execution units
− not both are reads
• „Concurrently“ also means, „not synchronized“
Synchronization, Concurrency and Memory Access Semantics in XMP
Joachim Protze
7
XMP specific memory access semantics
Global View
• reflect: − Read on reflection source, write on the shadow object
• gmove:− Read on the right-hand side, write on the left-hand side
• reduction:− Read/write on the reduction variable
• bcast:− Read on the from-node, write on the other nodes
• reduce_shadow:− Read on the shadow object, write on reflection source
Synchronization, Concurrency and Memory Access Semantics in XMP
Joachim Protze
8
XMP specific memory access semantics
Local View
• Coarray assignment− Write on the node that owns the image
• Coarray reference− Read on the node that owns the image
Synchronization, Concurrency and Memory Access Semantics in XMP
Joachim Protze
9
Synchronization
Global View
• wait_async− Synchronizes the async communication
• barrier, reduction:− Global synchronization on node-set
• bcast:− Root before any other node
• reflect− Unclear? (referenced by Murai-san in the morning)
Synchronization, Concurrency and Memory Access Semantics in XMP
Joachim Protze
10
Synchronization
Local view
• Post + wait− Post-begin happens before wait-end
− Like post/wait in MPI-RMA
• Sync image/images/all:− Synchronize with node, node-list, all
− Like fence in MPI-RMA
• Lock / unlock:− Synchronize with other node using the same lock variable
Synchronization, Concurrency and Memory Access Semantics in XMP
Joachim Protze
11
Concurrency
Global view
• async communication is concurrent with code executed
before matching wait
#pragma xmp bcast a async(10)a++;
#pragma xmp wait_async(10)
Synchronization, Concurrency and Memory Access Semantics in XMP
Joachim Protze
12
Concurrency
Local view
• Any other node in the same segment is concurrent
• Example:! me=1, other=3!$xmp image(node) sync images(other)
call exchange(dA, dB, other)!$xmp image(node)sync images(other)
! me=3, other =1!$xmp image(node) sync images(other)
call exchange(dB, dA, other)!$xmp image(node)sync images(other)
subroutine exchange(mine,yours,iput)real :: mine[*],yours[*]!$xmp coarray on node :: mine,yoursyours[iput] = mine ! dB[3]=dA[1] || dA[1]=dB[3]end
Source of information
Synchronization, Concurrency and Memory Access Semantics in XMP
Joachim Protze
14
Tools interfaces
• PMPI− Tools interface for MPI
− MPI spec describes wrapping of MPI functions
• OMPT− Tools interface for OpenMP
− Latest OpenMP spec describes events, tool gets notification about
encountered events
• XMPT− Tools interface for XMP, follows the specification of OMPT
− Development of XMPT started with OMPT @TR2 level
Synchronization, Concurrency and Memory Access Semantics in XMP
Joachim Protze
15
XMPT
• What events are needed?− Events for the begin and end of XMP regions
• What information is needed?− Essentially, all information possibly provided to the XMP pragma
− To allow stateless implementation of the tool, the runtime stores a tool
data with scopes
− XMP already provided functions to derive information from handles
Tool implementation
Synchronization, Concurrency and Memory Access Semantics in XMP
Joachim Protze
17
Example: Data race detection for XMP
XMPT (in MUST):
- Fork / join, barrier
- Async communication
- Coarray access
- XMP communication
ThreadSanitizer:
- Happened-before
- POSIX threads
- POSIX threads
- Happened-before
ThreadSanitizer:
- Report of data races
- Report of dead locks
- Synchronization issues
Attributed to POSIX threads
XMPT (in MUST):
- Report of data races
- Report of dead locks
- Synchronization issues
Attributed to XMP regions
Transformation
Transformation
Analysis
Synchronization, Concurrency and Memory Access Semantics in XMP
Joachim Protze
18
Thread-local concurrency
Thread-local concurrency: a technique to handle data race detection at
programming model abstraction [HPDC '18, ACM open access]
thre
ad
#pragma xmp bcast a async(10)
a++;
#pragma xmp wait_async(10)
a++;
Synchronization, Concurrency and Memory Access Semantics in XMP
Joachim Protze
19
Analysing coarray accesses
• Memory access analysed at the node owning the image− Shadow memory for coarrays
• Each node has a vector clock
• Synchronization semantic updates the vector clock− E.g., post → wait : post-node sends vector clock to wait-node
• Analysis advantage over MPI: − MPI allows direct access to local image → need to fully track memory
Synchronization, Concurrency and Memory Access Semantics in XMP
Joachim Protze
20
Summary
• Correctness tools are important
• Tools are always steps behind the development of new
languages
• Tools interface is important for easier porting of tools
• Sometimes we hear the question:− Why not use a language, that prevents the issues?
− How many HPC codes are written in RUST?
Thank you for your attention.