detecting and eliminating potential violation of sequential consistency for concurrent c/c++ program...
TRANSCRIPT
![Page 1: Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf7a1a28abf838c82df7/html5/thumbnails/1.jpg)
Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program
Duan Yuelu, Feng Xiaobing, Pen-chung Yew
![Page 2: Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf7a1a28abf838c82df7/html5/thumbnails/2.jpg)
Outline
Motivation Approach & Implementation Results Related Work Conclusion
![Page 3: Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf7a1a28abf838c82df7/html5/thumbnails/3.jpg)
Motivation
Programmers develop “low-lock” code for better performance lock is expensive data race are deliberately employed require sequential consistency (SC) model
Such code might fail in relaxed consistency (RC) models E.g. Double Checked Locking (DCL) for lazy
initialized singleton
![Page 4: Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf7a1a28abf838c82df7/html5/thumbnails/4.jpg)
Example 1 (a):Lazy initialized singleton
Object::Object() {
this.field = 100;
}
Object Object::getInstance() {
if (!_instance)
_instance = new Object();
return _instance;
}
Object Object::getInstance() {
lock(l);
if (!_instance)
_instance = new Object();
unlock(l);
return _instance;
}
work only for single thread
work for multi-thread, but is expensive...
void Object::useInstance() { Object ins; ins = Object::getInstance(); int f = ins.getField();}
![Page 5: Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf7a1a28abf838c82df7/html5/thumbnails/5.jpg)
(b): Double Checked Locking for lazy initialized singleton
Object Object::getInstance() {
if (!_instance) {
lock(l);
if (!_instance)
_instance = new Object();
unlock(l);
}
return _instance;
}
If the architecture is SC, then it works correctly, with better performance than (a).
But, how about running on RC models that allows write-write reorder?
![Page 6: Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf7a1a28abf838c82df7/html5/thumbnails/6.jpg)
A possible execution interleave…correct!
Object Object::getInstance() {
if (!_instance) {
lock(l);
if (!_instance) {
temp = malloc(..);
A1: temp->field = 100;
A2: _instance = temp;
}
unlock(l);
}
return _instance;
}
B1: if (!_instance) {…}
…
B2: read _instance->field;
Initializer Thread (T1) Reader Thread (T2)
Data races are employed, since these accesses are improperly synchronized
![Page 7: Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf7a1a28abf838c82df7/html5/thumbnails/7.jpg)
But, how about reorder write-write?
Object Object::getInstance() {
if (!_instance) {
lock(l);
if (!_instance) {
temp = malloc(..);
temp->field = 100;
A2: _instance = temp;
A1: temp->field = 100;
}
…
B1: if (!_instance) {…}
…
B2: read _instance->field;
Initializer Thread (T1) Reader Thread (T2)
Get Un-initialized value of instance->field
Violate Sequential Consistency
![Page 8: Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf7a1a28abf838c82df7/html5/thumbnails/8.jpg)
bug pattern:Potential Violation of Sequential Consistency (PVSC),- since these defects might cause SC violation.
How to detect and eliminate PVSC bugs?- Basically, we combine Shasha/Snir’s conflict graph and delay set theory with existing data race detection scheme.
![Page 9: Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf7a1a28abf838c82df7/html5/thumbnails/9.jpg)
Outline
Motivation Approach & Implementation Results Related Work Conclusion
![Page 10: Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf7a1a28abf838c82df7/html5/thumbnails/10.jpg)
our scheme
(1) Construct Race Graph (2) Find cycles in it
A cycle in race graph corresponds to a PVSC bug
(3) Compute delay set (4) Insert memory ordering fences
![Page 11: Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf7a1a28abf838c82df7/html5/thumbnails/11.jpg)
Constructing Race Graph
For all the instructions that executed in a particular execution of a program P:Add program order edge for instructions in
each thread.Add race edge for each data race.
wr a
wr b
rd b
rd a
Thread 1 Thread 2
Race edge
Program order edge
![Page 12: Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf7a1a28abf838c82df7/html5/thumbnails/12.jpg)
A: wr a
B: wr b
C: rd b
D: rd a
Example 1.
Race Graph for DCL…
lock(l);
if (!_instance) {
temp = malloc(..);
temp->field = 100;
_instance = temp;
}
unlock(l);
}
if (!_instance) {…}
…
read _instance->field;
![Page 13: Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf7a1a28abf838c82df7/html5/thumbnails/13.jpg)
Find cycles in race graph
Theorem 1. A cycle in race graph corresponds to a PVSC bug.Proof: If a cycle is found in race graph, then it
is possible to get a non-sequential-consistent execution by letting the race order be consistent with the cycle. E.g, we can get a non-SC execution E={B->C, D->A} from the cycle A->B->C->D->A in previous example.
![Page 14: Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf7a1a28abf838c82df7/html5/thumbnails/14.jpg)
Compute delay set
Delay lemma : Any execution should be consistent with a delay set D. [Shasha/Snir]
Theorem 2. Let D be the delay set which contains all the program order edge of the race cycles in race graph. Then D enforces sequential consistency for the executions that generates D.Proof: Omitted
![Page 15: Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf7a1a28abf838c82df7/html5/thumbnails/15.jpg)
Insert memory ordering fences
A fence instruction delays the issue of an instruction until all previous instructions completed.
Insert a fence for each delay in D. Then D can be enforced, and, Detected PVSC can be eliminated.
![Page 16: Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf7a1a28abf838c82df7/html5/thumbnails/16.jpg)
Thread 2Thread 1
Examples for above 3 steps…
wr a
wr b
rd a
rd b
Fig. 1 : No cycles, no PVSC, no fence is needed. (Implies that any execution on RC is sequential consistent, thus we don’t need fences.)
![Page 17: Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf7a1a28abf838c82df7/html5/thumbnails/17.jpg)
Thread 1 Thread 2 Thread 3
A: a=1
C: b = 1
D: if (b)
B: if (a)
Fig. 2 : contains a cycle A->B->C->D->E->A, PVSC.It’s possible to get the execution {A->B, C->D,E->A} which violates SC and results in {a=1,b=1, R1=0}.If we insert fences between A and B, C and D, then PVSC is eliminated.
E: R1=a
Initially a = b = 0
![Page 18: Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf7a1a28abf838c82df7/html5/thumbnails/18.jpg)
Fig. 3: Corrected version of DCL for lazy initialized singleton.
Object getInstance() { Object *tmp = _instance; Fence(); if (!tmp) {
lock(l); tmp = _instance; if (!tmp) tmp = new Object(); Fence(); _instance = tmp; unlock(l);
} return _instance;}
![Page 19: Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf7a1a28abf838c82df7/html5/thumbnails/19.jpg)
Optimization
To handle real-world applications with Long execution time Many threads
We convert the race graph into PC race graph Combine nodes with same PC into one node.
The graph contains N nodes, where N equals the number of race access instructions.
Adopt SCC algorithm on PC race graph. Each SCC corresponds to a PVSC bug
Can introduce false negatives.
![Page 20: Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf7a1a28abf838c82df7/html5/thumbnails/20.jpg)
Outline
Motivation Approach & Implementation Results Related Work Conclusion
![Page 21: Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf7a1a28abf838c82df7/html5/thumbnails/21.jpg)
Results
Detected PVSC bugs Performance loss after fence insertion Cost of PVSC detection over race detection
![Page 22: Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf7a1a28abf838c82df7/html5/thumbnails/22.jpg)
Part of detected bugsMySQL 5.0.x
sql/slave.c,
handle_slave_io()
Assertion in slave shutdown. mi->slave_running=0 could be visible
toother threads before the cleanup is completed. Thus causes assertion during slave shutdown.
httpd 2.2.x modules/cache/mod_cache.c,
cache_store_content()
store_header() might be visible to other threads before store_body(), thus mod_cache might provide old content despite new content has been fetched.
httpd 2.2.x prefork/prefork.c,
ap_mpm_run()
restart_pending = shutdown_pending = 0; might be visible to child threads after set_singal(), thus if httpd receives SIGTERM, it will be ignored while child processes are being spawned.
![Page 23: Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf7a1a28abf838c82df7/html5/thumbnails/23.jpg)
Performance loss of SPLASH-2
Figure 10: Performance on Intel Itanium SMP
0.4
0.6
0.8
1
1.2
1.4
1.6
wate
r-ns
barn
es
fmm
raytrace
ocean
wate
r-sp
fft
chole
sky
lu
radixN
orm
alized E
xecution T
ime
Non_Fence Compiler Analysis Lock-set Hybrid Happens-before
![Page 24: Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf7a1a28abf838c82df7/html5/thumbnails/24.jpg)
Cost over data race detection
Figure 13: Cost of PVSC detection over different race detecting algorithm
0. 940. 960. 98
11. 021. 041. 061. 081. 1
1. 12
wat
er-n
s
barn
es
fmm
rayt
race
ocea
n
wat
er-s
p fft
chol
esky lu
radi
x
Norm
aliz
ed D
ete
ction T
ime
Non_PVSC Detection Lock-set Hybrid
![Page 25: Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf7a1a28abf838c82df7/html5/thumbnails/25.jpg)
Related Work
Compiler Analysis: Conservative for C/C++ programs, insert much redundant fences which hurt performance severely. [K.Yelick@ucb, S.Midkiff@purdue]
Verification: Enumerate all possible executions fit with a RC model. Not scale to large applications. [S.Burckhardt@msr]
Data race detection: Do not concern with the problem of SC violation. [many]
Other concurrency bugs: Atomicity[AVIO,yyzhou], Correlation[MUVI,yyzhou], do not consider the PVSC problem.
![Page 26: Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf7a1a28abf838c82df7/html5/thumbnails/26.jpg)
Outline
Motivation Approach & Implementation Results Related Work Conclusion
![Page 27: Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf7a1a28abf838c82df7/html5/thumbnails/27.jpg)
Conclusion
An effective and efficient scheme of detect Potential Violation of Sequential Consistency for concurrent C/C++ programs. Easy to be ported to the matured data race detection tools. Retain the performance after PVSC elimination. Scalable and low-cost.
Current limitation Dynamic data race detection limitations: false positive and false
negative. Can be addressed with the progress in data race detection Loop
![Page 28: Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf7a1a28abf838c82df7/html5/thumbnails/28.jpg)
Thanks!
Suggestion?