bda: practical dependence analysis for binary executables ... · intro: existing works •the...

97
BDA: Practical Dependence Analysis for Binary Executables by Unbiased Whole-Program Path Sampling and Per-Path Abstract Interpretation Zhuo Zhang, Wei You, Guanhong Tao, Guannan Wei, Yonghwi Kwon, and Xiangyu Zhang

Upload: others

Post on 19-Jul-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

BDA: Practical Dependence Analysis for Binary Executables by Unbiased Whole-Program Path Sampling and

Per-Path Abstract InterpretationZhuo Zhang, Wei You, Guanhong Tao, Guannan Wei,

Yonghwi Kwon, and Xiangyu Zhang

Page 2: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

• Determine data dependence between instructions in binary executables

Intro: Binary Program Dependence Analysis

Page 3: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

• Determine data dependence between instructions in binary executables

1 ...2 a = 0;3 ... 4 int b = a;

1 ...2 mov [rax], 0x03 ...4 mov rbx, [rcx]

Intro: Binary Program Dependence Analysis

Page 4: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

• Determine data dependence between instructions in binary executables

1 ...2 a = 0;3 ... 4 int b = a;

1 ...2 mov [rax], 0x03 ...4 mov rbx, [rcx]

Intro: Binary Program Dependence Analysis

Page 5: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

• Determine data dependence between instructions in binary executables

1 ...2 a = 0;3 ... 4 int b = a;

1 ...2 mov [rax], 0x03 ...4 mov rbx, [rcx]

Intro: Binary Program Dependence Analysis

Page 6: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

• Determine data dependence between instructions in binary executables

1 ...2 a = 0;3 ... 4 int b = a;

1 ...2 mov [rax], 0x03 ...4 mov rbx, [rcx]

DEPENDENCEIntro: Binary Program Dependence Analysis

Page 7: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

• Determine data dependence between instructions in binary executables• Have many applications• Precise call graph construction• Malware analysis to expose hidden behaviors• Binary rewriting• ……

Intro: Binary Program Dependence Analysis

Page 8: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

1 ...2 a = 0;3 ... 4 int b = a;

1 ...2 mov [rax], 0x03 ...4 mov rbx, [rcx]

• Determine data dependence between instructions in binary executables• Have many applications• A key challenge is to identify if multiple memory

read/write instructions access the same memory location

?

?

Intro: Binary Program Dependence Analysis

Page 9: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

1 ...2 a = 0;3 ... 4 int b = a;

1 ...2 mov [rax], 0x03 ...4 mov rbx, [rcx]

• Determine data dependence between instructions in binary executables• Have many applications• A key challenge is to identify if multiple memory

read/write instructions access the same memory location

?

?

Intro: Binary Program Dependence Analysis

Page 10: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Intro: Existing Works

• The state-of-the-art: Value Set Analysis (VSA)• Integrated into a variety of binary analysis frameworks

(Angr, BAP)• Compute a set of possible values for each operand of an

instruction

Page 11: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Intro: Existing Works

• The state-of-the-art: Value Set Analysis (VSA) • Integrated into a variety of binary analysis frameworks

(Angr, BAP)• Compute a set of possible values for each operand of an

instruction• Use a strided interval to denote the set of values

Page 12: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Intro: Existing Works

• The state-of-the-art: Value Set Analysis (VSA) • Integrated into a variety of binary analysis frameworks

(Angr, BAP)• Compute a set of possible values for each operand of an

instruction• Use a strided interval to denote the set of values

s[lb, ub] representing {lb, lb+s, lb+2s, …, ub}

e.g.,2[0, 8] representing {0, 2, 4, 6, 8}

stride

upper bound

lower bound

Page 13: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Intro: Existing Works

• The state-of-the-art: Value Set Analysis (VSA) • Integrated into a variety of binary analysis frameworks

(Angr, BAP)• Compute a set of possible values for each operand of an

instruction• Use a strided interval to denote the set of values• Have difficulty scaling to complex programs

1 int *p;2 if (...)3 p = ...;4 else5 p = ...;6 *p = 0;

1 ...2 ...3 mov rax, rbx4 ...5 mov rax, [rcx]6 mov [rax], 0

Page 14: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Intro: Existing Works

• The state-of-the-art: Value Set Analysis (VSA)• Integrated into a variety of binary analysis frameworks

(Angr, BAP)• Compute a set of possible values for each operand of an

instruction• Use a strided interval to denote the set of values• Have difficulty scaling to complex programs

2[0, 2] representing {0, 2}1 int *p;2 if (...)3 p = ...;4 else5 p = ...;6 *p = 0;

1 ...2 ...3 mov rax, rbx4 ...5 mov rax, [rcx]6 mov [rax], 0

Page 15: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Intro: Existing Works

• The state-of-the-art: Value Set Analysis (VSA) • Integrated into a variety of binary analysis frameworks

(Angr, BAP)• Compute a set of possible values for each operand of an

instruction• Use a strided interval to denote the set of values• Have difficulty scaling to complex programs

2[0, 2] representing {0, 2}1 int *p;2 if (...)3 p = ...;4 else5 p = ...;6 *p = 0;

1 ...2 ...3 mov rax, rbx4 ...5 mov rax, [rcx]6 mov [rax], 0

100[0, 100]representing {0, 100}

Page 16: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Intro: Existing Works

• The state-of-the-art: Value Set Analysis (VSA) • Integrated into a variety of binary analysis frameworks

(Angr, BAP)• Compute a set of possible values for each operand of an

instruction• Use a strided interval to denote the set of values• Have difficulty scaling to complex programs

100[0, 100]representing {0, 100}

2[0, 2] representing {0, 2}1 int *p;2 if (...)3 p = ...;4 else5 p = ...;6 *p = 0;

1 ...2 ...3 mov rax, rbx4 ...5 mov rax, [rcx]6 mov [rax], 0 2[0, 100] representing

{0, 2, 4, ..., 100}

Page 17: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Intro: Existing Works

• The state-of-the-art: Value Set Analysis (VSA)ANGR-VSA BAP-VSA

164.gzip ✘ TIMEOUT (12 hours)175.vpr ✘ TIMEOUT (12 hours)176.gcc ✘ TIMEOUT (12 hours)181.mcf ✘ ✓

186.crafty ✘ TIMEOUT (12 hours)197.parser ✘ TIMEOUT (12 hours)252.eon ✘ TIMEOUT (12 hours)

253.perlbmk ✘ TIMEOUT (12 hours)254.gap ✘ TIMEOUT (12 hours)

255.vortex ✘ TIMEOUT (12 hours)256.bzip2 ✘ TIMEOUT (12 hours)300.twolf ✘ TIMEOUT (12 hours)

Page 18: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Intro: Existing Works

• The state-of-the-art: Value Set Analysis (VSA)

• More efficient but conservative technique: ALTO

Page 19: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Observation

• Observation 1: probabilistic guarantees are sufficient for many practical applications

Page 20: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Observation

• Observation 1: probabilistic guarantees are sufficient for many practical applications• E.g., indirect control-flow transfer targets can help

construct precise call graphs

has a very low likelihood of missing any true positive

Page 21: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Observation

• Observation 1: probabilistic guarantees are sufficient for many practical applications• E.g., indirect control-flow transfer targets can help

construct precise call graphs

Strict Soundness à Never miss any true positivesà Produce a large number of bogus call edges

Probability Guaranteesà Discover most of the true edges à Have a low chance of missing some true positive edges

has a very low likelihood of missing any true positive

Page 22: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Observation

• Observation 1: probabilistic guarantees are sufficient for many practical applications

• Observation 2: a dependence relation can be disclosed by many whole-program paths

Page 23: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Observation

• Observation 1: probabilistic guarantees are sufficient for many practical applications

• Observation 2: a dependence relation can be disclosed by many whole-program paths• For a program with n statements

• The number of dependences: O(n2) • The number of paths is O(2n)

Page 24: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Observation

• Observation 1: probabilistic guarantees are sufficient for many practical applications

• Observation 2: a dependence relation can be disclosed by many whole-program paths

à BDA: a sampling-based abstract interpretation technique for dependence analysis

Page 25: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Observation

• Observation 1: probabilistic guarantees are sufficient for many practical applications

• Observation 2: a dependence relation can be disclosed by many whole-program paths

à BDA: a sampling-based abstract interpretation technique for dependence analysis

We will use source code examples to explain our idea.But BDA operates on stripped binary executables.

Page 26: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Naïve Sampling Algorithm?To toss a fair coin at each predicate 1. void foo(char *buf){

2. scanf(“%s”, buf);3. if (check1(buf))

5. if (!check2(buf)) 4. return;

15. printf(“%s”, buf);16.}

7. if (!check3(buf)) 6. return;

9. if (!check4(buf)) 8. return;

11. if (!check5(buf)) 10. return;

13. if (!check6(buf)) 12. return;

13. return;

1. void foo(char *buf){ 2. scanf(“%s”, buf); 3. if (!check1(buf))4. return;5. if (!check2(buf))6. return;7. if (!check3(buf))8. return;9. if (!check4(buf))10. return;11. if (!check5(buf))12. return;13. if (!check6(buf))14. return;15. printf(“%s”, buf); 16.}

Page 27: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Naïve Sampling Algorithm?To toss a fair coin at each predicate 1. void foo(char *buf){

2. scanf(“%s”, buf);3. if (check1(buf))

5. if (!check2(buf)) 4. return;

15. printf(“%s”, buf);16.}

7. if (!check3(buf)) 6. return;

9. if (!check4(buf)) 8. return;

11. if (!check5(buf)) 10. return;

13. if (!check6(buf)) 12. return;

13. return;

1. void foo(char *buf){ 2. scanf(“%s”, buf); 3. if (!check1(buf))4. return;5. if (!check2(buf))6. return;7. if (!check3(buf))8. return;9. if (!check4(buf))10. return;11. if (!check5(buf))12. return;13. if (!check6(buf))14. return;15. printf(“%s”, buf); 16.}

Page 28: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Naïve Sampling Algorithm?To toss a fair coin at each predicate 1. void foo(char *buf){

2. scanf(“%s”, buf);3. if (check1(buf))

5. if (!check2(buf)) 4. return;

15. printf(“%s”, buf);16.}

7. if (!check3(buf)) 6. return;

9. if (!check4(buf)) 8. return;

11. if (!check5(buf)) 10. return;

13. if (!check6(buf)) 12. return;

13. return;

1. void foo(char *buf){ 2. scanf(“%s”, buf); 3. if (!check1(buf))4. return;5. if (!check2(buf))6. return;7. if (!check3(buf))8. return;9. if (!check4(buf))10. return;11. if (!check5(buf))12. return;13. if (!check6(buf))14. return;15. printf(“%s”, buf); 16.}

Page 29: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Naïve Sampling Algorithm?To toss a fair coin at each predicate 1. void foo(char *buf){

2. scanf(“%s”, buf);3. if (check1(buf))

5. if (!check2(buf)) 4. return;

15. printf(“%s”, buf);16.}

7. if (!check3(buf)) 6. return;

9. if (!check4(buf)) 8. return;

11. if (!check5(buf)) 10. return;

13. if (!check6(buf)) 12. return;

13. return;

1. void foo(char *buf){ 2. scanf(“%s”, buf); 3. if (!check1(buf))4. return;5. if (!check2(buf))6. return;7. if (!check3(buf))8. return;9. if (!check4(buf))10. return;11. if (!check5(buf))12. return;13. if (!check6(buf))14. return;15. printf(“%s”, buf); 16.}

Page 30: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Naïve Sampling Algorithm?To toss a fair coin at each predicate

DEPENDENCE

1. void foo(char *buf){2. scanf(“%s”, buf);3. if (check1(buf))

5. if (!check2(buf)) 4. return;

15. printf(“%s”, buf);16.}

7. if (!check3(buf)) 6. return;

9. if (!check4(buf)) 8. return;

11. if (!check5(buf)) 10. return;

13. if (!check6(buf)) 12. return;

13. return;

1. void foo(char *buf){ 2. scanf(“%s”, buf); 3. if (!check1(buf))4. return;5. if (!check2(buf))6. return;7. if (!check3(buf))8. return;9. if (!check4(buf))10. return;11. if (!check5(buf))12. return;13. if (!check6(buf))14. return;15. printf(“%s”, buf); 16.}

Page 31: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Naïve Sampling Algorithm? NOTo toss a fair coin at each predicate

DEPENDENCE

1. void foo(char *buf){2. scanf(“%s”, buf);3. if (check1(buf))

5. if (!check2(buf)) 4. return;

15. printf(“%s”, buf);16.}

7. if (!check3(buf)) 6. return;

9. if (!check4(buf)) 8. return;

11. if (!check5(buf)) 10. return;

13. if (!check6(buf)) 12. return;

13. return;

1/2

1/2

1/2

1/2

1/2

1/2 1/2

1/2

1/2

1/2

1/2

1/2

P(Red Path): 1/64Find Dependence

P(Blue Path): 1/2Cannot Find Dependence

Page 32: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Naïve Sampling Algorithm? NOTo toss a fair coin at each predicate

DEPENDENCE

1. void foo(char *buf){2. scanf(“%s”, buf);3. if (check1(buf))

5. if (!check2(buf)) 4. return;

15. printf(“%s”, buf);16.}

7. if (!check3(buf)) 6. return;

9. if (!check4(buf)) 8. return;

11. if (!check5(buf)) 10. return;

13. if (!check6(buf)) 12. return;

13. return;

?/?

?/?

?/?

?/?

?/?

?/? ?/?

?/?

?/?

?/?

?/?

?/?

P(Red Path): 1/7Find Dependence

P(Blue Path): 1/7Cannot Find Dependence

7 paths in total

Page 33: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Workflow of BDA

• Phase 1: Path SamplingSample whole-program paths under a uniform distribution

• Phase 2: Per-path Abstract InterpretationCompute the possible values for individual instructions, following the given sample path

• Phase 3: Posterior AnalysisMitigate the possible incomplete path coverage during sampling

Page 34: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 1: Path Sampling

• Input: Binary executable and its inter-procedural control flow graph

• Output: A number of whole-program path samples

Page 35: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

1. int main(){ 2. int a; 3. if (rand()) 4. gee(&a); 5. else foo(&a) ; 6. }7.8. void foo(int *a){9. gee(a) ; 10. if (rand()) 11. *a+=1; 12. } 13. 14. void gee(int *a){ 15. if (rand()) 16. *a=0; 17. else *a=2; 18. }

Phase 1: Inter-procedural Control Flow Graph gee()

foo()main()

BB_1

BB_5BB_4

BB_6

BB_14

BB_17BB_16

BB_18

BB_8

BB_11

BB_12

BB_10

Entry

Exit

Page 36: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 1: Inter-procedural Control Flow Graph

1. int main(){ 2. int a; 3. if (rand()) 4. gee(&a); 5. else foo(&a) ; 6. }7.8. void foo(int *a){9. gee(a) ; 10. if (rand()) 11. *a+=1; 12. } 13. 14. void gee(int *a){ 15. if (rand()) 16. *a=0; 17. else *a=2; 18. }

gee()foo()main()

BB_1

BB_5BB_4

BB_6

BB_14

BB_17BB_16

BB_18

BB_8

BB_11

BB_12

BB_10

Entry

Exit

Page 37: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

1. int main(){ 2. int a; 3. if (rand()) 4. gee(&a); 5. else foo(&a) ; 6. }7.8. void foo(int *a){9. gee(a) ; 10. if (rand()) 11. *a+=1; 12. } 13. 14. void gee(int *a){ 15. if (rand()) 16. *a=0; 17. else *a=2; 18. }

Phase 1: Inter-procedural Control Flow Graph gee()

foo()main()

BB_1

BB_5BB_4

BB_6

BB_14

BB_17BB_16

BB_18

BB_8

BB_11

BB_12

BB_10

Entry

Exit

Page 38: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 1: Inter-procedural Control Flow Graph

Intra-procedural Control Flow1. int main(){

2. int a; 3. if (rand()) 4. gee(&a); 5. else foo(&a) ; 6. }7.8. void foo(int *a){9. gee(a) ; 10. if (rand()) 11. *a+=1; 12. } 13. 14. void gee(int *a){ 15. if (rand()) 16. *a=0; 17. else *a=2; 18. }

gee()foo()main()

BB_1

BB_5BB_4

BB_6

BB_14

BB_17BB_16

BB_18

BB_8

BB_11

BB_12

BB_10

Entry

Exit

Page 39: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 1: Inter-procedural Control Flow Graph Inter-procedural

Control Flow1. int main(){ 2. int a; 3. if (rand()) 4. gee(&a); 5. else foo(&a) ; 6. }7.8. void foo(int *a){9. gee(a) ; 10. if (rand()) 11. *a+=1; 12. } 13. 14. void gee(int *a){ 15. if (rand()) 16. *a=0; 17. else *a=2; 18. }

gee()foo()main()

BB_1

BB_5BB_4

BB_6

BB_14

BB_17BB_16

BB_18

BB_8

BB_11

BB_12

BB_10

Entry

Exit

Page 40: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 1: Inter-procedural Control Flow Graph

1. int main(){ 2. int a; 3. if (rand()) 4. gee(&a); 5. else foo(&a) ; 6. }7.8. void foo(int *a){9. gee(a) ; 10. if (rand()) 11. *a+=1; 12. } 13. 14. void gee(int *a){ 15. if (rand()) 16. *a=0; 17. else *a=2; 18. }

gee()foo()main()

BB_1

BB_5BB_4

BB_6

BB_14

BB_17BB_16

BB_18

BB_8

BB_11

BB_12

BB_10

Entry

Exit

Page 41: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 1: Path Countinggee()foo()main()

BB_1

BB_5BB_4

BB_6

BB_14

BB_17BB_16

BB_18

BB_8

BB_11

BB_12

BB_10

Entry

Exit

Toss a biased coin at predicates to sample each path uniformly

Page 42: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 1: Path Counting

BB_1

BB_5BB_4

Entry main()• m inter-procedural paths from BB_4 to BB_6 m

Page 43: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 1: Path Counting

BB_1

BB_5BB_4

Entry main()

m n

• m inter-procedural paths from BB_4 to BB_6• n inter-procedural

paths from BB_5 to BB_6

Page 44: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 1: Path Counting

BB_1

BB_5BB_4

Entry main()

m

• m inter-procedural paths from BB_4 to BB_6• n inter-procedural

paths from BB_5 to BB_6

• n/(m+n) probability to take BB_5 from BB_1

nn/(m+n)m/(m+n)

Page 45: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 1: Path Countinggee()foo()main()

BB_1

BB_5BB_4

BB_6

BB_14

BB_17BB_16

BB_18

BB_8

BB_11

BB_12

BB_10

Entry

Exit

Compute the weight for each basic block, which denotes the number of inter-procedural paths from the block to the exit of its enclosing function

Page 46: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 1: Path Countinggee()foo()main()

BB_1

BB_5BB_4

BB_6

BB_14

BB_17BB_16

BB_18

BB_8

BB_11

BB_12

BB_10

Entry

Exit

The path counting is performed in reverse topological order

1. Sort the call graph (geeà fooàmain)

2. Sort the nodes inside each function

1

23

4

56

7

8

9

1011

12

Page 47: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 1: Path Countinggee()

BB_14

BB_17BB_16

BB_18

The path counting is performed in reverse topological order

1. Sort the call graph (geeà fooàmain)

2. Sort the nodes inside each function

1

Page 48: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 1: Path Countinggee()

BB_14

BB_17BB_16

BB_18

The path counting is performed in reverse topological order

1. Sort the call graph (geeà fooàmain)

2. Sort the nodes inside each function

1

The first processed node

1 1

2

There are two paths inside gee()

Page 49: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 1: Path Counting

BB_12

foo()

Return node’s weight is 1

1

W[BB_12] = 1

Page 50: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 1: Path Counting

BB_11

BB_12

BB_10

foo()

1

W[BB_10] = W[BB_12] + W[BB_11]= 1 + 1 = 2

2

1

Non-callsite node’s weight is the sum of successors’ weights

Page 51: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 1: Path Counting

BB_14

BB_8

BB_10

2

foo()

2

W[BB_8]= W[BB_14] x W[BB_10]= 2 x 2 = 4

4Callsite node’s weight is the product of the callee weight and the continuation weight

Page 52: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 1: Path Counting

BB_1

BB_5BB_4

BB_6

BB_8

BB_10

Entry

Exit

main()

1

42

6• 2 inter-procedural paths from BB_4 to BB_6• 4 inter-procedural

paths from BB_5 to BB_6

• 2/3 probability to take BB_5 from BB_1

2/3

Page 53: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 1: Path Samplinggee()foo()main()

BB_1

BB_5BB_4

BB_6

BB_14

BB_17BB_16

BB_18

BB_8

BB_11

BB_12

BB_10

Entry

Exit

1

1 1

2

1

2

1

4

1

4

62/31/3

1/2 1/2

1/2

1/2

BB_1

BB_4

BB_14

BB_16

BB_18

BB_6

1/3

1/2

Path 1:

P1 = 1/6 P2 = 1/6

BB_1

BB_5

BB_8

BB_14

BB_17

BB_18

2/3

1/2

Path 2:

BB_10

BB_12

BB_6

1/2

2

Page 54: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 1: Practical Challenges

• The weight of each block is extremely large• The number of whole-program paths: O(2n)

Page 55: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 1: Practical Challenges

• The weight of each block is extremely large• The number of whole-program paths: O(2n)• How to handle biased distribution (e.g., 1 : 101000)

A simple random number generator will introduce substantial error

on such a biased odds.

Page 56: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 1: Practical Challenges

• The weight of each block is extremely large• The number of whole-program paths: O(2n)• How to handle biased distribution (e.g., 1 : 101000)• We develop a novel algorithm to simulate the biased

distribution

Page 57: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 1: Practical Challenges

• The weight of each block is extremely large • Loops and recursion [Bounded unrolling]• Multi-exit [Two different kinds of weights]

Page 58: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 2: Per-Path Abstract Interpretation• Follow the given sampled path

Page 59: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 2: Per-Path Abstract Interpretation• Follow the given sampled path• Use singleton value, instead of strided interval

Page 60: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 2: Per-Path Abstract Interpretation• Follow the given sampled path• Use singleton value, instead of strided interval• Is to-some-extent similar to concrete execution

Page 61: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 2: Per-Path Abstract Interpretation• Follow the given sampled path• Use singleton value, instead of strided interval• Is to-some-extent similar to concrete execution• Compute the possible values for individual

instructions, which will be used to collect the definition and use information about memory

Page 62: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 2: Per-Path Abstract Interpretation

&v à 0xDEADBEEF

A. if (rand())

B. v = 0; C. v = 1;

D. if (rand())

E. output(v); F. output(-v);

G. return;

Page 63: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 2: Per-Path Abstract Interpretation

A. if (rand())

B. v = 0; C. v = 1;

D. if (rand())

E. output(v); F. output(-v);

G. return;

DEF(0xDEADBEEF) = {B}

&v à 0xDEADBEEF

Page 64: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 2: Per-Path Abstract Interpretation

A. if (rand())

B. v = 0; C. v = 1;

D. if (rand())

E. output(v); F. output(-v);

G. return;

DEF(0xDEADBEEF) = {B}

&v à 0xDEADBEEF

USE(0xDEADBEEF) = {E}

Page 65: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 2: Per-Path Abstract Interpretation

A. if (rand())

B. v = 0; C. v = 1;

D. if (rand())

E. output(v); F. output(-v);

G. return;

DEF(0xDEADBEEF) = {B}

&v à 0xDEADBEEF

USE(0xDEADBEEF) = {E}

Page 66: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 3: Posterior Analysis

• Cannot sample all the whole-program paths• Miss some dependence belonging to uncovered paths

Page 67: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 3: Posterior Analysis

• Cannot sample all the whole-program paths• Miss some dependence belonging to uncovered paths

A. if (rand())

B. v = 0; C. v = 1;

D. if (rand())

E. output(v); F. output(-v);

G. return;

Page 68: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 3: Posterior Analysis

• Cannot sample all the whole-program paths• Miss some dependence belonging to uncovered paths

B. v = 0; C. v = 1;

E. output(v); F. output(-v);

✓ ✓

Page 69: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 3: Posterior Analysis

• Cannot sample all the whole-program paths• Miss some dependence belonging to uncovered paths

C. v = 1;

E. output(v); F. output(-v);

A. if (rand())

D. if (rand())

G. return;

Page 70: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 3: Posterior Analysis

• Merge per-path memory write information at each control-flow joint point

A. if (rand())

B. v = 0; C. v = 1;

D. if (rand())

E. output(v); F. output(-v);

G. return;

&v à 0xDEADBEEF

Page 71: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 3: Posterior Analysis

A. if (rand())

B. v = 0; C. v = 1;

D. if (rand())

E. output(v); F. output(-v);

G. return;

&v à 0xDEADBEEF

• Merge per-path memory write information at each control-flow joint point

DEF(0xDEADBEEF) = {B}

DEF(0xDEADBEEF) = {C}

Page 72: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 3: Posterior Analysis

A. if (rand())

B. v = 0; C. v = 1;

D. if (rand())

E. output(v); F. output(-v);

G. return;

&v à 0xDEADBEEF

• Merge per-path memory write information at each control-flow joint point

DEF(0xDEADBEEF) = {B, C}

Page 73: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Phase 3: Posterior Analysis

• Cross-check the memory read information to detect dependence

B. v = 0; C. v = 1;

E. output(v);

USE(0xDEADBEEF) = {E}

DEF(0xDEADBEEF) = {B, C}

Page 74: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Probabilistic Guarantees

• Assume m out of total n paths disclose a dependence, and let k = m/n • For one sample, the probability pd of observing a

given dependency d is:

• For N sample, the probability Pd of observing a given dependency d is:

BDA: Practical Dependence Analysis for Binary Executables by Unbiased Whole-Program Path Sampling and ... 137:11

By applying Taylor’s Theorem to inequality (4), we can derive inequality (5). In practice,the length L of the longest path of any binary executable (without loops or recursion) satisfiesL≪ (263 + 1), the approximation is hence very tight around 1

n . For example, 176.gcc’s longest path

is nearly 40000, such that 1−(8e−15)n ≤!p ≤ 1+(8e−15)

n .

(1 −2L

263 + 1) ·

1

n≤ !p ≤ (1 +

2L

263 + 1) ·

1

n(5)

We should note that a simple random number generator would not work because of the limitationof floating point representation. Considering selecting a branch with possibility 1e−1000 representedvia 1 : 1e+1000, it would be transformed to 2e−308 (the minimal representable positive value infloat64), suggesting over 2e+692 times undesirable amplification of the likelihood. This would leadto heavily biased sampling. Lumbroso [2013] proposed a heavy-weight algorithm to accuratelysample from strongly-biased distribution, whose average-case time complexity is O(log(p + q))when sampling from p :q. In contrast, Algorithm 2 samples in O(1) with negligible precision loss,and hence is more desirable in our context where the sampling function is frequently invoked.

Probabilistic Guarantee for Disclosing Dependence. As mentioned in Section 2, a (memory)dependence may be disclosed by many paths. Assumem out of total n paths disclose a dependence,and let k = m

n . Following our path sampling algorithm, in a path sample, the probability pd ofobserving a given dependency d satisfies inequality (6).

"263

263 + 1

#2L· k ≤ pd = !p ·m ≤

"263 + 1

263

#2L· k (6)

For N samples, the probability Pd of disclosing dependency d at least once has a lower boundmentioned in inequality (7).

Pd = 1 − (1 − pd )N ≥ 1 −

$

1 −

"263

263 + 1

#2L· k

%N≈ 1 − (1 − k)N (7)

Inequality (7) offers a strong guarantee for finding dependency in practice. Taking 176.gcc as anexample, if L=40000, k=0.0005 and N =10000, we would have Pd ≥ 99.32%, which means that thechance of missing the dependence is only 0.68%.

4.3 Addressing Practical Challenges

Handling Loops. Our discussion so far assumes loop-free and recursion-free programs. BDAdistinguishes two kinds of loops and handles them differently. The first kind is loops whoseiteration numbers are not external input related. We call it constant loops. The other kind is inputrelated, called input-dependent loops.

For an input-dependent loop, it is intractable to determine how many times it iterates. A standardsolution is to compute a fix-point, which often entails substantial over-approximation. Hence,our design is to bound the number of iterations. A naive solution is to give a fixed bound for allinput-dependent loops. However, this could cause non-trivial path explosion in the presence ofnesting loops. Hence, we bound the total number of iterations across all the nesting loops withina function. Such a design also allows easy computation of weight values. Assume the bound foreach function is t = 3, Figure 7a illustrates the idea. For each function F , BDA clones the functiont times, denotes as F0, . . . , Ft−1. For each back-edge in Fi , we reconnect it to the correspondingloop head in Fi+1. For example, back-edge a in Figure 7 becomes a1, a2 and a3 connecting differentversions of F . Note that in the transformed graph at most t = 3 back-edges could be taken (e.g., a 3times and b 0 times; a 2 times and b 1 time; and so on).

Proc. ACM Program. Lang., Vol. 3, No. OOPSLA, Article 137. Publication date: October 2019.

BDA: Practical Dependence Analysis for Binary Executables by Unbiased Whole-Program Path Sampling and ... 137:11

By applying Taylor’s Theorem to inequality (4), we can derive inequality (5). In practice,the length L of the longest path of any binary executable (without loops or recursion) satisfiesL≪ (263 + 1), the approximation is hence very tight around 1

n . For example, 176.gcc’s longest path

is nearly 40000, such that 1−(8e−15)n ≤!p ≤ 1+(8e−15)

n .

(1 −2L

263 + 1) ·

1

n≤ !p ≤ (1 +

2L

263 + 1) ·

1

n(5)

We should note that a simple random number generator would not work because of the limitationof floating point representation. Considering selecting a branch with possibility 1e−1000 representedvia 1 : 1e+1000, it would be transformed to 2e−308 (the minimal representable positive value infloat64), suggesting over 2e+692 times undesirable amplification of the likelihood. This would leadto heavily biased sampling. Lumbroso [2013] proposed a heavy-weight algorithm to accuratelysample from strongly-biased distribution, whose average-case time complexity is O(log(p + q))when sampling from p :q. In contrast, Algorithm 2 samples in O(1) with negligible precision loss,and hence is more desirable in our context where the sampling function is frequently invoked.

Probabilistic Guarantee for Disclosing Dependence. As mentioned in Section 2, a (memory)dependence may be disclosed by many paths. Assumem out of total n paths disclose a dependence,and let k = m

n . Following our path sampling algorithm, in a path sample, the probability pd ofobserving a given dependency d satisfies inequality (6).

"263

263 + 1

#2L· k ≤ pd = !p ·m ≤

"263 + 1

263

#2L· k (6)

For N samples, the probability Pd of disclosing dependency d at least once has a lower boundmentioned in inequality (7).

Pd = 1 − (1 − pd )N ≥ 1 −

$

1 −

"263

263 + 1

#2L· k

%N≈ 1 − (1 − k)N (7)

Inequality (7) offers a strong guarantee for finding dependency in practice. Taking 176.gcc as anexample, if L=40000, k=0.0005 and N =10000, we would have Pd ≥ 99.32%, which means that thechance of missing the dependence is only 0.68%.

4.3 Addressing Practical Challenges

Handling Loops. Our discussion so far assumes loop-free and recursion-free programs. BDAdistinguishes two kinds of loops and handles them differently. The first kind is loops whoseiteration numbers are not external input related. We call it constant loops. The other kind is inputrelated, called input-dependent loops.

For an input-dependent loop, it is intractable to determine how many times it iterates. A standardsolution is to compute a fix-point, which often entails substantial over-approximation. Hence,our design is to bound the number of iterations. A naive solution is to give a fixed bound for allinput-dependent loops. However, this could cause non-trivial path explosion in the presence ofnesting loops. Hence, we bound the total number of iterations across all the nesting loops withina function. Such a design also allows easy computation of weight values. Assume the bound foreach function is t = 3, Figure 7a illustrates the idea. For each function F , BDA clones the functiont times, denotes as F0, . . . , Ft−1. For each back-edge in Fi , we reconnect it to the correspondingloop head in Fi+1. For example, back-edge a in Figure 7 becomes a1, a2 and a3 connecting differentversions of F . Note that in the transformed graph at most t = 3 back-edges could be taken (e.g., a 3times and b 0 times; a 2 times and b 1 time; and so on).

Proc. ACM Program. Lang., Vol. 3, No. OOPSLA, Article 137. Publication date: October 2019.

Page 75: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

to it also being applicable to any supervised learneror data type. However, their approach are limitedin supervised learning, while our technique is moregeneral.

GGDB is also related with adversarial training [7].Adversarial examples are perturbed inputs designedto fool machine learning models. Adversarial train-ing injects such examples into training data to in-crease robustness. Di↵erent from model debugging,adversarial training is aiming at generate adversarialexamples, which would cause model bugs. We believeGGDB is complementary to these work.

pd � 1� (1� 8

15)5 = 0.967

pd � 1� (1� 8

15)50 = 1� 2.2⇥ 10�14

|✏| 2L

263 + 1

References

[1] Gabriel Cadamuro, Ran Gilad-Bachrach, and Xi-aojin Zhu. Debugging machine learning models.In ICML Workshop on Reliable Machine Learn-

ing in the Wild, 2016.

[2] Ekin Dogus Cubuk, Barret Zoph, DandelionMane, Vijay Vasudevan, and Quoc V. Le. Au-toaugment: Learning augmentation policies fromdata. CoRR, abs/1805.09501, 2018.

[3] Ian J. Goodfellow, Jean Pouget-Abadie, MehdiMirza, Bing Xu, David Warde-Farley, SherjilOzair, Aaron C. Courville, and Yoshua Ben-gio. Generative adversarial nets. In Advances in

Neural Information Processing Systems 27: An-

nual Conference on Neural Information Process-

ing Systems 2014, December 8-13 2014, Montreal,

Quebec, Canada, pages 2672–2680, 2014.

[4] Alex Krizhevsky, Ilya Sutskever, and Geo↵rey E.Hinton. Imagenet classification with deep convo-lutional neural networks. In Advances in Neu-

ral Information Processing Systems 25: 26th An-

nual Conference on Neural Information Process-

ing Systems 2012. Proceedings of a meeting held

December 3-6, 2012, Lake Tahoe, Nevada, United

States., pages 1106–1114, 2012.

[5] Shiqing Ma, Yingqi Liu, Wen-Chuan Lee, Xi-angyu Zhang, and Ananth Grama. MODE: auto-mated neural network model debugging via statedi↵erential analysis and input selection. In Pro-

ceedings of the 2018 ACM Joint Meeting on Euro-

pean Software Engineering Conference and Sym-

posium on the Foundations of Software Engineer-

ing, ESEC/SIGSOFT FSE 2018, Lake Buena

Vista, FL, USA, November 04-09, 2018, pages175–186, 2018.

[6] David E Rumelhart, Geo↵rey E Hinton, Ronald JWilliams, et al. Learning representations by back-propagating errors. Cognitive modeling, 5(3):1,1988.

[7] Florian Tramer, Alexey Kurakin, Nicolas Pa-pernot, Ian J. Goodfellow, Dan Boneh, andPatrick D. McDaniel. Ensemble adversarial train-ing: Attacks and defenses. In 6th International

Conference on Learning Representations, ICLR

2018, Vancouver, BC, Canada, April 30 - May 3,

2018, Conference Track Proceedings, 2018.

3

Probabilistic Guarantees

• Loop unrolling = 15• Probabilities of observing the the dependence from

strcpy to line 1:• Sample 5 times

• Sample 50 times

1 src[8] = '\x00’;2 strcpy(src, dst);

to it also being applicable to any supervised learneror data type. However, their approach are limitedin supervised learning, while our technique is moregeneral.

GGDB is also related with adversarial training [7].Adversarial examples are perturbed inputs designedto fool machine learning models. Adversarial train-ing injects such examples into training data to in-crease robustness. Di↵erent from model debugging,adversarial training is aiming at generate adversarialexamples, which would cause model bugs. We believeGGDB is complementary to these work.

pd � 1� (1� 8

15)5 = 0.967

pd � 1� (1� 8

15)50 = 1� 2.2⇥ 10�14

|✏| 2L

263 + 1

References

[1] Gabriel Cadamuro, Ran Gilad-Bachrach, and Xi-aojin Zhu. Debugging machine learning models.In ICML Workshop on Reliable Machine Learn-

ing in the Wild, 2016.

[2] Ekin Dogus Cubuk, Barret Zoph, DandelionMane, Vijay Vasudevan, and Quoc V. Le. Au-toaugment: Learning augmentation policies fromdata. CoRR, abs/1805.09501, 2018.

[3] Ian J. Goodfellow, Jean Pouget-Abadie, MehdiMirza, Bing Xu, David Warde-Farley, SherjilOzair, Aaron C. Courville, and Yoshua Ben-gio. Generative adversarial nets. In Advances in

Neural Information Processing Systems 27: An-

nual Conference on Neural Information Process-

ing Systems 2014, December 8-13 2014, Montreal,

Quebec, Canada, pages 2672–2680, 2014.

[4] Alex Krizhevsky, Ilya Sutskever, and Geo↵rey E.Hinton. Imagenet classification with deep convo-lutional neural networks. In Advances in Neu-

ral Information Processing Systems 25: 26th An-

nual Conference on Neural Information Process-

ing Systems 2012. Proceedings of a meeting held

December 3-6, 2012, Lake Tahoe, Nevada, United

States., pages 1106–1114, 2012.

[5] Shiqing Ma, Yingqi Liu, Wen-Chuan Lee, Xi-angyu Zhang, and Ananth Grama. MODE: auto-mated neural network model debugging via statedi↵erential analysis and input selection. In Pro-

ceedings of the 2018 ACM Joint Meeting on Euro-

pean Software Engineering Conference and Sym-

posium on the Foundations of Software Engineer-

ing, ESEC/SIGSOFT FSE 2018, Lake Buena

Vista, FL, USA, November 04-09, 2018, pages175–186, 2018.

[6] David E Rumelhart, Geo↵rey E Hinton, Ronald JWilliams, et al. Learning representations by back-propagating errors. Cognitive modeling, 5(3):1,1988.

[7] Florian Tramer, Alexey Kurakin, Nicolas Pa-pernot, Ian J. Goodfellow, Dan Boneh, andPatrick D. McDaniel. Ensemble adversarial train-ing: Attacks and defenses. In 6th International

Conference on Learning Representations, ICLR

2018, Vancouver, BC, Canada, April 30 - May 3,

2018, Conference Track Proceedings, 2018.

3

Page 76: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Evaluation

• We implemented BDA in Rust

• The system is available at https://github.com/bda-tool/bda/

Page 77: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Evaluation

• Code and Intra-procedural Path Coverage• Program Dependence Analysis• Necessity of Posterior Analysis• Effect of Sampling• Analysis Overhead• Downstream Analysis

Page 78: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Benchmark: SPECTINT 2000 Timeout Budget: 12 hours

PROGRAM #REFERALTO

#FOUND MISS(%) #EXTRA MISTYPED(%)164.gzip 3,580 2,229,749 0.00 2,226,169 13.55175.vpr 13,042 36,840,012 0.00 36,826,970 72.45181.mcf 2,050 588,076 0.00 586,026 55.20

186.crafty 30,777 44,139,556 0.00 44,108,779 11.16197.parser 15,196 32,905,403 0.00 32,890,207 89.21252.eon 4,401 994,655 0.00 990,264 98.02

253.perlbmk 57,507 102,068,477 0.00 100,349,485 92.69254.gap 7,935 10,611,636 0.00 10,603,701 94.06

255.vortex 29,971 265,981,817 0.00 265,951,846 89.66256.bzip2 4,306 2,466,876 0.00 2,462,570 28.71300.twolf 16,710 44,735,257 0.00 44,718,440 75.42

Avg. 16.861 49,414,683 0.00 49,246,769 65.47

Evaluation: Program Dependence Analysis (Compared with ALTO)

Page 79: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

PROGRAM #REFERALTO

#FOUND MISS(%) #EXTRA MISTYPED(%)164.gzip 3,580 2,229,749 0.00 2,226,169 13.55175.vpr 13,042 36,840,012 0.00 36,826,970 72.45181.mcf 2,050 588,076 0.00 586,026 55.20

186.crafty 30,777 44,139,556 0.00 44,108,779 11.16197.parser 15,196 32,905,403 0.00 32,890,207 89.21252.eon 4,401 994,655 0.00 990,264 98.02

253.perlbmk 57,507 102,068,477 0.00 100,349,485 92.69254.gap 7,935 10,611,636 0.00 10,603,701 94.06

255.vortex 29,971 265,981,817 0.00 265,951,846 89.66256.bzip2 4,306 2,466,876 0.00 2,462,570 28.71300.twolf 16,710 44,735,257 0.00 44,718,440 75.42

Avg. 16.861 49,414,683 0.00 49,246,769 65.47

Dependence observed by reference execution

Evaluation: Program Dependence Analysis (Compared with ALTO)

Page 80: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

PROGRAM #REFERALTO

#FOUND MISS(%) #EXTRA MISTYPED(%)164.gzip 3,580 2,229,749 0.00 2,226,169 13.55175.vpr 13,042 36,840,012 0.00 36,826,970 72.45181.mcf 2,050 588,076 0.00 586,026 55.20

186.crafty 30,777 44,139,556 0.00 44,108,779 11.16197.parser 15,196 32,905,403 0.00 32,890,207 89.21252.eon 4,401 994,655 0.00 990,264 98.02

253.perlbmk 57,507 102,068,477 0.00 100,349,485 92.69254.gap 7,935 10,611,636 0.00 10,603,701 94.06

255.vortex 29,971 265,981,817 0.00 265,951,846 89.66256.bzip2 4,306 2,466,876 0.00 2,462,570 28.71300.twolf 16,710 44,735,257 0.00 44,718,440 75.42

Avg. 16.861 49,414,683 0.00 49,246,769 65.47

Dependence detected by reference execution but not by the tool

Evaluation: Program Dependence Analysis (Compared with ALTO)

Page 81: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

PROGRAM #REFERALTO

#FOUND MISS(%) #EXTRA MISTYPED(%)164.gzip 3,580 2,229,749 0.00 2,226,169 13.55175.vpr 13,042 36,840,012 0.00 36,826,970 72.45181.mcf 2,050 588,076 0.00 586,026 55.20

186.crafty 30,777 44,139,556 0.00 44,108,779 11.16197.parser 15,196 32,905,403 0.00 32,890,207 89.21252.eon 4,401 994,655 0.00 990,264 98.02

253.perlbmk 57,507 102,068,477 0.00 100,349,485 92.69254.gap 7,935 10,611,636 0.00 10,603,701 94.06

255.vortex 29,971 265,981,817 0.00 265,951,846 89.66256.bzip2 4,306 2,466,876 0.00 2,462,570 28.71300.twolf 16,710 44,735,257 0.00 44,718,440 75.42

Avg. 16.861 49,414,683 0.00 49,246,769 65.47

Dependence detected by the tool but not by reference execution

Evaluation: Program Dependence Analysis (Compared with ALTO)

Page 82: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

PROGRAM #REFERALTO

#FOUND MISS(%) #EXTRA MISTYPED(%)164.gzip 3,580 2,229,749 0.00 2,226,169 13.55175.vpr 13,042 36,840,012 0.00 36,826,970 72.45181.mcf 2,050 588,076 0.00 586,026 55.20

186.crafty 30,777 44,139,556 0.00 44,108,779 11.16197.parser 15,196 32,905,403 0.00 32,890,207 89.21252.eon 4,401 994,655 0.00 990,264 98.02

253.perlbmk 57,507 102,068,477 0.00 100,349,485 92.69254.gap 7,935 10,611,636 0.00 10,603,701 94.06

255.vortex 29,971 265,981,817 0.00 265,951,846 89.66256.bzip2 4,306 2,466,876 0.00 2,462,570 28.71300.twolf 16,710 44,735,257 0.00 44,718,440 75.42

Avg. 16.861 49,414,683 0.00 49,246,769 65.47

Dependence whose source and destination have different types

Evaluation: Program Dependence Analysis (Compared with ALTO)

The checker is implemented as an LLVM pass, propagating symbol information to individual instructions, registers and memory locations.

Page 83: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Evaluation: Program Dependence Analysis (Compared with ALTO)

PROGRAM #REFERALTO

#FOUND MISS(%) #EXTRA MISTYPED(%)164.gzip 3,580 2,229,749 0.00 2,226,169 13.55175.vpr 13,042 36,840,012 0.00 36,826,970 72.45181.mcf 2,050 588,076 0.00 586,026 55.20

186.crafty 30,777 44,139,556 0.00 44,108,779 11.16197.parser 15,196 32,905,403 0.00 32,890,207 89.21252.eon 4,401 994,655 0.00 990,264 98.02

253.perlbmk 57,507 102,068,477 0.00 100,349,485 92.69254.gap 7,935 10,611,636 0.00 10,603,701 94.06

255.vortex 29,971 265,981,817 0.00 265,951,846 89.66256.bzip2 4,306 2,466,876 0.00 2,462,570 28.71300.twolf 16,710 44,735,257 0.00 44,718,440 75.42

Avg. 16.861 49,414,683 0.00 49,246,769 65.47

ALTO reports 49M dependence, 65% of them are mis-typed , without missing any.

Page 84: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

PROGRAM #REFERBDA

#FOUND MISS(%) #EXTRA MISTYPED(%)164.gzip 3,580 29,370 0.22 25,798 11.92175.vpr 13,042 559,460 0.08 546,428 61.88181.mcf 2,050 3,347 0.00 1,297 12.94

186.crafty 30,777 1,077,346 0.15 1,046,614 7.31197.parser 15,196 659,867 0.01 644,673 81.12252.eon 4,401 28,855 0.00 24,454 78.11

253.perlbmk 57,507 5,389,973 0.23 5,363,373 82.77254.gap 7,935 205,200 0.52 197,306 74.30

255.vortex 29,971 2,159,444 0.33 2,129,473 64.10256.bzip2 4,306 13,917 0.23 9,621 10.84300.twolf 16,710 2,285,090 0.34 2,268,436 73.45

Avg. 16.861 1,128,352 0.19 1,114,316 50.80

Evaluation: Program Dependence Analysis (Compared with ALTO)

BDA reports 1M dependence (48 times smaller than ALTO’s), 50% of them are mis-typed, only with 0.19% missing rate

Page 85: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

PROGRAM #REFERBDA

#FOUND MISS(%) #EXTRA MISTYPED(%)... ... ... ... ... ...

175.vpr 13,042 559,460 0.08 546,428 61.88181.mcf 2,050 3,347 0.00 1,297 12.94

186.crafty 30,777 1,077,346 0.15 1,046,614 7.31... ... ... ... ... ...

PROGRAM #REFERVSA

#FOUND MISS(%) #EXTRA MISTYPED(%)... ... ... ... ... ...

175.vpr 13,042 TIMEOUT TIMEOUT TIMEOUT TIMEOUT181.mcf 2,050 23,068 0.00 21,018 54.33

186.crafty 30,777 TIMEOUT TIMEOUT TIMEOUT TIMEOUT... ... ... ... ... ...

BAP-VSA only handles 181.mcf within 12 hours.

Evaluation: Program Dependence Analysis (Compared with VSA)

Page 86: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

PROGRAM #REFERBDA

#FOUND MISS(%) #EXTRA MISTYPED(%)... ... ... ... ... ...

175.vpr 13,042 559,460 0.08 546,428 61.88181.mcf 2,050 3,347 0.00 1,297 12.94

186.crafty 30,777 1,077,346 0.15 1,046,614 7.31... ... ... ... ... ...

PROGRAM #REFERVSA

#FOUND MISS(%) #EXTRA MISTYPED(%)... ... ... ... ... ...

175.vpr 13,042 TIMEOUT TIMEOUT TIMEOUT TIMEOUT181.mcf 2,050 23,068 0.00 21,018 54.33

186.crafty 30,777 TIMEOUT TIMEOUT TIMEOUT TIMEOUT... ... ... ... ... ...

BAP-VSA only handles 181.mcf within 12 hours.

Evaluation: Program Dependence Analysis (Compared with VSA)

BDA reports 5 times less dependence, with less mistyped ones.

Page 87: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Evaluation: Downstream Analysis

Page 88: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

PROGRAM#INDIRECT JUMP EDGES #INDIRECT CALL EDGESIDA REFER BDA IDA REFER BDA

164.gzip 0 0 0 0 3 3175.vpr 49 0 49 0 1 1176.gcc 3,628 324 3,628 25 214 853181.mcf 0 0 0 0 1 1

186.crafty 159 38 159 0 1 1197.parser 0 0 0 0 1 1252.eon 17 0 17 0 183 215

253.perlbmk 1,454 229 1,454 24 243 261254.gap 63 5 63 2 1,438 7,836

255.vortex 247 56 247 0 24 27256.bzip2 0 0 0 0 1 1300.twolf 17 0 17 0 1 1

Avg. 470 54 470 4 176 767

Evaluation: Identify Indirect Control Flow Targets

IDA is a widely used commercial disassembling tools.

Page 89: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

PROGRAM#INDIRECT JUMP EDGES #INDIRECT CALL EDGESIDA REFER BDA IDA REFER BDA

164.gzip 0 0 0 0 3 3175.vpr 49 0 49 0 1 1176.gcc 3,628 324 3,628 25 214 853181.mcf 0 0 0 0 1 1

186.crafty 159 38 159 0 1 1197.parser 0 0 0 0 1 1252.eon 17 0 17 0 183 215

253.perlbmk 1,454 229 1,454 24 243 261254.gap 63 5 63 2 1,438 7,836

255.vortex 247 56 247 0 24 27256.bzip2 0 0 0 0 1 1300.twolf 17 0 17 0 1 1

Avg. 470 54 470 4 176 767

Evaluation: Identify Indirect Control Flow Targets

IDA is a widely used commercial disassembling tools.

Page 90: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

PROGRAM#INDIRECT JUMP EDGES #INDIRECT CALL EDGESIDA REFER BDA IDA REFER BDA

164.gzip 0 0 0 0 3 3175.vpr 49 0 49 0 1 1176.gcc 3,628 324 3,628 25 214 853181.mcf 0 0 0 0 1 1

186.crafty 159 38 159 0 1 1197.parser 0 0 0 0 1 1252.eon 17 0 17 0 183 215

253.perlbmk 1,454 229 1,454 24 243 261254.gap 63 5 63 2 1,438 7,836

255.vortex 247 56 247 0 24 27256.bzip2 0 0 0 0 1 1300.twolf 17 0 17 0 1 1

Avg. 470 54 470 4 176 767

IDA is a widely used commercial disassembling tools.

Evaluation: Identify Indirect Control Flow Targets

Page 91: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

PROGRAM#INDIRECT JUMP EDGES #INDIRECT CALL EDGESIDA REFER BDA IDA REFER BDA

164.gzip 0 0 0 0 3 3175.vpr 49 0 49 0 1 1176.gcc 3,628 324 3,628 25 214 853181.mcf 0 0 0 0 1 1

186.crafty 159 38 159 0 1 1197.parser 0 0 0 0 1 1252.eon 17 0 17 0 183 215

253.perlbmk 1,454 229 1,454 24 243 261254.gap 63 5 63 2 1,438 7,836

255.vortex 247 56 247 0 24 27256.bzip2 0 0 0 0 1 1300.twolf 17 0 17 0 1 1

Avg. 470 54 470 4 176 767

Evaluation: Identify Indirect Control Flow Targets

BDA performs as good as IDA in inferring indirect jump targets. (470 in average)

BDA reports 767 indirect call edges, without missing any observer ones .

Page 92: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Evaluation: Malware Analysis

• Malware behavior is largely defined by its system and library calls, together with parameter values.

• BDA performs static constant propagation through dependence, to identify the parameter values.

Page 93: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Evaluation: Malware Analysis

BDA reports 3times more hidden malicious behaviors than cuckoo.

Cuckoo is the state-of-the-art malware analysis tool.

MD5 OF MALWARE REPORT DATE#LIBRARY CALLCUCKOO BDA

1a0b96488c4be390ce2072735ffb0e49 2019-01-22 50 1643fb857173602653861b4d0547a49b395 2018-07-24 20 11249c178976c50cf77db3f6234efce5eeb 2019-01-23 23 485e890cb3f6cba8168d078fdede090996 2019-01-25 28 1386dc1f557eac7093ee9e5807385dbcb05 2018-12-23 20 7572afccb455faa4bc1e5f16ee67c6f915 2019-07-02 6 8174124dae8fdbb903bece57d5be31246b 2019-03-21 36 203912bca5947944fdcd09e9620d7aa8c4a 2019-10-04 20 68a664df72a34b863fc0a6e04c96866d4c 2018-12-20 23 99c38d08b904d5e1c7c798e840f1d8f1ee 2018-08-28 34 151c63cef04d931d8171d0c40b7521855e9 2019-01-23 20 81dc4db38f6d3c1e751dcf06bea072ba9c 2018-10-23 20 77

Avg. / 25 108

Page 94: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Evaluation: Malware Analysis

MD5 OF MALWARE REPORT DATE#LIBRARY CALLCUCKOO BDA

1a0b96488c4be390ce2072735ffb0e49 2019-01-22 50 1643fb857173602653861b4d0547a49b395 2018-07-24 20 11249c178976c50cf77db3f6234efce5eeb 2019-01-23 23 485e890cb3f6cba8168d078fdede090996 2019-01-25 28 1386dc1f557eac7093ee9e5807385dbcb05 2018-12-23 20 7572afccb455faa4bc1e5f16ee67c6f915 2019-07-02 6 8174124dae8fdbb903bece57d5be31246b 2019-03-21 36 203912bca5947944fdcd09e9620d7aa8c4a 2019-10-04 20 68a664df72a34b863fc0a6e04c96866d4c 2018-12-20 23 99c38d08b904d5e1c7c798e840f1d8f1ee 2018-08-28 34 151c63cef04d931d8171d0c40b7521855e9 2019-01-23 20 81dc4db38f6d3c1e751dcf06bea072ba9c 2018-10-23 20 77

Avg. / 25 108

BDA reports 3times more hidden malicious behaviors than cuckoo.

Cuckoo is the state-of-the-art malware analysis tool.

• open(“/etc/passwd”, O_RDONLY) à Read user password

• system(“/bin/sh”) à Return a remote shell

• system(“rm –rf /”) à Remove all the files on disk

Page 95: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Closely Related WorkRandom abstract interpretation

• Discovering affine equalities using random interpretation [POPL 03]

• Global value numbering using random interpretation [POPL 04]

• Precise inter-procedural analysis using random interpretation

[POPL 05]

Path encoding

• Efficient path profiling [MICRO 96]

• Precise Calling Context Encoding [ICSE 10]

Reducing the runtime complexity of path-sensitive analysis

• ESP: Path-sensitive program verification in polynomial time [PLDI 02]

• Sound, complete and scalable path-sensitive analysis [PLDI 08]

Page 96: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Conclusion

• We propose a practical program dependence analysis for binary executables• A novel unbiased whole-program path sampling

algorithm• A per-path abstract interpretation• Probabilistic guarantees in disclosing a dependence

relation

• Result• Improve the state-of-the-art, such as Value Set Analysis• Improve performance of downstream applications

Page 97: BDA: Practical Dependence Analysis for Binary Executables ... · Intro: Existing Works •The state-of-the-art: Value Set Analysis (VSA) ANGR-VSA BAP-VSA 164.gzip TIMEOUT (12 hours)

Thank you!

Q&A

BDA Repohttps://github.com/bda-tool/bda/

Email Address [email protected]