formalizing memory consistency models for program analysis

38
Formalizing Memory Consistency Models for Program Analysis Jason Yue Yang This work was supported in part by NSF Research Grant No. CCR-0081406 and SRC Task 1031.001. Doctoral Dissertation Defense

Upload: lazar

Post on 25-Feb-2016

49 views

Category:

Documents


3 download

DESCRIPTION

Doctoral Dissertation Defense. Formalizing Memory Consistency Models for Program Analysis. Jason Yue Yang. This work was supported in part by NSF Research Grant No. CCR-0081406 and SRC Task 1031.001. Motivation. Memory architectures - more aggressive. Data dependence. Load/store. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Formalizing Memory Consistency Models  for  Program Analysis

Formalizing Memory Consistency Models

for Program Analysis

Jason Yue Yang

This work was supported in part by NSF Research Grant No. CCR-0081406 and SRC Task 1031.001.

Doctoral Dissertation Defense

Page 2: Formalizing Memory Consistency Models  for  Program Analysis

2

Memory architectures - more aggressive

Central Problem – shared memory consistency models

- Need a clear specification of memory ordering rules- Need an executable version of memory ordering rules- Need a method to analyze thread executions against the rules

Load/store Data dependence

Semaphore

Memory fence Load-acquire/store-release

Write atomicity

Motivation

Multithreaded software – popular, BUT hard to analyze- Thread libraries: e.g., P-thread, Win32, Solaris- Language level support of threads: e.g., Java

Page 3: Formalizing Memory Consistency Models  for  Program Analysis

3

What Is a Memory Model?

It defines the legal orderings of memory operations that can be perceived at the user level

CPU

memory

st a,1;st b,1;

ld r1,b; <1>ld r2,a; <0>

st a,1 ;st.rel b,1;

ld.acq r1,b; <1>ld r2,a; <1>

CPU CPUCPU

memory

Example (Itanium assembly code, initially: a = b = 0)Can’t observe 0

store/loadless restriction

store-release/load-acquiremore restriction

0 is OK

Page 4: Formalizing Memory Consistency Models  for  Program Analysis

4

Classical Memory Models

1. Common total order 2. Program order 3. Read sees the “latest”

write

Sequential Consistency (SC)

Other Weaker Models:Parallel Random Access Memory (PRAM), Coherence, Causal Consistency,Processor Consistency, Release Consistency, Lazy Release Consistency,Location Consistency, and more …

memory

They execute as ifconnected to a singlememory through anon-deterministic switch

Non-operational View: Operational View:

Page 5: Formalizing Memory Consistency Models  for  Program Analysis

5

Industrial Memory Models

The Intel Itanium® Memory Model • Intel application note contains more than 30 pages of semi-formal rules • English + large amount of special notations• Many non-obvious consequences• Use litmus tests to illustrate properties• Cannot automatically execute litmus tests• Use pencil-and-paper reasoning

Example:

Page 6: Formalizing Memory Consistency Models  for  Program Analysis

6

Language Level Memory Models

• Original JMM: Chapter 17 of Java Language Specification • Poorly understood• Flawed

- too weak (may introduce security hole)- too strong (prevents common optimizations)

• Currently under revision (JSR-133)- Extensive discussions for more than 3 years- Several replacement proposals- Issues still remain

Example:The Java Memory Model (JMM)

Page 7: Formalizing Memory Consistency Models  for  Program Analysis

7

Why Does a Memory Model Matter?

Initially, flag1 = flag2 = false, turn = 0.

Thread 1 Thread 2flag1 = true;

turn = 2;while (turn == 2 && flag2) ;<critical section>flag1 = false;

flag2 = true;turn = 1;while (turn == 1 && flag1) ;<critical section>flag2 = false;

Can both threads enter the critical section simultaneously?

• For sequential consistency: No (the “intended behavior” is guaranteed)• For many weaker models: Yes (the algorithm would be broken)

Example: Peterson’s Algorithm for Mutual Exclusion

Page 8: Formalizing Memory Consistency Models  for  Program Analysis

8

Do Programmers Really Care?Another example: Double-Checked Locking for Singleton creation

class foo { private static Helper helper = null; public static Helper get() { if (helper == null) { synchronized (this) { if (helper == null) helper = new Helper(); } } return helper; }}

Only use locking as needed

“Double-check” the reference

Page 9: Formalizing Memory Consistency Models  for  Program Analysis

9

Broken Under the Current JMM

class foo { private static Helper helper = null; public static Helper get() { if (helper == null) { synchronized (this) { if (helper == null) helper = new Helper(); } } return helper; }}

Only use locking as needed

“Double-check” the reference

Problem:Broken under the JMM! - on weak architectures - with race conditions - reference can be “visible” before constructor completes

Can’t guarantee Helper is fully constructed!

Page 10: Formalizing Memory Consistency Models  for  Program Analysis

10

Problems with Previous Approaches

Virtually for all industrial weak memory models •They don’t have formal specifications

For those that do have a formal spec on paper•They can’t be executed

For those that have a machine-readable formal spec•They use a “state machine” approach that

- employ architecture-specific data structures - cannot be decomposed into orthogonal components- have not been verified against higher level rules

No support for verifying “programmer expectations” in multithreaded software

Page 11: Formalizing Memory Consistency Models  for  Program Analysis

11

Analysis of Multithreaded Software

Intra-procedural Inter-procedural

Inter-threadIntra-thread

More precise

Memory-modelinsensitive

More Scalable

My thesis work

Memory-modelsensitive

Page 12: Formalizing Memory Consistency Models  for  Program Analysis

12

ContributionsOperational style framework - UMMApplications:

Language level memory model issues Applications:

Prototype tools based on various solvers: CLP, SAT, QBF

Incremental SAT solving; Different encoding

Intel Itanium Memory Model, Classical memory models

Execution validationRace detectionAtomicity verification

Operational Specification

Method

Axiomatic Specification

Method

Constraint Solving Method

Concurrency Analysis

Non-Operational style framework - NemosApplications:

Java Memory Model, Classical memory models

Page 13: Formalizing Memory Consistency Models  for  Program Analysis

13

Operational Approach: UMM

1. Supports formal verification Integrates a model checker (Murphi) Inspired by Park & Dill’s work on Sparc

2. Employs a generic memory abstraction To eliminate architecture-specific complexities Uniform notation A parameterized method

Uniform Memory Model

Page 14: Formalizing Memory Consistency Models  for  Program Analysis

14

UMM Abstract Machine

LIB – Local Instruction Buffer GIB – Global Instruction Buffer

LIBjLIBi

ThreadjThreadi

GIB

- Only two layers

- GIB can grow as needed

Key insight: make it easy to configure program order and visibility order

Page 15: Formalizing Memory Consistency Models  for  Program Analysis

15

General Strategy in UMM

Enabling mechanism- Program order may be relaxed to enable - certain interleaving- Controlled via bypassing table

Filtering mechanism- Visibility order constructed from GIB following - proper ordering requirements - Enforced in read selection rules

Page 16: Formalizing Memory Consistency Models  for  Program Analysis

16

UMM Example: Sequential Consistency

Event Condition Actionread iLIBt(i) :

ready(i) op(i) = Read ( wGIB: legalWrite(i, w))

i.data := data(w);LIBt(i) := delete(LIBt(i), i);

write iLIBt(i) : ready(i) op(i) = Write

GIB := append(GIB, i);LIBt(i) := delete(LIBt(i), i);

Transition Table

ready(i) jLIBt(i): pc(j) < pc(i) BYPASS[op(j)][op(i)] = No

legalWrite(r, w) op(w) = Write var(w) = var(r) ( w’GIB : op(w’) = Write var(w’) = var(r) time(r) > time(w’) time(w’) > time(w))

Program order

Visibility order

Page 17: Formalizing Memory Consistency Models  for  Program Analysis

17

Non-Operational Approach: Nemos Desired Features

Easy to understand, flexiblePrecise

Compositional, modularExecutable

SolutionsDeclarative (axiomatic)Predicate logic“Higher order” logicMake “hidden” rules explicit

Key insights (1) Make the rules higher order - pass down the order relation through all the rules

- Compositional, reusable, scalable, easy to compare (2) Make all rules explicit

- Executable using a constraint-programming system

(Non-operational yet Executable Memory Ordering Specifications)

Page 18: Formalizing Memory Consistency Models  for  Program Analysis

18

legal ops order requireProgramOrder ops order requireReadValue ops order requireWeakTotalOrder ops odder requireTransitiveOrder ops order requireAsymmetricOrder ops order

Nemos Example: Sequential Consistency

Formal Definition of SC

- Program order

requireTransitiveOrder ops order i, j, k ops. (order i j order j k) order i k

requireProgramOrder ops order i, j ops. (t i = t j pc i < pc j) (t i = t_init t j t_init) order i j

- Common total order

- Read sees “latest” write

order is repeatedly refined

Hidden rules are explicit

(ops is the execution; order is the ordering relation)

Page 19: Formalizing Memory Consistency Models  for  Program Analysis

19

The Itanium Memory Ordering Rules

legal ops order requireLinearOrder ops order requireWriteOperationOrder ops order requirePO ops odder requireMemoryDataDependence ops order requireDataFlowDependence ops order requireCoherence ops order requireReadValue ops order requireAtomicWBRelease ops order requireNoUCBypass ops order

Page 20: Formalizing Memory Consistency Models  for  Program Analysis

20

– requireLinearOrder • Irreflexive• Transitive• Total• Asymmetric

– requireWriteOperationOrder • Local/Remote case• Remote/Remote case

– requireProgramOrder• Acquire Rule• Release Rule• Fence Rule

– requireMemoryDataDependence • MD:RAW• MD:WAR• MD:WAW

– requireDataFlowDependence • DF:RAW• DF:WAR• DF:WAW

– requireCoherence •Local/Local case•Remote/Remote case

– requireReadValue •ValidWr

•ValidLocalWr•ValidRemoteWr•ValidDefaultWr

•ValidRd

– requireAutomicWBRelease

– requireSequentialUC –RAR Rule–RAW Rule–WAR Rule–WAW Rule

– requireNoUCBypasss

Specification Hierarchy for Itanium

Page 21: Formalizing Memory Consistency Models  for  Program Analysis

21

Execution Validation:

Memory Model Specification Constraints

How to Make an Axiomatic Specification Executable?

SAT

UNSAT

SolverCLPSATQBF

Test Program

validateExecution ops order. legal ops order

- Effective for revealing critical properties- Effective for verifying common programming patterns

Page 22: Formalizing Memory Consistency Models  for  Program Analysis

22

• Implementation in FD-Prolog is straightforward• Universal quantification handled via enumeration

• Existential quantification handled via backtracking• Built-in constraint solver from FD-Prolog:

- logical variables- Finite-domain (FD) variables

Using Constraint Logic Programming (CLP)

Page 23: Formalizing Memory Consistency Models  for  Program Analysis

23

How to Encode the Ordering Relation?

Given a test program with N operations, use a 2D precedence matrix with N2 constraint variables

Interpret the symbolic execution, impose constraints to the 2D matrix

When interpretation finishes, x values reveal latitude in weak order

When an x changes to a 1, an attempt to set it to 0 later triggers backtracking

x x x x x xx x x x x xx x x x x xx x x x x xx x x x x xx x x x x x

ji

Values of entry Mij:1: i is ordered before j0: i is not ordered before jx: value not bound yet

Precedence matrix M

nn Encoding: The Method:

Page 24: Formalizing Memory Consistency Models  for  Program Analysis

24

Example of Prolog Implementation

requireProgramOrder ops order i, j ops. (t i = t j pc i < pc j) (t i = t_init t j t_init) order i j

requireProgramOrder(Ops,Order):- for_each_elem(Ops,Order,doProgramOrder).

elem_prog(doProgramOrder,Ops,Order,I,J):- nth(I,Ops,Oi), nth(J,Ops,Oj), p(Oi,P_i), p(Oj,P_j), pc(Oi,PC_i), pc(Oj,PC_j), length(Ops,N), matrix_elem(Order,N,I,J,Oij),

(T_i #= T_j #/\ PC_i #< PC_j) #\/ T_i #= 0 #/\ T_j #\= 0) #=> Oij.

Formal Specification (e.g., requireProgramOrder)

SICStus Prolog Code

Page 25: Formalizing Memory Consistency Models  for  Program Analysis

25

Interactive and Incremental Analysis

Initially, a = b = 0.

P1st a,1;st b,1;

P2ld r1,b; <1>ld r2,a; <0>

Can r1 = 1 and r2 = 0?

P1 P2 (1) st_local(a,1); (7) ld(1,b); (2) st_remote1(a,1); (8) ld(0,a); (3) st_remote2(a,1); (4) st_local(b,1); (5) st_remote1(b,1); (6) st_remote2(b,1);

Itanium Test Program Execution (ops)

0 1 1 x x x x x 0 0 1 x x x x x 0 0 0 x x x x 0x x x 0 1 1 1 xx x x 0 0 1 1 xx x x 0 0 0 1 xx x x 0 0 0 0 xx x 1 x x x x 0

Result: legal

1 2 3 4 5 6 7 812345678

Order satisfying all constraints An instantiated Order

Interleaving: 8 4 5 6 7 1 2 3

0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 01 1 1 0 1 1 1 01 1 1 0 0 1 1 01 1 1 0 0 0 1 01 1 1 0 0 0 0 01 1 1 1 1 1 1 0

1 2 3 4 5 6 7 812345678

Page 26: Formalizing Memory Consistency Models  for  Program Analysis

26

The SAT/QBF Approach

Initially, we “retro-fit” our Prolog version with SAT-generating code - Showed speed improvement in constraint solving, BUT … - Still slow in CNF generation - Very difficult to debug

So we re-engineered our tool: (Done by Prof. Ganesh Gopalakrishnan) - “Stamping out” a finite execution as a QBF formula - “Stamping out” a finite execution as a CNF formula - Experimenting different encoding method: nn vs. nlogn - Check pointing SAT generation

Page 27: Formalizing Memory Consistency Models  for  Program Analysis

27

Gist of Results1. SAT seems to be better than QBF2. The nn encoding method is better than nlogn

- despite using more bits - many unit clauses, good for SAT solving

2. Check pointing method does pay-off up to 64 tuples3. We can easily handle 128 operations4. Latest result: completed Intel-provided test run

(experiment done by Hemanthkumar Sivaraj)- test contains 500 Itanium memory operations- had to suppress the total-order constraint, UNSAT- takes 10 sec to generate SAT instance; 0.1 sec to solve- still lots of room for improvement

Page 28: Formalizing Memory Consistency Models  for  Program Analysis

28

How to Verify Programmer Expectations?

Program propertiese.g., race / atomicity

(2) Model correctness properties as additional constraints (3) Reduce a verification problem to a constraint satisfaction problem

and solve it automatically

SAT

UNSAT

Solver

Test Program

Constraints

(1) Define both intra-thread and inter-thread semantics as constraints

Program semantics

+

Memory model semantics

Page 29: Formalizing Memory Consistency Models  for  Program Analysis

29

Race DetectionWhat’s a data-race? Informally: conflicting and concurrent accesses

Initially, a = b = 0.Thread 1r1 = a;if (r1 > 0) b = 1;

Thread 2r2 = b;if (r2 > 0) a = 1;

Is this program race-free?

• Control flow interwoven with memory consistency requirements• Hence, the question depends on the memory model

- Under SC, this program is race-free- Under a weaker model, this program might contain races

Are these two instructions conflicting and concurrent?

Page 30: Formalizing Memory Consistency Models  for  Program Analysis

30

Constraints for Control Flow• Treat control operations similar to memory operations

–Imagine “assigns” and “uses” of “control variables”• Add an auxiliary control variable ck for each branch statement k, and convert the if-statement to an auxiliary assign of ck

–E.g. if(r1>0) becomes c1=r1>0• Every op k has a path predicate ctrExpr

–K is a use of those control variables in ctrExpr• k is feasible if ctrExpr evaluates to ture• Feasibility of ops are checked when setting the rules

Page 31: Formalizing Memory Consistency Models  for  Program Analysis

31

Data and Control Dependence

Data/control flow can be treated similar to global read value rule, i.e., a read should see the “latest” write

Global Reads: for all r = x, exists a x = …Local Reads: for all x = r, exists a r = …Control Reads: for all op that depends on c, exists a c = …

requireReadValue ops order globalReadValue ops

order localReadValue ops

order controlReadValue ops

order

Page 32: Formalizing Memory Consistency Models  for  Program Analysis

32

How to Formalize Data-Race?detectDataRace ops scOrder, hbOrder.

legalSC ops scOrder requireHbOrder ops hbOrder mapConstraints ops hbOrder

scOrder existDataRace ops hbOrderrequireHbOrder ops hbOrder requireProgramOrder ops

hbOrder requireSyncOrder ops hbOrder

requireTransitiveOrder ops hbOrderexistDataRace ops hbOrder i, j ops.

conflictingAccess i j ¬ (hbOrder i j) ¬ (hbOrder j

i)

Page 33: Formalizing Memory Consistency Models  for  Program Analysis

33

Atomicity Verification

What’s Atomicity? Informally: a block of code executed atomically Neither a necessary nor a sufficient condition for race-

freedomOur approach:

Annotate the atomic block with AtomicEnter and AtomicExit Verify it automatically Our definition is generic, can be fine-tuned

Page 34: Formalizing Memory Consistency Models  for  Program Analysis

34

Constraints for AtomicityverifyAtomicity ops order.

legalSC ops order existsAtomicityViolation ops order

existsAtomicityViolation ops order i, j, k ops.

matchedAtomicPair i j (t k t i) ¬ (order k i) ¬ (order j k)

Page 35: Formalizing Memory Consistency Models  for  Program Analysis

35

ConclusionMy thesis addressed the following issues

- How to make memory ordering rules clear and executable?

- How to analyze thread executions against these rules?Our methods have been shown to be practical - A wide range of academic memory models as well as real-world models (Itanium, JMM) - Validation of test cases far exceeded others’ both in speed and scale - Being applied for post-silicon verification in industry

Many “customers” can benefit from our methods- Software developers, compiler writers, system designers

Page 36: Formalizing Memory Consistency Models  for  Program Analysis

36

Publications• Analyzing the CRF Java Memory Model (APSEC’01)• Specifying Java Thread Semantics Using a Uniform Memory Model (JGI’02)• UMM: An Operational Memory Model Specification Framework with Integrated Model Checking Capability (CCPE)

Operational Specification

Method

Axiomatic Specification

Method

Constraint Solving Method

Concurrency Analysis

• Analyzing the Intel Itanium Memory Ordering Rules Using Logic Programming and SAT(CHARME’03)• Nemos: A Framework for Axiomatic and Executable Specifications of Memory Consistency Models (IPDPS’04)• A Constraint-Based Approach for Specifying Memory Consistency Models (sent to TPLP)

• QB or not QB: An Efficient Execution Verification Tool for Memory Orderings (sent to CAV)

• Rigorous Concurrency Analysis of Multithreaded Programs (sent to ISSTA)

Page 37: Formalizing Memory Consistency Models  for  Program Analysis

37

Continuing Research Opportunities Scale-up our approach even further - Give up certain precision - Compositional methods - Create assertion language to help abstraction Improve solving algorithms - Exploit the structural information “Memory-model-sensitive” compilers - Code synthesis, optimization Other application domains - Security, embedded systems

Page 38: Formalizing Memory Consistency Models  for  Program Analysis

Thank You !The dissertation is available at

http://www.cs.utah.edu/~yyang/papers/thesis.pdf

The prototype tools are available athttp://www.cs.utah.edu/~yyang/research.html