writing and reasoning about parallel programs in coq2013/10/29 · writing and reasoning about...
TRANSCRIPT
Writing and Reasoning aboutParallel Programs in Coq
NII Lectures Series
Frederic Loulergue
Universite d’Orleans – LIFO – PaMDA Team
October-November 2013
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 1 / 42
Outline
1 Motivations
2 BSML: Bulk Synchronous Parallel MLThe BSP ModelOverview of the BSML LanguageClassic BSML PrimitivesRevised Bulk Synchronous Parallel ML
3 BSML in CoqShallow Embedding of BSML in CoqBSML Programming and Reasonning in CoqExtraction
4 Summary
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 2 / 42
Outline
1 Motivations
2 BSML: Bulk Synchronous Parallel ML
3 BSML in Coq
4 Summary
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 3 / 42
Our Goal
To ease the development of correct
and verifiable parallel programs with
predictable performances
We should address:
I the easy development of correct and verifiable programs
I the easy development of parallel programs
I the easy development of parallel programs with predictableperformances
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 4 / 42
Easy Development of Correct and Verifiable Programs
I high-level languages: expressive, modular, less error-prone
I high-level languages have simpler semantics, and could have acomplete formal semantics (e.g. Standard ML, ISO Prolog)
I therefore verification of programs is possible and easier
) a high-level parallel language with formal semantics
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 5 / 42
The Easy Development of Parallel Programs withPredictable Performances
I assumption: the goal is to program functions
I issues: non-determinism, deadlocks, di�culty to readprograms, complex semantics and verification, portability . . .
I it is also very important for the programmer to be able toreason about the performance of the programs
) a structured parallel model which allows the design ofportable parallel algorithms with a simple cost model
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 6 / 42
The Bulk Synchronous Parallel ML Approach
Choices
I an e�cient functional programming language with formalsemantics and easy reasoning about the performance ofprograms (strict evaluation):
ML (Objective Caml flavor)
I a restricted model of parallelism with no deadlock, verylimited cases of non-determinism, a simple cost model:
Bulk Synchronous Parallelism
The result is:Bulk Synchronous Parallel ML (BSML)
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 7 / 42
Outline
1 Motivations
2 BSML: Bulk Synchronous Parallel ML
3 BSML in Coq
4 Summary
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 8 / 42
Outline
1 Motivations
2 BSML: Bulk Synchronous Parallel MLThe BSP ModelOverview of the BSML LanguageClassic BSML PrimitivesRevised Bulk Synchronous Parallel ML
3 BSML in CoqShallow Embedding of BSML in CoqBSML Programming and Reasonning in CoqExtraction
4 Summary
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 9 / 42
Bulk Synchronous Parallelism (BSP)
Research on BSP90’ by Valiant (Cambridge) and McColl (Oxford)
Three models
I abstract architecture
I execution model
I cost model
BSP Computer
Ip processor / memory pairs (of speed r)
I a communication network (of speed g)
I a global synchronisation unit (of speed L)
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 10 / 42
Bulk Synchronous Parallelism
Execution model
Cost modelT (s) = max0i<p wi + h ⇥ g + L
where h = max0i<p{h+i , h�i }
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 11 / 42
Outline
1 Motivations
2 BSML: Bulk Synchronous Parallel MLThe BSP ModelOverview of the BSML LanguageClassic BSML PrimitivesRevised Bulk Synchronous Parallel ML
3 BSML in CoqShallow Embedding of BSML in CoqBSML Programming and Reasonning in CoqExtraction
4 Summary
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 12 / 42
Bulk Synchronous Parallel ML
Design principles
I Small set of parallel primitives
I Universal for bulk synchronous parallelism
I Global view of programs
I Simple semantics
BSML
a sequential functional language
+ a parallel data structure
+ parallel operations on this data structure
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 13 / 42
A Parallel Data Structure
Parallel Vectors
I An abstract polymorphic datatype: ’a par
I Fixed size p: each processor has a value of type ’a
I no nesting allowed
) Direct mapping eases the reasonning about performances
Notationh v0 , . . . , vp�1 i
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 14 / 42
Outline
1 Motivations
2 BSML: Bulk Synchronous Parallel MLThe BSP ModelOverview of the BSML LanguageClassic BSML PrimitivesRevised Bulk Synchronous Parallel ML
3 BSML in CoqShallow Embedding of BSML in CoqBSML Programming and Reasonning in CoqExtraction
4 Summary
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 15 / 42
Classic BSML Primitives
I Access to the BSP parameters:
bsp p: intbsp r: floatbsp g: floatbsp l: float
) Programs with performance portabilityI Manipulation of parallel vectors:
I mkpar : (int ! ’a) ! ’a parI proj: ’a par ! (int ! ’a)I apply : (’a ! ’b) par ! ’a par ! ’b parI put: (int ! ’a) par ! (int ! ’a) par
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 16 / 42
Creation of parallel vectors
Signature
mkpar : (int ! ’a) ! ’a par
Informal semanticsmkpar f =
⌦f 0, f 1, . . . , f (p � 1)
↵
Examples# let this = mkpar(fun pid -> pid);;val this : int par = <0, 1, 2, 3, 4, 5, 6, 7>
# let plusMinus = mkpar(fun pid ->if pid mod 2=0then fun x->x+1else fun x->x-1);;
val plusMinus : (int -> int) par =<<fun>, <fun>, <fun>, <fun>, <fun>, <fun>, <fun>, <fun>>
BSP Costmax0i<p
kf ik where kek is the time required to evaluate e
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 17 / 42
Projection
Signature
proj: ’a par ! (int ! ’a)
Informal semantics
proj h v0 , . . . , vp�1 i =function 0 ! v0
...| p � 1 ! vp�1
RemarkI Should not be evaluated in the context of a mkparI Returned function is partial: proj (mkpar f) 6= f
Example# proj (mkpar string_of_int) 2;;- : string = "2"
BSP Costmax0i<p{|vi |,
Pj 6=i |vj |}⇥ g + L where |e| is the value of e’s size
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 18 / 42
Point-wise parallel application
Signature
apply : (’a ! ’b) par ! ’a par ! ’b par
Informal semanticsapply h f0 , . . . , fp�1 i h v0 , . . . , vp�1 i = h f0 v0 , . . . , fp�1 vp�1 i
Example# let v = apply plusMinus this;;val v : int par = <1, 0, 3, 2, 5, 4, 7, 6>
BSP Costmax0i<p
kfi vik
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 19 / 42
Communication I
Signature
put: (int ! ’a) par ! (int ! ’a) par
Informal semanticsput h f0 , . . . , fp�1 i = h g0 , . . . , gp�1 i with gj ⌘ fun src! fsrc j
RemarkI function fi encodes the p messages to be sent from processor i
(fi j) is the message to be sent from i to j
I function gj encodes the p messages received by processor j(gj i) is the message received by j from i
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 20 / 42
Communication II
Exampleapply(mkpar(fun _->fun f->List.map f Stdlib.Base.procs))
(put(mkpar(fun pid dst->if dst=(pid+1) mod bsp_pthen Some pidelse None)));;
- : int option list par =< [None; None; None; None; None; None; None; Some 7],[Some 0; None; None; None; None; None; None; None],[None; Some 1; None; None; None; None; None; None],[None; None; Some 2; None; None; None; None; None],[None; None; None; Some 3; None; None; None; None],[None; None; None; None; Some 4; None; None; None],[None; None; None; None; None; Some 5; None; None],[None; None; None; None; None; None; Some 6; None] >
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 21 / 42
Communication III
BSP Cost
max0i<p
(p�1X
j=0
kfi jk) + max0i<p
{X
j 6=i
|fi j |,X
j 6=i
|fj i |}⇥ g + L
Remark
I The first constant constructor of a sum type has size 0
I Examples: None, [], . . .
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 22 / 42
A More Complicated ExampleCommunication pattern to implement⌦
. . . , . . . , . . . ,a
j1 . . . a
jn , . . . , . . . , . . .
↵
. &a
j1 a
jn
Implementationlet getBounds first last v =let p = bsp_p inlet parfun f v = apply (mkpar(fun _ -> f)) v inlet lasts = parfun last v inlet firsts = parfun first v inlet msg = put(apply(apply(mkpar(fun pid first last dst->
if dst=(pid+1) mod p then Some lastelse if dst=(p+pid-1) mod p
then Some first else None))firsts) lasts) in
( apply msg (mkpar(fun pid->(p+pid-1) mod p)),apply msg (mkpar(fun pid->(pid+1) mod p)) )
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 23 / 42
Outline
1 Motivations
2 BSML: Bulk Synchronous Parallel MLThe BSP ModelOverview of the BSML LanguageClassic BSML PrimitivesRevised Bulk Synchronous Parallel ML
3 BSML in CoqShallow Embedding of BSML in CoqBSML Programming and Reasonning in CoqExtraction
4 Summary
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 24 / 42
Levels of Execution in BSML
I Replicated execution (default)I “sequential” ML codeI every processor does the same
I Local executionI what happens inside parallel vectors, on each of their
componentsI uses local dataI may be di↵erent on di↵erent processors
I Global executionI concerns the set of all processors as a wholeI example: communications
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 25 / 42
Revised BSML
I Classic BSML: impossible to use vectors in a local section
I Revised BSML: access to local information of vector v noted$v$, possible only in a local section, written ⌧ e �
I Examples:
let mkpar f = ⌧ f $this$ �let apply fv vv = ⌧ $fv$ $vv$ �let parfun f v = ⌧ f $v$ �
) mkpar and apply are no longer primitives in RevisedBSML
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 26 / 42
A Revised More Complicated Example
Implementationlet getBounds first last v =let p = bsp_p inlet lasts = << last $v$ >> inlet firsts = << first $v$ >> inlet msg = put << fun dst -> if dst=($this$ + 1) mod p
then Some $lasts$else if dst=(p + $this$ -1) mod p
then Some $firsts$else None >> in
<< $msg$ ((p+ $this$ - 1) mod p) >>,<< $msg$ (($this$ + 1) mod p) >>
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 27 / 42
Parallel Implementation
Version 0.5 (november 2010) & Version 0.51 (November 2013)
I Library for OCaml: Primitives + a Standard Library
I Modular implementation: MPI, TCP/IP, sequentialI What’s new?
I Revised BSMLI New configuration and build process
I http://traclifo.univ-orleans.fr/BSML
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 28 / 42
Outline
1 Motivations
2 BSML: Bulk Synchronous Parallel ML
3 BSML in Coq
4 Summary
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 29 / 42
Outline
1 Motivations
2 BSML: Bulk Synchronous Parallel MLThe BSP ModelOverview of the BSML LanguageClassic BSML PrimitivesRevised Bulk Synchronous Parallel ML
3 BSML in CoqShallow Embedding of BSML in CoqBSML Programming and Reasonning in CoqExtraction
4 Summary
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 30 / 42
Deep vs. Shallow Embedding
Deep Embedding
I Use Coq as a mathematical modelling language
I Language syntax: new induction data-type
I Language semantics: inductive predicates or functions
Shallow Embedding
I Use Coq as a functional programming languageI Library data structure and operations:
I signature of a module: types and functionsI in the Coq way: including properties of these functions
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 31 / 42
Shallow Embedding of BSML in Coq (1)
Module Type PRIMITIVES .
Section Processors.
Parameter bsp p : nat.
Axiom bsp pLtZero: 0 ¡ bsp p.
Definition processor : Type := { pid : nat | pid ¡ bsp p }.End Processors.
Section Parallel vectors.
Parameter par : Type ! Type.
Parameter get: 8 A: Type, par A ! processor ! A.
Parameter par eq : 8 (A:Type) (v v’: par A),(8 (i : processor), get v i = get v’ i) ! v = v’.
End Parallel vectors.
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 32 / 42
Shallow Embedding of BSML in Coq (2)
Section Primitives.
Variable A : Type.
Parameter mkpar :8 f : processor ! A,
{ X : par A | 8 i : processor , get X i = f i }.Parameter apply:8 (B : Type) (vf : par (A ! B)) (vx : par A),
{ X : par B | 8 i : processor ,get X i = (get vf i) (get vx i) }.
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 33 / 42
Shallow Embedding of BSML in Coq (3)
Parameter put:8 (vf : par (processor ! A)),
{ X : par (processor ! A) |8 i j : processor , get X i j = get vf j i }.
Parameter proj :8 (v : par A),
{ X : processor ! A | 8 i : processor , X i = get v i }.End Primitives.
End PRIMITIVES .
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 34 / 42
Coherence with Coq’s Logic
I Implementation of this module type using Coq’s vectors
I Bonus: it is a verified sequential implementation
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 35 / 42
Outline
1 Motivations
2 BSML: Bulk Synchronous Parallel MLThe BSP ModelOverview of the BSML LanguageClassic BSML PrimitivesRevised Bulk Synchronous Parallel ML
3 BSML in CoqShallow Embedding of BSML in CoqBSML Programming and Reasonning in CoqExtraction
4 Summary
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 36 / 42
BSML Programming and Reasonning in Coq (1)
Module Type TYPE
(Import Bsml : Primitives.Specification.PRIMITIVES)(Import PropM : Primitives.Properties.TYPE Bsml)(Import ParSM : Support.ParSig.TYPE Bsml PropM).
Definition replicate (A:Type)(a:A) :{ vr : par A | 8 (i :processor), get vr i = a } :=mkpar(fun ) a).
Definition lreplicate(A:Type)(a:A):= proj2 sig (replicate a).
Hint Rewrite lreplicate: bsmlbase.
Program Definition parfun(A B : Type)(f :A!B)(v :par A) :{ vr : par B | 8 (i :processor), get vr i = f (get v i) }:=apply (replicate f ) v .
Next Obligation.autorewrite with bsml . reflexivity.
Defined.
. . .
End Make.F. Loulergue SyDPaCC – Lecture 3 October-November 2013 37 / 42
BSML Programming and Reasonning in Coq (2)
Program Definition getBoundsAux (A:Type)(l r :A)(v :par(list A))(H:8i , get v i 6=[]) :{ vr : par (option A) | 8 (i :processor),
get vr i = Some ( if ( i == firstProcessor ) then l else sLast (get v (i-1)) ) } ⇥{ vr : par (option A) | 8 (i :processor),
get vr i = Some ( if ( i == lastProcessor) then relse sHead (get v (min (i+1) lastProcessor))) } :=
let msg := put(apply(mkpar(fun (pid :processor) data (dst:processor) )if ( dst == (pid+1) ) && negb(pid == (bsp p-1)) then Some (sLast data)else if ( dst == (pid-1) ) && (negb(pid == 0)) then Some (sHead data)else None)) (parSig v H) ) in
( applyat firstProcessor (constantFunPar processor (Some l)) msg(parSig (mkpar(fun pid)pid-1)) ) ,
applyat lastProcessor (constantFunPar processor (Some r)) msg(parSig (mkpar(fun pid)min (pid+1) lastProcessor)) ) ).
Next Obligation.. . .
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 38 / 42
Outline
1 Motivations
2 BSML: Bulk Synchronous Parallel MLThe BSP ModelOverview of the BSML LanguageClassic BSML PrimitivesRevised Bulk Synchronous Parallel ML
3 BSML in CoqShallow Embedding of BSML in CoqBSML Programming and Reasonning in CoqExtraction
4 Summary
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 39 / 42
From Coq to OCaml
In Coq
I the BSML module type
I BSML programs: in functors taking a implementation asargument
) extraction provides OCaml functors
In OCaml
I the BSML library provides a implementation of the moduletype
I almost . . . because processors:I type nat in CoqI type int in OCaml
) Wrapper OCaml module BsmlNat
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 40 / 42
Outline
1 Motivations
2 BSML: Bulk Synchronous Parallel ML
3 BSML in Coq
4 Summary
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 41 / 42
Summary
It is possible:
I to write BSML programs in Coq
I to extract actual OCaml BSML programs
I to compile them with OCaml compilers and BSMLimplementation
Current BSML in Coq:
I BSML Primitives
I Big subset of BSML pure functional standard library
I Dedicated tacticsI Some “applications”:
I Heat EquationI Prototype verified implementation of OSL skeletonsI Verified implementation of the BH skeleton
F. Loulergue SyDPaCC – Lecture 3 October-November 2013 42 / 42