sumit gulwani ([email protected]) microsoft research, redmond dimensions in program synthesis acm...

60
Sumit Gulwani ([email protected]) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative Programming, 2010

Upload: tamia-farrar

Post on 14-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

Sumit Gulwani([email protected])

Microsoft Research, Redmond

Dimensions in Program Synthesis

ACM Symposium on Principles and Practice of Declarative Programming,

2010

Page 2: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

• What is Program Synthesis?– Synthesize an executable program from user intent

expressed in form of some constraints.

2

Automated Program Synthesis

Compilers Synthesizers

Structured language input Can accept a variety/mixed form of constraints (e.g., logic, examples, traces, partial programs)

Syntax-directed translation

Uses some kind of search

No new algorithmic insights

Discovers new algorithmic insights• Why today?– Natural goal given that computing has become

accessible, but:• fundamental “how” programming models have not

changed. • most people are not expert programmers.

– Enabling technology is now available• Better search/logical reasoning techniques (SAT/SMT

solvers)• Faster machines (good application for multi-cores)

Page 3: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

Applications

• Dimension 1: User Intent

• Dimension 2: Search space

• Dimension 3: Search Technique

3

Outline

Page 4: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

Can be a useful tool for algorithm designers.

• Bitvector algorithms

• Mutual Exclusion algorithms

4

Applications: Discovery of New Algorithms

Page 5: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

• Straight-line programs that use – Arithmetic Operators: +,-,*,/– Logical Operators: Bitwise and/or/not, Shift left/right

• Challenge: Combination of arithmetic + logical operators leads to unintuitive algorithms

• Application: Provides most-efficient way to accomplish a given task on a given architecture

5

Bitvector Algorithms

Page 6: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

1 0 1 0 1 1 0 0

Turn-off rightmost 1-bit

6

Examples of Bitvector Algorithms

1 0 1 0 1 1 0 0

1 0 1 0 1 0 0 0

Z

Z & (Z-1)

1 0 1 0 1 0 1 1

Z

Z-1

1 0 1 0 1 0 0 0

&

Z & (Z-1)

Page 7: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

7

Examples of Bitvector Algorithms

Turn-off rightmost contiguous sequence of 1-bits

Z

Z & (1 + (Z | (Z-1)))

1 0 1 0 1 1 0 0 1 0 1 0 0 0 0 0

Ceil of average of two integers without overflowing

(Y|Z) – ((Y©Z) >> 1)

Page 8: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

8

Examples of Bitvector Algorithms

P25: Higher order half

of product of x and yo1 := and(x,0xFFFF);o2 := shr(x,16);o3 := and(y,0xFFFF);o4 := shr(y,16);o5 := mul(o1,o3);o6 := mul(o2,o3);o7 := mul(o1,o4);o8 := mul(o2,o4);o9 := shr(o5,16);o10 := add(o6,o9);o11 := and(o10,0xFFFF);o12 := shr(o10,16);o13 := add(o7,o11);o14 := shr(o13,16);o15 := add(o14,o12);res := add(o15,o8);

P24: Round up to next highest power of 2

o1 := sub(x,1);o2 := shr(o1,1);o3 := or(o1,o2);o4 := shr(o3,2);o5 := or(o3,o4);o6 := shr(o5,4);o7 := or(o5,o6);o8 := shr(o7,8);o9 := or(o7,o8);o10 := shr(o9,16);o11 := or(o9,o10);res := add(o10,1);

[MSR Techreport 2010] Joint work with Susmit Jha (UC-Berkeley), Ashish Tiwari (SRI) and Venkie (MSR

Redmond)

Page 9: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

• General purpose programming assistance– Program fragments from higher-order logic specifications– Tricky/mundane details after user provides higher-level

insight– Semi-automatic development with various forms of

interactivity• KIDS, Angelic Programming

– Automated debugging• Synthesis of program inverses

– compression/decompression, encryption/decryption– serialization/deserialization– insert/delete operations, transactional memory

rollback • Program understanding

– Malware deobfuscation– Reverse engineering binaries– Programs with missing documentation

9

Applications: Software Development

Page 10: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

 End-Users

Algorithm Designers

Software DevelopersMost

Useful Target

Pyramid of Technology Users

Page 11: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

Automating repetitive tasks for end-users

11

Applications: End-user Programming

Examples Current Technology

Transforming lists of addresses from one format to other Excel Macros

Renaming files in a directory, Managing bibliographies Powershell Scripts

Extracting data from several web pages into a single document

Web Programming

Page 12: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

 

Students

End-Users

Algorithm Designers

Software DevelopersMost

Useful Target

Most Transformational Target

Pyramid of Technology Users

Page 13: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

• Teaching can be made interactive and fun.– Problem solving

• Provide hints.• Point out incorrect deductions.

– Grading• Explain bugs.• Suggest fixes.

– Problem Construction.• Construct different, but equally difficult, exams.• Individualize problem difficulty (like GRE exams).

13

Applications: Automating Teaching

Page 14: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

Target: High school students

Example 1:Draw a line perpendicular to a given line.

ProgramStep 1: Select two points P1 P2 on given line.Step 2: Draw a circle C1 from P1 of radius Dist(P1,P2).Step 3: Draw a circle C2 from P2 of radius Dist(P1,P2).Step 4: Result = Line joining the two points at which C1 and C2 intersect.

14

Domain: 2D Geometric Constructions

[Ongoing Work] Joint work with Vijay Anand (UIUC), Ashish Tiwari (SRI), Monojit Choudhury, Kalika Bali (MSR Bangalore)

Page 15: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

• Bisect a given line.• Bisect an angle.• Copy an angle.• Draw a line parallel to a given line.• Draw an equilateral triangle given two points.• Draw a regular hexagon given a side.

Other Applications: New approximate geometric constructions, 3D planning problems

15

Examples of Geometric Constructions

Page 16: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

16

Teaching System Architecture

Natural Language Processing

Paraphrasing

Synthesis Engine

Problem Description in English

Problem Description as Logical Relation

Solution as Functional Program

Solution in English

Page 17: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

What domains should we prioritize for automation?

• Mathematics– Statistics

• Can be useful for numerical data analysis in real world.

– Probability

• Physics– Mechanics

• Can be useful for modeling physical world.– Optics

• Chemistry– Quantitative Chemistry– Organic Chemistry 17

Other Domains

Page 18: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

• Teaching can be made interactive and fun.– Problem solving

• Provide hints.• Point out incorrect deductions.

– Grading• Explain bugs.• Suggest fixes.

– Problem Construction.• Construct different, but equally difficult, exams.• Individualize problem difficulty (like GRE exams).

• Inter-stellar travel

18

Applications: Automating Teaching

Page 19: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

• Applications

Dimension 1: User Intent

• Dimension 2: Search Space

• Dimension 3: Search Technique

19

Outline

Page 20: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

• Logical specifications– Logical relations between inputs and outputs

• Natural language

• Input-output examples

• Traces

• Programs

20

Dimension 1: User Intent

Page 21: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

21

Logical Specification: Example 1

Problem: Sorting

Logical relation between input array A and output array B of size n.

8k: (0≤k<n-1) )(B[k] ≤ B[k + 1])

Æ 8k 9j: (0≤k<n-1) ) (0≤j<n Æ B[j] = A[k])

Page 22: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

22

Logical Specification: Example 2

Æ[ (I[p]=1 Æ (I[j]=0)) ) (J[p]=0 Æ(J[j] =

I[j])) ]

p=1

b

j=p+1

b

jp

Problem: Turn off rightmost 1-bit

Logical relation between input I and output J: (where both I and J are bit-vectors of size b)

J := I & (I-1)

Page 23: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

• Advances in NLP allow mapping natural language to logic.– NL interfaces have been designed for database

queries.

• Natural language can be ambiguous.– This issue can be resolved by paraphrasing.

23

Natural Language

Page 24: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

• Advantages of Input-Output examples– Easy to provide, No need to remember syntax – Less chances of mistake

• What prevents a trivial table-lookup solution on input-output pairs (xi, yi)?

Switch x Case x1: return y1

Case x2: return y2

:

Case xn: retun yn

– Restriction on search space!

• How to select examples?– Interactive manner!

24

Input-Output Examples

Page 25: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

1. User provides few input-output examples I.2. Sythesizer constructs a program P consistent with

I.3. Process may be repeated after adding new

examples.– User-driven Interaction

• User tests the program on other inputs I’. If a discrepancy is found, user provides a new input-output example.

– Synthesizer-driven Interaction• If synthesizer finds another program P’ consistent with

I, it asks user to provide output for distinguishing input.

Typically few iterations are required– small teaching dimension [Goldman, Kearns, JCSS

‘95]– low Kolmogorov (descriptive) complexity

25

Interaction Model

[ICSE 2010] Joint work with Susmit Jha, Sanjit Seshia (UC-Berkeley) and Ashish Tiwari (SRI)

Page 26: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

Problem: Turn-off rightmost contiguous sequence of 1’s

User: I want a design that maps 01011 -> 01000

Synthesizer: I can think of two designs Design 1: (z+1) & (z-1) Design 2: (z+1) & z which differ on 00000 (Distinguishing Input) What should 00000 be mapped to?

User: 00000 -> 00000

26

Example of a synthesizer driven interaction model

Page 27: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

Problem: Turn-off rightmost contiguous string of 1’sUser: 01011 -> 01000

Synthesizer: 00000 ?User: 00000

Synthesizer: 01111 ? User: 00000

Synthesizer: 00110 ?User: 00000

Synthesizer: 01100 ?User: 00000

Synthesizer: 01010 ?User: 01000

Synthesizer : Your design is z & (1 + ((z-1)|z))

27

Dialog: Interactive Synthesis

Page 28: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

• A detailed step-by-step description of how the program should behave on a given input.

• Easier to deal with than input-output examples.– Some synthesizers that accept input-output

examples first generate traces. • A natural model in certain applications.

– Programming by demonstration systems for end-users.• intermediate states resulting from the user’s

successive actions on a user interface constitute a valid trace.

– Reverse engineering.• Convenient in certain scenarios.

– E.g., consider describing Factorial(7).• Trace: 7*6*5*4*3*2 or Recursive Trace: 7*Factorial(6)• Final simplified output: 5040

28

Traces

Page 29: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

• A natural model in certain applications.– Program Inverses– Deobfuscation

• Convenient in certain scenarios. – E.g., consider the problem of turning off rightmost

contiguous sequence of 1s in bitvector x.

29

Programs

TurnOffRightMostOnes(x)i := 0; y := x;while(y[i]==0 ^ i<n) i++;while(y[i]==1 ^ i<n) {y[i]:=0; i++;}return y;

9 i,j 0≤i ^ j<n ^ (j ≤i _ j=n-1) ^ 8k: (j≤k≤i ) x[k]=1) ^ 8k: (0≤k<j ) x[k]=0) ^ 8k: (i<k<n ) x[k]=y[k]) ^ 8k: (0≤k<i ) y[k]=0)

Logical Specification

Program

Page 30: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

• Applications

• Dimension 1: User Intent

Dimension 2: Search Space

• Dimension 3: Search Technique

30

Outline

Page 31: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

• Programs– Operators

• Comparison operators• Combination of arithmetic and bitwise operators• APIs exported from a given library

– SortList = Array2List◦SortArray◦List2Array– Control-flow

• Given looping template• Bounded number of statements• Partial program with holes• straight-line or loop-free

31

Dimension 2: Search Space

Page 32: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

Parameterized by set of components/operators used.• Bitvector Algorithms

– Components = Arithmetic + Bitwise operators• Text-editing Programs

– Components = editing commands (insert, locate, select, delete)

• Geoemetrical Constructions– Components = ruler + compass

• Unbounded data type manipulation– Components = linear arithmetic/set operators– [PLDI ‘10, Viktor Kuncak et al, Complete Functional Synthesis]

• API call sequences [PLDI ’05, Bodik et al, Jungloid Mining]– Components = API calls

Can be likened to putting together Jigsaw puzzle pieces.32

Loop-free Programs

Page 33: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

• Programs– Operators– Control-flow

• Grammars– Examples: Regular Expressions, DFAs, NFAs, Context Free

Grammars, Regular Transducers– Applications: robotics/control systems, pattern recognition,

computational linguistics/biology, data compression/mining etc.

• Logics– First-order logic + Fixed point

• = PTIME algorithms over ordered structures such as strings, graphs

• E.g., Graph Classifiers: Bipartite, Acyclic, Connected Graph Computations: Shortest Path, Cycle, 2 coloring• [OOPSLA ’10], Joint work with Shachar Itzhaky, Mooly Sagiv (Tel

Aviv Univ), Neil Immerman (Umass, Amherst)

33

Dimension 2: Search Space

Page 34: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

• Applications

• Dimension 1: User Intent

• Dimension 2: Search Space

Dimension 3: Search Technique

34

Outline

Page 35: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

• Exhaustive search– Applied to Mutual Exclusion Algorithms, Bitvector

Algorithms

35

Dimension 3: Search Technique

Page 36: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

Bitvector Algorithms: Exhaustive Search vs. Logical Reasoning

36

Program Brahmatime

AHAtimeNam

elines

P1 2 3 0.1

P2 2 3 0.1

P3 2 1 0.1

P4 2 3 0.1

P5 2 2 0.1

P6 2 2 0.1

P7 3 1 2

P8 3 1 1

P9 3 6 7

P10 3 76 10

P11 3 57 9

P12 3 67 10

Program Brahmatime

AHAtime

Name lines

P13 4 6 X

P14 4 60 X

P15 4 119 X

P16 4 62 X

P17 4 78 109

P18 6 46 X

P19 6 35 X

P20 7 108 X

P21 8 28 X

P22 8 279 X

P23 10 1668 X

P24 12 224 X

P25 16 2779 X

Page 37: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

• Exhaustive search– Applied to Mutual Exclusion Algorithms, Bitvector

Algorithms• Version space algebras

– Data structures for efficiently representing set of all programs that satisfy a given constraint and intersecting such sets.

– Applied to Programming by Demonstration systems.• Machine learning techniques

– Genetic Programming• Applied to Mutual Exclusion Algorithms, Automated

Debugging– Probabilistic Inference

• [POPL ’07] Joint work with Nebojsa Jojic • [MSR TR ‘10] Joint work with Aditya Nori, Sriram Rajamani, Rahul

Srinivasan

• Logical reasoning techniques– Most commonly used technique.– Leverages recent engineering advances of SAT/SMT

solvers.

37

Dimension 3: Search Technique

Page 38: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

• Constraint Generation– Macro-level (Control-flow encoding)

• Invariant based• Path based• Input based

– Micro-level (Operator encoding)

• Constraint Solving

38

Logical Reasoning Techniques

Page 39: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

• Constraint Generation– Macro-level (Control-flow encoding)

Invariant based• Path based• Input based

– Micro-level (Operator encoding)

• Constraint Solving

39

Logical Reasoning Techniques

Page 40: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

<Pre> while c do S <Post>

40

Invariant-based Constraint Generation

Pre ) II Æ : c ) PostI Æ c Æ S ) I1I ) r≥0I Æ c Æ S ) r1<r

Program Verification

9I,r 8y,y1

Program Synthesis

Notation: Let S be y := F(y). S as a formula denotes y1 = F(y)

I1 denotes I[y1/y]

[POPL 2010] Joint work with Saurabh Srivastava/Jeff Foster (Univ. of Maryland at College Park)

Pre ) II Æ : c ) PostI Æ c Æ S ) I1I ) r≥0I Æ c Æ S ) r1<r

9I,r,c,S 8y,y1

Page 41: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

Experiments: Program Verification vs. Program Synthesis

Sorting Algorithms(Verification Tool: PLDI 09)

Verification Time v

SynthesisTime s

Ratio s/v

Insertion Sort 2.5 5.4 2.1

Bubble Sort 1.3 3.2 2.4

Selection Sort 24 165 7

Quick Sort 1.7 160 94

Merge Sort 19 50 2.7

41

Arithmetic Algorithms(Verification Tool: PLDI 08)

Verification Time v

SynthesisTime s

Ratio s/v

Swap 0.1 0.1 1

Sqrt (Binary Search) 0.6 1.8 3

Sqrt (Linear Tracking) 0.8 10 12.5

Strassen Multiplication 0.1 5 50

Breshenham Line Drawing 166 9658 58

Page 42: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

Dynamic Prog. Algorithms(Verification Tool: PLDI 09)

Verification Time v

SynthesisTime s

Ratio s/v

Fibonacci 0.4 5.9 15

Checkerboard 0.4 1.0 2.5

Longest Common Subsequence

0.5 14 28

Matrix Chain Multiply 6.9 88 13

Single-src Shortest Path 47 124 2.6

42

Experiments: Program Verification vs. Program Synthesis

Page 43: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

• Constraint Generation– Macro-level (Control-flow encoding)

• Invariant basedPath based• Input based

– Micro-level (Operator encoding)

• Constraint Solving

43

Logical Reasoning Techniques

Page 44: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

44

Path-based Constraint Generation

Pre0 Æ :c0 ) Post0

Pre0 Æ φ1 Æ :c1 ) Post1

Pre0 Æ φ2 Æ :c2 ) Post2

:Pre0 Æ φn Æ :cn ) Postn

9c,S 8y0,…, yn

<Pre> while c do S <Post>

The ith equation encodes that pre/post specification holds for all inputs that execute i loop iterations.

Notation: ci denotes c[yi/y], Posti denotes Post[yi/y]

Si denotes S[yi/y , yi+1/y1]

[MSR Techreport 2010] Joint work with Saurabh Srivastava/Jeff Foster (Univ. of Maryland) and Swarat Chaudhuri (Penn State Univ.)

Page 45: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

Experiments: Program Inversion

Compressors LOC Iterations

Time

Run-length Encoding 12 7 26

LZ77 Encoding 20 6 1810

LZW Encoding 25 4 150

45

Formatters LOC Iterations

Time

Base64 22 12 1376

UUEncode 12 7 34

Pkt Wrapper 10 6 132

XML Serialize 8 14 55

Page 46: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

Experiments: Program Inversion

46

Graphics/Arithmetic LOC Iterations

Time

Vector Shift 8 3 4

Vector Scale 8 3 4

Vector Rotate 8 3 40

Dijkstra’s Permute 11 1 8

LU Decomposition 11 1 160

1 + ... + n 5 4 1

TFTP Server LOC Iterations

Time

CMD Loop 20 3 22

File get-send 14 1 520

Ack get-send 12 1 1

Data get-send 7 1 442

Packet get-send 9 1 1

Page 47: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

• Constraint Generation– Macro-level (Control-flow encoding)

• Invariant based• Path based Input based

– Micro-level (Operator encoding)

• Constraint Solving

47

Logical Reasoning Techniques

Page 48: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

48

Input-based Constraint Generation

<P> while c do S <Q>

(:co Æ y0=P Æ yo=Q)_ (φ1 Æ :c1 Æ y0=P Æ y1=Q)_ (φ2 Æ :c2 Æ y0=P Æ y2=Q) :_ (φn Æ :cn Æ y0=P Æ yn=Q)_ (φn Æ cn)

where Ψ(P,Q) = 9 y0,…,yn

9c,S Ψ(P1,Q1) Æ … Æ Ψ(Pk,Qk)

Ψ(P,Q) encodes that the program is consistent with input-output pair (P,Q) if it terminates within n iterations.

Notation: (Pi,Qi) denotes an input-output example.

Page 49: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

49

Choices for Macro-level Constraint Generation

Invariant-based Path-based Input-based

Encoding Unknown program is such that it always satisfies given spec.

Given spec should be satisfied on certain paths.

Given spec should be satisfied on certain inputs.

Coverage Full Medium Least

Constraint Size

Smallest Medium Largest

Sophistication of Constraints

Most Medium Least

Verification Analog

Formal Verification Testing based on path coverage

Testing based on input-suite

Synthesis Paradigm

Deductive Deductive + Inductive

Inductive

Page 50: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

• Constraint Generation– Macro-level

• Invariant based• Path based• Input based

Micro-level (Operator encoding)

• Constraint Solving

50

Logical Reasoning Techniques

Page 51: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

Let x = y op z, where op is some arithmetic operator

• Precise Modeling– x = y op z– Can be encoded using SMT

• Abstract/Approximate Modeling

51

Constraint Generation: Micro-level

Page 52: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

• Bit-vector modeling– F(x1,x2,y1,y2,z1,z2), where x1, x2 denote bits of x – Allows encoding using SAT or small tables

• Arithmetic modulo small prime– x = y op z mod p. Allows encoding using small tables.– Eg, x=y+z mod 3 is encoded as x=G(y,z) where

• Finite precision modeling– Represent x by finite bit integers (xm, xe) denoting

xm*10xe

– Relationship between xm, xe, ym, ye, zm, ze can be encoded using SAT or tables.

– Allows small unrolling of long running but converging loops.

52

Abstract/Approximate Modeling of Algebraic operators

G 0 1 2

0 0 1 2

1 1 2 0

2 2 0 1

Page 53: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

Constraints are of form 9X 8Y F(X,Y)

• Sampling involves replacing Y by few concrete instances Y1, Y2, Y3 to yield

9X F(X,Y1) Æ F(X,Y2) Æ F(X,Y3)

• Sampling can be done in various ways.

53

Sampling

Page 54: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

Constraints are of form 9X 8Y F(X,Y)• Counterexample-driven

– If verifier available• Random

– When F is equality of polynomials or free boolean graphs

• Basis– When F is comparison machine or vector space

computation.• Biased [ICSE ‘10]

– E.g., when Y is a bit-vector– Theorem: ith output bit of several bit-vector functions

depends on ith bit of inputs and bits on right side of it.– Hence, r1r200, r3r401, r5r610, r7r811 provide largest

discriminatory power if 4 instances for Y can be chosen.54

Some Sampling Methods

Page 55: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

• Constraint Generation– Macro-level (Control-flow encoding)

• Invariant based• Path based• Input based

– Micro-level (Operator encoding)• Precise or Approximate/Abstract

Constraint Solving

55

Logical Reasoning Techniques

Page 56: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

Constraints are typically of form 9P 8Y F(P,Y) with:• Second-order quantification over unknown program

fragments/invariants P– Can be reduced to first-order using templates.

• Universal (8) quantifiers over inputs Y– Can be eliminated using a variety of techniques

• Farkas Lemma [PLDI ‘08]– applies when F is an arithmetic formula

• Cover Algorithms [PLDI ‘09]– applies when F is a formula over a given set of

predicates• Sampling

– general technique

First-order and existentially quantified constraints can be solved using off-the-shelf SAT/SMT solvers. 56

Constraint Solving

Page 57: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

Constraints are of form 9X 8Y F(X,Y)

Farkas Lemma helps translate 8 to 9

8Y (e1¸0 Æ e2¸0 ) e¸0) iff

9k¸0 8Y (e ´ +1e1+2e2)

Y can now be eliminated by equating coefficients on both sides of the polynomial equality.

57

Farkas Lemma

Page 58: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

• Applications

• Dimension 1: User Intent

• Dimension 2: Search space

• Dimension 3: Search Technique

58

Outline

Page 59: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

• User Intent– Human Computer Interaction– Natural Language Processing

• Domain Expertise (for Search Space)– Information Extraction (for text manipulation)– Graphics (for image manipulation)– Mathematics/Physics (for teaching domains)

• Search Techniques– Logical Reasoning– Machine Learning

59

Multi-disciplinary Effort Required

Page 60: Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative

• How to combine various forms of user intent in a unified programming interface?– logic, natural language, input/output example, partial

program

• How to ensure a modular architecture that allows reuse of domain knowledge and search techniques across different synthesis tools/applications?

• How to combine power of different search techniques?– Version space algebras– Logical reasoning techniques– Machine learning techniques

60

Research Questions