exact programming by example · introduction program synthesis is the task of automatically...
TRANSCRIPT
Exact Programming by Example
Dana Drachsler Cohen
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Exact Programming by Example
Research Thesis
Submitted in partial fulfillment of the requirements
for the degree of Doctor of Philosophy
Dana Drachsler Cohen
Submitted to the Senate
of the Technion — Israel Institute of Technology Sivan
5777 Haifa June 2017
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
This research was carried out under the supervision of Prof. Eran Yahav, in the Faculty of
Computer Science.
Some results in this thesis have been published as articles by the author and research collaborators
in conferences and journals during the course of the author’s doctoral research period, the most
up-to-date versions of which being:
Nader Bshouty, Dana Drachsler-Cohen, Martin T. Vechev, and Eran Yahav. Learning disjunctions ofpredicates. In Proceedings of the 30th Conference on Learning Theory, COLT 2017, 2017.
Dana Drachsler-Cohen, Sharon Shoham, and Eran Yahav. Synthesis with abstract examples. In ComputerAided Verification - 29th International Conference, CAV 2017, 2017.
Dana Drachsler-Cohen, Martin T. Vechev, and Eran Yahav. Optimal learning of specifications fromexamples (in preparation). CoRR, abs/1608.00089, 2016.
ACKNOWLEDGEMENTS
First and foremost, I would like to thank Prof. Eran Yahav, who I have been fortunate to have
as my advisor. Thank you for your contagious enthusiasm that kept me optimistic throughout
my studies. Thank you for so many insightful discussions, especially those during late nights
before deadlines. Thank you for teaching me how to write in a simple and elegant way, how to
find and explain the essence of any complex idea, and how to shorten my sentences (though we
might have to keep working on this...). Thank you for teaching me to always pursue the most
interesting research questions and overcome any challenge along the way. But above all, thank
you for the endless belief in me. For all these and more, I will be forever grateful.
I would also like to thank my collaborators who contributed greatly to this thesis. To Prof.
Martin Vechev, thank you for the late hours and for the discussions and advice, for the short and
long term. To Prof. Nader H. Bshouty, thank you for the great help with the theoretical aspect
of this thesis; I have learned so much from you. Finally, to Prof. Sharon Shoham, thank you for
the long hours, for teaching me how to always look for ways to simplify ideas, algorithms and
proofs, and how to track obscure pitfalls and elegantly overcome them.
Last but not least, I would like to thank my parents, Ilana and Gabriel, my sister, Dorin,
and my beloved husband, Gal. Thank you for the support through the intense times, for always
putting things in perspective, and above all, for your unconditional love and belief in me. This
thesis is dedicated to you.
The generous financial help of the Technion is gratefully acknowledged.
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Contents
List of Figures
Abstract 1
1 Introduction 31.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.1 Program Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.2 Exact Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Preliminaries 112.1 Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Exact Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Time-Series Patterns from Charts 133.1 The Challenge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Definitions and Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.1 Technical Analysis Terms . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3 Learning Patterns from Charts . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3.1 Learning through Examples . . . . . . . . . . . . . . . . . . . . . . . 17
3.3.2 Learning with an Initial Positive Example . . . . . . . . . . . . . . . . 18
3.4 Synthesizing Code from Formulas . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4.1 The AmiBroker Trading Platform . . . . . . . . . . . . . . . . . . . . 26
3.4.2 Generating AFL Code . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4.3 Supporting Numerical Constraints . . . . . . . . . . . . . . . . . . . . 28
3.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5.1 Common Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5.2 The Efficiency of the Synthesis Process . . . . . . . . . . . . . . . . . 29
3.5.3 The Quality of the Synthesized Queries . . . . . . . . . . . . . . . . . 32
3.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
4 Learning Disjunctions and Conjunctions of Predicates 374.1 The Search Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.1.1 The Nodes of the Search Space . . . . . . . . . . . . . . . . . . . . . 38
4.1.2 The Edges of the Search Space . . . . . . . . . . . . . . . . . . . . . . 38
4.2 Searching the Space with Witnesses . . . . . . . . . . . . . . . . . . . . . . . 40
4.3 The D-SPEX Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4 The C-SPEX Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.5 A Polynomial Time Algorithm for Variable Inequalities . . . . . . . . . . . . . 47
4.5.1 Acyclic Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.5.2 Cyclic Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5 Learning a DNF of Predicates 555.1 The Search Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.2 Searching the Space with Witnesses . . . . . . . . . . . . . . . . . . . . . . . 56
5.3 Learning when Predicates are Closed under Negation . . . . . . . . . . . . . . 59
5.3.1 A Lower and Upper Bound . . . . . . . . . . . . . . . . . . . . . . . . 59
5.3.2 Learning with Representative Positive Examples . . . . . . . . . . . . 60
5.4 Learning when Predicates are Anti-closed under Negation . . . . . . . . . . . . 67
5.4.1 The Search Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.4.2 A Learning Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6 Synthesis with Abstract Examples 716.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.2 Abstract Specifications and Sequence Expressions . . . . . . . . . . . . . . . . 73
6.2.1 Abstract Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.2.2 Sequence Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.2.3 Sequence Expressions as Abstract Examples . . . . . . . . . . . . . . 75
6.3 An Algorithm for Learning Abstract Examples . . . . . . . . . . . . . . . . . 77
6.3.1 Input Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.3.2 Completion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.3.3 Guarantees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.3.4 Running Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.4 Synthesis with Abstract Examples . . . . . . . . . . . . . . . . . . . . . . . . 82
6.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.5.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.5.2 Synthesis Framework Evaluation . . . . . . . . . . . . . . . . . . . . . 86
6.5.3 Abstract Example Specification Evaluation . . . . . . . . . . . . . . . 88
6.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
7 Conclusion 91
Hebrew Abstract i
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
List of Figures
1.1 Using Flash Fill to send meeting appointments. . . . . . . . . . . . . . . . . . 4
1.2 The differences between (classic) program synthesis, programming by example
(PBE), and exact programming by example. In program synthesis, an expert
user provides a specification in the form of a logical formula and the synthesizer
returns a program meeting the specification. In PBE, an end user provides a
set of input-output examples and the synthesizer returns a program consistent
with the examples, but possibly not fully capturing the user’s intent. In exact
PBE, the synthesizer learns the user’s intent by interacting with the user through
examples. Then, the synthesizer returns a program that captures the user’s intent. 5
3.1 (a) A price chart (from Yahoo Finance). (b) The head and shoulders pattern
(from [Inv]). (c) The complete synthesized program for the head and shoulders
pattern. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Example of the head and shoulders pattern. . . . . . . . . . . . . . . . . . . . 15
3.3 The new patterns (figures taken from [Inv]). . . . . . . . . . . . . . . . . . . . 30
3.4 Recall as a function of the number of questions presented in the learning process. 35
6.1 SE grammar: σ ∈ Σ, x ∈ x, X ∈ X, k ∈ K, R ∈ R, f ∈ F . . . . . . . . . . . 74
6.2 Detailed results for B(8). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Abstract
The vast majority of computer users do not know how to code, and thus can only leverage
computers to the extent provided by distributed softwares. The abundance of softwares that
target the same domains demonstrates that off-the-shelf softwares can be complex for users or
not sufficiently suitable for their needs. Programming by example (PBE) has flourished in recent
years to mitigate exactly this problem and enable users to write their own programs by describing
their intent only through examples, without writing or examining a single piece of code. An
inherent problem of PBE is that examples often under-specify the full intent of the users and thus
PBE algorithms must heuristically choose one program from many non-equivalent programs.
While this approach has been shown to be successful in some cases, it cannot guarantee that the
user’s intent will be fully captured, and is thus impractical in many cases.
In this work, we study the problem of learning the exact user intent from examples. We
model user intent as formulas over an arbitrary set of predicates and study several classes of
formulas. We start with conjunctive formulas capturing patterns in streams. We then study
conjunctive and disjunctive formulas over arbitrary predicates. We finally study the class of
disjunctive normal form (DNF) formulas, which implies that any formula can be learned. These
algorithms are inspired by exact learning algorithms that were shown to be successful in other
domains (e.g., learning automata). Our setting is novel in that the types of predicates were
limited in previous works, and thus too the expressibility of user intent.
In the last part of this work, we define the notion of abstract examples and show that it
can help to drastically reduce the number of examples posed in the learning process. Abstract
examples provide a middle ground between concrete examples and formulas that describe
program behavior on multiple examples. We show an algorithm that describes a program
through abstract examples. This algorithm can extend previous PBE synthesizers with the ability
to communicate to the user a candidate program in an intuitive language. User acceptance
of a set of abstract examples covering the input domain implies that the candidate program
is guaranteed to capture his intent. We exemplify this approach on the string and bit vector
domains.
1
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
2
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Chapter 1
Introduction
Program synthesis is the task of automatically generating a (low-level) program from a (high-
level) specification. The specification is often declarative and does not explain how the program
should be implemented. Thus, synthesizers cannot syntactically translate specifications to
executable code, like compilers do. The first synthesizers coped with this challenge using
deductive and transformational methods [MW71]. The nice aspect of these synthesizers is
that the output programs are correct-by-construction. Their disadvantage, however, is that the
choice of rules is non-deterministic, and thus they are not guaranteed to terminate. Though some
techniques considered heuristics to improve the rules chosen [JNR02], modern synthesizers have
turned to constraint-solving approaches [SLJB08] or to enumerative approaches [SSA13]. In
these approaches, specifications are a set of constraints and any solution meeting the constraints
is considered valid. The premise of these approaches is that the specification is complete.
Namely, any solution meeting the specification is also globally correct, even on inputs not
covered by the specification.
At the same time, the setting of programming by example (PBE) has gained popula-
rity [Gul10, LWDW03, DSPGMW10, HG11, Gul11, GHS12, SG12, YTM+13, AGK13, ZS13,
MTG+13, LG14, FCD15, BGHZ15, PG15, SG16, RBVK16]. In programming by example, the
specification is a set of input-output examples. Compared to other synthesis settings, where
the specification is a logical formula or an inefficient program implementation, PBE requires
no a priori knowledge on how to represent the specification. Thus, if in former settings, the
users had to be experts or programmers, in PBE the users can be any end user. This means
that the target audience is significantly larger, which makes the potential impact of PBE much
greater. The premise behind this setting is that users can convey their intent with a few examples.
Unfortunately, this is not true and examples inherently provide an under-specification of the
user’s intent. Thus, PBE algorithms can only guarantee to output a program consistent with the
provided examples, and cannot guarantee to capture user intent on unseen inputs. A user who
wishes to guarantee correctness for all possible inputs has to manually inspect the synthesized
program, an error-prone and challenging task.
Example Eli Gold is a crisis manager at a respected law firm. Due to a crisis, he has to meet all
office members personally. After setting up times and storing the meeting times in an Excel
3
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Figure 1.1: Using Flash Fill to send meeting appointments.
spreadsheet (Fig. 1.1), Eli wants to send emails with a personal message notifying each member
of the time of the meeting. He starts typing the messages in Excel. While typing the third
message, Flash Fill [Gul11] (a PBE synthesizer integrated in Excel) synthesizes a program
and creates messages for all members on the list. In our example, the input-output examples
Flash Fill takes are the first two rows, with columns A–D serving as the input and column
E serving as the output. The rest of the rows are unseen inputs, which the synthesizer does
not consider when generating the output program but executes it on them when it completes.
Without describing the full details of the Flash Fill operation, suffice it to say that it considers
programs over strings that can copy a substring from the input, add a constant, and concatenate
strings. Flash Fill is designed to be invoked when it detects a pattern. With respect to the e-mail
from our crisis manager, this means that after the first two examples, Flash Fill has detected a
pattern. Thus, it synthesizes a program and fills in the missing outputs. At first glance, Flash
Fill seems to have learned the correct program. However, careful inspection reveals that instead
of the desired “Hi” greeting, the message’s first word is an “H” followed by the second letter
of the person’s first name. This demonstrates the importance of inspecting the synthesis result
before relying on it to handle additional examples (e.g., lines 4–8 in the Excel spreadsheet).
In this thesis, we show algorithms that are guaranteed to learn the user’s intent on all inputs,
while still enabling the user to communicate through examples. To this end, we formalize the
problem of learning user intent from examples as an instance of the exact learning problem1.
Exact learning is a field in computational learning theory that is usually associated with one
of the following models: (i) identification in the limit (Gold [Gol67]), (ii) PAC learning (Va-
liant [Val84]), and (iii) query learning (Angluin [Ang88]). In this thesis, we follow Angluin’s
model, which includes a teacher and a student. The teacher knows a concept and the student’s
goal is to learn this concept. To this end, Angluin defines two types of queries that the student
can pose: membership queries and equivalence queries. The secondary goal of the student
is to pose as few queries as possible. In our context, the teacher is the end user, the student
is the synthesizer, and the concept is a formula describing the user’s intent on all possible
inputs. A membership query is whether a given input-output pair satisfies the target formula.
An equivalence query (or, validation query) is whether a certain formula describes the user’s
intent. If the teacher accepts a validation query, the learning is complete. If not, the teacher
provides a counterexample. Though validation queries are inapplicable in PBE settings, we
note that there are works in synthesis that take this approach [ABJ+13, IGIS10, SL08]. In
these works, the teacher is realized as a verifier with a formal specification (rather than a user).
1This is why the thesis is titled Exact Programming by Example.
4
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Program Synthesis𝑃 ⊨ 𝜑
PBE
𝜑 𝑋, 𝑌, 𝑍 = 𝑐𝑜𝑛𝑐𝑎𝑡("𝐻𝑖", 𝑐𝑜𝑛𝑐𝑎𝑡(𝑋, 𝑐𝑜𝑛𝑐𝑎𝑡("𝑝𝑙𝑒𝑎𝑠𝑒 𝑐𝑜𝑚𝑒 𝑡𝑜 𝑚𝑦 𝑜𝑓𝑓𝑖𝑐𝑒 𝑎𝑡", 𝑐𝑜𝑛𝑐𝑎𝑡(𝑍, " -EG"))
𝑃 ⊨ 𝑖𝑛1 → 𝑜𝑢𝑡1∧ 𝑖𝑛2 → 𝑜𝑢𝑡2
𝒊𝒏𝟏 → 𝒐𝒖𝒕𝟏 =Diane Lockhart 11:00→Hi Diane, please come to my office at 11:00. -EG𝒊𝒏𝟐 → 𝒐𝒖𝒕𝟐 =Will Gardner 12:00→Hi Will, please come to my office at 12:00. -EG
Exact PBE
𝝋𝒊𝒏𝟏 → 𝒐𝒖𝒕𝟏 =Diane Lockhart 11:00→Hi Diane, please come to my office at 11:00. -EG𝒊𝒏𝟐 → 𝒐𝒖𝒕𝟐 =Will Gardner 12:00→Hi Will, please come to my office at 12:00. -EG
𝑃 ⊨ 𝜑
Cary Agos 15:00→?
Hi Cary, please come to my office at 15:00. -EG
Figure 1.2: The differences between (classic) program synthesis, programming by example(PBE), and exact programming by example. In program synthesis, an expert user provides aspecification in the form of a logical formula and the synthesizer returns a program meeting thespecification. In PBE, an end user provides a set of input-output examples and the synthesizerreturns a program consistent with the examples, but possibly not fully capturing the user’sintent. In exact PBE, the synthesizer learns the user’s intent by interacting with the user throughexamples. Then, the synthesizer returns a program that captures the user’s intent.
The formal specification provides an efficient way to answer validation questions automati-
cally. Our formulation of PBE as an instance of exact learning is novel. Except for a single
work [JGST10] that exhaustively presents membership queries until a single program remains,
no PBE synthesizers guarantee to learn the user’s intent, beyond the provided examples. While
the approach in [JGST10] guarantees to output a program that fully captures the user’s intent, it
has no non-trivial bounds on the number of membership queries posed. In contrast, in exact
learning the effectiveness of an algorithm is demonstrated by analyzing the membership query
complexity and comparing to the lower bound.
To illustrate the gap our approach addresses, consider Fig. 1.2, which continues our example
from Fig. 1.1. Program synthesis targets settings where an (expert) user provides a formal
specification, e.g., a logical formula (in Fig. 1.2, the specification is a first order formula over the
string theory) and the synthesizer looks for a program that meets this specification. While this
approach guarantees to synthesize programs meeting the specification, writing the specification
is not trivial. Programming by example targets settings where a user provides a set of input-
output examples, and the synthesizer looks for a program consistent with the examples. While
this approach provides an intuitive way to convey the user’s intent, it does not guarantee to
synthesize a program that captures the user’s intent on all possible inputs. Exact programming
by example targets the gap between these approaches – the user and synthesizer interact to learn
the user’s intent through examples. Then, the synthesizer synthesizes a program that guarantees
to capture the user’s intent.
5
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
In this thesis, we make the following contributions.
An Exact PBE Synthesizer for Time-Series Patterns (Chapter 3) We begin by showing
an exact PBE algorithm that learns patterns in time-series charts. Time-series charts are used
in many domains, including financial analysis ([Bul05]), medicine ([CF07]), and seismology
([MEMlT+10]). We formalize patterns as conjunctive formulas over variable inequalities (i.e.,
predicates of the form xi > xj over n variables) and study the problem of learning this class
of formulas with membership queries (that take the form of charts). In this setting, we assume
the learning begins with an initial positive example (i.e., chart). We then show how to extend
this algorithm with a synthesizer that takes a formula describing a pattern and generates an
executable program that detects this pattern in stock streams. We experimentally evaluate this
algorithm and show that it learns a range of popular chart patterns with few questions, and that
synthesized programs are able to detect popular pattern occurrences with an average precision
of 95% in real stock streams.
We continue by generalizing the learning problem to learn formulas over an arbitrary set
of predicates. These algorithms make it possible to split the exact PBE problem into two sub-
problems: (i) learning a formula describing the user’s intent on all inputs, and (ii) synthesizing a
program from that formula. We first study the class of disjunctions and conjunctions and then
study the class of disjunctive normal form (DNF) formulas.
Exact Learning Algorithms to Learn Disjunctions and Conjunctions (Chapter 4) In this
chapter, we study the learnability of the class of disjunctions over a set of predicates. In
this setting predicates may be dependent, and thus syntactically different formulas may be
semantically equivalent. Thus the challenge is to identify the non-equivalent formulas to avoid
posing redundant membership queries. Since it is expensive to compute whether two formulas
are equivalent, it is crucial to limit this kind of computation as much as possible. In this chapter,
we present an algorithm that traverses the space of non-equivalent formulas, but in a lazy fashion
– it computes members of the space only when doing so is required for the learning. We then
present the dual algorithm to learn the class of conjunctions. Lastly, we revisit the problem of
learning patterns from charts, but without the requirement that the user provide initial examples.
In this case, the class of formulas can express cyclic constraints (i.e., x1 < x2 ∧ x2 < x1). We
thus first study the class which does not permit cyclic constraints. We show that for this class,
learning can be done in polynomial time. We then study the general case, and show that learning
is equivalent to the problem of enumerating all the maximal acyclic subgraphs of a directed
graph, which is still an open problem ([ABC+12, BCL+13, Was16]).
Exact Learning Algorithms to Learn DNF Formulas (Chapter 5) We continue with a
study of the class of disjunctive normal form (DNF) formulas over arbitrary predefined predicates.
We begin with a general algorithm and then focus on two special settings where the set of
predicates is: (i) closed under negation, and (ii) “anti-closed” under negation. We show for each
6
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
setting an algorithm with better query complexity. In particular, for the first setting, we show an
algorithm optimal in the number of membership queries.
This chapter actually completes the topic of learning specifications from examples, as for
any formula there is an equivalent DNF formula. Thus, in the last chapter we show a different
approach to obtain exactness in PBE.
Synthesis from Abstract Examples (Chapter 6) In this chapter we show an exact learning
algorithm that takes a different approach from the previous chapters, where the task of learning
user intent is separate from the program synthesis task (and the former task is the primary
focus). However, PBE experts often believe that the program space itself should drive the
search to the target program. Thus, in the final chapter (which we believe opens a new field
for future work), we show how to obtain exactness while performing the search in the program
space. The main idea is to interact with the user through abstract examples, to be used by
the program synthesizer to communicate its behavior. The abstract examples serve as an
intuitive specification for candidate programs. Thus, through abstract examples, the final
candidate program is guaranteed to capture the user’s intent on all inputs. We have implemented
our approach and we experimentally show that our synthesizer communicates with the user
effectively by presenting on average 3 abstract examples until the user rejects false candidate
programs. Further, we show that a synthesizer that prunes the program space based on the
abstract examples reduces the overall number of required concrete examples in up to 96% of the
cases.
1.1 Related Work
In this section, we survey works in program synthesis and exact learning.
1.1.1 Program Synthesis
Some consider the roots of synthesis to be in the works of mathematicians, who have been
developing algorithmic approaches to prove theorems and solve problems since the 30’s [Kol32,
Gre69, DP60]. However, program synthesis, with the aim of generating a program from some
specification, dates to the end of the 60’s [WL69], when Waldinger and Lee showed an algorithm
that takes a first-order logical formula and generates a LISP program meeting this specification.
Their main idea is to phrase LISP instructions as axioms, the formula as a theorem, and use a
theorem prover to find a “proof”. If found, the proof is processed to a program. This approach,
known later as deductive synthesis, was further developed in the 70’s, mostly by Waldinger
and Manna [MW71, MW79, MW80]. In parallel, Waldinger and Manna [MW75] also showed
an artificial intelligence approach to synthesize programs. The idea was to define rules that
gradually transform a goal specification into a program, where each step introduces new sub-
specifications (goals). Both deductive synthesis and goal-based synthesis rely on having a
complete specification, which is often difficult for users to provide. Thus, at the same time,
7
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
another approach to program synthesis was developed, where instead of giving the synthesizer
a formal specification, input-output examples were provided [Har74, SSG75, Sum77, Bie78].
Later this setting became known as programming by example. Another setting that was studied
was that of concrete execution examples, which explain how to obtain the output from the
input. This setting was first presented by Smith [Smi75] and implemented in a system called
Pygmalion. Later this setting became known as programming by demonstration. Since then,
works in synthesis have developed new algorithms and paradigms; however, the settings studied
remained mostly the same. A semi-new setting assumes that in addition to a formal specification
a (limited) program syntax is provided [SLJB08, SSL11, AFSS16, BTGC16, ABJ+13]. This is
known as syntax-guided synthesis.
Moving forward to more recent years, the development of hardware and the increasing
size of CPU memory made previously impractical solutions viable. In particular, solutions
such as enumerating the program space [URD+13], succinctly representing all programs with
graphs [Gul11], and looking for solutions with constraint-solvers [SL08] have become effective
in domains that once were too large for the search to complete. A very partial list of the studied
domains includes domains for end users, such as string manipulation programs in spreads-
heets [Gul11]; data extraction [LG14] and smartphone applications [LGS13]; domains used
by programmers, such as data-structures [SLJB08, SSL11] and SQL-queries [ZS13]; and the
domain of compilers and optimizations, such as optimization of bit manipulations [JGST10],
compilation to low-power architectures [PJS+14], compilation of general-purpose language pro-
grams to optimized DSLs [CKSL15], and optimization of programs interacting with databases
via ORMs [CSLM13].
In this thesis, we focus on the setting of programming by example. This setting has gained
increased popularity over the last fifteen years [Gul10, LWDW03, DSPGMW10, HG11, Gul11,
GHS12, SG12, YTM+13, AGK13, ZS13, MTG+13, LG14, FCD15, BGHZ15, PG15, SG16,
RBVK16] due to its simplicity, which makes it tractable to end users who are not programmers.
The vast majority of PBE algorithms synthesize programs consistent with the input-output
examples, which may not capture the user’s intent on unseen inputs. However, some works
guarantee to output the target program. For example, CEGIS [SL08] learns a program by
introducing equivalence queries upon finding a candidate program that is consistent with the
provided examples. By accepting the equivalence query, the user confirms that his intent has
been captured on all inputs. In oracle-guided synthesis [JGST10], the program space is assumed
to be finite (in fact, it is assumed to be a permutation of a fixed number of instructions). This
enables the posing of membership queries to prune the program space until only one program
remains (more precisely, until all programs that remain are equivalent). While correctness is
guaranteed, there is no guarantee on the number of queries posed, as in each step two non-
equivalent programs are chosen arbitrarily and then an input on which they return different
outputs is presented as a membership query.
8
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
1.1.2 Exact Learning
Learning from examples has been extensively studied in computational learning theory. As
mentioned, there are three main models: (i) identification in the limit (by Gold [Gol67]),
(ii) query learning (by Angluin [Ang88]), and (iii) PAC learning (by Valiant [Val84]). While
these models vary in their settings and goals, they all learn languages and although their queries
are a bit different, they all support queries that ask the teacher whether a specific word belongs
to the target language. Our setting follows Angluin’s setting, which defines the teacher-student
model and two types of queries: membership and equivalence (also called validation). In a
membership query the student picks a word in the domain and asks whether it is part of the
target language. The teacher responds with a yes or no. In an equivalence query the student
picks a language and asks whether it is the target language. If the teacher accepts that language,
the learning is complete. Otherwise, the teacher provides a counterexample, that is, a word that
belongs to one language but not to the other.
In this thesis, we focus on learning with membership queries only. The literature has shown
many results for this setting in many applications, including group testing [DH00, DH06], blood
testing [Dor43], chemical leak testing, chemical reactions [AC08], electrical short detection,
codes, multi-access channel communications [BG07], molecular biology, VLSI testing, AIDS
screening, whole-genome shotgun sequencing [ABK+02], DNA physical mapping [GK98],
game theory [Pel02], and many other applications [DH00, ND00, BGV05, DH06, Cic13, BG07].
However, no work has studied the class we address of formulas over an arbitrary set of predicates.
The setting of learning with membership queries is also related to the notion of teaching
dimension. Goldman and Kearns [GK95] define the teaching dimension of a concept class as
the minimum number of examples a teacher must reveal to uniquely identify any concept in
the class. In particular, Goldman and Kearns have studied the teaching dimension for the class
of formulas over monomials and provided lower bounds on the required number of questions.
In this thesis, we study the class of formulas over an arbitrary set of predicates and analyze
the query complexity compared to the lower bound (i.e., the teaching dimension). Due to the
potential correlations between predicates, their approach is inapplicable in our setting.
9
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
10
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Chapter 2
Preliminaries
In this section, we provide preliminaries and notations used throughout the thesis.
2.1 Formulas
In this section, we define common terminology related to sets and formulas.
Throughout the thesis, we focus on quantifier-free first-order logical formulas (or simply,
formulas) restricted to predicate symbols. Formally, we focus on the following class of formulas:
Definition 2.1.1. Let x0, x1, ... be an infinite set of variables andQ be a set of predicate symbols.
The set of formulas is defined inductively:
• Predicate symbols: If q is an n-ary predicate symbol, and xi1 , ..., xin are variables, then
q(xi1 , ..., xin) is a formula.
• If ϕ is a formula, then ¬ϕ is a formula.
• If ϕ and ψ are formulas, then ϕ ∨ ψ and ϕ ∧ ψ are formulas.
We follow the common semantics to evaluate the truth value of a formula. In the following,
we abuse the term predicate to refer both to its syntactic meaning (the symbol that is part of the
formula) and to its semantic meaning (a set).
A formula ϕ is called disjunctive (resp. conjunctive) or a disjunction (resp. conjunction) if
it is of the form∨q∈Q q (resp.
∧q∈Q q) where Q is a set of literals. The literals over Q is the
set L = Q∪ ¬q | q ∈ Q. The set of literals L is closed under negation, where we equate ¬¬lwith l.
A formula ϕ is called a DNF if it is of the form (∧q∈Q1
q) ∨ ... ∨ (∧q∈Qk q), where Qi are
sets of literals. In the following, we denote ∨Q the disjunctive formula ∨q∈QR and ∧Q the
conjunctive formula ∧q∈Qq, where Q is a set of literals.
2.2 Exact Learning
We follow an exact learning model that is heavily inspired by Angluin’s exact learning mo-
del [Ang88]. Given a domain D and a set of predicates over D,Q, our goal is to learn a member
11
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
ψ in a certain class of formulas over Q, denoted by Qclass, where class is either ∨ (the class
of disjunctions), ∧ (the class of conjunctions), or DNF (the class of DNF formulas over) over
Q. That member ψ, also called the target formula, defines a subset H (the hypothesis) of D
and thus has a single free variable. Formally, denote byM the model whose domain is D and
its predicates are those in S. Then, for all e ∈ D,M |= ϕ[e] if and only if e ∈ H . If D is a
Cartesian product of size n, e.g., D = X ×X...×X︸ ︷︷ ︸n times
, we will define ψ with n free variables to
address individual elements in the tuples. We sometimes treat formulas as Boolean functions
over D. The Boolean functions are defined as follows: for all e ∈ D, ϕ(e) = 1 ifM |= ϕ[e],
and ϕ(e) = 0, otherwise.
We assume a teacher (also called the user) that has a target formula ψ ∈ Qclass and a
learner that knows Qclass but not ψ. The teacher can answer membership queries for the target
function, that is, given e ∈ D (from the learner), the user returns true if e |= ψ and false
otherwise. The goal of the learner (the learning algorithm) is to find the target formula ψ with a
minimum number of membership queries.
Following are a few notations used throughout the thesis. OPT(Qclass) denotes the mi-
nimum worst case number of membership queries required to learn a formula ψ in Qclass.Given ψ ∈ Qclass, we denote by Q(ψ) the set that consists of all predicates in ψ. For example,
S(R1 ∨R2) = R1, R2. Given e ∈ D, we denote byQ(e) the set of all predicates satisfied by
e: Q(e) = q ∈ Q | e |= q.
12
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Chapter 3
Time-Series Patterns from Charts
In this chapter, we show an exact PBE algorithm that learns patterns in time-series charts.
Time-series charts are used in many domains including financial analysis ([Bul05]), medicine
([CF07]), and seismology ([MEMlT+10]). Experts use these charts to predict important events
(e.g., trend changes in a stock price) indicated by special patterns. There is a lot of study on
common patterns and there are many softwares that enable these experts to write a program
that alerts upon detecting their customized pattern (e.g., some platforms for finance analysts are
MetaTrader, MetaStock, Amibroker). Unfortunately, writing programs is a complex task for
these experts, who are not programmers.
We present a novel, interactive synthesis approach that relieves analysts from programming,
and allows them instead to specify their intent directly via visual examples. Our approach is
based on two key ideas: (i) a logical fragment expressive enough to capture interesting chart
patterns, and (ii) an interactive algorithm which leverages our logical fragment to learn target
queries by presenting the user with a polynomial number of examples to be classified. Our
results are general to any application of time-series patterns; however, in the following we focus
on patterns of financial streams. In particular, to evaluate our approach, we implemented a
procedure that transforms the synthesized queries into directly executable programs in a popular
trading platform. Experimental results show that our synthesizer learns a range of popular chart
patterns with few questions, and that synthesized programs are able to detect popular pattern
occurrences with an average precision of 95% in real stock streams.
3.1 The Challenge
Technical analysis is used by millions of traders for trading various assets, including stocks,
futures, and commodities. Technical analysis tries to predict future price movement based on
past price changes visualized in charts (e.g., Fig. 3.1(a)) and on special forms known as patterns.
The occurrence of a pattern in a chart is used as a predictor of future price trends. For example,
the head and shoulders pattern in Fig. 3.1(b) predicts price decline.
To detect chart patterns analysts use pattern queries, queries that take an input price stream
and report matches of the patterns in the streams. There are many trading platforms that
13
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
(a)
(b)
1: Price = Close;2: thrs = 0.5;3: P5 = Peak(Price, thrs, 1);4: PB5 = LastValue(PeakBars(Price, thrs, 1));5: P3 = Peak(Price, thrs, 2);6: PB3 = LastValue(PeakBars(Price, thrs, 2));7: P1 = Peak(Price, thrs, 3);8: PB1 = LastValue(PeakBars(Price, thrs, 3));9: P-1 = Peak(Price, thrs, 4);10: PB-1 = LastValue(PeakBars(Price, thrs, 4));11: P0 = LLV(Ref(Price, -PB1-1), PB-1-PB1-1);12: P2 = LLV(Ref(Price, -PB3-1), PB1-PB3-1);13: P4 = LLV(Ref(Price, -PB5-1), PB3-PB5-1);14: P6 = LLV(Price, PB5);15: Filter = P0 < P2 AND P2 < P1 AND P1 < P3 AND P2 <
P4 AND P4 < P5 AND P5 < P3 AND P6 < P0;
(c)Figure 3.1: (a) A price chart (from Yahoo Finance). (b) The head and shoulders pattern(from [Inv]). (c) The complete synthesized program for the head and shoulders pattern.
provide built-in queries; however, analysts typically want to define patterns based on their own
viewpoint [LMW00], ideally via a quick and intuitive process that enables them to adapt queries
after obtaining preliminary results.
There are various domain-specific languages (DSLs) for writing pattern queries; however,
the task of writing queries is complex and error-prone, especially for analysts who are not
expert programmers. For example, Fig. 3.1(c) shows a query written in AFL, a DSL of the
well-known AmiBroker trading platform. This query detects the head and shoulders pattern by
locating seven peaks and lows (P0, ..., P6) in a price stream and checking whether they meet the
conditions that characterize the pattern. To write such queries, analysts are not only required to
know the language primitives (marked in bold in Fig. 3.1(c)), but also how to combine them
correctly in the query. For example, when LLV receives as input a mathematical expression
(Lines 11-13), each operand must be defined using the LastValue operation. Failing to do so
results in an incorrect query.
Current Approaches The interest in technical analysis queries led to the development of DSLs
in many trading platforms (e.g., MetaTrader, MetaStock, Amibroker, NinjaTrader). While these
DSLs offer tailored primitives, they are still strict programming languages and require familiarity
with programming. Microsoft’s StreamInsight [CGM10] allows analysts to express patterns via
state machines; however, they are still required to encode them in C], which is non-trivial even
for programmers. CPL [ACK01] is a Haskell-based language designed to simplify programming
of chart pattern queries. Yet it requires familiarity with Haskell and functional programming,
which even experienced programmers may not have. All these approaches require analysts
to express chart patterns in strict programming languages. In contrast, to these approaches,
we present a new synthesis approach that enables analysts to work directly with visual chart
examples and not with programs. To describe patterns, analysts provide visual chart examples
to our interactive synthesizer, called SyFi. SyFi uses visual examples to learn formulas that
14
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Figure 3.2: Example of the head and shoulders pattern.
capture the patterns, and synthesizes efficient queries from the formulas. The elegance of this
approach is that analysts need not understand formulas or programming languages and only deal
with intuitive and familiar charts.
3.2 Definitions and Problem Definition
In this section, we provide definitions and state the problem addressed.
3.2.1 Technical Analysis Terms
Price Streams A price stream is a function mapping time points (such as date or hour) to
prices. For example, Fig. 3.2 shows a price stream at the resolution of days where each date
is mapped to the stock closing price on that date. To simplify presentation, we assume price
streams are stock closing prices and that time points are dates. Formally, a stream is a function
mapping natural numbers to real valued prices: S : N→ R.
Prices do not move in a straight line, but rather in zigzags (e.g., Fig. 3.1(a)). Still, prices
exhibit trends that are the overall direction of the price stream at a certain period of time.
Trends are determined by the notable peak and low points. Notable can be interpreted in many
ways; thus, to accommodate any interpretation, we henceforth assume an extremum function
ES : N→ 0, 1 defined over a stream S that flags the extremum points:
Es(i) = 1⇔ S(i) is an extremum point.
Let i and j be points such that ES(i) = ES(j) = 1 and ES(i′) = 0 for all i′ ∈ (i, ..., j).
We say that i and j show an uptrend if S(i) < S(j), a downtrend if S(i) > S(j), or a sideways
trend if S(i) = S(j)1. For example, the points in the left part of Fig. 3.1(a) show an uptrend.
Line Charts Streams are inspected in bounded time frames, known as charts. A price chart is
a function mapping a finite set of consecutive dates to their corresponding prices. A line chart is
a chart that shows the line connecting the daily closing prices (e.g., Fig. 3.1(a)). We focus on
line charts because many analysts believe that the closing price is the most significant indicator
of price activity, and thus believe that line charts are a more indicative measure of this activity
1Practically, one defines thresholds and defines S(i) > S(j) if S(i) > S(j) + thes, S(i) < S(j) if S(i) +thes < S(j) and S(i) = S(j) if S(i) < S(j) + thes and S(i) + thes > S(j).
15
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
than other chart types. Formally, given a stream S, a chart from date d of size k is a function
mapping k consecutive dates starting from d to their prices: Sd..k : 0, ..., k − 1 → R, such
that:
Sd..k(i) = S(d+ i)
When referring to an arbitrary chart, where the starting date is not important, we denote S0..k.
Patterns Technical analysts predict future trends based on past trends that form into a known
pattern. Namely, a pattern is a sequence of trends. For example, the head and shoulders pattern
(Fig. 3.2) consists of three uptrends each followed by a downtrend. We note that while the trend
sequence is the main characteristic of a pattern, there are other characteristics, such as stock
volume ([Bul05]), which are ignored in this work. Formally, a pattern is a formula defined over
n variables, p0, ..., pn−1, that belongs to the class Qn∧ where:
Qn = pi≺pj ,¬(pi≺pj) | 0 ≤ i, j < n
For example, the following is a head and shoulders formula:
ϕHS(p0, . . . , p6) = p0≺p1 ∧ p2≺p1 ∧ p1≺p3 ∧ p5≺p3 ∧ p4≺p5 ∧ p6≺p5
The size of a pattern ϕP ∈ Qn∧, denoted by |ϕP |, is the maximal index of the variables. For
example, |p1≺p3| = 3.
A price chart Sd..k meets a pattern ϕP of size n if its extremum points are a model of ϕP :
Sd..k |= ∃i0, ..., in−1 [∧j∈i0,...,in−1 0 ≤ j < k] ∧
∀j.[j ∈ i0, ..., in−1 ⇔ ES(d+ j) = 1] ∧ϕP (Sd..k(i0), ..., Sd..k(in−1))
3.2.2 Problem Definition
We address the problem of synthesizing pattern queries from price charts. We split this problem
into two parts:
• Given a price chart, interactively learn a formula ϕP that captures the desired pattern.
• Given a formula, synthesize an executable query that detects charts that meet the pattern
in a stream.
The first task is exactly the problem of learning the class ofQn∧ over the domain D = Rn, where
we assume a user who can answer membership queries. Technically, the membership queries
are charts – a sequence of real numbers uniquely defines a price chart – which implies that the
interaction is based on (visual) charts. The second task completes the first task by synthesizing
a query over a programming language. While this task is more straightforward, it requires
detecting charts that have n extremum points and report to the user in case they meet the pattern.
16
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
3.3 Learning Patterns from Charts
In this section, we describe our algorithm for learning formulas from charts, called SyFi
(standing for synthesis of finance queries). We begin with the main insight that guides the
algorithm, then provide the algorithm, and finally illustrate it on an example.
3.3.1 Learning through Examples
In this section, we give a few results that guide our algorithm.
Lemma 3.3.1. Let ψ be the target pattern of size n and let S0..n be a chart. If S0..n |= ψ, then
for any q ∈ Qn such that S0..n 6|= q, ψ 6|= q. In particular, q /∈ Q(ψ).
Proof. Since there is S0..n 6|= q such that S0..n |= ψ, it follows that ψ 6|= q. Since ψ is a
conjunction, if it does not logically imply q, q /∈ Q(ψ).
Lemma 3.3.2. Let ψ be the target pattern of size n and let Q ⊆ Qn be a set of predicates such
that Q(ψ) ⊆ Q. If q ∈ Q such that (∧(Q \ q)) 6|= ψ, then q ∈ Q(ψ).
Proof. Assume in contradiction that q /∈ Q(ψ), namely Q(ψ) ⊆ Q \ q. Then, it must be that
(∧(Q \ q)) |= ψ– a contradiction.
These lemmas guide our algorithm. It maintains a set Q′ that is known to be a superset of
the Q(ψ) and at each step it picks a predicate q and generates a chart. If this chart is a positive
example, then q is not in Q(ψ) and otherwise q is in Q(ψ). Namely, the high-level algorithm is:
Algorithm 1: High-level SyFi1 Q′ = ?2 for q ∈ Q′ do3 Sq = model(Q′ \ q ∧ ¬q)4 if Sq == null then ?5 if ψ(Sq) = 1 then // Pose a membership query
6 Q′ = Q′ \ q // By Lemma 3.3.1, q /∈ Q(ψ)
7 else8 // do nothing // By Lemma 3.3.2, q ∈ Q(ψ)
9 return ∧Q′ // At this point, Q′ = Q(ψ)
If for every q there exists a chart Sq, the lemmas guarantee that this algorithm returns ψ.
Thus, the questions that remain to be addressed are:
• How should Q′ be initialized?
• How can it be guaranteed that Sq exists for every q?
While a natural candidate for initializing Q′ is Qn (i.e., all possible predicates), doing so would
result in no Sq for any q (assuming |Q| > 2). This follows since Qn is closed under negation,
and thus starting from Qn means that Q′ contains a pair of a predicate and its negation.
Instead, we address both questions by assuming that the learning process starts from a
positive example provided by the user, which is the topic of the next section. This assumption
17
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Algorithm 2: SyFi(Su0..n)1 Q′ = Q(Su0..n) = q ∈ Qn | Su0..n |= q2 Qψ = ∅ // The set of predicates logically implied by ψ
3 while Q′ \ Qψ 6= ∅ do4 pi≺pj = argminpi≺pj∈Q′\Qψ |Su0..n(i)− Su0..n(j)|5 S = model(∧(Q′ \ pi≺pj) ∧ ¬pi≺pj)6 if ψ(S) = 1 then // Pose a membership query
7 Q′ = Q′ \ pi≺pj // By Lemma 3.3.1
8 S = model(Q′ \ ¬pj≺pi ∧ pj≺pi)9 if ψ(S) = 1 then // Pose a membership query
10 Q′ = Q′ \ ¬pj≺pi // By Lemma 3.3.1
11 else12 Qψ = Qψ ∪ q′ ∈ Q′ | Qψ ∪ ¬pj≺pi |= q′ // By Lemma 3.3.2
13 else14 Qψ = Qψ ∪ q′ ∈ Q′ | Qψ ∪ pi≺pj |= q′ // By Lemma 3.3.2, q ∈ Q(ψ)
15 return ∧Q′ // At this point, ∧Q′ ≡ ψ
does not impose any burden on the user and it enables our algorithm to be linear in the size of
Qn if there are no equal points. In the next chapter, we provide a different solution that does
not require starting from an example but is not linear in the number of predicates and does not
support the case where there are equal points.
3.3.2 Learning with an Initial Positive Example
In this section, we address the questions that were raised when describing the high-level
algorithm. We address them by assuming the user provides an initial chart that meets the target
pattern. We denote this chart by Su0..n. We address both points by leveraging Su0..n:
• Initializing Q′: We initialize Q′ to q ∈ Qn | S0..n |= q. By Lemma 3.3.1, it is
guaranteed that Q(ψ) ⊆ Q′.• Guaranteeing that Sq 6= null: We address this by posing an order over the predicates
inspected, such that at each point it is guaranteed that the predicate considered will have
the required chart. The details are provided in the reminder of this section.
We begin by explaining our solution when there are no equal points in the initial Qn (namely,
there are no i, j such that ¬pi≺pj ,¬pj≺pi ∈ Qn), and then explain how to extend this solution
when there are equal points.
Learning with No Equal Points If there are no equal points, then for every pair of points
i 6= j, either pi≺pj ,¬pj≺pi ∈ Q′ or pi≺pj ,¬pj≺pi ∈ Qn. Our next lemma shows that if
one considers the predicate pi≺pj such that S(i) and S(j) have a minimal distance, then Spi≺pjand S¬pj≺pi exist.
Lemma 3.3.3. Let ψ be the target pattern of size n, let Su0..n be the initial chart, and let
Q ⊆ Q(Su0..n) be a set such that:
• Q(ψ) ⊆ Q,
18
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
• ∧Q is satisfiable, and
• For every i 6= j and for every k: If pi ≺ pj ,¬pk ≺ pi, pk ≺ pj ∈ Q, then Su0..n(i) <
Su0..n(k) < Su0..n(j).
If pi≺pj = argminpi≺pj∈Q′\Qψ |Su0..n(i)− Su0..n(j)|, then the following are satisfiable:
1. Q \ pi≺pj ∧ ¬pi≺pj2. Q \ pi≺pj ,¬pj≺pi ∧ pj≺pi
Proof. We first show that ϕ = (∧Q) ∧ ∀k.(k 6= i, j → pk ≺ pi ∨ ¬pk ≺ pj) is satisfiable.
Since ∧Q is satisfiable, if ϕ is unsatisfiable then there exists k 6= i, j such that ¬pk≺pi, pk≺pj ∈ Q. Since Q ⊆ Q(Su0..n), then Su0..n(i) < Su0..n(k) < Su0..n(j). Namely, pi ≺ pj 6=argminpi≺pj∈Q′\Qψ |S
u0..n(i)−Su0..n(j)| – a contradiction. Since ϕ is satisfiable, there is a chart
S |= ϕ.
1. Define:
S′(i′) =
S(j), if i′ = i
S(i′), otherwise
We show that S′ |= ∧Q \ pi≺pj ∧ ¬pi≺pj . Let q ∈ Q \ pi≺pj ∧ ¬pi≺pj .• If q = pk ≺ pk′ or q = ¬pk ≺ pk′ such that k, k′ 6= i: S′ |= q since S |= q and
S′(k) = S(k), S′(k′) = S(k′).
• If q = ¬pi≺pj or q = ¬pj≺pi: S′ |= q since S′(i) = S′(j).
• If q = pk≺pi: Since S |= Q, then S(k) ≤ S(i), and since S(i) < S′(i) it follows
that S′ |= Q.
• If q = ¬pk≺ pi for k 6= j: In this case, by the definition of ϕ, S(k) ≥ S(j) and
thus in particular S′(k) ≥ S(j) = S′(i) and thus S′ |= Q.
2. Define:
S′(i′) =
S(j), if i′ = i
S(i), if i′ = j
S(i′), otherwise
We show that S′ |= ∧Q \ pi ≺ pj ,¬pj ≺ pi ∧ pj ≺ pi. Let q ∈ Q \ pi ≺ pj ,¬pj ≺pi ∧ pj≺pi.• If q = pk≺ pk′ or q = ¬pk≺ pk′ such that k, k′ 6= i, j: S′ |= q since S |= q and
S′(k) = S(k), S′(k′) = S(k′).
• If q = pj≺pi: Since S(i) < S(j), by the definition of S′, S′(i) > S′(j).
• If q = pk≺pi or q = ¬pk≺pj : Continues to hold by the definition of S′.
• If q = pk ≺ pj : By the definition of ϕ, S(k) < S(i) and thus in particular
S′(k) < S′(j) = S(i) and thus S′ |= Q.
• If q = ¬pk≺ pi for k 6= j: In this case, by the definition of ϕ, S(k) ≥ S(j) and
thus in particular S′(k) ≥ S(j) = S′(i) and thus S′ |= Q.
Lemma 3.3.4. Let Su0..n be a chart such that for every i 6= j, if ¬pi≺pj ∈ Q, then ¬pj≺pi /∈Q. Then, SyFi terminates and outputs a formula that is equivalent to the target ψ.
19
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Proof. • SyFi completes:
– At every iteration either there is pi, pj such that pi≺pj ∈ Q′ \ Qψ or Q′ \ Qψ = ∅:Follows because for every pi, pj either pi ≺ pj ∈ Q′ or pj ≺ pi ∈ Q′ and each
iteration either removes or adds to Q′ψ pi≺pj and ¬pj≺pi or pj≺pi and ¬pi≺pj .– There are models in Lines 5 and 8: From Lemma 3.3.3, it is sufficient to show that
the lemma preconditions are met. Initially Q′ is satisfiable by Su0..n. Also, since
there are no equal points, for every i 6= j and k, if pi≺pj ,¬pk≺pi, pk≺pj ∈ Q,
then Su0..n(i) < Su0..n(k) < Su0..n(j). Thus, it is sufficient to show that Q(ψ) ⊆ Q′.We show this by induction. Base: Follows sinceQ′ = Q(Su0..n) and by Lemma 3.3.1.
Step: A predicate q is removed only when an example satisfying ¬q is discovered
as positive, and thus from Lemma 3.3.1, Q(ψ) ⊆ Q \ q.• SyFi returns a formula equivalent to the target ψ: Since Q(ψ) ⊆ Q′ throughout the
execution, ∧Q′ |= ψ. We now show that ψ |= ∧Q′ψ throughout the execution, and since
when SyFi completes Q′ = Qψ, the claim follows. We show this by induction. Base:
Follows since Qψ = ∅. Step: A predicate q is added to Qψ either when:
– An example satisfying ∧(Qn \ q) is discovered as negative, and thus from
Lemma 3.3.2, q ∈ Q(ψ).
– The predicate is logically implied by Qψ ∪ q′ for q′ satisfying q′ ∈ Q(ψ). By
transitivity, ψ |= ∧Qψ ∪ q′ |= q.
From this lemma, we get the next theorem.
Theorem 3.1. Let Su0..n be a chart such that for every i 6= j, if¬pi≺pj ∈ Q, then¬pj≺pi /∈ Q.
SyFi learns the target formula with at most |Qn| membership queries.
Learning with Equal Points We next extend the previous results in the case that Su0..n has
equal points. There are two challenges when addressing this setting:
1. Identifying the set of points that are equal in ψ.
2. Learning the relation of the other points to the equal points.
We begin by explaining the second challenge through an example. Assume an initial chart Su0..3where two points, 0, 1, are known to be equal in ψ (i.e., ¬p0≺p1,¬p1≺p0 ∈ Q(ψ)) and the
other one is smaller in Su0..3. Then, by the initialization:
Q′ = ¬p0≺p1,¬p1≺p0, p2≺p0,¬p0≺p2, p2≺p1,¬p1≺p2
If we let SyFi run as defined in Algorithm 2, it would begin the loop and pick p2≺p0 or p2≺p1.
Assume it picks p2≺p0. Then, SyFi looks for an example satisfying the conjunction:
∧Q′ \p2≺p0∪¬p2≺p0 = ∧¬p0≺p1,¬p1≺p0,¬p2≺p0,¬p0≺p2, p2≺p1,¬p1≺p2
This is equivalent to satisfying (p0 = p1) ∧ (p0 = p2) ∧ (p2≺p1), which is unsatisfiable. The
problem arises because if points are known to be equal (p0 = p1), negating only one predicate
that relates to one of them (p2≺p0) while leaving the equivalent predicate (p2≺p1) results in
20
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
an unsatisfiable formula. To avoid this situation, after obtaining the equal points in ψ (which we
describe shortly), we pick a representative for each set of equal points and remove all constraints
that pertain to the other points in T (except the ones involving the representative, which describe
T ). Formally, if T is a set of equal points, we define the representative as the minimal point in
T , denoted by min(T ) and update Q′ as follows:
Q′T = Q′ ∩ pi≺pj ,¬pi≺pj | i, j /∈ T ∨ i = min(T ) ∨ j = min(T )
We next formalize this, extend the definition to multiple sets of equal points, and prove
that the resulting predicate set is logically equivalent to the original predicate set. We begin by
defining equal point sets and then provide the lemma.
Definition 3.3.5. Let ψ of size n be a target formula. A set T ⊆ 0, ..., n−1 is called an equal
point set of ψ if for all i, j ∈ T , ¬pi≺pj ,¬pj≺pi ∈ Q(ψ). T is a maximal equal point set of
ψ if T is equal point set and for every i ∈ T, k /∈ T : ¬pi≺pk /∈ Q(ψ) or ¬pk≺pi /∈ Q(ψ). A
set T1, ..., Tm is a maximal equal set of ψ if every Ti is a maximal equal point set and every
other subset of 0, ..., n− 1 is not a maximal equal point set.
Definition 3.3.6. Let T ⊆ 0, ..., n be a set of indices. Given a set Q′, we define
Q′T = Q′ ∩ pi≺pj ,¬pi≺pj | i, j /∈ T ∨ i = min(T ) ∨ j = min(T )
Given a set of index sets, T1, ..., Tm, we denote Q′T1,...,Tm = (((Q′T1)T2)...Tn).
Lemma 3.3.7. Let ψ be a target formula, Su0..n a positive example, Q(Su0..n) the predi-
cates in Qn satisfied by Su0..n and T1, ..., Tm maximal equal point sets of ψ. Then,
∧(Q(Su0..n)T1,...,Tm) ≡ ∧Q(Su0..n).
Proof. We prove by induction on m. Base: n = 0 trivial. Step: We show that
∧(Q(Su0..n)T1,...,Tm) ≡ ∧(Q(Su0..n)T1,...,Tm−1) and by transitivity and the induction hypothesis
we get the result. Since (Q(Su0..n)T1,...,Tm) ⊆ (Q(Su0..n)T1,...,Tm−1), ∧(Q(Su0..n)T1,...,Tm−1) |=∧(Q(Su0..n)T1,...,Tm). To show that ∧(Q(Su0..n)T1,...,Tm) |= ∧(Q(Su0..n)T1,...,Tm−1), we show that
for any q ∈ (Q(Su0..n)T1,...,Tm−1), ∧(Q(Su0..n)T1,...,Tm) |= q. We split to cases:
• If q ∈ Q(Su0..n)T1,...,Tm : The claim clearly follows.
• If q /∈ Q(Su0..n)T1,...,Tm : By definition, q = pi ≺ pj or ¬pi ≺ pj such that either
i ∈ Tm \ min(Tm) or j ∈ Tm \ min(Tm) (or both). Assume w.l.o.g. that
q = pi ≺ pj and i ∈ Tm \ min(Tm). Since Q(Su0..n)T1,...,Tm ⊆ QSu0..n , and i and
min(Tm) are equal points in ψ, it must be that pmin(Tm)≺pj ∈ Q(Su0..n). We show that
¬pi≺pmin(Tm),¬pmin(Tm)≺pi, pmin(Tm)≺pj ∈ Q(Su0..n)T1,...,Tm and this implies the
claim. First, since T1, ..., Tm−1, Tm are maximal sets: i,min(Tm) /∈ T1 ∪ ... ∪ Tm−1.
Thus ¬pi ≺ pmin(Tm),¬pmin(Tm) ≺ pi ∈ Q(Su0..n)T1,...,Tm−1 and in particular ¬pi ≺pmin(Tm),¬pmin(Tm)≺pi ∈ Q(Su0..n)T1,...,Tm . Since pi≺pj ∈ Q(Su0..n)T1,...,Tm−1 , either
j /∈ T1 ∪ ...Tm−1 or that for some j′ ≤ m− 1, j = min(Tj′). In any case, it follows that
pmin(Tm)≺pj ∈ Q(Su0..n)T1,...,Tm−1 and thus pmin(Tm)≺pj ∈ Q(Su0..n)T1,...,Tm .
21
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
We next address the first question of how to obtain T1, ..., Tm. If it is known that all equal
points in the given chart are also equal in the target ψ, then the maximal equal set is the set of
equivalence classes: [i]=Su0..n | 0 ≤ i < n where (i, j) ∈=S if and only if S(i) = S(j). If it
is unknown, then we run a procedure called getEqualPointChart (described shortly) that
takes a chart Su0..n and outputs a chart S which is identical, except that points that are equal in
Su0..n but are not equal in ψ are not equal in S. This enables us to assume that the maximal equal
set is the set of equivalence classes: [i]=Su0..n | 0 ≤ i < n. Equipped with this, we show an
extension of SyFi that learns any formula in Q∧:
Algorithm 3: SyFiwEqualPoints(Su0..n)1 Su0..n = getEqualPointChart(Su0..n)2 T1, ...Tm = [i]=Su0..n | 0 ≤ i < n3 Q′ = Q(Su0..n)T1,...Tm
4 Qψ = ¬pi≺pj ,¬pi≺pj | ¬pi≺pj ,¬pi≺pj ∈ Q′ // Logically implied by ψ
5 while Q′ \ (Qψ ∪⋃k∈1,...,m¬pi≺pj ,¬pj≺pi | i, j ∈ Tk) do
6 [SyFi (Algorithm 2) Lines 4–14]
7 return ∧Q′
Before describing getEqualPointChart, we prove that this algorithm meets the precon-
ditions of the lemmas presented in the previous section and thus SyFiwEqualPoints learns
the class Q∧. To facilitate the proof, we show the following lemma:
Lemma 3.3.8. Given the target formula ψ, a positive example Su0..n, and a maximal equal set
T1, ..., Tm of ψ, there exists ψ′ such that ψ′ ≡ ψ and Q(ψ′) ⊆ (QSu0..n)T1,...,Tm .
Proof. We construct ψ′. We begin with Q(ψ′) = Q(Su0..n)T1,...,Tm ∩ Q(ψ) (i.e., ψ |= ψ′) and
add predicates as follows such that ψ |= ψ′. Let q ∈ Q(ψ) \ Q(ψ′). Then, there exists i, j such
that q = pi≺pj or q = ¬pi≺pj and k such that w.l.o.g. i ∈ Tk. We split to cases:
• If j /∈ T1 ∪ ... ∪ Tm, then pmin(Tk) ≺ pj (or ¬pmin(Tk) ≺ pj) is in Q(Su0..n)T1,...,Tm .
Also, ¬pi≺ pmin(Tk),¬pmin(Tk)≺ pi are in Q(Su0..n)T1,...,Tm . Thus, we add these three
predicates to Q(ψ′). Since q ∈ Q(ψ), ψ logically implies these three predicates and thus
it continues to hold that ψ |= ψ′.
• If there exists k′ such that j ∈ Tk′ , we split to cases:
– If k′ 6= k: pmin(Tk) ≺ pmin(Tk′ )(or ¬pmin(Tk) ≺ pmin(Tk′ )
) is in Q(Su0..n)T1,...,Tm .
Also, ¬pi ≺ pmin(Tk),¬pmin(Tk) ≺ pi,¬pj ≺ pmin(Tk′ ),¬pmin(Tk′ )
≺ pj are in
Q(Su0..n)T1,...,Tm . Thus, we add these five predicates to Q(ψ′). Since q ∈ Q(ψ), ψ
logically implies these five predicates and thus it continues to hold that ψ |= ψ′.
– If k′ = k (i.e., q is in fact ¬pi ≺ pj): ¬pmin(Tk) ≺ pj ,¬pj ≺ pmin(Tk) are in
Q(Su0..n)T1,...,Tm . Also, ¬pi ≺ pmin(Tk),¬pmin(Tk) ≺ pi are in Q(Su0..n)T1,...,Tm .
Thus, we add these four predicates to Q(ψ′). Since q ∈ Q(ψ), ψ logically implies
these four predicates and thus it continues to hold that ψ |= ψ′.
Lastly, we show that ψ′ |= ψ. Let S |= ψ′. We show that for every q ∈ Q(ψ), S |= q, which
implies S |= ψ. Let q ∈ Q(ψ). If q ∈ Q(ψ′), the claim follows. Otherwise, consider the
22
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
predicates that were added to ψ′ when constructing it and considering q. Then, since S |= ψ′, it
satisfies these predicates, and thus S |= q.
Lemma 3.3.9. Given the target formula ψ and an initial chart Su0..n such that the maximal equal
set of ψ is equal to T1, ..., Tm = [i]=Su0..n | 0 ≤ i < n, SyFiwEqualPoints completes
and outputs the a formula equivalent to ψ.
Proof. • SyFiwEqualPoints completes:
– At every iteration either there is pi, pj such that pi≺pj ∈ Q′ \ Qψ or Q′ \ (Qψ ∪⋃k∈1,...,m¬pi ≺ pj ,¬pj ≺ pi | i, j ∈ Tk) = ∅: Follows because for every
constraint involving pi, pj one of the following is true:
∗ They belong to a maximal equal point set and thus are not in Q′ or are in⋃k∈1,...,m¬pi ≺ pj ,¬pj ≺ pi | i, j ∈ Tk. In either case, they are not in
Q′ \ (Qψ ∪⋃k∈1,...,m¬pi≺pj ,¬pj≺pi | i, j ∈ Tk).
∗ Otherwise, they are not in the same maximal equal point set and are inQ′. Thus,
they are not equal in Su0..n and hence either pi≺pj ∈ Q′ or pj≺pi ∈ Q′. In that
case, some iteration removes or adds to Q′ψ the predicates pi≺pj and ¬pj≺pior pj≺pi and ¬pi≺pj , thus reducing the size ofQ′\(Qψ∪
⋃k∈1,...,m¬pi≺
pj ,¬pj≺pi | i, j ∈ Tk).
– There are models in Lines 5 and 8: From Lemma 3.3.3, it is sufficient to show that
the lemma preconditions are met. Initially Q′ is satisfiable by Su0..n. Also, for every
i 6= j and k, if pi≺pj ,¬pk≺pi, pk≺pj ∈ Q′, then i, j, k are not in T1, ..., Tm or
belong to different sets. Thus, Su0..n(i) < Su0..n(k) < Su0..n(j). Thus, it remains to
show that Q(ψ) ⊆ Q′. To show that, we use ψ′ ≡ ψ from Lemma 3.3.8, for which
Q(ψ′) ⊆ Q′. The proof is then by induction identically to the proof in Lemma 3.3.4
(except that it shows that Q(ψ′) ⊆ Q′ throughout the execution).
• SyFiwEqualPoints returns a formula equivalent to the target ψ: Since Q(ψ′) ⊆ Q′
throughout the execution, ∧Q′ |= ψ′ (where ψ′ is the formula from Lemma 3.3.8).
We now show that ψ′ |= ∧(Q′ψ ∪⋃k∈1,...,m¬pi ≺ pj ,¬pj ≺ pi | i, j ∈ Tk)
throughout the execution, and since when SyFiwEqualPoints completes Q′ =
Qψ ∪⋃k∈1,...,m¬pi ≺ pj ,¬pj ≺ pi | i, j ∈ Tk), the claim follows. We show
this by induction. Base: Follows sinceQψ = ∅ and since T1, ..., Tm are maximal equal
sets of ψ and thus in ψ′. Step: A predicate q is added to Qψ either when:
– An example satisfying ∧(Qn \ q) and is discovered as negative; thus from
Lemma 3.3.2, q ∈ Q(ψ′).
– The predicate is logically implied by Qψ ∪ q′ for q′ satisfying q′ ∈ Q(ψ). By
transitivity, ψ |= ∧Qψ ∪ q′ |= q.
We conclude this section by explaining getEqualPointChart procedure, which takes a
chart Su0..n and returns a chart S as similar to Su0..n as possible but where points are equal only if
it is required by the target formula ψ. The getEqualPointChart procedure (Algorithm 4)
23
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Algorithm 4: getEqualPointChart(Su0..n)1 S = Su0..n2 while true do3 for T ∈ [i]=S | 0 ≤ i < n do4 for i = 1; i ≤ |T |/2; i+ + do5 for T ′ ∈ T ′ ∈ 2T | |T ′| = i do6 S′ = model(∧[Q(S) \ ¬pk≺pk′ |k ∈ T ′, k ∈ T \ T ′ ∪ pk≺pk′ | k ∈
T ′, k ∈ T \ T ′])7 if ψ(S′) = 0 then // A membership query
8 S′ = model(∧[Q(S) \ ¬pk′≺pk | k ∈ T ′, k ∈ T \ T ′ ∪ pk′≺pk | k ∈T ′, k ∈ T \ T ′])
9 if ψ(S′) = 0 then continue // A membership query
10 S = S′
11 goto Line 2
12 break
13 return S
starts from Su0..n and the equivalence classes of =Su0..n. It then starts a loop that iterates the
equivalence classes and checks for each whether it has a subset whose values may differ from
those of the other points in the class (but the points in the subset are equal to one another). To
this end, an inner loop checks all possible subsets of the equivalence class (by symmetry, it is
sufficient to check subsets up to half of the class size). For each subset, two charts are generated:
one where the subset has value smaller than the other points and one where the subset has
value greater than the other points. For each chart a membership query is posed. If a chart is
a positive example, then it serves as the new chart to proceed from and the operation restarts.
Eventually, the algorithm completes when all the equal points of the current chart have to be
equal in charts satisfying ψ. To prove correctness, we show that (i) there are models in Lines 6
and 8, (ii) getEqualPointChart completes, and (iii) the final chart meets the requirement.
Lemma 3.3.10. For every S, there are models in Lines 6 and 8.
Proof. We show a model for Line 6, and the model for Line 8 is similar. Denote the minimal
difference between different points in S bymin, i.e., min = minS(i)−S(j) | S(i)−S(j) >
0. We define S′ as follows:
S′(i) =
S(i)−min/2, if i ∈ T ′
S(i), otherwise
We show that S′ |= ∧[Q(S)\¬pk≺pk′ |k ∈ T ′, k ∈ T \T ′∪pk≺pk′ | k ∈ T ′, k ∈ T \T ′].Let q ∈ ∧[Q(S) \ ¬pk≺pk′ |k ∈ T ′, k ∈ T \ T ′ ∪ pk≺pk′ | k ∈ T ′, k ∈ T \ T ′]:• If q = pk ≺ pk′ such that k ∈ T ′, k′ ∈ T \ T ′: Then, S(k) = S(k′) and thus S′(k) =
S(k)−min/2 < S(k′) = S′(k).
• If q = pk≺ pk′ or q = ¬pk≺ pk′ for k, k′ /∈ T : Follows since S |= q and since by the
definition of S′, S′(k) = S(k) and S′(k′) = S(k′).
24
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
• If q = pk ≺ pk′ or q = ¬pk′ ≺ pk such that k ∈ T ′ and k′ /∈ T : Follows since
S′(k) = S(k)−min/2 < S(k) < S(k′) = S′(k′).
• If q = ¬pk ≺ pk′ or q = pk′ ≺ pk such that k ∈ T ′ and k′ /∈ T : Follows since
S′(k) = S(k)−min/2 > S(k)−min ≥ S(k′) = S′(k′).
• If q = ¬pk≺pk′ such that k, k′ ∈ T ′: Follows since S′(k) = S(k) = S(k′) = S′(k′).
Lemma 3.3.11. getEqualPointChart terminates.
Proof. By the previous lemma, getEqualPointChart cannot get stuck when looking for
models. Further, at each iteration of the outer loop, either S is replaced with S′, which has fewer
equal points, or S is returned. Thus, the number of iterations is bounded by the number of pairs
of equal points in the initial chart, and thus is guaranteed to terminate.
Lemma 3.3.12. Let S be a chart returned by getEqualPointChart. Then, for every 0 ≤i 6= j < n, if S(i) = S(j), then ¬pi≺pj ,¬pi≺pi ∈ Q(ψ).
Proof. Assume by contradiction that S is returned and there is a pair S(i) = S(j) such that
¬pi ≺ pj /∈ Q(ψ) or ¬pj ≺ pi /∈ Q(ψ). Consider T = j′ | S(i) = S(j′), which is
inspected by getEqualPointChart, and consider all subsets that are equal point sets of ψ:
Ts = j′ | ¬pi′ ≺ pj′ ,¬pj′ ≺ pi′ ∈ Q(ψ) | i′ ∈ T. Since S is a positive example, it
satisfies all predicates in Q(ψ) and thus Q(ψ) ⊆ Q(S). We define a partial order over Ts, <
as follows: T1 < T2 if there exists i′ ∈ T1 and j′ ∈ T2 such that ψ 6|= ¬pi′ ≺ pj′ . Let T ′ be
a minimal element in Ts over <. Namely, Q(ψ) ⊆ Q(S) \ ¬pk≺ pk′ |k ∈ T ′, k ∈ T \ T ′.This implies that if |T ′| ≤ |T |/2, then S′ defined in Line 6 is a positive example, and otherwise
S′ defined in Line 8 is a positive example. Thus S will be replaced with S′. Since each iteration
reduces the number of equal points, S will not be reconsidered, and thus will not be returned – a
contradiction.
The above algorithm implies that the query complexity is dominated by the maximal number
of equal points in the original chart, which is bounded by the chart size, n. This provides us
with the following theorem.
Theorem 3.2. The number of membership queries posed by SyFiwEqualPoints is at most
|Qn|+ 2n/2, where n is the initial chart size.
3.4 Synthesizing Code from Formulas
In this section, we present the query synthesis process, taking the formula we learned from
visual examples, and realizing it as a program for a trading platform. Here, we show one way to
synthesize a query to detect a pattern. We show a program synthesized in AFL, the programming
language of a popular trading platform, called AmiBroker. This approach is mostly technical
and in particular one can design another synthesizer that compiles patterns (i.e., formulas) to
executable code. We provide the details for completeness of the presentation.
25
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
We begin with a high-level description of the query structure. We then provide a short
background on AmiBroker (Section 3.4.1), followed by an explanation of how to synthesize
queries from formulas (Section 3.4.2), and then a description of how to extend the queries with
quantitative constraints (Section 3.4.3).
3.4.1 The AmiBroker Trading Platform
AmiBroker is a popular high-speed trading platform that supports writing and executing pattern
queries. Analysts can use the queries for real-time trading by configuring the query to buy or
sell when the pattern is detected.
Queries are written in a DSL called AFL (other platforms offer their own DSLs, with
similar primitives). AFL is an array-based language and as such its primitives are arrays,
functions receive and return arrays, and expressions (Boolean or computational) consist of
and are evaluated to arrays. Array indices are non-positive and they end at index zero (in the
examples below, the rightmost value is the cell at index zero). In addition, the index zero plays a
special role as arrays are often identified with the value that appears at that cell. For example,
the expression A>3 for A = [2, 4] can be treated as evaluated to true, instead of the array
[false, true]. Thus, and for simplicity’s sake, the functions below are described as returning
values instead of arrays (whereas actually they return arrays whose cells at index zero contain
these values). The one case in which arrays cannot be treated as values is discussed at the end
of Section 3.4.2.
We next present the AFL primitives and functions that appear in the synthesized queries.
The Close array contains the stock closing prices where the value at index zero contains today’s
price and values at negative indices refer to historical prices. Similarly, Open, High, and Low
are arrays containing the opening, high, and low prices. The function Ref(A,n) returns the
value of A n days from today; for example, Ref(Close,-2) returns the closing price two days
ago. The function LLV(A, n) returns the lowest low in A over the last n days; for instance,
LLV([1, 4, 2, 3],3) returns 2. The function LLVBars(A, n) returns the number of days since
the lowest low value was reached over the last n days; e.g., LLVBars([1, 4, 2, 3],3) returns 1.
The function Peak(A, t, n) returns the nth most recent peak in A, where n ≥ 1 and t is the
threshold in percentages for identifying peaks. The function PeakBars(A, t, n) returns the
number of days since the nth recent peak in A. The function LastValue(A) is unique and
does not return an array but rather the value of A at index zero.
3.4.2 Generating AFL Code
The synthesizeAFL (Algorithm 5) operation synthesizes AFL code from a pattern formula.
It takes as arguments the pattern formula, ϕP , the price stream, Stream, and the threshold for
identifying peaks, K. SynthesizeAFL consists of the following steps:
• Splitting the points in ϕP into peaks and lows (Lines 1–4).
• Generating the query header (Line 5).
• Generating the code finding the peaks (Lines 6–10).
26
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Algorithm 5: synthesizeAFL(ϕP , Stream, K)1 plastO = maxpi | pi ∈ ϕP and i is odd2 plastE = maxpi | pi ∈ ϕP and i is even3 peaks = odd peaks? lastO, ..., 3, 1, -1 : lastE , ..., 2, 04 lows = odd peaks? 0, 2, ..., lastE : 1, 3, ..., lastO5 q +=“Price = Stream; thrs = K;”6 n = 17 for p in peaks do8 q += “Pp = Peak(Price, thrs, n);”9 q += “PBp = LastValue(PeakBars(Price, thrs, n));”
10 n++
11 plast = maxpi | pi ∈ ϕP 12 for l in lows do13 prc = (l < last)? “Ref(Price, -PBl+1-1)” : “Price”14 n = (l < last)? “PBl−1-PBl+1-1” : “PBl−1”15 q += “Pl = LLV(prc, n);”
16 q += “Filter = ”17 for pi < pj in ϕP do18 q += “Pi < Pj AND”
19 for ¬(pi < pj) in ϕP do20 q += “Pi ≥ Pj AND”
• Generating the code finding the lows (Lines 11–15).
• Generating the code checking whether these peaks and lows satisfy ϕP (Lines 16–20).
We next explain these steps.
Splitting into Peaks and Lows (Lines 1–4) Since the pattern formula ϕP refers only to the
pattern’s peaks and lows, either the odd points in ϕP are the peaks and the even points are the
lows, or vice versa. Thus, to split the points, we check whether the odd points are the peaks
(e.g., by checking whether p1 is greater than p0) and accordingly initialize the peaks and lows
sets to the peaks’ and lows’ indices. Since lows are later defined relatively to their surrounding
peaks, if p0 is a low, a new point, p-1, is added as a peak.
Generating the Query Header (Line 5) The second step begins to generate the query, which
will be stored in q. This step generates the header, which consists of the price stream to scan
(Price) and the threshold for identifying peaks (thrs). Price is set to the parameter Stream,
which could be any price stream supported by AFL, such as Close, Open, Low, or High, while
thrs is set to the parameter K.
Finding the Peaks (Lines 6–10) The third step scans the set peaks from the most recent peak
to the oldest one and defines each of them using Peak and PeakBars. As mentioned, Peak
and PeakBar take as arguments the stream, Price; the threshold for identifying peaks, thrs;
and the ordinary number of peaks, n, which begins at 1 (the most recent peak) and is increased
by one after each peak definition. In the query generated, Pp is the peak’s price and PBp is the
number of days since the peak was reached. The PBp definition uses the LastValue operation;
we postpone the explanation for why we use this to the end of this section. Also, for simplicity
of presentation, the edge case where the last peak is the last point in ϕP is omitted. In this case,
27
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Peak cannot be used for technical reasons, and another function is used instead.
Finding the Lows (Lines 11–15) The fourth step scans the set lows and defines each of them
between the two peaks surrounding it using the function LLV. LLV searches for the lowest low
point in the stream prc over the last n days. Thus, to define a low l between its surrounding
peaks, Pl−1 and Pl+1, we define prc to be the price stream that ends at the price Pl+1 (exclusive)
and search for the lowest low in this stream that appears after Pl−1. Namely, prc is the stream
Price shifted in -PBl+1-1 days (obtained using Ref) and n is equal to PBl−1-PBl+1-1 (i.e.,
the number of days between Pl−1 and Pl+1). A special case arises if l is the last point in ϕP , in
which case Pl+1 is undefined. Instead, we search for the lowest low that appears after the peak
Pl−1 and thus prc is Price and n is PBl-1.
Checking against ϕP (Lines 16–20) The final step generates the pattern formula. To detect
patterns, AFL allows a formula to be assigned to a variable called Filter, and whenever the
formula is satisfied, a notification is sent to the user. Thus, the final step translates ϕP to code
and assigns it to Filter (for simplicity’s sake, the code presented includes a redundant AND at the
end of the formula). We note that Filter can be replaced with Buy or Sell and then AmiBroker
will buy or sell stocks when the formula is satisfied.
A Note on LastValue We next explain why it is necessary to add LastValue in the instructi-
ons that compute PBp (Line 9). Computational expressions that appear inside LLV and Ref (as
generated in lines 13–14) may be evaluated to unexpected values if the operands are arrays and
not numbers (i.e., arrays cannot be treated as values, as mentioned in Section 3.4.1). Thus, we
need to access values in the arrays in order to use them in LLV and Ref, and we obtain them
using the LastValue operation.
3.4.3 Supporting Numerical Constraints
The generated code captures the pattern’s formation, but sometimes the user wishes to express
numerical constraints such as: (i) the minimal difference between two points required to consider
them as not equal, (ii) the maximum (or minimum) number of days between two notable peaks
or lows, and (iii) the ratio between two price points.
All these constraints can be easily added to the queries SyFi synthesizes by adding constraints
concerning the Pi and PBi variables. The user can express numerical constraints by configuring
a set of parameters. The parameters may apply to a specific pair of points or to all pairs. We
next show how to express the above numerical constraints.
To express constraints of type (i), SyFi replaces constraints of the form Pi<Pj with Pi ·(1 + diff ) <Pj . The parameter that defines diff globally for all pairs is called Klow and it is
described here since we refer to it in the next section. Klow is the fraction of thrs (the threshold
for identifying peaks) required to determine that one point is lower than the other one, namely,
diff = thrsKlow
· 100 (the fraction is multiplied by 100 since thrs is given in percentages).
To express constraints of type (ii), SyFi adds constraints of the form PBi − PBj ≥ M
where M is an integer. Though PBi were defined only for the peaks, they can be defined also
for the lows using the LLVBars operation.
28
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
To express constraints of type (iii), SyFi adds constraints of the form Pi · r ≥ Pj where r is
a real number.
3.5 Evaluation
In this section, we evaluate the effectiveness of SyFi by investigating answers to the following
research questions:
• How long does the synthesis process take to learn common technical analysis patterns?
• How precisely do learned formulas capture the patterns in real stock streams?
We begin by describing the common patterns (Section 3.5.1), continue with a study of the
synthesis process (Section 3.5.2), and conclude with a study of its effectiveness in detecting
patterns in stock prices (Section 3.5.3). Experiments were run on a Sony Vaio PC with Intel i7
processor and 16GB RAM.
3.5.1 Common Patterns
Many of the common patterns are described in textbooks, with most of them being similar to
each other up to slight modifications. Thus, to evaluate the effectiveness of SyFi in capturing
patterns, we selected six basic patterns, of which we believe all the others to be variations.
Hence, we believe that identifying them successfully implies that SyFi is general enough to
capture a wide range of patterns. The selected patterns are: (i) head and shoulders, (ii) cup with
handle, (iii) double tops, (iv) symmetrical triangle, (v) rectangle, and (vi) flag. The last five
patterns are illustrated in Fig. 3.3; for further reading see [Bul12, Bul05, Inv].
3.5.2 The Efficiency of the Synthesis Process
To study the efficiency of the synthesis process, we measured how long it took to learn the
formulas that capture the patterns precisely, namely the formulas that are satisfied by all pattern
occurrences and only by them.
The Evaluated Factors SyFi’s synthesis process consists of the learning process (Section 3.3)
and the code generation process (Section 3.4). Since the code generation completes instantly,
we focus on the learning process. The duration of the learning process is affected by the number
of questions presented to the user and the time it takes to synthesize the charts. Thus, we study
both factors on the different patterns.
The Experiments To study the learning process, we conducted several experiments. In each
experiment, we defined a priori the goal formula that described the pattern, created an example
(shown in Fig. 3.3), and let SyFi learn the formula from the example (interactively). The result
of the experiment was the overall learning time and questions.
Coping with Subjectivity Pattern definitions are subjective. To overcome the challenge in evalu-
ating subjective definitions that might lead to inconclusive results, we ran several experiments
for each pattern, each with a different formula (but with the same example). The different
29
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Pattern Figure Example Chart
Head and Shoulders
Cup with Handle
Two Tops
Symmetrical Triangle
Flag
Rectangle
Figure 3.3: The new patterns (figures taken from [Inv]).
definitions, taken from textbooks and online forums, span a range of possible definitions, from
the most permissive to the most restrictive. We next provide a general description of the patterns
and the definitions used.
Head and Shoulders Three peaks, the middle is the highest.
(1) Most permissive – three peaks, middle one is the highest.
(2) (1) with shoulders higher than all lows.
(3) (2) where p0, p6 are lower than the other points.
(4) (3) with ascending “neckline” (p0≺p2≺p4) and p6≺p0.
(5) Most restrictive – the given chart is the only valid chart.
Cup with Handle A rise, followed by a cup-shape, then a decline (“the handle”), and finally
another rise.
(1) Most permissive – all four parts exist.
(2) (1) with significant rise: p5 is higher than the other points.
(3) (2) with p0 lower than the other points.
(4) (3) with handle not lower than the cup (¬(p4≺p2)).
30
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Pattern |Sall| Def. |Spat| Q’s Avg. Time (std.dev.)
MaxTime
Head and Shoulders 42
(1) 6 28 0.074 (0.05) 0.15(2) 10 24 0.09 (0.059) 0.176(3) 10 16 0.086 (0.054) 0.166(4) 7 9 0.103 (0.061) 0.166(5) 6 6 0.115 (0.054) 0.165
Cup with Handle 30
(1) 5 25 0.043 (0.031) 0.1(2) 6 18 0.055 (0.038) 0.119(3) 7 13 0.054 (0.045) 0.127(4) 6 11 0.054 (0.035) 0.109(5) 6 10 0.068 (0.045) 0.141
Two Tops 20
(1) 5 11 0.076 (0.027) 0.107(2) 5 9 0.086 (0.035) 0.131(3) 5 6 0.089 (0.041) 0.134(4) 6 6 0.189 (0.184) 0.562
Symmetrical Triangle 42(1) 7 15 0.083 (0.076) 0.259(2) 7 7 0.207 (0.111) 0.357
Flag 42(1) 7 9 0.094 (0.054) 0.166(2) 6 6 0.107 (0.048) 0.146
Rectangle 20(1) 6 9 0.159 (0.105) 0.376(2) 6 6 0.183 (0.183) 0.56
Table 3.1: Learning Process Evaluation Results.
(5) Most restrictive – the given chart is the only valid chart.
Two Tops Two peaks of equal height.
(1) Most permissive – there are two equal height tops.
(2) (1) with middle low (p2) not lower than the other lows.
(3) (2) with last point (p4) lower than the other points.
(4) Most restrictive – the given chart is the only valid chart.
The next patterns are captured by constraints that leave little room for different definitions and
thus only two are listed.
Symmetrical Triangle Descending peaks (p1p3p5), ascending lows (p2≺p4 ≺ p6), and
p2≺p0, p0≺p1.
(1) Most permissive – p0 appears between p1 and p2.
(2) Most restrictive – the given chart is the only valid chart.
Flag A pole followed by descending peaks (p1p3p5), descending lows (p2p4p6), and p0
lower than all points.
(1) Most permissive – p2 and p5 may be equal.
(2) Most restrictive – the given chart is the only valid chart.
Rectangle Peaks (p1, p3) are equal, lows (p2, p4) are equal, and p0 is not higher than p1.
(1) Most permissive – p0 is not higher than p1.
(2) Most restrictive – the given chart is the only valid chart.
Results The results are shown in Table 3.1. The table shows the pattern name (Pattern); the
31
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
total number of predicates that may appear in the formula (|Sall|); the number of the definition
used (Def.); the number of predicates in the learned formula (|Spat|); the number of questions
presented to the user (Q’s); the average and maximum time (Avg. Time, Max Time) in seconds
passed between the user response and the next chart display. Standard deviation is shown in
brackets.
Question Analysis Table 3.1 shows that there are relatively few questions and that this number
is in correlation with the number of irrelevant constraints satisfied by the example. Namely, as
the pattern definition tested was more restrictive, the number of questions declined. In particular,
the most restrictive definitions were learned within 10 questions. The table also shows that SyFi
required fewer questions (up to 15) to learn patterns that were more restrictive (triangle, flag,
and rectangle). For patterns that consisted of several parts (head and shoulders and cup with
handle), SyFi required more questions to learn (up to 30). Nevertheless, we believe that even
if the number of questions reaches 30, the overall learning process can be completed by the
analysts quickly as classifying visual charts is a simple and intuitive task.
Time Analysis In all experiments, the average time was < 0.2 seconds and the maximum time
was < 0.6 seconds. These times are not noticeable to users and thus we believe users will not
observe delays during the learning process.
3.5.3 The Quality of the Synthesized Queries
Although SyFi learns the precise formulas that capture patterns, the queries it synthesizes contain
parameters which affect the detection of patterns in charts. In this section, we evaluate the
quality of SyFi’s final outcome – the queries.
The Experiments To evaluate the query quality, we conducted several experiments. In each
experiment, we ran one query over 10 stock streams (taken from [YF]), each containing the
closing prices over the last six years. Thus, each query was evaluated over more than 15, 000
charts.
For each pattern, we evaluated one query, the one that detected the most popular definition.
As opposed to the previous section where the definition that was used affected the evaluation
results (the number of questions), in this section the evaluation results (the detection quality)
are affected by the query parameters, which are independent of the pattern definition. The
definitions used were: head and shoulders-(4), cup with handle-(4), two tops-(3), symmetrical
triangle-(1), flag-(1), and rectangle-(1).
In the experiments, thrs (Section 3.4.2) was set to 0.5 and Klow (Section 3.4.3) was set
to 13 . These values were chosen after studying the values used by technical analysis users (as
described in online forums) and examining the data. We did not examine different parameters
because our goal is to show that the synthesized queries detect patterns well. In real-world
scenarios, such parameters are tuned by the analysts.
The Evaluated Factors To evaluate the detection quality, we measured precision and recall.
Precision is the percentage of detected charts that were pattern occurrences, while recall is the
32
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Pattern Head and Shoulders Cup with Handle Two Tops Symmetrical Triangle
StockSymbol
Pat.Oc.
Pre.(%)
Rec.(%)
Rec.0(%)
Pat.Oc.
Pre.(%)
Rec.(%)
Rec.0(%)
Pat.Oc.
Pre.(%)
Rec.(%)
Rec.0(%)
Pat.Oc.
Pre.(%)
Rec.(%)
Rec.0(%)
AAPL 7 100 100 0 128 86 96 1 6 100 83 33 5 100 80 0GOOGL 10 100 90 0 112 92 99 4 11 100 55 9 5 100 80 0MSFT 30 100 93 20 104 87 93 11 2 100 50 0 7 100 71 0AXP 9 100 67 0 102 88 99 15 12 100 58 0 4 100 50 0BA 7 100 57 0 116 89 100 3 13 100 92 15 3 100 67 0CAT 7 100 100 0 119 89 100 2 6 100 100 0 0 - - -CSCO 5 100 80 0 65 59 98 2 8 100 100 13 2 100 50 0CVX 13 100 100 0 107 88 99 13 17 100 94 29 3 100 100 0DD 14 100 79 0 87 88 97 7 11 100 73 0 2 100 50 0DIS 10 100 60 0 91 77 98 4 10 100 100 70 1 100 100 0
Summary 112 100 83 2 1031 84 98 6 96 100 81 17 32 100 72 0
Table 3.2: Detection statistics: number of pattern occurrences, precision, recall, and recallwithout SyFi’s learning.
percentage of detected pattern occurrences from all pattern occurrences. Formally, precision
equals TPTP+FP ·100 and recall equals TP
TP+FN ·100 where TP (true positive) is the number of
detected pattern occurrences, FP (false positive) is the number of detected charts that were not
pattern occurrences, and FN (false negative) is the number of pattern occurrences that were
missed. To determine which occurrences were pattern occurrences, we manually classified
streams based on the pattern formulas. We did not consider the values of thrs and Klow during
the manual classification, and instead determined visually whether peaks were significant and
whether two points were equal. We believe such classification simulates the way analysts detect
patterns.
Results The results are shown in Table 3.2 and Table 3.3: Pat. Oc. is the number of pattern
occurrences (classified manually), that is, charts that meet the pattern, Pre. is precision, Rec. is
recall, and Rec.0 is the recall that would have been obtained had we not applied the learning
process before generating the query (i.e., the formula used is the conjunction of all predicates
over the≺-predicate satisfied by initial chart example). Cells containing “-” could not be
computed either because there were no pattern occurrences or because no charts were detected
by the query.
Precision Analysis Tables 3.2 and 3.3 show that the precision is mostly high (on average, 95%),
namely most detected charts are indeed pattern occurrences. The only exception is the cup with
handle pattern. In this pattern, we observed that capturing patterns through extremum points and
the≺predicate cannot capture cup shapes. To extend the query to capture cup shapes, SyFi can
be extended to allow users to mark in the chart example the points that form cup shapes, and
it would add cup shape constraints to the query accordingly. Even without such an extension,
precision is still relatively high (84%). The flag pattern also encounters low precision at times.
Close inspection revealed that SyFi missed occurrences in which the points had very close
values, namely a lower Klow would have made it possible to detect these charts.
Recall Analysis Tables 3.2 and 3.3 show that the recall is relatively high (on average, 77%) and
33
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Pattern Flag Rectangle
StockSymbol
Pat.Oc.
Pre.(%)
Rec.(%)
Rec.0(%)
Pat.Oc.
Pre.(%)
Rec.(%)
Rec.0(%)
AAPL 7 100 71 71 8 100 75 25GOOGL 4 100 75 0 3 100 67 67MSFT 4 100 25 0 5 100 60 60AXP 7 100 57 0 6 100 67 67BA 2 0 0 0 6 100 83 67CAT 9 89 89 33 5 100 80 80CSCO 2 - 0 0 5 100 100 60CVX 1 100 100 0 6 100 67 50DD 2 100 50 50 7 100 86 43DIS 6 100 67 0 7 100 86 57
Summary 44 88 53 15 58 100 77 58
Table 3.3: Detection statistics (continued).
is significantly better than Rec.0 (on average, 16%), which is the recall that would have been
obtained without the learning process. We inspected all pattern occurrences that the query did
not detect and observed that the most common reason was that thrs was too low (especially
in the head and shoulders and two tops). Because thrs was low, peaks that visually looked
insignificant were considered significant by the query and thus the query did not check the more
significant peaks, which were required to satisfy the pattern formula. The second reason for
missed occurrences was that Klow was too high. This affected especially the rectangle and two
tops patterns, both of which consist of points of the same height. Flag and triangle were also
affected by the high Klow as some pattern occurrences were missed when the points had very
close prices.
To improve recall, analysts may tune thrs and Klow. Yet, there is some trade-off between
precision and recall, and thus each analyst has to decide which metric is more important. Here,
we chose to show that very high precision could be obtained while maintaining relatively high
recall. This choice is due to our belief that the common approach is to prefer precision over
recall because too many false alarms will result in analysts ignoring the query reports.
Time Analysis To evaluate the efficiency of the queries, we measured the time it took the
query to scan the 10 stock streams. The results are summarized in Table 3.4, which shows the
average time (in seconds) taken for the queries to complete on one stream. Standard deviation
is provided in brackets. The table shows that the queries generated are highly efficient and
complete scanning 1500 charts in a few seconds.
Partial Learning We next study whether the learning process can be beneficial even if the
learning is not complete. To this end, we define that learning that is stopped after the kth
question outputs a formula that is the conjunction of predicates in Sall \ Snopat. This set is
guaranteed to include all pattern predicates, but it also may include irrelevant predicates (which
were not learned yet)2.
2The alternative is to generate a conjunction from the predicates in Spat. However, this is likely to result in toomany false reports, especially when Spat is empty, in which case every chart will be reported by SyFi.
34
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Pattern Avg. Time (std. dev.)
Head and Shoulders 3.425 (0.245)
Cup with Handle 3.11 (0.274)
Two Tops 2.287 (0.18)
Symmetrical Triangle 3.169 (0.271)
Flag 2.986 (0.269)
Rectangle 2.492 (0.175)
Table 3.4: Pattern detection times for sets of 1500 charts.
Head and Shoulders Cup with Handle Two Tops
Rec
all(
%)
Figure 3.4: Recall as a function of the number of questions presented in the learning process.
Fig. 3.4 shows the graphs of recall as the number of questions varies for three patterns. The
graphs show that recall improves as more questions are presented. Further, the graphs show that
partial learning may obtain good recall and is thus preferable over no learning. However, there
is no common behavior for the rate at which the recall is improved. For example, recall of head
and shoulders is low until the learning is almost complete, while that of two tops reaches its
maximum after one question, and that of cup with handle improves consistently.
3.6 Related Work
Queries over Finance Streams Several works aim to help technical analysts. Many trading
software platforms provide domain-specific languages for writing queries where the user defines
the query and the system is responsible for the sliding window mechanism, e.g., MetaTrader,
MetaStock, NinjaTrader, and Microsoft’s StreamInsight [CGM10].
Recently, Amibroker added a feature that supports writing queries in natural language.
However, this feature is limited to a small set of phrases in English provided by AmiBroker that
does not cover all AmiBroker’s instructions. Thus, this feature cannot be used for writing pattern
queries. Another tool designed to help analysts is Stat! [BCD+13], an interactive tool enabling
analysts to write queries in StreamInsight. Gradually and at each step it shows the results of
the current query. CPL [ACK01] is a Haskell-based high-level language designed for chart
pattern queries. Its unique features include support in fuzzy constraints and pattern composition.
Composition simplifies the encoding of complex patterns by first defining their segments and
then composing them to form the pattern. This approach is applicable only for pattern definitions
35
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
that do not have constraints pertaining to pairs of points from different segments. As shown
in Section 3.5, many definitions contain such constraints.
Queries over Streams Many other languages support queries for streams. SASE [WDR06] is a
system designed for RFID streams (Radio Frequency Identification) that offers a user-friendly
language and can handle large volumes of data. Cayuga [BDG+07] is a system for detecting
complex patterns in streams, whose language is based on the Cayuga algebra. SPL [HAG+13]
is IBM’s stream processing language supporting pattern detections. ActiveSheets [VTR+14] is
a platform that extends Microsoft Excel with abilities to process real-time streams from within
spreadsheets. ActiveSheets enables users to process streams using Excel formulas and it can be
used to detect patterns in stock streams by defining corresponding automata and encoding their
states and transitions in the spreadsheet.
3.7 Conclusion
We presented SyFi, a tool for synthesizing pattern queries over finance streams. SyFi receives
an example chart and interacts with the analyst by presenting a series of charts to learn the
pattern formula. SyFi then produces programs that execute over real-time trading platforms and
detect pattern occurrences in price streams. We showed that SyFi learns common patterns and
synthesizes efficient queries that detect these patterns in real stock streams with high precision
and recall.
36
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Chapter 4
Learning Disjunctions andConjunctions of Predicates
In the previous chapter, we showed an exact learning algorithm that interacts with a user to
learn his intent, which was modeled as a conjunction over a particular set of predicates. In this
chapter, we consider a more general setting, where the user’s intent is a disjunctive (and dually,
a conjunctive) formula over arbitrary predefined predicates. More formally, let Q be a set of
predicates over a domain D. Our goal is to learn the class Q∨ = ∨q∈Pq | P ⊆ Q of any
disjunction of predicates inQ. We give a learning algorithm D-SPEX that learns any function in
Q∨ with polynomially many queries. We then show that given some computational complexity
conditions on the set of predicates, D-SPEX runs in polynomial time.
We demonstrate the above on the class of conjunctions over QI , where QI is the set of
variable inequalities, i.e., predicates of the form xi > xj over n variables. If the set is acyclic
(∧QI 6≡ false), we show that learning can be done in polynomial time. If the set is cyclic
(∧QI ≡ false), we show that learning is equivalent to the problem of enumerating all the
maximal acyclic subgraphs of a directed graph, which is still an open problem ([ABC+12,
BCL+13, Was16]).
We begin this chapter with notations and main definitions. We then provide our algorithm
that learns disjunctions, discuss complexity, and describe conditions under which D-SPEX is
polynomial. We then provide the dual algorithm to learn conjunctions. Finally, we discuss the
case where the class of predicates is conjunctions over variable inequalities.
4.1 The Search Space
In this section, we describe the search space of the learning problem. We begin with defining
the nodes in the search space. To this end, we define an equivalence relation over the set
of disjunctions and the representatives of the equivalence classes. The nodes are then these
representatives. We continue with defining a partial order over the disjunctions, which defines
the edges between the nodes. We finally present related notions (descendant, ascendant, and
lowest/ greatest common descendant/ ascendant) that are translated later to the search paths.
37
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
4.1.1 The Nodes of the Search Space
Clearly, the nodes should correspond to the elements in Q∨. However, formulas in Q∨ may
be equivalent. To reduce the size of the search space, we define a node for each set of equi-
valent formulas. To define the node, we first define an equivalence relation over Q∨ and the
representatives of the equivalence classes. Then, the nodes are the representative elements.
Let Q be a set of predicates over the domain D. The equivalence relation ≡ over Q∨ is
defined as follows: two disjunctions ϕ1, ϕ2 ∈ Q∨ are equivalent (ϕ1 ≡ ϕ2) if ϕ1 is logically
equal to ϕ2. We denote equivalence classes by [ϕ], where ϕ ∈ Q∨. Notice that if [ϕ1] = [ϕ2],
then [ϕ1 ∨ ϕ2] = [ϕ1] = [ϕ2]. We define for every [ϕ] the representative element to be
Gϕ = ∨q∈Pq where P ⊆ Q is the maximum size set that satisfies ∨P ≡ ϕ. We denote by
G(Q∨) the set of all representative elements. That is, G(Q∨) = Gϕ | ϕ ∈ Q∨.
Example 1. Consider the domain D = 1, 2 × 1, 2 and the set Q = x1 ≥ 1, x1 ≥ 2, x2 ≥1, x2 ≥ 2. There are 16 formulas in Q∨ and five representative formulas: G(Q∨) = (x1 ≥1)∨ (x1 ≥ 2)∨ (x2 ≥ 1)∨ (x2 ≥ 2), (x1 ≥ 2)∨ (x2 ≥ 2), (x1 ≥ 2), (x2 ≥ 2), false (where
false is a contradiction).
The following facts hold immediately from the above definitions:
Lemma 4.1.1. Let Q be a set of predicates. Then,
1. The size of the search space is |G(Q∨)|.2. For every ϕ ∈ Q∨: Gϕ ≡ ϕ.
3. For every ϕ ∈ G(Q∨) and q ∈ Q\Q(ϕ): ϕ ∨ q 6≡ ϕ.
4.1.2 The Edges of the Search Space
In this section, we define a partial order over Q∨. This partial order defines a Hasse diagram
over G(Q∨), which serves as the search space and in particular describes the edges between
the nodes in G(Q∨). The partial order, denoted by ⇒, is defined as follows: ϕ1⇒ϕ2 if ϕ1
logically implies ϕ2, i.e., ϕ1 |= ϕ2. Consider the Hasse diagram H(Q∨) of G(Q∨) for this
partial order. The maximum (top) element in the diagram is Gmax = ∨q∈Qq. The minimum
(bottom) element is Gmin ≡ ∨q∈Øq, i.e., a contradiction.
In a Hasse diagram, G1 is a descendant (resp. ascendent) of G2 if there is a (nonempty)
downward path from G2 to G1 (resp. from G1 to G2), i.e., G1⇒G2 (resp. G2⇒G1) and
G1 6= G2. G1 is an immediate descendant of G2 in H(Q∨) if G1⇒G2, G1 6= G2 and there is
no G such that G 6= G1, G 6= G2 and G1⇒G⇒G2. G1 is an immediate ascendant of G2 if G2
is an immediate descendant of G1. We now show some preliminary results.
Properties of the Hasse Diagram
Lemma 4.1.2. Let G1 be an immediate descendant of G2 and ϕ ∈ Q∨. If G1⇒ϕ⇒G2, then
G1 ≡ ϕ or G2 ≡ ϕ.
38
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Proof. Since ϕ ≡ Gϕ, G1⇒Gϕ⇒G2. By the definition of immediate descendant, G1 = Gϕ or
G2 = Gϕ.
Lemma 4.1.3. If G1 is a descendant of G2, then Q(G1) ( Q(G2).
Proof. • Q(G1) ⊆ Q(G2): Assume there is q ∈ Q(G1) \ Q(G2). If G1 is a descendant
of G2, then G1⇒G2 and thus G1 ∨ G2 ≡ G2. By Lemma 4.1.1, q ∨ G2 6≡ G2, which
contradicts G1 ∨G2 ≡ G2.
• Q(G1) ( Q(G2): Assume otherwise, then G1 = G2 and thus G1 is not a descendant of
G2.
We denote by De(G) and As(G) the sets of all the immediate descendants and immedi-
ate ascendants of G, respectively. We further denote by DE(G), AS(G) the sets of all G’s
descendants and ascendants, respectively.
Lowest Common Ascendant and Greatest Common Descendant For G1 and G2, we de-
fine their lowest common ascendent (resp. greatest common descendant) G = lca(G1, G2)
(resp. G = gcd(G1, G2)) to be the formula G ∈ G(Q∨) that is the minimum (resp. maximum)
element in AS(G1) ∩AS(G2) (resp. DE(G1) ∩DE(G2)). This gives us the following lemma.
Lemma 4.1.4. Let G1, G2 ∈ G(Q∨) and ϕ ∈ Q∨.
1. If G1⇒ϕ⇒lca(G1, G2) and G2⇒ϕ⇒lca(G1, G2), then ϕ ≡ lca(G1, G2).
2. If gcd(G1, G2)⇒ϕ⇒G1 and gcd(G1, G2)⇒ϕ⇒G2, then ϕ ≡ gcd(G1, G2).
Proof. 1. Gϕ ≡ ϕ and Gϕ ∈ G(Q∨). Since G1⇒Gϕ⇒lca(G1, G2) and
G2⇒Gϕ⇒lca(G1, G2), by the definition of lca, Gϕ ≡ lca(G1, G2). Bullet 2. is similar.
We next characterize lca and gcd.
Lemma 4.1.5. Let G1, G2 ∈ G(Q∨). Then, lca(G1, G2) ≡ G1 ∨G2. In particular, if G1, G2
are two distinct immediate descendants of G, then G1 ∨G2 ≡ G.
Proof. Since G1⇒lca(G1, G2) and G2⇒lca(G1, G2), we get G1 ∨ G2⇒lca(G1, G2). Since
G1⇒(G1 ∨ G2)⇒lca(G1, G2) and G2⇒(G1 ∨ G2)⇒lca(G1, G2), by Lemma 4.1.4, we get
G1 ∨G2 ≡ lca(G1, G2).
Note that this does not imply that the predicates in these formulas are the same; namely, it
does not imply thatQ(G1∨G2) = Q(G1)∪Q(G2) = Q(lca(G1, G2)). In particular, G1∨G2
is not necessarily in G(Q∨). However, for the gcd, it is the case that its predicates are the
intersection of the predicates of G1 and G2, which is our next lemma.
Lemma 4.1.6. Let G1, G2 ∈ G(Q∨). Then, Q(G1) ∩Q(G2) = Q(gcd(G1, G2)).
In particular, if G1, G2 ∈ G(Q∨), then ∨(Q(G1) ∩Q(G2)) ∈ G(Q∨).
Also, if G1, G2 are two distinct immediate ascendants of G, then Q(G1) ∩Q(G2) = Q(G).
39
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Proof. • Q(gcd(G1, G2)) ⊆ Q(G1) ∩ Q(G2): Follows since by Lemma 4.1.3,
Q(gcd(G1, G2)) ⊆ Q(G1) and Q(gcd(G1, G2)) ⊆ Q(G2).
• Q(G1) ∩ Q(G2) ⊆ Q(gcd(G1, G2)): Since Q(gcd(G1, G2)) ⊆ Q(G1) ∩ Q(G2),
gcd(G1, G2) = ∨Q(gcd(G1, G2))⇒ ∨ (Q(G1) ∩ Q(G2)). Since ∨(Q(G1) ∩Q(G2))⇒G1 and ∨(Q(G1) ∩ Q(G2))⇒G2, by Lemma 4.1.4 we get gcd(G1, G2) =
∨(Q(G1) ∩Q(G2)). Thus, Q(G1) ∩Q(G2) ⊆ Q(gcd(G1, G2)).
If G1 and G2 are two distinct immediate ascendants of G:
• Q(G) ⊆ Q(G1) ∩ Q(G2) = Q(gcd(G1, G2)): Follows since Q(G) ⊆ Q(G1) and
Q(G) ⊆ Q(G2).
• Q(gcd(G1, G2) ⊆ Q(G): Q(gcd(G1, G2)) ⊆ Q(G1) ∩ Q(G2) and Q(G) ⊆ Q(G1) ∩Q(G2). Since G is an immediate descendant of G1 and G2, gcd(G1, G2)⇒G⇒G1 and
gcd(G1, G2)⇒G⇒G2, by Lemma 4.1.4 we get gcd(G1, G2) = G.
4.2 Searching the Space with Witnesses
In this section, we describe how the search space is traversed. We begin with defining a key
term called witness. Let G1 and G2 be elements in G(Q∨). An element e ∈ D is a witness
for G1 and G2 if G1(e) 6= G2(e) (here we treat formulas as Boolean functions, as described in
Chapter 2).
We begin with providing a few properties of a witness. The first lemma describes which
predicates are satisfied by the witness.
Lemma 4.2.1. Let G1 be an immediate descendant of G2. If e ∈ D is a witness for G1 and G2,
then:
1. G1(e) = 0 and G2(e) = 1.
2. For every q ∈ Q(G1), q(e) = 0.
3. For every q ∈ Q(G2) \ Q(G1), q(e) = 1.
Proof. Since G1⇒G2 it must be that G2(e) = 1 and G1(e) = 0. Namely, for every q ∈ Q(G1),
q(e) = 0. Let q ∈ Q(G2)\Q(G1). Consider G1∨ q. By bullet 3 in Lemma 4.1.1, G1∨ q 6≡ G1.
Since G1⇒G1 ∨ q⇒G2, by Lemma 4.1.2, G1 ∨ q ≡ G2. Therefore, 1 = G2(e) = G1(e) ∨q(e) = q(e).
Our next lemma states that given a node, it has a different witness for every immediate
descendant.
Lemma 4.2.2. Let De(G) = G1, G2, . . . , Gt be the set of immediate descendants of G. If e
is a witness for G1 and G, then e is not a witness for Gi and G for all i > 1. That is, G1(e) = 0,
G(e) = 1, and G2(e) = · · · = Gt(e) = 1.
Proof. By Lemma 4.2.1 G(e) = 1 and G1(e) = 0. For any Gi, i ≥ 2, G1 and Gi are
immediate descendants of G and thus, by Lemma 4.1.5, G ≡ G1 ∨Gi. Therefore, 1 = G(e) =
G1(e) ∨Gi(e) = Gi(e).
40
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Algorithm 6: D-SPEX1 return Learn(Gmax,∅)2 Function Learn(G, T ):3 Q = Q(G) // The set of predicates that the target ψ contains
4 Flag = true // Indicates whether the target is suspected to be G
5 for G′ ∈ getAllImmDe(G) do6 if ∃P ∈ T.Q(G′) ⊆ P then continue // G′ was eliminated by an ancestor
7 e = model(G ∧ ¬G′) // get a witness for G and G′
8 if ψ(e) = 0 then // pose membership query
9 Q = Q∩Q(G′); Flag = false // The target is G′ or its descendant
10 else11 T = T ∪ Q(G′) // Eliminate G′ and all its descendants
12 if Flag then return G13 Learn(∨Q, T )
Finally, we show how the witness enables the space to be searched for the target formula.
Lemma 4.2.3. Let G′ be an immediate descendant of G, e ∈ D be a witness for G and G′, and
G′′ be a descendant of G.
1. If G′′(e) = 0, G′′ is a descendant of G′ or equal to G′. In particular, Q(G′′) ⊆ Q(G′).
2. If G′′(e) = 1, G′′ is not a descendant of G′ nor equal to G′. In particular, Q(G′′) 6⊂Q(G′).
Proof. Since G′′ is a descendant of G, we have Q(G′′) ( Q(G). By Lemma 4.2.1, for every
q ∈ Q(G′), R(e) = 0 and for every q ∈ Q(G) \ Q(G′), q(e) = 1. Thus, if G′′(e) = 0, then no
q ∈ Q(G) \ Q(G′) is in Q(G′′) (otherwise, G′′(e) = 1). Therefore, Q(G′′) ⊆ Q(G′) and G′′
is a descendant of G′ or equal to G′. Otherwise, if G′′(e) = 1, then G′′ is not a descendant of
G′ nor equal to G′ (since if it were it must have been that G′′(e) = 0).
4.3 The D-SPEX Algorithm
In this section, we present our algorithm, called D-SPEX, that learns the classQ∨. Our algorithm
relies on the results from the previous section. To find the target formula ψ (more precisely,
its representative in G(Q∨), Gψ), D-SPEX starts from the maximal element in G(Q∨) and
traverses downwards the Hasse diagram. At each step, D-SPEX considers an element G, checks
its witnesses with its immediate descendants, and poses a membership query for each. The
witness is obtained by obtaining a satisfying example for the formula G ∧ ¬G′ (e.g., using an
SMT-solver). If ψ and G agree on the witness of G and G′, then by Lemma 4.2.3, ψ cannot be
G′ or its descendant, and thus these are pruned from the search space. Otherwise, if ψ and G′
agree on the witness, then ψ must be G′ or its descendant, and thus all other elements in G(F∨)
are pruned.
The D-SPEX algorithm is depicted in Figure 6. D-SPEX calls the recursive algorithm
Learn, which takes a candidateG and a set of subsets ofQ, T , that stores the already eliminated
41
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
elements from Q∨. Learn computes Q, a set of predicates over which ψ (i.e., Gψ) is defined
(i.e.,Q(Gψ) ⊆ Q). During the execution,Qmay be reduced. If not, thenG = ∨Q ≡ ψ. Learn
begins by initializing Q to the predicates in G, i.e., Q(G). Then, it examines the immediate
descendants of G whose ancestors have not been eliminated. When considering G′, a witness
e is obtained and Learn poses a membership query to learn ψ(e). If ψ(e) = 0 (recall that
G(e) = 1 since e is a witness), then G 6≡ ψ and ψ is inferred to be a descendant of G′ and is
thus over the predicates in Q(G′). Thus, Q is reduced. Otherwise, ψ is not a descendant of G′,
and thus G′ and its descendants are eliminated from the search space by adding Q(G′) to T .
Finally, ifG and ψ agreed on all witnesses (evident by the Flag variable), thenG is returned.
Intuitively, correctness follows since an invariant of the execution is that Gψ is G or one of its
descendants, and if G and ψ agreed on all witnesses, then by Lemma 4.2.3 Gψ is not any of G’s
descendants. Otherwise, if G and ψ did not agree on all witnesses (Flag is false), then Gψ is
inferred to be one of G’s descendants (by Lemma 4.2.3). More precisely, Gψ is a descendant
of the children that agreed with ψ on their witnesses. By the definition of gcd, ψ must be
that gcd or its descendant. Thus, Learn is invoked on their gcd, which by Lemma 4.1.6, is
the disjunction of their common predicates (stored in Q). Note that by the same lemma, this
disjunction is part of the Hasse Diagram (i.e., ∨Q ∈ G(Q∨)).
We now analyze D-SPEX’s complexity.
Theorem 4.1. If the immediate descendants of any G ∈ G(Q∨) can be found in time t, then
D-SPEX learns the target formula in time t · |Q| and at most |Q| · maxG∈G(Q∨) |De(G)|membership queries.
The complexity proofs follow directly from the height of the Hasse diagram (|Q|) and the
maximal number of immediate descendants (maxG∈G(Q∨) |De(G)|). The fact that D-SPEX
learns the target formula follows from the following lemma.
Lemma 4.3.1. Let ψ be the target formula. If Learn returns G, then Gψ = G (?). Otherwise,
if Learn calls Learn(∨Q, T ), then:
1. Q(Gψ) ⊆ Q. That is, Gψ is a descendant of ∨Q or equal to ∨Q.
2. Q(Gψ) 6⊂ P for all P ∈ T . That is, Gψ is not a descendant of any ∨P , for P ∈ T or
equal to ∨P .
Proof. The proof is by induction. Obviously, the induction hypothesis is true for (Gmax,Ø).
Assume the induction hypothesis is true for (∨Q, T ). That is, Q(Gψ) ⊆ Q and Q(Gψ) 6⊂ Pfor all P ∈ T . Let G′1, . . . , G
′` be all the immediate descendants of ∨Q. If Q(G′i) ⊆ P for
some P ∈ T , G′i and all its descendants G′′ satisfy Q(G′′) ⊆ Q(G′i) ⊆ P and thus Gψ is not
G′i or a descendant of G′i.
Assume now that Q(G′i) 6⊂ P for all P ∈ T . Let e(i) be a witness for ∨Q and G′i. If
ψ(e(i)) = 1, then by Lemma 4.2.3 Gψ is not a descendant of G′i and not equal to G′i. This
implies that Q(Gψ) 6⊂ Q(G′i) which is why Q(G′i) is added to T . This proves bullet 2.
42
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
If ψ(e(i)) = 1 for all i, then Gψ = G. This follows since by Lemma 4.2.3, ψ is not any of
G’s descendants, and thus by the induction hypothesis it must be G. This is the case when the
Flag variable does not change to false and D-SPEX outputs G. This proves (?).
If ψ(e(i)) = 0, then by Lemma 4.2.3, Gψ is a descendant of G′i or equal to G′i. Let I
be the set of all indices i for which ψ(e(i)) = 0. Then, Gψ is a descendant of (or equal to)
all G′i, i ∈ I and therefore Gψ is a descendant or equal to gcd(G′ii∈I). By Lemma 4.1.6,
Q(gcd(G′ii∈I)) = ∩i∈IQ(Gi). Thus, D-SPEX takes the new Q to be ∩i∈IQ(Gi). This
proves bullet 1.
We now prove the lower bound.
Theorem 4.2. Any learning algorithm that learns Q∨ must pose at least
max(log |G(Q∨)|,maxG∈G(Q∨) |De(G)|) membership queries. In particular, D-SPEX
poses at most |Q| ·OPT(Q∨) membership queries.
Proof. • OPT(Q∨) ≥ log |G(Q∨)|: The number of different formulas in Q∨ is |G(Q∨)|,and thus from the information theoretic lower bound we get OPT(Q∨) ≥ dlog |G(Q∨)|e.
• OPT(Q∨) ≥ maxG∈G(Q∨) |De(G)|: Let G′ be such that m = |De(G′)| =
maxG∈G(Q∨) |De(G)|. Let G1, . . . , Gm be the immediate descendants of G′. If the
target formula is either G′ or one of its immediate descendants, then any learning al-
gorithm must pose a membership query e(i) such that G′(e(i)) = 1 and Gi(e(i)) = 0.
Without such an assignment the algorithm cannot distinguish between G′ and Gi. By
Lemma 4.2.2, e(i) is a witness only to Gi and therefore the algorithm requires at least m
membership queries.
Finding All Immediate Descendants of G A missing detail in D-SPEX is how to find the
immediate descendants of G in the Hasse diagram H(S(G)) (in Line 5). In this section, we
explain how to obtain them. We first characterize the elements in H(S(G)) (compared to
the other elements in Q∨), which is required because the immediate descendants are part
of H(S(G)). We then give a characterization of the immediate descendants (compared to
other descendants), which leads to an operation that computes an immediate descendant from
a descendant. We finally show how to compute descendants that lead to obtaining different
immediate descendants. This completes the description of how D-SPEX can obtain all immediate
descendants.
By the definition of a representative, for every ϕ ∈ Q∨: Gϕ = ∨q|=ϕq. To decide whether
ϕ ∈ Q∨ is a representative, i.e., whether ϕ ∈ G(Q∨), we use the following lemma.
Lemma 4.3.2. Let ϕ ∈ Q∨. ϕ ∈ G(Q∨) if and only if for every q ∈ Q \ Q(ϕ): ϕ ∨ q 6≡ ϕ.
Proof. Follows from the definition of G(Q∨) (Lemma 4.1.1).
The next lemma shows how to decide whether G′ is an immediate descendant of G.
43
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Algorithm 7: GetAllImmDe(G)
1 Function GetImmDe(G,G′′):2 Q′ = Q(G′′)3 while ∃q ∈ Q(G) \ Q′ : (∨Q′) ∨ q 6≡ G do Q′ = Q′ ∪ q4 return Q′
5 De = GetImmDe(G, false)6 e = model(G ∧
∧mi=1
∨q∈Q(G)\Q(Gi)
¬q)7 while e 6= ⊥ do8 De = De ∪ GetImmDe(∨q ∈ Q(G) | e |= ¬q)9 e = model(G ∧
∧mi=1
∨q∈Q(G)\Q(Gi)
¬q)
10 return De
Lemma 4.3.3. Let G,G′ ∈ G(Q∨). G′ is an immediate descendant of G if and only if G′ is a
descendant of G and for every q ∈ Q(G) \ Q(G′) we have G′ ∨ q ≡ G.
Further, if G′ is a descendant of G and for some q ∈ Q(G) \ Q(G′) we have G′ ∨ q 6≡ G,
then GG′∨q is a descendant of G and an ascendant of G′.
Proof. Only if: Let G′ be an immediate descendant of G, i.e., G′⇒G. Let q ∈ Q(G) \ Q(G′).
Since G′⇒(G′ ∨ q)⇒G and G′ 6≡ G′ ∨ q (since G′ ∈ G(Q∨)), we get from Lemma 4.1.2 that
G′ ∨ q ≡ G.
If: Suppose G′ is a descendant of G and for every q ∈ Q(G) \ Q(G′) we have G′ ∨ q ≡ G.
If G′ is not an immediate descendant of G, then let G′′ be a descendant of G and an immediate
ascendant of G′′. Take any q ∈ Q(G′′) \ Q(G′) ( Q(G) \ Q(G′). Then, as before by
Lemma 4.1.2, G′ ∨ q ≡ G′′. However, G′′ 6≡ G and thus G′ ∨ q 6≡ G – a contradiction. This
also proves the last statement of the lemma.
The above lemma guides the computation of an immediate descendant from a descendant:
predicates from Q are added to the descendant as long as the resulting formula is not equivalent
to G. We phrase this in an operation called GetImmDe (Algorithm 7). GetImmDe takes G and a
descendant G′′ of G (which can even be the contradiction false), initializes Q = Q(G′′), and
repeatedly extendsQ as follows while possible: For q ∈ Q(G) \Q if (∨Q)∨ q 6≡ G, q is added
to Q.
GetImmDe can be used to obtain the first immediate descendant by calling it with the
contradiction false. Then the question is how to obtain a descendant for which GetImmDe will
return a different immediate descendant. More generally, the question is how to obtain a new
immediate descendant after computing a set of immediate descendants, or determine there are
no more immediate descendants. We first give intuition and then formalize it.
Let G′ be an immediate descendant of G. By Lemma 4.1.3, the predicates of any descendant
of G′ are contained in Q(G′) and none are in Q(G) \ Q(G′). Thus, a descendant of G which is
not a descendant of G′ can be constructed by looking for a descendant of G that satisfies one of
the predicates inQ(G) \ Q(G′). This ensures, by the operation of getImmDe, that the resulting
immediate descendant will also contain that predicate. Technically, a descendant of G might
be found by looking for an element satisfying ∨q∈(Q(G)\Q(G′))q. However, that element may
44
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
also satisfy the rest of the predicates in Q(G) and thus getImmDe would result in returning G
itself. To ensure that we find an element that is a descendant of G, we look for a witness of G
and an (unknown) descendant. Namely, we look for an element that satisfies G and falsifies that
descendant’s predicates. Though we do not know that descendant’s predicates, we know that
they intersect withQ(G)\Q(G′). In general, we know that given a set of immediate descendants
Gi ∈ G1, . . . , Gm, that descendant’s predicates intersect with each Q(G) \ Q(Gi). This
guides us to look for an example satisfying G ∧∧mi=1
∨q∈Q(G)\Q(Gi)
¬q. Given a satisfying
example, we construct the descendant by collecting the predicates whose negation was satisfied
by the example. We next formalize this.
Lemma 4.3.4. Let G1, . . . , Gm be immediate descendants of G. There is no other immediate
descendant for G if and only if
G ∧m∧i=1
∨q∈Q(G)\Q(Gi)
¬q (4.1)
is unsatisfiable. If (4.1) is satisfiable, then for any example e for (4.1) we have ∨q ∈ Q(G) |e |= ¬q is a descendant of G but not equal to and neither is a descendant of any Gi, i =
1, . . . ,m.
Proof. Only if: Suppose G ∧∧mi=1
∨q∈Q(G)\Q(Gi)
¬q is satisfiable and let e be a satisfying
example. Namely, for every i there is qi ∈ Q(G) \ Q(Gi) such that qi(e) = 0. Denote
Ge = ∨q ∈ Q(G) | e |= ¬q. Since qi 6∈ Q(Gi), Ge is not a descendant of Gi (Lemma 4.1.3).
Q(Ge) ( Q(G), since there exists q ∈ Q(G) such that e |= q, and thus e 6|= ¬q and q /∈ Q(Ge).
Thus, since Ge 6≡ G and Ge⇒G, Ge is a descendant of G. Since Ge is not a descendant of any
Gi, there must be another immediate descendant of G.
If: Assume that there is another immediate descendant of G, denoted by G′, we show that
(4.1) is satisfiable. Let e be a witness for G and G′. Then by Lemma 4.2.1, e |= G and for
every q ∈ Q(G′) we have q(e) = 0. Since Q(G′) 6⊆ Q(Gi), (Q(G)\Q(Gi)) ∩ Q(G′) is not
empty. Choose qi ∈ (Q(G)\Q(Gi)) ∩Q(G′). Then, qi ∈ Q(G)\Q(Gi) and since qi ∈ Q(G′),
qi(e) = 0 and ¬qi(e) = 1. Therefore, (4.1) is satisfiable.
4.4 The C-SPEX Algorithm
The C-SPEX algorithm that learns conjunctions over Q is dual to D-SPEX. C-SPEX learns the
class Q∧ = ∧Q′ | Q′ ⊆ Q. The changes to D-SPEX (Algorithm 4.2.3) are: (i) the witness
is obtained by taking an example satisfying ¬G ∧G′, (ii) the condition on Line 6 is changed
from 0 to 1, and (iii) Learn is invoked on ∧Q instead of ∨Q. By duality (De Morgan’s Law),
all our results are true for learning Q∧ (after swapping ∨ with ∧). In this section, we provide
the lemmas where the changes are more than swapping ∨ with ∧.
Witnesses We begin with lemmas that provide characteristics of the witnesses.
45
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Lemma 4.4.1. Let G1 be an immediate descendant of G2. If e ∈ D is a witness for G1 and G2,
then:
1. G1(e) = 1 and G2(e) = 0.
2. For every q ∈ Q(G1), q(e) = 1.
3. For every q ∈ Q(G2) \ Q(G1), q(e) = 0.
Proof. Since G1⇒G2 it must be that G2(e) = 0 and G1(e) = 1. Namely, for every q ∈ Q(G1),
q(e) = 1. Let q ∈ Q(G2) \ Q(G1). Consider G1 ∧ q. By bullet 3 in (the dual) Lemma 4.1.1,
G1 ∧ q 6≡ G1. Since G2⇒G1 ∧ q⇒G1, by (the dual) Lemma 4.1.2, G1 ∧ q ≡ G2. Therefore,
0 = G2(e) = G1(e) ∧ q(e) = q(e).
Lemma 4.4.2. Let De(G) = G1, G2, . . . , Gt be the immediate descendants of G. If e is a
witness for G1 and G, then e is not a witness for Gi and G for all i > 1. That is, G1(e) = 1,
G(e) = 0, and G2(e) = · · · = Gt(e) = 0.
Proof. By the previous lemma (Lemma 4.4.1) G(e) = 0 and G1(e) = 1. For any Gi, i ≥ 2, G1
and Gi are immediate descendants of G and thus, by (the dual) Lemma 4.1.5, G ≡ G1 ∧Gi.Therefore, 0 = G(e) = G1(e) ∧Gi(e) = Gi(e).
C-SPEX We continue with the lemmas pertaining the correctness of C-SPEX.
Lemma 4.4.3. Let G′ be an immediate descendant of G, e ∈ D be a witness for G and G′, and
G′′ be a descendant of G.
1. If G′′(e) = 1, G′′ is a descendant of G′ or equal to G′. In particular, Q(G′′) ⊆ Q(G′).
2. If G′′(e) = 0, G′′ is not a descendant of G′ nor equal to G′. In particular, Q(G′′) 6⊂Q(G′).
Proof. Since G′′ is a descendant of G, we have Q(G′′) ( Q(G). By Lemma 4.4.1, for every
q ∈ Q(G′), q(e) = 1 and for every q ∈ Q(G) \ Q(G′), q(e) = 0. Thus, if G′′(e) = 1, then no
q ∈ Q(G) \ Q(G′) is in Q(G′′) (otherwise, G′′(e) = 0). Therefore, Q(G′′) ⊆ Q(G′) and G′′
is a descendant of G′ or equal to G′. Otherwise, if G′′(e) = 0, then G′′ is not a descendant of
G′ nor equal to G′ (since if it were it must have been that G′′(e) = 1).
Lemma 4.4.4. Let ψ be the target formula. If Learn returns G, then Gψ = G (?). Otherwise,
if Learn calls Learn(∧Q, T ), then:
1. Q(Gψ) ⊆ Q. That is, Gψ is a descendant of ∧Q or equal to ∧Q.
2. Q(Gψ) 6⊂ P for all P ∈ T . That is, Gψ is not a descendant of any ∧P , for P ∈ T or
equal to ∧P .
Proof. The proof is by induction. The induction hypothesis is true for (Gmax,Ø). Assume the
induction hypothesis is true for (∧Q, T ) (Q(Gψ) ⊆ Q and Q(Gψ) 6⊂ P for all P ∈ T ). Let
G′1, . . . , G′` be all the immediate descendants of ∧Q. If Q(G′i) ⊆ P for some P ∈ T , G′i and
all its descendants G′′ satisfy Q(G′′) ⊆ Q(G′i) ⊆ P , and thus Gψ is not G′i or a descendant of
G′i.
46
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Assume now that Q(G′i) 6⊂ P for all P ∈ T . Let e(i) be a witness for ∧Q and G′i. If
ψ(e(i)) = 0, then by Lemma 4.4.3 Gψ is not a descendant of G′i and not equal to G′i. This
implies that Q(Gψ) 6⊂ Q(G′i), which is why Q(G′i) is added to T . This proves bullet 2.
If ψ(e(i)) = 0 for all i, then Gψ = G. This follows since by Lemma 4.4.3, ψ is not any of
G’s descendants, and thus by the induction hypothesis it must be G. This is the case when the
Flag variable does not change to false and C-SPEX outputs G. This proves (?).
If ψ(e(i)) = 1, then by Lemma 4.4.3, Gψ is a descendant of G′i or equal to G′i. Let I be
the set of all indices i for which ψ(e(i)) = 1. Then, Gψ is a descendant (or equal) of all G′i,
i ∈ I and therefore Gψ is a descendant or equal to gcd(G′ii∈I). By (the dual) Lemma 4.1.6,
Q(gcd(G′ii∈I)) = ∩i∈IQ(Gi). Thus, C-SPEX takes the new Q to be ∩i∈IQ(Gi). This
proves bullet 1.
Immediate Descendants Finally, we provide the dual lemmas pertaining to obtaining the
immediate descendants. Proofs are identical when swapping ∨ with ∧.
Lemma 4.4.5. Let ϕ ∈ Q∧. ϕ ∈ G(Q∧) if and only if for every q ∈ Q \ Q(ϕ): ϕ ∧ q 6≡ ϕ.
Lemma 4.4.6. Let G,G′ ∈ G(Q∧). G′ is an immediate descendant of G if and only if G′ is a
descendant of G and, for every q ∈ Q(G) \ Q(G′), we have G′ ∧ q ≡ G.
Further, if G′ is a descendant of G and for some q ∈ Q(G) \ Q(G′) we have G′ ∧ q 6≡ G,
then GG′∧q is a descendant of G and an ascendant of G′.
Lemma 4.4.7. Let G1, . . . , Gm be immediate descendants of G. There is no other immediate
descendant for G if and only if (¬G)∧∧mi=1
∨q∈Q(G)\Q(Gi)
q is unsatisfiable. If it is satisfiable,
then for any satisfying example e for we have ∧q ∈ Q(G) | e |= q is a descendant of G but
not equal and not a descendant of any Gi, i = 1, . . . ,m.
Proof. Only if: Suppose (¬G) ∧∧mi=1
∨q∈Q(G)\Q(Gi)
q is satisfiable and let e be a satisfying
example. Namely, for every i there is qi ∈ Q(G) \ Q(Gi) such that Qi(e) = 1. Denote
Ge = ∧q ∈ Q(G) | e |= q. Since qi 6∈ Q(Gi), Ge is not a descendant of Gi (the dual
Lemma 4.1.3). Ge is not equivalent to G, since there exists q ∈ Q(G) such that e 6|= q
(e |= ¬G) and thus q /∈ Q(Ge). Since Ge is a descendant of G (G⇒Ge and G 6≡ Ge) and not a
descendant of any Gi, there must be another immediate descendant of G.
If: Assume that there is another immediate descendant of G, denoted by G′. We show that
this formula is satisfiable. Let e be a witness for G and G′. Then by Lemma 4.4.1, e 6|= G and
for every q ∈ Q(G′) we have q(e) = 1. SinceQ(G′) 6⊆ Q(Gi), (Q(G)\Q(Gi))∩Q(G′) is not
empty. Choose qi ∈ (Q(G)\Q(Gi)) ∩Q(G′). Then, qi ∈ Q(G)\Q(Gi) and since qi ∈ Q(G′),
qi(e) = 1. Therefore, the formula is satisfiable.
4.5 A Polynomial Time Algorithm for Variable Inequalities
In this section, we study the learnability of conjunctions over variable inequality predicates.
An application of this class was studied in the previous chapter. There, the learning algorithm
47
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
started from a positive example, and here we make no such assumption. In this section, we
fix the domain to D = Rn (where R is the set of real numbers) and denote the examples as
tuples of variables e = (xe1, . . . , xen). The predicates we consider are pair-wise inequalities
over these variables, i.e., they take the form of xi > xj . More formally, given I ⊆ [n]2 where
[n] = 1, 2, . . . , n, the set of predicates we consider is QI = xi > xj | (i, j) ∈ I. We
assume throughout this section that (i, i) 6∈ I for all i.
We first focus on a subset of conjunctions where conjunctions do not imply cyclic constraints
(e.g., the conjunction x1 > x2 ∧ x2 > x1 has a cyclic constraint). We give a polynomial time
learning algorithm for learning this class. We then study the general case, where any conjunction
over QI is allowed. We show that in this case the learning problem is equivalent to the open
problem of enumerating all the maximal acyclic subgraphs of a given directed graph.
The main idea of the proofs is to represent a conjunction as a directed graph, where the
nodes are the variables and the edges are the constraints. Before introducing the graph, we
provide notations. For a set J ⊆ [n]2, we define ϕJ = ∧(i,j)∈Jxi > xj . For ϕ ∈ QI∧ we define
I(ϕ) = (i, j) | xi > xj is in ϕ. Note that I(ϕJ) = J . For example, I((x1 > x2) ∧ x3 >
x1)) = (1, 2), (3, 1).Given a set I ⊆ [n]2, its directed graph is GI = ([n], I). The reachability matrix of I ,
denoted by R(I), is an n× n matrix where R(I)i,j is 1 if there is a (directed) path from i to j
in GI ; and 0 otherwise. We say that I is acyclic (resp. cyclic) if the graph GI is acyclic (resp.
cyclic). We say that an assignment to the variables e ∈ Rn is a topological sorting of I if for
every (i, j) ∈ I we have xei > xej . It is known that I has a topological sorting if and only if I is
acyclic. Also, it is known that a topological sorting for an acyclic set can be found in linear time
(see [Knu97], Volume 1, section 2.2.3 and [CSRL01]).
We now study the properties of the graph and the reachability matrix, and then the learnability
ofQI∧ when I is acyclic and when I can be cyclic. Our first lemma states the connection between
satisfiability of a conjunction to topological order in the graph.
Lemma 4.5.1. Let ϕ ∈ QI∧, and e ∈ Rn. Then, e |= ϕ if and only if e is a topological sorting
of GI(ϕ).
Proof. If e |= ϕ, then for every (i, j) ∈ I , xei < xej and thus e is a topological order in GI . If
e is a topological order, then consider (i, j) ∈ I . By the definition of GI , it has the edge (i, j).
Since e is a topological order, i must be before j in the topological order, namely, xei < xej .
This lemma implies the following corollary.
Corollary 4.3. Let ϕ ∈ QI∧. ϕ is satisfiable if and only if there is a topological order in GI(ϕ).
In particular, a satisfying assignment e ∈ Rn can be found in linear time.
The next lemma connects equivalence of formulas with equality of the reachability matri-
ces. This will later enable us to focus on reachability matrices when looking for immediate
descendants of a given node: An immediate descendant of G is a formula G′ such that for any
q ∈ Q(G) \ Q(G′), G′ ∧ q ≡ G – this lemma reduces the problem of checking this equivalence
to checking the reachability matrices of G′ ∧ q and G.
48
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Lemma 4.5.2. Let ϕ1, ϕ2 ∈ QI∧. Then, if R(I(ϕ2))=R(I(ϕ1)), we have ϕ1 ≡ ϕ2. Further, if
I is acyclic, then ϕ1 ≡ ϕ2 if and only if R(I(ϕ2))=R(I(ϕ1)).
Proof. 1) Assume R(I(ϕ2)) = R(I(ϕ1)). Suppose on the contrary that ϕ2 6≡ ϕ1. Then, there
is an example e such that ϕ2(e) = 1 and ϕ1(e) = 0 (w.l.o.g.). Since ϕ1(e) = 0, e is not a
topological sorting of GI(ϕ1) (Lemma 4.5.1). Therefore, there is an edge i→ j in GI(ϕ1) such
that xei ≤ xej though xi > xj ∈ I . Since R(I(ϕ2))i,j = R(I(ϕ1))i,j = 1, there is a path from i
to j in GI(ϕ2). We now show that ϕ2(e) = 0 and thus get a contradiction. SinceR(I(ϕ2))i,j = 1
there is a path i = i1 → i2 → · · · → i` = j from i to j in GI(ϕ2). Therefore, ϕ2 contains
ϕ′ = [xi1 > xi2 ] ∧ [xi2 > xi3 ] ∧ · · · ∧ [xi`−1> xi` ]. Since ϕ2⇒ϕ′⇒xi1 > xi` = xi > xj and
our assignment satisfies [xei > xej ] = 0, we get ϕ2(e) = 0.
2) Assume I is acyclic and ϕ1 ≡ ϕ2. Suppose on the contrary that there are i, j such that
w.l.o.g. R(I(ϕ1))i,j = 0 andR(I(ϕ2))i,j = 1. Since I is acyclic andR(I(ϕ2))i,j = 1, there is
no path from j to i in GI (and therefore in GI(ϕ1)). Since R(I(ϕ1))i,j = 0, there is also no path
from i to j in GI(ϕ1). Therefore, we can match the vertices i and j in GI(ϕ1) (unify them to a
single vertex) and get an acyclic graph G′. Using the topological sorting of G′ we get a satisfying
assignment e for ϕ1 that satisfies xei = xej . Namely, ϕ1(e) = 1. We now show that ϕ2(e) = 0
and thus get a contradiction. SinceR(I(ϕ2))i,j = 1 there is a path i = i1 → i2 → · · · → i` = j
from i to j in GI(ϕ2). Therefore, ϕ2 contains ϕ′ = [xi1 > xi2 ]∧ [xi2 > xi3 ]∧· · ·∧ [xi`−1> xi` ].
Since ϕ2⇒ϕ′⇒[xi1 > xi` ] = [xi > xj ] and our assignment satisfies [xei > xej ] = 0, we get
ϕ2(e) = 0.
4.5.1 Acyclic Sets
In this section, we study the case when I is acyclic. To make C-SPEX polynomial, we have
to guarantee that the following are polynomial: (i) computing the immediate descendants and
(ii) computing witnesses. We show how to obtain the immediate descendants in quadratic time
(in n) and the witnesses in linear time. Finally, we show that the number of membership queries
is at most |I|.We show that the immediate descendants of a graph G are its sub-graphs that differ from G
in one edge and one pair of nodes that are unreachable. A witness can be obtained for G and
an immediate descendant by computing a topological order for the descendant that violates the
order between the pair of nodes that are reachable in G (thus, this topological order is not a
topological order for G). Before we describe this, we first characterize the members in G(Q∧)
through their reachability matrix.
Lemma 4.5.3. Let I be an acyclic set and ϕ ∈ QI∧. ϕ ∈ G(QI∧) if and only if, for every
(i, j) ∈ I\I(ϕ), there is no path from i to j in GI(ϕ).
Proof. If: If for every (i, j) ∈ I\I(ϕ) there is no path from i to j in GI(ϕ), then for every
(i, j) ∈ I\I(ϕ), R(I(ϕ) ∪ (i, j)) 6= R(I(ϕ)) . By Lemma 4.5.2 this implies that ϕ ∧ (xi >
xj) 6≡ ϕ. By Lemma 4.4.5 the result follows.
49
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Only if: Now let ϕ ∈ G(QI∧). By Lemma 4.4.5, for every xi > xj not in ϕ we have
ϕ ∧ (xi > xj) 6≡ ϕ. Therefore, there is an assignment e that satisfies xei ≤ xej and ϕ(e) = 1.
As before, if there is a path in GI(ϕ) from i to j, then we get a contradiction.
We now show how to determine the immediate descendants of G in polynomial time.
Lemma 4.5.4. Let I be acyclic. The immediate descendants of G ∈ G(QI∧) are all Gr,s =
ϕI(G)\(r,s) where (r, s) ∈ I(G) and there is no path from r to s in GI(G)\(r,s).
In particular, for all G ∈ G(QI∧), we have |De(G)| ≤ |I(G)| ≤ |I|.
Proof. On the one hand, (r, s) ∈ I(G), and thus R(I(G))r,s = 1. On the other hand, since
there is no path from r to s in GI(G)\(r,s), we haveR(I(Gr,s))r,s = 0. Therefore, R(I(G)) 6=R(I(Gr,s)) and by Lemma 4.5.2, we get G 6≡ Gr,s. By Lemma 4.4.6, Gr,s is an immediate
descendant of G.
To show that there is no other immediate descendant, we use Lemma 4.4.7. Note that
Q(G)\Q(Gr,s) = xr > xs and thus by Lemma 4.4.7 it is sufficient to prove that ¬G ∧∧(i,j)∈J(xi > xj) is unsatisfiable, where J = (i, j) ∈ I(G) | there is no path from i to
j in GI(G)\(i,j). To prove this, it is sufficient to show that G ≡∧
(i,j)∈J(xi > xj). By
Lemma 4.5.2, it is sufficient to show that R(I(G)) = R(J).
If R(J)i,j = 1, then R(I(G))i,j = 1 since GJ is a subgraph of GI(G). If R(I(G))i,j = 1,
there is a path p from i to j in GI(G). Let (r, s) 6∈ I(G)\J . Then, (r, s) ∈ I(G) and there is a
path (other than r → s) r → v1 → v2 → · · · → v` = s in GI(G). We now show that there is a
path from i to j in GI(G)\(r,s). This is true because if the path p (in GI(G)) contains the edge
r → s, then we can replace this edge with the path r → v1 → v2 → · · · → v` = s and get a
new path from i to j in GI(G)\(r,s). Therefore, R(I(G)\(r, s))i,j = 1. By repeating this on
the other edges in I(G)\J we get R(J)i,j = 1.
Corollary 4.4. The immediate descendants of G can be found in polynomial time.
Proof. By Lemma 4.5.4, this involves finding a path between every two nodes in the directed
graph GI(G), which can be done in polynomial time (e.g, using Dijkstra’s algorithm).
We now show how to find a witness.
Lemma 4.5.5. Let I be acyclic, G ∈ G(QI∧), and Gr,s = ϕI(G)\(r,s) be an immediate
descendant of G. A witness for G and Gr,s can be found in linear time.
Proof. By Lemma 4.5.4, (r, s) ∈ I(G) and there is no path from r to s in GI(G)\(r,s).
Therefore, if we match vertices r and s in GI(G)\(r,s) we get an acyclic graph G′. Then, a
topological sorting e for G′ is a satisfying assignment for Gr,s that satisfies xer = xes. Since
[xr > xs] ∈ S(G), we get G(e) = 0. Therefore, e is a witness for G and Gr,s.
Corollary 4.5. The class QI∧ is learnable in polynomial time with at most |I|2 membership
queries.
50
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Proof. Follows from Theorem 4.1 and Corollary 4.5.
We now show that in fact the number of membership queries is actually lower and equal to |I|.
Theorem 4.6. Let I ⊆ [n]2 be acyclic. The class QI∧ is learnable in polynomial time with at
most |I| membership queries.
Proof. Let ψ be the target function. Consider an execution of Learn(G,T). Let G = q1 ∧ q2 ∧· · ·∧qt where qi ∈ QI . By Lemma 4.5.4 and w.l.o.g.,G(i) = q1∧q2∧· · ·∧qi−1∧qi+1∧· · ·∧qt,where i = 1, . . . , ` are all the immediate descendants of G. Namely, Q = qi | i = 1, . . . , t.Let e(i) be the witness for G and G(i), i = 1, . . . , `. If ψ(e(i)) = 1, then Learn removes qifrom Q and qi never returns to Q. If ψ(e(i)) = 0, then the set q1, q2, . . . , qi−1, qi+1, · · · , qtis added to T , which means that C-SPEX never considers a descendant that does not contain qi.
Namely, for every qi there is at most one membership query that is posed by C-SPEX.
4.5.2 Cyclic Sets
In this section, we consider the general case, that is, where I ⊆ [n]2 can be any set. We first
show that if I is acyclic, then Gmax ≡ false and its immediate descendants are the maximal
acyclic subgraphs. Thus, obtaining them is equivalent to enumerating all maximal acyclic
subgraphs of GI .
Lemma 4.5.6. Let I ⊆ [n]2 be any set with cycles. Then:
1. Gmax ≡ false is in QI∧.
2. The immediate descendants of Gmax are all ∧(i,j)∈J [xi > xj ] where GJ is a maximal
acyclic subgraph of GI .
Proof. 1. I has a cycle, namely, there exists a cycle of constraints: x1 → x2 → · · · →xc → x1. Thus, Gmax⇒xi1 > xi1 ≡ false and thus Gmax ≡ false.
2. • If GJ is a maximal acyclic subgraph of GI , then adding any edge in I\J to GJcreates a cycle. This implies that for any xi > xj ∈ S(ϕI)\S(ϕJ), we have
ϕJ ∧(xi > xj) ≡ false ≡ Gmax. By Lemma 4.4.6, ϕJ is an immediate descendant
of Gmax.
• If ϕJ is an immediate descendant of Gmax, then J is acyclic because otherwise
ϕJ ≡ false ≡ Gmax. If GJ is not a maximal acyclic subgraph of GI , then there is an
edge (i, j) such that J ∪(i, j) is acyclic. Then, either ϕJ∪(i,j) ≡ ϕJ – in which
case ϕJ is not inG(Q∧) and thus not an immediate descendant – or ϕJ∪(i,j) 6≡ ϕJ– in which case Gmax⇒ϕJ∪(i,j)⇒ϕJ and Gmax 6≡ ϕJ∪(i,j) 6≡ ϕJ and therefore
ϕJ is not an immediate descendant of Gmax.
Corollary 4.7. Finding all the immediate descendants of Gmax is equivalent to enumerating
all the maximal acyclic subgraphs of GI .
51
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Let G be any directed graph and denote by N(G) the number of the maximal acyclic
subgraphs of G. The following lemma follows immediately from Theorem 5.8 and Lemma 4.5.6.
Lemma 4.5.7. OPT(QI∧) ≥ N(GI).
The problem of enumerating all the maximal acyclic subgraphs of a directed graph is still
an open problem ([ABC+12, BCL+13, Was16]). We next show that learning a function in QI∧(where I ⊆ [n]2) in polynomial time is possible if and only if the enumeration problem can be
solved in polynomial time.
Theorem 4.8. There is a polynomial time learning algorithm (poly(OPT(QI∧), n, |I|)) that,
for an input I ⊆ [n]2, learns ϕ ∈ QI∧ if and only if there is an algorithm that, for an input Gthat is a directed graph, enumerates all the maximal acyclic subgraph of G(V,E) in polynomial
time (poly(N(G), |V |, |E|)).
Proof. If: Let A be an algorithm that, for an input G, which is a directed graph, enumerates
all the maximal acyclic subgraphs in polynomial time (poly(N(G), |V |, |E|)). The first step
of C-SPEX finds all the immediate descendants of Gmax. By Lemma 4.5.6, this is equivalent
to enumerating all the maximal acyclic subgraphs of GI . This can be done by A in time
poly(N(GI), n, |I|). For every immediate descendant G′ of Gmax ≡ false, any topological
sorting of G′ is a witness for G′ and G. Once C-SPEX calls Learn on one of the immediate
descendants of Gmax, the algorithm proceeds as in the acyclic case. This algorithm runs in
poly(N(GI), n, |I|) time and poses at most N(GI) + |I| membership queries. By Lemma 4.5.7,
the algorithm runs in poly(OPT(QI∧), n, |I|) and poses at most OPT(QI∧) + |I| queries.
Only if: Let B be a learning algorithm that runs in poly(OPT(QI∧), n, |I|) time. By the
above argument, it follows that:
OPT(QI∧) ≤ N(GI) + |I|. (4.2)
Let G = ([n], E) be any directed graph. We run the learning algorithm with the target ϕE . For
any membership query posed by the algorithm, we answer 0 until the algorithm outputs the
hypothesis Gmax ≡ false. Let A be the set of all membership queries that are posed by the
algorithm. Then:
1. |A| = poly(N(G), n, |E|): Follows since the algorithm runs in poly(OPT(QI∧), n, |I|)and by (4.2) this is poly(N(G), n, |E|). Thus, the number of membership queries is
poly(N(G), n, |E|).
2. If G′ is a maximal acyclic subgraph of G, then there is an example e ∈ A such that
E(G′) = (i, j) ∈ E | xei > xej, where E(G′) is the set of edges of G′: There is an
example e ∈ A that satisfies ϕE(G′)(e) = 1, because otherwise the algorithm cannot
distinguish between ϕE(G′) and Gmax, which violates the correctness of the algorithm.
Now, since ϕE(G′)(e) = 1, we must have E(G′) ⊆ (i, j) ∈ E | xei > xej. Since E(G′)
is maximal (adding another edge will create a cycle), E(G′) = (i, j) ∈ E | xei > xej.
52
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
The algorithm that enumerates all the maximal acyclic subgraphs of G(V,E) continues to
run as follows. For each e ∈ A, it defines Ee := (i, j) ∈ E | xei > xej. If Ge = ([n], Ee) is a
maximal acyclic subgraph, and then it lists Ge. Testing whether Ge = ([n], Ee) is maximal can
be done in polynomial time (e.g., by checking edge-by-edge in E). Thus, the overall algorithm
runs in poly(N(G), |V |, |E|) time.
4.6 Conclusion
In this chapter, we studied the learnability of disjunctions (Q∨) and conjunctions (Q∧) over
a set of predicates Q. We showed two algorithms, D-SPEX and C-SPEX, which pose at most
|Q| · OPT (Q∨) membership queries. We further showed a class that C-SPEX can learn in
polynomial time.
53
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
54
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Chapter 5
Learning a DNF of Predicates
In the previous chapter, we introduced algorithms for the general but limited setting of learning
a disjunction and conjunction over an arbitrary set of predicates. In this chapter, we extend this
setting further, and study the problem of learning a disjunctive normal form (DNF) formula (or
dually, a conjunctive normal form) describing the user’s intent through membership queries.
DNF formulas are disjunctions of cubes, where a cube is a conjunction of predicates. As in the
previous chapter, the formulas are over arbitrary predefined predicates. More formally, let Q be
a set of predicates over a domain D. Our goal is to learn the class QDNF = ∨C∈S
∧q∈C q |
S ⊆ 2Q. We further focus on two settings:
• Predicates that are closed under negation: for all q ∈ Q, ¬q ∈ Q.
• Predicates that are “anti-closed” under negation: for all q ∈ Q, ¬q /∈ Q.
We show for each setting an improved algorithm. For the first setting, we show an algorithm
optimal in the number of membership queries.
5.1 The Search Space
In this section, we describe the search space. Corresponding each node to a DNF formula
will result in an exponential blow-up. Instead, we correspond nodes to cubes and the goal is
to find a set of cubes (nodes) such that the target formula is equivalent to their disjunction.
Formally, our search space is identical to the one presented in Section 4.1: the nodes are the set
of non-equivalent formulas over Q, G(Q∧), and the edges are defined by the Hasse diagram
over the partial order⇒, where G1⇒G2 if G1 logically implies G2. All notions of (immediate)
descendants/ ascendants, and lca/ gcd are identical. Given this search space, our problem
definition can be stated as follows:
Definition 5.1.1. Given a target DNF formula ψ over Q, find a set of nodes S in the Hasse
diagram of H(G(Q∧)) for⇒, such that∨C∈S C ≡ ψ.
55
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Algorithm 8: DNF-SPEX1 return Learn(Gmax,∅, ∅)2 Function Learn(Gs, S, T ):3 if Gs = ∅ then return S4 NewGs = ∅5 for G ∈ Gs do6 Flag = true7 for G′ ∈ getAllImmDe(G) do8 if ∃P ∈ T.Q(G′) ⊆ P then continue // G′ was eliminated by an ancestor
9 e = model(¬G ∧G′) // get a witness for G and G′
10 if ψ(e) = 1 then // pose membership query
11 NewGs = NewGs ∪ G′ // A cube is G′ or its descendant
12 Flag = false
13 else14 T = T ∪ Q(G′) // Eliminate G′ and all its descendants
15 if Flag then S = S ∪ G // G is one of the cubes
16 Learn(newGs, S, T )
5.2 Searching the Space with Witnesses
The notion of witness, as defined in Section 4.2, and the lemma defined in this section, remain
correct (as the space has not changed, but rather the goal). In the previous chapter we relied on
the fact that if the target is implied by two (non-comparable) nodes, then the target is implied by
their gcd (which enabled Learn to be invoked on the intersection of these nodes’ predicates in
D-SPEX, Line 13). Here, however, the goal is to find a set of nodes, and it may be that the target
contains these two nodes but is not implied by their gcd. Thus, C-SPEX cannot be used directly
to learn the cubes of a DNF formula. Instead, we consider a new algorithm, called DNF-SPEX,
which modifies C-SPEX for the new goal.
DNF-SPEX, depicted in Algorithm 8, traverses the space G(Q∧) to find the cubes of the
target formula ψ (more precisely, to find a set of cubes whose disjunction is equivalent to ψ).
As in the previous chapter, DNF-SPEX invokes Learn to learn the target. Learn takes three
sets: (i) a set that contains nodes that cubes are reachable from – Gs, (ii) a set of nodes that are
known to be part of the cubes – S , and (iii) a set that contains nodes that are pruned; that is, they
and their descendants are not part of the cubes – T . The first invocation of Learn, called by
DNF-SPEX, is on a set Gs that contains the maximal element in G(Q∧) and the empty sets for
S and T . At each step, Learn examines all members in Gs. For each such element G, it checks
the witnesses of G and its immediate descendants, and poses a membership query for each. If ψ
and G agree on the witness of G and an immediate descendant G′, then by Lemma 4.4.3, ψ is
not implied by G′ or its descendants, and thus these are pruned from the search space by adding
G′ to T . Otherwise, if ψ and G′ agree on the witness, then G′ or its descendants are part of the
cubes of ψ. Thus, G′ is added to newGs (for the next call of Learn).
If G and ψ agreed on all witnesses (evident by the Flag variable), then G is inferred to
be a cube in ψ. As before, correctness follows since an invariant of the execution is that one
56
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
of ψ’s cubes is G or one of its descendants, and if G and ψ agreed on all witnesses, then by
Lemma 4.4.3 neither of G’s descendants is a cube of ψ. Eventually, Learn is invoked on
NewGs, T , and S. Learn terminates when NewGs is empty, at which point it is guaranteed
that S contains all cubes, i.e., ψ ≡∨C∈S
∧C.
We now analyze DNF-SPEX’s complexity.
Theorem 5.1. If the immediate descendants of any G ∈ G(Q∧) can be found in time t, then
DNF-SPEX learns the target formula in time t · |G(Q∧)| and at most |G(Q∧)| membership
queries.
The complexity proofs follow directly from the size of the search space. While this may
seem to be a naıve algorithm (and indeed we will show improvements for some classes), this
complexity should be compared to the search space of all DNF formulas, which is of size
G(QDNF ) = Ω(2width(G(Q∧)))1, where width stands for the width of the Hasse diagram of
G(Q∧) (i.e., the maximal number of non-comparable nodes).
The fact that DNF-SPEX learns the target formula follows from the following lemma.
Lemma 5.2.1. Let ψ be the target formula. If Learn returns S, then ψ ≡∨C∈S
∧C (?).
Otherwise, if Learn calls Learn(Gs,S, T ), then for every G ∈ Gs:1. G |= ψ; that is, there is a cube in ψ that is G or a descendant of G.
2. For all P ∈ T , ∧P 6|= ψ. That is, no cube in ψ is a descendant of ∧P or equal to ∧P ,
for P ∈ T .
3. For every C ∈ S, C |= ψ and no descendant of C logically implies ψ.
Proof. The proof is by induction. Initially, the induction hypothesis is true for (Gmax,Ø,Ø).
Assume the induction hypothesis is true for (Gs,S, T ). Let G ∈ Gs and let G′1, . . . , G′` be all
its immediate descendants. If Q(G′i) ⊆ P for some P ∈ T , then by the induction hypothesis,
G′i and all its descendants G′′ satisfy G′′ 6|= ψ (since ∧P 6|= ψ, there is e |= ∧P such that
e 6|= ψ. On the other hand, Q(G′′) ⊆ Q(G′i) ⊆ P and thus ∧P |= G′i |= G′′). Thus, no cube is
G′i or a descendant of G′i.
Assume now that Q(G′i) 6⊂ P for all P ∈ T . Let e(i) be a witness for G and G′i. If
ψ(e(i)) = 0, then by Lemma 4.4.3 there is no cube in ψ that is a descendant of G′i or equal to
G′i. This implies that Q(Gψ) 6⊂ Q(G′i), which is why Q(G′i) is added to T . This proves bullet
2 of the lemma.
If ψ(e(i)) = 0 for all i, then G is a cube in ψ. This follows since by Lemma 4.4.3, no cube
is a descendant of any of G’s descendants, and thus by the induction hypothesis it must be
that G is a cube in ψ. This is the case when the Flag variable does not change to false and
DNF-SPEX adds G to S. This proves bullet 3.
If ψ(e(i)) = 1, then by Lemma 4.4.3, there is a cube that is a descendant of G′i or equal to
G′i. Thus, G′i |= ψ. Thus, DNF-SPEX adds G′i to NewGs. This proves bullet 1.
1This follows since the DNF formula space is at least of magnitude of∑width(G(Q∧))i=0
(width(G(Q∧))
i
), where i
stands for the number of cubes.
57
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
We finally show that if Learn returns∨C∈S
∧C, then ψ ≡
∨C∈S
∧C. If Learn returns∨
C∈S∧C, then for everyC ∈ S ,C |= ψ (by the induction hypothesis), and thus
∨C∈S
∧C |=
ψ. We now prove that for every e |= ψ, e |=∨C∈S
∧C. Let e be an example such that
e |= ψ and consider Ce = ∧q ∈ Q | e |= q. We show that DNF-SPEX must have
considered Ce and thus either Ce ∈ S or a descendant of Ce, denoted G′′, is in S; in either case,
e |= Ce |= G′′ |=∨C∈S
∧C.
• First, Ce ∈ G(Q∧): For every q /∈ Ce, e 6|= Ce ∧ q (by the definition of Ce). Thus,
Ce 6≡ Ce ∧ q, and thus, by Lemma 4.4.5, Ce ∈ G(Q∧).
• Second, Ce |= ψ: If e |= ψ, there is a cube C in ψ such that e |= C and by the definition
of Ce, Ce |= C.
• Third, consider a path from Gmax to Ce: Gmax = G0 |= G1 |= . . . |= Gk = Ce. We
show:
– Each Gi is considered by DNF-SPEX (i.e., at some point they are at Gs): By
induction. Base is trivial. Assume in contradiction that Gi is not considered. Then,
there exists P ∈ T such that Q(Gi) ⊆ P . However, in this case Gi 6|= ψ, and
thus in particular Ce 6|= ψ (as a descendant of Gi), which is a contradiction to the
previous bullet.
– Either Ce ∈ S or a descendant of Ce, G′′, is in S: We prove that if Ce ∈ Gs, then
either Ce is added to S or a descendant of Ce is added to S. If Ce is added to S –
we are done. Thus, assume it is not added to S. Namely, Flag is set to false, and
thus an immediate descendant of Ce is added to Gs. Then, from the same argument,
either the descendant is added to S or its descendant is added to Gs. We continue
with this argument until Gs contains Gmin. Gmin must be added to S (it has no
immediate descendants and thus Flag remains true) and thus the claim follows.
We now prove the lower bound.
Theorem 5.2. Any learning algorithm that learnsQDNF must ask at least log(|G(Q∧)|) mem-
bership queries.
Proof. The number of different formulas in QDNF is at least |G(Q∧)|, and thus from the
information theoretic lower bound we get OPT(QDNF ) ≥ dlog |G(Q∧)|e.
Note that QDNF can be equal to G(Q∧) for some predicate sets. For example, consider
Q = (x = 1), (y = 1), (x = 1 ∨ y = 1). In this case, G(Q∧) = (x = 1 ∧ y = 1), (x =
1), (y = 1), (x = 1∨ y = 1), true and every disjunction overQ is equivalent to some formula
in G(Q∧).
We next show that if further information is available about the search space, we can obtain
improved algorithms and bounds on the number of membership queries.
58
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
5.3 Learning when Predicates are Closed under Negation
In this section, we consider the special case where the predicate set is closed under negation.
Namely, for every q ∈ Q: ¬q ∈ Q. We begin this section with a few terms and then outline
our contributions. We say that two examples e1, e2 ∈ D are equivalent with respect to Q if
for all q ∈ Q, e1 |= q ⇔ e2 |= q. If two examples are equivalent with respect to Q, we write
e1 ≡Q e2. A non-equivalent example set is a maximal subset of D such that no pair of examples
is equivalent. Formally, E ⊆ D is a non-equivalent example set if and only if:
• ∀e1, e2 ∈ E : e1 6≡Q e2 and
• ∀e ∈ D \ E: ∃e′ ∈ E : e ≡Q e′.We next show that for this class to learn a target DNF formula, the classification of all
non-equivalent examples must be known. We then consider a setting oriented for program
synthesis applications, where one is given a set of representative positive examples. For this
setting, we show an algorithm optimal in the number of membership queries.
5.3.1 A Lower and Upper Bound
In this section, we show that without further assumptions, to learn a DNF formula all non-
equivalent examples must be observed.
Lemma 5.3.1. Let ψ be a target (unknown) DNF formula and E a non-equivalent example set.
Any learning algorithm A that learns ψ from membership queries has to pose for every e ∈ Ea membership query a ∈ D such that e ≡Q a. In particular, A has to pose |E| membership
queries.
Proof. Let A be such a learning algorithm. Assume in contradiction that there exists e ∈ Esuch that for any membership query a that A posed, e 6≡Q a. Let ϕ be the formula A learned.
We show that possibly ψ 6≡ ϕ, even though ψ is consistent with the membership queries Aobserved. This proves that A cannot distinguish between two non-equivalent formulas, and thus
it returned an incorrect formula.
Let EA be the set of examples A observed which were discovered as positive examples. We
split to cases:
• If e |= ϕ: Define ψ =∨e∈EA
∧Ce where Ce = q ∈ Q | e |= q. For every positive
example A observed, ψ returns 1. We now prove:
– e 6|= ψ: Since for every a ∈ EA, a 6≡Q e, e 6|= Ca for every a ∈ EA. This follows
since if a 6≡Q e, there exists q ∈ Q such that a |= q and e 6|= q (this is true because
Q is closed under negation), and thus q ∈ Ca and e 6|=∧Ca).
– For every negative example that A observed, ψ returns 0: Let e′ be a negative
example. Thus, for every a ∈ EA, a 6≡Q e′. As before, e′ 6|= Ca for every a ∈ EA.
Thus, e′ 6|= ψ.
Namely, ψ is consistent with the membership queries A observed and e 6|= ψ. Since
e |= ϕ, ψ 6≡ ϕ.
59
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
• If e 6|= ϕ: Define ψ =∨e∈EA∪e
∧Ce where Ce is defined as before. Proof is similar to
the former case: ψ is consistent with all positive and negative examples (none of which is
equivalent to e with respect to Q) and e |= ψ. Since e 6|= ϕ, ϕ 6≡ ψ.
This lower bound is in fact a tight bound, since if a non-equivalent example set is provided,
the target formula can be inferred immediately. This is our next corollary.
Corollary 5.3. Let ψ be a target (unknown) DNF formula and E a non-equivalent example set.
There exists a learning algorithm that learns ψ with |E| membership queries.
Proof. The algorithm acts as follows. It poses a membership query for every e ∈ E. Then, if
EP ⊆ E is the set of positive examples (i.e., ψ(e) = 1), the algorithm returns ϕ =∨e∈EP
∧Ce,
where Ce = q ∈ Q | e |= q. We now prove that the algorithm is correct on all inputs. Let
e ∈ D. Thus, there exists a ∈ E such that e ≡Q a, and thus e |= ψ ⇔ a |= ψ. If a is a positive
example, then Ca is part of ϕ, and since e ≡Q a, e |= ϕ. Otherwise, if a is negative, then since
for every a′ ∈ E, a 6≡Q E, a 6|= Ca′ and thus e 6|= Ca′ and in particular, e 6|= ϕ.
5.3.2 Learning with Representative Positive Examples
In this section, we consider a special setting oriented for program synthesis. A common as-
sumption of programming by example is that the user provides a set of representative positive
examples. Typically, this notion is interpreted intuitively and implies that the provided examples
indicate the desired behavior on all examples: all positive examples “resemble” one of the repre-
sentative examples and none of the negative examples resemble them. Here, we formalize this
notion and then show an algorithm that learns a target DNF formula from a set of representative
examples. If the set of representative examples is minimal, then the algorithm is optimal in the
number of examples it needs for learning. We begin with the main definitions.
Let ψ be a DNF formula and let Sψ be the set of cubes in ψ, that is, Sψ ⊆ 2Q and∨C∈Sψ
∧C = ψ. A set of examples E ⊆ D is called a representative set for ψ if for every
C ∈ Sψ there is e ∈ E such that e |= C. Note that examples in E may satisfy more than one
cube (|E| may be smaller than |Sψ|). A representative set for ψ is minimal if every strict subset
is not a representative set for ψ.
The search space, G(Q∧), does not allow pruning in this setting. Consider the set of
predicates Q = x1,¬x1, . . . , xn,¬xk, where xi are Boolean variables. In this example,
the number of immediate descendants of Gmax is 2|Q|/2. This is because every subset of
Q that contains either xi or ¬xi (but not both) is not equivalent to Gmax but adding it any
other predicate will make it equivalent to Gmax. Thus, the current algorithm will introduce a
membership query for every Gmax and its immediate descendant, resulting in presenting all
non-equivalent examples.
Instead, we consider a different search space that enables pruning in our setting. The
main insight is that instead of looking for all cubes of the target formula – which cannot
be done without a non-equivalent example set – we look for all cubes satisfied by a given
60
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
positive example. To this end, we define a new search space where nodes correspond to non-
equivalent examples from D and we organize them in a way that enables pruning – based on
the given positive example. More specifically, given a positive example e ∈ D, we search
for all cubes that are satisfied by e and that logically imply the target DNF ψ, i.e., all sets
C ⊆ Q such that e |= ∧C |= ψ. To guarantee that e |= ∧C, the search considers subsets of
Q(e) = q ∈ Q | e |= q. This is performed by an on-the-fly traversal of a directed graph
Ge = (V,Ee) where each node v ∈ V is associated with such a subset C ⊆ Q(e). The traversal
ensures that for every node that remains in the graph, the set associated with it satisfies ∧C |= ψ.
We first present this new search space. We then show how it can be used to optimally search
for the set of conjunctions satisfied by a single positive example. Finally, we show an algorithm
to learn a DNF formula from a representative set. If the representative set is minimal, we prove
that this algorithm is optimal in the number of membership queries.
A New Search Space
Given an example e, we define a new search space, denoted by Ge = (V,Ee). To describe the
nodes, we first divide the predicates to two sets:
• The core predicate set, denoted byQc, which is the maximal subset ofQ that is not closed
under negation: ∀q ∈ Qc : ¬q /∈ Qc and for every q ∈ Q \ Qc : ∃q′ ∈ Qc : q ≡ ¬q′.• The negated predicate set, denoted by Q¬, which are all the predicates’ negation: Q¬ =
Q \ Qc.In the following, we use the standard names and refer toQ as a literal set, and writeL = Qc∪Q¬and l for elements (literals) in L. We further simplify writing and assume all predicates in Q¬take the form of ¬q where q ∈ Qc. Let q be a literal inQc, we say that q is its positive form and
¬q is its negative form. Another notation we use is L(e), for e ∈ D, and this is the set of literals
satisfied by e: L(e) = l ∈ L | e |= l.
Nodes The set V of nodes in Ge is the set of all subsets of L in which every predicate appears
exactly once, either in its positive form or its negative form. Namely,
V = v ⊆ L | ∀l ∈ L. l ∈ v ↔ ¬l 6∈ v
Nodes in V are classified to positive, negative or unsatisfiable by associating them with concrete
examples. For a node v ∈ V and an example e′ ∈ D, if e′ |= ∧v, we say that e′ is an example
corresponding to v. If e′ |= ψ (the target DNF formula), v is positive; if e′ 6|= ψ, v is negative;
if there is no e′ ∈ D such that e′ |= ∧v, v is unsatisfiable. Note that a node may have multiple
concrete examples from D that correspond to it; however, they are all equivalent modulo L.
Thus, in the following we refer to corresponding examples in a singular form (e.g., we say the
corresponding example of a node).
We say that nodes v 6= v′ ∈ V are logically equivalent if ∧v ≡ ∧v′. As we show next, this
can only happen if v and v′ are unsatisfiable.
61
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Lemma 5.3.2. Logically equivalent nodes in V are unsatisfiable: If v 6= v′ and ∧v ≡ ∧v′, then
∧v is unsatisfiable.
Proof. Since v 6= v′ ∈ V , there exists l ∈ v such that ¬l ∈ v′. Assume in contradiction that
there exists e |= ∧v. Then, on the one hand e |= ∧v |= l. On the other hand, ∧v ≡ ∧v′ and thus
e |= ∧v′ |= ¬l, a contradiction.
Corollary 5.4. For every e′ ∈ D, there exists a single node v ∈ V such that L(e′) = v.
Note that the set of nodes in Ge, their corresponding examples, and their classification are
independent of e. Next, we define the set of edges, as well as the cube associated with each
node, which depend on e.
Associated Cubes and Edges
To define the set of edges Ee, we first define a labeling of nodes. Given an example e the e-label
of a node v ∈ V , denoted Ce(v), is the set of literals from L(e) in v. That is,
Ce(v) = v ∩ L(e)
Therefore, for every v ∈ V , e |= ∧Ce(v), which ensures that the e-label of each node represents
a candidate set C that satisfies e |= ∧C |= ψ. We next show that each node has a unique e-label
and that for each subset C ⊆ L(e) there is a node with that e-label.
Lemma 5.3.3. If v 6= v′, then Ce(v) 6= Ce(v′).
Proof. Since v 6= v′, there exists l ∈ v such that ¬l ∈ v′. Assume w.l.o.g. that l ∈ L(e). Then,
l ∈ Ce(v) and l /∈ Ce(v′).
Lemma 5.3.4. For every C ⊆ L(e) there exists a node v ∈ V such that Ce(v) = C.
Proof. Given C ⊆ L(e), we define v = C ∪Q1∪Q2 such that Q1 = ¬q ∈ L | q ∈ L(e)\Cand Q2 = q ∈ L | ¬q ∈ L(e) \ C. We show v ∈ V :
• If q /∈ v, then ¬q ∈ v: If q /∈ v, then q /∈ C and q /∈ Q2. If ¬q ∈ C – we are done.
Otherwise, since ¬q /∈ C and ¬q /∈ L(e)\C, then ¬q /∈ L(e). This implies that q ∈ L(e)
and thus q ∈ L(e) \ C, which implies ¬q ∈ Q1 and ¬q ∈ v.
• If ¬q /∈ v, then q ∈ v: Similar.
• There are no q,¬q ∈ v: Assume in contradiction that there are q,¬q ∈ v. SinceC ⊆ L(e),
it cannot be that q,¬q ∈ C. Since q,¬q /∈ L(e), it cannot be that q,¬q ∈ L(e) \ C and
thus it cannot be that ¬q ∈ Q1 and q ∈ Q2. If q ∈ C and ¬q ∈ Q1, then q ∈ L(e) \ C –
a contradiction. If ¬q ∈ C and q ∈ Q2, then ¬q ∈ L(e) \ C – a contradiction.
Using the labeling, we define the set of edges, Ee, in a way that allows checking whether
∧Ce(v) |= ψ. The idea is to define the edges such that the set of ancestors of a node v in Ge
62
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
represents all the ways of extending Ce(v) into maximal cubes (where every predicate appears
exactly once). This ensures that ∧Ce(v) |= ψ if and only if v and all its ancestors are positive.
Formally, we define:
Ee = (v, v′) ∈ V × V | Ce(v′) ( Ce(v) and ¬∃v′′ ∈ V.Ce(v′) ( Ce(v′′) ( Ce(v)
We next define terminology. We define the parents of v through a function parentse : V →P(V ):
parentse(v) = v′ ∈ V | (v′, v) ∈ Ee
The ancestors of a node v, ancestorse(v), is the minimal set containing parentse(v) and closed
under the parentse function. The descendants of v, descendantse(v), is the set v′ ∈ V | v ∈ancestorse(v
′).
Observation 5.3.5. The following hold:
1. ancestors(v) = v′ ∈ V | Ce(v) ( Ce(v′) = v′ ∈ V | Ce(v) ( v′
2. descendants(v) = v′ ∈ V | Ce(v′) ( Ce(v) = v′ ∈ V | v′ ( Ce(v)
Lemma 5.3.6. Let v be a node such that every v′ ∈ v ∪ ancestors(v) is either unsatisfiable
or positive. Then, ∧Ce(v) |= ψ.
Proof. Assume in contradiction that ∧Ce(v) 6|= ψ. That is, there exists e′ |= ∧Ce(v) and
e′ 6|= ψ. Consider the node v′ = L(e′). Since e′ |= ∧Ce(v), Ce(v) ⊆ L(e′), and since
Ce(v) ⊆ L(e), Ce(v) ⊆ Ce(v′). Namely, v′ ∈ v ∪ ancestors(v) whose corresponding
example is negative. This contradicts the assumption.
Lemma 5.3.7. Let v be a negative node and v′ ∈ v ∪ descendants(v). Then, ∧Ce(v′) 6|= ψ.
Proof. Let e′ be a corresponding example of v, i.e., e′ 6|= ψ (since v is negative). Since e′ |= ∧v,
then ∧v 6|= ψ and since ∧v |= ∧Ce(v), ∧Ce(v) 6|= ψ. Also, for every v′ ∈ descendants(v),
Ce(v) ⊆ Ce(v′), and thus we get ∧Ce(v′) 6|= ψ.
Finally, we show that the graph Ge has no cycles.
Observation 5.3.8. G has no cycles.
Proof. If there is a cycle containing v, then v is an ancestor of v and thus Ce(v′) ( Ce(v′), a
contradiction.
An Algorithm for Learning Cubes from an Example
In this section, we present an algorithm for learning cubes from an example, called Cube-SPEX.
The algorithm takes a positive example e and learns all implicants of ψ that are over predicates
inQ. Cube-SPEX (Algorithm 9) maintains throughout the algorithm a set of nodes, Φ, which is
63
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Algorithm 9: Cube-SPEX(e, L)1 Φ = V2 V isited = ∅3 while Φ \ V isited 6= ∅ do4 pick v ∈ Φ \ V isited5 V isited = V isited ∪ v6 e′ = model(v)7 if e′ == ⊥ then continue8 if ψ(e’) == 0 then9 Φ = Φ \ (v ∪ descendants(v)) // descendants(v) = v′ | Ce(v′) ( Ce(v)
10 return Φ
initially V . At the end of the execution, v ∈ Φ if and only if ∧v |= ψ. The algorithm presents a
corresponding example for each node that was not pruned from Φ. If a corresponding example is
negative, the node and its descendants are pruned (as from Lemma 5.3.7, none of them logically
implies ψ). Eventually, every node v that remained in Φ has the property that v and its ancestors
are positive nodes or unsatisfiable nodes. From Lemma 5.3.6, ∧v |= ψ.
The algorithm operates as follows. While there is an unpruned node v that has not been
considered yet, a corresponding example is obtained. If there is no such example, v remains in
Φ. Otherwise, a membership query is posed to the oracle. If the example is negative, v and its
descendants are pruned. When all nodes were considered or pruned, the set Φ is returned. We
note that a clean-up step can be done later to return the nodes whose Ce(v) is minimal:
Φcleaned = v ∈ Φ |6 ∃v′ ∈ Φ.Ce(v′) ( Ce(v)
The final set of cubes is then: ∧Ce(v) | v ∈ Φcleaned. As another clean-up step (which
is computationally more expensive), equivalent labels can be removed from Φcleaned to get a
minimal number of conjunctions with the same semantic meaning.
Lemma 5.3.9 (Soundness). At the end of the execution, for every v ∈ Φ, ∧Ce(v) |= ψ.
Proof. From Lemma 5.3.6 it is sufficient to show that v and its ancestors are positive or
unsatisfiable. Assume in contradiction that there exists v′ ∈ v∪ancestors(v) that is negative,
i.e., its corresponding example is negative. We show that in this case v must have been pruned
from Φ and thus get a contradiction. Initially, v′ ∈ Φ. Thus, either it is explored by Cube-SPEX
or it was pruned before it was explored. If v′ was explored, then its corresponding example is
negative and thus v is pruned from Φ, because it is equal to v′ or is v′’s descendant. If v′ was
not explored, then another node v′′ has caused v′ to be pruned. Namely, v′ ∈ descendants(v′′).
However, in this case, v ∈ descendants(v′′) and is thus pruned, too.
Lemma 5.3.10 (Completeness). At the end of the execution, for every C ⊆ L such that e |=∧C |= ψ, there exists v ∈ V such that v ∈ Φ and Ce(v) = C.
64
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Proof. Let C be such a subset. Since e |= ∧C, by definition C ⊆ L(e). From Lemma 5.3.4,
there exists v ∈ V such that Ce(v) = C. Initially, v ∈ Φ. From Lemma 5.3.7, since ∧C |= ψ,
there is no ancestor of v that is negative and v is also not negative. Thus, v is never pruned
from Φ.
Lemma 5.3.11. If e′ is presented by Cube-SPEX, Cube-SPEX has not previously presented an
example e′′ such that e′ ≡L e′′.
Proof. Assume in contradiction that there is e′′ ≡L e′ such that e′′ was presented by Cube-SPEX
when it presented e′. Then, L(e′) = L(e′′). From Corollary 5.4, they correspond to a single
node v′. This means that v′ is considered twice by Cube-SPEX, but every node considered is
added to V isited and is thus not considered again – a contradiction.
Optimal Learning Algorithm from an Example
In this section, we optimize the search to leverage pruning as much as possible. The idea is to
search the space in a BFS order, namely by examining (and potentially) pruning sets before
considering their subsets. Thus, subsets are only examined, and pose a membership query, if all
their ancestors are in Φ. Namely, we change the unspecified pick of a node v (Line 4) to pick
nodes in a BFS order. We prove that in this case, the number of membership queries is minimal.
We call this variation BFS-Cube-SPEX.
Lemma 5.3.12. Let ψ be the target formula, e be a positive example, and Seψ be the set of cubes
from ψ satisfied by e. Further, let E be a non-equivalent example set and A be a learning
algorithm that learns for every e maximal cubes from Q∧ satisfied by e. Denote by EA the set
of queries that A posed. Then, for every e′ ∈ E, one of the following is true:
• There exists e′′ ∈ EA such that e′′ ≡Q e′.• There exists e′′ ∈ EA such that Ce(e′′) |= Ce(e
′) and e′′ 6|=∨C∈Seψ
∧C.
Proof. Let e′ ∈ E and assume there is no e′′ ∈ EA such that e′ ≡Q e′′ or Ce(e′′) |= Ce(e′) and
e′′ 6|=∨C∈Seψ
∧C. We split to cases:
• If A returns a conjunction c such that e′ |= c: Let EP be the set of positive examples
that A observed. By our assumption, for every e′′ ∈ EP : e′′ 6≡Q e′. Thus, we set
ψ =∨e′∈EP
∧L(e′). This hypothesis aligns with all A queries (positive and negative).
However, e′ 6|= ψ and e′ |= c. Thus c 6|= ψ and c is incorrect.
• If A returns no conjunction such that e′ |= c: Let EP , EN be the set of positive and
negative examples A observed, respectively. By our assumption, for every e′′ ∈ EP :
e′′ 6≡Q e′ and for every e′′ ∈ EN either Ce(e′′) 6|= Ce(e′) or e′′ |=
∨C∈Seψ
∧C. Then, we
set ψ =∨e′′∈EP∪ancestors(v)∪e′
∧∧L(e′′). This hypothesis aligns with all A queries
(positive and negative). Namely, e |= ∧Ce′(v) and e′ |= Ce′(v). Since A did not return a
conjunction satisfied by e′ which is satisfied by e, it did not return Ce′(v). Thus, A did
not return a maximal set of cubes over Q∧.
65
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
From the above lemma, it follows that to show that BFS-Cube-SPEX poses a minimal
number of queries, it is sufficient to show that if it presented an example which was discovered
as negative, it was not redundant. That is, there was no e′′ ∈ EA such that Ce(e′′) |= Ce(e′)
and e′′ 6|=∨C∈Seψ
∧C and thus by the lemma, this query could not be avoided. We prove this
in the following lemma.
Lemma 5.3.13. If BFS-Cube-SPEX poses a membership query to a node v, then every v′ ∈ancestors(v) is positive or unsatisfiable.
Proof. Let v be a node for which BFS-Cube-SPEX poses a membership query. Assume in
contradiction that there is a node v′ ∈ ancestors(v) that is negative. Then, by the BFS order
and since G has no cycles (Observation 5.3.8), v′ is considered before v. Thus, v′ is pruned
along with its descendants, including v. Thus, BFS-Cube-SPEX does not explore v or present a
membership query for v.
Learning DNF Formulas from Representative Sets
In this section, we provide DNF-SPEX, an algorithm that learns a DNF formula from a repre-
sentative set. DNF-SPEX (Algorithm 10) is straightforward: it iterates the positive examples
from the representative set and for each it learns a maximal set of cubes via Cube-SPEX. To
avoid posing equivalent queries, DNF-SPEX maintains a set of positive and negative examples,
to which Cube-SPEX also adds examples. When Cube-SPEX considers a node v, it checks
whether there is already an example corresponding to v in EP or EN .
Algorithm 10: Neg-Closed-DNF-SPEX(E)1 EN = ∅2 EP = E3 S = ∅4 for e ∈ E do5 S = S ∪ Cube-SPEX(e,L, EP , EN )6 clean(CP )7 return
∨C∈S ∧C
Theorem 5.5. At the end of the execution, ψ ≡∨C∈S
∧C.
Proof. By the correctness of Cube-SPEX,∨C∈S
∧C |= ψ. We now prove ψ |=
∨C∈S
∧C.
Let e |= ψ. Namely, there exists a cube C in ψ such that e |=∧C. Since E is a representative
set, there exists e′ ∈ E such that e′ |= C. Since e |=∧C |= ψ, by the soundness of Cube-SPEX,
it returns a conjunction C and thus C ′ ∈ S.
Theorem 5.6. If E is a minimal representative set and BFS-Cube-SPEX is executed, Neg-
Closed-DNF-SPEX learned ψ with a minimal number of examples.
Proof. Let e be a considered example. If e is presented by BFS-Cube-SPEX, then no equivalent
example has already been considered, and by the optimality of BFS-Cube-SPEX, this example
66
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
is required for correctness. If e is part of E, then since E is minimal, there exists a cube C in ψ
such that C is not satisfied by any other example in E. Namely, for e′ ∈ E \ e, e′ 6|=∧C.
In particular,∧L(e′) 6|=
∧C. Thus, any cube returned by invoking BFS-Cube-SPEX on e′ is
implied by∧L(e′) and thus does not imply
∧C. Namely, without e, Neg-Closed-DNF-SPEX
will return S such that ψ 6|=∨C∈S
∧C. Thus, e is necessary.
5.4 Learning when Predicates are Anti-closed under Negation
In this section, we study another class of predicate sets – those that are anti-closed to negation. A
predicate set is anti-closed to negation if for every q ∈ Q, there is no q′ ∈ Q such that ¬q ≡ q′.The unique aspect of this class is that it enables pruning that follows due to the inability to
express negation of predicates. We illustrate this class with the following example. Let D be
set of natural numbers and the following set of predicates Q = q×2, q×3, q×5, where q×n is
satisfied by all elements in D that are a multiplication of n. Assume a target formula ψ over
Q. If 2 is a positive example, which satisfies q×2,¬q×3,¬q×5, it must be that q×2 |= ψ. This
follows because if 2 is a positive example, there must be a cube over Q in ψ that is satisfied by
2. Since the cube q×2 and the tautology true are the only ones that are satisfied by 2, we get
q×2 |= ψ.
This intuition guides our algorithm that learns DNF formulas in this class: when considering
whether a cube C implies ψ, our algorithm presents an example that satisfies the negations of
the predicates in Q \ C. If such example exists and is positive, then∧C logically implies ψ
and so do sets containing C’s predicates. Otherwise, if∧C 6|= ψ, then neither C nor its subsets
logically imply ψ. We begin with defining the search space and then provide the algorithm.
5.4.1 The Search Space
We define the search space similarly to the one defined in the previous section, but with respect
to the predicate set Q ∪ ¬q | q ∈ Q. The labeling function is now defined differently (and
independently of concrete examples):
C(v) = v ∩Q
The corresponding examples and the edges are defined identically.
We begin with the main lemma that guides the pruning of the search space.
Lemma 5.4.1. Let ψ be a target formula, v a node, and e ∈ D a corresponding example for v.
1. If ψ(e) = 0, ∧C(v) 6|= ψ and for every descendant of v, v′: ∧C(v′) 6|= ψ.
2. If ψ(e) = 1, ∧C(v) |= ψ and for every ascendant of v, v′: ∧C(v′) |= ψ.
Proof. 1. If ψ(e) = 0: Since e is a corresponding example for v, in particular e |= C(v),
thus C(v) 6|= ψ. For every descendant of v, v′, C(v′) ⊆ C(v), and thus C(v) |= C(v′)
and in particular e |= C(v′) and C(v′) 6|= ψ.
67
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
2. If ψ(e) = 1: Either ψ is a tautology, in which case the claim trivially holds, or there exists
a cube C ⊆ Q in ψ such that e |= ∧C. Since e does not satisfy any of the predicates in
Q \ C(v), it must be that C ⊆ C(v). Since ∧C |= ψ and ∧C(v) |= ∧C, it follows that
∧C(v) |= ψ. Also, since for every ascendant of v, v′, C(v′) ⊆ C(v), the claim follows.
We now characterize when nodes logically imply a target formula.
Lemma 5.4.2. Let ψ be a target formula and v a node. Each one of the following implies
∧C(v) |= ψ:
• v corresponds to a positive example.
• v is unsatisfiable and for every v′ ∈ ancestors(v): ∧C(v′) |= ψ.
• There exists a descendant v′ of v such that ∧C(v′) |= ψ.
Further, if ∧C(v) |= ψ then for every v′ ∈ ancestors(v): ∧C(v′) |= ψ.
Proof. • If v corresponds to a positive example, then by Lemma 5.4.1 ∧C(v) |= ψ.
• If v is unsatisfiable and every ancestor of v logically implies ψ, then since v is unsatisfiable,
C(v) ≡∨v′∈ancestors(v)
∧C(v′). Since for each of them ∧C(v′) logically implies ψ,
∧C(v) also logically implies ψ.
• If there exists a descendant of v that logically implies ψ, then by Lemma 5.4.1, ∧C(v) |=ψ.
• Assume in contradiction that ∧C(v) |= ψ but there is v′ ∈ ancestors(v): ∧C(v′) 6|= ψ.
Then, it must be that the corresponding example to v′ is negative (Lemma 5.4.1). Thus,
from Lemma 5.4.1, ∧C(v) 6|= ψ – a contradiction.
5.4.2 A Learning Algorithm
In this section, we present our algorithm that learns a DNF formula over a set of predicates that
is anti-closed to negation. The algorithm is almost identical to Cube-SPEX (Algorithm 9), but
with one difference – if a corresponding example is positive, then all its ancestors are added to
V isited, indicating that they are known to logically imply ψ. This means that these nodes will
not be inspected at a later point.
We now prove soundness and completeness. We then provide a bound on the number of
membership queries posed.
Lemma 5.4.3 (Soundness). At the end of the execution, for every v ∈ Φ, ∧C(v) |= ψ.
Proof. From Lemma 5.4.2 it is sufficient to show that one of the following holds:
• v corresponds to a positive example.
• v is unsatisfiable and for every v′ ∈ ancestors(v): ∧C(v′) |= ψ.
• There exists a descendant v′ of v such that ∧C(v′) |= ψ.
68
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Algorithm 11: Neg-Anti-Closed-DNF-SPEX1 Φ = V2 V isited = ∅3 while Φ \ V isited 6= ∅ do4 pick v ∈ Φ \ V isited5 V isited = V isited ∪ v6 e′ = model(v)7 if e′ == ⊥ then continue8 if ψ(e’) == 0 then9 Φ = Φ \ (v ∪ descendants(v))
10 else11 V isited = V isited ∪ ancestors(v)
12 return∨v∈Φ
∧C(v)
If v remained in Φ by the end of the execution, it was not pruned from Φ and was added
to V isited. A node can be added to V isited either when it is explored or when one of its
descendants adds it to V isited. In the former case, since v remained in Φ, it means that v
corresponded to a positive example or was unsatisfiable. If v was unsatisfiable but was not
pruned, it means that all of its ancestors logically imply ψ (as otherwise, it would have been
pruned). In the latter case, if v was added to V isited by a descendant, then the descendant
corresponded to a positive example. In either case, the claim follows.
Lemma 5.4.4 (Completeness). At the end of the execution, for every C ⊆ Q such that ∧C |= ψ
there exists v ∈ V such that v ∈ Φ and C(v) = C.
Proof. Let C be such subset. A lemma similar to Lemma 5.3.4 shows that there exists v ∈ Vsuch that C(v) = C. Initially, v ∈ Φ. From Lemma 5.4.2, v has no ancestor that is negative and
v is also not negative. Thus, v is never pruned from Φ.
Corollary 5.7. Neg-Anti-Closed-DNF-SPEX learns the target formula.
Lower Bound
In this section, we show that if pick is implemented as a binary search, then the number of
membership queries posed is log(|Q|) ·OPT .
Theorem 5.8. Any learning algorithm that learns QNeg-Anti-Closed-DNF must ask at least
max(log |G(Q∧)|, |Sψ| ·(1+maxC∈G(Q∧) |De(C)|) membership queries, where Sψ is the mini-
mal number of cubes in the target formula ψ; that is, for every C ∈ Sψ,∨C′∈Sψ\C
∧C ′ 6≡ ψ.
In particular, Neg-Anti-Closed-DNF-SPEX poses at most |log(Q)| ·OPT(QNeg-Anti-Closed-DNF)
membership queries.
Proof. • OPT(QNeg-Anti-Closed-DNF) ≥ log |G(Q∧)|: The number of different formulas in
QNeg-Anti-Closed-DNF is at least |G(Q∧)|, and thus from the information theoretic lower
bound we get the result.
69
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
• OPT(QNeg-Anti-Closed-DNF) ≥ |Sψ| · (1 + maxC∈G(Q∧)): Let C ∈ Sψ, and C1, . . . , Cm
be the immediate descendants of C. Any learning algorithm must pose a query for C (that
is positive) and a query for each immediate descendant (which is negative). Without such
queries the algorithm cannot distinguish between C and Ci. By our construction, every
such example is unique per node. Therefore, the algorithm requires at least 1 +De(C)
membership queries. In total, it requires at most |Sψ| · (1 + maxG∈G(Q∨) |De(G)|)membership queries.
To find every C ∈ Sψ (and its descendants), Neg-Anti-Closed-DNF-SPEX poses at most
log(Q) queries and thus the claim follows.
5.5 Conclusion
In this chapter, we studied the learnability of DNFs (QDNF ) over a set of predicates Q. We
showed how to extend C-SPEX to learn DNFs. We then focused on two sub-classes, those
whose predicates are closed under negation, and those whose predicates are “anti-closed” under
negation. We showed for each sub-class an algorithm that poses fewer membership queries than
the extension of C-SPEX.
70
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Chapter 6
Synthesis with Abstract Examples
So far, we have focused on learning specifications that can be used to synthesize executable
programs. Namely, the search was in a specification space. PBE experts often believe that the
program space should be the one to drive the search to the target program. Their motivation
is Occam’s razor principle, which in our context of program synthesis implies that the user’s
intent is likely to be captured by a short program. In this chapter, we show that this approach
can be taken without sacrificing exactness. We present a novel synthesis framework that enables
us to extend PBE synthesizers (under some assumptions) with the ability to communicate a
candidate program’s behavior through a few abstract examples. The abstract examples serve
as an intuitive specification for candidate programs. Thus, through abstract examples, the user
is guaranteed that the final candidate program captures his intent on all inputs. The abstract
examples are a new form of examples that represent a potentially unbounded set of concrete
examples. An abstract example captures how part of the input space is mapped to corresponding
outputs by the synthesized program. Our framework uses a generalization algorithm to compute
abstract examples, which are then presented to the user. The user can accept an abstract example,
or provide a counterexample, in which case the synthesizer will explore a different program.
When the user accepts a set of abstract examples that covers the entire input space, the synthesis
process is completed.
We have implemented our approach and we experimentally show that our synthesizer
communicates with the user effectively by presenting on average 3 abstract examples until
the user rejects false candidate programs. Further, we show that a synthesizer that prunes the
program space based on the abstract examples reduces the overall number of required concrete
examples in up to 96% of the cases.
6.1 Overview
In this section, we provide an informal overview of abstract examples and their use in our
interactive synthesis framework. Our interactive synthesis framework communicates with a user
only through abstract membership queries—asking the user whether an abstract example of the
current candidate program should be accepted or rejected— and guarantees that the synthesized
71
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
program is correct on all inputs. Abstract examples are a new form of examples that represent
a potentially unbounded set of concrete examples of a candidate program. Abstract examples
are natural for a user to understand and inspect (similarly to examples), and at the same time
enable validation of the synthesis result without enumerating all concrete examples (which is
only possible for a finite domain, and even then is often prohibitively expensive). In fact, an
abstract membership question can also be viewed as a partial validation question. Instead of
presenting the user with a program and asking him to determine whether or not it is correct (a
validation question), we present an abstract example, which describes (declaratively) how the
candidate program transforms part of the input space. In this way, abstract examples allow us to
perform exact synthesis without a predefined specification.
Throughout the synthesis process, as the synthesizer explores the space of candidate pro-
grams to find the one that matches the user’s intent, it presents to the user abstract examples of
candidate programs. The user can accept an abstract example, or provide a counterexample, in
which case the synthesizer will explore a different candidate program. By accepting an abstract
example, the user confirms the behavior of the candidate program on part of the input space.
That is, the synthesizer learns the desired behavior for an unbounded number of concrete inputs.
Thus, it can prune every program that does not meet the confirmed abstract example. This
pruning is correct even if later the candidate program is rejected by another abstract example.
Generally, pruning based on an abstract example removes more programs than pruning based
on a concrete example. Thus, our synthesizer is likely to converge faster to the target program
compared to the current alternative (see Section 6.5). When the user accepts a set of abstract
examples that covers the entire input space, our synthesizer returns the corresponding candidate
program and the synthesis process is completed.
A key ingredient of our synthesizer is a generalization algorithm, called L-SEP. L-SEP
takes a concrete example and a candidate program, and generalizes the example to a maximally
general abstract example consistent with the candidate program. We illustrated this on our
motivating example from the Introduction (Chapter 1), where the candidate program is the one
synthesized by Flash Fill (that returns “H” followed by the second letter of the person’s first
name, etc.) and the initial concrete example is the first member on the list (i.e., Diane). Our
generalization algorithm produces the following abstract example:
a0a1A2 B C → Ha1 a0a1A2, please come to my office at C . -EG
This example describes the program behavior on the cells in columns A, B, and C, for the
case where the string in cell A has at least two characters, denoted by a0 and a1, followed by
a string sequence of arbitrary size (including 0), denoted by A2. For such inputs, the example
describes the output as a sequence consisting of: (i) the string “H” followed by a1, (ii) the entire
string at A followed by a comma, (iii) the string: “please come to my office at”, (iv) the string at
C, and (v) the string: “. -EG”.
This abstract example is presented to the user. The user rejects it and provides a concrete
counterexample (e.g., line 4 in the Excel spreadsheet). Thus, the synthesizer prunes the space
72
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
of candidate programs and generates a new candidate program. Eventually, the synthesizer
generates the target program (as a candidate program), and our synthesizer presents the following
abstract example:
A B C → Hi A, please come to my office at C . -EG
This time, the user accepts it. Since this abstract example covers the entire input space, the
synthesizer infers that this program captures the user’s intent on all inputs and returns it. In
general, covering the input space may require multiple abstract examples.
6.2 Abstract Specifications and Sequence Expressions
In this section, we define the key terms pertaining to abstract examples. We then present a
special class of abstract examples for programs that manipulate strings. For simplicity’s sake,
from here on we assume that programs take one input. This is not a limitation, as multiple inputs
(or outputs) can be joined with a predefined delimiter (e.g., the inputs in the motivating example
can be considered as one string separated by spaces).
6.2.1 Abstract Examples
Program Semantics The semantics of a program P is a function over a domain D: JP K : D →D. We equate JP K with its input-output pair set: (in, JP K(in)) | in ∈ D.
Abstract Examples An abstract example ae defines a set JaeK ⊆ D × D, which represents
a partial function: if (in, out1), (in, out2) ∈ JaeK, then out1 = out2. An abstract example
ae is an abstract example for program P if JaeK ⊆ JP K. We define the domain of ae to be
dom(ae) = in ∈ D | ∃out. (in, out) ∈ JaeK.
Abstract Example Specifications An abstract example specification of P is a set of abstract
examples A for P such that⋃ae∈A dom(ae) = D. Note that A need not be finite and the
example domains need not be disjoint.
6.2.2 Sequence Expressions
In this work, we focus on programs that manipulate strings, i.e., D = Σ∗ for a finite alphabet Σ.
Thus, it is desirable to represent abstract examples as expressions that represent collections of
concrete strings and can be readily interpreted by humans. A prominent candidate for this goal
is regular expressions, which are widely used to succinctly represent a set of strings. However,
regular expressions are restricted to constant symbols (from Σ). Thus, they cannot relate outputs
to inputs, which is desirable when describing partial functions (abstract examples). To obtain
this property, we introduce a new language, Sequence Expressions (SE), that extends regular
expressions with the ability to relate the outputs to their inputs via shared variables. We begin
this section with a review of regular expressions, and then introduce the two types of sequence
expressions: input SEs, for describing inputs, and output SEs, for describing outputs.
73
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
SI ::= SI · SI | ε | σ | xR | XR | σk SO ::= SO · SO | ε | σ | x | f(x) | X | f(X) | σk(a) Input SE (b) Output SE
Figure 6.1: SE grammar: σ ∈ Σ, x ∈ x, X ∈ X, k ∈ K, R ∈ R, f ∈ F .
Regular Expressions (RE) The set of regular languages over a finite alphabet Σ is the minimal
set containing ε, σ1, ..., σ|Σ| that is closed under concatenation, union, and Kleene star. A
regular expression r is a text representation of a regular language over the symbols in Σ and the
operators ·, |,∗ (concatenation, or, and Kleene star).
Input SE Syntax Fig. 6.1(a) shows the grammar of input SEs. In contrast to RE, SEs are
extended with three kinds of variables that later help to relate the output to the input:
• Character variables, denoted x ∈ x, used to denote an arbitrary letter from Σ.
• Sequence variables, denoted X ∈ X, used to denote a sequence of arbitrary size.
• Star variables, denoted k ∈ K, used instead of the Kleene star to indicate the number of
consecutive repeating occurrences of a symbol. For example, 0k has the same meaning as
the RE 0∗.
To eliminate ambiguity, in our examples we underline letters from Σ. For example, xXa
represents the set of words that have at least two letters and end with an a (a ∈ Σ).
We limit each variable (i.e., x,X, k) to appear at most once in an input SE. We also limit
the use of a Kleene star to single letters from the alphabet. Also, since the goal of each SE is to
describe a single behavior of the program, we exclude the ‘or’ operator. Instead, we extend the
grammar to enable to express ‘or’ to some extent via predefined predicates that put constraints
on the variables. We denote these predicates by R ∈ R, and their meaning (i.e., the set of
words that satisfy them) by JRK ⊆ Σ∗. We note that we do not impose restrictions on the setR;
however, our algorithm relies on an SMT-solver, and thus predicates inR have to be encodable
as formulas.
Some examples for predicates and their meaning are: JnumK = w ∈ Σ∗ |w consists of digits only, JanumK = w ∈ Σ∗ | w consists of letters and digits only, JdelK =
., \t, ; , Jno delK = Σ∗ \ JdelK.
We assume that the predicate satisfied by any string, T, (where JT K = Σ∗) is always in
R. We abbreviate xT, XT to x,X . In the following, we refer to these as atomic constructs:
σ, xR, XR, σk. Given an input SE se, we denote by xse, Xse, and Kse the set of variables in se.
Input SE Semantics To define the semantics, we first define interpretations of an SE, which
depend on assignments. An assignment env for an input SE se maps every x ∈ xse to a letter in
Σ, everyX ∈ Xse to a sequence in Σ∗, and every k ∈ Kse to a natural number (including 0). We
denote by env[se] the sequence over Σ obtained by substituting the variables with their interpreta-
tions. Formally: (i) env[ε] = ε (ii) env[σ] = σ (iii) env[xR] = env(x) (iv) env[XR] = env(X)
(v) env[σk] = σenv(k) (vi) env[S1 · S2] = env[S1] · env[S2] (where · denotes string concate-
nation). An assignment is valid if for every xR and XR in se, env(x), env(X) ∈ JRK. In the
following we always refer to valid assignments.
The semantics of an input SE se, denoted by JseK, is the set of strings obtained by the set
74
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
of all valid assignments, i.e. JseK = s ∈ Σ∗ | ∃env. env[se] = s. For example, JσK = σ,JxK = Σ, JXK = Σ∗, and JσkK = ε, σ, σσ, ....
Output SE Fig. 6.1(b) shows the grammar of output SEs. Output SEs are defined with respect
to an input SE and they can only refer to its variables. Formally, given an input SE se, an output
SE over se is restricted to variables in xse, Xse, and Kse. Unlike input SEs, an output SE is
allowed to have multiple occurrences of the same variable, and variables are not constrained
by predicates. In addition, output SEs can express invocations of unary functions over the
variables. Namely, the grammar is extended by f(x) and f(X), where x ∈ xse and X ∈ Xse,
and f : Σ→ Σ∗ is a function.
An interpretation of an output SE is defined with respect to an assignment, similarly to the
interpretation of an input SE. We extend the interpretation definition for the functions as follows:
env[f(x)] = f(env(x)) and if env(X) = σ1 · · ·σn then env[f(X)] = f(σ1) · · · f(σn), i.e.,
env[f(X)] is the concatenation of the results of invoking f on the characters of the interpretation
of X . (If env(X) = ε, env[f(X)] = ε.)
Input-Output SE Pairs An input-output SE (interchangeably, an SE pair) is a pair io = sein →seout consisting of an input SE, sein, and an output SE, seout, defined over sein. Given
io = sein → seout, we denote in(io) = sein and out(io) = seout. The semantics of io is the
set of pairs: JioK = (sin, sout) ∈ Σ∗ ×Σ∗ | ∃env. sin = env[in(io)]∧ sout = env[out(io)].The domain of io is dom(io) = Jin(ae)K.
Example An input-output SE for the pattern of column D based on columns A,B in Fig. 1.1 is:x0no delX1no del X2 → flowercase(x0).flowercase(X2)@lockhart-gardner.com
where x0 is a character variable, X1 and X2 are sequence variables and denotes a column
delimiter (taken from Σ). The predicate no del is satisfied by words that do not contain a
delimiter. The semantics of this SE pair is the set of all word pairs whose first element is a
string consisting of a first name, a delimiter, and a last name, and the second element is the
email address, which is the sequence of the first letter of the first name in lower case, a dot, the
lower-cased last name, and the suffix “@lockhart-gardner.com”.
6.2.3 Sequence Expressions as Abstract Examples
SE pairs provide an intuitive means to describe relations between outputs and inputs. In this work,
we focus on learning abstract examples that can be described with SE pairs. For simplicity’s
sake, in the following we ignore predicates and functions (i.e., R,F). Our definitions and
algorithms can be easily extended to arbitrary (but finite) setsR and F .
We say that an input-output SE is an abstract example if JioK describes a partial function.
Note that in general, an SE pair is not necessarily an abstract example. For example, the pair
ioXY = XY → XaY , can be interpreted to (bbb, babb) (by env1 = X 7→ b, Y 7→ bb) and
(bbb, bbab) (by env2 = X 7→ bb, Y 7→ b). Thus, JioXY K is not a partial function and hence
not an abstract example.
Given a program P , we say that an input-output SE is an abstract example for P if JioK ⊆JP K. Since JP K is a function, this requirement subsumes the abstract example requirement.
75
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Given an input SE sein, we say that an output SE seout over sein is a completion of sein for P
if sein → seout is an abstract example for P .
Example We next exemplify how SEs can provide an abstract example specification to describe
a program behavior. Assume a user has a list of first names and middle names (space delimited),
some of which are only initials, and the goal is to create a greeting message of the form “Dear
<name>”. The name in the greeting is the first string if it is identified as a name, i.e., has at
least two letters; otherwise, the name is the entire string. For example: (i) Adam→ Dear Adam,
(ii) Adam R.→ Dear Adam, (iii) A. Robert→ Dear A. Robert (iv) A.R.→ Dear A.R.. In this
example, we assume the predicate set contains the predicates R = T, name, other, where
JnameK = A, a, ..., Z, z+ \ A, a, ..., Z, z, JotherK = (Σ \ )∗ \ JnameK. An abstract
example specification is: (i) X0name → Dear X0 (ii) X0name X1 → Dear X0 (iii) X0other →Dear X0 (iv) X0other X1 → Dear X0 X1.
Discussion While SEs can capture many program behaviors, they have limitations. One limi-
tation is that an SE can only describe relations between output characters to input characters,
but not among input characters. For example, it cannot capture inputs that are palindromes or
inputs of the form XX (e.g., abab). This limitation arises because we chose input SEs to be
(a subset of) regular expressions, which cannot capture such languages. Also, tasks that are
not string manipulations are likely to have a specification that contains (many) trivial abstract
examples (i.e., concrete input-output examples). For example, consider a program that takes
two digits and returns their multiplication. Some abstract examples describing it are X 1→ X
and 1 X → X . However, the specification also contains 9 2 → 18, 9 3 → 27,...,9 9 → 81.
Moreover, an abstract example specification consists of a set of independent abstract examples,
with no particular order. As a result, describing if-else rules requires encoding the negation of
the “if” condition explicitly in order to obtain the same case splitting as an if-else structure.
Generalization Order We next define a partial order between SEs that are abstract examples.
This order is leveraged by our algorithm in the next section. We call this order the generalization
order and if an abstract example is greater than another one, we say it is more general or abstract.
We begin with defining a partial order on the atomic constructs of SEs, as follows:
X
x σk
σ
where σ ∈ Σ, x ∈ x, X ∈ X and k ∈ K.
We say that an input SE se′ is more general than se, se se′, if its atomic constructs
are pointwise more general than the atomic constructs of se. Namely, for se = a1 · · · an and
se′ = a′1 · · · a′n (where ai and a′i are atomic constructs), se se′ if for every 1 ≤ i ≤ n,
ai a′i. If se se′ ∧ se 6= se′, we write se ≺ se′. For example, abc ≺ abkc ≺ xY Z. In
addition, we define that for any atomic construct a, a 6 ε and ε 6 a. The generalization order
implies the following:
Lemma 6.2.1. Let se, se′ be two input SEs. If se se′, then JseK ⊆ Jse′K.
76
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
The proof follows directly from the definition of and the semantics of an input SE. Note that
the converse does not necessarily hold. For example, JXY K = JZK, but XY 6 Z and Z 6 XY .
In fact, may only relate SEs of the same length. In practice, we partly support generalizations
beyond (see Section 6.3).
The generalization order of input SEs induces a generalization order on input-output SEs:
io io′ if in(io) in(io′). If io and io′ are abstract examples for the same program P ,
this implies that JioK ⊆ Jio′K. Moreover, in that case, JioK ⊆ Jio′K if and only if Jin(io)K ⊆Jin(io′)K. This observation enables our algorithm to focus on generalizing the input SE instead
of generalizing the pair as a whole.
6.3 An Algorithm for Learning Abstract Examples
In this section, we describe L-SEP, our algorithm for automatically Learning an SE Pair. This
pair is an abstract example for a given program and it generalizes a given concrete example. In
Section 6.4, we will use L-SEP repeatedly in order to generate an abstract example specification.
L-SEP (Algorithm 12) takes as input a program P (e.g., the program Flash Fill learned)
and a (concrete) input in (e.g., Diane). These two define the initial SE with which to begin:
(in, JP K(in)) (namely, the concrete example). The algorithm outputs an input-output SE,
io = sin → sout, such that (in, JP K(in)) ∈ JioK ⊆ JP K. Namely, io generalizes (or abstracts)
the concrete example and is consistent with P . L-SEP’s goal is to find an io that is maximal
with respect to .
The high-level operation of L-SEP is as follows. First, it sets io = in→ JP K(in). Then, it
gradually generalizes io as long as this results in pairs that are abstract examples for P . The
main insight of L-SEP is that instead of generalizing io as a whole, it generalizes the input SE,
in(io), and then checks whether there is a completion of in(io) for P , namely an output SE
over in(io) such that the resulting pair is an abstract example for P . This is justified by the
property that io io′ if and only if in(io) in(io′).
6.3.1 Input Generalization
We now explain the pseudo-code of L-SEP. After initializing io by setting sin = in and
sout = JP K(in), L-SEP stores in InCands the set of candidates generalizing sin (which are
the input components of io’s generalizations). Then, a loop attempts to generalize sin as long as
InCand 6= ∅. Each iteration picks a minimal element from InCands, s′in, which is a candidate
to generalize sin. To determine whether s′in can generalize sin, findCompletion is called.
If it succeeds, it returns s′out such that s′in → s′out is an abstract example for P . If it fails, ⊥is returned. Either way, the search space, InCands, is pruned: if the generalization succeeds,
then the candidates are pruned to those generalizing s′in; otherwise, they are pruned to those
except the ones generalizing s′in. If the generalization succeeds, sin and sout are updated to s′inand s′out.
Our next lemma states that if findCompletion returns ⊥, pruning InCands does not
77
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Algorithm 12: L-SEP(P , in)1 sin = in; sout = JP K(in)2 InCands = s ∈ SEin | s sin3 while InCands 6= ∅ do4 s′in = pick a minimal element from InCands5 s′out = findCompletion(P , s′in) // if succeeds, Js′in → s′outK ⊆ JP K6 if s′out 6= ⊥ then7 sin = s′in ; sout = s′out8 InCands = InCands ∩ s ∈ SEin | s sin9 else
10 InCands = InCands \ s ∈ SEin | s s′in
11 return (sin, sout)
remove input SEs that have a completion for P . The lemma guarantees that L-SEP cannot miss
abstract examples for P because of this pruning.
Lemma 6.3.1. If s′′in s′in and s′in has no completion for P , s′′in has no completion for P .
sketch. We prove by induction on the number of generalization steps required to get from s′in to
s′′in. Base is trivial. Assume the last generalization step is to replace a′i in sin′ with a′′i in s′′in. If
s′′in has a completion s′′out for P , then substitute a′′i in s′′out by a′i to obtain a completion for s′in.
However, this contradicts our assumption.
InCands For ease of presentation, L-SEP defines InCand as the set of all generalizations
of in that remain to be checked, where initially it contains all generalizations. However,
the size of this set is exponential in the length of in, and thus in practice, L-SEP does not
maintain it explicitly. Instead, it maintains two sets: MinCands, which records the minimal
generalizations of the current candidate sin that remain to be checked, and Pruned, which
records the minimal generalizations that were overruled (and hence none of their generalizations
need to be inspected). In Line 2 and Line 8 L-SEP initializes MinCands based on the
current candidate sin by computing all of its minimal generalizations. In Line 10 it removes
from MinCand the generalization that was last checked and failed, and also records this
generalization in Pruned to indicate that none of its generalizations need to be inspected.
Pruned is used immediately after initializingMinCand in Line 8 to remove fromMinCands
any generalization that generalizes a member of Pruned – this efficiently implements the update
of InCands in Line 10. Using this representation of InCands we can now establish:
Lemma 6.3.2. The number of iterations of L-SEP is O(|in|2 · |R|2).
Proof. The number of iterations is at most the maximal size of MinCands multiplied by the
number of initializations of MinCands based on a new candidate sin in Line 8. The size
of MinCands computed based on some sin is at most |in| · (|R| + 1). This follows since
a minimal generalization of sin differs from sin in a single construct that is more general
than the corresponding construct in sin (with respect to the partial order of constructs). The
78
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
number of initializations of MinCands in Line 8 is bounded by the longest (possible) chain of
generalizations. This follows because each such initialization is triggered by the update of sinto a more general SE. Since the longest chain of generalizations is at most |in| · (|R|+ 1), the
number of iterations is O(|in|2 · |R|2).
Lemma 6.3.2 implies that MinCands and Pruned provide a polynomial representation of
InCands (even though the latter is exponential). Further, the use of these sets enables L-SEP
to run in polynomial time because they provide a quadratic bound on the number of iterations,
and because findCompletion is also polynomial, as we shortly prove.
Picking a Minimal Generalization We now discuss how L-SEP picks a minimal generalization
of sin in Line 4. One option is to do so arbitrarily. However, this greedy approach may result
in a sub-optimal maximal generalization, namely, a maximal generalization that concretizes
to fewer concrete inputs than some other possible maximal generalization. On the other hand,
to obtain an optimal generalization, all generalizations that have a completion have to be
computed and only then can the best one be picked by comparing the number of concretizations.
Unfortunately, this approach results in an exponential time complexity and is thus impractical.
Instead, our implementation of L-SEP takes an intermediate approach: it considers all minimal
generalizations that have a completion and picks one that concretizes to a maximal number of
inputs. To avoid counting the number of inputs (which may be computationally expensive), our
implementation employs the following heuristic. It syntactically compares the generalizations
by comparing the construct in each of them that is not in sin (i.e., where generalization took
place). It then picks the generalization whose construct is maximal with respect to the order:
X > σk > x. If there are generalized constructs that are not comparable w.r.t. this order (e.g.,
σk1 vs. σk2 ), one is picked arbitrarily.
6.3.2 Completion
findCompletion (Algorithm 13) takes P and an input generalization s′in and returns a com-
pletion of s′in for P , if one exists; or ⊥, otherwise.
In contrast to input SEs, if a certain candidate s′′out is not a completion of s′in for P , this does
not imply that its generalizations are also not completions of s′in. Thus, a pruning procedure
similar to the one in L-SEP may result in missing completions. Consider, for example, a program
P whose abstract example specification is xX → bX. Assume that while L-SEP looks for a
completion for s′in = ax it considers s′out = ba, which is not a completion. Pruning SEs that are
more general than s′out will result in pruning the completion bx. Likewise, pruning elements that
are more specific than a candidate that is not a completion may result in pruning completions.
Since the former pruning cannot be used to search the output SE, findCompletion
searches differently, by making gradual attempts to construct a completion s′out construct-
by-construct. If an attempt fails, it backtracks and attempts a different construction. This is
implemented via the recursive function findOutputPrefix. At each step, a current prefix
sprefout (initially ε) is extended with a single atomic construct sym (i.e., σ, x,X, σk). Then, it
checks whether the current extended construction is partially consistent with P (Line 7). If the
79
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Algorithm 13: findCompletion(P , s′in)1 return findOutputPrefix(P ,s′in, ε)2 Function findOutputPrefix(P , s′in, sprefout ):3 if Js′in → sprefout K ⊆ JP K then return sprefout
4 Cands = s ∈ SEout(s′in) | s is an atomic construct5 while Cands 6= ∅ do6 sym = pick and remove a minimal element from Cands
7 if Js′in → sprefout · symK ⊆ (in, op) | ∃os ∈ Σ∗.(in, op · os) ∈ JP K then8 sprefout = sprefout · sym9 s′out = findOutputPrefix(P , s′in, sprefout )
10 if s′out 6= ⊥ then return s′out
11 return ⊥
check fails, this extended prefix is discarded, thereby pruning its extensions from the search
space. Otherwise, further extension of the extended prefix is attempted. We next define partial
consistency.
Definition 6.3.3. An SE pair s′in → sprefout is partially consistent with P if for every assignment
env, env[sprefout ] is a prefix of JP K(env[s′in]).
When s′in is clear from the context, we say that sprefout is partially consistent with P .
By the semantics definition, a pair s′in → sprefout is partially consistent with P if and only
if Js′in → sprefout K ⊆ (in, op) | ∃os ∈ Σ∗.(in, op · os) ∈ JP K (which is the check of line 7).
Partial consistency is a necessary condition (albeit not sufficient) for sprefout · sym to be a prefix
of a completion s′out. Thus, if sprefout · sym is not partially consistent, there is no need to check
its extensions. Note that even if a certain prefix sprefout · sym is partially consistent, it may be
that this prefix cannot be further extended (namely, the suffixes cannot be realized by an SE).
In this case, this prefix will be discarded in later iterations and sprefout and a different attempt to
extend sprefout will be made. This extension process terminates when an extension results in a
completion, in which case it is returned, or when all extensions fail, in which case ⊥ is returned.
Lemma 6.3.4. The recursion depth of Algorithm 13 is bounded by the length of JP K(in).
Proof. Denote by n the length of JP K(in). Assume to the contrary that the recursion depth
exceeds n. Namely, the current prefix, sprefout , is strictly longer than n. We show that in this case,
the partial consistency check is guaranteed to fail. To this end, we show an assignment env
to s′in such that env[sprefout ] is not a prefix of JP K(env[s′in]). Consider the assignment env that
maps each variable in s′in to its original value in in (namely, env[s′in] = in). This assignment
maps each variable to exactly one letter. By our assumption, the length of env[sprefout ] is greater
than n. Thus, env[sprefout ] (of length > n) cannot be a prefix of JP K(in) (of length n).
6.3.3 Guarantees
Lemma 6.3.2 and Lemma 6.3.4 ensure that both the input generalization and the completion
algorithms terminate in polynomial time. Thus, the overall runtime of L-SEP is polynomial.
80
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Finally, we discuss the guarantees of these algorithms.
Lemma 6.3.5. findCompletion is sound and complete: if it returns s′out, then s′out is a
completion of s′in for P , and if it returns ⊥, then s′in has no completion for P .
Soundness follows since findOutputPrefix returns s′out only after validating that
Js′in → s′outK ⊆ JP K. Completeness follows since s′out is gradually constructed and every
possible extension is examined.
Lemma 6.3.6. L-SEP is sound and complete: for every (in, out) pair, an SE pair is returned,
and if L-SEP returns an SE pair, then it is an abstract example for P .
Soundness is guaranteed from findCompletion. Completeness follows since even if all
generalizations fail, L-SEP returns the concrete example as an SE pair.
Theorem 6.1. L-SEP returns an abstract example io for P such that (in, JP K(in)) ∈ JioK and
io is maximal w.r.t. .
This follows from Lemma 6.3.1, Lemma 6.3.5, and since L-SEP terminates only when InCands
is empty (i.e., when there are no more input generalizations to explore).
We note that in our implementation, findCompletion runs heuristics instead of the
expensive backtracking. In this case, maximality is no longer guaranteed.
6.3.4 Running Example
We next exemplify L-SEP on the (shortened) example from the Overview (Section 6.1), where
we start from the concrete example in = Diane and we wish to obtain the abstract example
a0a1A2 → Ha1 a0a1A2. L-SEP starts with: sin = Diane and sout = Hi Diane. It then picks
a minimal candidate that generalizes sin. A minimal candidate differs from sin in one atomic
construct in some position i. By , if sin[i] = σ, then s′in[i] is x or σk.
Assume that L-SEP first tests this minimal candidate: s′in = Dk0 iane. To test it, L-SEP
calls findCompletion to look for a completion. The completion is defined over s′in and in
particular can use the variable k0. Then, findCompletion invokes findOutputPrefix(P ,
Dk0 iane, ε). In the first call of findOutputPrefix, all extensions of the current prefix, ε,
except for H, fail in the partial consistency check. This follows since the output of P always
starts with an ‘H’ (and not, e.g., with ‘Hk0’). Thus, a recursive call is invoked (only) for the
output SE prefix H. In this call, all extensions (i.e., Hσ or Hσk0) fail. For example, Hi fails since
the output prefix is not always “Hi” (e.g., P (DDiane) = HD DDiane. Since the prefix H cannot be
extended further, ⊥ is returned. This indicates that the input generalization s′in = Dk0 iane fails.
Thus, L-SEP removes from InCands all generalizations whose first construct generalizes Dk0 .
L-SEP then tests another minimal generalization: s′in = x0iane. It then calls
findCompletion (which can use x0). As before, (only) the prefix SE H is found parti-
ally consistent. Next, a second call attempts to extend H. This time, the extension Hi succeeds
81
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
because for all interpretations of x0iane, the output prefix is “Hi”. The recursion continues, until
obtaining and returning the completion Hi x0iane.
When L-SEP learns that s′in is a feasible generalization, it updates sin and sout, and prunes
InCands to candidates generalizing x0iane (for example, InCands contains x0x1ane). Eventu-
ally, sin is generalized to s′in = x0x1X2X3X4 with the completion s′out = Hx1 x0x1X2X3X4.
In a postprocessing step (performed when L-SEP is done), X2X3X4 is simplified to Y , resulting
in the abstract example x0x1Y → Hx1 x0x1Y . Note that the last “generalization” is no longer
according to .
6.4 Synthesis with Abstract ExamplesIn this section, we present our framework for synthesis with abstract examples. We assume
the existence of an oracle O (e.g., a user) with a fixed target program Ptar. Our framework
is parameterized with a synthesizer S that takes concrete or abstract examples and returns a
consistent program. Note that the guarantee to finally output a program equivalent to Ptar is the
responsibility of our framework and not S . Nonetheless, candidate programs are provided by S .
Goal The goal of our framework is to learn a program equivalent to the target program. Note
that this is different from the traditional goal of PBE synthesizers, which learn a program that
agrees with the target program at least on the observed inputs. More formally, our goal is to
learn a program P ′ such that JPtarK = JP ′K, whereas PBE synthesizers that are given a set
of input-output examples E ⊆ D ×D can only guarantee to output a program P ′′ such that
JPtarK ∩ E = JP ′′K ∩ E.
Interaction Model We assume that the oracleO can accept abstract examples or reject them and
provide a counterexample. If the oracle accepts an abstract example io, then JioK ⊆ JPtarK. If it
returns a counterexample cex = (in′, out′), then (i) (in′, out′) ∈ JPtarK, (ii) (in′, out′) /∈ JioK,
and (iii) in′ ∈ Jin(io)K.
Operation Our framework (Algorithm 14) takes an initial (nonempty) set of input-output
examples E ⊆ D ×D. This set may be extended during the execution. The algorithm consists
of two loops: an outer one that searches for a candidate program and an inner one that computes
abstract examples for a given candidate program. The inner loop terminates when one of the
abstract examples is rejected (in which case a new iteration of the outer loop begins) or when the
input space is covered (in which case the candidate program is returned along with the abstract
example specification).
The algorithm begins by initializing A to the empty set. This set accumulates abstract
examples that eventually form an abstract example specification of Ptar. Then the outer loop
begins (Lines 2–10). Each iteration starts by asking the synthesizer for a program P consistent
with the current set of concrete examples in E and abstract examples in A. Then, the inner loop
begins (Lines 4–9). At each inner iteration, an input in is picked and L-SEP(P, in) is invoked.
When an abstract example io is returned, it is presented to the oracle. If the oracle provides a
counterexample cex = (in′, out′), then JP K 6= JPtarK (see Lemma 6.4.1). In this case, E is
extended with cex, and a new outer iteration begins. If the oracle accepts the abstract example,
82
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Algorithm 14: synthesisWithAbstractExamples(E)1 A = ∅ // initialize the set of abstract examples
2 while true do3 P = S(E, A) // obtain a program consistent with the examples
4 while ∪io∈AJin(io)K 6= D do // A does not cover D
5 Let in ∈ D \ ∪io∈AJin(io)K // obtain uncovered input
6 io = L-SEP(P, in) // learn abstract example
7 cex= O(io) // ask the oracle
8 if cex = ⊥ then A = A ∪ io // abstract example is correct
9 else E = E ∪ cex ; break // add a counterexample
10 return (P , A)
io, the abstract example is added to A (since it is an abstract example for Ptar). The idea is that
the synthesizer extends its set of examples with more examples (potentially an infinite number).
This (potentially) enables faster convergence to Ptar (in case additional outer iterations are
needed). If the inner loop terminates without encountering counterexamples, then A covers
the input domain D. At this point it is guaranteed that JP K = JPtarK (see Theorem 6.2). Thus,
P is returned, along with the abstract example specification A. Note that A has already been
validated and need not be inspected again.
We remark that although abstract examples can help the synthesizer to converge faster to
the target program, the convergence speed (and the number of counterexamples required to
converge) still depends on the synthesizer (which is a parameter to our framework) and not on
L-SEP or our synthesis framework.
Lemma 6.4.1. If O(io) = (in′, out′) ( 6= ⊥), then JP K 6= JPtarK.
Proof. From the oracle properties (in′, out′) ∈ JPtarK, (in′, out′) /∈ JioK, and in′ ∈ Jin(io)K.
Thus, there exists out′′ 6= out′ such that (in′, out′′) ∈ JioK. Since by construction, JioK ⊆ JP K,
it follows that (in′, out′′) ∈ JP K. Thus, JP K 6= JPtarK.
Theorem 6.2. Upon termination, Algorithm 14 returns a program P s.t. JP K = JPtarK.
Proof. Upon termination, for every in ∈ D there exists io ∈ A s.t. in ∈ Jin(io)K. By
construction JioK ⊆ JP K, and thus (in, JP K(in)) ∈ JioK. By the oracle properties, JioK ⊆JPtarK; thus (in, JPtarK(in)) ∈ JioK. Altogether, JP K(in) = JPtarK(in).
We emphasize that the interaction with the oracle (user) takes place only after both a
candidate program and an abstract example have been obtained; the goal of the interaction is
to determine whether the candidate program is correct. Rejection of the abstract example by
the user means rejection of the candidate program, in which case the PBE synthesizer S looks
for a new candidate program. In particular, the goal of the interaction is not to confirm the
correctness of the abstract examples – L-SEP always returns (without any interaction) a correct
generalization with respect to the candidate program.
83
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
E P (x) Abstract Examples Counterexample?
(10101,10111) P (x) = 10111 X → 10111 1→ 11
(10101,10111),(1,11)
P (x) = OR(x, 2) X0x1x2 → X01x2 0→ 1
(10101,10111),(1,11),(0,1)
P (x) = OR(x+ 1, 1) X00x2 → X0x21 NoX00→ X01 NoX00x11→ X0x1x11 11→ 111
(10101,10111),(1,11),(0,1),(11,111)
P (x) = OR(x+ 1, x) X001k → X01k1 No
Table 6.1: A running example for learning a program that flips the rightmost 0 bit with oursynthesis framework. The target program is Ptar(x) = OR(x+ 1, x).
Example We next exemplify our synthesis framework in the bit vector domain. We consider
a program space P defined inductively as follows. The identity function and all constant
functions are in P . For every op ∈ Not,Neg and P ∈ P , op(P ) ∈ P , and for every
op ∈ AND,OR,+, –,SHL,XOR,ASHR and P1, P2 ∈ P , op(P1, P2) ∈ P . We assume a
naıve synthesizer that enumerates the program space by considering programs of increasing
size and returning the first program consistent with the examples. In this setting, we consider
the task of flipping the rightmost 0 bit, e.g., 10101 → 10111 (taken from the SyGuS compe-
tition [AFSS16]). While this task is easy to explain intuitively through examples, phrasing
it as a logical formula is cumbersome. Assume a user provides to Algorithm 14 the set of
examples E = (10101, 10111). Table 6.1 shows the execution steps taken by our synt-
hesis framework: E shows the current set of examples, P (x) shows the candidate program
synthesized by the naıve synthesizer, Abstract Examples shows the abstract examples com-
puted by L-SEP and Counterexample? is either No if the user accepts the current abstract
example (to its left) or a pair of input-output example contradicting the current abstract ex-
ample. In this example, L-SEP uses the set of functions F = fneg in the output SE, where
fneg(0) = 1, fneg(1) = 0, and we abbreviate fneg(y) with y. Further, since the bit vector domain
consists of vectors of a fixed size (namely, Σn for a fixed n instead of Σ∗), the SE’s seman-
tics in this domain is defined as the suffixes of size n of its (normal) interpretation. Formally,
JseKn = s ∈ Σn | ∃env. s is a suffix of env[se]. The semantics of an input-output SE is
defined similarly. In the example, the first two programs are eliminated immediately by the user,
whereas the third program is eliminated only after showing the third abstract example describing
it. This enables the synthesizer to prune a significant portion of the search space. Note that
since abstract examples are interpreted over fixed sized vectors (as explained above), the last
abstract example covers the input space: if k = n, the input isn times︷ ︸︸ ︷11...1; if k = 0, the input takes
the form of b0...bn−10 (where the bi-s are bits); and if 0 < k < n, the input takes the form of
b0...bn−k−101k.
84
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Leveraging Counterexamples for Learning Abstract Examples A limitation of L-SEP is that
it only generalizes the existing characters of the concrete input. For example, consider a
candidate program generated by S that returns the first and last character of the string, which
can be summarized by the abstract example x0X1x2 → x0x2. In the process of generating
an abstract example specification for the candidate program, if the first example provided by
Algorithm 14 to L-SEP for generalization is ab, then it is generalized to x0x1 → x0x1. On the
other hand, if the first example is acb, then it is generalized to x0X1x2 → x0x1, whose domain
is a strict superset of the former’s domain. This exemplifies that some inputs may provide
better generalizations than others. Although eventually our framework will learn the better
generalizations, if Algorithm 14 starts from the less generalizing examples, then its termination
is delayed, and unnecessary questions are presented to the oracle (in our example, x0x1 → x0x1
will be presented, followed by x0X1x2 → x0x2, both of which are accepted, but the former
perhaps could have been avoided). We believe that the way to avoid this delay in the algorithm’s
termination is to pick “good” examples. We leave the question of how to identify them to future
work, but note that if the oracle is assumed to provide “good” examples (e.g., representative),
then Line 5 can be changed to first look for an uncovered input in E.
6.5 EvaluationIn this section, we discuss our implementation and evaluate L-SEP and our synthesis framework.
We evaluate our algorithms in two domains: strings and bit vectors (of size 8). The former
domain is suitable for end users, as targeted by approaches like Flash Fill or learning regular
expressions. The latter domain is of interest to the synthesis community (evident by the SyGuS
competition [AFSS16]). We begin with our implementation and then discuss the experiments.
All experiments ran on a Sony Vaio PC with Intel(R) Core(TM) i7-3612QM processor and
8GB RAM.
6.5.1 ImplementationWe implemented our algorithms in Java. We next provide the main details.
Program Spaces The program space we consider for bit vectors is the one defined in the example
at the end of Section 6.4. The program space P we consider for the string domain is defined
inductively as follows. The identity function and all constant functions are in P . For every
P1, P2 ∈ P , concat(P1, P2) ∈ P . For P ∈ P and integers i1, i2, Extract(P, i1, i2) ∈ P . For
P1, P2 ∈ P , and a condition e over string programs and integer symbols, ITE(e, P1, P2) ∈ P .
SE Spaces In the bit vector domain we consider F = fneg where fneg(b) = 1− b.
findCompletion To answer the containment queries (Lines 3 and 7), we use the Z3 SMT-
solver [DMB08]. To this end, we encode the candidate program P and the SEs as formulas.
Roughly speaking, an SE is encoded as a conjunction of sequence predicates, each encoding a
single atomic construct. A sequence predicate extends the equality predicate with a start position
and is denoted by t1i= t2. An interpretation d1, d2 for t1, t2 satisfies t1
i=t2 if starting from the
ith character of d1 the next |d2| characters are equal to d2. The term t1 is either a unique variable
tin, representing the input (for input SEs), or P (tin) (for output SEs). The term t2 can be (i) σ
85
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
(a letter from Σ), (ii) σk where k is a star variable, or (iii) a character or sequence variable. For
example, X0abk2x3 is encoded as: tin0=X0 ∧ tin
|X0|= a ∧ (∀i.1 + |X0| ≤ i < 1 + |X0|+ k2 →
tini=b)∧ tin
1+|X0|+k2= x3. Note that the positions can be a function of the variables. In the string
domain, the formulas are encoded in string theory (except for i and k2, which are integers). In
the bit vector domain, entities are encoded as bit vectors and i= is implemented with masks.
Synthesis Framework To check whether A covers the input domain and obtain an uncovered
input in if not, we encode the abstract examples in A as formulas. We then check whether one
of the concrete examples from E does not satisfy any of these formulas. If so, it is taken as in.
Otherwise, we check whether there is another input that does not satisfy the formulas, and if so
it is taken as in; otherwise the input domain is covered.
Synthesizer Our synthesizer is a naıve one that enumerates the program space by considering
programs of increasing size and returning the first program consistent with the examples.
Technically, we check consistency by submitting the formula P (in) = out to an SMT-solver
for every (in, out) ∈ E. Likewise, P is checked to be consistent with the abstract examples
by encoding them as formulas and testing whether they imply P . More sophisticated PBE
synthesizers, such as Flash-Fill, can in many cases be extended to handle abstract examples in a
straightforward manner.
6.5.2 Synthesis Framework Evaluation
In this section, we evaluate our synthesis framework on the bit vector domain. We consider
three experimental questions: (1) Do abstract examples reduce the number of concrete examples
required from the user? (2) Do abstract examples enable better pruning for the synthesizer?
(3) How many abstract examples are presented to the user before he rejects a program? To
answer these questions, we compare our synthesis framework (denoted AE) to a baseline
that implements the current popular alternative ([SL08]), which guarantees that a synthesized
program is correct. The baseline acts as follows. It looks for the first program that is consistent
with the provided examples and then asks the oracle whether this program is correct. The oracle
checks whether there is an input for which the synthesized program and the target program return
different outputs. If so, the oracle provides this input and its correct output to the synthesizer,
which in turn looks for a new program. If there is no such input, the oracle reports success,
and the synthesis completes. We assume a knowledgable user (oracle), implemented by an
SMT-solver, which is oblivious to whether the program is easy for a human to understand,
making the comparison especially challenging.
Benchmarks We consider three benchmarks, B(4), B(6), and B(8), each consisting of 50
programs. A program is in B(n) if baseline required at least n examples to find it. To find such
programs, we randomly select programs of size 4, for each we execute baseline (to find it), and
if it required at least n examples, we add it to B(n) and execute our synthesis framework (AE)
to find the same (or an equivalent) program.
Consistency of Examples The convergence of these algorithms is highly dependent on the
examples the oracle provides. To guarantee a fair comparison, we make sure that the same
86
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
B(4) B(6) B(8)AE baseline AE baseline AE baseline
#Concrete examples (candidate programs) 4.42 5.64 5.50 7.68 6.62 10.26Spec-final 11.04 9.36 13.22#AE-intermediate 1.98 2.00 3.23%Better than baseline 68% 76% 96%%Equal to baseline 30% 22% 2%%Worse than baseline 2% 2% 2%
Table 6.2: Experimental results on the bit vector domain.
Figure 6.2: Detailed results for B(8).
examples are presented to both algorithms whenever possible. To this end, we use a cache
that stores the examples observed by the baseline. When our algorithm asks the oracle for an
example, it first looks for an example in the cache. Only if none meets its requirements, can it
ask (an SMT-solver) for a new concrete example.
Results Table 6.2 summarizes the results. It reports the following:
• #Concrete examples: the average number of concrete examples the oracle provided, which
is also the number of candidate programs.
• Spec-final: the average size of the final abstract example specification (after removing
implied abstract examples).
• #AE-intermediate: the average number of abstract examples shown to the user before he
rejected the corresponding candidate program.
• %Better/ equal/ worse than baseline: the percentage of all programs in the benchmark
that required fewer/ same/ more (concrete) examples than the baseline.
We observed that the time to generate a single abstract example is a few seconds (≈ 6 seconds).
Results indicate that our synthesis framework (AE), which prunes the program space based
on the abstract examples, improves the baseline in terms of the examples the user needs to
provide. This becomes more significant as the number of examples required increases: AE
improves the baseline on B(4) by 22%, on B(6) by 30%, and on B(8) by 37%. Moreover, in
each benchmark AE performed worse than the baseline only in a single case – and the common
case was that it performed better (in B(8), AE performed better on all cases except two).
Fig. 6.2 provides detailed evidence of the improvement: it shows for each experiment (the
x-axis) the number of concrete examples each algorithm required (the y-axis). The figure
illustrates that the improvement can be significant. For example, in the 47th experiment, AE
reduced the number of examples from 17 to 7.
87
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
The String Program #Abstract Ex-amples
Concatenates the string “Dear” to the last name. 1Concatenates the first letter of the first name to the last name. 1Concatenates the first letter of the first name to the last name and to “@lockhart-gardner.com”.
1
Generates the message presented in the motivating example. 2Concatenates the first two characters of the first name to the third and fourth characters ofthe last name and to the second digit of the meeting time.
6.57
Table 6.3: Experimental results on the string domain.
The number of concrete examples is also the number of candidate programs generated by the
synthesizer. Thus, the lower number of examples indicates that the abstract examples improve
the pruning of the program space. Namely, abstract examples help the overall synthesis to
converge faster to the target program.
6.5.3 Abstract Example Specification EvaluationIn this section, we evaluate our generalization algorithm, L-SEP, in the string domain and check
how well it succeeds in learning small specifications. To this end, we fix a program and a
concrete example to start with and run L-SEP. We repeat this with uncovered inputs until the
set of abstract examples covers the string domain. We then check how many abstract examples
were computed.
The programs we considered are related to the motivating example. For each program, we
run five experiments. Each experiment uses a different Excel row (lawyer) as the first concrete
example. We note that our implementation assumes that the names and meeting times are
non-empty strings and are space-delimited. Table 6.3 reports the programs and the average
number of abstract examples. Results indicate that the average number of abstract examples
required to describe the entire string domain is low.
6.6 Related WorkIn this section, we survey the work closely related to ours.
Learning Specifications Learning regular languages from examples has been extensively studied
in the computational learning theory, under different models: (i) identification in the limit
(Gold [Gol67]), (ii) query learning (Angluin [Ang88]), and (iii) PAC learning (Valiant [Val84]).
Our setting is closest to Angluin’s setting, which defines a teacher-student model and two types
of queries: membership (concrete examples) and equivalence (validation). The literature has
many results for this setting, including learning automata, context-free grammars, and regular
expressions (see [Sak97]). In the context of learning regular expressions, current algorithms
impose restrictions on the target regular expression. For example, [BC94] allows at most one
union operator, [Kin10] prevents unions and allows loops up to depth 2 , [Fer09] assumes that
input samples are finite and Kleene stars are not nested, and [BNST06] assumes that expressions
consist of chains that have at most one occurrence of every symbol. In contrast, we learn an
extended form of regular expressions but we also impose some restrictions. In the context of
88
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
learning specifications, [TLHL11] learns specifications for programs in the form of logical
formulas, which are not intuitive for most users. Symbolic transducers [VHL+12, BB13]
describe input-output specifications, but these are more natural to describe functions over
streams than input manipulations.
Least General Generalization L-SEP takes the approach of least general generalization to
compute an abstract example. The approach of least general generalization was first introduced
by Plotkin [Plo70], who pioneered inductive logic programming and showed how to generalize
formulas. This approach was later used to synthesize programs from examples in a PBE
setting [MF90, RGMF14]. In contrast, we use this approach not to learn the low-level program,
but the high-level specification in the form of abstract examples.
Pre/Post- Condition Inference Learning specifications is related to finding the weakest pre-
conditions, strongest post-conditions, and inductive invariants [Dij75, GT07, Riv05, CCL11,
CCFL13, GLMN14]. Current inference approaches are mostly for program analysis and aim to
learn the conditions under which a bad behavior cannot occur. Our goal is different: we learn
the (good and bad) behaviors of the program and present them through a high-level language.
Applications of Regular Expressions There are many applications of regular expressions, for
example in data filtering (e.g., [WGS16]), learning XML file schemes (DTD) (e.g., [Fer09,
BNST06]), and program boosting (e.g., [CDL+15]). All of these learn expressions that are
consistent with the provided examples and have no guarantee on the target expression. In
contrast, we learn expressions that precisely capture program specifications.
6.7 ConclusionWe presented a novel synthesizer that interacts with the user via abstract examples and is
guaranteed to return a program that is correct on all inputs. The main idea is to use abstract
examples to describe a program behavior on multiple concrete inputs. To that end, we showed
L-SEP, an algorithm that generates maximal abstract examples. L-SEP enables our synthesizer
to describe candidate programs’ behavior through abstract examples. We implemented our
synthesizer and experimentally showed that it required few abstract examples to reject false
candidates and reduced the overall number of concrete examples required.
89
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
90
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Chapter 7
Conclusion
In this thesis, we studied the problem of exact programming by example. In programming by
example, a user provides a set of input-output examples, and a synthesizer generates a program
consistent with these examples. The premise of programming by example is that these examples
capture the user’s intent, and thus the synthesizer will return a program that captures it even on
unseen inputs. Unfortunately, this is typically not the case and examples often under-specify the
user’s intent, especially when they are few and the input domain is large or infinite. Previous
approaches in programming by example either assumed that the user can inspect the final
program (directly or by looking at its outputs on new inputs) and provide more examples if
the outcome is incorrect, or they exhaustively presented membership queries to the user until
converging to a single program without providing bounds on the number of queries.
In this research, we formalized the problem of learning the user’s intent from examples as
an instance of exact learning. We captured user intent as a formula over arbitrary predicates
and limited the student’s (i.e., the synthesizer’s) queries to membership queries. We began by
studying a novel domain for program synthesis – patterns in time-series charts. We formalized
patterns as conjunctions over variable inequalities and showed an exact learning algorithm that
learns the pattern from charts. We then generalized this algorithm to algorithms that learn the
class of conjunctions and disjunctions over arbitrary predicates. The crux of these algorithms
is to identify non-equivalent formulas, which is crucial for reducing the search space size and
lowering the number of queries posed. Finally, we turned to the most general class: DNF
formulas over arbitrary predicates. We showed algorithms to learn this class and further studied
two important sub-classes: DNF formulas over predicates that are closed under negation, and
DNF formulas over predicates that are anti-closed under negation. Since any formula has a
representation as a DNF formula over the same predicates, this implies that any user intent can
be learned from examples with algorithms that minimize the number of membership queries
posed. In the final chapter, we investigated a different approach to guarantee exactness while
interacting through abstract examples. Abstract examples provide a middle ground between
membership queries and validation queries: they provide the same guarantee as validation
queries, while enjoying the simplicity of examples. We demonstrated how synthesizers can
benefit from abstract examples, both in communicating a candidate program’s specification
91
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
in an intuitive language and in pruning the program space quickly to speed the convergence
towards the target program.
The novelty of our research can be summarized as follows. Previous works in programming
by example did not learn the exact user intent or did not provide a bound on the number of
queries presented. Previous works in exact learning have focused on specific types of predicates,
and thus could not capture any user intent. Thus, our work is a contribution to both program
synthesis and exact learning, and a demonstration of their tight connection. We hope this
research will inspire others to pursue this exciting field of exact programming by example. Some
interesting directions for future study are:
• Improving query complexity: Some of the algorithms presented are not optimal, which
leaves room for improved algorithms with better query complexity.
• Improving query complexity for special predicate sets: We presented general algorithms
to learn formulas over arbitrary predicate sets. As Chapter 3 demonstrates, fixing a
predicate set may yield algorithms with better query complexity. As many of program
synthesis works focus on specific domains, they may design domain-specific learning
algorithms with better query complexity than our general-purpose algorithms.
• Learning the important predicates: An inherent assumption of our algorithms is that a
predicate set is provided. This raises the question of how to obtain the predicates. Fixing
a domain can significantly help in this task; however, there is a tradeoff between the
expressibility of the predicate set (i.e., in separating elements of the input domain) and
the size of the predicate set, which is an important factor in the number of queries posed.
• Developing abstract examples: Finally, we have shown that abstract examples can serve
as a middle ground between validation queries and membership queries. However, the
effectiveness of abstract examples is highly dependent on their representation. We have
shown one representation that is suitable to describe string-manipulation programs. An
interesting future direction is to find succinct representations of concrete examples in
other domains.
92
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Bibliography
[ABC+12] Vicente Acuna, Etienne Birmele, Ludovic Cottret, Pierluigi Crescenzi,
Fabien Jourdan, Vincent Lacroix, Alberto Marchetti-Spaccamela, Andrea
Marino, Paulo Vieira Milreu, Marie-France Sagot, and Leen Stougie.
Telling stories: Enumerating maximal directed acyclic graphs with a
constrained set of sources and targets. Theoretical Computer Science,
457:1 – 9, 2012.
[ABJ+13] Rajeev Alur, Rastislav Bodık, Garvit Juniwal, Milo M. K. Martin, Mu-
kund Raghothaman, Sanjit A. Seshia, Rishabh Singh, Armando Solar-
Lezama, Emina Torlak, and Abhishek Udupa. Syntax-guided synthesis.
In Formal Methods in Computer-Aided Design, FMCAD 2013, Portland,
OR, USA, October 20-23, 2013, pages 1–8, 2013.
[ABK+02] Noga Alon, Richard Beigel, Simon Kasif, Steven Rudich, and Benny
Sudakov. Learning a hidden matching. In Proceedings of the 43rd
Symposium on Foundations of Computer Science, FOCS ’02, pages 197–
206, Washington, DC, USA, 2002. IEEE Computer Society.
[AC08] Dana Angluin and Jiang Chen. Learning a hidden graph using queries
per edge. Journal of Computer and System Sciences, 74(4):546 – 556,
2008. Carl Smith Memorial Issue.
[ACK01] Saswat Anand, Wei-Ngan Chin, and Siau-Cheng Khoo. Charting patterns
on price history. In Proceedings of the Sixth ACM SIGPLAN International
Conference on Functional Programming (ICFP ’01), Firenze (Florence),
Italy, September 3-5, 2001, pages 134–145, 2001.
[AFSS16] Rajeev Alur, Dana Fisman, Rishabh Singh, and Armando Solar-Lezama.
Sygus-comp 2016: Results and analysis. In Proceedings Fifth Workshop
on Synthesis, SYNT@CAV 2016, Toronto, Canada, July 17-18, 2016,
pages 178–202, 2016.
[AGK13] Aws Albarghouthi, Sumit Gulwani, and Zachary Kincaid. Recursive
program synthesis. In Computer Aided Verification - 25th Internatio-
93
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
nal Conference, CAV 2013, Saint Petersburg, Russia, July 13-19, 2013.
Proceedings, pages 934–950, 2013.
[Ang88] Dana Angluin. Queries and concept learning. Machine Learning,
2(4):319–342, 1988.
[BB13] Matko Botincan and Domagoj Babic. Sigma*: Symbolic learning of
input-output specifications. In Proceedings of the 40th Annual ACM
SIGPLAN-SIGACT Symposium on Principles of Programming Langua-
ges, POPL ’13, pages 443–456, New York, NY, USA, 2013. ACM.
[BC94] Alvis Brazma and Karlis Cerans. Efficient learning of regular expressions
from good examples. In 5th International Workshop on Algorithmic
Learning Theory, ALT ’94, Reinhardsbrunn Castle, Germany, October
10-15, 1994, Proceedings, pages 76–90, 1994.
[BCD+13] Mike Barnett, Badrish Chandramouli, Robert DeLine, Steven Drucker,
Danyel Fisher, Jonathan Goldstein, Patrick Morrison, and John Platt.
Stat!:an interactive analytics environment for big data. In Proceedings of
the ACM SIGMOD International Conference on Management of Data,
SIGMOD 2013, New York, NY, USA, June 22-27, 2013, pages 1013–1016,
2013.
[BCL+13] Michele Borassi, Pierluigi Crescenzi, Vincent Lacroix, Andrea Marino,
Marie-France Sagot, and Paulo Vieira Milreu. Telling stories fast. In
Experimental Algorithms: 12th International Symposium, SEA 2013,
Rome, Italy, June 5-7, 2013. Proceedings, pages 200–211, 2013.
[BDG+07] Lars Brenna, Alan Demers, Johannes Gehrke, Mingsheng Hong, Joel
Ossher, Biswanath Panda, Mirek Riedewald, Mohit Thatte, and Walker
White. Cayuga: A high-performance event processing engine. In Procee-
dings of the ACM SIGMOD International Conference on Management of
Data, Beijing, China, June 12-14, 2007, pages 1100–1102, 2007.
[BG07] E. Biglieri and L. Gyrfi. Multiple Access Channels: Theory and Practice.
IOS Press, Amsterdam, The Netherlands, 2007.
[BGHZ15] Daniel W. Barowy, Sumit Gulwani, Ted Hart, and Benjamin Zorn.
Flashrelate: Extracting relational data from semi-structured spreadsheets
using examples. In Proceedings of the 36th ACM SIGPLAN Conference
on Programming Language Design and Implementation, Portland, OR,
USA, June 15-17, 2015, pages 218–228, 2015.
[BGV05] Annalisa De Bonis, Leszek Gasieniec, and Ugo Vaccaro. Optimal two-
stage algorithms for group testing problems. SIAM Journal on Computing,
34(5):1253–1270, 2005.
94
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
[Bie78] Alan W. Biermann. The inference of regular lisp programs from examples.
IEEE Transactions on Systems, Man, and Cybernetics, 8(8):585 – 600,
1978.
[BNST06] Geert Jan Bex, Frank Neven, Thomas Schwentick, and Karl Tuyls. In-
ference of concise DTDs from XML data. In Proceedings of the 32nd
International Conference on Very Large Data Bases, Seoul, Korea, Sep-
tember 12-15, 2006, pages 115–126, 2006.
[BTGC16] James Bornholt, Emina Torlak, Dan Grossman, and Luis Ceze. Optimi-
zing synthesis with metasketches. In Proceedings of the 43rd Annual
ACM SIGPLAN-SIGACT Symposium on Principles of Programming Lan-
guages, POPL 2016, St. Petersburg, FL, USA, January 20 - 22, 2016,
pages 775–788, 2016.
[Bul05] Thomas N. Bulkowski. Encyclopedia of Chart Patterns. Wiley, 2nd
edition, 2005.
[Bul12] T.N. Bulkowski. Visual Guide to Chart Patterns. Bloomberg Financial.
2012.
[CCFL13] Patrick Cousot, Radhia Cousot, Manuel Fahndrich, and Francesco Lo-
gozzo. Automatic inference of necessary preconditions. In Verification,
Model Checking, and Abstract Interpretation, 14th International Con-
ference, VMCAI 2013, Rome, Italy, January 20-22, 2013. Proceedings,
pages 128–148, 2013.
[CCL11] Patrick Cousot, Radhia Cousot, and Francesco Logozzo. Precondition
inference from intermittent assertions and application to contracts on
collections. In Verification, Model Checking, and Abstract Interpretation
- 12th International Conference, VMCAI 2011, Austin, TX, USA, January
23-25, 2011. Proceedings, pages 150–168, 2011.
[CDL+15] Robert A. Cochran, Loris D’Antoni, Benjamin Livshits, David Mol-
nar, and Margus Veanes. Program boosting: Program synthesis via
crowd-sourcing. In Proceedings of the 42nd Annual ACM SIGPLAN-
SIGACT Symposium on Principles of Programming Languages, POPL
2015, Mumbai, India, January 15-17, 2015, pages 677–688, 2015.
[CF07] Mooi Choo Chuah and Fen Fu. ECG Anomaly Detection via Time Series
Analysis, pages 123–135. Springer Berlin Heidelberg, 2007.
[CGM10] Badrish Chandramouli, Jonathan Goldstein, and David Maier. High-
performance dynamic pattern matching over disordered streams. PVLDB,
3(1):220–231, 2010.
95
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
[Cic13] Ferdinando Cicalese. Group testing. In Fault-Tolerant Search Algorithms,
pages 139–173. Springer, 2013.
[CKSL15] Alvin Cheung, Shoaib Kamil, and Armando Solar-Lezama. Bridging
the gap between general-purpose and domain-specific compilers with
synthesis. In 1st Summit on Advances in Programming Languages,
SNAPL 2015, May 3-6, 2015, Asilomar, California, USA, pages 51–62,
2015.
[CSLM13] Alvin Cheung, Armando Solar-Lezama, and Samuel Madden. Optimizing
database-backed applications with query synthesis. In ACM SIGPLAN
Conference on Programming Language Design and Implementation,
PLDI ’13, Seattle, WA, USA, June 16-19, 2013, pages 3–14, 2013.
[CSRL01] Thomas H. Cormen, Clifford Stein, Ronald L. Rivest, and Charles E.
Leiserson. Introduction to Algorithms. McGraw-Hill Higher Education,
2nd edition, 2001.
[DH00] D. Du and F. Hwang. Combinatorial Group Testing and Its Applications.
Applied Mathematics. World Scientific, 2000.
[DH06] D. Du and F. Hwang. Pooling Designs and Nonadaptive Group Testing:
Important Tools for DNA Sequencing. Series on applied mathematics.
World Scientific, 2006.
[Dij75] Edsger W. Dijkstra. Guarded commands, nondeterminacy and formal
derivation of programs. Commun. ACM, 18(8), 1975.
[DMB08] Leonardo De Moura and Nikolaj Bjørner. Z3: An efficient SMT solver.
In Tools and Algorithms for the Construction and Analysis of Systems,
14th International Conference, TACAS 2008, Held as Part of the Joint
European Conferences on Theory and Practice of Software, ETAPS 2008,
Budapest, Hungary, March 29-April 6, 2008. Proceedings, pages 337–
340, 2008.
[Dor43] Robert Dorfman. The detection of defective members of large populati-
ons. The Annals of Mathematical Statistics, 14(4):436–440, 1943.
[DP60] Martin Davis and Hilary Putnam. A computing procedure for quantifica-
tion theory. J. ACM, 7(3):201–215, July 1960.
[DSPGMW10] Anish Das Sarma, Aditya Parameswaran, Hector Garcia-Molina, and
Jennifer Widom. Synthesizing view definitions from data. In Data-
base Theory - ICDT 2010, 13th International Conference, Lausanne,
Switzerland, March 23-25, 2010, Proceedings, pages 89–103, 2010.
96
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
[FCD15] John K. Feser, Swarat Chaudhuri, and Isil Dillig. Synthesizing data
structure transformations from input-output examples. In Proceedings of
the 36th ACM SIGPLAN Conference on Programming Language Design
and Implementation, Portland, OR, USA, June 15-17, 2015, pages 229–
239, 2015.
[Fer09] Henning Fernau. Algorithms for learning regular expressions from
positive data. Inf. Comput., 2009.
[GHS12] Sumit Gulwani, William R. Harris, and Rishabh Singh. Spreadsheet data
manipulation using examples. Commun. ACM, 55(8):97–105, 2012.
[GK95] S.A. Goldman and M.J. Kearns. On the complexity of teaching. J.
Comput. Syst. Sci., 50(1):20–31, February 1995.
[GK98] Vladimir Grebinski and Gregory Kucherov. Reconstructing a hamiltonian
cycle by querying the graph: Application to DNA physical mapping.
Discrete Appl. Math., 88(1-3):147–165, November 1998.
[GLMN14] Pranav Garg, Christof Loding, P. Madhusudan, and Daniel Neider. ICE:
A robust framework for learning invariants. In Computer Aided Verifi-
cation - 26th International Conference, CAV 2014, Held as Part of the
Vienna Summer of Logic, VSL 2014, Vienna, Austria, July 18-22, 2014.
Proceedings, pages 69–87, 2014.
[Gol67] E. Mark Gold. Language identification in the limit. Information and
Control, 10(5):447–474, 1967.
[Gre69] Cordell Green. Application of theorem proving to problem solving.
In Proceedings of the 1st International Joint Conference on Artificial
Intelligence, IJCAI’69, pages 219–239, 1969.
[GT07] Sumit Gulwani and Ashish Tiwari. Computing procedure summaries for
interprocedural analysis. In Programming Languages and Systems, 16th
European Symposium on Programming, ESOP 2007, Held as Part of the
Joint European Conferences on Theory and Practics of Software, ETAPS
2007, Braga, Portugal, March 24 - April 1, 2007, Proceedings, pages
253–267, 2007.
[Gul10] Sumit Gulwani. Dimensions in program synthesis. In Proceedings of
the 12th International ACM SIGPLAN Conference on Principles and
Practice of Declarative Programming, July 26-28, 2010, Hagenberg,
Austria, pages 13–24, 2010.
97
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
[Gul11] Sumit Gulwani. Automating string processing in spreadsheets using
input-output examples. In Proceedings of the 38th ACM SIGPLAN-
SIGACT Symposium on Principles of Programming Languages, POPL
2011, Austin, TX, USA, January 26-28, 2011, pages 317–330, 2011.
[HAG+13] M. Hirzel, H. Andrade, B. Gedik, G. Jacques-Silva, R. Khandekar, V. Ku-
mar, M. Mendell, H. Nasgaard, S. Schneider, R. Soule, and K.-L. Wu.
IBM streams processing language: Analyzing big data in motion. IBM J.
Res. Dev., 57(3-4), 2013.
[Har74] Steven Hardy. Automatic induction of lisp functions. In Proceedings of
the 1st Summer Conference on Artificial Intelligence and Simulation of
Behaviour, AISB’74, pages 50–62, 1974.
[HG11] William R. Harris and Sumit Gulwani. Spreadsheet table transformations
from examples. In Proceedings of the 32nd ACM SIGPLAN Conference
on Programming Language Design and Implementation, PLDI 2011, San
Jose, CA, USA, June 4-8, 2011, pages 317–328, 2011.
[IGIS10] Shachar Itzhaky, Sumit Gulwani, Neil Immerman, and Mooly Sagiv. A
simple inductive synthesis methodology and its applications. In Procee-
dings of the 25th Annual ACM SIGPLAN Conference on Object-Oriented
Programming, Systems, Languages, and Applications, OOPSLA 2010,
October 17-21, 2010, Reno/Tahoe, Nevada, USA, pages 36–46, 2010.
[Inv] Investopedia. http://www.investopedia.com/
university/technical/techanalysis8.asp.
[JGST10] Susmit Jha, Sumit Gulwani, Sanjit A. Seshia, and Ashish Tiwari. Oracle-
guided component-based program synthesis. In Proceedings of the 32nd
ACM/IEEE International Conference on Software Engineering - Volume
1, ICSE 2010, Cape Town, South Africa, 1-8 May 2010, pages 215–224,
2010.
[JNR02] Rajeev Joshi, Greg Nelson, and Keith Randall. Denali: A goal-directed
superoptimizer. In Proceedings of the 2002 ACM SIGPLAN Conference
on Programming Language Design and Implementation (PLDI), Berlin,
Germany, June 17-19, 2002, pages 304–314, 2002.
[Kin10] Efim B. Kinber. Learning regular expressions from representative exam-
ples and membership queries. In Grammatical Inference: Theoretical
Results and Applications, 10th International Colloquium, ICGI 2010,
Valencia, Spain, September 13-16, 2010. Proceedings, pages 94–108,
2010.
98
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
[Knu97] Donald E. Knuth. The Art of Computer Programming, Volume 1 (3rd
Ed.): Fundamental Algorithms. Addison Wesley Longman Publishing
Co., Inc., Redwood City, CA, USA, 1997.
[Kol32] A. Kolmogoroff. Zur deutung der intuitionistischen logik. Mathematische
Zeitschrift, 35(1):58–65, 1932.
[LG14] Vu Le and Sumit Gulwani. Flashextract: A framework for data extraction
by examples. In ACM SIGPLAN Conference on Programming Language
Design and Implementation, PLDI ’14, Edinburgh, United Kingdom -
June 09 - 11, 2014, pages 542–553, 2014.
[LGS13] Vu Le, Sumit Gulwani, and Zhendong Su. Smartsynth: synthesizing
smartphone automation scripts from natural language. In The 11th Annual
International Conference on Mobile Systems, Applications, and Services,
MobiSys’13, Taipei, Taiwan, June 25-28, 2013, pages 193–206, 2013.
[LMW00] Andrew W. Lo, Harry Mamaysky, and Jiang Wang. Foundations of
technical analysis: Computational algorithms, statistical inference, and
empirical implementation. The Journal of Finance, 55(4):pp. 1705–1765,
2000.
[LWDW03] Tessa A. Lau, Steven A. Wolfman, Pedro Domingos, and Daniel S. Weld.
Programming by demonstration using version space algebra. Machine
Learning, 53(1-2):111–156, 2003.
[MEMlT+10] A. Morales-Esteban, F. Martınez-Alvarez, A. Troncoso, J.L. Justo, and
C. Rubio-Escudero. Pattern recognition to forecast seismic time series.
Expert Systems with Applications, 37(12):8333 – 8342, 2010.
[MF90] S. Muggleton and C. Feng. Efficient induction of logic programs. In
First Conference on Algorithmic Learning Theory, pages 368–381, 1990.
[MTG+13] Aditya Krishna Menon, Omer Tamuz, Sumit Gulwani, Butler W. Lamp-
son, and Adam Kalai. A machine learning framework for programming
by example. In Proceedings of the 30th International Conference on
Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013,
pages 187–195, 2013.
[MW71] Zohar Manna and Richard J. Waldinger. Toward automatic program
synthesis. Commun. ACM, 14(3):151–165, March 1971.
[MW75] Zohar Manna and Richard Waldinger. Knowledge and reasoning in
program synthesis. Artificial Intelligence, 6(2):175 – 208, 1975.
[MW79] Z. Manna and R. Waldinger. Synthesis: Dreams => programs. IEEE
Trans. Softw. Eng., 5(4):294–328, July 1979.
99
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
[MW80] Zohar Manna and Richard Waldinger. A deductive approach to program
synthesis. ACM Trans. Program. Lang. Syst., 2(1):90–121, January 1980.
[ND00] Hung Q Ngo and Ding-Zhu Du. A survey on combinatorial group testing
algorithms with applications to DNA library screening. DIMACS Series
in Discrete Mathematics and Theoretical Computer Science, 2000.
[Pel02] Andrzej Pelc. Searching games with errors—fifty years of coping with
liars. Theor. Comput. Sci., 270(1-2):71–109, January 2002.
[PG15] Oleksandr Polozov and Sumit Gulwani. Flashmeta: A framework for
inductive program synthesis. In Proceedings of the 2015 ACM SIGPLAN
International Conference on Object-Oriented Programming, Systems,
Languages, and Applications, OOPSLA 2015, part of SPLASH 2015,
Pittsburgh, PA, USA, October 25-30, 2015, pages 107–126, 2015.
[PJS+14] Phitchaya Mangpo Phothilimthana, Tikhon Jelvis, Rohin Shah, Nishant
Totla, Sarah Chasins, and Rastislav Bodik. Chlorophyll: Synthesis-
aided compiler for low-power spatial architectures. In ACM SIGPLAN
Conference on Programming Language Design and Implementation,
PLDI ’14, Edinburgh, United Kingdom - June 09 - 11, 2014, pages
396–407, 2014.
[Plo70] G. D. Plotkin. A note on inductive generalization. Machine Intelligence,
5, 1970.
[RBVK16] Veselin Raychev, Pavol Bielik, Martin Vechev, and Andreas Krause. Le-
arning programs from noisy data. In Proceedings of the 43rd Annual
ACM SIGPLAN-SIGACT Symposium on Principles of Programming Lan-
guages, POPL 2016, St. Petersburg, FL, USA, January 20 - 22, 2016,
pages 761–774, 2016.
[RGMF14] Mohammad Raza, Sumit Gulwani, and Natasa Milic-Frayling. Program-
ming by example using least general generalizations. In Proceedings of
the Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27
-31, 2014, Quebec City, Quebec, Canada., pages 283–290, 2014.
[Riv05] Xavier Rival. Understanding the origin of alarms in astree. In Static Ana-
lysis, 12th International Symposium, SAS 2005, London, UK, September
7-9, 2005, Proceedings, pages 303–319, 2005.
[Sak97] Yasubumi Sakakibara. Recent advances of grammatical inference. Theo-
retical Computer Science, 185(1):15 – 45, 1997.
[SG12] Rishabh Singh and Sumit Gulwani. Learning semantic string transforma-
tions from examples. PVLDB, 5(8):740–751, 2012.
100
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
[SG16] Rishabh Singh and Sumit Gulwani. Transforming spreadsheet data types
using examples. In Proceedings of the 43rd Annual ACM SIGPLAN-
SIGACT Symposium on Principles of Programming Languages, POPL
2016, St. Petersburg, FL, USA, January 20 - 22, 2016, pages 343–356,
2016.
[SL08] Armando Solar-Lezama. Program synthesis by sketching. ProQuest,
2008.
[SLJB08] Armando Solar-Lezama, Christopher Grant Jones, and Rastislav Bo-
dik. Sketching concurrent data structures. In Proceedings of the ACM
SIGPLAN 2008 Conference on Programming Language Design and Im-
plementation, Tucson, AZ, USA, June 7-13, 2008, pages 136–148, 2008.
[Smi75] David Canfield Smith. Pygmalion: A Creative Programming Environ-
ment. PhD thesis, Stanford, CA, USA, 1975. AAI7525608.
[SSA13] Eric Schkufza, Rahul Sharma, and Alex Aiken. Stochastic superopti-
mization. In Architectural Support for Programming Languages and
Operating Systems, ASPLOS ’13, Houston, TX, USA - March 16 - 20,
2013, pages 305–316, 2013.
[SSG75] David E. Shaw, William R. Swartout, and C. Cordell Green. Inferring
lisp programs from examples. In Proceedings of the 4th International
Joint Conference on Artificial Intelligence - Volume 1, IJCAI’75, pages
260–267, 1975.
[SSL11] Rishabh Singh and Armando Solar-Lezama. Synthesizing data structure
manipulations from storyboards. In SIGSOFT/FSE’11 19th ACM SIGS-
OFT Symposium on the Foundations of Software Engineering (FSE-19)
and ESEC’11: 13th European Software Engineering Conference (ESEC-
13), Szeged, Hungary, September 5-9, 2011, pages 289–299, 2011.
[Sum77] Phillip D. Summers. A methodology for lisp program construction from
examples. J. ACM, 24(1):161–175, January 1977.
[TLHL11] Stavros Tripakis, Ben Lickly, Thomas A. Henzinger, and Edward A. Lee.
A theory of synchronous relational interfaces. ACM Trans. Program.
Lang. Syst., 33(4), 2011.
[URD+13] Abhishek Udupa, Arun Raghavan, Jyotirmoy V. Deshmukh, Sela Mador-
Haim, Milo M.K. Martin, and Rajeev Alur. Transit: Specifying protocols
with concolic snippets. In ACM SIGPLAN Conference on Programming
Language Design and Implementation, PLDI ’13, Seattle, WA, USA, June
16-19, 2013, pages 287–296, 2013.
101
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
[Val84] L. G. Valiant. A theory of the learnable. Commun. ACM, Nov. 1984.
[VHL+12] Margus Veanes, Pieter Hooimeijer, Benjamin Livshits, David Molnar, and
Nikolaj Bjorner. Symbolic finite state transducers: Algorithms and appli-
cations. In Proceedings of the 39th ACM SIGPLAN-SIGACT Symposium
on Principles of Programming Languages, POPL 2012, Philadelphia,
Pennsylvania, USA, January 22-28, 2012, pages 137–150, 2012.
[VTR+14] Mandana Vaziri, Olivier Tardieu, Rodric Rabbah, Philippe Suter, and
Martin Hirzel. Stream processing with a spreadsheet. In ECOOP 2014
- Object-Oriented Programming - 28th European Conference, Uppsala,
Sweden, July 28 - August 1, 2014. Proceedings, pages 360–384. 2014.
[Was16] Kunihiro Wasa. Enumeration of enumeration algorithms. CoRR,
abs/1605.05102, 2016.
[WDR06] Eugene Wu, Yanlei Diao, and Shariq Rizvi. High-performance complex
event processing over streams. In Proceedings of the ACM SIGMOD
International Conference on Management of Data, Chicago, Illinois,
USA, June 27-29, 2006, pages 407–418, 2006.
[WGS16] Xinyu Wang, Sumit Gulwani, and Rishabh Singh. FIDEX: filtering
spreadsheet data using examples. In Proceedings of the 2016 ACM
SIGPLAN International Conference on Object-Oriented Programming,
Systems, Languages, and Applications, OOPSLA 2016, part of SPLASH
2016, Amsterdam, The Netherlands, October 30 - November 4, 2016,
pages 195–213, 2016.
[WL69] Richard J. Waldinger and Richard C. T. Lee. Prow: A step toward
automatic program writing. In Proceedings of the 1st International Joint
Conference on Artificial Intelligence, IJCAI’69, pages 241–252, San
Francisco, CA, USA, 1969. Morgan Kaufmann Publishers Inc.
[YF] Yahoo!-Finance. finance.yahoo.com.
[YTM+13] Kuat Yessenov, Shubham Tulsiani, Aditya Krishna Menon, Robert C.
Miller, Sumit Gulwani, Butler W. Lampson, and Adam Kalai. A colorful
approach to text processing by example. In The 26th Annual ACM
Symposium on User Interface Software and Technology, UIST’13, St.
Andrews, United Kingdom, October 8-11, 2013, pages 495–504, 2013.
[ZS13] Sai Zhang and Yuyin Sun. Automatically synthesizing SQL queries
from input-output examples. In 2013 28th IEEE/ACM International
Conference on Automated Software Engineering, ASE 2013, Silicon
Valley, CA, USA, November 11-15, 2013, pages 224–234, 2013.
102
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
שונה בגישה שנוהג מדויק אלגוריתם מראים אנו – אבסטרקטיות מדוגמאות סינתזה .4
המשתמש כוונת למידת של המשימה את הפרידו קודמות גישות הקודמים. מהאלגוריתמים
מאמינים תמ"ד מומחי לרב, הראשונה). המשימה את בעיקר (ולמדו התוכנית ייצור ממשימת
אחרונה כתרומה לכן, לייצר. שיש התוכנית אחר החיפוש את להניע צריך התוכניות שמרחב
להבטיח שונה גישה מראים אנו עתידית), לעבודה חדש תחום שפותחת מאמינים (שאנו
בעזרת המשתמש עם לתקשר הוא המרכזי הרעיון התוכניות. במרחב חיפוש תוך דיוק
למשתמש אותה ומתאר מתאימה שנראית תוכנית בוחר המסנתז כאשר אבסטרקטיות, דוגמאות
כמפ־ משמשות האבסטרקטיות הדוגמאות אבסטרקטיות. דוגמאות של קטן מספר בעזרת
הדוגמאות דרך שלו. החיפוש בתהליך שוקל שהמסנתז תוכניות בשביל אינטואיטיבי רט
על כוונתו את תופסת מייצר שהמסנתז האחרונה שהתוכנית למשתמש מובטח האבסטרקטיות,
האפשריים. הקלטים כל
iii
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
מדויקת למידה – התיזה כותרת (ומכאן מדויקת למידה של בתחום כבעיה מדוגמאות המשתמש כוונת
התחומים משלושת אחד עם מקושר שלרוב חישובית בלמידה תחום היא מדויקת למידה מדוגמאות).
אנו זו, בתיזה . [Ang88] משאלות ולמידה ,[Val84] CAP למידת ,[Gol67] בגבול למידה הבאים:
היא התלמיד ומטרת רעיון יודע המורה ותלמיד. מורה יש זה, במודל האחרון. המודל אחר עוקבים
ושאלות קיום שאלות שאלות: סוגי משני אחד להציג יכול התלמיד כך, לשם רעיון. אותו את ללמוד
הוא המורה שלנו, בהקשר שאלות. שפחות כמה לשאול היא התלמיד של המשנית המטרה שקילות.
הקלטים כל על המשתמש כוונת את שמביעה נוסחה הוא והרעיון המסנתז הוא התלמיד המשתמש,
שאלת (או שקילות שאלת המשתמש. לכוונת מתאים קלט־פלט זוג האם היא קיום שאלת האפשריים.
ב־ שקילות שאלת על עונה המורה אם המשתמש. כוונת את מתארת מסוימת נוסחה האם היא וידוא)
אפשריות לא הן שקילות ששאלות למרות נגדית. דוגמה מספק המורה אחרת, הסתיימה. הלמידה 'כן',
.[ABJ+13, IGIS10, SL08] כאלו שאלות מאפשרות שכן בסינתזה עבודות קיימות תמ"ד, של בשיטה
מספק הפורמלי המפרט משתמש). (ולא פורמלי מפרט עם מאמת ע"י ממומש המורה אלו, בעבודות
אוטומטי. באופן שקילות שאלות על לענות יעילה דרך
[JGST10] בודדת עבודה מלבד חדשנית. היא מדויקת למידה של כבעיה תמ"ד של שלנו ההגדרה
לא התמ"ד מסנתזי כל אחת, תוכנית מכיל המתאימות התוכניות שמרחב עד קיום שאלות שמציגה
זו בודדת עבודה של שהגישה בעוד שקיבלו. לדוגמאות מעבר המשתמש כוונת את ללמוד מבטיחים
שאלות מספר על טריוויאלי לא חסם לה אין המשתמש, כוונת את שמביעה תוכנית ללמוד מבטיחה
ידי על האלגוריתם יעילות את להציג יש מדויקת בלמידה בעבודות להבדיל, מציגה. שהיא הקיום
הן: הזו התיזה של התרומות תחתון. לחסם והשוואה הקיום שאלות סיבוכיות ניתוח
שלומד מדויק תמ"ד מסנתז בהצגת מתחילים אנו תבניות־זמן: בשביל מדויק תמ"ד מסנתז .1
תבניות מייצגים אנו מניות). מחירי (למשל, בזמן כתלות נתונים ערכי שמציגים בגרפים תבניות
שנעזר הזאת המחלקה של למידה אלגוריתם ומראים אי־שיוויונות, מעל גימום כנוסחאות
מדוגמה מתחילה שהלמידה מניחים אנו ויזואליים). לגרפים (שמתורגמות בלבד קיום בשאלות
מסנתז עם הזה האלגוריתם את להרחיב איך מראים אנחנו גרף). (כלומר, אחת חיובית
מניות. במחירי הזאת התבנית את שמאתרת תוכנית מייצר תבנית, שמתארת נוסחה שבהינתן
עם אותן ומאתר פופולריות תבניות מגוון לומד שהוא והראינו אמפירית האלגוריתם את בדקנו
.%95 של דיוק
מעל ודיסיונקציה) (קוניונקציה ואיווי גימום של המחלקות עבור מדויקת למידה אלגוריתמי .2
עשויות תחבירית שונות נוסחאות ולכן תלויים להיות יכולים הפרדיקטים פרדיקטים. קבוצת
מלהציג להימנע כדי שקולות הלא הנוסחאות את לזהות הוא האתגר לכן, לוגית. שקולות להיות
להגביל חשוב לוגית, שקולות הן נוסחאות האם לבדוק יקר שזה מכיוון מיותרות. קיום שאלות
"עצלני" באופן שקולות הלא הנוסחאות במרחב שמחפש אלגוריתם מראים אנו כאלו. בדיקות
ללמידה. נחוץ זה כאשר רק במרחב איברים מחשב הוא –
מעל Disjunctive normal form (DNF) נוסחאות של המחלקות עבור מדויקת למידה אלגוריתמי .3
תתי בשתי מתרכזים ואז כללי אלגוריתם עם מתחילים אנו פרדיקטים. של שרירותית קבוצה
"אנטי־ הפרדיקטים שבה מחלקה ו־(2) לשלילה סגורים הפרדיקטים שבה מחלקה (1) מחלקות:
טובה קיום שאלות סיבוכיות עם אלגוריתם מחלקה תת לכל מראים אנו לשלילה. סגורים"
הראשונה. המחלקה לתת אופטימלי אלגוריתם מראים אנו בפרט, הכללי. מהאלגוריתם יותר
ii
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
תקציר
הקיימות. התוכניות להיצע מוגבל במחשב שימושם ולכן לתכנת יודעים אינם המחשב משתמשי מרבית
למשתמשים מסובכות הקיימות שהתוכנות מדגים בעיה תחום באותו לסייע שמיועדות התוכנות שפע
פרחו מדוגמאות תכנות ובפרט תוכנה של סינתזה שיטות לצרכיהם. מספק באופן עונות לא או
על משלהם תוכניות לכתוב למשתמשים ולאפשר אלו בעיות בדיוק למנוע במטרה האחרונות בשנים
של בסינתזה המטרה אחד. קוד קטע לבדוק או לכתוב מבלי דוגמאות, בעזרת כוונתם תיאור ידי
ולא תיאורי לרב הוא המפרט גבוהה). (בשפה ממפרט נמוכה) (בשפה תוכנית ליצור היא תוכנה
שניתן לקוד מפרטים לתרגם תחבירי באופן יכולים לא מסנתזים לכן, אותו. לממש יש כיצד מסביר
ושיטות הסקה ידי על זה אתגר עם התמודדו הראשונים המסנתזים עושים. שמהדרים כפי להרצה,
הן שנוצרו שהתוכניות מובטח פעולתם, אופן פי שעל הוא כאלו מסנתזים של היתרון .[MW71] המרה
מוחלטים לא בחוקים תלוי פעולתם שאופן הוא שלהם החסרון המפרט. את מממשות כלומר נכונות,
בחירת את לשפר כדי יוריסטיקות הציגו שיטות שכמה למרות יסתיים. שהתהליך מובטח לא ולכן
לשיטות או [SLJB08] אילוצים פתרון של לשיטות עברו מודרניים מסנתזים , [JNR02] החוקים
פתרון וכל אילוצים של קבוצה הוא המפרט אלו, בגישות .[SSA13] התוכניות מרחב על מניה של
שהמפרט היא אלו שיטות של ההנחה כלומר, חוקי. לפתרון נחשב האלו האילוצים על שעונה (תוכנית)
אם גם התוכנית, של אפשרי קלט כל על נכון הוא המפרט על שעונה פתרון כל כלומר, שלם. הוא
מפורש. באופן עליהם הרצויה ההתנהגות את תיאר לא המפרט
[Gul10, LWDW03, DSPGMW10, פופולריות צברה (תמ"ד) מדוגמאות תכנות של השיטה במקביל,
HG11, Gul11, GHS12, SG12, YTM+13, AGK13, ZS13, MTG+13, LG14, FCD15, BGHZ15,
בהשוואה קלט־פלט. דוגמאות של קבוצה הוא המפרט מדוגמאות בתכנות .PG15, SG16, RBVK16]
תוכנית, של יעיל לא מימוש או לוגית נוסחה הוא המפרט בהן תוכנה, של סינתזה של אחרות לשיטות
בשיטות אם לכן, המפרט. את מתמטית בצורה לייצג יש שבו האופן על מוקדם ידע דורש לא תמ"ד
כל להיות יכול המשתמש בתמ"ד מתכנתים, או מומחים להיות צריכים היו המשתמשים הקודמות
היא זו שיטה של ההנחה משמעותית. יותר הרבה היא תמ"ד של האפשרית ההשפעה כלומר, אחד.
לא זה הצער, למרבה דוגמאות. של קטן מספר בעזרת כוונתם את להביע יכולים שהמשתמשים
אלגוריתמי לכן, המשתמש. כוונת של חלקי תיאור מהוות הגדרתן באופן ודוגמאות נכון בהכרח
יכולים ולא סיפק, שהמשתמש הדוגמאות עם שעקבית תוכנית ייצרו שהם להבטיח רק יכולים תמ"ד
על נכונות להבטיח שרוצה משתמש קיבלו. לא שהם קלטים על המשתמש כוונת את להבין להבטיח
לטעויות. נוטה אשר קשה משימה וזוהי ידני, באופן התוכנית את לבדוק חייב האפשריים הקלטים כל
האפשריים הקלטים כל על המשתמש כוונת את ללמוד שמבטיחים אלגוריתמים מראים אנו זו, בתיזה
למידת של הבעיה את מייצגים אנו כך, לשם דוגמאות. בעזרת לתקשר למשתמש מאפשרים ועדיין
i
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
המחשב. למדעי בפקולטה יהב, ערן פרופסור של בהנחייתו בוצע המחקר
ובכתבי־עת בכנסים למחקר ושותפיו המחבר מאת כמאמרים פורסמו זה בחיבור התוצאות מן חלק
הינן: ביותר העדכניות גרסאותיהם אשר המחבר, של הדוקטורט מחקר תקופת במהלך
Nader Bshouty, Dana Drachsler-Cohen, Martin T. Vechev, and Eran Yahav. Learning disjunctions ofpredicates. In Proceedings of the 30th Conference on Learning Theory, COLT 2017, 2017.
Dana Drachsler-Cohen, Sharon Shoham, and Eran Yahav. Synthesis with abstract examples. In ComputerAided Verification - 29th International Conference, CAV 2017, 2017.
Dana Drachsler-Cohen, Martin T. Vechev, and Eran Yahav. Optimal learning of specifications fromexamples (in preparation). CoRR, abs/1608.00089, 2016.
תודות
לך תודה ידו. על מונחית להיות גדול מזל לי שהיה יהב, ערן לפרופ' מודה אני כל, ראשית
דיונים על לך תודה הלימודים. לאורך אופטימיות על לשמור לי שגרמה המדבקת ההתלהבות על
לכתוב איך אותי שלימדת לך תודה הגשות. לפני הארוכים הלילות במהלך במיוחד רבים, מעשירים
את לקצר ואיך מסובך, רעיון כל של המהות את ולהסביר למצוא איך ואלגנטית, פשוטה בצורה
לרדוף תמיד אותי שלימדת לך תודה זה...). על לעבוד להמשיך ונצטרך ייתכן כי (אם שלי המשפטים
על לך תודה הכל, מעל הדרך. במהלך אתגר כל על ולהתגבר ביותר המעניינות המחקר שאלות אחרי
תודה. אסירת תמיד אהיה מעבר, הרבה ועל כך על בי. האינסופית האמונה
הארוכות השעות על לך תודה וצ'ב, מרטין לפרופ' זו. לתיזה רבות שתרמו לשותפים גם מודה אני
עם הרבה העזרה על לך תודה בשותי, נאדר לפרופ' והרחוק. הקרוב לטווח והעצות, הדיונים ועל
השעות על לך תודה שוהם, שרון לפרופ' לבסוף, ממך. רבות למדתי הזו, התיזה של התיאורטי החלק
לזהות ואיך והוכחות אלגוריתמים רעיונות, לפשט דרכים לחפש תמיד איך אותי שלימדת על הארוכות,
אלגנטית. בצורה אותן ולפתור עדינות נקודות
התמי־ על לכם תודה גל. היקר, ובעלי דורין, אחותי, וגבריאל, אילנה להוריי, מודה אני לבסוף,
האהבה על הכל ומעל בפרספקטיבה, דברים העמדתם שתמיד על העמוסות, התקופות במהלך כה
לכם. מוקדשת הזאת התיזה בי. והאמונה
בהשתלמותי. הנדיבה הכספית התמיכה על לטכניון מודה אני
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
מדוגמאות מדויק תכנות
מחקר על חיבור
התואר לקבלת הדרישות של חלקי מילוי לשם
לפילוסופיה דוקטור
כהן דרקסלר דנה
לישראל טכנולוגי מכון – הטכניון לסנט הוגש
2017 יוני חיפה התשע"ז סיון
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017
מדוגמאות מדויק תכנות
כהן דרקסלר דנה
Technion - Computer Science Department - Ph.D. Thesis PHD-2017-09 - 2017