1 a survey on theorem provers in formal methods

26
1 A Survey on Theorem Provers in Formal Methods M. Saqib Nawaz, Moin Malik, Yi Li, Meng Sun and M. Ikram Ullah Lali Abstract—Mechanical reasoning is a key area of research that lies at the crossroads of mathematical logic and artificial intelligence. The main aim to develop mechanical reasoning systems (also known as theorem provers) was to enable mathematicians to prove theorems by computer programs. However, these tools evolved with time and now play vital role in the modeling and reasoning about complex and large-scale systems, especially safety-critical systems. Technically, mathematical formalisms and automated reasoning based-approaches are employed to perform inferences and to generate proofs in theorem provers. In literature, there is a shortage of comprehensive documents that can provide proper guidance about the preferences of theorem provers with respect to their designs, performances, logical frameworks, strengths, differences and their application areas. In this work, more than 40 theorem provers are studied in detail and compared to present a comprehensive analysis and evaluation of these tools. Theorem provers are investigated based on various parameters, which includes: implementation architecture, logic and calculus used, library support, level of automation, programming paradigm, programming language, differences and application areas. Index Terms—Mathematical logics, Reasoning, Interactive theorem provers, Automated theorem provers, Proof automation, Survey. 1 I NTRODUCTION Recent developments and evolution in Information and Com- munication Technology (ICT) have made computing systems more and more complex. The criteria that how much we can rely on these systems is based on their correctness. Bugs or any loopholes in the system lead to severe risks that endanger human safety or financial loss. In recent times, bugs ratio has increased due to the complex designs of the modern systems under market pressure and user requirements. The efforts and cost required to correct bugs increases as the gap widens between their introduction and detection. Table 1, taken from [272], shows the relative costs to fix bugs that are introduced in the requirements phase. A bug introduced in a particular stage of a system development is relatively cheap to fix if also detected in that stage. It becomes more hard and expensive to fix a bug that is introduced in one stage and detected in the other stage. For safety-critical systems, the impact of bugs can be so large as to make a fix effectively mandatory. TABLE 1: Cost to fix a bug introduced in requirements phase Bug Found at Stage Relative Cost to Fix Requirements 1 (definition) Architecture 3 Design 5-10 System Test 10 Post-Release 10-100 Testing and verification techniques are used in the system test phase to empirically check their correctness. In testing, a system is tested against the software/hardware requirements [201]. Similarly, simulation provides virtual environments for any real M. Saqib Nawaz is with School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China. E- mail:[email protected] Moin Malik is with Department of Computer Science & IT, University of Sargodha, Pakistan. E-mail:[email protected] Yi Li and Meng Sun are with School of Mathematical Sciences, Peking University, Beijing, China. E-mail:{liyi math, sunm}@pku.edu.cn M. Ikram Ullah Lali is with Department of Computer Science, Faculty of Computing and IT, University of Gujrat, Pakistan. E- mail:[email protected] events. However, there are some inherent limitations of these techniques. A program can be only tested against the functional requirements of the system, which may be not refined and may contain ambiguities that leads to inadequate testing. Exhaustive testing of systems is not possible. Moreover, the time and budget constraints may affect the testing process. Simulations are also based on assumptions and does not always cover all the aspects of the system [77]. Furthermore, both testing and simulation can not be used efficiently for analyzing the continuous or hybrid systems. According to [82]: “Program testing is an effective way to find errors, but it does not guarantee the absence of errors”. On the other hand, formal methods formally verify the system correctness. Formal methods are “mathematical-based techniques that are used in the modeling, analysis and verification of both the software and hardware systems” [275]. Formal methods allow the early introduction of models in the development life-cycle, against which the system can be proved by using appropriate mathematics. As mathematics is involved in the modeling and analysis, 100% accuracy is guaranteed [123]. But why we need formal methods in place of other well-known, widely acceptable and easy to use techniques such as testing and simulation? To answer this question, we first provide a few examples where testing and simulation failed. Air France Flight 447 crashed in June 2009, which resulted into hundred of casualties. During investigation, it was found that the probe sensors were unable to measure the accurate speed of the plane, which provides the automatic disengagement of autopilot. Similarly, in August 2005, Malaysian Airbus 124 landed unexpectedly 18 minutes after taking off due to a fault in its air data inertial reference unit. There are two accelerators that control the airspeed of the flight, but one of them failed, which resulted into a sudden rapid climbing and passed almost 4000 feet higher than expected without any warning. After investigation, it came to know that on the failure of the first accelerator, the second one used the falsy data of former due to input anomaly in the software. The probe, which laid hidden for a decade, was not arXiv:1912.03028v1 [cs.SE] 6 Dec 2019

Upload: others

Post on 18-Dec-2021

30 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 A Survey on Theorem Provers in Formal Methods

1

A Survey on Theorem Provers in FormalMethods

M. Saqib Nawaz, Moin Malik, Yi Li, Meng Sun and M. Ikram Ullah Lali

Abstract—Mechanical reasoning is a key area of research that lies at the crossroads of mathematical logic and artificial intelligence.The main aim to develop mechanical reasoning systems (also known as theorem provers) was to enable mathematicians to provetheorems by computer programs. However, these tools evolved with time and now play vital role in the modeling and reasoning aboutcomplex and large-scale systems, especially safety-critical systems. Technically, mathematical formalisms and automated reasoningbased-approaches are employed to perform inferences and to generate proofs in theorem provers. In literature, there is a shortage ofcomprehensive documents that can provide proper guidance about the preferences of theorem provers with respect to their designs,performances, logical frameworks, strengths, differences and their application areas. In this work, more than 40 theorem provers arestudied in detail and compared to present a comprehensive analysis and evaluation of these tools. Theorem provers are investigatedbased on various parameters, which includes: implementation architecture, logic and calculus used, library support, level ofautomation, programming paradigm, programming language, differences and application areas.

Index Terms—Mathematical logics, Reasoning, Interactive theorem provers, Automated theorem provers, Proof automation, Survey.

F

1 INTRODUCTION

Recent developments and evolution in Information and Com-munication Technology (ICT) have made computing systems moreand more complex. The criteria that how much we can rely onthese systems is based on their correctness. Bugs or any loopholesin the system lead to severe risks that endanger human safety orfinancial loss. In recent times, bugs ratio has increased due to thecomplex designs of the modern systems under market pressureand user requirements. The efforts and cost required to correctbugs increases as the gap widens between their introduction anddetection. Table 1, taken from [272], shows the relative coststo fix bugs that are introduced in the requirements phase. Abug introduced in a particular stage of a system development isrelatively cheap to fix if also detected in that stage. It becomesmore hard and expensive to fix a bug that is introduced in onestage and detected in the other stage. For safety-critical systems,the impact of bugs can be so large as to make a fix effectivelymandatory.

TABLE 1: Cost to fix a bug introduced in requirements phaseBug Found at Stage Relative Cost to Fix

Requirements 1 (definition)Architecture 3

Design 5-10System Test 10Post-Release 10-100

Testing and verification techniques are used in the systemtest phase to empirically check their correctness. In testing, asystem is tested against the software/hardware requirements [201].Similarly, simulation provides virtual environments for any real

• M. Saqib Nawaz is with School of Computer Science andTechnology, Harbin Institute of Technology, Shenzhen, China. E-mail:[email protected]

• Moin Malik is with Department of Computer Science & IT, University ofSargodha, Pakistan. E-mail:[email protected]

• Yi Li and Meng Sun are with School of Mathematical Sciences, PekingUniversity, Beijing, China. E-mail:{liyi math, sunm}@pku.edu.cn

• M. Ikram Ullah Lali is with Department of Computer Science,Faculty of Computing and IT, University of Gujrat, Pakistan. E-mail:[email protected]

events. However, there are some inherent limitations of thesetechniques. A program can be only tested against the functionalrequirements of the system, which may be not refined and maycontain ambiguities that leads to inadequate testing. Exhaustivetesting of systems is not possible. Moreover, the time and budgetconstraints may affect the testing process. Simulations are alsobased on assumptions and does not always cover all the aspectsof the system [77]. Furthermore, both testing and simulation cannot be used efficiently for analyzing the continuous or hybridsystems. According to [82]: “Program testing is an effective wayto find errors, but it does not guarantee the absence of errors”.On the other hand, formal methods formally verify the systemcorrectness. Formal methods are “mathematical-based techniquesthat are used in the modeling, analysis and verification of boththe software and hardware systems” [275]. Formal methods allowthe early introduction of models in the development life-cycle,against which the system can be proved by using appropriatemathematics. As mathematics is involved in the modeling andanalysis, 100% accuracy is guaranteed [123]. But why we needformal methods in place of other well-known, widely acceptableand easy to use techniques such as testing and simulation? Toanswer this question, we first provide a few examples wheretesting and simulation failed.

Air France Flight 447 crashed in June 2009, which resultedinto hundred of casualties. During investigation, it was found thatthe probe sensors were unable to measure the accurate speedof the plane, which provides the automatic disengagement ofautopilot. Similarly, in August 2005, Malaysian Airbus 124 landedunexpectedly 18 minutes after taking off due to a fault in its airdata inertial reference unit. There are two accelerators that controlthe airspeed of the flight, but one of them failed, which resultedinto a sudden rapid climbing and passed almost 4000 feet higherthan expected without any warning. After investigation, it cameto know that on the failure of the first accelerator, the secondone used the falsy data of former due to input anomaly in thesoftware. The probe, which laid hidden for a decade, was not

arX

iv:1

912.

0302

8v1

[cs

.SE

] 6

Dec

201

9

Page 2: 1 A Survey on Theorem Provers in Formal Methods

2

found in testing because the designer had never assumed thatsuch an event might occur [58]. In June 2009, Metro Train inWashington crashed and as a result, the operator of the train and80 other people got injured severely. The cause of this incidentwas the design anomaly in the safety signal system. The safetysystem sent a green signal to the upcoming station, while thetrack was not empty. Similarly, other examples are failure of theLondon Ambulance Service’s computer added dispatch system[217], Therac 25 [184], Anaesthetic equipment and the respirationmonitoring device [189] which resulted into the casualties andfinancial losses. All listed accidents could have been avoided ifthe design of the systems were analyzed mathematically.

Formal methods techniques in contrast to testing permit theexhaustive investigation and reveals those bugs which are missedby testing methods. Actual requirements of the system in suchtechniques are translated into formal specifications which aremathematically verified and elaborate the actual behavior of thesystem in real scenarios. Two most popular formal verificationmethods are model checking and theorem proving. Inmodel checking, a finite model of the system is developed first,whose state space is then explored by the model checker toexamine whether a desired property is satisfied in the model ornot [28]. Model checking is automatic, fast, effective and it canbe used to check the partial specification of the system. However,model checkers still face the state-space explosion problem [62].Model state space grows infinitely large with increase in the totalnumber of variables and components used and the number ofdistinct values assigned to these variables [204].

Theorem proving on the other hand, can be used to handleinfinite systems. In theorem proving, systems are defined andspecified by users in an appropriate mathematical logic. Im-portant/critical properties of the system are verified by theoremprovers. Theorem prover checks that whether a statement (goal)can be derived from a logical set of statements (axiom/hypothesis).It can model and verify any system that can be defined withthe help of mathematical logics. It is akin to Computer AlgebraSystem (CAS) because both are used for symbolic computation.However, theorem provers have some advantages over CAS suchas: flexibility in logic expressiveness, clear expression and morerigor. Theorem provers can be further categorized into two maintypes: Automated Theorem Provers (ATPs) and Interactive Theo-rem Provers (ITPs). ATPs deal with the development of automatedcomputer programs to prove the goals [254]. In contrast, ITPsinvolve human interaction with computer in the process of proofsearching and development. That is why ITPs are also known asproof-assistants. Due to practical limitations in pure automation,interactive proving is the suitable way for the formalization of“most non trivial theorems in mathematics or computer systemcorrectness” [121]. Theorem provers have been used successfullyin various domains such as biomedical [232], game theory [159],machine learning [147], economy [160], computer science [97],artificial intelligence [268] and self-adaptive systems [271]. Notethat the terms theorem provers and mechanical reasoning systemsare used interchangeably in this paper.

We believe that a comprehensive review on mechanical rea-soning systems is strongly needed. People having little knowledgeabout them generally think that all the systems based on math-ematics have similar nature. However, it is not the case. Eachsystem has different functionality and it is not an easy task toselect which system should be used for the formalization efforts.Theorem provers are diverse in nature and the main aim of this

work is to demonstrate how different they are. Moreover, the goalis to provide a proper guidance to new researchers in formalverification. In order to substantiate our work, a questionnairehas been designed for the evaluation of theorem provers. Thequestionnaire is then filled by the developers and active researchersof theorem provers. Mechanical reasoning systems are investigatedfor the following parameters:

• Mathematical logic used in the system,• Implementation language of the system,• System type,• Platform-support in the system,• System category, whether it belongs to ATP or ITP,• Truth value of the system (binary/fuzzy),• Calculus (deductive/inductive) of the system,• Set (ZF/fuzzy) theoretic support in the system,• Programming paradigm of the system,• User Interface (UI) of the system,• Scalability of the system,• Distributed/multi-threaded,• Integrated development environment (IDE) support,• Library support in the the system,• Whether the system satisfy the de Bruijn criterion, and• Whether the system satisfy the Poincare principle of au-

tomation.

Primarily, this survey is a collection of tables and figuresthat illustrates various aspects of theorem provers. We receivedreplies from experts/developers of 16 theorem provers. Another27 theorem provers characteristics are investigated through onlinedatabases and research articles. We also report the top scientistswho have proved maximum number of theorems in provers andtop provers in which most number of mathematical theoremsare proved till now. ATP that performed best at CADE ATPsystem competition (CASC) are also discussed. CASC is a yearlycompetition for the first-order logic fully ATPs. Moreover, afore-mentioned parameters are used to compare the provers and topresent their main characteristics and differences. Finally, theirapplications are discussed along-with the existing recent work andthe potential application/problems, where they can be used.

The rest of the survey is structured as follows: An overviewon the historical background of theorem provers in the light oflogical frameworks is provided in Section 2. Related work is alsodiscussed. Section 3 elaborates the research methodology that isbased on the systematic literature review in software engineering.Section 4 presents the results of the questionnaire, where answersof the experts and developers are presented. In Section 5, topscientist that proved most of the theorems and top provers in whichmaximum number of theorems are proved till now are listed.ATP that performed best at CASC competition are also listed.Finally, theorem provers are compared for aforementioned criteria.The strengths, in-depth analysis of the differences among theexisting theorem provers and their application areas are discussedin Section 6, along-with the future research directions. The surveyconcludes with some remarks in Section 7.

2 BACKGROUND

In this section, historical background of the logical frameworksthat are used in mechanical reasoning systems is presented. Fur-thermore, related work on the surveys and previous comparisonsof theorem provers is discussed.

Page 3: 1 A Survey on Theorem Provers in Formal Methods

3

2.1 Mathematical Logic

Treating mathematics as a formal theory where all the mathemat-ical statements are proved with a set of few basic axioms andinference rules is a long-standing goal. However, formal proofs oftheorems require a lot of steps, effort and time. Many ancientGreek logicians, mathematicians and philosophers successfullyexpressed reasoning through syllogism [124]. Syllogism dealswith formalization of deductive reasoning on logical arguments toarrive at a conclusion based on two or more propositions. Leibnizworked on the ways to reduce human reasoning calculationsin symbolic logic and embodied these deductive reasoning intomathematics, making it easier to be implemented in computerprograms. Mathematical reasoning provide objectivity and cre-ativity that is hard to be found in any other fields. In mechanicalreasoning, two paths were presented. One was to analyze humanproof creation process and implement it using computationalresources. The other was to utilize the work of logicians andtransform the logical reasoning into a standard form on whichalgorithms are based [97]. In the mid of 1950s, the relationshipof the computer to mathematics has emerged in the form ofautomated reasoning, especially the automation of mathematicalproofs [188]. More discussion on formal deduction, LCF (Logicof Computable Functions) and modern type theory role in thedevelopment of theorem provers can be found at [31], [191].

Emergence of automated reasoning initiated the earliest workon computer-assisted proofs, when the first general purpose com-puters became available [121]. The field of computer supportedtheorem proving gets attention in the second half of 20th century.In 1970s, theorem provers were investigated for verification ofcomputers systems. Extensive research in this area was done in late1980s when these tools were used in the verification of hardwaresystems. In mid 1990s, a bug in Intel’s Pentium II processor causedby floating point division increased the interest in formal methodsand formal hardware verification tools were used by industry intheir system design in late 1990s [176]. In 1994, Boyer proposedthe QED manifesto [47] for a “computer-based database of allmathematical knowledge” (formalized mathematics). During theyears, the QED manifesto is adopted by many theorem provers.In recent years, the mechanical mathematical proofs for systemverification has gained popularity [265]. Future of formal methodslooks promising and practical. Big companies such as Google,Facebook, IBM, Intel and Microsoft are now using and conductingresearch in formal methods.

Theorem provers are fundamentally based on mathematicallogics. Among the most popular theorem provers, we take threewidely used ones: Propositional Logic, First-Order Logic (FOL)and Higher-Order Logic (HOL). Each logic is further discussednext.

2.1.1 Propositional Logic

Propositional logic is used to represent atomic proposi-tions or declarative sentences with the help of mathematicalBoolean operators such as and, or, not, implicationand equivalence. It is also known as axiomatization ofBoolean logic. Logical operators such as conjunction (∧),disjunction (∨), not (¬), implication (=⇒) andbi-implication (⇐⇒) are used to bind propositions tomake sentences. Truth values (True or False) are assigned tothese propositions for evaluation of a sentence. Axiomatizationof Boolean algebra is also performed through propositional logic.

Strong argument about propositional logic is that it is decidablewith the help of truth tables.

Newell developed the first theorem proving program LogicTheorist in 1956 [245]. This program proved propositional logictheorems using axioms/inference rules. It not only worked fornumeric expressions but also for symbolic formulas and proofsearching is guided through heuristics. It proved 38 theoremsout of 52 and proofs of the theorems were more elegant. An-other contribution in mechanical reasoning system was Davisand Putnam’s semantics-based procedure reference [78]. It wasa decision method for checking whether a formula in Conjunc-tive Normal Form (CNF) is satisfiable or not. Such problemsnowadays are called SAT (satisfiability) problems. They used“ground resolution” for proving mathematical formulas in theform of predicate logic. Ground resolution used two propositionalclauses and generate another propositional clause. For testingBoolean formulas, Davis and Putnam’s procedure implementeda series of ground resolution for proving satisfiability of theseformulas [95]. Satisfiability of Boolean formulas was also testedwith Davis-Putnam-Longemann-Loveland (DPLL) method [79].This was a searching algorithm based on backtracking mechanism.DPLL was used in checking the satisfiability of propositional logicformulas that are in CNF. This is an improved version of Davisand Putnam’s procedure. This method used backtracking searchinstead of ground resolution. DPLL as compared to Davis andPutnam’s method was much faster. It has a semantics searchingprocess which helps in truth assignment. NP complete problem[93] was the new notion, which was introduced several years laterafter DPLL. These problems were developed and proved by Cook[65]. This motivated the development of SAT for solving hardproblems, which has been improved over the last few years bothon algorithmic and implementation level.

2.1.2 First-Order LogicPropositional logic has less power as it is based on propositionsand have no ability to predict any complex behavior. Furthermore,it does not support variables, functions, relations and quantifica-tions to compute complex problems. For example, “Socrates isa man” can be represented by propositional logic, but “all menare mortal” can not be represented in it because of quantificationinvolvement. First-Order Logic (FOL) is the extension of propo-sitional logic that allows quantifiers. Predicate logic is the generalcategory to which FOL belongs. A predicate is a two valued orBoolean function that maps a given domain of objects onto aBoolean set, and this function is used to show specific quality orproperty between variables. For example, we assumed that Q(n)is a predicate where n is an even integer. Domain of discoursefor this predicate is set of all integers. Therefore Q(n) dependson the value of n. Logical operators and quantifiers (universal(∀), existential (∃)) are used in predicate logic to expressproblems. It is less challenging to build automated theorem proverbased on FOL. However, satisfiability checking of FOL is semi-decidable. Skolem and Herbrand first designed a semi-decidableprocedure in 1920 [141]. Their method was based on unguidedretrieving for proof searching process and enumeration of groundterms. This method was useless in practical terms such as provingnon-trivial theorems. However, it played an important role forimplementing theorem provers based on FOL.

Prawitz was the first one who developed a general mechanicalreasoning system for FOL [39]. FOL provers such as Otter[197] and Setheo [183] are mostly based on resolution and

Page 4: 1 A Survey on Theorem Provers in Formal Methods

4

tableaux methods and they are used in solving puzzles, algebraicproblems, software retrieval and verification of protocols. Sometheorem provers even support recursive functions, dependent and(co)inductive types, e.g. ACL2 [156], Metamath [198] and E[240]. Such systems are used in mathematics (number theoryand set theory), compiler verification, hardware verification andcommercial applications. However, exponential time algorithmsare required for automatic proving in practice. Therefore, proofsin FOL are generally achieved by changing the FOL formula intoa tautology or Boolean satisfiability problem. In such way, BDD,DPLL based SAT solvers can be used to automatically checkthe formulas. Satisfiability Modulo Theories (SMT) deals withthe satisfiability of formulas against some logical theory [34]. Inrecent years, SMT solvers further extended the capabilities of SATsolvers.

2.1.3 Higher-Order LogicFOL is more expressive as compared to propositional logic but lessthan Higher-Order Logic (HOL). FOL only quantifies variables.Predicates, propositions and functions are not quantified by FOL.For example: quantifier quantifying a proposition

∃s(R(y) −→ s) (1)

Another example is quantifier quantifying a predicate.

∀S(Q(s) ∧ ¬R(s)) (2)

HOL extends FOL by supporting many types of quantifi-cation. HOL permits predicates to accept premises which arealso predicates, and allows quantification over predicates andfunctions which is not the case for FOL. Based on HOL, onecan construct a proof environment which is logically sound, dometa reasoning, interactive as well as automated and practicallyimplementable. HOL is mostly used for ITPs. Various methodsused in HOL-based theorem provers are decision procedures,inductions, tableaux, rewriting, interactions and many heuristics.They are used in formalizing mathematics and verification ofprogramming languages, distributed system, compiler, softwareand hardware systems. Some well known HOL based theoremprovers are: HOL [212], Coq [38], PVS [237] and Agda [211].

HOL is undecidable, so proving HOL properties is not fully-automatic and thus human assistance is required. It is moreexpressive and has the ability to prove complex problems andtheorems. But it is more challenging to build an ATP or ITP basedon HOL. Generally, the proof process in ITPs is as follows. Userfirst states the property or feature (in the form of a theorem) that iscalled a goal. User then applies proof commands to solve it. Proofcommands may decompose the goal into sub-goals. The proofprocess is completed once all the sub-goals are solved [185].

2.2 Related WorkOur survey on mechanical reasoning systems is certainly notthe first one. Some work is done in the past on the surveyand detailed discussion and comparison on theorem provers [23],[113], [117], [118], [121], [191], [275]. A detailed description ofproof-assistants and their theoretical foundations is given in [31]along-with the comparison for nine theorem provers. However,their comparison results take only one and a half pages. 17theorem provers are compared in [274] for three parameters: (i)library size of each prover, (ii) strength of the logic used in theprover, and (iii) level of automation in each prover. Similarly, [46]

surveyed Coq, PVS, Mizar, ProofPower-Hol, HOL4, Isabelle/HOLand HOL Light for formally analyzing the real time systems.They also investigated the extended standard libraries that playmain role in proof automations: C-CoRN/MathClasses for Coq,ACL2(r) for ACL2 and the NASA PVS library. The applicationof theorem provers in economics is discussed in [160], with thefocus on two domains: social choice theory and auction theory.

In literature, other comparisons between provers can be found.However, in most of those works, only two systems are comparedgenerally. Such works include the comparisons between HOL andPVS [112], HOL and Isabelle [9], NuPRL and Nqthm [35], Coqand HOL [139], HOL and ALF [8] and Isabelle/HOL and Coq[277]. Some works have also been done on how to adapt the proofsto different systems [108], [202], [213]. In [99], Reentrant ReadersWriters problem is first modeled in UPPAAL model checker andfound a possible deadlock scenario. They further converted theUPPAAL model and analyzed the model in PVS and checked thePVS model for arbitrary number of processes. Moreover, SPINmodel checker is used in [100] for modeling and analysis ofReentrant Reader Writers problem. Promela model is convertedto PVS specification and the correctness of the model was thenverified.

3 RESEARCH METHODOLOGY

Core part of a Systematic Literature Review (SLR) is the require-ment of research questions. Right and meaningful questions aredemanded to ask during the review process and we also need topinpoint the scope of research accomplishments. Following theguidelines of [162], we have structured the research questionswith the support of PIOC (Population, Intervention, Outcome,and Context) standard for applying the SLR process in softwareengineering field. PIOC for this work is presented in Table 2.

TABLE 2: PIOC for this workPopulation Mechanical reasoning systems

Intervention Theorem proving approachOutcome Comprehensive document, evaluation and comparisonContext Developers and experts from industry and researchers

from academia

Efforts are made to collect evidence on recent scenarios ofresearch in the development of mechanical reasoning tools. Forthis purpose, we have designed research questions which arepresented in Table 3. We forwarded our questionnaire to a numberof theorem prover developers, experts and forums. Names andanswers of the domain experts that responded are presented inSection 4.

3.1 Keywords Retrieval for Search StringsKeywords (question elements) are find out from relevant researcharticles that we studied. These keywords are used for retriev-ing more information related to the development of mechanicalreasoning tools from electronic databases. These keywords arepresented in Table 4. Alternative words for the keywords are findout by using synonyms and thesaurus. These alternative keywordsare also used for the searching process. Keywords are linkedtogether with Boolean OR and search strings are constructed bylinking the four OR lists with Boolean AND.

Online databases, journals and conferences related to mechan-ical reasoning tools are used for comparison, analysis and evalua-tion. Seven electronic sources of relevance in software engineering

Page 5: 1 A Survey on Theorem Provers in Formal Methods

5

TABLE 3: Research questions from our questionnaireResearch questions about system general information

What are the names of people who contributed into the system?When system first time (date, year) appeared in the market?

What is the latest version of the system?When was the system updated last time?

What is the address of web page for accessing it online?What are the unique features of the system?What are the success stories of the system?

Research questions about system category informationWhat is the type of the theorem prover (ATP/ITP)?

What is the type of the system w.r.t. reasoning (mathematical)?What is the type of the system w.r.t. logic (FOL or HOL)?

What is the type of the system w.r.t. truth values (binary/fuzzy)?What is the type of the system w.r.t. calculus (deductive/inductive)?Either set theoretic support (ZF/fuzzy) is available in the system?

Research questions on system programming frameworkWhat is the paradigm (functional/imperative/other) of the system?

What is the programming language (C/C++/Java/Other) of the system?What is the user interface (GUI/CLI) of the system?

What is the scalability (distributed/multi-threaded) of the system?Either library support is available in the system?

TABLE 4: Keywords from mechanical reasoning surveysArmstronget al. [19]

Formal tools, software and hardware correctness, provablycorrect design, theorem provers, Satisfiability ModuloTheory, abstract models, Event-B, digital systems, formalmethods, model correctness.

Mackenzie[188]

Mathematical proofs, interactive theorem proving, auto-matic theorem proving, mathematical logic, mathematicalreasoning, formal logic, proof automation, classical andconstructive logic, machine intelligence.

Azurat &Prasetya[25]

Theorem prover, proof checker, formal verification, pro-gramming logic, formal representation, HOL, PVS, auto-matic proof generation, Coq.

Harrison[117]

Logic and program meaning, mathematical logic, sym-bolic manipulation, algebraic interpretation, formal lan-guages, software engineering.

Harrisonet al.[121]

Formal proof, interactive theorem provers, proof goals,semi automated mathematics, Automath, Coq, NuPRL,Agda, Logic of computable functions, HOL, PVS, prooflanguage, proof automation.

Boldo etal. [46]

Proof assistant, formalization, proof libraries, interac-tive theorem provers, PVS, Coq, HOL4, Isabelle/HOL,ProofPower-HOL, HOL Light, proof automation.

Wiedijk[274]

Proof assistants, proof kernel, logical framework, decid-able types, dependable types, de Bruijn criterion, Isabelle,Theorema, HOL, Coq, Metamath, PVS, Nuprl, Otter, Alfa,Mizar, ACL2.

Maric[191]

Decision procedures, proof search, theorem provers, soft-ware correctness, interactive theorem provers survey, for-mal deduction, proof checking, logical frameworks, SATsolvers, SMT solvers, Poincare principle.

Hales[113]

Computer proofs, proof assistant, small proof kernel, logi-cal framework, proof tactics, first-order automated reason-ing, mathematical proof, theorem provers.

Avigad &Harrison[23]

Axiomatic set theory, mathematical proof, calculus ofreasoning, formalized mathematics, Formal verification,interactive theorem proving, Poincare conjecture, formalproof systems.

Barendregt&Geuvers[31]

proof checking, mathematical logic, type theory and typechecking, type systems, predicate logic, higher-order logicproof development, proof-assistants, Coq, Agda, NUPRL,HOL, Isabelle, Mizar, PVS, ACL2.

is identified in [48]. However, in last few years, many new andfamous libraries are developed especially in computer sciencefield. Therefore, it may also be necessary to consider other sources.The search strings were used on 10 digital libraries: (i) DBLP, (ii)IEEE Explore, (iii) ACM Digital Library, (iv) Springer Link, (v)Science Direct, (vi) CiteSeerX , (vii) Scopus, (viii) Inspec, (ix) EICompendex, and (x) Web of Science.

4 PROVERS AND THEIR CHARACTERISTICS

In this section, answers of the developers and experts thatresponded to our questionnaire are presented. The order in whichwe present the answers of our respondents is the order in whichwe received their replies. In this way, we wish to express ourgratitude to them.

Matt Kaufman (ACL2): Matt Kaufman is a senior researchscientist working at Department of Computer Science, Universityof Texas, Austin. Matt provided information about “ACL2” [156].Main authors are M. Kaufman and J. Moore. However, severalothers have also made significant contributions. The first publicrelease of ACL2 was 1.9 in 1994. The latest version is 8.2 andupdated last time in May 2019. Basically, it is a monolithicsystem, but the applicative style of programming often makesit straightforward to use pieces of the system. It is generallyclassified as an ITP. However, it takes automation seriously; inthat sense it shares characteristics with ATP. Its logic is FOL withinduction and is written mostly in the ACL2 language, which isan applicative language extending a non-trivial subset of CommonLisp. Its UI is typically Emacs based. ACL2 is a cross platformtool: it runs on Linux, MacOS and Windows. It also runs on thetop of 6 different common Lisp implementations. Input format ofACL2 is s-expressions, though output can be pro-grammaticallyproduced. Web address is cs.utexas.edu/users/moore/acl2. ACL2has been scaled to large applications, recently at Centaur andOracle. Its users seem pretty happy with readability, but othersmight be put off by the s-expression format. Inter-operabilitybetween different provers is limited, though there has beenwork [108], [109] that connects ACL2 and HOL4, e.g. ACL2is first-order, but is still quite expressive because of its supportfor recursive definitions. Several capabilities allow it to do somethings that might be considered higher-order in nature: macros, aproof technique called functional instantiation and oracle-apply.ACL2(p) [227] supports parallelism for execution, proof andother infrastructure supports parallelism at the level of collectionsof files. Run-time assertions are supported and lots of debuggingtools are available for program execution and proof. ACL2 canoften emulate other logics by formalizing their proof theories. Itis extensible or programmable by the user via rule classes anddirectly via meta rules and clause-processors. ACL2 may be theonly ITP that presents a single logic for both its programminglanguage (provide efficient execution) and its theorem prover(including definitions and theorems to prove). There is a largelibrary of “Community Books” developed over many years byusers, in daily use. There are users in academia, government andindustry.

Stephen Schulz (E): Stephen Schulz is the next person whoresponded to our questionnaire. Stephen designed and developed“E” theorem prover [240]. The first public release of E was 0.2 in1998. The latest version is 2.4 and updated last time in October2019. E was originally developed at TU Munich, but now it ismaintained and extended at DHBW Stuttgart, Germany. Licensetype is open source/free software under GNU GPL Version2. It is an ATP for full FOL with equality, where first-orderproblems are reduced to clause normal form and uses a saturationprocedure based on the equational superposition calculus. Mainuser community of the system is mathematician. Web address iswww.eprover.org. E won several CADE ATP competition and

Page 6: 1 A Survey on Theorem Provers in Formal Methods

6

has a good ranking. The type of E with respect to reasoning ismathematical, type with respect to the logic is classical FOLand with respect to the truth value is binary. Calculus usedin E is deductive and set theoretic support is available but onlogical level via axiomatization for ZF. Programming paradigmis imperative, it is purely developed in C and it supports CLI(command line interface). E is officially distributed in sourcefiles and supports Linux, Mac OS, FreeBSD, Solaris, Windowsand w/Cygwin. It is not a multi-threaded system, has a mixedarchitecture (modular + monolithic) and has its own library. Proofcan be generated in TPTP-3, PCL2 and Graphviz format. Someinput codes are generated automatically from test data. Systemhas no dedicated proof kernel, but has explicit proof-object.Any standard text editor can be used for files input. E has beencombined with other systems (Waldmeister, LEO-II, Vampire, Z3,etc.) at Isabelle Sledgehammer tool [223] to increase the level ofproof automation.

Makarius Wenzel (Isabelle): Makarius Wenzel providedinformation about “Isabelle” [222], originally published by L.Paulson (Cambridge, UK). Many people have contributed toIsabelle in the last 30 years. It was released in 1986 and its purelogical framework first came up in 1989. Latest version of thesystem was released in June 2019. Isabelle grew out of universityresearch projects, but it is of industrial quality, or even beyondthat, because it is not subjected to constraints imposed by marketeconomy. The full distribution uses add-on tools with variousstandard open-source licenses: LGPL, GPL, etc. Web address isisabelle.in.tum.de. Isabelle unique features is a huge integratedenvironment for interactive and automated theorem proving. It islike a word-processor for formal logic, with specifications andproofs. Main user community is the people interested in formallogic and formalized mathematics and people doing proofs aboutsoftware and hardware. Isabelle is in fact a multiplicity of ITPand ATP systems. Type of the system with respect to reasoningis mathematical, type with respect to logic is mostly HOL, butusers can also do something else if they really want to. Its typewith respect to truth value is mostly classical logic/Boolean.Programming paradigm is purely functional (ML and Scala) andUI is a full-scale IDE. It supports multiple operating systemssuch as: Linux, Windows, Mac and is available both for 32-bitand 64-bit architectures. For scalability, it provides support forclassic shared-memory workstations with many cores. Isabelleis highly modular, to the extent that it is hard to tell where itstarts and ends and what is actually its true structure. It providescode generation facility for SML, OCaml, Haskell and Scala.Isabelle has a small proof kernel, according to the classic “LCFapproach”, but with many add-ons and reforms over the decades.It is based on λ-calculus with simple types and natural deduction.Moreover, it supports inductive recursion and has very powerfulderived principles for inductive sets, predicates, primitive andgeneral recursive functions.

Thierry Lecomte (Atelier B): Thierry Lecomte works asdirector at ClearSy organization. Under his supervision, “AtelierB” was developed. Atelier B implements the B method [4] andoffers a theorem prover. It was released first time in 1994. Latestversion is 4.5.1 and updated last time in May 2018. AtelierB is an interactive rule based theorem prover plus interactiveand dedicated tableau method. Logic of the system is classicalFOL and truth value is traditional Boolean. Calculus type is

deductive and supports ZF set theory. Programming paradigm isimperative and programming language is similar to Prolog. Webaddress is clearsy.com/en/our-tools/atelier-b. UI of Atelier B isgraphical-based and it operates on various operating systems. Italso provides support for the Linux based clusters. Its architectureis monolithic, where inheritance and library support is notavailable. Atelier B (CASE tool) provides C and Ada codegeneration. It has no small proof kernel but has proof-objectsfor more then 130 axioms. Mathematical rules (transformation,rewriting, hypotheses generation) are added by users, but it isnot programmable by users. Syntax is inspired from Haskell,ML, Java, C, C++ and Prolog languages. Infix/postfix/mixfixoperators’ support are available and also for Unicode, Binaryand ASCII coding schemes. Native support for B language is theunique feature of the system. Rich tactic language is available forwriting proof instead by hand and it does not support inductiverecursion.

David Crocker (Escher Verifier): David Crocker is serving atMISRA C++ working group. He developed “Escher” verifier[53]. Its latest version is 6.10.02 and was updated last time in2015. It is an industrial product and license type of the systemis commercial. Web address is eschertech.com/products/ecv.php.One of the unique feature of Escher Verifier is that sometimesit suggests missing preconditions/assertions/invariants, etc. inthe model or software being verified when a proof is not found.Main user community is the defense industry. It falls in ATPcategory. Reasoning type of the system is mathematical, logicaltype is a combination of FOL, SOL and some HOL. Type withrespect to the truth value is mostly binary but triadic wherenecessary. Calculus of the system is deductive and programmingparadigm is mostly functional, but imperative in speed-criticalparts. Programming language is C++. There is no direct interfacefor the theorem prover. However, GUI is available for theverification tools that uses it. It runs on Windows and Linuxoperating systems. System scalability is limited to a singlethread and architecture is monolithic and standalone. It does nothave small proof kernel and editor support. It is not extensibleor programmable by its user. Moreover, it does not supportconstructive logic.

Norman Megill (Metamath): Norman Megill is the nextrespondent of the questionnaire. He is the originator of the“Metamath” [198]. There are 34 other contributors who helpedto extend the system. It was introduced first time in 1993.Latest version is 0.131 and was updated last time in June 2016.It is an independent development by Norman. Web page isus.metamath.org. License type of the system is GPL. User FOLscheme is the unique feature of the system. It is an ITP andalso used as a proof checker. Reasoning type of the system ismathematical, logical type is FOL. HOL is also possible but notdeveloped yet. Truth value of the system is binary and deductivecalculus is used. There are 12 independent verifiers availablein C, Java, C#, Lua, Mathematica, Julia, Rust, Python, Haskell,C++, and JavaScript. Metamath supports both CLI, GUI andruns on almost all operating systems. Architecture of the systemvaries according to the environment. Metamath also displayscomprehensive error message for the debug output and runtimeassertion. Library of the system contains over 20000 theoremsthat covers results in logic, algebra, set and group theory, topologyanalysis, Hilbert spaces and quantum logic. It is a standalone

Page 7: 1 A Survey on Theorem Provers in Formal Methods

7

system and no code generation facility is available. It is extensibleand programmable by the user. Metamath has a small proofkernel. Human readability feature according to the syntax isunique as compared to others. Unification process is used forpattern matching. Argument handling is implicitly available andit is lightweight. Metamath supports inductive recursion and doesnot allow to write non-terminating programs. The system is easyto learn, but require experience with library and reasoning foradvanced proofs.

Frank Pfenning (Twelf); Frank Pfenning works as professorat Computer Science Department, Carnegie Mellon University,USA and is the creator of “Twelf” [226]. C. Schurman alsocontributed to the system. It was released publicly in January1999. Latest version is 1.7.1 and updated last time in January2015. Website address is twelf.org. Twelf is an ITP and simplifiedBSD is the type of system license. Unique features of the systemare meta-theorem proving for programming languages and logics.It is mainly developed for academia community. Type of thereasoning system with respect to logic is type theory and its typewith respect to the truth value is intuitionistic. Twelf is built ondeductive calculus and is developed in standard ML. UI of thesystem is CLI and runs on almost all operating systems. It maybe scalable but it is not distributed or multi-threaded system.Twelf supports IDE and has its own libraries. System is extensibleand programmable by users. It supports no tactic language andproofs are written by hand. Twelf has been used to formalizemany different logics and programming languages (examples areincluded with the distribution).

Ulf Norell (Agda): Ulf Norell works as a principalresearch engineer at University of Gothenburg, Sweden. Hedeveloped “Agda” system [211]. Latest version is 2.6.0.1,which was updated last time in May 2019. Web address iswiki.portal.chalmers.se/agda/pmwiki.php. License type of thesystem is BSD-like. Dependent types are the unique feature andacademia is the main user community of the system. Popularresearch language is the main success story of the system. It isan ITP and based on functional programming. System type withrespect to the logic is intuitionistic HOL, type with respect tothe truth value is binary and is built on inductive calculus. Itsupport constructive type theory. Haskell programming languageis used for developing the system and UI of the system is graphicbased. Agda supports and runs on all popular operating systems.It is scalable but not used for distributed or multi-threadedenvironment. It supports IDE, has its own proof kernel andlibrary. It is extensible and programmable by users and has atactic language for proof writing. An important aspect of Agda isits dependence on Unicode. Its standard library is under constantdevelopment and includes many useful definitions and theoremsabout basic mathematical designs.

Adam Naumowicz (Mizar): Adam Naumowicz provideshis services to computer science institute at University ofBialystok, Poland. He gave information on “Mizar” [110], whichwas publicly announced in November 1973. Latest versionis 8.1.09 and updated last time in June 2019. Web addressis mizar.org. Andrzej Trybulec is the founder and Mizar isdeveloped at University of Bialystok. The system is free for anynoncommercial purposes. User friendly input language based onnatural language and a large library of formalized mathematics

are the unique features. Mathematicians, computer scientistsand students are the main user community. Mizar is an ITPand based on syllogism or mathematical statements. FOL withschemes (statements with free second-order variables) is thesystem type with respect to the logic and is based on binarytruth value. It is based on deductive calculus and ZF set theoreticsupport is available. Declarative is the programming paradigmand object Pascal programming language is used for developingthe system. Type of the interface is CLI and runs on almost alloperating systems. It is scalable, but not suitable for distributedor multi-threaded environment. It supports IDE, has its ownlibrary and proof kernel. However, it is not extensible and notactic language support is available for proof writing. MizarMathematical Library (MML) contains approximately 10,000formal definitions and 52,000 lemmas and theorems.

Michael Norrish (HOL): Michael Norrish has been working asprincipal research engineer at Australian National University. Hetalked about “HOL” theorem prover [212]. It was publicly releasedin January 1985. Latest version is Kananaskis-13 and was updatedlast time in August 2019. Web address is hol-theorem-prover.org.HOL was developed at Cambridge University. Four toolsnow comes in HOL family: HOL4 [247], HOL Light [120],ProofPower [20] and HOLZero [7]. Other tools that come in HOLfamily are developed jointly by Cambridge University, Data61,CSIRO and Chalmers University of Technology. License type ofHOL is BSD. It is an ITP based on syllogism or mathematicalstatements. HOL is the logical framework of the system andis based on binary truth value. System is based on deductivecalculus and does not supports set theory directly, but it has aset-theoretic model. Programming paradigm is functional anddeveloped in SML programming language. UI is command line. Itsupports and runs on all famous operating systems. It is scalable,but not suitable for distributed or multi-threaded environment. Itdoes not support IDE, but has its own proof kernel and library.System is extensible and support tactic language for proof writing.

Jonathan Sterling (RedPRL): Jonathan Sterling is a graduateresearch assistant at School of Computer Science, CarnegieMellon University and creator of “RedPRL” [250]. Web addressis redprl.org. MIT is the license type of the system. Uniquefeatures of RedPRL are higher dimensional types, support forstrict equality and tactic scripts, refinement of dependent proofsand functional extensionality. Main user community is homotopytype theory. It is an ITP and based on syllogism or mathematicalstatements. Type theory is the logical framework and is basedon intuitionistic truth values. System is based on deductivecalculus. Programming paradigm is functional and it is developedin standard ML. Visual studio code extension is the UI of thesystem. It runs on almost all major operating systems. RedPRLis not suitable for distributed or multi-thread environment, buthas its own IDE. It has no library, but has its own proof kernel.RedPRL is extensible by the user and syntax of the system isinspired by Nurpl programming language [13]. Tactic languagesupport is also available for proof writing.

Oleg Okhotnikov (Class & Int Proof Checker): YuriVtorushin and Oleg Okhotnikov implemented the “Class andInt proof checker”. It was publicly announced first time inOctober 2007. Latest version of the system is Class 2.0 andInt 2.0 and was updated last time in November 2017. Web

Page 8: 1 A Survey on Theorem Provers in Formal Methods

8

address is class-int.narod.ru/. Automated proof search for naturalreasoning and support for iterative equalities are the uniquefeatures of the system. System is mainly developed for studentsand teachers. Vtorushin and Okhotnikov uses Class and Intprograms on seminars with students in courses “MathematicalLogics and Algorithm Theory”, “Artificial intelligence”, etc.It is an ATP based on syllogism or mathematical statements.It is based on FOL, supports binary truth value and built ondeductive calculus. Axiomatic method set theoretic support isavailable. Programming paradigm is declarative and developedin C++. Moreover, it supports Windows operating system onlyand has a CLI. It is scalable and also supports distributed andmulti-threaded environment. System supports IDE and has itsown proof kernel. System is not extensible by the user and syntaxis inspired by Mizar and SAD. Tactic language is also availablefor proof writing.

Hans de Nivelle (GEO): Hans de Nivelle from School of Scienceand Technology, Nazarbayev University, Kazakhstan developedthe “Geo” [61] prover. It was released first time in August 2015and latest version is Geo2016C. It is an ATP for FOL that is basedon graph theory and supports partial classical logic (PCL) with3-valued logic as a truth value. Calculus of the system is basedon geometric resolution and is developed in C++. It supportsCLI and only runs on Mac. The system takes geometric formulasand FOL formulas as input, where FOL formulas are changed togeometric formulas. During proof search, it looks for a geometricformulas model through backtracking. Main success story is itsexistence in the current scenario. License type is GNU GPL,Version 3. System is scalable, but not designed for distributed ormulti-threaded environment and it has no IDE. Library support isnot available but proof kernel is owned by the system. Syntax ofthe system is inspired by TPTP-language and it is not extensibleby the user with no tactic support. Web page of the system ishttp://www.ii.uni.wroc.pl/∼nivelle/software/geo III/CASC.html.

Hugo Herbelin (Coq): Hugo Heberlin is working as researcherat INRIA, France. He talked about the “Coq” system [38]. Itwas first released in May 1989. Latest version is 8.10.1 and lasttime updated in October 2019. Web address of the system iscoq.inria.fr. It is developed by INRIA and academic partners.LPGL 2.1 is the license type. Unique features of the system areexpressive logic and programming language, program extraction,elaborated certification language, proof techniques and a tacticlanguage that allows users to define proof methods. Teaching,formalization of mathematics and certified programming are mainuser community of the system. The specification language of Coqis called Gallina (based on Calculus of Inductive Constructions),which allows its users to write their specification by developingtheories. Coq follows the Curry-Howard isomorphism [248]and uses Calculus of Inductive Constructions language [67]to formalize programs, properties and proofs. Curry-Howardisomorphism provides a direct relation between programing andproving and says that proofs in a given subset of mathematicsare exactly programs from a particular language. It means thatone can use a programming paradigm to encode propositionsand their proofs. Coq is an ITP and supports various decision orsemi-decision procedures produced proof-terms checked validby a kernel. Logical framework of the system is based on HOL,λ-calculus and is built on both inductive as well deductivecalculus. Different ways to represent sets are available in the

system. Programming paradigm of the system is functional anddeveloped in OCaml programming language. System supportsboth graphical as well as CLI and run on almost all operatingsystems. System is scalable, but not designed for distributed ormulti-threaded environment. System has IDE, library support isavailable and proof kernel is also owned by the system. Syntax ofthe system is inspired by ML language and is extensible by theusers.

Clark Barrett (CVC4): Clark Barret is the last respondentof the questionnaire and provided information about CVC4 [32],developed at Stanford University and University of Iowa. CVC4is an ATP for SMT problems and was released first time inDecember 2014. Latest version is 1.7 and last time updated inApril 2019. Web address is http://cvc4.cs.stanford.edu and licensetype is BSD 3-clause. CVC4 is based on DPLL(T) calculus [33]and its type with respect to reasoning is mathematical and isbased on standard many-sorted FOL, with limited support forHOL. It also supports finite sets. Main user communities ofthe system are people that are interested in program analysis.Programming paradigm is logical and is developed in C++.CVC4 supports CLI, offers API’s for C, C++, Java, Pythonand runs on Mac and Windows. Support for finite sets is alsoavailable. Moreover, system is modular, does not provide anysupport for distributed computing and offers limited supportfor multi-threading. CVC4 offers solvers for separation logic,sets and relations, where models assign every formulas eithertrue or false. Debug output and run time assertions support isavailable, whereas code generation support is not available. Inputlanguage to CVC4 is SMT-Lib, which is inspired by LISP. It alsosupport CVC input language, which is more human-readable thanSMT-LIB. Moreover, limited support is available for inductivereasoning. CVC4 is used as the main engine in Altran SPARKtoolset and at GE and Google. CVC4 comes first in variousdivisions of Satisfiability Modulo Theories (SMT-COMP), CASCand SyGuS (Syntax-Guided Synthesis) competitions.

The summary of the main characteristics of theorem proversfor which we received answers from experts/developers is listed inTable 5. The answers for PVS is provided by authors of this paperas they have done some work in PVS in the past.

We also developed a layout for the survey questionnaire,which is listed in Table 6. Mnemonics codes are used to rep-resent headlines. These abbreviations are: CLang = Computa-tional Language, 1st Rel = First Release, Ind/Uni/Inde = Indus-try/University/Independent, Prog.P = Programming Paradigm, LV= Latest Version, LT = License Type, UI = User Interface, OS =Operating System, Lib = Library, CG = Code Generation, Ed =Editor, Ext = Extendable, I/O = Input/Output, TType = Tool Type,CLogic = Computational Logic, TV = Truth Value, ST = SetTheory, App.Areas = Application Areas and Eval = Evaluation.We filled the layout for 27 more theorem provers. We collect thedata from various resources such as electronic databases, researcharticles and dissertations. Complete detail for each system is listedin Appendix A.

Page 9: 1 A Survey on Theorem Provers in Formal Methods

9

TABLE 5: Main characteristics of 16 theorem provers

Cha

ract

eris

tics

AC

L2

EIs

abel

leA

telie

rB

Esc

her

Met

amat

hTw

elf

Agd

aM

izar

HO

LR

edPR

LC

lass

&In

tG

eoC

oqPV

SC

VC

4

Syst

emTy

peT

PT

PT

PT

PT

PT

PT

PT

PT

PT

PT

PT

PT

PT

PT

PT

P4SM

T

The

orem

Prov

erC

ateg

ory

ITP

AT

PA

TP+

ITP

ITP

AT

PIT

PIT

PIT

PIT

PIT

PIT

PA

TP

AT

PIT

PIT

PA

TP

Syst

emB

ased

onSy

llogi

smSy

llogi

smSy

llogi

smSy

llogi

smSy

llogi

smSy

llogi

smL

FFP

Syllo

gism

Syllo

gism

Syllo

gism

Syllo

gism

GT

DP

Syllo

gism

DPL

L

Log

icU

sed

FOL

FOL

HO

LFO

LFO

L+H

OL

FOL

+HO

LT

TH

OL

FOL

HO

LT

TFO

LPC

LH

OT

TH

OL

FOL

Syst

em’s

Trut

hV

alue

Bin

ary

Bin

ary

Bin

ary

Bin

ary

Bin

+Tri

Bin

ary

Intu

ition

Bin

ary

Bin

ary

Bin

ary

Intu

ition

Bin

ary

3-va

lue

Bin

ary

Bin

ary

Bin

ary

Cal

culu

sIn

duct

ive

Ded

uctiv

eD

ed+I

ndu

Ded

uctiv

eD

educ

tive

Ded

uctiv

eD

educ

tive

Indu

ctiv

eD

educ

tive

Ded

uctiv

eD

educ

tive

Ded

uctiv

eD

educ

tive

Ded

+Ind

uD

educ

tive

Ded

uctiv

e

SetT

heor

etic

Supp

ort

No

Yes

Yes

Yes

No

Yes

Yes

Yes

Yes

No

No

Yes

No

Yes

Yes

Yes

Prog

ram

min

gPa

radi

gmFu

ncIm

peFu

ncIm

peFu

nc+I

mp

Func

LP

Func

Dec

lFu

ncFu

ncD

ecl

Dec

lFu

ncFu

nc+O

OL

ogic

al

Syst

emA

rchi

tect

ure

Mod

ular

Mod

+Mon

oM

odul

arM

onol

ithic

Mon

olith

icM

od+M

ono

Mon

olith

icM

odul

arM

odul

arM

odul

arM

odul

arM

odul

arM

onol

ithic

Mod

ular

Mod

ular

Mod

ular

Prog

ram

min

gL

angu

age

AC

L2

CM

L+S

cala

Prol

ogC

++M

MSM

LH

aske

llPa

scal

SML

SML

C++

C++

OC

aml

CL

isp

C++

Use

rIn

terf

ace

CL

IC

LI

GU

IG

UI

CL

I+G

UI

CL

I+G

UI

CL

IG

UI

CL

IC

LI

GU

IC

LI

CL

IC

LI+

GU

IG

UI

CL

I

Plat

form

Supp

ort

Cro

ssC

ross

Cro

ssC

ross

Win

+Lin

uxC

ross

Cro

ssC

ross

Cro

ssC

ross

Cro

ssW

indo

ws

Mac

Cro

ssM

ac+L

inux

Mac

+Win

Scal

abili

tyY

esY

esY

esY

esN

oY

esY

esY

esY

esY

esN

oY

esY

esY

esY

esN

o

Mul

ti-th

read

edY

esN

oY

esN

oN

oY

esN

oN

oN

oN

oN

oY

esN

oN

oY

esY

es

IDE

Yes

Yes

Yes

Yes

No

Yes

Yes

Yes

Yes

No

Yes

Yes

No

Yes

Yes

No

Lib

rary

Supp

ort

Yes

Yes

Yes

No

No

Yes

Yes

Yes

Yes

Yes

Yes

Yes

No

Yes

Yes

Yes

Prog

ram

mab

ility

Yes

No

Yes

No

No

Yes

Yes

Yes

No

Yes

Yes

No

No

No

Yes

No

Tact

icL

angu

age

Supp

ort

Yes

No

Yes

Yes

No

Yes

No

Yes

No

Yes

Yes

Yes

No

Yes

Yes

No

TABLE 6: Systematic literature review designTheorem provers

Gen

eral Name

Contributor1st RelInd/Uni/Ind

Impl

emen

tatio

n

CLangProg.PLVLTUIOSLibCGEdExt

Log

ico-

Mat

h TTypeCLogicTVSTCalculusProofKernel

Oth

ers App. Areas

EvalUnique Features

5 COMPARISON

In this section, we first listed those scientists that contributed mostin the formalization of mathematical theorems. We also presentthose theorem provers in which most of the theorems are proved.Furthermore, the top systems (from 1996 till 2019) in CADEATP system competition are described. Finally, we showed thecomparison of more than 40 provers for parameters mentioned inSection 1.

5.1 Top Scientists and Theorem ProversEfficiency and power of theorem provers are generally evaluatedon the number of theorems they proved from top hundred theoremslist (available at: http://www.cs.ru.nl/∼freek/100/). People whocontributed most in verifying these theorems are presented inFigure 1. John Harrison currently working at Intel proved 84theorems and he used HOL (particularly HOL Light) and Isabelle.Rob Arthan (second in the list) also used family of HOL theoremprovers for theorem proofs. Theorem provers which proved mostof the theorems from top hundred theorems list are presented inthe order:

HOL Light (86)→ Isabelle (81)→ Metamath (71)→ Coq (69)→ Mizar (69)→ ProofPower (43)→ Nqhtm/ACL2 (18)→ PVS

(16)→ NuPRL/MetaPRL (8)

HOL Light’s performance is outstanding and it is at the top byformalizing and proving 86 theorems. Isabelle, another powerfultool is at the second number in the list. Metamath, Coq, Mizar andProofPower are also the computationally strong tools and playa vital role in the formalization of top hundred theorems. Twosystems in HOL family (HOL Light and ProofPower) are includedin the list.

5.2 Best FOL Theorem Provers at CASCEach year, FOL based ATPs performances are checked in theCADE ATP System Competition (CASC). This competition wasstarted in 1996. CASC consists of various divisions and thesedivisions are categorized based on the type of their problems andthe characteristic of systems. There are two major divisions. First

Page 10: 1 A Survey on Theorem Provers in Formal Methods

10

8

43

8

5

13

14

35

13

17

9

9

10

84

6

6

12

10

7

7

Ruben Gambao

Rob Arthan

Ricky Butler

Paul Jackson

Norman Megill

NASA Library

Mario Carneiro

Marco Riccardi

Manual Eberl

Lukas Bulwahn

Laurent Thery

Karol Pak

John Harrison

Jacques D. Fleuriot

Grzegrorz Bancerek

Frederique Guilhot

C-CoRN

Benjamin Porter

Amnie Chaieb

Fig. 1: Theorems proved by scientists in ITPs

one is the competition division which ranks the reasoning system,second one is the demonstration division which enables the systemto demonstrate its potential without ranking. These divisions arefurther divided into subdivision on the basis of problem categories.Competition division is an open platform for automated reasoningsystems that meet the requirements of this division. System se-lected for the competition division tries to attempt all the problemsof this division. Subdivisions of this division are: THF, THN, TFA,TFN, FOF, FNT, CNF, SAT, EPR, SLH (changed to UEQ in 2015and back to SLH in 2018) and LTB. More details on subdivisionscan be found in [251]. These divisions are presented on horizontalaxis in Figure 2, while vertical axis represents the competitionyear. Some tools are specified to only one division, while some aretested on different problem divisions which show excellent results.Figure 2 has sketched the overall results for each division. Arrowsin the figure show the continuous winners in a particular division.For example, Vampire [235] in the FOF division is performing bestfrom 2002 to 2019 and Satallax [49] is coming first in the THFdivision from last seven years. Vampire system topped the TFA,FOF, FNT and EPR divisions respectively. Similarly, iProver [170]dominates the EPR division from 2008 to 2014 and 2016 to 2018.These provers perform best at CASC due to the following reasons:

1) Sound theoretical foundations,2) Thorough tuning and testing,3) Huge implementation efforts, and4) Understanding of how to optimize for the competition.

5.3 Provers Comparative Analysis ResultsThis subsection presents the comparative analysis of theoremprovers for various parameters, that includes: theorem prover cate-gory, mechanical reasoning system type, logical framework, truthvalue, calculus of the system, set theoretic support, programmingparadigm and programming language of the system, UI, systemscalability, distributed/multi-threaded, IDE and library support.Results for each parameter is presented next.

5.3.1 Theorem Prover CategoryFigure 3a shows the category of theorem provers such as ATPs,ITPs, geometric theorem provers, decision procedures and theorygenerators, etc. Major portion is shared by ATPs (44%) and ITPs(31%). Systems working as both ATP and ITP take 11%. Whereas,systems that work as either theory generator, as ATP and model

1996

2004

2001

20032002

2000199919981997

2005

2013

2010

20122011

2009200820072006

2018

2015

20172016

2014

2019

THF THN TFA TFN FOF FNT CNF SAT EPR SLH LTBE-SETHEO Otter

SPASS Gandalf SPASS

SPASS

SPASS

E-SETHEO

Gandalf SPASS Waldmeister

Vampire

Vampire

Vampire

TPS

LEO-I

Satallax

Isabelle

Satallax

Satallax

SPASS + T

Princess

CVC4

VampireZ3

Vampire

Vampire

Niptick

SPASS + T

CVC4

Beagle

Paradox

Meta Prover

Paradox

Paradox

iProver

iProver

Vampire

Vampire

Vampire

E

E-SETHEO

Vampire

Vampire

E

E-SETHEO

OtterMACE

GandalfSAT

Gandalf

Gandalf

Paradox

Paradox

Meta Prover

Paradox

DCTP

DCTP

Darwin

Darwin

iProver

iProver

Vampire

iProver

iProver

Vampire

Waldmeister

Waldmeister

UEQ

Vampire

SLH

E

E-SETHEOSEM

SInE

Vampire

Vampire

MaLARea

Vampire

VampireMaLARea

LEO-III

Fig. 2: Top ATPs at CASC

generator and as automated geometric theorem prover share 6%.Systems that work as decision procedures and ATP for SMTproblems take 8%.

5.3.2 Mechanical Reasoning System Type

Figure 3b represents the theorem provers grounding either theseare based on syllogism, mathematical statements or various logictheories. 54.5% of the systems are based on syllogism or mathe-matical statements. While other types take 9.1% each.

5.3.3 Logical Framework

Figure 3c shows the logical framework of theorem provers. Mostsystems (59%) are FOL based systems. 16% of the systems arebased on HOL. Systems that are based on graph theory, dynamicmodal logic, FOL/HOL, pure classical logic, equational logic, typetheory and higher order type theory share collectively 25%.

5.3.4 Truth Value and System Calculus

Figure 3d shows 86% of theorem provers are based on Booleanlogic or binary logic, while 7% of the systems are based onintuitionistic logic. Systems that result in triadic (3-value) share2% and both binary and triadic value systems take 5%.

Figure 3e shows 52% of the systems are based on deductivecalculus while 10% are based on inductive calculus. Theoremprovers which are based on both inductive and deductive calculustake 5%. Systems based on euclidean and differential geometry,first order predicate calculus, fixed point co-induction, λ-calculus,sequent calculus, tableau calculus, instantiations calculus, hypertableau calculus and typed λ-calculus are respectively 3%, 3%,3%, 5%, 5%, 8%, 2%, 2% and 2%.

5.3.5 Set Theoretic Support

Figure 3f shows that only 30% of theorem provers provide settheoretic support, while 56% do not support set theory. Systemsthat support Horn theory, swinging type theory and ZF set theorytake 2% each. Systems that supports Quine’s and B-Method settheory also take 3% each.

Page 11: 1 A Survey on Theorem Provers in Formal Methods

11

5.3.6 Programming Paradigm and Programming LanguageFigure 3g presents the programming paradigm of the systemssuch as functional, imperative and declarative, etc. Programmingparadigms of 23% theorem provers are functional. While systemshaving functional, imperative and object oriented paradigms take16%. Systems that belong to logic programming paradigm take9%. Provers that belong to both procedural and object orientedparadigm are 7% and 12% systems belong to the declarativeparadigm. 2% of the systems belong to functional, concurrent,and object oriented programming paradigm. Systems that be-long to functional and imperative paradigm take 5%. Theoremprovers which come under the functional and procedural paradigmtake 5%. Systems that only belong to concurrent programmingparadigm take 2%. Systems belonging to both functional andmodular paradigm take 2%.

Figure 3h presents the overview of programming languageswhich are used to develop theorem provers: Ocaml (20%), C/C++(17%), Common Lisp (10%), Java (10%), Prolog (10%), SML(10%), Haskell (7%), Mathematica (5%), Pascal (3%), Metamath(2%), Perl (2%), Scala (2%), ML and Scala (2%).

5.3.7 User Interface and Operating SystemTheorem provers are mostly available with CLI (54%) only, while30% percent of the systems are available with GUI only and 16%of the systems provide both CLI and GUI.

On the other hand, 51% theorem provers run on cross platformas shown in Figure 3i. 15 % of the systems support Linux,Mac and Windows operating systems. 3% of the systems runon all Unixoids-based operating systems. Systems that run onlyon Windows take 3%, Unix (5%), Linux (5%), and Mac (2%).Systems that support Unix as well as Linux are 2%. Systemsthat run on Linux, Unix, Windows and Mac are 5%. Systemssupporting Linux, Solaris and Mac take 2%. Systems that onlysupport Linux and Solaris take 2%.

5.3.8 Distributed/Multi-threaded, IDE and Library SupportOnly 17% (8 out of 43) of theorem provers support distributed ormulti-threaded environment, while 83% of the systems does notsupport such environments. Further, our results showed that allof the systems have the ability of scalability according to futureneeds. On the other hand, 65% (30 out of 43) of the systemssupport IDE. Furthermore, 56% (26 out of 43) of the systemshave their own libraries while 44% of the systems does not havetheir own libraries.

5.4 The de Bruijn CriterionAccording to the de Bruijn criterion “the correctness of the math-ematics in the system should be guaranteed by a small checker”[30]. This means that a system has a ‘proof kernel’ (also calledproof checker) that is used to filter all the mathematics. Table 7shows whether 15 theorem provers for which we received answersfrom experts/developers have small proof kernels or not + standsfor yes and - for no). Proof kernel for other 27 theorem provers areshown in Appendix A, and majority of them have no proof kernel.Whereas, HOL Light has extremely small proof kernel containngonly several hundred lines of OCaml.

In some ITPs (e.g., Coq and Agda), the proof kernel alsochecks the correctness of proof-objects that are generated by othertools included in the whole system. For an ITP with proof-objects,the proof-script for a statement (theorem or lemma) contain a list

TABLE 7: de Bruijn CriterionSystem De Bruijn Criterion Proof-objectACL2 - -

E - +Isabelle + x

Atelier B - -Escher - -

Metamath + +Twelf - -Agda + +Mizar - -HOL + x

RedPRL/Nuprl + xClass & Int + +

Geo + +Coq + +PVS - -

of tactics/strategies that are required to make the proof-assistantto verify the validity of the statement. The proof-script generatesand stores a term that is a proof that can be checked by a simpleproof kernel. The reliability of the whole system depends on thesoundness of proof-objects and the proof kernel. Even if someonehave doubts about the validity of certain statements or if someparts of the systems contain bugs, the proof-object for a givenstatement and the proof kernel can be used to locally verify thestatement within the corresponding logical system.

HOL, Isabelle and Nuprl come in the class of ITPs that havea proof kernel but no proof-objects. In such systems, the proof-script are considered as no-standard proof-object (shown with xin Table 7). They translate the proof-script into a proof-object thatrequires some system specific preprocessing. The trustworthinessof the translation is verified with the proof kernel. For ITPs withno proof kernel (e.g., PVS, Mizar and ACL2), there is no way(yet) that provides a proof-object with high reliability. One has totrust these systems for the correctness of statement accepted bythe assistants. The advantage of these kind of systems generallyis their larger automated deduction facilities and user-friendliness[31].

5.5 The Poincare Principle and Automation

For theorem provers, one of the important aspects is the automa-tion of trivial tasks [274]. It means that a user is not requiredto explain all the details of the calculations to a theorem prover.A theorem prover satisfies the Poincare principle (formulated by[29]) if it has the ability to automatically prove the correctnessof calculations. For example 3 + 4 = 7 holds by computation andit should not be justified with long chains of logical inferences.Table 8 lists whether a prover satisfies Poincare principle or not.

Another important feature of a theorem prover is whether itenables its user to write programs that can solve proof problemsalgorithmically. HOL, PVS, Coq and Isabelle offers such kindof user automation. Strong built-in automation (proof tactics,decision and search procedures, and induction automation etc.) isanother important aspect of a theorem prover. ACL2 and PVS hasthe powerful built-in automation. Then comes HOL, Isabelle andthen Mizar and Coq. The following order indicates the reliabilityof ITPs with most reliable on the left side and least reliable on theright side.

Agda, Coq, Metamath, Nuprl, HOL, Isabelle, Twelf, Mizar, PVS,ACL2.

Page 12: 1 A Survey on Theorem Provers in Formal Methods

12

(a) Theorem provers category (b) Mechanical reasoning system type

(c) Logical framework of systems(d) Truth value of the systems

(e) Calculus of the systems

(f) Set theoretic support

(g) System programming paradigm (h) System programming language

(i) Operating system

Fig. 3: Theorem provers comparisons

Page 13: 1 A Survey on Theorem Provers in Formal Methods

13

TABLE 8: The Poincare principleSystem Poincare principle User automationACL2 + +

E - +Isabelle + +

Atelier B + +Escher - -

Metamath - -Twelf - -Agda - -Mizar - -HOL + +

RedPRL/Nuprl + -Class & Int + -

Geo - +Coq + +PVS + +

Agda is placed at first place as it only uses predicative logic.The middle places are occupied by Nuprl, HOL and Isabellebecause of their non-standard proof-objects. Finally, the leastreliable are those that do not work wit proof-objects. On the otherhand, the order for internal automation in ITPs is opposite withACL2 and PVS on the top.

6 STRENGTHS, ANALYSIS, APPLICATIONS ANDFUTURE DIRECTIONS

In this section, we provide the details on the strengths, in-depthanalysis of the differences and applications of theorem provers.Moreover, some future research directions based on recent workis also discussed. Some theorem provers such as Geo, Class & Intproof checker are omitted in this section and some other famousprovers such as Nuprl, Vampire, Prover9/iProver and MaLAReaare included.

6.1 ITPsACL2 main strengths are: state-of-the-art prover heuristics, robustengineering and extensive hyperlinked documentation. Among allITPs, ACl2 uses FOL instead of higher-order logics. Only twoITPs, Isabelle and ACl2 offer parallel proof checking facility.ACl2 also supports program extraction (also called code gener-ation) by translating the specification in ACL2 language to Com-mon Lisp. Theories can also be developed and executed in ACL2as it is built around a real programming language. Users can notconstruct inductive types, but powerful built-in induction schemein ACl2 allows users to define their own recursive functions.The inference engine is based on the waterfall design of Nqthmprover [219]. ACL2 has been used extensively to verify hardwareand software designs at AMD, Centaur, Oracle, Intel, IBM andto prove separation kernel properties at Rockwell Collins [137].Moreover, ACL2 is also used successfully in processor modeling[91], digital systems [157], programming languages [200], asyn-chronous circuits [59] and concurrency [199]. ACl2(ml) [125] usesmachine learning to facilitate users in the proof process. In past,some work has been done in integrating SAT solvers into ACL2[224], [255]. However, it has generated new issues because oftheir support for a wide range of domains including real numbersand uninterpreted functions. The x86isa library [104] in ACL2offers a formal model for reasoning about x86 machine-codeprograms. Adding several features to current x86isa library suchas exceptions, interrupts handling and extending I/O capabilitieswill enable us to reason about real system codes.

Atelier B offers a framework that automatically prove andreview user added mathematical proof rules. Proof obligations inAtelier B contain traceability information that helps in locatingmodeling errors and model editor allows the navigation of modelsand operations [177]. B-method allows one to develop many mod-els of the same system with the refinement technique. However,one is required to explain the B model while proving theoremsand the proof obligation generator may generates small proofobligations that need to be discharged. Atelier B is used in thedevelopment of safety-critical systems [178] and communicationprotocol [145]. Moreover, B-method is also used successfully inthe development of safety-critical parts in two railway projects [5],[27] and byte code verifier of the Java card [57].

Main strength of Metamath is that it uses the minimumpossible framework that is required to express mathematics andtheir proofs. Unlike most ITPs, no assumption is made by theMetamath’s verification engine about the underlying logic and themain verification algorithm is very simple (essentially nothingmore than substitution of variables expression, enhanced withchecking for conflicts in distinct variables). Weaker logics suchas quantum or intuitionistic can be handled in Metamath withdifferent sets of axioms. Proofs in Metamath are generally verylong, but the proofs are completely transparent and Metamathdatabase contains over 30,000 human readable formal proofs.In the proof development process, users prove a theorem/lemmainteractively within the program, which is then written to thesource by the program. A definition is provided in [54] for modelsof Metamath style formal systems, which are demonstrated onpropositional calculus examples. From mathematical foundations,Hilbert space and quantum logic is developed in Metamath, whichare used in the verification of some new results in these fields.Metamath is used in the formalization of Dirichlet’s theorem andSelberg’s proof of the prime number theorem [55]. An algorithm ispresented in [56] that converts HOL Light proofs into Metamath.

Twelf strengths are representing deductive systems with sideconditions and judgments with contexts. It offers an environ-ment for experimenting with encodings and to verify their meta-theoretic properties. It also provides a module system for the or-ganization of large developments. Twelf implements the λProlog[89] and its logic is very close to the Edinburgh Logical Frame-work (LF) [226]. Twelf specifications can be executed througha search procedure, which means that it can also be used as alogical programming language. Twelf is used in proving the safetyof standard ML programming language [179], in typed assemblylanguage system [69], in foundational Proof Carrying-Code sys-tem [18], in cut-elimination proofs for classical and intuitionisticlogic [225], for specifying and validating logic morphisms [242]and construction of a safety policy for the IA-32 architecture [70].

Main features of Agda is the interactive construction of pro-grams and proofs with meta-variables and place-holders. Unlikeother ITPs that work with proof-scripts, Agda acts as a structureeditor, providing support for term construction. Users can editthe proof-object by focusing on a hole and executing one of theoperations (tactics) that is applied to that hole. It is the only ITPthat offers a functional programming language with dependenttypes. Moreover, strictly positive inductive and inductive-recursivedata types are supported in Agda. Emacs interface for Agdaallows the incremental development of programs [15]. Agda isused to formally verify railway interlocking systems [155], webclient application development [143], fully certified merge sort[66], hardware description and verification [92], formalizing Type-

Page 14: 1 A Survey on Theorem Provers in Formal Methods

14

Logical Grammars [168], Curry programs verification [17] andformalization of Valiant’s Algorithm [37]. In [94], automated the-orem prover Waldmeister [129] is integrated in Agda to facilitatethe proof automation. Similarly, another tool is proposed in [186]for automated theorem proving in Agda.

Mizar received popularity because of its huge repository offormalized mathematical knowledge, which has been used indeveloping AI/ATP methods to solve conjectures in mathematics[260]. Over the years, the syntax of Mizar is improved, simpli-fied and the Mizar language now contains a rich set of logicalquantifiers and connectives. The evolution of Mizar in first 30years is presented in [194]. In Mizar, proofs are written in adeclarative way and proofs are developed according to the rules ofthe Jaskowski style of natural deduction [142]. This characteristicsinfluenced other systems to build similar kind of proof layerson top of several other systems, such as the Mizar mode forHOL [115], the Isar language for Isabelle [270], Mizar-Light forHOL-Light [273] and the declarative proof language (DPL) forCoq [68]. Mizar does not support the Poincare principle, yet ithas some automated deduction and a set of tactics that are veryuser-friendly. The main unique feature is that the proof-scriptis close to an ordinary proof in mathematics. Apart from largeMML, Mizar is used in the development of rigorous mathematics,in hardware/software verification and in mathematical education[110]. Some recent work in Mizar includes the formalization ofPell’s equation [6], Nominative Algorithmic Algebra [169] andformalization of bounded functions for cryptology [214]. A suiteof AI/ATP system is developed on Mizar library in [151] thatcontains approximately 58000 theorems. 14 strongest methods thatexecuted in parallel proved 40.6% of the theorems. Moreover, anindependent certification mechanism is developed in Mizar that isbased on Isabelle framework [148]. In [149], Mizar environmentis also emulated inside the Isabelle.

HOL was build upon the Cambridge LCF approach [221] thatis now referred to as HOL/88 with the purpose of hardware veri-fication. It has influenced the development of other famous ITPssuch as Isabelle/HOL, HOL Zero, ProofPower and HOL Light. InHOL, the computations that involves recursion can become quitelengthy and complex when they are converted to proof-objects.Thus, the proof-objects are not stored, only the proof-scripts.This is the reason why non-standard proof-objects are used inHOL. Users in HOL generally work inside the implementationlanguage. As HOL is fully programmable, various other means ofinteracting with HOL have also been developed. HOL Light is themost widely used provers of this family and it is probably the onlyprover that represents the LCF approach in its purest form. Its logicis based on simple type theory with polymorphic type variable.The terms in HOL Light are of simply typed λ−calculus, withjust two primitive types: bool (Booleans) and ind (individuals).HOL has been used extensively for formal verification projectsin industry. HOL provers are used widely in the formalization ofmathematical theorems [23], hardware design [2], [116], commu-nication protocols verification [122], programs [119] and controlsystem analysis [232]. Similarly, HOL(y) Hammer offers machinelearning based premise selection and automated reasoning bothfor HOL Light and HOL4 [151]. Recent work in HOL includesthe formalization of quaternions [96], linear analog circuits [257],process algebra CCS [258] and metric spaces [190]. A library forcombinational circuits is developed in [244]. Moreover, formalizedLambek calculus has been ported from Coq to HOL4 in [259] withsome new theorems and improvements. A technique based on A*

algorithm is presented in [101] to automate the selection processof tactics and tactic-sequences in HOL4.

Nuprl [64], which inspired the development of RedPRL, is aproof development system based on Computational Type Theory.Nuprl has evolved significantly in years and now can handlethose logics where inference rules can be declared in a sequentstyle. It follows the LCF approach and the type theory includeless-common subtype, very-dependent function types and typeconstructors of quotient type. Type checking in Nuprl is unde-cidable as subtypes can be defined with arbitrary types. Whereasin PVS, an algorithm for type-checking automates all simpler typechecking tasks. Moreover, the computation language is untypedand judgments are also not decidable because the Poincare prin-ciple is assumed not only for intensional equality but also forextensional equality. Users can interact with Nuprl only throughstructural editors where proofs can be edited and viewed as prooftrees. Nuprl is used in mathematics formalization and hundredof theorems are proved in the system [11]. Nuprl is also usedin protocol verification [41], hardware and software specificationand verification [1], [181], reasoning about functional programs[135], design of reliable networks [175], and the development ofdemonstrably correct distributed systems [40]. Some work on theintegration of Nuprl with other systems is done in the past. In[12], PVS is integrated in Nuprl to enables users to access PVSfrom the Nuprl environment. A new semantics is provided in [134]to embed the logic of the HOL prover inside Nuprl. Similarly,Nuprl’s meta-theory is formalized in Coq [16], which is later usedin the Nuprl proof for the validity of Brouwer’s Bar Inductionprinciple [228].

Coq also follows the LCF approach and probably the mostdeveloped ITP after HOL Light. The logic used in Coq is veryexpressive that can define rich mathematical objects. Moreover,Coq has the ability to explicitly manipulate proof-objects fromproof-scripts, which makes the integrity of the syetm dependent onthe correctness of the type-checking algorithm. As Coq is based onconstructive foundations, it has two basic meta-types (also calledsorts): Prop (as a type of logical propositions) and Set (as a typeof other types (eg., Booleans, naturals, subsets, etc) [277]. Thisallows Coq to distinguish between terms that represent proofsand terms that represent programs. A program extractor can alsobe used to synthesize and extract verified programs (in OCaml,Haskell or Scheme) from their formal specifications [182]. Coquses two languages for proofs: Gallina (a pure functional program-ming language) for writing specification and LTac (a procedurallanguage) for the proof process manipulation. Main success storiesof the system are formalization of fully certified C-compiler [44],[180], disjoint-set data structure [69], formalization of two wayfinite automata [83], multiplier circuit formalization [220] andcoordination language [185]. Main mathematical formalizationsdone in Coq include the formalization of Feit-Thompson theorem[107], four-color theorem [106], three gap theorem [195], realanalysis [45] and theory of algebra [102]. A new approach inCoq is presented in [256] that directly generates provably-safe Ccode. Recently, Coq is used in the formal verification of dynamicalsystems [63], password quality checkers [90], security protocol[218], QWIRE quantum circuit language [230], complex datastructure invariants [146], component connectors [133], [279] andthe control function for the inverted pendulum [236]. Moreover, aplug-in (called SMTCoq) is developed in [86] to integrate externalSMT solvers in Coq. An IDE for integration of Coq projects intoEclipse is also developed in [88].

Page 15: 1 A Survey on Theorem Provers in Formal Methods

15

Compared to other ITPs, PVS is based on classical simpletype theory and is without proof-objects, which allows all kindsof rewriting for numeric as well as symbolic equalities. It of-fers theory interpretation, dependent types, predicate sub-typing,powerful decision procedures, Latex support for specifications andproofs, and is user-friendly due to highly expressive specificationlanguage and powerful built-in automated deduction. It is alsointegrated with other outside systems such as a BDD-based modelchecker and also serves as a back-end verification tool for com-puter algebra and code verification systems [191]. During proofconstruction, PVS builds a graphical proof tree in which remainingproof obligations are at the leaves of tree. Each node in the treerepresents a sequence and each branch is considered as a (sub)proof goal that is followed from its off-spring branch with thehelp of a proof step. PVS prover is based on sequent calculuswhere each proof goal is a sequent consisting of a sequence offormulas called antecedents and a sequence of formulas calledconsequents. The type system of PVS is not algorithmicallydecidable and theorem proving may be required to establish thetype-consistency of a PVS specification. Theorems that need to beproved are called type-correctness conditions (TCCs). PVS is usedin hardware and software verification [161], [215], concurrencyproblems verification [127], file systems verification [128], controlsystems [266], cryptographic protocol [24], microprocessor veri-fication [249], real time systems [243], formalization of integralcalculus [51] and medical devices [193]. Some recent work in PVSincludes the specification of multi-window user interface [246],formalization of component connectors [206], [208], analysis ofdistributed cognition systems [192], genetic algorithms operators[205] and cloud services [207]. PVS along with its libraries istranslated to the OMDoc/MMT framework in [167]. The proposedtranslation provides a universal representation format for theoremprover libraries. Similarly, PVS is allowed in [103] to export proofcertificates that can be verified externally. Moreover, denotationalsemantics for Answer Set Programming (ASP) is encoded in [10]and fundamental properties of ASP are proved with PVS theoremprover. Some of the differences between top ten famous proof-assistants are listed in Table 9.

TABLE 9: Comparison of proof-assistants

ITPs rel T dep. T dec. T state. R rpif LLibACL2 - - - - + + +

Isabelle ++ + - + + + +Metamath + - - - + + +

Twelf + + + + - - -Agda +++ + + + - - +Mizar + + + + + + +HOL ++ + - + + - +Nuprl ++ + + - - - -Coq + + + + + - +PVS - + + - + - +

rel: reliability, T: typed, dep. T: dependent type, dec. T: decidable type,state. R: statement about R, rpif: readable proof input files, LLib: largelibrary

6.2 ATPsIsabelle is built around a relatively small core that implementsmany theories as classical FOL, constructive type theory, intu-itionistic natural deduction and ZFC. Its meta-logic is based onthe fragment of intuitionistic simple type theory that includesbasic types as functional types. Whereas the terms are of typedλ-calculus. Only the type prop (proposition) is defined by the

meta-logic and the formulas are terms of type prop. The meta-logicsupports implication, the universal quantification and equality, andthe inference rule is provided in natural-deduction style [191]. Forproofs, Isabelle combines HOL for writing specification and Isaras the language to describe procedures for proofs manipulation.Isabelle/HOL [210] is the most widely used system nowadays.Isabelle offers a rich infrastructure for high-level proof schemes.During theory development, both structured and unstructuredproofs can be mixed freely. It is important to state that both HOLand Isabelle use non-standard proof-objects in the form of tacticsfor equational reasoning. This makes formalization relatively easyin both systems but it has the disadvantage that the proof-objectscan not be used to see the proof details. In principal, both systemscan be modified for proof-objects creation and storing.

HP used Isabelle in the design and analysis of the HP 9000 lineof servers’ Runway bus [52]. The L4.verified project at NICTAused Isabelle to prove the functional correctness of seL4 micro-kernel [163]. Moreover, Isabelle is successfully used in securityprotocols’ correctness [50], formalization of Java programminglanguage [267], Java virtual machine code soundness and com-pleteness [164], property verification of programming languagesemantics [165]. A list of research projects that uses Isabelle canbe found at: isabelle.in.tum.de/community/projects. Recent worksin Isabelle include the verification of Ethereum smart contractbytecode [14], imperative programs asymptotic time complexityverification [278] and formalization of Akra-Bazzi method [84],deep learning theoretical foundations [36], Green’s theorem [3]and Markov chains and Markov decision processes with discretetime and discrete state-spaces [132].

From last 15 years, E theorem prover is constantly participat-ing at CASC in more than one category (winnrer in SLH divisonin CASC-27 (2019)). The semantics of E is purely decelrative andinternal unique features are: shared terms with cached rewriting,folding feature vector indexing and fingerprint indexing. Uniquefeatures that are visible to the users are advanced and highlyflexible search heuristics. E main strengths are the generation ofproof-objects, the automatic problem analysis and the support forthe TPTP standrad for answers [241]. E is successfully used in thereasoning of large ontologies [229], software verification [231]and certification [81]. One of the main limitations in ATP is thelack of mechanism that allows proofs to guide the proof searchfor a new conjecture. In this regard, E is extended in [140] withvarious new clause selection strategies. These strategies are basedon similarity of a clause with the conjecture. The use of watchlists(also known as hint list) in large E theories is explored in [105],to improve the proof process automation.

Escher Verifier (the successor of Perfect Developer [71]) isbased on the Verified Design-by-Contract Paradigm [72] inspiredfrom Hoare logic and weakest precondition. It performs staticanalysis and formal verification of C programs by checking theout-of-bounds array indexing, arithmetic overflow, null-pointerde-referencing and other undefined behavior in the program. Itextends the C language with additional keywords and constructsthat are required in programs specifications expression and to givestrength to the C type system [53]. Term rewriting and FOL basedtheorem prover is implemented for the verification purpose. Theunique feature of the tool is that it provides hints for the caseswhen the provers is unable to verify the code automatically [114].Escher verifier is used in the verification of C programs [74],compilers [73] and formal analysis of web applications [75].

Vampire [235] is an ATP for FOL, based on equational super-

Page 16: 1 A Survey on Theorem Provers in Formal Methods

16

position calculus and is one of the best ATPs at CASC. Uniquefeatures of Vampire include the generation of interpolants andimplementation of a limited resource strategy (LRS). Moreover,symbol elimination is also implemented in Vampire that is used toautomatically find first-order program properties. It has a specialmode for working with very large knowledge bases and can answerqueries to them according to TPTP standars. On a multi-coresystem, Vampire can perform several proof attempts in parallel[173]. The strength of ATPs such as Vampire, E and SPASS inproving theorems from MML is presented in [262]. Some work onadding arithmetic to Vampire is done in [171]. Vampire is used in[111] to automate the soundness proofs of type systems. Vampireis also used for program analysis and in proving properties of loopswith arrays [172]. Cheap SAT solvers such as Avatar [233] playsan important role in the success of Vampire. In general, Vampireis well-suited in the domain of type soundness proofs. However,the use of Vampire relies heavily on the size of the chosen axiomset and on the concrete form of individual axioms.

Prover9 [196], the successor of the Otter prover, is a resolutionbased automated prover for equational logic and FOL. Mainstrength of Prover9 is that it is paired with Mace4. Users givesformulas and Prover9 attempts to find a proof. If proof is not findthen Mace4 looks for a counter-example. Similalry, Prooftrans canbe used to transform Prover9 proofs into more detailed proofs,simplify the justifications, re-number and expand proof steps,produce them in XML format, generate hints to guide subsequentsearches and produces proofs for input to proof checkers suchas IVY. Prover9 is used in the analysis of cryptographic-keyAssignment schemes [239], verification of Alloy specificationlanguage [187] and access control policies [238]. Moreover sometasks in combinatorics on words [131] and geometric procedure[216] is also formalized in Prover9, along with proofs of theoremsin Tarskian geometry [264]. Similarly, iProver [170] is based onan instantiation framework for FOL called Inst-Gen [98]. Firstorder reasoning is combined with ground reasoning in iProverwith the help of SAT solver called MiniSat [85]. Main strengths ofiProver are: reasoning with large theories, a predicate eliminationprocedure as a preprocessing technique, EPR-based k-inductionwith counterexample, model representation with first order defini-tions in term algebra and proof extraction for resolution as wellas instantiation. More details on iProver and other ATPs can befound in Appendix A.

Table 10 compares the famous ATPs that perform best atCASC for some features. SonTPTP (Systems on TPTP) is anonline interface for ATPs. It can be used by the users to run theATP on TPTP (thousand problems for theorem provers) library ortheir own problems in the TPTP language [252]. It is important topoint out that MaLARea [261] is not an ATP. It is a simple meta-system that combines several ATPs (E, SPASS, Vampire, etc) witha machine learning based component (SNoW system). MaLAReainterleaves the ATPs by first running them (in cycles) on problems,followed by machine learning from successful proofs. The learnedinformation is then used to limit the set of axioms provided toATPs in the next cycle. In CASC-J9 (2018), MaLARea comesfirst in the LTB division.

ATPs can be integrated (through hammers [43]) with ITPs forthe proof automation in interactive proof development process.Hammers use ATPs to automatically find the proofs for userdefined proof goals. They combine the learning from previousproofs with translation of the goals to the logics of ATPs andreconstruction of the successfully found proofs for the goals.

TABLE 10: Comparison of famous ATPs

Type ILang Lib SS Web serviceE SB-FOP C - + SonTPTP

Vampire SB-FOP C++ + - SonTPTPProver9/Otter RB-FOP C + - SonTPTP

SPASS SB-FOP Java/C - + SonTPTPSatallax TB-HOP OCaml - + SonTPTPiProver IB-FOP OCaml - + SonTPTPLEO-II RB-HOP OCaml + + SonTPTP

MaLARea MS-ATP Perl + - -

SS: standalone system, FOP: first-order prover SB-FOP: super-position-based FOP, RB-FOP: resolution-based FOP, TB-FOP:tableau-based FOP, TB-HOP: tableau-based higher-order prover,IB-FOP: instantaition-based FOP, RB-HOP: resolution-based HOP,SonTPTP: systems on TPTP. MS-ATP: metasystem for ATP

Similarly, the SAT/SMT solvers can be used in ITPs by firsttranslating and passing the goals to the fragment supported bya SAT/SMT solver. The SAT/SMT solver then solve the translatedgoal without human intervention and guidance. Table 11 lists someof the hammers and SAT/SMT solvers that are developed andintegrated with ITPs. Waldmeister [129] is integrated with Agdain [94], but no SAT/SMT solvers is yet integrated with Agda.Moreover, PVS employs the Yices SMT solver as an oracle [237],but not integrated with any ATPs yet. Some new theorem proversthat aims to fill the gap between interactive and automated theoremproving such as Lean [22] offers APIs to access features of SMTsolvers (CVC4, Z3) and ATPs.

TABLE 11: ATPs and SAT/SMT solvers for ITPs

ITP Hammers SAT/SMT solversIsabelle/HOL Sledgehammer [223] Yices in Isabelle/HOL [87]

HOL Light/HOL4 HOLyHammer [152] SMT solvers for HOL4 [269]Mizar MizAR [153] MiniSAT for Mizar [203]Coq Hammer for Coq [76] SMTCoq [86]

ACL2 ATPs for ACL2 [144] Smtlink for ACL2 [224]

6.3 Some Future DirectionsActive research activity is going on in both ITPs and ATPs.Despite the great progress in last three decades, general purposeFOL based theorem provers are still unable to directly determinethe satisfiability of a first-order formula. SMT problem dealswith whether a formula written in first-order is satisfiable insome logical theory. One of the famous theorem prover for SMTproblem is CVC4 [32]. SMT solvers may not terminate on someproblems due to undecidability of FOL. In such cases, users wouldlike to know the reason why the solver failed. Developing toolsfor SMT solvers which allows developers and users in helping thesystem to finish some proofs is an interesting research area.

Currently, most of the SMT solvers display “unknown” whenthey are unable in proving the unsatisfiability of quantified formu-las [234]. Main research direction in SMT solvers is to enablethem to return counter models in case they fail to prove theunsatisfiability of quantified formulas that ranges from integersand inductive datatypes to free sorts. Popular SMT solvers (CVC4,Yices, Z3 etc) generally work in a sequential manner. Oneanother research area is to parallelize SMT solving to betterutilize the capability of hardware in multi-core systems. PZ3[60] (the parallel solver for Z3) is one example for this kindof parallelization. Another important area is the development oftools that can integrate SMT solvers in ITPs to increase the levelof automation by offering safe tactics/strategies for solving proof

Page 17: 1 A Survey on Theorem Provers in Formal Methods

17

goals automatically with the help of external solvers. SMT solversfor ITPs listed in the preceding Table 11 are some example of this.

One of the main challenge in ATPs is reasoning with large on-tologies, which are becoming more dominant as large knowledgebases. Some techniques used for reasoning with large theoriesare based on methods for different axiom selection [130], [253].Machine learning is also used for axiom selection where previousknowledge is used to prove a conjecture [261], [263]. Frameworkfor abstract refinement, where the axioms selection and reasoningphases are interleaved, can also be used in reasoning of largeontologies, as shown recently in [126].

Theorem provers have the limitations that it is not fast enough,the logic is inconvenient as a scripting language and majority oftheorem provers do not support graphics and visualization tools.Similarly, ITPs requires heavy interactions between a user and theproof-assistant, which consumes a lot of time. IDE’s in theoremprovers , especially in ITPs, can substantially facilitate the creationof large proofs. However, very few of them are equipped withfull-fledged IDE’s. Some future work in this direction includes:(i) making the provers fast and efficient, and (ii) developmentof IDE’s and integration of IDE’s in different provers. This willmake these tools more acceptable in industrial sector. Similarly, inITPs, users make use of tactics that reduces a goal to simpler andsmaller sub-goals. Another interesting area is the development ofstrategies/tactics by using tactic languages, such as HITAC [21]and Ltac [80] which will allow users to elaborate proof strategiesand combine tactics.

ITPs also lack the inter-operability among proof-assistants andother related tools, which means that tool support cannot be easilyshared between ITPs. Translation of ITPs to a universal format isneeded to overcome the duplication efforts in the development ofsystems, their libraries and supporting periphery tools. Similarly,some work [99], [100] is done on integrating the model checkingwith theorem provers. However, integrating model checking withtheorem proving is more difficult as it involves the mapping ofmodels and mathematics involved in the analysis of the systems.

The vision of QED manifesto is to develop a universal,computer-based database for all mathematical knowledge thatis formalized in logic and is supported by proofs that can bevalidated mechanically. As shown in previous sections, theoremprovers are diverse in nature with radically different foundations.On one hand, using various provers offers a diverse experience,which helps to better understand the strengths and weaknessesof provers. However, on the other hand, the effort and overheadneeded to learn even one prover effectively makes researchersto stick to using just one system. This results in duplication ofsimilar work. A way of sharing the work and knowledge amongprovers would not be just appealing but it would also make proversmore powerful and practical. One feasible approach is to importtheorems from one prover to another.

Theorems between different systems are transported by trans-lating the libraries between systems, as done in [158], [174],[213]. The main challenge in sharing theorems is to ensure ameaningful semantic match between the provers, meaning thatlogic, definitions, types and treatment of functions, etc. in proversare compatible with each other. Isabelle’s sledgehammer [223]describes the way for integrating different automation tools. How-ever, sledgehammer has number of limitations, such as unsoundtranslation, primitive relevance filter and low performance onhigher-order problems. Integrating ITPs with ATPs still requiresa lot of research into approaches of interfacing. One of the main

challenge is a sound and reliable translation among differentlanguages and logics. Similarly, other main concern is the inter-pretation of ATP outputs back into ITP environment.

ITPs does have a large corpora of computer-understandablereasoning knowledge [42], [121] in the form of libraries. Thesecorpora can play an important role in artificial intelligence basedmethods, such as concept matching, structure formation and theoryexploration. The ongoing fast progress in machine learning anddata mining made it possible to use these learning techniqueson such corpora in guiding the proof search process, in proofautomation and in developing proof tactics/strategies, as indicatedin recent works [101], [105], [147], [154], [209]. Such machinelearning systems can also be combined with ATPs on the largeITPs libraries to develop an artificial intelligence based feedbackloops. Another interesting area is to use evolutionary algorithms[26] in ITPs to find and evolve proofs. Till now, effective searchmechanisms for formal proofs are lacking and we believe thatevolutionary algorithms are more suitable for this task due to theirsuitability to solve search and optimization problems. Moreover,investigating evolutionary/heuristic algorithms (as the program(proof) generator) and ITPs (as the proof verifier) to automaticallyfind formal proofs for new conjectures is also worth pursuing.Some initial work can be found in [136], [276], where a GA is usedwith the Coq to automatically find formal proofs for theorems.

A single tool based on machine learning is developed in[151] that can be used for every ITP on one condition: both thelanguage and its corresponding library are available in a universalformat, so that they can be easily put into the common selectionalgorithm. However, the universal format is generally infeasibleand expensive for many applications. The reason is that it isvery hard to built a universal format that can offer a good trade-off between universality and simplicity. Even if such a universalformat is available, the implementation of library export into theuniversal format is laborious.

One of the such universal format for formal knowledge isthe OMDoc/MMT framework [166], which has been used totranslate Mizar [138], HOL Light [150] and PVS [167] librariesinto OMDoc/MMT framework . Their work has made the librariesbecomes accessible to a wide range of OMDoc-based tools. Itwould be interesting to translate other famous ITPs logics andlibraries to OMDoc/MMT framework for formal mathematics andknowledge management. Also, translation of ITPs to one universalformat will make the machine learning based premise selectionto provers much easier. Moreover, with flexible alignments [149]between the libraries, the developers of different provers can beguided in the approximate translation of contents across librariesand in reuse notations, such as to show one prover content in aform that looks familiar to other prover users.

7 CONCLUSION

Mechanical reasoning systems are actively developed since thebirth of modern logic, and now these state-of-the-art tools areused in proving complicated mathematical theorems and verifyinglarge computer systems. A comprehensive survey on mechanicalreasoning systems (both ITPs and ATPs) is presented in this work.Main characteristics, strengths, differences and application areasof the systems are investigated. Some future research directionsbased on recent work are also discussed. In summary, we findthat formalization with theorem provers have not become ma-ture enough to adopt the working style of vast mathematical

Page 18: 1 A Survey on Theorem Provers in Formal Methods

18

community. It still needs: (i) better libraries support for proofsautomation, (ii) better ways of knowledge sharing between theproof systems, (iii) computation incorporation and verification,(iv) better means to store and search the background facts, (v)improved interfaces and integrations of ATPs, ITPs and SAT/SMTsolvers, and (vi) better support for the machine learning and deepmining techniques for proof guidance and automation. Currently,a wider researcher community is working on these problems and itis sanguinely estimated that mechanically formalized and verifiedmathematics will be a commonplace till the mid of this century.

In this survey, judgments which we made about theoremprovers may be subjective. The authors hope that this surveywill provide a quick and easy guide to the interested users andpeople without good mathematics knowledge into the work oftheorem provers. In advance, we allege for misrepresentationof these systems from their perspective developers or owners.We feel happy to be informed via e-mail about any lapses [email protected].

ACKNOWLEDGMENTS

The work has been supported by the National Natural ScienceFoundation of China under grant no. 61772038, 61532019 and61272160, and the Guandong Science and Technology Depart-ment (Grant no. 2018B010107004).

REFERENCES

[1] M. Aagaard and M. Leeser. Verifying a logic synthesis tool in Nuprl:A case study in software verification. In Proceedings of InternationalConference on Computer Aided Verification, pages 69–81, 1992.

[2] A. T. Abdel-Hamid, S. Tahar, and J. Harrison. Enabling hardwareverification through design changes. In Proceedings of the InternationalConference on Formal Engineering Methods, pages 459–470, 2002.

[3] M. Abdulaziz and L. C. Paulson. An Isabelle/HOL formalisationof Green’s theorem. In Proceedings of International Conference onInteractive Theorem Proving, pages 3–19, 2016.

[4] J. R. Abrial. The B-book: Assigning programs to meanings. CambridgeUniversity Press, 2005.

[5] J. R. Abrial. Formal methods: Theory becoming practice. Journal ofUniversal Computer Science, 13(5):619–628, 2007.

[6] M. Acewicz and K. Pak. Formalization of Pell’s equations in the Mizarsystem. In Proceedings of the Federated Conference on ComputerScience and Information Systems, pages 223–226, 2017.

[7] M. Adams. Introducing HOL Zero. In Proceedings of 3rd InternationalCongress on Mathematical Software, pages 142–143. Springer, 2010.

[8] S. Agerholm, I. Beylin, and P. Dybjer. A comparison of HOL andALF formalizations of a categorical coherence theorem. In Proceedingsof 9th International Conference on Theorem Proving in Higher OrderLogics, pages 17–32, 1996.

[9] S. Agerholm and M. Gordon. Experiments with ZF set theory inHOL and Isabelle. In Proceedings of 8th International Conferenceon Higher Order Logic Theorem Proving and its Applications, pages32–45. Springer, 1995.

[10] F. Aguado, P. Ascariz, P. Cabalar, G. Perez, and C. Vidal. Verificationfor ASP denotational semantics: A case study using the PVS theoremprover. Logic Journal of the IGPL, 25(2):195–213, 2015.

[11] S. Allen, R. Constable, R. Eaton, C. Kreitz, and L. Lorigo. TheNuprl open logical environment. In Proceedings of 17th InternationalConference on Automated Deduction, pages 170–176, 2000.

[12] S. F. Allen, M. Bickford, R. Constable, R. Eaton, and C. Kreitz. ANuprl–PVS connection: Integrating libraries of formal mathematics.Technical Report TR2003-1889, Cornell University, USA, 2003.

[13] S. F. Allen, M. Bickford, R. L. Constable, R. Eaton, C. Kreitz, L. Lorigo,and E. Moran. Innovations in computational type theory using Nuprl.Journal of Applied Logic, 4(4):428–469, 2006.

[14] S. Amani, M. Begel, M. Bortin, and M. Staples. Towards verifyingEthereum smart contract bytecode in Isabelle/HOL. In Proceedings ofthe 7th ACM SIGPLAN International Conference on Certified Programsand Proofs, pages 66–77, 2018.

[15] P. D. Ana Bove and U. Norell. A brief overview of Agda-A functionallanguage with dependent types. In Proceedings of 22nd InternationalConference on Theorem Proving in Higher Order Logic, pages 73–78,2009.

[16] A. Anand and V. Rahli. Towards a formally verified proof assistant.In Proceedings of International Conference on Interactive TheoremProving, pages 27–44, 2014.

[17] S. Antoy, M. Hanus, and S. Libby. Proving non-deterministic com-putations in Agda. In Proceedings of WLP’15/’16/WFLP’16, pages180–195, 2016.

[18] A. W. Appel. Foundational proof-carrying code. In Proceedings of16th Annual Symposium on Logic in Computer Science, pages 247–256,2001.

[19] R. C. Armstrong, R. J. Punnoose, M. H. Wong, and J. R. Mayo.Survey of existing tools for formal verification. Technical report, SandiaNational Laboratories, USA, 2014.

[20] R. Arthan. ProofPower–SLRP user guide. Technical report, Lemma 1Limited, 2005.

[21] D. Aspinall, E. Denney, and C. Luth. A tactic language for Hiproofs.In Proceedings of International Conference on Intelligent ComputerMathematics, pages 339–354, 2008.

[22] J. Avigad, L. de Moura, and S. Kong. Theorem proving in Lean, release3.4.0, 2019.

[23] J. Avigad and J. Harrison. Formally verified mathematics. Communica-tions of the ACM, 57(4):66–75, 2014.

[24] M. Ayala-Rincon and Y. S. Rego. Formalization in PVS of balancingproperties necessary for proving security of the Dolev-Yao cascadeprotocol model. Journal of Formalized Reasoning, 6(1):31–61, 2013.

[25] A. Azurat and W. Prasetya. A survey on embedding programminglogics in a theorem prover. Technical report, UU-CS-2002-007, UtrechtUniversity, Netharlands, 2002.

[26] T. Back. Evolutionary Algorithms in Theory and Practice. OxfordUniversity Press, 1996.

[27] F. Badeau and A. Amelot. Using B as a high level programminglanguage in an industrial project: Roissy VAL. In Proceedings of theInternational Conference of B and Z Users, pages 334–354, 2005.

[28] C. Baier, J. P. Katoen, and K. G. Larsen. Principles of model checking.MIT press, 2008.

[29] H. Barendregt. The impact of the lambda calculus in logic and computerscience. Bulletin of Symbolic Logic, 3(2):181–215, 1997.

[30] H. Barendregt and E. Barendsen. Autarkic computations in formalproofs. Journal of Automated Reasoning, 28(3):321–336, 2002.

[31] H. Barendregt and H. Geuvers. Proof-assistants using dependent typesystems. In Handbook of Automated Reasoning (in 2 volumes), pages1149–1238. 2001.

[32] C. Barrett, C. L. Conway, M. Deters, L. Hadarean, D. Jovanovic,T. King, A. Reynolds, and C. Tinelli. CVC4. In Proceedings of 23rdInternational Conference on Computer Aided Verification, pages 171–177, 2011.

[33] C. Barrett, R. Nieuwenhuis, A. Oliveras, and C. Tinelli. Splitting ondemand in SAT modulo theories. In Proceedings of 13th InternationalConference on Logic for Programming, Artificial Intelligence, andReasoning, pages 512–526, 2006.

[34] C. W. Barrett, R. Sebastiani, S. A. Seshia, and C. Tinelli. Satisfiabilitymodulo theories. In Handbook of Satisfiability, pages 825–885. 2009.

[35] D. Basin and M. Kaufmann. The Boyer-Moore prover and Nuprl: Anexperimental comparison, logical frameworks, 1991.

[36] A. Bentkamp, J. C. Blanchette, and D. Klakow. A formal proof of theexpressiveness of deep learning. In Proceedings of 8th InternationalConference on International Conference on Interactive Theorem Prov-ing, pages 46–64, 2017.

[37] J. P. Bernardy and P. Jansson. Certified context-free parsing: Aformalisation of Valiant’s algorithm in Agda. Logical Methods inComputer Science, 12, 2016.

[38] Y. Bertot and P. Casteran. Interactive theorem proving and pro-gram development-Coq’Art: The calculus of inductive constructions.Springer, 2013.

[39] W. Bibel. Research perspectives for logic and deduction. In Proceedingsof Reasoning, Action and Interaction in AI Theories and Systems, pages25–43, 2006.

[40] M. Bickford and D. Guaspari. A programming logic for distributedsystems. Technical report, ATC-NY, 2005.

[41] M. Bickford, C. Kreitz, R. van Renesse, and X. Liu. Proving hybridprotocols correct. In Proceedings of 14th International Conference onTheorem Proving in Higher Order Logics, pages 105–120, 2001.

Page 19: 1 A Survey on Theorem Provers in Formal Methods

19

[42] J. C. Blanchette, M. P. L. Haslbeck, D. Matichuk, and T. Nipkow.Mining the archive of formal proofs. In Proceedings of InternationalConference on Intelligent Computer Mathematics, pages 3–17, 2015.

[43] J. C. Blanchette, C. Kaliszyk, L. C. Paulson, and J. Urban. Hammeringtowards QED. Journal of Formalized Reasoning, 9(1):101–148, 2016.

[44] S. Blazy, Z. Dargaye, and X. Leroy. Formal verification of a C compilerfront-end. In Proceedings of 2006 International Symposium on FormalMethods, pages 460–475, 2006.

[45] S. Boldo, C. Lelay, and G. Melquiond. Coquelicot: A user-friendlylibrary of real analysis for Coq. Technical report, INRIA, France, 2013.

[46] S. Boldo, C. Lelay, and G. Melquiond. Formalization of real analysis:A survey of proof assistants and libraries. Mathematical Structures inComputer Science, 26(7):1196–1233, 2016.

[47] R. Boyer. The QED manifesto. In Proceedings of 12th InternationalConference on Automated Deduction, pages 238–251, 1994.

[48] P. Brereton, B. A. Kitchenham, D. Budgen, M. Turner, and M. Khalil.Lessons from applying the systematic literature review process withinthe software engineering domain. Journal of Systems and Softwares,80(4):571–583, 2007.

[49] C. Brown. Satallax: An automatic higher-order prover. In Proceedingsof 6th International Joint Conference on Automated Reasoning, pages111–117. Springer, 2012.

[50] D. F. Butin. Inductive analysis of security protocols in Isabelle/HOLwith applications to electronic voting. PhD thesis, 2012.

[51] R. W. Butler. Formalization of the integral calculus in the PVS theoremprover. Journal of Formalized Reasoning, 2(1):1–26, 2009.

[52] J. Camilleri. A hybrid approach to verifying liveness in a symmetricmultiprocessor. In 10th International Conference on Theorem Provingin Higher-Order Logics, pages 49–67, 1997.

[53] J. Carlton and D. Crocker. Escher verification studio perfect developerand Escher C verifier. Industrial Use of Formal Methods: FormalVerification, pages 155–193, 2013.

[54] M. Carneiro. Models for Metamath. Technical report, The Ohio StateUniversity, USA, 2015.

[55] M. Carneiro. Formalization of the prime number theorem and Dirich-let’s theorem. In Joint Proceedings of the FM4M, MathUI, and ThEduWorkshops, pages 10–13, 2016.

[56] M. M. Carneiro. Conversion of HOL Light proofs into Metamath.Journal of Formalized Reasoning, 9:187–200, 2016.

[57] L. Casset. Development of an embedded verifier for Java card bytecode using formal methods. In Proceedings of Formal Methods Europe,pages 20–309, 2002.

[58] R. N. Charette. Automated to death. IEEE Spectrum, 15, 2009.[59] C. K. Chau, W. A. Hunt, M. Roncken, and I. E. Sutherland. A

framework for asynchronous circuit modeling and verification in ACL2.In Proceedings of 13th International Haifa Conference on Hardwareand Software: Verification and Testing, pages 3–18, 2017.

[60] X. Cheng, M. Zhou, X. Song, M. Gu, and J. Sun. Parallelizing SMTsolving: Lazy decomposition and conciliation. Artificial Intelligence,257:127–157, 2018.

[61] S. J. Chou. Geo prover- A geometry theorem prover developed atUT. In Proceedings of the 8th International Conference on AutomatedDeduction, pages 679–680. Springer, 1986.

[62] E. M. Clarke, W. Klieber, M. Novacek, and P. Zuliani. Model checkingand the state explosion problem. In Tools for Practical SoftwareVerification, pages 1–30. 2012.

[63] C. Cohen and D. Rouhling. A formal proof in Coq of LaSalles’sinvariance principle. In Proceedings of International Conference onInteractive Theorem Proving, pages 148–163, 2017.

[64] R. L. Constable, S. F. Allen, H. M. Bromley, W. R. Cleaveland, J. F.Cremer, R. W. Harper, D. J. Howe, T. B. Knoblock, N. P. Mendler,P. Panangaden, J. T. Sasaki, and S. F. Smith. Implementing Mathematicswith the Nuprl proof development system. Prentice Hall, 1986.

[65] S. A. Cook. The complexity of theorem-proving procedures. InProceedings of the 3rd Annual Symposium on Theory of Computing,pages 151–158. ACM, 1971.

[66] E. Copello, A. Tasistro, and B. Bianchi. Case of (quite) painlessdependently typed programming: Fully certified merge sort in Agda.In Proceedings of Brazilian Symposium on Programming Languages,pages 62–76, 2014.

[67] T. Coquand and G. Huet. The calculus of constructions. Informationand Computation, 76(2–3):95–120, 1988.

[68] P. Corbineau. A declarative language for the Coq proof assistant.In Proceedings of International Workshop on Types for Proofs andPrograms, pages 69–84, 2007.

[69] K. Crary. Toward a foundational typed assembly language. In Pro-ceedings of International Symposium on the Principles of ProgrammingLanguages, pages 198–212, 2003.

[70] K. Crary and S. Sarkar. Foundational certified code in the Twelfmetalogical framework. ACM Transactions on Computational Logic,9(3):1–26, 2008.

[71] D. Crocker. Perfect developer: A tool for object-oriented formalspecification and refinement. In Tools Exhibition Notes at FormalMethods Europe, 2003.

[72] D. Crocker. Safe object-oriented software: The verified design-by-contract paradigm. In Proceedings of 12th Safety-Critical SystemsSymposium, pages 19–41, 2004.

[73] D. Crocker. Verifying compilers for financial applications. In GrandChallenge 6 workshop of Formal Methods, 2005.

[74] D. Crocker and J. Carlton. Verification of C programs using automatedreasoning. In Proceedings of 5th International Conference on SoftwareEngineering and Formal Methods, pages 1–8, 2007.

[75] D. Crocker and J. H. Warren. Generating commercial web applicationsfrom precise requirements and formal specifications. In 1st Interna-tional Workshop on Automated Specification and Verification of WebSites, pages 1–6, 2005.

[76] L. Czajka and C. Kaliszyk. Hammer for Coq: Automation for dependenttype theory. Journal of Automated Reasoning, 61(1-4):423–453, 2018.

[77] E. Davis and G. Marcus. The scope and limits of simulation inautomated reasoning. Artificial Intelligence, 233:60–72, 2016.

[78] M. Davis. The prehistory and early history of automated deduction. InAutomation of Reasoning 1: Classical Papers on Computational Logic1957-1966, pages 1–28. 1983.

[79] M. Davis, G. Logemann, and D. Loveland. A machine program fortheorem-proving. Communications of the ACM, 5(7):394–397, 1962.

[80] D. Delahaye. A tactic language for the system Coq. In Proceedings of7th International Conference on Logic for Programming and AutomatedReasoning, pages 85–95, 2000.

[81] E. Denney, B. Fischer, and J. Schumann. An empirical evaluationof automated theorem provers in software certification. InternationalJournal on Artificial Intelligence Tools, 15(1):81–108, 2006.

[82] E. W. Dijkstra. Cooperating sequential processes. In The Origin ofConcurrent Programming, pages 65–138. 1968.

[83] C. Doczkal and G. Smolka. Two-way automata in Coq. In Proceedingsof International Conference on Interactive Theorem Proving, pages151–166, 2016.

[84] M. Eberl. Proving divide and conquer complexities in Isabelle/HOL.Journal of Automated Reasoning, 58(4):483–508, 2017.

[85] N. Een and N. Sorensson. An extensible SAT-solver. In Proceedings of6th International Conference on Theory and Applications of Satisfiabil-ity Testing, pages 502–518, 2003.

[86] B. Ekici, A. Mebsout, C. Tinelli, C. Keller, G. Katz, A. Reynolds, andC. W. Barrett. Smtcoq: A plug-in for integrating SMT solvers into Coq.In Proceedings of 29th International Conference on Computer AidedVerification, pages 126–133, 2017.

[87] L. Erkk and J. Matthews. Using Yices as an automated solver inIsabelle/HOL. In Proceedings of Automated Formal Methods, pages3–13, 2008.

[88] A. Faithfull, J. Bengtson, E. Tassi, and C. Tankink. Coqoon - An IDEfor interactive proof development in Coq. International Journal onSoftware Tools for Technology Transfer, 20(2):125–137, 2018.

[89] A. P. Felty, E. L. Gunter, J. Hannan, D. Miller, G. Nadathur, and A. Sce-drov. Lambda-Prolog: An extended logic programming language. InProceedings of 9th International Conference on Automated Deduction,pages 754–755, 1988.

[90] J. F. Ferreira, S. A. Johnson, A. Mendes, and P. J. Brooke. Certifiedpassword quality - A case study using Coq and Linux pluggableauthentication modules. In Proceedings of International Conferenceon Integrated Formal Methods, pages 407–421, 2017.

[91] A. Flatau, M. Kaufmann, D. Reed, D. Russinoff, E. Smith, and R. Sum-ners. Formal verification of microprocessors at AMD. In Proceedingsof Designing Correct Circuits, 2002.

[92] J. P. P. Flor, W. Swierstra, and Y. Sijsling. Pi-Ware: Hardwaredescription and verification in Agda. In 21st International Conferenceon Types for Proofs and Programs, pages 1–27, 2015.

[93] L. Fortnow. The status of the P versus NP problem. Communicationsof the ACM, 52(9):78–86, 2009.

[94] S. Foster and G. Struth. Integrating an automated theorem prover intoAgda. In Proceedings of NASA Formal Methods Symposium, pages116–130, 2011.

[95] J. Franco and J. Martin. A history of satisfiability. volume 185 ofFrontiers in Artificial Intelligence and Applications. IOS Press, 2009.

Page 20: 1 A Survey on Theorem Provers in Formal Methods

20

[96] A. Gabrielli and M. Maggesi. Formalizing basic quaternionic analysis.In Proceedings of International Conference on Interactive TheoremProving, pages 225–240, 2017.

[97] J. H. Gallier. Logic for Computer Science: Foundations of automatictheorem proving. Courier Dover Publications, 2015.

[98] H. Ganzinger and K. Korovin. New directions in instantiation-basedtheorem proving. In Proceedings of 18th Symposium on Logic inComputer Science, pages 55–64, 2003.

[99] B. V. Gastel, L. Lensink, S. Smetsers, and M. van Eekelen. Reentrantreaders-writers: A case study combining model checking with theoremproving. In Proceedings of International Worksop on Formal Methodsfor Industrial Critical Systems, pages 85–102, 2008.

[100] B. V. Gastel, L. Lensink, S. Smetsers, and M. van Eekelen. Deadlockand starvation free reentrant readers–writers: A case study combiningmodel checking with theorem proving. Science of Computer Program-ming, 76(2):82–99, 2011.

[101] T. Gauthier, C. Kaliszyk, and J. Urban. Tactictoe: Learning to reasonwith HOL4 tactics. In Proceedings of 21st International Conferenceon Logic for Programming, Artificial Intelligence and Reasoning, pages125–143, 2017.

[102] H. Geuvers, R. Pollack, F. Wiedijk, and J. Zwanenburg. A construc-tive algebraic hierarchy in Coq. Journal of Symbolic Computation,34(4):271–286, 2002.

[103] F. Gilbert. Proof certificates in PVS. In Proceedings of InternationalConference on Interactive Theorem Proving, pages 262–268, 2017.

[104] S. Goel. The x86isa books: Features, usage, and future plans. InProceedings of 14th International Workshop on the ACL2 TheoremProver and its Applications, pages 1–17, 2017.

[105] Z. Goertzel, J. Jakubuv, S. Schulz, and J. Urban. Proofwatch: Watchlistguidance for large theories in E. In Proceedings of 9th InternationalConference on Interactive Theorem Proving, pages 270–288, 2018.

[106] G. Gonthier. Formal proof of the Four-Color theorem. Notices of theAmerican Mathematical Society, 55:182–1393, 2008.

[107] G. Gonthier, A. Asperti, J. Avigad, Y. Bertot, C. Cohen, F. Garillot,S. L. Roux, A. Mahboubi, R. OConnor, S. O. Biha, I. Pasca, L. Rideau,A. Solovyev, E. Tassi, and L. Thery. A machine-checked proof of theodd order theorem. In Proceedings of 4th International Conference onInteractive Theorem Proving, pages 163–179, 2013.

[108] M. J. C. Gordon, W. A. Hunt, M. Kaufmann, and J. Reynolds. Anembedding of the ACL2 logic in HOL. In Proceedings of the 6th Inter-national Workshop on the ACL2 Theorem Prover and its Applications,pages 40–46. ACM, 2006.

[109] M. J. C. Gordon, J. Reynolds, W. A. Hunt, and M. Kaufmann. Anintegration of HOL and ACL2. In Proceedings of the 6th InternationalConference on Formal Methods in Computer-Aided Design, pages 153–160, 2006.

[110] A. Grabowski, A. Kornilowicz, and A. Naumowicz. Mizar in a nutshell.Journal of Formalized Reasoning, 3(2):153–245, 2010.

[111] S. Grewe, S. Erdweg, and M. Mezini. Using vampire in soundnessproofs of type systems. Technical report, 2016.

[112] D. Griffioen and M. Huisman. A comparison of PVS and Isabelle/HOL.In Proccedings of 11th International Conference on Theorem Provingin Higher Order Logics, pages 123–142, 1998.

[113] T. C. Hales. Mathematics in the age of the turing machine. Turing’slegacy. Technical report, University of Pittsburgh, 2013.

[114] M. R. Harbach. Methods and tools for the formal verification ofsoftware. Master’s thesis, Technical University Wien, 2011.

[115] J. Harrison. A Mizar mode for HOL. In Proceedings of 9th InternationalConference on Theorem Proving in Higher Order Logics, pages 203–220, 1996.

[116] J. Harrison. Floating-point verification. Journal of Universal ComputerScience, 13(5):629–638, 2007.

[117] J. Harrison. A short survey of automated reasoning. In Proceedings ofInternational Conference on Algebraic Biology, pages 334–349, 2007.

[118] J. Harrison. Formal proof theory and practice. Notices of the AmericanMathematical Society, 55(11):1395–1406, 2008.

[119] J. Harrison. Handbook of Practical Logic and Automated Reasoning.Cambridge University Press, 2009.

[120] J. Harrison. Hol Light: An overview. In Proceedings of 22st Interna-tional Conference on Theorem Proving in Higher Order Logics, pages60–66. Springer, 2009.

[121] J. Harrison, J. Urban, and F. Wiedijk. History of interactive theoremproving. In Computational Logic, volume 9, pages 135–214, 2014.

[122] O. Hasan and S. Tahar. Performance analysis and functional verificationof the stop-and-wait protocol in HOL. Journal of Automated Reasoning,42(1):1–33, 2009.

[123] O. Hasan and S. Tahar. Formal verification methods. In Encyclopedia ofInformation Science and Technology, Third Edition, pages 7162–7170.IGI Global, 2015.

[124] J. V. Heijenoort. Historical development of modern logic. LogicaUniversalis, pages 1–11, 2012.

[125] J. Heras and E. Komendantskaya. ACL2(ml): Machine-learning forACL2. In Proceedings of 12th International Workshop on ACL2Theorem Prover and its Applications, pages 461–75, 2014.

[126] J. C. L. Hernandez and K. Korovin. Towards an abstraction-refinementframework for reasoning with large theories. In Proceedings of IWILWorkshop and LPAR Short Presentations, pages 119–123, 2017.

[127] W. Hesselink and M. IJbema. Starvation-free mutual exclusion withsemaphores. Formal Aspects of Computing, 25:947–969, 2013.

[128] W. Hesselink and M. I. Lali. Formalizing a hierarchical file system.Formal Aspects of Computing, 24:27–44, 2012.

[129] T. Hillenbrand, A. Buch, R. Vogt, and B. Lochner. Waldmeister: Highperformance equational deduction. Journal of Automated Reasoning,18:265–270, 1997.

[130] K. Hoder and A. Voronkov. Sine qua non for large theory reasoning. InProceedings of 23rd International Conference on Automated Deduction,pages 299–314, 2011.

[131] S. Holub and R. Veroff. Formalizing a fragment of combinatorics onwords. In Proceedings of 13th International Conference on Computabil-ity in Europe, pages 24–31, 2017.

[132] J. Holzl. Markov chains and markov decision processes in Is-abelle/HOL. Journal of Automated Reasoning, 59(3):345–387, 2017.

[133] W. Hong, M. S. Nawaz, X. Zhang, Y. Li, and M. Sun. Using Coq forformal modeling and verification of timed connectors. In Proceedingsof Software Engineering and Formal Methods: SEFM 2017 CollocatedWorkshops, Revised Selected Papers, pages 558–573.

[134] D. J. Howe. Semantic foundations for embedding HOL in Nuprl. InProceedings of International Conference on Algebraic Methodology andSoftware Technology, pages 85–101, 1996.

[135] D. J. Howe. Reasoning about functional programs in Nuprl. InProceedings of Functional Programming, Concurrency, Simulation andAutomated Reasoning, pages 145–164, 2005.

[136] S. Huang and Y. Chen. Proving theorems by using evolutionary searchwith human involvement. In Proceedings of Congress on EvolutionaryComputation, pages 1495–1502, 2017.

[137] W. A. Hunt, M. Kaufmann, J. S. Moore, and A. Slobodova. Industrialhardware and software verification with ACL2. Philosophical Transac-tions A, 375:20150399, 2017.

[138] M. Iancu, M. Kohlhase, F. Rabe, and J. Urban. The Mizar mathematicallibrary in OMDoc: Translation and applications. Journal of AutomatedReasoning, 50(2):191–202, 2013.

[139] L. Jakubiec, S. Coupet-Grimal, and P. Curzon. A comparison of the Coqand HOL proof systems for specifying hardware. Short Presentationsat 10th International Conference on Theorem Proving in Higher OrderLogics, 63:78, 1997.

[140] J. Jakubuv and J. Urban. Extending E prover with similarity basedclause selection strategies. In Proceedings of 9th International Confer-ence on Intelligent Computer Mathematics, pages 151–156, 2016.

[141] P. Janicic. Automated reasoning: Some successes and new challenges.In Proceedings of the 22nd Central European Conference on Informa-tion and Intelligent Systems, 2011.

[142] S. Jaskowski. On the rules of supposition in formal logic. Studia Logica,1, 1934.

[143] A. Jeffrey. Dependently typed web client applications. In Proceed-ings of International Symposium on Practical Aspects of DeclarativeLanguages, pages 228–243, 2013.

[144] S. J. C. Joosten, C. Kaliszyk, and J. Urban. Initial experimentswith TPTP-style automated theorem provers on ACL2 problems. InProceedings of 12th International Workshop on the ACL2 TheoremProver and its Applications, pages 77–85, 2014.

[145] J. Julliand, B. Legeard, T. Machicoane, B. Parreaux, and B. Tatibouet.Specification of an integrated circuit card protocol application using theB method and linear temporal logic. In Proceedings of InternationalConference of B Users, pages 273–292, 1998.

[146] J. Kaiser, B. Pientka, and G. Smolka. Relating system F and Lambda2:A case study in Coq, Abella and Beluga. In Proceedings of 2ndInternational Conference on Formal Structures for Computation andDeduction, pages 21:1–21:19, 2017.

[147] C. Kaliszyk, F. Chollet, and C. Szegedy. HolStep : A machine learningdataset for higher order logic theorem proving. In Proceedings of 5thInternational Conference on Learning Representations, 2017.

[148] C. Kaliszyk and K. Pak. Progress in the independent certificationof Mizar mathematical library in Isabelle. In Proceedings of 12th

Page 21: 1 A Survey on Theorem Provers in Formal Methods

21

Federated Conference on Computer Science and Information Systems,pages 227–236, 2017.

[149] C. Kaliszyk, K. Pak, and J. Urban. Towards a Mizar environment forIsabelle: Foundations and language. In Proceedings of the Conferenceon Certified Programs and Proofs, pages 58–65, 2016.

[150] C. Kaliszyk and F. Rabe. Towards knowledge management for HOLLight. In Proceedings of Internaional Conference on Intelligent Com-puter Mathematics, pages 357–372, 2014.

[151] C. Kaliszyk and J. Urban. Hol(y)hammer: Online ATP service for HOLLight. Mathematics in Computer Science, 9(1):5–22, 2015.

[152] C. Kaliszyk and J. Urban. Hol(y)hammer: Online ATP service for HOLLight. Mathematics in Computer Science, 9(1):5–22, 2015.

[153] C. Kaliszyk and J. Urban. MizAR 40 for Mizar 40. Journal ofAutomated Reasoning, 55(3):245–256, 2015.

[154] C. Kaliszyk, J. Urban, H. Michalewski, and M. Olsak. Reinforcementlearning of theorem proving. In Proceedings of Annual Conference onNeural Information Processing Systems, pages 8836–8847, 2018.

[155] K. Kanso. Agda as a platform for the development of verified railwayinterlocking systems. PhD thesis, 2012.

[156] M. Kaufmann and J. S. Moore. An industrial strength theorem proverfor a logic based on Common Lisp. IEEE Transactions on SoftwareEngineering, 23(4):203–213, 1997.

[157] M. Kaufmann and J. S. Moore. ACL2 and its applications to digitalsystem verification. In Proceedings of the International Conference onDesign and Verification of Microprocessor Systems for High-AssuranceApplications, 2010.

[158] C. Keller and B. Werner. Importing HOL Light into Coq. In Proceedingsof International Conference on Interactive Theorem Proving, pages307–322, 2010.

[159] M. Kerber, C. Lange, and C. Rowat. Formal representation andproof for cooperative games. In Symposium on Mathematical Practiceand Cognition II. Society for the Study of Artificial Intelligence andSimulation of Behaviour, pages 15–18, 2012.

[160] M. Kerber, C. Lange, and C. Rowat. An introduction to mechanizedreasoning. Journal of Mathematical Economics, 66:26–39, 2016.

[161] T. Kim, D. Stringer-Calvert, and S. Cha. Formal verification of func-tional properties of an SCR-style software requirements specificationusing PVS. In Proceedings of 8th International Conference on Toolsand Algorithms for the Construction and Analysis of Systems, pages205–220, 2002.

[162] B. Kitchenham, H. Al-Khilidar, M. A. Babar, M. Berry, K. Cox,J. Keung, F. Kurniawati, M. Staples, H. Zhang, and L. Zhu. Evalu-ating guidelines for reporting empirical software engineering studies.Emperical Software Engineering, 13(1):97–121, 2008.

[163] G. Klein, K. Elphinstone, G. Heiser, J. Andronick, D. Cock, P. Derrin,D. Elkaduwe, K. Engelhardt, R. Kolanski, M. Norrish, T. Sewell,H. Tuch, and S. Winwood. seL4: Formal verification of an OS kernel. InProceedins of 22nd Symposium on Operating System Principles, pages200–220, 2009.

[164] G. Klein and T. Nipkow. Verified lightweight bytecode verification.Concurrency and Computataion: Practice and Experience, 13:1133–1151, 2001.

[165] G. Klein and T. Nipkow. Applications of interactive proof to data flowanalysis and security. In Software Systems Safety, pages 77–134, 2014.

[166] M. Kohlhase. OMDoc - An Open Markup Format for MathematicalDocuments [version 1.2], volume 4180 of LNCS. Springer, 2006.

[167] M. Kohlhase, D. Mller, S. Owre, and F. Rabe. Making PVS accessible togeneric services by interpretation in a universal format. In Proceedingsof 8th International Conference on Interactive Theorem Proving, pages319–335, 2017.

[168] W. Kokke. Formalising type-logical grammars in Agda. In Proceedingsof 1st Workshop on Type Theory and Lexical Semantics, 2015.

[169] A. Kornilowicz, A. Kryvolap, M. Nikitchenko, and I. Ivanov. Formal-ization of the nominative algorithmic algebra in Mizar. In Proceedins of38th Iternational Conference on Information Systems Architecture andTechnology, pages 176–186, 2017.

[170] K. Korovin. iProver - an instantiation-based theorem prover for first-order logic. In Proceedings of 4th International Joint Conference onAutomated Reasoning, pages 292–298, 2008.

[171] K. Korovin and A. Voronkov. Solving systems of linear inequalities bybound propagation. In Proceedings of 23rd International Conferenceon Automated Deduction, pages 269–383, 2011.

[172] L. Kovacs and A. Voronkov. Finding loop invariants for programs overarrays using a theorem prover. In Proceedings of the International Con-ference on Fundamental Approaches to Software Engineering, pages470–485, 2009.

[173] L. Kovacs and A. Voronkov. First-order theorem proving and Vampire.In Proceedings of 25th International Conference on Computer AidedVerification, pages 1–35, 2013.

[174] A. Krauss and A. Schropp. A mechanized translation from higher-order logic to set theory. In Proceedings of International Conference onInteractive Theorem Proving, pages 323–338, 2010.

[175] C. Kreitz. Building reliable, high-performance networks with the Nuprlproof development system. Journal of Functional Programming, 14:21–68, 2004.

[176] T. Kropf. Introduction to formal hardware verification. Springer, 2013.[177] T. Lecomte, D. Deharbe, E. Prun, and E. Mottin. Applying a formal

method in industry: A 25-year trajectory. In Proceedings of 20thBrazilian Symposium on Formal Methods, pages 70–87, 2017.

[178] T. Lecomte, T. Servat, and G. Pouzancre. Formal methods in safety-critical railway systems. In 10th Brasilian Symposium on FormalMethods, 2007.

[179] D. Lee, K. Crary, and R. Harper. Towards a mechanized metatheory ofstandard ML. In Proceedings of 34th Symposium on the Principles ofProgramming Languages, pages 173–184, 2007.

[180] X. Leroy. A formally verified compiler back-end. Journal of AutomatedReasoning, 43:363–446, 2009.

[181] M. Lesser. Using Nuprl for the verification and synthesis of hardware.Philosophical Transactions of the Royal Society of London A: Mathe-matical, Physical and Engineering Sciences, 339(1652):49–68, 1992.

[182] P. Letouzey. Extraction in Coq: An overview. In Proceedings of 4thConference on Computability in Logic and Theory of Algorithms, pages359–369, 2008.

[183] R. Letz, J. Schuman, S. Bayerl, and W. Bibel. SETHEO: Ahigh-performance theorem prover. Journal of Automated Reasoning,8(2):183–212, 1992.

[184] N. G. Levenson. System safety and computers. Addison Wesley, 1995.[185] Y. Li and M. Sun. Modeling and verification of component connectors

in Coq. Science in Computer Progamming, 113:285–301, 2015.[186] F. Lindblad and M. Benke. A tool for automated theorem proving in

Agda. In Proceedings of International Workshop on Types for Proofsand Programs, pages 154–169, 2004.

[187] N. Macedo and A. Cunha. Automatic unbounded verification of Alloyspecifications with Prover9. CoRR, abs/1209.5773, 2012.

[188] D. Mackenzie. The automation of proof: A historical and sociologicalexploration. IEEE Annals of the History of Computing, 17(3):7–29,1995.

[189] J. Mackie and I. Sommerville. Failures of healthcare systems. InProceedings of the 1st Dependability IRC Workshop, pages 1–8, 2000.

[190] M. Maggesi. A formalization of metric spaces in HOL light. Journal ofAutomated Reasoning, 60(2):237–254, 2018.

[191] F. Maric. A survey of interactive theorem proving. Zb. Rad, 18:173–223, 2015.

[192] P. Masci, P. Curzon, D. Furniss, and A. Blandford. Using PVS tosupport the analysis of distributed cognition systems. Innovations inSystems and Software Engineering, 11:113–130, 2015.

[193] P. Masci, Y. Zhang, P. Jones, P. Curzon, and H. Thimbleby. Formal ver-ification of medical device user interfaces using PVS. In Proceedings ofthe International Conference on Fundamental Approaches to SoftwareEngineering, pages 200–214, 2014.

[194] R. Matuszewski and P. Rudnicki. Mizar: The first 30 years. MechanizedMathematics and Its Applications, 4:3–24, 2005.

[195] M. Mayero. The three gap theorem (steinhauss conjecture). Technicalreport, INRIA, France, 2006.

[196] W. McCune. Prover9 and Mace4. Available at:cs.unm.edu/˜mccune/prover9, 2005-2010.

[197] W. W. McCune. Otter 3.0 reference manual and guide. Technical report,ANL-94/6, Argonne National Laboratory, 1994.

[198] N. Megill. Metamath: A Computer Language for Pure Mathematics.Lulu Press, USA, 2007.

[199] J. S. Moore. A mechanical analysis of program verification strategies.Formal Methods in System Design, 14:213–228, 1999.

[200] J. S. Moore. Proving theorems about Java-Like byte code. In CorrectSystem Design, 2000.

[201] G. J. Myers, T. Badgett, and C. Snadler. The art of software testing,Third Edition. John Wiley & Sons Publishers, 2011.

[202] P. Naumov, M.-O. Stehr, and J. Meseguer. The HOL/NuPRL prooftranslator. Proceedings of 14th International Conference on TheoremProving in Higher Order Logic, 2152:329–345, 2001.

[203] A. Naumowicz. SAT-enhanced Mizar proof checking. In Proceedingsof 7th International Conference on Intelligent Computer Mathematics,pages 449–452, 2014.

Page 22: 1 A Survey on Theorem Provers in Formal Methods

22

[204] M. S. Nawaz, M. I. Lali, and S. Meng. Formal modeling, analysisand verification of Black White Bakery algorithm. In 9th InternationalConference on Intelligent Human-Machine Systems and Cybernatics,pages 407–410. IEEE, 2017.

[205] M. S. Nawaz and M. Sun. A formal design model for genetic algorithmsoperators and its encoding in PVS. In Proceedings of 2nd InternationalConference on Big Data and Internet of Things, pages 2186–190, 2018.

[206] M. S. Nawaz and M. Sun. Reo2PVS: Formal specification and verifica-tion of component connectors. In Proceedings of 30th InternationalConference on Software Engineering and Knowledge Engineering,pages 391–396, 2018.

[207] M. S. Nawaz and M. Sun. Using PVS for modeling and verifying cloudservices and their composition. In Proceedings of 6th InternationalConference on Advanced Cloud and Big Data, pages 41–46, 2018.

[208] M. S. Nawaz and M. Sun. Using PVS for modeling and verificationof probabilistic connectors. In Proceedings of International Conferenceon Fundamentals of Software Engineering, pages 61–76, 2019.

[209] M. S. Nawaz, M. Sun, and P. Fournier-Viger. Proof guidance inPVS with sequential pattern mining. In Proceedings of InternationalConference on Fundamentals of Software Engineering, pages 45–60,2019.

[210] T. Nipkow, L. C. Paulson, and M. Wenzel. Isabelle/HOL - A proofassistant for higher-order logic, volume 2283 of LNCS. Springer, 2002.

[211] U. Norell. Dependently typed programming in Agda. In Proceedingsof the 4th International Workshop on Types in Language Design andImplementation, pages 1–2. ACM, 2009.

[212] M. Norrish. Formalising C in HOL. PhD thesis, Computer Laboratory,University of Cambridge, 1998.

[213] S. Obua and S. Skalberg. Importing HOL into Isabelle/HOL. In Pro-ceedings of 3rd Inernational Joint Conference on Automated Reasoning,pages 298–302, 2006.

[214] H. Okazaki and Y. Futa. Formalization of polynomially bounded andnegligible functions using the computer-aided proof-checking systemMizar. In Proceedings of 9th Conference on Intelligent ComputerMathematics, pages 117–131, 2016.

[215] S. Owre, J. M. Rushby, N. Shankar, and M. K. Srivas. A tutorial on us-ing PVS for hardware verification. In Proceedings of 2nd InternationalConference on Theorem Provers in Circuit Design - Theory, Practiceand Experience, pages 258–279, 1994.

[216] R. Padmanabhan and R. Veroff. A geometric procedure with Prover9. InAutomated Reasoning and Mathematics - Essays in Memory of WilliamW. McCune, pages 139–150, 2013.

[217] D. Page. Report of the inquiry into the london ambulance service, 1993.[218] H. M. Palombo, H. Zheng, and J. Ligatti. POSTER: Towards precise

and automated verification of security protocols in Coq. In Proceedingsof the 2017 Conference on Computer and Communications Security,pages 2567–2569, 2017.

[219] P. Papapanagiotou and J. D. Fleuriot. The Boyer-Moore waterfall modelrevisited. CoRR, abs/1808.03810, 2018.

[220] C. Paulin-Mohring. Circuits as streams in Coq: Verification of asequential multiplier. In Proceedings of the International Workshop onTypes for Proofs and Programs, slected papers, pages 216–230, 2005.

[221] L. C. Paulson. Logic and Computation: Interactive Proof with Cam-bridge LCF. Cambridge University Press, 1987.

[222] L. C. Paulson. Isabelle: A generic theorem prover. Springer, 1994.[223] L. C. Paulson and J. Blanchette. Three years of experience with

Sledgehammer, a practical link between automated and interactivetheorem provers. In Invited talk at 8th International Workshop on theImplementation of Logics, 2010.

[224] Y. Peng and M. R. Greenstreet. Extending ACL2 with SMT solvers.In Proceedings of 13th International Workshop on the ACL2 TheoremProver and Its Applications, pages 61–77, 2015.

[225] F. Pfenning. Structural cut elimination. In Proceedings of 10th AnnualSymposium on Logic in Computer Science, pages 156–166, 1995.

[226] F. Pfenning and C. Schurmann. System description: Twelf-a meta-logical framework for deductive systems. In Proceedings of Interna-tional Conference on Automated Deduction, pages 202–206, 1999.

[227] D. L. Rager. Adding parallelism capabilities to ACL2. In Proceedingsof the 6th International Workshop on the ACL2 Theorem Prover and itsapplications, pages 90–94. ACM, 2006.

[228] V. Rahli and M. Bickford. Coq as a metatheory for Nuprl with barinduction. Technical report, Cornell University, 2015.

[229] D. Ramachandran, P. Reagan, and K. Goolsbery. First-orderized re-searchcyc : Expressivity and efficiency in a common-sense ontology. InProceedings of AAAI Workshop on Contexts and Ontologies: Theory,Practice and Applications, 2005.

[230] R. Rand, J. Paykin, and S. Zdancewic. QWIRE practice: Formal verifi-cation of quantum circuits in Coq. In Proceedings of 14th InternationalConference on Quantum Physics and Logic, pages 119–132, 2017.

[231] S. Ranise and D. Deharbe. Applying light-weight theorem provingto debugging and verifying pointer programs. In Proceedings of 4thInternational Workshop on First-Order Theorem Proving, pages 109–119, 2009.

[232] A. Rashid, O. Hasan, U. Siddique, and S. Tahar. Formal reasoning aboutsystems biology using theorem proving. PLoS ONE, 12(7):e0180179,2017.

[233] G. Reger, M. Suda, and A. Voronkov. Playing with AVATAR. InProceedings of the 25th Internal Conference on Automated Deduction,pages 399–415, 2015.

[234] A. Reynolds, C. Tinelli, and C. Barrett. Constraint solving for finitemodel finding in SMT solvers. Theory and Practice of Logic Program-ming, 17(4):516–558, 2017.

[235] A. Riazanov and A. Voronkov. Vampire 1.1 (system description).In Proceedings of 1st International Joint Conference on AutomatedReasoning, pages 376–380, 2001.

[236] D. Rouhling. A formal proof in Coq of a control function for theinverted pendulum. In Proceedings of the 7th International Conferenceon Certified Programs and Proofs, pages 28–41, 2018.

[237] J. M. Rushby. Tutorial: Automated formal methods with PVS, SAL,and Yices. In 4th International Conference on Software Engineeringand Formal Methods, page 262, 2006.

[238] K. E. Sabri. Automated verification of role-based access control policiesconstraints using Prover9. CoRR, abs/1503.07645, 2015.

[239] K. E. Sabri and R. Khedri. A generic algebraic model for the analysisof cryptographic-key assignment schemes. In Proceedings of 5thInternational Symposium on Foundations and Practice of Security,pages 62–77, 2012.

[240] S. Schulz. E-A brainiac theorem prover. Ai Communications, 15(2,3):111–126, 2002.

[241] S. Schulz, S. Cruanes, and P. Vukmirovic. Faster, higher, stronger: E2.3. In Proceedings of 27th International Conference on AutomatedDeduction, pages 495–507, 2019.

[242] C. Schurmann and M. Stehr. An executable formalization of theHOL/Nuprl connection in the meta-logical framework Twelf. InProceedings of International Conference on Logic for ProgrammingArtificial Intelligence and Reasoning, pages 150–166, 2006.

[243] N. Shankar. Verification of real-time systems using PVS. In Proceedingsof the International Conference on Computer Aided Verification, pages280–291, 1993.

[244] S. Shiraz and O. Hasan. A library for combinational circuit verificationusing the HOL theorem prover. IEEE Transaction on CAD of IntegratedCircuits and Systems, 37(2):512–516, 2018.

[245] J. Siekmann and G. Wrightson. Automation of Reasoning: 2: ClassicalPapers on Computational Logic 1967–1970. Springer, 2012.

[246] K. Singh and B. Auernheimer. Formal specification of Multi-Windowuser interface in PVS. In Proceedings of International Conference onHuman-Computer Interaction, pages 144–149, 2016.

[247] K. Slind and M. Norrish. A brief overview of HOL4. In Proceedingsof 21st International Conference on Theorem Proving in Higher OrderLogics, pages 28–32, 2008.

[248] M. H. Sorensen and P. Urzyczyn. Lectures on the Curry-HowardIsomorphism. Elsevier, 2006.

[249] M. K. Srivas and S. P. Miller. Applying formal verification to theAAMP5 microprocessor: A case study in the industrial use of formalmethods. Formal Methods in System Design, 8(2):153–188, 1996.

[250] J. Sterling, D. Gratzer, V. Rahli, D. Morrison, E. Akentyev, andA. Tosun. RedPRL–The people’s refinement logic, available at:http://www.redprl.org/, 2016.

[251] G. Sutcliffe. The CADE ATP system competition - CASC. AIMagazine, 37(2):99–101, 2016.

[252] G. Sutcliffe. The TPTP problem library and associated infrastructure- From CNF to TH0, TPTP v6.4.0. Journal of Automated Reasoning,59(4):483–502, 2017.

[253] G. Sutcliffe and Y. Puzis. SRASS - A semantic relevance axiomselection system. In Proceedings of 21st International Conference onAutomated Deduction, pages 295–310, 2007.

[254] G. Sutcliffe and C. Suttner. Evaluating general purpose automatedtheorem proving systems. Artificial Intelligence, 131(1):39–54, 2001.

[255] S. Swords and J. Davis. Bit-blasting ACL2 theorems. In Proceedingsof 10th International Workshop on the ACL2 Theorem Prover and itsApplications, pages 84–102, 2011.

Page 23: 1 A Survey on Theorem Provers in Formal Methods

23

[256] A. Tanaka, R. Affeldt, and J. Garrigue. Safe low-level code genera-tion in Coq using monomorphization and monadification. Journal ofInformaion Processing, 26:54–72, 2018.

[257] S. H. Taqdees and O. Hasan. Formally verifying transfer functions oflinear analog circuits. IEEE Design & Test, 34(5):30–37, 2017.

[258] C. Tian. A formalization of the process algebra CCS in HOL4. CoRR,abs/1705.07313, 2017.

[259] C. Tian. Formalized Lambek calculus in higher order logic (HOL4).CoRR, abs/1705.07318, 2017.

[260] J. Urban. Translating Mizar for first order theorem provers. In Pro-ceedings of 2nd International Conference on Mathematical KnowledgeManagement, pages 203–215, 2003.

[261] J. Urban. MaLARea: A metasystem for automated reasoning in largetheories. In Proceedings of 21th Workshop on Empirically SuccessfulAutomated Reasoning in Large Theories, 2007.

[262] J. Urban, K. Hoder, and A. Voronkov. Evaluation of automatedtheorem proving on the Mizar mathematical library. In Proceedingsof International Congress on Mathematical Software, pages 155–166,2010.

[263] J. Urban, G. Sutcliffe, P. Pudlak, and J. Vyskocil. MaLARea SG1-machine learner for automated reasoning with semantic guidance. InProceedings of 4th International Conference on Automated Reasoning,pages 441–456, 2008.

[264] J. Urban and R. Veroff. Experiments with state-of-the-art automatedprovers on problems in tarskian geometry. In Procedings of 11thInternational Workshop on the Implementation of Logics, pages 122–126, 2015.

[265] J. Urban and J. Vyskocil. Theorem proving in large formal mathematicsas an emerging AI field. In Automated Reasoning and Mathematics:Essays in Memory of William McCune, pages 240–257, 2013.

[266] J. Vitt and J. Hooman. Assertional specification and verificationusing PVS of the steam boiler control system. In Proceedings of theInternational Conference Formal Methods for Industrial Applications,pages 453–472, 1995.

[267] D. von Oheimb. Hoare logic for Java in Isabelle/HOL. Concurrencyand Computation: Practive and Experience, 13(13):1173–1214, 2001.

[268] H. Wang. Computer theorem proving and artificial intelligence. Com-putational Logic, pages 63–75, 1990.

[269] T. Weber. SMT solvers: New oracles for the HOL theorem prover. Inter-national Journal on Software Tools for Technology Transfer, 13(5):419–429, 2011.

[270] W. F. Wenzel, M. A comparison of Mizar and Isar. Journal of AutomatedReasoning, 29(3–4):389–411, 2002.

[271] D. Weyns, M. U. Iftikhar, D. G. de la Iglesia, and T. Ahmad. Asurvey of formal methods in self-adaptive systems. In Proceedingsof 5th International Conference on Computer Science & SoftwareEngineering, pages 67–79, 2012.

[272] N. White, S. Matthews, and R. Chapman. Formal verification: Willthe seedling ever flower? Philosophical Transactions A, 375:20150402,2017.

[273] F. Wiedijk. Mizar Light for HOL Light. In Proceedings of InternationalConference on Theorem Proving in Higher Order Logic, pages 378–393,2001.

[274] F. Wiedijk. The seventeen provers of the world: Foreword by Dana S.Scott, volume 3600. Springer, 2006.

[275] J. Woodcock, P. G. Larsen, J. Bicarregui, and J. S. Fitzgerald. Formalmethods: Practice and experience. ACM Computing Surveys, 41(4):1–36, 2009.

[276] L. A. Yang, J. P. Liu, C. H. Chen, and Y. ping Chen. Automaticallyproving mathematical theorems with evolutionary algorithms and proofassistants. In Proceedings of Congress on Evolutionary Computation,pages 4421–4428, 2016.

[277] A. Yushkovskiy. Comparison of two theorem provers: Isabelle/HOL andCoq. In Proceedings of the Seminar in Computer Science (CS-E4000),pages 1–17, 2017.

[278] B. Zhan and M. P. L. Haslbeck. Verifying asymptotic time complexityof imperative programs in Isabelle. In Proceedings of 9th InternationalJoint Conference on Automated Reasoning, pages 532–548, 2018.

[279] X. Zhang, W. Hong, Y. Li, and M. Sun. Reasoning about connectorsusing Coq and Z3. Science of Computer Programming, 170:27–44,2019.

APPENDIXSystem details of 27 theorem provers.

General

Nam

eH

RIC

SA

naly

tica

Zen

onC

ontr

ibut

orSi

mon

Col

ton

Jean

-Chr

isto

phe

Filli

tre

E.C

lark

e&

X.Z

hao

R.B

onic

hon,

D.D

elah

aye

&D

.Dol

igez

1stR

el20

0220

0219

9020

07In

d/U

ni/I

ndU

nive

rsity

ofE

dinb

urgh

SRI

Inte

rnat

iona

l,U

SAC

arne

gie

Mel

lon

Uni

vers

ityIn

depe

nden

t

Implementation

CL

ang

Java

Oca

ml

Mat

hem

atic

aO

Cam

lPr

og.P

Func

tiona

l,C

oncu

rren

tFu

nctio

nal,

Impe

rativ

e,O

OPr

oced

ural

,Fun

ctio

nal,

OO

Func

tiona

l,Im

pera

tive,

OO

LVH

R2.

0Y

ices

Ana

lytic

a2

0.8.

2LT

Ope

nso

urce

Ope

nso

urce

Free

BSD

BSD

and

MIT

UI

CL

IC

LI

GU

IC

LI

OS

Lin

ux,W

indo

ws

Lin

ux,S

olar

is,M

AC

Cro

ssC

ross

Lib

API

No

Stan

dard

libra

rySt

anda

rdlib

rary

CG

No

No

No

No

Ed

Yes

Yes

Yes

No

Ext

Yes

Yes

Yes

Yes

I/O

No

No

Tex

files

No

Logico-Math

TTy

peT

heor

emge

nera

tor

Dec

isio

npr

oced

ure

AT

PA

TP

(Alg

ebra

icsp

ecifi

catio

n&

proo

fsy

stem

)C

Log

icFO

LFO

Lw

itheq

ualit

y&

quan

tifier

free

FOL

FOL

(with

poly

mor

phic

&eq

ualit

y)T

VB

inar

yB

inar

yB

inar

yB

inar

yST

No

No

No

B-M

etho

dse

tthe

ory

Cal

culu

sIn

duct

ive

Ded

uctiv

eD

educ

tive

Ded

uctio

nm

odul

o(T

able

aum

etho

d)Pr

oofK

erne

lN

oN

oY

esN

o

Others

App

.Are

asPr

oduc

ela

rge

num

ber

ofth

eore

ms

for

test

ing

AT

PSy

stem

sE

mbe

dded

inap

plic

atio

nto

prov

ide

dedu

ctiv

ese

rvic

esSy

mbo

licco

mpu

tatio

nsy

stem

Use

din

foca

len

viro

nmen

t,O

bjec

tor

i-en

ted

alge

bra

spec

ifica

tion

Eva

lZ

aris

kiSp

ecifi

catio

nN

ASA

,par

tof

PVS

Polic

yan

alys

isT

PTP

Cat

egor

ySe

t(2

27ou

tof

462)

SEU

(110

outo

f90

0)U

niqu

eFe

atur

esM

achi

neL

earn

ing

API

for

proo

fse

arch

and

sym

bolic

sim

ulat

ion

Tran

slat

edto

OM

Doc

fram

ewor

kPr

oduc

elo

wle

velp

roof

dire

ctly

forC

oq

Page 24: 1 A Survey on Theorem Provers in Formal Methods

24

General

Nam

eYa

rrow

Wat

son

KR

Hyp

er/E

-KR

Hyp

er/H

yper

iPro

ver

Con

trib

utor

Jan

Zw

anen

burg

M.R

anda

llH

olm

esB

jorn

Pelz

erK

onst

antin

Kor

ovin

1stR

el19

9720

0620

0720

08In

d/U

ni/I

ndE

indh

oven

Uni

vers

ityB

oise

Stat

eU

nive

rsity

Kob

lenz

Uni

vers

ityU

nive

rsity

ofM

anch

este

r

ImplementationC

Lan

gH

aske

llSt

anda

rdM

LO

Cam

lO

Cam

lPr

og.P

Func

tiona

l,M

odul

arFu

nctio

nala

ndIm

pera

tive

Func

tiona

l,Im

pera

tive,

OO

Func

tiona

l,Im

pera

tive,

OO

LVV

1.20

0.8.

21.

4V

0.99

LTFr

eeB

SDB

SDan

dM

ITG

NU

Gen

eral

GN

UG

PLG

NU

UI

GU

I&

CL

IC

LI

CL

IC

LI

OS

Uni

x,L

inux

Lin

uxW

indo

ws

and

Uni

xL

inux

Lib

Fudg

etfo

rgr

aphi

cali

nter

face

No

No

No

CG

No

No

No

No

Ed

Yes

No

Yes

Yes

Ext

Yes

Yes

Yes

Yes

I/O

Poly

mor

phic

No

TPT

Psu

ppor

ted

prot

ein

form

atN

o

Logico-Math

TTy

peIT

PIT

PA

TP

and

mod

elge

nera

tor

AT

PC

Log

icC

onst

ruct

ive

HO

LFO

Lw

itheq

ualit

yFO

LT

VB

inar

yB

inar

yB

inar

yB

inar

yST

No

Qui

ne’s

sett

heor

yN

oN

oC

alcu

lus

Type

-cal

culu

sTy

pedλ

-cal

culu

sH

yper

tabl

eau

calc

ulus

Inst

antia

tion

calc

ulus

Proo

fKer

nel

Yes

No

No

No

Others

App

.Are

asR

epre

sent

atio

nen

viro

nmen

tfo

rlo

g-ic

s&

prog

ram

min

gla

ngua

ges.

Soft

war

eve

rific

atio

n,m

odel

chec

k-in

g,ed

ucat

ion

&m

athe

mat

ics

Em

bedd

edin

know

ledg

ere

pres

enta

-tio

nsy

stem

sH

ardw

are

veri

ficat

ion

and

finite

mod

elin

gE

val

Form

aliz

edty

peth

eory

TPT

Pca

tego

rySo

lved

74%

ofth

esu

best

ofT

PTP

CA

SC-2

6E

PRdi

visi

onw

inne

rU

niqu

eFe

atur

esE

xper

imen

tco

ntex

tfo

rte

stin

gpu

rety

pesy

stem

Type

free

HO

Lsu

ppor

tU

sed

for

desc

ript

ion

logi

cpr

oble

ms

Mod

ular

com

bina

tion

ofpr

opos

ition

and

inst

antia

tiona

lrea

soni

ng

General

Nam

eJA

PEE

-Dar

win

lean

CoP

LE

O-I

IC

ontr

ibut

orR

icha

rdB

orna

tB

aum

gart

ner

Otte

nJe

nsO

tten

C.B

enzm

ulle

r,F.

The

iss,

N.S

ulta

na1s

tRel

1996

2005

2003

2012

Ind/

Uni

/Ind

Que

enM

arry

,Uni

vers

ityof

Lon

don

Kob

lenz

Uni

vers

ityU

nive

rsity

ofO

slo

Frei

eU

nive

rsity

Ber

linan

dC

am-

brid

geU

nive

rsity

Implementation

CL

ang

Java

OC

aml

Prol

ogO

Cam

lPr

og.P

Obj

ecto

rien

ted

(OO

)Fu

nctio

nal,

impe

rativ

e,O

OL

ogic

prog

ram

min

gFu

nctio

nal,

Impe

rativ

e,O

OLV

v7-d

151.

52.

11.

720

15LT

GN

UG

PLG

NU

Gen

eral

GN

Uge

nera

lB

SDU

IG

UI

CL

IC

LI

CL

IO

SL

inux

,Mac

Uni

x,W

indo

ws

Win

dow

s,U

nix,

Lin

ux,M

acU

nix,

Win

dow

sL

ibN

oN

oN

oY

esC

GN

oN

oN

oN

oE

dY

esY

esY

esY

esE

xtY

esY

esY

esY

esI/

ON

oIn

putT

PTP

orT

ME

form

atL

eanC

oPor

TPT

Psy

ntax

TPT

PT

HF

lang

uage

Logico-Math

TTy

pePr

oof

assi

stan

tA

TP

AT

PA

TP+

ITP

CL

ogic

FOL

FOL

clau

salw

itheq

ualit

yFi

rst-

orde

rin

tuiti

onis

ticH

OL

TV

Bin

ary

Bin

ary

Bin

ary

Bin

ary

STY

esN

oN

oN

oC

alcu

lus

Ded

uctiv

e&

Sequ

entc

alcu

lus

Mod

elev

alua

tion

Con

nect

ion/

tabl

eau

calc

ulus

Res

olut

ion

byun

ifica

tion

and

equa

l-ity

(RU

E)

Proo

fKer

nel

No

No

No

No

Others

App

.Are

asU

sed

asa

proo

fas

sist

anta

ndim

ple-

men

tJA

PEth

eori

esE

ncry

ptan

dso

lve

prob

lem

sFo

rmal

izat

ion

Coo

pera

tion

with

first

-ord

erA

TP

Eva

lTe

achi

ngpu

rpos

eto

olE

PRw

inne

rat

CA

SC-2

0an

dJ3

Thi

rdin

FOF

divi

sion

atC

ASC

-22

CA

SC-J

5w

inne

rin

TH

Fdi

visi

onU

niqu

eFe

atur

esFo

rwar

dre

ason

ing

and

logi

cen

cod-

ing

Bac

k-ju

mpi

ngan

ddy

nam

icba

ck-

trac

king

Prog

ram

can

beea

sily

mod

ified

for

spec

ific

task

orap

plic

atio

ndu

eto

itsco

mpa

ctco

de

Coo

pera

tive

Proo

fSe

arch

Page 25: 1 A Survey on Theorem Provers in Formal Methods

25

General

Nam

eM

aLA

Rea

Mus

cade

tPr

ince

ssSa

talla

xC

ontr

ibut

orJo

sef

Urb

anD

omin

ique

Past

rePh

ilipp

Rum

mer

Cha

dE

.Bro

wn

1stR

el20

0720

0320

0820

10In

d/U

ni/I

ndC

harl

esU

nive

rsity

Uni

vers

ityPa

ris

Des

cart

esU

ppsa

laun

iver

sity

Saar

land

Uni

vers

ity

Implementation

CL

ang

Perl

SWI-

Prol

ogSc

ala

OC

aml

Prog

.PFu

nctio

nal,

Impe

rativ

e,O

OL

ogic

prog

ram

min

gO

O,f

unct

iona

l,co

ncur

rent

Func

tiona

l,im

pera

tive,

OO

LV0.

54.

5V

2.1

3.0

LTG

PL2

BSD

LG

PLB

SDU

IC

LI

CL

IC

LI

CL

IO

SL

inux

Uni

x,L

inux

Lin

uxL

inux

Lib

No

No

No

No

CG

No

No

No

No

Ed

Yes

Yes

Yes

Yes

Ext

Yes

Yes

Yes

Yes

I/O

No

No

Nat

ive

lang

uage

SMT

No

Logico-Math

TTy

peA

TP

Kno

wle

dge

base

TP

The

orem

prov

erA

TP

CL

ogic

HO

Lse

cond

-ord

erlo

gic

FOL

HO

LT

VB

inar

yB

inar

yB

inar

yB

inar

yST

No

No

Mod

ular

linea

rin

tege

rar

ithm

etic

Chu

rch’

ssi

mpl

ety

peth

eory

Cal

culu

sD

educ

tive

Nat

ural

dedu

ctio

nFr

eeva

riab

leta

blea

u,C

onst

rain

edse

-qu

entc

alcu

lus

Tabl

eau

calc

ulus

Proo

fKer

nel

No

No

No

No

Others

App

.Are

asL

earn

ing

and

reas

onin

gsy

stem

for

prov

ing

inla

rge

form

allib

rari

esTo

polo

gica

llin

ear

spac

es,

cellu

lar

auto

mat

aSo

ftw

are

veri

ficat

ion

and

mod

elch

ecki

ngFo

rmal

izat

ion

Eva

lC

ASC

-24

LTB

divi

sion

win

ner

Win

ner

CA

SC-J

Cin

IJC

AR

2001

Solv

edpr

oble

ms

from

QF-

LIA

cate

-go

ryof

SMT

Lib

rary

CA

SC-2

6T

HF

divi

sion

win

ner

Uni

que

Feat

ures

Util

ize

AT

Pas

core

syst

emw

ithA

Ite

chni

ques

.E

ffici

entf

orth

epr

oble

ms

cont

aini

ngto

om

any

axio

ms

and

form

ulas

Solv

edqu

antifi

edm

odul

olin

eari

nte-

ger

arith

met

icSe

man

ticem

bedd

ing

and

cuts

imul

a-tio

n

General

Nam

eSP

ASS

Cor

alD

ISC

OU

NT

DO

RIS

Con

trib

utor

Chr

isto

phe

Wei

denb

ach

Ala

nB

undy

,G

raha

mSt

eel

and

Mon

ika

Mai

dlJo

rgD

enzi

nger

Joha

nB

os

1stR

el19

9920

0619

9719

98In

d/U

ni/I

nde

Max

Plan

ckIn

stitu

tefo

rC

ompu

ter

Scie

nce

Uni

vers

ityof

Edi

nbur

ghU

nive

rsity

ofC

alga

ryU

nive

rsity

ofE

dinb

urgh

Implementation

CL

ang

Java

Bui

lton

SPA

SSth

eore

mpr

over

CPr

olog

Prog

.PFu

nctio

nal

Func

tiona

lPr

oced

ural

Log

icpr

ogra

mm

ing

LV3.

920

082.

0D

OR

IS20

01LT

Free

BSD

Free

BSD

Free

BSD

Free

BSD

UI

CL

I+G

UI

GU

IC

LI

CL

IO

SW

indo

ws,

MA

C,L

inux

Cro

ssL

inux

,Sol

aris

Cro

ssPl

atfo

rmL

ibY

esY

esY

esN

oC

GN

oN

oN

oN

oE

dY

esY

esY

esY

esE

xtY

esY

esN

oY

esI/

ON

oN

oE

(Uni

vers

alIm

plic

atio

n)N

o

Logico-Math

TTy

peA

TP

Indu

ctiv

eth

eore

mpr

over

Dis

trib

uted

equa

tiona

lTP

TP+

Sem

antic

anal

yzer

CL

ogic

FOL

FOL

FOL

FOL

TV

Bin

ary

Bin

ary

Bin

ary

Bin

ary

STN

oN

oN

oN

oC

alcu

lus

Supe

rpos

ition

calc

ulus

Tabl

eau

calc

ulus

Pure

unit

equa

lity

Lam

bda

calc

ulus

Proo

fKer

nel

No

No

Yes

No

Others

App

.Are

asA

naly

sis

ofse

curi

typr

otoc

ols,

colli

-si

onav

oida

nce

prot

ocol

sC

rypt

ogra

phic

secu

rity

prot

ocol

anal

ysis

,fin

dat

tack

son

faul

tyse

curi

typr

otoc

ols

Mac

hine

lear

ning

Com

puta

tiona

lse

man

tics,

cove

rva

r-io

uslin

guis

ticph

enom

enas

Eva

lD

isco

vern

ewat

tack

son

the

Aso

kan-

Gin

zboo

rgpr

otoc

olD

isco

vern

ewat

tack

son

the

Tagh

idri

and

Jack

son

impr

oved

prot

ocol

Ent

ranc

eco

mpe

titio

nD

isco

unt/G

LSt

udy

beha

vior

ofR

OB

’sal

gori

thm

Uni

que

Feat

ures

AT

Pw

itheq

ualit

yFi

ndco

unte

rexa

mpl

eto

indu

ctiv

eco

njec

ture

Bas

edon

team

wor

km

etho

dfo

rkn

owle

dge

base

ddi

stri

butio

nTr

ansl

ate

Eng

lish

text

into

disc

ours

ere

pres

enta

tion

stru

ctur

e

Page 26: 1 A Survey on Theorem Provers in Formal Methods

26

General

Nam

eG

etfo

lG

oede

lE

xpan

der

Geo

met

ryE

xper

tC

ontr

ibut

orFa

usto

Giu

nchi

glia

Joha

nB

elin

fant

ePe

ter

Pada

witz

Xia

o-Sh

anG

ao1s

tRel

1994

2005

2007

1998

Ind/

Uni

/Ind

Uni

vers

ityof

Tren

toG

eorg

iaIn

stitu

teof

Tech

nolo

gyD

ortm

und

univ

ersi

tyK

eyL

abor

ator

yof

Chi

na

ImplementationC

Lan

gC

omm

onL

isp

Mat

hem

atic

aO

’Has

kell

(ex

tent

ion

ofH

aske

ll)Ja

vaPr

og.P

Met

are

flect

ive,

OO

,fun

ctio

nal

Func

tiona

l,pr

oced

ural

Con

curr

entp

rogr

amm

ing

Func

tiona

lLV

2.00

120

14E

xpan

der

MM

P/G

eom

eter

LTFr

eeB

SDFr

eeB

SDFr

eeB

SDG

NU

gene

ralp

ublic

lisce

nce

UI

GU

IG

UI

GU

IG

UI

OS

Uni

xC

ross

Cro

ssC

ross

Lib

Yes

No

Yes

No

CG

Yes

No

No

No

Ed

Yes

Yes

Yes

Yes

Ext

Yes

Yes

Yes

Yes

I/O

Proo

f-sc

ript

No

No

No

Logico-Math

TTy

peIT

PA

TP

AT

PA

utom

atic

Geo

met

ric

TP

CL

ogic

FOL

FOL

FOL

Dyn

amic

logi

cm

odel

sT

VB

inar

yB

inar

yB

inar

yB

inar

yST

No

ZF

sett

heor

ySw

ingi

ngty

pes

No

Cal

culu

sN

atur

alde

duct

ion

Nat

ural

dedu

ctio

nN

arro

w&

Fixe

dpo

intc

o-in

duct

ion

Euc

lidea

nan

ddi

ffer

entia

lgeo

met

ryPr

oofK

erne

lN

oN

oN

oN

o

Others

App

.Are

asIn

vari

ous

data

stru

ctur

ere

alw

orld

embe

ddin

gD

eriv

ene

wth

eore

mfo

rau

tom

ated

reas

onin

gTe

stin

gal

gebr

aic

data

type

and

func

-tio

nall

ogic

prog

ram

Teac

hing

geom

etry

,al

gebr

aan

dph

ysic

sin

Chi

naat

scho

olE

val

Ent

ranc

ein

CA

DE

-17

393

exam

ple

ofQ

AIF

in91

seco

nds

Supe

rco

ncen

trat

ors

Wu’

sm

etho

dim

plem

ente

dU

niqu

eFe

atur

esM

eta-

theo

ryim

plem

enta

tion

Red

uce

num

ber

ofst

eps

Inte

ract

ive

term

rew

ritin

g,gr

aph

tran

sfor

mat

ion,

seve

ral

repr

esen

ta-

tion

offo

rmal

expr

essi

on

Aut

omat

edge

omet

ric

diag

ram

con-

stru

ctio

n

General

Nam

eD

TP

Z/E

VE

SG

raffi

tiC

ontr

ibut

orD

onG

eddi

sM

ark

Saal

tinkz

Eps

tein

1stR

el19

9519

9719

86In

d/U

ni/I

ndSt

anfo

rdU

nive

rsity

Uni

vers

ityof

Ken

tU

nive

rsity

ofH

oust

on

Implementation

CL

ang

Com

mon

Lis

pC

omm

onL

isp

C++

Prog

.PM

eta

refle

ctiv

e,O

O,f

unct

iona

lM

eta

refle

ctiv

e,O

O,f

unct

iona

l,pr

o-ce

dura

lPr

oced

ural

,OO

,sta

tical

lyty

pe,t

ype

chec

king

LV3.

02.

4.1

GR

AFF

ITI

LTFr

eeB

SDFr

eeB

SDFr

eeB

SDU

IC

LI

GU

I/C

LI

GU

IO

SC

ross

Win

dow

s,U

nix,

Lin

ux,M

acU

inx

Lib

Yes

Epi

log

Yes

Yes

CG

No

Yes

No

Ed

Yes

Yes

Yes

Ext

Yes

Yes

Yes

I/O

KIF

(Kno

wle

dge

Inte

rcha

nge

For-

mat

)L

atex

No

Logico-Math

TTy

peM

odal

elim

inat

ion

theo

rem

prov

erA

TP+

ITP

Dec

isio

npr

oced

ure

CL

ogic

FOL

FOL

Gra

phth

eory

TV

Bin

ary

Bin

ary

Bin

ary

STH

orn

theo

ries

ZF

sett

heor

y,A

xiom

atic

sett

heor

yY

esC

alcu

lus

Firs

t-or

der

pred

icat

eca

lcul

usλ

-cal

culu

sD

educ

tive

Proo

fKer

nel

No

No

No

Others

App

.Are

asB

lack

box

infe

renc

een

gine

for

vari

-ou

sm

achi

nele

arni

ngpr

ogra

mG

ener

alth

eore

mpr

over

Use

din

chem

istr

yan

dm

athe

mat

ics

(gra

phth

eory

)for

mak

ing

conj

ectu

reE

val

Spec

ifica

tion

Bar

bara

Proj

ect

Uni

que

Feat

ures

Dom

ain

inde

pend

entc

ontr

olof

infe

r-en

ceT

heor

empr

over

,sy

ntax

and

type

chec

ker,

dom

ain

chec

ker

Peda

gogi

cto

ol