Supporting Dependently Typed Functional Programming with Proof Automation and Testing Sean Wilson T H E U N I V E R S I T YO F E D I N B U R G H Doctor of Philosophy Centre for Intelligent Systems and their Applications School of Informatics University of Edinburgh 2011 Abstract Dependent types can be used to capture useful properties about programs at com- pile time. However, developing dependently typed programs can be difficult in current systems. Capturing interesting program properties usually requires the user to write proofs, where constructing the latter can be both a difficult and tedious process. Addi- tionally, finding and fixing errors in program scripts can be challenging. This thesis concerns ways in which functional programming with dependent types can be made easier. In particular, we focus on providing help for developing programs that incorporate user-defined types and user-defined functions. For the purpose of sup- porting dependently typed programming, we have designed a framework that provides improved proof automation and error feedback. Proof automation is provided with the use of heuristic based tactics that automate common patterns of proofs that arise when programming with dependent types. In particular, we use heuristics for generalising goals and employ the rippling heuristic for guiding inductive and non-inductive proofs. The automation we describe includes features for caching and reusing lemmas proven during proof search and, whenever proof search fails, the user can assist the prover by providing high-level hints. We concentrate on providing improved feedback for the errors that occur when there is a mismatch between the specification of a program, described with the use of dependent types, and the behaviour of the program. We employ a QuickCheck-like testing tool for automatically identifying these forms of errors, where the counterex- amples generated are used as error messages. To demonstrate the effectiveness of our framework for supporting dependently typed programming, we have developed a prototype based around the Coq theorem prover. We demonstrate that the framework as a whole makes program development easier by conducting a series of case studies. In these case studies, which involved verifying properties of tail recursive functions, sorting functions and a binary adder, a significant number of the proofs required were automated. i Acknowledgements Firstly, thank you to my supervisors Jacques Fleuriot and Alan Smaill for all their time, advice and encouragement over the years. I would also like to thank Lucas Dixon for his help and for initially suggesting the field of dependently typed programming to me. Thank you to my examiners Ewen Denney and Stephen Gilmore for their helpful feedback. Finally, thank you to Emer, Priya and my family for their support. ii Declaration I declare that this thesis was composed by myself, that the work contained herein is my own except where explicitly stated otherwise in the text, and that this work has not been submitted for any other degree or professional qualification except as specified. (Sean Wilson) iii Table of Contents 1 Introduction 1 1.1 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Background 5 2.1 Types and Programming . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Dependently Typed Programming . . . . . . . . . . . . . . . . . . . 6 2.2.1 Uses of Dependent Types in Programming . . . . . . . . . . 7 2.2.2 The Curry-Howard Isomorphism . . . . . . . . . . . . . . . . 8 2.2.3 Type Erasure . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.4 A Brief Introduction to Coq . . . . . . . . . . . . . . . . . . 8 2.2.5 Dependent Types . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.6 Recursion and Termination . . . . . . . . . . . . . . . . . . . 16 2.2.7 Impossible Cases . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3 Dependently Typed Programming Languages . . . . . . . . . . . . . 17 2.3.1 DML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.2 ATS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.3 Sage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.4 Concoqtion . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.5 Cayenne . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.6 Epigram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3.7 Agda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3.8 Idris . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.4 Inductive Theorem Proving . . . . . . . . . . . . . . . . . . . . . . . 21 2.5 Rippling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 iv 2.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.5.2 Differences and Embeddings . . . . . . . . . . . . . . . . . . 23 2.5.3 Fertilisation . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.5.4 Ripple Measures . . . . . . . . . . . . . . . . . . . . . . . . 24 2.5.5 Example: A Rippling Proof . . . . . . . . . . . . . . . . . . 24 2.6 Proof Planning and Critics . . . . . . . . . . . . . . . . . . . . . . . 25 2.7 Lemma Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.8 Generalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.9 Counterexample Generation . . . . . . . . . . . . . . . . . . . . . . 29 2.9.1 QuickCheck . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.9.2 SmallCheck and Lazy SmallCheck . . . . . . . . . . . . . . . 30 2.9.3 Testing in Agda . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3 Programming with Dependent Types 31 3.1 Coq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.1.1 Reductions, Normalisation and Convertibility . . . . . . . . . 31 3.1.2 Equality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2 Program Construction . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2.1 Constructing Programs Manually . . . . . . . . . . . . . . . 33 3.2.2 Proof Construction with Tactics . . . . . . . . . . . . . . . . 34 3.2.3 Program Construction with Tactics . . . . . . . . . . . . . . . 34 3.2.4 Constructing Computational and Logical Terms Separately . . 35 3.3 Dependently Typed Programming in Russell . . . . . . . . . . . . . . 35 3.3.1 Inductive Family Coercions . . . . . . . . . . . . . . . . . . 36 3.3.2 Subset Type Coercions . . . . . . . . . . . . . . . . . . . . . 37 3.3.3 Impossible Case Proof Obligations . . . . . . . . . . . . . . . 39 3.3.4 Termination Measures . . . . . . . . . . . . . . . . . . . . . 39 3.4 Program Specifications . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.4.1 Strong and Weak Specifications . . . . . . . . . . . . . . . . 40 3.4.2 Transparent and Opaque Definitions . . . . . . . . . . . . . . 41 3.4.3 Type Refinement Choices . . . . . . . . . . . . . . . . . . . 41 3.4.4 Functions and Inductive Predicates . . . . . . . . . . . . . . . 42 3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 v 4 Challenges when Programming with Dependent Types 44 4.1 User-Defined Properties . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.2 Proof Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.3 Coping with Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5 A Framework for Supporting Dependently Typed Programming 48 5.1 Framework Features . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.2 Components and Interactions . . . . . . . . . . . . . . . . . . . . . . 50 5.3 Usage Storyboards . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.4 Intended Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.5 Prototype Implementation . . . . . . . . . . . . . . . . . . . . . . . 54 5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 6 Proof Patterns of Dependently Typed Programs 56 6.1 The simplify Proof Pattern . . . . . . . . . . . . . . . . . . . . . . . 57 6.2 The trivial Proof Pattern . . . . . . . . . . . . . . . . . . . . . . . . 58 6.3 The impossible case Proof Pattern . . . . . . . . . . . . . . . . . . . 58 6.4 The induction Proof Pattern . . . . . . . . . . . . . . . . . . . . . . . 59 6.5 The recursive call Proof Pattern . . . . . . . . . . . . . . . . . . . . 60 6.5.1 Recursive Calls and Embeddings . . . . . . . . . . . . . . . . 60 6.5.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.5.3 Multiple Recursive Calls . . . . . . . . . . . . . . . . . . . . 61 6.5.4 Pattern Description . . . . . . . . . . . . . . . . . . . . . . . 61 6.6 The ripple Proof Pattern . . . . . . . . . . . . . . . . . . . . . . . . 62 6.7 The cross fertilise Proof Pattern . . . . . . . . . . . . . . . . . . . . 62 6.8 The generalise Proof Pattern . . . . . . . . . . . . . . . . . . . . . . 64 6.8.1 Pattern Description . . . . . . . . . . . . . . . . . . . . . . . 65 6.9 Combining Proof Patterns . . . . . . . . . . . . . . . . . . . . . . . . 65 6.10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 7 Automation of Proof Patterns 67 7.1 Top-Level Tactic Description . . . . . . . . . . . . . . . . . . . . . . 68 7.1.1 Relation to Proof Planning . . . . . . . . . . . . . . . . . . . 69 7.2 Lemma Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 7.2.1 Irrelevant Assumptions and Caching Reusable Lemmas . . . . 70 vi 7.3 The simplify Tactic . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 7.4 The trivial Tactic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 7.5 The generalise Tactic . . . . . . . . . . . . . . . . . . . . . . . . . . 73 7.5.1 Overview: An Aggressive Generalisation Algorithm . . . . . 73 7.5.2 Step 1: Inverse Functionality . . . . . . . . . . . . . . . . . . 74 7.5.3 Step 2: Common Subterm Generalisation . . . . . . . . . . . 75 7.5.4 Step 3: Generalising Apart . . . . . . . . . . . . . . . . . . . 75 7.5.5 Step 4: Eliminating Irrelevant Assumptions (the irrelevance Tactic) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 7.5.6 Step 5: Checking for overgeneralisations . . . . . . . . . . . 78 7.5.7 Unblocking Rippling by Generalising Apart . . . . . . . . . . 79 7.6 The induction Tactic . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 7.6.1 Inductive Variable and Induction Principle Choice . . . . . . 79 7.6.2 Modifying the Conclusion Before Performing Induction . . . 80 7.7 The recursive call Tactic . . . . . . . . . . . . . . . . . . . . . . . . 81 7.8 The ripple Tactic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7.8.1 Generating Equations from Function Definitions . . . . . . . 82 7.8.2 Rippling Annotations . . . . . . . . . . . . . . . . . . . . . . 84 7.8.3 Ripple Measures . . . . . . . . . . . . . . . . . . . . . . . . 84 7.8.4 Conclusion Transformations . . . . . . . . . . . . . . . . . . 85 7.8.5 Weak Fertilisation . . . . . . . . . . . . . . . . . . . . . . . 85 7.9 The cross fertilise Tactic . . . . . . . . . . . . . . . . . . . . . . . 86 7.10 Delayed Generalisation . . . . . . . . . . . . . . . . . . . . . . . . . 86 7.10.1 Illustrative Examples . . . . . . . . . . . . . . . . . . . . . . 87 7.10.2 Algorithm Description . . . . . . . . . . . . . . . . . . . . . 88 7.11 Automatic Identification of Simplification Rules . . . . . . . . . . . . 88 7.11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 7.11.2 Heuristics for Identifying Simplification Rules . . . . . . . . 90 7.12 Lemma Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 7.13 User Hints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 7.13.1 Proof Search Feedback . . . . . . . . . . . . . . . . . . . . . 94 7.13.2 User Hinting Mechanism . . . . . . . . . . . . . . . . . . . . 97 7.13.3 Providing Productive Hints . . . . . . . . . . . . . . . . . . . 97 7.14 A Comparison with IsaPlanner . . . . . . . . . . . . . . . . . . . . . 99 7.14.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 99 vii 7.14.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 7.14.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 7.14.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 7.15 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 8 Supporting Dependently Typed Programming with Testing 104 8.1 Providing Error Feedback with Testing . . . . . . . . . . . . . . . . . 105 8.1.1 Error Feedback Procedure . . . . . . . . . . . . . . . . . . . 108 8.1.2 Concise Counterexample Evaluation Traces . . . . . . . . . . 109 8.1.3 Weak Specifications and Counterexamples . . . . . . . . . . 110 8.2 Testing and Proving . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 8.2.1 Testing as Part of Proof Automation . . . . . . . . . . . . . . 111 8.2.2 Feedback for Faulty Hints with Testing . . . . . . . . . . . . 111 8.2.3 Supporting Manual Proofs with Testing . . . . . . . . . . . . 112 8.3 Design of a Testing Tool . . . . . . . . . . . . . . . . . . . . . . . . 112 8.3.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . 112 8.3.2 Counterexample Generation . . . . . . . . . . . . . . . . . . 113 8.3.3 Testing Propositions . . . . . . . . . . . . . . . . . . . . . . 114 8.3.4 Instantiating Type Variables . . . . . . . . . . . . . . . . . . 115 8.3.5 Random Term Generation . . . . . . . . . . . . . . . . . . . 116 8.3.6 Generating Small Counterexamples . . . . . . . . . . . . . . 116 8.3.7 Testing within Coq . . . . . . . . . . . . . . . . . . . . . . . 119 8.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 8.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 9 Case Studies 122 9.1 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 9.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 9.2.1 Choice of Examples . . . . . . . . . . . . . . . . . . . . . . 123 9.2.2 Conducting a Case Study . . . . . . . . . . . . . . . . . . . . 123 9.2.3 Reporting Case Studies . . . . . . . . . . . . . . . . . . . . . 123 9.2.4 System Configuration . . . . . . . . . . . . . . . . . . . . . 124 9.3 Case Study: Tail Recursive Functions . . . . . . . . . . . . . . . . . 125 9.3.1 List Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 9.3.2 Factorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 9.3.3 Inorder Tree Traversal . . . . . . . . . . . . . . . . . . . . . 130 viii 9.3.4 Error Feedback . . . . . . . . . . . . . . . . . . . . . . . . . 132 9.4 Case Study: Insertion Sort, Tree Sort and Quicksort . . . . . . . . . . 133 9.4.1 Insertion Sort . . . . . . . . . . . . . . . . . . . . . . . . . . 133 9.4.2 Tree Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 9.4.3 Quicksort . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 9.4.4 Error Feedback . . . . . . . . . . . . . . . . . . . . . . . . . 148 9.5 Case Study: Binary Adder . . . . . . . . . . . . . . . . . . . . . . . 148 9.5.1 Inductive Families Representation . . . . . . . . . . . . . . . 148 9.5.2 Inductive Families Representation: Variation . . . . . . . . . 152 9.5.3 Subset Types Representation . . . . . . . . . . . . . . . . . . 152 9.5.4 Error Feedback . . . . . . . . . . . . . . . . . . . . . . . . . 154 9.6 Results from Case Studies . . . . . . . . . . . . . . . . . . . . . . . 155 9.7 Lemma Caching Evaluation . . . . . . . . . . . . . . . . . . . . . . . 155 9.7.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 155 9.7.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 9.7.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 9.7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 9.8 A Comparison with IsaPlanner . . . . . . . . . . . . . . . . . . . . . 158 9.8.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 158 9.8.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 9.8.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 9.8.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 9.9 Answers to Research Questions . . . . . . . . . . . . . . . . . . . . . 160 9.10 Related Work in Dependently Typed Programming Environments . . . 163 9.11 Related Work in Inductive Proof Automation . . . . . . . . . . . . . 164 9.12 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 10 Conclusions and Further Work 167 10.1 Framework Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 167 10.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 10.3 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 10.4 Further Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 10.4.1 Inductive Families . . . . . . . . . . . . . . . . . . . . . . . 171 10.4.2 Integrating Domain Specific Tools and Libraries . . . . . . . 171 10.4.3 Non-Structural Recursion . . . . . . . . . . . . . . . . . . . 171 ix 10.4.4 Inductive Predicates . . . . . . . . . . . . . . . . . . . . . . 172 10.4.5 Infinite Data Structures . . . . . . . . . . . . . . . . . . . . . 172 10.4.6 Piecewise Fertilisation . . . . . . . . . . . . . . . . . . . . . 173 10.4.7 Improved Error Feedback . . . . . . . . . . . . . . . . . . . 173 10.4.8 Existential Quantifiers . . . . . . . . . . . . . . . . . . . . . 174 10.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 Bibliography 175 A Function and Type Definitions 190 A.1 Peano Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 A.2 Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 A.3 Binary Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 A.4 IsaPlanner Theorem Corpus Definitions . . . . . . . . . . . . . . . . 194 B Case Study Results 200 C IsaPlanner Theorem Corpus Experiment 204 x Chapter 1 Introduction Dependent types [Martin-Lo¨f, 1971] can be used to verify useful program properties at compile time. For example, dependent types can be used to statically verify that a program only performs safe array accesses [Xi, 1999a] and that a list sorting function always returns a sorted list [Altenkirch et al., 2005]. As well as providing an approach to reducing software faults, dependent types have been previously used as a means for performing compile time optimisations [Xi and Pfenning, 1998,Brady, 2005] and eliminating dead code from programs [Xi, 1999a]. Several dependently typed programming languages exist that support programming with what we will call user-defined properties. By this, we mean any program proper- ties that are described with the use of data types or functions whose definitions were introduced by the user. Current dependently typed languages that allow programming with user-defined properties include Coq [Bertot and Caste´ran, 2004], Cayenne [Au- gustsson, 1998], Agda [Coquand, 1998], Epigram [McBride and McKinna, 2004] and ATS [Cui et al., 2005]. However, developing programs in these languages can be chal- lenging for a number of reasons: • To capture interesting program properties, the user is typically required to con- struct proofs. Proof construction can range from being simple yet tedious to complex and challenging. • Errors occur when there is a mismatch between the specification of a program, described with the use of dependent types, and the actual behaviour of the pro- gram. With the exception of a tool available for Agda [Qiao Haiyan, 2003], there is little support in current systems for identifying or giving feedback for such errors. 1 Chapter 1. Introduction 2 1.1 Hypothesis This thesis describes a framework designed to make dependently typed functional pro- gramming more practical. The challenges we described above are addressed by this framework through the application of proof automation and testing. Proof automation is used to assist in the construction of any proofs required during program development. In cases where the proof automation fails, a hinting mecha- nism is available where the user can suggest important lemmas that should be proven before another proof search is attempted. Testing is used to provide feedback for er- rors and faulty hints, as well as for guiding proof search. The intended audience for this framework is existing users of dependently typed programming languages who are comfortable with writing proofs. The hypothesis we argue in this thesis is that: “This framework makes dependently typed programming significantly easier” By this, we mean that 1) the framework should be able to automate a significant number of the proofs that are required when writing dependently typed programs in practice and that 2) the error feedback feature helps the user to correct errors more quickly than without this feature. We give evidence for the above by providing an analysis of our experiences when developing programs with the whole framework, where we show that many of the proofs required can be automated. 1.2 Contributions The primary contribution of this thesis is as follows: We show that by integrating ideas from the domains of proof automation and testing, we can make dependently typed functional programming more practical, specifically when working with user-defined properties. This contribution involves two key areas: • We present generic and modular proof automation designed to construct the proofs required when programming with dependent types. In particular, this automation is shown to provide significant support for verifying program prop- erties concerning inductively defined types and recursively defined functions. Chapter 1. Introduction 3 • We demonstrate how this proof automation can be combined with testing to cre- ate an effective tool for supporting dependently typed programming. Testing is used to give feedback to faulty programs and faulty user hints, as well as for avoiding unnecessary search during proof automation attempts. In the course of establishing the above, we have created a concrete implementation of our ideas in the Coq theorem prover. The practical contributions of this are as follows: • We have made significant contributions to the automation power of Coq by in- troducing rippling-based [Bundy et al., 2005] inductive proof automation. • We have created a QuickCheck-like [Claessen and Hughes, 2000] testing tool for testing Coq goals, developed mostly within Coq itself. Surprisingly, we are not aware of any tools with similar functionality available in Coq. 1.3 Thesis Outline The structure of the thesis can be summarised as follows: Chapter 2: We describe necessary background information by summarising current dependently typed programming environments, as well as surveying automated reasoning and program testing techniques. Chapter 3: We describe the ways in which dependent types can be employed to write programs that capture useful program properties. Chapter 4: We explain how dependently typed programming can be challenging and identify the need for improved proof automation and error feedback. Chapter 5: We give a high-level overview of our framework for supporting depen- dently typed programming. Chapter 6: We give an analysis of what we find are the common patterns of proof that arise when programming with dependent types. Chapter 7: We describe how effective proof automation can be provided for automat- ing the proof patterns we identified. Chapter 8: We describe how testing can be used to provide error feedback and explain how testing and proof automation are combined to create our framework. Chapter 1. Introduction 4 Chapter 9: We evaluate the effectiveness of our framework as a whole by conducting several case studies. These case studies explore the use of a variety of data types, program properties and representations. Chapter 10: We finish by giving the conclusions of the thesis along with a discussion of further work. 1.4 Publications Work from this thesis has previously been published [Wilson et al., 2010a,Wilson et al., 2010b]. These publications mostly concern work from chapters 6 and 7. Chapter 2 Background In this chapter, we present background material and summarise the previous research that our work builds upon. We start with an overview of the general features of depen- dently typed programming and then describe the various development environments currently available. The framework we have designed to support dependently typed programming uses ideas from the proof automation and program testing communities. We give background information on these two domains in the later sections of this chapter. 2.1 Types and Programming The use of types in programming languages is widely recognised as making software development more practical. Types are used in programs to describe properties about data and the behaviour of functions. Typically, this is done so that type checking can be used to detect program faults at compile time. For instance, working within Church’s simply typed lambda calculus [Church, 1940] and given that the type int represents integers, the type int → int → int could be used to describe the behaviour of a function called plus for summing two integers. Type checking will identify the term plus 1 2 as well-typed and reject nonsensical terms like plus 1 2 3 and plus plus. We now note some of the important uses of types in software development, based on observations made in [Pierce, 2002]: Safety: As shown above, type checking can be used to guarantee at compile time that certain forms of errors are absent from programs. This is known by the slogan “well-typed programs do not go wrong” [Milner, 1978]. 5 Chapter 2. Background 6 Abstraction: Types can be used to abstract away implementation details of programs to create modular code, where modules communicate through well-defined in- terfaces. Modular code is widely recognised as being easier to develop, maintain and reuse. Documentation: Types are a form of machine checked documentation. For example, the type of a function can give hints about its behaviour and purpose. Efficiency: Machines can make use of types to improve program execution perfor- mance. For example, Fortran’s [IBM, 1954] type system was introduced so computers could use appropriate machine instructions depending on whether arithmetic was being performed with integers or real numbers. 2.2 Dependently Typed Programming Dependent types [Martin-Lo¨f, 1971] allow the behaviour of programs to be described more accurately than is possible with simple types. The underlying idea of dependent types is that types are allowed to depend on values. The common example used is to define the type vect A n, where A is some type, for representing lists of length n, where each list contains elements from type A. For example, given that nat represents the type of natural numbers, each term belonging to the type vect nat 2 represents a list containing two nat items. By making use of the vect type, we can verify program properties concerning list lengths at compile time. We give a definition for vect along with some examples in §2.2.5. Due to their expressive power, dependently typed languages have a long history of being used as the foundation of theorem proving systems. Such systems include NuPrl [Constable et al., 1986], Coq [Bertot and Caste´ran, 2004], LEGO [Pollack, 1994] and Alf [Magnusson and Nordstro¨m, 1994]. More recently, languages have been designed for the purpose of programming with dependent types. These languages in- clude Cayenne [Augustsson, 1998], Agda [Coquand, 1998], Epigram [McBride and McKinna, 2004] and ATS [Cui et al., 2005]. In this section, we describe the uses of dependent types in programming and in- troduce the general features of dependently typed languages. As further reading, Altenkirch et al.’s paper on “Why Dependent Types Matter” gives a practical intro- duction to these topics [Altenkirch et al., 2005]. Chapter 2. Background 7 2.2.1 Uses of Dependent Types in Programming In the following, we summarise some of the practical applications of dependent types to software development: Program verification: Dependent types have been used in previous work to capture useful program properties. Examples of this include verifying that arrays are always accessed safely [Xi and Pfenning, 1998], verifying the correctness of a stack-based compiler [McKinna and Wright, 2006] and verifying the properties of finger trees [Sozeau, 2007a]. Expressivity: Dependent types can be used to describe functions that cannot be given satisfactory types in simply typed languages. The common example is to use dependent types to define a C-style printf function, where the type of this func- tion is computed from the input formatting string [Augustsson, 1998]. More- over, dependent types can be used to reason that certain branches of a program will never be executed, avoiding the need to include redundant error handling code [Xi, 1999a]. Optimisations: When the machine knows more about the static behaviour of a pro- gram, additional optimisations can be made at compile time. Dependent types have been used to safely remove dynamic array bounds checks [Xi and Pfenning, 1998], eliminate the checks associated with dead code [Xi, 1999a] and eliminate the need to store run time type tags when implementing type safe language in- terpreters [Augustsson and Carlsson, 1999]. Outside of the dependently typed programming community, there are of course many other approaches for developing verified software. For example, two of the most well known software verification tools are Spec# [Barnett et al., 2005] and ESC/Java [Leino et al., 2000]. These extend C# and Java respectively with support for describing pro- gram specifications using Hoare-style pre and post-conditions. However, dependently typed languages allow for more expressive specifications to be captured than is possi- ble in either of these tools. In particular, specifications in dependently typed languages such as Coq, Epigram and ATS can incorporate any program term (including higher- order functions) which is not possible in Spec# or ESC/Java. Chapter 2. Background 8 2.2.2 The Curry-Howard Isomorphism The Curry-Howard isomorphism [Howard, 1980] describes the correspondence of types as propositions and programs as proofs. In other words, given that type T can be viewed as a propositional statement P, a term t that inhabits T is simultaneously a program of type T and a proof that P holds. By using types to describe program specifications, dependently typed languages therefore give an approach to programming and proving in the same language. 2.2.3 Type Erasure Type erasure is the process of removing information related to types from a program, typically for the purpose of efficiency when the program is run. With simply typed languages, we usually consider the type checking phase of compilation to be separate from the run time program execution phase. This phase distinction is not so clear in dependently typed languages which can make type erasure problematic [Mckinna and Brady, 2005]. Coq addresses this issue by providing an explicit type for proofs called Prop and a type for computational terms called Set (see §2.2.4.1 for more details) so that type erasure becomes a relatively simple process. 2.2.4 A Brief Introduction to Coq As we make use of Coq for demonstrating our ideas in this thesis, we use Coq as a formal notation to give illustrative examples of dependently typed programming in the following sections. We are only concerned with describing the general concepts of dependently typed programming at the moment but we go into more specific details about programming in Coq in Chapter 2. The core language used by Coq [Bertot and Caste´ran, 2004] is based on the Cal- culus of Inductive Constructions (CIC), a variant of intuitionistic type theory. CIC is both a dependently typed functional programming language and a constructive logic. In other words, we can use CIC to write regular functional programs and verify prop- erties about those programs within the same language. Coq is designed to adhere to the de Bruijn principle [de Bruijn, 1980], where the correctness of a Coq proof relies only on a small trusted kernel. Notable uses of Coq include the formal proof of the Four Colour Theorem [Gonthier, 2007] and the construction of a verified compiler for a large subset of the C program- Chapter 2. Background 9 ming language [Leroy, 2009]. Most would not consider Coq as a practical everyday programming language due to the level of experience required to write Coq programs and work with formal proofs. However, recent work, which we cover in §3.3, has aimed to make programming with dependent types in Coq more convenient [Sozeau, 2007b,Sozeau, 2008]. In this section, we give a brief introduction to Coq, starting with its features that have more in common with simply typed languages before discussing concepts of de- pendently typed languages. We make use of the following Coq conventions: • t :T means “term t has type T”. • t1 . . . tn:T means “the terms t1 . . . tn have type T”. • id := t :T means “id is an identifier with the value t of type T”. 2.2.4.1 Sorts In Coq, the type of a type is referred to as a sort. There are two standard sorts called Prop and Set [Coq development team, 2006]: • The Prop sort is the type of propositions. Given x:P and P:Prop, x can be inter- preted as a proof that proposition P holds. • The Set sort is the type of program specifications. Given x:S and S:Set, x can be interpreted as a program that satisfies the specification S. When writing Coq programs, we generally want terms that are computationally rele- vant to have the type Set and terms that are only required for reasoning about program behaviour at compile time to have the type Prop. Terms of type Prop are erased during Coq’s program extraction process [Paulin-Mohring, 1989]. Sorts themselves have types. There is a family of infinite sorts called Type(i), where i is a natural number. The types of the various sorts are Prop:Type(0), Set:Type(0) and Type(i):Type(i + 1). The type index of a Type term is usually left implicit in Coq programs. 2.2.4.2 Inductively Defined Types and Function Definitions Inductively defined types are a common tool used for representing data in simply typed functional languages. For example, Peano numbers can be defined inductively in Coq with the following ML-like definition: Chapter 2. Background 10 Inductive nat : Set := | O : nat | S : nat → nat . The above definition introduces nat as a new type with the constructor O (note the uppercase letter) to represent zero and S n to represent the successor of n. We can now construct nat terms such as S O and S (S O) to represent the numbers “one” and “two” respectively. 0, 1 and 2 are used as syntactic sugar for the corresponding nat terms. The following function to sum two nat terms can be defined in the usual functional programming style, using pattern matching and recursion: Fixpoint plus ( n m: nat ) : nat := match n with | O ⇒ m | S p ⇒ S ( p lus p m) end . We use the infix notation + for plus in later examples. Inductive types can be parametrised by another type to create a family of inductive types. For instance, an inductive type for polymorphic lists, parametrised by a type A, can be defined in Coq as follows: Inductive l i s t (A :Type ) : Type := | n i l : l i s t A | cons : A → l i s t A → l i s t A . The constructor nil constructs the empty list and, given a head element h and another list t , cons concatenates h onto t . Notice that the use of the Type sort allows the list type to be parametrised by types that have type Sort or type Prop. We make use of the notation h:: t for cons h t, [] for nil and [x1; . . . ; xn] as shorthand for x1 :: . . . :: xn :: nil . Note that, for brevity, we sometimes leave type parameters implicit in examples. As further examples and to introduce common function definitions, the following shows the length function for calculating the length of a list and the app function for appending two lists: Chapter 2. Background 11 Fixpoint l eng th (A :Type ) ( a : l i s t A) : nat := match a with | [ ] ⇒ O | h : : t ⇒ S ( leng th t ) end . Fixpoint app (A :Type ) ( a b : l i s t A) : l i s t A := match a with | [ ] ⇒ b | h : : t ⇒ h : : app t b end . We make use of the infix notation of ++ in later examples for app. 2.2.5 Dependent Types We use the phrase full dependent types when dependent types can depend on any value. As we will discuss in §2.3, some languages, like DML, make use of only restricted forms of dependent types. For now, we discuss the general features common to lan- guages with full dependent types like Agda, Epigram, ATS and Coq. 2.2.5.1 Dependent Function Types Dependent function types generalise the usual function space by allowing functions to be defined that have the type ∀ (x:A), B, where A and B are types and B can depend on x (i.e. second-order polymorphism is allowed). Dependent function types are also known as Π types and dependent product types. In the special case when x does not occur free in B, we simply write A →B. Recall that we use the type vect A n to represents lists of length n. We can use dependent function types to accurately describe the length of the list returned by a function vapp for appending two vect terms. A suitable type for such a function is as follows: vapp : ∀ (A : Set ) ( n m: nat ) , vect A n → vect A m → vect A ( n + m) . We give definitions for vect and vapp over the next two sections. Chapter 2. Background 12 2.2.5.2 Inductive Families Dependently typed languages differ from simply typed languages in that they allow in- ductive types to be defined that are parametrised by a value. Such types are commonly referred to as inductive families [Dybjer, 1991]. The value that a type is parametrised over is sometimes referred to as the type index. As mentioned previously, the traditional example of an inductive family is the type vect n, that represents lists of length n. The vect type is defined by indexing a list data structure with a natural number representing the list length, such as in the following definition: Inductive vect (A : Set ) : nat → Set := | v n i l : vect A O | vcons : ∀ ( n : nat ) , A → vect A n → vect A (S n ) . The vnil constructor creates the empty list with type index O, representing a list of zero length. The vcons constructor concatenates an item onto a list of length n, creating a list of the expected size of length S n. Notice that the constructors have dependent function types. 2.2.5.3 Capturing Program Properties The type indices of inductive families can be used to statically verify properties of data structures. As an illustrative example of this, we now define a Coq function for appending vect lists: Fixpoint vapp (A : Set ) ( n m : nat ) ( a : vect A n ) ( b : vect A m) : vect A ( n + m) := match a with | v n i l ⇒ b | vcons n ’ h t ⇒ vcons h ( vapp2 t b ) end . The output of vapp states that the resultant list should have the same length as the sum of the lengths of the input lists. The function takes five parameters: a type parameter A, the length of the two input lists (parameters n and m) and the two vect list terms to append (parameters a and b). The form of this definition is similar to app from §2.2.4.2, with the exception of the added type indices. Chapter 2. Background 13 Notice that Coq automatically infers that the type parameter for vcons and vapp in the body of the function must be A. Additionally, Coq is able to infer the value of the type indices for these constructors. However, Coq requires that the type indices are given in each pattern matching expression i.e. we must write vcons n’ h t instead of vcons h t in the step case pattern matching expression. 2.2.5.4 Type Equality Coq’s type checker will verify that vapp is well typed and thus always returns a list of the expected length. To help describe what type checking dependently typed programs involves, we first consider the type of the term required for each pattern matching clause and how this compares to the type of the term given: • For the vnil case, the type expected is vect (0 + m). The term given for this case is b, which has type vect m. • For the vcons case, the type expected is vect (S n’ + m). The term given for this case is vcons h (vapp t b), which has type vect (S (n’ + m)) as vapp t b has type vect (n’ + m). In both cases, the indices of the expected type and the actual type differ. The definition vapp type-checks because Coq uses intensional type equality. Intensional equality dictates that two types are equal if the types have the same normal form. We explain how normal forms are calculated in Coq in §3.1.1. For now, it suf- fices to say that this involves applying standard reduction rules to terms until no more reductions apply. We now reconsider the cases from the vapp example as follows: • For the vnil case, the normal form of the expected type vect (0 + m) is vect m, which matches the type of term b. • For the vcons case, the normal form of the expected type vect (S n’ + m) is vect (S (n’ + m)), which matches the type of term vcons h (vapp t b). In Coq, only total functions are allowed and calculating the normal form of a type is a decidable operation (see §3.1.1). Another approach when comparing types is to use extensional type equality. This notion of equality says that two types are equal if both types are inhabited by the same members. Determining this typically requires constructing proofs as part of Chapter 2. Background 14 type checking. This notion of equality therefore generally leads to undecidable type- checking. For instance, NuPrl [Allen et al., 2000] uses extensional equality when type checking dependently typed functions and has undecidable type checking. Extensional equality tends to be more generous about what types are equal compared to intensional equality. 2.2.5.5 Proof Construction When the unrestricted use of dependent types is allowed, proofs must typically be constructed when writing programs. We now give an example of a program where proofs are required to be written. The following attempt to implement list reversal for vect terms looks as if it should be accepted by Coq but is actually ill-typed (the need for the in . . . return annotation is not important to this example): Fixpoint vrev (A : Set ) ( n : nat ) ( a : vect A n ) : vect A n := match a in vect n return vect n with | v n i l ⇒ v n i l A | vcons n ’ h t ⇒ vapp ( vrev t ) ( vcons h ( v n i l A) ) end . Coq’s type checker will report that this program is ill-typed with the following message pinpointing the vcons case as the problem: The term “vapp (vrev A n’ t ) (vcons h ( vnil A))” has type “vect A (n’ + 1)” while it is expected to have type “vect A (S n’)”. This error arises because vect A (n’ + 1) is not the same type as vect A (S n’) under intensional type equality as the type indices do not share the same normal forms. The + function is defined recursively on the first argument, meaning that computations with + involve examining the structure of the first argument. As n’ + 1 has a variable in the recursive position, it therefore will not reduce to and match S n. One approach to fixing the faulty vrev function is to justify that n’ + 1 means the same as S n’ with a proof of the lemma plus S : ∀ n, n + 1 = S n. This can be achieved here with the help of eq rec which is a standard Coq term that can be used to substitute some term x with some term y given a proof of x = y. In other words, eq rec is a theorem that represents the substitution property of equality. To fix the error message given for vrev, we can make explicit use of eq rec with plus S to justify that the type index for the type of vapp (vrev t ) (vcons h vnil ) has Chapter 2. Background 15 the same meaning as the expected type index. The following script makes use of this approach to give a well-typed version of the vrev function (where the terms that go in place of the symbols are implicit and can be determined by Coq): Fixpoint vrev ( n : nat ) ( a : vect n ) : vect n := match a in vect n return vect n with | v n i l ⇒ v n i l | vcons n ’ h t ⇒ eq rec ( vapp ( vrev t ) ( vcons h v n i l ) ) ( p lus S n ’ ) end . When we refer to providing proof automation for dependently typed programming, we are referring to machine support for constructing proofs such as the one needed in the above example. 2.2.5.6 Dependent Sums Dependent sum types, also known as Σ types, represent pairs where the type of the second component can depend on the first. Typically, the first component is a value and the second component is a proof that some property holds for this value. Coq includes definitions for what are called weak and strong dependent sums. Strong dependent sums corresponds most closely to the standard notion of dependent sums. When performing pattern matching on a strong dependent sum term, both com- ponents are accessible, whereas for weak dependent sums only the first component is accessible. We favour the terminology of subset types in this thesis to refer to weak dependent sums. Given A : Type and P : A→Prop, a Coq subset type is denoted as {x:A | P}. A member of this subset type is constructed using a term y : A and a term of type P y. As an example of a subset type, the type {x: list A | length x = n} describes lists of length n. This particular subset type offers an alternative representation to the type vect A n for representing a list of a fixed length. Notably, the subset type representation makes use of the type list to represent the list of items and the simply typed length function is used to describe the number of items. In comparison, the vect inductive family represents the list structure by itself and the type indices of the inductive family are used to describe the number of items in the list. Chapter 2. Background 16 Related to the above, the notation {A}+{B} is used in Coq to denote the construc- tive sum of the propositions A and B. This is represented as the type sumbool using the following definition: Inductive sumbool (A B:Prop ) : Set := | l e f t : A → sumbool A B | r i g h t : B → sumbool A B Constructive sums can be thought of as a more informative version of booleans. For example, a function that decides whether two nat terms are equal could have the fol- lowing type: nat eq dec : ∀ ( x y : nat ) , {x = y}+{x 6= y} The nat eq dec x y function returns either a proof of x = y or a proof of x 6= y. 2.2.6 Recursion and Termination Most popular programming languages allow unrestricted recursion and it is not un- common for programming errors to result in programs that fail to terminate at runtime. As type checking dependently typed languages can involve performing computations at compile time, non-terminating functions can cause type checking to loop. For this reason, languages based on type theory, like Coq, typically only allow terminating functions to be defined. A common restricted form of recursion that ensures termination is structural re- cursion, where the argument to the recursive call must be structurally smaller than the argument to the parent call. For example, vapp (see §2.2.5.3) and app (see §2.2.4.2) were defined by structural recursion. To define a structurally recursive function in Coq, we can make use of the Fixpoint command using the following syntax: Fixpoint f ( x1 :A1 ) . . . ( xn :An ) {struct xi} : T := t This introduces the function definition f that has the parameters (x1:A1). . . (xn:An) and returns a term of type T. The function f is implemented by the term t . The expres- sion {struct xi}, where xi is one of the function parameters, specifies which argument becomes smaller at each recursive call. As an example of using the Fixpoint com- mand, the following code will introduce a function called plus, where plus is defined by structural recursion on the first argument: Fixpoint plus ( n : nat ) (m: nat ) {struct n} : nat := Chapter 2. Background 17 match n with | O ⇒ m | S p ⇒ S ( p lus p m) end . Coq can usually infer the structurally recursive parameter and the struct expression can be left out in many cases. Sometimes we may wish to define functions that are not structurally recursive. One approach to defining such functions in type theory is to make use of an accessibility predicate to prove that recursion is well-founded [Aczel, 1977]. In §3.3.4, we give an example of how to define a non-structurally recursive function in Coq. 2.2.7 Impossible Cases Dependent types can be utilised to prove that certain program branches will never be executed at run time. We call such branches impossible cases. For example, consider implementing a function hd that returns the head of a list. The programmer must decide what to do when the function is applied to the empty list. In simply typed languages, choices here include returning a default value, returning an option type, throwing an exception or defining hd as a partial function. We can use dependent types to specify that hd can only be applied to non-empty lists by giving an appropriate type to hd, such as the following: hd : ∀ (A :Type ) , {a : l i s t A | a 6= [ ] } → A Here, we make use of a subset type to specify that the input list to hd cannot be empty. The hd function can be defined as usual by pattern matching on the input list, but for the case of the empty list, we can use the assumption that the input list must be non-empty to prove that such a scenario is impossible. We give a definition for hd in §3.3.3. 2.3 Dependently Typed Programming Languages We now give an overview of the dependently typed programming languages that are currently available. As the line between proving and programming is blurred with dependent types, it can become unclear at times where to draw a distinction between proof assistants and programming languages. Our intention is to survey systems that are primarily intended to be used for programming with dependently types, as opposed to proof development. Chapter 2. Background 18 We begin with languages, such as DML [Xi, 1998] and Concoqtion [Fogarty and Pasalic, 2007], where dependent types have been added to existing simply typed lan- guages with varying levels of restrictions. We then discuss languages with full depen- dent types based on type theory, such as Epigram [McBride and McKinna, 2004] and Agda [Norell, 2007,Bove et al., 2009]. 2.3.1 DML DML [Xi, 1998] is a conservative extension of the ML programming language that allows the use of a restricted form of dependent types. Type indices are restricted to what can be represented in a constraint domain C, where type checking involves solving constraint satisfaction problems in C. The authors demonstrate DML with C instantiated to the domain of linear integer arithmetic. Type checking in this domain is decidable and a variant of Fourier-Motzkin method [Dantzig and Eaves, 1973] is used in DML’s implementation for constraint solving. Examples of the program properties that can be captured in DML include verifying that array accesses are safe and that binary tree operations preserve tree balancing properties. 2.3.2 ATS The obvious limitation of the current implementation of DML is that type indices are limited to the domain of linear arithmetic. The ATS language extends DML with an expressive type system for encoding arbitrary program properties [Cui et al., 2005]. Automation for linear arithmetic proofs is still provided but the user is expected to construct any other required proofs themselves in the form of total functions. ATS uses a Coq-like separation for the logical and computational parts of programs, where the former is insulated from problematic features, such as side-effects and general recursion, used in the language for computations. 2.3.3 Sage Sage [Gronski et al., 2006] is a pure functional programming language with dependent types where any program term can appear in a type. Proof obligations, generated during type checking, are translated into a form that can be processed by an external theorem prover. The current implementation makes use of the Simplify theorem prover [Detlefs et al., 2005] for type checking, which is assumed to cope well with linear Chapter 2. Background 19 arithmetic theorems. There appear to be no facilities in Sage for the user to write proofs when this automation fails. An interesting feature of Sage is that program properties that cannot be verified or refuted statically are enforced at run time using a dynamic check. This idea presents a practical solution for when we wish to experiment with a dependently typed program and are unable or unwilling to construct the proofs necessary for type checking. If a runtime check in a Sage program fails during execution, the counterexample found is added to a database. This database is used at compile time to statically detect future errors of the same form. 2.3.4 Concoqtion Concoqtion [Fogarty and Pasalic, 2007] refers to an approach for extending the type system of an existing language by using a constructive type theory, where the latter has existing proof checking software available. The logical and computational languages are kept separate to allow for decidable type checking. The Concoqtion approach is demonstrated with MetaOCaml Concoqtion, which conservatively extends MetaO- Caml [Calcagno et al., 2003] with indexed types. Coq terms are used as the logical language for type indices, where type checking involves the use of the Coq theorem prover. Programs scripts can include Coq proofs and make use of Coq’s tactics and decision procedures. 2.3.5 Cayenne Cayenne [Augustsson, 1998] is a Haskell-like language where type indices can depend on any program expression. As Cayenne programs can make use of unguarded gen- eral recursion, it is possible for a non-terminating term to appear in a type. As type checking involves term evaluation, the type checker can thus can fail to terminate and typechecking is undecidable. As a workaround, the number of reduction steps per- formed while type checking a term can be given an upper bound. The type checker is then sometimes unable to tell whether a term is well-typed or not. To allow type erasure, there are no constructs in Cayenne that allow programs to depend on a type. Chapter 2. Background 20 2.3.6 Epigram Epigram [McBride and McKinna, 2004] is a functional programming language with full dependent types. Epigram programs are elaborated to an intensional type the- ory based on the UTT [Luo, 1994], which is a strongly normalising language with decidable type checking. There is no explicit separation between the logical and com- putational language in Epigram programs. Epigram programs are written in a structured editor that is similar in style to the editors for Alf [Magnusson and Nordstro¨m, 1994] and Agda [Coquand, 1998]. Pro- grams in Epigram are constructed incrementally by invoking tactics to construct terms. These terms can contain holes that represent missing subterms that the user must com- plete. Epigram however includes little in the way of proof automation and even simple arithmetic proofs must be proven by hand. 2.3.7 Agda Agda [Norell, 2007,Bove et al., 2009] is a dependently typed programming language based on intuitionistic type theory which has many similarities to Epigram. Unlike Coq, Agda does not support tactic based theorem proving and instead relies on terms being manipulated by hand. Moreover, there is no Coq-like distinction between com- putational and logical terms in Agda. Programming in Agda has been made practical with the introduction of a program testing tool [Dybjer et al., 2003b,Dybjer et al., 2003a,Qiao Haiyan, 2003] and a tool for automating inductive proofs [Lindblad and Benke, 2006]. However, as far as we know, these tools have not been integrated in any way. 2.3.8 Idris Idris [Brady, 2008] is a language with full dependent types that is described as being closely related to Epigram and Agda. Idris has Haskell-like syntax and is implemented on top of a theorem prover for Haskell called Ivor [Brady, 2007]. A compelling practi- cal feature of Idris is that it supports I/O operations and communication with external programs. Similarly to Epigram and ATS, Idris does not offer any significant proof automation to make dealing with proof obligations easier. Chapter 2. Background 21 2.4 Inductive Theorem Proving We now move onto discussing aspects of automated theorem proving. Specifically, inductive theorem proving is an important technique for reasoning about functional programs and is the focus of much of our proof automation work. Induction is a common technique for reasoning about recursively defined data structures and recursively defined functions. Induction is usually performed on a free variable in the goal using a suitable induction principle. For example, when declaring a new inductive data type, Coq will automatically generate a standard induction prin- ciple for that type. The following induction principle is automatically generated for Coq’s standard list type: ∀ (A :Type ) (P : l i s t A → Prop ) , P [ ] → (∀ ( h :A) ( t : l i s t A) , P t → P (h : : t ) ) → (∀ ( x : l i s t A) , P x ) As an example of an inductive proof, consider the following theorem: ∀ (A :Type ) ( a b : l i s t A) , leng th ( a ++ b ) = leng th a + leng th b To prove this theorem, we can proceed by induction on variable a, using the standard induction principle for the list type given above. We must then prove two subgoals. The first subgoal is the base case goal of the inductive proof is as follows, which is trivially true by reflexivity in this example: l eng th ( [ ] ++ b ) = leng th [ ] + leng th b The second goal is the step case goal of the inductive proof which is as follows: IH : ∀ b , leng th ( a ++ b ) = leng th a + leng th b leng th ( ( h : : a ) ++ b ) = leng th ( h : : a ) + leng th b The assumption introduced in the step case is usually referred to as the inductive hypothesis. Proving the step case of an inductive proof generally relies on transforming the conclusion of the goal so that the inductive hypothesis can be utilised. It is well known that providing automation for inductive proofs is challenging since the latter is generally undecidable and the failure of cut elimination means most induc- tive proofs require new lemmas to be speculated [Bundy, 2001]. Moreover, proof search includes choices such as the following: Chapter 2. Background 22 Variable choice: When a conjecture includes several variables, we must decide which variable or variables to perform induction on. Induction principle choice: Sometimes the standard induction principle is not appro- priate for the proof at hand. We may have to choose from a selection of induction principles and sometimes we may even need to invent a new one. Generalisation: Instead of performing induction directly on the goal we are trying to prove, it is sometimes necessary to prove a more general version of the original goal first. We discuss this technique more in §2.8. 2.5 Rippling Rippling [Bundy et al., 2005] is an automated theorem proving technique that has been successfully used to automate nontrivial inductive proofs. This technique has been implemented in several theorem provers, including Clam [Bundy et al., 1990], INKA [Hutter and Sengler, 1996], NuPrl [Pientka and Kreitz, 1998b] and Isabelle [Dixon and Fleuriot, 2003]. The use of rippling is a central feature of our proof automation. We give a brief overview of the rippling approach in this section. A more formal and in-depth discus- sion of rippling can be found in [Bundy et al., 2005]. 2.5.1 Overview Rippling applies whenever a theorem (labelled the given), shares syntactic similarities with the conclusion (labelled the goal). Rippling directs a proof attempt by using rules that reduce syntactic differences between the goal and the given, where the aim is to utilise the given to advance the proof. When applied to inductive theorem proving, the inductive hypothesis in the step case of an inductive proof is considered to be the given. Rippling is applicable here as, with typical induction principles, the inductive hypothesis and the goal share syntactic similarities. Application of the inductive hypothesis to the conclusion is usually crucial to proving the step case. In traditional rippling proofs, modification of the goal is only allowed if differences are reduced with respect to the given. This requirement greatly restricts which rules Chapter 2. Background 23 can be applied. Difference reducing rules are called wave rules. As differences can only be reduced a finite number of times in proofs, rippling always terminates. The rippling technique is known to provide advantages over typical simplification tactics. For example, rippling can guide the use of associativity and commutativity lemmas in step case proofs in ways that will allow the inductive hypothesis to be ap- plied [Bundy et al., 2005]. Lemmas such as these can lead to non-terminating be- haviour when used naively as part of simplification. Rippling can also guide the use of case splits in proofs where, in comparison, the naive use of case splitting during simplification can lead to non-terminating behaviour [Johansson, 2009]. 2.5.2 Differences and Embeddings Consider again the step case of the inductive proof described in §2.4 from a rippling perspective. The given and the goal are as follows: Given: ∀ b, length (a ++ b) = length a + length b Goal: length ((h :: a) ++ b) = length (h :: a) + length b The given is syntactically similar to goal in that, if we remove certain terms from the latter, the given will match against the goal. We can annotate which terms in the goal are different to the given by shading-in those terms as follows: length (( h :: a ) ++ b) = length ( h :: a ) + length b Intuitively, we know that the differences have been correctly annotated when removing the annotated terms produces the given. As such, the term that was used as the given can be inferred from the annotated term. Each collection of shaded terms that represent a difference is referred to as a wave front. The unshaded subterm within each wave front, which is part of the given, is referred to as a wave hole. Note that a goal can have multiple valid annotations. When we can annotate a goal in the above way, we say that an embedding exists and that the given embeds into the goal. 2.5.3 Fertilisation When the entire given matches a subterm within the goal, the matching term in the goal can be replaced with True. This step is called strong fertilisation. Chapter 2. Background 24 When the given is an equation and one side of this equation matches a term t in the goal, the given can be used as a rewrite rule to replace t . This step is called weak fertilisation. Rippling proofs usually involve “rippling-out” wave fronts to the top of the term tree of the goal to allow fertilisation to occur. In rippling proofs, we indicate the direction in which a wave front is moving during the proof attempt with a small arrow on the right of the wave front. Wave fronts that are being rippled-out are indicated with an upwards arrow. Differences can also be “rippled-in”, shown with a downwards arrow on the right of the wave front, to allow fertilisation. When the given contains a universally quantified variable x, strong or weak fertilisation can still occur if differences are moved next to the position in the goal that corresponds to the position of x. Any differences moved to this position can be used to instantiate x in the given when fertilising. The position of terms like x in the given are referred to as sinks. The b variable in the annotated term from the previous section is a sink variable. We can indicate b is in a sink position by annotating the variable as follows: bbc. 2.5.4 Ripple Measures Rippling proofs make use of a metric to determine if a transformation has reduced the differences between the goal and the given. For example, to calculate the “sum of distances” measure for an annotated goal, we sum the distance from each outward wave front to the top of the term tree and add this to the sum of the distance from each inward wave front to its closest sink [Dixon and Fleuriot, 2004]. The measure of the goal can therefore be reduced by moving outward wave fronts towards the top of the term tree and moving inward wave fronts towards sinks. Figure 2.1 gives a concrete example of how to calculate the measure of an annotated term. 2.5.5 Example: A Rippling Proof Figure 2.2 shows a rippling proof of the step case example from §2.5.2. This proof makes use of the following lemmas (note that it would be up to the user to decide on the set of lemmas that should be used during rippling proofs): Chapter 2. Background 25 Annotated term: length ( h :: a ++bbc ↓ )= length ( h :: a ↑ ) + length bbc Annotated syntax tree: = length + h:: ++ ↓ length length a bbc h:: a ↑ bbc Figure 2.1: The above shows the syntax tree for an annotated term. Nodes are dec- orated with wave fronts that indicate the differences in the goal. The inward wavefront is 1 step from its nearest sink and the outward wavefront is 3 steps from the top of the term tree. Thus, the measure for this annotation is 1 + 3 = 4. w1 : ∀ h t b, (h :: t ) ++ b = h ::( t ++ b) w2 : ∀ h t , length (h :: t ) = S (length t ) w3 : ∀ x y, S x + y = S (x + y) w4 : ∀ x y, x = y → S x = S y 2.6 Proof Planning and Critics Proof planning is an approach that provides high-level guidance to a proof attempt with the use of so-called proof plans [Bundy, 1988]. A proof plan can be thought of a high-level outline of the proof that we intend to generate. The purpose of a proof planner is to construct an appropriate proof plan for the goal we wish to prove, where this proof plan will then be used to guide to proof search. Proof plans are composed of methods. A method is composed of a tactic along with a formal specification of the pre-conditions and post-conditions for applying that tactic. For example, given a rippling tactic, a rippling method that uses this tactic could have the pre-condition that the goal must contain an assumption that embeds into the conclusion of the goal and the post-condition could be that strong fertilisation must have taken place when the tactic succeeds. A critic describes a strategy for fixing failed proof attempts during the execution Chapter 2. Background 26 length (( h :: a ↑ ) ++bbc) = length ( h :: a ↑ ) + length bbc ⇓ LHS rippled out using w1. length ( h :: a ++bbc ↑ ) = length ( h :: a ↑ ) + length bbc ⇓ LHS rippled out using w2. S (length (a ++bbc)) ↑ = length ( h :: a ↑ ) + length bbc ⇓ RHS rippled out using w2. S (length (a ++bbc)) ↑ = S (length a) ↑+ length bbc ⇓ RHS rippled out using w3. S (length (a ++bbc)) ↑ = S (length a + lengthbbc) ↑ ⇓ Final differences removed using w4. length (a ++bbc) = length a + lengthbbc Figure 2.2: An example rippling proof for the step case from §2.5.2. After rewriting the conclusion with a lemma, we recalculate the differences between the modified conclu- sion and the given. Notice that each step reduces the differences between the goal and the given until, at the end, strong fertilisation is possible. Chapter 2. Background 27 of a proof plan [Ireland and Bundy, 1996]. Critics are attached to methods and are invoked when a method fails in a specific way. Several critics have been developed that are based specifically on patching failed rippling proofs based on different failure conditions. The most commonly applicable critic is lemma calculation which can be used to discover lemmas rippling needs to succeed. Lemma calculation is invoked when there are no more difference reducing rules to apply in the proof attempt and only weak fertilisation is possible. Lemma calculation proceeds by starting a proof of a new conjecture whose form is derived from a weak fertilised and generalised version of the current goal. If this new conjecture is proven, the lemma proven is used to unblock the rippling proof. 2.7 Lemma Discovery The lemma calculation critic for rippling attempts to conjecture a missing lemma at the stage in a proof where the lemma is needed. An alternative to this lazy approach to lemma discovery is to attempt to eagerly discover lemmas that may be useful prior to attempting proofs. We briefly discuss this domain of lemma discovery for inductive theorems here. IsaCoSy is a system for Isabelle for discovering inductive theorems [Johansson, 2009]. Given a set of initial theorems, IsaCoSy will generate a set of constraints for how these theorems should be used to generate conjectures. Constraints are gener- ated in such a way as to avoid naively generating conjectures that trivially follow from currently known theorems. A counterexample checker is first used to filter nontheo- rems from this list of conjectures. IsaPlanner’s rippling-based proof automation then attempts to find a proof for each conjecture. MATHsAiD [McCaslan et al., 2007] aims to discover inductive theorems and em- ploys heuristics to identify theorems that mathematicians would consider interesting, as opposed to generating an exhaustive list of all theorems that can be found. Whereas IsaCoSy generates conjectures to prove, MATHsAiD uses forward-chaining to dis- cover new theorems. Chapter 2. Background 28 2.8 Generalisation Generalisation [Aubin, 1976, Boyer and Moore, 1979, Aderhold, 2007] is a theorem proving technique where, instead of proving the current goal g, we prove a lemma g′ that is a generalised version of g and use g′ to prove g. Somewhat counter-intuitively, g′ is usually easier to prove than the more specialised goal g. In some cases, gener- alisation is required before performing induction so that the inductive hypothesis is strong enough for a proof to be found. Following the terminology in [Walther, 1994], some common generalisation approaches are as follows: Common subterm generalisation involves identifying a set of common non-variable subterms in the conclusion and replacing these subterms by a fresh variable. For example, the statement (x + 1) + y = y + (x + 1) can be made more gen- eral by generalising the common subterm x + 1 to z to produce the statement z + y = y + z. Generalising apart is performed by replacing only some of the occurrences of a re- peating variable in a statement with a fresh variable. For example, the statement x + x + y = x + y + x can be made more general by generalising apart the occur- rences of x to z to produce z + x + y = z + y + x. Inverse functionality is used to generalise equations where the top level function used on both sides of the equation match. Inverse functionality is used to gener- alise statements of the form f x1 . . . xn = f y1 . . . yn by removing the applica- tion of f to produce the statement (x1 = y1) ∧ . . . ∧ (xn = yn). For example, S (x + 1) = S (S x) can be generalised to x + 1 = S x. Inverse weakening involves removing unnecessary conditions from a statement. For example, the statement ∀ x y, x 6= y → x + y = y + x can be generalised to the statement ∀ x y, x + y = y + x as the condition x 6= y is not required to write the proof. However, generalisation can be an unsafe proof step in that a provable goal can be overgeneralised to form a new goal that is not provable. The typical approach is to make use of a counterexample finder to detect overgeneralisations. We discuss coun- terexample generation in the next section. Chapter 2. Background 29 2.9 Counterexample Generation Testing can be used for checking the correctness of a conjecture before attempting a proof and as a light-weight alternative to formal verification. Testing usually refers to checking whether a conjecture holds for some finite number of example instantiations to increase confidence that the conjecture is actually a theorem. A counterexample to a conjecture is some example that falsifies that conjecture. For a universally quantified conjecture of the form ∀ x, P x, it suffices to find one instance of x such that P x leads to a contradiction to falsify that conjecture. We examine some practical tools for finding counterexamples in this section. 2.9.1 QuickCheck QuickCheck [Claessen and Hughes, 2000] is a well-known tool for Haskell that offers automated assistance for program testing. The programmer supplies QuickCheck with universally quantified conjectures about the behaviour of their program and QuickCheck tests these conjectures by searching for counterexamples. To check for a counterex- ample, QuickCheck replaces the universally quantified variables in the conjecture with appropriately typed randomly generated terms. QuickCheck then evaluates the truth of the conjecture with these concrete values. If the conjecture evaluates to false, a counterexample has been found. 2.9.1.1 Generators Testing statements that include pre-conditions can be challenging as randomly gener- ated data is unlikely to satisfy the necessary conditions. For example, consider testing the following statement where sorted x is true when x is a sorted list and, under the condition that y is a sorted list, insert i y inserts the item i into sorted position into list y: ∀ x i , sor ted x → sor ted ( i n s e r t i x ) Except for small terms, random instantiations of x are highly unlikely to satisfy the condition of being sorted. Naive term generation will result in most test cases being trivially true, leading to poor test coverage. One approach to this problem in QuickCheck is to use a custom generator function. The purpose of a generator is to generate random terms that always satisfy a certain Chapter 2. Background 30 condition. For example, a generator for the condition above would randomly generate sorted lists. 2.9.2 SmallCheck and Lazy SmallCheck SmallCheck and Lazy SmallCheck offer an alternative approach to QuickCheck’s ran- dom generation of terms for finding counterexamples [Runciman et al., 2008]. Small- Check searches for counterexamples by exhaustively testing, up to some fixed maxi- mum term size, all possible term instantiations. This approach has the advantage that the smallest, and thus usually easiest to inspect, counterexample will be found and the need for writing custom generators is reduced. Lazy SmallCheck makes use of partially-defined inputs to prune large areas of the test space [Runciman et al., 2008]. For example, given that sorted [2; 1; x] evaluates to false without evaluating x, testing variants of x is unnecessary. Lazy SmallCheck results in significant performance improvements in many cases and allows a greater search depth to be tested in less time than with SmallCheck [Runciman et al., 2008]. 2.9.3 Testing in Agda Of particular relevance, a QuickCheck-like tool is available for Agda and has been used in case studies to develop dependently typed programs [Dybjer et al., 2003b, Dybjer et al., 2003a,Qiao Haiyan, 2003]. This tool has been mostly implemented within Agda itself, where custom generators are written as Agda functions. An interesting practical utility of this is that we can formally verify within Agda that such generators have the expected property of being surjective functions. In other words, we can verify that the generator for a type is able to generate all possible terms of that type. 2.10 Summary We have introduced the primary features of dependently typed programming languages and, in particular, described the need for proof construction when writing programs. We then surveyed the current development environments available for dependently typed programming followed by an introduction to topics concerning automated induc- tive theorem proving and program testing. In the next chapters, we elaborate further on how dependently typed programs are constructed before introducing our framework for supporting this process. Chapter 3 Programming with Dependent Types This chapter describes ways in which dependently typed programs can be constructed and summarises design choices that need to be considered when capturing program properties with types. These topics will become important later when we discuss the design and scope of our framework for supporting dependently typed programming. As we want to provide support for programming with full dependent types, we fo- cus on features common to languages like Agda, Epigram, ATS and Coq. Moreover, we introduce the Russell language for Coq [Sozeau, 2008], which we utilise in our framework prototype. 3.1 Coq In this section, we describe some specific details related to programming in Coq to pro- vide more clarity to the examples we give later. We recommend [Bertot and Caste´ran, 2004,Gime´nez and Caste´ran, 2005] as further material for learning how to program in Coq and the use of Coq’s manual [Coq development team, 2006] as a reference guide. 3.1.1 Reductions, Normalisation and Convertibility We now describe the reduction rules used in Coq to perform computations. These rules are of particular relevance in their use to compute normal forms when comparing types during type checking (see §2.2.5.4). The notation t{x/u} is used to represent the term that results from substituting all free occurrences of x in term t by u, where α-conversion is used to avoid variable capture. The reduction rules used in Coq are as follows [Coq development team, 2006]: 31 Chapter 3. Programming with Dependent Types 32 δ-reduction: This reduction is used for unfolding definitions. If id is an identifier with the value v in the current context, then δ-reducing id in the term t results in the term t{id/v}. β-reduction: A term of the form (fun x ⇒ s) t is called a β-redex. Performing β- reduction on such a term results in the term s{x/t}. ζ-reduction: Performing ζ-reduction on a term of the form let x := s in t results in the term t{x/s}. ι-reduction: Informally, this reduction will simplify a pattern matching expression by determining which pattern matches and making the appropriate simplification (see [Coq development team, 2006] for further details). A term is said to be in normal form when none of the above reductions can be applied. In Coq, sequences of reductions on terms have several important properties including: Strong normalisation: A term can only be reduced a finite number of times and will eventually reach a normal form. Confluence: If t1 can be reduced to the terms t2 and t3, both t2 and t3 can then be reduced to the term t4. As reductions are strongly normalising and confluent, all terms have unique normal forms [Bertot and Caste´ran, 2004]. If two terms can be reduced to the same term by reductions, the terms are said to be convertible. For example, the terms 1 + 1 and 2 are convertible because 1 + 1 can be reduced to 2 using a combination of δ, β and ι reduction (these reduction steps were explained at the start of the section). However, if + is defined to be structurally recursive on the first argument, x and x + 0 are not convertible as these terms are already in normal form. 3.1.2 Equality Equality in Coq is represented as a parametrised inductive definition called eq, which has the following type: eq : ∀ (A :Type ) , A → A → Prop We write eq A x y as x = y, where the type parameter is implicit. The only constructor for eq is refl equal , which has the following type: Chapter 3. Programming with Dependent Types 33 r e f l e q u a l : ∀ (A :Type ) ( x :A) , x = x For example, the term 1 = 1 represents a proposition and refl equal nat 1 is a proof of this proposition. The term 1 = 0 represents a proposition for which there exists no proof. Notice that the definition of = only allows two terms that have convertible types to be compared. This can be problematic when, for example, we want to compare a term of type vect n and a term of type vect (n + 0). McBride’s so-called “John Major” equality, which can be defined in Coq, provides an approach for making such comparisons [McBride, 2000]. 3.2 Program Construction In this section, we consider approaches for constructing dependently typed programs. We consider manual term construction and term construction with the use of tactics. We then introduce the Russell language for Coq [Sozeau, 2008] which we utilise in our framework prototype. This language gives a convenient approach to dependently typed programming as it allows the computational and logical parts of a program to be constructed separately. 3.2.1 Constructing Programs Manually Dependently typed programs can be constructed by working directly with the term language to build programs by hand. The following Coq function for reversing vect lists from §2.2.5.5 was written in this manner: Fixpoint vrev ( n : nat ) ( a : vect n ) : vect n := match a in vect n return vect n with | v n i l ⇒ v n i l | vcons n ’ h t ⇒ eq rec ( vapp ( vrev t ) ( vcons h v n i l ) ) ( p lus S n ’ ) end . Recall that we were required to add propositional terms to the step case of the function so that the program would be well-typed. Dependently typed functions like the above can be incrementally built by hand using feedback from the type checker to construct terms of the appropriate type. Chapter 3. Programming with Dependent Types 34 Proofs in Coq can also be constructed directly in the style of a dependently typed program. However, this approach is uncommon in Coq scripts as terms for relatively simple proofs are large and difficult to interpret. As such, proofs in Coq are usually constructed with machine assistance, which we cover in the next section. 3.2.2 Proof Construction with Tactics Proof assistants usually include tactics that provide machine assistance for incremen- tally building proofs. In systems based on type theory, tactics are used to build terms that have the same type as the proof of the proposition we want to prove. For exam- ple, we can construct a proof for the proposition ∀ n, n + 1 = S n with the following annotated Coq script: Lemma plus S : ∀ n , n + 1 = S n . (∗ Perform induc t i on on n and l abe l the i nduc t i v e hypothes is H ∗ ) i nduc t i on n as [ | n H ] . (∗ Base case subgoal : 0 + 1 = 1 ∗ ) r e f l e x i v i t y . (∗ Proof by r e f l e x i v i t y ∗ ) (∗ In the step case subgoal , the i nduc t i v e hypothes is H i s n + 1 = S n and the conc lus ion i s S n + 1 = S (S n ) ∗ ) s impl . (∗ S imp l i f y conc lus ion to : S( n + 1) = S(S n ) ∗ ) r ew r i t e H. (∗ F e r t i l i s e conc lus ion to : S(S n ) = S(S n ) ∗ ) r e f l e x i v i t y . (∗ Proof by r e f l e x i v i t y ∗ ) Qed . When the proof begins, the only information known about the structure of the term being built is that the type of the final term should be ∀ n, n + 1 = S n. When the induction tactic is invoked in the script above, the term representing the proof is par- tially constructed using the induction principle for nat. This partial term contains holes for the subterms that correspond to the base case and step case proof where Coq’s user interface shows these holes as subgoals. The next lines in the script discharge these subgoals and instantiate these holes to create the complete term. 3.2.3 Program Construction with Tactics As well as proofs, tactics can be used to construct dependently typed programs. This is done by specifying the type of the function we want to build and using tactics to incrementally build a term of the corresponding type. For example, the following Chapter 3. Programming with Dependent Types 35 proof script will build a term with the same type and computational behaviour as the vrev function from §2.2.5.5: Def in i t ion vrev : ∀ (A : Set ) ( n : nat ) ( a : vect A n ) , vect A n . (∗ Perform induc t i on on term a and name the i nduc t i v e hypothes is H ∗ ) i n t r o s . i nduc t i on a as [ | n ’ h H ] . (∗ Base case : vect 0 ∗ ) exact ( v n i l A) . (∗ Supply term to use ∗ ) (∗ Step case : vect (S n ’ ) ∗ ) r ew r i t e <− plus S . (∗ Rewri te conc lus ion : vect ( n ’ + 1) ∗ ) exact ( ( vapp H) ( vcons h ( v n i l A) ) ) . (∗ Supply term to use ∗ ) Qed . Intuitively, when considering the term constructed by this script, the induction step in the proof corresponds to performing pattern matching on the input vect term and the use of the inductive hypothesis (labelled H in the script) corresponds to the recursive call. When writing programs with tactics in this way, it is important to be aware which tactic calls are used to construct computationally relevant terms in the final program. 3.2.4 Constructing Computational and Logical Terms Separately Sometimes it can be preferable to construct the computational and logical parts of a program separately. For example, this methodology was employed when verifying a Java Card tokenization algorithm in Coq [Denney, 2001, §5]. Coq’s Program tac- tic [Sozeau, 2007a, Sozeau, 2007b, Sozeau, 2008] gives a convenient method for con- structing programs in this fashion, where we write the computational part of a program first and then write the proof of correctness. 3.3 Dependently Typed Programming in Russell By using the Program tactic, we can write the computationally relevant parts of a dependently typed function and defer the construction of the required proofs to a later time. Previous work in Coq by Parent [Parent, 1995] provided similar facili- ties. The Program tactic accepts function definitions written in a language called Rus- sell [Sozeau, 2007a,Sozeau, 2007b,Sozeau, 2008]. Russell functions share much of the syntax and typing rules of regular Coq definitions except that Russell permits certain terms to be omitted. After a decidable type-checking procedure, a Russell program is Chapter 3. Programming with Dependent Types 36 interpreted into a Calculus of Inductive Constructions term that contains uninstantiated typed metavariables in the place of the missing proofs. These missing proofs become proof obligations that must be solved to complete the program definition. The Program tactic will attempt to automatically discharge proof obligations with a configurable tactic. When a proof cannot be found automatically, the user is asked to interactively construct a suitable term using tactics in the form of a Coq proof goal. We give an introduction to programming in Russell in the next few sections. For a more formal treatment of Russell and its typing rules, see [Sozeau, 2007b]. 3.3.1 Inductive Family Coercions When a subterm of a Russell program is expected to belong to an inductive family I with type index x, it is permitted to use a term from the same inductive family I with the type index y as long as we later discharge a proof obligation of the form x = y. We now give an example of a Russell function that generates proof obligations such as this. Russell definitions are prefixed with the Program keyword. The following Russell function uses vect to capture the length of the list returned by a function that concate- nates a list of n lists, each of length m, together: Program Fixpoint vconcat (A : Set ) ( n m: nat ) ( a : vect ( vect A m) n ) : vect A (m ∗ n ) := match a with | v n i l ⇒ v n i l | vcons A h t ⇒ vapp h ( vconcat t ) end . As can be seen from the above, Russell programs have the appearance of regular Coq functions. The important difference here is that, for the function body to be a valid Coq program, type coercions would need to be added to each pattern matching clause. As explained at the start of this subsection, Russell allows type coercions to be omitted in certain situations. Due to the type coercions we omitted, this Russell definition generates a proof obligation for each pattern matching clause. We introduce the terminology base case proof obligation for proof obligations produced by the base case term of a function and recursive call proof obligation for proof obligations produced by the step case term of a function. The base case of vconcat generates a proof obligation because the result term Chapter 3. Programming with Dependent Types 37 vnil A has type index 0 when it is expected to have type index m ∗ 0. We therefore need to prove that ∀ m, 0 = m ∗ 0. The actual proof obligation generated by Program is shown as follows in the form of a Coq proof goal: A : Set n : nat m : nat a : vect ( vect A m) n Heq n : 0 = n Heq a : v n i l ' a 0 = m ∗ 0 The proof obligation contains the assumptions A, n, m and a as these are the input terms to vconcat. The assumption Heq n and Heq a are generated from the use of pattern matching in the definition of vconcat. The symbol ' represents McBride’s John Major equality [McBride, 2000] (which we mentioned in §3.1.2). Briefly, proof obligations contain the information that is deduced about terms when pattern matching is performed [Sozeau, 2007a, §3.3]. In this case, the term a was matched against the pattern vnil so the proof obligation contains the assumption 0 = n and vnil ' a. We refer to any equations produced by pattern matching as pattern matching equations. 3.3.2 Subset Type Coercions Russell provides support for programming with subset types (see §2.2.5.6) using a mechanism based on the predicate subtyping feature from PVS [Shankar and Owre, 1999]. The essential idea is that, when a term in a Russell function is expected to have type {x:A | P x}, Russell allows the use of a term t :A if we later provide a proof of P t in the typing context of t . The proj1 sig t function is a standard Coq definition for returning only the compu- tational part of a subset type term t . If a Russell program includes a term t :{x:A | P} when a term of type A was expected, the Program tactic will use the proj1 sig function to coerce t to the expected type. We now give an example of a Russell function that includes the use of subset types. The following Russell function reverses the input list a and returns a subset type term that contains the reversed list r with a proof that a and r share the same length: Chapter 3. Programming with Dependent Types 38 Program Fixpoint srev (A : Set ) ( a : l i s t A) : { r : l i s t A | l eng th r = leng th a} := match a with | [ ] ⇒ [ ] | h : : t ⇒ ( srev t ) ++ [ h ] end . For the base case of the function, we are required to show that the length of the list returned (i.e. [] ) has the same length as the input list a. This is done by proving the base case proof obligation, which has the following form: A: Set a : l i s t A Heq a : [ ] = a leng th [ ] = leng th [ ] Again, notice the equational assumption Heq a that was produced from the use of pattern matching. The term given for the step case of srev is (srev t ) ++ [h]. As ++ expects terms of type list and srev t is a subset type term, the Program tactic uses the proj1 sig function mentioned above to coerce the srev t term to the expected type. For the step case, we therefore must prove ( proj1 sig (srev t ) ) ++ [h] has the expected length. The proof obligation generated for the step case is as follows: srev : ∀ (A : Set ) ( a : l i s t A) , { r : l i s t A | l eng th r = leng th a} A : Set a : l i s t A h : A t : l i s t A Heq a : h : : t = a leng th ( ( p r o j 1 s i g ( srev A t ) ) ++ [ h ] ) = leng th ( h : : t ) Proving recursive call proof obligations usually involves making use of the proposi- tional part of the subset type term returned by the recursive call. The first step in doing this is typically to destructure the result term of the recursive call r into its compu- tational part s and propositional part p. This allows terms of the form proj1 sig r to Chapter 3. Programming with Dependent Types 39 simplify to s and then p can be used as part of the proof. For example, destructur- ing srev A t and simplifying away the proj1 sig term in the above goal produces the following: srev : ∀ (A : Set ) ( a : l i s t A) , { r : l i s t A | l eng th r = leng th a} A : Set a : l i s t A h : A t : l i s t A Heq a : h : : t = a srev s : l i s t A srev p : leng th srev s = leng th t leng th ( s rev s ++ [ h ] ) = leng th ( h : : t ) 3.3.3 Impossible Case Proof Obligations The ! symbol is used in Russell programs to indicate that a particular branch of a program is an impossible case (see §2.2.7). For example, in the following function that returns the head of a list, we mark the base case as being an impossible case: Program Fixpoint hd (A :Type ) ( a : l i s t A | a 6= [ ] ) : A := match a with | n i l ⇒ ! | h : : t ⇒ h end . Note that, in the above function, (a: list A | a 6= []) is convenient shorthand notation for a:{x: list A | x 6= []} . The use of the ! symbol will produce a proof obligation where we must show that the typing context where the symbol appeared contains a contradiction. For the above program, the generated proof obligation has the form {a: list A | [] 6= []} → False. 3.3.4 Termination Measures Only terminating functions can be defined in Coq. Non-structurally recursive functions can be defined using Russell provided we construct the proofs required to show each function terminates. The Russell syntax {measure x} is used to state that a function Chapter 3. Programming with Dependent Types 40 terminates because, each recursive call, the nat term x becomes smaller each time. For example, we can define a function for calculating Fibonacci numbers using a decreas- ing measure as follows: Program Fixpoint f i b ( n : nat ) {measure n} : nat := match n with | O ⇒ 1 | S O ⇒ 1 | S (S x ) ⇒ f i b x + f i b (S x ) end . In this specific example, {measure n} states that the value of n passed to each recursive call to fib should always be less than the value of n passed to the parent call. Therefore, to prove this function terminates, we must discharge a proof obligation of the form ∀ x, x < S (S x) and another of the form ∀ x, S x < S (S x). These proof obligations are produced by the first and second recursive call respectively. 3.4 Program Specifications When writing dependently typed programs, we want to make use of types in such a way that type checking enforces the program properties we are interested in capturing. In this section, we refer to the type given to a dependently typed function as being its program specification. We now discuss some design choices writing program specifi- cations and describe some of the features of working with different representations. 3.4.1 Strong and Weak Specifications Specifications can be described as fitting within two general categories: Strong specifications precisely describe all valid input and output pairs we would expect from a correct function. For example, a strong specification for a list sorting function f is “ f returns a sorted list that is the permutation of its input”. Weak specifications only specify some of the behaviour we would expect from a cor- rect function. Such specifications are also referred to as being “loose“or under- specified. For example, a weak specification for the list sorting function f is “ f returns a sorted list”. This captures some of the properties expected in a sorting Chapter 3. Programming with Dependent Types 41 function but notice a function that always returns the empty list would satisfy this specification. 3.4.2 Transparent and Opaque Definitions Proof assistants make use of the notion of transparent and opaque definitions. Trans- parent definitions can be unfolded in proofs. Opaque definitions differ in that, by design, they cannot be unfolded and only their typing can be observed. Dependently typed functions are generally declared opaque to hide their imple- mentation details and only the information given by the typing of such functions can be relied on in proofs. For example, the typing of srev from §3.3.2 only provides a guarantee regarding the length of the list that srev returns. In contrast, simply typed functions are usually transparent when used in the typing or the implementation of a dependently typed function. 3.4.3 Type Refinement Choices When developing dependently typed programs, we have a design choice of which func- tions we make dependently typed and opaque, and which functions we make simply typed and transparent. The concept of how specific the typing of a program is at vary- ing levels is known as type refinement [Pfenning, 1993]. For example, reconsider the srev function: Program Fixpoint srev (A : Set ) ( a : l i s t A) : { r : l i s t A | l eng th r = leng th a} := match a with | [ ] ⇒ [ ] | h : : t ⇒ ( srev t ) ++ [ h ] end . Here, we have chosen to use a transparent simply typed append function (i.e. ++) in the body of srev. As ++ is transparent, we can write proofs that rely on any of the usual properties of append. An alternative approach to implementing srev is to replace ++ with an opaque de- pendently typed append function with the following type: sapp : ∀ (A :Type ) ( a b : l i s t A) , { r : l i s t A | l eng th r = leng th a + leng th b} Chapter 3. Programming with Dependent Types 42 If we then implement srev using sapp, a call to sapp will appear in the recursive call proof obligation. Destructuring this call to sapp will produce a propositional term that describes the length of the list sapp returns, resulting in a proof obligation with a different form to before. In several examples in Chapter 9, we explore how varying the level of type refinement in a program can require more challenging proofs. 3.4.4 Functions and Inductive Predicates Program properties can be described with the use of functions as well as with inductive predicates. We describe these representations here. 3.4.4.1 Inductive Predicates Predicates can be defined using inductive types to create so-called inductive predicates. In Coq, the commonly used operators for conjunction (∧), disjunction (∨) and equality (=) are defined as inductive predicates. For example, the following inductive predicate defines a type that can be used to build a proof that a natural number is even: Inductive p even : nat → Prop := | even O : p even O | even S : ∀ n , p even n → p even (S (S n ) ) . The proposition p even n can be interpreted as “n is even”. To prove p even n, we must show that the type p even n is inhabited. For example, we can show the proposition p even (S (S O)) holds by building the witness term even S even O. Proofs concerning inductive predicates typically involve determining which constructors should be used to build a term of the appropriate type in this way. Inversion is a common tool for reasoning about inductive predicates [Cornes and Terrasse, 1995]. Briefly, inversion is used to reason about which constructors could have been used to construct a term of a certain type. For example, given the con- structors for p even, we can reason that p even 4 must have been constructed with the constructor even S and a term of type p even 2. We can also reason that a term of type p even 1 is impossible to construct. 3.4.4.2 Predicates as Functions Instead of using inductive predicates, predicates can also be defined as regular func- tions. For example, the following function f even returns True when n is even and False Chapter 3. Programming with Dependent Types 43 otherwise: Fixpoint f even ( n : nat ) : Prop := match n with | O ⇒ True | S O ⇒ False | S (S p ) ⇒ f even p end . In contrast to the inductive predicate p even from before, we can perform computations with f even. This time, to check that S (S O) is even, we only have to simplify the term f even (S (S O)) by computation and check that the result is True. The work in this thesis mainly concerns the use of recursive functions over inductive predicates. 3.5 Conclusions We have given a summary of the main approaches for constructing dependently typed programs and discussed some of the design choices available when capturing program specifications. We looked at how programs can be built directly by hand and how assistance can be given to term construction with the use of tactics. We then introduced the Russell language that gives a convenient approach to programming in Coq, where tactics can be used to construct any required proofs. The proofs required typically involve manipulating equations and reasoning about inductively defined types. We describe the general pattern of these proofs in Chapter 6. We then described some of the choices that are available when capturing program specifications, such as the option of which functions in dependently typed programs should be simply typed and how to represent predicates. These choices will become relevant when describing what style of programming we can support with our frame- work. Chapter 4 Challenges when Programming with Dependent Types In this chapter, we discuss some of the challenging aspects of programming with de- pendent types. Specifically, we explain why we believe user assistance is needed for constructing proofs and coping with errors in programs. The purpose here is to describe the motivation behind the framework we have designed that aims to make dependently typed programming more practical. We introduce this framework in the next chapter. 4.1 User-Defined Properties In this thesis, we use the phrase user-defined properties to refer to program properties, captured with the use of dependent types, that involve data types and functions that were introduced by the user. We aim to support dependently typed programming with user-defined properties and not, for instance, limit the user to only working with some restricted predefined set of definitions. In particular, we want to provide support for capturing program behaviour such as the following with user-defined properties: Membership properties, such as those involving subcollections and permutations e.g. “reversing a list results in a permutation of the initial list”. Ordering properties, such as stating that a collection is sorted and that an item has been added to particular position in a data structure e.g. “an insertion sort func- tion produces a sorted list”. Program equivalence properties, such as showing that an optimised version of a program produces the same results as an simpler but unoptimised version e.g. 44 Chapter 4. Challenges when Programming with Dependent Types 45 “a tail recursive version of a factorial function gives the same results as a non- tail recursive version”. Arithmetic properties, such as those involving the number of items in a collection and the size/height/depth of a data structure. We are particularly interested in providing some support for non-linear arithmetic properties e.g. “a complete binary tree of depth n has 2n−1 nodes”. We note that typical dependently typed programs will also make use of types and functions that are taken from the standard libraries that form part of the programming environment being used. In this thesis, we primarily focus on providing support for types and definitions introduced by the user. 4.2 Proof Construction As seen previously, developing dependently typed programs typically requires that we construct proofs (see §3.3). Requiring users to construct proofs themselves makes dependently typed programming less practical for the following reasons: Proof construction can be difficult. It is well known that constructing formal proofs is a challenging task and can be particularly daunting for beginners. Without suitable proof automation, dependently typed programming will require that users have theorem proving experience. Proof construction is time consuming. Even when a skilled user is able to construct the required proofs, this process can take a lot of effort. As such, without proof automation, users will be discouraged from capturing properties that involve time consuming proofs. In this thesis, we focus on capturing user-defined properties that involve inductively defined types and recursively defined functions. As we describe in Chapter 6, we find that capturing these forms of properties typically requires inductive proofs to be written. Inductive proofs can also be commonly seen in Agda, ATS, Coq and Epigram programs. Most users are likely to find writing such proofs manually both challenging and time consuming. As such, we believe proof automation support is important to making programming with user-defined properties more practical. Chapter 4. Challenges when Programming with Dependent Types 46 We note here that there are benefits to constructing proofs manually as, for ex- ample, this process can be insightful in understanding the behaviour of a program. However, we leave the discussion of where automation is suitable open to debate. 4.3 Coping with Errors Recall that in Russell programs, proof obligations had to be discharged to complete the definition of a dependently typed program (see §3.3). When there is a mismatch between the specification of a program, described with the use of dependent types, and the actual behaviour of the program, some of the proof obligations generated will be unprovable. An unprovable proof obligation indicates that there is either an error in the specification of the program, the behaviour of the program or both (see §8.1 for an example of this). Determining that a proof obligation is unprovable and that there is an error in the program is typically left to the user in most systems. This makes dependently typed programming less practical for the following reasons: Identifying errors can be challenging. In our experience, errors are difficult to iden- tify by hand and it is not often immediately obvious that a proof obligation is un- provable. In particular, much time can be wasted during development attempting to discharge unprovable proof obligations before an error is noticed. Fixing errors can be challenging. When we are aware that an error exists, identify- ing where the fault is and what modifications need to made to correct the problem can be difficult. For example, it can be unclear if there is a fault in the program specification, the program behaviour or both. We therefore believe support is needed for coping with errors to make dependently typed programming easier. 4.4 Conclusions We have described aspects of dependently typed programming that we believe need support to make development more practical. We discussed how users are likely to find the need for manual proof construction challenging as this can be both a difficult and time consuming activity. Moreover, we explained why support is needed for coping Chapter 4. Challenges when Programming with Dependent Types 47 with errors that are indicated by unprovable proof obligations. In the next chapter, we introduce our framework designed to address these aspects of dependently typed programming. Chapter 5 A Framework for Supporting Dependently Typed Programming In the previous chapter, we described how programming with dependent types can be challenging because of the difficulties involved in manually constructing proofs and coping with errors. We now introduce our framework for supporting dependently typed programming that is designed to address these areas. The framework combines ideas from the domains of proof automation and testing, where we include features for automatically discharging proof obligations and giving feedback on errors. In this chapter, we give a high-level description of this framework and how its features are designed to make dependently typed programming more practical. 5.1 Framework Features We first describe the high-level features provided by our framework. These features are described from the perspective of the framework being integrated with a Russell- like language where the tasks of programming and proving are separated (see §3.3). However, the ideas we present will apply to other methods of program construction as well. As we explain in §5.4, the intended audience for this work is existing users of dependently typed programming languages. The main features offered by our framework are as follows: Error feedback: A testing tool is used to automatically identify errors that are indi- cated by unprovable proof obligations. When an unprovable proof obligation is identified, error feedback is provided to the user in the form of a counterexample 48 Chapter 5. A Framework for Supporting Dependently Typed Programming 49 description. This feedback is designed to give information that can be used to fix the error. We describe this error feedback and the design of the testing tool in Chapter 8. Proof automation for user-defined properties: Generic heuristic-based proof automa- tion is provided that is designed to be effective for discharging the proof obliga- tions that arise from dependently typed programs. In particular, this automation makes use of the rippling technique [Bundy et al., 2005] and supports working with user-defined properties that involve inductively defined types and recur- sively defined functions. Moreover, we have focused on support for capturing program properties using subset types. We provide an in-depth description of this automation in chapters 6 and 7. User hinting facilities: If a proof obligation cannot be discharged by the automation, the search tree of the failed proof attempt is shown to the user. The user can examine the search tree and attempt to help the prover by providing a hint. A hint takes the form of a conjecture that the proof automation will try to prove. If successful, the proof found is stored as a new lemma and the automation then tries to discharge the original proof obligation again with the help of this lemma. In cases where the user gives a non-theorem as a hint, the testing tool is em- ployed again to give counterexample-based error feedback. The above hinting mechanism is described in §7.13. This hinting mechanism gives an alternative to having to resort to a manual proof when the automation fails. Lemma caching and lemma reuse: To make the proof automation more powerful and scalable, lemmas proven by the proof automation during proof searches are cached for reuse in future proofs. Several of our design choices center around the desire to cache lemmas that can be more easily reused. We give an overview of the lemma caching feature in §7.2. Tactics for use in manual proofs: In cases where the proof automation fails and the user cannot help automate the proof by providing hints, the individual tactics that make up the proof automation can be usefully employed as part of manual proofs. These tactics are described in Chapter 7. This framework is novel in that it presents a combination of integrated features that are not available in current dependently typed programming environments. Chapter 5. A Framework for Supporting Dependently Typed Programming 50 5.2 Components and Interactions We now describe the components of the framework and explain how these are used to provide the features described above. The main components of the framework are as follows: • The testing tool component is used to find counterexamples to proof goals where the counterexample descriptions are designed to be readable by the user. We employ a QuickCheck-like approach [Claessen and Hughes, 2000] for finding counterexamples. • The proof automation component is composed of several tactics that are inte- grated to provide inductive proof automation. For example, we have designed tactics for simplifying goals, generalising goals [Aubin, 1976,Boyer and Moore, 1979, Aderhold, 2007] and performing rippling proofs [Bundy et al., 2005]. These tactics are structured using the Boyer-Moore theorem prover waterfall approach [Boyer and Moore, 1979]. • The lemma database component is used to store lemmas for use by the automa- tion during proof attempts. Lemmas cached during proof search are stored here as well as the lemmas proven when the user supplies hints to the prover. The following describes how the above components interact with each other when assisting the user in constructing a dependently typed program: 1. The user inputs a dependently typed function into the system in the style of a Russell program (see §3.3), where proof obligations are generated that must be discharged. 2. Generated proof obligations are sent to the proof automation component to be discharged. 3. The testing tool is employed by the proof automation to identify unprovable proof obligations as well as to detect overgeneralisations made during proof search. Moreover, the proof automation utilises the lemma database during proof search as a source of lemmas and as a place to cache lemmas. 4. There are then three possible forms of feedback that the framework can give to the user: Chapter 5. A Framework for Supporting Dependently Typed Programming 51 Success: If the proof automation can discharge all the generated proof obliga- tions, the user is informed that their function has now been defined. Error detected: If the testing tool generates a counterexample to any of the top-level proof obligations, the user is told an error has been found. A de- scription of the counterexample found is displayed and the term in the body of the function that generated the unprovable proof obligation is identified to the user (see §3.3 for a description of how terms in the body of Russell functions can generate proof obligations). This information is intended to help the user identify and correct the error. Proof automation failure: If the proof automation fails to discharge a proof obligation and the testing tool could not find a counterexample, the user will be shown a trace of the failed proof attempt. Assuming a proof is possible, the user can sometimes avoid having to resort to performing a manual proof by supplying a hint to the automation. Figure 5.1 gives a high-level overview of how the framework components communi- cate and summarises how the user interacts with and gets feedback from the frame- work. 5.3 Usage Storyboards To give a better understanding of the dialogue that is meant to take place between the user and the framework, we now present some typical usage scenarios. We describe how the user interacts with the system, how the system responds and how this feedback is used to construct a dependently typed program. Correcting a Program Error The following scenario involves the user correcting a program error: 1. The user enters a dependently typed function definition into the system. 2. When processing the function, the system generates several proof obligations. One of these proof obligations is unprovable because the function supplied con- tains an error. Chapter 5. A Framework for Supporting Dependently Typed Programming 52 Figure 5.1: High-level overview of the framework components and their interactions. 3. The testing tool identifies that one of the proof obligations is unprovable because a counterexample was found. 4. The user is presented with a description of the counterexample and is told which term and property in their program generated the unprovable proof obligation. 5. The user considers this feedback and uses the information given to identify and fix an error in the body of the function supplied previously. 6. Several proof obligations are again generated when the function is processed. This time, the proof automation is able to discharge all of these and the function definition is accepted. Providing a Proof Hint The following scenario involves the user providing a hint to help the proof automation solve a proof obligation: 1. The user enters a dependently typed function definition into the system. Chapter 5. A Framework for Supporting Dependently Typed Programming 53 2. When processing the function, the system generates several proof obligations. 3. The proof automation discharges all of the proof obligations except for one. The user is presented with a trace of the failed proof attempted. 4. The user identifies from the proof trace that a certain lemma could be useful in the proof attempt. The user inputs their proof hint by entering a conjecture that represents this lemma. 5. The automation proves the conjecture and adds the lemma proven to the lemma database. 6. The automation then reattempts the proof that failed previously. This time, with the help of the new lemma in the lemma database, the proof obligation is dis- charged and the function definition is accepted. Notice here that, using proof planning terminology, the user is playing the role of a proof critic by suggesting how a failed proof can be patched (see §2.6). 5.4 Intended Audience Our main audience is users of current dependently typed programming languages like Epigram, Agda, Coq and ATS. The features described should make program develop- ment easier and allow these users to be more proficient. Users of this audience who are familiar with formal proofs are likely to appreciate the lemma hinting mechanism and find the individual tactics useful for writing semi-automated proofs. Ultimately, we would like to make programming with dependent types easy for users who have only had experiences with regular functional programming languages, such as Haskell, ML and OCaml. We believe these users, who are unlikely to have theorem proving experience, would benefit from the framework features we described, especially the improved proof automation. However, we note that members of this audience would need training in at least some aspects of formal proofs to make use of the lemma hinting feature. We note that the features of the framework will also be useful to proof assistant users who wish to construct dependently typed definitions and, in particular, inductive proofs. Inductive theorem proving is a common tool in formal reasoning so improved automation here is likely to be appreciated. Likewise, testing tools are widely known to be useful for testing and refining conjectures during theory developments. Chapter 5. A Framework for Supporting Dependently Typed Programming 54 5.5 Prototype Implementation We have built a proof-of-concept prototype to provide evidence that our framework can be used to make dependently typed programming more practical. This prototype is implemented within the Coq proof assistant, based around the Russell language. Specifically, the proof automation component of the framework acts upon the proof obligations generated from Russell programs. Coq is a good choice as a foundation for demonstrating our ideas for the following reasons: • The Russell language separates the tasks of programming and proving. This makes it easier to produce a prototype where the user interacts with the frame- work in the way that we have envisaged. • Coq is a mature system with a large active user and development community. The practical benefit of this when developing a prototype is that there are many places to find help about Coq and there is lots of documentation available. • Coq is packaged with many powerful tactics, such as decision procedures for linear arithmetic and propositional logic [Coq development team, 2006]. More- over, Coq includes Ltac, a domain specific language for writing new tactics that can make proof automation development easier [Delahaye, 2000]. In principle, systems such as Epigram, ATS and Agda could have been used to proto- type our framework. However, these systems lack Coq’s mature framework and level of built-in proof automation. One possibility we considered was to develop our prototype in Isabelle so we could take advantage of Isabelle’s existing rippling tactic [Dixon and Fleuriot, 2003]. The approach considered was to port Hurd’s PVS-like predicate subtyping work from HOL [Hurd, 2001] for use in capturing program specifications. However, we felt that having to effectively design a new language in Isabelle would be more work and less straightforward than implementing rippling in Coq. 5.6 Conclusions In this chapter, we introduced our framework for supporting dependently typed pro- gramming. The framework aims to make dependently typed programming more prac- tical by providing assistance for coping with errors and constructing proofs. We have Chapter 5. A Framework for Supporting Dependently Typed Programming 55 included features for identifying errors, giving feedback to errors, inductive proof au- tomation for discharging proof obligations and a facility where the user can help the automation by providing high-level hints. In the next chapters, we describe the design of the framework components and their implementation details. We then present a se- ries of case studies in Chapter 9 where we find that our prototype framework makes developing dependently typed functional programs significantly easier. Chapter 6 Proof Patterns of Dependently Typed Programs As described in the previous chapter, one important feature of our framework is to provide automation that is effective at discharging the proof obligations that arise from dependently typed programs. To implement suitable automation, we first need an un- derstanding of the steps taken to discharge proof obligations manually. In this chapter, based on our own experiences of discharging proof obligations by hand, we describe these high-level steps in the form of proof patterns. The purpose of each proof pattern is to identify a pattern of proof that we need to provide automation for and to give an analysis of the situations where these patterns arise. A proof pattern consists of the following: the features that a goal should have for a pattern to be applicable (the pre-conditions), the high-level proof steps that are carried out on the goal (the description) and a description of any notable features that the modified goal will have afterwards (the post-conditions). The pre- and post- conditions are intended to explain the rational behind how the proof patterns can be combined to describe the steps needed to discharge entire goals. The proof pattern descriptions in this chapter were used as the foundation for the design of our proof automation. For each proof pattern in this chapter, we designed and implemented a tactic (described in the next chapter) that provides automation for that pattern of proof. For example, in this chapter, the purpose of the ripple proof pattern description is to identify the places where the rippling technique is applicable when discharging proof obligations (specifically, it is nonobvious that rippling applies in recursive call proof obligations). The ripple tactic in the next chapter gives a concrete implementation of a tactic that automates this proof pattern where, it should be noted, 56 Chapter 6. Proof Patterns of Dependently Typed Programs 57 there are many design choices available in how a rippling tactic can be implemented. 6.1 The simplify Proof Pattern A common proof step is to transform a goal into a simpler form before any complex reasoning techniques, like induction, are used. In particular, we find that top-level proof obligations can frequently be simplified for the following reasons: • Goals that contain subset types can almost always be simplified by destructuring all subset type terms. Doing so gives access to the propositional parts of the subset type terms, which is usually needed to advance the proof. • For each pattern matching equation x (see §3.3.1), x can almost always be used to rewrite the goal and then x can be discarded to make the goal simpler. • Top-level proof obligations can usually be simplified by performing computa- tions. We now describe the simplify pattern: Pre-conditions: None. Description: The following describes the general steps used to simplify goals, where these steps are performed in a loop until no further progress can be made: 1. Simplify the goal using computation. For example, Coq’s simpl tactic does this by applying appropriate reductions [Bertot and Caste´ran, 2004]. 2. If the goal contains a subterm t with a type of the form {x | P}, destruc- ture t into its computational part s and propositional part p. After doing this, p is accessible for use in the proof and terms that were of the form proj1 sig t can be reduced to s. These simplification steps can been seen in the example from §3.3.2. 3. For each assumption of the form H : x = t , where x is a variable and x is not a subterm of term t (i.e. a non-recursive equation), replace all occur- rences of x by t and discard the assumption H. Each of these assumptions can be safely discarded after use because the variable replaced will have been eliminated from the goal (i.e. x has been substituted everywhere by Chapter 6. Proof Patterns of Dependently Typed Programs 58 its definition). Using these assumptions in this manner can allow further simplification to take place. Pattern matching equations are generally non- recursive equations. 4. For each subterm in the goal of the form match x with . . . , we can some- times simplify the goal by destructuring x. This step performs case analy- sis on conditional statements, possibly producing subgoals. For example, x could be a boolean variable. 5. Rewrite the goal with equations that are known to be useful simplification rules. For example, it is common to rewrite occurrences of x ++ [] to x and occurrences of x ∗ 0 to 0. 6. Repeat the above steps until no further progress can be made. Post-conditions: If any of the above steps applied, the resulting goal will generally be easier to prove than before. 6.2 The trivial Proof Pattern Before attempting complex reasoning techniques like induction, it is usually sensible to first check if the goal is solvable by any standard automated tactics that are available. For example, base case proof obligations and the base cases of inductive proofs are sometimes solvable without performing induction. We now describe what we have named the trivial pattern: Pre-conditions: None. Description: The goal is proven using standard reasoning techniques such as propo- sitional reasoning, proof by reflexivity or the application of a previously proven lemma. Post-conditions: The goal is either discharged or unaltered. 6.3 The impossible case Proof Pattern The impossible case pattern describes proofs where wemust find a contradiction amongst the assumptions (in other words, reductio ad absurdum). Impossible case proof obliga- tions (see §3.3.3) usually have this form. Moreover, the base cases of some inductive proofs have this form also. This proof pattern is described as follows: Chapter 6. Proof Patterns of Dependently Typed Programs 59 Pre-conditions: There are no obvious ways to determine when this pattern applies. Although this pattern is always applicable when the conclusion is of the form False, this pattern can also apply when the conclusion does not have this form. Description: The proof is completed by finding a contradiction amongst the assump- tions. The method of doing this is influenced by representation choices. Some typical methods for showing contradictions are as follows: • Propositional reasoning is used to show that the assumptions P and∼P lead to a contradiction. • We must sometimes prove the goal by reasoning that an assumption has a type that is uninhabited. For example, types like 0 = 1, h:: t = [] and 0 6= 0 require such reasoning. Post-conditions: The goal is either discharged or unaltered. 6.4 The induction Proof Pattern Proof by induction is an essential tool for proving universally quantified statements about inductively defined data types and recursively defined functions. The proof obli- gations generated by the use of inductive families (see §3.3.1) and subset types (see §3.3.2) are always universally quantified statements and usually contain inductively defined data types and recursively defined functions. Thus, we find inductive reason- ing common when discharging proof obligations. The induction pattern, which is used to begin an inductive proof, is as follows: Pre-conditions: Induction can be applied whenever the conclusion contains a univer- sally quantified or a free variable that is of an inductively defined type. Description: Induction is performed using a suitable variable and induction principle. Inductive hypotheses can usually be made stronger by first making sure as many free variables as possible are universally quantified before performing induction (see §7.6). Post-conditions: When induction is performed, base case and step case subgoals are produced. When standard induction principles are used, the inductive hypothe- ses in each step case are guaranteed to embed into the conclusion i.e. the rippling heuristic will be applicable to such subgoals (see §2.5). Chapter 6. Proof Patterns of Dependently Typed Programs 60 6.5 The recursive call Proof Pattern The recursive call proof pattern requires some analysis before it is presented. This pattern applies when a recursively defined function has a subset type as its output type. Assume that we are defining a dependently typed function that matches the following template, where function g has type T → T, P is a function that returns Prop, and y1 . . .yn are arbitrary terms: Program Fixpoint f x1 x2 . . . xn : {o : T | P o x1 x2 . . . xn} := match . . . | . . . | . . . ⇒ g ( f y1 y2 . . . yn ) The term f y1 y2 . . .yn, which represents a recursive call to f , will generate a proof obligation because of the way subset types are used above (see §3.3.2). The first step of this proof pattern is to substitute with any pattern matching equations. The conclusion of the goal for this proof obligation will then have the following form: P (g ( p r o j 1 s i g ( f y1 y2 . . . yn ) ) ) x1 x2 . . . xn If the f y1 y2 . . .yn term is destructured into its computational term f s and proposi- tional term f p, and the proj1 sig is simplified away, the goal is transformed into the following form: f p : P f s y1 y2 . . . yn P (g f s ) x1 x2 . . . xn Notice that the type of f p and the conclusion term share syntactic similarities, where both contain the terms P and f s . These similarities exists because the shape of both of these terms is determined by the output type of f . 6.5.1 Recursive Calls and Embeddings In fact, it is common for an embedding (see §2.5) to exist between assumption f p and the conclusion. The presence of an embedding is useful as rippling can then be used to guide the proof search. An embedding will exist when, for all n, the nth argument to P in f p embeds into the nth argument to P in the conclusion. The first argument will always embed in this scenario as f s embeds into g f s . The rest of the arguments will embed when, in Chapter 6. Proof Patterns of Dependently Typed Programs 61 reference to the program that produced the proof obligation, f was called recursively with argument yn being a subterm of xn, for each n. Many structurally recursive func- tional programs are defined using recursive calls that match this form. As such, we can expect embeddings to occur frequently in recursive call proof obligations and these embeddings can be used to guide proofs. 6.5.2 Example We now reconsider the recursive call proof obligation generated from the function srev from §3.3.2. The shape of the step case of srev matches the description from the previous section and, as such, the recursive call proof obligation generated contains an embedding. The recursive call proof obligation can therefore be annotated as follows: srev p : length srev s = length t length ( srev s ++ [h] ) = length ( h :: t ) The rippling heuristic can then be used to determine what rules should be used to modify the conclusion such that srev p can be used. 6.5.3 Multiple Recursive Calls When a dependently typed function f contains multiple recursive calls in the step case, the recursive call proof obligation produced will contain multiple calls to t . Destruc- turing the result of each call to f produces a propositional term. By the same reasoning as before, it is possible for all of these propositional terms to embed into the conclu- sion. The proofs for such goals resemble the step case of inductive proofs that contain multiple inductive hypotheses. 6.5.4 Pattern Description We now describe the recursive call proof pattern: Pre-conditions: The goal is a recursive call proof obligation (see §3.3.2) that was generated when defining some function f where f returns a subset type and the conclusion of the goal contains an occurrence of f . Chapter 6. Proof Patterns of Dependently Typed Programs 62 Description: Firstly, pattern matching equations are substituted. Then the subset type result from each call to f is destructured into the propositional term p and com- putation term s, which then allows terms that have the form proj1 sig ( f . . . ) to be simplified to s. Post-conditions: The resulting goal is now likely to contain an assumption that em- beds into the conclusion. 6.6 The ripple Proof Pattern As we have seen, the recursive call and the induction patterns can produce goals that contain embeddings. We can use rippling to guide the proof for such goals. The ripple proof pattern is described as follows: Pre-conditions: One or more assumptions embed into the conclusion. Description: The rippling heuristic is used to apply proof steps that reduce differences between the embeddable assumptions and the conclusion. If all differences can be eliminated between the conclusion and the embeddable assumptions, the goal can be strongly fertilised. If only weak fertilisation is possible, the lemma cal- culation technique can be used to conjecture a missing lemma (see §2.6). Post-conditions: Conjectures from lemma calculation are universally quantified state- ments about inductively defined types. 6.7 The cross fertilise Proof Pattern The cross fertilise pattern describes a common pattern that arises when we write a program composed of several dependently typed functions that each have an output type of the form {x | P x = . . . }, for some function P. This pattern involves the somewhat ad hoc usage of equations to forward the proof. To describe this pattern by example, consider the following program, where srev is a weakly specified function that reverses a list and is defined in terms of sapp for appending lists: Program Fixpoint sapp (A : Set ) ( a b : l i s t A) : { r : l i s t A | l eng th r = leng th a + leng th b} := (∗ . . . ∗ ) Chapter 6. Proof Patterns of Dependently Typed Programs 63 Program Fixpoint srev (A : Set ) ( a : l i s t A) : { r : l i s t A | l eng th r = leng th a} := match a with | [ ] ⇒ [ ] | h : : t ⇒ sapp ( srev t ) [ h ] end . Notice that the return type of both sapp and srev has the following form: { r : l i s t A | l eng th r = . . . } The recursive call proof obligation of the srev function has the following form after destructuring the recursive call and substituting the pattern matching equations: srev p : length srev s = length t length ( proj1 sig (sapp srev s [h] ) = length ( h :: t ) Following the ripple pattern, we can ripple out the RHS of the conclusion and weak fertilise with srev p from right-to-left. If we then destructure the result from sapp, the goal has the following form: sapp p : leng th sapp s = leng th srev s + leng th [ h ] leng th sapp s = S ( leng th srev s ) Notice that there are no embeddings to guide the use of sapp p here. However, we are able to rewrite the conclusion using sapp p from left-to-right to forward the proof. As the output type of srev and sapp share a common form, we can expect opportunities such as this when these functions appear in the same proof obligation. As in the above, we generally find that making use of equations when there is some opportunity to do so is frequently useful. For example, in some situations, an available equational assumption could be used to rewrite another assumption instead of the conclusion. There will be situations when such an ad hoc approach is not pro- ductive but, when there are no embeddings to guide the proof, this seems a reasonable last resort. Simplifying goals by rewriting with available equational assumptions is a common strategy used in proof automation [Boyer and Moore, 1979, Kaufmann and Moore, 1997,Dixon, 2005]. The cross fertilise pattern is described as follows: Chapter 6. Proof Patterns of Dependently Typed Programs 64 Pre-conditions: The goal contains an assumption of the form H:s = t or H:t = s and the term s occurs elsewhere in the goal. Description: Except in H, all occurrences of s are replaced with t . Assumption H is then discarded. Post-conditions: H has been removed from the goal and all occurrences of s have been replaced with t . 6.8 The generalise Proof Pattern We observe that there are frequently opportunities to generalise top-level proof obli- gations, such as by generalising common subterms, after simplification is performed. This pattern is seen in many places in our case studies (see Chapter 9). For example, we describe how common subterm generalisation is used on several occasions when discharging the top-level proof obligations generated when verifying a binary adder in §9.5. As well as including common subterms, it is not uncommon for proof obligations to contain assumptions that are irrelevant to discharging the goal. For example, consider the following weakly specified program that inserts an item into a sorted list in sorted position where we use the function le gt dec : ∀ n m, {n ≤ m} + {n > m} to compare list items: Program Fixpoint i n s e r t ( x : nat ) ( a : l i s t nat ) : { r : l i s t nat | l eng th r = S ( leng th a )} := match a with | n i l ⇒ [ x ] | h : : t ⇒ i f l e g t dec x h then x : : a else h : : ( i n s e r t x t ) end . The proof obligation generated by the term x :: a in the step case has the following form: Heq a : h : : t = a H : x ≤ h leng th ( x : : a ) = S ( leng th ( h : : t ) ) Chapter 6. Proof Patterns of Dependently Typed Programs 65 Assumption H comes from performing pattern matching on the term le gt dec x h in the program. However, this assumption is not required for the proof as we are only interested in verifying the length of the list returned in this case. Likewise, when a function has a subset type term as an input parameter, each proof obligation will contain a corresponding assumption with that type. As with the above, this assumption may not always be needed to discharge each proof obligation. When the proof of a goal is cached as a lemma, irrelevant assumptions can be problematic as these can make the lemma cached less general (see §7.2). Moreover, irrelevant assumptions can complicate inductive proofs (see §7.6). 6.8.1 Pattern Description We now describe the generalise pattern: Pre-conditions: Applies to any goal. Description: The goal is generalised, such as by replacing common subterms by fresh universally quantified variables, generalising apart variables or eliminating irrel- evant assumptions. Post-conditions: The goal produced will be a more general version of the original. However, there is a danger of overgeneralising the goal. 6.9 Combining Proof Patterns We find that the proofs required to discharge typical proof obligations can be described by a combination of the previously identified patterns. The following describes how these patterns can be composed to describe the general shape of the proofs that we want to automate: • As recursive call proof obligations can contain embeddings if manipulated cor- rectly, it is important not to apply generic simplification steps to these proof obligations initially. The recursive call pattern is followed on these proof obli- gations to reveal potential embeddings. If embeddings are found, the ripple pattern is followed. • Simplification followed by basic reasoning techniques will discharge some proof obligations. Proofs for base case proof obligations and base cases produced from Chapter 6. Proof Patterns of Dependently Typed Programs 66 performing induction typically have this shape. These proofs are described by the simplify pattern followed by either the trivial or impossible case pattern. • For theorems that require induction to be performed, it is beneficial to have the current goal in its simplest and most general form first. This process can be described by following the simplify , generalise and then the induction pattern. Step cases of induction follow the ripple pattern. In the next chapter, we present a concrete implementation of a tactic that automates proofs using the strategy described above. In Chapter 9, we evaluate this tactic against the proof obligations generated from a set of dependently typed programs, where we demonstrate that this tactic provides a high level of proof automation. 6.10 Conclusion In this chapter, we have identified the patterns of proof that commonly emerge when discharging proof obligations that arise from dependently typed programs. We de- scribed how top-level proof obligations in particular benefit from being simplified and generalised before a proof attempt is made. For example, this is in contrast to the proofs automated by IsaPlanner, where top-level goals are not generalised before in- duction is performed [Dixon, 2005]. As IsaPlanner is typically used to prove theorem statements that are hand crafted by the user, the top-level goal is assumed to be in its most general form and therefore generalisation is not attempted. We then remarked that inductive proofs are frequently required in practice, where the rippling technique can be used to guide the proof attempt for the step case [Bundy et al., 2005]. In recursive call proof obligations for programs that use subset types, we identified the non-obvious presence of embeddings in common situations. Rippling can thus be applied to guide these proofs also. In the next chapter, we describe tactics that are designed to automate the proof patterns described here so that practical support can be given for programming with dependent types. Chapter 7 Automation of Proof Patterns In the previous chapter, we described proof patterns that frequently occurred when discharging proof obligations generated from dependently typed programs. As part of our framework for supporting dependently typed programming, we now describe tac- tics designed to automate these patterns. These tactics have been implemented within the Coq proof assistant so, in Chapter 9, we can investigate the effectiveness of this automation using case studies. For each proof pattern in the previous chapter, we de- scribe the design of a corresponding tactic with the same name in this chapter. For example, the ripple tactic provides automation for the ripple proof pattern. The tactics presented in this chapter have been implemented in Coq using a combi- nation of OCaml and Coq’s tactic language Ltac. We describe the high-level algorithms implemented for each tactic in this chapter as opposed to showing the actual code as the former is more concise and easier to understand for those unfamiliar with Coq tactic development. With the exception of basic tactics common to most theorem provers, such as tac- tics for rewriting terms, generalising specified terms and performing structural induc- tion, we name the important existing Coq tactics we have used to implement our tac- tics. For example, our simplify and trivial tactics provide their functionality by calling several nontrivial Coq tactics whereas our ripple , generalise and induction tactics are implemented using basic Coq tactics. In addition to the design of these tactics, we also describe several extra features that have been added to make the proof automation a more practical tool. In §7.2, we de- scribe our approach to caching lemmas found during proof search and ways that these lemmas can be reused by the tactics in future proof attempts. This is supplemented by a simple template-based technique for automatically conjecturing common forms 67 Chapter 7. Automation of Proof Patterns 68 of lemmas, such as commutativity, prior to proof attempts (see §9.7.3). Moreover, we employ heuristics for automatically identifying rules that can be used for simplifying goals (see §7.11). We then explain the feature of our automation that allows the user to provide hints to help the prover when proof search fails (see §7.13). 7.1 Top-Level Tactic Description The top-level tactic that is invoked to automate proof obligations makes use of the Boyer-Moore theorem prover waterfall approach to structure calls to the tactics we have designed [Boyer and Moore, 1979]. In the waterfall approach, a fixed sequence of tactics is invoked on the current goal where the first tactic in the sequence is referred to as the top of the waterfall. When a tactic generates subgoals, each subgoal is processed from the top of the waterfall. The rationale of the ordering of the tactic calls is as follows: rippling should be used to guide the proof when embeddings are present; when there are no embeddings, the goal should be simplified and a trivial proof attempted; when a trivial proof fails, the goal usually requires an inductive proof, where generalising the goal beforehand typically makes the inductive proof easier. Note that, as we describe later, the function- ality of the cross fertilise tactic has been merged into the simplify tactic and, likewise, the impossible case tactic has been merged with the trivial tactic. The top-level tactic thus performs the following steps for each goal: 1. The recursive call tactic is invoked to destructure recursive calls. Recall that this can potentially produce assumptions that embed into the conclusion. 2. If an assumption embeds into the conclusion, the following steps are performed: (a) The simplify and trivial tactics are invoked in an attempt to discharge the goal trivially, where any changes made to the goal are undone on failure. When a proof is found here for a step case goal, this can indicate that induc- tion was performed unnecessarily and that only case analysis was needed. (b) The ripple tactic is invoked, with backtracking occurring if ripple fails to fertilise the conclusion. Specifically, ripple must succeed for the next steps to be applied. 3. The simplify and trivial tactics are invoked. Chapter 7. Automation of Proof Patterns 69 4. The generalise tactic is invoked, with backtracking taking place if an overgen- eralisation is detected. If the proof after this point fails, we allow backtracking to the point before generalise was invoked for cases where an overgeneralisation went undetected. 5. The induction tactic is invoked, with the top-level tactic being called on each subgoal generated. The intended behaviour here is that the ripple tactic, which is part of the top-level tactic, will exploit the presence of embeddings in step case goals. Subgoals generated here that contain embeddings are processed first because, as ripple must fertilise the goal before induction is performed again, we find this limits unproductive proof search. Goals are processed in a depth-first search manner, with the top-level tactic taking a parameter that limits the number of times the induction tactic can be invoked on a sequence of subgoals to prevent looping (we use a default limit of 5). IsaPlanner’s prover, which also makes use of rippling to automate inductive proofs, experiences similar looping behaviour [Dixon, 2005]. 7.1.1 Relation to Proof Planning Note that, although we make use of rippling and the lemma calculation technique as part of our proof automation, we do not use the proof planning approach (see §2.6) in this thesis. We would say that the proof planning approach was being used if each tactic was formalised as a method (i.e. with pre-conditions and post-conditions) and some reasoning was being performed by the machine to determine which combination of methods should be used to conduct the proof search. In our case, the same tactic is always used on every proof attempt. 7.2 Lemma Caching Three databases are used to store cached lemmas, where the same lemma can be in- cluded in more than one database. Each database is intended to include lemmas that are suitable for use by certain tactics. For example, as we mention below, certain lemmas should not be utilised by the ripple tactic for efficiency and only lemmas that simplify the goal should be used by the simplify tactic. The databases used are as follows: Chapter 7. Automation of Proof Patterns 70 • The simplify lemma database contains directed equations for use by the simplify tactic. These lemmas are used when performing exhaustive rewriting to simplify the goal. In §7.11, we describe a heuristic that is useful for identifying obvious simplification rules for this purpose. For flexibility, we include commands for letting the user add rules to this database directly but the user is trusted to only add rules under which rewriting will terminate. • The ripple lemma database contains directed equations that are used by the ripple tactic when performing rippling proof steps. Rippling is able to produc- tively use any rule that can reduce differences in rippling proofs, where suitable rules can increase proof coverage. As rippling always terminates when meta- variables are absent [Bundy et al., 2005] we do not need to be concerned that some combination of cached lemmas might lead to non-terminating behaviour. However, for efficiency, an equation will not be added to the ripple lemma database for use from left to right if the LHS of the equation embeds into the RHS or when the LHS is a ground term. For example, the rules ∀ x, x = x + 0 and ∀ x, 0 = x ∗ 0 usually only serve to increase differences in rippling proofs when used from left to right. • The trivial lemma database is used by the trivial tactic to automatically prove conjectures that are instances of goals that have been seen before. There are no restrictions on the contents of this database and all cached lemmas are added to this. To quickly determine if the current goal is an instance of a cached lemma, the standard technique of using discrimination trees (which are related to tries) is employed [Christian, 1993]. 7.2.1 Irrelevant Assumptions and Caching Reusable Lemmas In §6.8, we noted that top-level proof obligations can contain irrelevant assumptions. As proving goals with irrelevant assumptions can make the lemmas we cache less general, we need a strategy for handling such assumptions so that cached lemmas are useful in future proofs. For example, consider if we had to prove the following goal: ( x : nat ) ( y : nat ) ( z : nat ) (H: y 6= 0) ` x + y = y + x Assumption H and z are superfluous to proving this goal. If these assumptions are not discarded and we go on to prove the goal, the cached lemma will have the following form: Chapter 7. Automation of Proof Patterns 71 L : ∀ x y z , y 6= 0 → x + y = y + x As seen in the above, irrelevant subformulae can make a cached lemma less general as well as cumbersome to use. For example, the ripple tactic would only be able to use L as a rewrite rule when the y 6= 0 side-condition was satisfied and some instantiation was given for z. The problems irrelevant assumptions cause to lemma caching has also been identi- fied in IsaPlanner but is not addressed there [Johansson, 2009, §5.6.1]. The following summarises the strategy that we use to eliminate certain irrelevant subformulae from cached lemmas: • By examining the proof found for a goal, we identify which assumptions from the goal were not used in the proof so that we can then remove the corresponding irrelevant subformulae from the cached lemma. This technique, which we call delayed generalisation, is described in §7.10. • Before performing induction, the induction tactic manipulates the goal so that potentially irrelevant assumptions do not form part of the inductive hypothesis when induction is performed (see §7.6). This avoids irrelevant assumptions be- ing used unnecessarily in proofs, which can then allow delayed generalisation to eliminate such assumptions when the proof is finished. We also investigate heuristics for removing irrelevant assumptions during the proof attempt as part of generalisation (see §7.5.5), which is a step commonly seen in other generalisation algorithms [Aderhold, 2007,Boyer and Moore, 1979]. 7.3 The simplify Tactic We now begin our discussion of the tactics we have designed for automating proof pat- terns, starting with the simplify tactic (i.e. which automates the simplify proof pattern from §6.1). The simplify tactic applies the following steps in sequence and repeats until no progress is made: Subset types: All subset type terms in the goal are destructured. Reductions: The goal is simplified using Coq’s simpl tactic, which simplifies the goal by performing computations [Bertot and Caste´ran, 2004]. Chapter 7. Automation of Proof Patterns 72 Conditional statements: To simplify conditional statements, we identify terms of the form match x . . . and destructure x when x has a non-recursively defined type such as bool. Substitution: For each assumption of the form H : x = t , where x is a variable and x is not a subterm of term t (i.e. a non-recursive equation), we replace all occurrences of x by t and discard H. This is implemented with Coq’s subst tactic [Bertot and Caste´ran, 2004]. Injectivity: Equational assumptions are simplified using the knowledge that con- structors are injective functions. For example, given H : cons h t = cons 0 nil , we can generate the assumptions h = 0 and t = nil and discard H. We use Coq’s injection tactic for this [Bertot and Caste´ran, 2004]. Rewriting: The goal is exhaustively rewritten with cached lemmas that have been selected for use as simplification rules (see §7.2). A conditional rewrite rule can only be used when the subgoals it generates are discharged by the trivial tactic. Use equational assumptions: The cross fertilise tactic is called to rewrite with any equational assumptions available. Removal of Non-informative Equations: We automatically discard assumptions of the form x = x from the goal. It is not uncommon for such equations to be introduced when performing case splits and these equations only serve to clutter goals. Of course, non-informative assumptions can have other forms but we have only considered those of the form x = x so far in this work. 7.4 The trivial Tactic The trivial tactic is intended to automate the trivial proof pattern as well as the impossible case pattern. We decided against implementing a separate tactic for each of these patterns for efficiency reasons as a propositional logic decision procedure is naturally required to automate both patterns. The trivial tactic attempts the following procedures in sequence: Lemma cache: If the goal matches a lemma from the lemma database used by this tactic, the lemma is used to prove the goal. The use of the symmetry property Chapter 7. Automation of Proof Patterns 73 of equality is used so that, for example, a lemma of the form s = t will match a goal of the form t = s as well as s = t . Decision procedures: We use Coq’s intuitionistic propositional logic decision proce- dure (tauto) to attempt to prove the goal [Bertot and Caste´ran, 2004]. Impossible cases: When the goal contains an assumption that has an uninhabited type, such as the type h:: t = [] , we can discharge the goal by reasoning that it is impossible to construct a term that has this type. We implement this using Coq’s discriminate tactic [Bertot and Caste´ran, 2004]. 7.5 The generalise Tactic In this section, we describe the design of the generalise tactic. This tactic makes use of several heuristics to generalise goals automatically. 7.5.1 Overview: An Aggressive Generalisation Algorithm In contrast to more cautious approaches where generalisation is only used as much as is needed to allow the proof to succeed, our tactic generalises more aggressively. For example, the generalisation heuristics in Verifun will only generalise common sub- terms that occur in recursive positions [Aderhold, 2007] whereas we always attempt to generalise all common subterms. Generalising more aggressively has the following benefits: Reusable lemmas: The primary reason for generalising more aggressively is that this results in more general lemmas being cached, where such lemmas are then more reusable in future proofs. Efficiency: The proof search space for generalised lemmas tends to be smaller as gen- eralising will often make the goal simpler and, for example, reduce the number of variables that are available for induction to be performed on. Conciseness: The proofs of more general lemmas tend to be more concise, less clut- tered and easier to read. This is particularly important when we want the user to examine failed proof traces when providing hints to the prover. Chapter 7. Automation of Proof Patterns 74 Our generalisation algorithm is based on the common subterm generalisation algo- rithm used in IsaPlanner [Dixon and Fleuriot, 2004]. During development, we added additional generalisation stages to the algorithm. IsaPlanner’s generalisation algorithm only generalises common subterms whereas our algorithm also generalises by inverse functionality, generalises apart and attempts to eliminate irrelevant assumptions. We now give an overview of our generalisation algorithm, where we make use of the terminology introduced in §2.8. The algorithm generalises the current goal by performing the following step in sequence: 1. Generalise by inverse functionality (see §7.5.2). 2. Generalise common subterms (see §7.5.3). 3. Generalise apart (see §7.5.4). 4. Eliminate irrelevant assumptions (see §7.5.5). 5. Check for overgeneralisations (see §7.5.6). We explain the details of these stages in the following sections. 7.5.2 Step 1: Inverse Functionality In its most general form, inverse functionality can be used to generalise statements of the form f x1 . . . xn= f y1 . . . yn by removing the application of f to produce the statement (x1= y1)∧. . .∧(xn= yn). We find that naive use of inverse functionality fre- quently leads to overgeneralisations so we are more cautious in this stage than others. For this reason, we restrict the use of inversion functionality to cases where n = 1. For example, this strategy will successfully generalise in the following cases: rev (x ++ y) = rev (rev (( rev y) ++ (rev x)) ) generalises to x ++ y = (rev (( rev y) ++ (rev x)) ) length (x ++ (y ++ z)) = length ((x ++ y) ++ z) generalises to x ++ (y ++ z) = (x ++ y) ++ z S (x + 1) = S (1 + x) generalises to x + 1 = 1 + x x ++ (x ++ y ++ z) = x ++ ((x ++ y) ++ z) generalises to x ++ y ++ z = (x ++ y) ++ z x ∗ (y + z) = x ∗ (z + y) generalises to y + z = z + y Note that in, for instance, the last example, generalisation by inverse functionality on x ∗ (y + z) = x ∗ (z + y) is allowed as f is taken as the curried function mult x (i.e. which only has one parameter). Chapter 7. Automation of Proof Patterns 75 As with all generalisation heuristics, there are cases where the generalisations made will be productive (as shown above) and in other cases overgeneralisations can occur. For example, the goal length x = length (rev x) would be overgeneralised to x = rev x by inverse functionality. Such overgeneralisations would be identified by the coun- terexample finder. 7.5.3 Step 2: Common Subterm Generalisation In IsaPlanner, it was found that generalising all maximal common subterms in a goal, where terms of higher-order type are not treated as generalisation candidates, was a successful strategy to use when performing lemma calculation [Dixon, 2005]. We use the same strategy when generalising Coq terms. Our algorithm performs as follows: 1. The set of all subterms s that occur more than once in the conclusion is gener- ated. 2. A subterm t from s is generalised if the following criteria is satisfied: (a) The term t is not a subterm of any of the other terms from s. (b) The type of t is not Prop (e.g. int → int and x + 0 = x have this type) or Set (e.g. nat has this type). This criterion prevents generalising terms of the form fun x ⇒ . . . and type variables. Notice that we allow constants to be generalised. Such generalisations can sometimes be important in rippling proofs [Ireland, 1995]. 7.5.4 Step 3: Generalising Apart When the conclusion is an equation, a variable x is generalised apart if it occurs at least twice on each side of the equation and occurs the same number of times on both sides of the equation. We require the latter condition as, when generalising apart, we always simultaneously generalise one occurrence of x from the LHS along with one occurrence of x from the RHS as opposed to generalising two occurrences of x on just one side of the equation. Generalising apart x occurs by simultaneously replacing the leftmost occurrence of x on both sides of the equation with a fresh variable, where this process is repeated until all occurrences of x are replaced. For example, this strategy will successfully generalise apart the following equations: Chapter 7. Automation of Proof Patterns 76 x + (x + x) = (x + x) + x generalises to a + (b + c) = (a + b) + c length (x ++ x) = length x + length x generalises to length (a ++ b) = length a + length b x ∗ (y + y) = x ∗ y + x ∗ y generalises to x ∗ (a + b) = x ∗ a + x ∗ b max x (max x y)= max y (max x x) generalises to max a (max b y)= max y (max a b) 7.5.5 Step 4: Eliminating Irrelevant Assumptions (the irrelevance Tactic) We now describe an algorithm for eliminating assumptions that are likely to be irrele- vant to proving the current goal. This is implemented as a tactic named the irrelevance tactic. This tactic is also useful in manual proofs as it can help make goals more read- able by discarding assumptions that only serve to obfuscate the goal. The algorithm used has similarities to the irrelevance heuristic from the Boyer Moore theorem prover, where variable sharing between terms is considered to determine relevance [Boyer and Moore, 1979]. The irrelevance tactic works by recursively marking assumptions that it guesses are “probably relevant” to proving the goal. When finished, all the assumptions that have not been marked are discarded, where the tactic is guessing that the discarded as- sumptions are irrelevant. The irrelevance tactic operates as follows, where the special treatment of assumptions of type Set, Prop and Type is explained after: 1. Initially, no assumptions are marked as probably relevant. 2. With the exception of assumptions that have type Set, Prop and Type, we recur- sively reclassify assumptions according to the following criteria until no further reclassifications occur: R1: All assumptions that occur in the conclusion are probably relevant. R2: Assumption x is probably relevant if y : t is a probably relevant assumption and x occurs in t . R3: Assumption x : t is probably relevant if a probably relevant assumption occurs in t . 3. Assumptions that have type Set, Prop and Type which occur in the type of a probably relevant assumption are marked as probably relevant. 4. All assumptions that are not marked as probably relevant are discarded Chapter 7. Automation of Proof Patterns 77 We now give an example that has been constructed to show how assumptions are in- crementally classified by the rules above. Consider the following goal which contains several irrelevant assumptions: (w x y z : nat ) (H1 :w = y ) (H2 : y = 1) (H3 : z = 0) ` S x = x + w The algorithm above correctly identifies that only the assumptions z and H3 are irrele- vant with the following reasoning: 1. x and w are probably relevant because they occur in the conclusion (R1). 2. H1 is probably relevant because w is probably relevant and w occurs in the type of H1 (R3). 3. y is probably relevant because y occurs in the type of the probably relevant as- sumption H1 (R2). 4. H2 is probably relevant because y is probably relevant and y occurs in the type of H2 (R3). To explain the special treatment of assumptions of type Set, Prop and Type, consider the following goal where y is an irrelevant assumption and the, usually implicit, type parameter for length is shown: (A : Set ) ( x y : l i s t A) ` l eng th A x = leng th A x + 0 If assumptions of type Set were considered by rules R1, R2 and R3, y would be incor- rectly marked as probably relevant by rule R3 because A is used in the conclusion and A occurs in the type of y. 7.5.5.1 Limitations We note here some cases that demonstrate the limitations of the irrelevance tactic: Overclassifications: Relevent assumptions that do not share variables with the con- clusion are never marked as probably relevant. For example this would happen in the following goals, where H is relevant to proving the goal in each case by showing a contradiction exists: (x:nat) (H:0 6= 0) ` x + 1 = x (x y:nat) (H:y 6= y) ` x + 1 = x Chapter 7. Automation of Proof Patterns 78 However, the examples above are not an issue in practice as we would usually expect the impossible case tactic to discharge such goals before generalisation is attempted. Underclassifications: Irrelevant assumptions that share variables with the conclusion will always be marked as probably relevant. For example, this will happen in the following goal, where H is irrelevant but is incorrectly marked as probably relevant: ( x : nat ) (H: x 6= 0) ` x + 0 = x It is unclear how the irrelevance tactic could be extended to generalise correctly in the above case and not risk overgeneralising in others. For instance, H is correctly identified as relevant by our tactic in the following similar looking goal: ( x : nat ) (H: x 6= 0) ` ( x − 1) + 1 = x Note that the minus operator for natural numbers is defined in Coq in such a way that 0 − 1 = 0. Assumption H is therefore relevant here as discarding H would make the goal unprovable. To generalise correctly in both of these examples, some domain specific knowl- edge or analysis of how the functions being used are defined would be needed. One approach to identifying irrelevant assumptions here would be to only al- low an assumption to be discarded as irrelevant if a counterexample check de- termined that doing so would not transform the goal into a nontheorem. This approach has been used in Verifun [Aderhold, 2007]. 7.5.6 Step 5: Checking for overgeneralisations We use the typical approach of detecting overgeneralisations by using a testing tool. For simplicity, we only test for overgeneralisations after all generalisation steps are made as opposed to testing after each generalisation step. For this, we use the QuickCheck- like testing tool we have developed for Coq (see Chapter 8). With this tool, we find that, as long as a goal does not contain any difficult to satisfy propositional assump- tions, a small number of tests (e.g. 10) are sufficient to detect overgeneralisations in most cases. Chapter 7. Automation of Proof Patterns 79 7.5.7 Unblocking Rippling by Generalising Apart We incorporated the heuristic for generalising apart into the generalise tactic when we observed that rippling proofs tend to fail when induction is performed on a goal where generalisation apart is applicable. For example, when only using basic defini- tions, the rippling proof for the theorem x + S y = S (x + y) is trivial but the proof of x + S x = S (x + x) is problematic. After performing induction on x on the latter state- ment, rippling becomes blocked in the step case and lemma calculation does not apply. This problematic theorem is given as an example of a rippling proof that can be un- blocked by discovering the wave rule ∀ x y, x + S y = S (x + y) using a lemma spec- ulation critic [Ireland and Bundy, 1996, theorem T15]. Johansson shows in IsaPlanner that rippling can also succeed here without lemma speculation if lemma synthesis is first used to find the same lemma prior to the proof attempt [Johansson, 2009, p114]. We advocate that a conceptually simpler and more natural technique is to generalise apart where possible before performing induction. In this example, the occurrences of x in the top-level goal x + S x = S (x + x) are generalised apart by the generalise tactic to give x + S y = S (x + y), which is then trivial to prove by induction. 7.6 The induction Tactic We now discuss the induction tactic, which initiates an inductive proof. 7.6.1 Inductive Variable and Induction Principle Choice To start an inductive proof, we must pick a variable on which to perform induction and select an induction principle to use. The induction tactic first performs exhaustive universal introduction and then collects a list of all unique free variables used in the conclusion that are of an inductively defined type. Induction is then performed on the first variable collected using the standard induction principle for the type of that variable. When the induction tactic is invoked in the top-level tactic (see §7.1), the subgoals produced are processed by another call to the top-level tactic. If either of these subgoals cannot be discharged, backtracking occurs to the point where induction was performed. The induction tactic then performs induction on the next variable in the variable list and the top-level tactic is invoked on these subgoals. This continues until the contents of the variable list is exhausted. Chapter 7. Automation of Proof Patterns 80 We find this naive approach to selecting the induction variable and induction princi- ple performs well enough in practice. If an unproductive induction variable is chosen, backtracking tends to occur quickly as the ripple tactic is unable to make any progress. A similar approach is used in IsaPlanner when performing induction and is found to perform well in practice as well [Dixon, 2005]. 7.6.2 Modifying the Conclusion Before Performing Induction Before performing induction, it is usually advantageous to modify the conclusion so that certain variables are universally quantified. Quantifying variables can strengthen inductive hypotheses and gives more opportunities for rippling to fertilise in step cases. For example, consider the following two goals, where each goal can be transformed into the other with appropriate universal introduction and reintroduction: 1. (x:nat) , (y:nat) ` x + y = y + x 2. (x:nat) ` ∀ (y:nat) , x + y = y + x Performing induction on x in the first goal would result in a weaker induction hypoth- esis than performing induction on x in the second goal as y would not be universally quantified in the former. We use the following procedure before performing induction on a variable x, where the exceptions to what assumptions we reintroduce are explained in the following sections: 1. Exhaustively perform universal introduction. 2. Reintroduce each assumption into the conclusion that matches the following cri- teria: the assumption is not the induction variable x, the assumption type is any- thing except for Prop and the assumption does not occur in the conclusion. Note that, given some assumption P is defined in terms of another assumption Q, P has to be reintroduced before it is valid to reintroduceQ. Apart from cases such as this, the order of reintroduction is unimportant. For example, ∀ x y, R x y is equivalent to ∀ y x, R x y and there is no reason that the tactics we use should prefer one of these forms over the other. Chapter 7. Automation of Proof Patterns 81 7.6.2.1 Treatment of Irrelevant Assumptions We prevent the reintroduction of variables that do not occur in the conclusion to avoid complicating inductive hypotheses when irrelevant assumptions are present. For exam- ple, if induction is performed on x in the goal (x:nat) ` ∀ (y:nat) , x + 0 = x, where y is irrelevant, the inductive hypothesis is needlessly complex and requires we instan- tiate y during fertilisation. Likewise, reintroducing propositional assumptions (i.e. type Prop) can make an inductive proof more complex than necessary. Consider the following variants of a goal where y 6= 0 is irrelevant in both: 1. (x:nat) ` ∀ (y:nat) , y 6= 0 → x + y = y + x 2. (x:nat) (y:nat) (P:y 6= 0) ` x + y = y + x If we attempt to prove both goals by induction on x, we find that the second goal is sim- ple to prove but the first goal is unnecessarily complex to discharge. When induction is performed on the first goal, the step case inductive hypothesis will contain an implica- tion. In general, step cases of this form require piecewise fertilisation [Armando et al., 1999] to prove which means that the rippling tactic used must be more sophisticated. By not reintroducing propositional assumptions before performing induction, we thus avoid complicating some proofs when goals contain irrelevant assumptions. The above strategy avoids making use of certain irrelevant assumptions in a proof, which then allows our delayed generalisation algorithm to identify and eliminate these as- sumptions later when caching lemmas (see §7.10). As we note in §10.4.6, we currently do not support piecewise fertilisation [Ar- mando et al., 1999], which will be needed to solve inductive proofs where an im- plication appears in the inductive hypothesis. As such, when relevant or irrelevant assumptions appear as implications in the inductive hypothesis, the ripple tactic will fail. 7.7 The recursive call Tactic This recursive call tactic is straightforward to implement and follows the steps de- scribed in §6.5. Chapter 7. Automation of Proof Patterns 82 7.8 The ripple Tactic We now discuss the ripple tactic for automating proofs with rippling. This is largely based on the implementation of dynamic rippling in IsaPlanner [Dixon, 2005, Johans- son, 2009]. We first give a high-level overview of the ripple tactic, where each stage described below is elaborated on over the next few sections. The ripple tactic works as follows: 1. The assumptions that embed into the conclusion that have type Prop (i.e. propo- sitions) are taken as the list of givens to use in the rippling proof attempt. This is intended to automatically identify inductive hypotheses and other assumptions that embed. The restriction on the type of the assumption is used to prevent, for example, the assumption H : list nat from being considered as a given. Treat- ing such assumptions as givens is rarely useful and needs to be avoided as these forms of assumption occur frequently in the conclusion as type variables. 2. The tactic generates all the ways that the current goal conclusion can be modified using available equational lemmas (see §7.8.4). The list of equational lemmas used is initially populated with equations generated from function definitions (see §7.8.1). Only modifications that reduce differences in the conclusion with respect to the list of givens are allowed. Depth-first-search is then used to explore the search space. We limit the depth of the search space to 10 so that, in cases where it will not be possible to fertilise the goal, the proof will fail faster. This limit seemed reasonable as the longest rippling proof found with our prover when ran against a theorem corpus from IsaPlanner (see §7.14) involved 4 rippling proof steps. 3. Fertilisation is attempted when no further difference reducing transformations can be found (see §7.8.1). We do not allow backtracking on the way weak fer- tilisation is performed as we find the choice is typically unimportant to a proof. Likewise, givens are rarely useful after weak fertilisation and so are discarded afterwards. 7.8.1 Generating Equations from Function Definitions When performing rippling proof steps, we make use of equations that are generated from function definitions. For example, the standard functions plus and max (see §A.1) Chapter 7. Automation of Proof Patterns 83 can be represented with the following two equations, where each equation targets one pattern matching clause from the function definition: plus base : ∀ m, plus 0 m = m. p lus s tep : ∀ p m, p lus (S p ) m = S ( p lus p m) . max base : ∀ m, max 0 m = m max step : ∀ m n , max (S n ) m = match m with | 0 ⇒ S n | S m’ ⇒ S (max n m’ ) end Each equation trivially follows from the function definition and is provable by reflex- ivity. The form of these equations is similar to the form seen in other presentations of rippling [Dixon, 2005]. Notice that we can use these equations from right-to-left, which can be useful in rippling proofs, where no similar transformation can be made when performing reductions on terms in Coq. We have implemented an algorithm in Coq that will automatically generate these forms of equations from structurally recursive functions, where this algorithm is suf- ficient for all the examples considered in this thesis. This generation process is best explained by example. Consider the following definition of plus: Fixpoint plus ( n m: nat ) : nat := match n with 0 ⇒ m | S p ⇒ S ( p lus p m) end . We first transform the function definition into an equational goal. For plus, we generate the following goal: ∀ n m, p lus n m = match n with 0 ⇒ m | S p ⇒ S ( p lus p m) end . The goal, which is provable by reflexivity, is derived from a simple syntactic transfor- mation of the original function definition: the LHS consists of the function name and parameters, the RHS consists of the body of the function and the variable names for the input parameters of the function are universally quantified. To generate the defining equations, exhaustive universal introduction is performed and case splitting occurs on the recursion variable. This produces subgoals, where each corresponds to an equation we want to generate. Each subgoal is proven by reflexivity, where each goal and its proof are cached as a lemma. Delayed generalisation is used to remove any unnecessary variables from these cached lemmas (see §7.10). Chapter 7. Automation of Proof Patterns 84 The reason the equation generation process is implemented using tactics is because this was deemed the simplest implementation approach. Specifically, each stage in the generation process is trivial to implement with standard tactics. 7.8.2 Rippling Annotations After transforming the conclusion, we generate all first-order rippling annotations with respect to the list of givens. A rippling annotation is represented as a regular Coq term, where identity functions are used to decorate terms in the traditional way to represent annotation features such as wave fronts and holes [Basin and Walsh, 1996]. For example, to represent a wave hole we introduce the function wave hole which is defined as fun (A:Type) (x:A) ⇒ x. We can then represent that some subterm t in a term is a wave hole by replacing t with wave hole t. We can introduce functions to represent inward wave fronts, outward wave fronts and sinks in the same fashion, where these can then be used to annotate Coq terms with rippling annotations. Due to time constraints, our function for calculating annotations does not look inside the following term constructs when searching for differences between terms: let x := . . . in . . . , fun x ⇒ . . . or match x with . . . . For example, when calcu- lating embeddings, the term t will embed into the term fun x ⇒ s only when both of these terms are syntactically the same, given any term for s. As seen in previous work, special treatment is needed to annotate goals that contain higher-order features such as λ expressions [Smaill and Green, 1996, Dixon, 2005]. However, we can still support rippling proofs that include λ expressions as long as the differences in the conclusion do not appear inside these. 7.8.3 Ripple Measures To check if a modification to the conclusion is difference reducing, we use the sum of distances ripple measure [Dixon, 2005]. When there are multiple givens, a transfor- mation is only allowed when the following holds: all the givens that embedded before still embed, the measure for at least one given has improved and the measure for the rest of the givens are no worse than before. When the conclusion can be annotated in multiple ways, it can have multiple mea- sures. In such cases, we use Dixon’s notion of a threshold measure to efficiently decide when conclusion transformations should be allowed [Dixon and Fleuriot, 2004, §7.10]. Chapter 7. Automation of Proof Patterns 85 7.8.4 Conclusion Transformations When searching for ways to transform the conclusion, we consider every way the con- clusion can be modified by only rewriting one subterm. For example, given commu- tativity of + and the conclusion is (a + b) + c = a + b, we would want to generate the transformations (b + a) + c = a + b, c + (a + b) = a + b and (a + b) + c = b + a. We only allow conditional rewrite rules to be applied when the side-conditions can be dis- charged by calling simplify and trivial in sequence. As with IsaPlanner, the state of the conclusion is stored each time a transformation is made and we backtrack if the same state is seen again during search. 7.8.4.1 Controlling Case Splits with Rippling We make use of a similar technique to IsaPlanner to control case splitting during rip- pling proofs [Johansson, 2009]. Briefly, before checking if a conclusion transformation is measure reducing, when the conclusion contains a subterm of the form match x . . . , a case split is automatically performed on x. In each subgoal produced, for any new assumption that are introduced of the form H : x = t , where x is a variable and x is not a subterm of term t , substitution of x is performed using H and then H is discarded. The case split is only allowed if the ripple measure has been reduced in each subgoal that contains an embedding. If a generated subgoal contains no embeddings, it must be discharged when simplify and trivial are invoked in sequence to continue. The rippling proof then continues within the remaining subgoals. 7.8.5 Weak Fertilisation There are usually choices in the way a conclusion can be weak fertilised. The general heuristic for weak fertilisation, that we use, is that the LHS of the conclusion should only be rewritten by using a given from left-to-right and the RHS of the conclusion should only be rewritten by using a given from right-to-left [Boyer and Moore, 1979, Dixon, 2005]. To weak fertilise with a given, we first attempt to rewrite the LHS of the conclusion and, on failure, attempt to rewrite the RHS of the conclusion. For multiple givens, we repeat this procedure for each given individually, where weak fertilisation only succeeds when all givens can be used. Chapter 7. Automation of Proof Patterns 86 7.9 The cross fertilise Tactic We now explain the implementation of the cross fertilise tactic. For each assumption H in the goal with the form ∀ x1 . . . xn, s = t, we perform the following procedure: 1. For each assumption P, we attempt to rewrite P with H used as a left to right rewrite rule. This is repeated for the conclusion. 2. If no terms were rewritten in the previous step, attempt the rewriting operation again using H as a right to left rewrite rule. 3. If any terms were rewritten in the previous two steps, discard H. This tactic will produce subgoals when H is a conditional rewrite rule. Note that the tactic always terminates because, whenever terms are rewritten, the number of assump- tions in the goal decreases and this can only happen a finite number of times. Recall that the ripple tactic discards inductive hypotheses after fertilisation is per- formed (see §7.8.5). This step is important when the cross fertilise tactic could be called as calls to this can undo the progress made by the weak fertilisation step of the ripple tactic. 7.10 Delayed Generalisation Now that we have described all of the tactics that make up our automation, we move on to describing additional features that concern lemma caching and the ability to give hints. In this section, we describe an algorithm that we have named delayed generalisation that is used when caching lemmas. Given a lemma statement P and its proof t , the purpose of this algorithm is to produce a more general lemma by inspecting both P and t to identify irrelevant subformulae that can be safely removed. As an example, consider a lemma of the form ∀ x y, y 6= 0 → x + y = y + x, where the proof of this lemma did not make use of the witness for y 6= 0. By inspecting the lemma statement and the proof, delayed generalisation can produce a more general lemma of the form ∀ x y, x + y = y + x. This offers an alternative to eagerly guessing which assumptions are irrelevant and removing them in the middle of a proof [Ader- hold, 2007,Boyer and Moore, 1979], which can cause overgeneralisations to occur. However, note that delayed generalisation would not, for example, be able to re- move the y 6= 0 assumption from the lemma above if the assumption had been need- lessly used in some way in the proof. For future work, it would be useful to explore Chapter 7. Automation of Proof Patterns 87 how proofs could be simplified to reduce the unnecessary use of assumptions before delayed generalisation is applied. 7.10.1 Illustrative Examples We begin with a lemma statement P and its proof t : P. Given that P has the form ∀ (x1:T1) . . . (xn:Tn), Q, the task of delayed generalisation is to identify which univer- sally quantified variables can be removed from the start of P and produce a more gen- eral lemma t ’ : P’ such that P’ subsumes P. To understand how we can identify which variables from P should be removed, first consider the case where P is the following: ∀ ( x y z : nat ) , y 6= 0 → x + y + y = y + x + y Notice that, to prove this theorem, we should not have to make use of z or y 6= 0. The following is a Coq term for t , that gives a proof for P, where the proof involves per- forming exhaustive universal introduction, rewriting the LHS of the conclusion using the lemma plus comm and finishing with a proof by reflexivity: fun ( x y z : nat ) (H : y 6= 0) ⇒ eq i nd r ( fun t ⇒ t + y = y + x + y ) r e f l ( plus comm x y ) The exact meaning of each subterm in t is unimportant except to make note of a few features. Firstly, when the proof begins by exhaustive universal introduction, t begins with a sequence of λ terms, where each λ term corresponds to a universally quantified variable from P. When one of the λ terms at the start of t introduces a variable that is not used in the body of t , this represents an assumption that was not required to construct the proof. In this case, variables z and H are not used in the proof and are thus superfluous to the lemma statement. A special case to be aware of is that universally quantified variables that occur in the conclusion of P should always be retained when generating t ’ : P’. For example, consider the case where P is ∀ n, 0 ∗ n = 0. A standard proof t for this lemma in Coq is fun n ⇒ refl equal 0. As the variable n is not used in t , it is nonsensical to elimi- nate the corresponding variable n from P. For this reason, the delayed generalisation algorithm never eliminates universally quantified variables that occur in the conclusion of P when generating t ’ : P’. Chapter 7. Automation of Proof Patterns 88 7.10.2 Algorithm Description With the previous examples in mind, the following describes the delayed generalisation algorithm: 1. We assume P has the form ∀ (x1:T1) . . . ∀ (xn:Tn), Q and its proof t was con- structed by first performing exhaustive universal introduction. Under these as- sumptions, term t will have the form fun (y1:T1) ⇒ . . . fun (yn:Tn) ⇒R. 2. P’ and t ’ are initially taken as copies of P and t respectively. The following operation is performed on each pair (xi, yi) from P’ and t ’ : if xi does not occur in Q and yi does not occur in R then, in P’, the subterm (∀ (xi:Ti), U) is replaced with U and, in t ’ , the subterm (fun (yi:Ti) ⇒V) is replaced with V. Pairs are processed from the innermost to the outermost because, for example, when x1 and x2 are irrelevant and x1 occurs in T2, x2 must be removed first for x1 to be identified as irrelevant. 3. P’ and t ’ are then used to define a new lemma which, assuming some pairs were removed from these in the previous step, will be a more general version of P. The above is implemented in Coq as a command that, when supplied with a theorem, will attempt to derive and store a more general version of that theorem using delayed generalisation. When our automation finishes constructing a proof t for a goal g and t and g are cached as the lemma P, delayed generalisation is used on P to produce P’. P’ is then used to prove g. This step is important because if P is used to prove g, P will be instantiated with, and thus make use of, all the assumptions in g, including any that were just identified as being irrelevant by delayed generalisation. Proving the goal by P can therefore prevent delayed generalisation from identifying further irrelevant assumptions in the proof for the top-level goal. Finally, we note that implementing delayed generalisation is fairly trivial in Coq as proofs are represented using regular Coq terms and can thus be easily manipulated with the same techniques used to write tactics. 7.11 Automatic Identification of Simplification Rules To provide better support for working with definitions introduced by the user, we describe two simple heuristics which we have found to be successful for automat- Chapter 7. Automation of Proof Patterns 89 ically identifying equational lemmas that can be productively used by the simplify tactic. For example, rewriting from left to right with the rules ∀ a, a ++ [] = a and ∀ a, rev (rev a) = a can be useful for simplifying goals. Simplification rule sets are normally hand chosen based on experience and users are trusted to choose rules that do not cause simplification tactics to loop. As we discuss in the next section, the power of our proof automation can be increased by supplying the simplify tactic with appropriate simplification rules. Similarly, we note that the simplification tactic in ACL2 [Kauf- mann and Moore, 1997] contributes significantly to the power of the prover. 7.11.1 Motivation In the following, we describe several reasons why we found simplification rule detec- tion an important feature to add to our framework: Increasing proof coverage: Appropriate use of simplification can allow a theorem to be solved trivially without induction and rippling. For example, consider if we were required to prove ∀ a, length (rev (rev a)) = length a and had access to the lemma L : ∀ a, rev (rev a) = a. The goal can be solved easily by rewriting the goal using L from left to right and then finishing with a proof by reflexivity. This avoids the more complex approach of performing a proof by induction. Moreover, in cases where the inductive proof is difficult and rippling can become blocked, simplifying the goal first can make the inductive proof more likely to succeed. Producing Reusable Lemmas: Neglecting to apply obvious simplification rules be- fore performing induction can result in less reusable lemmas from being cached. For example, consider the following goal: ∀ a b c , ( a ++ b ++ [ ] ) ++ c = a ++ ( b ++ c ) Given that the prover knows the lemma L : ∀ a, a ++ [] = a, performing simpli- fication with L before proving the goal by induction results in the general lemma ∀ a b c, (a ++ b) ++ c = a ++ (b ++ c) being cached. Alternatively, proving the top-level goal directly by induction leads to a less general lemma being cached. Additionally, as discharging a single goal can involve several lemmas being cached in the process, simplification of the top-level goal can also make these additional lemmas more general also. Chapter 7. Automation of Proof Patterns 90 Efficiency: Simplification can make proof search more efficient in that costly induc- tive proof attempts can be avoided entirely and, when induction is needed, sim- plification can reduce the proof search space by removing variables and terms from the goal. 7.11.2 Heuristics for Identifying Simplification Rules We now describe heuristics for identifying simplification rules. Whenever a lemma is cached, these heuristics are used to automatically identify lemmas that are appropriate for simplifying goals. Suitable lemmas are added to the lemma database used by the simplify tactic. Recall that the simplify tactic will exhaustively rewrite the goal with all equational lemmas in its lemma database, with the restriction that any side-conditions produced during rewriting must be solved with a call to the trivial tactic. 7.11.2.1 Basic Terms We introduce the notion of a basic term to describe terms that intuitively cannot be sim- plified any further. A basic term is defined recursively as a term of the form f x1 . . . xn where the following is true: 1. The term f is either a constructor (e.g. S or cons), a type constructor (e.g. list ) or an inductive data type (e.g. nat). 2. Individually, x1 . . . xn are either basic terms or have the type Type or Set. In other words, a basic term can only include variables that act as type variables and the only function symbols that are allowed are constructors. For example: • The terms O, (S O), nil nat and cons nat O (nil nat) are basic terms. • Given A:Type, the term nil A is a basic term. • The term 0 + 0 is not a basic term because + is not a basic term. • Given x:nat, the terms S x and cons nat x ( nil nat) are not basic terms because x does not have the type Type or Set. Chapter 7. Automation of Proof Patterns 91 7.11.2.2 Basic Terms Heuristic (BH) The first heuristic we use for identifying simplification rules is as follows: BH: The lemma ∀ x1 . . . xn, s = t should be used as a left to right rewrite rule for simplification when the term s in the lemma statement is not a basic term and the term t in the lemma statement is a basic term. Intuitively, this rule says that we can simplify a goal by using any lemma that can be used to replace non-basic terms with basic terms. The following are some example lemmas that are identified as left to right rewrite rules for simplification by BH: ∀ x , x − x = 0 ∀ x , x ∗ 0 = 0 ∀ x , 1 ˆ x = 1 ∀ x , min x 0 = 0 ∀ x , leng th x = 0 → x = [ ] We find this heuristic selects obvious simplification rules as it is almost always benefi- cial to eliminate variables in this manner from a goal. 7.11.2.3 Embeddings Heuristic (EH) The second heuristic we use for identifying simplification rules is as follows: EH: The lemma ∀ x1 . . . xn, s = t should be used as a left to right rewrite rule when the term t in the lemma statement embeds into the term s using first-order embeddings and s is syntactically different from t . In the simplest cases, the RHS side of a selected equation is a term that occurs in the LHS of the equation. For example, the following equations are identified as left to right rewrite rules for simplification by EH: ∀ x , x ∗ 1 = x ∀ x , x ++ [ ] = x ∀ s t a r t len , leng th ( seq s t a r t len ) = len ∀ a , rev ( rev a ) = a ∀ a , m i r r o r ( m i r r o r a ) = a ∀ x , max x 0 = x ∀ x , min x 0 = 0 ∀ x , x ∗ 0 = 0 Chapter 7. Automation of Proof Patterns 92 In more complex cases, the RHS is not a subterm of the LHS. For example: ∀ x y , max x (max y x ) = max x y . ∀ a , leng th ( rev a ) = leng th a ∀ f a , leng th (map f a ) = leng th a ∀ a , num nodes ( m i r r o r a ) = num nodes a ∀ f a , num nodes (map f a ) = num nodes a ∀ a , sum ( rev a ) = sum a ∀ a x , l i s t c o u n t ( rev a ) x = l i s t c o u n t a x ∀ h x a , h 6= x → l i s t c o u n t ( a ++ [ h ] ) x = l i s t c o u n t a x We find the above heuristic identifies many useful simplification rules. For example, as seen above, there exists many simplification rules of the form ∀ x, f (g x) = f x. Notice that, in cases where the RHS is both a basic term and a subterm of the LHS, the lemma will be selected by both EH and BH. However, BH is restricted to selecting lemmas where the RHS is a basic term which is not the case for EH. 7.11.2.4 Termination We now justify that exhaustively rewriting a term t using any combination of rules selected by BH and EH must always terminate: • Let m(s) be the function that sums 1) the number of function symbols that are neither type constructors nor constructors in the term s (e.g. + would be counted but list would not) with 2) the number of variables in s whose type are neither Set nor Type. • Let n(s) be the function that returns the number of nodes in the syntax tree for s. • When a BH rule is used to transform the term t to the term t ’ by replacing some subterm s in t with the term s’ , it must hold that m(s)> 0 and m(s’) = 0 and therefore m(t) < m(t’) must hold. For example, BH rules can only ever be used to eliminate, and never introduce, function symbols like + and variables of type nat so BH rules must always decrease the value of m(t ’) . • When an EH rule is used to transform the term t to the term t ’ by replacing some subterm s in t with the term s’ , it must hold that m(s) ≤ m(s’) and n(s) < n(s’) and therefore m(t) ≤ m(t ’) and n(t ) < n(t ’) must hold. Specifically, an EH rule can only transform s by stripping nodes from its syntax tree and can never intro- duce new nodes. Chapter 7. Automation of Proof Patterns 93 • Exhaustive rewriting with any rules selected by BH or EH rules must terminate as the pair (m(t) , n(t ) ) descends lexicographically each time a rule is used to rewrite t . 7.12 Lemma Discovery As demonstrated by IsaCoSy [Johansson, 2009], conjecturing and caching lemmas about function definitions prior to attempting to prove the theorems we are interested in can improve the proof coverage of rippling-based proof automation. We describe here a limited form of lemma discovery we use that aims to only conjecture a small number of common lemma forms. This is intended to be used whenever a new simply typed function definition is entered. The procedure we use for lemma discovery is as follows: 1. The system is provided with a list of hand-crafted lemma templates that state generic operator laws. We make use of templates for involution, commutativity and associativity laws. For example, the template ∀ (x y: t ) , f x y = f y x de- scribes commutativity, where f must be instantiated to some binary function f and type inference is used to instantiate t with an appropriate type. 2. After a new simply typed function g is defined by the user, g is used to instantiate all available lemma templates to create a list of terms representing conjectures. For example, after plus is defined, the template given above would be instan- tiated to create the conjecture ∀ (x y:nat) , plus x y = plus y x. In cases where g has arguments of type Type, all these arguments are instantiated to the same universally quantified variable T that has type Type. For example, consider if f in the template ∀ (x: t ) , f ( f x) = f x was to be instantiated with the following function: rev : ∀ (T :Type ) , l i s t T → l i s t T The instantiated template would then have the following form: ∀ (T :Type ) ( x : l i s t T ) , rev T ( rev T x ) = x 3. Any ill-typed conjectures that are generated are discarded. For example, this would happen when the commutativity template is instantiated with a single ar- gument function. Chapter 7. Automation of Proof Patterns 94 4. Our testing tool (see Chapter 8) is used to identify and discard faulty conjectures. 5. The proof automation is then used to attempt to prove the remaining conjectures. Successful proofs result in lemmas being cached. We have implemented partial automation for the above, where lemma discovery must be called manually on a function when it is introduced. 7.13 User Hints In this section, we describe how our proof automation gives feedback on failed proof attempts and explain the feature that allows the user to help the proof automation by providing hints. 7.13.1 Proof Search Feedback When a proof attempt fails, the proof automation displays a trace of the failed proof search. The user can utilise the information in the trace to determine ways to provide useful hints to the automation. A proof trace consists of a tree of goal nodes, where each goal node contains a de- scription of a Coq goal that was seen during proof search. Each goal node is connected with a single directed edge to either a “tactic” node or a “branch” node. When a goal node g is connected to a tactic node, where the latter node is labelled with the name of some Coq tactic t , this has the meaning that t was invoked on g during proof search. Each tactic node is joined with directed edges to the goal nodes that represent the subgoals that were produced when this tactic was invoked during proof search. When a goal node is connected to a branch node, this represents a point in the search space where backtracking was possible. A branch node is connected with di- rected edges to one or more tactic nodes. Each of these tactic nodes represents a tactic that was invoked on the goal that was connected to the branch node. Each node in the proof trace is annotated with either “fail” or “success”. A goal node and all of its child nodes are annotated with “fail” if the goal this node represents was not discharged dur- ing proof search. A goal node and all of its child nodes are annotated with “success” if the goal this node represents was discharged during proof search. Chapter 7. Automation of Proof Patterns 95 Fi gu re 7. 1: Sc re en sh ot sh ow in g a tra ce of a fa ile d pr oo fa tte m pt . Chapter 7. Automation of Proof Patterns 96 7.13.1.1 XML Representation A proof trace of the form described above is displayed by the prover in the form of an XML tree. For example, Figure 7.1 shows the failed proof trace produced when the prover is asked to prove ∀ x, rev (x ++ x) = rev x ++ rev x from basic definitions. The indentation of the XML represents the depth of the search. Our intention was that the XML tree would be converted to a more human readable format before being displayed to the user but this was not done due to time constraints. This particular trace shows that the prover failed to simplify or generalise the goal, performed induction on x, failed to fertilise in the rippling proof for the step case and then the proof attempt failed. We explain in the next section how the user can provide a hint to automate this particular proof. We now describe the meaning of the XML tags: • A goal tag describes the state of the goal at a given stage in the proof attempt. Tags nested within a goal tag describe the attempt to prove that goal. • A tactic tag gives the name of the high-level tactic that was invoked on the current goal. Each goal tag that appears in sequence after the closing tactic tag represents a subgoal produced by the tactic that was called. • A tag that comes before the tag that closes a tactic , goal or branch block indicates that the last tactic call failed and backtracking occurred. • Tag sequences of the formt1t2. . . rep- resent choice points in the search space, meaning what is described in t1 was per- formed first, backtracking occurred, then what is described in t2 was performed next and so on. The branch tags in this trace indicate the choice of whether to generalise the goal or not, and the choice of which variable to perform induction on. • Between the opening and closing tags for a tactic tag, trace information pro- duced by the tactic called is displayed. Of particular interest, the ripple tactic will display the sequence of transformations it makes to the conclusion during the rippling proof search. For each transformation, the current ripple measure and the wave front annotations are shown. To improve readability, wave fronts are coloured differently from the rest of the terms. We find this helps signifi- cantly when inspecting rippling proof attempts. Chapter 7. Automation of Proof Patterns 97 Proof traces are written to the standard output stream so they can be displayed in a terminal window alongside Coq’s standard IDE. We initially tried displaying proof traces inside the error feedback window in Coq’s IDE, but we found the length and width of typical proof traces were too large for this to be practical. For future work, it would be useful to present the user with a graphical overview of proof attempts. For example, a so-called hiproof could be presented to explain to the user which subgoals were considered and which tactics were invoked in a proof attempt [Denney et al., 2006,Aspinall et al., 2008]. 7.13.2 User Hinting Mechanism After a failed proof attempt, the user can invoke the hint (c) command to give con- jecture c as a hint, where this command is entered as part of the program script. The prover then tries to prove c, where the following scenarios can occur: • If the prover is successful, c and its proof will be cached as a lemma in the same manner as lemmas are cached during proof search. As described in §7.2, cached lemmas of appropriate forms can be automatically utilised by the ripple , trivial and simplify tactics. The failed proof attempt from before can then be reattempted, with the hope that the extra lemma can be used productively by the prover to complete the proof this time. Note that the proof is reattempted from scratch when a hint is given and we do not reuse any progress from the previous search. • If the prover is unable to prove c, a proof trace is displayed. In such cases, the user can give a further hint to help produce a proof for c. • The hint may be faulty in that c is a nontheorem. In §8.2.2, we describe how we can make use of our testing tool to give useful feedback in the form of a counterexample to help the user refine faulty hints. 7.13.3 Providing Productive Hints To give useful hints, the user must consider how their hint could be productively used by the various tactics. The following describes the primary ways that useful hints can be given by making use of the information for proof traces: Chapter 7. Automation of Proof Patterns 98 Wave rule hints: When a proof trace indicates that the ripple tactic failed to fertilise in a goal, the user can conjecture a lemma that can be used as a wave rule to help. The wave front annotations in the proof trace can give the user a strong indication about the shapes of lemmas which would be useful for this. Generalisation hints: When a proof trace indicates that the generalise tactic did not generalise some goal g appropriately, the user can suggest a generalised form of the goal as a hint. To do this, the user would suggest the conjecture g’ as a hint, where g’ subsumes g. If the lemma for g’ is proven, this lemma can then be used by the trivial tactic to automatically prove g. Returning to the example in the previous section, the proof trace shows the generalise tactic was unable to suggest a way to generalise the top-level goal. The top-level goal can be stated more generally, as we show with the following hint that the prover is able to automate: h i n t (∀ x y , rev ( x ++ y ) = rev y ++ rev x ) . Simplification hints: The user can gives hints to help the simplify tactic performmore effective simplification. For example, if the user noticed a unprovable subgoal contained the term rev (rev x), the user might be able to help the prover succeed by giving the following hint: h i n t (∀ x , rev ( rev x ) = x ) . The user can then use an extra command for manually adding the lemma proven for this hint to the rewrite rule database used by the simplify tactic. However, this manual step is not required for this particular rule as the prover has heuristics that will automatically use this rule for simplification (see §7.11). It is worth noting the similarities between the role of the user in providing hints and the role of a proof critic in proof planning [Ireland and Bundy, 1996]. Similarly, a proof critic uses the information gained from a failed proof attempt to suggest a way to patch the failing proof. As such, it is possible that critics could be applied here to reduce the need for manual user hints. However, as proof critics are not guaranteed to succeed in all cases, we feel it is important to provide facilities for manual user hinting. Chapter 7. Automation of Proof Patterns 99 7.14 A Comparison with IsaPlanner In this section, we evaluate the utility of our top-level tactic as an inductive proof au- tomation tool. To do this, we evaluate our tactic against a theorem corpus that has been previously used to evaluate the inductive proof automation power of IsaPlanner [Jo- hansson, 2009, §5.5]. This corpus contains 87 theorems concerning arithmetic, lists and binary trees, many of which are taken from Isabelle’s standard theorem libraries. The corpus can be found in Appendix C, where we make references to the labelled theorems in this section. The corpus was devised to test IsaPlanner’s ability to auto- mate proofs that require case splits to be performed. IsaPlanner was able to automate 47 of these theorems. In this section, we test our prover against the same theorems and compare the results with IsaPlanner. 7.14.1 Experimental Setup When IsaPlanner was evaluated against the theorem corpus, only basic function defi- nitions (and no extras lemmas) were supplied to IsaPlanner. For a fair comparison, we configured our prover to only use the same definitions. To perform the experiment, we first had to translate the IsaPlanner theorem cor- pus and function definitions to Coq. This translation was mostly straightforward (see Appendix A.4 for the definitions used) except for the following design choices: • The function last x (which returns the last item in the list x) was defined in Is- abelle as a partial function. Specifically, the case in which x is empty is ignored. As Coq does not allow partial functions to be defined, we used the approach of implementing this as the function last x d which returns the last item in list x when x is nonempty and returns a default result d when x is empty. • In Isabelle, theorem 11 (see Appendix C) has the form (max a b = a) = (b ≤ a). A literal translation of this statement to Coq is a nontheorem. A natural Coq interpretation of this statement that is a theorem is (max a b = a) ↔ (b ≤ a), where the outermost = operator has been replaced with ↔ . A similar interpreta- tion was required to translate theorems 12, 15 and 16 from Isabelle. Chapter 7. Automation of Proof Patterns 100 7.14.2 Results Of the 87 theorems, our prover was able to automate 45 (52 %) of these. IsaPlanner was able to automate 47 (54%) theorems. Of the theorems IsaPlanner could automate (those labelled 1 to 47) we could automate 39. From the remaining theorems Isa- Planner could not automate (those labelled 48 to 87), we could automate 6 of these. See Appendix C for the detailed results of which theorems could be automated by our prover. 7.14.3 Analysis As our rippling tactic relies on the same ripple measure and implements the same tech- nique for reasoning about case splits, we would expect that our system could automate many of the rippling proofs that IsaPlanner can automate. The results of the experi- ment show that this is the case with only a few exceptions. We now give an analysis of the results of our prover compared to IsaPlanner. We first consider the 8 theorems that our prover failed to automate that IsaPlanner was able to prove: • Theorems 6, 11, 12, 15 and 16 are not equality statements. We currently only support rippling proofs where the conclusion and the given are equality state- ments so the inductive proof attempts for these theorems fail. Specifically, the weak fertilisation step of our prover will only work when the given is an equa- tion. These theorems succeed in IsaPlanner as the Isabelle version of the theo- rems have the form P = Q (see §7.14.1), which IsaPlanner’s rippling tactic can support. • Theorem 31 has the form member x l →member x (l @ t). Notice that, after per- forming universal introduction, the goal for this theorem will contain an assump- tion that embeds into the conclusion. Our top-level tactic therefore invokes rip- pling on this goal (see §7.1). The proof attempt fails because appropriate lemmas are not available to complete the rippling proof. Theorem 32 fails for the same reason. IsaPlanner only ever invokes rippling in step case goals and never on top-level goals. IsaPlanner automates both of these theorems by first performing induction on the top-level goal and then using rippling in the step cases. The behaviour of our system here was somewhat surprising as we usually only expect our rippling tactic to be invoked in recursive call proof obligations and in Chapter 7. Automation of Proof Patterns 101 the step cases of inductive proofs. Our prover can automate both of these proofs if it is forced to perform induction on the top-level goal instead of rippling. To remedy the above problem, we could consider hard-wiring our top-level tactic to only use rippling in recursive call proof obligations and in the step cases of inductive proofs. However, while verifying a quicksort program, we came across a surprising situation where rippling was applicable in a useful way after a top- level goal was simplified (see §9.4.3). We would need to consider more examples to determine the best general approach for when rippling should and should not be invoked when embeddings are found outside of step case goals. • When proving theorem 43, IsaPlanner performs a case split during a rippling proof where one of the subgoals produced contains an assumption of the form x 6= x. When IsaPlanner performs case splits, it automatically discharges any subgoals generated that contain contradictions of this form [Johansson, 2009]. As we do not check for contradictions when performing case splits during rip- pling proofs, our rippling tactic fails to automate this subgoal. This could be fixed by performing a call to our trivial tactic when case splits are performed. We now consider the 6 theorems we are able to automate that IsaPlanner could not: • Theorem 76 has the form butlast (xs ++ ys) = match ys with [] ⇒ . . . . Our prover proceeds by performing induction on the goal for this theorem. Unex- pectedly, the rippling tactic is not invoked in the step case. The reason for this is that our algorithm for generating rippling annotations cannot annotate conclu- sions that contain match constructs (see §7.8.2). As the rippling tactic reports that there are no assumptions that embed, the top-level tactic performs simpli- fication on the goal (where the inductive hypothesis is never used) and, after another similar inductive proof, eventually discharges the goal. Theorem 80 is proven in a similar fashion. IsaPlanner fails to automate these theorems by at- tempting rippling in the step case goals. As the prove our prover found did not make use of the inductive hypotheses in the step case goals, this indicates that case analysis would be more suitable for automating these theorems than induc- tion. • Whenever our prover performs induction, it first modifies the conclusion of the goal to avoid introducing implications into inductive hypotheses (see §7.6). For example, when proving theorem 87, a simple inductive proof is performed on the Chapter 7. Automation of Proof Patterns 102 modified goal n 6= h ` count n (x ++ (h :: []) ) as opposed to the initial goal ` n 6= h → count n (x ++ (h :: []) ). Theorem 69 is automated in a similar manner. In IsaPlanner, induction is performed in such a way that the inductive hypotheses in the step cases for these examples contain implications. IsaPlanner is known to fail in these situations as its rippling tactic lacks support for step cases of this form [Johansson, 2009, §5.6]. • Theorems 74 and 84 are automated by our prover using simplification followed by induction. For example, for theorem 74, the initial goal is: ys0 = [ ] → l a s t ( xs ++ ys0 ) de f au l t = l a s t xs de f au l t This goal is simplified to last (xs ++ []) default = last xs default and the in- ductive proof is straightforward. IsaPlanner does not simplify top-level goals before performing induction and fails to automate these theorems using this strat- egy. For the remaining theorems that cannot be automated by either system, our prover fails in these proof attempts for the same primary reasons that IsaPlanner does [Johansson, 2009, §5.6]. Specifically, both of our systems lack support for using induction prin- ciples other than structural induction and cannot perform rippling proofs that require piecewise fertilisation [Armando et al., 1999]. 7.14.4 Summary In this experiment, we observed that IsaPlanner and our prover provide similar proof coverage for the theorem corpus that was considered. The results did however high- light the difference in the automation approach used by the two systems. Specifically, our prover always attempts simplification on top-level goals and performs rippling if applicable. In contrast, IsaPlanner never simplifies top-level goals and will only per- form rippling in step case proofs. We identified cases where simplifying the top-level goal would improve the proof coverage of IsaPlanner and observed the unexpected behaviour that allowing rippling to be invoked in top-level goals can block proofs in some situations. It should be noted that the theorem corpus used in this case contained theorems where rippling and simplification were rarely applicable to the top-level goals. In §9.8, we compare the proof coverage of these two systems against proof obligations Chapter 7. Automation of Proof Patterns 103 generated from dependently typed programs, where our prover is found to perform significantly better. 7.15 Conclusions We have described tactics designed to automate the proof patterns that arise when programming with dependent types. This automation uses heuristics for simplifying and generalising goals, and employs the rippling heuristic for guiding induction-like proofs. Of note, several design choices were made so that more general, and thus reusable lemmas would be cached during proof attempts. In particular, we use a liberal generalisation tactic and the delayed generalisation algorithm is used to identify and remove irrelevant subformulae from cached lemmas. We further improve the reusabil- ity of cached lemmas by adding heuristics to identify those which are appropriate for use by the simplification tactic. We then described the proof traces that are produced when proof search fails and explained how the user can provide hints that can help the prover overcome failures. In the next chapter, we introduce our testing tool and explain how this is integrated into our framework to provide further support. Chapter 8 Supporting Dependently Typed Programming with Testing In this chapter, we explain the role of testing in our framework for supporting depen- dently typed programming. In the next two sections, we describe how testing is utilised in the following ways: 1. When a dependently typed function generates an unprovable proof obligation, testing is used to identify this and an error message is shown to the user (see §8.1). The error message shows a counterexample that demonstrates why the identified proof obligation is unprovable. When no counterexample can be found to a proof obligation and the automation fails to find a proof, the user is presented with a trace of the failed proof attempt (the latter feedback was described previ- ously in §7.13). 2. Testing is used by the proof automation to prune overgeneralisations from the search space (see §8.2.1). 3. When the user supplies a non-theorem as a hint to the prover, testing is used to give feedback to help the user fix their faulty hint (see §8.2.2). 4. When the user is performing a manual proof, testing can be used as a tool during the proof attempt to identify unprovable goals (see §8.2.3). We describe a testing tool that we have designed and developed for Coq for the above purposes in §8.3. In the meantime, it suffices to know that this tool uses a QuickCheck- like approach [Claessen and Hughes, 2000] for finding counterexamples to universally quantified conjectures. 104 Chapter 8. Supporting Dependently Typed Programming with Testing 105 8.1 Providing Error Feedback with Testing In §4.3, we described how identifying and fixing errors in dependently typed pro- grams can be challenging. Recall that the kinds of errors we are interested in are those indicated by unprovable proof obligations. We now describe how we can ap- ply testing to identify such errors and provide useful feedback. Before explaining the implementation details of our procedure for providing error feedback, we give an ex- ample of a program that contains an error and show the style of feedback that we want to provide. For this example, we want to define the function intersperse x y where intersperse returns the list x with the items in list y inserted after every item. For instance, intersperse [1; 2; 3] [4; 5] would return [1; 4; 5; 2; 4; 5; 3; 4; 5]. We also want to verify the length of the output list from intersperse is correct using subset types. The following faulty definition almost achieves this task but contains an error: Program Fixpoint i n t e r spe rse ( x : l i s t nat ) ( y : l i s t nat ) : { r : l i s t nat | l eng th r = ( leng th x ) ∗ ( leng th y )} := match x with | [ ] ⇒ [ ] | h : : t ⇒ [ h ] ++ y ++ ( i n t e r spe rse t y ) end . The body of the function has the expected behaviour but the output type is faulty. This faulty typing leads to the recursive call proof obligation being unprovable. The mistake made is that the output type should actually be the following (notice the extra S term): { r : l i s t nat | l eng th r = ( leng th x ) ∗ S ( leng th y )} The unprovable recursive call proof obligation produced by the faulty function has the following form after destructuring the recursive call and substituting pattern matching equations (as we detail in §8.1.1, these steps form part of the testing procedure): y : l i s t nat h : nat t : l i s t nat i n t e r spe r se s : l i s t nat i n t e r spe rse p : leng th i n t e r spe r se s = leng th t ∗ l eng th y leng th ( [ h ] ++ y ++ in t e r spe r se s ) = leng th ( h : : t ) ∗ l eng th y We find the above unprovable proof obligation typical, in that it is non-obvious from casual inspection that the proof obligation is unprovable. Furthermore, even when we know the above is unprovable, identifying where the fault lies is challenging. Chapter 8. Supporting Dependently Typed Programming with Testing 106 We propose the application of QuickCheck style testing on proof goals for identify- ing unprovable proof obligations and the use of counterexamples found during testing for providing helpful error feedback. To do this, whenever a proof obligation is gen- erated, the testing tool we have designed is automatically invoked on each proof obli- gation. For the above proof obligation, the testing tool will quickly find a counterex- ample. When this happens, the term that produced the unprovable proof obligation is underlined in Coq’s IDE 1 and an error message containing the counterexample found with an evaluation trace is displayed to the user (see Figure 8.1). The error message displayed in the case of this example is as follows: ∗∗∗ COUNTEREXAMPLE FOUND ∗∗∗ Var i ab l e i n s t a n t i a t i o n s : y := [ ] , h := 1 , t := [ ] , i n t e r spe r se s := [ ] A l l s ide−cond i t i ons were s a t i s f i e d : i n t e r spe rse p : leng th [ ] = leng th [ ] ∗ l eng th [ ] I n s t a n t i a t e d and s imp l i f i e d conc lus ion showing the con t r ad i c t i o n : leng th ( [ h ] ++ y ++ in t e r spe r se s ) = leng th ( h : : t ) ∗ l eng th y leng th ( [ 1 ] ++ [ ] ++ [ ] ) = leng th [ 1 ] ∗ l eng th [ ] leng th ( [ 1 ] ++ [ ] ) = 1 ∗ 0 leng th [ 1 ] = 0 1 = 0 The error message contains the variable instantiations for the counterexample and shows, with a step-by-step trace, how the conclusion evaluates to a contradiction. This information is intended to be used by the user to isolate the cause of the error and give hints to what changes need to be made. In the above, we can see that when h is 1 and t , y and intersperse s are empty, this leads to a contradiction (i.e. when the input to intersperse is [1] [] ). Notice that the LHS of the conclusion concerns the term from the implementation of intersperse that produced the proof obligation. The RHS of the conclusion is supposed to capture the length of the list we expect from a valid implementation of intersperse in this case. 1Thanks to Matthieu Sozeau for making the modification needed to his Program tactic to allow for this behaviour. Chapter 8. Supporting Dependently Typed Programming with Testing 107 Fi gu re 8. 1: A sc re en sh ot th at de m on st ra te s wh at ha pp en s in Co q’s ID E wh en an un pr ov ab le pr oo fo bl ig at io n is id en tifi ed . Th e te rm in th e pr og ra m sc rip ti n th e le ft pa ne is un de rli ne d to in di ca te th is te rm pr od uc ed th e un pr ov ab le pr oo fo bl ig at io n. Th e bo tto m rig ht pa ne sh ow s th e co un te re xa m pl e- ba se d er ro rm es sa ge . Chapter 8. Supporting Dependently Typed Programming with Testing 108 By considering that a valid implementation of intersperse should return a list of length 1 for the given variable instantiations (i.e. because y is empty, the output list should have the same length as list x), we can reason that the LHS is correct but the RHS is incorrect. The latter indicates that the error is caused by a faulty output type. The evaluation trace, which shows how the value for the RHS of the conclusion was calculated for this counterexample, suggests that adding an S around the length y term would fix the problem. Furthermore, the evaluation trace is useful when a function that appears in the proof obligation is faulty. For example, if length had been wrongly defined to always return 0, this would become obvious from looking at the steps in the evaluation trace. 8.1.1 Error Feedback Procedure We now describe the procedure used for generating the error feedback shown above. Whenever a dependently typed program generates proof obligations, we apply the fol- lowing procedure to each proof obligation before our proof automation is invoked: 1. Exhaustive universal introduction is performed on the proof goal and pattern matching equations are substituted. 2. All subset type terms in the conclusion and subset type assumptions are destruc- tured. This is currently required for our testing tool to work. As it is important for the user to be able to trace the origin of the terms produced here back to their program, we must employ some form of origin tracking [van Deursen et al., 1993]. As we only perform minimal modifications to a goal be- fore testing occurs, we use the following basic procedure for this: when destruc- turing the subset type term returned by a call to function f , the two assumptions produced are given labels of the form f s and f p for the computational term and the propositional term respectively so that the user can determine their origin from the labels. 3. The testing tool is then invoked on the proof goal, resulting in one of the follow- ing: • If the goal is falsified, we display the counterexample found with an eval- uation trace in the form of an error message. We explain how to create Chapter 8. Supporting Dependently Typed Programming with Testing 109 concise evaluation traces in §8.1.2. Additionally, the term that produced the unprovable proof obligation is indicated to the user. • If the goal cannot be falsified, the proof automation is invoked on the orig- inal unmodified goal to attempt to discharge the proof obligation. The modifications to the goal must be undone as these can interfere with the recursive call proof pattern (see §6.5). Note that finding a counterexample to the modified goal in step 3 means that a counterexample exists for the top-level goal as performing universal introduction and destructing subset types can never turn a provable goal into an unprovable goal or vice versa. 8.1.2 Concise Counterexample Evaluation Traces For conciseness, we use a procedure that generates compact yet easy to follow evalua- tion traces of counterexamples when showing error messages. Notice in the evaluation trace from the previous section how incremental simplification is performed on both the LHS and RHS of the equation on each line. The underlying motivation was to replicate how an algebraic equation is simplified over several lines in pen-and-paper proofs. Typically, a handful of simplifications are made on each new line in a way that keeps the number of lines short yet maintains the checkability of the steps that led to the final result. To make the simplifications for one line of the evaluation trace, the procedure used performs a postorder traversal (i.e. the innermost subterms are considered first) of the syntax tree of the current conclusion term. For each node n in the tree, we simplify n by performing computations (i.e. using Coq’s simpl tactic [Bertot and Caste´ran, 2004]) only if none of the child nodes of n have been modified so far. The above procedure is implemented in Coq as an OCaml function that, when supplied with a Coq term, displays an evaluation trace. Specifically, this procedure is not implemented as a tactic as no proof is being constructed. For example, consider if the conclusion was the following term: 1 + (2 + 3) = 4 + 5 In one traversal, the first subterms that are simplified are 2 + 3 and 4 + 5. This simpli- fies the conclusion to 1 + 5 = 9. After this, no more terms are simplified for this line in the evaluation trace. In particular, the terms 1 + 5 and 1 + 5 = 9 are not simplified be- cause these terms are composed of subterms that have been simplified already. For the Chapter 8. Supporting Dependently Typed Programming with Testing 110 next line of the evaluation trace, the conclusion is simplified to 6 = 9. The evaluation trace is then finished as there are no more terms to simplify. 8.1.3 Weak Specifications and Counterexamples We note that weakly specified programs (see §3.4.1) can be somewhat problematic when generating easy-to-understand error messages. For example, reconsider the un- provable proof obligation from the weakly specified interperse program, where only the length property of the output list was captured: y : l i s t nat h : nat t : l i s t nat i n t e r spe r se s : l i s t nat i n t e r spe rse p : leng th i n t e r spe r se s = leng th t ∗ l eng th y leng th ( [ h ] ++ y ++ in t e r spe r se s ) = leng th ( h : : t ) ∗ l eng th y Given y := [1] and t := [2] , a valid instantiation for the recursive call result intersperse s that satisfies the constraint given by intersperse p is [3] . However, the computational part of the definition we gave for interperse could never return such a result from the recursive call given those instantiations for y and t . The issue here is that the terms generated for the counterexample only have to sat- isfy the constraints given by the weak specification of the intersperse function i.e. as specified by the intersperse p assumption. Specifically, the intersperse p assumption constrains the length of the lists generated but the contents of the length are uncon- strained. The user must keep this in mind when interpreting the error messages. This issue is not present with strongly specified functions as the valid term instantiations will be precisely constrained by the assumptions. We note that the error messages produced when working with weak specifications could be improved if the testing procedure generated counterexamples by making use of the computational content of the function the user was trying to define. However, the latter information is currently not accessible in the proof obligations Russell produces. 8.2 Testing and Proving In this section, we explain how testing is integrated with the proof automation de- scribed in the previous chapter to provide extra support for the user. Chapter 8. Supporting Dependently Typed Programming with Testing 111 8.2.1 Testing as Part of Proof Automation During proof search, we use testing to prune unprovable proof goals from the search space. Nontheorems can result from the overgeneralisations produced by the generalise tactic (see §7.5). This use of testing during proof search has the following benefits: Efficiency: Attempting to prove nontheorems can have a significant performance cost as sometimes large search trees must be exhausted for the proof automation to terminate. For example, a nontheorem that contains many variables can result in a large proof search tree as each variable could be considered a candidate for induction by the induction tactic. Concise Search Traces: When proof search fails, determining what hints might be useful to help the prover can involve inspecting a trace of the failed proof attempt (see §7.13). Pruning fruitless paths from the search tree makes this task easier as the trace will be more concise. 8.2.2 Feedback for Faulty Hints with Testing In §7.13, we describe the feature of our proof automation where the user can aid failed proof searches by supplying appropriate lemma hints. To improve the usability of our framework, we employ testing to identify cases where the user supplies a nontheorem as a lemma hint. When the user supplies a conjecture as a lemma hint, we use testing, before invok- ing the proof automation, to attempt to falsify the conjecture. If a counterexample is found, this is shown to the user in the form of an error message (in the same way as shown in §8.1). The use of testing to identify nontheorems supplied as lemma hints has the following benefits: Nontheorem detection: If the user is only told that the prover was unable to prove the conjecture, the user may make further attempts to prove the conjecture them- selves. When the conjecture is identified as a nontheorem, this will prevent the user from wasting time attempting a manual proof. Refinement help: Counterexamples can aid the user in refining nontheorems into the- orem statements. For example, an error message might suggest to the user what side-conditions must be added to refine their nontheorem into a theorem. In fu- ture work, automation could be provided here by using counterexamples to guide the refinement of a non-theorem into a theorem [Colton and Pease, 2005]. Chapter 8. Supporting Dependently Typed Programming with Testing 112 8.2.3 Supporting Manual Proofs with Testing For situations where the user decides to construct a proof manually, we have packaged our testing tool as a tactic so that it can be invoked to look for counterexamples to the current goal. Testing is useful for identifying intially unprovable conjectures and identifying proof steps that change a provable goal to an unprovable goal. Additionally, any counterexamples found can help the user refine unprovable conjectures to theorem statements and help explain why a certain proof step was unsafe. 8.3 Design of a Testing Tool In the previous sections, we described the various ways we can use testing to im- prove the support given by our framework. In this section, we describe the design of a QuickCheck-like [Claessen and Hughes, 2000] testing tool, that we have imple- mented in Coq, that can be used for the purposes we have mentioned. As with the testing tool for Agda, testing in our tool occurs within the framework of the goals being tested [Qiao Haiyan, 2003]. 8.3.1 Requirements The following describes what requirements the testing tool we have designed is in- tended to meet: Coverage: When used for providing error feedback, the testing tool should be able to identify a significant number of unprovable proof obligations to be useful. If the user-supplied program contains an error which is not identified by testing, the usability penalty can be high as the user can be unsure whether the proof obligation is provable or not. Likewise, we would expect similar reliability when testing is used to identify faulty lemma hints. Easy to interpret counterexamples: As the user is expected to inspect the counterex- amples found when these are shown as error messages, we want the counterex- amples to be easy to interpret. It is generally agreed that inspecting a minimal counterexample is easier than inspecting a large and complex one. Efficiency: Searching for counterexamples needs to be reasonably efficient. For ex- ample, we would expect a user to be frustrated if they had to wait more than a Chapter 8. Supporting Dependently Typed Programming with Testing 113 few seconds to test the proof obligation generated from their program. Addition- ally, when testing for unprovable goals during proof search, this only saves time if testing takes less time than it does to wait for failing proof attempts to finish. 8.3.2 Counterexample Generation We start by giving a high-level overview of our testing tool. To find counterexamples to Coq goals, we use the generate-and-test approach used by QuickCheck-like tools [Claessen and Hughes, 2000]. In contrast to, for example, the testing tool available for Isabelle [Nipkow, 2004], testing is performed within the same framework of the goals being tested. We discuss the merits of this approach later in §8.3.7. Our testing tool is first supplied with a Coq goal of the following form: (H1 : T1 ) . . . (Hn : Tn ) ` P Firstly, to test a goal, we must have a method to give a concrete instantiation to each variable H : T in the goal when T:Set or when T:Type. We explain term generation in §8.3.5, where our generator can randomly generate terms for ML-like types (i.e. but not dependent types). Secondly, P and each assumption H : T where T : Prop must be testable after the above variable instantiations have been made. We explain how instantiated proposi- tions are tested in §8.3.3. For example, we can provide support for testing propositions of the form s = t , where s and t are instantiated to ground terms, but not for proposi- tions that include quantifiers. The latter limitation is imposed as otherwise we would need some fast procedure for verifying statements of the form (∀ (n:nat) , P x) and (∃ (n:nat) , Q x) (for some proposition P and Q) hold during testing, where this is problematic for infinite types like nat using the basic QuickCheck approach. The procedure to search for a counterexample is as follows: 1. For each assumption A : R, A is added to the set (a) T when R is the term Type or Set. These represent type variables that need to be instantiated. (b) V when R:Set or R:Type. These represent variables for which we have to generate concrete terms. (c) S when R:Prop. These represent side-conditions that need to be tested after instantiations are made. Chapter 8. Supporting Dependently Typed Programming with Testing 114 For example, members of the sets T, V and S might be the terms A : Type, x : list A and p : length x = 1 respectively. Testing fails if an assumption does not match any of the patterns given. For instance, a goal with assump- tion P : Prop cannot be tested. However, for the examples considered in this thesis, assumptions of this particular form are unlikely to arise in practice. 2. For each assumption A in T, A is replaced in the conclusion and all assumptions in S and V by a concrete data type. We explain this step in §8.3.4. 3. For each assumption x : R in V, a random term t : R is generated and x is replaced with t in all assumptions in S and the goal conclusion. We explain term generation in §8.3.5. 4. The conclusion and all members of S are then simplified by computation (i.e. in the same way Coq’s simpl tactic operates on terms [Bertot and Caste´ran, 2004]). 5. If all properties in S are true and the conclusion is false when tested, a coun- terexample has been found. We describe how tests are performed in §8.3.3. 6. The search for a counterexample can be continued by repeating the above from step 2. We stop searching after a user-defined number of attempts is reached where we use 100 as the default. We label a generated example as “vacuous” when testing shows that a member of Swas false when tested. Like QuickCheck, we report the percentage of vacuous examples generated to the user. We now describe the details of some of the above steps. 8.3.3 Testing Propositions To test a Coq proposition P, we have implemented a function called test that either returns a boolean result when P is testable or fails with an error when P is determined to be untestable. The test function is defined recursively as follows to test proposition P (note that the Coq operators ∧, ∨, →, ∼, < and > are represented as inductive predicates): 1. test True returns true. 2. test False returns false. Chapter 8. Supporting Dependently Typed Programming with Testing 115 3. test (P ∧ Q) returns ( test P) && ( test Q), where && is the boolean “and” op- erator. Likewise, we can give a similar definition to support the operators ∨, → and ∼. 4. Given n and m have type nat and are both ground terms: test (n < m) returns the result of lt bool n m, where lt bool is a function that returns a boolean result that decides if n is less than m. Likewise, we can give a similar definition to support the > operator. 5. Given s and t are ground terms: test (s = t ) returns true when s and t are convertible and false when they are not convertible. 6. If the term being tested t does not match any of the above forms, t is untestable. For example, the propositions ∀ x, . . . , ∃ x, . . . and, given x:nat, x + 1 = 1 are untestable. Our testing tool fails with a warning when an attempt is made to test an untestable term. In further work, we would use a testing procedure that could be extended to work with user-defined inductive predicates. As we tend to use functions as opposed to inductive predicates to represent program properties in this thesis (see §3.4.4), we find the testing procedure above sufficient for our purposes. 8.3.4 Instantiating Type Variables Some goals require that type variables be instantiated before they are tested. For ex- ample, consider the following goal: A:Type x : l i s t A x = x ++ [ ] To generate x and test the conclusion, we must first instantiate A to some concrete data type. To do this, we currently require the user to use a command prior to testing that tells the testing tool what type a named variable should always be replaced with. For example, when testing a goal such as the above, we can instruct the tool to replace occurrences of any variable in a goal with the name A (which must have type Type or Set) to nat. Testing will fail with a warning if there are type variables that could not be instantiated. It would be useful, and also simple, to modify the testing tool so that Chapter 8. Supporting Dependently Typed Programming with Testing 116 when an instantiation for a type variable is not specified by the user, some appropriate random instantiation is chosen instead from the types available. 8.3.5 Random Term Generation To generate random terms for use in our testing procedure, we have implemented a random term generator for ML-like simply typed terms e.g. types such as nat, list nat and btree. To generate a random term t that is a member of the inductive type T, we perform the following procedure, where s is a natural number variable supplied to limit the size of the term generated: 1. If T has no base case constructors or T is not an inductively defined type, term generation fails (e.g. we cannot generate terms of type Prop or list Prop). 2. If s equals 0, we randomly choose a base case constructor c for type T. Other- wise, we randomly choose any constructor c for T. 3. A term is generated for each of constructor c’s arguments by repeating the term generation process for each argument type with s set to half of its current value. These generated terms are then used as arguments to constructor c to construct term t . The use of the variable s guarantees termination and gives some control over the size of the term generated. Custom generators for recursive types in QuickCheck typically use a size parameter for the same purposes. We do not currently provide facilities for writing custom term generators or support random generation of dependently typed terms such as vect. The testing tool will fail with a warning if it needs to generate a term that has a type that is not supported. For future work, it would desirable to adopt the feature from Agda’s testing tool that allows custom generators to be written for inductive families [Qiao Haiyan, 2003]. 8.3.6 Generating Small Counterexamples We use a simple mechanism, which is also used in QuickCheck [Claessen and Hughes, 2000], to increase the likelihood of generating counterexamples that are close to being minimal. When testing the goal, we set the size parameter of the term generator to match the number of tests that have been performed on the goal so far. For instance, when generating test data to test the current goal for the fifth time, the size parameter Chapter 8. Supporting Dependently Typed Programming with Testing 117 for the term generator is set to five. This means that small terms are generated for the initial examples and gradually larger terms are used as more tests are performed. We find this approach is generally effective at finding small, and thus more readable, counterexamples for many goals. Additionally, this approach makes it easier to find counterexamples in goals with side-conditions (recall that we have not implemented facilities for custom term gener- ators yet). For example, consider the following goal: x : l i s t nat y : l i s t nat P : leng th x = leng th y . . . If we allow for large random terms to be generated for x and y, it is highly unlikely P will hold for these values. When we limit the term generator size parameter to a small value, we are much more likely to find a pair of lists that satisfy P. 8.3.6.1 Testing and Higher-Order Functions To test proof goals that contain variables that represent higher-order functions, some mechanism is required for replacing these variables with appropriate concrete func- tions. The testing tools for Agda [Qiao Haiyan, 2003] and PVS [Owre, 2006], as well as Gast [Koopman et al., 2002], Smallcheck [Runciman et al., 2008] and QuickCheck [Claessen and Hughes, 2000], each have different approaches for dealing with this problem. In each case, there is support for generating random functions for use in test- ing. For our testing tool, we use a simple solution to provide some support for testing with higher-order functions: 1. The user first supplies the testing tool with a list of functions that should be used when testing goals. For example, the user might supply the functions S : nat → nat and eq nat dec : ∀ (x y:nat) , {x = y}+{x 6= y}. 2. To test a goal containing some variable f : T, where T is a function type, f is randomly replaced by a user supplied function that has type T. Testing fails if no such replacement is available. For example, consider the following unprovable goal: Chapter 8. Supporting Dependently Typed Programming with Testing 118 A : Type B : Type f : A → B x : l i s t A y : l i s t A rev (map f ( x ++ y ) ) = rev (map f x ) ++ rev (map f y ) The conclusion contains an easily made mistake, where the x and y variables on the RHS of the equation appear in the wrong order. Given that the testing tool has been supplied with the successor function S, the following readable counterexample is easily found by instantiating A to nat and then instantiating f to S: Variable i n s t a n t i a t i o n s : A := nat , B := nat , y := [ 1 ] , x := [ 0 ] , f := S Conclusion : ( rev (map f ( x ++ y ) ) = rev (map f x ) ++ rev (map f y ) ) ( rev (map S ( [ 0 ] + + [ 1 ] ) ) = rev (map S [ 0 ] ) ++ rev (map S [ 1 ] ) ) ( rev (map S [ 0 ; 1 ] ) = rev [ 1 ] ++ rev [ 2 ] ) ( rev [ 1 ; 2 ] = [ 1 ] ++ [ 2 ] ) ( [ 2 ; 1 ] = [ 1 ; 2 ] ) Despite the obvious limitation of failure occurring when functions of the appropri- ate types are not available during testing, this simple approach is useful in practice. When counterexamples are generated, replacement with user generated functions ar- guably increases the readability of the counterexamples compared to replacement with randomly generated functions. With randomly generated functions, the user will be presented with functions that perform unfamiliar computations, which will make un- derstanding the counterexample challenging. With user supplied functions, the user will be familiar with the names and computations of the functions used. For future work, it would be useful to consider how user-defined functions could be randomly combined to generate new functions that could be used by the testing tool. Compared to only using user-defined functions, this would give access to a larger pool of functions when testing goals. Moreover, compared to functions that are gener- ated completely at random, functions created by combining user-defined functions are likely to be easier to interpret by users when used in error messages. Chapter 8. Supporting Dependently Typed Programming with Testing 119 8.3.7 Testing within Coq Testing occurs within the framework of Coq, where the term generator generates Coq terms and testing involves simplifying Coq terms by computation. This approach is different to, for example, the testing tool for Isabelle, where Isabelle specifications are first translated to an ML representation and testing takes place in ML for efficiency [Nipkow, 2004]. We find our approach has the following advantages: • Displaying the counterexample message and the trace with the same notation, formatting and labels used by Coq is trivial. This makes it much easier to gener- ate readable counterexample descriptions and traces. • We have the option of writing term generators in the same language as the pro- grams we want to test. This approach has been demonstrated in Agda’s testing tool [Qiao Haiyan, 2003]. • We can avoid the issue of unsound counterexamples being found when convert- ing between representations e.g. when arbitrary arithmetic precision is used by only one language. However, this issue is harder to avoid if, for example, the goal of constructing a verified Coq program was to extract this program to ML and then use machine integers in place of nat terms in the ML code. In this scenario, we would want our testing tool to identify properties that hold in Coq but do not hold in the ML code. One approach to increase confidence that the properties of the verified code hold for the extracted code is to perform testing on the extracted code. This approach is used in the Focal programming language [Carlier and Dubois, 2008]. However, one issue we did encounter with our approach is that evaluating Coq terms can be slow when the operations being used produce large Coq terms. For example, to evaluate the result of 200 ∗ 200 using Peano arithmetic in Coq takes 0.35s (using an Intel E5200 CPU and 4Gb of RAM). This problem can be worked around by configur- ing the term generator to only generate small terms when conducting tests (see §8.3.6). With this approach it only takes 0.04 seconds on average for our testing tool to run 100 tests on each proof obligation generated from our cases study programs from Chapter 9. Chapter 8. Supporting Dependently Typed Programming with Testing 120 8.4 Related Work We now mention some related testing tools available for programming languages and proof assistants. One of the most well-known testing tools is QuickCheck, which is built for testing Haskell programs [Claessen and Hughes, 2000]. Our Coq testing tool is based on the same approach QuickCheck uses for testing goals (see §2.9.1). In particular, both systems generate random test data when searching for counterexam- ples. However, unlike our system, QuickCheck provides facilities for writing custom generators so that test data can be generated more efficiently. The approach used by SmallCheck for generating test data is to exhaustively search, up to some size limit, all possible term instantiations [Runciman et al., 2008]. This has the benefit of finding the smallest possible counterexample, which is useful for gener- ating readable feedback. The Gast tool for the language Clean employs a combination of the systematic checking, that SmallCheck uses, and random testing [Koopman et al., 2002]. A QuickCheck-like testing also exists for the Focal environment [Carlier and Dubois, 2008]. This tool has similarities to ours in that both generate test data by ran- domly selecting constructors from the type we wish to construct [Carlier and Dubois, 2008]. There are also QuickCheck inspired tools available for many proof assistants. Most closely related to our work is the testing tool for Agda [Qiao Haiyan, 2003] as these both function in a dependently typed setting. As with our work, this tool performs test- ing within the same framework that the goals being tested are represented in. Custom generators can be written for Agda’s tool, where the generator functions are imple- mented as Agda functions. A practical benefit of this is that a generator can be proved to be a surjective function within Agda [Qiao Haiyan, 2003]. Agda also offers support for generating dependently typed terms as well as random functions. The QuickCheck-like tool for Isabelle takes a different approach to testing by first translating the goals being tested to ML [Nipkow, 2004]. This is in contrast to our tool, where testing takes place within the same environment as the goal we want to test. The Isabelle tool supports testing for goals that include inductive datatypes as well as inductive predicates, where we do not support the latter. A QuickCheck tool has also been developed for PVS which, similarly to our work, can be used to test goals that include subset types [Owre, 2006]. Another approach to testing is to translate the goal we want to test to propositional logic and then employ a SAT solver. This technique is used to test first-order goals Chapter 8. Supporting Dependently Typed Programming with Testing 121 in MACE [McCune, 2001] and higher-order goals in both Refute [Weber, 2008] and Nitpick [Blanchette and Nipkow, 2009]. 8.5 Conclusions In this chapter, we described the testing tool component of our framework and how this is integrated with our proof automation to provide further support for program- ming with dependent types. The testing component employs a QuickCheck-like ap- proach to find counterexamples to unprovable proof obligations, where we presented an implementation of such a tool for the Coq proof assistant. One purpose of testing in our framework is to provide error feedback to the user in the form of counterexam- ple descriptions for unprovable proof obligations. We explained how testing is used to give feedback to faulty user hints as well as to identify overgeneralisations made by our prover during proof search. The testing tool can also be manually employed to help during interactive proofs. In the next chapter, we make use of our testing tool as part of our complete prototype to conduct case studies that examine what support our framework can give when programming with dependent types. Chapter 9 Case Studies We have now described the design of our framework for supporting dependently typed programming. In this chapter, we evaluate our claim that this framework can make development significantly easier. To do this, we discuss the results of several case studies where the framework was used to provide help for writing dependently typed programs. The case studies are used to illustrate the strengths and weaknesses of our framework and compare the support it gives to currently available programming envi- ronments. 9.1 Research Questions We first consider the research questions that we would like our case studies to answer. Regarding the support provided for developing dependently typed programs, we would like to answer the following questions: • Which data type representations, program property representations and levels of type refinement are well supported by the proof automation? • How often can proof automation failures be overcome with user hints? What level of expertise is required to give effective hints? • How helpful were the error feedback facilities during development? 9.2 Procedure We now describe the procedure we used to carry out our case studies, where we aim to give a broad picture of how our framework can support dependently typed program- 122 Chapter 9. Case Studies 123 ming in practice. 9.2.1 Choice of Examples We began the work in this chapter with a fixed list of the programs that needed to be developed. These were chosen so that we would be required to make use of a variety of data types and program properties so we could determine what was well supported by our framework. Most of the case studies are based on or inspired by example programs we have seen from current dependently typed programming languages. This is to show what support our framework provides for the kinds of programs current developers want to write and so that we can more easily make comparisons between the support our framework provides and the support provided in current environments. For example, the tail recursion case study is based on an example from ATS (see §9.3), the quicksort example is based on a Coq program example (see §9.3) and the binary adder is based on an Idris program (see §9.5). 9.2.2 Conducting a Case Study Each case study involved implementing a dependently typed program and reporting on our experiences. The basic components of the instructions for each case study consisted of descriptions for the following: Functionality: An informal description of the tasks the finished program should per- form e.g. “implement an insertion sort function”. Program properties: An informal description of the program properties that should be captured with the use of dependent types e.g. “verify that the list returned by the insertion sort function is always a permutation of the original list”. When conducting the case studies, we intentionally avoided representations that the prototype has not been built to support. Specifically, we avoided the use of inductive predicates as these are not supported by our testing tool or our proof automation. 9.2.3 Reporting Case Studies For each case study carried out, we give a factual account of the following: • We describe the program written and the reason behind any relevant design choices. Standard function definitions used can be found in Appendix A. Chapter 9. Case Studies 124 • We describe how the proof automation performed when discharging proof obli- gations. We describe any proof attempt successes or failures worthy of note, but we do not exhaustively discuss each proof attempt. For brevity, unless otherwise stated, we describe the initial form of each proof obligation as being the form after exhaustive universal introduction is performed, pattern matching equations are substituted and subset type terms are destructured. • When a proof obligation could not be automatically discharged, we describe our attempts to complete the proof with the use of the hinting feature and the various tactics our framework provides. • We describe where the error feedback facilities were particularly helpful or un- helpful for developing certain kinds of programs. We did not attempt to exhaus- tively catalogue data regarding the errors we made during development. When developing a program script, it is typical to first write one function, followed by another, followed by returning to modify the first function to correct an error. As a program script goes through many changes before it reaches its final form, it would be problematic to report on each and every proof obligation we encountered during case studies. Our pragmatic approach is that our description of the proof obligations are of those produced by the final script only. Moreover, whilst conducting a case study, when we were satisfied the specification of a function captured the property we had intended and we were convinced that the proof obligations generated by the function should be provable, we made no further modifications to that function definition. 9.2.4 System Configuration The following describes how the system was configured when we conducted the case studies and why this configuration was chosen: Initial lemmas: Each example program was developed from an empty proof script (i.e. cached lemmas were not shared between examples). Additionally, the lemma databases (see §7.2) at the start of each example were initially empty. This design choice was made to show how the prover copes without domain specific lemmas. Note that, for the purposes of describing the behaviour of our prover, we added a feature to make the prover report if a proof could only be found when cached lemmas were used. Chapter 9. Case Studies 125 Lemma discovery: We used a conservative configuration for the lemma discovery component (see §9.7.3), where only commutativity, associativity and involution properties were conjectured about each simply typed function definition used. We chose against checking for additional properties and only chose to check for these very common properties to avoid criticisms that the prover had been fine tuned to the examples. Decision procedures: We note here that the prover does not make use of Coq’s Pres- burger arithmetic procedure, which can easily be added as part of the trivial tactic. This choice was made to demonstrate how the prover would perform in environments without such a decision procedure, such as in Epigram. Modifications: We only allowed for modifying the prototype during the case stud- ies to patch easily fixable minor bugs that were preventing intended behaviour. Fortunately, such modifications were not required. 9.3 Case Study: Tail Recursive Functions We now begin our description of the case studies that we conducted. We start by look- ing at a set of examples which involved writing efficient tail recursive functions. Each example involves making use of dependent types to verify that a tail recursive version of a function always computes the same result as a, simpler to define, naive definition. For each example function, we experiment with different possible representations, in- cluding the use of helper functions and higher-order fold functions. This case study was inspired from an example ATS program, where a tail recursive factorial function is verified [Xi, 2010]. 9.3.1 List Sum We start with the somewhat simpler task of implementing a function to sum a list of natural numbers. For this set of examples, our proof automation was able to discharge all of the 5 proof obligations that arose. The standard naive definition of such a function is as follows: Fixpoint sum (a : l i s t nat ) : nat := match a with | [ ] ⇒ 0 Chapter 9. Case Studies 126 | h : : t ⇒ h + sum t end . 9.3.1.1 Using a Helper Function The first representation we use involves defining our tail recursive sum function using a helper function with an accumulator variable as follows: Program Fixpoint sum ta i l aux ( a : l i s t nat ) ( acc : nat ) : { r : nat | r = acc + sum a} := match a with | [ ] ⇒ acc | h : : t ⇒ sum ta i l aux t ( acc + h ) end . Program Fixpoint sum ta i l ( a : l i s t nat ) : { r : nat | r = sum a} := sum ta i l aux a 0. Here we have used subset types to verify that sum tail and sum always compute the same result. The recursive call proof obligation of sum tail aux is as follows: sum ta i l aux p : sum ta i l aux s = acc + h + sum t sum ta i l aux s = acc + sum (h : : t ) Somewhat unexpectedly, the recursive call pattern does not apply here as sum tail aux p does not embed into the conclusion. As we commented in §6.5.1, this can happen when arguments in a recursive call are not all subterms of the corresponding arguments to the parent call. Here, the second argument of the recursive call is acc + h, which is not a subterm of acc. As the output type of the recursive call here has the form {r | r = t}, where vari- able r does not occur in the term t , the propositional term produced by destructuring the recursive call will always be a non-recursive equation. The simplify tactic will always substitute with such equations. This goal thus simplifies to the following: acc + h + sum t = acc + ( h + sum t ) Chapter 9. Case Studies 127 We find this pattern of simplification common to the solving the recursive call proof obligations in later examples of tail recursive programs where helper functions are used. Returning to the goal above, as lemma discovery was already used to prove that + is associative, the prover trivially discharges this proof obligation. Finally, the sum tail function generates no proof obligations as the return type for sum tail aux normalises to the expected type. 9.3.1.2 Using a Helper Function: Variant To experiment with representation changes, we now take the program from the pre- vious section and simply swap the order of the arguments to + in the output type and function body of sum tail aux. We would expect our framework to be able to support such a minor representation change seeing as the previous example was unproblematic. This time, the recursive call proof obligation for sum tail aux simplifies to the fol- lowing: sum t + ( h + acc ) = h + sum t + acc This variation of associativity is not yet known by the prover. The goal is discharged automatically by first generalising sum t and then performing a simple inductive proof. 9.3.1.3 Using Fold Finally, we now attempt to define sum tail using a fold function as follows: Program Fixpoint sum ta i l ( a : l i s t nat ) : { r : nat | r = sum a} := f o l d l e f t p lus a 0 . This representation is pleasingly concise and similar to the style that is encouraged when writing regular functional programs. The above function generates the following proof obligation which, notably, contains a higher order function: f o l d l e f t p lus a 0 = sum a The prover manages to discharge this proof obligation with the use of induction and lemma calculation (i.e. another inductive proof is required after fertilising then gener- alising the goal in the step case). Chapter 9. Case Studies 128 9.3.2 Factorial We now perform a similar investigation into defining a tail recursive version of a facto- rial function, where we would expect to have to prove non-linear arithmetic properties. For this set of examples, our proof automation was able to automate 7 out of the 9 proof obligations that arose. Capturing properties about a tail recursive factorial function has been seen previ- ously in ATS examples [Xi, 2010], where manual proofs are required. Xi comments that it is “really tedious to establish [the] properties”1 needed for this in ATS. We make use of the following function for the naive definition of factorial: Fixpoint f a c t ( n : nat ) : nat := match n with | O ⇒ 1 | S p ⇒ S p ∗ f a c t p end 9.3.2.1 Using a Helper Function Similar to before, we start by attempting to define a tail recursive version of fact with the use of a helper function and an accumulator variable as follows: Program Fixpoint f a c t t a i l a u x ( n acc : nat ) : { r : nat | r = acc ∗ ( f a c t n )} := match n with | O ⇒ acc | S p ⇒ f a c t t a i l a u x p ( acc ∗ n ) end . Program Def in i t ion f a c t t a i l ( n : nat ) : { r : nat | r = f a c t n} := f a c t t a i l a u x n 1. The recursive call proof obligation of fact tail aux simplifies to the following: acc ∗ S p ∗ f a c t p = acc ∗ ( f a c t p + p ∗ f a c t p ) This goal is proven by first generalising the common subterm fact p and performing an inductive proof on acc. 1Personal communication. Chapter 9. Case Studies 129 9.3.2.2 Using a Helper Function: Variant For this example, we copy the previous program script and swap the arguments to the ∗ operator in both the output type and in the function body of the fact tail aux function. The recursive call proof obligation from fact tail aux now simplifies to the follow- ing, which turns out to be more challenging than before: f a c t p ∗ ( acc + p ∗ acc ) = ( f a c t p + p ∗ f a c t p ) ∗ acc The prover solves this goal by generalising the common subterm fact p and then per- forming an inductive proof on the fresh variable introduced by this step. 9.3.2.3 Using Fold We now attempt a definition of a tail recursive factorial function that makes use of folding. To do this, we use the following function from Coq’s standard library for generating a sequence of numbers: Fixpoint seq ( s t a r t len : nat ) : l i s t nat := match len with | 0 ⇒ [ ] | S p ⇒ s t a r t : : seq (S s t a r t ) p end . For example, seq 2 4 is used to generate the list [2; 3; 4; 5]. A typical definition of factorial using fold is as follows, where mult is the function that ∗ is annotation for: Program Def in i t ion f a c t t a i l ( n : nat ) : { r : nat | r = f a c t n} := f o l d l e f t mul t ( seq 1 n ) 1 . This function generates the following simplified proof obligation: f o l d l e f t mul t ( seq 1 n ) 1 = f a c t n Unfortunately, the prover fails to prove this goal. Positive progress towards a proof is made however. Induction on n yields a trivial base case and in the step case rippling can fully ripple the RHS. Lemma calculation then conjectures the following theorem: ∀ n , f o l d l e f t mul t ( seq 2 n ) 1 = f o l d l e f t mul t ( seq 1 n ) 1 + n ∗ f o l d l e f t mul t ( seq 1 n ) 1 The prover is unable to prove this goal and we could see no obvious hints that could help. Chapter 9. Case Studies 130 9.3.2.4 Another Attempt Using Fold We next tried reimplementing the body of fact aux using fold, instead of fact tail , to see how the prover copes. We predicted that this proof should be easier to automate. We defined fact aux as follows: Program Def in i t ion f ac t aux ( n acc : nat ) : { r : nat | r = acc ∗ f a c t n} := f o l d l e f t mul t ( seq 1 n ) acc . The proof obligation generated from this function is as follows: f o l d l e f t mul t ( seq 1 n ) acc = acc ∗ f a c t n Unfortunately, this again turned out to be too difficult for the prover to discharge. 9.3.3 Inorder Tree Traversal As a final example in this case study, we consider writing an optimised version of the following inorder traversal function for binary trees: Fixpoint i no rde r ( a : b t ree A) : l i s t A := match a with | empty ⇒ [ ] | node v l r ⇒ ( i no rde r l ) ++ [ v ] ++ ( i no rde r r ) end . For this set of examples, our automation was able to discharge 4 out of the 5 proof obligations that arose. 9.3.3.1 Using a Helper Function As there are two recursive calls, there is no simple way to write this function using tail recursion. We can however settle for a definition where the left subtree is traversed with tail recursion and use an accumulator to replace the expensive ++ operator with :: . This definition, defined with a helper function, is as follows: Chapter 9. Case Studies 131 Program Fixpoint i no rder aux ( a : b t ree A) ( acc : l i s t A) : { r : l i s t A | r = i no rde r a ++ acc} := match a with | empty ⇒ acc | node v l r ⇒ i no rder aux l ( v : : ( i no rder aux r acc ) ) end . Program Fixpoint i n o r d e r t a i l ( a : b t ree A) : { r : l i s t A | r = i no rde r a} := ino rder aux a [ ] . The base case proof obligation of inorder aux is proven by reflexivity. The recursive call proof obligation simplifies to the following: i no rde r l ++ v : : i no rde r r ++ acc = ( i no rde r l ++ v : : i no rde r r ) ++ acc The proof proceeds by generalising the common subterms inorder l and inorder r to produce the following: c1 ++ v : : ( c2 ++ acc ) = ( c1 ++ v : : c2 ) ++ acc This goal is then proven with a simple inductive proof. Finally, the proof obligation generated by inorder tail is trivially automated. 9.3.3.2 Using Fold We move onto defining a version of inorder tail that uses fold. We implement such a function as follows, where fold here performs an inorder traversal of a tree: Program Fixpoint i n o r d e r t a i l ( a : b t ree A) : { r : l i s t A | r = i no rde r a} := f o l d ( fun acc v ⇒ v : : acc ) a [ ] . Notice that a λ term features in this program. The proof obligation generated by this function is as follows: f o l d ( fun acc v ⇒ v : : acc ) a [ ] = i no rde r a Unfortunately, the prover is unable to automate this proof. After induction on variable a, lemma calculation results in the following goal, which the prover is unable to solve: f o l d r i g h t ( fun acc v ⇒ v : : acc ) b l = Chapter 9. Case Studies 132 f o l d r i g h t ( fun acc v ⇒ v : : acc ) b [ ] ++ l 9.3.3.3 Using Fold: Variant This time we attempt to reimplement inorder aux using a fold as follows: Program Fixpoint i no rder aux ( a : b t ree A) ( acc : l i s t A) : { r : l i s t A | r = i no rde r a ++ acc} := f o l d ( fun acc v ⇒ v : : acc ) a acc . The proof obligation generated by inorder aux is as follows, where, for brevity, we have replaced the lambda term with f: f o l d f a acc = ino rde r a ++ acc The prover succeeds at finding a proof. The proof begins by induction on the tree a, where the step case goal produces multiple givens. The following shows the step case, where we display one annotated conclusion for each hypothesis: H1 : ∀ acc , f o l d f r acc = ino rde r r ++ acc H2 : ∀ acc , f o l d f l acc = ino rde r l ++ acc fold f ( node v l r ↑ )baccc= inorder ( node v l r ↑ ) ++baccc fold f ( node v l r ↑ )baccc= inorder ( node v l r ↑ ) ++baccc After computing with the fold function, the LHS is rippled out to the following: fold f l (v :: fold f rbaccc ↑= inorder ( node v l r ↑ ) ++baccc fold f l bv :: fold f r accc= inorder ( node v l r ↑ ) ++baccc Weak fertilisation proceeds by rewriting the LHS with H1 and then H2 in sequence. After this, lemma calculation is then used to finish the proof. Compared to the failed proof attempt in the previous section, weak fertilisation is possible this time because of the presence of the acc sink in the conclusion. 9.3.4 Error Feedback We found the error feedback particularly useful for developing this style of examples. In particular, as these examples used strong specifications, we found the error messages produced easy to follow (we mentioned in §8.1.3 that weak specifications can make the Chapter 9. Case Studies 133 error messages harder to understand). The following gives some examples of the ways the error feedback helped: • When giving the output type of the helper functions, it is relatively easy to treat the accumulator variable incorrectly. For example, we initially wrote r = acc + ( fact n) instead of r = acc ∗ ( fact n) in the output type for fact aux. Likewise, we acci- dentally wrote r = acc ++ inorder a instead of r = inorder a ++ acc in the output type for inorder aux. In each case, we were alerted to the error and examining the counterexample trace made the cause of the problem obvious. • When defining fact , we accidentally made the base case return 0 instead of 1. This became obvious when examining a trace of a counterexample found when defining the first fact aux function. 9.4 Case Study: Insertion Sort, Tree Sort and Quicksort In this section, we make use of subset types for capturing properties of programs that implement insertion sort, tree sort and quicksort. We explore what support can be given for capturing length and permutation properties. Note that, for ease of presentation, the implementations given are specialised for collections of natural numbers. We make use of the following comparison function in each sorting program: l e g t dec : ∀ n m : nat , {n ≤ m} + {n > m} 9.4.1 Insertion Sort The first sorting algorithm that we verify is insertion sort. Our automation managed to discharge all 6 of the proof obligations that arose for this example. The first property that we wish to capture is that the insertion sort function we implement returns a list of the expected length. Such a program can be implemented in a straightforward way as follows, where we have used subset types to capture the length of the sorted output list: Program Fixpoint i n s e r t ( x : nat ) ( a : l i s t nat ) : { r : l i s t nat | l eng th r = S ( leng th a )} := match a with Chapter 9. Case Studies 134 | n i l ⇒ [ x ] | h : : t ⇒ i f l e g t dec x h then x : : a else h : : ( i n s e r t x t ) end . Program Fixpoint i n s e r t i o n s o r t ( a : l i s t nat ) : { r : l i s t nat | l eng th r = leng th a} := match a with | n i l ⇒ n i l | h : : t ⇒ i n s e r t h ( i n s e r t i o n s o r t t ) end . The insert function adds an item into a sorted list such that the is also sorted. The insertion sort function recursively adds each item from an unsorted list a into an ini- tially empty list using insert such that the resultant list from insertion sort is a sorted permutation of a. 9.4.1.1 Length Property We now consider the proof obligations produced by the program above, ignoring the more trivial ones. The recursive call proof obligation produced by insert is as follows: insert p : length insert s = S (length t ) length ( h :: insert s ↑ ) = S (length ( h :: t ↑ ) ) The proof for this goal follows the recursive call proof pattern and is discharged auto- matically. Notice that the insertion sort function contains a call to insert , where insert re- turns a subset type. The recursive call proof obligation produced by insertion sort , without destructuring the subset type terms, is as follows: l eng th ( p r o j 1 s i g ( i n s e r t h ( p r o j 1 s i g ( i n s e r t i o n s o r t t ) ) ) ) = leng th ( h : : t ) To prove this goal, the prover follows the recursive call pattern. This first involves destructuring the call to only the recursive call term, producing the following goal: e : length x = length t length ( proj1 sig ( insert h x ) ↑ ) = length ( h :: t ↑ Chapter 9. Case Studies 135 The prover then fully ripples out and weak fertilises the RHS to produce the following goal: l eng th ( p r o j 1 s i g ( i n s e r t h x ) ) = S ( leng th x ) The prover finishes the proof by destructuring the result of insert and directly applying the propositional term produced to prove the conclusion. Note that, if the prover had blindly destructured both of the subset type terms, the following goal would have been produced: i n s e r t i o n s o r t p : leng th i n s e r t i o n s o r t s = leng th t i n s e r t p : leng th i n s e r t s = S ( leng th i n s e r t i o n s o r t s ) leng th i n s e r t s = leng th ( h : : t ) Notice that there are no embeddings in this goal so a more ad hoc and less guided approach would have been needed to solve this goal, compared to the use of rippling. 9.4.1.2 Length Property: Variation Next, we consider what happens when the insert function is changed to being simply typed, where it no longer specifies the length of the list it returns. By making this change, we can see how the framework copes when the programmer decides to make use of functions with less informative types. From experience, we know that this can make the proofs involved more challenging. This time, the recursive call proof obligation produced by insertion sort only con- tains one subset type term (i.e. the call to insertion sort ). The recursive call pattern is followed where, after rippling out and weak fertilising, the following goal is produced: l eng th ( i n s e r t h i n s e r t i o n s o r t s ) = S ( leng th i n s e r t i o n s o r t s ) The prover discharges this goal by induction over insertion sort s . Notice that the goal above encodes the information that we chose to specify by hand in the output type of insert in the previous section. We see here that the convenience of fewer annotations can result in more challenging proofs and, in this case, the prover can support both representations. Chapter 9. Case Studies 136 9.4.1.3 Permutation Property We now consider capturing the property that insertion sort returns a permutation of its input. To represent a permutation, we make use of the following function which returns the number of terms in a list that have the same value as x: Fixpoint l i s t c o u n t ( a : l i s t nat ) ( x : nat ) : nat := match a with | n i l ⇒ O | h : : t ⇒ i f nat eq dec h x then S ( l i s t c o u n t t x ) else l i s t c o u n t t x end . We use list perm x y as shorthand for ∀ n, list count x n = list count y n to repre- sent that list x is a permutation of list y. We chose this representation as it was simple to define and generally useful for capturing other properties about lists. To create our program, we simply copy the one from §9.4.1.1 and replace the length propositions with propositions concerning permutation properties as follows: Program Fixpoint i n s e r t ( x : nat ) ( a : l i s t nat ) : { r : l i s t nat | l i s t p e rm r ( x : : a )} := (∗ as before ∗ ) Program Fixpoint i n s e r t i o n s o r t ( a : l i s t nat ) : { r : l i s t nat | l i s t p e rm r a} := (∗ as before ∗ ) The recursive call proof obligation generated by the insert function, which can be automatically discharged, has the following form: insert p : ∀ n : nat, list count insert s n = list count (x :: t ) n list count ( h :: insert s ↑ )bnc= list count (x :: h :: t ↑ )bnc Similarly to when the length property of this function was captured, this goal contains embeddings and the proof again involves following the recursive call pattern. In this Chapter 9. Case Studies 137 case, the goal features sinks, because list perm includes a universal quantifier. Ad- ditionally, the conditional statement used to define list count means that a case split must be performed before weak fertilisation can occur. The recursive call proof obligation produced by insertion sort follows a similar course to before with the use of the recursive call pattern. Again, the difference here is that a case split is required before weak fertilisation can occur. 9.4.1.4 Permutation Property: Variation For this example, we modified our previous program such that insert was now a sim- ply typed function instead of one that returns a subset type to produce the following program (notice that insertion sort still returns a subset type): Fixpoint i n s e r t ( x : nat ) ( a : l i s t nat ) : l i s t nat := (∗ as before ∗ ) Program Fixpoint i n s e r t i o n s o r t ( a : l i s t nat ) : { r : l i s t nat | l i s t p e rm r a} := (∗ as before ∗ ) Again, the prover is able to automate all the proofs required. As with the previous program, the proof for the recursive call proof obligation generated by insertion sort follows the recursive call pattern. Lemma calculation is required to finish this proof, where the following two lemmas are proven and cached in the process: 1) l i s t c o u n t ( i n s e r t n i n s e r t i o n s o r t s ) n = S ( l i s t c o u n t i n s e r t i o n s o r t s n ) 2) h 6= n → l i s t c o u n t ( i n s e r t h i n s e r t i o n s o r t s ) n = l i s t c o u n t i n s e r t i o n s o r t s n The second lemma is automatically identified as a right-to-left simplification rule (see §7.11). It was not immediately obvious to us that this was a useful simplification rule, but we agreed with the classification on inspection. Chapter 9. Case Studies 138 9.4.2 Tree Sort We now look at implementing a program to sort lists using tree sort. To sort a list with this algorithm, items from an unsorted list are first inserted one by one into a binary tree. A list created by performing an inorder traversal of this tree will result in a sorted permutation of the original list. For this set of examples, our automation discharged 10 out of the 12 proof obligations that arose. To implement tree sort, we use the simple type btree to represent binary trees (see §A.3). The following gives a straightforward implementation of tree sort, where subset types are used to capture the length property of the resulting sorted list: Program Fixpoint i n s e r t ( x : nat ) ( a : b t ree nat ) : { r : b t ree nat | num nodes r = num nodes ( node x a empty )} := match a with | empty ⇒ node x empty empty | node y l r ⇒ i f l e g t dec x y then node y ( i n s e r t x l ) r else node y l ( i n s e r t x r ) end . Program Fixpoint s o r t e d t r e e o f l i s t ( a : l i s t nat ) : { r : b t ree nat | num nodes r = leng th a} := match a with | [ ] ⇒ empty | h : : t ⇒ i n s e r t h ( s o r t e d t r e e o f l i s t t ) end . Program Fixpoint t r e e s o r t ( a : l i s t nat ) : { r : l i s t nat | l eng th r = leng th a} := i no rde r ( s o r t e d t r e e o f l i s t a ) . The insert function insert an item into a sorted binary tree. Note that we do not con- sider balanced trees in this implementation. The sorted tree of list function converts an unsorted list into a sorted binary tree. The tree sort function first converts the input unsorted list into a sorted binary tree and then returns the inorder traversal of this tree Chapter 9. Case Studies 139 to give the final sorted list. The simply typed functions inorder and num nodes are used to return the inorder traversal of a tree and the number of nodes in a tree respectively. 9.4.2.1 Length Property We begin by considering the proof obligations produced by the program above, where we have captured the length property of the final sorted list. Compared to the insertion sort algorithm in the last set of examples, the proofs involved this time naturally require reasoning about trees, as well as lists. For the insert function, there are two recursive call proof obligations to discharge. In each case, the proof is automated by an inductive proof of a simple linear arithmetic property after the use of the recursive call pattern. For the sorted tree of list function, the proof of the recursive call proof obligation involves using the propositional terms returned by both the recursive calls and the call to insert . In this sense, this proof has similarities to the proof for the recursive call proof obligation for the insertion sort function from §9.4.1.1. For this proof, the recursive call terms are destructured and rippling is used to weak fertilise the goal with the propositional terms generated. When the result from insert is then destructured, the goal has the following form: i n s e r t p : num nodes i n s e r t s = num nodes ( node h x empty ) num nodes i n s e r t s = S ( num nodes x ) The proof is completed by first using the cross fertilise tactic to rewrite the conclusion with insert p from left to right. Simplification, generalisation and induction are then used to finish the proof. The tree sort function produces one proof obligation. Here, the cross fertilise tactic is used to fertilise the conclusion with the propositional term returned by the sorted tree of list function, resulting in the following goal: l eng th ( i no rde r s o r t e d t r e e o f l i s t s ) = num nodes s o r t e d t r e e o f l i s t s As the inorder function is simply typed, induction is naturally needed to prove this goal. This goal is proven by induction over the variable sorted tree of list s . Chapter 9. Case Studies 140 9.4.2.2 Permutation Property Reusing the previous program, we now capture the property that the output list from tree sort is a permutation of the input list to the function. To do this, we reuse the list count function and list perm notation from §9.4.1.3. We introduce the notation btree perm x y to denote that tree x is a permutation of tree y. This notation is shorthand for ∀ n, btree count x n = btree count y n, where btree count x n returns the number of terms with the same name as n in tree x. The btree count function is defined as follows: Fixpoint bt ree coun t ( a : b t ree nat ) ( x : nat ) : nat := match a with | empty ⇒ O | node v l r ⇒ l e t coun t l r := b t ree coun t l x + b t ree coun t r x in i f nat eq dec v x then (S coun t l r ) else coun t l r end . This function is more complex than count list in that the member of both subtrees must be considered. We modify the output types of the tree sort implementation as follows: Program Fixpoint i n s e r t ( x : nat ) ( a : b t ree nat ) : { r : b t ree nat | btree perm r ( node x a empty )} := (∗ as before ∗ ) Program Fixpoint s o r t e d t r e e o f l i s t ( a : l i s t nat ) : { r : b t ree nat | ∀ n , b t ree coun t r n = l i s t c o u n t a n} := (∗ as before ∗ ) Program Fixpoint t r e e s o r t ( a : l i s t nat ) : { r : l i s t nat | l i s t p e rm r a} := (∗ as before ∗ ) We now consider the proof obligations produced by this program. The proofs required for the sorted tree of list and insert functions have a similar shape to the ones re- quired in the previous section. This time, the proofs are more complex in that case splits are performed during rippling. Chapter 9. Case Studies 141 Unfortunately, the prover fails to automate the proof obligation generated by the tree sort function. After simplification, the proof obligation has the following form: l i s t c o u n t ( i no rde r s o r t e d t r e e o f l i s t s ) n = b t ree coun t s o r t e d t r e e o f l i s t s n After induction on variable sorted tree of list s , a case split is performed in the step case proof and lemma calculation results in the following goal, which the prover is unable to automate: l i s t c o u n t ( i no rde r l ++ [ n ] ++ ino rde r r ) n = S ( l i s t c o u n t ( i no rde r l ) n + l i s t c o u n t ( i no rde r r ) n ) A productive generalisation here is to generalise the inorder l and inorder r common subterms as follows: ∀ x y n , l i s t c o u n t ( x ++ [ n ] ++ y ) n S ( l i s t c o u n t x n + l i s t c o u n t y n ) ) . The generalise tactic makes this step but, unfortunately, follows on by overgeneralising the goal by generalising apart the occurrences of the variable n. The overgeneralisation is detected by our testing tool, but this event causes all the generalisation steps to be undone (see §7.5.6). As the alternative proof path is to attempt induction on the ungeneralised conjecture, the prover eventually fails. To work around this, after identifying from the proof search trace that this goal was not being generalised correctly, we provided the above correct generalisation as a hint. The prover successfully proved this lemma and then managed to discharge the failing proof obligation by using this new lemma to trivially prove the problematic goal. 9.4.2.3 Permutation Property: Variation The output type of the previous definition of sorted tree of list checks that the out- put tree contains the same elements as the input list by making use of tree count and list count . To experiment with different representation choices, we now consider the following alternative representation for the output type: Program Fixpoint s o r t e d t r e e o f l i s t ( a : l i s t nat ) : { r : b t ree nat | l i s t p e rm ( i no rde r r ) a} := (∗ as before ∗ ) This time for the output type, the output tree is converted to a list using inorder and we then check that this list is a permutation of the input list. Chapter 9. Case Studies 142 Unfortunately, the prover is unable to automate the recursive call proof obligation generated by this function. Briefly, the proof attempt follows the recursive call pattern, where rippling produces two subgoals. The prover is then unable to automate either of these subgoals. We consider only the first subgoal here, which is as follows: i n s e r t p : ∀ n : nat , b t ree coun t i n s e r t s n = b t ree coun t ( node n x empty ) n l i s t c o u n t ( i no rde r i n s e r t s ) n = S ( l i s t c o u n t ( i no rde r x ) n ) Rippling does not apply and the simplify tactic cannot do anything productive. Any successful inductive proof would likely require piecewise fertilisation [Armando et al., 1999], which the current system does not support. To work around this, we considered how the goal above could be modified to allow fertilisation with insert p . We reasoned that the following lemma, when used from left to right to rewrite the conclusion, would allow this: L : ∀ x n , l i s t c o u n t ( i no rde r x ) n = b t ree coun t x n We asked the system to prove L but the automation failed. Notice that L has the same form as the proof obligation that could not be fully automated in the previous section because of an overgeneralisation occurring in the proof attempt. We thus provided the same generalisation hint from the previous section (recall that we are not sharing cached lemmas between example programs) and this allowed L to be proven. After adding L as a left to right simplification rule, the problematic goal above was then successfully automated. The automation succeeded this time because the simplify tactic was able to alter the goal of the conclusion so that fertilisation could occur and induction was then used to finish the proof. We note that this solution is not ideal because converting uses of list count to tree count is not always going to be desirable in every proof. Finally, unlike in §9.4.2.2, the proof obligation produced by tree sort is trivially automated with the use of simplification. 9.4.3 Quicksort As the final set of examples in this case study, we now consider sorting with the well known and efficient quicksort algorithm. Our automation was only moderately suc- cessful for this example by being able to discharge 5 out of the 10 of the proof obliga- Chapter 9. Case Studies 143 tions that arose. Quicksort is traditionally not written in a structurally recursive manner and can be problematic to define in languages where all function definitions are required to terminate. Sozeau’s Program tactic includes a feature that allows a non-structurally recursive function to be defined if a decreasing measure is provided (see §3.3.4). This feature can be used to give a natural definition of quicksort in Coq. For this set of examples, we have adapted an implementation of quicksort written by Sozeau2 that uses this feature. In his proof script, Sozeau captures the full specification of quicksort using subset types. Unfortunately, the approach used to do this is not well supported by our frame- work. Specifically, the propositional parts of the subset types are defined with the use of inductive predicates. We noted earlier in §7.14.3, that working with such a repre- sentation is not currently supported by our prototype. We therefore chose to adapt this quicksort implementation, maintaining the same program structure (and making some minor cosmetic differences) but changing the way the propositional statements were expressed. Our adaptation, where the length property of the resulting list has been captured, is as follows: Program Fixpoint s p l i t ( p i v o t : nat ) ( a : l i s t nat ) : { ( lower , h igher ) : l i s t nat ∗ l i s t nat | l eng th lower + leng th h igher = leng th a} := match a with | n i l ⇒ ( [ ] , [ ] ) | h : : t ⇒ match s p l i t p i v o t t with | ( lower , h igher ) ⇒ i f l e g t dec h p i vo t then ( h : : lower , h igher ) else ( lower , h : : h igher ) end end . 2Available at http://mattam.org/repos/coq/misc/sort/quicksort.v Chapter 9. Case Studies 144 Program Fixpoint qu i ckso r t ( a : l i s t nat ) {measure ( leng th a )} : { r : l i s t nat | l eng th r = leng th a} := match a with | [ ] ⇒ [ ] | h : : t ⇒ match s p l i t h t with | ( lower , h igher ) ⇒ qu i ckso r t lower ++ [ h ] ++ qu i ckso r t h igher end end . The split function splits a list into two sublists based on a pivot item, where members of the first list are all lower than the pivot and members of the second list are all higher than the pivot. The quicksort function, using the head value as the pivot, uses split to partition a list in two and then applies the quicksort function again on these two sublists. The decreasing measure specified (using the measure keyword) is that the length of the output from each recursive call to quicksort is always less than the length of the input for this call. 9.4.3.1 Length Property We begin by considering the proof obligations generated by the above program. Firstly, the proof obligations from the split function are successfully automated, where proofs of simple arithmetic properties are required. The recursive call proof obligation generated by the quicksort function is interest- ing in that it contains two recursive call terms. The usual strategy of following the recursive call pattern fails here as no embeddings are found between the conclusion and the proofs terms generated from these recursive calls. This is perhaps unsurprising when we remember that quicksort has not been defined by structural recursion. The proof thus proceeds by destructing all subset type terms to produce the following goal: qu ickso r t1 p : leng th qu i ckso r t 1 s = leng th h igher qu i ckso r t2 p : leng th qu i ckso r t 2 s = leng th lower s p l i t p : leng th lower + leng th h igher = leng th t leng th ( qu i ckso r t 2 s ++ [ h ] ++ qu i ckso r t 1 s ) = leng th ( h : : t ) Chapter 9. Case Studies 145 As there are no embeddable assumptions to ripple with here, the simplify tactic pro- ceeds by simplifying the RHS of the conclusion and then rewriting the RHS of the conclusion with the split p , quicksort1 p and quicksort2 p equations from right-to-left in sequence to give the following: l eng th ( qu i ckso r t 2 s ++ h : : qu i ckso r t 1 s ) = S ( leng th qu i ckso r t 2 s + leng th qu i ckso r t 1 s ) This goal is then discharged automatically with a simple inductive proof over variable quicksort2 s. We now consider the proof required to show that quicksort always terminates. We must prove the supplied measure is decreasing for both of the recursive calls to quicksort. The first call to quicksort requires a proof of the following: l eng th lower < S ( leng th lower + leng th h igher ) The prover is unable to solve this goal as it attempts to perform a proof by induction but is unable to reason about the inductive predicate <. The second measure proof obligation also has a similar shape to the above. Note that we can work around these proof automation failures by manually calling Coq’s Presburger arithmetic procedure. 9.4.3.2 Permutation Property We now capture the property that the result of the quicksort function is a permutation of its input. We do this by changing the output types of the previous program to the following: Program Fixpoint s p l i t ( p i v o t : nat ) ( a : l i s t nat ) : { ( lower , h igher ) : l i s t nat ∗ l i s t nat | l i s t p e rm ( lower ++ h igher ) a} := (∗ as before ∗ ) Program Fixpoint qu i ckso r t ( a : l i s t nat ) {measure ( leng th a )} : { r : l i s t nat | l i s t p e rm r a} := (∗ as before ∗ ) The proof obligations generated by the split function are both successfully automated. Of note, when discharging the proof obligation generated by the second if clause in this function, the following lemma is cached: count simp : ∀ h x y n , Chapter 9. Case Studies 146 h 6= n → l i s t c o u n t ( x ++ h : : y ) n = l i s t c o u n t ( x ++ y ) n This lemma was then automatically added to the simplification lemma database as a left to right simplification rule as the RHS of the equation embeds into the LHS (see §7.11). Note that, to produce this reusable lemma, delayed generalisation (see §7.10) was used to remove irrelevant assumptions, such as pivot < h, from the original proof found for this lemma. This simplification rule becomes relevant for discharging the next proof obligation. The recursive call proof obligation for the quicksort function has a similar shape to what was seen in the previous section. This time, however, this cannot be automated without help. After destructuring the subset type terms, the proof obligation here has the following form: qu ickso r t1 p : ∀ n , l i s t c o u n t qu i ckso r t 1 s n = l i s t c o u n t h igher n qu i ckso r t2 p : ∀ n , l i s t c o u n t qu i ckso r t 2 s n = l i s t c o u n t lower n s p l i t p : ∀ n , l i s t c o u n t ( lower ++ h igher ) n = l i s t c o u n t t n l i s t c o u n t ( qu i ckso r t 2 s ++ [ h ] ++ qu i ckso r t 1 s ) n = l i s t c o u n t ( h : : t ) n Again, there are no embeddings to ripple with. The simplify tactic proceeds by sim- plifying the conclusion and then performing a case split on the if construct produced on the RHS to give two subgoals. In both subgoals, the split p equation is used to rewrite the RHS of the conclusion from right to left. Unlike last time, quicksort1 p and quicksort2 p cannot yet be used to rewrite the conclusion. Somewhat surprisingly however, these two terms now embed into the conclusion. At this stage in the proof attempt, the second subgoal has the following form: qu ickso r t1 p : ∀ n , l i s t c o u n t qu i ckso r t 1 s n = l i s t c o u n t h igher n qu i ckso r t2 p : ∀ n , l i s t c o u n t qu i ckso r t 2 s n = l i s t c o u n t lower n c : h 6= n list count ( quicksort2 s ++ h :: quicksort1 s ↑ )bnc= list count ( lower ++ higher ↑ )bnc list count ( quicksort2 s ++ h :: quicksort1 s ↑ )bnc= list count ( lower ++ higher ↑ )bnc Notice that the LHS of the conclusion matches the RHS of the cached simplification rule count simp found at the start of this section, where assumption c is a proof of the necessary side-condition. The simplify tactic continues its work by using count simp to eliminate the h:: term from the conclusion. Unfortunately, instead of guiding the Chapter 9. Case Studies 147 proof with rippling, the prover then tries to perform an inductive proof and fails. This happens here because the prover never expects to see any terms that embed into the conclusion between its simplification and induction steps (see §7.1). Noticing from the proof trace that the propositional terms from the recursive call were not being used, we attempted a manual proof. We invoked the simplify tactic and, upon noticing the embeddings in each subgoal, called the rippling tactic manually. However, rippling was immediately blocked. We found it easy to see from the rippling annotations that the following rule would unblock this rippling proof: ∀ x y n , l i s t c o u n t ( x ++ y ) n = ( l i s t c o u n t x n ) + ( l i s t c o u n t y n ) After being supplied with the above statement in the form of a hint, the prover was able to use this to finish the proof via rippling automatically. A Challenging Termination Proof When capturing only the length property of the quicksort program in §9.4.3.1, the termination proof obligation could not be automated by the prover but it was possible to work around this by simply invoking Coq’s Presburger arithmetic procedure manually. For this program, the proof required is more challenging. For the first recursive call to quicksort, we are required to prove the following goal: s p l i t p : l i s t p e rm ( lower ++ h igher ) t l eng th lower < S ( leng th t ) The prover is unable to make any useful progress on this goal so we have to resort to a manual proof. The approach we used was to first prove the following lemmas: perm length : ∀ x y , l i s t p e rm x y → l eng th x = leng th y length app : ∀ x y , leng th ( x ++ y ) = leng th x + leng th y We can then manually use perm length with assumption split p to produce a proof of length (lower ++ higher) = length t . After rewriting this new assumption using the lemma length app from right to left , the goal can be discharged by invoking Coq’s Presburger arithmetic procedure. The prover can help in this manual proof in that it can automatically prove length app for us when asked to do so. However, the prover cannot automate the proof for perm length. The second recursive call to quicksort requires a similar termination proof. Chapter 9. Case Studies 148 9.4.4 Error Feedback We again found the error feedback helpful for quickly identifying errors when devel- oping this set of examples but did experience some problems. As the testing tool can- not test goals that have assumptions that contain universal quantifiers, error feedback could not be given for most of the examples where we captured permutation properties. Specifically, this was the case when the tree perm and list perm notations were being used. For instance, when the prover failed to prove the goal from §9.4.3.2, we were less sure the goal was provable when testing was not available. However, as the per- mutation property examples were created by modifying the output types of previous examples (where the length property was captured first), the lack of error feedback for the former examples was less of a problem in practice. When capturing the length properties, the feedback was generally useful in alerting us to problems and the error messages produced offered some help in fixing the error messages. As we noted in §8.1.3, some thought is needed when interpreting the error messages when weak specifications are being used, which is the case for the length property examples. 9.5 Case Study: Binary Adder In this case study, we port a program written in Idris to Coq to see what support can be provided. This Idris program3 makes use of inductive families to verify that a bi- nary adder performs as expected [Brady, 2008]. We chose this example as the program makes an interesting use of inductive families, non-linear arithmetic properties are in- volved and the comments in Brady’s program script implies that the proofs required to define the program were tedious to write. Note that, due to some bugs we encountered in the Program tactic, some of the Russell functions in this section had to be written as regular Coq functions. Our automation was able to discharge all but one of the 20 proof obligations that arose in this case study. 9.5.1 Inductive Families Representation We start by introducing all the data types that will be needed in the main program. The following type represents a binary bit indexed by its natural number representation: 3Available at http://www.cs.st-andrews.ac.uk/˜eb/drafts/binary.idr Chapter 9. Case Studies 149 Inductive B i t : nat → Set := | b i t 0 : B i t 0 | b i t 1 : B i t 1 . For example, we know from the type index that bit0 represents that nat value 0. The following type, indexed in the same way, represents a pair of bits, where the leftmost bit is taken as the most significant bit: Inductive B i t Pa i r : nat → Set := | b i t P a i r : ∀ c v , B i t c → B i t v → B i t Pa i r ( v + 2 ∗ c ) . For example, bitPair bit1 bit0 has type BitPair (0 + 2 ∗ 1). The type can be inter- preted as “a bit pair whose decimal value is 2“. The following type represents a binary number, composed of Bit terms, where the type is indexed by its natural number rep- resentation as well as its length: Inductive Number : nat → nat → Set := | none : Number 0 0 | b i t : ∀ b n val , B i t b → Number n va l → Number (S n ) ( ( 2 ˆ n ) ∗ b + va l ) . For example, the type Number 8 32 represents “a binary number composed of 8 bits that has the decimal value 32”. Similarly indexed, the following type represents a binary number coupled with a carry bit: Inductive NumCarry : nat → nat → Set := | numCarry : ∀ c n val , B i t c → Number n va l → NumCarry n ( ( 2 ˆ n ) ∗ c + va l ) . We now define several utility functions before defining the binary adder function. The first function adds a pair of bits x to the leftmost position of a binary number num: Program Fixpoint msPair ( b n va l : nat ) ( x : B i t Pa i r b ) (num:Number n va l ) : NumCarry (S n ) ( ( 2 ˆ n ) ∗ b + va l ) := match x with | ( b i t P a i r c v ) ⇒ numCarry c ( b i t v num) end . We must discharge a proof obligation to show that the binary result has the expected natural number representation. The conclusion of the proof obligation generated is as follows after simplifying by performing computations: Chapter 9. Case Studies 150 ( 2 ˆ n + (2 ˆ n + 0) ) ∗ s + (2 ˆ n ∗ t + va l ) = 2ˆn ∗ ( t + ( s + ( s + 0) ) ) + va l The prover is able to discharge this proof obligation automatically. In the proof, the simplify tactic simplifies the goal using the rule ∀ x, x + 0 = x. This rule was found during lemma discovery and automatically added as a simplification rule. The 2ˆn term is then identified as a common subterm and generalised to produce the following: ( c + c ) ∗ s + ( c ∗ t + va l ) = c ∗ ( t + ( s + s ) ) + va l The prover then automates this proof via induction on c, with lemma calculation being needed several times. This proof is challenging in that arithmetic lemmas found during lemma discovery are required to find a proof. The initial steps of Brady’s hand written proof for the above similarly involves making the same simplifications and generalising the 2ˆn term. However, instead of induction, Brady makes use of commutativity, associativity and distributive theorems about + and ∗ to rewrite the goal to finish the proof. The next function we need to define sums together three bits labelled x, y and z, returning a pair of bits: Program Def in i t ion addBi t ( r l c : nat ) ( x : B i t c ) ( y : B i t l ) ( z : B i t r ) : B i t Pa i r ( c + ( l + r ) ) := match x , y , z with | b i t0 , b i t 0 , b i t 0 ⇒ b i t P a i r b i t 0 b i t 0 | b i t0 , b i t 0 , b i t 1 ⇒ b i t P a i r b i t 0 b i t 1 | b i t0 , b i t 1 , b i t 0 ⇒ b i t P a i r b i t 0 b i t 1 | b i t0 , b i t 1 , b i t 1 ⇒ b i t P a i r b i t 1 b i t 0 | b i t1 , b i t 0 , b i t 0 ⇒ b i t P a i r b i t 0 b i t 1 | b i t1 , b i t 0 , b i t 1 ⇒ b i t P a i r b i t 1 b i t 0 | b i t1 , b i t 1 , b i t 0 ⇒ b i t P a i r b i t 1 b i t 0 | b i t1 , b i t 1 , b i t 1 ⇒ b i t P a i r b i t 1 b i t 1 end . This function is simply implemented as a lookup table, where no proof obligations are generated. The next function adds the two bits x and y to a binary number with a carry bit nc: Program Fixpoint addNumberAux ( l r n va l : nat ) ( x : B i t l ) ( y : B i t r ) ( nc : NumCarry n va l ) : Chapter 9. Case Studies 151 NumCarry (S n ) ( ( 2 ˆ n ) ∗ ( l + r ) + va l ) := match nc with | numCarry c num ⇒ msPair ( addBi t x y c ) num end . This function generates the following proof obligation: 2ˆn ∗ ( l + ( r + c ) ) + va l = 2ˆn ∗ ( l + r ) + ( 2 ˆ n ∗ c + va l ) The proof found by the prover is similar to the proof required for the msPair function, where the 2ˆn term is first generalised and induction is used to finish the proof. Brady’s proof again involves the same generalisation and makes use of rewriting instead of induction. We now consider the function that sums two binary numbers. This function is defined to only accept two binary numbers that have the same length, where summing is performed recursively by adding together the leftmost bits of the two numbers: Program Fixpoint addNumber ( n l r c : nat ) ( x : Number n l ) ( y : Number n r ) ( b : B i t c ) : NumCarry n ( c + ( l + r ) ) := match x , y with | b i t , none ⇒ ! | none , b i t ⇒ ! | none , none ⇒ numCarry b none | b i t b1 num1, b i t b2 num2 ⇒ addNumberAux b1 b2 ( addNumber num1 num2 b ) end . The first two match clauses are marked as impossible cases as the numbers being added must be the same length. All the proof obligations generated by this function were discharged automatically by our top-level tactic. The proof obligations generated by the first three match clauses are trivially automated. The final match clause produces the following proof obligation, where some simplification has already been performed: 2ˆn ∗ ( x + y ) + ( c + ( l + r ) ) = c + (2 ˆ n ∗ x + l + (2 ˆ n ∗ y + r ) ) Again, the prover manages to discharge this goal by generalising the common subterm 2ˆn and performing an inductive proof. The binary adder program has now been de- fined. Brady’s proof for the previous goal involves the same generalisation step and again makes use of rewriting instead of induction. Chapter 9. Case Studies 152 9.5.2 Inductive Families Representation: Variation To see how robust our prover was to simple representation changes, we modified the type definitions used for the above program by arbitrarily swapping the arguments to ∗ in the definition of BitPair and swapping the arguments to + in the definition of NumCarry. The prover again successfully automated all the proofs needed. Had we defined this program with a manual proof, such as in the Idris script, such a represen- tation change would have required the manual proof to be updated as well. 9.5.3 Subset Types Representation To test an alternative representation, we modified the binary adder program from the previous section to capture properties using subset types instead of using inductive types. We were interested in seeing if our framework was able to support the use of both representations. All but one of the proof obligations produced by our subset type version were successfully automated. We now briefly describe this new version of the previous program. This time, we represent binary numbers using only simply typed inductive types. For example, reusing the same type names from before, we can define a binary number as follows: Inductive B i t : Set := | b i t 0 | b i t 1 . Inductive Number : Set := | none : Number | b i t : B i t → Number → Number . To capture properties of binary numbers with subset types, we make use of several functions that convert the binary number types we use above to their nat representation: Fixpoint n a t o f b i t ( b : B i t ) : nat := match b with | b i t 0 ⇒ 0 | b i t 1 ⇒ 1 end . Chapter 9. Case Studies 153 Fixpoint num length ( n : Number ) : nat := match n with | none ⇒ 0 | b i t b m ⇒ S ( num length m) end . Fixpoint nat of num (n :Number ) : nat := match n with | none ⇒ 0 | b i t b i t 0 m ⇒ nat of num m | b i t b i t 1 m ⇒ 2 ˆ ( num length m) + ( nat of num m) end . For example, the nat of num function above is used to convert a Number term into a nat term. Given a suitable implementation of addNumberAux, the following function gives a definition of addNumber, where the same properties as before are captured using subset types: Program Fixpoint addNumber ( x : Number ) ( y : Number | num length y = num length x ) ( ca r ry : B i t ) : { r : NumCarry | na t o f nc r = n a t o f b i t ca r ry + ( nat of num x + nat of num y ) ∧ num length ( num of nc r ) = num length x} := match x , y with | none , none ⇒ numCarry ca r ry none | b i t b1 t1 , none ⇒ ! | none , b i t b1 t2 ⇒ ! | b i t b1 num1, b i t b2 num2 ⇒ addNumberAux b1 b2 ( addNumber num1 num2 car ry ) end . We found the use of subset types here cumbersome as the output types were verbose and complex. Fortunately, our testing tool alerted us to mistakes we made and the prover was able to automate all the needed proofs. The prover was unable to automate the recursive call proof obligation generated by the addNumber function. The conclusion of this goal after destructing the result from the recursive call is as follows (the rippling annotations indicate the differences between the conclusion and the recursive call result): Chapter 9. Case Studies 154 nat of nc ( proj1 sig (addNumberAux b1 b2 addNumber s ) ↑ ) = nat of bit carry + (nat of num ( bit b1 num1 ↑ ) + nat of num ( bit b2 num2 ↑ ) ) ∧ num length (num of nc ( proj1 sig (addNumberAux b1 b2 addNumber s ) ↑ ) ) = num length ( bit b1 num1 ) ↑ As the conclusion is not an equation, the rippling tactic does not have the option of weak fertilising the goal. The rippling tactic fails to strong fertilise and the proof at- tempt fails. However, we can help the prover succeed by destructuring the conjunction in the given, splitting the conjunction in the goal and then invoking the rippling tac- tic on the two subgoals produced. Rippling succeeds as it is able to weak fertilise in both subgoals as the given is an equation in each case. Clearly it would be desirable to adapt the rippling tactic to perform the manual steps described so that givens that contain conjunctions do not block rippling proofs. 9.5.4 Error Feedback We found the error feedback helpful for the binary adder examples but with some caveats: • We made many errors when developing the subset type version of the binary adder as we found the output types tricky to write due to the number of terms they included. The testing tool helped greatly in that it automatically told us that an error had been made. Unfortunately, as the proof obligations generated in this case study tended to contain many assumptions and a complex conclusion, the counterexample descriptions were less useful due to their verbosity. We thus found it easier to inspect the program script for errors when we were told that an error was present. Additionally, when we made an error writing the output type of the addNumber function, the testing tool initially did not identify the unprovable recursive call proof obligation that arose. In this case, the testing tool warned (see §8.3.2) that it could only generate test data that satisfied the side-conditions in this proof obli- gation approximately 1% of the time. On noticing this percentage, we manually ran the testing tool two more times before it found a counterexample. Test data was difficult to generate in this case as the tool lacks support for custom genera- tors and the strong specification of the program being tested tightly constrained which variable instantiations were allowed. Chapter 9. Case Studies 155 • Our testing tool is unable to generate terms that have dependent types, such as the indexed Number type, when testing goals. So that we would get error feedback while developing the binary adder program which uses such types, whenever the automation failed, we checked for an error by manually calling the testing tool after simplifying top-level goals and discarding all assumptions. As each of the nontrivial goals simplified to an arithmetic equation that was solvable without using any of the goal assumptions, this enabled us to get feedback on when we made an error. However, as with the binary adder program written with subset types, we found the indication that an error had been made was useful but we tended not to inspect the error messages closely. 9.6 Results from Case Studies The table in Appendix B presents a summary of how the proof automation performed in the case studies that were described in the previous sections. We ignore what we have labelled “trivial” proof obligations in the table of results. A proof obligation is labelled as trivial if it can be proven using a propositional logic decision procedure, or by reflexivity, after destructuring all subset type terms and substituting with all as- sumptions with the type x = . . . for some variable x. Over all of the case studies, 67 nontrivial proof obligations were generated. Out of these, 84% were successfully proven, showing that our framework offers a high degree of automation. The mean time spent on proof search was 1.60 seconds. We would consider this to be satisfactory performance for the prover to be a practical tool. 9.7 Lemma Caching Evaluation In this section, we evaluate the impact the lemma caching feature of our prover (see §7.2) had when automating the case study proof obligations. This includes an exami- nation of the utility of the lemma discovery feature (see §9.7.3) of our system. 9.7.1 Experimental Setup In this experiment, we consider the following three configurations of our prover when ran against the case studies described at the start of this chapter: Chapter 9. Case Studies 156 DiscoverOnly: The lemmas found by lemma discovery tool are available for use by the prover. Lemmas are not cached after proof attempts. CacheOnly: The lemmas found by the lemma discovery tool are not available for use by the prover. However, lemmas are cached after proof attempts and these lemmas can be used. The lemma cache is not cleared after each verification task (where the tasks are attempted in the order presented in this chapter). BasicDefs: The prover only uses basic definitions during proofs i.e. it is not allowed to use cached lemmas. 9.7.2 Results The prover was able to automate 47 (70%) of the goals under the BasicDefs configura- tion. For the DiscoverOnly and the CacheOnly configurations, the prover was able to automate 56 (84%) of the goals. The DiscoverOnly and the CacheOnly configurations succeeded on the same goals and were able to automate all the goals proven by the BasicDefs configuration. See Appendix B for the detailed results of which theorems could be automated by our prover under the various configurations. 9.7.3 Analysis The results show that the lemma discovery tool as well as the lemma caching feature increases the proof coverage of our system. We first consider the behaviour of the lemma discovery tool. For these case studies, the lemma discovery tool discovered lemmas concerning the +, ∗ and ++ operators. The tool conjectured and proved the following lemmas about these operators in 1.7 seconds: (x ++ y) ++ z = x ++ y ++ z x ∗ y ∗ z = x ∗ (y ∗ z) x ∗ y = y ∗ x x + y + z = x + (y + z) x + y = y + x By comparing the results of the DiscoverOnly and the BasicDefs configuration, it can be seen that the lemmas proven during lemma discovery were required to solve 1 of the goals that arose when verifying the tail recursive factorial program (see §9.3.2) and Chapter 9. Case Studies 157 8 of the goals that arose when verifying the binary adder programs (see §9.5). Each of these goals was arithmetic in nature, where each proof involved induction and rippling. In each case, lemmas found during lemma discovery were required by rippling during the proof attempt to allow fertilisation to take place. For example, our prover requires more than basic definitions to discharge the following goal that arose from the tail recursive factorial case study: n ∗ (acc + p ∗ acc) = (n + p ∗ n) ∗ acc For the CacheOnly configuration, the results show that caching lemmas between proof attempts improved the proof coverage of the prover compared to when no lem- mas were cached. For example, the CacheOnly configuration was able to automate the example goal above by using arithmetic lemmas that were cached before this goal was attempted. It is interesting that despite the different approaches used, CacheOnly and Dis- coverOnly succeeded on the same goals. We would not expect this result in general and further evaluation would be required to see how commonly this occurs. For in- stance, the order that the goals are attempted in will have an impact on the results of the CacheOnly configuration. Specifically, if the goal above from the tail recursive fac- torial case study had been attempted before any others, the proof attempt would have failed as no lemmas would have been cached at this point. One benefit of the lemma discovery tool is that the proof coverage of the prover is less reliant on the order the goals are attempted in. However, the lemma discovery tool can only generate lemmas that have the same form as the supplied lemma templates (see §) unlike the lemma caching mechanism. Thus, scenarios must exist where a goal can be automated with a cached lemma that could not have been generated by the lemma discovery tool. 9.7.4 Summary From this experiment, it can be seen that the lemma discovery tool and the lemma caching features of our prover increase proof coverage in practice. This agrees with previous observations that rippling-based proof automation becomes more powerful when given access to extra lemmas [Bundy et al., 1993,Dixon, 2005,Johansson, 2009]. From these results, we would consider it useful further work to consider extending our lemma discovery tool so that it is able to conjecture and prove more complex lemmas. Chapter 9. Case Studies 158 9.8 A Comparison with IsaPlanner In this section, we examine how many of the proof obligations from the case studies can be automated by IsaPlanner (a tool for the Isabelle theorem prover) compared to our prover. As IsaPlanner is designed to automate inductive proofs and many of the proof obligations naturally require induction to discharge, we would expect that IsaPlanner should be able to offer some level of automation. 9.8.1 Experimental Setup For this experiment, IsaPlanner and our prover were only be supplied with basic def- initions and no additional lemmas. Recall that, using this configuration, IsaPlanner and our system gave a similar level of automation for the theorem corpus described in §7.14. To run the experiment, we first had to translate all the case study proof obliga- tions to Isabelle so that these could be attempted by IsaPlanner. The translation of the simply typed Coq functions that appear in the proof obligations to Isabelle was straightforward. Each proof obligation was translated to Isabelle in the following man- ner: 1. For each proof obligation, we first substituted any pattern matching equations that were present (see §3.3.1). This trivial simplification step is part of the default behaviour of the Program tactic and is always performed by our prover on top- level goals also. 2. The case study proof obligations contain dependently typed terms and there is no obvious way such terms can be represented in Isabelle. These features were therefore eliminated from each proof obligation before being translated to Is- abelle. For each proof obligation, this was achieved by destructuring all subset type terms and discarding any assumptions concerning the inductive families used to represent binary numbers. The former step is always safe and the latter step is safe for the proof obligations from our case studies as these assumptions are not needed in any of the proofs. For a fair comparison with IsaPlanner, our prover was ran against the case study proof obligations after the transformations described above were applied. Chapter 9. Case Studies 159 9.8.2 Results Out of 67 goals, IsaPlanner was able to automate 8 (12%). In comparison, our prover was able to automate 46 of the goals (69%) while similarly configured to only use basic definitions during proofs. All of the goals discharged by IsaPlanner were also discharged by our prover. See Appendix B for the detailed results of which theorems could be automated by the two systems. 9.8.3 Analysis IsaPlanner failed to automate a significant number of theorems and the results show that our prover is more effective at automating the proof obligations that arose from the case studies. We now describe the primary differences between the approach our prover uses to automate the proof obligations compared to the approach used by Isa- Planner: • IsaPlanner does not take advantage of embeddings that exist in top-level goals and will perform induction when rippling could be used to guide the proof. For example, IsaPlanner fails to automate the recursive call proof obligation that arose from the example in §9.4.1.2. This proof obligation has the following form: insertion sort p : length insertion sort s = length t length ( insert h insertion sort s ↑ ) = length ( h :: t ↑ As the assumption embeds into the conclusion, rippling can be used to guide the proof. Our prover discharges this goal by rippling out the differences on the RHS of the conclusion, weak fertilising and then performing an inductive proof on the goal (see §9.4.1.2). Instead of using rippling, IsaPlanner performs induction on the top-level goal and fails to find a proof. • IsaPlanner does not attempt to simplify or generalise goals before performing induction on the top-level goals. For example, IsaPlanner fails to automate the goal r = 1 ∗ fact n → r = fact n (which arose from the factorial case study) with an inductive proof. Our prover automates this by simplifying the goal to fact n + 0 = fact n, generalising this goal to x + 0 = x and then performing a Chapter 9. Case Studies 160 simple inductive proof. IsaPlanner is however able to automate this generalised goal. • IsaPlanner lacks reasoning techniques needed for proving impossible case proof obligations. Specifically, IsaPlanner fails to automate any of the six impossible case proof obligations that arose from the binary adder case study (see §9.5.1). For example, one of these proof obligations has the form (P:0 = S x) ` False. Our prover discharges this goal with the trivial tactic by reasoning that the as- sumption has a type that is uninhabited. • IsaPlanner fails to automate top-level goals that can be proven with only basic simplification. A simple example of this comes from the binary adder case study (see §9.5.1) where one proof obligation has the form (P : S x = S y) ` x = y. This is solved by our prover by simplifying assumption P to x = y and then using P to trivially discharge the goal. IsaPlanner instead attempts an inductive proof and fails. 9.8.4 Summary In this experiment, it was found that our prover was able to automate significantly more of the proof obligations that arose from our case studies compared to IsaPlanner. This is in contrast to a previous experiment that compared the proof coverage of IsaPlanner and our prover on a theorem corpus (see §7.14) where both systems gave a comparable level of automation. The primary difference in the latter experiment is that almost all of the theorems from the corpus required induction to be performed directly on the top-level goals whereas this is not always the case for discharging proof obligations. IsaPlanner has primarily been designed to automate lemma statements that require in- duction to be performed directly on top-level goals. In contrast, our prover is designed to expect simplification, generalisation and rippling to be applicable to top-level goals. The results of this experiment suggest that the generality of IsaPlanner could be im- proved by adopting the strategy employed by our prover. 9.9 Answers to Research Questions In this section, we provide answers to the research questions we proposed in §9.1 by summarising our experiences of conducting the case studies. Chapter 9. Case Studies 161 Which data type representations, program property representations and levels of type refinement are well supported by the proof au- tomation? We believe our case studies show that our framework offers broad proof automation support for programming with dependent types. In the following, we comment on several aspects of the proof automation support demonstrated: Program properties: In the case studies, we verified the correctness of tail recursive functions, length and permutation properties of sorting functions, and the cor- rectness of variants of a binary adder program. These program properties were described using a combination of functions that operate over lists, trees, Peano arithmetic and binary numbers. The variety of program properties and the data types used give good evidence that the automation provides generic support that will work for other types and functions that we have not yet experimented with. Subset types: The majority of the example programs involved programming with sub- set types, where the propositional parts of subset types generally consisted of equational statements, the use of simple types (e.g. list and nat) and structurally recursive functions. We found the framework offered significant automation for working with this style of representation. In particular, the recursive call , induction, ripple and generalise patterns were frequently used in the proofs re- quired. Inductive families: In the binary adder case study, we demonstrated that automation could be provided for working with inductive families, where the proof obliga- tions involved had the form of non-linear arithmetic equations. Although we should examine further examples involving inductive families, we would expect support could be given whenever the type indices used produce proof obliga- tions that involve equations, recursively defined functions and inductively de- fined types. Recursion: The case studies show that the prover provides good support for work- ing with structurally recursive functions. The quicksort example involved non- structural recursion, where partial automation was achieved. However, further work will be needed to understand how the automation copes for other examples of non-structural recursion. Chapter 9. Case Studies 162 Type refinement: In several examples, we changed a function that had previously re- turned a subset type term to one that only returned a simply typed term. For example, we did this for the helper functions in the tail recursion examples and for the insert function in the insertion sort examples. In each case, the automa- tion provided support for the proof obligations produced, where the use of simple types typically made the proofs more challenging. Robustness to change: In each set of case studies, we examined programs that were created by making slight changes to previously written programs to see how the automation coped. For example, in the tail recursion and binary adder examples, we changed the order of certain arguments in such a way that the program be- haviour and the property being verified were the same but the proofs required were different. In each case, the automation was robust to these small changes. This gives further evidence of the generality of the tactics. Moreover, this sup- port is useful when refactoring programs as, without proof automation, small representation changes usually require that we update manual proofs. As seen in the quicksort and tree sort examples, the prover is unable to automate any proofs that required piecewise fertilisation. Additionally, the termination proofs for the quick sort examples were problematic as the prover does not support the use of inductive predicates yet. How often can proof automation failures be overcome with user hints? What level of expertise is required to give effective hints? On several occasions, we managed to give hints to help the prover discharge a prob- lematic proof obligation. In the quicksort case study, we supplied a wave rule hint to unblock a rippling proof. In the tree sort case study, we supplied a generalisation hint to work around a case where the generalisation tactic was overgeneralising a goal. We personally found formulating these particular hints intuitive, but we are aware that a good understanding of what each tactic does during proof search is needed for this to be a realistic option in many cases. For example, some of the hints we gave would require the user to understand rippling and rippling annotations. We note that, in cases where the proof was blocked because of the lack of support for piecewise fertilisation and inductive predicates, there was no obvious way we could give a hint to help the prover. Chapter 9. Case Studies 163 How helpful were the error feedback facilities during development? At the end of each case study, we commented on the usefulness of the error feed- back feature of our framework. In general, we felt the feedback greatly improved our ability to identify and fix errors. For example, we believe that the binary adder case study would have been much more difficult to complete without this since we made numerous errors that were caught by our testing tool. However, we did note that when capturing the permutation properties of the sorting algorithms, the testing tool was less useful as it was unable to test goals that included universally quantified assumptions. Additionally, when writing the binary adder with the use of subset type, we noted that, although the indication that there was an error was helpful, many of the actual error messages were difficult to interpret. 9.10 Related Work in Dependently Typed Programming Environments We now compare the facilities offered by our framework, which provides integrated proof automation and error feedback, to the support provided by other dependently typed programming environments. We consider Agda first, which has both an inductive proof automation tool and a testing tool available for it. The Agsy proof automation tool for Agda [Lindblad and Benke, 2006] has similar- ities to our tool in that the former is implemented in a similar setting to Coq and auto- mates proofs using generalisation and induction, where proofs can include case splits. However, Agsy has limited support for rewriting with equations [Lindblad and Benke, 2006, §4] and so would be unable to support proofs that rely on the controlled use of equational lemmas made possible by rippling. The author of the tool comments that Agsy is unable to discover simple lemmas that are needed during some proofs [Lind- blad and Benke, 2006, §4]. It is difficult to make a more formal comparison between our system and Agsy as we are unable to obtain a version of Agsy to perform experi- ments with and there is no corpus available that documents which theorems Agsy can automate. Although Agsy does not currently cache and reuse the proofs it finds, delayed generalisation could be implemented similarly in Adga. An interesting difference with our work is that the Agsy tool directly inserts the terms it constructs into the program script. For this reason, Agsy contains features for improving the readability of the Chapter 9. Case Studies 164 terms found, such as searching for short proofs and naming variables in a readable manner. The testing tool for Agda [Qiao Haiyan, 2003] (which we mentioned in §8.4) is similar to ours in that it uses a QuickCheck-like approach and testing occurs within the target language. However, Agda’s testing tool is more powerful than ours in that it supports generators for inductive families, generators for functions and custom gen- erators. For example, this support for inductive families would have been useful for finding counterexamples to the proof obligations that arose in our binary adder case study. We note that Agsy and Agda’s testing tool are not integrated in any fashion which is in contrast to our framework where testing is integrated to provide error feedback, feedback for faulty hints and is used to guide proof search. However, Agda’s testing tool has been used in combination with interactive theorem proving and a boolean formula model checker to conduct program verification case studies [Qiao Haiyan, 2003]. Outside of Agda, we are not aware of any environment intended for dependently typed programming that includes inductive proof automation. For example, all of our case studies would require manual proofs if written in Sage, Epigram, ATS or Idris, although we note that Sage and ATS provide automation for linear arithmetic goals. Epigram, ATS and Idris also lack facilities for identifying errors and providing error feedback compared to what our framework offers. However, Sage does makes use of counterexamples as a way to detect errors. If Sage cannot verify a program property statically, a dynamic check is added to the compiled program that checks the property at run-time. If this run-time check is violated, the user is presented with a corresponding counterexample. The counterexample is stored in a database so that future proof obligations of the same form can be rejected at compile time. 9.11 Related Work in Inductive Proof Automation We are unaware of any tactics in Coq that can automate theorems that require induction to be performed. However, we note that a Coq tool is available that is intended to make writing inductive proofs about recursive functions easier through the generation of suitable induction principles [Barthe et al., 2006]. Moreover, Coq’s auto tactic [Bertot and Caste´ran, 2004], which uses a Prolog-like resolution approach, can provide help for proving examples similar to those that we have looked at, but only when induction Chapter 9. Case Studies 165 is not required and only when auto is supplied with carefully chosen theorems to use. In contrast, our automation requires minimal setup to be useful as well as being able to support working with new definitions. The Boyer-Moore theorem prover [Boyer and Moore, 1979], and its successor ACL2 [Kaufmann and Moore, 1997], are well known for their inductive proof au- tomation and are likely to be able to automate theorems similar to those that we have presented. ACL2 features a complex and fine-tuned simplification tactic which is used to simplify step case proofs, in contrast to our approach based on rippling. ACL2 also uses heuristics for choosing appropriate induction principles and induction variables. At the moment, our prover only supports proofs that use standard induction principles where this can limit the kinds of inductive proofs rippling can automate [Johansson, 2009, §5.6]. As with our work, when the automation fails in ACL2, the user has the option of providing hints. ACL2’s hint feature is more advanced that ours in that many more options are available. For example, the user can instruct the prover to apply a lemma to a specific subgoal in the proof attempt and dictate which induction principle to use. Rippling has been implemented in other systems, such as in Clam [Bundy et al., 1990], NuPrl [Pientka and Kreitz, 1998a] and IsaPlanner [Dixon, 2005]. IsaPlanner uses rippling to automate inductive proofs in a simply typed setting within a proof planning framework. Like our prover, IsaPlanner includes support for rippling proofs that involve case splits and multiple hypotheses [Johansson, 2009]. An important dif- ference between IsaPlanner and our prover is that we always attempt rippling, simpli- fication and generalisation on a top-level goal before an inductive proof is considered. In contrast, IsaPlanner always attempts to discharge top-level goals by performing in- duction first. A comparison of the level of automation offered by our system and IsaPlanner can be found in §7.14 and §9.8. 9.12 Conclusions In our case studies, we have shown that our framework provides practical support for developing dependently typed programs. These case studies involved verifying tail re- cursive functions, sorting functions, and a binary adder. A variety of data types were used and different program properties were captured to demonstrate the generality of our approach and framework. The proof automation was found to discharge a signifi- cant amount of the proof obligations produced. Moreover, we reported that we found Chapter 9. Case Studies 166 the error feedback and hinting facilities of our framework useful in practice during our case studies. Chapter 10 Conclusions and Further Work We now give a review of the contributions of the thesis and then discuss whether the hypothesis presented in the first chapter has been verified. This is followed with some suggestions on areas for future research. 10.1 Framework Overview In this thesis, we have presented a framework designed to make dependently typed functional programming with user-defined properties easier. The primary features of this framework are as follows: • Testing is employed to identify and give feedback to errors that are indicated by unprovable proof obligations (see Chapter 8). • Proof automation that supports reasoning about inductively defined types and recursively defined functions is employed to discharge proof obligations (see chapters 6 and 7). • To increase proof coverage, lemmas found during proof search are cached for reuse in future proof attempts (see §7.2). • Should the automation fail to discharge a goal, a trace of the proof attempt is given and the user has the opportunity to help the prover find a proof by giving high-level hints in the form of lemma conjectures (see §7.13). Testing is also used here to give feedback to faulty hints. The two main components of our framework are as follows: 167 Chapter 10. Conclusions and Further Work 168 • The proof automation component is composed of several tactics that are struc- tured using the Boyer-Moore waterfall approach [Boyer and Moore, 1979]. In- ductive proof automation is provided, primarily with the use of tactics that per- form simplification, rippling [Bundy et al., 2005] and generalisation [Aubin, 1976,Boyer and Moore, 1979,Aderhold, 2007]. For the latter, the testing com- ponent is employed to identify overgeneralisations. • The testing tool component uses a QuickCheck-like [Claessen and Hughes, 2000] approach to identify unprovable goals, where the counterexamples found are used for providing error feedback. We believe that this framework is effective at making development more practical as it gives integrated support for several common, and frequently challenging, activities that take place when programming with dependent types: identifying errors, fixing er- rors and constructing proofs. The generic nature of the framework means support can be given for capturing a wide range of program properties and the user is not restricted to working with predefined definitions. Further to this, a benefit of our approach is that the modular architecture of the proof automation gives a foundation that can be built upon for addressing new domains. For example, to give support for further pro- gram properties, new tactics could be introduced to various stages of the waterfall and existing tactics could be extended. In DML [Xi, 1998], the program properties the user can capture are restricted to those that concern linear arithmetic, but the proof automation provided is decidable. With our system, we can support many more forms of program properties but, with this freedom, it is harder to make guarantees to the user about what proofs can be automated. The hinting feature was added so that users can sometimes avoid having to resort to manual proofs on the occasions where the automation fails. However, the user has to have some knowledge of theorem proving and how the proof automation works to take advantage of this. Likewise, error feedback is only available when the user works with properties that can be tested. 10.2 Contributions The combination of features and their integration within this framework is novel com- pared to what is available in current dependently typed programming environments. We note that the underlying ideas and approach used could be equally applied to give Chapter 10. Conclusions and Further Work 169 support in other languages with dependent types, such as ATS, Epigram, Agda and Idris. In particular, we have shown how to provide effective proof automation for supporting program properties that involve inductively defined types and recursively defined functions. As a by-product of developing a prototype of our framework, we have introduced inductive proof automation and a QuickCheck-like testing tool to Coq. As far as we know, there are no existing tools that provide similar capabilities in this system. As such, we believe this contribution can be of practical value to the Coq community. Moreover, aspects of our work concerning inductive proof automation have appli- cations elsewhere: • We devised delayed generalisation as a technique for identifying irrelevant as- sumptions at the end of a proof for the purpose of eliminating the corresponding irrelevant subformulae from cached lemmas (see §7.10). This idea could also be applied in other provers that cache lemmas, such as in IsaPlanner [Dixon, 2005]. • We introduced heuristics that can automatically identify, from a collection of lemmas, a terminating set of rewrite rules suitable for simplification (see §7.11). These heuristics could also be employed by IsaPlanner to make more productive use of cached lemmas, including those found by IsaPlanner’s lemma discovery tool IsaCoSy [Dixon, 2005, Johansson, 2009]. • We have given further evidence that the rippling technique can be productively applied outside of its traditional use in proof planning. In most presentations, the utility of rippling is demonstrated as part of proof planning [Bundy, 1988,Dixon, 2005]. In contrast, our proof automation is structured using a Boyer-Moore style waterfall, where this waterfall includes a call to a rippling tactic. Rippling has also been used without proof planning in Nuprl [Pientka and Kreitz, 1998a]. 10.3 Hypothesis We now consider the evidence we have produced for the hypothesis presented in the first chapter. The hypothesis was as follows: “This framework makes dependently typed programming significantly easier” We believe that this has been shown with the evidence from our case studies where we reported on our experiences developing dependently typed programs with the help Chapter 10. Conclusions and Further Work 170 of our framework (see Chapter 9). In these case studies, we verified tail recursive functions, sorting functions, and variations of a binary adder. This included the use of programs based on examples found in ATS, Coq and Idris respectively. The programs that we wrote made use of a variety of data types, program proper- ties and various levels of type refinement. Program properties were described using inductively defined types and recursively defined functions, generally with the use of equality statements. The majority of our examples made use of subset types, although the binary adder case study involved the use of inductive families. A significant amount of the proof obligations generated (84%) were discharged automatically, thereby providing good evidence that the prover is effective in prac- tice. Moreover, the use of example programs that only differed by small representation changes demonstrated the robustness of the proof automation. The high level of automation relieved the burden of writing a large portion of the proofs by hand, a fact that undoubtedly made development significantly easier. More- over, in several cases where the proof automation failed, we found that we were able to use the lemma hinting facilities to help the prover find a proof. We also reported that the error feedback provided by our framework identified many errors for us in practice and that the error messages given were usually helpful in suggesting what changes to make. However, we did note that the error messages were sometimes hard to interpret when we were capturing weak specifications and when the proof obligations generated were complex. 10.4 Further Work We now describe several areas of future research into providing better support for pro- gramming with dependent types. The first topics we cover concern providing more support for inductive families and non-structural recursion, as well as integrating do- main specific techniques into our prover. From the work done so far, it is not entirely clear what extensions would need to be added to our automation to support these. We then cover topics where it is comparatively easier to know what work needs to be done next. For example, we can look to previous work to incorporate extensions to give automation for proofs that involve piecewise fertilisation and existential quantifiers. We would consider the topics of adding support for inductive families and induc- tive predicates some of the most important for making dependently typed program- ming more practical as these such representations appear often in most Coq, Agda and Chapter 10. Conclusions and Further Work 171 Epigram programs. Moreover, the integration of domain specific techniques into our prover is likely to be important for scaling to more complex program verification tasks. 10.4.1 Inductive Families The majority of the dependently typed programs we have considered so far capture program properties using subset types as opposed to dependently typed inductive fam- ilies. For future work, we would want to give further support for the latter. This would require looking at more varied and more complex examples of the proof pat- terns that arise when programming with inductive families. For example, we could consider inductive families for representing ordered lists [Altenkirch et al., 2005] and list permutations [Brady et al., 2008]. Sozeau’s formalisation of finger trees would be a challenging case study to consider as this makes extensive use of inductive families in combination with inductive predicates and subset types [Sozeau, 2007a]. For providing error feedback for the above, we would need to extend our testing tool with term generators for inductive families. Agda’s testing tool can test goals that include inductive families as long as the user writes custom generators for the inductive families used [Qiao Haiyan, 2003]. For a practical testing tool however, it would be desirable to minimise the need for the user to have to write the generators themselves. 10.4.2 Integrating Domain Specific Tools and Libraries Our automation work has focused on constructing proofs about user-defined properties with the use of induction. Typical dependently typed programs will make use of user- defined types in combination with common standard definitions, such as simply typed lists and Peano arithmetic. Useful further work would involve integrating domain spe- cific tactics into our automation along with methods for making use of theorems from existing libraries. These topics will be important for creating a tool that scales to larger and more complex programs than those which we have looked at so far. 10.4.3 Non-Structural Recursion The majority of the example programs we have considered so far have involved struc- tural recursion. If we wish to extend our proof automation support to non-structurally recursive functions, further examples will need to be examined to determine what ex- tensions would be required. Although we showed that our framework gave some auto- Chapter 10. Conclusions and Further Work 172 mated support for the proof obligations that arose in the case study involving quicksort (which was defined using non-structural recursion) in §9.4.3, it is unlikely that other examples will be as straightforward. 10.4.4 Inductive Predicates Currently, we do not provide support for working with user-defined properties that involve inductive predicates. The latter are used heavily in many dependently typed programs and support for inductive predicates would make our framework even more practical. Our rippling tactic would need several extensions to support inductive predicates. Given that even n is an inductive predicate, we would need to allow the use of rules such as ∀ n m, even n →even (n ∗ m) and ∀ n, even n ↔ even (S (S n)) for rippling proof steps. Furthermore, we would need to extend the weak fertilisation step to allow the use of user-defined relations. The recently improved rewriting support for working with arbitrarily relations in Coq is likely to make implementing these extensions easier [Sozeau, 2009]. For our simplification tactic, we would need additional heuristics for simplifying goals that contain inductive predicates, which would likely include the use of inversion [Cornes and Terrasse, 1995]. Our testing tool would need to be extended to test goals that contain user-defined inductive predicates. Agda’s testing tool shows how Prolog-like search can be used to test goals that include user-defined inductive predicates in limited cases [Qiao Haiyan, 2003]. In other situations, testing support could be given by asking the user to supply a mapping between an inductive predicate and an equivalent function so that the goal can be transformed to a testable form. 10.4.5 Infinite Data Structures An interesting extension would be to provide support for writing dependently typed programs that involve infinite data types, such as lazy lists. Such types can be defined in Coq using coinduction [Bertot, 2005]. Of particular relevance to extending our prover to support coinductive proofs, we are aware that work exists on coinductive proof automation in Clam [Dennis, 1998]. For providing error feedback, we would need to extend the generate and test phases of our testing tool to cope with the inclusion of infinite data types. Chapter 10. Conclusions and Further Work 173 10.4.6 Piecewise Fertilisation We do not currently support rippling proofs where a given includes an implication. For example, during our case studies, we required an inductive proof of the following theorem to show that an implementation of quicksort would terminate (see §9.4.3.2): ∀ x y z , l i s t p e rm ( x ++ y ) z → l eng th x < S ( leng th z ) The step case of this inductive proof requires rippling with a given that contains an implication. Likewise, when the propositional part of a subset type contains an impli- cation, similar reasoning will be required to discharge recursive call proof obligations. Proofs such as these could be supported by extending rippling to perform piecewise fertilisation [Armando et al., 1999]. IsaPlanner is similarly unable to automate these kinds of proofs at the moment [Johansson, 2009], but there are plans to add to the necessary extensions in the future [Dennis and Dixon, 2009]. 10.4.7 Improved Error Feedback The error messages discussed in Chapter 8 describe to the user how a proof obligation generated by a faulty program is unprovable with the use of a counterexample. We noted that the error messages could be more helpful when weakly specified functions were used (see §8.1.3) and we found in the binary adder case study (see §9.5) that the error messages produced were tricky to interpret as the proof obligations contained numerous assumptions and complex terms. To produce more helpful error messages, we have considered generating error mes- sages that make direct use of the top-level function being defined. To describe this idea by example, the counterexample to the proof obligation shown in §8.1 could be pre- sented in the following manner: Given the f o l l ow i ng output from ” i n te r spe rse ” : output = i n te r spe rse x y = in te r spe rse [ 1 ] [ ] = [ 1 ] The p r opos i t i o na l pa r t o f the output type o f ” i n t e r spe rse ” i s un inhab i tab le : Chapter 10. Conclusions and Further Work 174 leng th output = ( leng th x ) ∗ ( l eng th y ) leng th [ 1 ] = ( leng th [ 1 ] ) ∗ ( l eng th [ ] ) 1 = 1 ∗ 0 1 = 0 We think this style of error message is likely to be easier to understand and more concise in many cases compared to examining counterexamples to proof obligations. 10.4.8 Existential Quantifiers We have yet to consider support for proof obligations that contain existential quanti- fiers. Of relevance, rippling has been applied in Nuprl to automate proofs that contain existential quantifiers [Pientka and Kreitz, 1998a] and such an extension is likely to be useful to our proof automation. For testing support, we are aware that SmallCheck has some support for testing existentially quantified conjectures [Runciman et al., 2008]. 10.5 Summary In this thesis, we presented a framework that combines proof automation and testing for the purpose of supporting dependently typed programming. In this concluding chapter, we described the contributions of the thesis and outlined the evidence for our hypothesis that the framework presented makes dependently typed programming significantly easier. We then discussed possible further research that we believe can be used to make programming with dependent types more practical in future. Bibliography [Aczel, 1977] Aczel, P. (1977). An introduction to inductive definitions. In Barwise, J., editor, Handbook of Mathematical Logic, pages 739–782. North-Holland. [Adams and Dennis, 2003] Adams, A. A. and Dennis, L. A. (2003). Rippling in PVS. In Archer, M., Vito, B. D., and Munoz, C., editors, Proceedings of Design and Application of Strategies/Tactics in Higher Order Logics (STRATA 2003), pages 84–91. NASA Technical Report CP-2003-212448. [Aderhold, 2007] Aderhold, M. (2007). Improvements in formula generalization. In Pfenning, F., editor, Automated Deduction - CADE-21, volume 4603 of Lecture Notes in Computer Science, pages 231–246. Springer. [Allen et al., 2000] Allen, S. F., Constable, R. L., Eaton, R., Kreitz, C., and Lorigo, L. (2000). The Nuprl open logical environment. In CADE-17: Proceedings of the 17th International Conference on Automated Deduction, pages 170–176, London, UK. Springer-Verlag. [Altenkirch et al., 2005] Altenkirch, T., McBride, C., and McKinna, J. (2005). Why dependent types matter. http://www.e-pig.org/downloads/ydtm.pdf. [Armando et al., 1999] Armando, A., Smaill, A., and Green, I. (1999). Automatic synthesis of recursive programs: The proof-planning paradigm. Autom. Softw. Eng, 6(4):329–356. [Aspinall et al., 2008] Aspinall, D., Denney, E., and Lu¨th, C. (2008). A tactic lan- guage for hiproofs. In Autexier, S., Campbell, J., Rubio, J., Sorge, V., Suzuki, M., and Wiedijk, F., editors, AISC/MKM/Calculemus, volume 5144 of Lecture Notes in Computer Science, pages 339–354. Springer. [Aubin, 1976] Aubin, R. (1976). Mechanizing structural induction. PhD thesis, The University of Edinburgh. 175 Bibliography 176 [Augustsson, 1984] Augustsson, L. (1984). A compiler for Lazy ML. In LFP ’84: Proceedings of the 1984 ACM Symposium on LISP and functional programming, pages 218–227, New York, NY, USA. ACM Press. [Augustsson, 1998] Augustsson, L. (1998). Cayenne - a language with dependent types. In International Conference on Functional Programming, pages 239–250. [Augustsson and Carlsson, 1999] Augustsson, L. and Carlsson, M. (1999). An exer- cise in dependent types: A well-typed interpreter. In In Workshop on Dependent Types in Programming, Gothenburg. [Barnett et al., 2005] Barnett, M., Leino, K. R. M., and Schulte, W. (2005). The Spec# programming system: an overview. In Barthe, G., Burdy, L., Huisman, M., Lanet, J.-L., and Muntean, T., editors, Post Conference Proceedings of CASSIS: Construc- tion and Analysis of Safe, Secure and Interoperable Smart devices, Marseille, vol- ume 3362 of LNCS, pages 49–69. Springer-Verlag. [Barthe et al., 2006] Barthe, G., Forest, J., Pichardie, D., and Rusu, V. (2006). Defin- ing and reasoning about recursive functions: A practical tool for the Coq proof assistant. In Proceedings of 8th International Symposium on Functional and Logic Programming (FLOPS’06), volume 3945 of Lecture Notes in Computer Science, pages 114–129. Springer-Verlag. [Basin and Walsh, 1996] Basin, D. A. and Walsh, T. (1996). A calculus for and termi- nation of rippling. Journal of Automated Reasoning, 16(1–2):147–180. [Bertot, 2005] Bertot, Y. (2005). Coinduction in Coq. In Lecture Notes of TYPES Summer School 2005, Sweden, Volume II. [Bertot, 2008] Bertot, Y. (2008). Coq in a hurry. http://arxiv.org/abs/cs/ 0603118. [Bertot and Caste´ran, 2004] Bertot, Y. and Caste´ran, P. (2004). Interactive Theorem Proving and Program Development. Coq’Art: The Calculus of Inductive Construc- tions. Texts in Theoretical Computer Science. Springer Verlag. [Bertot and The´ry, 2008] Bertot, Y. and The´ry, L. (2008). Dependent types, theorem proving, and applications for a verifying compiler. In Verified Software: Theories, Tools, Experiments: First IFIP TC 2/WG 2.3 Conference, VSTTE 2005, pages 173– 181, Berlin, Heidelberg. Springer-Verlag. Bibliography 177 [Blanchette and Nipkow, 2009] Blanchette, J. C. and Nipkow, T. (2009). Nitpick: A counterexample generator for higher-order logic based on a relational model finder. Technical report, In Tests and Proofs 2009: Short Papers, ETH. [Boulton, 1993] Boulton, R. J. (1993). Boyer-Moore Automation for the HOL System. In HOL’92: Proceedings of the IFIP TC10/WG10.2 Workshop on Higher Order Logic Theorem Proving and its Applications, pages 133–142. North- Holland/Elsevier. [Bove and Capretta, 2005] Bove, A. and Capretta, V. (2005). Modelling general re- cursion in type theory. Mathematical Structures in Computer Science, 15:671–708. Cambridge University Press. [Bove et al., 2009] Bove, A., Dybjer, P., and Norell, U. (2009). A brief overview of Agda - a functional language with dependent types. In TPHOLs, 22nd International Conference, LNCS 5674, pages 73–78. [Boyer and Moore, 1979] Boyer, R. S. and Moore, J. S. (1979). A Computational Logic. New York: Academic Press, Orlando. [Boyer and Moore, 1988] Boyer, R. S. and Moore, J. S. (1988). Integrating decision procedures into heuristic theorem provers: a case study of linear arithmetic. Ma- chine intelligence, 11:83–124. [Brady, 2007] Brady, E. (2007). Ivor, a proof engine. In Proceedings of Implementa- tion of Functional Languages, volume 4449 of Lecture Notes in Computer Science. Springer. [Brady, 2008] Brady, E. (2008). Idris, a language with dependent types. In IFL 2008. [Brady et al., 2008] Brady, E., Herrmann, C., and Hammond, K. (2008). Lightweight invariants with full dependent types. In Proceedings of TFP 2008. [Brady, 2005] Brady, E. C. (2005). Practical Implementation of a Dependently Typed Functional Programming Language. PhD thesis, Durham University. [Bundy, 1988] Bundy, A. (1988). The use of explicit plans to guide inductive proofs. In Proceedings of the 9th International Conference on Automated Deduction, pages 111–120, London, UK. Springer-Verlag. Bibliography 178 [Bundy, 2001] Bundy, A. (2001). The automation of proof by mathematical induction. In Robinson, A. and Voronkov, A., editors, Handbook of Automated Reasoning, volume I, chapter 13, pages 845–911. Elsevier Science. [Bundy et al., 2005] Bundy, A., Basin, D., Hutter, D., and Ireland, A. (2005). Rip- pling: Meta-Level Guidance for Mathematical Reasoning. Cambridge University Press. [Bundy et al., 1993] Bundy, A., Stevens, A., van Harmelen, F., Ireland, A., and Smaill, A. (1993). Rippling: a heuristic for guiding inductive proofs. Artif. Intell., 62(2):185–253. [Bundy et al., 1990] Bundy, A., van Harmelen, F., Horn, C., and Smaill, A. (1990). The Oyster-Clam System. In Proceedings of the 10th International Conference on Automated Deduction, pages 647–648, London, UK. Springer-Verlag. [Burton, 1982] Burton, F. W. (1982). An efficient functional implementation of FIFO queues. Inf. Process. Lett., 14(5):205–206. [Calcagno et al., 2003] Calcagno, C., Taha, W., Huang, L., and Leroy, X. (2003). Im- plementing multi-stage languages using ASTs, GenSym, and Reflection. In In Krzysztof Czarnecki, Frank Pfenning, and Yannis Smaragdakis, editors, Generative Programming and Component Engineering (GPCE), Lecture Notes in Computer Science, pages 57–76. Springer-Verlag. [Cardelli, 1994] Cardelli, L. (1994). The Quest language and system. [Carlier and Dubois, 2008] Carlier, M. and Dubois, C. (2008). Functional testing in the Focal environment. In Tests and Proofs, pages 84–98. [Chen and Xi, 2005] Chen, C. and Xi, H. (2005). Combining Programming with The- orem Proving. In Proceedings of the Tenth ACM SIGPLAN International Confer- ence on Functional Programming, pages 66–77, Tallinn, Estonia. [Chlipala, 2007] Chlipala, A. J. (2007). Position paper: Thoughts on programming with proof assistants. Electr. Notes Theor. Comput. Sci, 174(7):17–21. [Christian, 1993] Christian, J. (1993). Flatterms, discrimination nets, and fast term rewriting. Journal of Automated Reasoning, 10:95–113. 10.1007/BF00881866. Bibliography 179 [Church, 1940] Church, A. (1940). A formulation of the simple theory of types. The Journal of Symbolic Logic, 5(2):56–68. [Church, 1941] Church, A. (1941). The Calculi of Lambda Conversion. Princeton University Press. [Claessen and Hughes, 2000] Claessen, K. and Hughes, J. (2000). QuickCheck: a lightweight tool for random testing of haskell programs. In Proceedings of the ACM Sigplan International Conference on Functional Programming (ICFP-00), volume 35.9 of ACM Sigplan Notices, pages 268–279, N.Y. ACM Press. [Colton and Pease, 2005] Colton, S. and Pease, A. (2005). The TM System for Repair- ing Non-Theorems. Electronic Notes in Theoretical Computer Science, 125(3):87– 101. [Constable et al., 1986] Constable, R. L., Allen, S. F., Bromley, H. M., Cleaveland, W. R., Cremer, J. F., Harper, R. W., Howe, D. J., Knoblock, T. B., Mendler, N. P., Panangaden, P., Sasaki, J. T., and Smith, S. F. (1986). Implementing Mathematics with the Nuprl Development System. Prentice-Hall, NJ. [Coq development team, 2006] Coq development team (2006). The Coq proof assis- tant reference manual. LogiCal Project. Version 8.1. [Coquand, 1998] Coquand, C. (1998). The AGDA proof system homepage. http: //www.cs.chalmers.se/˜catarina/agda/. [Cornes and Terrasse, 1995] Cornes, C. and Terrasse, D. (1995). Automating inver- sion of inductive predicates in Coq. In TYPES, pages 85–104. [Cui et al., 2005] Cui, S., Donnelly, K., and Xi, H. (2005). ATS: A language that com- bines programming with theorem proving. In Lecture Notes in Computer Science, pages 310–320. Springer. [Dantzig and Eaves, 1973] Dantzig, G. B. and Eaves, B. C. (1973). Fourier-Motzkin elimination and its dual. Journal of Combinatorial Theory, Series. A, 14(3):288– 297. [de Bruijn, 1980] de Bruijn, N. (1980). A survey of the project AUTOMATH. In Seldin, J. P. and Hindley, J. R., editors, To H.B. Curry: Essays in Combinatory Logic, Lambda Calculus and Formalism, pages 579–606. Academic Press. Bibliography 180 [Delahaye, 2000] Delahaye, D. (2000). A Tactic Language for the System Coq. In Parigot, M. and Voronkov, A., editors, Logic for Programming and Automated Rea- soning (LPAR), volume 1955 of Lecture Notes in Computer Science (LNCS)/Lecture Notes in Artificial Intelligence (LNAI), pages 85–95, Reunion Island (France). Springer. [Denney, 2001] Denney, E. (2001). The synthesis of a Java card tokenization algo- rithm. In Automated Software Engineering, pages 43–50. IEEE Computer Society. [Denney et al., 2006] Denney, E., Power, J., and Tourlas, K. (2006). Hiproofs: A hierarchical notion of proof tree. Electronic Notes in Theoretical Computer Science, 155:341–359. [Dennis, 1998] Dennis, L. A. (1998). Proof Planning Coinduction. PhD thesis, Edin- burgh University. [Dennis and Dixon, 2009] Dennis, L. A. and Dixon, L. (2009). Adapting piecewise fertilisation to reason about hypotheses. In Hustadt, U., editor, Proceedings of the Automated Reasoning Workshop 2009. [Dennis and Nogueira, 2005] Dennis, L. A. and Nogueira, P. (2005). What can be learned from failed proofs of non-theorems? In Hurd, J., Smith, E., and Darbari, A., editors, Theorem Proving in Higher Order Logics (TPHOLs 2005): Emerging Trends Proceedings, pages 45–58. Technical Report PRG-RP-05-2, Oxford Univer- sity Computer Laboratory. [Dennis and Smaill, 2001] Dennis, L. A. and Smaill, A. (2001). Ordinal arithmetic: A case study for rippling in a higher order domain. In TPHOLs ’01: Proceedings of the 14th International Conference on Theorem Proving in Higher Order Logics, pages 185–200, London, UK. Springer-Verlag. [Detlefs et al., 2005] Detlefs, D., Nelson, G., and Saxe, J. B. (2005). Simplify: a theorem prover for program checking. J. ACM, 52(3):365–473. [Dixon, 2005] Dixon, L. (2005). A Proof Planning Framework for Isabelle. PhD thesis, University of Edinburgh. [Dixon and Fleuriot, 2003] Dixon, L. and Fleuriot, J. D. (2003). IsaPlanner: A proto- type proof planner in Isabelle. In Proceedings of CADE’03, volume 2741 of LNCS, pages 279–283. Bibliography 181 [Dixon and Fleuriot, 2004] Dixon, L. and Fleuriot, J. D. (2004). Higher order rippling in IsaPlanner. In Theorem Proving in Higher Order Logics, volume 3223 of LNCS, pages 83–98. [Dybjer, 1991] Dybjer, P. (1991). Inductive sets and families in Martin-Lo¨f’s type theory and their set-theoretic semantics. In Logical Frameworks, pages 280–306. Cambridge University Press. [Dybjer, 1994] Dybjer, P. (1994). Inductive families. Formal Asp. Comput., 6(4):440– 465. [Dybjer et al., 2003a] Dybjer, P., Haiyan, Q., and Takeyama, M. (2003a). Combin- ing testing and proving in dependent type theory. In 16th International Confer- ence on Theorem Proving in Higher Order Logics (TPHOLs 2003), pages 188–203. SpringerVerlag. [Dybjer et al., 2003b] Dybjer, P., Haiyan, Q., and Takeyama, M. (2003b). Verifying Haskell programs by combining testing and proving. Quality Software, Interna- tional Conference on, 0:272. [Fogarty and Pasalic, 2007] Fogarty, S. and Pasalic, E. (2007). Concoqtion: indexed types now. In In Workshop on Partial Evaluation and Semantics-Based Program Manipulation, pages 112–121. ACM Press. ISBN. [Gime´nez and Caste´ran, 2005] Gime´nez, E. and Caste´ran, P. (2005). A tutorial on [co-]inductive types in Coq. available at http://coq.inria.fr/doc. [Gonthier, 2007] Gonthier, G. (2007). The Four Colour Theorem: Engineering of a formal proof. In Kapur, D., editor, ASCM, volume 5081 of Lecture Notes in Computer Science, page 333. Springer. [Gow, 2004] Gow, J. (2004). The Dynamic Creation of Induction Rules Using Proof Planning. PhD thesis, School of Informatics, University of Edinburgh. [Griffioen and Huisman, 1998] Griffioen, D. and Huisman, M. (1998). A comparison of PVS and Isabelle/HOL. In Proceedings of the 11th International Conference on Theorem Proving in Higher Order Logics, pages 123–142, London, UK. Springer- Verlag. Bibliography 182 [Grobauer, 2001] Grobauer, B. (2001). Cost recurrences for DML programs. In ICFP ’01: Proceedings of the sixth ACM SIGPLAN international conference on Func- tional programming, pages 253–264, New York, NY, USA. ACM Press. [Gronski et al., 2006] Gronski, J., Knowles, K., Tomb, A., Freund, S. N., and Flana- gan, C. (2006). Sage: Hybrid checking for flexible specifications. In In Scheme and Functional Programming Workshop, pages 93–104. [Hallgren, 1998] Hallgren, T. (1998). The proof editor Alfa. http://www.cs. chalmers.se/˜hallgren/Alfa/. [Howard, 1980] Howard, W. (1980). The formulas-as-types notion of construction. In Seldin, J. P. and Hindley, J. R., editors, To H. B. Curry: Essays on Combinatory Logic, Lambda-Calculus, and Formalism, pages 479–490. Academic Press, New York, NY. [Hudak et al., 1992] Hudak, P., Jones, S. P., Wadler, P., Boutel, B., Fairbairn, J., Fasel, J., Guzma´n, M. M., Hammond, K., Hughes, J., Johnsson, T., Kieburtz, D., Nikhil, R., Partain, W., and Peterson, J. (1992). Report on the programming language Haskell: a non-strict, purely functional language version 1.2. SIGPLAN Not., 27(5):1–164. [Hurd, 2001] Hurd, J. (2001). Predicate subtyping with predicate sets. In TPHOLs, pages 265–280. [Hutter and Sengler, 1996] Hutter, D. and Sengler, C. (1996). INKA: The next gen- eration. In McRobbie, M. A. and Slaney, J. K., editors, CADE, volume 1104 of Lecture Notes in Computer Science, pages 288–292. Springer. [IBM, 1954] IBM (1954). Specifications for the IBM Mathematical FORmula TRANSlating system. Preliminary report, IBM Corp., Programming Research Group, Applied Sciences Division, New York, NY, USA. [Ireland, 1992] Ireland, A. (1992). The use of planning critics in mechanizing induc- tive proofs. In Logic Programming and Automated Reasoning, pages 178–189. [Ireland, 1995] Ireland, A. (1995). Rippling to meet the challenge. Edinburgh Dream Group Blue Book Note 1049. Bibliography 183 [Ireland and Bundy, 1996] Ireland, A. and Bundy, A. (1996). Productive use of failure in inductive proof. Journal of Automated Reasoning, 16:79–111. [Johansson, 2009] Johansson, M. (2009). Automated Discovery of Inductive Lemmas. PhD thesis, University of Edinburgh. [Johansson et al., 2006] Johansson, M., Bundy, A., and Dixon, L. (2006). Best-first rippling. In Stock, O. and Schaerf, M., editors, Reasoning, Action and Interaction in AI Theories and Systems, volume 4155 of Lecture Notes in Computer Science, pages 83–100. Springer. [Kammu¨ller, 2000] Kammu¨ller, F. (2000). Modular reasoning in Isabelle. In CADE- 17: Proceedings of the 17th International Conference on Automated Deduction, pages 99–114, London, UK. Springer-Verlag. [Kaufmann and Moore, 1997] Kaufmann, M. and Moore, J. S. (1997). An industrial strength theorem prover for a logic based on Common Lisp. IEEE Transactions on Software Engineering, 23(4):203–213. [Koopman et al., 2002] Koopman, P., Alimarine, A., Tretmans, J., and Plasmeijer, R. (2002). Gast: Generic automated software testing. In The 14th International Work- shop on the Implementation of Functional Languages, IFL02, Selected Papers, vol- ume 2670 of LNCS, pages 84–100. Springer. [Kreisel, 1965] Kreisel, G. (1965). Mathematical logic. In Saaty, T. L., editor, Lec- tures on Modern Mathematics, page 95. John Wiley and Sons, New York. [Leino et al., 2000] Leino, K. R. M., Nelson, G., and Saxe, J. B. (2000). ESC/Java user’s manual. Technical note, Compaq Systems Research Center. [Leroy, 2009] Leroy, X. (2009). Formal verification of a realistic compiler. Commun. ACM, 52(7):107–115. [Lindblad and Benke, 2006] Lindblad, F. and Benke, M. (2006). A tool for automated theorem proving in Agda. Lecture Notes in Computer Science, 3839/2006:154–169. [Loader, 1998] Loader, R. (1998). Notes on simply typed lambda calculus. Technical report, The University of Edinburgh. Report number ECS-LFCS-98-381. Bibliography 184 [Luo, 1994] Luo, Z. (1994). Computation and Reasoning: A Type Theory for Com- puter Science. Number 11 in International Series of Monographs on Computer Science. Oxford University Press. [Magaud, 2003] Magaud, N. (2003). Programming with dependent types in Coq: a study of square matrices. http://dpt-info.u-strasbg.fr/˜magaud/UNSW/ Coq/Matrices/. [Magnusson and Nordstro¨m, 1994] Magnusson, L. and Nordstro¨m, B. (1994). The ALF proof editor and its proof engine. In TYPES ’93: Proceedings of the interna- tional workshop on Types for proofs and programs, pages 213–237, Secaucus, NJ, USA. Springer-Verlag New York, Inc. [Martin-Lo¨f, 1971] Martin-Lo¨f, P. (1971). A theory of types. Manuscript. [McBride, 2000] McBride, C. (2000). Dependently Typed Functional Programs and their Proofs. PhD thesis, LFCS, University of Edinburgh, Edinburgh, Scotland. [McBride, 2004] McBride, C. (2004). Epigram: Practical programming with depen- dent types. In Advanced Functional Programming, pages 130–170. [McBride, 2005] McBride, C. (2005). The Epigram prototype: a nod and two winks. [McBride and McKinna, 2004] McBride, C. and McKinna, J. (2004). The view from the left. Journal of Functional Programing, 14(1):69–111. [McCaslan et al., 2007] McCaslan, R., Bundy, A., and Autexier, S. (2007). Auto- mated discovery of inductive theorems. In Zalewska, R. M. A., editor, From insight to proof - Jubilee Book for Andrzej Trybulec, volume 10 (23), pages 135–150. Uni- versity of Bialystok. [McCune, 2001] McCune, W. (2001). MACE 2.0 reference manual and guide. http: //arxiv.org/abs/cs/0106042. [Mckinna and Brady, 2005] Mckinna, J. and Brady, E. (2005). Phase distinctions in the compilation of Epigram. [McKinna and Wright, 2006] McKinna, J. and Wright, J. (2006). A type-correct, stack-safe, provably correct, expression compiler in Epigram. Bibliography 185 [Milner, 1978] Milner, R. (1978). A theory of type polymorphism in programming. Journal of Computer and System Sciences, 17:348–375. [Milner et al., 1997] Milner, R., Tofte, M., and Macqueen, D. (1997). The Definition of Standard ML. MIT Press, Cambridge, MA, USA. [Murthy, 1990] Murthy, C. R. (1990). Extracting constructive content from classical proofs. PhD thesis, Cornell University, Ithaca, NY, USA. [Nash, 2000] Nash, J. C. (2000). The (Dantzig) simplex method for linear program- ming. Computing in Science and Eng., 2(1):29–31. [Necula, 1997] Necula, G. C. (1997). Proof-carrying code. In Proceedings of the 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Langauges (POPL ’97), pages 106–119, Paris. [Nipkow, 2004] Nipkow, S. B. T. (2004). Random testing in Isabelle/HOL. In Cuellar, J. and Liu, Z., editors, Software Engineering and Formal Methods (SEFM 2004), pages 230–239. IEEE Computer Society. [Nipkow et al., 2002] Nipkow, T., Paulson, L. C., and Wenzel, M. (2002). Is- abelle/HOL — A Proof Assistant for Higher-Order Logic, volume 2283 of LNCS. Springer. [Norell, 2007] Norell, U. (2007). Towards a practical programming language based on dependent type theory. PhD thesis, Department of Computer Science and Engi- neering, Chalmers University of Technology, SE-412 96 Go¨teborg, Sweden. [Okasaki, 1998] Okasaki, C. (1998). Purely Functional Data Structures. Cambridge University Press, Cambridge, England. [Owre, 2006] Owre, S. (2006). Random testing in PVS. In Workshop on Automated Formal Methods. [Owre et al., 1999] Owre, S., Shankar, N., Rushby, J. M., and Stringer-Calvert, D. W. J. (1999). PVS System Guide. Computer Science Laboratory, SRI International, Menlo Park, CA. [Papapanagiotou, 2007] Papapanagiotou, P. (2007). On the automation of inductive proofs in HOL light. Master Thesis, University of Edinburgh. Bibliography 186 [Parent, 1995] Parent, C. (1995). Synthesizing proofs from programs in the Calculus of Inductive Constructions. In MPC, pages 351–379. [Paulin-Mohring, 1989] Paulin-Mohring, C. (1989). Extraction de programmes dans le Calcul des Constructions. The`se d’universite´, Paris 7. [Paulin-Mohring, 1993] Paulin-Mohring, C. (1993). Inductive definitions in the sys- tem Coq - rules and properties. In TLCA ’93: Proceedings of the International Conference on Typed Lambda Calculi and Applications, pages 328–345, London, UK. Springer-Verlag. [Paulson, 1991] Paulson, L. C. (1991). Isabelle system for constructive type theory. http://www.cl.cam.ac.uk/Research/HVG/Isabelle/dist/library/CTT/. [Pfenning, 1993] Pfenning, F. (1993). Refinement types for logical frameworks. In Informal Proceedings of the Workshop on Types for Proofs and Programs, pages 285–299. [Pientka and Kreitz, 1998a] Pientka, B. and Kreitz, C. (1998a). Automating inductive specification proofs in Nuprl. Fundamenta Informaticae, 1(2):189 – 209. [Pientka and Kreitz, 1998b] Pientka, B. and Kreitz, C. (1998b). Instantiation of exis- tentially quantified variables in inductive specification proofs. In AISC ’98: Pro- ceedings of the International Conference on Artificial Intelligence and Symbolic Computation, pages 247–258, London, UK. Springer-Verlag. [Pierce, 2002] Pierce, B. C. (2002). Types and programming languages. MIT Press, Cambridge, MA, USA. [Pollack, 1994] Pollack, R. (1994). The Theory of LEGO: A Proof Checker for the Extended Calculus of Constructions. PhD thesis, University of Edinburgh. [Qiao Haiyan, 2003] Qiao Haiyan (2003). Testing and Proving in Dependent Type Theory. PhD thesis, School of Computer Science and Engineering, Chalmers Uni- versity of Technology. [Runciman et al., 2008] Runciman, C., Naylor, M., and Lindblad, F. (2008). Small- check and lazy smallcheck: automatic exhaustive testing for small values. In Haskell ’08: Proceedings of the first ACM SIGPLAN symposium on Haskell, pages 37–48, New York, NY, USA. ACM. Bibliography 187 [Shankar and Owre, 1999] Shankar, N. and Owre, S. (1999). Principles and pragmat- ics of subtyping in PVS. In Bert, D., Choppy, C., and Mosses, P., editors, Recent Trends in Algebraic Development Techniques, WADT ’99, volume 1827 of Lecture Notes in Computer Science, pages 37–52, Toulouse, France. Springer-Verlag. [Slaney, 1994] Slaney, J. (1994). FINDER: Finite domain enumerator system de- scription. In Bundy, A., editor, Automated Deduction-CADE-12, pages 798–801. Springer, Berlin, Heidelberg. [Slind et al., 1998] Slind, K., Gordon, M., Boulton, R., and Bundy, A. (1998). System description: An interface between CLAM and HOL. In Kirchner, C. and Kirchner, H., editors, Proceedings of the Fifteenth International Conference on Automated Deduction (CADE-15), volume 1421 of Lecture Notes in Artificial Intelligence, pages 134–138, Lindau, Germany. Springer. [Smaill and Green, 1996] Smaill, A. and Green, I. (1996). Higher-order annotated terms for proof search. In TPHOLs ’96: Proceedings of the 9th International Con- ference on Theorem Proving in Higher Order Logics, pages 399–413, London, UK. Springer-Verlag. [Sozeau, 2007a] Sozeau, M. (2007a). Program-ing Finger Trees in Coq. In ICFP’07: Proceedings of the 2007 ACM SIGPLAN International Conference on Functional Programming, pages 13–24. ACM Press. [Sozeau, 2007b] Sozeau, M. (2007b). Subset coercions in Coq. In TYPES’06, volume 4502 of Lecture Notes in Computer Science, pages 237–252. Springer. [Sozeau, 2008] Sozeau, M. (2008). Un environnement pour la programmation avec types dependants. The`se de doctorat, Universite´ Paris-Sud. [Sozeau, 2009] Sozeau, M. (2009). A New Look at Generalized Rewriting in Type Theory. Journal of Formalized Reasoning, 2(1):41–62. [van Deursen et al., 1993] van Deursen, A., Klint, P., and Tip, F. (1993). Origin track- ing. Journal of Symbolic Computation, 15(5-6):523–545. [Walther, 1994] Walther, C. (1994). Mathematical induction. In Handbook of logic in artificial intelligence and logic programming, pages 127–228. Oxford University Press, Inc., New York, NY, USA. Bibliography 188 [Weber, 2008] Weber, T. (2008). SAT-based Finite Model Generation for Higher- Order Logic. PhD thesis, Institut fu¨r Informatik, Technische Universita¨t Mu¨nchen, Germany. [Werner, 1994] Werner, B. (1994). Me´ta-the´orie du Calcul des Constructions Induc- tives. PhD thesis, Universite Paris VII. [Whittle and Cumming, 2000] Whittle, J. and Cumming, A. (2000). Evaluating envi- ronments for functional programming. International Journal of Human-Computer Studies, 52:847–878. [Wilson et al., 2010a] Wilson, S., Fleuriot, J., and Smaill, A. (2010a). Automation for dependently typed functional programming. To appear in: Special Issue of Fundamenta Informaticae on Dependently Typed Programming. [Wilson et al., 2010b] Wilson, S., Fleuriot, J., and Smaill, A. (2010b). Inductive proof automation for Coq. In Proceedings of the 2nd Coq Workshop, EPTCS. [Xi, 1998] Xi, H. (1998). Dependent Types in Practical Programming. PhD thesis, Carnegie Mellon University. [Xi, 1999a] Xi, H. (1999a). Dead code elimination through dependent types. Lecture Notes in Computer Science, 1551:228–242. [Xi, 1999b] Xi, H. (1999b). Dependently Typed Data Structures. In Proceedings of Workshop of Algorithmic Aspects of Advanced Programming Languages (WAAAPL ’99), pages 17–32, Paris. [Xi, 2000] Xi, H. (2000). Imperative programming with dependent types. In LICS ’00: Proceedings of the 15th Annual IEEE Symposium on Logic in Computer Science, page 375, Washington, DC, USA. IEEE Computer Society. [Xi, 2001] Xi, H. (2001). Dependent types for program termination verification. In Proceedings of 16th IEEE Symposium on Logic in Computer Science, Boston. [Xi, 2010] Xi, H. (2010). The ATS Programming Language. [Xi and Harper, 2001] Xi, H. and Harper, R. (2001). A dependently typed assembly language. In International Conference on Functional Programming, pages 169– 180. Bibliography 189 [Xi and Pfenning, 1998] Xi, H. and Pfenning, F. (1998). Eliminating array bound checking through dependent types. In SIGPLAN Conference on Programming Lan- guage Design and Implementation, pages 249–257. [Xi and Pfenning, 1999] Xi, H. and Pfenning, F. (1999). Dependent types in practi- cal programming. In Conference Record of POPL 99: The 26th ACM SIGPLAN- SIGACT Symposium on Principles of Programming Languages, San Antonio, Texas, pages 214–227, New York, NY. Appendix A Function and Type Definitions The following sections give the definitions of the functions and types used in this thesis. A.1 Peano Arithmetic Inductive nat : Set := | O : nat | S : nat → nat . Fixpoint plus ( n m: nat ) : nat := match n with | O ⇒ m | S p ⇒ S ( p lus p m) end . I n f i x ”+ ” := p lus . Fixpoint minus ( n m: nat ) {struct n} : nat := match n , m with | O, ⇒ 0 | S k , O ⇒ S k | S k , S l ⇒ minus k l end . I n f i x ”−” := minus . 190 Appendix A. Function and Type Definitions 191 Fixpoint mult ( n m: nat ) : nat := match n with | O ⇒ 0 | S p ⇒ m + mult p m end I n f i x ”∗ ” := mult . Fixpoint pow ( r n : nat ) : nat := match n with | O ⇒ 1 | S n ⇒ r ∗ pow r n end . I n f i x ” ˆ ” := pow . Fixpoint max ( n m: nat ) : nat := match n , m with | O, ⇒ m | S n ’ , O ⇒ n | S n ’ , S m’ ⇒ S (max n ’ m’ ) end . Fixpoint min ( n m: nat ) : nat := match n , m with | O, ⇒ 0 | S n ’ , O ⇒ 0 | S n ’ , S m’ ⇒ S (min n ’ m’ ) end . l e g t dec : ∀ n m: nat ) , {n ≤ m} + {n > m} nat eq dec : ∀ ( n m: nat ) , {n = m} + {n 6= m} Appendix A. Function and Type Definitions 192 A.2 Lists Inductive l i s t (A :Type ) : Type := | n i l : l i s t A | cons : A → l i s t A → l i s t A . Inductive vect (A : Set ) : nat → Set := | v n i l : vect A O | vcons : ∀ ( n : nat ) , A → vect A n → vect A (S n ) . Fixpoint l eng th (A :Type ) ( a : l i s t A) : nat := match a with | [ ] ⇒ O | h : : t ⇒ S ( leng th t ) end . Fixpoint app (A :Type ) ( a b : l i s t ) : l i s t A := match a with | [ ] ⇒ b | h : : t ⇒ h : : app t b end . Fixpoint rev ( a : l i s t A) : l i s t A := match a with | n i l ⇒ n i l | h : : t ⇒ rev t ++ h : : n i l end . Fixpoint sum (a : l i s t nat ) : nat := match a with | [ ] ⇒ 0 | h : : t ⇒ h + sum t end . Fixpoint f o l d l e f t (A B : Type ) Appendix A. Function and Type Definitions 193 ( f : A → B → A) ( a : l i s t B) ( i :A) : A := match a with | n i l ⇒ i | cons h t ⇒ f o l d l e f t f t ( f i h ) end . Fixpoint l i s t c o u n t ( a : l i s t nat ) ( x : nat ) : nat := match a with | n i l ⇒ 0 | h : : t ⇒ i f nat eq dec h x then S ( l i s t c o u n t t x ) else l i s t c o u n t t x end . Nota t ion l i s t p e rm x y := (∀ n , l i s t c o u n t x n = l i s t c o u n t y n ) A.3 Binary Trees Inductive bt ree (A :Type ) : Type := | empty : b t ree | node : A → bt ree → bt ree → bt ree . Fixpoint i no rde r ( a : b t ree A) : l i s t A := match a with | empty ⇒ [ ] | node v l r ⇒ ( i no rde r l ) ++ [ v ] ++ ( i no rde r r ) end . Fixpoint f o l d l e f t (A B:Type ) ( f :B→A→B) ( l : b t ree A) ( i : B) : B := match l with | empty ⇒ i | node v l r ⇒ f o l d l e f t f r ( f ( f o l d l e f t f l i ) v ) end . Appendix A. Function and Type Definitions 194 Fixpoint bt ree coun t ( a : b t ree nat ) ( x : nat ) : nat := match a with | empty ⇒ 0 | node v l r ⇒ l e t coun t l r := t ree coun t l x + t ree coun t r x in i f nat eq dec v x then S coun t l r else coun t l r end . Nota t ion btree perm x y := (∀ n , b t ree coun t x n = b t ree coun t y n ) A.4 IsaPlanner Theorem Corpus Definitions The following Coq definitions were used for the IsaPlanner theorem corpus experiment (see §7.14): Fixpoint l a s t ( l : l i s t A) ( d :A) : A := match l with | [ ] ⇒ d | [ a ] ⇒ a | a : : l ⇒ l a s t l d end . Fixpoint l ess eq m n {struct m} : Prop := match m, n with | 0 , ⇒ True | S m’ , 0 ⇒ False | S m’ , S n ’ ⇒ ( less eq m’ n ’ ) end . Fixpoint l ess m n {struct m} : Prop := match m, n with | , 0 ⇒ False | 0 , S n ’ ⇒ True | S m’ , S n ’ ⇒ ( less m’ n ’ ) Appendix A. Function and Type Definitions 195 end . Lemma less eq dec : ∀ ( x y : nat ) , { l ess eq x y} + {∼ l ess eq x y } . i nduc t i on x ; i nduc t i on y ; s impl in ∗ ; t r y tau to . apply IHx . Defined . Lemma less dec : ∀ ( x y : nat ) , { l ess x y} + {∼ l ess x y } . i nduc t i on x ; i nduc t i on y ; s impl in ∗ ; t r y tau to . apply IHx . Defined . Fixpoint max n m {struct n} : nat := match n , m with | O, ⇒ m | S n ’ , O ⇒ n | S n ’ , S m’ ⇒ S (max n ’ m’ ) end . Fixpoint min n m {struct n} : nat := match n , m with | O, ⇒ 0 | S n ’ , O ⇒ 0 | S n ’ , S m’ ⇒ S (min n ’ m’ ) end . Inductive bt ree (A :Type ) : Type := | empty : b t ree A | node : A → bt ree A → bt ree A → bt ree A. Fixpoint mi r r o r (A :Type ) ( a : b t ree A) : b t ree A := match a with | empty ⇒ empty A Appendix A. Function and Type Definitions 196 | node v l r ⇒ node v ( m i r r o r r ) ( m i r r o r l ) end . Fixpoint he igh t (A :Type ) ( a : b t ree A) : nat := match a with | empty ⇒ 0 | node v l r ⇒ 1 + max ( he igh t l ) ( he igh t r ) end . Fixpoint drop (A :Type ) ( n : nat ) ( a : l i s t A) {struct a} : l i s t A := match a with | n i l ⇒ [ ] | h : : t ⇒ match n with | O ⇒ a | S p ⇒ drop p t end end . Fixpoint take (A :Type ) ( n : nat ) ( a : l i s t A) {struct a} : l i s t A := match a with | n i l ⇒ [ ] | h : : t ⇒ match n with | O ⇒ [ ] | S p ⇒ h : : ( take p t ) end end . Fixpoint takeWhi le (A :Type ) (P :A→bool ) ( a : l i s t A) {struct a} : l i s t A := match a with | n i l ⇒ [ ] Appendix A. Function and Type Definitions 197 | h : : t ⇒ i f P h then h : : ( takeWhi le P t ) else [ ] end . Fixpoint dropWhile (A :Type ) (P :A→bool ) ( a : l i s t A) {struct a} : l i s t A := match a with | n i l ⇒ [ ] | h : : t ⇒ i f P h then ( dropWhile P t ) else a end . Fixpoint bu t l a s t (A :Type ) ( a : l i s t A) : l i s t A := match a with | n i l ⇒ [ ] | h : : t ⇒ match t with | [ ] ⇒ [ ] | ⇒ h : : ( b u t l a s t t ) end end . Fixpoint member (A :Type ) ( eqA : ∀ ( x y : A) , {x = y} + {x 6= y} ) ( x : A) ( a : l i s t A) {struct a} : Prop := match a with | n i l ⇒ False | h : : t ⇒ i f eqA x h then True else (member eqA x t ) end . Fixpoint i n s e r t ( x : nat ) ( a : l i s t nat ) : l i s t nat := match a with | n i l ⇒ [ x ] | h : : t ⇒ Appendix A. Function and Type Definitions 198 i f less dec x h then x : : a else h : : ( i n s e r t x t ) end . Fixpoint i n se r t 1 ’ ( x : nat ) ( a : l i s t nat ) : l i s t nat := match a with | n i l ⇒ [ x ] | h : : t ⇒ i f eq nat dec ide x h then x : : t else h : : ( i n se r t 1 ’ x t ) end . Fixpoint sor ted ( a : l i s t nat ) : Prop := match a with | n i l ⇒ True | h1 : : t1 ⇒ match t1 with | n i l ⇒ True | h2 : : t2 ⇒ i f less eq dec h1 h2 then ( sor ted t1 ) else False end end . Fixpoint count (A :Type ) ( eqA : ∀ ( x y :A) , {x = y} + {x 6= y} ) ( x : A) ( a : l i s t A) : nat := match a with | n i l ⇒ 0 | h : : t ⇒ i f eqA x h then S ( count eqA x t ) else ( count eqA x t ) end . Fixpoint i n s o r t ( x : nat ) ( a : l i s t nat ) {struct a} : l i s t nat := match a with | n i l ⇒ [ x ] | h : : t ⇒ i f less eq dec x h then x : : a else h : : ( i n s o r t x t ) Appendix A. Function and Type Definitions 199 end . Fixpoint so r t ( a : l i s t nat ) : l i s t nat := match a with | n i l ⇒ [ ] | h : : t ⇒ i n s o r t h ( so r t t ) end . Fixpoint z ip ( l : l i s t A) ( l ’ : l i s t B) : l i s t (A∗B) := match l , l ’ with | x : : t l , y : : t l ’ ⇒ ( x , y ) : : ( z i p t l t l ’ ) | , ⇒ n i l end . Appendix B Case Study Results The table in this appendix gives the results of running our prover against the proof obligations generated from the case study programs from Chapter 9. The labels in the header of the table denote which configuration of our prover was used for each set of results: DiscoverAndCache (DAC): The lemmas found by lemma discovery tool are avail- able for use by the prover. The lemma cache is cleared after each verification task (where the tasks are attempted in the order given in the table). This is the configuration used when conducting the case studies described in detail in Chap- ter 9. DiscoverOnly (DO): The lemmas found by lemma discovery tool are available for use by the prover. Lemmas are not cached after proof attempts. CacheOnly (CO): The lemmas found by lemma discovery tool are not available for use by the prover. Lemmas are cached after proof attempts. The lemma cache is not cleared after each verification task (where the tasks are attempted in the order given in the table). BasicDefs (BD): The prover only uses basic definitions during proofs (i.e. it is not allowed to use cached lemmas). The ST label denotes the results of our prover when ran against simply typed versions of the goals from the case studies (see §9.8.1) using the same configuration as BD above. The IsaP label denotes results from IsaPlanner (configured to only use basic definitions) running against Isabelle versions of these goals. 200 Appendix B. Case Study Results 201 Goal DAC DO CO BD ST IsaP Tail recursive sum (without fold): 1 3 3 3 3 3 3 2 3 3 3 3 3 5 Tail recursive sum without fold (variant): 3 3 3 3 3 3 5 4 3 3 3 3 3 5 Tail recursive sum with fold: 5 3 3 3 3 3 5 Tail recursive factorial without fold: 6 3 3 3 3 3 3 7 3 3 3 3 3 5 8 3 3 3 3 3 5 Tail recursive factorial without fold (variant): 9 3 3 3 3 3 3 10 3 3 3 5 5 5 11 3 3 3 3 3 5 Tail recursive factorial with fold: 12 5 5 5 5 5 5 Tail recursive factorial with fold (variant): 13 5 5 5 5 5 5 14 3 3 3 3 3 5 Tail recursive inorder without fold: 15 3 3 3 3 3 5 16 3 3 3 3 3 5 Tail recursive inorder with fold: 17 5 5 5 5 5 5 Tail recursive inorder with fold (variant): 18 3 3 3 3 3 5 19 3 3 3 3 3 5 Insertion sort (length property): 20 3 3 3 3 3 3 21 3 3 3 3 3 5 Insertion sort (length property variant): 22 3 3 3 3 3 5 Continued on next page Appendix B. Case Study Results 202 Goal DAC DO CO BD ST IsaP Insertion sort (permutation property): 23 3 3 3 3 3 5 24 3 3 3 3 3 5 Insertion sort (permutation property variant): 25 3 3 3 3 3 5 Treesort (length property): 26 3 3 3 3 3 3 27 3 3 3 3 3 3 28 3 3 3 3 3 5 29 3 3 3 3 3 5 Treesort (permutation property): 30 3 3 3 3 3 5 31 3 3 3 3 3 5 32 3 3 3 3 3 5 33 5 5 5 5 5 5 Treesort (permutation property variant): 34 3 3 3 3 3 5 35 3 3 3 3 3 5 36 5 5 5 5 5 5 37 3 3 3 3 3 5 Quicksort (length property): 38 3 3 3 3 3 3 39 3 3 3 3 3 5 40 5 5 5 5 5 5 41 5 5 5 5 5 5 42 3 3 3 3 3 5 Quicksort (permutation property): 43 3 3 3 3 3 5 44 3 3 3 3 3 5 45 5 5 5 5 5 5 46 5 5 5 5 5 5 47 5 5 5 5 5 5 Binary adder using inductive families: 48 3 3 3 5 5 5 49 3 3 3 5 5 5 Continued on next page Appendix B. Case Study Results 203 Goal DAC DO CO BD ST IsaP 50 3 3 3 3 3 3 51 3 3 3 3 3 5 52 3 3 3 5 5 5 53 3 3 3 3 3 5 54 3 3 3 3 3 5 Binary adder using inductive families (variant): 55 3 3 3 5 5 5 56 3 3 3 5 5 5 57 3 3 3 3 3 5 58 3 3 3 5 5 5 59 3 3 3 3 3 5 60 3 3 3 3 3 5 Binary adder using subset types: 61 3 3 3 5 5 5 62 3 3 3 5 5 5 63 3 3 3 3 5 5 64 3 3 3 3 3 5 65 3 3 3 3 3 5 66 3 3 3 3 3 5 67 5 5 5 5 5 5 Total successes 56 56 56 47 46 8 Total failures 11 11 11 20 21 59 Time (s) 107.51 103.83 138.71 122.93 149.59 341.25 Appendix C IsaPlanner Theorem Corpus Experiment The following table contains the experimental results generated from running our prover against a theorem corpus that has used to evaluate IsaPlanner (see §7.14). For successfully automated theorems (indicated with a tick mark), the time indicates how long our prover took to find a proof. For theorems that could not be automated (indi- cated with a cross mark), the time indicates how long the prover took to fail. No. Theorem Result Time (s) 01 m − m = 0 3 0.01 02 n − (n + m)= 0 3 0.06 03 n + m − n = m 3 0.06 04 k + m − (k + n) = m − n 3 0.08 05 i − j − k = i − (j + k) 3 0.07 06 less eq n 0 ↔ n = 0 5 0.47 07 less eq n (n + m) 3 0.20 08 less i (S ( i + m)) 3 0.01 09 max a b = max b a 3 0.06 10 max (max a b)c = max a (max b c) 3 0.09 11 max a b = a ↔ less eq b a 5 0.67 12 max a b = b ↔ less eq a b 5 0.12 13 min a b = min b a 3 0.06 14 min (min a b) c = min a (min b c) 3 0.07 15 min a b = a ↔ less eq a b 5 0.80 Continued on next page 204 Appendix C. IsaPlanner Theorem Corpus Experiment 205 No. Theorem Result Time (s) 16 min a b = b ↔ less eq b a 5 0.83 17 drop 0 xs = xs 3 0.02 18 drop (S n) (x :: xs) = drop n xs 3 0.00 19 drop n (map f xs) = map f (drop n xs) 3 11.77 20 len (drop n xs) = len xs − n 3 0.09 21 take 0 xs = [] 3 0.01 22 take (S n) (x :: xs) = x :: take n xs 3 0.00 23 take n (map f xs) = map f (take n xs) 3 14.26 24 take n xs ++ drop n xs = xs 3 0.22 25 zip [] ys = [] 3 0.00 26 zip (x :: xs) ys = match ys with [] ⇒ 3 0.00 [] | (z :: zs) ⇒ (x, z) :: zip xs zs end 27 zip (x :: xs) (y :: ys) = (x, y) :: zip xs ys 3 0.00 28 height (mirror t ) = height t 3 0.20 29 member x (l ++ (x :: []) ) 3 0.06 30 ∼member x (delete x l) 3 0.08 31 member x l →member x (l ++ t) 5 0.01 32 member x t →member x (l ++ t) 5 0.01 33 member x (insert x l ) 3 0.45 34 member x (insert 1’ x l ) 3 0.25 35 len ( insert x l ) = S (len l ) 3 0.21 36 len (sort l ) = len l 3 0.32 37 xs = [] → last (x :: xs) default = x 3 0.00 38 1 + count n l = count n (n :: l ) 3 0.01 39 n = x → 1 + count n l = count n (x :: l ) 3 0.01 40 count n l + count n m = count n ( l ++ m) 3 0.22 41 count n (x ++ (n :: []) ) = S (count n x) 3 0.19 42 count n (h :: []) + count n t = count n (h :: t ) 3 0.00 43 less eq (count n l ) (count n ( l ++ m)) 5 0.26 44 dropWhile (fun ⇒ false ) xs = xs 3 0.02 45 takeWhile (fun ⇒ true) xs = xs 3 0.04 46 takeWhile P xs ++ dropWhile P xs = xs 3 1.81 47 filter P (xs ++ ys) = filter P xs ++ filter P ys 3 0.21 48 m + n − n = m 5 0.38 49 S m − n − S k = m − n − k 5 10.30 Continued on next page Appendix C. IsaPlanner Theorem Corpus Experiment 206 No. Theorem Result Time (s) 50 less i (S (m + i) ) 5 0.06 51 less eq n (m + n) 5 0.06 52 less eq m n → less eq m (S n) 5 0.02 53 drop n (drop m xs) = drop (n + m) xs 5 0.14 54 drop n (xs ++ ys) = drop n xs ++ drop (n − len xs) ys 5 0.29 55 drop n (take m xs) = take (m − n)(drop n xs) 5 0.14 56 drop n (zip xs ys) = zip (drop n xs) (drop n ys) 5 0.19 57 rev (drop i xs) = take (len xs − i) (rev xs) 5 0.15 58 rev (take i xs) = drop (len xs − i) (rev xs) 5 0.12 59 rev ( filter P xs) = filter P (rev xs) 5 0.26 60 take n (xs ++ ys) = take n xs ++ take (n − len xs) ys 5 0.23 61 take n (drop m xs) = drop m (take (n + m) xs) 5 0.16 62 take n (zip xs ys) = zip (take n xs) (take n ys) 5 0.16 63 less eq (len ( filter P xs)) (len xs) 5 1.41 64 zip (xs ++ ys) zs = zip xs (take (len xs) zs) ++ 5 0.53 zip ys (drop (len xs) zs) 65 zip xs (ys ++ zs) = zip (take (len ys) xs) ys ++ 5 0.43 zip (drop (len ys) xs) zs 66 len xs = len ys → zip (rev xs) (rev ys) = rev (zip xs ys) 5 0.13 67 less eq (len (delete x l ) ) (len l ) 5 0.14 68 less x y →member x (insert y l ) = member x l 5 12.59 69 x 6= y →member x (insert y l ) = member x l 3 2.50 70 sorted l → sorted ( insert x l ) 5 0.01 71 sorted (sort l ) 5 0.03 72 last (xs ++ (x :: []) ) default = x 5 0.07 73 xs 6= [] → last (x :: xs) default = last xs default 5 0.02 74 ys0 = [] → last (xs ++ ys0) default = last xs default 3 0.04 75 ys 6= [] → last (xs ++ ys) default = last ys default 5 0.25 76 last (xs ++ ys) default = match ys with [] ⇒ 3 0.13 ( last xs default ) | ⇒ ( last ys default ) end 77 less n (len xs) → last (drop n xs) default = last xs default 5 0.25 78 butlast (xs ++ (x :: []) ) = xs 5 0.08 79 xs 6= [] → butlast xs ++ ( last xs default :: []) = xs 5 0.05 80 butlast (xs ++ ys) = match ys with [] ⇒ 3 0.15 ( butlast xs) | ⇒ (xs ++ butlast ys) end Continued on next page Appendix C. IsaPlanner Theorem Corpus Experiment 207 No. Theorem Result Time (s) 81 butlast xs = take (len xs − 1) xs 5 0.09 82 len ( butlast xs) = len xs − 1 5 0.07 83 less eq (len (delete x l ) ) (len l ) 5 0.10 84 count n t + count n (h :: []) = count n (h :: t ) 3 0.14 85 count n l = count n (rev l ) 5 0.67 86 count x l = count x (sort l ) 5 1.65 87 n 6= h → count n (x ++ (h :: []) ) = count n x 3 0.29