annotate src/Large.v @ 241:cb3f3ef9d5bb

slow
author Adam Chlipala <adamc@hcoop.net>
date Wed, 09 Dec 2009 11:36:37 -0500
parents b28c6e6eca0c
children 5a32784e30f3
rev   line source
adamc@235 1 (* Copyright (c) 2009, Adam Chlipala
adamc@235 2 *
adamc@235 3 * This work is licensed under a
adamc@235 4 * Creative Commons Attribution-Noncommercial-No Derivative Works 3.0
adamc@235 5 * Unported License.
adamc@235 6 * The license text is available at:
adamc@235 7 * http://creativecommons.org/licenses/by-nc-nd/3.0/
adamc@235 8 *)
adamc@235 9
adamc@235 10 (* begin hide *)
adamc@236 11 Require Import Arith.
adamc@236 12
adamc@235 13 Require Import Tactics.
adamc@235 14
adamc@235 15 Set Implicit Arguments.
adamc@235 16 (* end hide *)
adamc@235 17
adamc@235 18
adamc@235 19 (** %\chapter{Proving in the Large}% *)
adamc@235 20
adamc@236 21 (** It is somewhat unfortunate that the term "theorem-proving" looks so much like the word "theory." Most researchers and practitioners in software assume that mechanized theorem-proving is profoundly impractical. Indeed, until recently, most advances in theorem-proving for higher-order logics have been largely theoretical. However, starting around the beginning of the 21st century, there was a surge in the use of proof assistants in serious verification efforts. That line of work is still quite new, but I believe it is not too soon to distill some lessons on how to work effectively with large formal proofs.
adamc@236 22
adamc@236 23 Thus, this chapter gives some tips for structuring and maintaining large Coq developments. *)
adamc@236 24
adamc@236 25
adamc@236 26 (** * Ltac Anti-Patterns *)
adamc@236 27
adamc@237 28 (** In this book, I have been following an unusual style, where proofs are not considered finished until they are "fully automated," in a certain sense. SEach such theorem is proved by a single tactic. Since Ltac is a Turing-complete programming language, it is not hard to squeeze arbitrary heuristics into single tactics, using operators like the semicolon to combine steps. In contrast, most Ltac proofs "in the wild" consist of many steps, performed by individual tactics followed by periods. Is it really worth drawing a distinction between proof steps terminated by semicolons and steps terminated by periods?
adamc@236 29
adamc@237 30 I argue that this is, in fact, a very important distinction, with serious consequences for a majority of important verification domains. The more uninteresting drudge work a proof domain involves, the more important it is to work to prove theorems with single tactics. From an automation standpoint, single-tactic proofs can be extremely effective, and automation becomes more and more critical as proofs are populated by more uninteresting detail. In this section, I will give some examples of the consequences of more common proof styles.
adamc@236 31
adamc@236 32 As a running example, consider a basic language of arithmetic expressions, an interpreter for it, and a transformation that scales up every constant in an expression. *)
adamc@236 33
adamc@236 34 Inductive exp : Set :=
adamc@236 35 | Const : nat -> exp
adamc@236 36 | Plus : exp -> exp -> exp.
adamc@236 37
adamc@236 38 Fixpoint eval (e : exp) : nat :=
adamc@236 39 match e with
adamc@236 40 | Const n => n
adamc@236 41 | Plus e1 e2 => eval e1 + eval e2
adamc@236 42 end.
adamc@236 43
adamc@236 44 Fixpoint times (k : nat) (e : exp) : exp :=
adamc@236 45 match e with
adamc@236 46 | Const n => Const (k * n)
adamc@236 47 | Plus e1 e2 => Plus (times k e1) (times k e2)
adamc@236 48 end.
adamc@236 49
adamc@236 50 (** We can write a very manual proof that [double] really doubles an expression's value. *)
adamc@236 51
adamc@236 52 Theorem eval_times : forall k e,
adamc@236 53 eval (times k e) = k * eval e.
adamc@236 54 induction e.
adamc@236 55
adamc@236 56 trivial.
adamc@236 57
adamc@236 58 simpl.
adamc@236 59 rewrite IHe1.
adamc@236 60 rewrite IHe2.
adamc@236 61 rewrite mult_plus_distr_l.
adamc@236 62 trivial.
adamc@236 63 Qed.
adamc@236 64
adamc@236 65 (** We use spaces to separate the two inductive cases. The second case mentions automatically-generated hypothesis names explicitly. As a result, innocuous changes to the theorem statement can invalidate the proof. *)
adamc@236 66
adamc@236 67 Reset eval_times.
adamc@236 68
adamc@236 69 Theorem eval_double : forall k x,
adamc@236 70 eval (times k x) = k * eval x.
adamc@236 71 induction x.
adamc@236 72
adamc@236 73 trivial.
adamc@236 74
adamc@236 75 simpl.
adamc@236 76 (** [[
adamc@236 77 rewrite IHe1.
adamc@236 78
adamc@236 79 Error: The reference IHe1 was not found in the current environment.
adamc@236 80
adamc@236 81 ]]
adamc@236 82
adamc@236 83 The inductive hypotheses are named [IHx1] and [IHx2] now, not [IHe1] and [IHe2]. *)
adamc@236 84
adamc@236 85 Abort.
adamc@236 86
adamc@236 87 (** We might decide to use a more explicit invocation of [induction] to give explicit binders for all of the names that we will reference later in the proof. *)
adamc@236 88
adamc@236 89 Theorem eval_times : forall k e,
adamc@236 90 eval (times k e) = k * eval e.
adamc@236 91 induction e as [ | ? IHe1 ? IHe2 ].
adamc@236 92
adamc@236 93 trivial.
adamc@236 94
adamc@236 95 simpl.
adamc@236 96 rewrite IHe1.
adamc@236 97 rewrite IHe2.
adamc@236 98 rewrite mult_plus_distr_l.
adamc@236 99 trivial.
adamc@236 100 Qed.
adamc@236 101
adamc@236 102 (** We pass [induction] an %\textit{%#<i>#intro pattern#</i>#%}%, using a [|] character to separate out instructions for the different inductive cases. Within a case, we write [?] to ask Coq to generate a name automatically, and we write an explicit name to assign that name to the corresponding new variable. It is apparent that, to use intro patterns to avoid proof brittleness, one needs to keep track of the seemingly unimportant facts of the orders in which variables are introduced. Thus, the script keeps working if we replace [e] by [x], but it has become more cluttered. Arguably, neither proof is particularly easy to follow.
adamc@236 103
adamc@237 104 That category of complaint has to do with understanding proofs as static artifacts. As with programming in general, with serious projects, it tends to be much more important to be able to support evolution of proofs as specifications change. Unstructured proofs like the above examples can be very hard to update in concert with theorem statements. For instance, consider how the last proof script plays out when we modify [times] to introduce a bug. *)
adamc@236 105
adamc@236 106 Reset times.
adamc@236 107
adamc@236 108 Fixpoint times (k : nat) (e : exp) : exp :=
adamc@236 109 match e with
adamc@236 110 | Const n => Const (1 + k * n)
adamc@236 111 | Plus e1 e2 => Plus (times k e1) (times k e2)
adamc@236 112 end.
adamc@236 113
adamc@236 114 Theorem eval_times : forall k e,
adamc@236 115 eval (times k e) = k * eval e.
adamc@236 116 induction e as [ | ? IHe1 ? IHe2 ].
adamc@236 117
adamc@236 118 trivial.
adamc@236 119
adamc@236 120 simpl.
adamc@236 121 (** [[
adamc@236 122 rewrite IHe1.
adamc@236 123
adamc@236 124 Error: The reference IHe1 was not found in the current environment.
adamc@236 125
adamc@236 126 ]] *)
adamc@236 127
adamc@236 128 Abort.
adamc@236 129
adamc@237 130 (** Can you spot what went wrong, without stepping through the script step-by-step? The problem is that [trivial] never fails. Originally, [trivial] had been succeeding in proving an equality that follows by reflexivity. Our change to [times] leads to a case where that equality is no longer true. [trivial] happily leaves the false equality in place, and we continue on to the span of tactics intended for the second inductive case. Unfortunately, those tactics end up being applied to the %\textit{%#<i>#first#</i>#%}% case instead.
adamc@237 131
adamc@237 132 The problem with [trivial] could be "solved" by writing [solve [trivial]] instead, so that an error is signaled early on if something unexpected happens. However, the root problem is that the syntax of a tactic invocation does not imply how many subgoals it produces. Much more confusing instances of this problem are possible. For example, if a lemma [L] is modified to take an extra hypothesis, then uses of [apply L] will general more subgoals than before. Old unstructured proof scripts will become hopelessly jumbled, with tactics applied to inappropriate subgoals. Because of the lack of structure, there is usually relatively little to be gleaned from knowledge of the precise point in a proof script where an error is raised. *)
adamc@236 133
adamc@236 134 Reset times.
adamc@236 135
adamc@236 136 Fixpoint times (k : nat) (e : exp) : exp :=
adamc@236 137 match e with
adamc@236 138 | Const n => Const (k * n)
adamc@236 139 | Plus e1 e2 => Plus (times k e1) (times k e2)
adamc@236 140 end.
adamc@236 141
adamc@237 142 (** Many real developments try to make essentially unstructured proofs look structured by applying careful indentation conventions, idempotent case-marker tactics included soley to serve as documentation, and so on. All of these strategies suffer from the same kind of failure of abstraction that was just demonstrated. I like to say that if you find yourself caring about indentation in a proof script, it is a sign that the script is structured poorly.
adamc@236 143
adamc@236 144 We can rewrite the current proof with a single tactic. *)
adamc@236 145
adamc@236 146 Theorem eval_times : forall k e,
adamc@236 147 eval (times k e) = k * eval e.
adamc@236 148 induction e as [ | ? IHe1 ? IHe2 ]; [
adamc@236 149 trivial
adamc@236 150 | simpl; rewrite IHe1; rewrite IHe2; rewrite mult_plus_distr_l; trivial ].
adamc@236 151 Qed.
adamc@236 152
adamc@236 153 (** This is an improvement in robustness of the script. We no longer need to worry about tactics from one case being applied to a different case. Still, the proof script is not especially readable. Probably most readers would not find it helpful in explaining why the theorem is true.
adamc@236 154
adamc@236 155 The situation gets worse in considering extensions to the theorem we want to prove. Let us add multiplication nodes to our [exp] type and see how the proof fares. *)
adamc@236 156
adamc@236 157 Reset exp.
adamc@236 158
adamc@236 159 Inductive exp : Set :=
adamc@236 160 | Const : nat -> exp
adamc@236 161 | Plus : exp -> exp -> exp
adamc@236 162 | Mult : exp -> exp -> exp.
adamc@236 163
adamc@236 164 Fixpoint eval (e : exp) : nat :=
adamc@236 165 match e with
adamc@236 166 | Const n => n
adamc@236 167 | Plus e1 e2 => eval e1 + eval e2
adamc@236 168 | Mult e1 e2 => eval e1 * eval e2
adamc@236 169 end.
adamc@236 170
adamc@236 171 Fixpoint times (k : nat) (e : exp) : exp :=
adamc@236 172 match e with
adamc@236 173 | Const n => Const (k * n)
adamc@236 174 | Plus e1 e2 => Plus (times k e1) (times k e2)
adamc@236 175 | Mult e1 e2 => Mult (times k e1) e2
adamc@236 176 end.
adamc@236 177
adamc@236 178 Theorem eval_times : forall k e,
adamc@236 179 eval (times k e) = k * eval e.
adamc@236 180 (** [[
adamc@236 181 induction e as [ | ? IHe1 ? IHe2 ]; [
adamc@236 182 trivial
adamc@236 183 | simpl; rewrite IHe1; rewrite IHe2; rewrite mult_plus_distr_l; trivial ].
adamc@236 184
adamc@236 185 Error: Expects a disjunctive pattern with 3 branches.
adamc@236 186
adamc@236 187 ]] *)
adamc@236 188
adamc@236 189 Abort.
adamc@236 190
adamc@236 191 (** Unsurprisingly, the old proof fails, because it explicitly says that there are two inductive cases. To update the script, we must, at a minimum, remember the order in which the inductive cases are generated, so that we can insert the new case in the appropriate place. Even then, it will be painful to add the case, because we cannot walk through proof steps interactively when they occur inside an explicit set of cases. *)
adamc@236 192
adamc@236 193 Theorem eval_times : forall k e,
adamc@236 194 eval (times k e) = k * eval e.
adamc@236 195 induction e as [ | ? IHe1 ? IHe2 | ? IHe1 ? IHe2 ]; [
adamc@236 196 trivial
adamc@236 197 | simpl; rewrite IHe1; rewrite IHe2; rewrite mult_plus_distr_l; trivial
adamc@236 198 | simpl; rewrite IHe1; rewrite mult_assoc; trivial ].
adamc@236 199 Qed.
adamc@236 200
adamc@236 201 (** Now we are in a position to see how much nicer is the style of proof that we have followed in most of this book. *)
adamc@236 202
adamc@236 203 Reset eval_times.
adamc@236 204
adamc@238 205 Hint Rewrite mult_plus_distr_l : cpdt.
adamc@238 206
adamc@236 207 Theorem eval_times : forall k e,
adamc@236 208 eval (times k e) = k * eval e.
adamc@236 209 induction e; crush.
adamc@236 210 Qed.
adamc@236 211
adamc@237 212 (** This style is motivated by a hard truth: one person's manual proof script is almost always mostly inscrutable to most everyone else. I claim that step-by-step formal proofs are a poor way of conveying information. Thus, we had might as well cut out the steps and automate as much as possible.
adamc@237 213
adamc@237 214 What about the illustrative value of proofs? Most informal proofs are read to convey the big ideas of proofs. How can reading [induction e; crush] convey any big ideas? My position is that any ideas that standard automation can find are not very big after all, and the %\textit{%#<i>#real#</i>#%}% big ideas should be expressed through lemmas that are added as hints.
adamc@237 215
adamc@237 216 An example should help illustrate what I mean. Consider this function, which rewrites an expression using associativity of addition and multiplication. *)
adamc@237 217
adamc@237 218 Fixpoint reassoc (e : exp) : exp :=
adamc@237 219 match e with
adamc@237 220 | Const _ => e
adamc@237 221 | Plus e1 e2 =>
adamc@237 222 let e1' := reassoc e1 in
adamc@237 223 let e2' := reassoc e2 in
adamc@237 224 match e2' with
adamc@237 225 | Plus e21 e22 => Plus (Plus e1' e21) e22
adamc@237 226 | _ => Plus e1' e2'
adamc@237 227 end
adamc@237 228 | Mult e1 e2 =>
adamc@237 229 let e1' := reassoc e1 in
adamc@237 230 let e2' := reassoc e2 in
adamc@237 231 match e2' with
adamc@237 232 | Mult e21 e22 => Mult (Mult e1' e21) e22
adamc@237 233 | _ => Mult e1' e2'
adamc@237 234 end
adamc@237 235 end.
adamc@237 236
adamc@237 237 Theorem reassoc_correct : forall e, eval (reassoc e) = eval e.
adamc@237 238 induction e; crush;
adamc@237 239 match goal with
adamc@237 240 | [ |- context[match ?E with Const _ => _ | Plus _ _ => _ | Mult _ _ => _ end] ] =>
adamc@237 241 destruct E; crush
adamc@237 242 end.
adamc@237 243
adamc@237 244 (** One subgoal remains:
adamc@237 245 [[
adamc@237 246 IHe2 : eval e3 * eval e4 = eval e2
adamc@237 247 ============================
adamc@237 248 eval e1 * eval e3 * eval e4 = eval e1 * eval e2
adamc@237 249
adamc@237 250 ]]
adamc@237 251
adamc@237 252 [crush] does not know how to finish this goal. We could finish the proof manually. *)
adamc@237 253
adamc@237 254 rewrite <- IHe2; crush.
adamc@237 255
adamc@237 256 (** However, the proof would be easier to understand and maintain if we separated this insight into a separate lemma. *)
adamc@237 257
adamc@237 258 Abort.
adamc@237 259
adamc@237 260 Lemma rewr : forall a b c d, b * c = d -> a * b * c = a * d.
adamc@237 261 crush.
adamc@237 262 Qed.
adamc@237 263
adamc@237 264 Hint Resolve rewr.
adamc@237 265
adamc@237 266 Theorem reassoc_correct : forall e, eval (reassoc e) = eval e.
adamc@237 267 induction e; crush;
adamc@237 268 match goal with
adamc@237 269 | [ |- context[match ?E with Const _ => _ | Plus _ _ => _ | Mult _ _ => _ end] ] =>
adamc@237 270 destruct E; crush
adamc@237 271 end.
adamc@237 272 Qed.
adamc@237 273
adamc@237 274 (** In the limit, a complicated inductive proof might rely on one hint for each inductive case. The lemma for each hint could restate the associated case. Compared to manual proof scripts, we arrive at more readable results. Scripts no longer need to depend on the order in which cases are generated. The lemmas are easier to digest separately than are fragments of tactic code, since lemma statements include complete proof contexts. Such contexts can only be extracted from monolithic manual proofs by stepping through scripts interactively.
adamc@237 275
adamc@237 276 The more common situation is that a large induction has several easy cases that automation makes short work of. In the remaining cases, automation performs some standard simplification. Among these cases, some may require quite involved proofs; such a case may deserve a hint lemma of its own, where the lemma statement may copy the simplified version of the case. Alternatively, the proof script for the main theorem may be extended with some automation code targeted at the specific case. Even such targeted scripting is more desirable than manual proving, because it may be read and understood without knowledge of a proof's hierarchical structure, case ordering, or name binding structure. *)
adamc@237 277
adamc@235 278
adamc@238 279 (** * Debugging and Maintaining Automation *)
adamc@238 280
adamc@238 281 (** Fully-automated proofs are desirable because they open up possibilities for automatic adaptation to changes of specification. A well-engineered script within a narrow domain can survive many changes to the formulation of the problem it solves. Still, as we are working with higher-order logic, most theorems fall within no obvious decidable theories. It is inevitable that most long-lived automated proofs will need updating.
adamc@238 282
adamc@238 283 Before we are ready to update our proofs, we need to write them in the first place. While fully-automated scripts are most robust to changes of specification, it is hard to write every new proof directly in that form. Instead, it is useful to begin a theorem with exploratory proving and then gradually refine it into a suitable automated form.
adamc@238 284
adamc@238 285 Consider this theorem from Chapter 7, which we begin by proving in a mostly manual way, invoking [crush] after each steop to discharge any low-hanging fruit. Our manual effort involves choosing which expressions to case-analyze on. *)
adamc@238 286
adamc@238 287 (* begin hide *)
adamc@238 288 Require Import MoreDep.
adamc@238 289 (* end hide *)
adamc@238 290
adamc@238 291 Theorem cfold_correct : forall t (e : exp t), expDenote e = expDenote (cfold e).
adamc@238 292 induction e; crush.
adamc@238 293
adamc@238 294 dep_destruct (cfold e1); crush.
adamc@238 295 dep_destruct (cfold e2); crush.
adamc@238 296
adamc@238 297 dep_destruct (cfold e1); crush.
adamc@238 298 dep_destruct (cfold e2); crush.
adamc@238 299
adamc@238 300 dep_destruct (cfold e1); crush.
adamc@238 301 dep_destruct (cfold e2); crush.
adamc@238 302
adamc@238 303 dep_destruct (cfold e1); crush.
adamc@238 304 dep_destruct (expDenote e1); crush.
adamc@238 305
adamc@238 306 dep_destruct (cfold e); crush.
adamc@238 307
adamc@238 308 dep_destruct (cfold e); crush.
adamc@238 309 Qed.
adamc@238 310
adamc@238 311 (** In this complete proof, it is hard to avoid noticing a pattern. We rework the proof, abstracting over the patterns we find. *)
adamc@238 312
adamc@238 313 Reset cfold_correct.
adamc@238 314
adamc@238 315 Theorem cfold_correct : forall t (e : exp t), expDenote e = expDenote (cfold e).
adamc@238 316 induction e; crush.
adamc@238 317
adamc@238 318 (** The expression we want to destruct here turns out to be the discriminee of a [match], and we can easily enough write a tactic that destructs all such expressions. *)
adamc@238 319
adamc@238 320 Ltac t :=
adamc@238 321 repeat (match goal with
adamc@238 322 | [ |- context[match ?E with NConst _ => _ | Plus _ _ => _
adamc@238 323 | Eq _ _ => _ | BConst _ => _ | And _ _ => _
adamc@238 324 | If _ _ _ _ => _ | Pair _ _ _ _ => _
adamc@238 325 | Fst _ _ _ => _ | Snd _ _ _ => _ end] ] =>
adamc@238 326 dep_destruct E
adamc@238 327 end; crush).
adamc@238 328
adamc@238 329 t.
adamc@238 330
adamc@238 331 (** This tactic invocation discharges the whole case. It does the same on the next two cases, but it gets stuck on the fourth case. *)
adamc@238 332
adamc@238 333 t.
adamc@238 334
adamc@238 335 t.
adamc@238 336
adamc@238 337 t.
adamc@238 338
adamc@238 339 (** The subgoal's conclusion is:
adamc@238 340 [[
adamc@238 341 ============================
adamc@238 342 (if expDenote e1 then expDenote (cfold e2) else expDenote (cfold e3)) =
adamc@238 343 expDenote (if expDenote e1 then cfold e2 else cfold e3)
adamc@238 344
adamc@238 345 ]]
adamc@238 346
adamc@238 347 We need to expand our [t] tactic to handle this case. *)
adamc@238 348
adamc@238 349 Ltac t' :=
adamc@238 350 repeat (match goal with
adamc@238 351 | [ |- context[match ?E with NConst _ => _ | Plus _ _ => _
adamc@238 352 | Eq _ _ => _ | BConst _ => _ | And _ _ => _
adamc@238 353 | If _ _ _ _ => _ | Pair _ _ _ _ => _
adamc@238 354 | Fst _ _ _ => _ | Snd _ _ _ => _ end] ] =>
adamc@238 355 dep_destruct E
adamc@238 356 | [ |- (if ?E then _ else _) = _ ] => destruct E
adamc@238 357 end; crush).
adamc@238 358
adamc@238 359 t'.
adamc@238 360
adamc@238 361 (** Now the goal is discharged, but [t'] has no effect on the next subgoal. *)
adamc@238 362
adamc@238 363 t'.
adamc@238 364
adamc@238 365 (** A final revision of [t] finishes the proof. *)
adamc@238 366
adamc@238 367 Ltac t'' :=
adamc@238 368 repeat (match goal with
adamc@238 369 | [ |- context[match ?E with NConst _ => _ | Plus _ _ => _
adamc@238 370 | Eq _ _ => _ | BConst _ => _ | And _ _ => _
adamc@238 371 | If _ _ _ _ => _ | Pair _ _ _ _ => _
adamc@238 372 | Fst _ _ _ => _ | Snd _ _ _ => _ end] ] =>
adamc@238 373 dep_destruct E
adamc@238 374 | [ |- (if ?E then _ else _) = _ ] => destruct E
adamc@238 375 | [ |- context[match pairOut ?E with Some _ => _
adamc@238 376 | None => _ end] ] =>
adamc@238 377 dep_destruct E
adamc@238 378 end; crush).
adamc@238 379
adamc@238 380 t''.
adamc@238 381
adamc@238 382 t''.
adamc@238 383 Qed.
adamc@238 384
adamc@238 385 (** We can take the final tactic and move it into the initial part of the proof script, arriving at a nicely-automated proof. *)
adamc@238 386
adamc@238 387 Reset t.
adamc@238 388
adamc@238 389 Theorem cfold_correct : forall t (e : exp t), expDenote e = expDenote (cfold e).
adamc@238 390 induction e; crush;
adamc@238 391 repeat (match goal with
adamc@238 392 | [ |- context[match ?E with NConst _ => _ | Plus _ _ => _
adamc@238 393 | Eq _ _ => _ | BConst _ => _ | And _ _ => _
adamc@238 394 | If _ _ _ _ => _ | Pair _ _ _ _ => _
adamc@238 395 | Fst _ _ _ => _ | Snd _ _ _ => _ end] ] =>
adamc@238 396 dep_destruct E
adamc@238 397 | [ |- (if ?E then _ else _) = _ ] => destruct E
adamc@238 398 | [ |- context[match pairOut ?E with Some _ => _
adamc@238 399 | None => _ end] ] =>
adamc@238 400 dep_destruct E
adamc@238 401 end; crush).
adamc@238 402 Qed.
adamc@238 403
adamc@240 404 (** Even after we put together nice automated proofs, we must deal with specification changes that can invalidate them. It is not generally possible to step through single-tactic proofs interactively. There is a command [Debug On] that lets us step through points in tactic execution, but the debugger tends to make counterintuitive choices of which points we would like to stop at, and per-point output is quite verbose, so most Coq users do not find this debugging mode very helpful. How are we to understand what has broken in a script that used to work?
adamc@240 405
adamc@240 406 An example helps demonstrate a useful approach. Consider what would have happened in our proof of [reassoc_correct] if we had first added an unfortunate rewriting hint. *)
adamc@240 407
adamc@240 408 Reset reassoc_correct.
adamc@240 409
adamc@240 410 Theorem confounder : forall e1 e2 e3,
adamc@240 411 eval e1 * eval e2 * eval e3 = eval e1 * (eval e2 + 1 - 1) * eval e3.
adamc@240 412 crush.
adamc@240 413 Qed.
adamc@240 414
adamc@240 415 Hint Rewrite confounder : cpdt.
adamc@240 416
adamc@240 417 Theorem reassoc_correct : forall e, eval (reassoc e) = eval e.
adamc@240 418 induction e; crush;
adamc@240 419 match goal with
adamc@240 420 | [ |- context[match ?E with Const _ => _ | Plus _ _ => _ | Mult _ _ => _ end] ] =>
adamc@240 421 destruct E; crush
adamc@240 422 end.
adamc@240 423
adamc@240 424 (** One subgoal remains:
adamc@240 425
adamc@240 426 [[
adamc@240 427 ============================
adamc@240 428 eval e1 * (eval e3 + 1 - 1) * eval e4 = eval e1 * eval e2
adamc@240 429
adamc@240 430 ]]
adamc@240 431
adamc@240 432 The poorly-chosen rewrite rule fired, changing the goal to a form where another hint no longer applies. Imagine that we are in the middle of a large development with many hints. How would we diagnose the problem? First, we might not be sure which case of the inductive proof has gone wrong. It is useful to separate out our automation procedure and apply it manually. *)
adamc@240 433
adamc@240 434 Restart.
adamc@240 435
adamc@240 436 Ltac t := crush; match goal with
adamc@240 437 | [ |- context[match ?E with Const _ => _ | Plus _ _ => _
adamc@240 438 | Mult _ _ => _ end] ] =>
adamc@240 439 destruct E; crush
adamc@240 440 end.
adamc@240 441
adamc@240 442 induction e.
adamc@240 443
adamc@240 444 (** Since we see the subgoals before any simplification occurs, it is clear that this is the case for constants. [t] makes short work of it. *)
adamc@240 445
adamc@240 446 t.
adamc@240 447
adamc@240 448 (** The next subgoal, for addition, is also discharged without trouble. *)
adamc@240 449
adamc@240 450 t.
adamc@240 451
adamc@240 452 (** The final subgoal is for multiplication, and it is here that we get stuck in the proof state summarized above. *)
adamc@240 453
adamc@240 454 t.
adamc@240 455
adamc@240 456 (** What is [t] doing to get us to this point? The [info] command can help us answer this kind of question. *)
adamc@240 457
adamc@240 458 (** remove printing * *)
adamc@240 459 Undo.
adamc@240 460 info t.
adamc@240 461 (** [[
adamc@240 462 == simpl in *; intuition; subst; autorewrite with cpdt in *;
adamc@240 463 simpl in *; intuition; subst; autorewrite with cpdt in *;
adamc@240 464 simpl in *; intuition; subst; destruct (reassoc e2).
adamc@240 465 simpl in *; intuition.
adamc@240 466
adamc@240 467 simpl in *; intuition.
adamc@240 468
adamc@240 469 simpl in *; intuition; subst; autorewrite with cpdt in *;
adamc@240 470 refine (eq_ind_r
adamc@240 471 (fun n : nat =>
adamc@240 472 n * (eval e3 + 1 - 1) * eval e4 = eval e1 * eval e2) _ IHe1);
adamc@240 473 autorewrite with cpdt in *; simpl in *; intuition;
adamc@240 474 subst; autorewrite with cpdt in *; simpl in *;
adamc@240 475 intuition; subst.
adamc@240 476
adamc@240 477 ]]
adamc@240 478
adamc@240 479 A detailed trace of [t]'s execution appears. Since we are using the very general [crush] tactic, many of these steps have no effect and only occur as instances of a more general strategy. We can copy-and-paste the details to see where things go wrong. *)
adamc@240 480
adamc@240 481 Undo.
adamc@240 482
adamc@240 483 (** We arbitrarily split the script into chunks. The first few seem not to do any harm. *)
adamc@240 484
adamc@240 485 simpl in *; intuition; subst; autorewrite with cpdt in *.
adamc@240 486 simpl in *; intuition; subst; autorewrite with cpdt in *.
adamc@240 487 simpl in *; intuition; subst; destruct (reassoc e2).
adamc@240 488 simpl in *; intuition.
adamc@240 489 simpl in *; intuition.
adamc@240 490
adamc@240 491 (** The next step is revealed as the culprit, bringing us to the final unproved subgoal. *)
adamc@240 492
adamc@240 493 simpl in *; intuition; subst; autorewrite with cpdt in *.
adamc@240 494
adamc@240 495 (** We can split the steps further to assign blame. *)
adamc@240 496
adamc@240 497 Undo.
adamc@240 498
adamc@240 499 simpl in *.
adamc@240 500 intuition.
adamc@240 501 subst.
adamc@240 502 autorewrite with cpdt in *.
adamc@240 503
adamc@240 504 (** It was the final of these four tactics that made the rewrite. We can find out exactly what happened. The [info] command presents hierarchical views of proof steps, and we can zoom down to a lower level of detail by applying [info] to one of the steps that appeared in the original trace. *)
adamc@240 505
adamc@240 506 Undo.
adamc@240 507
adamc@240 508 info autorewrite with cpdt in *.
adamc@240 509 (** [[
adamc@240 510 == refine (eq_ind_r (fun n : nat => n = eval e1 * eval e2) _
adamc@240 511 (confounder (reassoc e1) e3 e4)).
adamc@240 512
adamc@240 513 ]]
adamc@240 514
adamc@240 515 The way a rewrite is displayed is somewhat baroque, but we can see that theorem [confounder] is the final culprit. At this point, we could remove that hint, prove an alternate version of the key lemma [rewr], or come up with some other remedy. Fixing this kind of problem tends to be relatively easy once the problem is revealed. *)
adamc@240 516
adamc@240 517 Abort.
adamc@240 518
adamc@240 519 (** printing * $\times$ *)
adamc@240 520
adamc@241 521 (** Sometimes a change to a development has undesirable performance consequences, even if it does not prevent any old proof scripts from completing. If the performance consequences are severe enough, the proof scripts can be considered broken for practical purposes.
adamc@241 522
adamc@241 523 Here is one example of a performance surprise. *)
adamc@241 524
adamc@239 525 Section slow.
adamc@239 526 Hint Resolve trans_eq.
adamc@239 527
adamc@241 528 (** The central element of the problem is the addition of transitivity as a hint. With transitivity available, it is easy for proof search to wind up exploring exponential search spaces. We also add a few other arbitrary variables and hypotheses, designed to lead to trouble later. *)
adamc@241 529
adamc@239 530 Variable A : Set.
adamc@239 531 Variables P Q R S : A -> A -> Prop.
adamc@239 532 Variable f : A -> A.
adamc@239 533
adamc@239 534 Hypothesis H1 : forall x y, P x y -> Q x y -> R x y -> f x = f y.
adamc@239 535 Hypothesis H2 : forall x y, S x y -> R x y.
adamc@239 536
adamc@241 537 (** We prove a simple lemma very quickly, using the [Time] command to measure exactly how quickly. *)
adamc@241 538
adamc@239 539 Lemma slow : forall x y, P x y -> Q x y -> S x y -> f x = f y.
adamc@241 540 Time eauto 6.
adamc@241 541 (** [[
adamc@241 542 Finished transaction in 0. secs (0.068004u,0.s)
adamc@241 543 ]] *)
adamc@241 544
adamc@239 545 Qed.
adamc@239 546
adamc@241 547 (** Now we add a different hypothesis, which is innocent enough; in fact, it is even provable as a theorem. *)
adamc@241 548
adamc@239 549 Hypothesis H3 : forall x y, x = y -> f x = f y.
adamc@239 550
adamc@239 551 Lemma slow' : forall x y, P x y -> Q x y -> S x y -> f x = f y.
adamc@241 552 Time eauto 6.
adamc@241 553 (** [[
adamc@241 554 Finished transaction in 2. secs (1.264079u,0.s)
adamc@241 555
adamc@241 556 ]]
adamc@241 557
adamc@241 558 Why has the search time gone up so much? The [info] command is not much help, since it only shows the result of search, not all of the paths that turned out to be worthless. *)
adamc@241 559
adamc@241 560 Restart.
adamc@241 561 info eauto 6.
adamc@241 562 (** [[
adamc@241 563 == intro x; intro y; intro H; intro H0; intro H4;
adamc@241 564 simple eapply trans_eq.
adamc@241 565 simple apply refl_equal.
adamc@241 566
adamc@241 567 simple eapply trans_eq.
adamc@241 568 simple apply refl_equal.
adamc@241 569
adamc@241 570 simple eapply trans_eq.
adamc@241 571 simple apply refl_equal.
adamc@241 572
adamc@241 573 simple apply H1.
adamc@241 574 eexact H.
adamc@241 575
adamc@241 576 eexact H0.
adamc@241 577
adamc@241 578 simple apply H2; eexact H4.
adamc@241 579
adamc@241 580 ]]
adamc@241 581
adamc@241 582 This output does not tell us why proof search takes so long, but it does provide a clue that would be useful if we had forgotten that we added transitivity as a hint. The [eauto] tactic is applying depth-first search, and the proof script where the real action is ends up buried inside a chain of pointless invocations of transitivity, where each invocation uses reflexivity to discharge one subgoal. Each increment to the depth argument to [eauto] adds another silly use of transitivity. This wasted proof effort only adds linear time overhead, as long as proof search never makes false steps. No false steps were made before we added the new hypothesis, but somehow the addition made possible a new faulty path. To understand which paths we enabled, we can use the [debug] command. *)
adamc@241 583
adamc@241 584 Restart.
adamc@241 585 debug eauto 6.
adamc@241 586
adamc@241 587 (** The output is a large proof tree. The beginning of the tree is enough to reveal what is happening:
adamc@241 588
adamc@241 589 [[
adamc@241 590 1 depth=6
adamc@241 591 1.1 depth=6 intro
adamc@241 592 1.1.1 depth=6 intro
adamc@241 593 1.1.1.1 depth=6 intro
adamc@241 594 1.1.1.1.1 depth=6 intro
adamc@241 595 1.1.1.1.1.1 depth=6 intro
adamc@241 596 1.1.1.1.1.1.1 depth=5 apply H3
adamc@241 597 1.1.1.1.1.1.1.1 depth=4 eapply trans_eq
adamc@241 598 1.1.1.1.1.1.1.1.1 depth=4 apply refl_equal
adamc@241 599 1.1.1.1.1.1.1.1.1.1 depth=3 eapply trans_eq
adamc@241 600 1.1.1.1.1.1.1.1.1.1.1 depth=3 apply refl_equal
adamc@241 601 1.1.1.1.1.1.1.1.1.1.1.1 depth=2 eapply trans_eq
adamc@241 602 1.1.1.1.1.1.1.1.1.1.1.1.1 depth=2 apply refl_equal
adamc@241 603 1.1.1.1.1.1.1.1.1.1.1.1.1.1 depth=1 eapply trans_eq
adamc@241 604 1.1.1.1.1.1.1.1.1.1.1.1.1.1.1 depth=1 apply refl_equal
adamc@241 605 1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1 depth=0 eapply trans_eq
adamc@241 606 1.1.1.1.1.1.1.1.1.1.1.1.1.1.2 depth=1 apply sym_eq ; trivial
adamc@241 607 1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.1 depth=0 eapply trans_eq
adamc@241 608 1.1.1.1.1.1.1.1.1.1.1.1.1.1.3 depth=0 eapply trans_eq
adamc@241 609 1.1.1.1.1.1.1.1.1.1.1.1.2 depth=2 apply sym_eq ; trivial
adamc@241 610 1.1.1.1.1.1.1.1.1.1.1.1.2.1 depth=1 eapply trans_eq
adamc@241 611 1.1.1.1.1.1.1.1.1.1.1.1.2.1.1 depth=1 apply refl_equal
adamc@241 612 1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1 depth=0 eapply trans_eq
adamc@241 613 1.1.1.1.1.1.1.1.1.1.1.1.2.1.2 depth=1 apply sym_eq ; trivial
adamc@241 614 1.1.1.1.1.1.1.1.1.1.1.1.2.1.2.1 depth=0 eapply trans_eq
adamc@241 615 1.1.1.1.1.1.1.1.1.1.1.1.2.1.3 depth=0 eapply trans_eq
adamc@241 616
adamc@241 617 ]]
adamc@241 618
adamc@241 619 The first choice [eauto] makes is to apply [H3], since [H3] has the fewest hypotheses of all of the hypotheses and hints that match. However, it turns out that the single hypothesis generated is unprovable. That does not stop [eauto] from trying to prove it with an exponentially-sized tree of applications of transitivity, reflexivity, and symmetry of equality. It is the children of the initial [apply H3] that account for all of the noticeable time in proof execution. In a more realistic development, we might use this output of [info] to realize that adding transitivity as a hint was a bad idea. *)
adamc@241 620
adamc@239 621 Qed.
adamc@239 622 End slow.
adamc@239 623
adamc@241 624 (** It is also easy to end up with a proof script that uses too much memory. As tactics run, they avoid generating proof terms, since serious proof search will consider many possible avenues, and we do not want to built proof terms for subproofs that end up unused. Instead, tactic execution maintains %\textit{%#<i>#thunks#</i>#%}% (suspended computations, represented with closures), such that a tactic's proof-producing thunk is only executed when we run [Qed]. These thunks can use up large amounts of space, such that a proof script exhausts available memory, even when we know that we could have used much less memory by forcing some thunks earlier.
adamc@241 625
adamc@241 626 The [abstract] tactical helps us force thunks by proving some subgoals as their own lemmas. For instance, a proof [induction x; crush] can in many cases be made to use significantly less peak memory by changing it to [induction x; abstract crush]. The main limitation of [abstract] is that it can only be applied to subgoals that are proved completely, with no undetermined unification variables remaining. Still, many large automated proofs can realize vast memory savings via [abstract]. *)
adamc@241 627
adamc@238 628
adamc@235 629 (** * Modules *)
adamc@235 630
adamc@235 631 Module Type GROUP.
adamc@235 632 Parameter G : Set.
adamc@235 633 Parameter f : G -> G -> G.
adamc@235 634 Parameter e : G.
adamc@235 635 Parameter i : G -> G.
adamc@235 636
adamc@235 637 Axiom assoc : forall a b c, f (f a b) c = f a (f b c).
adamc@235 638 Axiom ident : forall a, f e a = a.
adamc@235 639 Axiom inverse : forall a, f (i a) a = e.
adamc@235 640 End GROUP.
adamc@235 641
adamc@235 642 Module Type GROUP_THEOREMS.
adamc@235 643 Declare Module M : GROUP.
adamc@235 644
adamc@235 645 Axiom ident' : forall a, M.f a M.e = a.
adamc@235 646
adamc@235 647 Axiom inverse' : forall a, M.f a (M.i a) = M.e.
adamc@235 648
adamc@235 649 Axiom unique_ident : forall e', (forall a, M.f e' a = a) -> e' = M.e.
adamc@235 650 End GROUP_THEOREMS.
adamc@235 651
adamc@239 652 Module Group (M : GROUP) : GROUP_THEOREMS with Module M := M.
adamc@235 653 Module M := M.
adamc@235 654
adamc@235 655 Import M.
adamc@235 656
adamc@235 657 Theorem inverse' : forall a, f a (i a) = e.
adamc@235 658 intro.
adamc@235 659 rewrite <- (ident (f a (i a))).
adamc@235 660 rewrite <- (inverse (f a (i a))) at 1.
adamc@235 661 rewrite assoc.
adamc@235 662 rewrite assoc.
adamc@235 663 rewrite <- (assoc (i a) a (i a)).
adamc@235 664 rewrite inverse.
adamc@235 665 rewrite ident.
adamc@235 666 apply inverse.
adamc@235 667 Qed.
adamc@235 668
adamc@235 669 Theorem ident' : forall a, f a e = a.
adamc@235 670 intro.
adamc@235 671 rewrite <- (inverse a).
adamc@235 672 rewrite <- assoc.
adamc@235 673 rewrite inverse'.
adamc@235 674 apply ident.
adamc@235 675 Qed.
adamc@235 676
adamc@235 677 Theorem unique_ident : forall e', (forall a, M.f e' a = a) -> e' = M.e.
adamc@235 678 intros.
adamc@235 679 rewrite <- (H e).
adamc@235 680 symmetry.
adamc@235 681 apply ident'.
adamc@235 682 Qed.
adamc@235 683 End Group.
adamc@239 684
adamc@239 685 Require Import ZArith.
adamc@239 686 Open Scope Z_scope.
adamc@239 687
adamc@239 688 Module Int.
adamc@239 689 Definition G := Z.
adamc@239 690 Definition f x y := x + y.
adamc@239 691 Definition e := 0.
adamc@239 692 Definition i x := -x.
adamc@239 693
adamc@239 694 Theorem assoc : forall a b c, f (f a b) c = f a (f b c).
adamc@239 695 unfold f; crush.
adamc@239 696 Qed.
adamc@239 697 Theorem ident : forall a, f e a = a.
adamc@239 698 unfold f, e; crush.
adamc@239 699 Qed.
adamc@239 700 Theorem inverse : forall a, f (i a) a = e.
adamc@239 701 unfold f, i, e; crush.
adamc@239 702 Qed.
adamc@239 703 End Int.
adamc@239 704
adamc@239 705 Module IntTheorems := Group(Int).
adamc@239 706
adamc@239 707 Check IntTheorems.unique_ident.
adamc@239 708
adamc@239 709 Theorem unique_ident : forall e', (forall a, e' + a = a) -> e' = 0.
adamc@239 710 exact IntTheorems.unique_ident.
adamc@239 711 Qed.