Research Highlight: Certifying a File System using Crash Hoare Logic: Correctness in the Presence of Crashes

Tej Chajed, Haogang Chen, Adam Chlipala, Frans Kaashoek, Nickolai Zeldovich, Daniel Ziegler. Research Highlight: Certifying a File System using Crash Hoare Logic: Correctness in the Presence of Crashes. Communications of the ACM (CACM). 60(4). 75-84, 2017. Association for Computing Machinery.

Publisher's version


FSCQ is the first file system with a machine-checkable proof that its implementation meets a specification, even in the presence of fail-stop crashes. FSCQ provably avoids bugs that have plagued previous file systems, such as performing disk writes without sufficient barriers or forgetting to zero out directory blocks. If a crash happens at an inopportune time, these bugs can lead to data loss. FSCQ's theorems prove that, under any sequence of crashes followed by reboots, FSCQ will recover its state correctly without losing data.

To state FSCQ's theorems, this paper introduces the Crash Hoare logic (CHL), which extends traditional Hoare logic with a crash condition, a recovery procedure, and logical address spaces for specifying disk states at different abstraction levels. CHL also reduces the proof effort for developers through proof automation. Using CHL, we developed, specified, and proved the correctness of the FSCQ file system. Although FSCQ's design is relatively simple, experiments with FSCQ as a user-level file system show that it is sufficient to run Unix applications with usable performance. FSCQ's specifications and proofs required significantly more work than the implementation, but the work was manageable even for a small team of a few researchers.