(Final Project for CS 6115, Fall 2024)
This repo contains mechanized proofs about the equivalence of Brzozowski and Antimirov derivatives, as well as the zipper representation of Brzozowski derivatives (from Romain Edelmann's PhD dissertation).
There are two subdirectories: coq
& ocaml
, which contain Coq and OCaml code respectively. Instructions on running the demo executables are in the demo subsection of this README.
This project compiles with Coq 8.19.2 and OCaml 5.2.0. We recommend setting up a fresh Opam switch with these versions:
opam switch create [switch-name] ocaml-base-compiler.5.2.0
eval $(opam env)
opam pin add coq 8.19.2
Our Coq code uses the coq-std++
library, which can be installed as follows:
opam repo add coq-released https://coq.inria.fr/opam/released
opam install coq-stdpp
- To compile:
cd
into thecoq
subdirectory & runmake
/make clean
as appropriate. - (Optional) We recommend viewing our Coq code in VS Code using the coq-lsp VS Code extension (instead of VSCoq).
- To install
coq-lsp
: Runopam install coq-lsp
and install the corresponding VS Code extension.
- To install
- Note: if you are using VS Code, open VS Code in the
coq
subdirectory bycd
-ing to thecoq
subdirectory and runningcode .
in the terminal (this is needed forcoq-lsp
to work properly).
Regex.v
: Definitions related to regular expressions (adapted from Jules)Antimirov.v
: Antimirov derivativesBrzozowski.v
: Brzozowski derivativesEquivalent.v
: Proof that Antimirov and Brzozowski matchers are equivalentFinite.v
: Proof that there are finitely many Antimirov partial derivatives of a regexHeight.v
: Proofs that height and size of Antimirov derivatives are boundedEdelmann.v
: Romain Edelmann's Coq formalization of the zipper representation of Brzozowski derivatives, modified to usegset
sZipperAntimirov.v
: Proof that the underlying sets of regexes for zippers and Antimirov derivatives are equivalent
The ocaml
subdirectory contains executable implementations of regex matchers:
regex.ml
: Regex definitions, smart constructors, and functions for computing regex height and sizebrzozowski.ml
: Brzozowski derivative-based regex matcherantimirov.ml
: Antimirov derivative-based regex matcherzipper.ml
: An implementation of Huet's zipper (adapted from chapter 2.3 of Romain Edelmann's PhD dissertation)extracted_brzozowski_zipper.ml
: the zipper representation of Brzozowski derivatives, due to Romain Edelmann (extracted from the Coq code inEdelmannOriginal.v
)quickcheck_properties.ml
: QuickCheck random generators and properties regarding derivativesutils.ml
: Pretty-printers for regexes, other miscellaneous utilszipper_antimirov.ml
: Demo illustrating that zippers and Antimirov derivatives represent the same set of regexesthree_matchers.ml
: Demo illustrating that the three matchers (Brzozowski, Antimirov, zippers) behave the same given the same inputs
First, cd
into the ocaml
subdirectory. Then:
- Run
make install
to install all OCaml dependencies - Run
make
to build the OCaml code - (Optional) Run
make test
to run our QuickCheck test suite
We have two demo executables:
make demo1
: Generates 15 random (regex, string) pairs and shows that the three regex matchers based on Brzozowski/Antimirov/zippers return the same acceptance resultmake demo2
: Generates 15 random (regex, char) pairs and shows that the underlying regex sets for both Antimirov derivatives and zippers are the same
(Note: we use QuickCheck to generate the random inputs. For reproducibility, we force QuickCheck to use the same seed across different invocations of the executable.)
brzozowski_zipper.ml
: An implementation of Brzozowski derivatives via zippers (translated from the Scala code in chapter 2.6 of Edelmann's dissertation)- Note: We found out (via QuickCheck tests) that this file is buggy, possibly because we manually translated Edelmann's Scala code to OCaml. This file has been superceded by
extracted_brzozowski_zipper.ml
- Note: We found out (via QuickCheck tests) that this file is buggy, possibly because we manually translated Edelmann's Scala code to OCaml. This file has been superceded by
krishnaswami.ml
: Builds a DFA corresponding to a regex using Antimirov derivatives (adapted from Neel Krishnaswami's blogpost)ListMonad.v
: The list monad (currently unused in the rest of our Coq development)RegexOpt.v
: Smart constructors for regexesEdelmannOriginal.v
: Edelmann's code, using lists instead ofgset
sZipperAntimirovOriginal.v
: First attempt at proof that underlying sets for zippers and Antimirov derivatives are equivalent- We previously tried to mechanize Filinski's JFP 2021 paper "Proof-directed program transformation: A functional account of efficient regular expression matching," but we decided to switch to work on Brzozowski/Antimirov derivatives instead. Our (previous) work involving the Filinski paper is contained in the following files:
harper.ml
: The code from "Proof-directed debugging" (Harper 1998), translated from SML to OCamlfilinski.ml
: The code from "Proof-directed program transformation: A functional account of efficient regular expression matching" (Filinski 2021), translated from SML to OCamlFilinski.v
: Our (abandoned) attempt at mechanizing the Filinski paper