Skip to content

Proving Brzozowski & Antimirov derivatives (and zippers) equivalent in Coq

Notifications You must be signed in to change notification settings

ngernest/regexes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Verified Derivative-Based Regular Expression Matching

(Final Project for CS 6115, Fall 2024)

This repo contains mechanized proofs about the equivalence of Brzozowski and Antimirov derivatives, as well as the zipper representation of Brzozowski derivatives (from Romain Edelmann's PhD dissertation).

There are two subdirectories: coq & ocaml, which contain Coq and OCaml code respectively. Instructions on running the demo executables are in the demo subsection of this README.

Coq Environment Setup

Creating a new Opam switch

This project compiles with Coq 8.19.2 and OCaml 5.2.0. We recommend setting up a fresh Opam switch with these versions:

opam switch create [switch-name] ocaml-base-compiler.5.2.0
eval $(opam env)
opam pin add coq 8.19.2

Installing the std++ library

Our Coq code uses the coq-std++ library, which can be installed as follows:

opam repo add coq-released https://coq.inria.fr/opam/released
opam install coq-stdpp

Compiling and viewing our Coq code

  • To compile: cd into the coq subdirectory & run make/make clean as appropriate.
  • (Optional) We recommend viewing our Coq code in VS Code using the coq-lsp VS Code extension (instead of VSCoq).
  • Note: if you are using VS Code, open VS Code in the coq subdirectory by cd-ing to the coq subdirectory and running code . in the terminal (this is needed for coq-lsp to work properly).

Coq Code

  • Regex.v: Definitions related to regular expressions (adapted from Jules)
  • Antimirov.v: Antimirov derivatives
  • Brzozowski.v: Brzozowski derivatives
  • Equivalent.v: Proof that Antimirov and Brzozowski matchers are equivalent
  • Finite.v: Proof that there are finitely many Antimirov partial derivatives of a regex
  • Height.v: Proofs that height and size of Antimirov derivatives are bounded
  • Edelmann.v: Romain Edelmann's Coq formalization of the zipper representation of Brzozowski derivatives, modified to use gsets
  • ZipperAntimirov.v: Proof that the underlying sets of regexes for zippers and Antimirov derivatives are equivalent

OCaml Code

The ocaml subdirectory contains executable implementations of regex matchers:

Building & Testing the OCaml Code

First, cd into the ocaml subdirectory. Then:

  • Run make install to install all OCaml dependencies
  • Run make to build the OCaml code
  • (Optional) Run make test to run our QuickCheck test suite

Demo

We have two demo executables:

  • make demo1: Generates 15 random (regex, string) pairs and shows that the three regex matchers based on Brzozowski/Antimirov/zippers return the same acceptance result
  • make demo2: Generates 15 random (regex, char) pairs and shows that the underlying regex sets for both Antimirov derivatives and zippers are the same

(Note: we use QuickCheck to generate the random inputs. For reproducibility, we force QuickCheck to use the same seed across different invocations of the executable.)

Deprecated Project Work

  • brzozowski_zipper.ml: An implementation of Brzozowski derivatives via zippers (translated from the Scala code in chapter 2.6 of Edelmann's dissertation)
    • Note: We found out (via QuickCheck tests) that this file is buggy, possibly because we manually translated Edelmann's Scala code to OCaml. This file has been superceded by extracted_brzozowski_zipper.ml
  • krishnaswami.ml: Builds a DFA corresponding to a regex using Antimirov derivatives (adapted from Neel Krishnaswami's blogpost)
  • ListMonad.v: The list monad (currently unused in the rest of our Coq development)
  • RegexOpt.v: Smart constructors for regexes
  • EdelmannOriginal.v: Edelmann's code, using lists instead of gsets
  • ZipperAntimirovOriginal.v: First attempt at proof that underlying sets for zippers and Antimirov derivatives are equivalent
  • We previously tried to mechanize Filinski's JFP 2021 paper "Proof-directed program transformation: A functional account of efficient regular expression matching," but we decided to switch to work on Brzozowski/Antimirov derivatives instead. Our (previous) work involving the Filinski paper is contained in the following files:
    • harper.ml: The code from "Proof-directed debugging" (Harper 1998), translated from SML to OCaml
    • filinski.ml: The code from "Proof-directed program transformation: A functional account of efficient regular expression matching" (Filinski 2021), translated from SML to OCaml
    • Filinski.v: Our (abandoned) attempt at mechanizing the Filinski paper

About

Proving Brzozowski & Antimirov derivatives (and zippers) equivalent in Coq

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published