Skip to content

Commit

Permalink
Merge pull request #8 from EngineeringSoftware/readme-update
Browse files Browse the repository at this point in the history
Readme update
  • Loading branch information
pengyunie authored Nov 23, 2020
2 parents 3a1c8fd + 2a4ece9 commit 2990f7b
Show file tree
Hide file tree
Showing 4 changed files with 184 additions and 62 deletions.
130 changes: 72 additions & 58 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ section below.

## Requirements

- Linux or macOS
- [OCaml 4.07.1](https://ocaml.org)
- [SerAPI 0.7.1](https://github.com/ejgallego/coq-serapi)
- [Coq 8.10.2](https://coq.inria.fr/download)
Expand All @@ -16,7 +17,19 @@ section below.

## Installation

### Installation of OCaml, Coq, and SerAPI
You can install Roosterize from source code by cloning this GitHub
repository and setting up the dependencies following steps 1 & 2.
(Alternatively, you can download the a [binary
distribution](https://github.com/EngineeringSoftware/roosterize/releases)
which already contains the Python dependencies, and then you only need
step 1.)

```
git clone https://github.com/EngineeringSoftware/roosterize.git
cd roosterize
```

### 1. Installation of OCaml, Coq, and SerAPI

We strongly recommend installing the required versions of OCaml, Coq,
and SerAPI via the [OPAM package manager](https://opam.ocaml.org),
Expand All @@ -35,93 +48,94 @@ opam pin add coq 8.10.2
opam pin add coq-serapi 8.10.0+0.7.1
```

### Installation of PyTorch and Python libraries
### 2. Installation of PyTorch and Python libraries

We strongly recommend installing the required versions of Python and
PyTorch using [Conda](https://docs.conda.io/en/latest/miniconda.html).

We strongly recommend installing the required versions of Python and PyTorch
using [Conda](https://docs.conda.io/en/latest/miniconda.html).
To set up the Conda environment, use one of the following command
suitable for your operating system and whether you want to use it on a
CPU or GPU.

First, create a Python 3.7 environment using Conda:
- Linux, CPU:
```
conda create --name roosterize python=3.7 pip
conda activate roosterize
conda env create --name roosterize --file conda-envs/cpu.yml
```

Then, install PyTorch 1.1.0 by following the
[official instructions](https://pytorch.org/get-started/previous-versions/#conda-5),
using the command suitable for your operating system and whether you want to use
it on a CPU or GPU. For example, the following command installs PyTorch for
CPU-only use on Linux or Windows:
- Linux, GPU w/ CUDA 10.0:
```
conda install pytorch-cpu==1.1.0 torchvision-cpu==0.3.0 cpuonly -c pytorch
conda env create --name roosterize --file conda-envs/gpu-cuda10.yml
```
Finally, install the required Python libraries:

- Linux, GPU w/ CUDA 9.0:
```
conda install configargparse nltk future seutil==0.4.12 six torchtext==0.4.0 tqdm==4.30.*
conda env create --name roosterize --file conda-envs/gpu-cuda9.yml
```

### Installation of Roosterize and trained models
- Mac, CPU:
```
conda env create --name roosterize --file conda-envs/mac-cpu.yml
```

To install Roosterize itself, clone the GitHub repository and enter the root directory:
Finally, activate the Conda environment before using Roosterize:
```
git clone https://github.com/EngineeringSoftware/roosterize.git
cd roosterize
conda activate roosterize
```

Next, you need to obtain pre-trained models that capture
naming conventions. We have trained several models using our
[corpus][math-comp-corpus], which follows the conventions
used in the [Mathematical Components][math-comp-website]
family of projects. These models are available for separate
[download][models-link].
### Installation of trained models

Next, you need to obtain a pre-trained model that capture naming
conventions. The default pre-trained model, which was trained using
our [corpus][math-comp-corpus] and follows the conventions used in the
[Mathematical Components][math-comp-website] family of projects, can
be obtained by running the command:

To use our pre-trained models, go to the Roosterize
root directory, and download and unpack the archive (`models.tgz`):
```
wget https://github.com/EngineeringSoftware/roosterize/releases/download/v8.10.0/models.tgz
tar xzf models.tgz
./bin/roosterize download_global_model
```

[models-link]: https://github.com/EngineeringSoftware/roosterize/releases/download/v8.10.0/models.tgz
The model will be downloaded to `$HOME/.roosterize/`. To use a
different model (that we [released][latest-release] or you trained),
simply put it in `$HOME/.roosterize/`.

## Usage

To use Roosterize on a Coq verification project, you first need to
build the Coq project using a command provided by the project
(usually `make`). Then, at the root directory of
the Roosterize project repository, run the command
```
python -m roosterize.main suggest_lemmas \
--project=$PATH_TO_PROJECT \
--serapi-options=$SERAPI_OPTIONS \
--model-dir=./models/roosterize-ta \
--output=./output
```
where `$PATH_TO_PROJECT` should be replaced with the path to the
Coq project, and `$SERAPI_OPTIONS` should be replaced with the SerAPI
command line options for mapping logical paths to directories
(see [SerAPI's documentation][serapi-faq-link]). For example,
if the logical path (inside Coq) for the project is `Verified`,
you should set `SERAPI_OPTIONS="-R $PATH_TO_PROJECT,Verified"`.

The above command extracts all lemmas from the project, uses Roosterize's
pre-trained model (at `./models/roosterize-ta`) to predict a lemma name
for each lemma, and finally prints the lemma name update suggestions,
i.e., the predicted lemma names that are different from to the existing ones.
Below is an example of printed suggestions:
```
>>>>> Suggestions:
infotheo/ecc_classic/bch.v: infotheo.ecc_classic.bch.BCH.BCH_PCM_altP1 -> inde_F2
infotheo/ecc_classic/bch.v: infotheo.ecc_classic.bch.BCH.BCH_PCM_altP2 -> inde_mul
infotheo/ecc_classic/bch.v: infotheo.ecc_classic.bch.BCH.PCM_altP -> F2_eq0
...
build the Coq project using a command provided by the project (usually
`make`). Then, run this command to get the lemma name suggestions for
the lemmas in a Coq document (.v file):

```
python -m roosterize.main suggest_naming --file=PATH_TO_FILE
```

Roosterize automatically infers the root directory of the project by
finding `_CoqProject`, and reads `_CoqProject` to infer the SerAPI
command line options (for mapping logical paths to directories, see
[SerAPI's documentation][serapi-faq-link]). If you don't have a
`_CoqProject` file, you need to provide an additional argument
`--project_root=PATH_TO_PROJECT_ROOT`.

<!-- where `$PATH_TO_PROJECT` should be replaced with the path to the -->
<!-- Coq project, and `$SERAPI_OPTIONS` should be replaced with the SerAPI -->
<!-- command line options for mapping logical paths to directories -->
<!-- (see [SerAPI's documentation][serapi-faq-link]). For example, -->
<!-- if the logical path (inside Coq) for the project is `Verified`, -->
<!-- you should set `SERAPI_OPTIONS="-R $PATH_TO_PROJECT,Verified"`. -->

The above command extracts all lemmas from the file, uses the
pre-trained model to generate likely lemma names for each lemma, and
finally prints the lemma name update suggestions, i.e., the generated
lemma names that are different from to the existing ones. See an
[example suggestion report](./docs/example-suggestion.txt).

For other usages and command line interfaces of Roosterize, please
check the help:
```
python -m roosterize.main help
```

[latest-release]: https://github.com/EngineeringSoftware/roosterize/releases/latest
[serapi-faq-link]: https://github.com/ejgallego/coq-serapi/blob/v8.10/FAQ.md#does-serapi-support-coqs-command-line-flags

## Technique
Expand Down
108 changes: 108 additions & 0 deletions docs/example-suggestion.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
$ ~/projects/roosterize/bin/roosterize suggest_naming --file=$PWD/finmap/finmap.v
== Analyzed 110 lemma names, 8 (7.3%) conform to the learned naming conventions.
==========
== 21 can be improved and here are Roosterize's suggestions:
Line 851: fcatsK => eq_fcat (likelihood: 0.45)
Line 822: fcatC => eq_fcat (likelihood: 0.44)
Line 862: fcatKs => eq_fcat (likelihood: 0.43)
Line 1118: map_key_zip' => eq_zip (likelihood: 0.31)
Line 1178: zip_supp' => eq_zip (likelihood: 0.31)
Line 1258: zunit_fcat => zip_fcat (likelihood: 0.30)
Line 769: disjC => eq_disj (likelihood: 0.30)
Line 962: mapf_disj => eq_map (likelihood: 0.29)
Line 526: fcats0 => fcat_nil (likelihood: 0.28)
Line 1273: zunit_disj => disj_zip (likelihood: 0.27)
Line 1186: zip_supp => eq_zip (likelihood: 0.27)
Line 937: mapf_ins => map_ins (likelihood: 0.26)
Line 525: fcat0s => fcat_nil (likelihood: 0.25)
Line 443: seqof_ins => path_ordP (likelihood: 0.24)
Line 945: mapf_fcat => map_fcat (likelihood: 0.23)
Line 1139: zip_sorted' => map_zip (likelihood: 0.23)
Line 1006: path_mapk => path_ord_supp (likelihood: 0.23)
Line 953: mapf_disjL => eq_map (likelihood: 0.23)
Line 158: fmapP => eq_fnd (likelihood: 0.22)
Line 1075: zipC' => eq_zip (likelihood: 0.21)
Line 982: foldf_ins => path_foldfmap (likelihood: 0.20)
==========
== 81 can be improved but Roosterize cannot provide good suggestion:
Line 1082: zipA' (best guess: zip_bind; likelihood: 0.20)
Line 716: kfilter_pred0 (best guess: kfilter0s; likelihood: 0.19)
Line 462: path_supp_ins_inv (best guess: path_ord_supp; likelihood: 0.19)
Line 719: kfilter_predT (best guess: kfilterTE; likelihood: 0.18)
Line 1017: mapk_ins (best guess: path_ord; likelihood: 0.17)
Line 528: fcat_inss (best guess: eq_fcat; likelihood: 0.17)
Line 1194: zip_filter' (best guess: filter_zip; likelihood: 0.16)
Line 447: path_supp_ins (best guess: path_ord1; likelihood: 0.16)
Line 532: fcat_sins (best guess: eq_fcat; likelihood: 0.15)
Line 209: suppE (best guess: eq_suppP; likelihood: 0.15)
Line 1215: zip_fnd (best guess: eq_zip; likelihood: 0.14)
Line 979: foldf_nil (best guess: foldfmap_nil; likelihood: 0.14)
Line 606: supp_eq_ins (best guess: eq_supp; likelihood: 0.13)
Line 549: fcat_rems (best guess: eq_fcat; likelihood: 0.13)
Line 97: sorted_filter' (best guess: map_filter; likelihood: 0.13)
Line 650: sorted_kfilter (best guess: map_kfilter; likelihood: 0.13)
Line 665: kfilt_nil (best guess: nil_kfilter; likelihood: 0.12)
Line 120: last_ins' (best guess: path_ord; likelihood: 0.12)
Line 319: ins_ins (best guess: eq_ins; likelihood: 0.12)
Line 86: sorted_ins' (best guess: map_ins; likelihood: 0.12)
Line 657: supp_kfilt (best guess: supp_filter; likelihood: 0.12)
Line 922: map_key_mapf (best guess: map_inj; likelihood: 0.11)
Line 134: notin_path (best guess: path_ord; likelihood: 0.11)
Line 903: feqP (best guess: feqmP; likelihood: 0.11)
Line 1164: zip_unitE (best guess: existsP; likelihood: 0.11)
Line 358: rem_empty (best guess: nil_rem; likelihood: 0.11)
Line 233: supp_nilE (best guess: nilP; likelihood: 0.10)
Line 518: fcat_nil' (best guess: fcatP; likelihood: 0.09)
Line 253: fnd_empty (best guess: fnd_eq; likelihood: 0.09)
Line 262: fnd_ins (best guess: eq_fnd; likelihood: 0.09)
Line 830: fcatA (best guess: eq_fcat; likelihood: 0.09)
Line 1153: zip_unit_sorted' (best guess: zip_unit; likelihood: 0.09)
Line 709: kfilt_fcat (best guess: fcat_kfilter; likelihood: 0.08)
Line 668: fnd_kfilt (best guess: fndP; likelihood: 0.08)
Line 844: fcatCA (best guess: eq_fcat; likelihood: 0.08)
Line 558: fcat_srem (best guess: eq_fcat; likelihood: 0.08)
Line 621: fmap_ind2 (best guess: eq_supp; likelihood: 0.08)
Line 415: cancel_ins (best guess: eq_ins; likelihood: 0.08)
Line 1250: zunit_ins (best guess: zip_unit; likelihood: 0.07)
Line 76: path_ins' (best guess: map_ord; likelihood: 0.07)
Line 510: fcat_ins' (best guess: eq_ins'; likelihood: 0.07)
Line 63: sorted_nil (best guess: ord_sorted; likelihood: 0.07)
Line 776: disj_nil (best guess: disj_fin; likelihood: 0.07)
Line 255: fnd_rem (best guess: optio_fnd; likelihood: 0.07)
Line 142: path_supp_ord (best guess: path_ord; likelihood: 0.07)
Line 56: fmapE (best guess: prodP; likelihood: 0.07)
Line 1001: sorted_mapk (best guess: ord_supp; likelihood: 0.06)
Line 839: fcatAC (best guess: eq_fcat; likelihood: 0.06)
Line 793: disj_rem (best guess: eqseq_disj; likelihood: 0.05)
Line 760: disjP (best guess: eq_spec; likelihood: 0.05)
Line 722: kfilter_predI (best guess: kfilterI1; likelihood: 0.05)
Line 925: sorted_map (best guess: map_sorted; likelihood: 0.05)
Line 680: kfilt_ins (best guess: ins_kfilter; likelihood: 0.04)
Line 726: kfilter_predU (best guess: kfilter_fcat; likelihood: 0.04)
Line 1206: zip_rem (best guess: fin_zip; likelihood: 0.04)
Line 586: supp_fcat (best guess: eq_fcat; likelihood: 0.04)
Line 1115: zip_unitL' (best guess: zip_unit; likelihood: 0.04)
Line 236: supp_rem (best guess: meC_supp1; likelihood: 0.04)
Line 277: ins_rem (best guess: eq_ins; likelihood: 0.04)
Line 1028: map_id (best guess: fin_mapk; likelihood: 0.04)
Line 698: kfilt_rem (best guess: rem_kfilter; likelihood: 0.03)
Line 1126: zip_unitE' (best guess: eq_zipP; likelihood: 0.03)
Line 370: rem_ins (best guess: eq_rem; likelihood: 0.03)
Line 1033: map_comp (best guess: mapkC_comp; likelihood: 0.03)
Line 800: disj_remE (best guess: eq_disj; likelihood: 0.03)
Line 870: disj_kfilt (best guess: eqseq_kfilter; likelihood: 0.03)
Line 1267: zunit_supp (best guess: zip_unit; likelihood: 0.03)
Line 243: supp_ins (best guess: mort_supp; likelihood: 0.03)
Line 689: rem_kfilt (best guess: rem_kfilter; likelihood: 0.03)
Line 779: disj_ins (best guess: eqseq_disj; likelihood: 0.03)
Line 471: fmap_ind' (best guess: fin_supp; likelihood: 0.03)
Line 150: notin_filter (best guess: filter_predk; likelihood: 0.03)
Line 411: fnd_supp_in (best guess: mex_exists; likelihood: 0.03)
Line 191: suppP (best guess: supp_spec; likelihood: 0.03)
Line 361: rem_rem (best guess: eqseq_rem; likelihood: 0.02)
Line 1247: zunit0 (best guess: zip_comp; likelihood: 0.02)
Line 124: first_ins' (best guess: map_ord; likelihood: 0.02)
Line 181: predkN (best guess: prod0_predk; likelihood: 0.02)
Line 398: rem_supp (best guess: min_rem; likelihood: 0.02)
Line 483: fmap_ind'' (best guess: min_ord; likelihood: 0.01)
Line 881: in_disj_kfilt (best guess: in_supp; likelihood: 0.01)
6 changes: 3 additions & 3 deletions roosterize/interface/CommandLineInterface.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ def __init__(self):
self.exclude_files = None
self.exclude_pattern = None
self.serapi_options = None
self.model_url = "https://github.com/EngineeringSoftware/roosterize/releases/download/v1.1.0+8.10.0/roosterize-model-t1.tgz"
self.model_url = "https://github.com/EngineeringSoftware/roosterize/releases/download/v1.1.0+8.10.2-beta/roosterize-model-t1.tgz"
self.compile_cmd = None
self.loaded_config_prj: Path = None

Expand Down Expand Up @@ -241,15 +241,15 @@ def report_predictions(self, data: ProcessedFile, candidates_logprobs: List[List
# Print suggestions
total = len(good_names) + len(bad_names_and_suggestions) + len(bad_names_no_suggestion)
print(f"== Analyzed {total} lemma names, "
f"{len(good_names)} ({len(good_names)/total:.1%}) look good.")
f"{len(good_names)} ({len(good_names)/total:.1%}) conform to the learned naming conventions.")
if len(bad_names_and_suggestions) > 0:
print(f"==========")
print(f"== {len(bad_names_and_suggestions)} can be improved and here are Roosterize's suggestions:")
for lemma, suggestion, score in sorted(bad_names_and_suggestions, key=lambda x: x[2], reverse=True):
print(f"Line {lemma.vernac_command[0].lineno}: {lemma.name} => {suggestion} (likelihood: {score:.2f})")
if len(bad_names_no_suggestion) > 0:
print(f"==========")
print(f"== {len(bad_names_no_suggestion)} can be improved but Roosterize cannot provide good suggestion (please consider improve_project_model):")
print(f"== {len(bad_names_no_suggestion)} can be improved but Roosterize cannot provide good suggestion:")
for lemma, suggestion, score in sorted(bad_names_no_suggestion, key=lambda x: x[2], reverse=True):
print(f"Line {lemma.vernac_command[0].lineno}: {lemma.name} (best guess: {suggestion}; likelihood: {score:.2f})")

Expand Down
2 changes: 1 addition & 1 deletion roosterize/interface/VSCodeInterface.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ def report_predictions(self, data: ProcessedFile, candidates_logprobs: List[List

total = len(good_names) + len(bad_names_and_suggestions) + len(bad_names_no_suggestion)
self.show_message(f"{data.path}: Analyzed {total} lemma names, "
f"{len(good_names)} ({len(good_names)/total:.1%}) look good. "
f"{len(good_names)} ({len(good_names)/total:.1%}) conform to the learned naming conventions. "
f"Roosterize made {len(bad_names_and_suggestions)} suggestions.")

# Publish suggestions as diagnostics
Expand Down

0 comments on commit 2990f7b

Please sign in to comment.