Skip to content

Commit

Permalink
Merge pull request #10 from SasCezar/dev-doc
Browse files Browse the repository at this point in the history
Added workflow for documentation building and deployment
  • Loading branch information
SasCezar authored Dec 7, 2023
2 parents 015eeb2 + bd6c70c commit 7de26bb
Show file tree
Hide file tree
Showing 6 changed files with 302 additions and 1 deletion.
34 changes: 34 additions & 0 deletions .github/workflows/docs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
name: mkdocs

on:
push:
branches:
- main
- dev-doc
permissions:
contents: write

jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Configure Git Credentials
run: |
git config user.name github-actions[bot]
git config user.email 41898282+github-actions[bot]@users.noreply.github.com
- uses: actions/setup-python@v4
with:
python-version: 3.x
- run: echo "cache_id=$(date --utc '+%V')" >> $GITHUB_ENV
- uses: actions/cache@v3
with:
key: mkdocs-material-${{ env.cache_id }}
path: .cache
restore-keys: |
mkdocs-material-
- name: Install requirements
run: python3 -m pip install -r mkdocs_requirements.txt
- name: Deploy 🚀
run: python3 -m mkdocs gh-deploy --force
176 changes: 176 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
# AutoFL
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![DOI](https://zenodo.org/badge/644095707.svg)](https://zenodo.org/doi/10.5281/zenodo.10255367)
[![Docker](https://img.shields.io/badge/Docker-blue.svg)](https://img.shields.io/badge/Docker-blue)

Automatic source code file annotation using weak labelling.

## Setup
Clone the repository and the UI submodule [autofl-ui](https://github.com/SasCezar/autofl-ui) by running the following command:
```bash
git clone --recursive git@github.com:SasCezar/AutoFL.git AutoFL
```

### Optional Setup
To make use of certain feature like semantic based labelling functions, you need to download the model.
For example, for **w2v-so**, you can download the model from [here](https://github.com/vefstathiou/SO_word2vec), and place it in the [data/models/w2v-so](data/models/w2v-so) folder, or a custom
path that you can use in the configs.

## Usage

Run docker the docker compose file [docker-compose.yaml](docker-compose.yaml) by executing:
```shell
docker compose up
```
in the project folder.

### API Endpoint
You can analyze the files of project by making a request to the endpoint:
```shell
curl -X POST -d '{"name": "<PROJECT_NAME>", "remote": "<PROJECT_REMOTE>", "languages": ["<PROGRAMMING_LANGUAGE>"]}' localhost:8000/label/files -H "content-type: application/json"
```
For example, to analyze the files of [https://github.com/mickleness/pumpernickel](https://github.com/mickleness/pumpernickel), you can make the following request:
```shell
curl -X POST -d '{"name": "pumpernickel", "remote": "https://github.com/mickleness/pumpernickel", "languages": ["java"]}' localhost:8000/label/files -H "content-type: application/json"
```

### UI

The tool also offers a web UI that is available at the following page (when running locally):
[http://localhost:8501](http://localhost:8501)

![UI](resources/ui-screenshots/landing-page.png)

For more details, check the [UI repo](https://github.com/SasCezar/autofl-ui).

[//]: # (For more details, check the [UI repo]&#40;https://github.com/SasCezar/autofl-ui&#41;)

## Configuration
AutoFL uses [Hydra](https://hydra.cc/) to manage the configuration. The configuration files are located in the [config](config) folder.
The main configuration file is [main.yaml](./config/main.yaml), which contains the following options:

- **local**: which environment to use, either local or docker. [Docker](./config/local/docker.yaml) is default.
- **taxonomy**: which taxonomy to use. Currently only [gitranking](./config/taxonomy/gitranking.yaml) is supported.
- **annotator**: which annotators to use. Default is [simple](./config/annotator/simple.yaml), which allows good results without extra dependencies on models.
- **version_strategy**: which version strategy to use. Default is [latest](./config/version_strategy/latest.yaml), which will only analyze the latest version of the project.
- **dataloader**: which dataloader to use. Default is [postgres](./config/dataloader/postgres.yaml) which allows the API to fetch already analysed projects.
- **writer**: which writer to use. Default is [postgres](./config/writer/postgres.yaml) which allows the API to store the results in a database.

Other configuration can be defined by creating a new file in the folder of the specific component.

## Functionalities

- Annotation (UI/API/Script)
- File
- Package
- Project
- Batch Analysis (Script Only)
- Temporal Analysis (**TODO**)
- Classification (**TODO**)

## Supported Languages

- Java
- Python (untested)
- C (untested)
- C++ (untested)
- C# (untested)

## Development

### Add New Languages

In order to support more languages, a new language specific parser is needed.
We can create one quickly by using [tree-sitter](https://tree-sitter.github.io/tree-sitter/),
and a custom parser.

#### Parser
The parser needs to be in the [parser/languages](./src/parser/languages) folder.
It has to extend the ```BaseParser``` class, which has the following interface.

```python
class ParserBase(ABC):
"""
Abstract class for a programming language parser.
"""

def __init__(self, library_path: Path | str):
"""
:param library_path: Path to the tree-sitter languages.so file. The file has to contain the
language parser. See tree-sitter for more details
"""
...
```
And the language specific class has to contain the logic to parse the language to get the identifiers.
For example for Python, the class will look like this:

```python
class PythonParser(ParserBase, lang=Extension.python.name): # The lang argument is used to register the parser in the ParserFactory class.
"""
Python specific parser. Uses a generic grammar for multiple versions of python. Uses tree_sitter to get the AST
"""

def __init__(self, library_path: Path | str):
super().__init__(library_path)
self.language: Language = Language(library_path, Extension.python.name) # Creates the tree-sitter language for python
self.parser.set_language(self.language) # Sets tree-sitter parser to parse the language

# Pattern used to match the identifiers, it depends on the Lanugage. Check tree-sitter
self.identifiers_pattern: str = """
((identifier) @identifier)
"""

# Creates the query used to find the identifiers in the AST produced by tree-sitter
self.identifiers_query = self.language.query(self.identifiers_pattern)

# Keyword that will be ignored, in this case, the language specific keywords as the query extracts them as well.
self.keywords = set(keyword.kwlist) # Use python's built in keyword list
self.keywords.update(['self', 'cls'])
```

A custom class that does not rely on [tree-sitter](https://github.com/tree-sitter/tree-sitter) can be also used, however, there are more methods from ParserBase that need to be
changed. Check the implementation of [ParserBase](src/parser/parser.py).

## Disclaimer

The project is still in development, and it might not work as expected in some cases.
It has been developed and tested on Docker 24.0.7 for ```Ubuntu 22.04```. While minor testing has been done on ```Windows``` and ```MacOS```,
not all functionalities might work due to differences in Docker for these OSs (e.g. Windows uses WSL 2).

In case of any problems, please open an issue, make a pull request, or contact me at ```c.a.sas@rug.nl```.

## Cite

If you use this work please cite us:

### Paper
```text
@article{sas2024multigranular,
title = {Multi-granular Software Annotation using File-level Weak Labelling},
author = {Cezar Sas and Andrea Capiluppi},
journal = {Empirical Software Engineering},
volume = {29},
number = {1},
pages = {12},
year = {2024},
url = {https://doi.org/10.1007/s10664-023-10423-7},
doi = {10.1007/s10664-023-10423-7}
}
```

**Note**: The code used in the paper is available in the [https://github.com/SasCezar/CodeGraphClassification](https://github.com/SasCezar/CodeGraphClassification) repository.
However, this tool is more up to date, is easier to use, configurable, and also offers a UI.

### Tool
```text
@software{sas2023autofl,
author = {Sas, Cezar and Capiluppi, Andrea},
month = dec,
title = {{AutoFL}},
url = {https://github.com/SasCezar/AutoFL},
version = {0.3.0},
year = {2023},
url = {https://doi.org/10.5281/zenodo.10255368},
doi = {10.5281/zenodo.10255368}
}
```
41 changes: 41 additions & 0 deletions mkdocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# yaml-language-server: $schema=https://squidfunk.github.io/mkdocs-material/schema.json

### Site metadata ###

site_name: AutoFL
site_description: Documentation for AutoFL
site_url: https://sascezar.github.io/AutoFL

repo_url: https://github.com/SasCezar/AutoFL
edit_uri: edit/main/docs/docs

### Build settings ###

theme:
name: material

plugins:
- search

- gen-files:
scripts:
- scripts/python/gen_ref_pages.py

- literate-nav:
nav_file: SUMMARY.md
implicit_index: false
tab_length: 4

- section-index

- mkdocstrings:
default_handler: python
handlers:
python:
paths: [ src ]
options:
docstring_style: sphinx

nav:
- Home: index.md
- Code Reference: reference/SUMMARY.md
7 changes: 7 additions & 0 deletions mkdocs_requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
mkdocs==1.5.3
mkdocstrings==0.24.0
mkdocs-gen-files==0.5.0
mkdocstrings-python==1.7.5
mkdocs-literate-nav==0.6.1
mkdocs-section-index==0.3.8
mkdocs-material==9.4.14
10 changes: 9 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ python-rake = "^1.5.0"
more-itertools = "^10.1.0"
tree-sitter = "^0.20.4"
sqlalchemy = "^2.0.21"
psycopg = {extras = ["binary"], version = "^3.1.13"}
psycopg = {extras = ["binary"], version = "^3.1.14"}
gensim = "^4.3.2"
fasttext-wheel = "^0.9.2"
transformers = "^4.35.2"
Expand All @@ -33,6 +33,14 @@ sentence-transformers = "^2.2.2"
[tool.poetry.group.dev.dependencies]
notebook = "^6.5.4"
jupyter = "^1.0.0"
mkdocs = "^1.5.3"
mkdocstrings = "^0.24.0"
mkdocs-gen-files = "^0.5.0"
mkdocstrings-python = "^1.7.5"
mkdocs-literate-nav = "^0.6.1"
mkdocs-section-index = "^0.3.8"
mkdocs-material = "^9.4.14"


[build-system]
requires = ["poetry-core"]
Expand Down
35 changes: 35 additions & 0 deletions scripts/python/gen_ref_pages.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
"""Generate the code reference pages and navigation."""

from pathlib import Path

import mkdocs_gen_files

nav = mkdocs_gen_files.Nav()

src = Path(__file__).parent.parent.parent / "src"

for path in sorted(src.rglob("*.py")):
module_path = path.relative_to(src).with_suffix("")
doc_path = path.relative_to(src).with_suffix(".md")
full_doc_path = Path("reference", doc_path)

parts = tuple(module_path.parts)

if parts[-1] == "__init__":
parts = parts[:-1]
doc_path = doc_path.with_name("index.md")
full_doc_path = full_doc_path.with_name("index.md")
elif parts[-1] == "__main__":
continue

nav[parts] = doc_path.as_posix()

with mkdocs_gen_files.open(full_doc_path, "w") as fd:
ident = ".".join(parts)
fd.write(f"::: {ident}")

#mkdocs_gen_files.set_edit_path(full_doc_path, Path('../') / path)
mkdocs_gen_files.set_edit_path(full_doc_path, path)

with mkdocs_gen_files.open("reference/SUMMARY.md", "w") as nav_file:
nav_file.writelines(nav.build_literate_nav())

0 comments on commit 7de26bb

Please sign in to comment.