Skip to content

Crux is a suite of LLM-empowered summarization and retrieval services for academic activity. Crux is developed by XCCV group of cvpaper.challenge.

License

Notifications You must be signed in to change notification settings

cvpaperchallenge/Crux

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crux

stable python versions tests MIT License Code style: black Code style: flake8 Imports: isort Typing: mypy DOI

What is Crux?

Crux (Accelerator of SCiENtific DEvelopment and Research) is a GitHub repository template for research projects using Python as a developing language. The following features are pre-implemented to accelerate your development:

  • Container: Use of Docker reduces development environment dependencies and improves code portability.
  • Virtual environment / package management: Package management using Poetry improves reproducibility of the same environment.
  • Coding style: Automatic code style formatting using Black, Flake8, and isort.
  • Static type check: Static type checking with Mypy to assist in finding bugs.
  • pytest: Easily add test code using pytest.
  • GitHub features: Some useful features, workflow for style check and test for pull request, issue template, etc. are pre-implemented.

Please check the slide format resources about Crux (Japanese) too.

Project Organization

    ├── .github/           <- Settings for GitHub.
    │
    ├── data/              <- Datasets.
    │
    ├── environments/       <- Provision depends on environments.
    │
    ├── models/            <- Pretrained and serialized models.
    │
    ├── notebooks/         <- Jupyter notebooks.
    │
    ├── outputs/           <- Outputs.
    │
    ├── src/               <- Source code. This sould be Python module.
    │
    ├── tests/             <- Test codes.
    │
    ├── .flake8            <- Setting file for Flake8.
    ├── .dockerignore
    ├── .gitignore
    ├── LICENSE
    ├── Makefile           <- Makefile used as task runner.
    ├── poetry.lock        <- Lock file. DON'T edit this file manually.
    ├── poetry.toml        <- Setting file for Poetry.
    ├── pyproject.toml     <- Setting file for Project. (Poetry, Black, isort, Mypy)
    └── README.md          <- The top-level README for developers.

Prerequisites

NOTE: Example codes in the README.md are written for Docker Compose v2. However, Crux also should work under Docker Compose v1. If you are using Docker Compose v1, just replace docker compose in the example code by docker-compose.

Prerequisites installation

Here, we show example prerequisites installation codes for Ubuntu. If prerequisites are already installed your environment, please skip this section. If you want to install in another environment, please follow the officail documentations.

Install Docker and Docker Compose

# Set up the repository
$ sudo apt update
$ sudo apt install ca-certificates curl gnupg lsb-release
$ sudo mkdir -p /etc/apt/keyrings
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
$ echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

# Install Docker and Docker Compose
$ sudo apt update
$ sudo apt install docker-ce docker-ce-cli containerd.io docker-compose-plugin

If sudo docker run hello-world works, installation succeeded.

(Optional) NVIDIA Container Toolkit

If you want to use GPU in Crux, please install NVIDIA Container Toolkit (nvidia-docker2) too. NVIDIA Container Toolkit also requires some prerequisites to install. So please check thier official documentation first.

$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
      && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
      && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
            sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
            sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

$ sudo apt update
$ sudo apt install -y nvidia-docker2
$ sudo systemctl restart docker

If sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base nvidia-smi works, installation succeeded.

Quick start

Here, we explain how to start using Crux. Please refer to this slide (Japanese) for detailed information.

Create GitHub repo from Crux

Fisrt of all, you need to create your own GitHub repo from Crux as follows:

Now, a new repo should be created from Crux in your GitHub account.

Start development

# Clone repo
$ git clone git@github.com:cvpaperchallenge/<YOUR_REPO_NAME>.git
$ cd <YOUR_REPO_NAME>

# Build Docker image and run container
$ cd environments/gpu  # if you want to use only cpu, `cd environments/cpu`
$ sudo docker compose up -d

# Run bash inside of container (jump into contaienr)
$ sudo docker compose exec core bash

# Create virtual environment and install dependent packages by Poetry
$ poetry install

Now, you are ready to start development with Crux.

Stop development

# Stop container
$ cd environments/gpu  # or `cd environments/cpu` 
$ sudo docker compose stop

FAQ

Use Crux without Docker

We recommend using Crux with Docker as described above. However, you might not be able to install Docker in your development environment due to permission issues or etc.

In such cases, Crux can be used without Docker. To do that, please install Poetry in your computer, and follow the steps describing in "Start development" section with ignoring the steps related to Docker.

# Install Poetry
$ pip3 install poetry

# Clone repo
$ git clone git@github.com:<YOUR_USER_NAME>/<YOUR_REPO_NAME>.git
$ cd <YOUR_REPO_NAME>

# Create virtual environment and install dependent packages by Poetry
$ poetry install

NOTE: CI job (GitHub Actions workflow) of Crux is using Dockerfile. Therefore, using Crux without Docker might raise error at CI job. In that case, please modify the Dockerfile appropriately or delete the CI job (.github/workflows/lint-and-test.yaml).

Permission error is raised when execute poetry install.

Sometime poetry install might rise permission error like following:

$ poetry install
...

virtualenv: error: argument dest: the destination . is not write-able at /home/challenger/crux

In that case, please check UID (user id) and GID (group id) at your local PC by following:

$ id -u $USER  # check UID
$ id -g $USER  # check GID

In Crux, default value of both is 1000. If UID or GID of your local PC is not 1000, you need to modify the value of UID or GID inside of docker-compose.yaml to align your local PC (please edit their values from 1000). Or if environmental variables HOST_UID and HOST_GID is defined at host PC, Crux uses these values.

Compatibility issue between PyTorch and Poetry

NOTE: Now poetry 1.2 is used in Crux. So this issue is expected to be solved.

Currently, there is a compatibility issue between PyTorch and Poetry. This issue is being worked on by the Poetry community and is expected to be resolved in 1.2.0. (You can check pre-release of 1.2.0 from here.)

We plan to incorporate Poetry 1.2.0 into Crux immediately after its release. In the meantime, please consider using the workaround described in this issue.

Some related GitHub issues

Change the Python version to run CI jobs

By default, CI job (GitHub Actions workflow) of Crux is run against Python 3.8 and 3.9. If you want to change the target Python version, please modify the matrix part of .github/workflows/lint-and-test.yaml.

When changes to the Dockerfile are not reflected correctly on the image build

When you run sudo docker compose up after adding some modifications to the Dockerfile, you may find no changes have been made to the image built. In that case, please try following commands:

$ sudo docker compose build --no-cache
$ sudo docker compose up --force-recreate -d

When changes to the Dockerfile are not reflected, potential reasons are:

  1. docker uses cache to build an image
  2. docker doesn't recreate a container

sudo docker compose build --no-cache command build docker image with no cache (the solution for the 1st case). And sudo docker compose up --force-recreate -d command recreate and start containers (the solution for the 2nd case).

Activate/deavtivate caching in CI job

Caching has been introduced in CI job (lint-and-tests.yaml) since v0.1.2 to minimize latency due to Docker image build and Poetry install in the CI job. However, this feature has not yet been fully tested, so if you do not want to use it in the CI job, please change the value of USE_CACHE variable in lint-and-tests.yaml to false.

About

Crux is a suite of LLM-empowered summarization and retrieval services for academic activity. Crux is developed by XCCV group of cvpaper.challenge.

Resources

License

Stars

Watchers

Forks

Packages

No packages published