Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use maturin as a build backend and improve github workflows #165

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

yvan-sraka
Copy link
Contributor

@yvan-sraka yvan-sraka commented Jan 31, 2025

Why

maturin is the best for Rust-based extensions and makes much faster builds. For cross-compilation cross could be used, it works well with maturin, it's much faster than CPU emulation since it natively compiles code, has pre-configured cross-compilers. Linux aarch64 build was taking more than an hour most likely due to QEMU, because it translates every instruction: #62

Example
image

In this PR

Openssl

Switching from openssl to rustls, which drastically simplifies absolutely everything 🎉

It was possible to opt out by:

  • enabling rustls-tls feature for hf-hub dependency
  • creating PR in tokenizers to add rustls-tls feature, which was quickly merged on their side, so we're pinning to that commit (also Rust code was adjusted due to new version)

Workflows

Changes:

  • new dry-run release workflow
  • new custom action to bump version automatically, it's a custom action, because we need it within job's context
  • new reusable workflow to build the wheels: for release workflows & for dry-run release workflows on merges to main
  • uniting release PyPi & release Rust workflows, so that if one fails, the other won't be published, otherwise we're in situation with releases mismatch in Python & Rust registries

Results

Each job:

  • creates 5 wheels, one per python version
  • runs between 3 mins (macos) and 8 mins (windows), when hits all the caches

image

https://github.com/dottxt-ai/outlines-core/actions/runs/13306992049/job/37160143310?pr=165

@yvan-sraka yvan-sraka requested review from torymur and rlouf January 31, 2025 15:13
@yvan-sraka yvan-sraka self-assigned this Jan 31, 2025
@yvan-sraka yvan-sraka linked an issue Feb 3, 2025 that may be closed by this pull request
@yvan-sraka yvan-sraka force-pushed the maturin branch 3 times, most recently from c1f307d to 7d7d102 Compare February 4, 2025 12:43
@torymur
Copy link
Contributor

torymur commented Feb 4, 2025

Agree with the steps, especially this:

merging jobs wasn't the best approach

And we need to take into consideration, that currently for macos builds there is inconsistency #168, I will fix this using macos-latest: #169
But this reopens the issue to have builds both for macos-14 & macos-15: #75

Maybe having jobs per each os with strategy matrix per python version and target:

  release:
    name: Release to PyPI
    runs-on: ubuntu-latest
    needs:
      - linux
      - windows
      - macos-14
      - macos-15
      - linux-cross

@yvan-sraka yvan-sraka requested a review from torymur February 4, 2025 12:45
@yvan-sraka yvan-sraka marked this pull request as ready for review February 4, 2025 12:46
@yvan-sraka
Copy link
Contributor Author

yvan-sraka commented Feb 4, 2025

 __________________ ERROR collecting tests/test_json_schema.py __________________
ImportError while importing test module '/home/runner/work/outlines-core/outlines-core/tests/test_json_schema.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/_pytest/python.py:493: in importtestmodule
    mod = import_path(
/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/_pytest/pathlib.py:549: in import_path
    mod = _import_module_using_spec(
/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/_pytest/pathlib.py:715: in _import_module_using_spec
    spec.loader.exec_module(mod)  # type: ignore[union-attr]
/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/_pytest/assertion/rewrite.py:184: in exec_module
    exec(co, module.__dict__)
tests/test_json_schema.py:5: in <module>
    from outlines_core.json_schema import build_regex_from_schema
E   ModuleNotFoundError: No module named 'outlines_core.json_schema'       

@yvan-sraka yvan-sraka force-pushed the maturin branch 2 times, most recently from e180ccc to df00717 Compare February 4, 2025 23:21
@yvan-sraka yvan-sraka requested a review from torymur February 4, 2025 23:22
@yvan-sraka
Copy link
Contributor Author

I gave a try to % maturin generate-ci > .github/workflows/release_pypi.yaml, WDYT?

@yvan-sraka
Copy link
Contributor Author

Run python -m coverage combine
Combined data file .coverage.8cc0c411e24d7bac9c3565f8296dcf91
No source for code: '/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/outlines_core/__init__.py'.
Error: Process completed with exit code 1.

@torymur
Copy link
Contributor

torymur commented Feb 5, 2025

To address questions:

  1. generated release file could be a good starting point (maybe, if it's easier), but it can't be used as it is, we need to do match exactly what we currently have and then iterate if needed
  2. scopes seem okay, tests are failing, I think you might need to adjust the testing workflow to do maturin develop before running them

@yvan-sraka yvan-sraka force-pushed the maturin branch 4 times, most recently from aef5905 to 4782a04 Compare February 5, 2025 14:21
@torymur
Copy link
Contributor

torymur commented Feb 17, 2025

Some updates:

  • python version to rely on maturin action's interpreter
  • python 3.13
  • twine to check the wheels & check imports where appropriate
  • build_wheels reusable workflow
  • custom bump version action

Versioning investigation:

  • setuptools_scm is designed for setuptools and relies on extension APIs of setuptools, maturin as a pep517 build backend doesn't even use setuptools so it can't support setuptools_scm
  • maturin relies on Cargo.toml's version and correctly writes to wheels's metadata (manually confirmed), so that version via from importlib.metadata import version will show a correct version. But we will also repeat the existing logic and separately provide __version__ module's attribute, since it was provided before
  • experimented with build.rs to handle version bumping in case of release & dev work. But after setting up everything I wasn't able to find a way not to overwrite Cargo.toml file. Even though it'll work, it will also create new Cargo.toml with a different layout of dependencies and written version locally. It looked good generally as a way to do things, but it would only make sense if we'll stop bumping version automatically and switch again to manual bumps (which is, by looking around, actually a very common way to do this)
  • if we'll stick to automatic version bumping via script: we won't be able to setup dev version like 2.1.3-dev, because Rust uses SemVer while python uses PEP 440, which have e.g. some differences when declaring prereleases. Python expects version to be defined like 2.1.3.dev and like Rust 2.1.3-dev and there is only one Cargo.toml. So, for dev work we'd stick with 0.0.0 placeholder versions.
It was possible in build.rs via `semver::Prerelease::new("dev")`
[build-dependencies]
semver = "1.0.25"
toml = "0.8.20"
...

fn git_tag() -> Option<semver::Version> {
    let mut cmd = std::process::Command::new("git");
    cmd.args(["describe", "--exact-match", "--tags"]);

    let tag = cmd.output().ok()?;
    if !tag.status.success() {
        return None;
    }

    let version = String::from_utf8_lossy(&tag.stdout[1..]);
    semver::Version::parse(version.trim()).ok()
}

fn dev_version() -> semver::Version {
    let mut cmd = std::process::Command::new("git");
    cmd.args(["describe", "--tags", "--abbrev=0"]);

    let latest_tag = match cmd.output() {
        Ok(tag) => String::from_utf8_lossy(&tag.stdout).to_string(),
        Err(_) => "0.0.0".to_string(),
    };

    let mut version =
        semver::Version::parse(latest_tag.trim()).expect("Parsing latest version failed");
    version.pre = semver::Prerelease::new("dev").expect("Pre-release");
    version
}

fn bump_cargo_toml(version: &semver::Version) -> Result<(), Box<dyn std::error::Error>> {
    let cargo_toml_path = std::path::Path::new(env!("CARGO_MANIFEST_DIR")).join("Cargo.toml");
    let cargo_toml = std::fs::read_to_string(&cargo_toml_path)?;

    let mut parsed_toml: toml::Value =
        toml::de::from_str(&cargo_toml).expect("Unable to parse Cargo.toml");
    parsed_toml["package"]["version"] = toml::Value::String(version.to_string());

    let updated_toml = toml::ser::to_string(&parsed_toml).expect("Unable to serialize TOML");
    std::fs::write(&cargo_toml_path, updated_toml).expect("Unable to write updated Cargo.toml");

    Ok(())
}

fn main() {
    println!("cargo:rerun-if-changed=build.rs");
    println!("cargo:rerun-if-changed=Cargo.toml");

    let version = git_tag().unwrap_or(dev_version());
    bump_cargo_toml(&version).expect("Version update of Cargo.toml failed");
}

@torymur torymur force-pushed the maturin branch 2 times, most recently from 981ce78 to ccc52aa Compare February 18, 2025 11:44
@torymur torymur force-pushed the maturin branch 3 times, most recently from 46ef0f5 to 279b640 Compare February 18, 2025 12:24
@torymur
Copy link
Contributor

torymur commented Feb 18, 2025

1. Incorrect bumping action

Current action for bumping is incorrect, with regex it gets wrong CARGO_VERSION It was working before because we didn't have dependencies defined separately like [dependencies.hf-hub]

Details
   - name: Check Cargo.toml version matches Release tag
      run: |
        CARGO_VERSION=$(grep '^version =' Cargo.toml | sed 's/.*"\(.*\)".*/\1/')
        echo "Cargo version is $CARGO_VERSION"
        echo "Tags: ${GITHUB_REF#refs/tags/}"
        if [ "${GITHUB_REF#refs/tags/}" != "$CARGO_VERSION" ]; then
          echo "Version mismatch: Cargo.toml ($CARGO_VERSION) doesn't match Release tag (${GITHUB_REF#refs/tags/})"
          exit 1
        fi
      shell: bash

It matches also versions for dependencies, defined lower (0.4.1):

Cargo version is 0.2.3
=0.4.1
Tags: refs/pull/165/merge
Version mismatch: Cargo.toml (0.2.3
=0.4.1) doesn't match Release tag (refs/pull/165/merge)

Confirmed that now it works correctly: https://github.com/dottxt-ai/outlines-core/actions/runs/13390036577/job/37395484642?pr=165#step:4:186 with:

VERSION=$(cargo metadata --format-version=1 --no-deps | jq -r '.packages[0].version')

Result as expected:

Package version is 0.2.3
Tags: refs/pull/165/merge
Version mismatch: Cargo.toml (0.2.3) doesn't match Release tag (refs/pull/165/merge)

2. Wheels checks

Checked written versions in wheels with automated script, which bumps version, also scopes & overall functioning. All looks good.

Wheels for checking could be downloaded here: https://github.com/dottxt-ai/outlines-core/actions/runs/13390483062

Scopes & versions confirmed
In [2]: import outlines_core

In [3]: dir(outlines_core)
Out[3]: 
['Guide',
 'Index',
 'Vocabulary',
 '__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 'json_schema',
 'outlines_core']

In [4]: outlines_core.__version__
Out[4]: '0.2.3'

In [5]: from importlib.metadata import version

In [6]: version("outlines_core")
Out[6]: '0.2.3'
outlines_core-0.2.3.dist-info$ cat METADATA 
Metadata-Version: 2.4
Name: outlines_core
Version: 0.2.3

3. Dry-run publish workflow

It ran successfully in terms of building wheels, but cargo publish --dry-run failed with:

error: all dependencies must have a version specified when publishing.
dependency `tokenizers` does not specify a version
Note: The published dependency will use the version from crates.io,
the `git` specification will be removed from the dependency declaration.

https://github.com/dottxt-ai/outlines-core/actions/runs/13392912542/job/37405398157

Indeed, we're using tokenizers from git with specific commit, meanwhile for publishing on crates.io it's required to be with a version, available on crates.io.

I asked for a fresh release: huggingface/tokenizers#1736

@torymur torymur changed the title Use maturin as build-backend Use maturin as a build backend and improve github workflows Feb 18, 2025
@torymur torymur force-pushed the maturin branch 5 times, most recently from 068789f to b147dcf Compare February 24, 2025 13:13
@torymur
Copy link
Contributor

torymur commented Feb 25, 2025

Hey @yvan-sraka Could you please review this one?

Copy link
Contributor Author

@yvan-sraka yvan-sraka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not the right to approve (but I would do otherwise) as I opened that PR with my user … even if clearly most of the work here was made by @torymur.

I left a lot of comments because that's a huge PR that fixes so many useful things, like documenting and cleaning the Python interfaces exposed, but overall great work that could be merged as is!

run: |
VERSION=$(cargo metadata --format-version=1 --no-deps | jq -r '.packages[0].version')
if [ "${GITHUB_REF#refs/tags/}" != "$VERSION" ]; then
echo "Version mismatch: Cargo.toml ($VERSION) doesn't match Release tag (${GITHUB_REF#refs/tags/})"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great to check that assertion here!

strategy:
matrix:
platform:
# older ubuntu to avoid messing with glibc version
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have a link here or a short paragraph that explain what's exactly the kind of glibc-related bug we try to avoid here? and since we're not distributing a static binary (e.g., musl-based) this kind of issue can still affect users in the end, right?

- name: Install required packages
run: |
sudo apt update
sudo apt install pkg-config gcc-aarch64-linux-gnu g++-aarch64-linux-gnu -qy
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why build-essential isn't enough here? if not for openssl, what's provided by pkg-config? I'm btw curious if gcc could be replaced by clang here without breaking anything?

run: twine check --strict dist/*

- name: Install built wheel
if: matrix.platform.target == 'x86_64'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we filtering out aarch64 here?

platform:
- runner: windows-latest
target: x86
alias-target: i686-pc-windows-msvc
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are Windows 32 bits programs still a thing?

workflow_dispatch:
push:
branches: [main]
# TODO: remove
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not addressing those left TODO comments in that PR?

@@ -76,11 +79,15 @@ jobs:
- uses: actions/setup-python@v4
with:
cache: pip
python-version: "3.11"
python-version: "3.10"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's broken with Python 3.10?

@@ -24,16 +24,6 @@ repos:
rev: 23.3.0
hooks:
- id: black
- repo: https://github.com/pre-commit/mirrors-mypy
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look like an unrelated change to that PR?

@@ -1,11 +1,27 @@
# How to release
## Overview
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great guide! Rendered

@@ -12,7 +12,6 @@ dependencies:
- pydantic
- pytest
- pre-commit
- jsonschema
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why that change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI enhancement New feature or request
Projects
None yet
2 participants