Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize airflow build process and switch to Hatchling build backend #36536

Closed
wants to merge 1 commit into from

Commits on Jan 2, 2024

  1. Standardize airflow build process and switch to Hatchling build backend

    This PR changes Airflow installation and build backend to use new
    standard Python ways of building Python applications.
    
    We've been trying to do it for quite a while. Airflow tranditionally
    has been using complex and convoluted build process based on
    setuptools and (extremely) custom setup.py file. It survived
    migration to Airflow 2.0 and splitting Airlfow monorepo into
    Airflow and Providers, adding pre-installed providers and switching
    providers to use flit (and follow build standards).
    
    So far tooling in Python ecosystme had not been able to fuflill our
    needs and we refrained to develop our own tooling, but finally with
    appearance of Hatch (managed by Python Packaging Authority) and
    few recent advancements there we are finally able to swtich to
    Python standard ways of managing project dependnecy configuration
    and project build setup (with a few customizations).
    
    This PR makes airflow build process follow those standard PEPs:
    
    * Airflow has all build configuration stored in pyproject.toml
      following PEP 518 which allows any fronted (`pip`, `poetry`,
      `hatch`, `flit`, or whatever other frontend is used to
      install required build dependendencies to install Airflow
      locally and to build distribution pacakges (sdist/wheel)
    
    * Hatchling backend follows PEP 517 for standard source tree and build
      backend implementation that allows to execute the build in a
      frontend-independent way
    
    * We store all project metadata in pyprooject.toml - following
      PEP 621 where all necessary project metadata components were
      defined.
    
    * We plug-in into Hatchling "editable build" hooks following
      PEP 660. Hatchling internally builds editable wheel that
      is used as ephemeral step and communication between backend
      and frontend (and this ephemeral wheel is used to make
      editable installation of the projeect - suitable for fast
      iteration of code without reinstalling the package.
    
    With Airflow having many provider packages in single source tree
    where we want to be able to install and develop airflow and
    providers together, this is not a small feat to implement the
    case wher editable installation has to behave quite a bit
    differently when it comes to packaging and dependencies for
    editable install (when you want to edit sources directly) and
    installable package (where you want to have separate Airflow
    package and provider packages). Fortunately the standardisation
    efforts in the Python Packaging community and tooling implementing
    it had finally made it possible.
    
    Some of the important ways bow this has been achieved:
    
    * Pyproject.toml is generally managed manually, but the part where
      provider dependencies and bundle dependencies are used is
      automatically updated by a pre-commit whenever provider
      dependencies change.
    
    * We have dedicated (generated) `[devel_provider_*]` extras
      that are only installing provider dependencies in editable
      mode (not the final provider packages). This allows to install
      dependencies of providers individually or in groups in the
      editable installation of Airflow, without installing provider
      packages (i.e. we can use provider code directly from sources
      of editable Airflow installation).
    
    * We have some generated `[devel_*]` bundle extras that bundle
      together all or selected provider dependencies for installation
      in CI image and local editable virtualenv installation.
    
    * We are utilising custom hatchiling build hooks (PEP 660 standard)
      that allow to modify 'standard' wheel package on-the-fly when
      the wheel is being prepared by adding preinstalled package
      dependencies (which are not needed in editable build) and by
      removing all devel extras (that are not needed in the PyPI
      distributed wheel package). This allows to solve the conundrum
      of having different "editable" and "standard" behaviour while
      keeping the same project specification in pyproject.toml.
    
    * We added description of how `Hatch` can be employed as build
      frontend in order to manage local virtualenv and install Airflow
      in editable way easily - while keeping all properties of the
      installed application (including working airflow cli and
      package metadata discovery) as well as how to use PEP-standard
      ways of bulding wheel and sdist packages.
    
    * We have a custom step (following PEP-standards) to inject
      airflow-specific build steps - compiling www assets and
      generating git commit hash version to display it in the UI
    
    * We also show how all this makes it possible to make it easy to
      manage local virtualenvs and editable installations for Airflow
      contributors - without vendor lock-in of the build tools as
      by following standard PEPs Airflow can be locally and editably
      installed by anyone using any build front-end tools following
      the standards - whether you use `pip`, `poetry`, `Hatch`, `flit`
      or any other frontent build tools, Airflow local installation
      and package building will work the same way for all of them,
      where both "editable" and "standard" package prepration is
      managed by `hatchling` backend in the same way.
    
    * Previously our extras contained a "." which is not normalized
      name for extras - `pip` and other tools replaced it automatically
      with `_'. This change updates the extra names to contain
      '_' rather than '.' in the name. This should be fully backwards
      compatible, users will still be able to use "." but it will be
      normalized to "_" in Airflow packages.
    
    * Some of the problematic extras (graphviz, docgen) have been
      moved out of the core extras to optional ones. Particularly
      graphviz has been difficult to install on MacOS ARM. This
      is slightly backwards incompatible, but we should treat it
      as bugfix - the only missing feature Airflow will not be
      able to handle is to produce DAG output as image (and it
      only requires to install graphviz to bring it back). The
      difficulty of installing graphviz as required dependency
      justifies the slight backwards-incompatible change.
    
    * Additionally, this change organizes the documentation around
      the extras and dependencies, explaining the reasoning behind
      all the different extras we have.
    
    * As a bonus (and this is what we used to test it all) we are
      documenting how to use Hatch frontend to:
    
      * manage multiple Python installations
      * manage multiple Pythob virtualenv environments
      * build Airflow packages for release management
    potiuk committed Jan 2, 2024
    Configuration menu
    Copy the full SHA
    d496faa View commit details
    Browse the repository at this point in the history