Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add description of role of infrastructure in package building #2150

Merged
166 changes: 162 additions & 4 deletions docs/maintainer/infrastructure.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,8 +222,7 @@ the bot now runs directly in `regro/cf-scripts`.

This web application powers several services, like:

- the `@conda-forge-admin, please ...` commands
- the `@conda-forge-linter` bot
- the `@conda-forge-admin` bot and its `@conda-forge-admin, please ...` commands
- the `cf-staging` to `conda-forge` validation (plus copy)
- status monitoring

Expand Down Expand Up @@ -469,6 +468,165 @@ uploaded from the new feedstock, by design.
Once this is done correctly and the package is uploaded,
you can then request the conda-forge core devs to archive the old feedstock.

## Stages of package building and involved infrastructure

Packages in conda-forge are almost[^manual-builds] always built through CI.
However, when a new package enters conda-forge for the first time, it does so via a pull request in the [`staged-recipes` repository](#staged-recipes), whereas every new build of the package after that is built in its repository, the so-called feedstock.
Both places use slightly different CI setups and interact with the infrastructure accordingly.
Hence, we first describe the interaction at the start of a new package and then for existing packages in their respective feedstocks.

[^manual-builds]:
Very few packages cannot be built through CI due to special resource
requirements. These packages may be built and uploaded manually following the rules
laid out in [CFEP-3](https://github.com/conda-forge/cfep/blob/main/cfep-03.md).

### Initial submission to staged-recipes

The `conda-forge/staged-recipes` repository uses several pieces of infrastructure.

On pull requests:

- Package building pipelines. These are slightly different than the ones running in feedstocks (they are not automatically generated by `conda-smithy`, but they do use the same underlying components).
- The linter is provided by `conda-smithy recipe-lint`, run by `@conda-forge-admin`.
- Auto-labeling logic, run by Github Actions workflows.

Authenticated services involved:

- Github, with permissions for:
- PR labeling
- Azure Pipelines

The conversion of new recipes in `staged-recipes` to their respective feedstocks
happens in a cron job run by `admin-requests`. For more details see [admin-requests](#admin-requests).
As part of the feedstock creation, the new feedstock receives a webhook connecting it with the [webservices](#admin-web-services).

### Feedstock changes

A feedstock can receive changes for several reasons.

Pushes to `main` or other branches:

- The automated initialization commits following approval in `staged-recipes`.
These are generated by `conda-smithy` and pushed by the automation in `admin-requests`.
- Automated maintenance commits triggered from `admin-migrations`.
- Rerender requests are handled by instances of `conda-forge/webservices-dispatch-action` and triggered by the [webservices](#admin-web-services).

Automatic pull requests can be opened by...

- `@conda-forge-admin`, responding to some issues with titles like `@conda-forge-admin, please...`.
- `@regro-cf-autotick-bot`, handling migrations and new versions being available.

...and closed by:

- `conda-forge/automerge-action`, if labeled accordingly.

On an open pull request:

- The building pipelines (more [below](#package-building)).
- The linter is provided by `conda-smithy recipe-lint`, run by `@conda-forge-admin`.
- The `@conda-forge-admin, please...` command comments, answered by `@conda-forge-admin`.

On issues:

- `@conda-forge-admin, please...` command issues, handled by `@conda-forge-admin`.

### Package building {#package-building}

The pipelines that build conda packages are used for both pull requests and push events in `main` and other branches.
The only difference is that the packages built during a pull request are not uploaded to the staging channel.
Maintaining these up-to-date across all feedstocks involves several repositories:

- `conda-smithy` is in charge of generating the CI pipelines themselves, together with the supporting scripts and configuration files.
These pipelines and scripts can rely on code and data defined in the repositories below.
- `conda-forge-ci-setup-feedstock` provides the code needed to prepare and homogenize the CI runners across providers.
It also does some checks before the artifacts are uploaded to `cf-staging`.
- `conda-forge-pinning-feedstock` defines which versions are supported for a number of runtimes and libraries, as well as the compilers used for certain languages and platforms.
- `docker-images` builds the standardized container images for Linux runners.
This repository has additional authentication needs for Docker Hub, Quay.io.

The pipelines can run on several CI providers supported by `conda-smithy`, including:

- Azure DevOps Pipelines
- Travis CI
- Circle CI
- Appveyor
zklaus marked this conversation as resolved.
Show resolved Hide resolved
- Self-hosted Github Actions runners

Registration of hooks and triggers is also done by the `conda-smithy` app.

:::tip
`conda-smithy` supports more CI providers.
Check [its repository](https://github.com/conda-forge/conda-smithy) for more details.
:::

Authenticated services involved:

- Anaconda.org uploads to `cf-staging`

### Package validation and publication
beckermr marked this conversation as resolved.
Show resolved Hide resolved

Once built on `main` (or other branches), the conda packages are uploaded to an intermediary channel named `cf-staging`.
From there, the packages are downloaded by the validation server and, if successful, copied over to `conda-forge` itself.

- The validation logic is defined at `conda-forge/artifact-validation`
- If problematic, the results of the validation are posted as issues in the same repo.
- This logic runs at `conda-forge/conda-forge-webservices`.
This web app also copies the artifacts from `cf-staging` to `conda-forge`.
- Part of the validation includes checking for cross-package clobbering.
The list of authorized feedstocks per package name is maintained at `conda-forge/feedstock-outputs`.
- Some further analysis might be performed _after_ publication.

Authenticated services involved:

- Anaconda.org uploads to `conda-forge`
- The `conda-forge-webservices` app deployment itself (currently at Heroku)
- (?) Post new issues to `conda-forge/artifact-validation`
beckermr marked this conversation as resolved.
Show resolved Hide resolved

### Post-publication

Once uploaded to anaconda.org/conda-forge, packages are not immediately available to CLI clients.
They have to be replicated in the Content Distribution Network (CDN).
This step should ideally take around 15 minutes. In some circumstances, longer delays are possible. Check [conda-forge.org/status](https://conda-forge.org/status) in case of doubt.

After CDN replication, most packages available on anaconda.org/conda-forge won't suffer any further modifications.
However, in some cases, maintainers might need to perform some actions on the published packages:

- Patching their repodata
- Marking them as broken

#### Repodata patch

The metadata for `conda` packages is initially contained in each package archive (under `info/`).
`conda index` iterates over the published `conda` packages, extracts the metadata and consolidates all the found JSON blobs into a single JSON file: `repodata.json`.
zklaus marked this conversation as resolved.
Show resolved Hide resolved
This is where the hashes and file sizes are added too.
This is the metadata file that the CLI clients download initially to _solve_ the environment.

Since the metadata is external to the package files, some details can be modified without rebuilding packages, which simplifies some maintenance tasks notably.

Repodata patches are created in `conda-forge/conda-forge-repodata-patches-feedstock`, which generates (and uploads) a regular `conda` package as a result:
[`conda-forge-repodata-patches`](https://anaconda.org/conda-forge/conda-forge-repodata-patches/files).
Each of these timestamped packages contains the patch instructions for each channel subdir on conda-forge.
The Anaconda infrastructure takes the JSON files from these packages and applies them on top of the vanilla `repodata.json` (which remains available for download as `repodata_from_packages.json`).

Since `conda-forge-repodata-patches-feedstock` operates as a regular feedstock for package publication, there are no further infrastructural details to cover.

#### Mark a package as broken

Sometimes a package is faulty in ways that a repodata patch cannot amend (e.g. bad binary).
In these cases, conda-forge does not remove packages from Anaconda.org.
Instead, it marks them with the `broken` label, which has a special meaning:
packages labeled as such will be removed from the repodata via automated patches.
zklaus marked this conversation as resolved.
Show resolved Hide resolved
This action is reversible and doesn't change the direct URL of the artifact, which
can always be downloaded from e.g. a lockfile.

The main repository handling this is `conda-forge/admin-requests`, which features different
Github Actions workflows running every 15 minutes.

For this task, the Github Action workflow needs access to:

- Anaconda.org, to add (or remove) labels
- Github, to modify and commit the input files after success

## Inventory of services & providers

### Github resources
Expand Down Expand Up @@ -526,15 +684,15 @@ Most of them are associated with a feedstock, but there are a few special ones t

- [`conda-forge-admin`](https://github.com/conda-forge-admin)
- [`conda-forge-daemon`](https://github.com/conda-forge-daemon)
- [`conda-forge-linter`](https://github.com/conda-forge-linter)
- [`regro-cf-autotick-bot`](https://github.com/regro-cf-autotick-bot)

:::info
These accounts exist but are not in active usage anymore:

- [`conda-forge-drone-ci`](https://github.com/conda-forge-drone-ci)
- [`conda-forge-bot`](https://github.com/conda-forge-bot)
- [`conda-forge-coordinator`](https://github.com/conda-forge-coordinator)
- [`conda-forge-drone-ci`](https://github.com/conda-forge-drone-ci)
- [`conda-forge-linter`](https://github.com/conda-forge-linter)
- [`conda-forge-manager`](https://github.com/conda-forge-manager)
- [`conda-forge-status`](https://github.com/conda-forge-status)

Expand Down