Table of Contents generated with DocToc
- Automated cache refreshing in CI
- Manually generating constraint files
- Manually refreshing the images
Our CI system is build in the way that it self-maintains. Regular scheduled builds and
merges to main
branch have separate maintenance step that take care about refreshing the cache that is
used to speed up our builds and to speed up rebuilding of Breeze images for development
purpose. This is all happening automatically, usually:
-
The latest constraints are pushed to appropriate branch after all tests succeeded in
main
merge or inscheduled
build -
The images in
ghcr.io
registry are refreshed after every successful merge tomain
orscheduled
build and after pushing the constraints, this means that the latest image cache uses also the latest tested constraints
Sometimes however, when we have prolonged period of fighting with flakiness of GitHub Actions runners or our tests, the refresh might not be triggered - because tests will not succeed for some time. In this case manual refresh might be needed.
breeze build-image --run-in-parallel --upgrade-to-newer-dependencies --answer yes
breeze generate-constraints --airflow-constraints-mode constraints --run-in-parallel --answer yes
breeze generate-constraints --airflow-constraints-mode constraints-source-providers --run-in-parallel --answer yes
breeze generate-constraints --airflow-constraints-mode constraints-no-providers --run-in-parallel --answer yes
AIRFLOW_SOURCES=$(pwd)
The constraints will be generated in files/constraints-PYTHON_VERSION/constraints-*.txt
files. You need to
check out the right 'constraints-' branch in a separate repository, and then you can copy, commit and push the
generated files:
cd <AIRFLOW_WITH_CONSTRAINTS-MAIN_DIRECTORY>
git pull
cp ${AIRFLOW_SOURCES}/files/constraints-*/constraints*.txt .
git diff
git add .
git commit -m "Your commit message here" --no-verify
git push
Note that in order to refresh images you have to not only have buildx
command installed for docker,
but you should also make sure that you have the buildkit builder configured and set. Since we also build
multi-platform images (for both AMD and ARM), you need to have support for qemu or hardware ARM/AMD builders
configured.
According to the official installation instructions this can be achieved via:
docker run --privileged --rm tonistiigi/binfmt --install all
More information can be found here
However, emulation is very slow - more than 10x slower than hardware-backed builds.
If you plan to build a number of images, probably better solution is to set up a hardware remote builder for your ARM or AMD builds (depending which platform you build images on - the "other" platform should be remote.
This can be achieved by settings build as described in
this guideline and
adding it to docker buildx airflow_cache
builder.
This usually can be done with those two commands:
docker buildx create --name airflow_cache # your local builder
docker buildx create --name airflow_cache --append HOST:PORT # your remote builder
One of the ways to have HOST:PORT is to login to the remote machine via SSH and forward the port to the docker engine running on the remote machine.
When everything is fine you should see both local and remote builder configured and reporting status:
docker buildx ls
airflow_cache docker-container
airflow_cache0 unix:///var/run/docker.sock
airflow_cache1 tcp://127.0.0.1:2375
The images can be rebuilt and refreshed after the constraints are pushed. Refreshing image for all python version sis a simple as running the refresh_images.sh script which will sequentially rebuild all the images. Usually building several images in parallel on one machine does not speed up the build significantly, that's why the images are build sequentially.
./dev/refresh_images.sh