Archive home directory using multi-stage build #781

danielhollas · 2024-07-22T16:55:36Z

Supersedes #778, hopefully the last iteration!

The main goal here is to reduce the complexity of status quo and of #740.

The strategy of archiving home directory and extracting it at startup allows for a bunch of simplification of the Dockerfile since everything can be directly prepared in home folder, without intermediary steps, and this allows to get rid of the current startup scripts (70_, 71_).

All startup scripts from full-stack are preserved and reused, which minimizes duplication, resolves the SSH key issue and should be more maintainable

The only new startup script is the 00_untar_home.sh which is basically the same here as in #740.

I've done some quick benchmarking, at starting the container takes around 12s on my machine. The image takes around 5.8Gb. We could trade around 300Mb images size for extra 3s of startup time if we compressed the home.tar archive. (My timings seems roughly consistent with those observed in #740.

Reducing the image size will be done in subsequent PR.

codecov · 2024-07-22T17:03:39Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.21%. Comparing base (4d92c54) to head (bf57f63).
Report is 43 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #781   +/-   ##
=======================================
  Coverage   68.21%   68.21%           
=======================================
  Files          45       45           
  Lines        4147     4147           
=======================================
  Hits         2829     2829           
  Misses       1318     1318

Flag	Coverage Δ
python-3.10	`68.21% <ø> (ø)`
python-3.9	`68.25% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/aiidalab_qe/common/setup_codes.py

danielhollas · 2024-07-22T22:21:10Z

@unkcpz @superstar54 This is hopefully the final iteration of the docker build. 😅 For easier review, I've split chunks of this PR into 3 extra pull requests that should be reviewed and merged first: #782, #783, #784

The startup takes slighly less than 10s. More speedup is I think possible:

Updating to Python 3.11
Speeding up verdi storage migrate (I'll open an issue on aiida-core)
Speeding up verdi in general (not sure if possible)

The size is currently 5.8Gb, but I don't understand where the increase comes from, even though I tried very hard to get rid of it. Let's merge the three PRs first, I'll continue to investigate....

unkcpz

Thanks @danielhollas! The implementation looks super clear, I have one minor request.

@superstar54, you showed interests to learn more docker stuff. I'd say this is a nice PR to read if you have time.

unkcpz · 2024-07-23T11:42:26Z

Dockerfile

+RUN --mount=from=uv,source=/uv,target=/bin/uv \
+    uv pip install --strict --system --cache-dir=${UV_CACHE_DIR} .
+


This also happened in stage 4 but I understand it is not avoidable.

Yes. It is unfortunate that we need to install all the dependencies to install qe codes and pseudos. But uv is so fast and I am re-using its cache that in terms of speed it doesn't matter much.

unkcpz · 2024-07-23T11:42:38Z

Dockerfile

+# STAGE 3:
+# - Prepare AiiDA profile and localhost computer
+# - Install QE codes and pseudopotentials
+# - Archive home folder


Stage 2 and stage 3 better to be merged? It says "to run aiidalab_qe CLI commands" then clear to directly run it in the same stage. I believe the finale size will be the same.

You are right. I did this mainly as a logical separation, but it is not needed and might be confusing. I'll merge them.

Actually, it is beneficial to leave this as a separate stage, because the uv cache can then be immediately used in the final stage, without waiting for the rest of the home_stage build (which is the longest build part). I've rearranged things a bit for better cache utilization. Now, when you modify Dockerfile and rebuild, it only takes 10s!

Dockerfile

danielhollas · 2024-07-23T16:48:08Z

Hmm, after rearranging things a little bit, the image size dropped from 5.8Gb to 5.1Gb (compared to 4.1Gb on main), although I have no idea why.

540Mb comes from home.tar file, and ~100Mb comes from python bytecode *pyc files which we were previously not compiling. So there is still ~300Mb hiding somewhere, but I have no idea where, in any case this is now ready from my side.

@unkcpz perhaps you can deploy again to the demo server for testing?

unkcpz · 2024-07-23T17:30:01Z

the image size dropped from 5.8Gb to 5.1Gb (compared to 4.1Gb on main), although I have no idea why.

If you click on a specific tag in dockerhub, you'll see how much size each layer generated: https://hub.docker.com/r/aiidalab/qe/tags
Apparently, you remove upload images to dockerhub ;)

I also find this fancy tool to check the detail of image layer size: https://github.com/wagoodman/dive

unkcpz · 2024-07-25T06:34:15Z

Forget to mention, the image was redeployed to the Azure and works as I expected. I think we can merge this and from next week I 'll working on hyperqueue integration.
Let me know if you still want to have a close look at the size of image.

.github/workflows/docker-build-test-upload.yml

This reverts commit 21a1917.

unkcpz · 2024-07-31T09:34:19Z

Apparently, you remove upload images to dockerhub ;)

Hi @danielhollas, I guess you miss one comment above?

danielhollas · 2024-07-31T10:15:06Z

Hi. I am aware, although I should have been more explicit. Does it bring any benefits of publishing to Dockerhub? Given what you told me about this image being most important for demo server deployment, I think that only publishing to ghcr.io is fine?

Publishing to Dockerhub would complicate the GitHub actions workflow, so unless there is a clear benefit I'd advise against it.

unkcpz · 2024-07-31T10:20:58Z

One thing a bit annoy is I can not find what is available tags in ghcr.io registry, since we have a lot images with pr-xx tags and disgest directly.

danielhollas · 2024-07-31T10:28:17Z

Yeah, ghcr.io interface is not great. But with the significantly simpler workflow, you don't really need to search for tags, do you? If you look at the workflow, we don't push by digest or commit sha anymore, only pr-xxx on PRs and edge on main, and version when a new tag is pushed.

unkcpz · 2024-07-31T11:10:08Z

Make sense, I think we want to have highly maintainable repo that involve less outside tools as possible to fit the goal. Would you then mind to add a paragraph to README to tell which tag can be used and is from which branch? Sort of like "Supported tags" section of aiidalab-docker-stack.

danielhollas added 10 commits July 19, 2024 17:52

Tar home build

0bc3277

Add .dockerignore

9f30615

WIP: Smaller image

99ee0ce

Still doesn't work

45c4596

WIP: Multistage

a685221

Pre-download pseudos

fd1ef03

Fixes

0521d29

Build deps stage

68dc3c0

Use uv cache

94254ed

Indepedent build steps

7c30427

danielhollas commented Jul 22, 2024

View reviewed changes

src/aiidalab_qe/common/setup_codes.py Outdated Show resolved Hide resolved

danielhollas force-pushed the smaller-image branch from 1832c06 to 6b2ed46 Compare July 22, 2024 17:28

Trace startup script

dba9fbb

danielhollas force-pushed the smaller-image branch 2 times, most recently from 1a0c3ae to c9be311 Compare July 22, 2024 18:36

Smaller timeout, print logs when startup fails

192e558

danielhollas force-pushed the smaller-image branch from c9be311 to 192e558 Compare July 22, 2024 18:48

danielhollas marked this pull request as ready for review July 22, 2024 18:58

danielhollas added 3 commits July 22, 2024 20:00

Use AIIDALAB_APPS

f735c2b

Fix permissions?

0bbe0b1

Don't remove home dir

48b3990

This was referenced Jul 22, 2024

Print container logs if service fails to start #782

Merged

Move QE conda install into separate stage #783

Merged

danielhollas marked this pull request as draft July 22, 2024 21:24

This was referenced Jul 22, 2024

Tar home build #778

Closed

Support QE conda env in /opt/conda #784

Merged

unkcpz requested changes Jul 23, 2024

View reviewed changes

Merge branch 'main' into smaller-image

af72528

danielhollas added 4 commits July 23, 2024 16:04

Smaller diff

475635e

Fix

2804752

Separate stage for build_deps

e9cd720

Better caching behaviour

9a18ab7

danielhollas marked this pull request as ready for review July 23, 2024 16:12

danielhollas requested review from unkcpz and superstar54 July 23, 2024 16:12

danielhollas added 2 commits July 23, 2024 17:23

Tweaks

2fd78f8

Min cache

456c4a2

danielhollas commented Jul 23, 2024

View reviewed changes

Dockerfile Show resolved Hide resolved

Fix integration tests on PRs from forks

21a1917

danielhollas force-pushed the smaller-image branch from 0159dcb to 21a1917 Compare July 24, 2024 00:25

unkcpz approved these changes Jul 25, 2024

View reviewed changes

unkcpz reviewed Jul 25, 2024

View reviewed changes

.github/workflows/docker-build-test-upload.yml Outdated Show resolved Hide resolved

Revert "Fix integration tests on PRs from forks"

bf57f63

This reverts commit 21a1917.

danielhollas merged commit 82b329b into main Jul 25, 2024
12 checks passed

danielhollas deleted the smaller-image branch July 25, 2024 17:07

superstar54 mentioned this pull request Jul 31, 2024

Docker image with home directory tar #740

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Archive home directory using multi-stage build #781

Archive home directory using multi-stage build #781

danielhollas commented Jul 22, 2024 •

edited

Loading

codecov bot commented Jul 22, 2024 •

edited

Loading

danielhollas commented Jul 22, 2024

unkcpz left a comment

unkcpz Jul 23, 2024

danielhollas Jul 23, 2024

unkcpz Jul 23, 2024

danielhollas Jul 23, 2024

danielhollas Jul 23, 2024

danielhollas commented Jul 23, 2024 •

edited

Loading

unkcpz commented Jul 23, 2024 •

edited

Loading

unkcpz commented Jul 25, 2024 •

edited

Loading

unkcpz commented Jul 31, 2024

danielhollas commented Jul 31, 2024

unkcpz commented Jul 31, 2024

danielhollas commented Jul 31, 2024

unkcpz commented Jul 31, 2024

		RUN --mount=from=uv,source=/uv,target=/bin/uv \
		uv pip install --strict --system --cache-dir=${UV_CACHE_DIR} .

Archive home directory using multi-stage build #781

Archive home directory using multi-stage build #781

Conversation

danielhollas commented Jul 22, 2024 • edited Loading

codecov bot commented Jul 22, 2024 • edited Loading

Codecov Report

danielhollas commented Jul 22, 2024

unkcpz left a comment

Choose a reason for hiding this comment

unkcpz Jul 23, 2024

Choose a reason for hiding this comment

danielhollas Jul 23, 2024

Choose a reason for hiding this comment

unkcpz Jul 23, 2024

Choose a reason for hiding this comment

danielhollas Jul 23, 2024

Choose a reason for hiding this comment

danielhollas Jul 23, 2024

Choose a reason for hiding this comment

danielhollas commented Jul 23, 2024 • edited Loading

unkcpz commented Jul 23, 2024 • edited Loading

unkcpz commented Jul 25, 2024 • edited Loading

unkcpz commented Jul 31, 2024

danielhollas commented Jul 31, 2024

unkcpz commented Jul 31, 2024

danielhollas commented Jul 31, 2024

unkcpz commented Jul 31, 2024

danielhollas commented Jul 22, 2024 •

edited

Loading

codecov bot commented Jul 22, 2024 •

edited

Loading

danielhollas commented Jul 23, 2024 •

edited

Loading

unkcpz commented Jul 23, 2024 •

edited

Loading

unkcpz commented Jul 25, 2024 •

edited

Loading