Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Procedure for reinstalling software with the bot (and preventing chown/chmod issues in the container) #312

Closed
satishskamath opened this issue Jul 21, 2023 · 9 comments · Fixed by #488
Assignees

Comments

@satishskamath
Copy link
Collaborator

satishskamath commented Jul 21, 2023

Easybuild crashed because it could not chown/chmod some /cvmfs/... paths. Most likely this is only while the bot is rebuilding something otherwise this error would have been seen while we build anything.
easybuild-9ukb1c3n.log

Originally posted by @satishskamath in #311 (comment)

@bedroge
Copy link
Collaborator

bedroge commented Jul 24, 2023

I have seen it before, mostly in cases where we were updating the existing compatibility layer. That was often because the original installation was done with for instance a different user id. I don't think the latter is the case here, as we're doing all installation with the bot. Instead, I think this is due to the --read-only-installdir that we added a few months ago: #245.

Apptainer> cd /cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/x86_64/amd/zen3/software/ReFrame/
Apptainer> ls -l
total 5
dr-xr-xr-x. 10 bedroge users 4096 Jul  4 10:13 4.2.0
Apptainer> touch 4.2.0/test
touch: cannot touch '4.2.0/test': Permission denied
Apptainer> chmod +w 4.2.0 
chmod: changing permissions of '4.2.0': Permission denied
Apptainer> rm -f 4.2.0/bin/pip
rm: cannot remove '4.2.0/bin/pip': Permission denied

Seems like I'm not allowed to change it in any way, even with a writable overlay.

Comparing it to the older 2021.12 pilot, where we didn't use this option, I am allowed to do whatever I want:

Apptainer> cd /cvmfs/pilot.eessi-hpc.org/versions/2021.12/software/linux/x86_64/amd/zen3/software/TensorFlow/
Apptainer> ls -l
total 5
drwxr-xr-x. 5 bedroge users 4096 Jan 15  2022 2.3.1-foss-2020a-Python-3.8.2
Apptainer> rm 2.3.1-foss-2020a-Python-3.8.2/bin/
estimator_ckpt_converter  pyrsa-decrypt             pyrsa-priv2pub            saved_model_cli           tflite_convert            
google-oauthlib-tool      pyrsa-encrypt             pyrsa-sign                tensorboard               toco                      
markdown_py               pyrsa-keygen              pyrsa-verify              tf_upgrade_v2             toco_from_protos          
Apptainer> rm 2.3.1-foss-2020a-Python-3.8.2/bin/tensorboard 
Apptainer> chmod g+w 2.3.1-foss-2020a-Python-3.8.2/
Apptainer> 

@bedroge
Copy link
Collaborator

bedroge commented Jul 24, 2023

Using Apptainer's --fakeroot seems to solve the issue:

Apptainer> cd /cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/x86_64/amd/zen3/software/ReFrame/
Apptainer> ls -l
total 5
dr-xr-xr-x 10 root root 4096 Jul  4 10:13 4.2.0
Apptainer> rm 4.2.0/bin/pip
Apptainer> rm -rf 4.2.0/
Apptainer> 

But it probably also defeats the purpose of using --read-only-installdir ;-).

@ocaisa
Copy link
Member

ocaisa commented Jul 24, 2023

I'm not sure what the right move here is. I think the use of --read-only-installdir is a good idea to protect our installations from accidental changes. In the case of @satishskamath, he is looking to rebuild an existing installation, which one would expect to be a rare case.

In #295, there is a discussion about what the workflow should be in that scenario. Perhaps the use of --fakeroot should be part of that workflow? Perhaps there should be a special easystack file for rebuilds so that the impact can be contained and recorded?

@satishskamath
Copy link
Collaborator Author

satishskamath commented Jul 24, 2023

  1. But is it still possible to make the bot install in a local eb prefix if the rebuild option is given? Then it doesn't need to modify the /cvmfs paths at all. I do agree with the read only install directory in Make sure installations are read only by enabling --read-only-installdir EasyBuild configure option #245 .
  2. Also can such a local build be ingested?

@ocaisa
Copy link
Member

ocaisa commented Jul 24, 2023

  1. But is it still possible to make the bot install in a local eb prefix if the rebuild option is given? Then it doesn't need to modify the /cvmfs paths at all. I do agree with the read only install directory in Make sure installations are read only by enabling --read-only-installdir EasyBuild configure option #245 .

It's always possible to do this, but you have then changed the path, so the old location is still available

  1. Also can such a local build be ingested?

Not in the correct location. Trying to "move" an installation can very easily be problematic (because of CMake paths, pkgconfig, not to mention what the package itself may do)

@bedroge
Copy link
Collaborator

bedroge commented Feb 22, 2024

Looking into this again now, since it's also an issue for #404 where we need to replace the OpenMPI installation. This looks similar to containers/fuse-overlayfs#377 and containers/fuse-overlayfs#374. I'm not sure how we can easily work around this, but the --fakeroot option is still the only workaround that I've found so far. We could consider adding a rebuild command to the bot, which will then set this. It will also need to run eb with --allow-use-as-root-and-accept-consequences, though, as otherwise EB will complain (You seem to be running EasyBuild with root privileges which is not wise, so let's end this here.). Not sure if this has any other implications for the way we build and ingest things, e.g. in terms of ownership.

@bedroge bedroge changed the title Easybuild crashed because it could not chown/chmod some /cvmfs/... paths. Most likely this is only while the bot is rebuilding something otherwise this error would have been seen while we build anything. Procedure for reinstalling software with the bot (and preventing chown/chmod issues in the container) Feb 26, 2024
@bedroge
Copy link
Collaborator

bedroge commented Feb 26, 2024

After discussing this a bit with @akesandgren on Slack, I now have another workaround/solution to hide an existing software installation from the container: you can make a character special file using mknod <path to file/dir> c 0 0 in the overlay's upper directory. For instance, if we want to hide some OpenMPI version, you can do the following:

cd /tmp/eessi.fu3sWt8xpS/overlay-upper
mkdir -p versions/2023.06/software/linux/x86_64/amd/zen3/software/OpenMPI
mknod versions/2023.06/software/linux/x86_64/amd/zen3/software/OpenMPI/4.1.5-GCC-12.3.0 c 0 0

Looks like it has to be done before launching the container, though.

Letting the bot's build script do that is still not that easy, though, as it would need to figure out which things to delete/hide before even entering the container, i.e. before calling EasyBuild. So the bot will have to provide this info to the build scripts in some way, which brings me to another point: how do we even want to specify that we need to rebuild something? We've done that before with ReFrame, and then we (temporarily) added rebuild: True to the easystack, and removed that before merging the PR. That's also a bit cumbersome...

While thinking about possible ways to do it and wondering if you could make an empty PR where you just tell the bot to rebuild OpenMPI-......eb for arch ... and repo ..., I realized that it's better anyway to log this in some file that can be updated by a PR. Similar to the known issues YAML file, we could make a YAML file for all installations that ever had to be rebuilt, with maybe a date and/or link to an issue to make clear when and why it was done. The build script called by the bot should then check if this PR is changing this particular YAML file, and it that case only rebuild the easyconfig(s) that was added to the file in this PR. It should then be clear which directory has to be hidden in the overlay.

@bedroge
Copy link
Collaborator

bedroge commented Feb 27, 2024

I've tried to use a Debian 12 container with newer FUSE libraries and various versions of fuse-overlayfs. In all cases, I still got the Permission denied error when trying to do anything with an existing software installation. So it doesn't look like we can work around it that way, unfortunately.

@boegel
Copy link
Contributor

boegel commented Feb 28, 2024

Letting the bot's build script do that is still not that easy, though, as it would need to figure out which things to delete/hide before even entering the container, i.e. before calling EasyBuild. So the bot will have to provide this info to the build scripts in some way, which brings me to another point: how do we even want to specify that we need to rebuild something? We've done that before with ReFrame, and then we (temporarily) added rebuild: True to the easystack, and removed that before merging the PR. That's also a bit cumbersome...

While thinking about possible ways to do it and wondering if you could make an empty PR where you just tell the bot to rebuild OpenMPI-......eb for arch ... and repo ..., I realized that it's better anyway to log this in some file that can be updated by a PR. Similar to the known issues YAML file, we could make a YAML file for all installations that ever had to be rebuilt, with maybe a date and/or link to an issue to make clear when and why it was done. The build script called by the bot should then check if this PR is changing this particular YAML file, and it that case only rebuild the easyconfig(s) that was added to the file in this PR. It should then be clear which directory has to be hidden in the overlay.

I like this idea...
This could boil down to adding a file like easystacks/software.eessi.io/2023.06/rebuilds/20240228-OpenMPI-fix-smcuda.yml?

One thing to keep in mind is that we should somehow try to retain the possibility to "replay" a series of installations for a new CPU target (like x86_64/amd/zen4 soon), not sure how we would do that with the rebuilds in the mix...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants