-
-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In memory security contexts are not supported #520
Comments
I guess the main challenge will be handling the temporary files as they need to survive until the cluster is closed. |
Can you think of another solution to avoid having to go through a file (which seems to involve a lot of unwanted code...)? |
Not to defend myself: Temporary certificates are always going through a file: https://github.com/dask/distributed/blob/8fa3854514c5ede53e9c0381776f6109882ff778/distributed/security.py#L213 . However, you are correct, that it is a problem regarding the lifecycle of the file and using the API correctly. It seems that the command line interface of the workers that dask_jobqueue is using does not support any other way to ingest a certificate to the process (I have tried putting the whole key on the command line). What dask cloud-provider does is creating a dask config : https://github.com/jacobtomlinson/dask-cloudprovider/blob/feb86a8da834bc0b34d6408686c683b976d85a10/dask_cloudprovider/generic/vmcluster.py#L253-L266 My rationale was that this might change/break more things in dask-jobqueue because it might have rather global effects. But maybe I am mistaken here. |
Okay, so I guess we should stick with this approach, and say we would not support in memory contexts. |
Sorry if I am asking, but do you mean that a) in memory contexts should not be automatically converterted in temp files for the worker and we do not support or b) we just do not support keeping them in memory for the workers that are send to the batch scheduler. Which would in fact add support from a user perspective. in case of a) I guess an exception should be thrown and there would no sound way of certificate autogeneration. I could then strip down #519 to a bare minimum and add an "not supported exception" instead. This will force users to care about temp certificates themselves. case b) is covered in #519 . As the tempfile code is only triggered anyway in case people use |
Sorry I thought I answered this... I was meaning b). |
I was actually thinking this also the whole time until you reminded me of dask-jobqueue/dask_jobqueue/htcondor.py Line 236 in 26fd3ae
However, when setting up the new CI I think I completely ignored this: dask-jobqueue/ci/htcondor/docker-compose.yml Lines 56 to 57 in 9944654
So I think it might probably not depend on this. Here is a hint that this can be very different on different setups: dask-jobqueue/dask_jobqueue/lsf.py Line 201 in 1f9ae1e
Here you find hints that even python might not be shared: Bottom line: the testing of security should actually be done on all schedulers and the assumption should be dropped, IMHO. It might well be the case that also file based certificates need to be copied to the worker.#524 What do you think. @guillaumeeb : thanks for nagging me here. I think this led to much clearer thinking. One positive outcome should be that the testing strategy should be overhauled a bit, since the CI run on #519 for all the schedulers should be a warning... |
I confirmed that actually file based certificates also won't work on the HTCondor and slurm CI: #532 . Now the price question: is this a bug of the ci or the code. It should be at least document that one has to be smart enough to put the certificates in some accessible place. On the other hand: how would code fixing this issue know where this is... |
Sorry for the delay @riedel, and as always thanks for your investments in dask-jobqueue!
I'd said neither. I think all dask-jobqueue Cluster can work without any shared file system in some specific setups. Some CI environments show this. It should remain the same. The code for generating a temporary file should just try to do it in the user folder, and be explicit (with some logging?) that the place where the file is generated should be accessible from all workers. Maybe we can add a configuration option for this location (as we do for worker log directory). All this should also be clearly documented. For the CI, we could test this only in CI environment with shared folders. I don't know if we want to add a shared file system for all tests, just to test this. I don't think so. |
See https://github.com/dask/distributed/blob/3754a6694f72207b69aaf661445aecf18fb5f0cb/distributed/security.py#L211
i.e. one cannot use Security.temporary() to create a cluster
distrinuted allows using in memory keys where the content is in memory. It does so by checking if the value is multiline.
the expansion to the extra arguments does not support this:
dask-jobqueue/dask_jobqueue/core.py
Line 224 in 79a9a3b
in this case a temporary file needs to be created
The text was updated successfully, but these errors were encountered: