Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel build support #5449

Open
2 tasks done
cjac opened this issue Aug 8, 2024 · 1 comment
Open
2 tasks done

Parallel build support #5449

cjac opened this issue Aug 8, 2024 · 1 comment
Labels
type::feature request for a new feature or capability

Comments

@cjac
Copy link

cjac commented Aug 8, 2024

Checklist

  • I added a descriptive title
  • I searched open requests and couldn't find a duplicate

What is the idea?

Conda could build packages in parallel. After an analysis of the DAG of package dependencies, leaf nodes and their hierarchy could be built in parallel. Most of my system is idle during installation of conda packages.

image

Why is this needed?

tests for rapids[1], which include installation of cudatools, dask, pandas and other ML tools take a very long time and spend a good portion of the workflow blocking on a single threaded application.

[1] GoogleCloudDataproc/initialization-actions#1219

What should happen?

The work should be broken down into a DAG and delegated to worker threads à la make -j$(nproc)

Additional Context

I appreciate the work done on parallelizing the package downloads. I've included export CONDA_FETCH_THREADS="$(nproc)" to accelerate that portion of the workflow.

@cjac cjac added the type::feature request for a new feature or capability label Aug 8, 2024
@cjac
Copy link
Author

cjac commented Aug 8, 2024

For the record, here is the command that's taking a while to run. I am running this on a rocky8 base image. I can gather metrics for the debian and ubuntu variants as well if that would help.

time conda create -n rapids-24.06 -c rapidsai -c conda-forge -c nvidia rapids=24.06 python=3.11 cuda-version=12.4

It was using more than the 15G of memory available to the n1-standard-4 machine type, and during some portions of the installation, CPU load was near 100% with the 4 processors, so I've increased the machine type to n1-standard-16.

This improves the performance of the GPU driver build script, which uses make -j$(nproc) to parallelize the nvidia kernel driver compilation process. With -j1, the build takes much more time than with -j16. I would hope that the same would be true of the conda build process, but it seems to be single-threaded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type::feature request for a new feature or capability
Projects
Status: 🆕 New
Development

No branches or pull requests

1 participant