Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] use mamba instead of conda in macOS and Linux CI jobs #6140

Merged
merged 4 commits into from
Oct 25, 2023
Merged

[ci] use mamba instead of conda in macOS and Linux CI jobs #6140

merged 4 commits into from
Oct 25, 2023

Conversation

borchero
Copy link
Collaborator

@borchero borchero commented Oct 12, 2023

Motivation

In #6034, the Python package bdist build on macOS times out after an hour as the environment for testing cannot be solved in time. This is the result from using conda with its slow solver. Replacing conda with mamba causes the same CI job to complete successfully in under 20 minutes.

This PR allows to optionally use mamba rather than conda to allow certain CI jobs to take advantage of mamba's faster solver. The optional usage alleviates the need to build new docker images for the CI for the time being.

@borchero borchero changed the title Start to replace conda with mamba in some CI jobs Replace conda with mamba in some CI jobs Oct 12, 2023
@jameslamb jameslamb changed the title Replace conda with mamba in some CI jobs [ci] Replace conda with mamba in some CI jobs Oct 12, 2023
Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this but simply changing conda to mamba everywhere you find it is not going to work in this project's CI. Most of the Linux *** jobs running on Azure Devops are failing, for example, with an error like this:

/__w/1/s/.ci/test.sh: line 131: mamba: command not found

(example build link)

Because they run in a container whose image is published from here: https://github.com/guolinke/lightgbm-ci-docker/tree/9121c2852e1cbbc556f12b4ba70b335989e0d9b4/dockers/ubuntu-14.04

And that image does not have mamba in it.

If you want to make this change, you'll have to install mamba in those images, and make other changes similar to those in #4953.

@borchero
Copy link
Collaborator Author

@jameslamb yep, realized that already. The second paragraph of my description describes what I now wanted to do instead, will implement in the next couple minutes.

@borchero borchero changed the title [ci] Replace conda with mamba in some CI jobs [ci] Optionally allow to use mamba instead of conda in CI jobs Oct 12, 2023
@borchero borchero requested a review from jameslamb October 12, 2023 22:28
@borchero
Copy link
Collaborator Author

@jameslamb I also tried to build Docker containers in https://github.com/guolinke/lightgbm-ci-docker. However, I completely fail to do so on an AWS-hosted Linux machine and it also seems to be flaky in the GitHub CI. How do you build these images locally?

@jameslamb
Copy link
Collaborator

I completely fail to do so on an AWS-hosted Linux machine and it also seems to be flaky in the GitHub CI.

Please be specific about the issues you're encountering (logs? error messages?) and I will try to help.

@borchero
Copy link
Collaborator Author

Issues in the CI are visible here: https://github.com/borchero/lightgbm-ci-docker/actions/runs/6502543755. Issues on AWS were similar. I got a little farther with some adjustments but failed to install PoCL due to an unknown target for the ubuntu-14.04 container 👀 unfortunately, I don't have any logs available right now.

In any case, would it be able to merge this PR regardless? It would at least fix some issue

@jameslamb
Copy link
Collaborator

would it be able to merge this PR regardless?

No, it is still not passing. The latest Linux sdist build on Azure DevOps hit its 1-hour timeout while trying to solve for the test environment in this PR: https://dev.azure.com/lightgbm-ci/lightgbm-ci/_build/results?buildId=15329&view=logs&j=02a2c3ba-81f8-54e3-0767-5d5adbb0daa9&t=720ee3fa-96d4-5b47-dbf4-01607b74ade2

In addition, I'm concerned...where is mamba coming from? It isn't listed at https://github.com/actions/runner-images/blob/main/images/macos/macos-13-Readme.md and you haven't added any code to the CI setup scripts to add it.

We try in this project to not rely on the things that come pre-installed in the VM images from CI providers, as those can change without our knowledge and in some cases can go without updates for an unacceptably long time.

I'd expected to see a PR that adds the use of mamba also adding some setup of mamba, similar to this:

LightGBM/.ci/setup.sh

Lines 127 to 133 in 8ed371c

if [[ $SETUP_CONDA != "false" ]]; then
ARCH=$(uname -m)
curl \
-sL \
-o miniforge.sh \
https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-${ARCH}.sh
fi

LightGBM/.ci/setup.sh

Lines 137 to 139 in 8ed371c

if [[ $SETUP_CONDA != "false" ]]; then
sh miniforge.sh -b -p $CONDA
fi

Can you please add something like that and update env variables PATH, CONDA, etc. to ensure that the expected copy of mamba is being used?


Also, since you're pursuing this in support of #6034... could you consider other approaches to try to get that environment with cffi and pyarrow to solve faster? For example, putting floors on the cffi and pyarrow dependencies (and maybe others like pandas) might cut out some combinations the solver has to search through.


Please note that I'll have very limited availability for the next week, so I personally won't be able to provide thorough reviews for this or #6034 for the next 8 days or so.

@borchero
Copy link
Collaborator Author

The latest Linux sdist build on Azure DevOps hit its 1-hour timeout while trying to solve for the test environment in this PR

I think that this is a general issue of the current CI setup 👀 I wouldn't know how this is connected to changes in this PR and I would expect it to fail (non-deterministically) in other PRs as well 😬

where is mamba coming from?

Since July 2023, Miniforge includes mamba (cf. https://github.com/conda-forge/miniforge#whats-the-difference-between-mambaforge-and-miniforge) and Mambaforge (= Miniforge as of July 2023) is the recommended way to install mamba (cf. https://mamba.readthedocs.io/en/latest/mamba-installation.html#fresh-install-recommended).

For example, putting floors on the cffi and pyarrow dependencies (and maybe others like pandas) might cut out some combinations the solver has to search through.

I can give this a try but it feels like a band-aid solution that is likely to break easily in the future.

@jameslamb
Copy link
Collaborator

Since July 2023, Miniforge includes mamba (cf. https://github.com/conda-forge/miniforge#whats-the-difference-between-mambaforge-and-miniforge) and Mambaforge (= Miniforge as of July 2023) is the recommended way to install mamba

Oh very cool, did not know that! Thanks very much for the links.

Given that, I definitely support this PR. I think it's ok to mix conda and mamba across different CI jobs for now.

but it feels like a band-aid solution that is likely to break easily in the future

I definitely agree that having to provide hints like this shouldn't be necessary in the presence of well-formed package metadata for all of the things we install from conda-forge + a lack of bugs in the mamba solver. But the reality is that that's a fairly large surface area, and sometimes things do break in surprising ways solving such large conda environments. Putting floors on the dependencies we install would give us faster feedback that the root of some issue (like ) is actually "got downgraded to a surprisingly-old version of a dependency".

So my proposal is this:

  1. let's merge this PR (assuming my manually triggering that failing job leads to it succeeding)
  2. Please add floors on cffi and pyarrow in the testing scripts over in [python-package] Allow to pass Arrow table as training data #6034
  3. I'll add floors on other dependencies in test scripts in a separate PR
  4. at some point, I or maybe @jmoralez will look into using mamba more widely in the project

@borchero
Copy link
Collaborator Author

assuming my manually triggering that failing job leads to it succeeding

That hasn't been too successful, unfortunately :/

Since July 2023, Miniforge includes mamba

Given that, do you think we can just rebuild the images in https://github.com/guolinke/lightgbm-ci-docker without making any changes to the Dockerfile? That would give us mamba in the CI running on Azure wouldn't it?

@jameslamb
Copy link
Collaborator

can we just rebuild the images

good idea! It's been a while since that was done, I just put up guolinke/lightgbm-ci-docker#30 to make some changes that'll hopefully make that a bit easier to do.

@borchero
Copy link
Collaborator Author

Nice! Let me know once new images are used in the Azure CI 😄

@jameslamb
Copy link
Collaborator

The new images built successfully!

(the failed ubuntu-14.04 job isn't relevant... we don't use that image any more. I'll remove that task + Dockerfile entirely)

So I just pushed 7e2114b, switching this PR back to using mamba unconditionally for all the Linux and macOS CI jobs.

Hopefully it helps 🤞🏻

@jameslamb
Copy link
Collaborator

Looks to me like that worked! https://dev.azure.com/lightgbm-ci/lightgbm-ci/_build/results?buildId=15335&view=logs&j=02a2c3ba-81f8-54e3-0767-5d5adbb0daa9

As of this branch, LightGBM's CI would now use mamba unconditionally for all macOS and Linux CI jobs.

Hopefully that'll help with with #6034, and with development in the repo in general. The builds all look a few minutes faster than the last build on master (link).

@jmoralez since I pushed commits here, I don't want to approve and merge it. Could you help with a review?

@jameslamb jameslamb dismissed their stale review October 24, 2023 04:29

dismissing to wait for another reviewer, since I pushed changes here

@jameslamb jameslamb changed the title [ci] Optionally allow to use mamba instead of conda in CI jobs [ci] use mamba instead of conda in macOS and Linux CI jobs Oct 24, 2023
@borchero
Copy link
Collaborator Author

@jameslamb awesome! Glad to see that this helped, hope it gets rid of the CI failures in #6034 🙏🏼

@jezdez
Copy link

jezdez commented Oct 24, 2023

Hey y'all, conda maintainer here, in case you want to use mamba's solver without switching to the mamba CLI, you might also want to switch to the integration into conda called conda-libmamba-solver, that the community worked on a for a while.

It's already installed by default by recent miniconda/miniforge installers and can be enabled by setting CONDA_SOLVER=libmamba or passing --solver=libmamba (more info).

Very soon® we'll even switch the default solver used by conda to using that, see all the details of our rollout in our blog post from earlier this year.

@jameslamb
Copy link
Collaborator

Thanks so much for the links @jezdez !

I was aware that you the conda CLI was in the process of switching its default solver to libmamba.

For us here in this project's testing jobs, what are the benefits of keeping the conda CLI + passing --solver instead of just switching to the mamba CLI?

If the only benefit is "you don't have to update your scripts", then I think I'd still prefer the switch to the mamba CLI here to passing through --solver or setting an environment variable.

@jameslamb
Copy link
Collaborator

I'm going to merge this so @borchero can keep making progress, but it's easily reversible so @jezdez whenever you have time I'd be very interested to hear your answer to #6140 (comment).

@jameslamb jameslamb merged commit 2d358d5 into microsoft:master Oct 25, 2023
@borchero borchero deleted the mamba branch October 25, 2023 14:49
Ten0 pushed a commit to Ten0/LightGBM that referenced this pull request Jan 12, 2024
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 30, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants