Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] use conda-forge in Linux and macOS CI jobs #4953

Merged
merged 86 commits into from
Feb 11, 2022
Merged

Conversation

jameslamb
Copy link
Collaborator

@jameslamb jameslamb commented Jan 16, 2022

Contributes to #4948.

This PR attempts to switch LightGBM's Linux and macOS CI jobs to use conda-forge instead of the anaconda default channels. Based on the discussion in #4948, this PR proposes doing that through the use of mambaforge (link).

Note for reviewers

Opening this as a draft, to test changes on LightGBM's real CI setup.

@jameslamb
Copy link
Collaborator Author

Even cutting over to conda-forge, I'm still seeing one job fail with a timeout similar to #4948

Linux sdist (build)

@StrikerRUS could you look at those logs and the changes I've made in this PR so far and let me know if you have any ideas for fixes to investigate? If we can't think of anything else, then next I'll try setting MKL_THREADING_LAYER=GNU to see if it helps (#4948 (comment))

@StrikerRUS
Copy link
Collaborator

@jameslamb Thanks for your efforts!

#4948 is affecting only Linux_latest jobs family. Linux sdist is run inside our Ubuntu-14 Docker and compiles LightGBM via gcc

COMPILER: gcc

So this is another problem. According to the logs,

============================= test session starts ==============================
platform linux -- Python 3.7.12[pypy-7.3.7-final], pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: /__w/1/s
collected 647 items

../tests/python_package_test/test_basic.py F............................ [  4%]
F..FFFFFF...................                                             [  8%]
../tests/python_package_test/test_consistency.py FFFFFF                  [  9%]
../tests/python_package_test/test_dask.py 

a lot of tests fail and the job hangs during trying to run the first Dask test.

A guess the reason is here:

platform linux -- Python 3.7.12[pypy-7.3.7-final], pytest-6.2.5, py-1.11.0, pluggy-1.0.0

Somehow pypy implementation Python 3.7.12[pypy-7.3.7-final] was configured to run tests.

@jameslamb
Copy link
Collaborator Author

Somehow pypy implementation Python 3.7.12[pypy-7.3.7-final] was configured to run tests.

Ah! weird! Ok thank you for the help, I'll investigate that.

.ci/test.sh Outdated
Comment on lines 145 to 147
# python-graphviz has to be installed separately to prevent conda from downgrading to pypy
${CONDA_INSTALL} -q -y -n $CONDA_ENV \
python-graphviz || exit -1
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only for the specific of case of Linux sdist (Python 3.7, ubuntu:14.04, gcc), I found that conda was downgrading Python from a cpython-based Python 3.7.12 to a PyPy-based Python 3.7.12: #4953 (comment)

Was able to reproduce this locally in Docker.

docker run \
    --rm \
    --env CONDA=/opt/conda \
    --entrypoint="" \
    -it lightgbm/vsts-agent:ubuntu-14.04 \
    /bin/bash

# then run the following in the container
export PATH=${CONDA}/bin:${PATH}

conda create \
    --name test-env \
    -c conda-forge \
    --override-channels \
        python=3.7

conda config --set always_yes yes --set changeps1 no
conda update -q -y conda

source activate test-env
conda install -c conda-forge --override-channels -q -y -n test-env \
    cloudpickle \
    dask \
    distributed \
    joblib \
    matplotlib \
    numpy \
    pandas \
    psutil \
    python-graphviz \
    pytest \
    scikit-learn \
    scipy

This resulted in the following from the conda install logs

The following packages will be DOWNGRADED:

  python                        3.7.12-hf930737_100_cpython --> 3.7.12-0_73_pypy
  python_abi                                    3.7-2_cp37m --> 3.7-2_pypy37_pp73
  setuptools                          60.5.0-py37h89c1867_0 --> 59.8.0-py37h9c2f6ca_0

So I started thinking "ok, what is changing in this PR? One thing I'm proposing changing in this PR is installing python-graphviz with conda instead of as a separate pip install, to reduce the risk of conflicts.

Funny enough, it seems like that change was the cause for conda choosing PyPy. Using the sample code above, I found that installing python-graphviz by itself, separately, does not result in conda choosing to downgrade from a cpython-based Python to the PyPy one.

@StrikerRUS
Copy link
Collaborator

wat?

image
image

PR doesn't see new commits in the base branch...

@StrikerRUS StrikerRUS closed this Feb 5, 2022
@StrikerRUS StrikerRUS reopened this Feb 5, 2022
@StrikerRUS
Copy link
Collaborator

OK, GitHub is lagging.

image

Feel free to continue my plan while I'm sleeping.

@StrikerRUS
Copy link
Collaborator

I can see the only difference between successful and the most recemt failing Linux GPU jobs:

    scipy-1.7.3                |   py39hee8e79c_0        22.0 MB  conda-forge

    scipy-1.8.0                |   py39hee8e79c_0        25.4 MB  conda-forge

@StrikerRUS
Copy link
Collaborator

@jameslamb @jmoralez Please check the following results: #4953 (comment).

@StrikerRUS
Copy link
Collaborator

@jameslamb @jmoralez I'd like to push this blocking PR forward. Could you please take a look when have time?

@jameslamb
Copy link
Collaborator Author

Could you please take a look when have time?

Sorry, didn't realize you were waiting on me. Changes all look fine to me, ok with me if we merge them.

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants