From 2933efcfeeb99b9059be1ee5b97a004c67992160 Mon Sep 17 00:00:00 2001 From: panbingkun Date: Wed, 20 Mar 2024 07:15:32 -0700 Subject: [PATCH] [SPARK-47481][INFRA][3.4] Pin `matplotlib<3.3.0` to fix Python linter failure ### What changes were proposed in this pull request? The pr aims to fix `python linter issue` on branch-3.4 through pinning `matplotlib<3.3.0` ### Why are the changes needed? - Through this PR https://github.com/apache/spark/pull/45600, we found that the version of `matplotlib` in our Docker image was `3.8.2`, which clearly did not meet the original requirements for `branch-3.4`. https://github.com/panbingkun/spark/actions/runs/8354370179/job/22869580038 image https://github.com/apache/spark/blob/branch-3.4/dev/requirements.txt#L12 image - Fix as follows: image ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45608 from panbingkun/branch_3.4_pin_matplotlib. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- dev/infra/Dockerfile | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile index 68d27052437b0..5ebd10339be95 100644 --- a/dev/infra/Dockerfile +++ b/dev/infra/Dockerfile @@ -37,6 +37,7 @@ RUN add-apt-repository ppa:pypy/ppa RUN apt update RUN $APT_INSTALL gfortran libopenblas-dev liblapack-dev RUN $APT_INSTALL build-essential +RUN $APT_INSTALL python3-matplotlib RUN mkdir -p /usr/local/pypy/pypy3.7 && \ curl -sqL https://downloads.python.org/pypy/pypy3.7-v7.3.7-linux64.tar.bz2 | tar xjf - -C /usr/local/pypy/pypy3.7 --strip-components=1 && \ @@ -64,8 +65,8 @@ RUN Rscript -e "devtools::install_version('roxygen2', version='7.2.0', repos='ht # See more in SPARK-39735 ENV R_LIBS_SITE "/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library" -RUN pypy3 -m pip install numpy 'pandas<=1.5.3' scipy coverage matplotlib -RUN python3.9 -m pip install 'numpy==1.23.5' 'pyarrow==12.0.1' 'pandas<=1.5.3' scipy unittest-xml-reporting plotly>=4.8 scikit-learn 'mlflow>=1.0' coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*' +RUN pypy3 -m pip install numpy 'pandas<=1.5.3' scipy coverage 'matplotlib<3.3.0' +RUN python3.9 -m pip install 'numpy==1.23.5' 'pyarrow==12.0.1' 'pandas<=1.5.3' scipy unittest-xml-reporting plotly>=4.8 scikit-learn 'mlflow>=1.0' coverage 'matplotlib<3.3.0' openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*' # Add Python deps for Spark Connect. RUN python3.9 -m pip install grpcio protobuf googleapis-common-protos grpcio-status