-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to latest Hadoop 3.3.6 #1937
Comments
Also you are supposed to be able to specify the Hadoop version when launching the image per the image specifics instructions: docker build --rm --force-rm -t jupyter/all-spark-notebook:spark-3.4.1 . --build-arg hadoop_version=3.3.6 This also failed to set Hadoop to version 3.3.6 |
Appears that hadoop is bundled with Spark. So likely this is not a Jupyter build issue. In other words, Hadoop 3.3.4 is bundled with Spark 3.4.1 michael@PC:/mnt/c/Users/mvier/code/helium/spark-3.4.1-bin-hadoop3$ find . -name "hadoop*" |
3.3.6 was inserted into the Spark build files last week. Appears we just need to wait for the next Spark 3.4.2 release which will include Hadoop 3.3.6. |
Before this issue is closed. I'm wondering why --build-arg hadoop_version=3.3.6 has no effect? Per the specifics doc you are supposed to be able to specify the Hadoop version when launching the image per specifics instructions: Is there a work-around to configure a different Hadoop version? |
Recap: Attempted to dynamically update Hadoop to 3.3.6 via three methods: None of the methods worked. |
You need to build |
Overall, you're right, and we're only using the bundled Hadoop. |
yes, Hadoop is bundled in Apache Spark. Apache Spark 3.5.0 will soon start RC |
This was for hadoop version 2 or version 3 |
There are some problems with Hadoop 3.3.6 apache/hadoop#5706 https://lists.apache.org/thread/o7ockmppo5yqk2cm7f1kvo7plfgx6xnc |
What docker image(s) are you using?
all-spark-notebook
Host OS system and architecture running docker image
Ubuntu 22.04
What Docker command are you running?
docker run -it -p 8888:8888 --user root -e GRANT_SUDO=yes -v $(pwd):/home/jovyan/work jupyter/all-spark-notebook:spark-3.4.1
How to Reproduce the problem?
Visit localhost:8888
Open Terminal from Launcher
(base) jovyan@745e84c0ed21:/home$ find /usr/local/spark-3.4.1-bin-hadoop3/ -name "hadoop*"
/usr/local/spark-3.4.1-bin-hadoop3/jars/hadoop-yarn-server-web-proxy-3.3.4.jar
/usr/local/spark-3.4.1-bin-hadoop3/jars/hadoop-shaded-guava-1.1.1.jar
/usr/local/spark-3.4.1-bin-hadoop3/jars/hadoop-client-runtime-3.3.4.jar
/usr/local/spark-3.4.1-bin-hadoop3/jars/hadoop-client-api-3.3.4.jar
(base) jovyan@745e84c0ed21:/home$
(base) jovyan@745e84c0ed21:/home$
Command output
No response
Expected behavior
Expect to see hadoop-client-api-3.3.6.jar. Hadoop should be updated to latest which is 3.3.6 or greater.
Actual behavior
Although Spark is at version 3.4.1 the Hadoop library is still at 3.3.4
base) jovyan@745e84c0ed21:/home$ find /usr/local/spark-3.4.1-bin-hadoop3/ -name "hadoop*"
/usr/local/spark-3.4.1-bin-hadoop3/jars/hadoop-yarn-server-web-proxy-3.3.4.jar
/usr/local/spark-3.4.1-bin-hadoop3/jars/hadoop-shaded-guava-1.1.1.jar
/usr/local/spark-3.4.1-bin-hadoop3/jars/hadoop-client-runtime-3.3.4.jar
/usr/local/spark-3.4.1-bin-hadoop3/jars/hadoop-client-api-3.3.4.jar
(base) jovyan@745e84c0ed21:/home$
Anything else?
Our project uses AWS S3 and requires the requester-pays header on all S3 requests. This issue was described and fixed in Hadoop 3.3.5.
https://issues.apache.org/jira/browse/HADOOP-14661
The patch is here:
https://issues.apache.org/jira/secure/attachment/12877218/HADOOP-14661.patch
Per the patch we're required to set "fs.s3a.requester-pays.enabled" to "true"
This fix was enabled in aws-hadoop 3.3.5 and released on Mar 27, 2023.
I've tried to upgrade Hadoop in various ways and it still doesn't work. But I finally noticed that my hadoop is fixed at version 3.3.4. Somehow I can't seem to upgrade to 3.3.5. However Hadoop 3.3.5 was very recently released maybe something extra is needed to get the upgrade into Jupyter.
Latest Docker version
The text was updated successfully, but these errors were encountered: