Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-26015][K8S] Set a default UID for Spark on K8S Images #23017

Closed
wants to merge 8 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 13 additions & 3 deletions bin/docker-image-tool.sh
Original file line number Diff line number Diff line change
@@ -146,6 +146,12 @@ function build {
fi

local BUILD_ARGS=(${BUILD_PARAMS})

# If a custom SPARK_UID was set add it to build arguments
if [ -n "$SPARK_UID" ]; then
BUILD_ARGS+=(--build-arg spark_uid=$SPARK_UID)
fi

local BINDING_BUILD_ARGS=(
${BUILD_PARAMS}
--build-arg
@@ -207,8 +213,10 @@ Options:
-t tag Tag to apply to the built image, or to identify the image to be pushed.
-m Use minikube's Docker daemon.
-n Build docker image with --no-cache
-b arg Build arg to build or push the image. For multiple build args, this option needs to
be used separately for each build arg.
-u uid UID to use in the USER directive to set the user the main Spark process runs as inside the
resulting container
-b arg Build arg to build or push the image. For multiple build args, this option needs to
be used separately for each build arg.

Using minikube when building images will do so directly into minikube's Docker daemon.
There is no need to push the images into minikube in that case, they'll be automatically
@@ -243,7 +251,8 @@ PYDOCKERFILE=
RDOCKERFILE=
NOCACHEARG=
BUILD_PARAMS=
while getopts f:p:R:mr:t:nb: option
SPARK_UID=
while getopts f:p:R:mr:t:nb:u: option
do
case "${option}"
in
@@ -263,6 +272,7 @@ do
fi
eval $(minikube docker-env)
;;
u) SPARK_UID=${OPTARG};;
esac
done

5 changes: 3 additions & 2 deletions docs/running-on-kubernetes.md
Original file line number Diff line number Diff line change
@@ -19,9 +19,9 @@ Please see [Spark Security](security.html) and the specific advice below before

## User Identity

Images built from the project provided Dockerfiles do not contain any [`USER`](https://docs.docker.com/engine/reference/builder/#user) directives. This means that the resulting images will be running the Spark processes as `root` inside the container. On unsecured clusters this may provide an attack vector for privilege escalation and container breakout. Therefore security conscious deployments should consider providing custom images with `USER` directives specifying an unprivileged UID and GID.
Images built from the project provided Dockerfiles contain a default [`USER`](https://docs.docker.com/engine/reference/builder/#user) directive with a default UID of `185`. This means that the resulting images will be running the Spark processes as this UID inside the container. Security conscious deployments should consider providing custom images with `USER` directives specifying their desired unprivileged UID and GID. The resulting UID should include the root group in its supplementary groups in order to be able to run the Spark executables. Users building their own images with the provided `docker-image-tool.sh` script can use the `-u <uid>` option to specify the desired UID.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the docs you quoted before, you can't override the container's GID, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you can, the USER directive allows you to specify both a UID and a GID by separating them with a :. However as noted here changing the GID is likely to have side effects that are difficult for Spark as a project to deal with hence the note about ensuring your UID belongs to the root group. If end users need more complex setups then they are free to create their own custom images.

Copy link
Contributor

@skonto skonto Dec 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rvesse I think using GID 0 is safe , I am using that on Openshift.


Alternatively the [Pod Template](#pod-template) feature can be used to add a [Security Context](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#volumes-and-file-systems) with a `runAsUser` to the pods that Spark submits. Please bear in mind that this requires cooperation from your users and as such may not be a suitable solution for shared environments. Cluster administrators should use [Pod Security Policies](https://kubernetes.io/docs/concepts/policy/pod-security-policy/#users-and-groups) if they wish to limit the users that pods may run as.
Alternatively the [Pod Template](#pod-template) feature can be used to add a [Security Context](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#volumes-and-file-systems) with a `runAsUser` to the pods that Spark submits. This can be used to override the `USER` directives in the images themselves. Please bear in mind that this requires cooperation from your users and as such may not be a suitable solution for shared environments. Cluster administrators should use [Pod Security Policies](https://kubernetes.io/docs/concepts/policy/pod-security-policy/#users-and-groups) if they wish to limit the users that pods may run as.

## Volume Mounts

@@ -87,6 +87,7 @@ Example usage is:
$ ./bin/docker-image-tool.sh -r <repo> -t my-tag build
$ ./bin/docker-image-tool.sh -r <repo> -t my-tag push
```
This will build using the projects provided default `Dockerfiles`.To see more options available for customising the behaviour of this tool, including providing custom `Dockerfiles`, please run with the `-h` flag.

By default `bin/docker-image-tool.sh` builds docker image for running JVM jobs. You need to opt-in to build additional
language binding docker images.
Original file line number Diff line number Diff line change
@@ -17,6 +17,8 @@

FROM openjdk:8-alpine

ARG spark_uid=185

# Before building the docker image, first build and make a Spark distribution following
# the instructions in http://spark.apache.org/docs/latest/building-spark.html.
# If this docker file is being used in the context of building your images from a Spark
@@ -47,5 +49,9 @@ COPY data /opt/spark/data
ENV SPARK_HOME /opt/spark

WORKDIR /opt/spark/work-dir
RUN chmod g+w /opt/spark/work-dir

ENTRYPOINT [ "/opt/entrypoint.sh" ]

# Specify the User that the actual main process will run as
USER ${spark_uid}
Original file line number Diff line number Diff line change
@@ -16,8 +16,14 @@
#

ARG base_img
ARG spark_uid=185

FROM $base_img
WORKDIR /

# Reset to root to run installation tasks
USER 0

RUN mkdir ${SPARK_HOME}/R

RUN apk add --no-cache R R-dev
@@ -27,3 +33,6 @@ ENV R_HOME /usr/lib/R

WORKDIR /opt/spark/work-dir
ENTRYPOINT [ "/opt/entrypoint.sh" ]

# Specify the User that the actual main process will run as
USER ${spark_uid}
Original file line number Diff line number Diff line change
@@ -16,8 +16,14 @@
#

ARG base_img
ARG spark_uid=185

FROM $base_img
WORKDIR /

# Reset to root to run installation tasks
USER 0

RUN mkdir ${SPARK_HOME}/python
# TODO: Investigate running both pip and pip3 via virtualenvs
RUN apk add --no-cache python && \
@@ -37,3 +43,6 @@ ENV PYTHONPATH ${SPARK_HOME}/python/lib/pyspark.zip:${SPARK_HOME}/python/lib/py4

WORKDIR /opt/spark/work-dir
ENTRYPOINT [ "/opt/entrypoint.sh" ]

# Specify the User that the actual main process will run as
USER ${spark_uid}
Original file line number Diff line number Diff line change
@@ -30,7 +30,7 @@ set -e
# If there is no passwd entry for the container UID, attempt to create one
if [ -z "$uidentry" ] ; then
if [ -w /etc/passwd ] ; then
echo "$myuid:x:$myuid:$mygid:anonymous uid:$SPARK_HOME:/bin/false" >> /etc/passwd
echo "$myuid:x:$myuid:$mygid:${SPARK_USER_NAME:-anonymous uid}:$SPARK_HOME:/bin/false" >> /etc/passwd
else
echo "Container ENTRYPOINT failed to add passwd entry for anonymous UID"
fi
Original file line number Diff line number Diff line change
@@ -62,11 +62,12 @@ private[spark] trait ClientModeTestsSuite { k8sSuite: KubernetesSuite =>
.endMetadata()
.withNewSpec()
.withServiceAccountName(kubernetesTestComponents.serviceAccountName)
.withRestartPolicy("Never")
.addNewContainer()
.withName("spark-example")
.withImage(image)
.withImagePullPolicy("IfNotPresent")
.withCommand("/opt/spark/bin/run-example")
.addToArgs("/opt/spark/bin/run-example")
.addToArgs("--master", s"k8s://https://kubernetes.default.svc")
.addToArgs("--deploy-mode", "client")
.addToArgs("--conf", s"spark.kubernetes.container.image=$image")