Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add requirement about Taints to the deployment guide #183

Merged
merged 1 commit into from
Jul 14, 2022

Conversation

kota2and3kan
Copy link
Contributor

@kota2and3kan kota2and3kan commented Jul 7, 2022

This PR adds the explanation of Taints to the EKS/AKS deployment guide.
I will reorganize these documents, but I added these sentence since I think it is a critical issue.

If we don't add this Taints to the worker node, the pods of Scalar Products will not be deployed with sample configurations since the sample configuration includes toleration to prevent application pods will be deployed to worker node for Scalar Products.

Also, in the Japanese document, I already added the explanation of Taints.

Please take a look!

@@ -69,6 +69,8 @@ Install the following tools on your bastion for controlling the EKS cluster:
* You must create a Kubernetes cluster with version 1.19 or higher for Scalar DL deployment.
* You must create a node group with the label, key as `agentpool` and value as `scalardlpool` for Scalar DL deployment.
* You must add a rule in the EKS security group to **enable HTTPS access (Port 443)** to the private EKS cluster from the bastion server.
* You must add Taints `kubernetes.io/app=scalardlpool:NoSchedule` to each worker node.
* You need to add the Taints using the `kubectl taint node` command after you created the cluster since EKS does not support adding Taints with key `kubernetes.io/app` to the node group.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a limitation of EKS.
aws/containers-roadmap#1451

@kota2and3kan kota2and3kan added the documentation Improvements or additions to documentation label Jul 7, 2022
Copy link
Collaborator

@feeblefakie feeblefakie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

BTW, I think Boney and Jishnu deployed scalar products without taints, but why did it work well?

@scalar-boney @inv-jishnu

@feeblefakie feeblefakie merged commit 58bfc95 into master Jul 14, 2022
@feeblefakie feeblefakie deleted the add-taints-explanation branch July 14, 2022 11:16
@@ -69,6 +69,8 @@ Install the following tools on your bastion for controlling the EKS cluster:
* You must create a Kubernetes cluster with version 1.19 or higher for Scalar DL deployment.
* You must create a node group with the label, key as `agentpool` and value as `scalardlpool` for Scalar DL deployment.
* You must add a rule in the EKS security group to **enable HTTPS access (Port 443)** to the private EKS cluster from the bastion server.
* You must add Taints `kubernetes.io/app=scalardlpool:NoSchedule` to each worker node.
Copy link
Contributor

@scalar-boney scalar-boney Jul 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manual deployment of Scalar DL worked fine for me without taint.
I did not face any issues related to this.
Let me check more about it.
Sorry for the delay.

@kota2and3kan
Copy link
Contributor Author

@feeblefakie @scalar-boney
Ah, sorry. I had a bit misunderstanding. I mixed up Node Affinity and Taints.
Strictly, the Scalar DL pods will be deployed and work well even if there is no Taints on the worker node for Scalar DL.
However, the application pods also will be deployed to the worker node for Scalar DL if there is no Taints.

I think the main purpose of Taints / Tolerations in the sample configurations is making specific node dedicated to the Scalar DL pods.
(In other words, we don't want to deploy application pods to the dedicated worker node for Scalar DL.)

If there is no Taints on the dedicated worker node for Scalar DL, we cannot achieve the above purpose.
So, the Taints is not required for working Scalar DL, but it is better to add Taints for making specific node dedicated to the Scalar DL.


When we use the Taints / Tolerations with NoSchedule effect, each deployment behavior is the following.

[Node (Taints)] <------(Deploy OK)--- [Pods (Tolerations)]
[Node (Taints)] <------(Deploy NG)--- [Pods (No tolerations)]

[Node (No taints)] <---(Deploy OK)--- [Pods (Tolerations)]
[Node (No taints)] <---(Deploy OK)--- [Pods (No tolerations)]

Taints restricts (don't allow) the pods deployment if there is no Tolerations in the pods.
In other words, we can deploy any pods if there is no Taints on the node.

In the case of Scalar DL deployment, each deployment behavior is the following.

[Node (Taints)] <------(Deploy OK)--- [Scalar DL pods (Tolerations)]
[Node (Taints)] <------(Deploy NG)--- [Application pods (No tolerations)]

[Node (No taints)] <---(Deploy OK)--- [Scalar DL pods (Tolerations)]
[Node (No taints)] <---(Deploy OK)--- [Application pods (No tolerations)]

As above, Scalar DL pods can be deployed even if there is no Taints on the node.
This is why the existing test environment worked well with this manual deployment guide.

However, we cannot make worker node dedicated to the Scalar DL pods if there is no Taints.
The application pods may be mixed in the dedicated node for Scalar DL.
It may causes some unexpected issues like performance degradation of Scalar DL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants