Skip to content

[FEATURE]:Auto Removal of Taint in PODMON #1827

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
JWilsonDell opened this issue Apr 3, 2025 · 2 comments
Open

[FEATURE]:Auto Removal of Taint in PODMON #1827

JWilsonDell opened this issue Apr 3, 2025 · 2 comments
Labels
type/feature-request New feature request. This is the default label associated with a feature request issue.

Comments

@JWilsonDell
Copy link

Issue:
The CSI Podmon applies taints to all worker nodes when there is a storage disconnection. However, when storage connectivity is restored, it fails to automatically remove the taints. This behavior significantly delays the restoration of production traffic after a storage disconnection event.

Action Taken:
We opened a Dell support request 206899925 and successfully reproduced the issue in the presence of the support team in the lab environment. We also provided all the necessary logs for further analysis and tested the issue with different versions of the CSI.

Dell’s Support team Recommendations:
After analyzing the issue and consulting with Dell engineering team, the support engineer suggested submitting an enhancement request through the account team. They determined that auto-removal of taints is currently not supported in any version of CSI (refer to the attached email for further details).

Enhancement Request:
We recommend that the CSI Podmon solution should be enhanced to automatically remove taints from all worker nodes as soon as storage connectivity is restored. This improvement is crucial for large environments where more than 300 pods are running at a single site. It will ensure that production traffic is restored immediately after any storage disconnection, minimizing downtime and optimizing efficiency.

@JWilsonDell JWilsonDell added needs-triage Issue requires triage. type/feature-request New feature request. This is the default label associated with a feature request issue. labels Apr 3, 2025
@csmbot
Copy link
Collaborator

csmbot commented Apr 3, 2025

@JWilsonDell: Thank you for submitting this issue!

The issue is currently awaiting triage. Please make sure you have given us as much context as possible.

If the maintainers determine this is a relevant issue, they will remove the needs-triage label and respond appropriately.


We want your feedback! If you have any questions or suggestions regarding our contributing process/workflow, please reach out to us at container.storage.modules@dell.com.

@alikdell
Copy link
Contributor

alikdell commented Apr 3, 2025

@JWilsonDell Please work with Pdm on enhancement request.

@alikdell alikdell removed the needs-triage Issue requires triage. label Apr 4, 2025
@alikdell alikdell closed this as completed Apr 4, 2025
@alikdell alikdell reopened this Apr 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature-request New feature request. This is the default label associated with a feature request issue.
Projects
None yet
Development

No branches or pull requests

3 participants