Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PWX-30276: Do not delete pods running on storage down nodes. #1385

Merged
merged 2 commits into from
May 9, 2023

Conversation

adityadani
Copy link
Contributor

What type of PR is this?

Uncomment only one and also add the corresponding label in the PR:
bug

What this PR does / why we need it:

  • Introduce a new state in Stork for storage down nodes.
  • Remap portworx StorageDown state to Stork's StorageDown state
  • For degraded node state:
    • filter out the nodes from extender requests.
    • monitor should delete the pods running on them
  • For storage down state:
    • do not filter out such nodes.
    • give a low score to nodes which are storage down.
    • monitor should not delete pods running on them

Does this PR change a user-facing CRD or CLI?:
No

Is a release note needed?:

Issue: Stork would delete pods running on Degraded or Portworx StorageDown nodes.
User Impact: Applications would face disruption even though drivers like Portworx supported applications running on StorageDown nodes.
Resolution: Stork will not delete pods running on StorageDown nodes.

Does this change need to be cherry-picked to a release branch?:
Yes - 23.4

Testing Notes

  • Added a UT to ensure pods are not deleted on StorageDown nodes.
  • Modified existing UTs to use StorageDown state instead of NodeDegraded state.
  • Added a UT in extender to not schedule pods on Degraded nodes.
  • Currently working on adding integration tests.

@cnbu-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

@codecov
Copy link

codecov bot commented Apr 29, 2023

Codecov Report

Patch coverage: 92.85% and project coverage change: +0.06 🎉

Comparison is base (2bebe98) 67.76% compared to head (2982478) 67.82%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1385      +/-   ##
==========================================
+ Coverage   67.76%   67.82%   +0.06%     
==========================================
  Files          42       42              
  Lines        4889     4889              
==========================================
+ Hits         3313     3316       +3     
+ Misses       1242     1240       -2     
+ Partials      334      333       -1     
Impacted Files Coverage Δ
pkg/monitor/monitor.go 66.13% <50.00%> (+1.58%) ⬆️
pkg/extender/extender.go 82.56% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

pkg/extender/extender_test.go Outdated Show resolved Hide resolved
pkg/extender/extender_test.go Outdated Show resolved Hide resolved
pkg/monitor/monitor_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@pp511 pp511 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good.
Some nitpicks.
One comment in monitor.go needs to be addressed before we merge.

Copy link
Contributor

@pp511 pp511 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. just one nit that you missed addressing. ok to merge after CBTs pass.

@adityadani
Copy link
Contributor Author

Thanks @pp511 ! Will fix the missed comment. Currently trying to get a successful integration test run with the newly added tests. Will merge it once I get a success.

@adityadani adityadani force-pushed the PWX-30726 branch 2 times, most recently from 7f6c49d to f142f33 Compare May 6, 2023 00:13
- Introduce a new state in Stork for storage down nodes.
- Remap portworx StorageDown state to Stork's StorageDown state
- For degraded node state:
  - filter out the nodes from extender requests.
  - monitor should delete the pods running on them
- For storage down state:
  - do not filter out such nodes.
  - give a low score to nodes which are storage down.
  - monitor should not delete pods running on them

Signed-off-by: Aditya Dani <aditya@portworx.com>
require.NotEqual(t, poolMaintenanceNode.Name, scheduledNodes[0].Name, "Pod should not be scheduled on node in PoolMaintenance state")

err = volumeDriver.ExitPoolMaintenance(poolMaintenanceNode)
require.NoError(t, err, "Error starting driver on Node %+v", scheduledNodes[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit..Error exiting pool maintenance

verifyScheduledNode(t, scheduledNodes[0], volumeNames)

err = volumeDriver.EnterPoolMaintenance(scheduledNodes[0])
require.NoError(t, err, "Error stopping driver on scheduled Node %+v", scheduledNodes[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit..Error stopping driver -> Error entering pool maintenance

@pp511
Copy link
Contributor

pp511 commented May 8, 2023

Integration tests look good. Just two nits that need to be addressed

- Added integration tests for extender and health-monitor.
- Simulating a StorageDown state by using Portworx driver's Pool Maintenance operation.
@adityadani adityadani merged commit 07ea4ff into master May 9, 2023
@adityadani adityadani deleted the PWX-30726 branch May 9, 2023 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants