Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] OCPBUGS-45924: for PodSucceeded static pod should always be pending #1911

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tkashem
Copy link
Contributor

@tkashem tkashem commented Dec 20, 2024

the following conditions:

  • a) pod.Status.Phase == PodSucceeded
  • b) pod.Status.Condition[type==corev1.PodReady]: status == corev1.ConditionTrue

if kubelet does not update the Pod status (a and b) atomically then there is a gap between when 1) pod.Status.Phase is set to PodSucceeded and 2) the PodReady condition is updated to False

if a and b are true then we may signal that "static pod is ready" on the designated node, which may cause the installer controller to create a new installer pod on a different node
if a is true, this alone can be used to say that "static pod is pending" on the designated node.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 20, 2024
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Dec 20, 2024
@openshift-ci-robot
Copy link

@tkashem: This pull request references Jira Issue OCPBUGS-45924, which is invalid:

  • expected the bug to target the "4.19.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

I don't know much about how the Ready condition of a static pod is updated, but if for some reason, kubelet can't update the status of the Ready condition to "False" when the Pod gracefully exits then the installer controller may deem a static pod that has just completed as "static pod is ready"

Hypothetically, this may cause the installer controller to create an installer Pod on a different node that could lead to quorum loss?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

openshift-ci bot commented Dec 20, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tkashem

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 20, 2024
@tkashem tkashem force-pushed the experiment branch 2 times, most recently from dff6593 to 24c1781 Compare January 13, 2025 12:24
the following conditions:
- a) pod.Status.Phase == PodSucceeded
- b) pod.Status.Condition[type==corev1.PodReady]: status ==
corev1.ConditionTrue

if kubelet does not update the Pod status (a and b) atomically
then there is a gap between when 1) pod.Status.Phase is set to
PodSucceeded and 2) the PodReady condition is updated to False

if a abd b are true then we may signal that "static pod is ready"
on the designated node, which may cause the installer controller
to create a new installer pod on a different node

if a is true, this alone can be used to say that "static pod is
pending" on the designated node.
@tkashem
Copy link
Contributor Author

tkashem commented Jan 13, 2025

proof PR: openshift/cluster-etcd-operator#1385

@tkashem
Copy link
Contributor Author

tkashem commented Jan 13, 2025

/cc @dgrisonnet @benluddy (I am probably missing something obvious)

Copy link
Contributor

openshift-ci bot commented Jan 13, 2025

@tkashem: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants