Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable tolerateAllTaints on kOps #1806

Merged
merged 1 commit into from
Oct 26, 2023

Conversation

torredil
Copy link
Member

@torredil torredil commented Oct 25, 2023

CI is failing due to the node pod on the control plane node being stuck in a CrashLoopBackOff state. This PR prevents the node pod from being scheduled on the control plane node.

See logs:

## Printing pod ebs-csi-node-x4nn2 ebs-plugin container logs
#
I1020 18:45:18.555825       1 driver.go:77] "Driver Information" Driver="ebs.csi.aws.com" Version="v1.24.0"
I1020 18:45:18.555934       1 node.go:82] "[Debug] Retrieving node info from metadata service"
I1020 18:45:18.555942       1 node.go:84] "regionFromSession Node service" region=""
I1020 18:45:18.555953       1 metadata.go:85] "retrieving instance data from ec2 metadata"
I1020 18:45:21.697673       1 metadata.go:88] "ec2 metadata is not available"
I1020 18:45:21.697695       1 metadata.go:96] "retrieving instance data from kubernetes api"
I1020 18:45:21.698336       1 metadata.go:101] "kubernetes api is available"
panic: error getting Node i-03843a6ba2de8e36b: Get "[https://100.64.0.1:443/api/v1/nodes/i-03843a6ba2de8e36b](https://100.64.0.1/api/v1/nodes/i-03843a6ba2de8e36b)": dial tcp 100.64.0.1:443: i/o timeout

goroutine 1 [running]:
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver.newNodeService(0xc000094d20)
	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/node.go:87 +0x3e5
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver.NewDriver({0xc00073ff18, 0xa, 0x4?})
	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/driver.go:99 +0x425
main.main()
	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/cmd/main.go:64 +0x4da
###
...

  ENDLOG for container kube-system:ebs-csi-node-vdcwm:node-driver-registrar
  Oct 26 14:15:26.196: INFO: Logs of kube-system/ebs-csi-node-vdcwm:liveness-probe on node i-07b3d93dec6e3f366
  Oct 26 14:15:26.196: INFO:  : STARTLOG
  W1026 14:00:04.692838       1 connection.go:173] Still connecting to unix:///csi/csi.sock
  W1026 14:00:14.692979       1 connection.go:173] Still connecting to unix:///csi/csi.sock
  W1026 14:00:24.693685       1 connection.go:173] Still connecting to unix:///csi/csi.sock
  W1026 14:15:24.692814       1 connection.go:173] Still connecting to unix:///csi/csi.sock

  ENDLOG for container kube-system:ebs-csi-node-vdcwm:liveness-probe
  [FAILED] in [SynchronizedBeforeSuite] - test/e2e/e2e.go:242 @ 10/26/23 14:15:26.196
  << Timeline

  [FAILED] Error waiting for all pods to be running and ready: Timed out after 600.000s.
  Expected all pods (need at least 0) in namespace "kube-system" to be running and ready (except for 0).
  25 / 26 pods were running and ready.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 25, 2023
@torredil torredil force-pushed the ci-test-1823 branch 4 times, most recently from eff4a16 to 57b1211 Compare October 25, 2023 19:45
@torredil
Copy link
Member Author

/retest

@torredil torredil force-pushed the ci-test-1823 branch 3 times, most recently from 1c9e63b to d5620d8 Compare October 26, 2023 13:19
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 26, 2023
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 26, 2023
@torredil
Copy link
Member Author

/retest

@torredil torredil force-pushed the ci-test-1823 branch 6 times, most recently from 6fdba7b to 76ef28b Compare October 26, 2023 17:05
Signed-off-by: Eddie Torres <torredil@amazon.com>
@torredil torredil changed the title CI test Disable tolerateAllTaints on kOps Oct 26, 2023
@torredil
Copy link
Member Author

/retest

@ConnorJC3
Copy link
Contributor

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ConnorJC3

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 26, 2023
@AndrewSirenko
Copy link
Contributor

/lgtm

@torredil
Copy link
Member Author

/test pull-aws-ebs-csi-driver-external-test-eks

@torredil
Copy link
Member Author

Need to request a NATGateway quota increase:

2023-10-26 18:01:04 [✖]  AWS::EC2::NatGateway/NATGateway: CREATE_FAILED – "Resource handler returned message: \"Performing this operation would exceed the limit of 5 NAT gateways (Service: Ec2, Status Code: 400, Request ID: 813d64cd-fecc-4b79-9831-86da84e28989)\" (RequestToken: 3a961337-8a6d-6e8d-4c45-290ba9e92dc9, HandlerErrorCode: ServiceLimitExceeded)"

@torredil
Copy link
Member Author

/test pull-aws-ebs-csi-driver-external-test-eks

@AndrewSirenko
Copy link
Contributor

/retest

1 similar comment
@AndrewSirenko
Copy link
Contributor

/retest

@ConnorJC3
Copy link
Contributor

I'm going to try testing 1 at a time to try to dodge the quota

/test pull-aws-ebs-csi-driver-external-test-eks

@AndrewSirenko
Copy link
Contributor

/retest

@k8s-ci-robot k8s-ci-robot merged commit 09466e8 into kubernetes-sigs:master Oct 26, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants