-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix managed suite failures #5696
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
TiberiuGC
approved these changes
Sep 14, 2022
Outstanding work debugging this! ✨ |
some changes to generated files and packages due to running make unit-test
Himangini
force-pushed
the
fix-managed-suite
branch
from
September 15, 2022 12:41
7e9c901
to
a1fc8e3
Compare
cPu1
reviewed
Sep 16, 2022
Comment on lines
-164
to
-292
}, | ||
}, | ||
{ | ||
NodeGroupBase: &api.NodeGroupBase{ | ||
Name: ubuntuNodegroup, | ||
VolumeSize: aws.Int(25), | ||
AMIFamily: "Ubuntu2004", | ||
}, | ||
}, | ||
} | ||
|
||
cmd := params.EksctlCreateCmd. | ||
WithArgs( | ||
"nodegroup", | ||
"--config-file", "-", | ||
"--verbose", "4", | ||
). | ||
WithoutArg("--region", params.Region). | ||
WithStdin(clusterutils.Reader(clusterConfig)) | ||
|
||
Expect(cmd).To(RunSuccessfully()) | ||
|
||
tests.AssertNodeVolumes(params.KubeconfigPath, params.Region, bottlerocketNodegroup, "/dev/sdb") | ||
|
||
By("correctly configuring the bottlerocket nodegroup") | ||
kubeTest, err := kube.NewTest(params.KubeconfigPath) | ||
Expect(err).NotTo(HaveOccurred()) | ||
|
||
nodeList := kubeTest.ListNodes(metav1.ListOptions{ | ||
LabelSelector: fmt.Sprintf("%s=%s", "eks.amazonaws.com/nodegroup", bottlerocketNodegroup), | ||
}) | ||
Expect(nodeList.Items).NotTo(BeEmpty()) | ||
for _, node := range nodeList.Items { | ||
Expect(node.Status.NodeInfo.OSImage).To(ContainSubstring("Bottlerocket")) | ||
Expect(node.Labels["kubernetes.io/hostname"]).To(Equal("custom-bottlerocket-host")) | ||
} | ||
kubeTest.Close() | ||
}) | ||
|
||
It("should have created an EKS cluster and 4 CloudFormation stacks", func() { | ||
awsSession := NewConfig(params.Region) | ||
|
||
Expect(awsSession).To(HaveExistingCluster(params.ClusterName, string(ekstypes.ClusterStatusActive), params.Version)) | ||
|
||
Expect(awsSession).To(HaveExistingStack(fmt.Sprintf("eksctl-%s-cluster", params.ClusterName))) | ||
Expect(awsSession).To(HaveExistingStack(fmt.Sprintf("eksctl-%s-nodegroup-%s", params.ClusterName, initialAl2Nodegroup))) | ||
Expect(awsSession).To(HaveExistingStack(fmt.Sprintf("eksctl-%s-nodegroup-%s", params.ClusterName, bottlerocketNodegroup))) | ||
Expect(awsSession).To(HaveExistingStack(fmt.Sprintf("eksctl-%s-nodegroup-%s", params.ClusterName, ubuntuNodegroup))) | ||
}) | ||
|
||
It("should have created a valid kubectl config file", func() { | ||
config, err := clientcmd.LoadFromFile(params.KubeconfigPath) | ||
Expect(err).ShouldNot(HaveOccurred()) | ||
|
||
err = clientcmd.ConfirmUsable(*config, "") | ||
Expect(err).ShouldNot(HaveOccurred()) | ||
|
||
Expect(config.CurrentContext).To(ContainSubstring("eksctl")) | ||
Expect(config.CurrentContext).To(ContainSubstring(params.ClusterName)) | ||
Expect(config.CurrentContext).To(ContainSubstring(params.Region)) | ||
}) | ||
|
||
Context("and listing clusters", func() { | ||
It("should return the previously created cluster", func() { | ||
cmd := params.EksctlGetCmd.WithArgs("clusters", "--all-regions") | ||
Expect(cmd).To(RunSuccessfullyWithOutputString(ContainSubstring(params.ClusterName))) | ||
}) | ||
}) | ||
|
||
Context("and checking the nodegroup health", func() { | ||
It("should return healthy", func() { | ||
checkNg := func(ngName string) { | ||
cmd := params.EksctlUtilsCmd.WithArgs( | ||
"nodegroup-health", | ||
"--cluster", params.ClusterName, | ||
"--name", ngName, | ||
) | ||
|
||
Expect(cmd).To(RunSuccessfullyWithOutputString(ContainSubstring("active"))) | ||
} | ||
|
||
checkNg(initialAl2Nodegroup) | ||
checkNg(bottlerocketNodegroup) | ||
checkNg(ubuntuNodegroup) | ||
}) | ||
}) | ||
|
||
Context("and scale the initial nodegroup", func() { | ||
It("should not return an error", func() { | ||
cmd := params.EksctlScaleNodeGroupCmd.WithArgs( | ||
"--cluster", params.ClusterName, | ||
"--nodes-min", "2", | ||
"--nodes", "3", | ||
"--nodes-max", "4", | ||
"--name", initialAl2Nodegroup, | ||
) | ||
Expect(cmd).To(RunSuccessfully()) | ||
}) | ||
}) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the tests fail if this code isn't moved around?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. I moved it around to speed up the tests mostly.
cPu1
approved these changes
Sep 16, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Closes https://github.com/weaveworks/eksctl-ci/issues/57 , #5682, #5683, #5684, #5685, #5686
The managed suite was failing consistently from the past two weeks. Logs below from debugging
Looking at the troubleshooting guides and increasing the livelines and readiness probe timeouts didn't help, the test suite was still failing.
https://aws.amazon.com/premiumsupport/knowledge-center/eks-failed-create-pod-sandbox/
aws/amazon-vpc-cni-k8s#1847
aws/amazon-vpc-cni-k8s#1055
aws/amazon-vpc-cni-k8s#1425
https://docs.aws.amazon.com/eks/latest/userguide/managing-vpc-cni.html#updating-vpc-cni-eks-add-on
Also following the troubleshooting guide https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/troubleshooting.md#known-issues and applying the suggestion
kubectl apply -f https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/v1.11.3/config/master/aws-k8s-cni.yaml
didn't work. After running the apply command the
aws-nodes
would start crashing again after some timeFinally found the cause 😅 Due to insufficient memory, the Nodes kept restarting.
I updated the
instanceType
fromm5.large
tot3a.xlarge
since these are best suited for running build/test environments. Also, they are very cost-effective than using general purpose instances.https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/burstable-performance-instances.html
https://aws.amazon.com/ec2/instance-types/t3/
https://aws.amazon.com/ec2/instance-types/m5/
Last note on the performance, after the added restructuring and cleanup changes in this PR the execution time for running
managed suite
went down from2h 30m 53s
to1h 46m 47s
🤑 ⚡
Rest of the commits are package updates, updates to the vpc-cni plugin version to
1.11.3
as suggested here and updates to auto-generated files as a consequence of runningmake unit-test
locally.Checklist
README.md
, or theuserdocs
directory)area/nodegroup
) and kind (e.g.kind/improvement
)BONUS POINTS checklist: complete for good vibes and maybe prizes?! 🤯