Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EKS] [request]: InvalidDiskCapacity warning on EKS Fargate #1403

Open
mickael-ange opened this issue Jun 14, 2021 · 4 comments
Open

[EKS] [request]: InvalidDiskCapacity warning on EKS Fargate #1403

mickael-ange opened this issue Jun 14, 2021 · 4 comments
Labels
EKS Amazon Elastic Kubernetes Service Fargate AWS Fargate Proposed Community submitted issue

Comments

@mickael-ange
Copy link

mickael-ange commented Jun 14, 2021

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request
What do you want us to build?

Since #625 has been shipped, we are starting to scheduling workloads on EKS Fargate. Every time a new Fargate node is created, the node reports the following warning:

invalid capacity 0 on image filesystem.

kubectl describe node fargate-ip-172-17-8-64.ap-northeast-1.compute.internal

Events:
  Type     Reason                   Age    From                                                             Message
  ----     ------                   ----   ----                                                             -------
  Normal   Starting                 9m25s  kubelet, fargate-ip-172-17-8-64.ap-northeast-1.compute.internal  Starting kubelet.
  Warning  InvalidDiskCapacity      9m25s  kubelet, fargate-ip-172-17-8-64.ap-northeast-1.compute.internal  invalid capacity 0 on image filesystem
  Normal   NodeHasSufficientMemory  9m25s  kubelet, fargate-ip-172-17-8-64.ap-northeast-1.compute.internal  Node fargate-ip-172-17-8-64.ap-northeast-1.compute.internal status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    9m25s  kubelet, fargate-ip-172-17-8-64.ap-northeast-1.compute.internal  Node fargate-ip-172-17-8-64.ap-northeast-1.compute.internal status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     9m25s  kubelet, fargate-ip-172-17-8-64.ap-northeast-1.compute.internal  Node fargate-ip-172-17-8-64.ap-northeast-1.compute.internal status is now: NodeHasSufficientPID
  Normal   NodeAllocatableEnforced  9m24s  kubelet, fargate-ip-172-17-8-64.ap-northeast-1.compute.internal  Updated Node Allocatable limit across pods
  Normal   NodeReady                9m15s  kubelet, fargate-ip-172-17-8-64.ap-northeast-1.compute.internal  Node fargate-ip-172-17-8-64.ap-northeast-1.compute.internal status is now: NodeReady

We use BotKube to monitor our EKS clusters. Warnings and errors are sent to our Slack channels. The above InvalidDiskCapacity is now "spamming" us for each scheduled pod on EKS Fargate.

I'm wondering if we are the only one affected by this issue or if this is a temporary issue on EKS Fargate scheduler and whether or not AWS is going to handle this warning in the near future?

Which service(s) is this request for?
EKS Fargate

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
What outcome are you trying to achieve, ultimately, and why is it hard/impossible to do right now? What is the impact of not having this problem solved? The more details you can provide, the better we'll be able to understand and solve the problem.

I'm trying to avoid false positive alarms on EKS cluster with workload scheduled on Fargate.

Are you currently working around this issue?
How are you currently solving this problem?

I have implemented a custom BotKube filter to ignore invalid capacity 0 on image filesystem Node event on Fargate.

Here is the custom filter for those who want to have a look: botkube/pkg/filterengine/filters/custom_node_event_checker.go

// CustomNodeEventsChecker filter to send notifications on critical node events

package filters

import (
	"github.com/infracloudio/botkube/pkg/events"
	"github.com/infracloudio/botkube/pkg/filterengine"
	"github.com/infracloudio/botkube/pkg/log"
	"strings"
)

const (
	// InvalidDiskCapacity EventReason when Node has InvalidDiskCapacity
	InvalidDiskCapacity string = "InvalidDiskCapacity"
)

// CustomNodeEventsChecker checks job status and adds message in the events structure
type CustomNodeEventsChecker struct {
	Description string
}

// Register filter
func init() {
	filterengine.DefaultFilterEngine.Register(CustomNodeEventsChecker{
		Description: "Sends notifications on node level critical events.",
	})
}

// Run filers and modifies event struct
func (f CustomNodeEventsChecker) Run(object interface{}, event *events.Event) {

	// Run filter only on Node events
	if event.Kind != "Node" {
		return
	}

	log.Debugf("CustomNodeEventsChecker, object: %+v\n------------", object)
	log.Debugf("CustomNodeEventsChecker, event: %+v\n------------", event)

	// Update event details
	// Promote InfoEvent with critical reason as significant ErrorEvent
	switch event.Reason {
	case InvalidDiskCapacity:
		log.Debug("Node has InvalidDiskCapacity, ignoring it")
		if strings.Contains(event.Name, "fargate-ip-") {
			for _, m := range event.Messages {
				// As of 2021/06/17 skip warning events due to invalid capacity 0 on image filesystem during Fargate node creation
				// See https://github.com/aws/containers-roadmap/issues/1403
				if strings.Contains(m, "invalid capacity 0 on image filesystem") {
					log.Debug("Skipping Node event with InvalidDiskCapacity for EKS Fargate")
					event.Skip = true
				}
			}
		}
	default:
	}

	log.Debug("Node Critical Event filter successful!")
}

// Describe filter
func (f CustomNodeEventsChecker) Describe() string {
	return f.Description
}

Additional context
Anything else we should know?

We don't have this issue with Self-Managed EKS Nodes nor AWS Managed EKS nodes.

Thanks in advance for your time.

@mickael-ange mickael-ange added the Proposed Community submitted issue label Jun 14, 2021
@mikestef9 mikestef9 added EKS Amazon Elastic Kubernetes Service Fargate AWS Fargate labels Jun 14, 2021
@Hunter-Thompson
Copy link

We have the same issue. This issue has been spamming our BotKube channel since way before #625.

@herod2k
Copy link

herod2k commented Oct 29, 2021

Same here, same error message:

invalid capacity 0 on image filesystem

EKS on fargate too.

@FireballDWF
Copy link

I see the same Event Type warning when I run "kubectl describe node instance_name" where instance_name is the dns name of a EKS Local Clusters control-plane,master node.

@Narsilion
Copy link

Same warning, but I can't see any bad effects from it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EKS Amazon Elastic Kubernetes Service Fargate AWS Fargate Proposed Community submitted issue
Projects
None yet
Development

No branches or pull requests

6 participants