Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KubePrism generates a lot of DNS queries #7690

Closed
ruifung opened this issue Aug 31, 2023 Discussed in #7689 · 2 comments · Fixed by #7692
Closed

KubePrism generates a lot of DNS queries #7690

ruifung opened this issue Aug 31, 2023 Discussed in #7689 · 2 comments · Fixed by #7692
Assignees

Comments

@ruifung
Copy link

ruifung commented Aug 31, 2023

Discussed in #7689

Originally posted by ruifung August 31, 2023
Is it just me, or does it seem like talos 1.5.1 with kubeprism enabled seem to constantly do DNS queries for the controlplane endpoint?

I've noticed that since I've updated to 1.5.1, the top domain queried is my controlplane endpoint DNS at over 40k queries over the last hour from 6 nodes (3 control, 3 worker).

I just noticed this when looking at the stats on my local DNS server (Technitium DNS)

Addendum:

I just tested, it does seem like KubePrism is indeed what's causing what seems like (compared to everything else on my network) an excessive amount of queries for the controlplane DNS (i.e. controlplane.cluster.home.arpa) to the point that 6 nodes (3 control, 3 worker) generated in excess of 40k queries for that per hour. Disabling KubePrism seems to resolve it.

On average, it appears to be generating 2 queries per second per node.

Is something not respecting the TTL set on the DNS records?
I'll leave KubePrism disabled for now because it's been filling the query logs and query stats.

@ruifung ruifung changed the title Talos 1.5.1 controlplane endpoint DNS query flood KubePrism generates a lot of DNS queries Aug 31, 2023
@smira
Copy link
Member

smira commented Aug 31, 2023

Talos does health checks with KubePrism enabled on all controlplane endpoints. Talos doesn't use the local DNS cache, but I see the problem - the checks are run too aggressively (too fast), and that needs to be fixed

@smira
Copy link
Member

smira commented Aug 31, 2023

The PR #7692 doesn't fully solve the issue, as it will make less DNS requests, but will not do proper caching still.

I created #7693 to track DNS cache.

smira added a commit to smira/talos that referenced this issue Sep 5, 2023
The default timeouts are very aggressive, and we should use explicit
timeouts so that healh checks don't run that often.

Fixes siderolabs#7690

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 79bbdf4)
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 11, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants