-
Notifications
You must be signed in to change notification settings - Fork 797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spoofcheck: Make use of go-nft's ApplyConfigEcho() #902
Conversation
@EdDev here we are, could you please review the commit? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very nice, thank you!
I have added a few points inline.
I noticed the new logic is not covered by unit tests, which may be a bit tricky now. I think we can improve on that in a follow up.
e759696
to
6c8f8c8
Compare
Next round: I introduced SpoofChecker::findPreroutingRule() which is a wrapper around the LookupRule() method, taking care of gathering the relevant ruleset snippet if not cached already. Cache dropping happens in Teardown() which knows if it is no longer valid or not. |
func (dnc defaultNftConfigurer) Apply(cfg *nft.Config) error { | ||
return nft.ApplyConfig(cfg) | ||
func (dnc defaultNftConfigurer) Apply(cfg *nft.Config) (*nft.Config, error) { | ||
const timeout = 55 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need a separate timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since nft.ApplyConfigEcho() requires a context, I had to create one so I just copied the respective code from Read(). Not sure if it is possible to create a context without timeout, but looking at commit 135292e ("bridge, del: timeout after 55 secs of trying to list rules"), which introduced the 55s timeout to Read(), it seems sensible to make ApplyConfigEcho() behave the same.
Or are you suggesting to introduce a common timeout value for use in both methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a big problem, but it's certainly not necessary. A CNI plugin binary is executed for every call, so if the runtime (read: containerd / cri-o) decides to cancel the CNI execution, it will just kill the binary. That will certainly stop any nft calls :-). In that case, you can just use context.Background()
which will never be cancelled.
That said, 55 seconds seems like more than enough for nft to apply (I hope!), so this is probably academic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, frequent updates to an oversized nftables ruleset have a realistic chance of starving readers. Each time the ruleset's generation ID bumps, the reader discards the already fetched cache and restarts. Older nft versions always fetched the full ruleset irrespective of the actual command (so listing a small chain would still take long if a large other one was present).
I don't think this is possible to happen with echo mode, but I don't see why having the timeout should hurt.
Your explanation sounds like the context with timeout is pointless, though I assume @maiqueb fixed a real issue when introducing the Read() timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A CNI plugin binary is executed for every call, so if the runtime (read: containerd / cri-o) decides to cancel the CNI execution, it will just kill the binary. That will certainly stop any nft calls :-). In that case, you can just use
context.Background()
which will never be cancelled.
We saw in k8s based systems that there is no timeout that eventually kills the call to the CNI.
We identified thousands of nft
instances hanging with no one canceling them.
I think it is not safe to assume someone will cancel the CNI call after a timeout, it depends on the runtime or even the one who calls the runtime in the first place.
3bd5d12
to
3f48434
Compare
Two updates:
@EdDev do you think the testsuite part is sufficient now? Any idea what to test for in addition? |
I am good with the current change, thank you. I need now to release go-nft to add the new API in a formal release. |
@SirPhuttel , new go-nft release is out: https://github.com/networkplumbing/go-nft/releases/tag/v0.4.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just two things
- a few minor lint issues
- can you pick up the latest go-nft release?
Thanks
2674966
to
b59274e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
@SirPhuttel we merged some dep updates, can you rebase? Then we'll merge this. |
b59274e
to
6cc3eca
Compare
DONE! Applied with just a context conflict in go.mod. |
6cc3eca
to
be4c52f
Compare
And a (final?) whitespace fix to make the linter happy. |
@squeed Ping! This is stuck somehow. Could you please approve the workflow so we see if it passes CI? Thanks! |
Hmm, the tests are failing for no clear reason. I'm taking a look. |
tests are broken; I'm investigating. Huh |
@SirPhuttel can you rebase on |
Store the relevant applied config part for later to extract the rule to delete from there instead of having to list the ruleset. This is much faster especially with large rulesets. Signed-off-by: Phil Sutter <psutter@redhat.com>
be4c52f
to
2ba7f16
Compare
@squeed DONE, thanks for investigating. |
Store the relevant applied config part for later to extract the rule to delete from there instead of having to list the ruleset. This is much faster especially with large rulesets.