Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spoofcheck: Make use of go-nft's ApplyConfigEcho() #902

Merged
merged 1 commit into from
Jul 20, 2023

Conversation

SirPhuttel
Copy link
Contributor

Store the relevant applied config part for later to extract the rule to delete from there instead of having to list the ruleset. This is much faster especially with large rulesets.

@SirPhuttel
Copy link
Contributor Author

@EdDev here we are, could you please review the commit?

Copy link
Contributor

@EdDev EdDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very nice, thank you!

I have added a few points inline.

I noticed the new logic is not covered by unit tests, which may be a bit tricky now. I think we can improve on that in a follow up.

go.mod Outdated Show resolved Hide resolved
pkg/link/spoofcheck.go Outdated Show resolved Hide resolved
pkg/link/spoofcheck.go Outdated Show resolved Hide resolved
@SirPhuttel
Copy link
Contributor Author

Next round: I introduced SpoofChecker::findPreroutingRule() which is a wrapper around the LookupRule() method, taking care of gathering the relevant ruleset snippet if not cached already.

Cache dropping happens in Teardown() which knows if it is no longer valid or not.

func (dnc defaultNftConfigurer) Apply(cfg *nft.Config) error {
return nft.ApplyConfig(cfg)
func (dnc defaultNftConfigurer) Apply(cfg *nft.Config) (*nft.Config, error) {
const timeout = 55 * time.Second
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need a separate timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since nft.ApplyConfigEcho() requires a context, I had to create one so I just copied the respective code from Read(). Not sure if it is possible to create a context without timeout, but looking at commit 135292e ("bridge, del: timeout after 55 secs of trying to list rules"), which introduced the 55s timeout to Read(), it seems sensible to make ApplyConfigEcho() behave the same.
Or are you suggesting to introduce a common timeout value for use in both methods?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a big problem, but it's certainly not necessary. A CNI plugin binary is executed for every call, so if the runtime (read: containerd / cri-o) decides to cancel the CNI execution, it will just kill the binary. That will certainly stop any nft calls :-). In that case, you can just use context.Background() which will never be cancelled.

That said, 55 seconds seems like more than enough for nft to apply (I hope!), so this is probably academic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, frequent updates to an oversized nftables ruleset have a realistic chance of starving readers. Each time the ruleset's generation ID bumps, the reader discards the already fetched cache and restarts. Older nft versions always fetched the full ruleset irrespective of the actual command (so listing a small chain would still take long if a large other one was present).
I don't think this is possible to happen with echo mode, but I don't see why having the timeout should hurt.
Your explanation sounds like the context with timeout is pointless, though I assume @maiqueb fixed a real issue when introducing the Read() timeout.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A CNI plugin binary is executed for every call, so if the runtime (read: containerd / cri-o) decides to cancel the CNI execution, it will just kill the binary. That will certainly stop any nft calls :-). In that case, you can just use context.Background() which will never be cancelled.

We saw in k8s based systems that there is no timeout that eventually kills the call to the CNI.
We identified thousands of nft instances hanging with no one canceling them.

I think it is not safe to assume someone will cancel the CNI call after a timeout, it depends on the runtime or even the one who calls the runtime in the first place.

@SirPhuttel SirPhuttel force-pushed the applyconfigecho branch 2 times, most recently from 3bd5d12 to 3f48434 Compare June 27, 2023 19:52
@SirPhuttel
Copy link
Contributor Author

Two updates:

  • Dropped a leftover line from the commit message, no functional changes.
  • Extended spoofcheck_test.go to cover the fallback to Read() if Apply() should not return anything.

@EdDev do you think the testsuite part is sufficient now? Any idea what to test for in addition?

@EdDev
Copy link
Contributor

EdDev commented Jun 28, 2023

Two updates:

* Dropped a leftover line from the commit message, no functional changes.

* Extended spoofcheck_test.go to cover the fallback to Read() if Apply() should not return anything.

@EdDev do you think the testsuite part is sufficient now? Any idea what to test for in addition?

I am good with the current change, thank you.

I need now to release go-nft to add the new API in a formal release.
There may be another change I want to include, so I will update soon about it.

@EdDev
Copy link
Contributor

EdDev commented Jun 28, 2023

I need now to release go-nft to add the new API in a formal release.
There may be another change I want to include, so I will update soon about it.

@SirPhuttel , new go-nft release is out: https://github.com/networkplumbing/go-nft/releases/tag/v0.4.0
I suggest you update go.mog and sync the vendoring to use it (instead of the one pointing to a specific commit).

pkg/link/spoofcheck_test.go Outdated Show resolved Hide resolved
Copy link
Member

@squeed squeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just two things

  • a few minor lint issues
  • can you pick up the latest go-nft release?

Thanks

@SirPhuttel SirPhuttel force-pushed the applyconfigecho branch 3 times, most recently from 2674966 to b59274e Compare June 28, 2023 16:08
Copy link
Contributor

@EdDev EdDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much!

Copy link
Contributor

@s1061123 s1061123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@squeed
Copy link
Member

squeed commented Jul 3, 2023

@SirPhuttel we merged some dep updates, can you rebase? Then we'll merge this.

@SirPhuttel
Copy link
Contributor Author

@SirPhuttel we merged some dep updates, can you rebase? Then we'll merge this.

DONE! Applied with just a context conflict in go.mod.

@SirPhuttel
Copy link
Contributor Author

And a (final?) whitespace fix to make the linter happy.

@SirPhuttel
Copy link
Contributor Author

@squeed Ping! This is stuck somehow. Could you please approve the workflow so we see if it passes CI? Thanks!

@squeed
Copy link
Member

squeed commented Jul 19, 2023

Hmm, the tests are failing for no clear reason. I'm taking a look.

@squeed
Copy link
Member

squeed commented Jul 19, 2023

tests are broken; I'm investigating. Huh

@squeed
Copy link
Member

squeed commented Jul 20, 2023

@SirPhuttel can you rebase on main? There was an issue with CI, sorry.

Store the relevant applied config part for later to extract the rule to
delete from there instead of having to list the ruleset. This is much
faster especially with large rulesets.

Signed-off-by: Phil Sutter <psutter@redhat.com>
@SirPhuttel
Copy link
Contributor Author

@squeed DONE, thanks for investigating.

@squeed squeed merged commit 438548a into containernetworking:main Jul 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants