Use Pod IP for peer communication #220

clobrano · 2024-06-13T10:47:54Z

Why we need this PR

SNR Peers communication uses hostnetwork (Node IP), which exposes a HTTP/2 endpoint.
Using the Pod IP will make the port harder to attack.

Changes made

In place of looking for Nodes' IP, we look for other agents' Pod IP.

Which issue(s) this PR fixes

https://issues.redhat.com/browse/ECOPROJECT-1879

Test plan

clobrano · 2024-06-14T12:35:52Z

/test 4.15-openshift-e2e

mshitrit · 2024-06-16T12:04:49Z

/test 4.15-openshift-e2e

pkg/apicheck/check.go

clobrano · 2024-06-17T13:18:52Z

/test 4.15-openshift-e2e

slintes · 2024-06-17T14:31:23Z

pkg/peers/peers.go

-		addresses[i] = node.Status.Addresses
+		for _, pod := range pods.Items {
+			if pod.Spec.NodeName == node.Name {
+				addresses[i] = pod.Status.PodIP


I was wondering why we have a string type now, and indeed there is a better choice IMHO, what about using pod.Status.PodIPs[0]?

Do you mean moving around this data as PodIP and then let popPeersIP deal with it returning the string[] of IPs?

I mean, at which point of the chain would it be better to use the underlining PodIP.IP?
IIUC, the only interface requiring the string is grpc.DialContext

I would use PodIP everywhere where we used NodeAddress before.
But: oh, it's just a wrapper around a string, I expected a more IP-ish thing 😁
And: oh, we did not even check the type of the NodeAddress in the old version of popNodes, and just took the first one 🙈

But: oh, it's just a wrapper around a string, I expected a more IP-ish thing 😁

😁 yep, moreover at the end of the day we use the string, so not sure it's worth it

slintes · 2024-06-17T14:33:45Z

pkg/apicheck/check.go

 		}

-		chosenNodesAddresses := c.popNodes(&nodesToAsk, nodesBatchCount)
-		healthyResponses, unhealthyResponses, apiErrorsResponses, _ := c.getHealthStatusFromPeers(chosenNodesAddresses)
+		chosenPodIPs := c.popPeerIPs(&peersToAsk, nodesBatchCount)


nit: at other places we use peer instead of node or pod, what about naming this var chosenPeerIPs as well?

you're right, just a typo here

clobrano · 2024-06-25T08:44:02Z

/test 4.15-openshift-e2e

clobrano · 2024-06-25T08:54:55Z

/test 4.15-openshift-e2e

mshitrit · 2024-06-26T06:38:12Z

/lgtm
/hold
Holding since not sure if others threads are resolved - feel free to unhold if this is the case.

slintes · 2024-06-27T11:53:19Z

code lgtm, but I would prefer to have enabled peer check e2e tests before merging this

- re-enable and fix api check log tests in e2e test - use service IP for killing API connection - kill API connection on SNR DS pod - add peer check server logs and use them for test which can't get logs from unhealthy node's SNR agent pod - wait for pod deletion only, not restart (restart is caused by reboot, not SNR) - refactor / cleanup e2e tests - fix owner check / node name / machine name in peer check server and agent reconciler - update sort-imports, which ignores generated files now

Signed-off-by: Carlo Lobrano <c.lobrano@gmail.com>

At startup (but it might happen in other moments too), some peers' Pod IP can still be empty, which means that until the next peers update we cannot check the connection with the other peers. Return an error in case a peer's Pod IP is empty. Signed-off-by: Carlo Lobrano <c.lobrano@gmail.com>

clobrano · 2024-07-09T14:37:40Z

/test 4.15-openshift-e2e

slintes

/hold

wait for #226 being merged

openshift-ci · 2024-07-09T15:59:23Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: clobrano, slintes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [clobrano,slintes]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

slintes · 2024-07-09T18:03:05Z

/test all

slintes · 2024-07-09T18:03:27Z

/hold cancel

slintes · 2024-07-09T19:56:57Z

/retest

clobrano · 2024-07-09T20:15:07Z

could not run steps: step [input:ocp-4.12-upi-installer] failed: failed to wait for importing imagestreamtag

4.12?

clobrano · 2024-07-09T20:22:24Z

It doesn't seem something related to our test.

/retest

slintes · 2024-07-09T20:25:50Z

could not run steps: step [input:ocp-4.12-upi-installer] failed: failed to wait for importing imagestreamtag

4.12?

looks very unrelated, upi is also wrong, should be ipi IIUC

slintes · 2024-07-09T22:14:34Z

/cherry-pick release-0.9

openshift-cherrypick-robot · 2024-07-09T22:15:14Z

@slintes: new pull request created: #234

In response to this:

/cherry-pick release-0.9

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

slintes · 2024-07-09T22:21:01Z

/cherry-pick release-0.9

openshift-cherrypick-robot · 2024-07-09T22:21:41Z

@slintes: #220 failed to apply on top of branch "release-0.9":

Applying: Some fixes and e2e test improvements:
Using index info to reconstruct a base tree...
M	Makefile
M	controllers/selfnoderemediation_controller.go
M	controllers/tests/controller/selfnoderemediation_controller_test.go
M	e2e/self_node_remediation_test.go
M	e2e/suite_test.go
A	e2e/utils/node.go
M	e2e/utils/pod.go
M	go.mod
M	go.sum
M	main.go
M	pkg/apicheck/check.go
M	pkg/peerhealth/client.go
M	pkg/peerhealth/client_server_test.go
M	pkg/peerhealth/peerhealth.pb.go
M	pkg/peerhealth/peerhealth_grpc.pb.go
M	pkg/peerhealth/server.go
M	pkg/peerhealth/suite_test.go
M	vendor/modules.txt
Falling back to patching base and 3-way merge...
Auto-merging pkg/peerhealth/server.go
CONFLICT (content): Merge conflict in pkg/peerhealth/server.go
Auto-merging controllers/selfnoderemediation_controller.go
CONFLICT (content): Merge conflict in controllers/selfnoderemediation_controller.go
CONFLICT (add/add): Merge conflict in controllers/owner_and_name.go
Auto-merging controllers/owner_and_name.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Some fixes and e2e test improvements:
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-0.9

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

slintes · 2024-07-09T22:28:52Z

meh, reopening and fixing #234

openshift-ci bot requested review from beekhof and razo7 June 13, 2024 10:47

openshift-ci bot added the approved label Jun 13, 2024

clobrano marked this pull request as draft June 13, 2024 10:48

openshift-ci bot added the do-not-merge/work-in-progress label Jun 13, 2024

clobrano force-pushed the e1879-use-pod-ip-port-0 branch from 8a0f42e to a71be9c Compare June 13, 2024 12:14

mshitrit reviewed Jun 16, 2024

View reviewed changes

pkg/apicheck/check.go Outdated Show resolved Hide resolved

mshitrit reviewed Jun 16, 2024

View reviewed changes

pkg/apicheck/check.go Outdated Show resolved Hide resolved

mshitrit reviewed Jun 16, 2024

View reviewed changes

pkg/apicheck/check.go Outdated Show resolved Hide resolved

slintes reviewed Jun 17, 2024

View reviewed changes

clobrano force-pushed the e1879-use-pod-ip-port-0 branch from c6a8cd3 to a8900ec Compare June 24, 2024 17:39

openshift-merge-robot added the needs-rebase label Jun 24, 2024

clobrano force-pushed the e1879-use-pod-ip-port-0 branch from a8900ec to 15dc2cc Compare June 25, 2024 08:16

openshift-merge-robot removed the needs-rebase label Jun 25, 2024

openshift-ci bot assigned mshitrit Jun 26, 2024

openshift-ci bot added do-not-merge/hold lgtm labels Jun 26, 2024

clobrano force-pushed the e1879-use-pod-ip-port-0 branch from 15dc2cc to 070f0bd Compare July 3, 2024 12:06

openshift-ci bot removed the lgtm label Jul 3, 2024

clobrano force-pushed the e1879-use-pod-ip-port-0 branch from 070f0bd to 7f0cf11 Compare July 3, 2024 12:08

clobrano added 3 commits July 9, 2024 14:41

Use Pod IP for peer communication

3eeeaeb

Signed-off-by: Carlo Lobrano <c.lobrano@gmail.com>

Update terminology to reflect Pod IP usage in place of Node IP

54086ff

Signed-off-by: Carlo Lobrano <c.lobrano@gmail.com>

clobrano added 2 commits July 9, 2024 16:28

Use core/v1 PodIP type in place than string

99a70db

Signed-off-by: Carlo Lobrano <c.lobrano@gmail.com>

clobrano force-pushed the e1879-use-pod-ip-port-0 branch from 7f0cf11 to cda2f3f Compare July 9, 2024 14:35

slintes approved these changes Jul 9, 2024

View reviewed changes

openshift-ci bot assigned slintes Jul 9, 2024

openshift-ci bot added the lgtm label Jul 9, 2024

clobrano marked this pull request as ready for review July 9, 2024 17:10

openshift-ci bot removed the do-not-merge/work-in-progress label Jul 9, 2024

openshift-ci bot requested review from mshitrit and slintes July 9, 2024 17:10

slintes mentioned this pull request Jul 9, 2024

Some fixes and e2e test improvements #226

Merged

openshift-ci bot removed the do-not-merge/hold label Jul 9, 2024

openshift-merge-bot bot merged commit 22336d0 into medik8s:main Jul 9, 2024
26 checks passed

openshift-cherrypick-robot mentioned this pull request Jul 9, 2024

[release-0.9] Use Pod IP for peer communication #234

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Pod IP for peer communication #220

Use Pod IP for peer communication #220

clobrano commented Jun 13, 2024

clobrano commented Jun 14, 2024

mshitrit commented Jun 16, 2024

clobrano commented Jun 17, 2024

slintes Jun 17, 2024

clobrano Jun 17, 2024

clobrano Jun 17, 2024

slintes Jun 17, 2024

clobrano Jun 17, 2024

slintes Jun 17, 2024

clobrano Jun 17, 2024

clobrano commented Jun 25, 2024

clobrano commented Jun 25, 2024

mshitrit commented Jun 26, 2024 •

edited

Loading

slintes commented Jun 27, 2024

clobrano commented Jul 9, 2024

slintes left a comment

openshift-ci bot commented Jul 9, 2024

slintes commented Jul 9, 2024

slintes commented Jul 9, 2024

slintes commented Jul 9, 2024

clobrano commented Jul 9, 2024

clobrano commented Jul 9, 2024

slintes commented Jul 9, 2024

slintes commented Jul 9, 2024

openshift-cherrypick-robot commented Jul 9, 2024

slintes commented Jul 9, 2024

openshift-cherrypick-robot commented Jul 9, 2024

slintes commented Jul 9, 2024

Use Pod IP for peer communication #220

Use Pod IP for peer communication #220

Conversation

clobrano commented Jun 13, 2024

Why we need this PR

Changes made

Which issue(s) this PR fixes

Test plan

clobrano commented Jun 14, 2024

mshitrit commented Jun 16, 2024

clobrano commented Jun 17, 2024

slintes Jun 17, 2024

Choose a reason for hiding this comment

clobrano Jun 17, 2024

Choose a reason for hiding this comment

clobrano Jun 17, 2024

Choose a reason for hiding this comment

slintes Jun 17, 2024

Choose a reason for hiding this comment

clobrano Jun 17, 2024

Choose a reason for hiding this comment

slintes Jun 17, 2024

Choose a reason for hiding this comment

clobrano Jun 17, 2024

Choose a reason for hiding this comment

clobrano commented Jun 25, 2024

clobrano commented Jun 25, 2024

mshitrit commented Jun 26, 2024 • edited Loading

slintes commented Jun 27, 2024

clobrano commented Jul 9, 2024

slintes left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Jul 9, 2024

slintes commented Jul 9, 2024

slintes commented Jul 9, 2024

slintes commented Jul 9, 2024

clobrano commented Jul 9, 2024

clobrano commented Jul 9, 2024

slintes commented Jul 9, 2024

slintes commented Jul 9, 2024

openshift-cherrypick-robot commented Jul 9, 2024

slintes commented Jul 9, 2024

openshift-cherrypick-robot commented Jul 9, 2024

slintes commented Jul 9, 2024

mshitrit commented Jun 26, 2024 •

edited

Loading