Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance LinuxCollector to support detecting multiple app VIF IPs #4253

Merged
merged 1 commit into from
Sep 14, 2024

Conversation

milan-zededa
Copy link
Contributor

@milan-zededa milan-zededa commented Sep 13, 2024

Application VIF may have multiple IP addresses assigned. They can be either assigned directly on the same interface, or used on separate VLAN sub-interfaces which share the parent interface MAC address.

LinuxCollector should detect and publish all of them, instead of flapping between them and generating many IP address change notifications, which trigger a flood of NI and App info messages published to the controller.

For DHCP assigned IP addresses, we use lease time to determine if a previously detected IP address is still valid. For statically assigned IP, we expect to see at least one associated ARP packet every 10 minutes. Otherwise, we consider the IP address to be removed or simply not used anymore.

This commit also enhances LinuxCollector to support enabling or disabling ARP snooping in runtime, without requiring to recreate all switch network instances or rebooting device.

Please note that I have added the stable tag because we will need to backport this to stable releases to fix the NI info message flooding caused by arp snooping when app interface has multiple IPs assigned.

Application VIF may have multiple IP address assigned. They can be
either assigned directly on the same interface, or used on separate
VLAN sub-interfaces which share the parent interface MAC address.

LinuxCollector should detect and publish all of them, instead of
flapping between them and generating many IP address change
notifications, which trigger a flood of NI and App info messages
published to the controller.

For DHCP assigned IP addresses, we use lease time to determine if
a previously detected IP address is still valid. For statically
assigned IP, we expect to see at least one associated ARP packet
every 10 minutes. Otherwise, we consider the IP address to be removed
or simply not used anymore.

This commit also enhances LinuxCollector to support enabling or
disabling ARP snooping in runtime, without requiring to recreate all
switch network instances or rebooting device.

Signed-off-by: Milan Lenco <milan@zededa.com>
@OhmSpectator
Copy link
Member

👀

@OhmSpectator
Copy link
Member

I have some doubts about using the absence of ARP packets over a 10-minute period to determine whether a statically assigned IP address is no longer in use. First, the lack of network traffic doesn't necessarily mean the IP isn't assigned; devices might be idle but still active on the network. Second, ARP traffic only occurs when a device needs to resolve an IP address to a MAC address, so if there's no communication initiated, you won't see ARP packets—meaning the method assumes regular communication that might not exist. Third, devices maintain ARP caches with varying timeouts, often between 2 to 30 minutes, so ARP activity might not happen within your 10-minute window even if the device is active.

@naiming-zededa
Copy link
Contributor

I have some doubts about using the absence of ARP packets over a 10-minute period to determine whether a statically assigned IP address is no longer in use. First, the lack of network traffic doesn't necessarily mean the IP isn't assigned; devices might be idle but still active on the network. Second, ARP traffic only occurs when a device needs to resolve an IP address to a MAC address, so if there's no communication initiated, you won't see ARP packets—meaning the method assumes regular communication that might not exist. Third, devices maintain ARP caches with varying timeouts, often between 2 to 30 minutes, so ARP activity might not happen within your 10-minute window even if the device is active.

I don't think there is a way to precisely to determine when the app's IP address is gone, but we don't have any other way to gauge when this ip will be gone if through the ARP snooping. But the worse case is just like today's behavior, we will keep toggling and send updated NI status message and causing more info messages to the cloud. Although now it will be limited to only send one every 10 minutes (if the ARP happens to delay for that long, which is not very likely). If the App don't have traffic, we likely to toggle the info messages much longer than 10 minutes.

@OhmSpectator
Copy link
Member

I have some doubts about using the absence of ARP packets over a 10-minute period to determine whether a statically assigned IP address is no longer in use. First, the lack of network traffic doesn't necessarily mean the IP isn't assigned; devices might be idle but still active on the network. Second, ARP traffic only occurs when a device needs to resolve an IP address to a MAC address, so if there's no communication initiated, you won't see ARP packets—meaning the method assumes regular communication that might not exist. Third, devices maintain ARP caches with varying timeouts, often between 2 to 30 minutes, so ARP activity might not happen within your 10-minute window even if the device is active.

I don't think there is a way to precisely to determine when the app's IP address is gone, but we don't have any other way to gauge when this ip will be gone if through the ARP snooping. But the worse case is just like today's behavior, we will keep toggling and send updated NI status message and causing more info messages to the cloud. Although now it will be limited to only send one every 10 minutes (if the ARP happens to delay for that long, which is not very likely). If the App don't have traffic, we likely to toggle the info messages much longer than 10 minutes.

Okay, thanks, I understand now that it's not critical, and these heuristics already help.

is required, IP addresses must be assigned statically within the connected applications
or provided by an external DHCP server or another application offering DHCP services.

Since EVE is not in control of IP address allocations and leases, it must monitor application
Copy link
Member

@OhmSpectator OhmSpectator Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, I didn't know EVE works that way =)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OhmSpectator for a switch network instance that is the case - EVE provides L2 connectivity but no IP and DNS. For a local network instance it is different.

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kick off tests

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
One eden test fails with
app_test.go:184: state: INSTALLED: [description:"failed to build domain config: can't write to config file /run/domainmgr/xen/xen1.cfg (template: qemuSwtpm:14:10: executing "qemuSwtpm" at <.Machine>: can't evaluate field Machine in type struct { CtrlSocket string })" timestamp:{seconds:1726263300 nanos:848559480} severity:SEVERITY_ERROR] received in: 2024-09-13T21:35:01.269742497Z

but that bug was in master and is now fixed.
ztests looks fine when run manually.

Do we want to backport this to the next 12.0 release?

@eriknordmark eriknordmark merged commit 5b2acf4 into lf-edge:master Sep 14, 2024
50 of 51 checks passed
@OhmSpectator
Copy link
Member

Do we want to backport this to the next 12.0 release?

I think so... At least @milan-zededa has marked it as stable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stable Should be backported to stable release(s)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants