An ArpTable
is a wrapper around an ArpCache object that services ARP
get and set operations for Router
RCU objects, being also responsible
for emitting ARP requests as necessary in order to fulfill those get()
operations.
The ArpCache
object it wraps is itself a thin thread-safe wrapper around
a ZooKeeper backed ReplicatedMap
, also called ArpTable
, which lives in the
org.midonet.midolman.state
package.
ArpTable
objects are created by the corresponding RouterManager
actor.
There exists only one instance during the lifetime of a RouterManager
, as
is the case for the ArpCache
object which the actor receives before it can
start creating Router
RCU objects.
There exists one RouterManager
actor per virtual router, and it gives
the ArpTable
reference to every Router
RCU object it creates, they share it.
ArpTable
contains two lifecycle management methods start()
and
stop()
which, respectively, make the object start or stop watching
for ArpCache
entry updates. stop()
is currently unused, but it
should be used by RouterManager
when a tear down mechanism is added to
it.
All operations run out of the simulation that requests them. This happens when an RCU Router needs to resolve a MAC address or discovers a new IP-MAC mapping to add to the ARP table (while processing a received ARP packet).
When a Router
asks for IP/MAC address mapping, this is what happens:
- The
ArpCache
is queried for the correspondingArpCacheEntry
. - A callback is registered on the
ArpCacheEntry
promise that will start the arp requests loop if necessary (entry is null or stale) - To return a value to the caller, if the received entry is not valid or
not existant, a promised is placed in the
arpWaiters
map.
The arp requests loop works as follows:
Each iteration checks that an arp request needs to be sent:
-
the arp cache entry is still stale/null.
-
no one else took over the sending of the arp requests.
-
the arp cache entry is not expired
...if so:
-
an ARP is emited.
-
a new value for
lastArp
is written to theArpCacheEntry
-
the loop adds itself to the
arpWaiters
map, with a timeout that equals the retry interval. If the future fails with a timeout, another iteration of the loop starts.
The set method notifies all the interested local waiters about the newly learned
MAC address and writes an entry in the ArpCache
so that it will make its way
to ZooKeeper and to waiters in other nodes.
Received ARP packets are not processed by the ArpTable
, but by the RCU
Router
. However this processing usually results in calls to arpTable.set()
.
In particular, reception of the following packets will cause an ArpTable
update, as long as the IP address of the sender falls within the network address
of the ingress port of the RCU Router
:
- An ARP request addressed to the
Router
ingress port MAC and IP addresses. - An ARP reply addressed similarly.
- A gratuitous ARP reply.
Whenever the ArpTable
sets an entry on the ArpCache
it will schedule
an expiry callback that will clean up the entry if it has not been
refreshed.
The ArpCache
(implemented inside ClusterRouterManager
) is the object
responsible for sending notifications from ZK (the ReplicatedMap
) down to the
ArpTable
, which will then notify the waiters (RCU Routers doing a simulation)
that are waiting for that particular MAC address.
-
If a node crashes, who cleans up the ARP cache entries whose expiration the deceased node was responsible for?
The
ReplicatedMap
that backs theArpCache
writes ephemeral nodes to ZooKeeper, so if a node goes down all entries it was responsible for will be cleaned up. -
While sending ARP requests,
ArpTable
does a read->change->write on the affectedArpCacheEntry
, this is a race condition. Preventing these sort of races is a TODO item for the Cluster design.This is harmful in two cases:
-
A node writing an entry with a null MAC address, the purpose of these writes is to track retries. If one of these null writes races with a write made by a node that discovered the actual MAC address it could overwrite it and thus delete a freshly created entry.
The node that overwrote the valid entry would continue ARPing for the address so the consequences would just be some extra traffic and latency.
This case is likely to happen, albeit infrequently. The retry interval for ARP requests is 10 seconds, null entries are written before sending an ARP request. It may be triggered by and ARP being resolved due to an event unrelated to the ARP request loop in question or due to a host replying to an ARP about 10 seconds after the request was sent.
-
A node expiring an ArpCacheEntry, could race with a node that happens to refresh it just before expiration. The consequences of this case would be similar to the above. But this case is so unlikely that it's not worth worrying about: cache entries have a 1 hour expiration period, and they become stale after 30 minutes. So triggering this would mean either keeping an entry stale for 30 minutes and resolving it exactly at the end of that period or having the entry be refreshed at that very moment for an unrelated reason, such as a gratuitous ARP reply.
-
-
If a node is sending ARP requests for an IP address and crashes, other nodes will take over if they need the MAC and the sender has skipped two retries, but two different nodes could take over a the same time. This is not serious because at the second iteration one of them would bail out.