Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EVPN]Handling error scenarios during route programming and IMR add #2670

Merged
merged 1 commit into from
Mar 7, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions orchagent/routeorch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -665,6 +665,8 @@ void RouteOrch::doTask(Consumer& consumer)
NextHopGroupKey& nhg = ctx.nhg;
vector<string> srv6_segv;
vector<string> srv6_src;
bool l3Vni = true;
uint32_t vni = 0;

/* Check if the next hop group is owned by the NhgOrch. */
if (nhg_index.empty())
Expand Down Expand Up @@ -696,6 +698,23 @@ void RouteOrch::doTask(Consumer& consumer)
ipv.resize(alsv.size());
}

for (auto &vni_str: vni_labelv)
{
vni = static_cast<uint32_t>(std::stoul(vni_str));
if (!m_vrfOrch->isL3VniVlan(vni))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the vrf-vni configuration is removed from OA and not from FRR after routes are installed then the routes still remain in the HW.
Is this scenario handled ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our case the VNI removal leads to error logs in SAI since it has references. This scenario again depends on references to maintain which I believe we don't do it. IMO I am fixing the scenarios which are easier to fix with the current implementation and then document the ones which are harder to fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes thats correct we don't maintain the routes in the OA and handling this will be harder. If this leads to a SAI error and an OA crash we will need to see whether this can be handled at the SAI level.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently SAI error is not creating orchagent crash since we are not handling status for this call. So we will get syslog error. For now I consider this a lower priority and document this.

{
SWSS_LOG_WARN("Route %s is received on non L3 VNI %s", key.c_str(), vni_str.c_str());
l3Vni = false;
break;
}
}

if (!l3Vni)
{
it++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment as for IMR handling. Not Removing this from the m_tosyncqueue in a scale scenario and in a misconfiguration case will unnecessarily load the OA.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, completely ignoring will lead to the message being lost. Since this is a error scenario, I would rather prefer retrying here. It won't affect regular use case scenarios

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we see this in startup scenarios as well ? In which case the retry becomes necessary. 

For example in an L3VNI scenario, FRR is configured correctly and SWSS is configured for both L2 and L3VNIs. 

The vrforch processes the L3VNI configuration after receiving the update from fpmsyncd. 

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srj102 It is quite possible but I haven't encountered it. I believe my change would handle that scenario as well

continue;
}

/* Set the empty ip(s) to zero
* as IpAddress("") will construct a incorrect ip. */
for (auto &ip : ipv)
Expand Down
14 changes: 14 additions & 0 deletions orchagent/vxlanorch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2376,6 +2376,13 @@ bool EvpnRemoteVnip2pOrch::addOperation(const Request& request)
return false;
}

VRFOrch* vrf_orch = gDirectory.get<VRFOrch*>();
if (vrf_orch->isL3VniVlan(vni_id))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. wouldn't this check is applicable for fdborch also ?
  2. If the vni became an L3VNI after IMR routes were installed then OA will remove VLAN-VNI mappings and add the VRF-VNI mapping, leaving us with a scenario where the SAI has MAC/IMR routes with no corresponding vlan vni mappings.
  3. What happens without this fix ? Are there SAI errors we see ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. In case of FDB orch we have check to ensure Vlan membership which will prevent FDB getting added since the IMR is prevented
  2. Yes that's true. But as per the current behavior we will fail at removing VLAN VNI mapping at SAI level due to references. Do you think we should add a check at orchagent. I am not sure if this check is valid for other platforms and so I didn't prefer to add
  3. Without the fix SAI returns error and orchagent crashes.

{
SWSS_LOG_WARN("Ignoring remote VNI add for L3 VNI:%d, remote:%s", vni_id, remote_vtep.c_str());
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be return true instead of return false ?

The log says "Ignore" but in reality it is being deferred.
Also a large number of IMR routes being held in the m_tosync queue and being processed in a continous loop will load the OA unnecessarily.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can correct the log. I prefer not to ignore but defer. If we ignore we might miss those remote. Since this is a corner error scenario I don't think it will overwhelm orchagent during regular processing

}

if (tunnel_orch->getTunnelPort(remote_vtep,tunnelPort))
{
SWSS_LOG_INFO("Vxlan tunnelPort exists: %s", remote_vtep.c_str());
Expand Down Expand Up @@ -2531,6 +2538,13 @@ bool EvpnRemoteVnip2mpOrch::addOperation(const Request& request)
return false;
}

VRFOrch* vrf_orch = gDirectory.get<VRFOrch*>();
if (vrf_orch->isL3VniVlan(vni_id))
{
SWSS_LOG_WARN("Ignoring remote VNI add for L3 VNI:%d, remote:%s", vni_id, end_point_ip.c_str());
return false;
}

auto src_vtep = vtep_ptr->getSrcIP().to_string();
if (tunnel_orch->getTunnelPort(src_vtep,tunnelPort, true))
{
Expand Down
51 changes: 51 additions & 0 deletions tests/evpn_tunnel.py
Original file line number Diff line number Diff line change
Expand Up @@ -723,6 +723,17 @@ def check_vlan_extension_delete_p2mp(self, dvs, vlan_name, sip, dip):
status, fvs = tbl.get(self.l2mcgroup_member_map[dip+vlan_name])
assert status == False, "L2MC Group Member entry not deleted"

def check_vlan_obj(self, dvs, vlan_name):
asic_db = swsscommon.DBConnector(swsscommon.ASIC_DB, dvs.redis_sock, 0)
expected_vlan_attributes = {
'SAI_VLAN_ATTR_VLAN_ID': vlan_name,
}
ret = self.helper.get_key_with_attr(asic_db, 'ASIC_STATE:SAI_OBJECT_TYPE_VLAN', expected_vlan_attributes)
assert len(ret) > 0, "VLAN entry not created"
assert len(ret) == 1, "More than 1 VLAN entry created"

self.vlan_id_map[vlan_name] = ret[0]

def check_vlan_extension(self, dvs, vlan_name, dip):
asic_db = swsscommon.DBConnector(swsscommon.ASIC_DB, dvs.redis_sock, 0)
expected_vlan_attributes = {
Expand All @@ -743,6 +754,25 @@ def check_vlan_extension(self, dvs, vlan_name, dip):
assert len(ret) == 1, "More than 1 VLAN member created"
self.vlan_member_map[dip+vlan_name] = ret[0]

def check_vlan_extension_not_created(self, dvs, vlan_name, dip):
asic_db = swsscommon.DBConnector(swsscommon.ASIC_DB, dvs.redis_sock, 0)
expected_vlan_attributes = {
'SAI_VLAN_ATTR_VLAN_ID': vlan_name,
}
ret = self.helper.get_key_with_attr(asic_db, 'ASIC_STATE:SAI_OBJECT_TYPE_VLAN', expected_vlan_attributes)
assert len(ret) > 0, "VLAN entry not created"
assert len(ret) == 1, "More than 1 VLAN entry created"

self.vlan_id_map[vlan_name] = ret[0]

if dip in self.bridgeport_map:
expected_vlan_member_attributes = {
'SAI_VLAN_MEMBER_ATTR_VLAN_ID': self.vlan_id_map[vlan_name],
'SAI_VLAN_MEMBER_ATTR_BRIDGE_PORT_ID': self.bridgeport_map[dip],
}
ret = self.helper.get_key_with_attr(asic_db, 'ASIC_STATE:SAI_OBJECT_TYPE_VLAN_MEMBER', expected_vlan_member_attributes)
assert len(ret) == 0, "VLAN member created"

def check_vlan_extension_p2mp(self, dvs, vlan_name, sip, dip):
asic_db = swsscommon.DBConnector(swsscommon.ASIC_DB, dvs.redis_sock, 0)
tbl = swsscommon.Table(asic_db, 'ASIC_STATE:SAI_OBJECT_TYPE_VLAN')
Expand Down Expand Up @@ -986,6 +1016,27 @@ def check_vrf_routes(self, dvs, prefix, vrf_name, endpoint, tunnel, mac="", vni=

return True

def check_vrf_routes_absence(self, dvs, prefix, vrf_name, endpoint, tunnel, mac="", vni=0, no_update=0):
asic_db = swsscommon.DBConnector(swsscommon.ASIC_DB, dvs.redis_sock, 0)

vr_ids = self.vrf_route_ids(dvs, vrf_name)
count = len(vr_ids)

# Check routes in ingress VRF
expected_attr = {
"SAI_NEXT_HOP_ATTR_TYPE": "SAI_NEXT_HOP_TYPE_TUNNEL_ENCAP",
"SAI_NEXT_HOP_ATTR_IP": endpoint,
"SAI_NEXT_HOP_ATTR_TUNNEL_ID": self.tunnel[tunnel],
}

if vni:
expected_attr.update({'SAI_NEXT_HOP_ATTR_TUNNEL_VNI': vni})

if mac:
expected_attr.update({'SAI_NEXT_HOP_ATTR_TUNNEL_MAC': mac})

self.helper.get_created_entries(asic_db, self.ASIC_NEXT_HOP, self.nhops, 0)

def check_vrf_routes_ecmp(self, dvs, prefix, vrf_name, tunnel, nh_count, no_update=0):
asic_db = swsscommon.DBConnector(swsscommon.ASIC_DB, dvs.redis_sock, 0)

Expand Down
64 changes: 6 additions & 58 deletions tests/test_evpn_l3_vxlan.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ def test_sip_tunnel_vrf_vni_map(self, dvs, testlog):
print ("\n\nTesting Create and Delete SIP Tunnel and VRF VNI Map entries")
print ("\tCreate SIP Tunnel")
vxlan_obj.create_vlan1(dvs,"Vlan100")
vxlan_obj.check_vlan_obj(dvs, "100")
vxlan_obj.create_vxlan_tunnel(dvs, tunnel_name, '6.6.6.6')
vxlan_obj.create_evpn_nvo(dvs, 'nvo1', tunnel_name)

Expand Down Expand Up @@ -97,7 +98,7 @@ def test_sip_tunnel_vrf_vni_map(self, dvs, testlog):
# Test 2 - Create and Delete DIP Tunnel on adding and removing prefix route
# @pytest.mark.skip(reason="Starting Route Orch, VRF Orch to be merged")
# @pytest.mark.dev_sanity
def test_prefix_route_create_dip_tunnel(self, dvs, testlog):
def test_prefix_route_create_tunnel(self, dvs, testlog):
vxlan_obj = self.get_vxlan_obj()
helper = self.get_vxlan_helper()

Expand Down Expand Up @@ -161,17 +162,11 @@ def test_prefix_route_create_dip_tunnel(self, dvs, testlog):
vxlan_obj.create_vrf_route(dvs, "80.80.1.0/24", 'Vrf-RED', '7.7.7.7', "Vlan100", "00:11:11:11:11:11", '1000')
vxlan_obj.check_vrf_routes(dvs, "80.80.1.0/24", 'Vrf-RED', '7.7.7.7', tunnel_name, "00:11:11:11:11:11", '1000')

print ("\tTesting DIP tunnel 7.7.7.7 creation")
vxlan_obj.check_vxlan_dip_tunnel(dvs, tunnel_name, '6.6.6.6', '7.7.7.7')

print ("\tTest VRF IPv4 Route with Tunnel Nexthop Delete")
vxlan_obj.delete_vrf_route(dvs, "80.80.1.0/24", 'Vrf-RED')
vxlan_obj.check_del_tunnel_nexthop(dvs, 'Vrf-RED', '7.7.7.7', tunnel_name, "00:11:11:11:11:11", '1000')
vxlan_obj.check_del_vrf_routes(dvs, "80.80.1.0/24", 'Vrf-RED')

print ("\tTesting DIP tunnel 7.7.7.7 deletion")
vxlan_obj.check_vxlan_dip_tunnel_delete(dvs, '7.7.7.7')

print ("\tTesting Tunnel Vrf Map Entry removal")
vxlan_obj.remove_vxlan_vrf_tunnel_map(dvs, 'Vrf-RED')
vxlan_obj.check_vxlan_tunnel_vrf_map_entry_remove(dvs, tunnel_name, 'Vrf-RED', '1000')
Expand All @@ -198,7 +193,7 @@ def test_prefix_route_create_dip_tunnel(self, dvs, testlog):
# Test 3 - Create and Delete DIP Tunnel and Test IPv4 route and overlay nexthop add and delete
# @pytest.mark.skip(reason="Starting Route Orch, VRF Orch to be merged")
# @pytest.mark.dev_sanity
def test_dip_tunnel_ipv4_routes(self, dvs, testlog):
def test_tunnel_ipv4_routes(self, dvs, testlog):
vxlan_obj = self.get_vxlan_obj()
helper = self.get_vxlan_helper()

Expand All @@ -211,6 +206,7 @@ def test_dip_tunnel_ipv4_routes(self, dvs, testlog):
print ("\n\nTesting IPv4 Route and Overlay Nexthop Add and Delete")
print ("\tCreate SIP Tunnel")
vxlan_obj.create_vlan1(dvs,"Vlan100")
vxlan_obj.check_vlan_obj(dvs, "100")
vxlan_obj.create_vxlan_tunnel(dvs, tunnel_name, '6.6.6.6')
vxlan_obj.create_evpn_nvo(dvs, 'nvo1', tunnel_name)

Expand Down Expand Up @@ -252,21 +248,6 @@ def test_dip_tunnel_ipv4_routes(self, dvs, testlog):
print ("\tTesting Tunnel Vrf Map Entry")
vxlan_obj.check_vxlan_tunnel_vrf_map_entry(dvs, tunnel_name, 'Vrf-RED', '1000')

print ("\tTesting First DIP tunnel creation to 7.7.7.7")
vxlan_obj.create_evpn_remote_vni(dvs, 'Vlan100', '7.7.7.7', '1000')
vxlan_obj.check_vxlan_dip_tunnel(dvs, tunnel_name, '6.6.6.6', '7.7.7.7')

print ("\tTesting VLAN 100 extension")
vxlan_obj.check_vlan_extension(dvs, '100', '7.7.7.7')

print ("\tTesting Second DIP tunnel creation to 8.8.8.8")
vxlan_obj.create_evpn_remote_vni(dvs, 'Vlan100', '8.8.8.8', '1000')
vxlan_obj.check_vxlan_dip_tunnel(dvs, tunnel_name, '6.6.6.6', '8.8.8.8')

print ("\tTesting VLAN 100 extension to 8.8.8.8 and 7.7.7.7")
vxlan_obj.check_vlan_extension(dvs, '100', '8.8.8.8')
vxlan_obj.check_vlan_extension(dvs, '100', '7.7.7.7')

print ("\tTesting VLAN 100 interface creation")
vxlan_obj.create_vlan_interface(dvs, "Vlan100", "Ethernet24", "Vrf-RED", "100.100.3.1/24")
vxlan_obj.check_router_interface(dvs, 'Vrf-RED', vxlan_obj.vlan_id_map['100'], 2)
Expand Down Expand Up @@ -373,16 +354,6 @@ def test_dip_tunnel_ipv4_routes(self, dvs, testlog):
vxlan_obj.remove_vxlan_vrf_tunnel_map(dvs, 'Vrf-RED')
vxlan_obj.check_vxlan_tunnel_vrf_map_entry_remove(dvs, tunnel_name, 'Vrf-RED', '1000')

print ("\tTesting LastVlan removal and DIP tunnel delete for 7.7.7.7")
vxlan_obj.remove_evpn_remote_vni(dvs, 'Vlan100', '7.7.7.7')
vxlan_obj.check_vlan_extension_delete(dvs, '100', '7.7.7.7')
vxlan_obj.check_vxlan_dip_tunnel_delete(dvs, '7.7.7.7')

print ("\tTesting LastVlan removal and DIP tunnel delete for 8.8.8.8")
vxlan_obj.remove_evpn_remote_vni(dvs, 'Vlan100', '8.8.8.8')
vxlan_obj.check_vlan_extension_delete(dvs, '100', '8.8.8.8')
vxlan_obj.check_vxlan_dip_tunnel_delete(dvs, '8.8.8.8')

print ("\tTesting Vlan 100 interface delete")
vxlan_obj.delete_vlan_interface(dvs, "Vlan100", "100.100.3.1/24")
vxlan_obj.check_del_router_interface(dvs, "Vlan100")
Expand All @@ -405,7 +376,7 @@ def test_dip_tunnel_ipv4_routes(self, dvs, testlog):
# Test 4 - Create and Delete DIP Tunnel and Test IPv6 route and overlay nexthop add and delete
# @pytest.mark.skip(reason="Starting Route Orch, VRF Orch to be merged")
# @pytest.mark.dev_sanity
def test_dip_tunnel_ipv6_routes(self, dvs, testlog):
def test_tunnel_ipv6_routes(self, dvs, testlog):
vxlan_obj = self.get_vxlan_obj()
helper = self.get_vxlan_helper()

Expand All @@ -418,6 +389,7 @@ def test_dip_tunnel_ipv6_routes(self, dvs, testlog):
print ("\n\nTesting IPv6 Route and Overlay Nexthop Add and Delete")
print ("\tCreate SIP Tunnel")
vxlan_obj.create_vlan1(dvs,"Vlan100")
vxlan_obj.check_vlan_obj(dvs, "100")
vxlan_obj.create_vxlan_tunnel(dvs, tunnel_name, '6.6.6.6')
vxlan_obj.create_evpn_nvo(dvs, 'nvo1', tunnel_name)

Expand Down Expand Up @@ -460,20 +432,6 @@ def test_dip_tunnel_ipv6_routes(self, dvs, testlog):
print ("\tTesting Tunnel Vrf Map Entry")
vxlan_obj.check_vxlan_tunnel_vrf_map_entry(dvs, tunnel_name, 'Vrf-RED', '1000')

print ("\tTesting First DIP tunnel creation to 7.7.7.7")
vxlan_obj.create_evpn_remote_vni(dvs, 'Vlan100', '7.7.7.7', '1000')
vxlan_obj.check_vxlan_dip_tunnel(dvs, tunnel_name, '6.6.6.6', '7.7.7.7')

print ("\tTesting VLAN 100 extension")
vxlan_obj.check_vlan_extension(dvs, '100', '7.7.7.7')

print ("\tTesting Second DIP tunnel creation to 8.8.8.8")
vxlan_obj.create_evpn_remote_vni(dvs, 'Vlan100', '8.8.8.8', '1000')
vxlan_obj.check_vxlan_dip_tunnel(dvs, tunnel_name, '6.6.6.6', '8.8.8.8')

print ("\tTesting VLAN 100 extension to 8.8.8.8 and 7.7.7.7")
vxlan_obj.check_vlan_extension(dvs, '100', '8.8.8.8')
vxlan_obj.check_vlan_extension(dvs, '100', '7.7.7.7')

vxlan_obj.fetch_exist_entries(dvs)
print ("\tTesting VLAN 100 interface creation")
Expand Down Expand Up @@ -582,16 +540,6 @@ def test_dip_tunnel_ipv6_routes(self, dvs, testlog):
vxlan_obj.remove_vxlan_vrf_tunnel_map(dvs, 'Vrf-RED')
vxlan_obj.check_vxlan_tunnel_vrf_map_entry_remove(dvs, tunnel_name, 'Vrf-RED', '1000')

print ("\tTesting LastVlan removal and DIP tunnel delete for 7.7.7.7")
vxlan_obj.remove_evpn_remote_vni(dvs, 'Vlan100', '7.7.7.7')
vxlan_obj.check_vlan_extension_delete(dvs, '100', '7.7.7.7')
vxlan_obj.check_vxlan_dip_tunnel_delete(dvs, '7.7.7.7')

print ("\tTesting LastVlan removal and DIP tunnel delete for 8.8.8.8")
vxlan_obj.remove_evpn_remote_vni(dvs, 'Vlan100', '8.8.8.8')
vxlan_obj.check_vlan_extension_delete(dvs, '100', '8.8.8.8')
vxlan_obj.check_vxlan_dip_tunnel_delete(dvs, '8.8.8.8')

print ("\tTesting Vlan 100 interface delete")
vxlan_obj.delete_vlan_interface(dvs, "Vlan100", "2001::8/64")
vxlan_obj.check_del_router_interface(dvs, "Vlan100")
Expand Down
Loading