ECMP acceleration for physical i/f down events #351

nikos-github · 2017-10-13T23:59:38Z

These changes introduce support for ecmp acceleration for physical i/f down events. The design document was reviewed a couple of months back and checked into the repo.

The functionality has been verified through manual testing in a physical topology having 2 T0 switches (asw01, asw02) connected 4-way to 4 T1 switches (csw01, csw02, csw03, csw04). IXIA was connected in the topology to both T0 switches. Prefixes were advertised to asw02 and traffic was towards asw01. The ecmp acceleration was tested by bring down the links in different combinations on the T1 devices and checking the membership through logs on asw01 as well as having IXIA measure the traffic loss duration.

Nikos.-

msftclas · 2017-10-13T23:59:46Z

All CLA requirements met.

prsunny · 2017-10-23T17:14:11Z

orchagent/neighorch.cpp

+{
+    SWSS_LOG_ENTER();
+
+    for (auto nhop = m_syncdNextHops.begin(); nhop != m_syncdNextHops.end(); ++nhop) {


Just checking, for any interface change, this will scan the entire nexthop cache and check for interface match. Does this have any performance issues in case of large nexthop set?

How big do you expect the nexthop set to be? Typical data center won't have more than 100.

Ok. Since it is handled in the same context as link down event, my concern was if it could result in some delays/side effects. The other option would have been acting on the link down notification from kernel. Closing this as the assumption is, size of nexthop set would be comparatively smaller.

prsunny · 2017-10-23T17:17:05Z

orchagent/orchdaemon.cpp

 FdbOrch *gFdbOrch;
+NeighOrch *neigh_orch;


Looks like global variables follow name prefixed with "g". Do you want to follow this since it is now moved to global space?

Sure I can change that.

prsunny · 2017-10-23T17:40:33Z

orchagent/portsorch.cpp

+        }
+        SWSS_LOG_NOTICE("Set operation status %s to host interface %s",
+                        up ? "UP" : "DOWN", it->second.m_alias.c_str());
+        if (neigh_orch->ifChangeInformNextHop(it->second.m_alias, up) == false) {


Just would like to mention, having the nexthop operation in the same context where we handle link up/down events, may create some delays in subsequent link up/down operations.

This is per design document. Please refer to it.

prsunny · 2017-10-23T18:29:45Z

orchagent/routeorch.cpp

+        }
+
+        next_hop_id = next_hop_group_entry.nhopgroup_members[nhop];
+        status = sai_next_hop_group_api->remove_next_hop_group_member(next_hop_id);


IMO, adding a nexthop_group_member to SAI above and then check for IFDOWN to remove the group_member from here seems inefficient. I think you could check for this in the first for-loop ("for (auto it : next_hop_set)" and create the next_hop_ids vector accordingly. What do you think?

The orch agent needs to always have a copy of the nexthop since the owner is a routing protocol. Please refer to the design doc.

The next_hop_id which was created then deleted, but saved here is only used again when the removeNextHopGroup(). I think what @prsunny suggested is valid. orch agent needs to have a copy of nexthop, but not the obsoleted next_hop_id.

@nikos-github did I miss anything?

prsunny · 2017-10-23T18:52:45Z

orchagent/routeorch.cpp

+    for (auto nhop = next_hop_group_entry.nhopgroup_members.begin();
+         nhop != next_hop_group_entry.nhopgroup_members.end(); ++nhop) {
+
+        if (m_neighOrch->isNextHopFlagSet(nhop->first, NHFLAGS_IFDOWN)) {


Why do you check flags here. I think the request is to delete the nexthop group and irrespective of whether member is "down" or "up", it should be deleted. Also, if not deleted here, when will this be deleted?

If the flag is set, the member has already been removed. Calling the SAI API will cause a crash with the current sonic design.

oleksandrivantsiv · 2017-10-24T10:16:43Z

orchagent/neighorch.cpp

    m_syncdNextHops[ipAddress] = next_hop_entry;

    m_intfsOrch->increaseRouterIntfsRefCount(alias);

    return true;
 }

+bool
+NeighOrch::setNextHopFlag (const IpAddress &ipaddr, const uint32_t nh_flag)


Code style doesn't meet SONiC.

SONiC has an established code style? Can you provide a pointer to it please where current sonic code style requirements are laid out/specified?

SONiC has established code style. However it is not described in a document as in many other opensource projects. If you take a look to .*h, *.cpp all they look similar. Code that provided here looks completely different.

Indeed. The code here is more readable and makes it easier to identify when any tool is used in the code base (even grep), where the function definition is. If you do insist against those improvements (which should be sonic-wide), I guess I will have to change it.

I have no objections to change code style. But it should be done in all files at once and not in couple places of modified files. In other case we will have a mess. I also think that new code style should be discussed.

Addressed. You should apply the same diligence to code coming from your team.

Next week I'm OOO and won't be able to review.

lguohan · 2017-11-09T06:17:26Z

did we have test cases review this feature?

nikos-github · 2017-11-09T19:20:46Z

We had agreed to let the feature in and test cases plus review will follow. Why are we coming back a month later asking for something we had agreed on a course of action already and blocking the commit? Why don't you outline what your expectations are in order for the code to go in and whether with testing you are referring to automation testing of this feature to MSFT test framework or whether you are referring to feature testing and test cases that I performed?

prsunny · 2017-11-10T01:21:14Z

orchagent/neighorch.cpp

+{
+    SWSS_LOG_ENTER();
+
+    for (auto nhop = m_syncdNextHops.begin(); nhop != m_syncdNextHops.end(); ++nhop) {


Ok. Since it is handled in the same context as link down event, my concern was if it could result in some delays/side effects. The other option would have been acting on the link down notification from kernel. Closing this as the assumption is, size of nexthop set would be comparatively smaller.

prsunny · 2017-11-10T01:22:07Z

orchagent/routeorch.cpp

+    for (auto nhop = next_hop_group_entry.nhopgroup_members.begin();
+         nhop != next_hop_group_entry.nhopgroup_members.end(); ++nhop) {
+
+        if (m_neighOrch->isNextHopFlagSet(nhop->first, NHFLAGS_IFDOWN)) {


lguohan · 2017-11-30T11:27:40Z

I do not see the test cases defined in the design document. https://github.com/Azure/SONiC/blob/gh-pages/doc/sonic-ecmp-acceleration.docx

This is mostly control plane changes, I think integrating into the swss integration test should be ok, but we need to have the test cases designed and reviewed before merge.

lguohan

as comments

nikos-github · 2017-11-30T16:02:58Z

@lguohan If you are referring to UT test cases, they have been defined in the document. Xin hasn't uploaded yet the updated version I have sent her. If you are referring test cases for the framework, those are being implemented and should be ready soon. Once they are, we will send for review.

lguohan · 2017-12-07T01:45:36Z

@nikos-li, please add swss integration test for this PR. This is a fundamental feature. We need integration test for this one to make sure we won't have regression.

lguohan · 2018-03-15T00:07:21Z

retest this please

lguohan · 2018-03-15T02:13:41Z

@nikos-li , build fails for vs seems to be related to crm pr.

nikos-github · 2018-03-15T03:37:19Z

@lguohan It's not a build issue.

nikos-github · 2018-03-15T08:01:43Z

@lguohan All checks pass.

lguohan · 2018-03-18T18:44:33Z

there is potential race condition since ifChangeInformNextHop is called in the context of port notification thread. ifChangeInformNexthop further called validnexthopinNextHopGroup/invalidnexthopinNextHopGroup, to iterate over m_syncdNextHopGroups and modify the next hop group member without lock.

orchagent is essential single thread application, we need to schedule the port notification event to the main loop.

nikos-github · 2018-03-19T01:18:05Z

@lguohan validnexthopinNextHopGroup/invalidnexthopinNextHopGroup doesn't add/delete group members at the orchagent level. It does so only in syncd and there is a mutex on the db. I don't see the racing condition you are referring to.

lguohan · 2018-03-19T01:26:04Z

there are now two threads accessing m_syncdNextHopGroups, one is from main thread, now you add another one from port notification. data needs to be protected.

nikos-github · 2018-03-19T01:26:33Z

@lguohan membership of m_syncdNextHopGroups is not modified.

lguohan · 2018-03-19T01:28:45Z

does not matter, the main thread could modify, you can still get into trouble.

lguohan · 2018-03-19T04:15:31Z

did we ever tested the case when all nexthop are down and a route points to a nexthop group with zero nexthop group member?

Signed-off-by: Guohan Lu <lguohan@gmail.com>

nikos-github · 2018-03-19T04:27:10Z

@lguohan Yes 8 months ago during UT. Shuotian also tested this case and verified.

lguohan · 2018-03-19T04:28:24Z

hmm, sai changed from 1.0 to 1.2. Do we have any further test when sai 1.2 is released?

nikos-github · 2018-03-19T04:33:11Z

@lguohan Did Broadcom SAI implementation change the behavior of the remove_next_hop_group_member API and all of a sudden it deletes the nexthop groups that have no members? I don't see something like that in the code and if that was the case, then the subsequent call to remove the group on a delete after all the members are deleted, would have resulted in a crash and would be visible even without my code changes. A lot more things would be broken by now.

Signed-off-by: Guohan Lu <lguohan@gmail.com>

lguohan · 2018-03-19T05:38:15Z

retest this please

* Fix issue in cmis.get_transceiver_bulk_status 1. In case it fails to read EEPROM, either self.get_rx_power() or self.get_tx_power() can be a list of 'N/A'. Need to test it before calling self.mw_to_dbm 2. It should be a valid case for either self.get_rx_power() or self.get_tx_power() to return None. Handle other fields instead of returning None in this case Signed-off-by: Stephen Sun <stephens@nvidia.com> * Address comments: distinguish scenarios between not supporting and reading failure Signed-off-by: Stephen Sun <stephens@nvidia.com> * Adjust unit test case Signed-off-by: Stephen Sun <stephens@nvidia.com> * Remove redundant code Signed-off-by: Stephen Sun <stephens@nvidia.com> --------- Signed-off-by: Stephen Sun <stephens@nvidia.com>

stcheng requested review from prsunny and stcheng October 14, 2017 00:11

stcheng added the Enhancement ➕ label Oct 17, 2017

prsunny reviewed Oct 23, 2017

View reviewed changes

oleksandrivantsiv previously requested changes Oct 25, 2017

View reviewed changes

stcheng assigned nikos-github Oct 27, 2017

prsunny approved these changes Nov 10, 2017

View reviewed changes

lguohan requested changes Nov 30, 2017

View reviewed changes

ECMP acceleration for physical i/f down events

faff0b1

nikos-github force-pushed the master branch from 5bf79a8 to faff0b1 Compare March 14, 2018 23:45

nikos-github added 3 commits March 14, 2018 23:25

ECMP acceleration for physical i/f down events

73a108a

ECMP acceleration for physical i/f down events

f3c220c

ECMP acceleration for physical i/f down events

344dfa9

add next hop group test

c94ac10

Signed-off-by: Guohan Lu <lguohan@gmail.com>

reformat the code to align with current style

19f346f

Signed-off-by: Guohan Lu <lguohan@gmail.com>

lguohan approved these changes Mar 22, 2018

View reviewed changes

lguohan merged commit 8e635fd into sonic-net:master Mar 22, 2018

wendani mentioned this pull request Nov 1, 2020

[sub intf] ecmp hardware convergence acceleration at parent port oper status changes #1492

Merged

oleksandrivantsiv pushed a commit to oleksandrivantsiv/sonic-swss that referenced this pull request Mar 1, 2023

Add special comparison logic for LAG (sonic-net#351)

05ef677

ECMP acceleration for physical i/f down events #351

ECMP acceleration for physical i/f down events #351

Conversation

nikos-github commented Oct 13, 2017

msftclas commented Oct 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikos-github Oct 31, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikos-github Oct 31, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikos-github Oct 31, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lguohan commented Nov 9, 2017

nikos-github commented Nov 9, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lguohan commented Nov 30, 2017

lguohan left a comment

Choose a reason for hiding this comment

nikos-github commented Nov 30, 2017

lguohan commented Dec 7, 2017

lguohan commented Mar 15, 2018

lguohan commented Mar 15, 2018

nikos-github commented Mar 15, 2018

nikos-github commented Mar 15, 2018

lguohan commented Mar 18, 2018

nikos-github commented Mar 19, 2018

lguohan commented Mar 19, 2018

nikos-github commented Mar 19, 2018

lguohan commented Mar 19, 2018

lguohan commented Mar 19, 2018

nikos-github commented Mar 19, 2018

lguohan commented Mar 19, 2018

nikos-github commented Mar 19, 2018 • edited Loading

lguohan commented Mar 19, 2018

msftclas commented Oct 13, 2017 •

edited

Loading

nikos-github Oct 31, 2017 •

edited

Loading

nikos-github Oct 31, 2017 •

edited

Loading

nikos-github Oct 31, 2017 •

edited

Loading

nikos-github commented Nov 9, 2017 •

edited

Loading

nikos-github commented Mar 19, 2018 •

edited

Loading