-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sairedis: Fixing race condition for rif counters #1136
sairedis: Fixing race condition for rif counters #1136
Conversation
|
Added @prsunny for review |
This PR along with (sonic-net/sonic-swss#2488) handles the below issue : |
syncd/FlexCounter.cpp
Outdated
return "RATES:" + key + ":RIF"; | ||
} | ||
|
||
void FlexCounter::cleanUpRifFromCounterDb(_In_ sai_object_id_t rifId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please try reuse removeDataFromCountersDB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Junchao-Mellanox , Sure , I will use as suggested and check/UT . If it works fine, will submit a patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Junchao-Mellanox , The changes you suggested worked fine , submitting a new patch for this.
is this backward compatible? |
@kcudnik , it is backward compatible. The stale rif counter db table/entries was cleaned up in orchagent in previous version while now it will be cleaned up in syncd. |
can you solve build issues? |
sure , will resolve it (Please note the new patch will be submitted if it works fine with @Junchao-Mellanox suggestion) |
* Fixing issue #11621 * The cleanup code for stale rif counters are now moved to syncd . Earlier as part of fix for issue #2193 the cleanup for stale rif counters was added. * But it could create a race condition between orchagent removes RIF rate counters from DB and lua script fetching them. * So as a fix all such cleanup has been moved to syncd. Signed-off-by: Suman Kumar <suman.kumar@broadcom.com>
/azpw run Azure.sonic-sairedis |
/AzurePipelines run Azure.sonic-sairedis |
Azure Pipelines successfully started running 1 pipeline(s). |
/AzurePipelines run Azure.sonic-sairedis |
Commenter does not have sufficient privileges for PR 1136 in repo sonic-net/sonic-sairedis |
/AzurePipelines run Azure.sonic-sairedis |
Commenter does not have sufficient privileges for PR 1136 in repo sonic-net/sonic-sairedis |
/AzurePipelines run Azure.sonic-sairedis |
Commenter does not have sufficient privileges for PR 1136 in repo sonic-net/sonic-sairedis |
/AzurePipelines run Azure.sonic-sairedis |
Commenter does not have sufficient privileges for PR 1136 in repo sonic-net/sonic-sairedis |
/AzurePipelines run Azure.sonic-sairedis |
Commenter does not have sufficient privileges for PR 1136 in repo sonic-net/sonic-sairedis |
/AzurePipelines run |
Commenter does not have sufficient privileges for PR 1136 in repo sonic-net/sonic-sairedis |
/AzurePipelines run Azure.sonic-sairedis |
/AzurePipelines run Azure.sonic-sairedis |
Commenter does not have sufficient privileges for PR 1136 in repo sonic-net/sonic-sairedis |
Please try with "/azpw run" |
/azpw run |
/AzurePipelines run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azpw run |
/AzurePipelines run |
Azure Pipelines successfully started running 1 pipeline(s). |
@prsunny @liat-grozovik @kcudnik |
* Fixing issue #11621 * The cleanup code for stale rif counters are now moved to syncd . Earlier as part of fix for issue #2193 the cleanup for stale rif counters was added. * But it could create a race condition between orchagent removes RIF rate counters from DB and lua script fetching them. * So as a fix all such cleanup has been moved to syncd. Signed-off-by: Suman Kumar <suman.kumar@broadcom.com>
* Fixing issue #11621 * The cleanup code for stale rif counters are now moved to syncd . Earlier as part of fix for issue #2193 the cleanup for stale rif counters was added. * But it could create a race condition between orchagent removes RIF rate counters from DB and lua script fetching them. * So as a fix all such cleanup has been moved to syncd. Signed-off-by: Suman Kumar <suman.kumar@broadcom.com>
* Fixing issue #11621 * The cleanup code for stale rif counters are now moved to syncd . Earlier as part of fix for issue #2193 the cleanup for stale rif counters was added. * But it could create a race condition between orchagent removes RIF rate counters from DB and lua script fetching them. * So as a fix all such cleanup has been moved to syncd. Signed-off-by: Suman Kumar <suman.kumar@broadcom.com>
@sumanbrcm this change cannot be cherry-picked cleanly, can you help enter an new PR for 202205 branch? |
* Fixing issue #11621 * The cleanup code for stale rif counters are now moved to syncd . Earlier as part of fix for issue #2193 the cleanup for stale rif counters was added. * But it could create a race condition between orchagent removes RIF rate counters from DB and lua script fetching them. * So as a fix all such cleanup has been moved to syncd. Signed-off-by: Suman Kumar <suman.kumar@broadcom.com>
* Fixing issue #11621 * The cleanup code for stale rif counters are now moved to syncd . Earlier as part of fix for issue #2193 the cleanup for stale rif counters was added. * But it could create a race condition between orchagent removes RIF rate counters from DB and lua script fetching them. * So as a fix all such cleanup has been moved to syncd. Signed-off-by: Suman Kumar <suman.kumar@broadcom.com>
@adyeung can you help ping the author? This change cannot be cherry-picked cleanly to 202205 branch. a separate PR is needed. |
Requested submitter @sumanbrcm to prioritize next wk |
Changes for 202205 branch Signed-off-by: Suman Kumar <suman.kumar@broadcom.com>
Changes for 202205 branch Signed-off-by: Suman Kumar <suman.kumar@broadcom.com>
Changes for 202205 branch Signed-off-by: Suman Kumar <suman.kumar@broadcom.com>
Changes for 202205 branch Fixing issue #11621 All the details of the fix in master branch is in the PR : - sairedis: Fixing race condition for rif counters #1136 swss: Fixing race condition for rif counters sonic-swss#2488 This change is already merged to master branch , but sonic-sairedis change (sairedis: Fixing race condition for rif counters #1136) could not be cherry picked to 202205 branch as the base file FlexCounter.cpp differs in both the branches. Also, the API removeDataFromCountersDB is not available in 202205 branch for the cleanup. Hence the current fix takes care of cleaning up the stale rif counters issue through newly added API cleanUpRifFromCounterDb . Unit Tests:- The steps followed are same as in sonic-net/sonic-buildimage#11621 Create RIF in SONiC, wait till RIF rates are populated in COUNTERS DB Remove RIF Repeat the steps multiple times and check if any error syslog is seen (No error syslog is seen) Also checked cleanup for rif counters. After RIF creation derived info of oid for RIF from "COUNTERS_RIF_NAME_MAP" 127) "Vlan100" 128) "oid:0x6000000000aa5" Checked all the tabled in COUNTER_DB which has same OID in keys 127.0.0.1:6379[2]> keys 6000000000aa5 "RATES:oid:0x6000000000aa5:RIF" "COUNTERS:oid:0x6000000000aa5" "RATES:oid:0x6000000000aa5" 127.0.0.1:6379[2]> Deleted the RIF by removing the ip on the intf. Checked COUNTER_DB again with same OID if there are stale entries or not. No stale entries exist now. 127.0.0.1:6379[2]> keys 6000000000aa5 (empty array) 127.0.0.1:6379[2]> Signed-off-by: Suman Kumar suman.kumar@broadcom.com Signed-off-by: Suman Kumar <suman.kumar@broadcom.com>
Signed-off-by: Suman Kumar suman.kumar@broadcom.com
All the details are mentioned in :
sonic-net/sonic-swss#2488