Skip to content

Commit 4bc6659

Browse files
committedFeb 6, 2025
Optimize counter polling interval by making it more accurate
<!-- Please make sure you have read and understood the contribution guildlines: https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md 1. Make sure your commit includes a signature generted with `git commit -s` 2. Make sure your commit title follows the correct format: [component]: description 3. Make sure your commit message contains enough details about the change and related tests 4. Make sure your pull request adds related reviewers, asignees, labels Please also provide the following information in this pull request: --> **What I did** Optimize the counter-polling performance in terms of polling interval accuracy 1. Enable bulk counter-polling to run at a smaller chunk size There is one counter-polling thread for each counter group. All such threads can compete for the critical sections at the vendor SAI level, which means a counter-polling thread can wait for a critical section if another thread has been in it, which introduces latency for the waiting counter group. An example is the competition between the PFC watchdog and the port counter groups. The port counter group contains many counters and is polled in a bulk mode which takes a relatively longer time. The PFC watchdog counter group contains only a few counters but is polled quickly. Sometimes, PFC watchdog counters must wait before polling, which makes the polling interval inaccurate and prevents the PFC storm from being detected in time. To resolve this issue, we can reduce the chunk size of the port counter group. By default, the port counter group polls the counters of all ports in a single bulk operation. By using a smaller chunk size, it polls the counters in several bulk operations, with each polling counter of a subset (whose size = `chunk size`) of all ports. Furthermore, we support setting chunk size on a per-counter-ID basis. By doing so, the port counter group stays in the critical section for a shorter time and the PFC watchdog is more likely to be scheduled to poll counters and detect the PFC storm in time. 2. Collect the time stamp immediately after vendor SAI API returns. Currently, many counter groups require a Lua plugin to execute based on polling interval, to calculate rates, detect certain events, etc. Eg. For PFC watchdog counter group to PFC storm. In this case, the polling interval is calculated based on the difference of time stamps between the `current` and `last` poll to avoid deviation due to scheduling latency. However, the timestamp is collected in the Lua plugin which is several steps after the SAI API returns and is executed in a different context (redis-server). Both introduce even larger deviations. To overcome this, we collect the timestamp immediately after the SAI API returns. Depends on 1. sonic-net/sonic-swss-common#950 2. sonic-net/sonic-sairedis#1519 **Why I did it** **How I verified it** Run regression test and observe counter-polling performance. A comparison test shows very good results if we put any/or all of the above optimizations. **Details if related** For 2, each counter group contains more than one counter context based on the type of objects. counter context is mapped from (group, object type). But the counters fetched from different counter groups will be pushed into the same entry for the same objects. eg. PFC_WD group contains counters of ports and queues. PORT group contains counters of ports. QUEUE_STAT group contains counters of queues. Both PFC_WD and PORT groups will push counter data into an item representing a port. but each counter has its own polling interval, which means counter IDs polled from different counter groups can be polled with different time stamps. We use the name of a counter group to identify the time stamp of the counter group. Eg. In port counter entry, PORT_timestamp represents last time when the port counter group polls the counters. PFC_WD_timestamp represents the last time when the PFC watchdog counter group polls the counters
1 parent 822310d commit 4bc6659

File tree

6 files changed

+137
-4
lines changed

6 files changed

+137
-4
lines changed
 

‎orchagent/flexcounterorch.cpp

+23
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,8 @@ void FlexCounterOrch::doTask(Consumer &consumer)
118118
{
119119
auto itDelay = std::find(std::begin(data), std::end(data), FieldValueTuple(FLEX_COUNTER_DELAY_STATUS_FIELD, "true"));
120120
string poll_interval;
121+
string bulk_chunk_size;
122+
string bulk_chunk_size_per_counter;
121123

122124
if (itDelay != data.end())
123125
{
@@ -141,6 +143,14 @@ void FlexCounterOrch::doTask(Consumer &consumer)
141143
}
142144
}
143145
}
146+
else if (field == BULK_CHUNK_SIZE_FIELD)
147+
{
148+
bulk_chunk_size = value;
149+
}
150+
else if (field == BULK_CHUNK_SIZE_PER_PREFIX_FIELD)
151+
{
152+
bulk_chunk_size_per_counter = value;
153+
}
144154
else if(field == FLEX_COUNTER_STATUS_FIELD)
145155
{
146156
// Currently, the counters are disabled for polling by default
@@ -256,6 +266,19 @@ void FlexCounterOrch::doTask(Consumer &consumer)
256266
SWSS_LOG_NOTICE("Unsupported field %s", field.c_str());
257267
}
258268
}
269+
270+
if (!bulk_chunk_size.empty() || !bulk_chunk_size_per_counter.empty())
271+
{
272+
m_groupsWithBulkChunkSize.insert(key);
273+
setFlexCounterGroupBulkChunkSize(flexCounterGroupMap[key],
274+
bulk_chunk_size.empty() ? "NULL" : bulk_chunk_size,
275+
bulk_chunk_size_per_counter.empty() ? "NULL" : bulk_chunk_size_per_counter);
276+
}
277+
else if (m_groupsWithBulkChunkSize.find(key) != m_groupsWithBulkChunkSize.end())
278+
{
279+
setFlexCounterGroupBulkChunkSize(flexCounterGroupMap[key], "NULL", "NULL");
280+
m_groupsWithBulkChunkSize.erase(key);
281+
}
259282
}
260283

261284
consumer.m_toSync.erase(it++);

‎orchagent/flexcounterorch.h

+1
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@ class FlexCounterOrch: public Orch
6767
Table m_bufferQueueConfigTable;
6868
Table m_bufferPgConfigTable;
6969
Table m_deviceMetadataConfigTable;
70+
std::unordered_set<std::string> m_groupsWithBulkChunkSize;
7071
};
7172

7273
#endif

‎orchagent/pfc_detect_mellanox.lua

100644100755
+40-4
Original file line numberDiff line numberDiff line change
@@ -18,13 +18,20 @@ local timestamp_struct = redis.call('TIME')
1818
local timestamp_current = timestamp_struct[1] + timestamp_struct[2] / 1000000
1919
local timestamp_string = tostring(timestamp_current)
2020
redis.call('HSET', 'TIMESTAMP', 'pfcwd_poll_timestamp_last', timestamp_string)
21-
local effective_poll_time = poll_time
22-
local effective_poll_time_lasttime = redis.call('HGET', 'TIMESTAMP', 'effective_pfcwd_poll_time_last')
21+
local global_effective_poll_time = poll_time
22+
local global_effective_poll_time_lasttime = redis.call('HGET', 'TIMESTAMP', 'effective_pfcwd_poll_time_last')
2323
if timestamp_last ~= false then
24-
effective_poll_time = (timestamp_current - tonumber(timestamp_last)) * 1000000
25-
redis.call('HSET', 'TIMESTAMP', 'effective_pfcwd_poll_time_last', effective_poll_time)
24+
global_effective_poll_time = (timestamp_current - tonumber(timestamp_last)) * 1000000
25+
redis.call('HSET', 'TIMESTAMP', 'effective_pfcwd_poll_time_last', global_effective_poll_time)
2626
end
2727

28+
local effective_poll_time
29+
local effective_poll_time_lasttime
30+
local port_timestamp_last_cache = {}
31+
32+
local debug_storm_global = redis.call('HGET', 'DEBUG_STORM', 'enabled') == 'true'
33+
local debug_storm_threshold = tonumber(redis.call('HGET', 'DEBUG_STORM', 'threshold'))
34+
2835
-- Iterate through each queue
2936
local n = table.getn(KEYS)
3037
for i = n, 1, -1 do
@@ -56,12 +63,37 @@ for i = n, 1, -1 do
5663
local pfc_rx_pkt_key = 'SAI_PORT_STAT_PFC_' .. queue_index .. '_RX_PKTS'
5764
local pfc_duration_key = 'SAI_PORT_STAT_PFC_' .. queue_index .. '_RX_PAUSE_DURATION_US'
5865

66+
-- Get port specific timestamp
67+
local port_timestamp_current = tonumber(redis.call('HGET', counters_table_name .. ':' .. port_id, 'PFC_WD_time_stamp'))
68+
if port_timestamp_current ~= nil then
69+
local port_timestamp_lasttime = port_timestamp_last_cache[port_id]
70+
if port_timestamp_lasttime == nil then
71+
port_timestamp_lasttime = tonumber(redis.call('HGET', counters_table_name .. ':' .. port_id, 'PFC_WD_time_stamp_last'))
72+
port_timestamp_last_cache[port_id] = port_timestamp_lasttime
73+
redis.call('HSET', counters_table_name .. ':' .. port_id, 'PFC_WD_time_stamp_last', port_timestamp_current)
74+
end
75+
76+
if port_timestamp_lasttime ~= nil then
77+
effective_poll_time = (port_timestamp_current - port_timestamp_lasttime) / 1000
78+
else
79+
effective_poll_time = global_effective_poll_time
80+
end
81+
effective_poll_time_lasttime = false
82+
else
83+
effective_poll_time = global_effective_poll_time
84+
effective_poll_time_lasttime = global_effective_poll_time_lasttime
85+
end
86+
5987
-- Get all counters
6088
local occupancy_bytes = redis.call('HGET', counters_table_name .. ':' .. KEYS[i], 'SAI_QUEUE_STAT_CURR_OCCUPANCY_BYTES')
6189
local packets = redis.call('HGET', counters_table_name .. ':' .. KEYS[i], 'SAI_QUEUE_STAT_PACKETS')
6290
local pfc_rx_packets = redis.call('HGET', counters_table_name .. ':' .. port_id, pfc_rx_pkt_key)
6391
local pfc_duration = redis.call('HGET', counters_table_name .. ':' .. port_id, pfc_duration_key)
6492

93+
if debug_storm_global then
94+
redis.call('PUBLISH', 'PFC_WD_DEBUG', 'Port ID ' .. port_id .. ' Queue index ' .. queue_index .. ' occupancy ' .. occupancy_bytes .. ' packets ' .. packets .. ' pfc rx ' .. pfc_rx_packets .. ' pfc duration ' .. pfc_duration .. ' effective poll time ' .. tostring(effective_poll_time) .. '(global ' .. tostring(global_effective_poll_time) .. ')')
95+
end
96+
6597
if occupancy_bytes and packets and pfc_rx_packets and pfc_duration then
6698
occupancy_bytes = tonumber(occupancy_bytes)
6799
packets = tonumber(packets)
@@ -82,6 +114,10 @@ for i = n, 1, -1 do
82114
pfc_duration_last = tonumber(pfc_duration_last)
83115
local storm_condition = (pfc_duration - pfc_duration_last) > (effective_poll_time * 0.99)
84116

117+
if debug_storm_threshold ~= nil and (pfc_duration - pfc_duration_last) > (effective_poll_time * debug_storm_threshold / 100) then
118+
redis.call('PUBLISH', 'PFC_WD_DEBUG', 'Port ID ' .. port_id .. ' Queue index ' .. queue_index .. ' occupancy ' .. occupancy_bytes .. ' packets ' .. packets .. ' pfc rx ' .. pfc_rx_packets .. ' pfc duration ' .. pfc_duration .. ' effective poll time ' .. tostring(effective_poll_time) .. ', triggered by threshold ' .. debug_storm_threshold .. '%')
119+
end
120+
85121
-- Check actual condition of queue being in PFC storm
86122
if (occupancy_bytes > 0 and packets - packets_last == 0 and storm_condition) or
87123
-- DEBUG CODE START. Uncomment to enable

‎orchagent/saihelper.cpp

+23
Original file line numberDiff line numberDiff line change
@@ -851,6 +851,8 @@ static inline void initSaiRedisCounterEmptyParameter(sai_redis_flex_counter_grou
851851
initSaiRedisCounterEmptyParameter(flex_counter_group_param.stats_mode);
852852
initSaiRedisCounterEmptyParameter(flex_counter_group_param.plugin_name);
853853
initSaiRedisCounterEmptyParameter(flex_counter_group_param.plugins);
854+
initSaiRedisCounterEmptyParameter(flex_counter_group_param.bulk_chunk_size);
855+
initSaiRedisCounterEmptyParameter(flex_counter_group_param.bulk_chunk_size_per_prefix);
854856
}
855857

856858
static inline void initSaiRedisCounterParameterFromString(sai_s8_list_t &sai_s8_list, const std::string &str)
@@ -935,6 +937,8 @@ void setFlexCounterGroupParameter(const string &group,
935937
attr.id = SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER_GROUP;
936938
attr.value.ptr = &flex_counter_group_param;
937939

940+
initSaiRedisCounterEmptyParameter(flex_counter_group_param.bulk_chunk_size);
941+
initSaiRedisCounterEmptyParameter(flex_counter_group_param.bulk_chunk_size_per_prefix);
938942
initSaiRedisCounterParameterFromString(flex_counter_group_param.counter_group_name, group);
939943
initSaiRedisCounterParameterFromString(flex_counter_group_param.poll_interval, poll_interval);
940944
initSaiRedisCounterParameterFromString(flex_counter_group_param.operation, operation);
@@ -1014,6 +1018,25 @@ void setFlexCounterGroupStatsMode(const std::string &group,
10141018
notifySyncdCounterOperation(is_gearbox, attr);
10151019
}
10161020

1021+
void setFlexCounterGroupBulkChunkSize(const std::string &group,
1022+
const std::string &bulk_chunk_size,
1023+
const std::string &bulk_chunk_size_per_prefix,
1024+
bool is_gearbox)
1025+
{
1026+
sai_attribute_t attr;
1027+
sai_redis_flex_counter_group_parameter_t flex_counter_group_param;
1028+
1029+
attr.id = SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER_GROUP;
1030+
attr.value.ptr = &flex_counter_group_param;
1031+
1032+
initSaiRedisCounterEmptyParameter(flex_counter_group_param);
1033+
initSaiRedisCounterParameterFromString(flex_counter_group_param.counter_group_name, group);
1034+
initSaiRedisCounterParameterFromString(flex_counter_group_param.bulk_chunk_size, bulk_chunk_size);
1035+
initSaiRedisCounterParameterFromString(flex_counter_group_param.bulk_chunk_size_per_prefix, bulk_chunk_size_per_prefix);
1036+
1037+
notifySyncdCounterOperation(is_gearbox, attr);
1038+
}
1039+
10171040
void delFlexCounterGroup(const std::string &group,
10181041
bool is_gearbox)
10191042
{

‎orchagent/saihelper.h

+5
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,11 @@ void setFlexCounterGroupStatsMode(const std::string &group,
3939
const std::string &stats_mode,
4040
bool is_gearbox=false);
4141

42+
void setFlexCounterGroupBulkChunkSize(const std::string &group,
43+
const std::string &bulk_size,
44+
const std::string &bulk_chunk_size_per_prefix,
45+
bool is_gearbox=false);
46+
4247
void delFlexCounterGroup(const std::string &group,
4348
bool is_gearbox=false);
4449

‎tests/mock_tests/flexcounter_ut.cpp

+45
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,10 @@ namespace flexcounter_test
111111
}
112112
else
113113
{
114+
if (flexCounterGroupParam->bulk_chunk_size.list != nullptr || flexCounterGroupParam->bulk_chunk_size_per_prefix.list != nullptr)
115+
{
116+
return SAI_STATUS_SUCCESS;
117+
}
114118
mockFlexCounterGroupTable->del(key);
115119
}
116120

@@ -824,6 +828,47 @@ namespace flexcounter_test
824828
consumer->addToSync(entries);
825829
entries.clear();
826830
static_cast<Orch *>(gBufferOrch)->doTask();
831+
832+
if (!gTraditionalFlexCounter)
833+
{
834+
// Verify bulk chunk size fields which can be verified in any combination of parameters.
835+
// We verify it here just for convenience.
836+
consumer = dynamic_cast<Consumer *>(flexCounterOrch->getExecutor(CFG_FLEX_COUNTER_TABLE_NAME));
837+
838+
entries.push_back({"PORT", "SET", {
839+
{"FLEX_COUNTER_STATUS", "enable"},
840+
{"BULK_CHUNK_SIZE", "64"}
841+
}});
842+
consumer->addToSync(entries);
843+
entries.clear();
844+
static_cast<Orch *>(flexCounterOrch)->doTask();
845+
ASSERT_TRUE(flexCounterOrch->m_groupsWithBulkChunkSize.find("PORT") != flexCounterOrch->m_groupsWithBulkChunkSize.end());
846+
847+
entries.push_back({"PORT", "SET", {
848+
{"FLEX_COUNTER_STATUS", "enable"}
849+
}});
850+
consumer->addToSync(entries);
851+
entries.clear();
852+
static_cast<Orch *>(flexCounterOrch)->doTask();
853+
ASSERT_EQ(flexCounterOrch->m_groupsWithBulkChunkSize.find("PORT"), flexCounterOrch->m_groupsWithBulkChunkSize.end());
854+
855+
entries.push_back({"PORT", "SET", {
856+
{"FLEX_COUNTER_STATUS", "enable"},
857+
{"BULK_CHUNK_SIZE_PER_PREFIX", "SAI_PORT_STAT_IF_OUT_QLEN:0;SAI_PORT_STAT_IF_IN_FEC:32"}
858+
}});
859+
consumer->addToSync(entries);
860+
entries.clear();
861+
static_cast<Orch *>(flexCounterOrch)->doTask();
862+
ASSERT_TRUE(flexCounterOrch->m_groupsWithBulkChunkSize.find("PORT") != flexCounterOrch->m_groupsWithBulkChunkSize.end());
863+
864+
entries.push_back({"PORT", "SET", {
865+
{"FLEX_COUNTER_STATUS", "enable"}
866+
}});
867+
consumer->addToSync(entries);
868+
entries.clear();
869+
static_cast<Orch *>(flexCounterOrch)->doTask();
870+
ASSERT_EQ(flexCounterOrch->m_groupsWithBulkChunkSize.find("PORT"), flexCounterOrch->m_groupsWithBulkChunkSize.end());
871+
}
827872
}
828873

829874
// Remove buffer pools

0 commit comments

Comments
 (0)