Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xcvrd changes to support 400G ZR configuration #270

Merged
merged 4 commits into from
Aug 2, 2022
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions sonic-xcvrd/tests/test_xcvrd.py
Original file line number Diff line number Diff line change
Expand Up @@ -443,6 +443,27 @@ def test_CmisManagerTask_handle_port_change_event(self):
task.on_port_update_event(port_change_event)
assert len(task.port_dict) == 1


@patch('xcvrd.xcvrd.XcvrTableHelper')
def test_CmisManagerTask_get_configured_freq(self, mock_table_helper):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add failure case? For instance, 1. when config_freq is out of allowed freq range. 2. when config_freq does not fall on 75GHz grid.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API tests get_configured_laser_freq() which obtains the configured value from the PORT table of CONFIG_DB. Invalid frequency is validated here sonic-net/sonic-utilities#2197

Copy link
Contributor

@qinchuanares qinchuanares Jul 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I looking at the correct place https://github.com/prgeor/sonic-utilities/blob/ba58d985b2483d24fb68821f11c6507dd08de567/scripts/portconfig#L337? seems like it's only validating the case when freq is <= 0? Does it verify the provisioned frequency is within the min and max advertised supported freq values?

port_mapping = PortMapping()
task = CmisManagerTask(DEFAULT_NAMESPACE, port_mapping)
cfg_port_tbl = MagicMock()
cfg_port_tbl.get = MagicMock(return_value=(True, (('laser_freq', 193100),)))
mock_table_helper.get_cfg_port_tbl = MagicMock(return_value=cfg_port_tbl)
task.xcvr_table_helper.get_cfg_port_tbl = mock_table_helper.get_cfg_port_tbl
assert task.get_configured_laser_freq('Ethernet0') == 193100

@patch('xcvrd.xcvrd.XcvrTableHelper')
def test_CmisManagerTask_get_configured_tx_power(self, mock_table_helper):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly can we add failure case? 1. when configured tx power is out of range.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API simply get the configured tx_power from the CONFIG_DB. The user input for TX power is validated here sonic-net/sonic-utilities#2197. Please do note, tx_power can be configured even if transceiver is NOT plugged in, so we cannot validate tx_power in that case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right place that it validates tx power https://github.com/prgeor/sonic-utilities/blob/ba58d985b2483d24fb68821f11c6507dd08de567/scripts/portconfig#L332? Does it mean it only allows one digit after decimal point? Does it verify the configured tx power is within the min and max advertised supported tx power values?

port_mapping = PortMapping()
task = CmisManagerTask(DEFAULT_NAMESPACE, port_mapping)
cfg_port_tbl = MagicMock()
cfg_port_tbl.get = MagicMock(return_value=(True, (('tx_power', -10),)))
mock_table_helper.get_cfg_port_tbl = MagicMock(return_value=cfg_port_tbl)
task.xcvr_table_helper.get_cfg_port_tbl = mock_table_helper.get_cfg_port_tbl
assert task.get_configured_tx_power('Ethernet0') == -10

@patch('xcvrd.xcvrd.platform_chassis')
@patch('xcvrd.xcvrd_utilities.port_mapping.subscribe_port_update_event', MagicMock(return_value=(None, None)))
@patch('xcvrd.xcvrd_utilities.port_mapping.handle_port_update_event', MagicMock())
Expand All @@ -469,6 +490,9 @@ def test_CmisManagerTask_task_worker(self, mock_chassis):
mock_xcvr_api.set_lpmode = MagicMock(return_value=True)
mock_xcvr_api.set_application = MagicMock(return_value=True)
mock_xcvr_api.is_flat_memory = MagicMock(return_value=False)
mock_xcvr_api.is_coherent_module = MagicMock(return_value=True)
mock_xcvr_api.get_tx_config_power = MagicMock(return_value=0)
mock_xcvr_api.get_laser_config_freq = MagicMock(return_value=0)
mock_xcvr_api.get_module_type_abbreviation = MagicMock(return_value='QSFP-DD')
mock_xcvr_api.get_application_advertisement = MagicMock(return_value={
1: {
Expand Down Expand Up @@ -551,6 +575,10 @@ def test_CmisManagerTask_task_worker(self, mock_chassis):

task.get_host_tx_status = MagicMock(return_value='true')
task.get_port_admin_status = MagicMock(return_value='up')
task.get_configured_tx_power = MagicMock(return_value=-13)
task.get_configured_laser_freq = MagicMock(return_value=193100)
task.configure_tx_output_power = MagicMock(return_value=1)
task.configure_laser_frequency = MagicMock(return_value=1)

# Case 1: Module Inserted --> DP_DEINIT
task.task_stopping_event.is_set = MagicMock(side_effect=[False, False, True])
Expand Down
96 changes: 90 additions & 6 deletions sonic-xcvrd/xcvrd/xcvrd.py
Original file line number Diff line number Diff line change
Expand Up @@ -999,6 +999,11 @@ def on_port_update_event(self, port_change_event):
# We dont have better way to check if 'admin_status' is from APPL_DB or STATE_DB so this
# check is put temporarily to listen only to APPL_DB's admin_status and ignore that of STATE_DB
self.port_dict[lport]['admin_status'] = port_change_event.port_dict['admin_status']
if 'laser_freq' in port_change_event.port_dict:
self.port_dict[lport]['laser_freq'] = int(port_change_event.port_dict['laser_freq'])
if 'tx_power' in port_change_event.port_dict:
self.port_dict[lport]['tx_power'] = float(port_change_event.port_dict['tx_power'])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With CMIS state reset to INSERTED upon any event for the port, the coherent module will be read continuously for Tx power and frequency.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

port_dict[] is updated by reading the configuration values from the DB, not by reading the module

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not about port_dict[].
When the CMIS state is reset, the state changes to INSERTED. Once the CMIS state is INSERTED, the below piece of code executed continuously in the CMIS mgr task worker. api.get_tx_config_power() and api.get_laser_config_freq() query the device with that.

Configure the target output power if ZR module

                    if api.is_coherent_module():
                       tx_power = self.port_dict[lport]['tx_power']
                       # Prevent configuring same tx power multiple times
                       if 0 != tx_power and tx_power != api.get_tx_config_power():

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed offline, this is not specific to tx_power or laser_freq. Currently don't have a subscription mechanism to listen only to the interested fields. This is the git issue to improve on this #259

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree and hopefully #259 converges soon.
Note: currently there is no overhead due to this issue. But with the this PR changes the optical module will be read for both freq and tx-power register offsets in continuous fashion.

self.force_cmis_reinit(lport, 0)
else:
self.port_dict[lport]['cmis_state'] = self.CMIS_STATE_REMOVED
Expand Down Expand Up @@ -1121,7 +1126,6 @@ def is_cmis_application_update_required(self, api, channel, speed):
skip = False
break
return (not skip)

return True

def force_cmis_reinit(self, lport, retries=0):
Expand Down Expand Up @@ -1203,6 +1207,32 @@ def check_datapath_state(self, api, channel, states):

return done

def get_configured_laser_freq(self, lport):
"""
Return the Tx power configured by user in CONFIG_DB's PORT table
"""
freq = 0
asic_index = self.port_mapping.get_asic_id_for_logical_port(lport)
port_tbl = self.xcvr_table_helper.get_cfg_port_tbl(asic_index)

found, port_info = port_tbl.get(lport)
if found and 'laser_freq' in dict(port_info):
freq = dict(port_info)['laser_freq']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if found is True but no frequency is configured in port_info? It's likely to happen when a ZR module is plugged but freq config is missing. Do we see the port to be err-disabled?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the port_info{} dictionary represents the PORT table in CONFIG_DB. If laser frequency is NOT configured, then we move on with the module's default frequency.

Copy link
Contributor

@qinchuanares qinchuanares Jul 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

freq init value == 0 in this function. Does it mean when we see freq == 0, it will set module frequency to the default frequency? Can we test this feature with a ZR module by erasing the port's frequency config and see how the module will be configured.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes module comes with default frequency if not configured

return int(freq)

def get_configured_tx_power(self, lport):
"""
Return the Tx power configured by user in CONFIG_DB's PORT table
"""
power = 0
asic_index = self.port_mapping.get_asic_id_for_logical_port(lport)
port_tbl = self.xcvr_table_helper.get_cfg_port_tbl(asic_index)

found, port_info = port_tbl.get(lport)
if found and 'tx_power' in dict(port_info):
power = dict(port_info)['tx_power']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly. What if found is True but no tx power is configured on the port. Should we enable a default power setting?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default is the module's default. This API gets the value explicitly configured by the user

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the previous comment, Can we test this feature with a ZR module by erasing the port's tx power config and see how the module will be configured.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If tx power is not configured, module comes with module's default Tx power.

return float(power)

def get_host_tx_status(self, lport):
host_tx_ready = 'false'

Expand All @@ -1226,11 +1256,31 @@ def get_port_admin_status(self, lport):
admin_status = dict(port_info)['admin_status']
return admin_status

def configure_tx_output_power(self, api, lport, tx_power):
min_p, max_p = api.get_supported_power_config()
if tx_power < min_p:
self.log_error("{} configured tx power {} < minimum power {} supported".format(lport, tx_power, min_p))
if tx_power > max_p:
self.log_error("{} configured tx power {} > maximum power {} supported".format(lport, tx_power, max_p))
return api.set_tx_power(tx_power)

def configure_laser_frequency(self, api, lport, freq):
_, _, _, lowf, highf = api.get_supported_freq_config()
if freq < lowf:
self.log_error("{} configured freq:{} GHz is lower than the supported freq:{} GHz".format(lport, freq, lowf))
if freq > highf:
self.log_error("{} configured freq:{} GHz is higher than the supported freq:{} GHz".format(lport, freq, highf))
chan = int(round((freq - 193100)/25))
if chan % 3 != 0:
self.log_error("{} configured freq:{} GHz is NOT in 75GHz grid".format(lport, freq))
if api.get_tuning_in_progress():
self.log_error("{} Tuning in progress, channel selection may fail!".format(lport))
return api.set_laser_freq(freq)

def task_worker(self):
self.xcvr_table_helper = XcvrTableHelper(self.namespaces)

self.log_notice("Starting...")
print("Starting")

# APPL_DB for CONFIG updates, and STATE_DB for insertion/removal
sel, asic_context = port_mapping.subscribe_port_update_event(self.namespaces)
Expand Down Expand Up @@ -1309,9 +1359,16 @@ def task_worker(self):
if (type is None) or (type not in self.CMIS_MODULE_TYPES):
self.port_dict[lport]['cmis_state'] = self.CMIS_STATE_READY
continue

if api.is_coherent_module():
if 'tx_power' not in self.port_dict[lport]:
self.port_dict[lport]['tx_power'] = self.get_configured_tx_power(lport)
if 'laser_freq' not in self.port_dict[lport]:
self.port_dict[lport]['laser_freq'] = self.get_configured_laser_freq(lport)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think get_configured_tx_power_from_dict() naming is appropriate as it reading from dict and not from the device. similar for get_configured_laser_freq().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaganbal-a the configured frequency is read from the DB. see comment inside get_configured_tx_power()

Copy link

@jaganbal-a jaganbal-a Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the get_configured_tx_power() read from DB, but the naming sounds like it read from the device.
So I would suggest the naming get_configured_tx_power_from_dict()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaganbal-a please check latest changes

except AttributeError:
# Skip if these essential routines are not available
self.port_dict[lport]['cmis_state'] = self.CMIS_STATE_READY
assert 1 == 0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this

continue

# CMIS expiration and retries
Expand Down Expand Up @@ -1339,20 +1396,38 @@ def task_worker(self):
api.tx_disable_channel(host_lanes, True)
self.port_dict[lport]['cmis_state'] = self.CMIS_STATE_READY
continue
# Configure the target output power if ZR module
if api.is_coherent_module():
tx_power = self.port_dict[lport]['tx_power']
# Prevent configuring same tx power multiple times
if 0 != tx_power and tx_power != api.get_tx_config_power():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does it mean when tx_power == 0?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it means no 'tx_power' missing in the PORT table of CONFIG_DB. See get_configured_tx_power()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check helps preventing reprogramming during xcvrd restart?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check helps preventing reprogramming during xcvrd restart?

@jaganbal-a YES

if 1 != self.configure_tx_output_power(api, lport, tx_power):
self.log_error("{} failed to configure Tx power = {}".format(lport, tx_power))
else:
self.log_notice("{} Successfully configured Tx power = {}".format(lport, tx_power))

appl = self.get_cmis_application_desired(api, host_lanes, host_speed)
if appl < 1:
self.log_error("{}: no suitable app for the port".format(lport))
self.port_dict[lport]['cmis_state'] = self.CMIS_STATE_FAILED
continue

has_update = self.is_cmis_application_update_required(api, host_lanes, host_speed)
if not has_update:
need_update = self.is_cmis_application_update_required(api, host_lanes, host_speed)

# For ZR module, Datapath needes to be re-initlialized on new channel selection
if api.is_coherent_module():
freq = self.port_dict[lport]['laser_freq']
# If user requested frequency is NOT the same as configured on the module
# force datapath re-initialization
if 0 != freq and freq != api.get_laser_config_freq():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does it mean when freq == 0?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means 'laser_freq' is missing in PORT table of CONFIG_DB. see get_configured_laser_freq()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. It means the no laser_freq is configured in config_db.

need_update = True

if not need_update:
# No application updates
self.log_notice("{}: no CMIS application update required...READY".format(lport))
self.port_dict[lport]['cmis_state'] = self.CMIS_STATE_READY
continue

self.log_notice("{}: force Datapath reinit".format(lport))
self.port_dict[lport]['cmis_state'] = self.CMIS_STATE_DP_DEINIT
elif state == self.CMIS_STATE_DP_DEINIT:
# D.2.2 Software Deinitialization
Expand Down Expand Up @@ -1382,6 +1457,15 @@ def task_worker(self):
self.force_cmis_reinit(lport, retries + 1)
continue

if api.is_coherent_module():
# For ZR module, configure the laser frequency when Datapath is in Deactivated state
freq = self.port_dict[lport]['laser_freq']
if 0 != freq:
if 1 != self.configure_laser_frequency(api, lport, freq):
self.log_error("{} failed to configure laser frequency {} GHz".format(lport, freq))
else:
self.log_notice("{} configured laser frequency {} GHz".format(lport, freq))

# D.1.3 Software Configuration and Initialization
appl = self.get_cmis_application_desired(api, host_lanes, host_speed)
if appl < 1:
Expand Down Expand Up @@ -2080,7 +2164,7 @@ def init(self):
if multi_asic.is_multi_asic():
# Load the namespace details first from the database_global.json file.
swsscommon.SonicDBConfig.initializeGlobalConfig()
# To prevent race condition in get_all_namespaces() we cache the namespaces before
# To prevent race condition in get_all_namespaces() we cache the namespaces before
# creating any worker threads
self.namespaces = multi_asic.get_front_end_namespaces()

Expand Down