Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dhcp_relay] [vlan] [CLI] Add/Del VLAN config should not restart dhcp_relay service #13910

Closed
vivekrnv opened this issue Feb 21, 2023 · 14 comments

Comments

@vivekrnv
Copy link
Contributor

vivekrnv commented Feb 21, 2023

Description

Add/Del a VLAN shouldn't restart the dhcp_relay service

Steps to reproduce the issue:

1. config vlan add 30

Describe the results you received:

Restarting DHCP relay service...
root@sonic:/home/admin#

Describe the results you expected:

Add/Del a vlan should not restart the dhcpv6 relay service for myriad of reasons:

  1. DHCP_RELAY is only enabled on ToRRouter. On other types. the cli will fail
root@sonic:/home/admin# config vlan add 40
Restarting DHCP relay service...
Failed to reset failed state of unit dhcp_relay.service: Unit dhcp_relay.service not loaded.
Usage: config vlan add [OPTIONS] <vid>
Try "config vlan add -h" for help.

Error: Restart service dhcp_relay failed with error 1
  1. Adding VLAN doesn't imply anything for dhcp_relay, only when a dhcp server is to be added or when an ipv4 interface is attached to the VLAN, the service must be restarted.

  2. Even for Delete VLAN, we can't assume a restart is required since isc-dhcp-relay only uses the VLAN's which has an interface configured as the upstream interfaces https://github.com/sonic-net/sonic-buildimage/blob/master/dockers/docker-dhcp-relay/dhcpv4-relay.agents.j2#L17

So, i think further analysis is required to understand how to dynamically restart the dhcp_relay service when there is a change to VLAN table but the current change for sure doesn't solve the problem. Thus i'd suggest fixing the changes made here:

sonic-net/sonic-utilities#2660

Output of show version:

SONiC Software Version: SONiC.202211_RC5.6-005b4e301_Internal
Distribution: Debian 11.6
Kernel: 5.10.0-18-2-amd64
Build commit: 005b4e301
Build date: Thu Feb 16 12:20:00 UTC 2023
Built by: sw-r2d2-bot@r-build-sonic-ci03-241

Platform: x86_64-mlnx_msn2100-r0
HwSKU: ACS-MSN2100
ASIC: mellanox
ASIC Count: 1
Serial Number: MT1752X06330
Model Number: MSN2100-CB2F
Hardware Revision: A1
Uptime: 22:40:40 up 5 days,  2:10,  1 user,  load average: 0.41, 0.40, 0.49
Date: Tue 21 Feb 2023 22:40:40

Docker images:
REPOSITORY                                         TAG                               IMAGE ID       SIZE
docker-platform-monitor                            202211_RC5.6-005b4e301_Internal   9fa84f0d69e4   941MB
docker-platform-monitor                            latest                            9fa84f0d69e4   941MB
docker-syncd-mlnx                                  202211_RC5.6-005b4e301_Internal   13d7f54fa389   936MB
docker-syncd-mlnx                                  latest                            13d7f54fa389   936MB
docker-orchagent                                   202211_RC5.6-005b4e301_Internal   10e4bea591f7   538MB
docker-orchagent                                   latest                            10e4bea591f7   538MB
docker-fpm-frr                                     202211_RC5.6-005b4e301_Internal   22c5eaabdcdc   548MB
docker-fpm-frr                                     latest                            22c5eaabdcdc   548MB
docker-teamd                                       202211_RC5.6-005b4e301_Internal   793b96145060   519MB
docker-teamd                                       latest                            793b96145060   519MB
docker-macsec                                      latest                            13cf01cbba97   521MB
docker-sonic-p4rt                                  202211_RC5.6-005b4e301_Internal   411cbcb5241a   585MB
docker-sonic-p4rt                                  latest                            411cbcb5241a   585MB
docker-nat                                         202211_RC5.6-005b4e301_Internal   4b62de182f4b   489MB
docker-nat                                         latest                            4b62de182f4b   489MB
docker-sflow                                       202211_RC5.6-005b4e301_Internal   cd7df3e0438c   487MB
docker-sflow                                       latest                            cd7df3e0438c   487MB
docker-sonic-telemetry                             202211_RC5.6-005b4e301_Internal   84f45b4152ac   792MB
docker-sonic-telemetry                             latest                            84f45b4152ac   792MB
docker-dhcp-relay                                  latest                            ebb92216d8ba   503MB
docker-snmp                                        202211_RC5.6-005b4e301_Internal   ec167ba9f3dc   540MB
docker-snmp                                        latest                            ec167ba9f3dc   540MB
docker-eventd                                      202211_RC5.6-005b4e301_Internal   e4fcf11a22c1   494MB
docker-eventd                                      latest                            e4fcf11a22c1   494MB
docker-router-advertiser                           202211_RC5.6-005b4e301_Internal   698dea4ca495   494MB
docker-router-advertiser                           latest                            698dea4ca495   494MB
docker-mux                                         202211_RC5.6-005b4e301_Internal   88902826ce1f   542MB
docker-mux                                         latest                            88902826ce1f   542MB
docker-database                                    202211_RC5.6-005b4e301_Internal   c5e6375418a4   494MB
docker-database                                    latest                            c5e6375418a4   494MB
docker-lldp                                        202211_RC5.6-005b4e301_Internal   403f159de975   536MB
docker-lldp                                        latest                            403f159de975   536MB
docker-sonic-mgmt-framework                        202211_RC5.6-005b4e301_Internal   84bec287c46f   612MB
docker-sonic-mgmt-framework                        latest                            84bec287c46f   612MB

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@vivekrnv
Copy link
Contributor Author

@yaqiangz, @jcaiMR , @yxieca PFA

@yaqiangz
Copy link
Contributor

@yaqiangz, @jcaiMR , @yxieca PFA

Hi @vivekrnv. This change is for DHCPv6Relay.

  1. In previous code, add/del a vlan will refresh DHCPv4Relay config in VLAN table, but do nothing to DHCPv6Relay config in DHCP_RELAY table. It would cause the same vlan to be added after the vlan is deleted, and the previous DHCPv6Relay config will still exist, but this is outdated. We think this is somewhat unreasonable. This is why refresh DHCP_RELAY table after adding/deleting a vlan was added.
  2. DHCPv6Relay config is in DHCP_RELAY table and doesn't support dynamic change yet, so after it is changed, we have to restart dhcp_relay.
  3. Now operation to restart dhcp_relay service is in the end of add or del a vlan, restart failed wouldn't affect the process of adding or deleting a vlan. But I think for the issue you found, we can add checks in the code, if the device does not have dhcp_relay service, we will not restart it while adding or deleting a vlan, then this error will not appear.

@vivekrnv
Copy link
Contributor Author

  1. In previous code, add/del a vlan will refresh DHCPv4Relay config in VLAN table, but do nothing to DHCPv6Relay config in DHCP_RELAY table. It would cause the same vlan to be added after the vlan is deleted, and the previous DHCPv6Relay config will still exist, but this is outdated. We think this is somewhat unreasonable. This is why refresh DHCP_RELAY table after adding/deleting a vlan was added.
  2. DHCPv6Relay config is in DHCP_RELAY table and doesn't support dynamic change yet, so after it is changed, we have to restart dhcp_relay.

Even the updated code today doesn't refresh DHCPv6Relay Config,

root@r-leopard-simx-79:/home/admin# config vlan add 40
Restarting DHCP relay service...
root@r-leopard-simx-79:/home/admin# sonic-db-cli CONFIG_DB KEYS DHCP_RELAY\*

root@r-leopard-simx-79:/home/admin# 

this'll the delete the key from redis https://github.com/sonic-net/sonic-utilities/blob/master/config/vlan.py#L50, so there won't be any change for in the config for dhcpv6_relay and restarting the service is not required for v6.

As for v4, this is only relevant when an interface is attached to the VLAN https://github.com/sonic-net/sonic-buildimage/blob/master/dockers/docker-dhcp-relay/dhcpv4-relay.agents.j2#L18, if the VLAN doesn't have an interface attached, then the arguments for the v4 binary will look just the same.

Same applies to VLAN del, if the VLAN netdev has no iface, the delete of this VLAN doesn't affect DHCP relay

So, restarting the service during vlan add/del is not the ideal solution

If we really want to make DHCP RELAY work properly in case of dynamic changes to other tables, We should do the restart/refresh of dhcp_relay during the updates to one of VLAN_INTERFACE, PORTCHANNEL_INTERFACE & INTERFACE tables and and not add/del to VLAN table

@yaqiangz
Copy link
Contributor

yaqiangz commented Feb 23, 2023

  1. In previous code, add/del a vlan will refresh DHCPv4Relay config in VLAN table, but do nothing to DHCPv6Relay config in DHCP_RELAY table. It would cause the same vlan to be added after the vlan is deleted, and the previous DHCPv6Relay config will still exist, but this is outdated. We think this is somewhat unreasonable. This is why refresh DHCP_RELAY table after adding/deleting a vlan was added.
  2. DHCPv6Relay config is in DHCP_RELAY table and doesn't support dynamic change yet, so after it is changed, we have to restart dhcp_relay.

Even the updated code today doesn't refresh DHCPv6Relay Config,

root@r-leopard-simx-79:/home/admin# config vlan add 40
Restarting DHCP relay service...
root@r-leopard-simx-79:/home/admin# sonic-db-cli CONFIG_DB KEYS DHCP_RELAY\*

root@r-leopard-simx-79:/home/admin# 

this'll the delete the key from redis https://github.com/sonic-net/sonic-utilities/blob/master/config/vlan.py#L50, so there won't be any change for in the config for dhcpv6_relay and restarting the service is not required for v6.

As for v4, this is only relevant when an interface is attached to the VLAN https://github.com/sonic-net/sonic-buildimage/blob/master/dockers/docker-dhcp-relay/dhcpv4-relay.agents.j2#L18, if the VLAN doesn't have an interface attached, then the arguments for the v4 binary will look just the same.

Same applies to VLAN del, if the VLAN netdev has no iface, the delete of this VLAN doesn't affect DHCP relay

So, restarting the service during vlan add/del is not the ideal solution

If we really want to make DHCP RELAY work properly in case of dynamic changes to other tables, We should do the restart/refresh of dhcp_relay during the updates to one of VLAN_INTERFACE, PORTCHANNEL_INTERFACE & INTERFACE tables and and not add/del to VLAN table

Hi @vivekrnv, let me clarify:

  1. While deleting vlan, delete config in DHCP_RELAY table is for dhcpv6 relay and it is to make sure once a vlan is deleted, its dhcpv6 relay config would not be in effect and not useless config is left.
  2. While adding vlan, delete config in DHCP_RELAY table is for dhcpv6 relay and it is to make sure that possibly exists dhcpv6 relay config (added by sonic-db-cli or config load or something else) about this vlan can be deleted and not be in effect. So this is why below facts heppened. After adding a vlan, DHCP_RELAY config related to it should be empty.
root@r-leopard-simx-79:/home/admin# config vlan add 40
Restarting DHCP relay service...
root@r-leopard-simx-79:/home/admin# sonic-db-cli CONFIG_DB KEYS DHCP_RELAY\*

root@r-leopard-simx-79:/home/admin# 
  1. According to 1. and 2. This change is not mainly to deal with changes in other tables (VLAN), but to ensure that outdated and incorrect dhcpv6 relay config can be deleted and will no longer work

So, we think maybe keeping restart dhcp_relay while add/del a vlan and adding check to make sure not restart dhcp_relay in not supported device is better.

@vivekrnv
Copy link
Contributor Author

Point 2 will not happen since there is a check already in config command https://github.com/sonic-net/sonic-utilities/blob/master/config/vlan.py#L45

As for point 1, i'd suggest adding a similar check i.e. CLI should fail saying DHCP_RELAY is present.

@yaqiangz
Copy link
Contributor

Point 2 will not happen since there is a check already in config command https://github.com/sonic-net/sonic-utilities/blob/master/config/vlan.py#L45

As for point 1, i'd suggest adding a similar check i.e. CLI should fail saying DHCP_RELAY is present.

@vivekrnv, for adding vlan, agree with you. Because the config has been guaranteed to be removed when vlan del, if it is modified through sonic-db-cli or somewhat else later, it is an unexpected action, vlan add cli should not be responsible for this. But for delete vlan we think it should not be blocked by DHCP_RELAY config, so removing config in DHCP_RELAY table and restart dhcp_relay is necessary (will add some check to avoid restarting dhcp_relay failed). But by our discussion, we can add a parameter for vlan del cli, by default we remove config and restart dhcp_relay, and if the parameter is given, it can do check as what you said. If someone don't want to restart dhcp_relay, can use this parameter.

@vivekrnv
Copy link
Contributor Author

Okay, so the decision it to:

  1. not restart during vlan add
  2. Del the DHCP_RELAY config (if present) and restart the service (if config deleted and feature enabled)

Crct?

@yaqiangz
Copy link
Contributor

Okay, so the decision it to:

  1. not restart during vlan add
  2. Del the DHCP_RELAY config (if present) and restart the service (if config deleted and feature enabled)

Crct?

Point 1 is correct.
For del vlan, our plan is to add a parameter to indicate whether 'not to restart dhcp_relay' (not required and by default is False). If the parameter is False, restart dhcp_relay service (if config deleted and feature enabled). If the parameter is True, check whether related dhcpv6 relay config is exist. If not exist, del vlan without restart dhcp_relay service. If exist, fail with notification that related config is exist.

@vivekrnv
Copy link
Contributor Author

Hmm, Understand. I'm okay with this change

@vivekrnv
Copy link
Contributor Author

vivekrnv commented Mar 8, 2023

@yaqiangz, what is the ETA for fixing this?

@yaqiangz
Copy link
Contributor

yaqiangz commented Mar 8, 2023

@yaqiangz, what is the ETA for fixing this?

@vivekrnv Fix in sonic-utilities repo has been merged. The last step is to advance sonic-utilities submodule in sonic-buildimage repo, would be done soon.

@vivekrnv
Copy link
Contributor Author

vivekrnv commented Mar 8, 2023

Fixed by sonic-net/sonic-utilities#2688

@vivekrnv vivekrnv closed this as completed Mar 8, 2023
@vivekrnv
Copy link
Contributor Author

vivekrnv commented Mar 8, 2023

@yaqiangz, what is the ETA for fixing this?

@vivekrnv Fix in sonic-utilities repo has been merged. The last step is to advance sonic-utilities submodule in sonic-buildimage repo, would be done soon.

Thanks

@vivekrnv
Copy link
Contributor Author

vivekrnv commented Mar 8, 2023

@yaqiangz, can you also update the HLD with the new config/show commands. I believe the changes are present in all the branches 202012+, correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants