Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make swssconfig status FATAL when it fails #1009

Merged
merged 3 commits into from
Oct 4, 2017
Merged

Make swssconfig status FATAL when it fails #1009

merged 3 commits into from
Oct 4, 2017

Conversation

qiluo-msft
Copy link
Collaborator

@qiluo-msft qiluo-msft commented Oct 3, 2017

Make supervisor controled one-shot program autorestart 0 time, so the status will become FATAL instead of EXITED if fails. 'supervisorctl status' differentiates between normal exit and failure.

It will help diagnose in running switch.

Signed-off-by: Qi Luo qiluo-msft@users.noreply.github.com

- What I did
The supervisor controlled one-shot program status will become FATAL instead of EXITED if failure happens
- How I did it
Make supervisor controled one-shot program autorestart 0 time
- How to verify it
Test in lab switch
- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

…e status will become FATAL instead of EXITED if failure happens

Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>
@@ -46,7 +46,7 @@ elif [ "$HWSKU" == "Force10-S6000-Q32" ]; then
elif [ "$HWSKU" == "Arista-7050-QX32" ]; then
SWSSCONFIG_ARGS+="td2.32ports.buffers.json td2.32ports.qos.json "
elif [[ "$HWSKU" == "ACS-MSN27"* ]]; then
sonic-cfggen -m /etc/sonic/minigraph.xml -t /usr/share/sonic/templates/msn27xx.32ports.buffers.json.j2 > /etc/swss/config.d/msn27xx.32ports.buffers.json
sonic-cfggen -m /etc/sonic/minigraph.xml -t /usr/share/sonic/templates/msn27xx.32ports.buffers.json.j2 > /etc/swss/config.d/msn27xx.32ports.buffers.json || exit 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about set -e for this script?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The swssconfig.sh could be improved in future. Currently there is no assumption that every command will success there, except the obvious line such as sonic-cfggen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get why there's no assumption that every command will success? right now there're sonic-cfggen and swssconfig commands. I think both of them need to be successful after execution.

@jleveque
Copy link
Contributor

jleveque commented Oct 3, 2017

What is the ultimate behavior change you're trying to achieve with this?

Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>
@qiluo-msft qiluo-msft changed the title Make supervisor controled one-shot program autorestart 0 time, so the status will become FATAL instead of EXITED if failure happens Make swssconfig status FATAL when it fails Oct 3, 2017
Copy link
Contributor

@stcheng stcheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also check with other reviewers

@lguohan
Copy link
Collaborator

lguohan commented Oct 3, 2017

does this change the syslog message?

@qiluo-msft
Copy link
Collaborator Author

qiluo-msft commented Oct 3, 2017

It only change syslog this way

Oct  4 06:25:42.643077 sonic INFO supervisord: start.sh neighsyncd: started
Oct  4 06:25:42.801254 sonic INFO swss.sh[26769]: 2017-10-04 06:25:42,800 INFO spawned: 'swssconfig' with pid 53
Oct  4 06:25:43.402113 sonic INFO supervisord: swssconfig Traceback (most recent call last):
Oct  4 06:25:43.402279 sonic INFO supervisord: swssconfig   File "/usr/local/bin/sonic-cfggen", line 215, in <module>
Oct  4 06:25:43.402502 sonic INFO supervisord: swssconfig     main()
Oct  4 06:25:43.402593 sonic INFO supervisord: swssconfig   File "/usr/local/bin/sonic-cfggen", line 196, in main
Oct  4 06:25:43.402676 sonic INFO supervisord: swssconfig     print template.render(data)
Oct  4 06:25:43.402755 sonic INFO supervisord: swssconfig   File "/usr/local/lib/python2.7/dist-packages/jinja2/environment.py", line 1008, in render
Oct  4 06:25:43.403235 sonic INFO supervisord: swssconfig     return self.environment.handle_exception(exc_info, True)
Oct  4 06:25:43.403330 sonic INFO supervisord: swssconfig   File "/usr/local/lib/python2.7/dist-packages/jinja2/environment.py", line 780, in handle_exception
Oct  4 06:25:43.403695 sonic INFO supervisord: swssconfig     reraise(exc_type, exc_value, tb)
Oct  4 06:25:43.403790 sonic INFO supervisord: swssconfig   File "/usr/share/sonic/templates/msn27xx.32ports.buffers.json.j2", line 209, in top-level template code
Oct  4 06:25:43.404032 sonic INFO supervisord: swssconfig     {%- set port_config = speed + '_' + cable -%}
Oct  4 06:25:43.404121 sonic INFO supervisord: swssconfig jinja2.exceptions.UndefinedError: 'dict object' has no attribute 'speed'
Oct  4 06:25:43.423018 sonic INFO swss.sh[26769]: 2017-10-04 06:25:43,422 INFO exited: swssconfig (exit status 1; not expected)
Oct  4 06:25:44.424811 sonic INFO swss.sh[26769]: 2017-10-04 06:25:44,424 INFO gave up: swssconfig entered FATAL state, too many start retries too quickly
Oct  4 06:25:44.428831 sonic INFO supervisord: start.sh swssconfig: ERROR (spawn error)

Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>
@lguohan lguohan merged commit 554114c into sonic-net:master Oct 4, 2017
@qiluo-msft qiluo-msft deleted the qiluo/supervisorfatal branch October 4, 2017 17:31
zhenggen-xu pushed a commit to zhenggen-xu/sonic-buildimage that referenced this pull request Oct 17, 2019
* msft_github/master:
  [DHCP Relay]: Support Multiple VLANs (Separate DHCP Relay Agents, One Per VLAN) (sonic-net#999)
  [build]: sonic-utilities package depends on swsssdk; build as wheel and add build dependency (sonic-net#1011)
  Make swssconfig status FATAL when it fails (sonic-net#1009)
  [swss]: Update swss-common/sairedis/swss submodules (sonic-net#1008)
  [config-engine]: Fix bug multiple ports connecting to same neighbor (sonic-net#1005)
stephenxs added a commit to stephenxs/sonic-buildimage that referenced this pull request Sep 8, 2020
Fix a typo in mellanox_buffer_migrator (sonic-net#1090)
[CLI][PFCWD][Multi-ASIC] Added multi ASIC support to 'pfcwd' CLI command (sonic-net#1080)
Add namespace of the process in the coredump filename. (sonic-net#1091)
[setup.py] Add aliases.ini to sonic_installer package (sonic-net#1088)
[pfcwd] Add single asic unit tests for show commands (sonic-net#1085)
Enhance SONiC with kubernetes management commands (sonic-net#962)
[counterpoll] add port buffer drop group (sonic-net#1009)
[CLI][PFC] Add multi ASIC options for pfcstat and 'show pfc counters' (sonic-net#1057)

Signed-off-by: Stephen Sun <stephens@nvidia.com>
volodymyrsamotiy added a commit to volodymyrsamotiy/sonic-buildimage that referenced this pull request Sep 8, 2020
5c173f7 [counterpoll] add port buffer drop group (sonic-net#1009)
62e44d9 [CLI][PFC] Add multi ASIC options for pfcstat and 'show pfc counters' (sonic-net#1057)

Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
stepanblyschak pushed a commit to stepanblyschak/sonic-buildimage that referenced this pull request May 10, 2021
Signed-off-by: Mykola Faryma <mykolaf@mellanox.com>
Co-authored-by: Danny Allen <daall@microsoft.com>
Co-authored-by: Volodymyr Samotiy <volodymyrs@mellanox.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants