Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bgpmon to be started as a new daemon under BGP docker #5329

Merged
merged 4 commits into from
Sep 20, 2020
Merged

Add bgpmon to be started as a new daemon under BGP docker #5329

merged 4 commits into from
Sep 20, 2020

Conversation

gechiang
Copy link
Collaborator

@gechiang gechiang commented Sep 6, 2020

This PR is to auto start the bgpmon (BGP Monitor) Daemon under the BGP docker.

The purpose of bgpmon (BGP Monitor daemon) is to assist gathering BGP related information periodically based on BGP activity detection to store BGP Neighbor information into State DB. For components such as SNMP agent that uses this new BGP related DB in state DB to support the MIB operation.
This PR is to auto start the bgpmon daemon under the BGP docker. There is another PR which I originally used to implements the bgpmon processing code (sonic-net/sonic-swss#1429) that was later decided to be moved into this PR instead of under the sonic-swss repo. Many valuable comments are still in that PR so reader should refer to that PR for review comments history purpose only.

Note: This new daemon is also added to be monitored by Monit so that in case it stops running it will be reported.

The HLD doc PR can be found here:

The following PR (SNMP Agent Changes to use State DB for BGP MIB )has a dependency on this PR:

The following PR contains the Pytest code that validate this feature functionality whenever VS Image test is run as part of all PRs as well as used in DUT pytest:
-[Azure-sonic-mgmt] (sonic-net/sonic-mgmt#2241)

Signed-off-by: Gen-Hwa Chiang gechiang@microsoft.com

- What I did

Added a new daemon to be run under BGP docker periodically to gather the BGP peer state information and update them in
the state DB. In order not to continuously running even when there are no BGP peer activities, this new daemon will use the BGP frr.log file timestamp as an indicator whether there are any activities in BGP that warrants it to inspect the peer state info for possible updates.

- How I did it

  • Start a periodic timer to get triggered every 15 second.
  • Check if there is a change in the timestamp of the frr.log file against a cached timestamp from the previous update.
  • Request the back-end handler (FRR/Zebra) to dump out the "show bgp summary json"
  • For each peer state that it just received, check it against the previous state that was cached and if there is a change or this is new peer, update/add the corresponding state DB table entry.
  • At the end of processing all the peer info, check if there are any stale peer entry present and delete the corresponding state DB entry accordingly.

- How to verify it

once the BGP docker is up and running, you can perform "show services" to see that this new daemon "bgpmon.py" is running under the BGP docker service similar to the following output:

admin@SONic:~$ show services
...
bgp     docker
---------------------------
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.2  0.2  28576 22632 pts/0    Ss+  19:10   0:08 /usr/bin/python2 /usr/bin/supervisord
root        21  0.0  0.2  22664 16716 pts/0    S    19:11   0:00 python /usr/bin/supervisor-proc-exit-listener --container-name bgp
root        25  0.0  0.0 225856  3512 pts/0    Sl   19:11   0:00 /usr/sbin/rsyslogd -n -iNONE
frr         29  1.0  0.4 525176 36580 pts/0    Sl   19:11   0:40 /usr/lib/frr/zebra -A 127.0.0.1 -s 90000000 -M fpm -M snmp
frr         30  0.0  0.0  43304  6080 pts/0    S    19:11   0:00 /usr/lib/frr/staticd -A 127.0.0.1
frr         31  2.5  1.0 356196 84108 pts/0    Sl   19:11   1:36 /usr/lib/frr/bgpd -A 127.0.0.1 -M snmp
root        40  0.1  0.6  69172 55876 pts/0    S    19:11   0:05 /usr/bin/python /usr/local/bin/bgpcfgd
root        41  0.0  0.1  20988 15508 pts/0    S    19:11   0:00 /usr/bin/python /usr/local/bin/bgpmon
root        42  0.2  0.0  82108  6136 pts/0    Sl   19:11   0:10 fpmsyncd
...

For IPV4 You can perform "show ip bgp summary" and use that output to check against the state DB for each peer.
The state DB key would be something like "NEIGH_STATE_TABLE|10.0.0.33" where 10.0.0.33 is the peer ip address.
you can then read out the state information by issuing the redis cmd hgetall "NEIGH_STATE_TABLE|10.0.0.33". Compare that with the corresponding output of "show ip bgp summary".
Here is a sample output from the State DB:
admin@str-s6000-acs-8:~$ redis-cli -n 6
127.0.0.1:6379[6]> keys "NEIGH*"

  1. "NEIGH_STATE_TABLE|fc00::1a"
  2. "NEIGH_STATE_TABLE|fc00::32"
  3. "NEIGH_STATE_TABLE|10.0.0.33"
  4. "NEIGH_STATE_TABLE|fc00::66"
  5. "NEIGH_STATE_TABLE|fc00::62"
  6. "NEIGH_STATE_TABLE|fc00::46"
  7. "NEIGH_STATE_TABLE|10.0.0.35"
  8. "NEIGH_STATE_TABLE|10.0.0.53"
  9. "NEIGH_STATE_TABLE|10.0.0.47"
  10. "NEIGH_STATE_TABLE|10.0.0.9"
  11. "NEIGH_STATE_TABLE|fc00::6e"
  12. "NEIGH_STATE_TABLE|10.0.0.41"
  13. "NEIGH_STATE_TABLE|10.0.0.63"
  14. "NEIGH_STATE_TABLE|10.0.0.17"
  15. "NEIGH_STATE_TABLE|fc00::4a"
  16. "NEIGH_STATE_TABLE|fc00::52"
  17. "NEIGH_STATE_TABLE|fc00::5e"
  18. "NEIGH_STATE_TABLE|10.0.0.29"
  19. "NEIGH_STATE_TABLE|10.0.0.45"
  20. "NEIGH_STATE_TABLE|fc00::2a"
  21. "NEIGH_STATE_TABLE|fc00::42"
  22. "NEIGH_STATE_TABLE|10.0.0.1"
  23. "NEIGH_STATE_TABLE|10.0.0.59"
    127.0.0.1:6379[6]> hgetall "NEIGH_STATE_TABLE|10.0.0.1"
  24. "state"
  25. "Established"
    127.0.0.1:6379[6]>

you can do the same for ipv6 by performing "show ipv6 bgp summary" and follow the same steps above to validate with the corresponding peer states stored in state DB.

qiluo-msft
qiluo-msft previously approved these changes Sep 9, 2020
@gechiang
Copy link
Collaborator Author

Added one more change to include Monit to monitor this new daemon under BGP so that when it failed/crashed, it will raise an alert to syslog the issue. It will show up something like following:

Sep 10 03:36:24.532960 str-s6000-acs-8 ERR monit[21086]: 'bgpmon' process is not running
Sep 10 03:37:24.615276 str-s6000-acs-8 ERR monit[21086]: 'sflowmgrd' process is not running
Sep 10 03:37:24.661284 str-s6000-acs-8 ERR monit[21086]: 'bgpmon' process is not running

@gechiang
Copy link
Collaborator Author

@qiluo-msft Sorry... Have to add one more change to ensure this new daemon is monitored by MONIT and raise an alert (syslog the issue) when bgpmon stops running (crashed).
Please help review the new added file.
Thanks!

@@ -21,3 +22,6 @@ check process staticd matching "/usr/lib/frr/staticd"

check process bgpcfgd matching "python /usr/local/bin/bgpcfgd"
if does not exist for 5 times within 5 cycles then alert

check process bgpmon matching "python2 /usr/bin/bgpmon.py"
Copy link
Collaborator

@qiluo-msft qiluo-msft Sep 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/usr/bin/bgpmon.py [](start = 39, length = 18)

Does the process name include "python2"? If yes, possible to remove? #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qiluo-msft Per comment in sonic-net/sonic-swss#1429 to make the daemon specific to python2, once that change is in place we see it gets reflected under the output of show services somethis as following:

admin@SONic:~$ show services
...
bgp     docker
---------------------------
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  5.2  0.2  28344 22532 pts/0    Ss+  03:47   0:03 /usr/bin/python2 /usr/bin/supervisord
root        21  0.7  0.2  22660 16684 pts/0    S    03:47   0:00 python /usr/bin/supervisor-proc-exit-listener --container-name bgp
root        25  0.1  0.0 225856  3488 pts/0    Sl   03:47   0:00 /usr/sbin/rsyslogd -n -iNONE
frr         29  0.8  0.1 503176 16156 pts/0    Sl   03:47   0:00 /usr/lib/frr/zebra -A 127.0.0.1 -s 90000000 -M fpm -M snmp
frr         30  0.2  0.0  43176  6140 pts/0    S    03:47   0:00 /usr/lib/frr/staticd -A 127.0.0.1
frr         31  1.3  0.2 294536 20960 pts/0    Sl   03:47   0:00 /usr/lib/frr/bgpd -A 127.0.0.1 -M snmp
root        38  6.1  0.6  69172 55992 pts/0    S    03:47   0:03 /usr/bin/python /usr/local/bin/bgpcfgd
root        41  0.6  0.1  20996 15028 pts/0    S    03:47   0:00 python2 /usr/bin/bgpmon.py
...

For this reason the Monit config specification needs to match it.

qiluo-msft
qiluo-msft previously approved these changes Sep 15, 2020
Copy link
Collaborator

@qiluo-msft qiluo-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

qiluo-msft
qiluo-msft previously approved these changes Sep 18, 2020
@gechiang
Copy link
Collaborator Author

Based on offline comment from Guohan we have decided to pull in the changes done in (sonic-net/sonic-swss#1429) to here and just have one single PR instead of 2 separate PRs. Another reason is that the sonic-swss repo was not the right place for this new bgpmon daemon code to reside with. Its functionality make more sense to be under "sonic-bgpcfgd".

@@ -9,7 +9,7 @@
author_email='pavelsh@microsoft.com',
url='https://github.com/Azure/sonic-buildimage',
packages=setuptools.find_packages(),
scripts=['bgpcfgd'],
scripts=['bgpcfgd','bgpmon.py'],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do console_script packaging? it is better to do that and we can get rid of .py suffix.

https://python-packaging.readthedocs.io/en/latest/command-line-scripts.html

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lguohan Agreed. I have modified to use console_scripts packaging and it indeed got rid of the .py suffix.
Code re-tested to be functioning as before.

qiluo-msft
qiluo-msft previously approved these changes Sep 18, 2020
Copy link
Collaborator

@qiluo-msft qiluo-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please also resolve others' comments.

@gechiang
Copy link
Collaborator Author

retest vsimage please

@gechiang gechiang merged commit 128def6 into sonic-net:master Sep 20, 2020
@qiluo-msft
Copy link
Collaborator

@gechiang Could you add some unit test or vs test?

santhosh-kt pushed a commit to santhosh-kt/sonic-buildimage that referenced this pull request Feb 25, 2021
…5329)

* Add bgpmon under sonic-bgpcfgd to be started as a new daemon under BGP docker

* Added bgpmon to be monitored by Monit so that if it crashed, it gets alerted

* use console_scripts entry point to package bgpmon
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants