new BGP Monitor Daemon to pull BGP peer state and store in State DB for MIB consumption #1429

gechiang · 2020-09-05T07:55:08Z

Added bgpmon (BGP Monitor Daemon) support to assist gathering BGP related information periodically based on BGP activity detection to store those information into State DB. For components such as SNMP agent that uses this new BGP related DB in state DB to support the MIB operation.
This PR has the implementation of the bgpmon processing code. There is another PR (sonic-net/sonic-buildimage#5329) needed in the dockers/docker-fpm-frr to start up this bgpmon inside of the BGP docker.
Both of these PRs are needed to achieve the full functionality of bgpmon running inside of BGP docker.

The HLD doc PR can be found here:

[Azure-SONiC] (CiscoBgp4MIB implementation changes SONiC#667)

The following PR (SNMP Agent Changes to use State DB for BGP MIB )has a dependency on this PR:

[Azure-sonic-snmpagent] (Implementation changes for CiscoBgp4MIB sonic-snmpagent#158)

Signed-off-by: Gen-Hwa Chiang gechiang@microsoft.com

- What I did

Added a new daemon to be run under BGP docker periodically to gather the BGP peer state information and update them in
the state DB. In order not to continuously running even when there are no BGP peer activities, this new daemon will use the BGP frr.log file timestamp as an indicator whether there are any activities in BGP that warrants it to inspect the peer state info for possible updates.

- How I did it

Start a periodic timer to get triggered every 15 second.
Check if there is a change in the timestamp of the frr.log file against a cached timestamp from the previous update.
Request the back-end handler (FRR/Zebra) to dump out the "show bgp summary json"
For each peer state that it just received, check it against the previous state that was cached and if there is a change or this is new peer, update/add the corresponding state DB table entry.
At the end of processing all the peer info, check if there are any stale peer entry present and delete the corresponding state DB entry accordingly.

- How to verify it
For IPV4 You can perform "show ip bgp summary" and use that output to check against the state DB for each peer.
The state DB key would be something like "NEIGH_STATE_TABLE|10.0.0.33" where 10.0.0.33 is the peer ip address.
you can then read out the state information by issuing the redis cmd hgetall "NEIGH_STATE_TABLE|10.0.0.33". Compare that with the corresponding output of "show ip bgp summary".
Here is a sample output from the State DB:
admin@str-s6000-acs-8:~$ redis-cli -n 6
127.0.0.1:6379[6]> keys "NEIGH*"

"NEIGH_STATE_TABLE|fc00::1a"
"NEIGH_STATE_TABLE|fc00::32"
"NEIGH_STATE_TABLE|10.0.0.33"
"NEIGH_STATE_TABLE|fc00::66"
"NEIGH_STATE_TABLE|fc00::62"
"NEIGH_STATE_TABLE|fc00::46"
"NEIGH_STATE_TABLE|10.0.0.35"
"NEIGH_STATE_TABLE|10.0.0.53"
"NEIGH_STATE_TABLE|10.0.0.47"
"NEIGH_STATE_TABLE|10.0.0.9"
"NEIGH_STATE_TABLE|fc00::6e"
"NEIGH_STATE_TABLE|10.0.0.41"
"NEIGH_STATE_TABLE|10.0.0.63"
"NEIGH_STATE_TABLE|10.0.0.17"
"NEIGH_STATE_TABLE|fc00::4a"
"NEIGH_STATE_TABLE|fc00::52"
"NEIGH_STATE_TABLE|fc00::5e"
"NEIGH_STATE_TABLE|10.0.0.29"
"NEIGH_STATE_TABLE|10.0.0.45"
"NEIGH_STATE_TABLE|fc00::2a"
"NEIGH_STATE_TABLE|fc00::42"
"NEIGH_STATE_TABLE|10.0.0.1"
"NEIGH_STATE_TABLE|10.0.0.59"
127.0.0.1:6379[6]> hgetall "NEIGH_STATE_TABLE|10.0.0.1"
"state"
"Established"
127.0.0.1:6379[6]>

you can do the same for ipv6 by performing "show ipv6 bgp summary" and follow the same steps above to validate with the corresponding peer states stored in state DB.

…ker to assist storing some BGP DB info into State DB for others to consume

lgtm-com · 2020-09-05T07:59:53Z

This pull request introduces 5 alerts when merging f805107 into 65f63c1 - view on LGTM.com

new alerts:

5 for Unused import

…ker to assist storing some BGP DB info into State DB for others to consume

gechiang · 2020-09-06T06:13:37Z

retest lgtm please

fpmsyncd/bgpmon.py

qiluo-msft · 2020-09-08T06:26:49Z

fpmsyncd/bgpmon.py

+                if self.ipv4_n_state[neighb] != self.new_ipv4_n_state[neighb]:
+                    # state changed. Update state DB for this entry
+                    state = self.new_ipv4_n_state[neighb]
+                    self.db.set(self.db.STATE_DB, key, 'state', state)


set [](start = 28, length = 3)

Suggest to use https://redis.io/commands/expire in case this process crashes or always fails.

If this is the right direction, "only update the entry if sate changed" may be not that important since we use pipeline and there is only one transaction. #WontFix

@qiluo-msft Since this StateDB is required by SNMP proxy agent and potential future consumers, if this process always fails, it will not matter what the BGP StateDB entries gets properly deleted or not... those agents will not operate correctly until bgpmon daemon is healthy again. If my understanding is correct after reading through the "expire" documentation it will require bgpmon to periodically update the expire to ensure those BGP key states does not get expired/deleted when there is no changes needed. Since we want to favor the design for the steady state condition (no more peer state changes in steady state), we should not use "expire" or else bgpmon in steady state will end up having to periodically update the state DB even there are no state changes. In this desing, bgpmon when restart will force the state DB to clean up and provide the most updated state from the new snapshot it gathered from BGP/Zebra. So StateDB clean up is taken cared of when necessary to prevent stale entries from staying in the state DB. Also, using "expire" means someone (Redis) will need to keep track of the time for each key. This operation is not free no matter how efficient Redis implements it...

Let's assume there is a bug inside this daemon and it is stuck somewhere. It is better to expire and propagate this error to monitors. In that situation, it's dangerous to keep as false stable.

In reply to: 484678284 [](ancestors = 484678284)

@qiluo-msft Per our phone discussion I think we agreed that this is ok for now.

qiluo-msft

As comments

…host ip when connect stateDB, moved cleaning snapshot DS only when no exception, removed unused log and debug counter

fpmsyncd/bgpmon.py

…de. Specified python2

qiluo-msft · 2020-09-14T15:55:40Z

fpmsyncd/bgpmon.py

+            return True
+
+    # Get a new snapshot of BGP neighbors and store them in the "new" location
+    def get_all_neigh_states(self):


get_all_neigh_states [](start = 8, length = 20)

Could you add some unit test or vs test?

@qiluo-msft Will add unit test in a separate PR.

Prefer adding in the this PR if you can. It even benefit yourself in iterations.

… error code, use Redis pipeline for stateDB operations

lgtm-com · 2020-09-16T06:59:41Z

This pull request introduces 1 alert when merging 85dce8e into f041c84 - view on LGTM.com

new alerts:

1 for Unused import

qiluo-msft · 2020-09-16T17:57:34Z

fpmsyncd/bgpmon.py

+                self.peer_state[peer] = state
+            if pipe_count > PIPE_BATCH_MAX_COUNT:
+                self.flush_pipe(data)
+                data = {}


Move data = {} into flush_pipe() function body #Closed

@qiluo-msft Agreed. unlike "c", python pass the object instead of copy the value so it make sense to clear the data{} inside this flush_pipe() method.

qiluo-msft · 2020-09-16T17:58:46Z

fpmsyncd/bgpmon.py

+                data = {}
+                pipe_count = 0
+        # If anything in the pipeline not yet flushed, flush them now
+        if pipe_count > 0:


Suggested change

if pipe_count > 0:

if len(data) > 0:

``` #Closed

@qiluo-msft Agreed. There is no need to introduce the extra counter "pipe_count" since python dict implementation maintains an item count that can be accessed with O(1) time so it make sense to use the len(data) instead.

qiluo-msft · 2020-09-16T18:04:03Z

fpmsyncd/bgpmon.py

+            return True
+
+    # Get a new snapshot of BGP neighbors and store them in the "new" location
+    def get_all_neigh_states(self):


Prefer adding in the this PR if you can. It even benefit yourself in iterations.

qiluo-msft

LGTM except missing test

gechiang · 2020-09-18T20:28:57Z

Based on offline comment from Guohan, the bgpmon code should not reside in this repo. Based on its functionality it should be moved to under sonic-buildimage/src/sonic-bgpcfgd .
I am closing this PR and have moved this same code to the following PR:
sonic-net/sonic-buildimage#5329
Please use the above for further review.
There will be no more activities on this PR.

Warn user while deleting VLAN if it has IP addresses. Signed-off-by: d-dashkov <Dmytro_Dashkov@Jabil.com>

gechiang added 2 commits September 4, 2020 18:12

Implement bgpmon (BGP Monitor) daemon to run under docker-fpm-frr doc…

c31e43c

…ker to assist storing some BGP DB info into State DB for others to consume

Implement bgpmon (BGP Monitor) daemon to run under docker-fpm-frr doc…

f805107

…ker to assist storing some BGP DB info into State DB for others to consume

gechiang requested review from lguohan, qiluo-msft, pavel-shirshov, rlhui and SuvarnaMeenakshi September 5, 2020 07:55

Implement bgpmon (BGP Monitor) daemon to run under docker-fpm-frr doc…

c6a9a80

…ker to assist storing some BGP DB info into State DB for others to consume

gechiang mentioned this pull request Sep 6, 2020

Add bgpmon to be started as a new daemon under BGP docker sonic-net/sonic-buildimage#5329

Merged