-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[doc][static-route] Static Route BFD HLD document #1216
Conversation
Minor editorial and formatting changes.
Update SONiC_static_route_bfd_hld.md
Minor editorial and format updates
Update SONiC_static_route_bfd_hld.md
Typo fix.
should no data carried in the event of key deletion
Follow the StaticRouteTimer style, rename the component to StaticRouteBfd. StaticRouteBfd does NOT use bgpcfgd/manager framework
change the "refreshable" field name to "expiry", because StaticRouteTimer already implemented the required feature with field name "expiry"
do we have test for this? |
Yes. The code (PR13789 and PR 13764) has passed the following tests (from the test section in this document): |
To avoid StaticRouteTimer deleting StaticRouteBfd created static route entry in appl_db, a new field "expiry" is introduced in appl_db STATIC_ROUTE_TABLE schema. StaticRouteBfd sets "expiry"="false" when it writes a static route to appl_db STATIC_ROUTE_TABLE. When StaticRouteTimer see "expiry"="false", it ignores this static route entry. <br> | ||
|
||
|
||
<img src="static_rt_bfd_overview.png" width="500"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I assume red color coded are newly introduced modules/interactions? if no, please color code accordingly. There are some typos (e.g bd, bfd stete 1). in the diagram, please fix that as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, only persistent static routes are pushed to config-db and this route will be configured to FRR and handled back as regular route via fpmsyncd for HW programming. staticroutebfd handles only the persistent static route case? how about non-persistent static route? we should handle static route entries from both config-db (persistent) and app-db (non-persistent) in staticroutebfd module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to add the details for SAI BFD offload APIs being used for this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the diagram: bfd and expiry are newly added fields. the interactions are color coded (see color code at top-right corner). fixed typos. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The target of this design is to handle static routes entry from config_db only.
If there is requirement for non-persistent static route, it could be the feature enhancement in the future.
non-persistent static route case is a more complicated scenario (it requires the client be aware of new "bfd" field, etc ). Because that StaticRouteBfd also need to write static route to AppDB to let StaticRouteMgr (appl_db) to install the route to FRR, so it may need to use different key for BFD enabled route, otherwise the static route entry will be overwritten if use same key to write to same table.
so this design support the static route in config_db only.
1. find the interface name for the nexthop | ||
2. lookup the config_db, find the IP address of that interface | ||
3. choose the IP address which has same type of nexthop (IPv4 or IPv6) | ||
4. If no IP address found, use local loopback IP address, and log a warning message for that. <br /><br /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if there are more than one local loopback address? suggest to log a warning message when local IP addresses are not found, we should be explicit if possible instead of deciding the source IP internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently it looks up 2 loopback ip addresses:
LOOPBACK0_NAME = "Loopback0"
LOOPBACK4096_NAME = "Loopback4096"
(see the code PR, the link is in this PR description)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay as long as this is assumed by bgpcfgd and works for all cases.
4. If no IP address found, use local loopback IP address, and log a warning message for that. <br /><br /> | ||
|
||
In the example (assume the "bfd" field is "true"), for the nexthop "20.0.10.3", the corresponding interface name is PortChannel10. | ||
From the interface configuration, we can found two IP addresses, an IPv4 address 20.0.10.1 and an IPv6 address 2603:10E2:400:10::1. we choose 20.0.10.1 because the nexthop IP address is IPv4 address. <br /><br /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, when there are more than one IPv4 or IPv6 addresses, we should choose based on subnet match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a note at the beginning of this document to limit the number of IP address (one IPv4 IP address or one IPv6 IP address) for static route BFD case. The reason is that, BFD need to work in pair, two BFD sessions in local system and remote system need to talk to each other to get BFD sessions UP, remote system's peer IP address need to be same as local system's local_addr (the IP assigned to the interface in this static route BFD case). If there are more than one IPv4 (or IPv6) addresses, it will be confusing when config the remote system, and the BFD session could not be UP if the IP address mismatches.
|
||
Two optional fields are introduced to STATIC_ROUTE_TABLE: <br> | ||
1. "bfd" -- Default value is false when this field is not configured | ||
2. "expiry" -- Default value is true when this field is not configured |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why we need this "expiry" field, as we expect to have only non-persistent STATIC_ROUTE_TABLE in the APPL_DB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
related code:
https://github.com/sonic-net/sonic-buildimage/blob/bce824723c0d2c5d9cb7ae34bb830bb632d43c83/src/sonic-bgpcfgd/bgpcfgd/static_rt_timer.py#L36
to let static_rt_timer not remove the static route.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this code is being used only for non-persistent route but in this HLD we focus only persistent route (CONFIG_DB) which means there should not be any STATIC_ROUTE_TABLE in the APPL_DB or in other words, we should not worry about STATIC_ROUTE_TABLE presence in the APPL_DB, do we use STATIC_ROUTE_TABLE present in APPL_DB for something?
I thought, we could send only the BFD_SESSION information all the way to SAI via APPL_DB & ASIC_DB and get back the notification if the session is down/up for us to react in the StaticRouteBfd module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The process is that, StaticRouteBfd read static route from config_db (persistent), and write the processed route to appl_db STATIC_ROUTE_TABLE, to utilize StaticRouteMgr(appl_db) to install the route to FRR. so the processed route looks exact same as non-persistent route (to StaticRouteMgr(appl_db)). that is the reason we set expiry=false to avoid StaticRouteTimer deleting it.
for your last comments, the bfdorch is already in the system to provide bfd service, write request to appl_db BFD_SESSION_TABLE can utilize the current bfdorch service. bfdorch maintains local_discriminator and BFD sessions in sonic system, other service should not create BFD session by directly calling SAI API.
1. TABLE_CONFIG: config_db STATIC_ROUTE_TABLE cache (for the route with "bfd"="true" only) | ||
2. TABLE_NEXTHOP: different prefixes may have same nexthop. This table is used to track which prefix is using the nexthop. | ||
3. TABLE_BFD: bfd session created by StaticRouteBfd. The contents are part of appl_db BFD_SESSION_TABLE (for the session its peer IP is in nexthop table). *Assumption: BFD session is not shared with other applications* | ||
4. TABLE_SR: the static routes written to appl_db STATIC_ROUTE_TABLE by BfdRouteManager (with "expiry"="false"). It's nexthop list might be different from the configuration depends on BFD session state.<br> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to STATIC_ROUTE instead of SR as it conflicts with segment routing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the table name to TABLE_SRT (avoiding SR, but want to keep the name short).
|
||
### Start event processing | ||
Start event processing after the above table and redis DB reconciliation.<br> | ||
In the rare case that, data were read from DB but also get an event later, cause rewrite BFD session or Static Route to appl_DB with same contents should not be a problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add sections for the new tables schema?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to create a PR to update the schema on the original static route document?
https://github.com/sonic-net/SONiC/blob/master/doc/static-route/SONiC_static_route_hdl.md#3211-static_route
There is a link to the above document.
if the schema in two different documents, need to update both documents to keep them in sync.
Follow the StaticRouteTimer style, rename the component to StaticRouteBfd. StaticRouteBfd does NOT use bgpcfgd/manager framework
change the "refreshable" field name to "expiry", because StaticRouteTimer already implemented the required feature with field name "expiry"
change table name TABLE_SR to TABLE_SRT (static route). SR seems like "segment routing" related.
Added a new section to describe the design to support the bfd flag changing at runtime.
Change the logic when bfd flag changes from false to true, to not interrupt the traffic.
update bfd field dynamic change design. the redis events are not reliable from T2 end-to-end test result. when there is consecutive write to a specific key, event there is 0.1 second delay, the consumer side may miss the redis "set" event. Here is the log: May 4 22:47:22.341016 sfd-lt2-lc2 NOTICE bgp0#bgpcfgd: :- pops: Miss table key STATIC_ROUTE:default:3.3.3.3/32, possibly outdated
don't need to check vnet prefix, remove that step
…ddress is not available
d575bd3
to
f527967
Compare
@prsunny can you please help with review/sign-off this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please fix typos
<br> | ||
<br> | ||
|
||
## Dynamc static route "bfd" config chagne support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change - typo
<br> | ||
|
||
## Dynamc static route "bfd" config chagne support | ||
When a static route "bfd" field changes (from "true" to "false", or from "false" to "true"), the owner(StaticRouteMgr or StaticRouteBfd) of this static routes also changes. StaticRouteMgr and StaticRouteBfd need to work together to handle this runtime ownershop change.<br> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ownership - typo
|
||
## Dynamc static route "bfd" config chagne support | ||
When a static route "bfd" field changes (from "true" to "false", or from "false" to "true"), the owner(StaticRouteMgr or StaticRouteBfd) of this static routes also changes. StaticRouteMgr and StaticRouteBfd need to work together to handle this runtime ownershop change.<br> | ||
When StaticRouteMgr gets a static route update (redis SET event) event, it checks the static route "bfd" field and its local cache. If the "bfd" field is "true" and the prefix is in its local cache(it handles this route before the update), the StaticRouteMgr delete it from its local cache and does not do any furhter processing for this route. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
further - typo
* [BFD] staticroutebfd implementation * To enable the BFD for static route HLD: sonic-net/SONiC#1216
1, BfdRouteMgr design to monitor static route nexthop reachability and update static route based on BFD session state.
2, testcases