-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bmp various fixes #17373
Bmp various fixes #17373
Conversation
bgpd/bgp_route.c
Outdated
@@ -4182,6 +4182,12 @@ static void bgp_process_internal(struct bgp *bgp, struct bgp_dest *dest, | |||
return; | |||
} | |||
|
|||
void bgp_process_hook(struct bgp *bgp, struct peer *peer, struct bgp_dest *dest, afi_t afi, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I'll bite: why?
why add a layer that just rearranges the arguments, and hides the "withdraw" boolean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- the layer, it is because it is not possible to call a HOOK from outside of bgp_route.c
- the withdraw, I can maintain it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but ... isn't that what hooks are for - they're hooks from our code out to other callbacks? what would it mean to have the callbacks for an existing hook are called from ... some other places?
this looks like a "leak" event - maybe it should have its own hook (if you think there's a use-case for that)
and yes, since there are both "update" and "withdraw" leak events, you should ... have a way to express that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, even if parameters are similar, this is not exactly the correct hook I wanted to call.
BMP is interested in having nexthop tracking hooks, in order to consider a path as reachable or not, and then further update the post-rib.
58dbc35
to
efbc62f
Compare
efbc62f
to
13caa1b
Compare
Can we fix also https://ci1.netdef.org/browse/FRR-PULLREQ3-CHECKOUT-6046? |
13caa1b
to
2fa79fb
Compare
2fa79fb
to
5d5d48e
Compare
I satisfied it, but frrbot will not be happy |
6ed7ba5
to
ec11074
Compare
@Mergifyio backport stable/10.2 stable/10.1 stable/10.0 stable/9.1 |
🟠 Waiting for conditions to match
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've already encountered this issue before, thanks for taking care of it with this PR :)
prefix2str(&dest->rn->p, pfxprint, sizeof(pfxprint)); | ||
frrtrace(4, frr_bgp, bmp_nht_path_valid, bgp, pfxprint, path, valid); | ||
} | ||
if (bgp->peer_self == path->peer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why aren't "redistributed networks relevant for bmp" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redistributed networks are the ones that are added in BGP when doing:
router bgp 65500
address-family ipv4 unicast
redistribute static
because they originate from already selected routes, those routes are always available, and are not impacted by BGP nexthop tracking. If there is a reachability problem, BGP will receive a bgp_redistribute_delete command from the route that disappeared from static route list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh okay i see, you meant "not relevant for next hop tracking" not "for bmp"
i understand it now thx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to clarify, when nht does not work, here is the output on a BMP collector:
{"peer_type": "route distinguisher instance", "policy": "pre-policy", "ipv6": true, "peer_ip": "192:167::3", "peer_distinguisher": "555:1", "peer_asn": 65501, "peer_bgp_id": "192.168.1.3", "timestamp": "2024-11-28 17:05:45.111963", "bmp_log_type": "update", "origin": "IGP", "as_path": "", "afi": 2, "safi": 1, "nxhp_ip": "192:167::3", "nxhp_link-local": "fe80::7caf:c7ff:fe63:2b4d", "ip_prefix": "2001::1125/128", "seq": 32}
{"peer_type": "route distinguisher instance", "policy": "post-policy", "ipv6": true, "peer_ip": "192:167::3", "peer_distinguisher": "555:1", "peer_asn": 65501, "peer_bgp_id": "192.168.1.3", "timestamp": "2024-11-28 17:05:45.111963", "bmp_log_type": "update", "origin": "IGP", "as_path": "", "afi": 2, "safi": 1, "nxhp_ip": "192:167::3", "nxhp_link-local": "fe80::7caf:c7ff:fe63:2b4d", "ip_prefix": "2001::1125/128", "seq": 33}
{"peer_type": "route distinguisher instance", "policy": "pre-policy", "ipv6": true, "peer_ip": "192:167::3", "peer_distinguisher": "555:1", "peer_asn": 65501, "peer_bgp_id": "192.168.1.3", "timestamp": "2024-11-28 17:05:45.111962", "bmp_log_type": "update", "origin": "IGP", "as_path": "", "afi": 2, "safi": 1, "nxhp_ip": "192:167::3", "nxhp_link-local": "fe80::7caf:c7ff:fe63:2b4d", "ip_prefix": "2001::1125/128", "seq": 34}
{"peer_type": "loc-rib instance", "is_filtered": false, "policy": "loc-rib", "peer_distinguisher": "555:1", "peer_asn": 65501, "peer_bgp_id": "192.168.1.1", "timestamp": "2024-11-28 17:05:45.111963", "bmp_log_type": "update", "origin": "IGP", "as_path": "", "afi": 2, "safi": 1, "nxhp_ip": "192:167::3", "nxhp_link-local": "fe80::7caf:c7ff:fe63:2b4d", "ip_prefix": "2001::1125/128", "seq": 35}
As you can see, the pre-policy message for 2001::1125/128 is sent twice.
But I think it is not a big deal, because all the information in the entry tells if it is an addpath entry or not, or if it is an ECMP entry. so we can not make mistakes when reading both messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see, in #14847 the adj-rib-in post-policy is moved to the queue of loc-rib since they run lookups on the same structure (the bgp->rib)
because multiple monitoring modes share queues, i added a flag field to the bqe that tells which rib must send a message. this should fix this double message issue too :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and i agree, it's not ideal but it's not a huge deal either. it will not confuse a collector
0e91188
to
0a37e19
Compare
ci:rerun unrelated ldp test failed |
0a37e19
to
1cd55d9
Compare
The following ASAN error can be seen. > ERROR: AddressSanitizer: attempting to call malloc_usable_size() for pointer which is not owned: 0x608000036c20 > #0 0x7f3d7a4b5425 in __interceptor_malloc_usable_size ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:198 > FRRouting#1 0x7f3d7a426a16 in __sanitizer::BufferedStackTrace::Unwind(unsigned long, unsigned long, void*, bool, unsigned int) ../../../../src/libsanitizer/sanitizer_common > /sanitizer_stacktrace.h:122 > FRRouting#2 0x7f3d7a426a16 in __asan::asan_malloc_usable_size(void const*, unsigned long, unsigned long) ../../../../src/libsanitizer/asan/asan_allocator.cpp:1074 > FRRouting#3 0x7f3d7a03f330 in mt_count_free lib/memory.c:78 > FRRouting#4 0x7f3d7a03f330 in qfree lib/memory.c:130 > FRRouting#5 0x7f3d76ccf89b in bmp_peer_status_changed bgpd/bgp_bmp.c:982 > FRRouting#6 0x560ae2aa6a94 in hook_call_peer_status_changed bgpd/bgp_fsm.c:47 > FRRouting#7 0x560ae2aa6a94 in bgp_fsm_change_status bgpd/bgp_fsm.c:1287 > FRRouting#8 0x560ae2c4f2e5 in peer_delete bgpd/bgpd.c:2777 > FRRouting#9 0x560ae2c58d24 in bgp_delete bgpd/bgpd.c:4140 > FRRouting#10 0x560ae2bbb47e in no_router_bgp bgpd/bgp_vty.c:1764 > FRRouting#11 0x7f3d79fb74ed in cmd_execute_command_real lib/command.c:1003 > FRRouting#12 0x7f3d79fb78a3 in cmd_execute_command lib/command.c:1062 > FRRouting#13 0x7f3d79fb7e03 in cmd_execute lib/command.c:1228 > FRRouting#14 0x7f3d7a107b53 in vty_command lib/vty.c:625 > FRRouting#15 0x7f3d7a109902 in vty_execute lib/vty.c:1388 > FRRouting#16 0x7f3d7a10cc32 in vtysh_read lib/vty.c:2400 > FRRouting#17 0x7f3d7a0f848b in event_call lib/event.c:2019 > FRRouting#18 0x7f3d7a01e627 in frr_run lib/libfrr.c:1232 > FRRouting#19 0x560ae29e0037 in main bgpd/bgp_main.c:555 > FRRouting#20 0x7f3d79a29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 > FRRouting#21 0x7f3d79a29e3f in __libc_start_main_impl ../csu/libc-start.c:392 > FRRouting#22 0x560ae29e4ef4 in _start (/usr/lib/frr/bgpd+0x2eeef4) > > 0x608000036c20 is located 0 bytes inside of 81-byte region [0x608000036c20,0x608000036c71) > freed by thread T0 here: > #0 0x7f3d7a4b4537 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:127 > FRRouting#1 0x7f3d76ccf85f in bmp_peer_status_changed bgpd/bgp_bmp.c:981 > FRRouting#2 0x560ae2aa6a94 in hook_call_peer_status_changed bgpd/bgp_fsm.c:47 > FRRouting#3 0x560ae2aa6a94 in bgp_fsm_change_status bgpd/bgp_fsm.c:1287 > FRRouting#4 0x560ae2c4f2e5 in peer_delete bgpd/bgpd.c:2777 > FRRouting#5 0x560ae2c58d24 in bgp_delete bgpd/bgpd.c:4140 > FRRouting#6 0x560ae2bbb47e in no_router_bgp bgpd/bgp_vty.c:1764 > FRRouting#7 0x7f3d79fb74ed in cmd_execute_command_real lib/command.c:1003 > FRRouting#8 0x7f3d79fb78a3 in cmd_execute_command lib/command.c:1062 > FRRouting#9 0x7f3d79fb7e03 in cmd_execute lib/command.c:1228 > FRRouting#10 0x7f3d7a107b53 in vty_command lib/vty.c:625 > FRRouting#11 0x7f3d7a109902 in vty_execute lib/vty.c:1388 > FRRouting#12 0x7f3d7a10cc32 in vtysh_read lib/vty.c:2400 > FRRouting#13 0x7f3d7a0f848b in event_call lib/event.c:2019 > FRRouting#14 0x7f3d7a01e627 in frr_run lib/libfrr.c:1232 > FRRouting#15 0x560ae29e0037 in main bgpd/bgp_main.c:555 > FRRouting#16 0x7f3d79a29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 > > previously allocated by thread T0 here: > #0 0x7f3d7a4b4887 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145 > FRRouting#1 0x7f3d7a03f0e9 in qmalloc lib/memory.c:101 > FRRouting#2 0x7f3d76cd0166 in bmp_bgp_peer_vrf bgpd/bgp_bmp.c:2194 > FRRouting#3 0x7f3d76cd0166 in bmp_bgp_update_vrf_status bgpd/bgp_bmp.c:2236 > FRRouting#4 0x7f3d76cd29b8 in bmp_vrf_state_changed bgpd/bgp_bmp.c:3479 > FRRouting#5 0x560ae2c45b34 in hook_call_bgp_instance_state bgpd/bgpd.c:88 > FRRouting#6 0x560ae2c4d158 in bgp_instance_up bgpd/bgpd.c:3936 > FRRouting#7 0x560ae29e5ed1 in bgp_vrf_enable bgpd/bgp_main.c:299 > FRRouting#8 0x7f3d7a0ff8b1 in vrf_enable lib/vrf.c:286 > FRRouting#9 0x7f3d7a0ff8b1 in vrf_enable lib/vrf.c:275 > FRRouting#10 0x7f3d7a12ab66 in zclient_vrf_add lib/zclient.c:2561 > FRRouting#11 0x7f3d7a12eb43 in zclient_read lib/zclient.c:4624 > FRRouting#12 0x7f3d7a0f848b in event_call lib/event.c:2019 > FRRouting#13 0x7f3d7a01e627 in frr_run lib/libfrr.c:1232 > FRRouting#14 0x560ae29e0037 in main bgpd/bgp_main.c:555 > FRRouting#15 0x7f3d79a29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The following warning can be seen: > In file included from ./bgpd/bgp_trace.h:21, > from bgpd/bgp_io.c:27: > bgpd/bgp_io.c: In function ‘read_ibuf_work’: > bgpd/bgp_io.c:202:53: warning: passing argument 1 of ‘lttng_ust_tracepoint_cb_frr_bgp___packet_read’ from incompatible pointer type [-Wincompatible-pointer-types] > 202 | frrtrace(2, frr_bgp, packet_read, connection->peer, pkt); > | ~~~~~~~~~~^~~~~~ > | | > | struct peer * > bgpd/bgp_io.c:202:9: note: in expansion of macro ‘frrtrace’ > 202 | frrtrace(2, frr_bgp, packet_read, connection->peer, pkt); > | ^~~~~~~~ > In file included from ./bgpd/bgp_trace.h:21, > from bgpd/bgp_io.c:27: > ./bgpd/bgp_trace.h:57:43: note: expected ‘struct peer_connection *’ but argument is of type ‘struct peer *’ > 57 | TP_ARGS(struct peer_connection *, connection, struct stream *, pkt), > | ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~ Use the appropriate connection parameter when calling the trace. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When a BGP listener configured with BMP receives the first BGP IPv6 update from a connected BGP IPv6 peer, the BMP collector receives a withdraw post-policy message. > {"peer_type": "route distinguisher instance", "policy": "post-policy", > "ipv6": true, "peer_ip": "192:167::3", "peer_distinguisher": "444:1", > "peer_asn": 65501, "peer_bgp_id": "192.168.1.3", "timestamp": > "2024-10-29 11:44:47.111962", "bmp_log_type": "withdraw", "afi": 2, > "safi": 1, "ip_prefix": "2001::1125/128", "seq": 22} > {"peer_type": "route distinguisher instance", "policy": "pre-policy", > "ipv6": true, "peer_ip": "192:167::3", "peer_distinguisher": "444:1", > "peer_asn": 65501, "peer_bgp_id": "192.168.1.3", "timestamp": > "2024-10-29 11:44:47.111963", "bmp_log_type": "update", "origin": > "IGP", "as_path": "", "afi": 2, "safi": 1, "nxhp_ip": "192:167::3", > "nxhp_link-local": "fe80::7063:d8ff:fedb:9e11", "ip_prefix": "2001::1125/128", "seq": 23} Actually, the BGP update is not valid, and BMP considers it as a withdraw message. The BGP upate is not valid, because the nexthop reachability is unknown at the time of reception, and no other BMP message is sent. Fix this by re-sending a BMP post update message when nexthop tracking becomes successfull. Generalise the re-sending of messages when nexthop tracking changes. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
1cd55d9
to
454b721
Compare
replaced/enlarged by #17733 . |
why replaced? |
it is enlarged. |
A commit to re-send post-policy BGP updates when nexthop reachability is changed.
A commit related to an ASAN issue encountered with rx_open and tx_open buffers.