-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizations and problem fixing for large scale ecmp from bgp #17229
Merged
ton31337
merged 5 commits into
FRRouting:master
from
donaldsharp:bgp_update_optimizations
Oct 25, 2024
Merged
Optimizations and problem fixing for large scale ecmp from bgp #17229
ton31337
merged 5 commits into
FRRouting:master
from
donaldsharp:bgp_update_optimizations
Oct 25, 2024
+91
−44
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
bgp_update is a very expensive call. Calling evpn_overlay_free even when we have no evpn data to free is not trivial. Let's limit the call into this function until we actually have data to free. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
donaldsharp
force-pushed
the
bgp_update_optimizations
branch
2 times, most recently
from
October 24, 2024 23:42
fab3725
to
0ea35c6
Compare
When running bestpath on a very large number of ecmp. BGP ends up calling aspath_count a very very large number of times, which results in ~15% cpu runtime in aspath_count_hops. Modify the aspath to keep track of it's own count. This results in the function now taking up ~1.5% of the cpu runtime. Enough for the moment to be ignored. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This is just a small optimization but when calling path_info_cmp hundreds of millions of times this adds up. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
(gdb) bt 0 futex_wait (private=0, expected=2, futex_word=0x5c438e9a98d8) at ../sysdeps/nptl/futex-internal.h:146 1 __GI___lll_lock_wait (futex=futex@entry=0x5c438e9a98d8, private=0) at ./nptl/lowlevellock.c:49 2 0x00007af16d698002 in lll_mutex_lock_optimized (mutex=0x5c438e9a98d8) at ./nptl/pthread_mutex_lock.c:48 3 ___pthread_mutex_lock (mutex=0x5c438e9a98d8) at ./nptl/pthread_mutex_lock.c:93 4 0x00005c4369c17e70 in _frr_mtx_lock (mutex=0x5c438e9a98d8, func=0x5c4369dc2750 <__func__.265> "bgp_notify_send_internal") at ./lib/frr_pthread.h:258 5 0x00005c4369c1a07a in bgp_notify_send_internal (connection=0x5c438e9a98c0, code=8 '\b', sub_code=0 '\000', data=0x0, datalen=0, use_curr=true) at bgpd/bgp_packet.c:928 6 0x00005c4369c1a707 in bgp_notify_send (connection=0x5c438e9a98c0, code=8 '\b', sub_code=0 '\000') at bgpd/bgp_packet.c:1069 7 0x00005c4369bea422 in bgp_stop_with_notify (connection=0x5c438e9a98c0, code=8 '\b', sub_code=0 '\000') at bgpd/bgp_fsm.c:1597 8 0x00005c4369c18480 in bgp_packet_add (connection=0x5c438e9a98c0, peer=0x5c438e9b6010, s=0x7af15c06bf70) at bgpd/bgp_packet.c:151 9 0x00005c4369c19816 in bgp_keepalive_send (peer=0x5c438e9b6010) at bgpd/bgp_packet.c:639 10 0x00005c4369bf01fd in peer_process (hb=0x5c438ed05520, arg=0x7af16bdffaf0) at bgpd/bgp_keepalives.c:111 11 0x00007af16dacd8e6 in hash_iterate (hash=0x7af15c000be0, func=0x5c4369bf005e <peer_process>, arg=0x7af16bdffaf0) at lib/hash.c:252 12 0x00005c4369bf0679 in bgp_keepalives_start (arg=0x5c438e0db110) at bgpd/bgp_keepalives.c:214 13 0x00007af16dac9932 in frr_pthread_inner (arg=0x5c438e0db110) at lib/frr_pthread.c:180 14 0x00007af16d694ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442 15 0x00007af16d726850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 (gdb) The bgp keepalive pthread gets deadlocked with itself and consequently the bgp master pthread gets locked when it attempts to lock the peerhash_mtx, since it is also locked by the keepalive_pthread The keepalive pthread is locking the peerhash_mtx in bgp_keepalives_start. Next the connection->io_mtx mutex in bgp_keepalives_send is locked and then when it notices a problem it invokes bgp_stop_with_notify which relocks the same mutex ( and of course the relock causes it to get stuck on itself ). This generates a deadlock condition. Modify the code to only hold the connection->io_mtx as short as possible. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=130719886083648) at ./nptl/pthread_kill.c:44 1 __pthread_kill_internal (signo=6, threadid=130719886083648) at ./nptl/pthread_kill.c:78 2 __GI___pthread_kill (threadid=130719886083648, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 3 0x000076e399e42476 in __GI_raise (sig=6) at ../sysdeps/posix/raise.c:26 4 0x000076e39a34f950 in core_handler (signo=6, siginfo=0x76e3985fca30, context=0x76e3985fc900) at lib/sigevent.c:258 5 <signal handler called> 6 __pthread_kill_implementation (no_tid=0, signo=6, threadid=130719886083648) at ./nptl/pthread_kill.c:44 7 __pthread_kill_internal (signo=6, threadid=130719886083648) at ./nptl/pthread_kill.c:78 8 __GI___pthread_kill (threadid=130719886083648, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 9 0x000076e399e42476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 10 0x000076e399e287f3 in __GI_abort () at ./stdlib/abort.c:79 11 0x000076e39a39874b in _zlog_assert_failed (xref=0x76e39a46cca0 <_xref.27>, extra=0x0) at lib/zlog.c:789 12 0x000076e39a369dde in cancel_event_helper (m=0x5eda32df5e40, arg=0x5eda33afeed0, flags=1) at lib/event.c:1428 13 0x000076e39a369ef6 in event_cancel_event_ready (m=0x5eda32df5e40, arg=0x5eda33afeed0) at lib/event.c:1470 14 0x00005eda0a94a5b3 in bgp_stop (connection=0x5eda33afeed0) at bgpd/bgp_fsm.c:1355 15 0x00005eda0a94b4ae in bgp_stop_with_notify (connection=0x5eda33afeed0, code=8 '\b', sub_code=0 '\000') at bgpd/bgp_fsm.c:1610 16 0x00005eda0a979498 in bgp_packet_add (connection=0x5eda33afeed0, peer=0x5eda33b11800, s=0x76e3880daf90) at bgpd/bgp_packet.c:152 17 0x00005eda0a97a80f in bgp_keepalive_send (peer=0x5eda33b11800) at bgpd/bgp_packet.c:639 18 0x00005eda0a9511fd in peer_process (hb=0x5eda33c9ab80, arg=0x76e3985ffaf0) at bgpd/bgp_keepalives.c:111 19 0x000076e39a2cd8e6 in hash_iterate (hash=0x76e388000be0, func=0x5eda0a95105e <peer_process>, arg=0x76e3985ffaf0) at lib/hash.c:252 20 0x00005eda0a951679 in bgp_keepalives_start (arg=0x5eda3306af80) at bgpd/bgp_keepalives.c:214 21 0x000076e39a2c9932 in frr_pthread_inner (arg=0x5eda3306af80) at lib/frr_pthread.c:180 22 0x000076e399e94ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442 23 0x000076e399f26850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 (gdb) f 12 12 0x000076e39a369dde in cancel_event_helper (m=0x5eda32df5e40, arg=0x5eda33afeed0, flags=1) at lib/event.c:1428 1428 assert(m->owner == pthread_self()); In this decode the attempt to cancel the connection's events from the wrong thread is causing the crash. Modify the code to create an event on the bm->master to cancel the events for the connection. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
donaldsharp
force-pushed
the
bgp_update_optimizations
branch
from
October 25, 2024 01:01
0ea35c6
to
138935a
Compare
Jafaral
approved these changes
Oct 25, 2024
Merged
11 tasks
lguohan
pushed a commit
to sonic-net/sonic-buildimage
that referenced
this pull request
Dec 19, 2024
Why I did it Added patches from FRR to support scale of BGP neighbors to 256/514. Below are the patches Patch FRR Pull request 0069-lib-nexthop-code-should-use-uint16_t-for-nexthop-cou.patch FRRouting/frr#16967 0070-Allow-16-bit-size-for-nexthops.patch FRRouting/frr#17023 0071-zebra-Only-notify-dplane-work-pthread-when-needed.patch FRRouting/frr#17062 0072-Fix-up-improper-handling-of-nexthops-for-nexthop-tra.patch FRRouting/frr#17076 0073-remove-in6addr-cmp.patch FRRouting/frr#17312 0074-bgp-best-port-reordering.patch FRRouting/frr#15572 0075-bgp-mp-info-changes.patch FRRouting/frr#16961 0076-Optimizations-and-problem-fixing-for-large-scale-ecmp-from-bgp.patch FRRouting/frr#17229
dgsudharsan
added a commit
to dgsudharsan/sonic-buildimage
that referenced
this pull request
Dec 23, 2024
…net#21199) Why I did it Added patches from FRR to support scale of BGP neighbors to 256/514. Below are the patches Patch FRR Pull request 0069-lib-nexthop-code-should-use-uint16_t-for-nexthop-cou.patch FRRouting/frr#16967 0070-Allow-16-bit-size-for-nexthops.patch FRRouting/frr#17023 0071-zebra-Only-notify-dplane-work-pthread-when-needed.patch FRRouting/frr#17062 0072-Fix-up-improper-handling-of-nexthops-for-nexthop-tra.patch FRRouting/frr#17076 0073-remove-in6addr-cmp.patch FRRouting/frr#17312 0074-bgp-best-port-reordering.patch FRRouting/frr#15572 0075-bgp-mp-info-changes.patch FRRouting/frr#16961 0076-Optimizations-and-problem-fixing-for-large-scale-ecmp-from-bgp.patch FRRouting/frr#17229
11 tasks
github-actions bot
pushed a commit
to bradh352/sonic-buildimage
that referenced
this pull request
Jan 2, 2025
…net#21199) Why I did it Added patches from FRR to support scale of BGP neighbors to 256/514. Below are the patches Patch FRR Pull request 0069-lib-nexthop-code-should-use-uint16_t-for-nexthop-cou.patch FRRouting/frr#16967 0070-Allow-16-bit-size-for-nexthops.patch FRRouting/frr#17023 0071-zebra-Only-notify-dplane-work-pthread-when-needed.patch FRRouting/frr#17062 0072-Fix-up-improper-handling-of-nexthops-for-nexthop-tra.patch FRRouting/frr#17076 0073-remove-in6addr-cmp.patch FRRouting/frr#17312 0074-bgp-best-port-reordering.patch FRRouting/frr#15572 0075-bgp-mp-info-changes.patch FRRouting/frr#16961 0076-Optimizations-and-problem-fixing-for-large-scale-ecmp-from-bgp.patch FRRouting/frr#17229
github-actions bot
pushed a commit
to bradh352/sonic-buildimage
that referenced
this pull request
Jan 2, 2025
…net#21199) Why I did it Added patches from FRR to support scale of BGP neighbors to 256/514. Below are the patches Patch FRR Pull request 0069-lib-nexthop-code-should-use-uint16_t-for-nexthop-cou.patch FRRouting/frr#16967 0070-Allow-16-bit-size-for-nexthops.patch FRRouting/frr#17023 0071-zebra-Only-notify-dplane-work-pthread-when-needed.patch FRRouting/frr#17062 0072-Fix-up-improper-handling-of-nexthops-for-nexthop-tra.patch FRRouting/frr#17076 0073-remove-in6addr-cmp.patch FRRouting/frr#17312 0074-bgp-best-port-reordering.patch FRRouting/frr#15572 0075-bgp-mp-info-changes.patch FRRouting/frr#16961 0076-Optimizations-and-problem-fixing-for-large-scale-ecmp-from-bgp.patch FRRouting/frr#17229
github-actions bot
pushed a commit
to bradh352/sonic-buildimage
that referenced
this pull request
Jan 2, 2025
…net#21199) Why I did it Added patches from FRR to support scale of BGP neighbors to 256/514. Below are the patches Patch FRR Pull request 0069-lib-nexthop-code-should-use-uint16_t-for-nexthop-cou.patch FRRouting/frr#16967 0070-Allow-16-bit-size-for-nexthops.patch FRRouting/frr#17023 0071-zebra-Only-notify-dplane-work-pthread-when-needed.patch FRRouting/frr#17062 0072-Fix-up-improper-handling-of-nexthops-for-nexthop-tra.patch FRRouting/frr#17076 0073-remove-in6addr-cmp.patch FRRouting/frr#17312 0074-bgp-best-port-reordering.patch FRRouting/frr#15572 0075-bgp-mp-info-changes.patch FRRouting/frr#16961 0076-Optimizations-and-problem-fixing-for-large-scale-ecmp-from-bgp.patch FRRouting/frr#17229
mssonicbld
added a commit
to mssonicbld/sonic-buildimage-msft
that referenced
this pull request
Jan 8, 2025
<!-- Please make sure you've read and understood our contributing guidelines: https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md ** Make sure all your commits include a signature generated with `git commit -s` ** If this is a bug fix, make sure your description includes "fixes #xxxx", or "closes #xxxx" or "resolves #xxxx" Please provide the following information: --> #### Why I did it Added patches from FRR to support scale of BGP neighbors to 256/514. Below are the patches | Patch | FRR Pull request| | ------ |--------- | | 0069-lib-nexthop-code-should-use-uint16_t-for-nexthop-cou.patch | FRRouting/frr#16967 | | 0070-Allow-16-bit-size-for-nexthops.patch | FRRouting/frr#17023 | | 0071-zebra-Only-notify-dplane-work-pthread-when-needed.patch | FRRouting/frr#17062 | | 0072-Fix-up-improper-handling-of-nexthops-for-nexthop-tra.patch | FRRouting/frr#17076 | | 0073-remove-in6addr-cmp.patch | FRRouting/frr#17312 | | 0074-bgp-best-port-reordering.patch | FRRouting/frr#15572 | | 0075-bgp-mp-info-changes.patch | FRRouting/frr#16961 | | 0076-Optimizations-and-problem-fixing-for-large-scale-ecmp-from-bgp.patch | FRRouting/frr#17229 | ##### Work item tracking - Microsoft ADO **(number only)**: #### How I did it #### How to verify it <!-- If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012. --> #### Which release branch to backport (provide reason below if selected) <!-- - Note we only backport fixes to a release branch, *not* features! - Please also provide a reason for the backporting below. - e.g. - [x] 202006 --> - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [ ] 202111 - [ ] 202205 - [ ] 202211 - [ ] 202305 #### Tested branch (Please provide the tested image version) <!-- - Please provide tested image version - e.g. - [x] 20201231.100 --> - [ ] <!-- image version 1 --> - [ ] <!-- image version 2 --> #### Description for the changelog <!-- Write a short (one line) summary that describes the changes in this pull request for inclusion in the changelog: --> <!-- Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU. --> #### Link to config_db schema for YANG module changes <!-- Provide a link to config_db schema for the table for which YANG model is defined Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md --> #### A picture of a cute animal (not mandatory but encouraged)
11 tasks
VladimirKuk
pushed a commit
to Marvell-switching/sonic-buildimage
that referenced
this pull request
Jan 21, 2025
…net#21199) Why I did it Added patches from FRR to support scale of BGP neighbors to 256/514. Below are the patches Patch FRR Pull request 0069-lib-nexthop-code-should-use-uint16_t-for-nexthop-cou.patch FRRouting/frr#16967 0070-Allow-16-bit-size-for-nexthops.patch FRRouting/frr#17023 0071-zebra-Only-notify-dplane-work-pthread-when-needed.patch FRRouting/frr#17062 0072-Fix-up-improper-handling-of-nexthops-for-nexthop-tra.patch FRRouting/frr#17076 0073-remove-in6addr-cmp.patch FRRouting/frr#17312 0074-bgp-best-port-reordering.patch FRRouting/frr#15572 0075-bgp-mp-info-changes.patch FRRouting/frr#16961 0076-Optimizations-and-problem-fixing-for-large-scale-ecmp-from-bgp.patch FRRouting/frr#17229
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See the individual commits.