Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

teamd stuck in some state and keep in it until we restart it. #48

Open
lguohan opened this issue Mar 9, 2020 · 0 comments
Open

teamd stuck in some state and keep in it until we restart it. #48

lguohan opened this issue Mar 9, 2020 · 0 comments

Comments

@lguohan
Copy link

lguohan commented Mar 9, 2020

Recently we found one critical issue in libteam.
Sometimes (very rarely) we can see that teamd stuck in some state and keep in it until we restart it.
In this state:

  1. Teamd doesn’t send any log output
  2. Teamd sends out lacp packets as usual. The packets has State Flags [Activity, Aggregation, Synchronization, Collecting, Distributing]
  3. Teamd member ports is disabled:
    root@host:/# teamnl PortChannel1013 options
    lb_port_stats (port:Ethernet48) \00\00\00\00\00\00\00\00
    queue_id (port:Ethernet48) 0
    priority (port:Ethernet48) 0
    user_linkup_enabled (port:Ethernet48) false
    user_linkup (port:Ethernet48) true
    enabled (port:Ethernet48) false
    lb_stats_refresh_interval 0

I analyzed logs for our teamd and found that:

  1. When teamd first enables member ports and then receive carrier up everything works as expected. This order in 99% of cases
  2. When teamd first receives “carrier up” message, it will be never show anything after that.

For example:
Working session:

Feb  7 23:16:59.385812 str-s6100-acs-1 DEBUG teamd#teamd_PortChannel16[37]: Ethernet16: Adding port (found ifindex "28").
Feb  7 23:16:59.464625 str-s6100-acs-1 DEBUG teamd#teamd_PortChannel16[37]: Ethernet17: Adding port (found ifindex "29").
Feb  7 23:17:10.455175 str-s6100-acs-1 DEBUG teamd#teamd_PortChannel16[37]: Ethernet17: Enabling port
Feb  7 23:17:10.477280 str-s6100-acs-1 DEBUG teamd#teamd_PortChannel16[37]: Ethernet16: Enabling port
Feb  7 23:17:10.477280 str-s6100-acs-1 DEBUG teamd#teamd_PortChannel16[37]: Enable carrier. Number of enabled ports 2 >= configured min_ports 2
Feb  7 23:17:10.477280 str-s6100-acs-1 INFO teamd#teamd_PortChannel16[37]: carrier changed to UP

Session which are stuck:

Feb 19 22:57:29.826177 str-s6100-acs-1 DEBUG teamd#teamd_PortChannel1019[170]: Ethernet72: Adding port (found ifindex "59").
Feb 19 22:57:30.246222 str-s6100-acs-1 DEBUG teamd#teamd_PortChannel1019[170]: Enable carrier. Number of enabled ports 1 >= configured min_ports 1
Feb 19 22:57:30.252943 str-s6100-acs-1 INFO teamd#teamd_PortChannel1019[170]: carrier changed to UP
Feb 19 22:57:30.263326 str-s6100-acs-1 DEBUG teamd#teamd_PortChannel1019[170]: Enable carrier. Number of enabled ports 1 >= configured min_ports 1
Feb 19 22:57:30.263707 str-s6100-acs-1 DEBUG teamd#teamd_PortChannel1019[170]: Enable carrier. Number of enabled ports 1 >= configured min_ports 1

After that no messages from the teamd, but it still sends updates, and traffic is being blackholed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant