Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fast-reboot] arp table will be cleared after swssconfig restores it #5841

Closed
dennis0113 opened this issue Nov 6, 2020 · 1 comment · Fixed by sonic-net/sonic-swss#1498
Closed

Comments

@dennis0113
Copy link

Description

Steps to reproduce the issue:
1.Run fast-reboot command
2.Compare the arp.json with ARP entry in db.

Describe the results you received:
=> Any ARP entries in arp.json do not appear in db.

Describe the results you expected:
=> All ARP entries which backup in arp.json are restored to db.

Additional information you deem important (e.g. issue happens only occasionally):

**Root Cause:**
```  
swssconfig and neighsyncd have race condition when write/clear producer state table(APP_NEIGH_TABLE_NAME)

1.When swssconfig restores arp.json, data with key "_NEIGH_TABLE" will be set to redis.
   And it expected orch to process it and rewrite the data with key "NEIGH_TABLE" back to redis.

2.However, during the initialization of neighsyncd, "m_AppRestartAssist->registerAppTable" will be called
   and it will clear the data in redis.

3.Now, neighsyncd is executed after swssconfig.
   [program:neighsyncd]
   command=/usr/bin/neighsyncd
   priority=7
   autostart=false
   autorestart=false
   stdout_logfile=syslog
   stderr_logfile=syslog
   dependent_startup=true
   dependent_startup_wait_for=swssconfig:exited

   So, if orch does not handle the data before the neighsyncd init, this issue will appear.

4.Is there any possibilities that other demons which also call 'm_AppRestartAssist->registerAppTable' 
   during initialization will have the race condition?
```
@dennis0113
Copy link
Author

dennis0113 commented Nov 6, 2020

[fast-reboot] Fix arp entries will be cleared after swssconfig restores #5842

I had done a modification which resolved the race condition issue.
Can someone review the change and check if there is any side effect or more suitable solution?

@dennis0113 dennis0113 reopened this Nov 6, 2020
qiluo-msft pushed a commit to sonic-net/sonic-swss that referenced this issue Nov 18, 2020
…s enable. (#1498)

This commit is to address the issue that the NEIGH_TABLE loaded by swssconfig
after fast-reboot is cleared by neighsyncd.

**What I did**
Fix sonic-net/sonic-buildimage#5841 and sonic-net/sonic-buildimage#5580

We found that neighbor table loaded by ```swssconfig``` from ```arp.json``` after ```fast-reboot``` is cleared by ```neighsyncd``` mistakenly at the initial stage. This PR adds a check for ```WarmStart``` before cleaning up, and only do that if ```WarmStart``` is enable.

**Why I did it**
This PR is to fix the issue that arp table is not recovered after fast-reboot.

**How I verified it**
Verified on Arista-7260, running 201911 image.
1. Run some test to populate ARP entries on DUT, such as ```test_fast_reboot```
2. Issue a fast-reboot
3. Verify the ```arp.json``` backed up by ```fast-reboot-dump.py``` is loaded and NEIGH_TABLE is restored.
abdosi pushed a commit to sonic-net/sonic-swss that referenced this issue Dec 4, 2020
…s enable. (#1498)

This commit is to address the issue that the NEIGH_TABLE loaded by swssconfig
after fast-reboot is cleared by neighsyncd.

**What I did**
Fix sonic-net/sonic-buildimage#5841 and sonic-net/sonic-buildimage#5580

We found that neighbor table loaded by ```swssconfig``` from ```arp.json``` after ```fast-reboot``` is cleared by ```neighsyncd``` mistakenly at the initial stage. This PR adds a check for ```WarmStart``` before cleaning up, and only do that if ```WarmStart``` is enable.

**Why I did it**
This PR is to fix the issue that arp table is not recovered after fast-reboot.

**How I verified it**
Verified on Arista-7260, running 201911 image.
1. Run some test to populate ARP entries on DUT, such as ```test_fast_reboot```
2. Issue a fast-reboot
3. Verify the ```arp.json``` backed up by ```fast-reboot-dump.py``` is loaded and NEIGH_TABLE is restored.
daall pushed a commit to daall/sonic-swss that referenced this issue Dec 7, 2020
…s enable. (sonic-net#1498)

This commit is to address the issue that the NEIGH_TABLE loaded by swssconfig
after fast-reboot is cleared by neighsyncd.

**What I did**
Fix sonic-net/sonic-buildimage#5841 and sonic-net/sonic-buildimage#5580

We found that neighbor table loaded by ```swssconfig``` from ```arp.json``` after ```fast-reboot``` is cleared by ```neighsyncd``` mistakenly at the initial stage. This PR adds a check for ```WarmStart``` before cleaning up, and only do that if ```WarmStart``` is enable.

**Why I did it**
This PR is to fix the issue that arp table is not recovered after fast-reboot.

**How I verified it**
Verified on Arista-7260, running 201911 image.
1. Run some test to populate ARP entries on DUT, such as ```test_fast_reboot```
2. Issue a fast-reboot
3. Verify the ```arp.json``` backed up by ```fast-reboot-dump.py``` is loaded and NEIGH_TABLE is restored.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant