Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fast-reboot] Add a check for warmstart before cleaning up neigh table #1498

Merged
merged 1 commit into from
Nov 18, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions warmrestart/warmRestartAssist.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,10 @@ void AppRestartAssist::registerAppTable(const std::string &tableName, ProducerSt
m_psTables[tableName] = psTable;

// Clear the producerstate table to make sure no pending data for the AppTable
psTable->clear();
if (m_warmStartInProgress)
{
psTable->clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this change will affect both neighsyncd and natsyncd. @lguohan is it OK to not clear the table for natsyncd when the dut is warm-rebooting? If not and to limit the change to limit neighsyncd, we can also check the table name to be neighsyncd

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't nat hit same issue if this protection is not there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear if it is required for NAT tables to be cleared unconditionally or not. Needs a bit of digging into how NAT is using this shared class. NAT is using this method for 4 table here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bug introduced by #1126 where the refactoring of the library to support multiple tables was calling this psTable->clear unconditionally. The way the fix doing here should be correct. NAT tables or any client should not use the library to flush producer state table in non warm-reboot cases.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw: The reason we use psTable clear was to make sure the relevant table wouldn’t change in some corner cases after we dumped it to memory, this is required only in warm-reboot/restart case before we dump the table. Also, since the daemon using the library was the producer itself, it was safe to do so. However, this assumption was broken if we use swssconfig to load the table at the same time, which is the case for non warm-reboot cases for arp, nat tables etc. In those cases, the library cleared the requests from swssconfig incorrectly and cause the issues reported.

}
m_appTables[tableName] = new Table(m_pipeLine, tableName, false);
}

Expand Down Expand Up @@ -150,7 +153,7 @@ void AppRestartAssist::readTablesToMap()
}
return;
}

/*
* Check and insert to CacheMap Logic:
* if delete_key:
Expand Down