-
Notifications
You must be signed in to change notification settings - Fork 748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve CRI->checkpoint logic in the face of downgrades #1123
Conversation
3807f3f
to
88292c8
Compare
88292c8
to
10508b3
Compare
Commit e38a4d8 introduced a method of checkpointing IPAM state to disk and avoiding CRI access. As written, that commit failed to handle upgrade/downgrade/upgrade: on the second upgrade the checkpoint file would exist but contain stale information. This change improves the logic to use a more robust "expand/contract" rollout: Migration phase0 (CNI 1.6): Read/write[*] from CRI only. Migration phase1 (CNI 1.7): Read from CRI. Write to CRI+file. Migration phase2 (CNI 1.8?): Read from file. Write to CRI+file. Migration phase3 (hypothetical): Read/write from file only. This (and the previous) commit implements phase1. [*] There is no "write" to CRI directly, but the logic assumes it happens indirectly as a consequence of a successful CNI add/delete operation. Phase2 and phase3 are thus equivalent in this migration. Note this supports upgrades/downgrades across a single version only (without a node reboot).
10508b3
to
7863f8b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, added some comments.
Will create a follow up CR with some updated documentation and an added test case.
// Migration phase0 (CNI 1.6): Read/write from CRI only. | ||
// Migration phase1 (CNI 1.7): Read from CRI. Write to CRI+file. | ||
// Migration phase2 (CNI 1.8?): Read from file. Write to CRI+file. | ||
// Migration phase3 (hypothetical): Read/write from file only. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the clear documentation. I'll update the readme as well in a follow up PR
@@ -503,7 +504,9 @@ func TestIPAMContext_nodeIPPoolTooLow(t *testing.T) { | |||
} | |||
|
|||
func testDatastore() *datastore.DataStore { | |||
return datastore.NewDataStore(log, datastore.NewTestCheckpoint(datastore.CheckpointData{Version: datastore.CheckpointFormatVersion})) | |||
ds := datastore.NewDataStore(log, datastore.NewTestCheckpoint(datastore.CheckpointData{Version: datastore.CheckpointFormatVersion})) | |||
ds.CheckpointMigrationPhase = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since CheckpointMigrationPhase
is still 1
in the default code, we should have a TestNodeInitPhase1
as well. I'll add another test to cover this.
// If the entry was discovered by querying CRI at | ||
// restart rather than by observing an ADD operation | ||
// directly, then we won't have captured the true | ||
// networkname/ifname. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have unified key for in-memory data store and File ? If so we can avoid the above logic. Also AssignIPv4Address doesnt account for key that has been built during restore(i.e., const BackfillNetworkName = "_migrated-from-cri" and const BackfillNetworkIface = "unknown")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have unified key for in-memory data store and File
We do: IPAMKey
is used for both in-memory and file, as you can see in the call to FindAddressForSandbox
2 lines above this diff chunk.
I think the placeholder values are not relevant for AssignIPv4Address
? AssignIPv4Address
is invoked for new CNI invocations, and the placeholder values are only used for assignments that were already created as part of a previous ipamd (and learned through CRI).
Oh, if you're referring to the "return existing assignment for existing IPAMKey" logic in AssignPodIPv4Address
, then yes. We can extend that to also include this _migrated-from-cri fallback, or I would rather just delete those lines tbh. There is no legitimate reason that that function should ever be called twice for the same IPAMKey (without an intervening DEL) - I just postponed this conversation in my earlier PR...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By unified key, i meant, can we populate entire key into file and restore it back so that we don't have to use const BackfillNetworkName = "_migrated-from-cri" and const BackfillNetworkIface = "unknown ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't. The only case where we ever use these placeholder values is the CRI->file upgrade case where we don't know the correct values (because CRI doesn't record/expose it).
We could drop the network name and iface from the key in all cases (and assume there's only a single network and interface per sandbox), and thus be consistent across upgrade-from-CRI and restart-with-file. I added them because I was trying to actually conform to the CNI spec for the first time, which is quite clear on the intended key:
Plugins that store state should do so using a primary key of (network name, CNI_CONTAINERID, CNI_IFNAME).
For completeness: if we just didn't support live CNI upgrades (without waiting for the next node reboot), then we would never have this whole class of challenges - because ADD/DEL would always be paired correctly on a consistent CNI version with a clear "reset" epoch at node reboot. Not a real suggestion, but just pointing out that live-upgrade is a problem we have created for ourselves and why the CNI interface doesn't handle this better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Gus for the detailed note. If i understand your explanation, CRI api called during nodeInit doesn't seem to provide all the information required to form complete IPAM key. If this is the case, then CNI spec guideline that you have quoted above cannot be met using current CRI API. I would say operational correctness should win over spec conformance and we can revisit conformance once CRI API can provide everything required to form the key. If CNI_CONTAINERID is guaranteed to be unique enough, we could just have this as part of the key
Commit e38a4d8 introduced a method of
checkpointing IPAM state to disk and avoiding CRI access. As written,
that commit failed to handle upgrade/downgrade/upgrade: on the second
upgrade the checkpoint file would exist but contain stale information.
This change improves the logic to use a more robust "expand/contract"
rollout:
Migration phase0 (CNI 1.6): Read/write[*] from CRI only.
Migration phase1 (CNI 1.7): Read from CRI. Write to CRI+file.
Migration phase2 (CNI 1.8?): Read from file. Write to CRI+file.
Migration phase3 (hypothetical): Read/write from file only.
This (and the previous) commit implements phase1.
[*] There is no "write" to CRI directly, but the logic assumes it
happens indirectly as a consequence of a successful CNI add/delete
operation. Phase2 and phase3 are thus equivalent in this migration.
Note this supports upgrades/downgrades across a single version
only (without a node reboot).
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.