Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve CRI->checkpoint logic in the face of downgrades #1123

Merged
merged 2 commits into from
Aug 11, 2020

Conversation

anguslees
Copy link
Contributor

@anguslees anguslees commented Aug 7, 2020

Commit e38a4d8 introduced a method of
checkpointing IPAM state to disk and avoiding CRI access. As written,
that commit failed to handle upgrade/downgrade/upgrade: on the second
upgrade the checkpoint file would exist but contain stale information.

This change improves the logic to use a more robust "expand/contract"
rollout:

Migration phase0 (CNI 1.6): Read/write[*] from CRI only.
Migration phase1 (CNI 1.7): Read from CRI. Write to CRI+file.
Migration phase2 (CNI 1.8?): Read from file. Write to CRI+file.
Migration phase3 (hypothetical): Read/write from file only.

This (and the previous) commit implements phase1.

[*] There is no "write" to CRI directly, but the logic assumes it
happens indirectly as a consequence of a successful CNI add/delete
operation. Phase2 and phase3 are thus equivalent in this migration.

Note this supports upgrades/downgrades across a single version
only (without a node reboot).

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Commit e38a4d8 introduced a method of
checkpointing IPAM state to disk and avoiding CRI access.  As written,
that commit failed to handle upgrade/downgrade/upgrade: on the second
upgrade the checkpoint file would exist but contain stale information.

This change improves the logic to use a more robust "expand/contract"
rollout:

Migration phase0 (CNI 1.6): Read/write[*] from CRI only.
Migration phase1 (CNI 1.7): Read from CRI.  Write to CRI+file.
Migration phase2 (CNI 1.8?): Read from file. Write to CRI+file.
Migration phase3 (hypothetical): Read/write from file only.

This (and the previous) commit implements phase1.

[*] There is no "write" to CRI directly, but the logic assumes it
happens indirectly as a consequence of a successful CNI add/delete
operation. Phase2 and phase3 are thus equivalent in this migration.

Note this supports upgrades/downgrades across a single version
only (without a node reboot).
Copy link
Contributor

@mogren mogren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, added some comments.

Will create a follow up CR with some updated documentation and an added test case.

Comment on lines +66 to +69
// Migration phase0 (CNI 1.6): Read/write from CRI only.
// Migration phase1 (CNI 1.7): Read from CRI. Write to CRI+file.
// Migration phase2 (CNI 1.8?): Read from file. Write to CRI+file.
// Migration phase3 (hypothetical): Read/write from file only.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the clear documentation. I'll update the readme as well in a follow up PR

@@ -503,7 +504,9 @@ func TestIPAMContext_nodeIPPoolTooLow(t *testing.T) {
}

func testDatastore() *datastore.DataStore {
return datastore.NewDataStore(log, datastore.NewTestCheckpoint(datastore.CheckpointData{Version: datastore.CheckpointFormatVersion}))
ds := datastore.NewDataStore(log, datastore.NewTestCheckpoint(datastore.CheckpointData{Version: datastore.CheckpointFormatVersion}))
ds.CheckpointMigrationPhase = 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since CheckpointMigrationPhase is still 1 in the default code, we should have a TestNodeInitPhase1 as well. I'll add another test to cover this.

@jayanthvn jayanthvn requested a review from srini-ram August 7, 2020 20:13
@mogren mogren merged commit 170d3c1 into aws:master Aug 11, 2020
// If the entry was discovered by querying CRI at
// restart rather than by observing an ADD operation
// directly, then we won't have captured the true
// networkname/ifname.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have unified key for in-memory data store and File ? If so we can avoid the above logic. Also AssignIPv4Address doesnt account for key that has been built during restore(i.e., const BackfillNetworkName = "_migrated-from-cri" and const BackfillNetworkIface = "unknown")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have unified key for in-memory data store and File

We do: IPAMKey is used for both in-memory and file, as you can see in the call to FindAddressForSandbox 2 lines above this diff chunk.

I think the placeholder values are not relevant for AssignIPv4Address? AssignIPv4Address is invoked for new CNI invocations, and the placeholder values are only used for assignments that were already created as part of a previous ipamd (and learned through CRI).

Oh, if you're referring to the "return existing assignment for existing IPAMKey" logic in AssignPodIPv4Address, then yes. We can extend that to also include this _migrated-from-cri fallback, or I would rather just delete those lines tbh. There is no legitimate reason that that function should ever be called twice for the same IPAMKey (without an intervening DEL) - I just postponed this conversation in my earlier PR...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By unified key, i meant, can we populate entire key into file and restore it back so that we don't have to use const BackfillNetworkName = "_migrated-from-cri" and const BackfillNetworkIface = "unknown ?

Copy link
Contributor Author

@anguslees anguslees Aug 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't. The only case where we ever use these placeholder values is the CRI->file upgrade case where we don't know the correct values (because CRI doesn't record/expose it).

We could drop the network name and iface from the key in all cases (and assume there's only a single network and interface per sandbox), and thus be consistent across upgrade-from-CRI and restart-with-file. I added them because I was trying to actually conform to the CNI spec for the first time, which is quite clear on the intended key:

Plugins that store state should do so using a primary key of (network name, CNI_CONTAINERID, CNI_IFNAME).

For completeness: if we just didn't support live CNI upgrades (without waiting for the next node reboot), then we would never have this whole class of challenges - because ADD/DEL would always be paired correctly on a consistent CNI version with a clear "reset" epoch at node reboot. Not a real suggestion, but just pointing out that live-upgrade is a problem we have created for ourselves and why the CNI interface doesn't handle this better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Gus for the detailed note. If i understand your explanation, CRI api called during nodeInit doesn't seem to provide all the information required to form complete IPAM key. If this is the case, then CNI spec guideline that you have quoted above cannot be met using current CRI API. I would say operational correctness should win over spec conformance and we can revisit conformance once CRI API can provide everything required to form the key. If CNI_CONTAINERID is guaranteed to be unique enough, we could just have this as part of the key

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants