Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nexus] Affinity and Anti-Affinity Groups #7076

Open
wants to merge 45 commits into
base: main
Choose a base branch
from
Open

[nexus] Affinity and Anti-Affinity Groups #7076

wants to merge 45 commits into from

Conversation

smklein
Copy link
Collaborator

@smklein smklein commented Nov 15, 2024

This PR is a partial implementation of RFD 522

It adds:

  • Affinity and Anti-Affinity groups, contained within projects. These groups are configured with a policy and failure domain can currently contain zero or more members. Affinity groups attempt to co-locate members, anti-affinity groups attempt to avoid co-locating members.
    • Policy describes "what to do if we cannot fulfill the co-location request". Currently, these options are "fail" (reject the request) or "allow" (continue with provisioning of the group member regardless).
    • Failure Domain describes the scope of what is considered "co-located". In this PR, the only option is "sled", but in the future, this may be expanded to e.g. "rack".
    • Members describe what can be added to affinity/anti-affinity groups. In this PR, the only option is "instance". RFD 522 describes how "anti-affinity groups may also contain affinity groups" -- which is why this "member" terminology is introduced -- but it is not yet implemented.
  • (anti-)Affinity groups are exposed by the API, through a CRUD interface
  • (anti-)Affinity groups are considered during "sled reservation", where instances are placed on a sled. This is most significantly implemented (and tested) within nexus/db-queries/src/db/datastore/sled.rs.

See: #1705

  • API & DB
    • Affinity...
      • List groups
      • List group members
      • Get Group
      • Get Group member
      • Create Group
      • Add Group Member
      • Delete Group
      • Delete Group Member
      • Update Group
    • Anti-Affinity...
      • List groups
      • List group members
      • Get Group
      • Get Group member
      • Create Group
      • Add Group Member
      • Delete Group
      • Delete Group Member
      • Update Group
  • Integrations
    • Affinity...
      • authz integration
      • omdb support
      • Instance provisioning
      • Instance Deletion (removal from groups)
      • Project Deletion (no deletion while groups exist)
    • Anti-Affinity...
      • authz integration
      • omdb support
      • Instance provisioning
      • Instance Deletion (removal from groups)
      • Project Deletion (no deletion while groups exist)
  • Tests
    • Affinity...
      • authz tests
      • database tests
      • integration tests
    • Anti-affinity...
      • authz tests
      • database tests
      • integration tests
    • Schema migration tests

common/src/api/external/mod.rs Outdated Show resolved Hide resolved

#[derive(Clone, Debug, Deserialize, Serialize, JsonSchema)]
pub enum AffinityGroupMember {
Instance(Uuid),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is an "enum of one", but RFD 522 discusses having anti-affinity groups which contain "either instances or affinity groups". I figured I'd just define these as "members" to be flexible for future work

common/src/api/external/mod.rs Show resolved Hide resolved
@hawkw hawkw self-requested a review November 15, 2024 18:03
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully a quick drive-by review wasn't too premature, but this stuff is relevant to my interests, so I wanted to take a peek. Overall, everything looks very straightforward and reasonable --- I commented on a few small things, but it's quite possible you were planning to get to all of them and just hadn't gotten around to it yet.

common/src/api/external/mod.rs Outdated Show resolved Hide resolved
nexus/db-model/src/affinity.rs Outdated Show resolved Hide resolved
Comment on lines 73 to 85
#[derive(Queryable, Insertable, Clone, Debug, Selectable)]
#[diesel(table_name = affinity_group_instance_membership)]
pub struct AffinityGroupInstanceMembership {
pub group_id: DbTypedUuid<AffinityGroupKind>,
pub instance_id: DbTypedUuid<InstanceKind>,
}

#[derive(Queryable, Insertable, Clone, Debug, Selectable)]
#[diesel(table_name = anti_affinity_group_instance_membership)]
pub struct AntiAffinityGroupInstanceMembership {
pub group_id: DbTypedUuid<AntiAffinityGroupKind>,
pub instance_id: DbTypedUuid<InstanceKind>,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I note that these lack created/deleted timestamps, implying that:

  1. we intend to hard-delete rather than soft-delete them, and,
  2. we don't presently record when instances were added to affinity/anti-affinity groups, so we can't present that in UIs in the future.

I'm not sure if either of these matter to us, but I figured I'd comment on it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I wrote this I was kinda planning on just using hard deletion for membership. I could definitely add the "time_modified" and "time_deleted" columns, and use soft-deletion here too, but as usual, we'll need to be more cautious with our indexing.

("Why hard delete" -> this was kinda arbitrary, my decision here isn't super strong, but I'm more familiar with us using soft deletion for user-facing objects that have full CRUD APIs, and hard-deletion for more internal-facing stuff, to avoid the cost of garbage collecting later, which we haven't really done at all)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's totally fair!

The reason I brought up deletion was because if we start out with a schema that uses hard-deletion, and decide to switch to soft-deletion later in order to do something like display affinity group histories in the UI, we can't get back records from before that change, since...they've been deleted. On the other hand, if we started with soft-deletion, we could switch to hard-deletion and blow away any soft-deleted records if we decide to not use them in that way.

On the other hand, maybe the problem of displaying historical affinity group changes is better solved by other things, like audit logging! I dunno.

nexus/db-model/src/schema.rs Show resolved Hide resolved
nexus/external-api/src/lib.rs Outdated Show resolved Hide resolved
nexus/external-api/src/lib.rs Outdated Show resolved Hide resolved
common/src/api/external/mod.rs Show resolved Hide resolved
common/src/api/external/http_pagination.rs Outdated Show resolved Hide resolved

async fn sled_reservation_create_inner(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @hawkw , you asked about this in the Hypervisor Sync.

The original API of the function pub async fn sled_reservation_create still exists above, but I made this "inner" function to get better testing of errors.

But we could also report these errors through e.g. an alerting system or something.

@smklein smklein changed the title [wip] Sketching out API, DB models for affinity and anti-affinity [nexus] Affinity and anti-affinity Jan 29, 2025
@@ -312,6 +312,19 @@ pub type PaginatedByNameOrId<Selector = ()> = PaginationParams<
pub type PageSelectorByNameOrId<Selector = ()> =
PageSelector<ScanByNameOrId<Selector>, NameOrId>;

pub fn id_pagination<'a, Selector>(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using this in http_entrypoints.rs to paginate over "group members", which only have UUIDs, not names.

instance_id,
).get_results_async::<(AffinityPolicy, Uuid)>(&conn).await?;

// We use the following logic to calculate a desirable sled,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sorta the "meat and potatoes" of this PR -- specifically, updating the logic for instance placement based on affinity groups.

This file also contains several tests for various combinations of affinity groups / instances, and I'm always happy to add more.

@@ -213,7 +213,7 @@ CREATE INDEX IF NOT EXISTS lookup_sled_by_policy_and_state ON omicron.public.sle
);

CREATE TYPE IF NOT EXISTS omicron.public.sled_resource_kind AS ENUM (
-- omicron.public.instance
-- omicron.public.vmm ; this is called "instance" for historical reasons.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tripped me up. This is not the UUID of an instance, it's the UUID of a propolis VMM that has been allocated. These are very subtly different objects.

Comment on lines +242 to +244

-- The UUID of an instance, if this resource belongs to an instance.
instance_id UUID
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... I do, however, want to know the instance UUID of this object, if it exists.

I don't think I can rely on the omicron.public.vmm table existing by the time reservations are made - note that e.g. for instance start, "allocating a server" happens before "VMM record creation":

builder.append(alloc_server_action());
builder.append(alloc_propolis_ip_action());
builder.append(create_vmm_record_action());

So - when we create a sled_resource record on behalf of an instance, we need a way for other requests to identify "what instance does this VMM belong to?". Hence the addition of this optional field.

@smklein smklein marked this pull request as ready for review January 29, 2025 17:37
@smklein smklein changed the title [nexus] Affinity and anti-affinity [nexus] Affinity and Anti-Affinity Groups Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants