Mutexes are a way to coordinate multiple copies of terraform by making them take turns in a Blueprint. They can also be tricky to set up and should be on the "figure that out later" list if you're just starting out with the Apstra Terraform provider.
If you're learning, you're in development, or you're in a production environment where you're not worried about multiple instances of terraform running at the same time, disabling the mutex feature will probably help maintain your sanity:
provider "apstra" {
blueprint_mutex_enabled = false
}
Definitely turn it back on if you have multiple jobs waiting to spring into action when the bell rings at the end of the business day. It's safer to force these processes to take turns modifying a blueprint, rather than thrashing changes and commits into a single blueprint all at once.
If your terraform apply
seems stuck, it's probably a mutex problem:
apstra_datacenter_virtual_network.b: Still creating... [10s elapsed]
apstra_datacenter_virtual_network.b: Still creating... [20s elapsed]
apstra_datacenter_virtual_network.b: Still creating... [30s elapsed]
apstra_datacenter_virtual_network.b: Still creating... [40s elapsed]
apstra_datacenter_virtual_network.b: Still creating... [50s elapsed]
apstra_datacenter_virtual_network.b: Still creating... [1m0s elapsed]
apstra_datacenter_virtual_network.b: Still creating... [1m10s elapsed]
See Manually Clearing Mutexes (below) to handle this situation.
Pending changes to an Apstra Blueprint are "staged" in a non-operational clone of the live Blueprint. When changes are promoted from the staging Blueprint, they're all sent together. Every detail configured in the staging Blueprint is promoted at the same time.
This all-or-nothing strategy requires that apstra administrators coordinate their efforts: Each administrator should endeavor to start with a clean-slate (an unmodified staging blueprint), make their changes, and complete their deployment in an expeditious manner so that other administrators aren't left to contend with half-baked changes, and so nobody inadvertently promotes somebody else's configuration experiment.
The same holds for non-human Apstra clients like Terraform, but these clients are more likely to kick off parallel execution workstreams than human users. So how do we keep two invocations of Terraform run by the "jenkins" Apstra user from polluting each other's staging blueprints?
A mutex is any object which signifies "mutual exclusion", or a need for exclusive access to some resource. You can think of it as a "do not disturb" sign hanging on a hotel room doorknob. It's trivially bypassed by anyone with a room key, but the expectation is that anyone with access to the room will honor the sign placer's desire for exclusive use of the room.
The important features of our mutex are:
- Everyone can see the mutex.
- Nobody wonders: Is the mutex mine?
- The mutex identifies a specific blueprint.
- The mutex doesn't affect the blueprint.
Tags in the global catalog are well positioned to satisfy our requirements:
Automation processes aren't constrained to running on a single system, so an in-memory or on-filesystem mutex doesn't fit the bill. The mutex needs to live on a network service accessible to all systems which might attempt writes to a single blueprint. Perhaps the Apstra API is appropriate?
Multiple Tags in the Apstra API cannot share a single name. When a client attempts to create a tag using an existing name, the Apstra API returns an error indicating this tag already exists. There's no risk of two clients simultaneously attempting to create the same mutex/tag, and both believing that they have succeeded, because one will have received an error.
The mutex/tags use a well-known scheme which uniquely identifies a specific blueprint.
We hope that other automation systems adopt this strategy so that the Terraform provider won't find itself in conflict with other automation systems using the Apstra API.
Because these tags live in the Global Catalog, creating and deleting them does not revise any blueprint.
The Terraform provider for Apstra has two tag-related configuration attributes:
blueprint_mutex_enabled
(Boolean)true
When true, the provider creates a blueprint-specific mutex / tag before modifying any Blueprint. If it is unable to create the tag because it already exists (the blueprint is locked), the provider will wait until the tag is removed (the blueprint is unlocked). There is no timeout. This is setting is probably appropriate for a production network environment.false
When false, the provider neither creates mutex / tags, nor checks if one exists before making changes. This setting is reasonable to use in a development environment, or anywhere that there is no concern about concurrent access by multiple instances of Terraform (or similar automation software).null
Whennull
or omitted (the default), the provider behaves the same as thefalse
case, but also prints a warning that exclusive access is not guaranteed and the user should take steps to understand the risks and then explicitly opt in or out of locking.
blueprint_mutex_message
(String, Optional) the mutex / tag'sDescription
field is not prescribed, by the locking scheme and can be used to indicate what system or process created the mutex, or other information which might be useful in the event that it must be manually cleared. Environment variables will be expanded in the message, so it can include usernames, PIDs, etc...
When locking is enabled, every resource which modifies a resource will ensure
that the blueprint is locked before executing any Create
, Update
, or
Delete
operations. Only the first resource to assert a specific mutex in any
terraform apply
run actually causes the mutex to be created. Subsequent
resources within a single run of terraform apply
re-use the earlier mutex. It
is not created and then destroyed by each resource in sequence, because doing so
would create an opening for a different Terraform to pollute our staging
blueprint.
Mutxes are automatically cleared in exactly two circumstances:
- When the blueprint is deleted by destruction of the
apstra_datacenter_blueprint
resource - When the blueprint is committed by the
apstra_blueprint_deploymnet
resource
Practically, these rules mean that automatically asserting and clearing locks is
only possible when the Terraform resources are arranged with appropriate
Terraform lifecycle decorators to ensure the apstra_blueprint_deploymnet
resource completes after every other blueprint has completed its changes.
When a mutex has been left behind after Terraform exits, either because of a bug, loss of connectivity to the Apstra API, or a premature exit due to invalid or impossible resource configuration, no system which honors the mutex will be able to proceed until the mutex is manually cleared.
In the specific case of a terraform apply
, this will look like an interminable
apply
run which never seems to make any progress. In reality, it's regularly
polling the API, waiting for the offending mutex to disappear so that it can
get to work.
It is reasonable and safe to manually clear a mutex any time one has been left
behind so long as it's clear that the mutex's creator (probably some earlier
instance of terraform apply
) has exited without clearing it.
To manually clear a mutex:
- Open the Apstra Web UI.
- Navigate to [Design] -> [Tags]
- Identify the offending mutex / tag and remove it using its "Delete" button.