-
Notifications
You must be signed in to change notification settings - Fork 38
Discuss filesystem locks and the need for repeated calls to get the operation status #36
Comments
With crossplane/crossplane-runtime#283 , generic reconciler doesn't call Until we lift the requirement that the controller needs to call |
I agree that we should simplify the flow considering well-known environment we know our controllers running:
So, I don't see much benefit in trying to keep things synced with filesystem locks. |
Closing this, let's add comments to #74 for further discussion. |
What problem are you facing?
During the initial implementation, in order to move fast and see our limits we deferred some of the discussion about the implementation and how we should do the bookkeeping was one of them. Currently, we manage the state and bookkeeping using filesystem. However, we also interact with Terraform's lock and state files, which makes the code complex to understand for readers and we don't really need to keep that information out of memory since the crash of the process means a restart of the
Pod
anyway. Additionally, there are some opportunities to merge some of the functionality in the code to get a simpler flow that is easier to read and change.How could Terrajet help solve your problem?
I suggest that we use a global map that does the bookkeeping of the running CLI executions. Roughly, I think we need the following improvements in the initial implementation. Please feel free to add more things that we had deferred to after PoC phase. cc @ulucinar @turkenh
map[string]Workspace
whose key ismetadata.uuid
.Workspace
, should update the map before it starts and after it completes, similar to today.Describe
method on thatWorkspace
object that will tell you the current status, i.e.creating
,destroying
orno-op
.apply
pipeline to be run repeatedly to check the status. But no other provider does that except some of Azure APIs, which is not user friendly at all in a controller, so we run into possibility of violating generic reconciler assumptions.Create
repeatedly because we need to make the sameApply
call to finalize it but since Account for two different kinds of consistency issues crossplane-runtime#283 , this won't be allowed because generic reconciler won't callCreate
again in the grace period.map[string]Workspace
can be treated just like a cloud provider SDK client. We don't need to know Terraform CLI, pipeline/operation details or propagating status as error etc. We can usemap[string]interface{}
to pass down spec configuration and receiveStateV4
and information about whether it'screating
ordestroying
.ExternalClient
implementation directly call a single interface. Because if you look at it today, it's merely a shim on top of conversion layer. Since we use similar names, it makes it really hard to follow the layers.ApplyResult
part, reporting current status should be the job ofDescribe
) which is very similar to the set of functions we use in native providerExternalClient
implementations.Note that some of the technical debt is accrued on purpose because we needed to be able to work in parallel during PoC and discover things from scratch. Before declaring terrajet as prod-ready and solidify the implementation with more tests, I think it's time to revisit the parts that we completed during that phase and refactor into a more cohesive structure because thanks to that initial effort we know what works, how Terraform behaves and the rough performance implications of the decisions.
The text was updated successfully, but these errors were encountered: