Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

helper/resource: Use base 36 to generate incremental ids #16997

Closed
wants to merge 9 commits into from

Conversation

ColinHebert
Copy link
Contributor

@ColinHebert ColinHebert commented Dec 28, 2017

After reading discussions (1, 2, 3, 4, 5, 6) about the managing of the "prefixed names", here's a suggestion to improve the automatic generation of suffixes in terraform.

What changes

  • Instead of using a human readable representation of a date (such as "201712280947212674") which doesn't fully use base 10 (months go only up to 12, days up to 31, minutes and seconds up to 60), use a timestamp which is an always incrementing value, which does not skip steps and is therefore more compact.
  • Rather than using base 10 to represent the timestamp, use base 36 ([a-z0-9]) this gives a wider range of characters to express the number making the suffix even more compact. I went with 36 instead of the classic 64 or 62 ([a-zA-Z0-9]) because some systems do not care about the case or do not deal with 'special characters'.
  • Instead of using a base16 incremental count, use base36 (same reasoning as above)
  • Instead of having a fixed length which requires padding and leads to waste, concatenate the timestamp to the id and let the suffix grow.

This leads to these changes:

  • 20171228094721267400000004 becomes p1o0ix4
  • 20171228094721267400010000 becomes p1o0ixcra

Notes

  • The length of the suffix is not fixed
  • The timestamp and the id have to be be converted independently to keep the same "prefix"
  • The creation order of the resources across multiple runs is kept as the suffixes can be sorted alphabetically (note that 9 < a).
  • The creation order of the resources within the same run is not going to be easily accessible (as we're not padding with 0), for example p1o0ix4 is the 4th object created, p1o0ix5 is the 5th one and p1o0ix40 is the 144th, ordering alphabetically will show the order p1o0ix4, p1o0ix40, p1o0ix5
  • The milliseconds have been dropped from the timestamp. They could be added easily (we could even use nanoseconds, but it feels like a waste of space.
  • The order of the resources will be less accessible when the timestamp grows by one order of magnitude. Today we got the timestamp as p1o0ix, the next move will be at 1000000 which is 2176782336 in base 10 or on the 24th of December 2038 (the following one being in the year 4453).
  • If the above point is a concern, adding milliseconds change the timestamp into jbqb2lgo which moves the next order of magnitude change to 100000000 which is 2821109907456 base 10 which is the 25th of May 2059. I'm not sure that the minor inconvenience of not being able to sort warrants the extra length, but for two characters it isn't the end of the world either.

Tools:

@ColinHebert ColinHebert changed the title Use base 36 to generate incremental ids [WIP] Use base 36 to generate incremental ids Dec 28, 2017
@ColinHebert ColinHebert changed the title [WIP] Use base 36 to generate incremental ids Use base 36 to generate incremental ids Dec 28, 2017
@zymzxq
Copy link

zymzxq commented Dec 29, 2017

+1

@moofish32
Copy link
Contributor

moofish32 commented Jan 22, 2018

I got here from one of the many references. I have a couple of questions:

  1. I am not sure I like the variable length. While I agree on the efficiency, I hate to have a working solution that was fortunate to fit under the max character length only to later break unexpectedly because this length increases. As a user I think I would rather have the padding to prevent unlucky deploys? I don't see this safe guarded in the current code. Is this expected to be handled by resource name logic, somewhere else, or is my concern invalid?
  2. Is even this implementation overkill? So far every time I have used name_prefix is for create_before_destroy in those situations you could literally increment from 0-9 and start over and never break my use case. How many use cases does something as simple as increment solve? Should we add that helper here and allow resource developers the ability to expose it as name_increment in addition to name_prefix? Yes this is scope creep and we can separate the ticket to keep progress moving if this is ready to get merged.

Finally @apparentlymart what's the appetite/timeline to get this type of change into Core? I have hit this issue with ALB and it's quite annoying.

cc/ @weavenet

@ColinHebert
Copy link
Contributor Author

  • Yes I agree that the variable length is a big difference, if you really wish to do that you should be able to enable padding with 0s, but how many zeros would you need to feel confident enough, knowing that 99% of the time the ID is going to look like a 0 filled string (a bit like IDs right now are mostly 0 filled) just for the padding.

  • If your recommendation is not to use prefixes then the question is moot? The reason I'm using prefixes is that external factors can trigger the creation/re-creation of resources without having any human interaction or changes in the configuration of terraform. This "increment your counter by one" is a poorman's way of handling suffixes which I suppose anyone could use but I would rather stay away from that.

@moofish32
Copy link
Contributor

  1. The variable length and the pragmatic problem of how big to make the number is always the challenge. However, we have the time stamp so let's just say we made it 4 characters (base 36). This would mean that a user would need to create 36^4 instances of a resource initiated within the given second to break the convention (double check and make sure I did that right?). To be honest I can't think of use case where I would be doing that; honestly just 3 characters and 46656 is probably overkill. Drop the timestamp to millis and I would hope 36^2 would be sufficient.
  2. I'm not saying the increment replaces this need entirely. However, the create_before_destroy lifecycle mathematically needs only a toggle (true xor true) -> false ; (false xor true) -> true because as long as I am creating with a name that does not exist then it should succeed. I used a single digit because once I burn up the space I might as well use base ten because it's just the same (or for this matter base 36 just to be consistent). AWS is well known for the eventual consistency challenges (specifically when cross account look ups are happening with roles), so the extra buffer would be nice too. The entire question is should we enable a helper that does this simpler algorithm that a resource developer could expose as something in addition to name_prefix (e.g. name_toggle). Perhaps increment was a poor name and I'm indicating this is a separate use case.

@ColinHebert
Copy link
Contributor Author

ColinHebert commented Jan 23, 2018

I could get behind that.

Say we go with date kept in 6 characters, that covers timestamps (in seconds) until zzzzzz (2176782335 or 12/24/2038, can we safely assume that in the next 20 years a different solution will be used? :P)
and 3 for the resource count, up to zzz (or 46655 resources in the second), that leaves a generated prefix of 9 characters fixed (often ending in "001") which, while not as compact as the padding-less version, is still a huge improvement over the 26 characters string we currently have.

Heck make it 7 characters for the timestamp and we are covered until 04/05/4453 and have a nice round 10 characters generated for the suffix (starting with 0 for the next 20 years).

@mgood
Copy link
Contributor

mgood commented Feb 8, 2018

Even with random suffixes, the number of resources this would need to handle is pretty low, so it seems like collisions can be minimized even with a fairly small number of bits.

1,000 ALBs with the same prefix seems like a fairly high number, but (if I did the math right) with just 16 hex digits (64 bits) the chance of a single collision is on the order of 10^-12%.

Does Terraform already check for collisions internally within the resource names in a given configuration? For example, if it's assigning 1,000 ALB names with a given prefix, it could generate them randomly, and detect and regenerate any duplicates before assigning the names to the resources. It doesn't take many bits to minimize the collisions and reduce how long it would take to assign them.

Is the implementation also able to check if a resource name is already taken on AWS before assigning it?

@KursLabIgor
Copy link

any updates here?

@teamterraform teamterraform changed the title Use base 36 to generate incremental ids helper/resource: Use base 36 to generate incremental ids Aug 8, 2019
@teamterraform
Copy link
Contributor

Hello, and sorry for the long silence!
The Terraform SDK has been extracted to its own repository (https://github.com/hashicorp/terraform-plugin-sdk). In support of this we are freezing the related codepaths in Terraform and closing PRs. Thank you!

@ghost
Copy link

ghost commented Nov 24, 2019

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Nov 24, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants