-
Notifications
You must be signed in to change notification settings - Fork 9
Further clarify UUIDv1 (UUIDv6) Clock Sequence Generation? #41
Comments
If I read your request correctly: Else new timestamp or unknown previous timestamp: |
@kyzer-davis I think your summarization is almost right, with one minor comment. I wasn't thinking of it in a sense of "versus" stalling when I wrote up this issue, but this would definitely reduce the number of situations where an implementer following the RFC as written would need to do that. For UUIDv1, it seems extremely unlikely a single stateful generator could generate enough UUIDs (4096) to loop back to the same clock sequence value without the timestamp having changed on most systems. There may be systems in use that don't have a precise clock and could generate 4096 UUIDs before the timestamp changes, so that assumption may not hold true. Considering that, we may still want to include language that explicitly guards against the clock sequence looping back around and generating duplicate UUIDs. I think either erroring or stalling when you detect you're about to generate a duplicate clock sequence is appropriate, so this is why I wouldn't summarize it as a "versus" but more of a "yes and". I do have a similar open question (#40) as to whether we should error/stall UUIDv7s when the sequence clock is about to roll over. Maybe we want to unify the language for both to have the same behaviors. With UUIDv1/6 starting at a random value, and UUIDv7 starting at 0. |
TL;DRThe 'clock sequence' and the timestamp 'count' are not the same thing. They have different goals. The 'clock sequence' is a misnomer. The timestamp 'count' is just a counter with many names. The 'count'Many people confuse the 'clock sequence' with the 'count' used to simulate a high resolution timestamp. RFC-4122 says the following in the fourth paragraph of Section-4.2.1.2:
Each implementation calls the 'count' with a different name. These are some names I found:
Note that all of the above 'count' implementations have the same function: simulate a high resolution timestamp. This is the common formula used, assuming a system time with millisecond precision:
For example, a generator that relies on a system clock that ticks every millisecond must use a 'count' variable that can be initialized to zero and increments whenever the system millisecond repeats. But it has a limit of 10,000 increment operations. When the 'count' reaches this limit, the Section-4.2.1.2 says that the generator should either throw an error or wait the clock to advance. If the generator is allowed to generate more than 10,000 UUIDs in the same millisecond, the generator would be violating the limit of 10,000,000 per second per machine:
The clock sequenceThe clock sequence has a very different goal:
The RFC-4122 says that the clock sequence must be changed when:
The clock sequence 'can just be incremented' (like a sequence). But it is a MAY, not a SHOULD. I've found some implementations that always randomize the clock sequence here and here. I think the word 'sequence' is the cause of the confusion. Although the 'clock sequence' behaves like a sequence when it is incremented, it is a random number. So, the UUID v1 has not just two parts, namely time (100-nanos of the Gregorian epoch) and place (IEEE address), but THREE: time, place and some entropy. Some people add another task to the 'clock sequence': to be an extension of the timestamp. I don't see any problem with this, but it's not explicit in RFC-4122. Read this text from Wikipedia:
If a developer wants to add the task of timestamp extension, he/she just needs to replace the comparison operator '<' with '<=': if (timestamp <= last_time)
clockseq++; Thanks for reading! :) |
@fabiolimace This is a terrific explanation! I'm not sure exactly how, but this clarification should absolutely appear in the new draft. |
Hello,
I understand that this may be beyond the scope of what you intended with your draft, but since it's slated to update 4122, and it has an impact on UUIDv6, I thought I'd drop this here in case you wanted some well-intentioned scope creep. 😄
While going through and manually revalidating my implementation of UUIDv6, I realized the library I maintained was generating UUIDv1 clock sequences subtly different than how it is defined in RFC 4122. This library was forked from an earlier one, and so this implementation detail of my library originated there. Here is what section 4.1.5 of RFC 4122 says:
This says the clock sequence is used to help avoid duplicates that could arise when the clock is set backwards in time. What this means is that the number of UUIDs a stateful generator can generate is limited by the precision of the timestamp, whereas a stateless generator using a random clock sequence does not face the same limitations. That seems like an unintentional side effect of the wording of the RFC. Especially when the RFC says you should either fail to generate the UUID or block until it's safe to do so:
So the person who originally implemented the UUIDv1 generator I now maintain, did more than the RFC said to do to avoid this failure mode and also incremented the clock sequence whenever a UUIDv1 has already been generated for this timestamp. This effectively removed the limitation on the number of UUIDv1s the stateful generator could generate, albeit while technically violating the language of the RFC.
After I discovered this, I became curious about other UUIDv1 implementations in Go, and other languages, and I came to find that every stateful UUIDv1 implementation I looked at chose to not follow this section of the RFC as it was written. Instead, they either implemented it the same as the code I maintain, or they took the simpler approach of incrementing the sequence on every generation. The C implementation provided in RFC 4122 was implemented as the RFC itself was written, but that was the only stateful one I found doing so.
So that makes me wonder if we should use this opportunity to update the clock sequence generation language from 4122, so that UUIDv6 stateful generator implementers aren't also encouraged to violate the RFC as it's currently written.
I'd be interested in your thoughts.
The text was updated successfully, but these errors were encountered: