Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for UUID version 7 #19

Merged
merged 1 commit into from
Sep 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
119 changes: 119 additions & 0 deletions lib/random/formatter.rb
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,125 @@ def uuid
"%08x-%04x-%04x-%04x-%04x%08x" % ary
end

alias uuid_v4 uuid

# Generate a random v7 UUID (Universally Unique IDentifier).
#
# require 'random/formatter'
#
# Random.uuid_v7 # => "0188d4c3-1311-7f96-85c7-242a7aa58f1e"
# Random.uuid_v7 # => "0188d4c3-16fe-744f-86af-38fa04c62bb5"
# Random.uuid_v7 # => "0188d4c3-1af8-764f-b049-c204ce0afa23"
# Random.uuid_v7 # => "0188d4c3-1e74-7085-b14f-ef6415dc6f31"
# # |<--sorted-->| |<----- random ---->|
#
# # or
# prng = Random.new
# prng.uuid_v7 # => "0188ca51-5e72-7950-a11d-def7ff977c98"
#
# The version 7 UUID starts with the least significant 48 bits of a 64 bit
# Unix timestamp (milliseconds since the epoch) and fills the remaining bits
# with random data, excluding the version and variant bits.
#
# This allows version 7 UUIDs to be sorted by creation time. Time ordered
# UUIDs can be used for better database index locality of newly inserted
# records, which may have a significant performance benefit compared to random
# data inserts.
#
# The result contains 74 random bits (9.25 random bytes).
#
# Note that this method cannot be made reproducable with Kernel#srand, which
# can only affect the random bits. The sorted bits will still be based on
# Process.clock_gettime.
#
# See draft-ietf-uuidrev-rfc4122bis[https://datatracker.ietf.org/doc/draft-ietf-uuidrev-rfc4122bis/]
# for details of UUIDv7.
#
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nevans It would be good to note that fixing the random number seed (e.g., by Kernel#srand) does not make this method reproducible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll add a note for that. If I remember correctly, I had originally written a version that allowed a keyword argument for the timestamp. But when I changed the code to allow extra_timestamp_bits, it didn't seem like it was worth the complexity.

Copy link
Contributor Author

@nevans nevans Sep 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about the following:

Suggested change
#
#
# Note that this method cannot be made reproducable with Kernel#srand, which
# can only affect the random bits. The sorted bits will still be based on
# Process.clock_gettime.
#

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nevans Ah, there are some other ways to fix the random seed than Kernel#srand, such as: Random.new(0).uuid_v7.

How about this? This PR is already merged, so I will create another PR if you are OK.

Note that this method cannot be made reproducable because its output includes not only random bits but also timestamp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mame Looks good. I like your text better than mine. It's simpler, clearer, and covers the Random.new(0) scenario too. :)

# ==== Monotonicity
#
# UUIDv7 has millisecond precision by default, so multiple UUIDs created
# within the same millisecond are not issued in monotonically increasing
# order. To create UUIDs that are time-ordered with sub-millisecond
# precision, up to 12 bits of additional timestamp may added with
# +extra_timestamp_bits+. The extra timestamp precision comes at the expense
# of random bits. Setting <tt>extra_timestamp_bits: 12</tt> provides ~244ns
# of precision, but only 62 random bits (7.75 random bytes).
#
# prng = Random.new
# Array.new(4) { prng.uuid_v7(extra_timestamp_bits: 12) }
# # =>
# ["0188d4c7-13da-74f9-8b53-22a786ffdd5a",
# "0188d4c7-13da-753b-83a5-7fb9b2afaeea",
# "0188d4c7-13da-754a-88ea-ac0baeedd8db",
# "0188d4c7-13da-7557-83e1-7cad9cda0d8d"]
# # |<--- sorted --->| |<-- random --->|
#
# Array.new(4) { prng.uuid_v7(extra_timestamp_bits: 8) }
# # =>
# ["0188d4c7-3333-7a95-850a-de6edb858f7e",
# "0188d4c7-3333-7ae8-842e-bc3a8b7d0cf9", # <- out of order
# "0188d4c7-3333-7ae2-995a-9f135dc44ead", # <- out of order
# "0188d4c7-3333-7af9-87c3-8f612edac82e"]
# # |<--- sorted -->||<---- random --->|
#
# Any rollbacks of the system clock will break monotonicity. UUIDv7 is based
# on UTC, which excludes leap seconds and can rollback the clock. To avoid
# this, the system clock can synchronize with an NTP server configured to use
# a "leap smear" approach. NTP or PTP will also be needed to synchronize
# across distributed nodes.
#
# Counters and other mechanisms for stronger guarantees of monotonicity are
# not implemented. Applications with stricter requirements should follow
# {Section 6.2}[https://www.ietf.org/archive/id/draft-ietf-uuidrev-rfc4122bis-07.html#monotonicity_counters]
# of the specification.
#
def uuid_v7(extra_timestamp_bits: 0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any suggestions for a better keyword parameter name?

case (extra_timestamp_bits = Integer(extra_timestamp_bits))
when 0 # min timestamp precision
ms = Process.clock_gettime(Process::CLOCK_REALTIME, :millisecond)
rand = random_bytes(10)
rand.setbyte(0, rand.getbyte(0) & 0x0f | 0x70) # version
rand.setbyte(2, rand.getbyte(2) & 0x3f | 0x80) # variant
"%08x-%04x-%s" % [
(ms & 0x0000_ffff_ffff_0000) >> 16,
(ms & 0x0000_0000_0000_ffff),
rand.unpack("H4H4H12").join("-")
]

when 12 # max timestamp precision
ms, ns = Process.clock_gettime(Process::CLOCK_REALTIME, :nanosecond)
.divmod(1_000_000)
extra_bits = ns * 4096 / 1_000_000
rand = random_bytes(8)
rand.setbyte(0, rand.getbyte(0) & 0x3f | 0x80) # variant
"%08x-%04x-7%03x-%s" % [
(ms & 0x0000_ffff_ffff_0000) >> 16,
(ms & 0x0000_0000_0000_ffff),
extra_bits,
rand.unpack("H4H12").join("-")
]

when (0..12) # the generic version is slower than the special cases above
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be clearer as when (1..11) ?

Copy link
Contributor Author

@nevans nevans Sep 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. :) I can go either way:

  • On the one hand, because 0 and 12 are handled by the prior clauses, it makes sense to only show the numbers that will be handled by this clause.
  • On the other hand, since this is the generic version, it can handle 0..12, and it's nice to "document" that here.

Although I did consider this earlier, I think I left it as 0..12 mostly by accident: While I was testing and benchmarking the code, I would copy/paste the entire method and then comment out or delete one of the other clauses. So it was temporarily simpler to keep this as 0..12.

So, what do you think?

rand_a, rand_b1, rand_b2, rand_b3 = random_bytes(10).unpack("nnnN")
rand_mask_bits = 12 - extra_timestamp_bits
ms, ns = Process.clock_gettime(Process::CLOCK_REALTIME, :nanosecond)
.divmod(1_000_000)
"%08x-%04x-%04x-%04x-%04x%08x" % [
(ms & 0x0000_ffff_ffff_0000) >> 16,
(ms & 0x0000_0000_0000_ffff),
0x7000 |
((ns * (1 << extra_timestamp_bits) / 1_000_000) << rand_mask_bits) |
rand_a & ((1 << rand_mask_bits) - 1),
0x8000 | (rand_b1 & 0x3fff),
rand_b2,
rand_b3
]

else
raise ArgumentError, "extra_timestamp_bits must be in 0..12"
end
end

private def gen_random(n)
self.bytes(n)
end
Expand Down
48 changes: 48 additions & 0 deletions test/ruby/test_random_formatter.rb
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,54 @@ def test_uuid
assert_match(/\A\h{8}-\h{4}-\h{4}-\h{4}-\h{12}\z/, uuid)
end

def test_uuid_v7(extra_timestamp_bits)
t1 = current_uuid7_time
uuid = @it.uuid_v7
t3 = current_uuid7_time

assert_match(/\A\h{8}-\h{4}-7\h{3}-[89ab]\h{3}-\h{12}\z/, uuid)

t2 = get_uuid7_time(uuid)
assert_operator(t1, :<=, t2)
assert_operator(t2, :<=, t3)
end

def test_uuid_v7_extra_timestamp_bits
0.upto(12) do |extra_timestamp_bits|
t1 = current_uuid7_time extra_timestamp_bits: extra_timestamp_bits
uuid = @it.uuid_v7 extra_timestamp_bits: extra_timestamp_bits
t3 = current_uuid7_time extra_timestamp_bits: extra_timestamp_bits

assert_match(/\A\h{8}-\h{4}-7\h{3}-[89ab]\h{3}-\h{12}\z/, uuid)

t2 = get_uuid7_time uuid, extra_timestamp_bits: extra_timestamp_bits
assert_operator(t1, :<=, t2)
assert_operator(t2, :<=, t3)
end
end

# It would be nice to simply use Time#floor here. But that is problematic
# due to the difference between decimal vs binary fractions.
def current_uuid7_time(extra_timestamp_bits: 0)
denominator = (1 << extra_timestamp_bits).to_r
Process.clock_gettime(Process::CLOCK_REALTIME, :nanosecond)
.then {|ns| ((ns / 1_000_000r) * denominator).floor / denominator }
.then {|ms| Time.at(ms / 1000r, in: "+00:00") }
end

def get_uuid7_time(uuid, extra_timestamp_bits: 0)
denominator = (1 << extra_timestamp_bits) * 1000r
extra_chars = extra_timestamp_bits / 4
last_char_bits = extra_timestamp_bits % 4
extra_chars += 1 if last_char_bits != 0
timestamp_re = /\A(\h{8})-(\h{4})-7(\h{#{extra_chars}})/
timestamp_chars = uuid.match(timestamp_re).captures.join
timestamp = timestamp_chars.to_i(16)
timestamp >>= 4 - last_char_bits unless last_char_bits == 0
timestamp /= denominator
Time.at timestamp, in: "+00:00"
end

def test_alphanumeric
65.times do |n|
an = @it.alphanumeric(n)
Expand Down