-
Notifications
You must be signed in to change notification settings - Fork 1
/
index.html
377 lines (300 loc) · 14.5 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="chrome=1">
<title>"Version 6" UUIDs</title>
<meta name="description" content="UUID 'Version 6' is an alternate format which sorts correctly as raw bytes and is suitable for use as a primary key value in a database.">
<link rel="stylesheet" href="stylesheets/styles.css">
<link rel="stylesheet" href="stylesheets/github-light.css">
<script src="javascripts/scale.fix.js"></script>
<meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=no">
<!--[if lt IE 9]>
<script src="//html5shiv.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
</head>
<body>
<div class="wrapper">
<header>
<h1 class="header">UUID "Version 6"</h1>
<p class="header">The version <a target="rfc4122" href="https://tools.ietf.org/html/rfc4122">RFC 4122</a> forgot.</p>
<hr/>
<p>Implementation in Go:
<ul>
<!-- <li class="download"><a class="buttons" href="https://github.com/bradleypeabody/uuidv6/zipball/master">Download ZIP</a></li> -->
<!-- <li class="download"><a class="buttons" href="https://github.com/bradleypeabody/uuidv6/tarball/master">Download TAR</a></li> -->
<li><a class="buttons github" href="https://github.com/bradleypeabody/gouuidv6">View on GitHub</a></li>
</ul>
<!-- <p class="header">This project is maintained by <a class="header name" href="https://github.com/bradleypeabody">bradleypeabody</a></p> -->
</header>
<section>
<h3>UPDATE</h3>
<p>A proposal has been created <a href="https://tools.ietf.org/html/draft-peabody-dispatch-new-uuid-format" target="_blank">here</a>
in the form of an IETF draft.
It is newer than the format described on this page and the result of that proposal will
likely determine the direction for this idea.</p>
<h3>UUIDs: A Common Case</h3>
<p>Universally Unique Identifiers are useful in many scenarios. And
<a target="rfc4122" href="https://tools.ietf.org/html/rfc4122">RFC 4122</a>
describes 5 versions, indicated for use in various cases.</p>
<p>This document describes a proposed sixth version which in the author's
opinion addresses a relatively minor but nonetheless significant shortcoming
in the UUID specification, specifically when compared to the Version 1
UUIDs from the RFC.</p>
<h3>Motivation</h3>
<p>RFC 4122 is dated July 2005. Times have changed. Today, CPUs are
faster and applications are frequently far more distributed. <strong>This
results in an increased need to generate globally unique identifiers
on different nodes in a distributed system and have these contain
properties appropriate for a database primary key.</strong></p>
<p>The existing RFC comes close, but stops short of describing a UUID
suitable to this scenario. Specifically, a value useful for
this case would have the following properties:
<ul>
<li><strong>Sorting by its raw bytes results in a sequence equivalent
to sorting by its
embedded timestamp,</strong> thus making it naturally more useful as a
primary key and providing improved reference locality and thus
insert performance. (I.e. primary keys that are out of sequence are
not just useless for sorting, but can also cause unnecessary disk
access due to occupying significantly different locations in
the index. <a target="_blank" href="https://www.percona.com/blog/2014/12/19/store-uuid-optimized-way/">[example]</a>)</li>
<li><strong>The embedded time should be extractable</strong> for use as the
creation time of the record.</li>
<li><strong>Global uniqueness,</strong> which is of course a basic requirement
of all types of UUIDs.</li>
</ul>
<p>Unfortunately, none of the existing 5 UUID versions meet all
3 requirements. The main issue being that the byte sequence
for the time field makes meeting the first requirement impossible
with the existing specification.</p>
<!-- <h3>Useful Scenarios</h3>
<p>Here are just a few examples of cases where such a UUID would be
applicable:
<ul>
<li>https://www.percona.com/blog/2014/12/19/store-uuid-optimized-way/</li>
<li>https://blog.codinghorror.com/primary-keys-ids-versus-guids/</li>
<li>http://forums.mysql.com/read.php?98,49626,172029</li>
</ul>
-->
<h3>Specification for 'Version 6'</h3>
<p><em>TL;DR: 'Version 6' UUIDs have the date/time encoded from high to low
bytes (bit-shifting around the version field in order to preserve its
location) and thus sort correctly by time when treated as an
opaque bunch of bytes; the clock
sequence can be used to avoid duplicates generated at the same
time; it's okay to use random data from a good PRNG in place
of the MAC address; the rest is the same as RFC 4122.</em></p>
<p>The description here will be kept primarily to the differences
from what is given in the RFC.</p>
<h4>Timestamp</h4>
<p>The only fields that change layout are the ones related to the timestamp.
The version field retains its same position for backward compatibility,
so applications can test for the version number in a consistent
way. Assuming the timestamp begins as a 60-bit unsigned integer
(usually implemented as a 64-bit unsigned int with the top 4 bits
unused), the most significant 48 bits of the timestamp appear
first (in order from most significant first, toward least significant),
followed by the 4-bit version field, followed by the remaining 12 bits
from the timestamp.<p>
<p>That gives a layout as follows: (timestamp shown in red)</p>
<pre><code><strong>Bytes 0-7:</strong> (each digit shown is hex, 4 bits)
<span class="color-red">00 00 00 00 00 00</span> 0<span class="color-red">0 00</span>
| | || |
---------------- || |
timestamp || |
bits 59-12 || |
version| |
--
timestamp
bits 11-0
<strong>Bytes 8-15:</strong> (same as RFC 4122)
00 00 00 00 00 00 00 00
|| | | |
|| | ________________
|| | node
| --
| clock seq bits 11-0
2 bits variant, 2 bits
are 13-12 of clock seq
<span class="grayish">NOTE: Just to be perfectly clear, even though the bit numbers
are given from high to low, what is being indicated is that
the byte sequence is big-endian, NOT that the bits should somehow
be reversed bit by bit. I would have used bytes but the version
field is only 4 bits, so it's necessary to describe the fields
in terms of their bit sizes. Hopefully all that is apparent.</span>
</code></pre>
<p>Assuming <code>t</code> is a 64-bit unsigned integer (<code>uint64_t</code>)
containing the timestamp as described in the RFC (in hundred-nanosecond intervals
since midnight 15 October 1582 UTC), the C code to make the adjustment
described above would be:
<p><code>((t << 4) & 0xFFFFFFFFFFFF0000) | (t & 0x0FFF) | 0x6000</code></p>
<p><span class="grayish">n.b. (the <code>0x6000</code> value is of course the version number - 6)</span>
<p>This would then be written in network byte order/big-endian to output
the first 8 bytes of the UUID, i.e. the first byte would get the value
<code>(t >> 56)</code> the second would be <code>((t >> 48) & 0xFF)</code>, etc.</p>
<h4>Clock Sequence</h4>
<p>In order to prevent duplicates on modern systems where it is very
possible to have two UUIDs created within the same hundred-nanosecond
interval, a minor difference is indicated in the behavior of the clock
sequence field (see RFC <a target="rfc4122" href="https://tools.ietf.org/html/rfc4122#section-4.1.5">section 4.1.5</a>). Instead of <a target="rfc4122"
href="https://tools.ietf.org/html/rfc4122#section-4.2.1.2">returning an
error or blocking</a> (both of which are undesirable and unnecessary) if
a duplicate UUID would be returned due to this case, instead the
clock sequence should be incremented and used for the would-be duplicate ID -
thus making it unique. This is the same action described in the RFC
should the clock move backward and it works just as well in this case.
In practice, this can be implemented as simple as (C) code like the
following, run in a critical section:
<p><pre><code>if (last_timestamp >= this_timestamp) {
clock_sequence++;
}</code></pre>
<h4>Node Field Value</h4>
<p>In section <a target="rfc4122" href="https://tools.ietf.org/html/rfc4122#section-4.5">4.5
of the RFC</a> it describes an alternate method for populating the node
field if the use of a MAC address (IEEE 802) is not desired. Statements
earlier in the document (e.g. <em>"For systems with no IEEE address, a
randomly or pseudo-randomly generated value may be used"</em>) seem to
indicate that using the MAC address on the system is the more correct
or normal case.
<p>So just for clarity I want to specifically point out here that if
there is a concern about privacy, just use a random value instead -
that is endorsed as a correct way of generating UUIDs in Version 6.
As specified in the RFC, random bytes should be generated
from a high quality source (i.e. use /dev/urandom instead of something
based on a deterministic algorithm [Mersenne Twister].) The multicast
bit should be set as indicated in the RFC.
<h3>Implementation</h3>
<p><a href="https://github.com/bradleypeabody/gouuidv6">A reference implementation has been written in Go.</a> The actual code involved is relatively simple and seems
to perform decently, feedback and pull requests welcome.<p>
<h3>Hacks</h3>
<p>It is somewhat trivial to convert between Version 1 and Version 6
UUIDs, since the information they contain is the same just in
different positions. In environments where an implementation of
Version 1 exists, converting this output to Version 6 may be
the simplest implementation. Here are some examples of how to do this
in various languages:</p>
<h4>C:</h4>
<pre><code>#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
#include <arpa/inet.h>
#include <uuid/uuid.h>
/* Converts UUID version 1 to version 6 in place. */
void uuidv1tov6(uuid_t u) {
uint64_t ut;
unsigned char *up = (unsigned char *)u;
// load ut with the first 64 bits of the UUID
ut = ((uint64_t)ntohl(*((uint32_t*)up))) << 32;
ut |= ((uint64_t)ntohl(*((uint32_t*)&up[4])));
// dance the bit-shift...
ut =
((ut >> 32) & 0x0FFF) | // 12 least significant bits
(0x6000) | // version number
((ut >> 28) & 0x0000000FFFFF0000) | // next 20 bits
((ut << 20) & 0x000FFFF000000000) | // next 16 bits
(ut << 52); // 12 most significant bits
// store back in UUID
*((uint32_t*)up) = htonl((uint32_t)(ut >> 32));
*((uint32_t*)&up[4]) = htonl((uint32_t)(ut));
}
int main(int argc, char **argv) {
uuid_t u;
char str[37];
uuid_generate_time(u);
uuid_unparse(u, str);
printf("UUIDv1: %s\n", str);
uuidv1tov6(u);
uuid_unparse(u, str);
printf("UUIDv6: %s\n", str);
return 0;
}
</code></pre>
<h4>Python 2/3:</h4>
<pre><code>import uuid
def uuidv1tov6(u):
uh = u.hex
tlo1 = uh[:5]
tlo2 = uh[5:8]
tmid = uh[8:12]
thig = uh[13:16]
rest = uh[16:]
uh6 = thig + tmid + tlo1 + '6' + tlo2 + rest
return uuid.UUID(hex=uh6)
u = uuidv1tov6(uuid.uuid1())
</code></pre>
<h4>MySQL/MariaDB/Percona:</h4>
<p>n.b. Basic idea <a target="_blank" href="https://www.percona.com/blog/2014/12/19/store-uuid-optimized-way/">borrowed from here,</a> adjusted to match definition above.</p>
<pre><code>DELIMITER //
CREATE DEFINER=`root`@`localhost` FUNCTION `uuidv1atov6b`(u1 BINARY(36))
RETURNS BINARY(16) DETERMINISTIC
RETURN UNHEX(CONCAT(
SUBSTR(u1, 16, 3),
SUBSTR(u1, 10, 4),
SUBSTR(u1, 1, 5),
'6',
SUBSTR(u1, 6, 3),
SUBSTR(u1, 20, 4),
SUBSTR(u1, 25)
));
//
DELIMITER ;
DELIMITER //
CREATE DEFINER=`root`@`localhost` FUNCTION `uuidbtoa`(u BINARY(16))
RETURNS BINARY(36) DETERMINISTIC
RETURN CONCAT(
HEX(SUBSTR(u, 1, 4)),
"-",
HEX(SUBSTR(u, 5, 2)),
"-",
HEX(SUBSTR(u, 7, 2)),
'-',
HEX(SUBSTR(u, 9, 2)),
"-",
HEX(SUBSTR(u, 11, 6))
);
//
DELIMITER ;
-- for use as primary key:
select uuidv1atov6b(uuid());
-- to display:
select uuidbtoa(uuidv1atov6b(uuid()));
</code></pre>
<h3>Feedback:</h3>
<!-- <hr/> -->
<!-- GA tracking code -->
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-82103174-1', 'auto');
ga('send', 'pageview');
</script>
<div id="disqus_thread"></div>
<script>
/**
* RECOMMENDED CONFIGURATION VARIABLES: EDIT AND UNCOMMENT THE SECTION BELOW TO INSERT DYNAMIC VALUES FROM YOUR PLATFORM OR CMS.
* LEARN WHY DEFINING THESE VARIABLES IS IMPORTANT: https://disqus.com/admin/universalcode/#configuration-variables */
/*
var disqus_config = function () {
this.page.url = PAGE_URL; // Replace PAGE_URL with your page's canonical URL variable
this.page.identifier = PAGE_IDENTIFIER; // Replace PAGE_IDENTIFIER with your page's unique identifier variable
};
*/
(function() { // DON'T EDIT BELOW THIS LINE
var d = document, s = d.createElement('script');
s.src = '//uuidv6.disqus.com/embed.js';
s.setAttribute('data-timestamp', +new Date());
(d.head || d.body).appendChild(s);
})();
</script>
<noscript>Please enable JavaScript to view the <a href="https://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript>
</section>
<footer>
<!-- <p><small>Hosted on GitHub Pages.</small></p> -->
</footer>
</div>
<!--[if !IE]><script>fixScale(document);</script><![endif]-->
</body>
</html>