Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add generation of COMB UUID's #43

Merged
merged 2 commits into from
Mar 10, 2015
Merged

Add generation of COMB UUID's #43

merged 2 commits into from
Mar 10, 2015

Conversation

fabre-thibaud
Copy link

No description provided.

@fabre-thibaud
Copy link
Author

@ramsey @marijn @ghola I'd appreciate a review on this updated version (I won't go haywire, promised ! )

@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) when pulling ab1d1ae on aztech-digital:comb into ec8d7a7 on ramsey:3.0.

@ramsey
Copy link
Owner

ramsey commented Dec 29, 2014

@aztech-dev This appears to be failing on HHVM only, with this error:

Failed asserting that '30303030-3030-4030-b030-002109109f93' is greater than '30303030-3030-4030-b030-0ce78a7e4626'.

See here: https://travis-ci.org/ramsey/uuid/builds/44397555

@fabre-thibaud
Copy link
Author

Actually, I ran more tests it's not an HHVM only error, it was just luck. Current version (using string based microtime) seems to overflow as @ghola mentionned on #40. I think microsecond precision is just not suitable considering the available bytes for the timestamp part, but I havent sat down and done the math yet.

Quickfix is to revert to using float based microtime with rounding to ms precision. I'm a bit busy these days with non-OSS work (bills gotta be paid...), but I'll try to clear some time today to investigate a bit more and push a fix.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.31%) when pulling 3882510 on aztech-digital:comb into ec8d7a7 on ramsey:3.0.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) when pulling d015d6f on aztech-digital:comb into ec8d7a7 on ramsey:3.0.

@coveralls
Copy link

Coverage Status

Coverage decreased (-1.1%) to 86.12% when pulling 8bece7c on aztech-digital:comb into db357ed on ramsey:3.0.

@fabre-thibaud
Copy link
Author

Finally got around to fixing the implementation.

I'm using a 0.00001 seconds precision, which should be safe for ~89 years (2**48 / (365 * 86400 * 100000)), lest I am wrong in my math. Going down to 0.000001 feels a bit short (8.9 years), though if you feel it's safe to go down that path, I'll amend my commit to do so.

@fabre-thibaud
Copy link
Author

Ping ? Pong ? @ramsey @marijn @ghola

Thoughts/feedback on the latest version ?

@ramsey
Copy link
Owner

ramsey commented Feb 15, 2015

Sorry. I must have missed this. I'll take a look this week.

@fabre-thibaud
Copy link
Author

fabre-thibaud commented Feb 15, 2015 via email

@fabre-thibaud
Copy link
Author

Speaking of oblivion :p still havent had time to review ? Let me know if anything needs changing

@ramsey
Copy link
Owner

ramsey commented Mar 10, 2015

This looks good to me. I'm sorry for taking so long to review it. I think v3 might be in a good place now. I need to go through a clean up some documentation and make sure we've got good code coverage. Once that's done, I think I'll be able to cut a release.

Thanks!

ramsey added a commit that referenced this pull request Mar 10, 2015
Add generation of COMB UUID's
@ramsey ramsey merged commit 12c9227 into ramsey:3.0 Mar 10, 2015
This was referenced Mar 10, 2015
@fabre-thibaud
Copy link
Author

No worries :) I admit I didnt really push test coverage up, but let me know if you need a hand with that considering your schedule seems tight these days.

@fabre-thibaud fabre-thibaud deleted the comb branch March 10, 2015 19:22
@fabre-thibaud
Copy link
Author

@ghola As I mentionned in another comment, the original intent was to have a third optional parameter in the constructor's signature in case some users wanted to change the precision of the timestamp. I'm open to debate as to whether this is really useful, and if not, it should be a constant, otherwise the missing parameter on the constructor should be added:

  public function __construct(RandomGeneratorInterface $generator, NumberConverterInterface $numberConverter, $timestampBytes = 6)

@ghola
Copy link
Contributor

ghola commented Mar 17, 2016

@aztech-dev Yep, I read that comment. I side with making it a constant. If it's not easy to come up with a scenario in which having it as a parameter would be useful or if noone needed/requested it there is no point in implementing it like that ... YAGNI.

@fabre-thibaud
Copy link
Author

I'm quite OK with either approach actually :)

@ramsey
Copy link
Owner

ramsey commented Mar 17, 2016

@aztech-dev, I see the use of microtime() now. I was thinking just of time(). :-)

Trying to wrap my head around this math to understand it…

The contents of this if statement are executed if $lsbTime is longer than $this->timestampBytes * 2 (12):

if ($this->timestampBytes > 0 && strlen($lsbTime) > $this->timestampBytes * 2) {

Most of the time, $lsbTime will be exactly 12 bytes (padded with 0's on the left side) because of this line:

$lsbTime = str_pad($this->converter->toHex($this->timestamp()), $this->timestampBytes * 2, '0', STR_PAD_LEFT);

However, if the hex value of timestamp is already longer than 12 bytes, then no padding will take place and the if condition will execute. The lowest possible hex value (I think) for this condition to occur is 1000000000000, which has 13 bytes and the decimal value 281474976710656.

Now, the timestamp is being generated with the following code:

$time = explode(' ', microtime(false));
return $time[1] . substr($time[0], 2, 5);

So, in order to achieve a return value of 281474976710656 from the timestamp method, we need a microtime value that looks something like:

0.10656000 2814749767

So, that puts the high-end boundary date at 2059-03-13 02:56:07.106560.

Does that sound correct?

@fabre-thibaud
Copy link
Author

Looks right :)

@ludofleury
Copy link

Pardon me to jump in there, but I'm interested in sequential uuid for time-series.
I'm quite new to the uuid world, so maybe I'm missing something.

Is there anyway to get the date, time and millisecond out of this generated Uuid ?
How do you recommend the byte storage to optimize time queries ?

@ludofleury
Copy link

Dismiss my previous message, I was hacking a bit and found the TimestampFirstCombCodec... then everything else suddenly made sense. But on another question: Is it a good idea to keep this under uuid v4 ? As it seems to me that this is not exactly a v4 right ?

And shouldn't it be better if it could provide the timestamp part ? Right now, we're face the error of v4 when requiring the time part.

@ramsey
Copy link
Owner

ramsey commented Jul 7, 2016

A COMB UUID is a combined (hence "COMB") random UUID with timestamp. Its format is defined in this article by Jimmy Nilsson. The goal of a COMB UUID is to replace the least significant bytes of the node field with the current timestamp, attempting to to compensate for the reduced clustering in database indexes. It's not an end-all-be-all solution, but it's an option.

You're right that it's not exactly a v4 UUID, but it is based on a v4 UUID. And it's definitely not a v1 UUID.

You can create one like this:

$factory = new UuidFactory();

$generator = new CombGenerator(
    $factory->getRandomGenerator(),
    $factory->getNumberConverter()
);

$factory->setRandomGenerator($generator);

$combUuid = $factory->uuid4();

Then, I'd recommend using a CHAR(16) database column to store the bytes value of this UUID ($combUuid->getBytes()).

You might also be interested in PR #118, which introduces a codec to store v1 UUIDs in an optimized way for InnoDB tables in MySQL. I'm waiting to reproduce the benchmarks found in this Percona blog post before merging the PR.

Garbee referenced this pull request in laravel/framework Apr 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants