Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

util/uuid: fork gofrs, generate with math/rand instead of crypto/rand #32238

Merged
merged 1 commit into from
Nov 14, 2018

Conversation

ajwerner
Copy link
Contributor

@ajwerner ajwerner commented Nov 12, 2018

Fixes #30236
Copies the source from github.com/gofrs/uuid into util/uuid at 7077aa61129615a0d7f45c49101cd011ab221c27.
Modifies the copyright notice to refer to the license at licenses/MIT-gofrs.txt.
Renames the spurious Bytes method to bytes.
Adds the ability to create a Gen with a random byte Reader.
Add FastMakeV4() which uses a generator which draws from a math/rand.Rand.
Use FastMakeV4 in roachpb.MakeTransaction.

Release note (performance improvement): Use faster randomness source to generate transaction IDs.

@ajwerner ajwerner requested review from bdarnell, benesch, nvanbenschoten and a team November 12, 2018 23:46
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@bdarnell
Copy link
Contributor

Sets the DefaultGenerator to one based on math/rand.Rand.
Use secure crypto/rand in sql uuid builtins.

This should go the other way: use crypto/rand by default, and only switch to math/rand where we know performance is critical and we don't need stronger randomness (just transaction ids?)

@knz
Copy link
Contributor

knz commented Nov 13, 2018

  1. this:

This should go the other way: use crypto/rand by default, and only switch to math/rand where we know performance is critical and we don't need stronger randomness (just transaction ids?)

What Ben says.

  1. in general a commit message / PR description should contain the following, in this order:

    1. a summary of the current situation
    2. a short explanation of why the current situation is problematic
    3. what the commit is doing to change the current situation
    4. what the new situation is as a result

When the change is aiming to improve performance:

  • there must be a benchmark that validates the performance change:

    • either the benchmark exists already, in which case the commit message should describe which benchmark was used to validate
    • or the commit should contain new benchmark code
  • the commit message should include a copy of the benchmark before/after comparison, eg via the benchstat tool.

@knz
Copy link
Contributor

knz commented Nov 13, 2018

Also looking at the linked issue #30236 there is only mention of accelerating txn ID generation. So as a word of wisdom, to be reused generally when thinking of changing the "provider" of some of the util packages:

  • audit every caller of the util package, and make a list of every context where the caller is used -- answer the question "is there anywhere else that's possibly impacted besides the specific problem I'm looking at?"

  • share this list in the commit message

  • for each of them make your own assessment of how the change is going to impact.

    • if the change is only performance-oriented, assess using a benchmark
    • if the change is potentially correctness-related (as is the case here), make an assessment of how the new behavior differs from the expectations from specs
      • in your case here, for the txn ID case this assessment was made by Nathan in the linked issue. You can copy-paste his argument in your commit message, or make a summary thereof
      • in your case here, for the SQL usage you need to assess yourself
      • there may be other callers I'm not seeing yet (you'd need to assess yourself too)
    • share the assessments in the commit message / PR description

Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the comment with regards to the default. I'll go through the other callers and see if they occur on hot paths, but for now updated the diff to just touch the transaction ID.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@ajwerner ajwerner force-pushed the ajwerner/fork-uuid branch 2 times, most recently from d50a3a0 to 024c5b4 Compare November 13, 2018 15:12
@ajwerner ajwerner requested a review from a team November 13, 2018 15:12
@ajwerner ajwerner force-pushed the ajwerner/fork-uuid branch 2 times, most recently from 38aef5c to 41fee30 Compare November 13, 2018 15:39
Copy link
Contributor

@benesch benesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: but please wait for someone else to sign off as well!

Reviewed 60 of 60 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: although I'd like to see a benchmark demonstrating the benefit that we observe from this change.

Reviewed 18 of 60 files at r1.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh wait

There is a huge chunk of new code in the pkg/util package. There is zero explanation for this code in the commit message. This contrast between the two things does not sit well with me.

See my comments above about what a commit message should state.

See also the comment above by Ben - how did you actually address the requirement to provide different UUID sources for SQL? Did you do so actually? (It's not so clear to me how/where this is happening. The commit message should guide me.)

It seems you copied code from 3rd party sources into the pkg/util directory. Code imports, together with a highlight of the license that allowed you to import this code, should be called out in the commit message.

Also repeat of my invitation to provide benchmark comparisons.

Reviewed 3 of 60 files at r1.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained

Copy link
Contributor

@benesch benesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we looking at different commit messages, @knz? This is the one I'm looking at:

Copies the source from github.com/gofrs/uuid into util/uuid at 7077aa61129615a0d7f45c49101cd011ab221c27.
Remove github.com/satori/go.uuid dep
Modifies the copyright notice to refer to the license at licenses/MIT-gofrs.txt.
Renames the spurious Bytes method to bytes.
Adds the ability to create a Gen with a random byte Reader.
Add FastMakeV4() which uses a generator which draws from a math/rand.Rand.
Use FastMakeV4 in roachpb.MakeTransaction.

I suppose there could be more rationale but it seems to speak to the questions you've asked.

Here's how I would phrase the commit message, @ajwerner, just FYI. I'm also fine with what you already have!

util/uuid: support generation of fast but non-cryptographically secure UUIDs

Replace our UUID library, github.com/satori/go.uuid, which was unmaintained and did not provide a facility for customizing the source of randomness. The new library is based upon the MIT-licensed github.com/gofrs/uuid library, which is better maintained and allows customizing the source of randomness. Because the code is so simple, copy and paste it wholesale rather than vendoring it. This will ease future customizations.

The new library exposes a fast UUID generation function, FastUUIDV4, that uses a non-cryptographically secure function for generating UUIDs. For now, only transaction ID generation uses the new fast generation function, as transaction IDs are only used internally and therefore do not need a cryptographically-secure source of randomness. Other calls to UUID generation, like the SQL builtins, continue to use a cryptographically-secure source of randomness.

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained

@knz
Copy link
Contributor

knz commented Nov 13, 2018 via email

@nvanbenschoten
Copy link
Member

Part of the expected improvement from #30236 is Mac-specific. I know you don't have a Mac laptop, so I took this for a quick spin to verify the effect. Sure enough, the effect is immediately clear on Mutex profiles when running kv95 --concurrency=64 on my laptop.

before:

screen shot 2018-11-14 at 10 11 31 am

after:

screen shot 2018-11-14 at 10 12 32 am

The baseline throughput difference on the benchmark was somewhere around .5%, but I didn't run enough trials for the results to be statistically significant:

name  old ops/sec  new ops/sec  delta
KV95   7.95k ± 1%   8.00k ± 1%   ~     (p=0.200 n=4+4)

…e UUIDs

Replace our UUID library, github.com/satori/go.uuid, which was unmaintained and did not provide a facility for customizing the source of randomness. The new library is based upon the MIT-licensed github.com/gofrs/uuid library (at 7077aa61129615a0d7f45c49101cd011ab221c27), which is a better maintained fork of the previous dependency. Because the code is so simple, copy and paste it wholesale rather than vendoring it to ease modification. Modify the forked library to allow customization of the source of randomness and remove problematic Bytes method on the UUID object which complicates its use inside protobuf generated structs.

The new library exposes a fast UUID generation function, FastMakeV4, that uses a non-cryptographically secure function for generating UUIDs. For now, only transaction ID generation uses the new fast generation function, as transaction IDs are only used internally and therefore do not need a cryptographically-secure source of randomness. Other calls to UUID generation, like the SQL builtins, continue to use a cryptographically-secure source of randomness.

Fixes cockroachdb#30236

Release note (performance improvement): Use faster randomness source to generate transaction IDs.
@ajwerner
Copy link
Contributor Author

From micro benchmarks on my Linux laptop I see the below compelling win. I'm running kv benchmarks on roachprod now.

name          time/op
FastMakeV4-8  65.5ns ±14%
MakeV4-8       646ns ± 0%

Also added the benchmark to the code and updated the commit message along the lines of @benesch's suggestion.

@ajwerner
Copy link
Contributor Author

Didn't see any statistically significant improvement on 32 core machines but got the below results on 3 64 core machines running kv with 95% reads, with concurrency set to 512:

name       old ops/s   new ops/s   delta
KV95     9.36k ± 1%  9.52k ± 1%  +1.81%  (p=0.016 n=5+5)

bors r+

craig bot pushed a commit that referenced this pull request Nov 14, 2018
32238: util/uuid: fork gofrs, generate with math/rand instead of crypto/rand r=ajwerner a=ajwerner

Fixes #30236
Copies the source from github.com/gofrs/uuid into util/uuid at 7077aa61129615a0d7f45c49101cd011ab221c27.
Modifies the copyright notice to refer to the license at licenses/MIT-gofrs.txt.
Renames the spurious Bytes method to bytes.
Adds the ability to create a Gen with a random byte Reader.
Add FastMakeV4() which uses a generator which draws from a math/rand.Rand.
Use FastMakeV4 in roachpb.MakeTransaction.

Release note (performance improvement): Use faster randomness source to generate transaction IDs.

Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Nov 14, 2018

Build succeeded

@craig craig bot merged commit fd9705b into cockroachdb:master Nov 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants