-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make RandomExample/RandomItemGenerator deterministic (use seed) #56
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work Chris, sorry I hit submit too soon.
I think this is going in a good direction but I don't think this tool should do the seeding but more be friendly to a world where another app has done the seeding.
lib/govuk_schemas/random.rb
Outdated
def number | ||
numbers = ("0".."9").to_a | ||
numbers[rand(0..numbers.count - 1)] | ||
end | ||
end | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code in this class looks good, might be worth making it a commit of it's own about deterministic random usage so there's good emphasis on the why as that could be quite a future headscratcher (current commit looks good but is wrapped up with the seed stuff which seems distinct).
It's probably worth writing some unit tests for it too since we expect the logic to reflect the external condition of srand seeding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've taken the Content Publisher approach here, where I'm confident every change in this commit is being tested thoroughly by the unit test I've added in random_example_spec.rb
. It seems a little unnecessary to write tests explicitly for the hex
/uuid
methods, especially as other public methods in this class also don't have an explicit test (e.g. base_path, anchor). Let me know if you think I should write more tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, it's a pretty minor point. The thing that I find a little peculiar is that we've added unit test for a class that didn't actually change for this. So it feels like we're doing an integration test rather than a unit test. I don't know this code base well enough to know if that's inconsistent for this project or not though. So happy to go with your call on what feels right.
8451736
to
f84e26d
Compare
66f61d0
to
12decd2
Compare
lib/govuk_schemas/random.rb
Outdated
def number | ||
numbers = ("0".."9").to_a | ||
numbers[rand(0..numbers.count - 1)] | ||
end | ||
end | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, it's a pretty minor point. The thing that I find a little peculiar is that we've added unit test for a class that didn't actually change for this. So it feels like we're doing an integration test rather than a unit test. I don't know this code base well enough to know if that's inconsistent for this project or not though. So happy to go with your call on what feels right.
We're now building govuk-developer-docs hourly, but our heavy use of GovukSchemas::RandomExample is causing ~50,000 line changes for every commit to gh-pages, despite the underlying schemas remaining unchanged. We want the example to be random and schema-validated, but not necessarily a new random on every build. Hence we would like to set the 'seed' so that we can predictably execute the same random generation every time. In theory, this means our gh-pages output will be identical between builds until an underlying schema changes. govuk_schemas did not support this out of the box because of its reliance on SecureRandom, which, as the name implies, does not allow outside interference/configuration by way of seed values (it gives a truly random result every time). We don't _need_ a 'secure' random output - just something we can be reasonably confident will give a different result every time if the seed is omitted. I've reproduced its 'uuid' and 'hex' methods using arrays of chars/numbers and the native 'rand' method, which _does_ respect the global seed value.
I noticed that the test would occasionally fail because the time would be different between the first run and second run of `GovukSchemas::RandomExample.new(schema: schema).payload` The `public_updated_at` might be 16:06:03 for one and 16:06:04 for the other. This is just a natural consequence of the time it takes to execute the code, and is not an indicator of the randomness not being deterministic, which is what the test is testing.
The previous approach to deterministic behaviour was to rely on 'srand' being set upstream, so that it is up to the app to define a seed if they wish to have consistent results. However, if the srand is not set explicitly by the developer, and if the test framework (such as RSpec) does not set the seed to a random value, then the 'random example' will be the same example every time, by default. This could be surprising. We decided it is better to make the developer explicitly opt in to seed behaviour, than it is to make them set a random seed in order to get a 'true' random on every test run. This commit is part 1: defining the 'seed' parameter. Part 2 comes in the next commit, where we'll swap out the call to 'srand' and use an instance of Random, to prevent changing the global state of things. Full discussion: https://github.com/alphagov/govuk_schemas/pull/56/files\#r458182188
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one Chris, this looks good without the side effects. I've just made a few small suggestions. Biggest thing seems to be dealing with Time.now
This is to avoid clashing with the ruby standard lib 'Random' module, which we'll be using in the next commit. At the same time, I've renamed 'random_item_generator.rb' to 'random_schema_generator.rb', which is more reflective of its purpose and helps to distinguish between the two classes.
Prior to this commit, we relied on the global random seed set via 'srand', insofar as if somebody set 'srand', we'd return the same result every time. However, this could be surprising behaviour, as devs calling RandomExample would expect a random example every time it is called. We decided in an earlier [discussion] that it would be better to pass an explicit 'seed' parameter if the dev wants consistent results. Whilst we could have used this 'seed' parameter to call 'srand(seed)', the behaviour becomes non-deterministic because it changes the nature of the global environment. I.e. if the dev is relying on setting 'srand' for some other part of their test suite, we would now have skewed it by calling 'srand' again. Hence the need to move to a sandbox/isolated way of setting seeded random behaviour. We used 'Random.new(seed)' and called all randomness-inducing methods on that returned object, i.e. 'rand => @random.rand', and 'arr.sample => arr.sample(random: @random)'. [discussion]: https://github.com/alphagov/govuk_schemas/pull/56/files\#r458182188
Using a seed, we were now returning a deterministic random schema for everything except time fields, which would vary according to the current time. This broke the nature of being able to supply a seed and reliably retrieve the same schema every time. By setting a random time relative to an arbitrary date (1st Feb 2012; GOV.UK's birthday), we respect the seed and will return the same 'random' time every time. This means we no longer need to alter time, so can remove our dependency on Timecop.
This only had one (untested) method, called in one place. I've now moved its contents directly to where it was being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one, looks good to me 👍
Just to check, have we given this a try in developer docs to confirm it resolves the problem?
I did, yes 👍 (albeit with an earlier version of the code, but am confident it will still work). |
Make RandomExample/RandomItemGenerator deterministic (use seed)
We're now building govuk-developer-docs hourly, but our heavy use
of GovukSchemas::RandomExample is causing ~50,000 line changes
for every commit to gh-pages, despite the underlying schemas
remaining unchanged.
We want the example to be random and schema-validated, but not
necessarily a new random on every build. Hence we would like to
set the 'seed' so that we can predictably execute the same random
generation every time. In theory, this means our gh-pages output
will be identical between builds until an underlying schema changes.
govuk_schemas did not support this out of the box because of its
reliance on SecureRandom, which, as the name implies, does not
allow outside interference/configuration by way of seed values
(it gives a truly random result every time). We don't need a
'secure' random output - just something we can be reasonably
confident will give a different result every time if the seed is
omitted. I've reproduced its 'uuid' and 'hex' methods using arrays
of chars/numbers and the native 'rand' method, which does
respect the global seed value.