Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make RandomExample/RandomItemGenerator deterministic (use seed) #56

Merged
merged 10 commits into from
Jul 27, 2020

Conversation

ChrisBAshton
Copy link
Contributor

@ChrisBAshton ChrisBAshton commented Jul 21, 2020

Make RandomExample/RandomItemGenerator deterministic (use seed)

We're now building govuk-developer-docs hourly, but our heavy use
of GovukSchemas::RandomExample is causing ~50,000 line changes
for every commit to gh-pages, despite the underlying schemas
remaining unchanged.

We want the example to be random and schema-validated, but not
necessarily a new random on every build. Hence we would like to
set the 'seed' so that we can predictably execute the same random
generation every time. In theory, this means our gh-pages output
will be identical between builds until an underlying schema changes.

govuk_schemas did not support this out of the box because of its
reliance on SecureRandom, which, as the name implies, does not
allow outside interference/configuration by way of seed values
(it gives a truly random result every time). We don't need a
'secure' random output - just something we can be reasonably
confident will give a different result every time if the seed is
omitted. I've reproduced its 'uuid' and 'hex' methods using arrays
of chars/numbers and the native 'rand' method, which does
respect the global seed value.

@ChrisBAshton ChrisBAshton marked this pull request as ready for review July 21, 2020 12:25
barrucadu
barrucadu previously approved these changes Jul 21, 2020
Copy link
Member

@kevindew kevindew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work Chris, sorry I hit submit too soon.

I think this is going in a good direction but I don't think this tool should do the seeding but more be friendly to a world where another app has done the seeding.

lib/govuk_schemas/random.rb Outdated Show resolved Hide resolved
def number
numbers = ("0".."9").to_a
numbers[rand(0..numbers.count - 1)]
end
end
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code in this class looks good, might be worth making it a commit of it's own about deterministic random usage so there's good emphasis on the why as that could be quite a future headscratcher (current commit looks good but is wrapped up with the seed stuff which seems distinct).

It's probably worth writing some unit tests for it too since we expect the logic to reflect the external condition of srand seeding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken the Content Publisher approach here, where I'm confident every change in this commit is being tested thoroughly by the unit test I've added in random_example_spec.rb. It seems a little unnecessary to write tests explicitly for the hex/uuid methods, especially as other public methods in this class also don't have an explicit test (e.g. base_path, anchor). Let me know if you think I should write more tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it's a pretty minor point. The thing that I find a little peculiar is that we've added unit test for a class that didn't actually change for this. So it feels like we're doing an integration test rather than a unit test. I don't know this code base well enough to know if that's inconsistent for this project or not though. So happy to go with your call on what feels right.

@ChrisBAshton ChrisBAshton force-pushed the seed branch 2 times, most recently from 8451736 to f84e26d Compare July 21, 2020 14:01
@ChrisBAshton ChrisBAshton changed the title Add seed functionality to RandomExample/RandomItemGenerator Make RandomExample/RandomItemGenerator deterministic (use seed) Jul 21, 2020
@ChrisBAshton ChrisBAshton requested a review from kevindew July 21, 2020 14:26
@ChrisBAshton ChrisBAshton force-pushed the seed branch 3 times, most recently from 66f61d0 to 12decd2 Compare July 21, 2020 15:13
def number
numbers = ("0".."9").to_a
numbers[rand(0..numbers.count - 1)]
end
end
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it's a pretty minor point. The thing that I find a little peculiar is that we've added unit test for a class that didn't actually change for this. So it feels like we're doing an integration test rather than a unit test. I don't know this code base well enough to know if that's inconsistent for this project or not though. So happy to go with your call on what feels right.

spec/lib/random_example_spec.rb Outdated Show resolved Hide resolved
spec/lib/random_example_spec.rb Outdated Show resolved Hide resolved
We're now building govuk-developer-docs hourly, but our heavy use
of GovukSchemas::RandomExample is causing ~50,000 line changes
for every commit to gh-pages, despite the underlying schemas
remaining unchanged.

We want the example to be random and schema-validated, but not
necessarily a new random on every build. Hence we would like to
set the 'seed' so that we can predictably execute the same random
generation every time. In theory, this means our gh-pages output
will be identical between builds until an underlying schema changes.

govuk_schemas did not support this out of the box because of its
reliance on SecureRandom, which, as the name implies, does not
allow outside interference/configuration by way of seed values
(it gives a truly random result every time). We don't _need_ a
'secure' random output - just something we can be reasonably
confident will give a different result every time if the seed is
omitted. I've reproduced its 'uuid' and 'hex' methods using arrays
of chars/numbers and the native 'rand' method, which _does_
respect the global seed value.
I noticed that the test would occasionally fail because the time
would be different between the first run and second run of
`GovukSchemas::RandomExample.new(schema: schema).payload`

The `public_updated_at` might be 16:06:03 for one and 16:06:04 for
the other. This is just a natural consequence of the time it takes
to execute the code, and is not an indicator of the randomness not
being deterministic, which is what the test is testing.
The previous approach to deterministic behaviour was to rely on
'srand' being set upstream, so that it is up to the app to define
a seed if they wish to have consistent results.

However, if the srand is not set explicitly by the developer,
and if the test framework (such as RSpec) does not set the seed
to a random value, then the 'random example' will be the same
example every time, by default. This could be surprising.

We decided it is better to make the developer explicitly opt in
to seed behaviour, than it is to make them set a random seed in
order to get a 'true' random on every test run.

This commit is part 1: defining the 'seed' parameter. Part 2
comes in the next commit, where we'll swap out the call to 'srand'
and use an instance of Random, to prevent changing the global
state of things.

Full discussion:
https://github.com/alphagov/govuk_schemas/pull/56/files\#r458182188
Copy link
Member

@kevindew kevindew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one Chris, this looks good without the side effects. I've just made a few small suggestions. Biggest thing seems to be dealing with Time.now

lib/govuk_schemas/random_helpers.rb Outdated Show resolved Hide resolved
lib/govuk_schemas/random_helpers.rb Outdated Show resolved Hide resolved
lib/govuk_schemas/random_item_generator.rb Outdated Show resolved Hide resolved
lib/govuk_schemas/random_item_generator.rb Outdated Show resolved Hide resolved
This is to avoid clashing with the ruby standard lib
'Random' module, which we'll be using in the next
commit.

At the same time, I've renamed 'random_item_generator.rb' to
'random_schema_generator.rb', which is more reflective of its
purpose and helps to distinguish between the two classes.
Prior to this commit, we relied on the global random seed set via
'srand', insofar as if somebody set 'srand', we'd return the same
result every time.
However, this could be surprising behaviour, as devs calling
RandomExample would expect a random example every time it is
called. We decided in an earlier [discussion] that it would be
better to pass an explicit 'seed' parameter if the dev wants
consistent results.

Whilst we could have used this 'seed' parameter to call
'srand(seed)', the behaviour becomes non-deterministic because
it changes the nature of the global environment. I.e. if the dev
is relying on setting 'srand' for some other part of their test
suite, we would now have skewed it by calling 'srand' again.
Hence the need to move to a sandbox/isolated way of setting
seeded random behaviour.

We used 'Random.new(seed)' and called all randomness-inducing
methods on that returned object, i.e. 'rand => @random.rand',
and 'arr.sample => arr.sample(random: @random)'.

[discussion]: https://github.com/alphagov/govuk_schemas/pull/56/files\#r458182188
Using a seed, we were now returning a deterministic random schema
for everything except time fields, which would vary according to
the current time. This broke the nature of being able to supply a
seed and reliably retrieve the same schema every time.

By setting a random time relative to an arbitrary date (1st Feb
2012; GOV.UK's birthday), we respect the seed and will return the
same 'random' time every time. This means we no longer need to
alter time, so can remove our dependency on Timecop.
This only had one (untested) method, called in one place.
I've now moved its contents directly to where it was being
used.
Copy link
Member

@kevindew kevindew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one, looks good to me 👍

Just to check, have we given this a try in developer docs to confirm it resolves the problem?

@ChrisBAshton ChrisBAshton merged commit e05adef into master Jul 27, 2020
@ChrisBAshton ChrisBAshton deleted the seed branch July 27, 2020 09:29
@ChrisBAshton
Copy link
Contributor Author

I did, yes 👍 (albeit with an earlier version of the code, but am confident it will still work).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants