Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid double validation of payloads #113

Merged
merged 2 commits into from
May 22, 2024
Merged

Avoid double validation of payloads #113

merged 2 commits into from
May 22, 2024

Conversation

kevindew
Copy link
Member

The call to validate a schema is quite expensive:

[3] pry(#<GovukSchemas::RandomExample>)> puts Benchmark.measure { 100.times { JSON::Validator.fully_validate(@Schema, item, errors_as_objects: true) } }
  7.614167   1.532485   9.146652 ( 10.468765)

This updates the code to avoid making two calls to the validator when a
payload is generated with a block if the customised version is valid.

Before:

irb(main):010:0> puts Benchmark.measure { 100.times { GovukSchemas::RandomExample.for_schema(frontend_schema: "news_artic
le") { |schema| schema["title"] = "Custom"; schema } } }
 13.308641   3.087679  16.396320 ( 18.977005)

After:

irb(main):002:0> puts Benchmark.measure { 100.times { GovukSchemas::RandomExample.for_schema(frontend_schema: "news_artic
le") { |schema| schema["title"] = "Custom"; schema } } }
  7.654303   1.561620   9.215923 ( 10.604802)

The motivation for this was that the GOV.UK Chat test suite makes quite
heavy use of random schemas with customisations for checking supported
schemas. On a MBP with GOV.UK Docker this reduces the time of the
running the test suite from around ~55s to around ~35s.

@kevindew kevindew force-pushed the iterate-random-example branch from bdfc679 to 48130ba Compare May 22, 2024 08:45
Copy link
Member

@yndajas yndajas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert on this, but it generally looks sensible to me. A few minor comments, including about the change in error messaging

lib/govuk_schemas/random_example.rb Outdated Show resolved Hide resolved
lib/govuk_schemas/random_example.rb Outdated Show resolved Hide resolved
lib/govuk_schemas/random_example.rb Outdated Show resolved Hide resolved
The call to validate a schema is quite expensive:

```
[3] pry(#<GovukSchemas::RandomExample>)> puts Benchmark.measure { 100.times { JSON::Validator.fully_validate(@Schema, item, errors_as_objects: true) } }
  7.614167   1.532485   9.146652 ( 10.468765)
```

This updates the code to avoid making two calls to the validator when a
payload is generated with a block if the customised version is valid.

Before:

```
irb(main):010:0> puts Benchmark.measure { 100.times { GovukSchemas::RandomExample.for_schema(frontend_schema: "news_artic
le") { |schema| schema["title"] = "Custom"; schema } } }
 13.308641   3.087679  16.396320 ( 18.977005)
```

After:

```
irb(main):002:0> puts Benchmark.measure { 100.times { GovukSchemas::RandomExample.for_schema(frontend_schema: "news_artic
le") { |schema| schema["title"] = "Custom"; schema } } }
  7.654303   1.561620   9.215923 ( 10.604802)
```

The motivation for this was that the GOV.UK Chat test suite makes quite
heavy use of random schemas with customisations for checking supported
schemas. On a MBP with GOV.UK Docker this reduces the time of the
running the test suite from around ~55s to around ~35s.
@kevindew kevindew force-pushed the iterate-random-example branch from 48130ba to 3a7e16b Compare May 22, 2024 10:51
@kevindew
Copy link
Member Author

Thanks for picking this up @yndajas 👍

@kevindew kevindew merged commit 08d5006 into main May 22, 2024
9 checks passed
@kevindew kevindew deleted the iterate-random-example branch May 22, 2024 11:30
kevindew added a commit that referenced this pull request May 23, 2024
The change in #113 didn't
handle a common usage scenario where a user modifies the payload passed
into the block (example usage [1]).

In order to handle this we need to make a copy of the original payload
before it is modified. As this project doesn't have an ActiveSupport
dependency I wrote the equivalent of a deep_dup method. using
`Marshal.load(Marshal.dump(object))`, with an expectation that given the
data generated is JSON in a Ruby Hash/Array structure there'd be no
risk of objects not having Marshal support.

An earlier approach was to iterate ourselves, but that was a bit more
verbose:

```
    def deep_dup_payload(item)
      case item
      when Array
        item.map { |i| deep_dup_payload(i) }
      when Hash
        item.dup.transform_values { |v| deep_dup_payload(v) }
      else
        item.dup
      end
    end
```

Anecdotally it does seem that previously this project considered
modifying the payload as unsupported [2], but this wasn't enforced by a
frozen object and would now be a breaking change to require this.

[1]: https://github.com/alphagov/content-data-api/blob/cddc744baa0a073868a527b5513a7943fa53ddc6/spec/factories/messages.rb#L19-L26
[2]: https://github.com/alphagov/govuk_schemas/blob/e1da17587b10df8ae3b29d7583c3a4881a377f97/spec/lib/random_example_spec.rb#L45-L53
kevindew added a commit that referenced this pull request May 23, 2024
The change in #113 didn't
handle a common usage scenario where a user modifies the payload passed
into the block (example usage [1]).

In order to handle this we need to make a copy of the original payload
before it is modified. As this project doesn't have an ActiveSupport
dependency I wrote the equivalent of a deep_dup method. using
`Marshal.load(Marshal.dump(object))`, with an expectation that given the
data generated is JSON in a Ruby Hash/Array structure there'd be no
risk of objects not having Marshal support.

Marshal is a Ruby module that can serialise Ruby objects into a byte
stream and can take a byte stream and re-create the objects. By doing
this for a payload it creates new objects for the entire payload hash.
This means that any modifications to the payload are not represented in
the original_payload object.

An earlier approach was to iterate ourselves, but that was a bit more
verbose:

```
    def deep_dup_payload(item)
      case item
      when Array
        item.map { |i| deep_dup_payload(i) }
      when Hash
        item.dup.transform_values { |v| deep_dup_payload(v) }
      else
        item.dup
      end
    end
```

Anecdotally it does seem that previously this project considered
modifying the payload as unsupported [2], but this wasn't enforced by a
frozen object and would now be a breaking change to require this.

[1]: https://github.com/alphagov/content-data-api/blob/cddc744baa0a073868a527b5513a7943fa53ddc6/spec/factories/messages.rb#L19-L26
[2]: https://github.com/alphagov/govuk_schemas/blob/e1da17587b10df8ae3b29d7583c3a4881a377f97/spec/lib/random_example_spec.rb#L45-L53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants