Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emit JSON schema and BigQuery schema #6

Merged
merged 5 commits into from
Jun 20, 2022
Merged

Emit JSON schema and BigQuery schema #6

merged 5 commits into from
Jun 20, 2022

Conversation

duncanjbrown
Copy link
Contributor

When setting up a new project using dfe-analytics, it's necessary to create a schema in BigQuery that's compatible with the schema in this project.

BigQuery uses its own schema format, so we need to map the JSON schema to that format. The BigQuery schema contains less information than the JSON schema (e.g. it lacks enums, additionalProperties etc). We keep the JSON schema as the source of truth for testing, and generate a BigQuery schema from it.

This PR adds two new rake tasks: dfe:analytics:schema and dfe:analytics:big_query_schema. The BigQuery schema is suitable for copy-pasting into the BigQuery web interface. It could also be used to generate a table via the bq CLI tool.

Changes to the schema!

During this process I noticed that the event-schema.json was not the same as the BigQuery schema (for Apply, at least) and I amended it (relaxing requirements, so not breaking changes)

@duncanjbrown duncanjbrown requested a review from misaka May 19, 2022 10:52
@@ -0,0 +1,73 @@
module DfE
module Analytics
class EventSchema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class EventSchema
module EventSchema

Why is this a class and not a module?

@misaka
Copy link
Contributor

misaka commented Jun 16, 2022

I can't for the life of me understand why Rubocop is singling out lib/dfe/analytics/event_schema.rb for not having top-level documentation. Other classes and modules don't have top-level docs. I event tried copying over the entire contents of load_entities.rb into event_schema.rb, and Rubocop complains about event_schema.rb and not load_entities.rb! 🤯

$ be rubocop lib/dfe/analytics/load_entities.rb lib/dfe/analytics/event_schema.rb
Inspecting 2 files
.C

Offenses:

lib/dfe/analytics/event_schema.rb:5:5: C: Style/Documentation: Missing top-level documentation comment for class DfE::Analytics::LoadEntities.
    class LoadEntities
    ^^^^^^^^^^^^^^^^^^

2 files inspected, 1 offense detected

This makes some assumptions about the content of repeated fields, but as
we control the event schema we think it is unlikely to break
unexpectedly
@duncanjbrown duncanjbrown merged commit f10c3da into main Jun 20, 2022
@duncanjbrown duncanjbrown deleted the schema-task branch June 20, 2022 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants