Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EES-5152 Add Public API Data Processor tests for ImportMetadata and ImportData #4971

Merged
merged 4 commits into from
Jun 18, 2024

Conversation

benoutram
Copy link
Collaborator

@benoutram benoutram commented Jun 17, 2024

This PR adds further tests to the Public API Data Processor's ImportMetadata and ImportData activities.

It makes a change to the ProcessorTestData class to support adding sets of expected test data. At present there is only one data set - AbsenceSchool. The intention is that further sets can easily be added if required without writing new tests as no tests are tied to specific data.

ImportMetadata tests

Two different sets of tests were added here:

  • Add tests to assert the contents of the PostgreSQL metadata for the imported version against the expected data set.
  • Add tests to assert the contents of the DuckDb metadata tables for the imported version against the expected data set.

Also assert the version has the correct DataSetVersionMetaSummary and correct TotalResults.

ImportData tests

Tests are added to assert the DuckDb data table has the correct row count, correct expected columns (base columns Id, GeographicLevel and TimePeriodId and all of the flexible location, filter and indicator columns), and that the data table contains the correct distinct set of values for time periods, geographic levels, location options and and filter options.

WriteDataFiles tests

No further tests were written for the WriteDataFiles activity. The reason being that we are now testing the contents of the DuckDb metadata and data tables, and the existing ProcessInitialDataSetVersionFunctionTests.WriteDataFilesTests.Success test is already testing that the table files which should be exported are present on the filesystem.

Any additional tests to check the data would be duplication of the tests added for the ImportMetadata or ImportData activities, or testing the correctness of DuckDb's own export feature.

Further considerations

  • This PR fixes a problem where the PublicId Sqids which were offset from the Id/Index that we expect them to be generated from by a value of one. However thinking about this has identified another problem that due to the way they are using the table counts as the starting index for inserts, any deletions from these tables can result in newer data set versions reusing old PublicId's. This will be addressed by EES-5235.

  • The test data set is not large enough to fully exercise the batching that's used to to generate filter options and location options so I was unsure if dedicated tests should be written for this or if a larger test data set should be added. This has been left for now.

  • The tests are not tied to any specific data set but this means the ImportData tests are not testing the integrity of specific rows. They are only testing that the columns (excluding indicator columns) contain the expected distinct set of labels and id's. We might want to do additional work to compare expected row values either for all rows or a small subset of rows, e.g. the first row.

@benoutram benoutram merged commit c2bd98d into dev Jun 18, 2024
7 checks passed
@benoutram benoutram deleted the EES-5152 branch June 18, 2024 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants