Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boolean columns #384

Merged
merged 22 commits into from
Apr 11, 2024
Merged

Boolean columns #384

merged 22 commits into from
Apr 11, 2024

Conversation

pflanze
Copy link
Contributor

@pflanze pflanze commented Apr 8, 2024

resolves #219

@pflanze
Copy link
Contributor Author

pflanze commented Apr 8, 2024

Looks like the e2e tests fail with hash based ordering issues. How to solve?

And the linter fails with cognitive complexity: fair enough, it does strike me that the code is missing some abstractions for these things in general. But this will need a bit more study on my part and it's not the right time to do this now. Alternatively could introduce a macro or template here as a hack but I'm not sure the linter will be happier about it?

Copy link
Contributor

@fengelniederhammer fengelniederhammer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far 👍

include/silo/storage/column/bool_column.h Outdated Show resolved Hide resolved
@fengelniederhammer
Copy link
Contributor

And please squash your commits before merging such that the message conforms to https://www.conventionalcommits.org/.

@pflanze
Copy link
Contributor Author

pflanze commented Apr 9, 2024

Looks like the e2e tests fail with hash based ordering issues.

That wasn't actually the case; it didn't get the new column in the test results because the database_config.yaml for the ndjson tests wasn't updated (which I didn't notice because my test runner didn't run the ndjson tests; I'll make a PR with a proper test runner in the future).

I have also pushed a commit that replaces exampleDatasetAsNdjson/database_config.yaml with a symlink that points to exampleDataset/database_config.yaml, since the two files are identical. In the future if the files ever diverge, the symlink could be replaced with the changed content again.

@pflanze
Copy link
Contributor Author

pflanze commented Apr 9, 2024

Should we add unit tests that test the BooleanEquals filter?

There are 3 e2e tests that do so?: endToEndTests/test/queries/booleanEquals*.json

I guess you really mean unit tests?

@fengelniederhammer
Copy link
Contributor

I guess you really mean unit tests?

Yes, I mean unit tests in https://github.com/GenSpectrum/LAPIS-SILO/tree/003c7ca53cb7915493e1a0871519a397dde77de6/src/silo/test

@pflanze pflanze force-pushed the boolean_columns branch 2 times, most recently from 9670911 to a38de21 Compare April 9, 2024 12:24
@pflanze
Copy link
Contributor Author

pflanze commented Apr 9, 2024

And please squash your commits before merging such that the message conforms to https://www.conventionalcommits.org/.

I've edited the messages to follow conventional commits, while keeping the commits separate as per discussion; except I've reduced the commits with test updates to only 2.

Copy link
Contributor

@fengelniederhammer fengelniederhammer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

- {addValueToColumn,reserveSpaceInColumn,getSubgroup,getValue}()

- Had to bring 'complexity' down to satisfy clangd, hence separate
  addNullToColumn() method and early value.IsNull() handling.
- add DatabasePartition::insertColumn( storage::column::BoolColumnPartition )
- update DatabasePartition::validateMetadataColumns
No need for a wrapper, a type alias is enough.
std::string has no `+` overload for string_view, hence use fmt::format
pflanze and others added 5 commits April 11, 2024 16:27
…esults

- Update {exampleDataset,exampleDatasetAsNdjson}/database_config.yaml.

- `small_metadata_set.tsv`: add test_boolean_column column with
  manually-randomized choices.

- Add metadata.test_boolean_column to `input_file.ndjson`. Copied over
  from `small_metadata_set.tsv` using
  https://github.com/pflanze/ndjson-updater/.

- Add new column to test results; sorting changed because it is
  currently dependent on the hash of the object contents.
Test results were verified using
https://github.com/pflanze/ndjson-updater/.

In `booleanEquals_Or.json`, `"value": "B.1"` would return `"count":
97`, but the second subquery would include all of those from the first
hence not so interesting, thus `"B.1.1"` was chosen.
…symlink

The contents was and likely will remain identical to
exampleDataset/database_config.yaml
@pflanze
Copy link
Contributor Author

pflanze commented Apr 11, 2024

Changed the commit messages to properly follow conventional commit spec, and only using issue number in the last commit to avoid spamming the linked GitHub issue.

This concludes the work, if there are no objections I will merge.

@pflanze
Copy link
Contributor Author

pflanze commented Apr 11, 2024

I don't know what failed in the CI. It appears to be something building the docker container? The tests run fine for me locally and the content didn't change from before the last force push.

@pflanze
Copy link
Contributor Author

pflanze commented Apr 11, 2024

After re-running for about the 4th time both tests went through, so it was an issue with GitHub running out of some resources, or some indeterminism in the tests (race condition).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement column type boolean
2 participants