Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent VARCHAR round-trip auto cast, check for duplicate primary keys and add preprocessing tests #624

Merged
merged 4 commits into from
Oct 25, 2024

Conversation

Taepper
Copy link
Collaborator

@Taepper Taepper commented Oct 24, 2024

Summary

We now prevent VARCHAR round-trip auto cast which lead to a VARCHAR column being first cast into a json and back by duckdb automatically. We also added a test for this.

Additionally, I added a check and test for duplicate primary keys

PR Checklist

- [ ] All necessary documentation has been adapted or there is an issue to do so.

  • The implemented feature is covered by an appropriate test.

Copy link
Contributor

github-actions bot commented Oct 24, 2024

This is a preview of the changelog of the next release. If this branch is not up-to-date with the current main branch, the changelog may not be accurate. Rebase your branch on the main branch to get the most accurate changelog.

Note that this might contain changes that are on main, but not yet released.

Changelog:

0.3.1 (2024-10-24)

Bug Fixes

  • preprocessing: error on duplicate primary keys (d28cae7)
  • validate the column types earlier to prevent duckdb auto-casting to cast types with differing round-trip conversions (092e601)

Copy link
Contributor

@fengelniederhammer fengelniederhammer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fengelniederhammer fengelniederhammer force-pushed the varchar-autocast branch 2 times, most recently from 1c02fb7 to 59b41f3 Compare October 24, 2024 13:00
… to cast types with differing round-trip conversions

We had a problem where DuckDB would wrongly autodetect types in the NDJSON input.
Types were inferred to "json" where they should have been strings.
@anna-parker
Copy link
Contributor

Since this change the dummyOrganism is failing to start because

[2024-10-25 10:29:27.109] [logger] [error] [api.cpp:289] Unknown metadata type: pango_lineage

This isn't very important as we only use it in the dummyOrganism - but would be good to figure out why

@Taepper
Copy link
Collaborator Author

Taepper commented Oct 25, 2024

Apparently, the linter problem in the github runners is more involved than I can reproduce. I will merge this anyways, as these bugfixes are quite good to have upstream. Work on the build fixes will meanwhile continue, but can take up to 4 hours to see the result, as the ARM images often need to be rebuild (.. and use the very slow github virtualization)

@Taepper Taepper merged commit 7062eb3 into main Oct 25, 2024
8 of 9 checks passed
@Taepper Taepper deleted the varchar-autocast branch October 25, 2024 10:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants