-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kartothek 4.0 update_dataset_from_ddf corrupts datasets if a different table name is used #445
Comments
The bug is in Line https://github.com/JDASoftwareGroup/kartothek/blob/master/kartothek/io_components/metapartition.py#L468 which drops the table_name silently and falls back to the default. Constructor should add
|
stephan-hesselmann-by
added a commit
to stephan-hesselmann-by/kartothek
that referenced
this issue
Apr 8, 2021
When updating a dataset with a table name other than 'table', an additional table named 'table' is erroneously created. This corrupts the dataset. The issue was introduced after deprecating the table name feature in the 4.0.0 release. The root cause is not passing the table name as an argument within `partition_on` and `add_metapartition`, which leads to the default table name "table" being used.
2 tasks
stephan-hesselmann-by
added a commit
that referenced
this issue
Apr 12, 2021
When updating a dataset with a table name other than 'table', an additional table named 'table' is erroneously created. This corrupts the dataset. The issue was introduced after deprecating the table name feature in the 4.0.0 release. The root cause is not passing the table name as an argument within `partition_on` and `add_metapartition`, which leads to the default table name "table" being used.
stephan-hesselmann-by
added a commit
to stephan-hesselmann-by/kartothek
that referenced
this issue
Apr 12, 2021
When updating a dataset with a table name other than 'table', an additional table named 'table' is erroneously created. This corrupts the dataset. The issue was introduced after deprecating the table name feature in the 4.0.0 release. The root cause is not passing the table name as an argument within `partition_on` and `add_metapartition`, which leads to the default table name "table" being used.
Fixed by #451 |
ilia-zaitcev-by
added a commit
to ilia-zaitcev-by/kartothek
that referenced
this issue
May 26, 2021
Revert "Bump codecov/codecov-action from v1.4.1 to v1.5.0 (JDASoftwareGroup#466)" This reverts commit fdc9779. Revert "fix mistakes in documentation" This reverts commit 4e4b5e0. Revert "Bump pre-commit/action from v2.0.0 to v2.0.3 (JDASoftwareGroup#460)" This reverts commit d027ca2. Revert "Bump codecov/codecov-action from v1.4.0 to v1.4.1 (JDASoftwareGroup#461)" This reverts commit 97cd553. Revert "Bump codecov/codecov-action from v1.3.1 to v1.4.0 (JDASoftwareGroup#458)" This reverts commit e48d67a. Revert "Fix bug when loading few columns of a dataset with many primary indices (JDASoftwareGroup#446)" This reverts commit 90ee486. Revert "Prepare release 4.0.1" This reverts commit b278503. Revert "Fix tests for dask dataframe and delayed backends" This reverts commit 5520f74. Revert "Add end-to-end regression test" This reverts commit 8a3e6ae. Revert "Fix dataset corruption after updates (JDASoftwareGroup#445)" This reverts commit a26e840. Revert "Set release date for 4.0" This reverts commit 08a8094. Revert "Return dask scalar for store and update from ddf (JDASoftwareGroup#437)" This reverts commit 494732d. Revert "Add tests for non-default table (JDASoftwareGroup#440)" This reverts commit 3807a02. Revert "Bump codecov/codecov-action from v1.2.2 to v1.3.1 (JDASoftwareGroup#441)" This reverts commit f7615ec. Revert "Set default for dates_as_object to True (JDASoftwareGroup#436)" This reverts commit 75ffdb5. Revert "Remove inferred indices (JDASoftwareGroup#438)" This reverts commit b1e2535. Revert "fix typo: 'KTK_CUBE_UUID_SEPERATOR' -> 'KTK_CUBE_UUID_SEPARATOR' (JDASoftwareGroup#422)" This reverts commit b349cee. Revert "Remove all deprecated arguments (JDASoftwareGroup#434)" This reverts commit 74f0790. Revert "Remove multi table feature (JDASoftwareGroup#431)" This reverts commit 032856a.
ilia-zaitcev-by
added a commit
to ilia-zaitcev-by/kartothek
that referenced
this issue
Jun 11, 2021
Revert "Bump codecov/codecov-action from v1.4.1 to v1.5.0 (JDASoftwareGroup#466)" This reverts commit fdc9779. Revert "fix mistakes in documentation" This reverts commit 4e4b5e0. Revert "Bump pre-commit/action from v2.0.0 to v2.0.3 (JDASoftwareGroup#460)" This reverts commit d027ca2. Revert "Bump codecov/codecov-action from v1.4.0 to v1.4.1 (JDASoftwareGroup#461)" This reverts commit 97cd553. Revert "Bump codecov/codecov-action from v1.3.1 to v1.4.0 (JDASoftwareGroup#458)" This reverts commit e48d67a. Revert "Fix bug when loading few columns of a dataset with many primary indices (JDASoftwareGroup#446)" This reverts commit 90ee486. Revert "Prepare release 4.0.1" This reverts commit b278503. Revert "Fix tests for dask dataframe and delayed backends" This reverts commit 5520f74. Revert "Add end-to-end regression test" This reverts commit 8a3e6ae. Revert "Fix dataset corruption after updates (JDASoftwareGroup#445)" This reverts commit a26e840. Revert "Set release date for 4.0" This reverts commit 08a8094. Revert "Return dask scalar for store and update from ddf (JDASoftwareGroup#437)" This reverts commit 494732d. Revert "Add tests for non-default table (JDASoftwareGroup#440)" This reverts commit 3807a02. Revert "Bump codecov/codecov-action from v1.2.2 to v1.3.1 (JDASoftwareGroup#441)" This reverts commit f7615ec. Revert "Set default for dates_as_object to True (JDASoftwareGroup#436)" This reverts commit 75ffdb5. Revert "Remove inferred indices (JDASoftwareGroup#438)" This reverts commit b1e2535. Revert "fix typo: 'KTK_CUBE_UUID_SEPERATOR' -> 'KTK_CUBE_UUID_SEPARATOR' (JDASoftwareGroup#422)" This reverts commit b349cee. Revert "Remove all deprecated arguments (JDASoftwareGroup#434)" This reverts commit 74f0790. Revert "Remove multi table feature (JDASoftwareGroup#431)" This reverts commit 032856a.
steffen-schroeder-by
pushed a commit
that referenced
this issue
Jun 11, 2021
Revert "Bump codecov/codecov-action from v1.4.1 to v1.5.0 (#466)" This reverts commit fdc9779. Revert "fix mistakes in documentation" This reverts commit 4e4b5e0. Revert "Bump pre-commit/action from v2.0.0 to v2.0.3 (#460)" This reverts commit d027ca2. Revert "Bump codecov/codecov-action from v1.4.0 to v1.4.1 (#461)" This reverts commit 97cd553. Revert "Bump codecov/codecov-action from v1.3.1 to v1.4.0 (#458)" This reverts commit e48d67a. Revert "Fix bug when loading few columns of a dataset with many primary indices (#446)" This reverts commit 90ee486. Revert "Prepare release 4.0.1" This reverts commit b278503. Revert "Fix tests for dask dataframe and delayed backends" This reverts commit 5520f74. Revert "Add end-to-end regression test" This reverts commit 8a3e6ae. Revert "Fix dataset corruption after updates (#445)" This reverts commit a26e840. Revert "Set release date for 4.0" This reverts commit 08a8094. Revert "Return dask scalar for store and update from ddf (#437)" This reverts commit 494732d. Revert "Add tests for non-default table (#440)" This reverts commit 3807a02. Revert "Bump codecov/codecov-action from v1.2.2 to v1.3.1 (#441)" This reverts commit f7615ec. Revert "Set default for dates_as_object to True (#436)" This reverts commit 75ffdb5. Revert "Remove inferred indices (#438)" This reverts commit b1e2535. Revert "fix typo: 'KTK_CUBE_UUID_SEPERATOR' -> 'KTK_CUBE_UUID_SEPARATOR' (#422)" This reverts commit b349cee. Revert "Remove all deprecated arguments (#434)" This reverts commit 74f0790. Revert "Remove multi table feature (#431)" This reverts commit 032856a.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Problem description
Kartothek 4.0 breaks updating existing datasets that use a different table name then the default.
Expected Behaviour
the table name keyword is respected in updating datasets.
Example code
Output:
Expected Output:
The example above is just a minimal example to show that the the table name is not used in update_dataset_from_ddf. The corruption of dataset occurs if you update an existing (pre 4.0 kartothek dataset) with the new release and you suddenly have two table name in your dataset ('predictions' and 'table').
raises the following exception
Dataset layout
Used versions
The text was updated successfully, but these errors were encountered: