Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In TaggedTable override methods of Table that return a Table #58

Closed
lars-reimann opened this issue Mar 22, 2023 · 10 comments · Fixed by #332
Closed

In TaggedTable override methods of Table that return a Table #58

lars-reimann opened this issue Mar 22, 2023 · 10 comments · Fixed by #332
Assignees
Labels
released Included in a release

Comments

@lars-reimann
Copy link
Member

lars-reimann commented Mar 22, 2023

Is your feature request related to a problem?

Now that TaggedTable is a subclass of Table, it also inherits some methods that return a new Table. However, currently those methods lose the tags, since a Table simply doesn't know about them. Methods that remove tagged Columns must
throw an IllegalSchemaModificationException.

Desired solution

Override the methods and return a new TaggedTable instead.

Possible alternatives (optional)

No response

Screenshots (optional)

No response

Additional Context (optional)

No response

@github-project-automation github-project-automation bot moved this to Backlog in Library Mar 22, 2023
@lars-reimann lars-reimann moved this from Backlog to 🧱 Blocked in Library May 6, 2023
@lars-reimann
Copy link
Member Author

Blocked by #190.

@alex-senger alex-senger moved this from 🧱 Blocked to Todo in Library May 12, 2023
@sibre28 sibre28 moved this from Todo to 🧱 Blocked in Library May 12, 2023
@sibre28 sibre28 moved this from 🧱 Blocked to Todo in Library May 12, 2023
@Marsmaennchen221 Marsmaennchen221 moved this from Todo to 🧱 Blocked in Library May 12, 2023
@PhilipGutberlet PhilipGutberlet moved this from 🧱 Blocked to Todo in Library May 19, 2023
@zzril zzril linked a pull request May 29, 2023 that will close this issue
@zzril zzril moved this from Todo to In Progress in Library May 29, 2023
@zzril
Copy link
Contributor

zzril commented May 30, 2023

Converning the InvalidSchemaModificationError that shall be thrown when attempting to remove the target column from a TaggedTable: Should this exception also be documented in the docstring of the parent class's implementation of the corresponding method?

I guess in a proper statically-typed language like Java, this would be the way to go (subclass can only throw fewer or more concrete exceptions than the parent class), but I'm not sure how this is typically done in Python.

@lars-reimann
Copy link
Member Author

Only put it in the docstring of the subclass.

@zzril
Copy link
Contributor

zzril commented May 30, 2023

Regarding the transform (and possibly inverse_transform) methods of TableTransformer: Shall these also maintain tagging?

That could not be achieved by overriding. Instead, one could add a check in the fit method to see if the table is tagged and if so, save the corresponding column name in the transformer. Then restore tagging at the end of the transform methods.
Might be a bit trickier with those transformers that can also change / remove columns (e.g. the OneHotEncoder).

Might be worth a new issue, though.

@lars-reimann
Copy link
Member Author

No, only the transform and inverse_transform methods of TaggedTable should have that property.

@zzril
Copy link
Contributor

zzril commented Jun 1, 2023

Only put it in the docstring of the subclass.

Turns out this behaviour of letting the subclass throw more exceptions is problematic.

Some of the tests (e.g. test_should_raise_if_dataset_misses_features in tests/safeds/ml/classical/regression/test_regressor.py mistakenly call remove_columns on a TaggedTable when they really should be using a normal Table.

Now while we could just go ahead and change the tests to use Tables instead, the fact that we use inheritance for the Table / TaggedTable would indicate that this should not really be a problem. Replacing a usage of a Table with a subclass, e.g. TaggedTable, should - in theory - not break anything. And this is also what users of the library may expect.

So, we either need to change the signature/docstring of the method in the parent class to make clear that the method can always throw the exception, so the caller must be prepared to handle it.
Or we could have remove_columns return a Table even in the child class. In this case, we could additionally provide a new method remove_feature_columns in the subclass that returns a TaggedTable and possibly throws the exception.

@lars-reimann

@guenterk
Copy link

guenterk commented Jun 9, 2023

For simplicity and uniformity, implement the first solution proposed by @zzril : "Change the signature/docstring of the method in the parent class to make clear that the method can always throw the exception, so the caller must be prepared to handle it."

For uniformity, apply the same schema if similar cases occur in the future (allways make sure that the supertype's contract is general enought).

@zzril
Copy link
Contributor

zzril commented Jun 9, 2023

Alright.

Some of the ML tests currently use remove_columns to extract the feature columns out of a TaggedTable, though, which will not be possible after changing it to always return a TaggedTable or throw.

So, we will have to implement a get_features_as_table or remove_target_column method anyway, for these tests to use.

Example from test_classifier.py:

    def test_should_include_complete_input_table(self, classifier: Classifier, valid_data: TaggedTable) -> None:
        fitted_regressor = classifier.fit(valid_data)
        prediction = fitted_regressor.predict(valid_data.remove_columns(["target"]))
        assert prediction.remove_columns(["target"]) == valid_data.remove_columns(["target"])

@guenterk
Copy link

guenterk commented Jun 9, 2023

End of day status report by @zzril: Implemented the above solution, except for the "keep_only" method of TaggedTable, which is heavily used throughout tests and the implementation of the TaggedTable class itself (e.g. in the constructor) in a way where returning a TaggedTable breaks existing code. Wee need to discuss these cases and determine how to treat them before we can complete this issue.

@Marsmaennchen221 Marsmaennchen221 moved this from In Progress to Ready for Review in Library Jul 7, 2023
@alex-senger alex-senger moved this from Ready for Review to Ready to Merge in Library Jul 7, 2023
@zzril zzril moved this from Ready to Merge to Ready for Review in Library Jul 9, 2023
@zzril zzril closed this as completed in #332 Jul 9, 2023
zzril added a commit that referenced this issue Jul 9, 2023
Closes #58.

### Summary of Changes

* Overrode methods in `TaggedTable` inherited from `Table` to maintain
tagging.
* Introduced new exceptions to be thrown when the overriden method
cannot be applied
* Added tests.

---------

Co-authored-by: megalinter-bot <129584137+megalinter-bot@users.noreply.github.com>
Co-authored-by: Alexander Gréus <alexgreus51@gmail.com>
Co-authored-by: Alexander <47296670+Marsmaennchen221@users.noreply.github.com>
Co-authored-by: patrikguempel <patrikguempel@gmail.com>
Co-authored-by: Simon Breuer <86068340+sibre28@users.noreply.github.com>
@github-project-automation github-project-automation bot moved this from Ready for Review to ✔️ Done in Library Jul 9, 2023
lars-reimann pushed a commit that referenced this issue Jul 13, 2023
## [0.15.0](v0.14.0...v0.15.0) (2023-07-13)

### Features

* Add copy method for tables ([#405](#405)) ([72e87f0](72e87f0)), closes [#275](#275)
* add gaussian noise to image ([#430](#430)) ([925a505](925a505)), closes [#381](#381)
* add schema conversions when adding new rows to a table and schema conversion when creating a new table ([#432](#432)) ([6e9ff69](6e9ff69)), closes [#404](#404) [#322](#322) [#127](#127) [#322](#322) [#127](#127)
* add test for empty tables for the method `Table.sort_rows` ([#431](#431)) ([f94b768](f94b768)), closes [#402](#402)
* added color adjustment feature ([#409](#409)) ([2cbee36](2cbee36)), closes [#380](#380)
* added test_repr table tests ([#410](#410)) ([cb77790](cb77790)), closes [#349](#349)
* discretize table ([#327](#327)) ([5e3da8d](5e3da8d)), closes [#143](#143)
* Improve error handling of TaggedTable ([#450](#450)) ([c5da544](c5da544)), closes [#150](#150)
* Maintain tagging in methods inherited from `Table` class ([#332](#332)) ([bc73a6c](bc73a6c)), closes [#58](#58)
* new error class `OutOfBoundsError` ([#438](#438)) ([1f37e4a](1f37e4a)), closes [#262](#262)
* rename several `Table` methods for consistency ([#445](#445)) ([9954986](9954986)), closes [#439](#439)
* suggest similar columns if column gets accessed that doesnt exist ([#385](#385)) ([6a097a4](6a097a4)), closes [#203](#203)

### Bug Fixes

* added the missing ids in parameterized tests ([#412](#412)) ([dab6419](dab6419)), closes [#362](#362)
* don't warn if `Imputer` transforms column without missing values ([#448](#448)) ([f0cb6a5](f0cb6a5))
* Warnings raised by underlying seaborn and numpy libraries  ([#425](#425)) ([c4143af](c4143af)), closes [#357](#357)
@lars-reimann
Copy link
Member Author

🎉 This issue has been resolved in version 0.15.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@lars-reimann lars-reimann added the released Included in a release label Jul 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
released Included in a release
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants