-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: OneHotEncoder no longer creates duplicate column names #271
fix: OneHotEncoder no longer creates duplicate column names #271
Conversation
also count number of occurences of column names
Tests run through, column name format not yet as specified (one "_" instdead of two)
Co-authored-by: ilkajw <123072184+ilkajw@users.noreply.github.com>
🦙 MegaLinter status: ✅ SUCCESS
See detailed report in MegaLinter reports |
Co-authored-by: ilkajw <123072184+ilkajw@users.noreply.github.com>
reverse_transform still missing
Still need to fix one test which checks the wrapped_encoder. Still need to change single to double underscore and update tests accordingly. Co-authored-by: ilkajw <123072184+ilkajw@users.noreply.github.com>
…lumns-with-same-name
Codecov Report
@@ Coverage Diff @@
## main #271 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 43 43
Lines 1761 1786 +25
=========================================
+ Hits 1761 1786 +25
|
Changed to double underscore. Adapted tests accordingly.
Note that this breaks code that depends on the old column renaming schema (single underscore as separator). Not sure if the keyword in the PR message is enough for that. Also note that this PR does not yet include performance tests. |
While the version of this library is in the |
Once #301 is implemented, we can shorten the implementation of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks!
## [0.12.0](v0.11.0...v0.12.0) (2023-05-11) ### Features * add `learning_rate` to AdaBoost classifier and regressor. ([#251](#251)) ([7f74440](7f74440)), closes [#167](#167) * add alpha parameter to `lasso_regression` ([#232](#232)) ([b5050b9](b5050b9)), closes [#163](#163) * add parameter `lasso_ratio` to `ElasticNetRegression` ([#237](#237)) ([4a1a736](4a1a736)), closes [#166](#166) * Add parameter `number_of_tree` to `RandomForest` classifier and regressor ([#230](#230)) ([414336a](414336a)), closes [#161](#161) * Added `Table.plot_boxplots` to plot a boxplot for each numerical column in the table ([#254](#254)) ([0203a0c](0203a0c)), closes [#156](#156) [#239](#239) * Added `Table.plot_histograms` to plot a histogram for each column in the table ([#252](#252)) ([e27d410](e27d410)), closes [#157](#157) * Added `Table.transform_table` method which returns the transformed Table ([#229](#229)) ([0a9ce72](0a9ce72)), closes [#110](#110) * Added alpha parameter to `RidgeRegression` ([#231](#231)) ([1ddc948](1ddc948)), closes [#164](#164) * Added Column#transform ([#270](#270)) ([40fb756](40fb756)), closes [#255](#255) * Added method `Table.inverse_transform_table` which returns the original table ([#227](#227)) ([846bf23](846bf23)), closes [#111](#111) * Added parameter `c` to `SupportVectorMachines` ([#267](#267)) ([a88eb8b](a88eb8b)), closes [#169](#169) * Added parameter `maximum_number_of_learner` and `learner` to `AdaBoost` ([#269](#269)) ([bb5a07e](bb5a07e)), closes [#171](#171) [#173](#173) * Added parameter `number_of_trees` to `GradientBoosting` ([#268](#268)) ([766f2ff](766f2ff)), closes [#170](#170) * Allow arguments of type pathlib.Path for file I/O methods ([#228](#228)) ([2b58c82](2b58c82)), closes [#146](#146) * convert `Schema` to `dict` and format it nicely in a notebook ([#244](#244)) ([ad1cac5](ad1cac5)), closes [#151](#151) * Convert between Excel file and `Table` ([#233](#233)) ([0d7a998](0d7a998)), closes [#138](#138) [#139](#139) * convert containers for tabular data to HTML ([#243](#243)) ([683c279](683c279)), closes [#140](#140) * make `Column` a subclass of `Sequence` ([#245](#245)) ([a35b943](a35b943)) * mark optional hyperparameters as keyword only ([#296](#296)) ([44a41eb](44a41eb)), closes [#278](#278) * move exceptions back to common package ([#295](#295)) ([a91172c](a91172c)), closes [#177](#177) [#262](#262) * precision metric for classification ([#272](#272)) ([5adadad](5adadad)), closes [#185](#185) * Raise error if an untagged table is used instead of a `TaggedTable` ([#234](#234)) ([8eea3dd](8eea3dd)), closes [#192](#192) * recall and F1-score metrics for classification ([#277](#277)) ([2cf93cc](2cf93cc)), closes [#187](#187) [#186](#186) * replace prefix `n` with `number_of` ([#250](#250)) ([f4f44a6](f4f44a6)), closes [#171](#171) * set `alpha` parameter for regularization of `ElasticNetRegression` ([#238](#238)) ([e642d1d](e642d1d)), closes [#165](#165) * Set `column_names` in `fit` methods of table transformers to be required ([#225](#225)) ([2856296](2856296)), closes [#179](#179) * set learning rate of Gradient Boosting models ([#253](#253)) ([9ffaf55](9ffaf55)), closes [#168](#168) * Support vector machine for regression and for classification ([#236](#236)) ([7f6c3bd](7f6c3bd)), closes [#154](#154) * usable constructor for `Table` ([#294](#294)) ([56a1fc4](56a1fc4)), closes [#266](#266) * usable constructor for `TaggedTable` ([#299](#299)) ([01c3ad9](01c3ad9)), closes [#293](#293) ### Bug Fixes * OneHotEncoder no longer creates duplicate column names ([#271](#271)) ([f604666](f604666)), closes [#201](#201) * selectively ignore one warning instead of all warnings ([#235](#235)) ([3aad07d](3aad07d))
🎉 This PR is included in version 0.12.0 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
Closes #201.
Summary of Changes
Changed OneHotEncoder to manually implement the encoding.
(Breaking) Changed the format of newly generated columns to use two underscores as separator. In case of naming conflicts, a hash and a unique ID will be appended to the column name.