-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New initial assumption #1350
base: master
Are you sure you want to change the base?
New initial assumption #1350
Conversation
Hello @dmitryglhf! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2025-01-07 15:38:49 UTC |
RIDGE = 'ridge' | ||
|
||
# Parameters of models | ||
models_params = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Это какие-то эффективные гиперпараметры?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
При тестировании пайплайнов на Kaggle с указанными гиперпараметрами значение метрики немного улучшалось. Настройку параметров прекращал, когда скорость работы значительно уменьшалась или качество падало. Для CatBoost и линейных моделей параметры остались по умолчанию, поскольку они изначально показывали хорошие результаты, а при подборе метрика ухудшалась и/или время работы возрастало (для CatBoost).
.add_branch((CATBOOSTREG, models_params[CATBOOSTREG]), | ||
(XGBOOSTREG, models_params[XGBOOSTREG]), | ||
(LGBMREG, models_params[LGBMREG])) \ | ||
.join_branches(CATBOOSTREG, models_params[CATBOOSTREG]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
А почему модели обьединены CATBOOSTREG, а не линейной моделью?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Пробовал объединять с помощью Random Forest и линейных моделей. В обоих случаях метрика немного ухудшалась. Аналогичный результат был, когда эти модели были перед разветвлением. Это тестирование проводилось не на полном бенчмарке, а на Kaggle. Думаю здесь мне стоит провести больше тестов с объединением линейной моделью.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ну вот в презентации расхваливалась именно линейная модель. Можно конечно оба варианта добавлять в популяцию, с использованием кэша это вычислительно не очень дорого.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Хорошо, я протестирую это.
Вроде он и сейчас в начальной популяции есть. Он не вообще используется или просто проигрывает в ходе отбора? |
Сейчас он есть, но не использовался, в бенчмарке тестировалось только 'gbm' приближение. |
/fix-pep8 |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1350 +/- ##
=======================================
Coverage 80.33% 80.34%
=======================================
Files 146 146
Lines 10464 10469 +5
=======================================
+ Hits 8406 8411 +5
Misses 2058 2058 ☔ View full report in Codecov by Sentry. |
Nodes fit time in Node: scaling, fit time: 1.89 seconds |
Training duration reduced in Full table and Detailscatboost_only.csv |
Full table and Detailscatboost_new_params.csv
|
This is a 🔨 code refactoring.
Summary
This PR introduces the following key updates:
New Initial Assumptions: Updates initial assumptions by adding boosting-based solutions.
Comparison table between old and new assumptions (validated on automlbenchmark):
Full table and Details
On
kc1
(openml.org/t/3917) got an issue:but, it succesfully performs on custom launch:
Bug Fix: Resolved issues in the convert_to_dataframe method for both XGBoost and LightGBM models:
Follow this link to check bug reproducing:
https://colab.research.google.com/drive/1r09xVZeVYSaTcQmG8-0r0RJyPfGesqq9?usp=sharing
This case happens specifically in XGBoost and LGBM because of the
convert_to_dataframe
method when we try to use a container as a target inmodel.fit(features=X_train, target=y_train)
instead ofmodel.fit(features=train, target='target')
Context
Closes #1341