-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring of benchmarks #133
Conversation
@Alexsandruss Thank you for your hard work here! I think this pull request is getting quite large and includes multiple features. It contains too much code to be reviewed. It's useful to have a way to split it into smaller tasks for your reviewers so that they can focus on specific portions of it. Wouldn't it be better to move an already finished feature to a separate PR? In this case we can quickly and efficiently review and merge it. |
This CODEOWNERS file contains errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am running Python3.9 and this was required to get it to work (requires changing some unions to an older style)
So far I am also running into an issue downloading datasets in 3.9 with urllib, specifically with the SSL certs. To get it to download I had to export SSL_CERT_DIR=/etc/ssl/certs To get it to properly download. Somehow this wasnt a problem with 3.10 ¯\(ツ)/¯ |
Also, some of the errors seen in the PR associated with KNN/sklearn in the CI checks have been solved with this PR: But likely a handling of Errors needs to occur to make sure that it will run through even though part of sklearn doesn't |
It looks like problem of specific python environment. Python 3.9 from conda-forge works fine. |
There is no external fix for this sklearn version, so previous version is pinned to dependencies from now. |
Naive question: could we catch Errors in such a way to allow for reports to be generated even in the case of a failure in a certain test case? Something like how its done in pytest, or would that be out of scope of this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @Alexsandruss , just as couple follow ups to my general comment.
8e9dde5
to
a3776ef
Compare
a3776ef
to
1510e96
Compare
- change parameters in sklearn and xgboost example configs - fix default value of `result-files` argument - change right border of `n_informative` parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partially reviewed. Overall have some questions/clarifications, minor suggestions, and potential revisions for examples that may not be functional but its functional and not much left to do before merge
configs/common/xgboost.json
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@razdoburdin please review xgboost part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Alexsandruss for the work done.
I am confident that new features and capabilities will only bring benefits to the projects we are working on.
For me, the description does not provide more argumentation and motivation for why changes were made in the whole project and in its individual parts. The changes affected practically every part of the libraries. Thus, I can call this implementation not “refactoring”, but scikit-learn_bench 2.0.
This PR goes beyond the usual contributions to the project. It is just remaking it. Therefore, I looked at it from the point of view of not a change, but rather as a new project that was written from scratch and can be compared with the previous main.
I would suggest:
- Taking measurements of the current main branch and this one. And compare what discrepancies there are. If there are no critical cases, then there are no blocks for merging.
- I also think it’s necessary to do initial checks using valgrind or other tools to track resources.
- Add new code owners, and try to split ownership of the project between several people.
- It makes sense to collect comments from the current review, add them to the backlog and add minor changes to the updated repo in small commits/PRs.
Assuming also review of people who already touch this code and contributed in the branch. @icfaust @ethanglaser @Vika-F
Many thanks!
- Add `false` and `true` to CLI parameters parser - Sync PyPI and conda-forge envs - Remove tabs from configs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some expansion of the CI checking and clarification of the python versions are necessary before merging. As far as I can tell only 3.10 is tested, though python 3.8+ is supported. I added some suggestions for some english corrections, and have slightly touched the code in some places.
Should the codebase use sklearnex as a submodule, and is it possible to link the dependencies to it to save maintenance time?
@Alexsandruss After discussions with others, for this PR, just limit everything to 3.10 so as to be able to merge it ASAP Please also change things related to private CI related to the python versions (so some of my previous comments can be disregarded short-term). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving pending a few minor revisions from comments. It's ready enough and can revise issues in follow-ups. Infra branch also looks like it supports (at least on CPU).
Benchmarks rework
Entry points: README, Developer Guide
Key features:
pd.DataFrame
's to perform comparison operations and use openpyxl functionality to write dataframes nativelySolved issues