-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: pyemd to POT for EMD computation in wmdistance
#3327
Conversation
Thanks. Can you add some benchmarks into the PR description? Something to point people to, a TL;DR for the release notes. Also a link to this "POT" dependency, its license etc. Is it a maintained / stable library? |
I've updated my first comment with the info, but maybe you meant I should amend the first commit's message with all this info? |
Hi - Is this PR still being worked on? Happy to take over if needed. I just swapped out pyemd for POT and it improved performance by around a factor of 3. |
I'm still able to work on the PR, but now I'm simply waiting for a review. |
From a read-over on Github (no personal testing), this looks good. The As an aside – not at all a blocker for this issue, and not at all any knock about the work done here, which follows project precedents – it's a bit unfortunate, a "project smell", that the tiny functional change requires changes scattered across 16 files, with many bits of updated comments/boilerplate. The core of the change is small & well-contained: import a different library, replace one function call, in But all the conditionals in tests would ideally be oblivious to implementation, perhaps by using a single is-support-available test rather than a dependency-specific check. And all the autogenerated output changes (eg in But that's all more: thoughts for project going forward than anything requiring immediate changes. |
Thanks @TLouf! One other thing we may wish to do - Do we want to switch to that? |
Switch to it in which places, in pursuit of what advantage? |
This line to
First, since we are already using POT for Neither are strong reasons but thought I'd raise it in case others feel more strongly. |
As it, (Perhaps, making distance-measure configurable improves performance in some cases. If we know that to be true via investigation, then it could be a good change... but even then, better as a focused PR with its own rationale, rather than mixed with this sort of drop-in, functionality-maintaining performance boost.) |
I got this in CI, can @TLouf please double check?
|
@mpenkov just went along with CI's recommendation, however it added a ton of files, some of which I should not |
No, I think you did everything right, with the exception of those two files that the CI thinks you did not add. Can you add them? Or am I misunderstanding something? |
With what I added in my last commit I think CI should be happy, the 2 python files were already added, but I hadn't included some cache files generated by those scripts, from what I understood. |
Hi all - Can this be merged? I'd love to use this from the dev branch. Thanks for your work on this. |
Yes, of course. Thank you for reminding me. @TLouf Thank you for your work! |
fixes #3312
Why this change?
Timings for the following call (took a subset because I'm on an old laptop right now):
pyemd