Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make most_similar accept topn of any integer type #2497

Merged
merged 5 commits into from
Jun 21, 2019

Conversation

Witiko
Copy link
Contributor

@Witiko Witiko commented May 17, 2019

This PR continues #2356 and #2461 and closes #2496. The PR makes the following changes:

@Witiko Witiko force-pushed the fix-most-similar branch 3 times, most recently from 7060ee8 to eefeb83 Compare May 18, 2019 08:35
@Witiko Witiko force-pushed the fix-most-similar branch from eefeb83 to 83455e2 Compare May 18, 2019 23:19
@Witiko
Copy link
Contributor Author

Witiko commented May 19, 2019

I tried to force-push a couple of times, but AppVeynor consistently fails to install dependencies.

@piskvorky
Copy link
Owner

piskvorky commented May 19, 2019

CC @mpenkov – can you please assist Vitek with the CI (AppVeyor) problems?

@mpenkov
Copy link
Collaborator

mpenkov commented May 21, 2019

Looks like appveyor is failing to install numpy under Py2:

distutils.errors.DistutilsError: Could not find suitable distribution for Requirement.parse('numpy<2.0.0,>=1.9.0')

@menshikh-iv Any idea why that could be?

On a related note, numpy will be dropping Py2 support in less than a year (https://www.numpy.org/neps/nep-0014-dropping-python2.7-proposal.html)... @piskvorky What does that mean for gensim and our plans for Py2 support in the future?

@menshikh-iv
Copy link
Contributor

menshikh-iv commented May 21, 2019

@mpenkov

Thanks to pyemd https://github.com/wmayner/pyemd/blob/develop/setup.py#L95
I recommend do not install it on windows with python 2 (just avoid this problem, if we are lucky, related tests will be skip).

upd: as an alternative way (if you looking to link above) - install numpy first (in separate command BEFORE pyemd install), this can help, but also can produce new issue (need to check).

@piskvorky
Copy link
Owner

piskvorky commented May 21, 2019

@mpenkov I don't think it means anything for us. We're not using any functionality from the newer NumPy releases. That pyemd dependency is marginal, we could drop it altogether if it gives us trouble.

In other words, we can continue to require a fairly old version of NumPy (the latest that supports py2?). But users are free to install newer versions of NumPy too — we don't care.

@Witiko
Copy link
Contributor Author

Witiko commented May 21, 2019

Please note that if we drop the pyemd dependency, 12 tests will be skipped.

@Witiko
Copy link
Contributor Author

Witiko commented May 21, 2019

I removed pyemd from win_testenv in setup.py, but this does not appear to solve the issue. The issue could be with numpy, scipy, or any of the packages in .[test-win], i.e. in win_testenv:

deps =
    py37: numpy==1.14.5
    py37: scipy==1.1.0

    py27: numpy==1.11.3
    py27: scipy==0.18.1
    py35: numpy==1.11.3
    py35: scipy==0.18.1
    py36: numpy==1.11.3
    py36: scipy==0.18.1

    linux: .[test]
    win: .[test-win]
    extras_require={
        'distributed': distributed_env,
        'test-win': win_testenv,
win_testenv = [
    'pytest',
    'pytest-rerunfailures',
    'mock',
    'cython',
    'testfixtures',
    'scikit-learn',
    'Morfessor==2.0.2a4',
    'python-Levenshtein >= 0.10.2',
    'visdom >= 0.1.8, != 0.1.8.7',
]

@menshikh-iv
Copy link
Contributor

menshikh-iv commented May 21, 2019

So, now the issue with scikit-learn, because they don't support python2 more
Possible solutions

@Witiko Witiko force-pushed the fix-most-similar branch from 976f734 to 83455e2 Compare May 21, 2019 23:22
@Witiko
Copy link
Contributor Author

Witiko commented May 21, 2019

I reverted the pyemd removal, so that we can solve this general issue separately.

@mpenkov
Copy link
Collaborator

mpenkov commented May 22, 2019

@menshikh-iv I'm +1 for removing Py2 support. We could do this in the next major release.

If people are keen to continue using Py2, then that's fine, they can keep using gensim 3.x. @piskvorky WDYT?

@piskvorky
Copy link
Owner

piskvorky commented May 22, 2019

Yes, next major release (4.x) is OK. No clear release date nor milestones for 4.x though, so this is theoretical for now. Getting Gensim into a better shape is more important (pyemd, sklearn are largely irrelevant to Gensim; we do need numpy though).

Sorry for hijacking your PR @Witiko :)

@Witiko
Copy link
Contributor Author

Witiko commented May 24, 2019

I don't mind, dropping the Python 2 support is an interesting topic. Please, let me know if I can help, or when I can rerun the tests for this PR.

@mpenkov
Copy link
Collaborator

mpenkov commented May 31, 2019

@piskvorky @Witiko I fixed the AppVeyor problems. Is this ready to merge?

@Witiko
Copy link
Contributor Author

Witiko commented May 31, 2019

@mpenkov Great work! Yes, this PR should be ready to merge.

@piskvorky
Copy link
Owner

piskvorky commented Jun 19, 2019

@mpenkov ping. After fixing smart_open, perhaps we can make a bugfix release? Anything else you want in? Maybe remove that stillborn NMF code?

@mpenkov mpenkov merged commit 369cc63 into piskvorky:develop Jun 21, 2019
@mpenkov
Copy link
Collaborator

mpenkov commented Jun 21, 2019

OK, merged. @Witiko Thank you for your contribution and your patience.

@piskvorky What do you think about the scope of this change? This seems to be new functionality: does it warrant a minor version bump?

@piskvorky
Copy link
Owner

Looking at the PR description: this looks like a textbook bug fix.

Which part do you see as substantial?

@mpenkov
Copy link
Collaborator

mpenkov commented Jun 21, 2019

I think I got it mixed up with another PR. We'll treat this as a bugfix.

@mpenkov mpenkov added the bug Issue described a bug label Jun 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue described a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SparseTermSimilarityMatrix - TypeError: 'numpy.float32' object is not iterable
4 participants