-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make show_topics more consistent across models #448
Conversation
bd755f4
to
ed97132
Compare
ed97132
to
e6ac51c
Compare
FYI, I think the latest numpy will make these tests failing (at least on my local Python 3 environment). Seems that, at least for |
e6ac51c
to
1069477
Compare
Great, thanks! Can you also add a short summary to README, explaining whom this affects and how? Re. failing tests: what's the problem with int64? Re. indexing: I know this one. It's due to this line: that doc = [(1, 0.5), (2, 0.3)]
print numpy.asarray(doc)
array([[ 1. , 0.5],
[ 2. , 0.3]]) (the feature indexes become floats). This was an optimization for sending data chunks via Pyro in distributed mode, so that they serialize to smaller chunks + 6x speed up in transfer. I think it's no longer important => let's |
w.r.t., int64: It should be handled now in the test explicitly, but numpy was returning indexes as int in Python 2 while numpy.int64 in Python 3. Very strange. |
@cscorley does this need anything else, or are we ready to merge? (after resolving the conflict) |
@cscorley we'll make a new release again this week. Any chance we could get this in? |
Sorry @piskvorky, only really have the chance to work on weekends :) Luckily, the merge conflict was only in CHANGELOG.txt, so this should be ready to go. |
Perfect, thanks. @tmylk please review & merge. This is a major, breaking change, so we have to be very explicit about this change in the release notes / announcement! Probably warrants a major version bump too. |
Make show_topics more consistent across models
@piskvorky more or less, I didn't change it here because I don't know how that would affect the Pyro stuff, so I left it as is for this PR. No idea how to make a test for that, not familiar with Pyro at all. |
Alright! Let's change it in a separate PR. It will affect Pyro performance (=will need more memory in the distributed computing), but that use case is so marginal I can't say I care much. Let's just mention it visibly in the CHANGELOG, for people who do use the distributed functionality. Maybe we can even add this is an LDA parameter (default @mattilyra are you up for this PR? |
@piskvorky Yes I can take it. Having a parameter |
This PR contains changes to address issues #354 and #389. Note that this change breaks compatibility for anyone that has come to rely on the format for a particular model.
Additional changes include splitting up the increasingly growing test_models.py file into individual files :)