-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove unnecessary assert blocking direct usage of CSC for LSI #1622
Conversation
@isamaru IIRC, the code path that handles CSC was there for the distributed version only. The master worker prepares batches for workers as CSC matrices; the workers process the batches and send the results back to master over the network. The code was never meant to be used in this way, as the "main" worker, it's a different use case. I actually wouldn't expect it to work so easily (do all the necessary model setup etc, as if it was the master itself). But if it does, that's great. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check that distributed onepass
works properly, look at this tutorial: https://radimrehurek.com/gensim/dist_lsi.html
@radim Thanks for the explanation! Makes much more sense now. Distributed algorithm with not work on CSC directly as written now: the dispatcher code is only enabled in the top branch ( To sum up: serial (non-distributed) algorithm will work directly on CSC. Distributed will not work, failing on the very same assert than before ('must be in serial mode to receive jobs'). I could rewrite that assert message to be more meaningful to the user: "Distributed mode not supported for CSC matrices directly. Use csc2corpus." |
I see, thanks. I think that's fine as-is, no changes needed -- nobody really uses the (ancient) distributed mode, AFAIK. It's one of the candidate features for removal. |
@piskvorky ok, I close this PR |
Why? We definitely want this PR. |
@piskvorky as I understand, |
No, it means we don't need to rewrite the assert message. |
Thanks @isamaru 🥇 |
@piskvorky @janpom @menshikh-iv
Removes the assert which is blocking usage of CSC directly as corpus for LsiModel with multiple power iterations.
I could not figure out why the assert was there in the first place: it doesn't make sense to me: The code path handles CSC exclusively but I don't see why CSC would require a single pass only. It works just fine with multiple power iterations.