Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add export_phrases function to Phrase model (TODO: cythonize the primitive loops) #588

Merged
merged 18 commits into from
Jun 9, 2016
Merged

Conversation

hanabi1224
Copy link
Contributor

No description provided.

@hanabi1224
Copy link
Contributor Author

The .h file under models may not be included in system C++ compiler,
This change is to avoid manually copying that file
But problem here is when models is imported in gensim.init, it starts compiling word2vec_inner before user call gensim.get_include (like numpy), so initialize pyximport in gensim.init or export get_include and stop importing models in gensim.init

@tmylk
Copy link
Contributor

tmylk commented Jan 23, 2016

Please add tests and CHANGELOG for the new functionality. Thanks.

@tmylk tmylk closed this Jan 23, 2016
@tmylk tmylk reopened this Jan 23, 2016
@gojomo
Copy link
Collaborator

gojomo commented Jan 24, 2016

The cython-related commits seem unrelated to the headline title?

@hanabi1224 hanabi1224 changed the title Add export_phrases function to Phrase model Add export_phrases function to Phrase model & try to initialize pyximport in gensim.__init__ Jan 24, 2016
@hanabi1224
Copy link
Contributor Author

@tmylk @gojomo comments resolved, thanks!

@tmylk
Copy link
Contributor

tmylk commented Jan 28, 2016

@gojomo Should this be merged?

@gojomo
Copy link
Collaborator

gojomo commented Jan 28, 2016

There's still two unrelated things here:

(1) Some sort of Windows-specific workaround for an issue with the cython code. I don't know who's been affected by that and have no way to evaluate if the change is a net improvement, since I don't build/use on Windows (and maybe this only affects some compilers/pythons, or it would have been an issue for others). It seems to me that change should get a separate PR.

(2) A utility method to yield just the phrases and their scores from a block of sentences (instead of the usual mix of phrase-ified content). I don't know the exact motivation for that need – was it a feature request? But, the method export_phrases() appears to be a straightforward way to dump that info, if someone needed it.

@hanabi1224
Copy link
Contributor Author

@gojomo
for 1, I'm currently using windows and MSVC for python27 contains neither .h file under models nor under num.get_include out of box, what I originally thought was to expose an get_include method from gensim, however, models package is included in gensim.init, which directly triggers cython compile logic before caller can call gensim.get_include, and I don't think it's possible to remove models import in gensim.init (this is will make 'import gensim;gensim.models.Word2Vec' not working), and in such situation, the only work around is to copy .h file into compiler includes

for 2, yes, it is a feature request, I'd like to dump phrase, score pair to do some rough manual review

@piskvorky
Copy link
Owner

Re. item 1: we definitely don't want the pyximport parts, that's a regression. Gensim doesn't need cython (it ships with already-compiled C modules), and the dynamic pyximport doesn't work well in any case (it messes up imports of other, unrelated libraries that use cython; according to cython devs, it's an experimental/development feature, not meant for end use in libraries).

@hanabi1224
Copy link
Contributor Author

@piskvorky good to know we have already-compiled C modules, How to make it work? Does below words in tutorials implies it will compile once during installation, what if user misses that point? Does he have to uninstall, make c compiler ready and install again? In my case, an on-demand runtime-compile does not seems to make things worse anyway :)

P.S. I have not tried gensim on linux yet, which may mitigate all the trouble I met on windows hopefully : )

Make sure you have a C compiler before installing gensim, to use optimized (compiled) word2vec training

@piskvorky
Copy link
Owner

Yes, the C code is compiled once, during gensim install. The install outputs a warning in case the compilation fails, notifying the user that some functionality will be slow as a result.

@hanabi1224
Copy link
Contributor Author

@piskvorky got it, how about the feature request of export_phrases? If feasible, I can revert cython related change

@piskvorky
Copy link
Owner

Sure, that sounds good.

Incidentally, the phrases module is a great adept at cythonization = rewriting the primitive loops in cython, compiling an (optional) C module so that it all runs faster.

@hanabi1224 hanabi1224 changed the title Add export_phrases function to Phrase model & try to initialize pyximport in gensim.__init__ Add export_phrases function to Phrase model (TODO: cythonize the primitive loops) Jan 29, 2016
@tmylk
Copy link
Contributor

tmylk commented Jun 9, 2016

@hanabi1224 Thanks for the PR!

@tmylk tmylk merged commit 650cf55 into piskvorky:develop Jun 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants