-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong calculation for max_iter_dump #1216
Comments
Cc @parulsethi |
Actually WordRank dumps the current iteration's file with the start of next iteration. So basically, we need to do 101 iterations for getting 100th iter file. And hence, the corresponding I've mentioned this thing under Train Model in wordrank quickstart, but yeah anyone could easily miss on it. I guess, better behavior would be to simply add 1 more iter in wrapper's train() itself (and so updating |
Just saw your note in the quickstart notebook. Thats implicit knowledge and i think the code should handle it or warn and not crash. This is what I think it should be |
Yep, I still think we can do n+1 iterations because then if someone input, for ex. 100 iterations, they would get the result of 100th iter only, and not 95 (am I missing any drawback of doing n+1 iteration? as if it's time then there would already be 5 redundant iterations in case of 95) So IMO solution could be:
and then doing |
sounds good, updated the PR |
In
https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/wrappers/wordrank.py#L144
Shouldnt this line
max_iter_dump = iter / dump_period * dump_period - 1
just bemax_iter_dump = iter - dump_period
?To reproduce try these parameters:
model = Wordrank.train(wr_path, data, out_dir, iter=100, dump_period=5)
It will error out with -
Mainly because
max_iter_dump = iter / dump_period * dump_period - 1
calculates max_iter_dump=99 instead of 95.The text was updated successfully, but these errors were encountered: