-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filename overwrite in FastText.load_fasttext_format
#2407
Comments
Not necessarily a bug, but I do think we could support this. Especially since we're already using The other option is to inform users through documentation/tutorials that they need to unpack their models first. @menshikh-iv can you open a PR? |
from user PoV - definitely a bug (fasttext try to open a file that user don't specify, instead of passed file and failed on it).
No time, sorry |
@menshikh-iv what is the |
@piskvorky ask an author of this code, not me (btw @mpenkov isn't an author too, they just refactored it, according to git blame). I guess that this is for the case when user have 2 files |
Well asking you, the project maintainer from when these changes happened and were merged, seemed like a good start to understand them :) If the filename change is not needed (or its reason is undocumented/unknown/outdated), then I propose simply removing it. Silently modifying user input looks like an anti-pattern to me. Feel free to open a PR when you have the time. We'll have to be careful with updating tutorials too (if they rely on this silent extension promotion). |
I'm already not a maintainer @piskvorky, this change was introduced 2 years ago (when I only start, just merged PR by Jayant approve), I really remember nothing about it. I found an "source" of this changes, this code firstly was introduced in #1341 (and after moves to other classes, like @jayantj @prakhar2b can you comment that? |
My understanding is this mirrors FB behavior. You specify model_name, and it creates model_name.bin and model_name.vec. This feature allows the user to specify model_name only and not worry about the file extension. In my opinion, we’re better off without it. |
I made these changes two years ago. There was a bug in facebook's french pretrained vector As a workaround, we made changes in gensim to load fastText models using only bin files. #1341 A lot of changes were made after that in Gensim and fastText by someone else, and this bug has arisen most probably because backward compatibility was not taken care of. |
I download FB pretrained common-crawl model using https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.en.300.bin.gz from https://fasttext.cc/docs/en/crawl-vectors.html page.
When I try to load it, I get an error
Code:
Stacktrace:
Problem in filename overwriting that happens
https://github.com/RaRe-Technologies/gensim/blob/cebc9dbeacddf7689c35aadb8854ae850d4561d2/gensim/models/fasttext.py#L1302-L1303
To avoid that, I should
gunzip
model file first (that's definitely not a thing that user expect).Versions:
no matters, all python & gensim version affected, including last release.
The text was updated successfully, but these errors were encountered: