-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MmCorpus.load --> UnpicklingError: invalid load key, '%'. #1889
Comments
Thanks for report @obeavers, can you share your |
Are you definitely calling MmCorpus.load('file.mm') or are you calling MmCorpus('file.mm')? |
@obeavers also, you said that corpus = MmCorpus.serialize('file.mm') #breaks here but in your stacktrace, I see different line c = MmCorpus.load(str(path)) This looks strange, can you fix your first message & share file? Also, as @arlenk suggested, if you call "serialize", you should load it as |
So, I investigate it again, the problem really with You should call |
Error while executing following command |
@sreenathelloti how this related with current thread? What is this code? |
Here's my two cents: I had serialized a corpus in a Linux server and transferred the .mm and .mm.index files into my windows 10 environment, then tried to load the corpus. try #1: resulting error: try #2: resulting error: |
The mmCorpus file (path_to_mm_file) should just be a plain text file. Have you tried looking at the file to make sure the transfer from linux didn't somehow corrupt the file? |
I compared checksums and they match, at least. One thing that caught my eye is |
The first way is correct and should work:
The error Seems unrelated to this ticket. Please open a new ticket, with the necessary info (incl. a minimal example, if possible), thanks. |
Description
I'm getting an error in using MmCorpus.load('file.mm'), even immediately after saving saving with MmCorpus.serialize('file.mm', corpus). I am using windows10.
Steps/Code/Corpus to Reproduce
Corpus created with:
corpus = [dictionary.doc2bow(text) for text in texts]
MmCorpus.serialize('file.mm', corpus')
corpus = MmCorpus.serialize('file.mm') #breaks here
Expected Results
Expecting corpus to load as called.
Actual Results
1 c = MmCorpus.load(str(path))c:\users\user.virtualenvs\key_log-v5coq-ss\lib\site-packages\gensim\utils.py in load(cls, fname, mmap)
393 compress, subname = SaveLoad._adapt_by_suffix(fname)
394
--> 395 obj = unpickle(fname)
396 obj._load_specials(fname, mmap, compress, subname)
397 logger.info("loaded %s", fname)
c:\users\user.virtualenvs\key_log-v5coq-ss\lib\site-packages\gensim\utils.py in unpickle(fname)
1300 # Because of loading from S3 load can't be used (missing readline in smart_open)
1301 if sys.version_info > (3, 0):
-> 1302 return _pickle.load(f, encoding='latin1')
1303 else:
1304 return _pickle.loads(f.read())
UnpicklingError: invalid load key, '%'.
Versions
Windows-10-10.0.16299-SP0
Python 3.6.3 |Anaconda, Inc.| (default, Oct 15 2017, 03:27:45) [MSC v.1900 64 bit (AMD64)]
NumPy 1.14.0
SciPy 1.0.0
gensim 3.3.0
FAST_VERSION 0
The text was updated successfully, but these errors were encountered: