Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: a bytes-like object is required, not 'str' #698

Closed
Tesfamariam opened this issue May 12, 2016 · 15 comments
Closed

TypeError: a bytes-like object is required, not 'str' #698

Tesfamariam opened this issue May 12, 2016 · 15 comments
Assignees
Labels
bug Issue described a bug

Comments

@Tesfamariam
Copy link

I am trying to implement dynamic topic modeling with python Anaconda 3.4 distribution on Linux OS.However, I am having the following error:
TypeError: a bytes-like object is required, not 'str'
Any idea how I could solve this problem?

@gojomo
Copy link
Collaborator

gojomo commented May 13, 2016

The issue tracker is for bugs/feature-requests, not support questions – those are better handled at the project discussion list: https://groups.google.com/forum/#!forum/gensim

And, you'd have to provide a lot more context/code/logging-info for us to have any idea what line of your code is triggering that error. So if you ask on the list, please better describe what you're trying to accomplish, and how.

@gojomo gojomo closed this as completed May 13, 2016
@piskvorky piskvorky reopened this May 13, 2016
@piskvorky
Copy link
Owner

piskvorky commented May 13, 2016

Sounds like a bug report for the DTM wrapper in gensim... but a very incomplete one.

@Tesfamariam, please review the contributing guide. Add relevant information so we know what you're talking about.

@Tesfamariam
Copy link
Author

Sorry for the incomplete information! Sample dataset:
['lecture', 'notes', 'edited', 'goos', 'hartmanis', 'van', 'leeuwen', 'berlin', 'heidelberg', 'york', 'barcelona', 'hong', 'kong', 'london', 'milan', 'paris', 'singapore', 'tokyo', 'vassil', 'alexandrov', 'jack', 'dongarra', 'benjoe', 'juliano', 'renner', 'kenneth', 'tan', 'eds', 'san', 'francisco', 'usa', 'proceedings', 'volume', 'editors', 'vassil', 'alexandrov', 'university', 'reading', 'school', 'cybernetics', 'electronic', 'engineering', 'whiteknights', 'box', 'reading', 'mail', 'alexandrov', 'rdg', 'jack', 'dongarra']
Then I feed the whole dataset to:
class DTMcorpus(corpora.textcorpus.TextCorpus):

def get_texts(self):
    return self.input

def __len__(self):
    return len(self.input)

corpus = DTMcorpus(texts)
Then determined the time slices:
my_timeslices = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1,1,1, 1, 1, 1]
model = gensim.models.wrappers.DtmModel('/media/tesfish/data/Topic Modeling/dtm-master/bin/dtm-linux64', corpus, my_timeslices, num_topics=15, id2word=dictionary_text, initialize_lda=True)
finally I got the following error:
TypeError Traceback (most recent call last)
in ()
1 my_timeslices = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1,1,1, 1, 1, 1]
----> 2 model = gensim.models.wrappers.DtmModel('/media/tesfish/data/Topic Modeling/dtm-master/bin/dtm-linux64', corpus, my_timeslices, num_topics=15, id2word=dictionary_text, initialize_lda=True)

/home/tesfish/anaconda3/lib/python3.5/site-packages/gensim/models/wrappers/dtmmodel.py in init(self, dtm_path, corpus, time_slices, mode, model, num_topics, id2word, prefix, lda_sequence_min_iter, lda_sequence_max_iter, lda_max_em_iter, alpha, top_chain_var, rng_seed, initialize_lda)
123
124 if corpus is not None:
--> 125 self.train(corpus, time_slices, mode, model)
126
127 def fout_liklihoods(self):

/home/tesfish/anaconda3/lib/python3.5/site-packages/gensim/models/wrappers/dtmmodel.py in train(self, corpus, time_slices, mode, model)
183
184 """
--> 185 self.convert_input(corpus, time_slices)
186
187 arguments = "--ntopics={p0} --model={mofrl} --mode={p1} --initialize_lda={p2} --corpus_prefix={p3} --outname={p4} --alpha={p5}".format(

/home/tesfish/anaconda3/lib/python3.5/site-packages/gensim/models/wrappers/dtmmodel.py in convert_input(self, corpus, time_slices)
174
175 with utils.smart_open(self.ftimeslices(), 'wb') as fout:
--> 176 fout.write(six.u(str(len(self.time_slices)) + "\n"))
177 for sl in time_slices:
178 fout.write(six.u(str(sl) + "\n"))

TypeError: a bytes-like object is required, not 'str'

@piskvorky piskvorky added the bug Issue described a bug label May 19, 2016
@tmylk
Copy link
Contributor

tmylk commented May 19, 2016

Ping @bhargavvader

@tmylk
Copy link
Contributor

tmylk commented May 27, 2016

@bhargavvader Do you have any thoughts on this?

@bhargavvader
Copy link
Contributor

@tmylk will have a look.

@jonathanicholas
Copy link

Just a +1 -- also having this error.

@jonathanicholas
Copy link

with utils.smart_open(self.ftimeslices(), 'wb') as fout:
to
with utils.smart_open(self.ftimeslices(), 'w') as fout:

as in: http://stackoverflow.com/questions/34283178/typeerror-a-bytes-like-object-is-required-not-str-in-python-and-csv

@piskvorky
Copy link
Owner

@boomsbloom that is not a good idea as w mode behaves differently on Windows.

Proper solution is to open in binary mode and store binary strings.

@bhargavvader
Copy link
Contributor

@piskvorky , could you elaborate a bit on your proposed solution? I tried poking around but am not too sure how to fix this.

@piskvorky
Copy link
Owner

piskvorky commented Jun 23, 2016

I meant simply opening files in binary mode (rb or wb) and then storing binary strings into it. So, if the input is unicode, convert to e.g. utf8 (see gensim.utils.to_utf8()).

I am not familiar with this particular issue though, maybe it's something different. What is the actual problem, why are we storing unicode strings into binary files in this wrapper?

@tmylk
Copy link
Contributor

tmylk commented Jun 27, 2016

Ping @bhargavvader

@bhargavvader
Copy link
Contributor

@Tesfamariam , do have a look at the PR, it will fix the problem.
I think this issue can be closed now.

@tmylk
Copy link
Contributor

tmylk commented Jul 1, 2016

Fixed in #768

@tmylk tmylk closed this as completed Jul 1, 2016
@gopi3e
Copy link

gopi3e commented Dec 12, 2019

Nice blog to address the issu
https://webkul.com/blog/string-and-bytes-conversion-in-python3-x/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue described a bug
Projects
None yet
Development

No branches or pull requests

7 participants