-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'NoneType' object is not iterable when using summarize #1531
Comments
Thanks for reporting. @menshikh-iv let's link to the examples from the documentation (I think there was a blog post or notebook for summarization, at the very least). |
@nabergh Docs: ipython tutorial and blogpost. But you use it correctly, looks like a bug, @olavurmortensen please look at this problem. |
@menshikh-iv I wrote that tutorial on |
Ok, ping @fedelopez77 @fbarrios |
John Mercer sent me an email
@nabergh please check this |
I just checked this out and it seems like John is correct. If there is a period followed by no space in some cases the same error is thrown, which can be fixed by adding a space after the period. Although I do not agree with this design decision (I think a more informative error should be thrown at the very least), this case does not pertain to the example I posted above and does not solve this issue. |
Looks like a bug to me, not a design decision. Although probably related to the bug in this ticket, is my guess, and a good lead. |
I''m confirming this bug which became apparent with the following text which doesn't have a dot followed by no space.
In the meantime I'm using this function to workaround this bug
Here's another self contained example
|
I'll look into it during the day. Hopefully I can propose a PR fixing this by today or tomorrow :) |
The NoneType issue was due to this return: https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/summarization/summarizer.py#L163 This all happened simply because the summarizer module doesn't have enough material to work with (a warning was in fact beign issued). I created #1570 to fix the NoneType error. |
By the way @diegospd, the summarizer module is splitting your text in a way that you probably don't want. You should remove the newlines so TextRank can make a proper summary:
|
Thanks @fbarrios! I was having a hard time figuring out why my texts had so many sentences. I think there should be warnings in the documentation about not being able to handle texts with less than 10 sentences and that newlines in the middle of sentences will confuse the summarizer. I'm cleaning my texts with this in order to replace import re
text = re.sub(r'\n|\r|\t', ' ', text)
text = re.sub(r'\s+', ' ', text) |
At the moment the logger issues a warning in those cases. |
I created issue #1575 |
…red a warning will be logged and the original text will be returned as if it had been succesfully summarized
For those reading this thread, the result of running the original code in the newest release of gensim (3.1.0) is for |
@nabergh The rationale behind that decision was to be consistent with the return types. |
Maybe return the entire original text? Or raise a I'm also not sure what the expected behaviour is. What is your suggestion @nabergh ? |
Of course, ideally we'd want to summarize even the shorter text passages. @fbarrios how central is the length limitation to the algorithm, how was it chosen? What happens if it is relaxed? |
Ping @fedelopez77, can your answer for #1531 (comment) please? What do you think about behavior (how it should look like)? |
Hmm I don't see the warning when running the code I posted in the interactive shell or in a Jupyter notebook. Do I have to enable warnings somehow? @fbarrios I like both of @piskvorky's suggestions for expected behavior. The original text is a better summary than an empty string so I (and I assume most users) would expect that first. If this only happens for very short texts, the original text is probably short enough to be a summary anyway. The documentation for the summarizer says "The input must be longer than INPUT_MIN_LENGTH". How long is INPUT_MIN_LENGTH and is this the length you are currently using to decide whether to return a summary or not? Again, thanks for working on this. |
When using
gensim.summarization.summarize
,TypeError: 'NoneType' object is not iterable
is occasionally thrown when summarizing random complaints in this dataset of consumer complaints. I'm not sure what it is about the strings that causes this error to be thrown but I feel the error should not be thrown regardless. Here is a self-contained example:I pasted the stack trace below but I'm sure you'll see it when running the above example.
I'm assuming this is a bug but there's no example in the gensim
documentationapi page so I'm not entirely sure if I'm using the function correctly.The text was updated successfully, but these errors were encountered: