-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding type check for corpus_file argument #2469
Changes from 12 commits
40989ba
ad76b83
e1f32a2
a9eeddf
6639089
2ca51ca
6458aea
bc9dce6
97d5619
cdaff9f
b046292
298e8f0
4fb01f4
07a7d5d
8821b92
bfc4360
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -70,7 +70,7 @@ | |||||
except ImportError: | ||||||
from Queue import Queue # noqa:F401 | ||||||
|
||||||
from collections import namedtuple, defaultdict | ||||||
from collections import namedtuple, defaultdict, Iterable | ||||||
from timeit import default_timer | ||||||
|
||||||
from numpy import zeros, float32 as REAL, empty, ones, \ | ||||||
|
@@ -794,6 +794,23 @@ def train(self, documents=None, corpus_file=None, total_examples=None, total_wor | |||||
|
||||||
""" | ||||||
kwargs = {} | ||||||
|
||||||
# Check if both documents and corpus_file are None | ||||||
if corpus_file is None and documents is None: | ||||||
raise TypeError("Either one of corpus_file or documents value must be provided.") | ||||||
|
||||||
# Check if both documents and corpus_file are not None | ||||||
if corpus_file is not None and documents is not None: | ||||||
raise TypeError("Instead provide value to either of corpus_file or documents parameter but not both.") | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For consistency with the above.
Suggested change
|
||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd be tempted to combine these two cases with an XOR: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did it this way for readability. Also, I noticed There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, it'd actually need to be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Strong -1 on this: hard to read and reason about. Please stick to simpler (and easier to maintain) constructs . There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The clearest approach would be not to have a method with such delicate XOR one-but-not-the-other positional parameters - use separate methods with distinct names! But failing that, a one-line XOR, whose next line is an error message with a plain-english description of the constraint, seems pretty simple to me - just accurately reflecting the design choice already made. |
||||||
# Check if corpus_file is string type | ||||||
if documents is None and not os.path.isfile(corpus_file): | ||||||
raise TypeError("Parameter corpus_file must be a valid path to a file, got %s instead." % corpus_file) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Better than %s, because corpus_file may contain spaces. |
||||||
|
||||||
# Check if documents is iterable | ||||||
if documents is not None and not isinstance(documents, Iterable): | ||||||
raise TypeError("documents must be an iterable of list, got %s instead." % documents) | ||||||
|
||||||
if corpus_file is not None: | ||||||
# Calculate offsets for each worker along with initial doctags (doctag ~ document/line number in a file) | ||||||
offsets, start_doctags = self._get_offsets_and_start_doctags_for_corpusfile(corpus_file, self.workers) | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to include a period. Please apply this to other messages as well.
I tried committing these nitpicks to your branch myself, but I don't have permissions.