[WIP] Subpackage refactor. Fix 1584 #1607

menshikh-iv · 2017-10-03T10:01:11Z

Fixes #1584.

…x imports (in py and ipynb).

… functions.

menshikh-iv · 2017-10-03T11:54:24Z

@macks22 please review my commit about CoherenceModel refactoring, maybe you have any suggestions about it.

…-refactor # Conflicts: # gensim/corpora/textcorpus.py # gensim/topic_coherence/indirect_confirmation_measure.py

macks22 · 2017-10-06T12:11:11Z

gensim/models/coherence_utils.py

+EPSILON = 1e-12  # Should be small. Value as suggested in paper.
+
+
+"""


The presence of these docstrings throughout the module makes me think the existing module structure was actually a pretty useful way to separate these things out. Have you considered adding a coherence_utils package with from <existing_module> import * instead of throwing them all in the same module? This seems cleaner to me. I think it makes things easier to find and navigate.

So, maybe simple moving & renaming coherence_model->gensim.models.coherence_utils OR gensim.models.coherence_inneris better and you are right (because I don't think that this variant is sufficiently consistent), for this reason, I'm asking your for review, thanks 👍

macks22 · 2017-10-06T12:16:26Z

gensim/utils.py

@@ -1278,3 +1277,456 @@ def lazy_flatten(nested_list):
                yield sub
        else:
            yield el
+
+
+class PorterStemmer(object):


I think a similar comment applies here to my comment above about the coherence_utils module. I think this would be better organized as a package, with subpackages for preprocessing, corpora, etc. I think the existing utils module was already getting a bit bloated. I would hypothesize very long files with a wide variety of things makes it hard to find things for folks who are new to the codebase. I might even advocate a more granular decomposition such as: utils/stemming, utils/lemmatizing, utils/corpora, utils/helpers, or something of that sort, with aggregation in the utils/__init__.

In general, I agree (maybe not very "detailed" utils, but splitting gensim.utils to gensim.utils.utils (old) and gensim.utils.preprocessing is better)

macks22 · 2017-10-06T12:22:14Z

gensim/utils.py

+    return " ".join(w for w in s.split() if w not in STOPWORDS)
+
+
+RE_PUNCT = re.compile('([%s])+' % re.escape(string.punctuation), re.UNICODE)


What are your thoughts on moving this into some kind of namespace object called regexes, or something of that sort, either as a module or a class with these as class attributes?

The dedicated namespace would make it easier to find these, provide a single place for users to look for existing regex patterns, and also make it easier to document them. The downside I see is moving their definitions away from where they are used.

I agree with you, maybe simple storage class will be better for this purposes.

macks22 · 2017-10-06T12:26:38Z

@menshikh-iv I left a few comments, all regarding modularity. I really like the work you've been doing to make the code better organized and more consistently styled. I think this is a good direction, and I hope my comments are useful. Cheers!

menshikh-iv · 2017-10-10T10:45:15Z

Continued in #1618

menshikh-iv added 3 commits September 19, 2017 13:39

Remove useless and broken 'examples' subpackage

1ddc489

Remove 'parsing' package. Move all used functions to gensim.utils. Fi…

479e088

…x imports (in py and ipynb).

Remove unused nose runner

9bcdc97

menshikh-iv added the breaks backward-compatibility Change breaks backward compatibility label Oct 3, 2017

menshikh-iv added 2 commits October 3, 2017 16:46

Fix PEP8 in untils

72c64e9

Remove topic_coherence submodule, create new file with all additional…

50d7592

… functions.

menshikh-iv added 3 commits October 5, 2017 17:20

Merge remote-tracking branch 'remotes/origin/develop' into subpackage…

128596f

…-refactor # Conflicts: # gensim/corpora/textcorpus.py # gensim/topic_coherence/indirect_confirmation_measure.py

Update doc structure accordance with refactoring

82be36f

Fix endline

c6a0085

macks22 reviewed Oct 6, 2017

View reviewed changes

menshikh-iv mentioned this pull request Oct 9, 2017

Remove/refactor useless subpackages #1584

Open

6 tasks

menshikh-iv closed this Oct 10, 2017

menshikh-iv deleted the subpackage-refactor branch October 10, 2017 10:45

menshikh-iv mentioned this pull request Oct 10, 2017

Subpackage refactor. Fix 1584 #1618

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Subpackage refactor. Fix 1584 #1607

[WIP] Subpackage refactor. Fix 1584 #1607

menshikh-iv commented Oct 3, 2017 •

edited by piskvorky

Loading

menshikh-iv commented Oct 3, 2017

macks22 Oct 6, 2017

menshikh-iv Oct 6, 2017

macks22 Oct 6, 2017

menshikh-iv Oct 6, 2017

macks22 Oct 6, 2017

menshikh-iv Oct 6, 2017

macks22 commented Oct 6, 2017

menshikh-iv commented Oct 10, 2017

		EPSILON = 1e-12 # Should be small. Value as suggested in paper.


		"""

		return " ".join(w for w in s.split() if w not in STOPWORDS)


		RE_PUNCT = re.compile('([%s])+' % re.escape(string.punctuation), re.UNICODE)

[WIP] Subpackage refactor. Fix 1584 #1607

[WIP] Subpackage refactor. Fix 1584 #1607

Conversation

menshikh-iv commented Oct 3, 2017 • edited by piskvorky Loading

menshikh-iv commented Oct 3, 2017

macks22 Oct 6, 2017

Choose a reason for hiding this comment

menshikh-iv Oct 6, 2017

Choose a reason for hiding this comment

macks22 Oct 6, 2017

Choose a reason for hiding this comment

menshikh-iv Oct 6, 2017

Choose a reason for hiding this comment

macks22 Oct 6, 2017

Choose a reason for hiding this comment

menshikh-iv Oct 6, 2017

Choose a reason for hiding this comment

macks22 commented Oct 6, 2017

menshikh-iv commented Oct 10, 2017

menshikh-iv commented Oct 3, 2017 •

edited by piskvorky

Loading