Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor line2doc methods of LowCorpus and MalletCorpus #2269

Merged
merged 6 commits into from
Jan 11, 2019

Conversation

horpto
Copy link
Contributor

@horpto horpto commented Nov 16, 2018

No description provided.

@horpto horpto force-pushed the nonoptimal-lowcorpora branch 2 times, most recently from 5ba87a5 to ef27d11 Compare November 17, 2018 14:29
@horpto horpto force-pushed the nonoptimal-lowcorpora branch from ef27d11 to 5f163f4 Compare December 12, 2018 01:12
@horpto horpto force-pushed the nonoptimal-lowcorpora branch from 5f163f4 to fcc9bc1 Compare December 12, 2018 01:43
docid, doclang, words = splited_line[0], splited_line[1], splited_line[2:]
split_line = utils.to_unicode(line).strip().split(None, 2)
docid, doclang = split_line[0], split_line[1]
words = split_line[2] if len(split_line) >= 3 else ''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why >=, not an =? I asked because "If maxsplit is given, at most maxsplit splits are done".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a habbit to write more flexible code on the case of future changes.

@menshikh-iv menshikh-iv changed the title Refactor to more optimal line2doc method of LowCorpus and MalletCorpus [WIP] Refactor to more optimal line2doc method of LowCorpus and MalletCorpus Jan 9, 2019
@horpto horpto changed the title [WIP] Refactor to more optimal line2doc method of LowCorpus and MalletCorpus Refactor to more optimal line2doc method of LowCorpus and MalletCorpus Jan 10, 2019
@menshikh-iv menshikh-iv changed the title Refactor to more optimal line2doc method of LowCorpus and MalletCorpus Refactor line2doc method of LowCorpus and MalletCorpus Jan 11, 2019
@menshikh-iv menshikh-iv changed the title Refactor line2doc method of LowCorpus and MalletCorpus Refactor line2doc methods of LowCorpus and MalletCorpus Jan 11, 2019
@menshikh-iv menshikh-iv merged commit 680de8d into piskvorky:develop Jan 11, 2019
@horpto horpto deleted the nonoptimal-lowcorpora branch January 19, 2019 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants