Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow name_only option gensim downloader api #2143

Merged
merged 35 commits into from
Aug 3, 2018
Merged
Show file tree
Hide file tree
Changes from 34 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
e249ed4
handle deprecation
aneesh-joshi Feb 8, 2018
62f6c82
Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim…
aneesh-joshi Feb 14, 2018
1677e98
handle max_count
aneesh-joshi Feb 18, 2018
e8c08f8
change flag name
aneesh-joshi Feb 18, 2018
258d033
make flake8 compatible
aneesh-joshi Feb 18, 2018
875c65c
move max_vocab to prepare vocab
aneesh-joshi Feb 20, 2018
0aa8426
correct max_vocab semantics
aneesh-joshi Feb 20, 2018
390f333
remove unnecessary nextline
aneesh-joshi Feb 20, 2018
8c508c7
fix bug and make flake8 complaint
aneesh-joshi Feb 21, 2018
c826b19
refactor code and change sorting to key based
aneesh-joshi Feb 22, 2018
35dc681
add tests
aneesh-joshi Mar 5, 2018
67f6a14
introduce effective_min_count
aneesh-joshi Mar 5, 2018
7b1f612
make flake8 compliant
aneesh-joshi Mar 5, 2018
fafee70
remove clobbering of min_count
aneesh-joshi Mar 7, 2018
9d99660
remove min_count assertion
aneesh-joshi Mar 7, 2018
6c06fbc
.\gensim\models\word2vec.py
aneesh-joshi Mar 7, 2018
c5a0e6e
Revert ".\gensim\models\word2vec.py"
aneesh-joshi Mar 7, 2018
fdd2aab
rename max_vocab to max_final_vocab
aneesh-joshi Mar 7, 2018
974d587
update test to max_final_vocab
aneesh-joshi Mar 7, 2018
ddb3556
move and modify comment docs
aneesh-joshi Mar 7, 2018
c54d8a9
make flake8 compliant
aneesh-joshi Mar 7, 2018
f379616
refactor word2vec.py
aneesh-joshi Mar 8, 2018
46d3885
handle possible old model load errors
aneesh-joshi Mar 11, 2018
2cf5625
include effective_min_count tests
aneesh-joshi Mar 11, 2018
8578e3d
make flake compliant
aneesh-joshi Mar 11, 2018
a43fea3
remove check for max_final_vocab
aneesh-joshi Mar 13, 2018
340a8cf
include backward compat for 3.3 models
aneesh-joshi Mar 15, 2018
0b62407
remove unnecessary newline
aneesh-joshi Mar 15, 2018
5b7a6c2
add test case for max_final_vocab
aneesh-joshi Mar 19, 2018
48ad4dc
merge master
aneesh-joshi May 14, 2018
f282a56
Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim…
aneesh-joshi Jul 31, 2018
738a018
add name only option to downloader api
aneesh-joshi Jul 31, 2018
3a91142
add tests
aneesh-joshi Jul 31, 2018
2571e85
make single argument option for name_only
aneesh-joshi Aug 2, 2018
0839f5c
make name_only into name
aneesh-joshi Aug 2, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 12 additions & 3 deletions gensim/downloader.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
Also, this API available via CLI::

python -m gensim.downloader --info <dataname> # same as api.info(dataname)
python -m gensim.downloader --info name_only # same as api.info(name_only=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--info name please :) (but stay name_only parameter for CLI)

python -m gensim.downloader --download <dataname> # same as api.load(dataname, return_path=True)

"""
Expand Down Expand Up @@ -154,7 +155,7 @@ def _calculate_md5_checksum(fname):
return hash_md5.hexdigest()


def info(name=None, show_only_latest=True):
def info(name=None, show_only_latest=True, name_only=False):
"""Provide the information related to model/dataset.

Parameters
Expand All @@ -164,6 +165,8 @@ def info(name=None, show_only_latest=True):
show_only_latest : bool, optional
If storage contains different versions for one data/model, this flag allow to hide outdated versions.
Affects only if `name` is None.
name_only : bool, optional
If True, will return only the names of available models and corpora.

Returns
-------
Expand Down Expand Up @@ -205,6 +208,9 @@ def info(name=None, show_only_latest=True):
if not show_only_latest:
return information

if name_only:
return {"corpora": list(information['corpora'].keys()), "models": list(information['models'])}

return {
"corpora": {name: data for (name, data) in information['corpora'].items() if data.get("latest", True)},
"models": {name: data for (name, data) in information['models'].items() if data.get("latest", True)}
Expand Down Expand Up @@ -444,5 +450,8 @@ def load(name, return_path=False):
data_path = load(args.download[0], return_path=True)
logger.info("Data has been installed and data path is %s", data_path)
elif args.info is not None:
output = info() if (args.info == full_information) else info(name=args.info)
print(json.dumps(output, indent=4))
if args.info == 'name_only':
print(json.dumps(info(name_only=True), indent=4))
else:
output = info() if (args.info == full_information) else info(name=args.info)
print(json.dumps(output, indent=4))
3 changes: 3 additions & 0 deletions gensim/test/test_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,9 @@ def test_info(self):
self.assertEqual(sorted(data.keys()), sorted(['models', 'corpora']))
self.assertTrue(len(data['models']))
self.assertTrue(len(data['corpora']))
name_only_data = api.info(name_only=True)
self.assertEqual(len(name_only_data.keys()), 2)
self.assertTrue({'models', 'corpora'} == set(name_only_data))


if __name__ == '__main__':
Expand Down