Skip to content
This repository has been archived by the owner on Mar 9, 2023. It is now read-only.

easy installable dictionary #73

Closed
izziiyt opened this issue Aug 1, 2019 · 11 comments
Closed

easy installable dictionary #73

izziiyt opened this issue Aug 1, 2019 · 11 comments

Comments

@izziiyt
Copy link
Collaborator

izziiyt commented Aug 1, 2019

explosion/spaCy#3756 (comment)

Asking PyPI organization allowing 60MB limit exception for full and core dictionary.
This issue is heavily related to https://github.com/WorksApplications/SudachiDict

@izziiyt
Copy link
Collaborator Author

izziiyt commented Oct 27, 2019

@sorami

@polm
Copy link
Contributor

polm commented Nov 10, 2019

Any progress on this? I think all you need to do is open an issue here and explain why the package is going to be large.

@sorami
Copy link
Collaborator

sorami commented Nov 12, 2019

Hi, thanks for the reminder. Let me have a look and take action in the next few days.

@sorami
Copy link
Collaborator

sorami commented Nov 15, 2019

Okay, so I will confirm with the team about the dictionary file and then open an issue on Pypa repo early next week.

@sorami
Copy link
Collaborator

sorami commented Nov 20, 2019

I've added the following 3 dictionary packages on PyPI;

  1. small (40MB): https://pypi.org/project/SudachiDict-small/20191030/
  2. core (70MB): https://pypi.org/project/SudachiDict-core/0.0.0/
  3. full (150MB): https://pypi.org/project/SudachiDict-full/0.0.0/

The PyPI size limit is 60MB; For small I have uploaded the dictionary resource already, and for core and full I have created the PyPI package version 0.0.0 without the resource files and waiting for the PyPA to increase the size limit.

So you can already do the following to start using the tokenizer;

$ pip install sudachipy sudachidict-small

I have filed an issue to request the size limit increase;
pypa/packaging-problems#299

Once they increase the limit, I will upload core and full resources on PyPI, notify you here, and update the SudachiPy readme.

@hiroshi-matsuda-rit
Copy link
Contributor

@sorami Did you get some response from the PyPI? I'm going to release the new version of GiNZA in next two weeks. I'd like to make the GiNZA packages available via the PyPI if the SudachDict-core would be also coming from the PyPI.

@sorami
Copy link
Collaborator

sorami commented Dec 7, 2019

@hiroshi-matsuda
Unfortunately, I haven't heard anything from the PyPA team.

The same request made by someone else a day before us ( PyPI package size limit for splice-beakerx · Issue #298 · pypa/packaging-problems ) is in a same situation.

I've added a comment to the issue just now to ping them.

As soon as they change the size limit, our side is ready to release all three (small, core, full) dictionaries on the PyPI.

@polm
Copy link
Contributor

polm commented Jan 4, 2020

Just poking this issue. It looks like the correct place to make a request for a size increase is actually pypi-support, not the place linked to before. The 298 issue linked here was migrated by the admin there but for some reason the SudachiPy one wasn't, so I guess you need to open a new issue.

@sorami
Copy link
Collaborator

sorami commented Jan 6, 2020

Thank you very much for the information, @polm !!

I have opened an issue there; pypi/support#131

@sorami
Copy link
Collaborator

sorami commented Jan 6, 2020

Limit Request: sudachidict-{core,full} - {75, 160MB} · Issue #131 · pypa/pypi-support

Okay, so we can't distribute the dictionaries on PyPI. Let me consider the alternative approaches Jason introduced to us in the above issue.

@sorami
Copy link
Collaborator

sorami commented May 12, 2020

I've (finally) set up the Python packages for the dictionaries; Now you can install them via PyPI.

$ pip install sudachidict_core
$ pip install sudachidict_small
$ pip install sudachidict_full

The dictionary binary files are not in the packages, but they are downloaded upon installation.

@sorami sorami closed this as completed May 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants