Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bundle emojis in package #23

Closed
impredicative opened this issue Mar 1, 2021 · 5 comments · Fixed by #29
Closed

Bundle emojis in package #23

impredicative opened this issue Mar 1, 2021 · 5 comments · Fixed by #29

Comments

@impredicative
Copy link

It is difficult for me to provided reusable software if the emjois are not already bundled in the demoji package. I understand and acknowledge that the list of emjois could be routinely updated, but this is not a reason to routinely not bundle the list in the package itself. For instance, I believe Python comes bundled with unicode data. You can still offer users a way to update the emoji list, and otherwise fallback to what's in the package.

@bsolomon1124
Copy link
Owner

@impredicative Thanks for raising this request. I am open to it and think you make a valid argument. I like the proposed behavior of defaulting to the bundled version that is distributed with the repo, but then offering the ability to live-download and update if needed. I'm going to do some thinking about what would be the best storage format that offers a tradeoff of space-savings and load time.

@bsolomon1124
Copy link
Owner

bsolomon1124 commented Mar 10, 2021

Interestingly enough it looks like CPython distributes unicode data as C header files

Options I could see here would be:

  • plain .py module containing a dict
  • JSON
  • compressed JSON

@Ronserruya
Copy link

Ronserruya commented Jun 21, 2021

unicode.org was down many times in the past few days, which took some of our systems down, I decided to inject the codes to the pkg by myself to solve this

  1. I took the file the demoji writes to disk (you can grab it from here, or from ~/.demoji/codes.json), its only 324kb https://gist.github.com/Ronserruya/80b2174957ab95b0f8cd88516037cf44)

  2. Before using demoji.download_codes()

def _replace_demoji_method(URL):
    with open('codes.json') as f:
        data = json.load(f)
        yield from data['codes'].items()

# Read from file instead of going to unicode.org
demoji.stream_unicodeorg_emojifile = _replace_demoji_method
demoji.download_codes()

Demoji will not read all the codes from the file instead of going to unicode.org

@impredicative
Copy link
Author

@Ronserruya I chose to just go with the emjoi package which doesn't have this issue:

EMOJI_REGEXP = emoji.get_emoji_regexp()
EMOJI_REGEXP.sub("", "your text, optionally having emojis")

@bsolomon1124 bsolomon1124 changed the title Bundle emjois in package Bundle emojis in package Jul 18, 2021
bsolomon1124 added a commit that referenced this issue Jul 18, 2021
...at install time, rather than requiring a runtime download
of the codes from unicode.org.

SemVer MAJOR:

- Drop support for Python 2 and Python 3.5
- The `demoji` package now bundles emoji data that is distributed with the
  package at install time, rather than requiring a download of the codes
  from the unicode.org site at runtime (closes #23)
- As a result of the above change, the following functions are **removed**
  from the `demoji` API:
  - `download_codes()`
  - `parse_unicode_sequence()`
  - `parse_unicode_range()`
  - `stream_unicodeorg_emojifile()`

SemVer MINOR:

- The `demoji.DIRECTORY` and `demoji.CACHEPATH` attributes are deprecated
  due to no longer being functionally in used by the package. Accessing them
  will warn with a `FutureWarning`, and these attributes may be removed
  completely in a future release
- `demoji` can now be installed with optional `ujson` support for faster loading
  of emoji data from file (versus the standard library's `json`, which is the
  default); use `python -m pip install demoji[ujson]`
- The dependencies `requests` and `colorama` have been removed completely
- `importlib_resources` (a backport module) is now required for Python < 3.7
- The `EMOJI_VERSION` attribute, newly added to `demoji`, is a `str` denoting
  the Unicode database version in use

SemVer PATCH:

- Fix a typo in `demoji.__all__` to properly include `demoji.findall_list()`
- Internal change: Functions that call `set_emoji_pattern()` are now decorated
  with a `@cache_setter` to set the cache
- Some unit tests have been removed to update the change in behavior from
  downloading codes to bundling codes with install

Closes #28
Closes #27
Closes #23
Closes #11
Closes #4
@bsolomon1124
Copy link
Owner

bsolomon1124 commented Jul 18, 2021

@impredicative

This issue has been tentatively resolved by demoji release 1.0.0, which is in release candidate stage right now.
This release bundles static copy of Unicode emoji data with the package rather than requiring a runtime download.

You can get this pre-release using --pre:

python3 -m pip install --pre -U demoji

I appreciate your feedback prior to releasing 1.0.0!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants