Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dependency charset-normalizer to v3 #46

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

renovate[bot]
Copy link
Contributor

@renovate renovate bot commented Oct 24, 2023

This PR contains the following updates:

Package Change Age Adoption Passing Confidence
charset-normalizer (changelog) ==2.1.1 -> ==3.4.1 age adoption passing confidence

Release Notes

jawah/charset_normalizer (charset-normalizer)

v3.4.1

Compare Source

Changed
  • Project metadata are now stored using pyproject.toml instead of setup.cfg using setuptools as the build backend.
  • Enforce annotation delayed loading for a simpler and consistent types in the project.
  • Optional mypyc compilation upgraded to version 1.14 for Python >= 3.8
Added
  • pre-commit configuration.
  • noxfile.
Removed
  • build-requirements.txt as per using pyproject.toml native build configuration.
  • bin/integration.py and bin/serve.py in favor of downstream integration test (see noxfile).
  • setup.cfg in favor of pyproject.toml metadata configuration.
  • Unused utils.range_scan function.
Fixed
  • Converting content to Unicode bytes may insert utf_8 instead of preferred utf-8. (#​572)
  • Deprecation warning "'count' is passed as positional argument" when converting to Unicode bytes on Python 3.13+

v3.4.0

Compare Source

Added
  • Argument --no-preemptive in the CLI to prevent the detector to search for hints.
  • Support for Python 3.13 (#​512)
Fixed
  • Relax the TypeError exception thrown when trying to compare a CharsetMatch with anything else than a CharsetMatch.
  • Improved the general reliability of the detector based on user feedbacks. (#​520) (#​509) (#​498) (#​407) (#​537)
  • Declared charset in content (preemptive detection) not changed when converting to utf-8 bytes. (#​381)

v3.3.2

Compare Source

Fixed
  • Unintentional memory usage regression when using large payload that match several encoding (#​376)
  • Regression on some detection case showcased in the documentation (#​371)
Added
  • Noise (md) probe that identify malformed arabic representation due to the presence of letters in isolated form (credit to my wife)

v3.3.1

Compare Source

Changed
  • Optional mypyc compilation upgraded to version 1.6.1 for Python >= 3.8
  • Improved the general detection reliability based on reports from the community

v3.3.0

Compare Source

Added
  • Allow to execute the CLI (e.g. normalizer) through python -m charset_normalizer.cli or python -m charset_normalizer
  • Support for 9 forgotten encoding that are supported by Python but unlisted in encoding.aliases as they have no alias (#​323)
Removed
  • (internal) Redundant utils.is_ascii function and unused function is_private_use_only
  • (internal) charset_normalizer.assets is moved inside charset_normalizer.constant
Changed
  • (internal) Unicode code blocks in constants are updated using the latest v15.0.0 definition to improve detection
  • Optional mypyc compilation upgraded to version 1.5.1 for Python >= 3.8
Fixed
  • Unable to properly sort CharsetMatch when both chaos/noise and coherence were close due to an unreachable condition in __lt__ (#​350)

v3.2.0

Compare Source

Changed
  • Typehint for function from_path no longer enforce PathLike as its first argument
  • Minor improvement over the global detection reliability
Added
  • Introduce function is_binary that relies on main capabilities, and optimized to detect binaries
  • Propagate enable_fallback argument throughout from_bytes, from_path, and from_fp that allow a deeper control over the detection (default True)
  • Explicit support for Python 3.12
Fixed
  • Edge case detection failure where a file would contain 'very-long' camel cased word (Issue #​289)

v3.1.0

Compare Source

Added
  • Argument should_rename_legacy for legacy function detect and disregard any new arguments without errors (PR #​262)
Removed
  • Support for Python 3.6 (PR #​260)
Changed
  • Optional speedup provided by mypy/c 1.0.1

v3.0.1

Compare Source

Fixed
  • Multi-bytes cutter/chunk generator did not always cut correctly (PR #​233)
Changed
  • Speedup provided by mypy/c 0.990 on Python >= 3.7

v3.0.0

Compare Source

Added
  • Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
  • Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
  • Add parameter language_threshold in from_bytes, from_path and from_fp to adjust the minimum expected coherence ratio
  • normalizer --version now specify if current version provide extra speedup (meaning mypyc compilation whl)
Changed
  • Build with static metadata using 'build' frontend
  • Make the language detection stricter
  • Optional: Module md.py can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
Fixed
  • CLI with opt --normalize fail when using full path for files
  • TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
  • Sphinx warnings when generating the documentation
Removed
  • Coherence detector no longer return 'Simple English' instead return 'English'
  • Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
  • Breaking: Method first() and best() from CharsetMatch
  • UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
  • Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
  • Breaking: Top-level function normalize
  • Breaking: Properties chaos_secondary_pass, coherence_non_latin and w_counter from CharsetMatch
  • Support for the backport unicodedata2

Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

@renovate renovate bot force-pushed the renovate/charset-normalizer-3.x branch from 041fc98 to 721e103 Compare November 1, 2023 07:39
@renovate renovate bot force-pushed the renovate/charset-normalizer-3.x branch from 721e103 to 1eaa7ee Compare October 9, 2024 11:51
@renovate renovate bot force-pushed the renovate/charset-normalizer-3.x branch from 1eaa7ee to 3f516a0 Compare December 24, 2024 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants