-
-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
html file is not reported as UTF8 after conversion #381
Milestone
Comments
hrvoj3e
added
detection
Related to the charset detection mechanism, chaos/mess/coherence
help wanted
Extra attention is needed
labels
Nov 8, 2023
hrvoj3e
changed the title
html file is not UTF8 after conversion
html file is not reported as UTF8 after conversion
Nov 8, 2023
Yes, you are correct.
Not entirely true, it's more complicated than that. Fortunately, I know how to fix this. I don't know exactly when, but soon. |
Ousret
added
bug
Something isn't working
CLI
Anything related to the CLI script (normalizer)
and removed
help wanted
Extra attention is needed
detection
Related to the charset detection mechanism, chaos/mess/coherence
labels
Sep 21, 2024
Merged
zemnmez-renovate-bot
added a commit
to zemn-me/monorepo
that referenced
this issue
Oct 9, 2024
##### v3.4.0 (`https://github.com/Ousret/charset_normalizer/blob/HEAD/CHANGELOG.md#340-2024-10-08`) ##### Added - Argument `--no-preemptive` in the CLI to prevent the detector to search for hints. - Support for Python 3.13 ([#512](jawah/charset_normalizer#512)) ##### Fixed - Relax the TypeError exception thrown when trying to compare a CharsetMatch with anything else than a CharsetMatch. - Improved the general reliability of the detector based on user feedbacks. ([#520](jawah/charset_normalizer#520)) ([#509](jawah/charset_normalizer#509)) ([#498](jawah/charset_normalizer#498)) ([#407](jawah/charset_normalizer#407)) ([#537](jawah/charset_normalizer#537)) - Declared charset in content (preemptive detection) not changed when converting to utf-8 bytes. ([#381](jawah/charset_normalizer#381))
zemnmez-renovate-bot
added a commit
to zemn-me/monorepo
that referenced
this issue
Oct 9, 2024
##### v3.4.0 (`https://github.com/Ousret/charset_normalizer/blob/HEAD/CHANGELOG.md#340-2024-10-08`) ##### Added - Argument `--no-preemptive` in the CLI to prevent the detector to search for hints. - Support for Python 3.13 ([#512](jawah/charset_normalizer#512)) ##### Fixed - Relax the TypeError exception thrown when trying to compare a CharsetMatch with anything else than a CharsetMatch. - Improved the general reliability of the detector based on user feedbacks. ([#520](jawah/charset_normalizer#520)) ([#509](jawah/charset_normalizer#509)) ([#498](jawah/charset_normalizer#498)) ([#407](jawah/charset_normalizer#407)) ([#537](jawah/charset_normalizer#537)) - Declared charset in content (preemptive detection) not changed when converting to utf-8 bytes. ([#381](jawah/charset_normalizer#381))
zemnmez-renovate-bot
added a commit
to zemn-me/monorepo
that referenced
this issue
Oct 9, 2024
##### v3.4.0 (`https://github.com/Ousret/charset_normalizer/blob/HEAD/CHANGELOG.md#340-2024-10-08`) ##### Added - Argument `--no-preemptive` in the CLI to prevent the detector to search for hints. - Support for Python 3.13 ([#512](jawah/charset_normalizer#512)) ##### Fixed - Relax the TypeError exception thrown when trying to compare a CharsetMatch with anything else than a CharsetMatch. - Improved the general reliability of the detector based on user feedbacks. ([#520](jawah/charset_normalizer#520)) ([#509](jawah/charset_normalizer#509)) ([#498](jawah/charset_normalizer#498)) ([#407](jawah/charset_normalizer#407)) ([#537](jawah/charset_normalizer#537)) - Declared charset in content (preemptive detection) not changed when converting to utf-8 bytes. ([#381](jawah/charset_normalizer#381))
zemnmez-renovate-bot
added a commit
to zemn-me/monorepo
that referenced
this issue
Oct 9, 2024
##### v3.4.0 (`https://github.com/Ousret/charset_normalizer/blob/HEAD/CHANGELOG.md#340-2024-10-08`) ##### Added - Argument `--no-preemptive` in the CLI to prevent the detector to search for hints. - Support for Python 3.13 ([#512](jawah/charset_normalizer#512)) ##### Fixed - Relax the TypeError exception thrown when trying to compare a CharsetMatch with anything else than a CharsetMatch. - Improved the general reliability of the detector based on user feedbacks. ([#520](jawah/charset_normalizer#520)) ([#509](jawah/charset_normalizer#509)) ([#498](jawah/charset_normalizer#498)) ([#407](jawah/charset_normalizer#407)) ([#537](jawah/charset_normalizer#537)) - Declared charset in content (preemptive detection) not changed when converting to utf-8 bytes. ([#381](jawah/charset_normalizer#381))
zemnmez-renovate-bot
added a commit
to zemn-me/monorepo
that referenced
this issue
Oct 9, 2024
##### v3.4.0 (`https://github.com/Ousret/charset_normalizer/blob/HEAD/CHANGELOG.md#340-2024-10-08`) ##### Added - Argument `--no-preemptive` in the CLI to prevent the detector to search for hints. - Support for Python 3.13 ([#512](jawah/charset_normalizer#512)) ##### Fixed - Relax the TypeError exception thrown when trying to compare a CharsetMatch with anything else than a CharsetMatch. - Improved the general reliability of the detector based on user feedbacks. ([#520](jawah/charset_normalizer#520)) ([#509](jawah/charset_normalizer#509)) ([#498](jawah/charset_normalizer#498)) ([#407](jawah/charset_normalizer#407)) ([#537](jawah/charset_normalizer#537)) - Declared charset in content (preemptive detection) not changed when converting to utf-8 bytes. ([#381](jawah/charset_normalizer#381))
zemnmez-renovate-bot
added a commit
to zemn-me/monorepo
that referenced
this issue
Oct 13, 2024
##### v3.4.0 ##### Added - Argument `--no-preemptive` in the CLI to prevent the detector to search for hints. - Support for Python 3.13 ([#512](jawah/charset_normalizer#512)) ##### Fixed - Relax the TypeError exception thrown when trying to compare a CharsetMatch with anything else than a CharsetMatch. - Improved the general reliability of the detector based on user feedbacks. ([#520](jawah/charset_normalizer#520)) ([#509](jawah/charset_normalizer#509)) ([#498](jawah/charset_normalizer#498)) ([#407](jawah/charset_normalizer#407)) ([#537](jawah/charset_normalizer#537)) - Declared charset in content (preemptive detection) not changed when converting to utf-8 bytes. ([#381](jawah/charset_normalizer#381))
zemnmez-renovate-bot
added a commit
to zemn-me/monorepo
that referenced
this issue
Oct 13, 2024
##### v3.4.0 ##### Added - Argument `--no-preemptive` in the CLI to prevent the detector to search for hints. - Support for Python 3.13 ([#512](jawah/charset_normalizer#512)) ##### Fixed - Relax the TypeError exception thrown when trying to compare a CharsetMatch with anything else than a CharsetMatch. - Improved the general reliability of the detector based on user feedbacks. ([#520](jawah/charset_normalizer#520)) ([#509](jawah/charset_normalizer#509)) ([#498](jawah/charset_normalizer#498)) ([#407](jawah/charset_normalizer#407)) ([#537](jawah/charset_normalizer#537)) - Declared charset in content (preemptive detection) not changed when converting to utf-8 bytes. ([#381](jawah/charset_normalizer#381))
zemnmez-renovate-bot
added a commit
to zemn-me/monorepo
that referenced
this issue
Oct 13, 2024
##### v3.4.0 ##### Added - Argument `--no-preemptive` in the CLI to prevent the detector to search for hints. - Support for Python 3.13 ([#512](jawah/charset_normalizer#512)) ##### Fixed - Relax the TypeError exception thrown when trying to compare a CharsetMatch with anything else than a CharsetMatch. - Improved the general reliability of the detector based on user feedbacks. ([#520](jawah/charset_normalizer#520)) ([#509](jawah/charset_normalizer#509)) ([#498](jawah/charset_normalizer#498)) ([#407](jawah/charset_normalizer#407)) ([#537](jawah/charset_normalizer#537)) - Declared charset in content (preemptive detection) not changed when converting to utf-8 bytes. ([#381](jawah/charset_normalizer#381))
zemnmez-renovate-bot
added a commit
to zemn-me/monorepo
that referenced
this issue
Oct 14, 2024
##### v3.4.0 ##### Added - Argument `--no-preemptive` in the CLI to prevent the detector to search for hints. - Support for Python 3.13 ([#512](jawah/charset_normalizer#512)) ##### Fixed - Relax the TypeError exception thrown when trying to compare a CharsetMatch with anything else than a CharsetMatch. - Improved the general reliability of the detector based on user feedbacks. ([#520](jawah/charset_normalizer#520)) ([#509](jawah/charset_normalizer#509)) ([#498](jawah/charset_normalizer#498)) ([#407](jawah/charset_normalizer#407)) ([#537](jawah/charset_normalizer#537)) - Declared charset in content (preemptive detection) not changed when converting to utf-8 bytes. ([#381](jawah/charset_normalizer#381))
zemnmez-renovate-bot
added a commit
to zemn-me/monorepo
that referenced
this issue
Oct 14, 2024
##### v3.4.0 ##### Added - Argument `--no-preemptive` in the CLI to prevent the detector to search for hints. - Support for Python 3.13 ([#512](jawah/charset_normalizer#512)) ##### Fixed - Relax the TypeError exception thrown when trying to compare a CharsetMatch with anything else than a CharsetMatch. - Improved the general reliability of the detector based on user feedbacks. ([#520](jawah/charset_normalizer#520)) ([#509](jawah/charset_normalizer#509)) ([#498](jawah/charset_normalizer#498)) ([#407](jawah/charset_normalizer#407)) ([#537](jawah/charset_normalizer#537)) - Declared charset in content (preemptive detection) not changed when converting to utf-8 bytes. ([#381](jawah/charset_normalizer#381))
zemnmez-renovate-bot
added a commit
to zemn-me/monorepo
that referenced
this issue
Oct 14, 2024
##### v3.4.0 ##### Added - Argument `--no-preemptive` in the CLI to prevent the detector to search for hints. - Support for Python 3.13 ([#512](jawah/charset_normalizer#512)) ##### Fixed - Relax the TypeError exception thrown when trying to compare a CharsetMatch with anything else than a CharsetMatch. - Improved the general reliability of the detector based on user feedbacks. ([#520](jawah/charset_normalizer#520)) ([#509](jawah/charset_normalizer#509)) ([#498](jawah/charset_normalizer#498)) ([#407](jawah/charset_normalizer#407)) ([#537](jawah/charset_normalizer#537)) - Declared charset in content (preemptive detection) not changed when converting to utf-8 bytes. ([#381](jawah/charset_normalizer#381))
zemnmez-renovate-bot
added a commit
to zemn-me/monorepo
that referenced
this issue
Oct 14, 2024
##### v3.4.0 ##### Added - Argument `--no-preemptive` in the CLI to prevent the detector to search for hints. - Support for Python 3.13 ([#512](jawah/charset_normalizer#512)) ##### Fixed - Relax the TypeError exception thrown when trying to compare a CharsetMatch with anything else than a CharsetMatch. - Improved the general reliability of the detector based on user feedbacks. ([#520](jawah/charset_normalizer#520)) ([#509](jawah/charset_normalizer#509)) ([#498](jawah/charset_normalizer#498)) ([#407](jawah/charset_normalizer#407)) ([#537](jawah/charset_normalizer#537)) - Declared charset in content (preemptive detection) not changed when converting to utf-8 bytes. ([#381](jawah/charset_normalizer#381))
zemnmez-renovate-bot
added a commit
to zemn-me/monorepo
that referenced
this issue
Oct 14, 2024
##### v3.4.0 ##### Added - Argument `--no-preemptive` in the CLI to prevent the detector to search for hints. - Support for Python 3.13 ([#512](jawah/charset_normalizer#512)) ##### Fixed - Relax the TypeError exception thrown when trying to compare a CharsetMatch with anything else than a CharsetMatch. - Improved the general reliability of the detector based on user feedbacks. ([#520](jawah/charset_normalizer#520)) ([#509](jawah/charset_normalizer#509)) ([#498](jawah/charset_normalizer#498)) ([#407](jawah/charset_normalizer#407)) ([#537](jawah/charset_normalizer#537)) - Declared charset in content (preemptive detection) not changed when converting to utf-8 bytes. ([#381](jawah/charset_normalizer#381))
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Provide the file
110-original.zip
Verbose output
Using the CLI, run
normalizer -v ./my-file.txt
and past the result in here.enca
will however detect UTF-8 as it shouldExpected encoding
Expected normalizer to show UTF-8 encoding after conversion to UTF-8.
Am I wrong here?
Desktop (please complete the following information):
Additional context
I know. Html is not the same as text.
But I will document this here.
I think that "declarative mark" should not take over like that. But I am new to this encoding world....
The text was updated successfully, but these errors were encountered: