Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use correct encoding when fetching non-UTF-8 site metadata #2015

Merged
merged 2 commits into from
Jan 6, 2022

Conversation

rgroothuijsen
Copy link
Contributor

When the site metadata is fetched, the default assumption is that it will be encoded in UTF-8, but this is not always the case. The result is that the metadata will be displayed in the frontend as garbled characters. This PR adds an additional check on the charset property of the fetched page if present, and will re-decode the fetched bytes with the specified encoding if possible. Should an unknown encoding be specified, it will fall back to the original UTF-8 data.

Fixes #1858

NOTE: An unrelated fix is also included, as the website in the original issue started its response with a blank line before the DOCTYPE declaration. For this purpose, trim_start() was added to the HTML parsing.

Copy link
Member

@dessalines dessalines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Card is garbled.
3 participants