Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Response bodies aren’t encoded when charset in header doesn’t have the right case #542

Closed
Mr0grog opened this issue May 27, 2017 · 3 comments · Fixed by #543
Closed

Comments

@Mr0grog
Copy link
Contributor

Mr0grog commented May 27, 2017

I’ve been having some character encoding issues with responses that seemed ok. For example, a response with the header:

Content-Type: text/plain; charset=utf-8

Would be encoded as ASCII-8BIT when I received it:

> response.body.encoding
=> #<Encoding:ASCII-8BIT>

This appears to be because Request#encode_with_ruby_encoding does an exact name match on the list of encodings to see whether it can be converted:

def encode_with_ruby_encoding(body, charset)
  if !body.nil? && Encoding.name_list.include?(charset)
    body.force_encoding(charset)
    ...

…but the name for utf-8 is UTF-8, not utf-8, as it was set in the HTTP header in my example above.

It looks like this broke in c46663d, where the check switched from Encoding.find(charset) to Encoding.name_list.include?(charset), I assume to avoid the rescue block. We could probably do a case-insensitive search on name_list, but I think that could still get false-positives or miss aliases. (I didn’t even know encoding aliases were a thing until today!) So I think the right fix here is to switch back to the rescue statement (but only rescue from ArgumentError).

I think this might be the cause behind #541 as well; the timing of the change seems right.

I’m happy to submit a PR if desired; just let me know!

Side note: I thought my conclusion here was crazy at first because there is a test for exactly this scenario. It turns out the test doesn’t really work: the stubbed response is already a UTF-8 string, so when httparty fails to apply the encoding from the HTTP header, the test passes anyway because the body was already properly encoded to begin with.

Mr0grog added a commit to edgi-govdata-archiving/web-monitoring-db that referenced this issue May 31, 2017
It turns out HTTParty has a pretty bad bug with response encodings: jnunemaker/httparty#542
This is hopefully just a temporary patch.
Mr0grog added a commit to edgi-govdata-archiving/web-monitoring-db that referenced this issue May 31, 2017
It turns out HTTParty has a pretty bad bug with response encodings: jnunemaker/httparty#542
This is hopefully just a temporary patch.
Mr0grog added a commit to edgi-govdata-archiving/web-monitoring-db that referenced this issue Jun 1, 2017
It turns out HTTParty has a pretty bad bug with response encodings: jnunemaker/httparty#542
This is hopefully just a temporary patch.
@jaypatrickhoward
Copy link

FWIW we're pinned to 0.14.x until this fix makes its way into a release.

@jaypatrickhoward
Copy link

Happy to test and verify the fix also addresses #541 once that happens.

Mr0grog added a commit to Mr0grog/httparty that referenced this issue Jun 2, 2017
As long as the charset is a known one according to Ruby's matching algorithm (which appears to be case insensitive and also includes aliases), we should use that encoding.

Fixes jnunemaker#542 (and maybe jnunemaker#541 as well)
@Mr0grog
Copy link
Contributor Author

Mr0grog commented Jun 2, 2017

@jaypatrickhoward I just posted a PR if you want to take a look: #543

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants