Response bodies aren’t encoded when `charset` in header doesn’t have the right case #542

Mr0grog · 2017-05-27T07:00:20Z

I’ve been having some character encoding issues with responses that seemed ok. For example, a response with the header:

Content-Type: text/plain; charset=utf-8

Would be encoded as ASCII-8BIT when I received it:

> response.body.encoding
=> #<Encoding:ASCII-8BIT>

This appears to be because Request#encode_with_ruby_encoding does an exact name match on the list of encodings to see whether it can be converted:

def encode_with_ruby_encoding(body, charset)
  if !body.nil? && Encoding.name_list.include?(charset)
    body.force_encoding(charset)
    ...

…but the name for utf-8 is UTF-8, not utf-8, as it was set in the HTTP header in my example above.

It looks like this broke in c46663d, where the check switched from Encoding.find(charset) to Encoding.name_list.include?(charset), I assume to avoid the rescue block. We could probably do a case-insensitive search on name_list, but I think that could still get false-positives or miss aliases. (I didn’t even know encoding aliases were a thing until today!) So I think the right fix here is to switch back to the rescue statement (but only rescue from ArgumentError).

I think this might be the cause behind #541 as well; the timing of the change seems right.

I’m happy to submit a PR if desired; just let me know!

Side note: I thought my conclusion here was crazy at first because there is a test for exactly this scenario. It turns out the test doesn’t really work: the stubbed response is already a UTF-8 string, so when httparty fails to apply the encoding from the HTTP header, the test passes anyway because the body was already properly encoded to begin with.

The text was updated successfully, but these errors were encountered:

It turns out HTTParty has a pretty bad bug with response encodings: jnunemaker/httparty#542 This is hopefully just a temporary patch.

jaypatrickhoward · 2017-06-01T19:09:53Z

FWIW we're pinned to 0.14.x until this fix makes its way into a release.

jaypatrickhoward · 2017-06-01T19:10:30Z

Happy to test and verify the fix also addresses #541 once that happens.

As long as the charset is a known one according to Ruby's matching algorithm (which appears to be case insensitive and also includes aliases), we should use that encoding. Fixes jnunemaker#542 (and maybe jnunemaker#541 as well)

Mr0grog · 2017-06-02T01:58:06Z

@jaypatrickhoward I just posted a PR if you want to take a look: #543

Mr0grog mentioned this issue May 31, 2017

Add API for getting diffs edgi-govdata-archiving/web-monitoring-db#60

Merged

Mr0grog mentioned this issue Jun 2, 2017

Ensure correct encoding is used for Content-Types that are lower-case or aliases #543

Merged

jnunemaker closed this as completed in #543 Jun 9, 2017

jaypatrickhoward mentioned this issue Jul 10, 2017

httparty trying to json parse string w/ invalid byte sequence #541

Open

Mr0grog mentioned this issue Aug 6, 2017

Update dependencies, remove HTTParty bugfix edgi-govdata-archiving/web-monitoring-db#117

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Response bodies aren’t encoded when `charset` in header doesn’t have the right case #542

Response bodies aren’t encoded when `charset` in header doesn’t have the right case #542

Mr0grog commented May 27, 2017 •

edited

Loading

jaypatrickhoward commented Jun 1, 2017

jaypatrickhoward commented Jun 1, 2017

Mr0grog commented Jun 2, 2017

Response bodies aren’t encoded when charset in header doesn’t have the right case #542

Response bodies aren’t encoded when charset in header doesn’t have the right case #542

Comments

Mr0grog commented May 27, 2017 • edited Loading

jaypatrickhoward commented Jun 1, 2017

jaypatrickhoward commented Jun 1, 2017

Mr0grog commented Jun 2, 2017

Response bodies aren’t encoded when `charset` in header doesn’t have the right case #542

Response bodies aren’t encoded when `charset` in header doesn’t have the right case #542

Mr0grog commented May 27, 2017 •

edited

Loading