-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
THREESCALE-11063: Fix parsing response if Content-Type contains charset #528
Conversation
I think that if we support |
I opened https://issues.redhat.com/browse/THREESCALE-11073 for dealing with it, as I think we need to think a bit more about the best way to approach it. |
app/adapters/abstract_adapter.rb
Outdated
@@ -59,6 +59,9 @@ def headers | |||
JSON_TYPE = Mime[:json] | |||
private_constant :JSON_TYPE | |||
|
|||
DEFAULT_ENCODING = 'utf-8' | |||
private_constant :DEFAULT_ENCODING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you checked utf-8
is how rails parses the string by default :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think json
does that indeed, while I'm not 100% sure that's the path: https://github.com/flori/json/blob/v2.6.1/lib/json/pure/parser.rb#L185-L187
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that strings are converted to UTF8 from whatever they originally were, they are not read as UTF-8 unconditionally. Seems like in the part you link to, they have generated an ASCII-8BIT string that they want now to treat as UTF8 (because it was generated to be valid UTF-8). See
https://github.com/flori/json/blob/v2.6.1/lib/json/pure/parser.rb#L80
So the question is whether body
is loaded with the supplied charset=
property or is always treated as ASCII 8bit or is always treated as UTF-8.
My understanding is that if we see charset in the content type to be something, we need to treat that string as the specified charset. It would be nice if rack did that automatically but does it do it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more information about the charset.
https://medium.com/@lefloh/when-to-use-a-charset-attribute-for-a-content-type-header-ffeb5838d3c3
json content type is supposed to default to utf-8. In this case I expect things to work with UTF-8. Interesting to know whether Rails automatically uses that information. I'll try to test that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that this has nothing to do with Rails. We are using httpclient to make requests to the IDP. And it does take into account the charset
parameter. And interprets body
according to it.
So excuse me for the trouble 🐛
I think we can just go ahead with your original commit, no need to process the charset by ourselves.
$ cat /tmp/index.http
HTTP/1.1 200 OK
Content-Type: text/html; charset=UTF-16
Content-Length: 7
Server: netcat!
asdfgh
$ nc -l 8000 < /tmp/index.http # in a separate terminal
$ bundle exec httpclient
irb(HTTPClient):023:0> resp = get 'http://localhost:8000'
=>
#<HTTP::Message:0x00007f296c299388
...
irb(HTTPClient):024:0> resp.content_type
=> "text/html; charset=UTF-16"
irb(HTTPClient):025:0> resp.body
=> "\x61\x73\x64\x66\x67\x68\n"
irb(HTTPClient):026:0> resp.body.encoding
=> #<Encoding:UTF-16 (dummy)>
If charset
is UTF-8
then httpclient sets UTF-8
for the body. So I believe we are fine.
And while on it - yes, the body encoding for application/json
is UTF-8
if no charset
is specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's all great info @akostadinov Thank you for digging into it and providing the verification steps 👏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note: httpclient
implements this feature here nahi/httpclient@e5efea5
ans fixes this issue nahi/httpclient#26
app/adapters/abstract_adapter.rb
Outdated
charset_match = content_type.match(/charset=(\S+)/) | ||
raise InvalidResponseError.new response: response, message: "Invalid charset: #{match[0]}" if charset_match && charset_match[1].downcase != DEFAULT_ENCODING | ||
|
||
JSON.parse(body) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing JSON.parse(body.force_encoding('utf-8'))
here makes test complicated, because it throws "can't modify frozen string".
app/adapters/abstract_adapter.rb
Outdated
when JSON_TYPE then JSON.parse(body) | ||
else raise InvalidResponseError.new response: response, message: 'Unknown Content-Type' | ||
end | ||
raise InvalidResponseError.new response: response, message: 'Unknown Content-Type' unless Mime::Type.lookup(content_type).match? JSON_TYPE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think though, we may want to use start_with?
instead of match?
6491a71
to
b39d620
Compare
@akostadinov I reverted the 2nd commit, does it look good now? :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this code suggestion works, it would be much preferable. Otherwise we should be good with match?. Thank you!
case Mime::Type.lookup(content_type) | ||
when JSON_TYPE then JSON.parse(body) | ||
else raise InvalidResponseError.new response: response, message: 'Unknown Content-Type' | ||
if Mime::Type.lookup(content_type).match? JSON_TYPE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if Mime::Type.lookup(content_type).match? JSON_TYPE | |
if Mime::Type.lookup(content_type).start_with? JSON_TYPE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this code suggestion works, it would be much preferable. Otherwise we should be good with match?
. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this code suggestion works, it would be much preferable. Otherwise we should be good with
match?
. Thank you!
Well, they are not strings, they are instances of Mime::Type
, and this match?
is "special": https://github.com/rails/rails/blob/v7.0.8.1/actionpack/lib/action_dispatch/http/mime_type.rb#L287-L291
It takes into account some "synonyms", which I don't know what it is, and we probably don't need it. Potentially, we could either 1. apply to_s.downcase
to both and match with start_with?
as you suggest, or 2. drop he whole Mime::Type.lookup
thing and work with plain strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then keep it as is. Fine with me, thank you!
b39d620
to
38e3794
Compare
Fixing the issue: https://issues.redhat.com/browse/THREESCALE-11063
It was failing previously with: