Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

THREESCALE-11063: Fix parsing response if Content-Type contains charset #528

Merged
merged 1 commit into from
Jun 3, 2024

Conversation

mayorova
Copy link
Contributor

Fixing the issue: https://issues.redhat.com/browse/THREESCALE-11063

It was failing previously with:

AbstractAdapter::OIDC::AuthenticationError: Unknown Content-Type

@akostadinov
Copy link
Contributor

I think that if we support charset=UTF-8, we also need to #force_encoding.

@mayorova
Copy link
Contributor Author

I think that if we support charset=UTF-8, we also need to #force_encoding.

I opened https://issues.redhat.com/browse/THREESCALE-11073 for dealing with it, as I think we need to think a bit more about the best way to approach it.

akostadinov
akostadinov previously approved these changes May 28, 2024
@@ -59,6 +59,9 @@ def headers
JSON_TYPE = Mime[:json]
private_constant :JSON_TYPE

DEFAULT_ENCODING = 'utf-8'
private_constant :DEFAULT_ENCODING
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you checked utf-8 is how rails parses the string by default :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think json does that indeed, while I'm not 100% sure that's the path: https://github.com/flori/json/blob/v2.6.1/lib/json/pure/parser.rb#L185-L187

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that strings are converted to UTF8 from whatever they originally were, they are not read as UTF-8 unconditionally. Seems like in the part you link to, they have generated an ASCII-8BIT string that they want now to treat as UTF8 (because it was generated to be valid UTF-8). See

https://github.com/flori/json/blob/v2.6.1/lib/json/pure/parser.rb#L80

So the question is whether body is loaded with the supplied charset= property or is always treated as ASCII 8bit or is always treated as UTF-8.

My understanding is that if we see charset in the content type to be something, we need to treat that string as the specified charset. It would be nice if rack did that automatically but does it do it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more information about the charset.

https://medium.com/@lefloh/when-to-use-a-charset-attribute-for-a-content-type-header-ffeb5838d3c3

json content type is supposed to default to utf-8. In this case I expect things to work with UTF-8. Interesting to know whether Rails automatically uses that information. I'll try to test that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this has nothing to do with Rails. We are using httpclient to make requests to the IDP. And it does take into account the charset parameter. And interprets body according to it.

So excuse me for the trouble 🐛

I think we can just go ahead with your original commit, no need to process the charset by ourselves.

$ cat /tmp/index.http 
HTTP/1.1 200 OK
Content-Type: text/html; charset=UTF-16
Content-Length: 7
Server: netcat!

asdfgh

$ nc -l 8000 < /tmp/index.http # in a separate terminal

$ bundle exec httpclient
irb(HTTPClient):023:0> resp = get 'http://localhost:8000'
=> 
#<HTTP::Message:0x00007f296c299388
...
irb(HTTPClient):024:0> resp.content_type
=> "text/html; charset=UTF-16"
irb(HTTPClient):025:0> resp.body
=> "\x61\x73\x64\x66\x67\x68\n"
irb(HTTPClient):026:0> resp.body.encoding
=> #<Encoding:UTF-16 (dummy)>

If charset is UTF-8 then httpclient sets UTF-8 for the body. So I believe we are fine.

And while on it - yes, the body encoding for application/json is UTF-8 if no charset is specified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's all great info @akostadinov Thank you for digging into it and providing the verification steps 👏

Copy link
Contributor Author

@mayorova mayorova May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note: httpclient implements this feature here nahi/httpclient@e5efea5

ans fixes this issue nahi/httpclient#26

charset_match = content_type.match(/charset=(\S+)/)
raise InvalidResponseError.new response: response, message: "Invalid charset: #{match[0]}" if charset_match && charset_match[1].downcase != DEFAULT_ENCODING

JSON.parse(body)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing JSON.parse(body.force_encoding('utf-8')) here makes test complicated, because it throws "can't modify frozen string".

when JSON_TYPE then JSON.parse(body)
else raise InvalidResponseError.new response: response, message: 'Unknown Content-Type'
end
raise InvalidResponseError.new response: response, message: 'Unknown Content-Type' unless Mime::Type.lookup(content_type).match? JSON_TYPE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think though, we may want to use start_with? instead of match?

@mayorova
Copy link
Contributor Author

@akostadinov I reverted the 2nd commit, does it look good now? :)

Copy link
Contributor

@akostadinov akostadinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this code suggestion works, it would be much preferable. Otherwise we should be good with match?. Thank you!

case Mime::Type.lookup(content_type)
when JSON_TYPE then JSON.parse(body)
else raise InvalidResponseError.new response: response, message: 'Unknown Content-Type'
if Mime::Type.lookup(content_type).match? JSON_TYPE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if Mime::Type.lookup(content_type).match? JSON_TYPE
if Mime::Type.lookup(content_type).start_with? JSON_TYPE

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this code suggestion works, it would be much preferable. Otherwise we should be good with match?. Thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this code suggestion works, it would be much preferable. Otherwise we should be good with match?. Thank you!

Well, they are not strings, they are instances of Mime::Type, and this match? is "special": https://github.com/rails/rails/blob/v7.0.8.1/actionpack/lib/action_dispatch/http/mime_type.rb#L287-L291
It takes into account some "synonyms", which I don't know what it is, and we probably don't need it. Potentially, we could either 1. apply to_s.downcase to both and match with start_with? as you suggest, or 2. drop he whole Mime::Type.lookup thing and work with plain strings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then keep it as is. Fine with me, thank you!

@mayorova mayorova force-pushed the fix-content-type-with-charset branch from b39d620 to 38e3794 Compare June 3, 2024 08:29
@mayorova mayorova merged commit 28d1619 into master Jun 3, 2024
9 checks passed
@mayorova mayorova deleted the fix-content-type-with-charset branch June 3, 2024 08:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants