Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP content-type isn't using charset #689

Closed
wants to merge 1 commit into from

Conversation

Cap-JaTu
Copy link

On a typical API-schema, content types don't have charsets. For example application/json charset is assumed to be UTF-8 by RFC 8259. For example text/xml, there is relevance. In neither case API-schema doesn't specify multiple contents for different charsets.

Thus, not using HTTP-response charset for API-schema verification doesn't make sense. For those response verifications with assumed charset, there is no need to fail. For those responses which are in varying charsets, verifying data is impossible as there is no decoding the response data into Unicode used by Python internally. Again, charset doesn't have any relevance and can be ignored.

Ref.: https://datatracker.ietf.org/doc/html/rfc2046#section-4.1.2
Ref.: https://datatracker.ietf.org/doc/html/rfc8259#section-8.1

@p1c2u
Copy link
Collaborator

p1c2u commented Oct 12, 2023

Hi @Cap-JaTu

do you have any specific use case where you need to drop charset? It looks similar case to #378

Unreleased version (current master) has charset handling for deserializing data (See #678).

For those response verifications with assumed charset, there is no need to fail.

This one was fixed with the mentioned change above. Please feel free to test it.

For those responses which are in varying charsets, verifying data is impossible as there is no decoding the response data into Unicode used by Python internally.

This one is done with media type deserializing process of the library.

@Cap-JaTu
Copy link
Author

The easy stuff: obviously, I have no idea on the product's roadmap. There isn't a single word of documentation on charset handling.

For use case, I could go to ChatGPT and ask it to say my PR description using different words. That would be fruitless, rude even. Instead, as a person living in real world HTTP-responses do have charset specifier in them, I'd simply love them not to be part of validation pass/fail test.

@p1c2u
Copy link
Collaborator

p1c2u commented Oct 12, 2023

As I mentioned, the only part charsets are considered in validation is media type object. I believe you ran into the issue with charset that was solved recently. Please consider testing unreleased version (or wait for alpha version release) and let me know if it solved your issue.

@Cap-JaTu
Copy link
Author

In

return self.deserializer_callable(value, **self.parameters)

Error:
TypeError: __init__() got an unexpected keyword argument 'charset'

Obviously JSON-library doesn't understand parameter:
{'charset': 'utf-8'}

I'd like to put emphasis on the fact: In real world HTTP-responses have Content-Types with "; charset=" definitions. For reason really unknown to me, this library ignores this.

@p1c2u
Copy link
Collaborator

p1c2u commented Oct 17, 2023

@Cap-JaTu thanks for the report. I will fix this.

I'd like to put emphasis on the fact: In real world HTTP-responses have Content-Types with "; charset=" definitions. For reason really unknown to me, this library ignores this.

I'm in the process of deserialization re-implementation. Do you know of other places where charset is ignored?

@p1c2u
Copy link
Collaborator

p1c2u commented Oct 31, 2023

It was fixed with #699 hence closing

@p1c2u p1c2u closed this Oct 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants