-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider adding a JavaScriptEncoder implementation that doesn't encode the block list or surrogate pairs. #42847
Comments
There is no built-in mechanism for allowing this. You'd have to subclass the @layomia - We could consider allowing the unsafe relaxed escaper to allow known-good supplementary characters. This isn't trivial work, though. |
Thank you for the reply. This is blocking us from migrating off of Newtonsoft. |
Could you describe why this is a blocker? Those values are correctly deserialized; it just means that if you look a the raw JSON payload you'll seem the escaped. |
@terrajobst sure. Our mobile apps display the escaped text as is instead of the emojis. |
That sounds like a problem with your decoding app not correctly handling valid JSON. The JSON specification says: ' To escape an extended character that is not in the Basic Multilingual Plane, the character is represented as a twelve-character sequence, encoding the UTF-16 surrogate pair. So, for example, a string containing only the G clef character (U+1D11E) may be represented as "\uD834\uDD1E". ' Technically only the quotation mark and reverse solidus require escaping (it is the only way to represent them), so maybe they could add switches to control if other character do or do not get encoded (tabs, line feeds, control characters, values outside ASCII, values outside BMP). e.g. the UTF-8 byte sequence will be shorted than the 12-byte escaped surrogate pair, so save a few bytes. But the decoding app should still handle all valid encodings. You are asking for a work around to compensate for a bug in your mobile app. |
I disagree with this take.
I'm honestly surprised more people aren't having this issue considering how prevalent emojis are in user generated content. |
Because as mentioned previously, this appears to be a bug in the component that is deserializing the JSON payload. All JSON deserializers in wide use that we're aware of follow the RFC and properly turn the \uXXXX\uYYYY pair back into the correct emoji. Would subclassing the escaper (see my comment from a few months ago) work for your scenario? This puts you in full control over how the payload is generated, which should give you sufficient berth to work around other bugs in the deserialization component you use. I suspect if that component isn't properly decoding \uXXXX substrings, it probably has other bugs that you may need to work around. |
I will look into a workaround. Thank you all for the responses. Will close this issue. |
The decision that was made here conflicts with major browser's behavior including Microsoft Edge.
Demonstration of JSON serialization and deserialization that contains an emoji. |
Can you provide an example of a System.Text.Json-serialized payload that the browser API |
Hi there! On the other hand, browsers by default serialize and deserialize the emojis as is as seen in the screen shots. |
I switched to Newtonsoft Json after spending several hours on this.
|
Reopening for future consideration. |
Tagging subscribers to this area: @dotnet/area-system-text-encodings-web Issue DetailsI would like to know if there is a way or any intention for the System.Text.Json serializer to support emojis defined with surrogate pairs? When we serialize "📲" we get its representation as escaped unicode values. I would like it to remain unescaped. static void Main()
{
var encoderSettings = new TextEncoderSettings();
encoderSettings.AllowRange(UnicodeRanges.All);
var serializerOptions = new JsonSerializerOptions
{
Encoder = JavaScriptEncoder.Create(encoderSettings),
WriteIndented = true
};
var json = JsonSerializer.Serialize("📲", serializerOptions);
Console.WriteLine(json);
// "\uD83D\uDCF2"
} Thank you in advance.
|
Also fixes dotnet#86800. Also fixes dotnet#87138 (except docs outside this repo).
Closing in favor of #87153. |
I would like to know if there is a way or any intention for the System.Text.Json serializer to support emojis defined with surrogate pairs?
When we serialize "📲" we get its representation as escaped unicode values. I would like it to remain unescaped.
Thank you in advance.
The text was updated successfully, but these errors were encountered: