Utf8JsonReader/Writer should expose useful (currently internal) methods. #54016

olivier-spinelli · 2021-06-10T17:47:02Z

Background and Motivation

The goal is to implement a Json read/write default behavior that correctly handles "big integers" in the context of ECMAScript without data loss and using strings for all numbers.
Important: This is at the Utf8JsonReader/Writer level, not at the serializer level (the actual de/serialization code is generated and uses the reader/writer).

JSON format defines no limitation on numeric values. 64 bits integer and BigInteger can be write and read without loss in JSON.

Unfortunately (n°1), ECMAScript only handles 64 bits floats: 64 bits integer (and larger) cannot be write and read back safely. Only integer values between Number.MIN_SAFE_INTEGER and Number.MAX_SAFE_INTEGER (53 bits) are safe on the ECMAScript side.
Among the basic types, the long (Int64), ulong (UInt64) and Decimal (and of course, the new BigInt) representations are not ECMAScript compliant. They must use a string representation, just like JsonNumberHandling.WriteAsString specifies it.
Unfortunately (n°2), this option is NOT available on the Utf8JsonWriter API. It is reserved for the "Serialization" layer (and it seems to be a global option: all the numbers are impacted).
Unfortunately (n°3), the APIs to write (WriteNumberValueAsString(ulong)) or read (GetUInt64WithQuotes) quoted long or unsigned long (same for Decimal) are not publicly exposed on the Utf8Writer/Reader.

These APIs are de facto required if one want to implement a specific treatement for numbers (and may also help people that deal with strange/exotic/ill-formed Json in input as well as for output). Currently, only the (internal) converters can use them.

Proposed API

The proposal is rather simple since it's about publishing existing internals. Only numeric types that are not "ECMAScript compliant" can be concerned, or all the types.

Minimalist approach (publicly expose the currently internal following methods):

On the Utf8JsonWriter, there is nothing to do:

        void WriteNumberValueAsString(ulong value);
        void WriteNumberValueAsString(long value);
        void WriteNumberValueAsString(decimal value);

On the Utf8JsonReader. The Get/TryGet pattern should be exposed:

       bool TryGetInt64WithQuotes( out long value );
       long GetInt64WithQuotes();
       bool TryGetUInt64WithQuotes( out ulong value );
       ulong GetUInt64WithQuotes();
       bool TryGetDecimalWithQuotes( out decimal value );
       decimal GetDecimalWithQuotes();

Note: The current GetXXXWithQuotes implementations are called by converters that guard their calls.

Full types covering

For the sake of completness, the same API for other numbers (that are "ECMAScript compliant") can be exposed: byte, sbyte, short, ushort, etc. (even if it's not strictly required to support ECMAScript).

Alternative Designs

The Get name pattern may be changed from [Try]GetXXXWithQuotes to [Try]GetQuotedXXX or (to enforce the parallel with the writer) to [Try]GetXXXFromString.

Usage

Instead of writing this unefficient code:

writer.WriteStringValue( d.ToString( System.Globalization.NumberFormatInfo.InvariantInfo ) );
...
d = Decimal.Parse( reader.GetString() );

We'll be able to write:

writer.WriteNumberValueAsString( d );
...
d = reader.GetDecimalWithQuotes();

Risks

None (since this doesn't weaken the Validation layer).

The text was updated successfully, but these errors were encountered:

ghost · 2021-06-10T17:47:06Z

Tagging subscribers to this area: @eiriktsarpalis, @layomia
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and Motivation

The goal is to implement a Json read/write default behavior that correctly handles "big integers" in the context of ECMAScript without data loss and using strings for all numbers.
Important: This is at the Utf8JsonReader/Writer level, not at the serializer level (the actual de/serialization code is generated and uses the reader/writer).

JSON format defines no limitation on numeric values. 64 bits integer and BigInteger can be write and read without loss in JSON.

Unfortunately (n°1), ECMAScript only handles 64 bits floats: 64 bits integer (and larger) cannot be write and read back safely. Only integer values between Number.MIN_SAFE_INTEGER and Number.MAX_SAFE_INTEGER (53 bits) are safe on the ECMAScript side.
Among the basic types, the long (Int64), ulong (UInt64) and Decimal (and of course, the new BigInt) representations are not ECMAScript compliant. They must use a string representation, just like JsonNumberHandling.WriteAsString specifies it.
Unfortunately (n°2), this option is NOT available on the Utf8JsonWriter API. It is reserved for the "Serialization" layer (and it seems to be a global option: all the numbers are impacted).
Unfortunately (n°3), the APIs to write (WriteNumberValueAsString(ulong)) or read (GetUInt64WithQuotes) quoted long or unsigned long (same for Decimal) are not publicly exposed on the Utf8Writer/Reader.

These APIs are de facto required if one want to implement a specific treatement for numbers (and may also help people that deal with strange/exotic/ill-formed Json in input as well as for output). Currently, only the (internal) converters can use them.

Proposed API

The proposal is rather simple since it's about publishing existing internals. Only numeric types that are not "ECMAScript compliant" can be concerned, or all the types.

Minimalist approach (publicly expose the currently internal following methods):

On the Utf8JsonWriter, there is nothing to do:

        void WriteNumberValueAsString(ulong value);
        void WriteNumberValueAsString(long value);
        void WriteNumberValueAsString(decimal value);

On the Utf8JsonReader. The Get/TryGet pattern should be exposed:

       bool TryGetInt64WithQuotes( out long value );
       long GetInt64WithQuotes();
       bool TryGetUInt64WithQuotes( out ulong value );
       ulong GetUInt64WithQuotes();
       bool TryGetDecimalWithQuotes( out decimal value );
       decimal GetDecimalWithQuotes();

Note: The current GetXXXWithQuotes implementations are called by converters that guard their calls.

Full types covering.

For the sake of completness, the same API for other numbers (that are "ECMAScript compliant") can be exposed: byte, sbyte, short, ushort, etc.

Alternative Designs

The Get name pattern may be changed from [Try]GetXXXWithQuotes to [Try]GetQuotedXXX or (to enforce the parallel with the writer) to [Try]GetXXXFromString.

Risks

None (since this doesn't weaken the Validation layer).

Author:	olivier-spinelli
Assignees:	-
Labels:	`api-suggestion`, `area-System.Text.Json`, `untriaged`
Milestone:	-

olivier-spinelli · 2021-06-11T04:35:56Z

My personal choice would be the "FromString" naming:

writer.WriteNumberValueAsString( d );
...
d = reader.GetDecimalFromString();

eiriktsarpalis · 2021-10-14T18:21:37Z

If I'm understanding the request correctly, it should still be possible to achieve this using a custom converter that converts the number to a string and passing to either of the existing writer.WriteStringValue/reader.GetString methods. This can result in more allocations but you can still avoid these if you're willing to go the extra mile and do a bit of buffer management on the converter level.

Honestly, I don't believe this meets the bar for inclusion to System.Text.Json. The appropriate layer for expressing these types of concerns is converter layer and I believe the existing APIs should be sufficient to achieve this.

olivier-spinelli · 2021-10-15T09:11:48Z

So you and/or Microsoft and/or the official Json serializer layer would ever be the only one to implement basically optimized (understand: not stupidly unefficient) serialization code?

I may have missed to explain one important thing: I'm working at the writer level, I have NO "serializer" at hand (this beast is "above" the level of the generated code that implement serialization).

May you, please, reconsider this (hard to understand) answer?

eiriktsarpalis · 2021-10-15T10:42:13Z

So you and/or Microsoft and/or the official Json serializer layer would ever be the only one to implement basically optimized (understand: not stupidly unefficient) serialization code?

I'm not sure I follow, would it not be possible to implement a similarly efficient method on your side if necessary? Utf8JsonWriter exposes string writing overloads that permit serializing pre-encoded buffers and as of .NET 6 we there's the new WriteRaw methods. For the deserialization side you can use the ValueSpan and ValueSequence properties, although care needs to be taken so that escaping is taken into account: we are planning to address that kink with #54410 in .NET 7.

I may have missed to explain one important thing: I'm working at the writer level, I have NO "serializer" at hand (this beast is "above" the level of the generated code that implement serialization).

I'm sure this should not be a blocker, since the above could be expressed as extension methods on the writer and reader classes.

olivier-spinelli · 2021-10-15T13:48:00Z

With the new Raw write, you're right. I will happily have to duplicate code... ;)
Thanks for your time.

eiriktsarpalis · 2021-10-15T13:55:16Z

With the new Raw write, you're right. I will happily have to duplicate code... ;)

We can revisit if there is strong evidence that lots of people need this, but in practice there are always minute variations in user requirements. In the interest of keeping the library size small, we prioritize having a small set of APIs that unblock users (even if it does require a bit of boilerplate on their side) rather than exposing functionality for every conceivable use case.

olivier-spinelli added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Jun 10, 2021

dotnet-issue-labeler bot added area-System.Text.Json untriaged New issue has not been triaged by the area owner labels Jun 10, 2021

olivier-spinelli mentioned this issue Jun 11, 2021

Writing raw JSON values when using Utf8JsonWriter #1784

Closed

eiriktsarpalis added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed untriaged New issue has not been triaged by the area owner labels Jul 16, 2021

eiriktsarpalis added this to the 7.0.0 milestone Jul 16, 2021

eiriktsarpalis added the api-needs-work API needs work before it is approved, it is NOT ready for implementation label Jul 16, 2021

eiriktsarpalis closed this as completed Oct 14, 2021

ghost locked as resolved and limited conversation to collaborators Nov 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Utf8JsonReader/Writer should expose useful (currently internal) methods. #54016

Utf8JsonReader/Writer should expose useful (currently internal) methods. #54016

olivier-spinelli commented Jun 10, 2021 •

edited

Loading

ghost commented Jun 10, 2021

Background and Motivation

Proposed API

Minimalist approach (publicly expose the currently internal following methods):

Full types covering.

Alternative Designs

Risks

olivier-spinelli commented Jun 11, 2021

eiriktsarpalis commented Oct 14, 2021

olivier-spinelli commented Oct 15, 2021 •

edited

Loading

eiriktsarpalis commented Oct 15, 2021

olivier-spinelli commented Oct 15, 2021

eiriktsarpalis commented Oct 15, 2021

Utf8JsonReader/Writer should expose useful (currently internal) methods. #54016

Utf8JsonReader/Writer should expose useful (currently internal) methods. #54016

Comments

olivier-spinelli commented Jun 10, 2021 • edited Loading

Background and Motivation

Proposed API

Minimalist approach (publicly expose the currently internal following methods):

Full types covering

Alternative Designs

Usage

Risks

ghost commented Jun 10, 2021

Background and Motivation

Proposed API

Minimalist approach (publicly expose the currently internal following methods):

Full types covering.

Alternative Designs

Risks

olivier-spinelli commented Jun 11, 2021

eiriktsarpalis commented Oct 14, 2021

olivier-spinelli commented Oct 15, 2021 • edited Loading

eiriktsarpalis commented Oct 15, 2021

olivier-spinelli commented Oct 15, 2021

eiriktsarpalis commented Oct 15, 2021

olivier-spinelli commented Jun 10, 2021 •

edited

Loading

olivier-spinelli commented Oct 15, 2021 •

edited

Loading