Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utf8JsonReader/Writer should expose useful (currently internal) methods. #54016

Closed
olivier-spinelli opened this issue Jun 10, 2021 · 7 comments
Closed
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Text.Json needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration
Milestone

Comments

@olivier-spinelli
Copy link

olivier-spinelli commented Jun 10, 2021

Background and Motivation

The goal is to implement a Json read/write default behavior that correctly handles "big integers" in the context of ECMAScript without data loss and using strings for all numbers.
Important: This is at the Utf8JsonReader/Writer level, not at the serializer level (the actual de/serialization code is generated and uses the reader/writer).

JSON format defines no limitation on numeric values. 64 bits integer and BigInteger can be write and read without loss in JSON.

  • Unfortunately (n°1), ECMAScript only handles 64 bits floats: 64 bits integer (and larger) cannot be write and read back safely. Only integer values between Number.MIN_SAFE_INTEGER and Number.MAX_SAFE_INTEGER (53 bits) are safe on the ECMAScript side.
    Among the basic types, the long (Int64), ulong (UInt64) and Decimal (and of course, the new BigInt) representations are not ECMAScript compliant. They must use a string representation, just like JsonNumberHandling.WriteAsString specifies it.
  • Unfortunately (n°2), this option is NOT available on the Utf8JsonWriter API. It is reserved for the "Serialization" layer (and it seems to be a global option: all the numbers are impacted).
  • Unfortunately (n°3), the APIs to write (WriteNumberValueAsString(ulong)) or read (GetUInt64WithQuotes) quoted long or unsigned long (same for Decimal) are not publicly exposed on the Utf8Writer/Reader.

These APIs are de facto required if one want to implement a specific treatement for numbers (and may also help people that deal with strange/exotic/ill-formed Json in input as well as for output). Currently, only the (internal) converters can use them.

Proposed API

The proposal is rather simple since it's about publishing existing internals. Only numeric types that are not "ECMAScript compliant" can be concerned, or all the types.

Minimalist approach (publicly expose the currently internal following methods):

On the Utf8JsonWriter, there is nothing to do:

        void WriteNumberValueAsString(ulong value);
        void WriteNumberValueAsString(long value);
        void WriteNumberValueAsString(decimal value);

On the Utf8JsonReader. The Get/TryGet pattern should be exposed:

       bool TryGetInt64WithQuotes( out long value );
       long GetInt64WithQuotes();
       bool TryGetUInt64WithQuotes( out ulong value );
       ulong GetUInt64WithQuotes();
       bool TryGetDecimalWithQuotes( out decimal value );
       decimal GetDecimalWithQuotes();

Note: The current GetXXXWithQuotes implementations are called by converters that guard their calls.

Full types covering

For the sake of completness, the same API for other numbers (that are "ECMAScript compliant") can be exposed: byte, sbyte, short, ushort, etc. (even if it's not strictly required to support ECMAScript).

Alternative Designs

The Get name pattern may be changed from [Try]GetXXXWithQuotes to [Try]GetQuotedXXX or (to enforce the parallel with the writer) to [Try]GetXXXFromString.

Usage

Instead of writing this unefficient code:

writer.WriteStringValue( d.ToString( System.Globalization.NumberFormatInfo.InvariantInfo ) );
...
d = Decimal.Parse( reader.GetString() ); 

We'll be able to write:

writer.WriteNumberValueAsString( d );
...
d = reader.GetDecimalWithQuotes(); 

Risks

None (since this doesn't weaken the Validation layer).

@olivier-spinelli olivier-spinelli added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Jun 10, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Text.Json untriaged New issue has not been triaged by the area owner labels Jun 10, 2021
@ghost
Copy link

ghost commented Jun 10, 2021

Tagging subscribers to this area: @eiriktsarpalis, @layomia
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and Motivation

The goal is to implement a Json read/write default behavior that correctly handles "big integers" in the context of ECMAScript without data loss and using strings for all numbers.
Important: This is at the Utf8JsonReader/Writer level, not at the serializer level (the actual de/serialization code is generated and uses the reader/writer).

JSON format defines no limitation on numeric values. 64 bits integer and BigInteger can be write and read without loss in JSON.

  • Unfortunately (n°1), ECMAScript only handles 64 bits floats: 64 bits integer (and larger) cannot be write and read back safely. Only integer values between Number.MIN_SAFE_INTEGER and Number.MAX_SAFE_INTEGER (53 bits) are safe on the ECMAScript side.
    Among the basic types, the long (Int64), ulong (UInt64) and Decimal (and of course, the new BigInt) representations are not ECMAScript compliant. They must use a string representation, just like JsonNumberHandling.WriteAsString specifies it.
  • Unfortunately (n°2), this option is NOT available on the Utf8JsonWriter API. It is reserved for the "Serialization" layer (and it seems to be a global option: all the numbers are impacted).
  • Unfortunately (n°3), the APIs to write (WriteNumberValueAsString(ulong)) or read (GetUInt64WithQuotes) quoted long or unsigned long (same for Decimal) are not publicly exposed on the Utf8Writer/Reader.

These APIs are de facto required if one want to implement a specific treatement for numbers (and may also help people that deal with strange/exotic/ill-formed Json in input as well as for output). Currently, only the (internal) converters can use them.

Proposed API

The proposal is rather simple since it's about publishing existing internals. Only numeric types that are not "ECMAScript compliant" can be concerned, or all the types.

Minimalist approach (publicly expose the currently internal following methods):

On the Utf8JsonWriter, there is nothing to do:

        void WriteNumberValueAsString(ulong value);
        void WriteNumberValueAsString(long value);
        void WriteNumberValueAsString(decimal value);

On the Utf8JsonReader. The Get/TryGet pattern should be exposed:

       bool TryGetInt64WithQuotes( out long value );
       long GetInt64WithQuotes();
       bool TryGetUInt64WithQuotes( out ulong value );
       ulong GetUInt64WithQuotes();
       bool TryGetDecimalWithQuotes( out decimal value );
       decimal GetDecimalWithQuotes();

Note: The current GetXXXWithQuotes implementations are called by converters that guard their calls.

Full types covering.

For the sake of completness, the same API for other numbers (that are "ECMAScript compliant") can be exposed: byte, sbyte, short, ushort, etc.

Alternative Designs

The Get name pattern may be changed from [Try]GetXXXWithQuotes to [Try]GetQuotedXXX or (to enforce the parallel with the writer) to [Try]GetXXXFromString.

Risks

None (since this doesn't weaken the Validation layer).

Author: olivier-spinelli
Assignees: -
Labels:

api-suggestion, area-System.Text.Json, untriaged

Milestone: -

@olivier-spinelli
Copy link
Author

My personal choice would be the "FromString" naming:

writer.WriteNumberValueAsString( d );
...
d = reader.GetDecimalFromString(); 

@eiriktsarpalis eiriktsarpalis added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed untriaged New issue has not been triaged by the area owner labels Jul 16, 2021
@eiriktsarpalis eiriktsarpalis added this to the 7.0.0 milestone Jul 16, 2021
@eiriktsarpalis eiriktsarpalis added the api-needs-work API needs work before it is approved, it is NOT ready for implementation label Jul 16, 2021
@eiriktsarpalis
Copy link
Member

If I'm understanding the request correctly, it should still be possible to achieve this using a custom converter that converts the number to a string and passing to either of the existing writer.WriteStringValue/reader.GetString methods. This can result in more allocations but you can still avoid these if you're willing to go the extra mile and do a bit of buffer management on the converter level.

Honestly, I don't believe this meets the bar for inclusion to System.Text.Json. The appropriate layer for expressing these types of concerns is converter layer and I believe the existing APIs should be sufficient to achieve this.

@olivier-spinelli
Copy link
Author

olivier-spinelli commented Oct 15, 2021

So you and/or Microsoft and/or the official Json serializer layer would ever be the only one to implement basically optimized (understand: not stupidly unefficient) serialization code?

I may have missed to explain one important thing: I'm working at the writer level, I have NO "serializer" at hand (this beast is "above" the level of the generated code that implement serialization).

May you, please, reconsider this (hard to understand) answer?

@eiriktsarpalis
Copy link
Member

So you and/or Microsoft and/or the official Json serializer layer would ever be the only one to implement basically optimized (understand: not stupidly unefficient) serialization code?

I'm not sure I follow, would it not be possible to implement a similarly efficient method on your side if necessary? Utf8JsonWriter exposes string writing overloads that permit serializing pre-encoded buffers and as of .NET 6 we there's the new WriteRaw methods. For the deserialization side you can use the ValueSpan and ValueSequence properties, although care needs to be taken so that escaping is taken into account: we are planning to address that kink with #54410 in .NET 7.

I may have missed to explain one important thing: I'm working at the writer level, I have NO "serializer" at hand (this beast is "above" the level of the generated code that implement serialization).

I'm sure this should not be a blocker, since the above could be expressed as extension methods on the writer and reader classes.

@olivier-spinelli
Copy link
Author

With the new Raw write, you're right. I will happily have to duplicate code... ;)
Thanks for your time.

@eiriktsarpalis
Copy link
Member

With the new Raw write, you're right. I will happily have to duplicate code... ;)

We can revisit if there is strong evidence that lots of people need this, but in practice there are always minute variations in user requirements. In the interest of keeping the library size small, we prioritize having a small set of APIs that unblock users (even if it does require a bit of boilerplate on their side) rather than exposing functionality for every conceivable use case.

@ghost ghost locked as resolved and limited conversation to collaborators Nov 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Text.Json needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration
Projects
None yet
Development

No branches or pull requests

2 participants