-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support non-allocating view on string properties values of JSON in System.Text.Json.Utf8JsonReader #54410
Comments
Tagging subscribers to this area: @eiriktsarpalis, @layomia Issue DetailsBackground and MotivationCurrently there is no way to get view on string property value of JSON without allocating string, except cases when string property is in fact number,
Non-internally non-allocating view on string properties can be used for creating custom Proposed APInamespace System.Text.Json
{
public ref partial struct Utf8JsonReader
{
+ public void GetChars(Span<char> buffer, out int written){}
+ public void GetChars(char[] buffer, out int written){} // Overload for convenience
+ public bool TryGetChars(Span<char> buffer, out int written){}
+ public bool TryGetChars(char[] buffer, out int written){} // Overload for convenience
// Optional
+ public char[] GetChars(){} // will allocate buffer inside itself
+ public bool TryGetChars(out char[] result){} // will allocate buffer inside itself
}
} Usage Examples// somehow consumer of API need to figure out length of buffer
// may be it somehow can be deducted from Utf8JsonReader.ValueSpan / Utf8JsonReader.ValueSequence lengthes?
Span<char> buffer = stackalloc char[length];
// OR
char[] buffer = ArrayPool<char>.Rent(length);
reader.GetChars(buffer/*.AsSpan()*/, out int written);
// Case with parsing
var classInstance = ClassThatSupportsParsingFromSpanOfChars.Parse(buffer.Slice(0, written);
// Optionally return buffer to ArrayPool<char>
return classInstance;
// OR
// Case with `StringPool` from Microsoft.Toolkit.HighPerformance package
// (https://github.com/windows-toolkit/WindowsCommunityToolkit/blob/main/Microsoft.Toolkit.HighPerformance)
string pooledNonAllocatedString = StringPool.Shared.GetOrAdd(buffer.Slice(0, written));
// Optionally return buffer to ArrayPool<char>
return pooledNonAllocatedString; Alternative DesignsCan't think of any. RisksName NotesWhat should happen in case when provided buffer is not of sufficient length? Should exception be thrown or buffer should be written to max, and when its capacity is full method should return?
|
This is already possible using the Utf8JsonReader reader = ...;
char[] rentedBuffer;
int charLength;
if (reader.HasValueSequence)
{
ReadOnlySequence<byte> bytes = reader.ValueSequence;
int maxCharLength = Encoding.UTF8.GetMaxCharCount(checked((int)bytes.Length));
rentedBuffer = ArrayPool<char>.Shared.Rent(maxCharLength);
charLength = Encoding.UTF8.GetChars(bytes, rentedBuffer);
}
else
{
ReadOnlySpan<byte> bytes = reader.ValueSpan;
int maxCharLength = Encoding.UTF8.GetMaxCharCount(bytes.Length);
rentedBuffer = ArrayPool<char>.Shared.Rent(maxCharLength);
charLength = Encoding.UTF8.GetChars(bytes, rentedBuffer);
}
Span<char> copiedChars = rentedBuffer.AsSpan(0, charLength);
try
{
// consume the chars
}
finally
{
copiedChars.Clear();
ArrayPool<char>.Shared.Return(rentedBuffer);
} I'm not sure how the proposed methods might simplify the above. As you have already mentioned allocating the right char buffer size would still require accessing the raw bytes and decoding the character length. One might also argue that |
|
As @huoyaoyuan noticed, current implementation of
In case of
I don't know much |
I've looked into |
Right, it does not.
I suppose it addresses the theoretical concern that the source |
I guess in such cases |
We should also include an overload that accepts Span<byte> for unescaped Utf8 encoded strings. See #1563 (comment) |
Should i update proposal? Also i thought of another usage for proposed API internally, it can replace usage of |
The concern specifically raised in #23433 is set to be addressed in C# 11 via scoped parameter values. @jaredpar has indicated that it should be possible to add the |
I'm proposing the following API for this change: namespace System.Text.Json
{
public ref partial struct Utf8JsonReader
{
public bool ValueIsEscaped { get; } // API already approved via https://github.com/dotnet/runtime/issues/45167
public void GetString(scoped Span<byte> utf8Destination, out int bytesWritten);
public void GetString(scoped Span<char> destination, out int charsWritten);
}
public partial class JsonEncodedText
{
public static void Unescape(ReadOnlySpan<byte> utf8Value, Span<byte> utf8Destination, out int bytesWritten);
}
} Additional notes:
cc @davidfowl |
Shouldn't Perhaps both enum JsonUnescapingResult
{
Success = 0,
TooSmallDestinationSize = 1
} or is intended behavior is to throw exception is such cases, like runtime/src/libraries/System.Private.CoreLib/src/System/Text/Encoding.cs Lines 819 to 830 in 6ca8c9b
|
Ah excellent, I had forgotten about that one. I'm updating its milestone given that the two feature are related.
My understanding is that this functionality specifically concerns unescaping JSON strings, and different rules might govern unescaping in different contexts.
That's the idea. Currently we only check that the destination buffer is at least as large as the source buffer and throw if it isn't. I think it's good enough for most intents and purposes. |
@N0D4N I've updated your original post and I'm marking the issue as ready for review. |
namespace System.Text.Json;
public ref partial struct Utf8JsonReader
{
// Existing APIs
// public ReadOnlySpan<byte> ValueSpan { get; }
// public ReadOnlySequence<byte> ValueSequence { get; }
//
// public bool ValueIsEscaped { get; } // Whether the JSON string contains escaped characters
// public bool HasValueSequence { get; } // The string can either be stored in a span or a ReadOnlySequence
//
// public string? GetString(); // How we currently decode JSON strings
public readonly int CopyString(Span<byte> utf8Destination);
public readonly int CopyString(Span<char> destination);
} |
void M(Utf8JsonReader reader) {
Span<byte> span = stackalloc byte[42];
reader.CopyString(span); // Error
} If the definition is |
@tannergooding tested this and concluded the compiler accepts this when |
Ah that's right cause the |
Is the policy then to essentially use In my mind it's better to always use |
This was a consideration around S.T.Json needing to target .NET Standard and the possibility (even if niche) around what to do if I'd agree it'd be better to explicitly declare it as scoped still to help enforce that its non-capturing, but we can do/consider that once the feature exists and is usable. |
This won't be an issue. The compiler synthesizes the metadata supporting
Understood. It's a sensible concern.
👍 |
If the compiler synthesizes the attribute then I don't think they are any (real) .NET Standard concerns (assuming While Said differently, I'd say whenever we see an API accepting a |
💯 |
Background and Motivation
Currently there is no way to get view on string property value of JSON without allocating string, except cases when string property is in fact number,
DateTime
or anything thatSystem.Buffers.Text.Utf8Parser
supports.But many converters even inside
System.Text.Json
need string representation of string property only to parse it and don't use anywhere further, for example, such converters are:VersionConverter
which uses allocated string fromGetString()
method only to pass it toTryParse
method which acceptsReadOnlySpan<char>
in one of overloads;CharConverter
which allocates string only to get firstchar
;EnumConverter
, which uses allocated string fromGetString()
method only to pass it toTryParse
method which hasReadOnlySpan<char>
overload as of Add overloads for Enum.Parse/TryParse with ReadOnlySpan<char> #43255.Non-internally non-allocating view on string properties can be used for creating custom
StringConverter
which will be using customStringPool
for example, which will operate on small set of strings but not known at compile time.My proposal is to add methods to
Utf8JsonReader
which will accept buffer of chars where value of string property will be written to.Proposed API
Usage Examples
Get an allocation-free view of the unescaped UTF8 string
Handling of
ValueSpan
representations only:Copying to char buffers
Alternative Designs
Can't think of any.
Risks
Name
GetChars
can be confusing for some users, maybe there can be other, better fit for such method?Notes
What should happen in case when provided buffer is not of sufficient length? Should exception be thrown or buffer should be written to max, and when its capacity is full method should return?
The text was updated successfully, but these errors were encountered: