-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Synchronous deserialize and serialize to a stream with JsonSerializer. #1574
Comments
To what? You haven't actually said what class(es) you'd like to change. |
|
Can you quantify this? A lot of work has been done over the life of async to bring overhead down, and it would be useful to have a nice example of what is missing. |
Most of our use cases where we found async serialization bad is a lot of message bus message handlers don't support async and now you are doing async over sync. A second scenario would be deserializing a cache hit say from Redis. Also the high rate of serialization calls leads to excessive state machine generation and more work when it could just be run sync. We did a lot of profiling in (https://github.com/exceptionless/Exceptionless) with the amazing help of @redknightlois. On the flip side I totally get you might be doing sync io over an io stream (which could be really bad). The following three code examples shows off that you don't always have a happy path for async serialization and even if you did you probably didn't want to do async because these calls need to happen quickly and sync ((I could be wrong on this) If I deserialized a message, but it was awaited and now a message could arrive out of order).
|
public static partial class JsonSerializer
{
// Existing APIs:
public static ValueTask<object> DeserializeAsync(Stream utf8Json,
Type returnType,
JsonSerializerOptions options = null,
CancellationToken cancellationToken = default);
public static ValueTask<TValue> DeserializeAsync<TValue>(Stream utf8Json,
JsonSerializerOptions options = null,
CancellationToken cancellationToken = default);
public static Task SerializeAsync(Stream utf8Json,
object value,
Type inputType,
JsonSerializerOptions options = null,
CancellationToken cancellationToken = default);
public static Task SerializeAsync<TValue>(Stream utf8Json,
TValue value,
JsonSerializerOptions options = null,
CancellationToken cancellationToken = default);
// Proposal to add following synchronous APIs:
public static object Deserialize(Stream utf8Json,
Type returnType,
JsonSerializerOptions options = null);
public static TValue Deserialize<TValue>(Stream utf8Json,
JsonSerializerOptions options = null);
public static void Serialize(Stream utf8Json,
object value,
Type inputType,
JsonSerializerOptions options = null);
public static void Serialize<TValue>(Stream utf8Json,
TValue value,
JsonSerializerOptions options = null);
} @niemyjski - can you update your original post with the API proposal? As a workaround (for now), you could write JSON to a byte[]/string using the existing sync overloads, and then copy that over to the underlying stream yourself. Edit: Just modified some indentation/spacing so the API surface aligns better. |
Thanks! I've updated the original post and I've already started going down the path this morning to write wrappers for converting to byte[]/string :). |
Do you have benchmarks or profile data that show-cases the performance impact of using async over sync for calls that always complete synchronously (which they should in your case of using memory streams)? It would be really helpful if you could share that so we can understand the perf issue better. As @scalablecory mentioned, the overhead should be really small and not impact overall perf at all (especially on deserialization where we return a valuetask).
Can you please share the perf numbers for this? Does changing these calls to a sync over async wrapper show up as a regression? cc @stephentoub |
I'll try to dig up the specific benchmark results (not sure If I have them anymore as I've gotten a new machine since then and slack history is limited, but I'll look) when we did it was in Sep 2017 and I know I logged them someplace.. I know it was pretty massive the improvements we did around that time. I think It was like ~40% improvement in cpu usage from just serializer changes. Combined with other changes at the time (we were benchmarking for months and also converted to core) that we did caused our azure app instance to look like this (I'll try to find the specific benchmark results). I'm not sure if @redknightlois has any more information, but it came down to where the async operations with serialization were basically wrapped sync operations (due to being local in memory data / payloads) which added up big time in execution time cost + time spent. It was the hottest spot in our app. |
@niemyjski - were you able to find benchmarks that showcased performance regressions when doing async serialization compared to sync only (preferable ones that are reproducible/isolated)? |
From @CodeBlanch in #31669:
|
@ahsonkhan I was not able to find the benchmarks when I looked :\ . I thought I wrote them down for release notes but I must have lost them when my mac motherboard went... |
Has this API change been accepted? Noticed it was put in the |
The workaround is to manually convert to\from byte[]. For deserialize: ReadOnlySpan<byte> utf8Json = DrainStream(stream); // DrainStream manages the buffer allocation(s) and pooling
Poco poco = JsonSerializer.Deserialize<Poco>(utf8Json); and for serialize: byte[] utf8Json = JsonSerializer.SerializeToUtf8Bytes(poco);
stream.Write(uf8Json.AsSpan()); Besides avoiding having to implement "DrainStream()", the main benefit of having However, given the original description of small payloads, the workaround is probably sufficient for that scenario. In any case, the API request is valid and avoids large buffer issues and having to implement "DrainStream()". |
Adding synchronous method parity sounds good. namespace System.Text.Json
{
partial class JsonSerializer
{
// Proposal to add following synchronous APIs:
public static object Deserialize(Stream utf8Json,
Type returnType,
JsonSerializerOptions options = null);
public static TValue Deserialize<TValue>(Stream utf8Json,
JsonSerializerOptions options = null);
public static void Serialize(Stream utf8Json,
object value,
Type inputType,
JsonSerializerOptions options = null);
public static void Serialize<TValue>(Stream utf8Json,
TValue value,
JsonSerializerOptions options = null);
}
} Also add the overloads for the new |
We have a serializer interface (https://github.com/FoundatioFx/Foundatio/blob/master/src/Foundatio/Serializer/ISerializer.cs#L6-L9) which takes a stream as the input and we have extensions that work from it. In our use cases the payloads are usually very small and fit inside normal buffer lengths. Now I totally understand async io and the benefits but we use in memory streams for our serialization and we don't want to do async serialization as it totally kills performance and really complicates some scenarios (esp in messaging handlers at scale). Can you add overloads for this? It would make it a little bit easier for this use case?
I propose that we add overloads to both
System.Text.Json.JsonSerializer.Serialize
andSystem.Text.Json.JsonSerializer.Deserialize
that take streams. As @ahsonkhan provided the following code which sums up what needs to happen signature wise.Edit by @ahsonkhan: Just modified some indentation/spacing so the API surface aligns better.
The text was updated successfully, but these errors were encountered: