Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synchronous deserialize and serialize to a stream with JsonSerializer. #1574

Closed
niemyjski opened this issue Sep 24, 2019 · 14 comments · Fixed by #54632
Closed

Synchronous deserialize and serialize to a stream with JsonSerializer. #1574

niemyjski opened this issue Sep 24, 2019 · 14 comments · Fixed by #54632
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Text.Json
Milestone

Comments

@niemyjski
Copy link

We have a serializer interface (https://github.com/FoundatioFx/Foundatio/blob/master/src/Foundatio/Serializer/ISerializer.cs#L6-L9) which takes a stream as the input and we have extensions that work from it. In our use cases the payloads are usually very small and fit inside normal buffer lengths. Now I totally understand async io and the benefits but we use in memory streams for our serialization and we don't want to do async serialization as it totally kills performance and really complicates some scenarios (esp in messaging handlers at scale). Can you add overloads for this? It would make it a little bit easier for this use case?

I propose that we add overloads to both System.Text.Json.JsonSerializer.Serialize and System.Text.Json.JsonSerializer.Deserialize that take streams. As @ahsonkhan provided the following code which sums up what needs to happen signature wise.

public static partial class JsonSerializer
{
    // Existing APIs:
    public static ValueTask<object> DeserializeAsync(Stream utf8Json, 
        Type returnType, 
        JsonSerializerOptions options = null, 
        CancellationToken cancellationToken = default);
    
    public static ValueTask<TValue> DeserializeAsync<TValue>(Stream utf8Json,
        JsonSerializerOptions options = null,
        CancellationToken cancellationToken = default);

    public static Task SerializeAsync(Stream utf8Json, 
        object value, 
        Type inputType, 
        JsonSerializerOptions options = null, 
        CancellationToken cancellationToken = default);
    
    public static Task SerializeAsync<TValue>(Stream utf8Json, 
        TValue value, 
        JsonSerializerOptions options = null, 
        CancellationToken cancellationToken = default);

    // Proposal to add following synchronous APIs:
    public static object Deserialize(Stream utf8Json, 
        Type returnType, 
        JsonSerializerOptions options = null);
    
    public static TValue Deserialize<TValue>(Stream utf8Json,
        JsonSerializerOptions options = null);

    public static void Serialize(Stream utf8Json, 
        object value, 
        Type inputType, 
        JsonSerializerOptions options = null);
    
    public static void Serialize<TValue>(Stream utf8Json, 
        TValue value, 
        JsonSerializerOptions options = null);
}

Edit by @ahsonkhan: Just modified some indentation/spacing so the API surface aligns better.

@Wraith2
Copy link
Contributor

Wraith2 commented Sep 24, 2019

Can you add overloads for this?

To what? You haven't actually said what class(es) you'd like to change.

@niemyjski
Copy link
Author

niemyjski commented Sep 24, 2019

System.Text.Json.JsonSerializer.Serialize(stream, data, options) and System.Text.Json.JsonSerializer.Deserialize(stream, options)

@niemyjski niemyjski changed the title Synchronous deserialize and serialize to a stream. Synchronous deserialize and serialize to a stream with JsonSerializer. Sep 24, 2019
@scalablecory
Copy link
Contributor

we don't want to do async serialization as it totally kills performance

Can you quantify this? A lot of work has been done over the life of async to bring overhead down, and it would be useful to have a nice example of what is missing.

@niemyjski
Copy link
Author

niemyjski commented Sep 25, 2019

Most of our use cases where we found async serialization bad is a lot of message bus message handlers don't support async and now you are doing async over sync. A second scenario would be deserializing a cache hit say from Redis. Also the high rate of serialization calls leads to excessive state machine generation and more work when it could just be run sync. We did a lot of profiling in (https://github.com/exceptionless/Exceptionless) with the amazing help of @redknightlois. On the flip side I totally get you might be doing sync io over an io stream (which could be really bad).

The following three code examples shows off that you don't always have a happy path for async serialization and even if you did you probably didn't want to do async because these calls need to happen quickly and sync ((I could be wrong on this) If I deserialized a message, but it was awaited and now a message could arrive out of order).

  1. https://github.com/FoundatioFx/Foundatio.AzureServiceBus/blob/master/src/Foundatio.AzureServiceBus/Messaging/AzureServiceBusMessageBus.cs#L67
  2. https://github.com/FoundatioFx/Foundatio.RabbitMQ/blob/master/src/Foundatio.RabbitMQ/Messaging/RabbitMQMessageBus.cs#L84
  3. https://github.com/FoundatioFx/Foundatio.Redis/blob/master/src/Foundatio.Redis/Messaging/RedisMessageBus.cs#L44

@ahsonkhan
Copy link
Member

ahsonkhan commented Sep 25, 2019

public static partial class JsonSerializer
{
    // Existing APIs:
    public static ValueTask<object> DeserializeAsync(Stream utf8Json, 
        Type returnType, 
        JsonSerializerOptions options = null, 
        CancellationToken cancellationToken = default);
    
    public static ValueTask<TValue> DeserializeAsync<TValue>(Stream utf8Json,
        JsonSerializerOptions options = null,
        CancellationToken cancellationToken = default);

    public static Task SerializeAsync(Stream utf8Json, 
        object value, 
        Type inputType, 
        JsonSerializerOptions options = null, 
        CancellationToken cancellationToken = default);
    
    public static Task SerializeAsync<TValue>(Stream utf8Json, 
        TValue value, 
        JsonSerializerOptions options = null, 
        CancellationToken cancellationToken = default);

    // Proposal to add following synchronous APIs:
    public static object Deserialize(Stream utf8Json, 
        Type returnType, 
        JsonSerializerOptions options = null);
    
    public static TValue Deserialize<TValue>(Stream utf8Json,
        JsonSerializerOptions options = null);

    public static void Serialize(Stream utf8Json, 
        object value, 
        Type inputType, 
        JsonSerializerOptions options = null);
    
    public static void Serialize<TValue>(Stream utf8Json, 
        TValue value, 
        JsonSerializerOptions options = null);
}

@niemyjski - can you update your original post with the API proposal?

As a workaround (for now), you could write JSON to a byte[]/string using the existing sync overloads, and then copy that over to the underlying stream yourself.

Edit: Just modified some indentation/spacing so the API surface aligns better.

@niemyjski
Copy link
Author

Thanks! I've updated the original post and I've already started going down the path this morning to write wrappers for converting to byte[]/string :).

@ahsonkhan
Copy link
Member

ahsonkhan commented Sep 26, 2019

In our use cases the payloads are usually very small and fit inside normal buffer lengths. Now I totally understand async io and the benefits but we use in memory streams for our serialization and we don't want to do async serialization as it totally kills performance and really complicates some scenarios (esp in messaging handlers at scale).

Do you have benchmarks or profile data that show-cases the performance impact of using async over sync for calls that always complete synchronously (which they should in your case of using memory streams)? It would be really helpful if you could share that so we can understand the perf issue better. As @scalablecory mentioned, the overhead should be really small and not impact overall perf at all (especially on deserialization where we return a valuetask).

A second scenario would be deserializing a cache hit say from Redis. Also the high rate of serialization calls leads to excessive state machine generation and more work when it could just be run sync. We did a lot of profiling in (https://github.com/exceptionless/Exceptionless) with the amazing help of @redknightlois

even if you did you probably didn't want to do async because these calls need to happen quickly and sync

Can you please share the perf numbers for this? Does changing these calls to a sync over async wrapper show up as a regression?

cc @stephentoub

@niemyjski
Copy link
Author

niemyjski commented Sep 27, 2019

I'll try to dig up the specific benchmark results (not sure If I have them anymore as I've gotten a new machine since then and slack history is limited, but I'll look) when we did it was in Sep 2017 and I know I logged them someplace.. I know it was pretty massive the improvements we did around that time. I think It was like ~40% improvement in cpu usage from just serializer changes. Combined with other changes at the time (we were benchmarking for months and also converted to core) that we did caused our azure app instance to look like this (I'll try to find the specific benchmark results).
cpu usage
memory usage

I'm not sure if @redknightlois has any more information, but it came down to where the async operations with serialization were basically wrapped sync operations (due to being local in memory data / payloads) which added up big time in execution time cost + time spent. It was the hottest spot in our app.

@ericstj ericstj transferred this issue from dotnet/corefx Jan 9, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Text.Json untriaged New issue has not been triaged by the area owner labels Jan 9, 2020
@ahsonkhan ahsonkhan added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Jan 25, 2020
@ahsonkhan ahsonkhan added this to the 5.0 milestone Jan 25, 2020
@ahsonkhan ahsonkhan removed the untriaged New issue has not been triaged by the area owner label Jan 25, 2020
@ahsonkhan
Copy link
Member

@niemyjski - were you able to find benchmarks that showcased performance regressions when doing async serialization compared to sync only (preferable ones that are reproducible/isolated)?

@ahsonkhan
Copy link
Member

From @CodeBlanch in #31669:

Proposed API

The current System.Text.Json API supports these two methods:

Task SerializeAsync<TValue>(Stream utf8Json, TValue value, JsonSerializerOptions? options = null, CancellationToken cancellationToken = default);
Task SerializeAsync(Stream utf8Json, object? value, Type inputType, JsonSerializerOptions? options = null, CancellationToken cancellationToken = default);

I'm proposing we add:

void Serialize<TValue>(Stream utf8Json, TValue value, JsonSerializerOptions? options = null);
void Serialize(Stream utf8Json, object? value, Type inputType, JsonSerializerOptions? options = null);

Which are basically overloads for synchronized writing to a Stream.

Rationale & Usage

I have a logging library that maintains its own dedicated thread for pumping messages to a file via FileStream. The current API does not provide an efficient way to handle this.

Using the existing async API:

		private void DrainMessagesThreadBody()
		{
			Stream LogFile = _LogFileManager.FindCurrentLogFile();

			while (true)
			{
				if (!_Messages.TryDequeue(out Message Message))
					break;

				SerializeMessageToJson(LogFile, Message).GetAwaiter().GetResult();
			}
		}

		private async Task SerializeMessageToJson(Stream stream, Message message)
		{
			await JsonSerializer.SerializeAsync(stream, message, _JsonOptions).ConfigureAwait(false);

			await stream.WriteAsync(s_NewLine, 0, s_NewLine.Length).ConfigureAwait(false);

			await stream.FlushAsync().ConfigureAwait(false);
		}

This is the "Sync over Async" anti-pattern. There is a dedicated thread but it has to use the pool to do its work.

Using the existing sync API:

		private void DrainMessagesThreadBody()
		{
			Stream LogFile = _LogFileManager.FindCurrentLogFile();

			while (true)
			{
				if (!_Messages.TryDequeue(out Message Message))
					break;

				SerializeMessageToJson(LogFile, Message);
			}
		}

		private void SerializeMessageToJson(Stream stream, Message message)
		{
			BufferWriter BufferWriter = new BufferWriter();
			using (Utf8JsonWriter Writer = new Utf8JsonWriter(BufferWriter))
			{
				JsonSerializer.Serialize(Writer, message, _JsonOptions);
			}
			BufferWriterr.WriteToStream(stream);
			stream.Write(s_NewLine, 0, s_NewLine.Length);
			stream.Flush();
		}

Works fine, but the problem is BufferWriter doesn't exist. I had to roll that based on the System.Text.Json PooledByteBufferWriter, which is internal. That is a lot of heavy lifting in order to make the usage efficent via buffer reuse. What I really wanted to do is reuse the pooling buffer code already in the lib (used by SerializeAsync calls).

Here's how it would look using the proposed API:

		private void DrainMessagesThreadBody()
		{
			Stream LogFile = _LogFileManager.FindCurrentLogFile();

			while (true)
			{
				if (!_Messages.TryDequeue(out Message Message))
					break;

				SerializeMessageToJson(LogFile, Message);
			}
		}

		private void SerializeMessageToJson(Stream stream, Message message)
		{
			JsonSerializer.Serialize(stream, message, _JsonOptions);
			stream.Write(s_NewLine, 0, s_NewLine.Length);
			stream.Flush();
		}

With the new API caller doesn't need to worry about correct usage (buffer pooling), it just happens under the hood as it does with SerializeAsync calls.

@niemyjski
Copy link
Author

niemyjski commented Feb 21, 2020

@ahsonkhan I was not able to find the benchmarks when I looked :\ . I thought I wrote them down for release notes but I must have lost them when my mac motherboard went...

@Mpdreamz
Copy link
Contributor

Has this API change been accepted? Noticed it was put in the 5.0 milestone, looking to confirm. Performance reasons aside it would be great to have these overloads ship out of the box for method signature parity reasons.

@eiriktsarpalis eiriktsarpalis modified the milestones: Future, 6.0.0 Feb 25, 2021
@steveharter steveharter self-assigned this Mar 24, 2021
@steveharter steveharter added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Mar 24, 2021
@steveharter
Copy link
Member

The workaround is to manually convert to\from byte[]. For deserialize:

ReadOnlySpan<byte> utf8Json = DrainStream(stream); // DrainStream manages the buffer allocation(s) and pooling
Poco poco = JsonSerializer.Deserialize<Poco>(utf8Json);    

and for serialize:

byte[] utf8Json = JsonSerializer.SerializeToUtf8Bytes(poco);
stream.Write(uf8Json.AsSpan());

Besides avoiding having to implement "DrainStream()", the main benefit of having JsonSerializer.Deserialize(stream) do this is that the serializer, as it also does for the DeserializeAsync(), doesn't drain the Stream upfront and thus minimizes the buffer size - it would do this by calling stream.Read() in manageable chunks (~16K each time by default) while deserializing a graph. The same also applies to JsonSerializer.Serialize(stream, value) where stream.Write() is called multiple times to minimize the buffer size.

However, given the original description of small payloads, the workaround is probably sufficient for that scenario. In any case, the API request is valid and avoids large buffer issues and having to implement "DrainStream()".

@bartonjs
Copy link
Member

bartonjs commented May 18, 2021

Video

Adding synchronous method parity sounds good.

namespace System.Text.Json
{
    partial class JsonSerializer
    {
        // Proposal to add following synchronous APIs:
        public static object Deserialize(Stream utf8Json, 
            Type returnType, 
            JsonSerializerOptions options = null);
        
        public static TValue Deserialize<TValue>(Stream utf8Json,
            JsonSerializerOptions options = null);
    
        public static void Serialize(Stream utf8Json, 
            object value, 
            Type inputType, 
            JsonSerializerOptions options = null);
        
        public static void Serialize<TValue>(Stream utf8Json, 
            TValue value, 
            JsonSerializerOptions options = null);
    }
}

Also add the overloads for the new JsonTypeInfo<T> versions from the source generator.

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels May 18, 2021
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jun 23, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jun 24, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Jul 24, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Text.Json
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants