-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Writing raw JSON values when using Utf8JsonWriter #1784
Comments
Yep. In the initial design discussions of
We could consider revisiting that decision, if there is evidence of a significant performance benefit in doing so. Do you have some benchmarks for your scenario that would potentially show a win or is there a general concern for too much unnecessary work being done?
One problem with trying to expose the destination is that Is your use case specific to uses within
Generally speaking, how large are these nested JSON payloads that are stored in their original form, on average (looking for order of magnitude such as 1 KB, 1 MB, 1 GB)?
What about validation though? Do they care about making sure the JSON is valid though before just writing the string directly calling
No, you aren't missing anything. What you found here is the only way you could do this within the Not knowing the specifics of your scenario, I explored the scenario a bit with some arbitrary JSON data and these are the results: BenchmarkDotNet=v0.12.0, OS=Windows 10.0.19041
Intel Core i7-6700 CPU 3.40GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.100-alpha1-015914
[Host] : .NET Core 5.0.0 (CoreCLR 5.0.19.56303, CoreFX 5.0.19.56306), X64 RyuJIT
Job-DHBJYG : .NET Core 5.0.0 (CoreCLR 5.0.19.56303, CoreFX 5.0.19.56306), X64 RyuJIT
PowerPlanMode=00000000-0000-0000-0000-000000000000
|
Thanks for taking a stub at this and doing some benchmarking! At this point it is a general concern for too much unnecessary work being done - this is part of an ASP.NET Core web app hosted in Azure App Service and I'd like to host on as little nodes as possible using smallest SKU possible, so that we scale horizontally, rather than vertically when needed. This looked like a low-hanging fruit to optimize, if all necessary APIs would exist, of course. Payloads are not big in size (limited by Azure Table Storage entity size anyway), ranging from 1 to 32 KB. But it is an OLTP workload (think online gaming), so we get thousands requests per second and this is not a heavy load yet, of course. Since serialization/deserialization happens within ASP.NET infrastructure, the easiest would be to hook through Validation-wise - it is there, but we only need to validate properties at the root level, which we do. Think the following json: {
"username": "foo",
"accessToken": "bar",
"payload": { "complex": "object" }
} As long as I thought about using Based on your benchmarks I'll probably switch to |
|
What is the workaround? |
@mikemcdougall Going back to good ol' Newtonsoft.Json and its JsonWriter.WriteRawValue. |
Work-around as described above:
string rawJson = ...; // Raw value can also be `byte`s
using (JsonDocument document = JsonDocument.Parse(rawJson))
{
document.RootElement.WriteTo(writer);
} |
We're in a similar situation, we heavily use PG json(b) columns for user defined metadata, our backend reads intentially know nothing about their contents, we'd ideally not pay the price of parsing and serializing something we only pass through. Api proposal seems solid, good to see it's not based on String. For large payloads -- LOH allocs are easier to avoid by deserializing, which we're trying to prevent here -- a We're looking at introducing |
A bit late to the party here, but this issue is what stops me from converting to system.text.json from newtonsoft.json. My issue is related to the GeoJSON format, which is a subset of JSON, used to describe geospatial features. Support for this format and mapping it to C# classes is provided by the NetTopologySuite package. The reading and writing of GeoJson to corresponding C# classes is complex, and not something I want to re-implement. Neither is it something I wish to be included in the standard library. The problem is that a GeoJSON geometry is a complex structure. In order to handle this with Newtonsoft I've done the following: https://gist.github.com/atlefren/4ec30d46af1dde48c4a461abc5b0b3c5 However, I cannot find a way to handle this in system.text.json. As far as I'm concerned, validation and such is no issue here, as the NTS reader/writer is responsible for that part. EDIT: I See that NTS has made this possible on their side, as documented here: https://github.com/NetTopologySuite/NetTopologySuite.IO.GeoJSON/wiki/Using-NTS.IO.GeoJSON4STJ-with-ASP.NET-Core-MVC#basic-usage |
@atlefren did you take a look at the workaround in #1784 (comment)? Does it help? |
Sorry for the noob question but @StephenBonikowsky moving this issue to .NET 6 Committed does it mean that it is planned for .Net 6? (The milestone on the card is "Future", so I doubt it.) (If it is, it's cool! I'd very like to see |
@olivier-spinelli thank you for pointing out that discrepancy. |
Any chance to "open" a little bit the Utf8JsonReader/Writer API? |
@olivier-spinelli per #30194 (comment) please feel free to open a new issue addressing your read requirements. |
Thanks! |
@layomia Apologies if I've misread this on my phone but I'm not sure this would fully support my scenario above? Could it be adapted that way? In that case I want to write complete JSON, not just a property value. Should I perhaps open a specific issue to discuss? |
@stevejgordon I'll circle back with more detail, but yes you'd be able to write complete JSON like top-level JSON objects, objects within JSON arrays, etc., just like the existing |
Update: I made the following changes to the proposal in the description above #1784 (comment)
|
A small remark on the API proposal about the nullable default parameter: public void WriteRawValue(string json, JsonWriteRawOptions? options = null); The May be the following could be more explicit? public void WriteRawValue( string json, JsonWriteRawOptions options = JsonWriteRawOptions.Default ); |
@olivier-spinelli I tried to explain this in the description:
Re:
With the proposed API, passing a Now, say the API we ship now is: public sealed partial class Utf8JsonWriter
{
public void WriteRawValue(ReadOnlySpan<byte> utf8Json, JsonWriteRawOptions options = default) { }
} And we want to add API like this in this future: public struct JsonWriterOptions
{
// Existing
// public JavaScriptEncoder? Encoder { readonly get; set; }
// public bool Indented { get; set; }
// public bool SkipValidation { get; set; }
// Specify write-raw options to use for all WriteRawValue calls on the writer instance
public JsonWriteRawOptions WriteRawOptions { get; set; }
} What would the writer behavior in this scenario be? JsonWriterOptions options = new() { WriteRawOptions = JsonWriterOptions.SkipValidation };
using Utf8JsonWriter writer = new(ms, options);
// Option a: Writer uses `JsonWriterOptions.SkipValidation` as specified globally
// Option b: Writer uses `JsonWriterOptions.Default` as specified in method call
writer.WriteRawValue(blob); // Same thing as writer.WriteRawValue(blob, JsonWriterOptions.Default) If we don't think we'd ever want a "global" |
Ouch! I missed your remark, sorry! If the "global" exists, wouldn't be clearer to use overloads rather than default (of default) values. Something like: public sealed partial class Utf8JsonWriter
{
/// <summary>
/// Writes the value directly according to the specified options.
/// </summary>
public void WriteRawValue( ReadOnlySpan<byte> utf8Json, JsonWriteRawOptions options ) { }
/// <summary>
/// Writes the value directly using this writer's <see cref="JsonWriterOptions.WriteRawOptions" />.
/// </summary>
public void WriteRawValue( ReadOnlySpan<byte> utf8Json ) { }
} But if you don't include it now, these overloads will be really strange :). IMHO whether the global option exists should be settled right now... Now, my very personal thought about this is that this "global" should not be here, here's why: |
@olivier-spinelli yeah I agree that "global" configuration is not needed for this low-level API. Your API sketch does seem like a better approach if we wanted one. I'll revert to simply the following and not propose a global setting now or in the future: public sealed partial class Utf8JsonWriter
{
public void WriteRawValue(ReadOnlySpan<byte> utf8Json, JsonWriteRawOptions options = default) { }
} |
Mentioned in internal chat: if the scenario is that you want to write a [potentially untrusted] payload as an object enveloped by some outer wrapper, make sure you're also checking that incoming data is well-formed UTF-8 and doesn't have malformed |
namespace System.Text.Json
{
public sealed partial class Utf8JsonWriter
{
public void WriteRawValue(ReadOnlySpan<byte> utf8Json, bool skipInputValidation = false);
public void WriteRawValue(ReadOnlySpan<char> json, bool skipInputValidation = false);
public void WriteRawValue(string json, bool skipInputValidation = false);
}
} |
Calling namespace System.Text.Json
{
public sealed partial class Utf8JsonWriter
{
public void RawWriteValue(ReadOnlySpan<byte> utf8Json, bool skipInputValidation = false);
public void RawWriteValue(ReadOnlySpan<char> json, bool skipInputValidation = false);
public void RawWriteValue(string json, bool skipInputValidation = false);
public void RawWrite(JsonEncodedText propertyName, string json, bool skipInputValidation = false);
public void RawWrite(string propertyName, ReadOnlySpan<byte> utf8Json, bool skipInputValidation = false);
}
} Using |
Edited by @layomia
Original proposal by @nahk-ivanov (click to view)
According to the image in the roadmap -
Utf8JsonWriter
would be the lowest level of API we are going to get, but it's missingWriteRawValue
functionality from JSON.NET. Seems like it is "by design", as roadmap statesIn this case, it would be nice to have access to the output stream directly in the custom
JsonConverter
, as there are still scenarios where for performance we want to write raw JSON. It's quite often that application layer doesn't care about particular branches of the JSON document and just wants to pass them through to/from the data store (especially when using NoSQL). What we used to do with JSON.NET in such cases is maintain it as string property and define custom converter that would useWriteRawValue
, which took astring
argument.One of the examples would be using Azure Table Storage, where we want to extract few top-level properties to be PartitionKey/RowKey and store other nested properties in their original JSON form. I don't see any reason why application would unnecessarily deserialize and serialize these parts of the tree - all we need is to just run forward-only reader to return the value of the given property in it's raw form and later write it as such.
I looked at the current
System.Text.Json
APIs and can't find a good way of doing this. It seems like (for performance reasons) we would want to maintain raw data in a binary form now (compared to string previously), which is fine, but there is no way to read/write next value as bytes. The best I could find is to useJsonDocument.ParseValue
+document.RootElement.GetRawText()
when reading it and parse it again into aJsonDocument
when writing to just callWriteTo
.Am I missing some APIs?
Introduction
There are scenarios where customers want to integrate existing, "raw" JSON payloads when writing new JSON payloads with
Utf8JsonWriter
. There's no first-class API for providing this functionality today, but there's a workaround usingJsonDocument
:This implementation builds a metadata database for navigating a JSON document, which is unnecessary given the desired functionality and has a performance overhead.
The goal of this feature is to provide performant and safe APIs to write raw JSON values.
API Proposal
Alternate Design (click to view)
Potential additions
Scenarios
The major considerations for this feature are:
The answers depend on the scenario. Is the raw JSON payload trusted or arbitrary? Is the enveloping of the raw payload done on behalf of users? This proposal seeks to make all of these post-processing options fully configurable by users, allowing them to tweak the behavior in a way that satisfies their performance, correctness, and security requirements.
In this proposal, raw JSON values will not be reformatted to align with whatever whitespace and indentation settings are specified on
JsonWriterOptions
, but left as-is. We can provide a future API to reformat raw JSON values. We'll also not encode values by default but can provide API to do it in the future. If users need all three processing options today, the workaround shown above works for that scenario.Let's go over two of the expected common scenarios, and what API would be configured to enable them.
I have a blob which I think represents JSON content and which I want to envelope, and I need to make sure the envelope & its inner contents remain well-formed
Consider the Azure SDK team which provides a service protocol where the payload content is JSON. Part of the JSON is internal protocol information, and a nested part of the JSON is user provided data. When serializing protocol information, Azure cares about the raw user JSON being structurally valid, and ultimately that the overall JSON payload is valid. They don't wish to alter the formatting of the raw JSON, or preemptively encode it. Given these considerations, they might write their serialization logic as follows:
The resulting JSON payload might look like this:
Notice that no
JsonWriteRawOptions
had to be specified.I have a deliberate sequence of bytes I want to write out on the wire, and I know what I'm doing
Consider a user who needs to format
double
values differently from theUtf8JsonWriter.WriteNumber[X]
methods. Those methods write the shortest possible JSON output required for round-tripping on deserialization. This means that afloat
ordouble
value that can be expressed simply as89
would not be written as89.0
. However, our user needs to send numeric JSON content to a service which treats values without decimal points as integers, and values with decimal points as doubles. This distinction is important to our user. TheWriteRawValue
APIs would allow our custom to format their numeric values as they wish, and write them directly to the destination buffer. In this scenario, their numeric values are trusted, and our user would not want the writer to perform any encoding or structural validation on their raw JSON payloads, to give the fastest possible perf. To satisfy this trusted scenario, our user might write serialization logic as follows:The resulting JSON payload might look like this:
Notes
WriteRawValue
APIs would work not just for writing raw values as JSON property values (i.e. as POCO properties and dictionary values), but also as root level JSON objects, arrays, and primitives; and as JSON array elements. This functions in the same way as the existingUtf8JsonWriter.WriteXValue
methods.What's next?
The text was updated successfully, but these errors were encountered: