Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing raw JSON values when using Utf8JsonWriter #1784

Closed
nahk-ivanov opened this issue Jan 15, 2020 · 36 comments · Fixed by #54254
Closed

Writing raw JSON values when using Utf8JsonWriter #1784

nahk-ivanov opened this issue Jan 15, 2020 · 36 comments · Fixed by #54254
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Text.Json tenet-performance Performance related issue
Milestone

Comments

@nahk-ivanov
Copy link

nahk-ivanov commented Jan 15, 2020

Edited by @layomia

Original proposal by @nahk-ivanov (click to view)

According to the image in the roadmap - Utf8JsonWriter would be the lowest level of API we are going to get, but it's missing WriteRawValue functionality from JSON.NET. Seems like it is "by design", as roadmap states

Does not allow writing invalid JSON according to the JSON RFC

In this case, it would be nice to have access to the output stream directly in the custom JsonConverter, as there are still scenarios where for performance we want to write raw JSON. It's quite often that application layer doesn't care about particular branches of the JSON document and just wants to pass them through to/from the data store (especially when using NoSQL). What we used to do with JSON.NET in such cases is maintain it as string property and define custom converter that would use WriteRawValue, which took a string argument.

One of the examples would be using Azure Table Storage, where we want to extract few top-level properties to be PartitionKey/RowKey and store other nested properties in their original JSON form. I don't see any reason why application would unnecessarily deserialize and serialize these parts of the tree - all we need is to just run forward-only reader to return the value of the given property in it's raw form and later write it as such.

I looked at the current System.Text.Json APIs and can't find a good way of doing this. It seems like (for performance reasons) we would want to maintain raw data in a binary form now (compared to string previously), which is fine, but there is no way to read/write next value as bytes. The best I could find is to use JsonDocument.ParseValue + document.RootElement.GetRawText() when reading it and parse it again into a JsonDocument when writing to just call WriteTo.

Am I missing some APIs?


Introduction

There are scenarios where customers want to integrate existing, "raw" JSON payloads when writing new JSON payloads with Utf8JsonWriter. There's no first-class API for providing this functionality today, but there's a workaround using JsonDocument:

// UTF8 representation of {"Hello":"World"}
byte[] rawJson = GetRawPayload();

writer.WriteStartObject();
writer.WritePropertyName("Payload");

// No easy way to write this raw JSON payload. Here's a workaround:
using (JsonDocument document = JsonDocument.Parse(rawJson))
{
    document.RootElement.WriteTo(writer);
}

writer.WriteEndObject():

This implementation builds a metadata database for navigating a JSON document, which is unnecessary given the desired functionality and has a performance overhead.

The goal of this feature is to provide performant and safe APIs to write raw JSON values.

API Proposal

namespace System.Text.Json
{
  public sealed partial class Utf8JsonWriter
  {
    // Writes the span of bytes directly as JSON content.
    public void WriteRawValue(ReadOnlySpan<byte> utf8Json, bool skipValidation = false) { }

    // Writes the span of bytes directly as JSON content.
    public void WriteRawValue(ReadOnlySpan<char> json, bool skipValidation = false) { }

    // Writes the span of bytes directly as JSON content.
    public void WriteRawValue(string json, bool skipValidation = false) { }
  }
}
Alternate Design (click to view)
namespace System.Text.Json
{
  // Specifies processing options when writing raw JSON values.
  [Flags]
  public enum JsonWriteRawOptions
  {
    // Raw JSON values will be validated for structural correctness and encoded if required.
    Default = 0,

    // Whether to skip validation. Independent of JsonWriterOptions.SkipValidation.
    SkipValidation = 1,

    // Whether to skip encoding specified via JsonWriterOptions.Encoder.
    SkipEncoding = 2,
  }

  public sealed partial class Utf8JsonWriter
  {
    // Writes the span of bytes directly as JSON content, according to the specified options.
    public void WriteRawValue(ReadOnlySpan<byte> utf8Json, JsonWriteRawOptions options = default) { }
  }
}

Potential additions

namespace System.Text.Json
{
  [Flags]
  public enum JsonWriteRawOptions
  {
    // Existing
    // Default = 0,
    // SkipValidation = 1,
    // SkipEncoding = 2,

    // Whether to reformat to raw payload according to the JsonWriterOptions, or write it as-is.
    // Reformatting is influenced by JsonWriterOptions.Indented, and may result in whitespace changes.
    Reformat = 4
  }

  public sealed partial class Utf8JsonWriter
  {
    // Existing
    // public void WriteRawValue(ReadOnlySpan<byte> utf8Json, JsonWriteRawOptions? options = null) { }

    // Write raw value overload that takes ROS<char>.
    public void WriteRawValue(ReadOnlySpan<char> json, JsonWriteRawOptions options = default) { }
    
    // Write raw value overload that takes string.
    public void WriteRawValue(string json, JsonWriteRawOptions options = default) { }
  }
}

Scenarios

The major considerations for this feature are:

  • Should we validate raw payloads for structural correctness?
  • Should we encode the raw JSON payload according to the writer options?
  • Should we maintain the formatting of the raw payload when writing, or should we re-format it according to the writer options?

The answers depend on the scenario. Is the raw JSON payload trusted or arbitrary? Is the enveloping of the raw payload done on behalf of users? This proposal seeks to make all of these post-processing options fully configurable by users, allowing them to tweak the behavior in a way that satisfies their performance, correctness, and security requirements.

In this proposal, raw JSON values will not be reformatted to align with whatever whitespace and indentation settings are specified on JsonWriterOptions, but left as-is. We can provide a future API to reformat raw JSON values. We'll also not encode values by default but can provide API to do it in the future. If users need all three processing options today, the workaround shown above works for that scenario.

Let's go over two of the expected common scenarios, and what API would be configured to enable them.

I have a blob which I think represents JSON content and which I want to envelope, and I need to make sure the envelope & its inner contents remain well-formed

Consider the Azure SDK team which provides a service protocol where the payload content is JSON. Part of the JSON is internal protocol information, and a nested part of the JSON is user provided data. When serializing protocol information, Azure cares about the raw user JSON being structurally valid, and ultimately that the overall JSON payload is valid. They don't wish to alter the formatting of the raw JSON, or preemptively encode it. Given these considerations, they might write their serialization logic as follows:

JsonWriterOptions writerOptions = new() { WriteIndented = true };

using MemoryStream ms = new();
using UtfJsonWriter writer = new(ms, writerOptions);

writer.WriteStartObject();
writer.WriteString("id", protocol.Id);
writer.WriteString("eventType", protocol.EventType);
writer.WriteString("eventTime", DateTimeOffset.UtcNow);
// Write user-provided data.
writer.WritePropertyName("data");
writer.WriteRawValue(protocol.UserBlob);
writer.WriteEndObject();

The resulting JSON payload might look like this:

{
    "id": "a20299ee-f239-42ce-8eee-2da562848fbe",
    "eventType": "Microsoft.Resources.ResourceWriteSuccess",
    "eventTime": "2021-06-10T14:29:10.3363984+00:00",
    "data":{"auth":"{az_resource_mgr_auth}","correlationId":"76d8b695-e98b-4fb6-81c4-1edc1aa22dfc","tenantId":"76d8b695-e98b-4fb6-81c4-1edc1aa22dfc"}
}

Notice that no JsonWriteRawOptions had to be specified.

I have a deliberate sequence of bytes I want to write out on the wire, and I know what I'm doing

Consider a user who needs to format double values differently from the Utf8JsonWriter.WriteNumber[X] methods. Those methods write the shortest possible JSON output required for round-tripping on deserialization. This means that a float or double value that can be expressed simply as 89 would not be written as 89.0. However, our user needs to send numeric JSON content to a service which treats values without decimal points as integers, and values with decimal points as doubles. This distinction is important to our user. The WriteRawValue APIs would allow our custom to format their numeric values as they wish, and write them directly to the destination buffer. In this scenario, their numeric values are trusted, and our user would not want the writer to perform any encoding or structural validation on their raw JSON payloads, to give the fastest possible perf. To satisfy this trusted scenario, our user might write serialization logic as follows:

JsonWriterOptions writerOptions = new() { WriteIndented = true, };

using MemoryStream ms = new();
using UtfJsonWriter writer = new(ms, writerOptions);

writer.WriteStartObject();
writer.WritePropertyName("dataType", "CalculationResults");

writer.WriteStartArray("data");

foreach (CalculationResult result in results)
{
    writer.WriteStartObject();
    writer.WriteString("measurement", result.Measurement);

    writer.WritePropertyName("value");
    // Write raw JSON numeric value
    byte[] formattedValue = FormatNumberValue(resultValue);
    writer.WriteRawValue(formattedValue, skipValidation: true);

    writer.WriteEndObject();
}

writer.WriteEndArray();
writer.WriteEndObject();

The resulting JSON payload might look like this:

{
    "dataType": "CalculationResults",
    "data": [
        {
            "measurement": "Min",
            "value": 50.4
        },
        {
            "measurement": "Max",
            "value": 2
        }
    ]
}

Notes

  • The WriteRawValue APIs would work not just for writing raw values as JSON property values (i.e. as POCO properties and dictionary values), but also as root level JSON objects, arrays, and primitives; and as JSON array elements. This functions in the same way as the existing Utf8JsonWriter.WriteXValue methods.

What's next?

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Text.Json untriaged New issue has not been triaged by the area owner labels Jan 15, 2020
@ahsonkhan
Copy link
Member

Seems like it is "by design"

Yep. In the initial design discussions of Utf8JsonWriter, we had considered adding WriteRaw APIs, but decided against it: https://github.com/dotnet/corefx/issues/33552

there are still scenarios where for performance we want to write raw JSON

We could consider revisiting that decision, if there is evidence of a significant performance benefit in doing so. Do you have some benchmarks for your scenario that would potentially show a win or is there a general concern for too much unnecessary work being done?

In this case, it would be nice to have access to the output stream directly in the custom JsonConverter

One problem with trying to expose the destination is that Utf8JsonWriter could have been constructed to write to different output destinations: Stream (like FileStream) or IBufferWriter<byte> (like ArrayBufferWriter<byte>).

Is your use case specific to uses within JsonConverter? If you are passing the output stream to the JsonSerializer, do you still have access to it?

One of the examples would be using Azure Table Storage, where we want to extract few top-level properties to be PartitionKey/RowKey and store other nested properties in their original JSON form.

Generally speaking, how large are these nested JSON payloads that are stored in their original form, on average (looking for order of magnitude such as 1 KB, 1 MB, 1 GB)?

It's quite often that application layer doesn't care about particular branches of the JSON document and just wants to pass them through to/from the data store (especially when using NoSQL).

What about validation though? Do they care about making sure the JSON is valid though before just writing the string directly calling WriteRawValue or is it assumed the JSON to already be validated elsewhere (like in your case where the JSON retrieved was previously written to storage by the JsonSerializer)?

The best I could find is to use JsonDocument.ParseValue + document.RootElement.GetRawText() when reading it and parse it again into a JsonDocument when writing to just call WriteTo.

Am I missing some APIs?

No, you aren't missing anything. What you found here is the only way you could do this within the JsonConverter. However, one thing you could do to avoid calling GetRawText and an extra re-parse is instead of storing the JSON as string to write later, store the JsonElement from the JsonDocument and call WriteTo on it. Is that feasible for your scenario? It still won't beat the performance of writing the raw bytes directly, but it should improve significantly.

Not knowing the specifics of your scenario, I explored the scenario a bit with some arbitrary JSON data and these are the results:

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.19041
Intel Core i7-6700 CPU 3.40GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.100-alpha1-015914
  [Host]     : .NET Core 5.0.0 (CoreCLR 5.0.19.56303, CoreFX 5.0.19.56306), X64 RyuJIT
  Job-DHBJYG : .NET Core 5.0.0 (CoreCLR 5.0.19.56303, CoreFX 5.0.19.56306), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  
Method Mean Error StdDev Median Min Max Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
RoundTripWithString 95.44 us 1.467 us 1.372 us 95.01 us 93.23 us 98.31 us 1.00 0.00 6.2256 0.8545 - 25.59 KB
RoundTripWithJsonElement 67.49 us 0.543 us 0.454 us 67.47 us 66.65 us 68.14 us 0.71 0.01 4.2725 0.3662 - 17.95 KB
RoundTripRaw 44.97 us 0.688 us 0.610 us 44.83 us 44.31 us 46.29 us 0.47 0.01 7.3853 0.0610 - 30.37 KB
RoundTripWithBytes 92.23 us 1.715 us 1.432 us 91.68 us 90.31 us 94.93 us 0.97 0.02 3.1738 0.2441 - 13.02 KB
RoundTripWithBytesRaw 44.31 us 0.871 us 0.932 us 44.08 us 43.32 us 46.20 us 0.47 0.01 3.1128 0.2441 - 12.95 KB

@ahsonkhan ahsonkhan added this to the 5.0 milestone Jan 25, 2020
@ahsonkhan ahsonkhan added tenet-performance Performance related issue api-suggestion Early API idea and discussion, it is NOT ready for implementation and removed untriaged New issue has not been triaged by the area owner labels Jan 25, 2020
@nahk-ivanov
Copy link
Author

nahk-ivanov commented Jan 25, 2020

Thanks for taking a stub at this and doing some benchmarking!

At this point it is a general concern for too much unnecessary work being done - this is part of an ASP.NET Core web app hosted in Azure App Service and I'd like to host on as little nodes as possible using smallest SKU possible, so that we scale horizontally, rather than vertically when needed. This looked like a low-hanging fruit to optimize, if all necessary APIs would exist, of course.

Payloads are not big in size (limited by Azure Table Storage entity size anyway), ranging from 1 to 32 KB. But it is an OLTP workload (think online gaming), so we get thousands requests per second and this is not a heavy load yet, of course.

Since serialization/deserialization happens within ASP.NET infrastructure, the easiest would be to hook through JsonConverter. I'll need to check ASP.NET Core APIs to see if I can get a hold of the serializer or request body stream directly to override standard deserialization behavior. I think it might be possible, if I just claim that my action expects a raw binary input, rather than a JSON payload and use JSON reader directly myself. It would be more convenient to have it extract some of the properties automatically though, and only ask me what to with the subset of properties - that's why I was leaning towards JsonConverter approach.

Validation-wise - it is there, but we only need to validate properties at the root level, which we do. Think the following json:

{
  "username": "foo",
  "accessToken": "bar",
  "payload": { "complex": "object" }
}

As long as username and accessToken are valid we don't care if user wants to "shoot himself in a foot" with bogus payload. There are, of course, scenarios where even the payload needs to be validated (e.g. shared with other users), but in this particular case it doesn't matter, as long as overall request size is valid. So what I used to do is map top-level properties and use custom converter on the payload, so that it is preserved in raw form within the application code.

I thought about using JsonElement for the payload property, yeah. This was possible even with JSON.NET as they had similar type, but I don't like serialization-specific types to "leak" into the data model. I'm not happy with using custom attributes, but I can live with it (even though I may not be sleeping well). If I have to use serialization framework-specific types for the properties in the model classes - then it will be even harder to switch from one framework to another.
With attributes I can add custom wrapper types and define converters for them externally to the model, but with public JsonElement Payload { get; set; } it's hard to do anything about it.

Based on your benchmarks I'll probably switch to JsonElement (even though it affects my sleep), but it would be nice if we could easily get performance of RoundTripWithBytesRaw using simple public APIs.

@layomia
Copy link
Contributor

layomia commented Feb 27, 2020

From @declspec in #32849:

I would like to propose an addition to the System.Text.Json.Utf8JsonWriter API which would allow writing existing UTF8-encoded JSON fragments into the current JSON document.

Something like the following:

public  sealed class Utf8JsonWriter {
    public void WriteUtf8JsonValue(ReadOnlySpan<byte> utf8json);

    public void WriteUtf8Json(string propertyName, ReadOnlySpan<byte> utf8json);
    // ... and the rest of the standard overloads for the different types of `propertyName`
}

An example of usage would be:

var utf8json = Encoding.UTF8.GetBytes("{ \"foo\": 42, \"bar\": \"baz\" }").AsSpan();

using (var ms = new MemoryStream()) {
    using (var writer = new Utf8JsonWriter(ms)) {
        writer.WriteStartObject();
        writer.WriteUtf8Json("fragment", utf8json);
        writer.WriteEndObject();
    }

    Debug.WriteLine(Encoding.UTF8.GetString(ms.ToArray()));

    // Expected output:
    // { "fragment": { "foo": 42, "bar": "baz" } }
}

I believe this would be useful when you are trying to augment or wrap existing JSON payloads (i.e. from external services) in a parent document without needing to deserialize and reserialize them.

I have read through all the documentation I can find and as far as I can see there doesn't currently seem to be any way of doing what I'm proposing. If there is already a better way of doing this please let me know.

@mikemcdougall
Copy link

What is the workaround?

@george-chakhidze
Copy link

What is the workaround?

@mikemcdougall Going back to good ol' Newtonsoft.Json and its JsonWriter.WriteRawValue.

@layomia
Copy link
Contributor

layomia commented Apr 3, 2020

Work-around as described above:

No, you aren't missing anything. What you found here is the only way you could do this within the JsonConverter. However, one thing you could do to avoid calling GetRawText and an extra re-parse is instead of storing the JSON as string to write later, store the JsonElement from the JsonDocument and call WriteTo on it.

string rawJson = ...; // Raw value can also be `byte`s

using (JsonDocument document = JsonDocument.Parse(rawJson))
{
    document.RootElement.WriteTo(writer);
}

@NinoFloris
Copy link
Contributor

We're in a similar situation, we heavily use PG json(b) columns for user defined metadata, our backend reads intentially know nothing about their contents, we'd ideally not pay the price of parsing and serializing something we only pass through.

Api proposal seems solid, good to see it's not based on String.

For large payloads -- LOH allocs are easier to avoid by deserializing, which we're trying to prevent here -- a ReadOnlySequence overload could be of use too.

We're looking at introducing ReadOnlySequence in Npgsql (PG driver) too, meaning we could remove all LOH allocs in an end to end scenario involving potentially large column values.

@layomia layomia modified the milestones: 5.0.0, Future Jul 14, 2020
@atlefren
Copy link

atlefren commented Sep 2, 2020

A bit late to the party here, but this issue is what stops me from converting to system.text.json from newtonsoft.json.

My issue is related to the GeoJSON format, which is a subset of JSON, used to describe geospatial features.

Support for this format and mapping it to C# classes is provided by the NetTopologySuite package. The reading and writing of GeoJson to corresponding C# classes is complex, and not something I want to re-implement. Neither is it something I wish to be included in the standard library.

The problem is that a GeoJSON geometry is a complex structure.

In order to handle this with Newtonsoft I've done the following: https://gist.github.com/atlefren/4ec30d46af1dde48c4a461abc5b0b3c5

However, I cannot find a way to handle this in system.text.json. As far as I'm concerned, validation and such is no issue here, as the NTS reader/writer is responsible for that part.

EDIT: I See that NTS has made this possible on their side, as documented here: https://github.com/NetTopologySuite/NetTopologySuite.IO.GeoJSON/wiki/Using-NTS.IO.GeoJSON4STJ-with-ASP.NET-Core-MVC#basic-usage

@layomia
Copy link
Contributor

layomia commented Sep 3, 2020

@atlefren did you take a look at the workaround in #1784 (comment)? Does it help?

@olivier-spinelli
Copy link

Sorry for the noob question but @StephenBonikowsky moving this issue to .NET 6 Committed does it mean that it is planned for .Net 6? (The milestone on the card is "Future", so I doubt it.)

(If it is, it's cool! I'd very like to see WriteRaw available method(s).)

@StephenBonikowsky
Copy link
Member

@olivier-spinelli thank you for pointing out that discrepancy.
@ericstj is this still planned for .NET 6?

@ericstj ericstj modified the milestones: Future, 6.0.0 Feb 17, 2021
@ericstj
Copy link
Member

ericstj commented Feb 17, 2021

This is part of the #43620 epic -> #45190 user story. We expect to get to this in 6.0. I've adjusted the milestone.

@layomia
Copy link
Contributor

layomia commented Jun 10, 2021

I updated the description above with an API proposal for this feature. @Tornhoof I opened #54005 to address writing raw property names.

@layomia layomia added api-ready-for-review API is ready for review, it is NOT ready for implementation blocking Marks issues that we want to fast track in order to unblock other important work and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Jun 10, 2021
@olivier-spinelli
Copy link

Any chance to "open" a little bit the Utf8JsonReader/Writer API?
I described my issue here: #30194 (comment)
(Wondering if I open an issue... If you can tell me what you think, it'll be cool :))

@layomia
Copy link
Contributor

layomia commented Jun 10, 2021

@olivier-spinelli per #30194 (comment) please feel free to open a new issue addressing your read requirements.

@olivier-spinelli
Copy link

olivier-spinelli commented Jun 10, 2021

Thanks!
The new issue is #54016.

@stevejgordon
Copy link
Contributor

@layomia Apologies if I've misread this on my phone but I'm not sure this would fully support my scenario above? Could it be adapted that way? In that case I want to write complete JSON, not just a property value. Should I perhaps open a specific issue to discuss?

@layomia
Copy link
Contributor

layomia commented Jun 10, 2021

@stevejgordon I'll circle back with more detail, but yes you'd be able to write complete JSON like top-level JSON objects, objects within JSON arrays, etc., just like the existing Utf8JsonWriter.WriteXValue methods.

@layomia
Copy link
Contributor

layomia commented Jun 11, 2021

Update: I made the following changes to the proposal in the description above #1784 (comment)

@layomia layomia changed the title Writing raw property values when using System.Text.Json Writing raw JSON values when using Utf8JsonWriter Jun 11, 2021
@olivier-spinelli
Copy link

A small remark on the API proposal about the nullable default parameter:

public void WriteRawValue(string json, JsonWriteRawOptions? options = null);

The JsonWriteRawOptions.Default exists (and is the default value 0). defaulting the parameter to null here seems somehow redundant and/or may be ambiguous (what's the final choice for the null?) and is, for sure, 5 bytes (+padding) on the stack instead of 4...

May be the following could be more explicit?

public void WriteRawValue( string json, JsonWriteRawOptions options = JsonWriteRawOptions.Default );

@layomia
Copy link
Contributor

layomia commented Jun 13, 2021

@olivier-spinelli I tried to explain this in the description:

The options parameter on the new Utf8JsonWriter.WriteRawValue method is nullable for forward compatibility - that is, if we wish to have an option on JsonWriterOptions to configure what the write raw settings for any WriteRawValue method call should be (as shown in the "Potential future API additions" section). Options passed directly to the method would win over the global option. The parameter is nullable so that we can distinguish between users passing default(JsonWriteRawOptions) (and meaning it), and users passing nothing (i.e. null, meaning the global option should apply).

Re:

what's the final choice for the null?

With the proposed API, passing a null value means we should use default(JsonWriteRawOptions).

Now, say the API we ship now is:

public sealed partial class Utf8JsonWriter
{
    public void WriteRawValue(ReadOnlySpan<byte> utf8Json, JsonWriteRawOptions options = default) { }
}

And we want to add API like this in this future:

public struct JsonWriterOptions
{
    // Existing
    // public JavaScriptEncoder? Encoder { readonly get; set; }
    // public bool Indented { get; set; }
    // public bool SkipValidation { get; set; }

    // Specify write-raw options to use for all WriteRawValue calls on the writer instance
    public JsonWriteRawOptions WriteRawOptions { get; set; }
}

What would the writer behavior in this scenario be?

JsonWriterOptions options = new() { WriteRawOptions = JsonWriterOptions.SkipValidation };
using Utf8JsonWriter writer = new(ms, options);

// Option a: Writer uses `JsonWriterOptions.SkipValidation` as specified globally
// Option b: Writer uses `JsonWriterOptions.Default` as specified in method call
writer.WriteRawValue(blob); // Same thing as writer.WriteRawValue(blob, JsonWriterOptions.Default)

If we don't think we'd ever want a "global" WriteRawOptions option to influence all WriteRawValue calls, so that users don't have to manually pass them each time, then my above question wouldn't apply and we can simpy go with public void WriteRawValue(ReadOnlySpan<byte> utf8Json, JsonWriteRawOptions options = default) { }.

@olivier-spinelli
Copy link

Ouch! I missed your remark, sorry!
But... this may be a sign that the "Least Surprise Principle" has somehow been violated here...

If the "global" exists, wouldn't be clearer to use overloads rather than default (of default) values. Something like:

public sealed partial class Utf8JsonWriter
{
    /// <summary>
    ///  Writes the value directly according to the specified options.
    /// </summary>
    public void WriteRawValue( ReadOnlySpan<byte> utf8Json, JsonWriteRawOptions options ) { }

    /// <summary>
    ///  Writes the value directly using this writer's <see cref="JsonWriterOptions.WriteRawOptions" />.
    /// </summary>
    public void WriteRawValue( ReadOnlySpan<byte> utf8Json ) { }
}

But if you don't include it now, these overloads will be really strange :). IMHO whether the global option exists should be settled right now...

Now, my very personal thought about this is that this "global" should not be here, here's why:
We're dealing here with a low-level API. That is meant to be used to handle specific cases. In these kind of scenario, there's a big probability that other (very specific) options are useful (like for instance: AllowNumberInfinity, NumberInfinityClampedToMax, UseByteArrayForGuid, etc.): the "global" Utf8JsonWriter options will not "be enough", these specific options will "naturally" be available in the calling context.

@layomia
Copy link
Contributor

layomia commented Jun 14, 2021

@olivier-spinelli yeah I agree that "global" configuration is not needed for this low-level API. Your API sketch does seem like a better approach if we wanted one. I'll revert to simply the following and not propose a global setting now or in the future:

public sealed partial class Utf8JsonWriter
{
    public void WriteRawValue(ReadOnlySpan<byte> utf8Json, JsonWriteRawOptions options = default) { }
}

@GrabYourPitchforks
Copy link
Member

GrabYourPitchforks commented Jun 14, 2021

Mentioned in internal chat: if the scenario is that you want to write a [potentially untrusted] payload as an object enveloped by some outer wrapper, make sure you're also checking that incoming data is well-formed UTF-8 and doesn't have malformed "\uXXXX" sequences or encoded standalone surrogates. This isn't needed for "put these bytes on the wire as-is" scenarios, which should only ever be run over trusted inputs.

@terrajobst
Copy link
Member

terrajobst commented Jun 14, 2021

Video

  • We should rename skipValidation to skipInputValidation to make it clear that this about catching issues with untrusted user input, rather than structural coding issues from the side of the caller.
  • When skipInputValidation is set to false we should enforce that no comments are present (or alternatively we strip them).
  • If we need to add options, we can either add an overload with an enum, a struct, or just more Booleans
  • The name RawValue is fine, but if we ever add the overload that takes a property name the convention would be to call it WriteRaw which is a bit odd
namespace System.Text.Json
{
    public sealed partial class Utf8JsonWriter
    {
        public void WriteRawValue(ReadOnlySpan<byte> utf8Json, bool skipInputValidation = false);
        public void WriteRawValue(ReadOnlySpan<char> json, bool skipInputValidation = false);
        public void WriteRawValue(string json, bool skipInputValidation = false);
    }
}

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation blocking Marks issues that we want to fast track in order to unblock other important work labels Jun 14, 2021
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jun 15, 2021
@olivier-spinelli
Copy link

Calling WriteRawValue is "raw writing". What about:

namespace System.Text.Json
{
    public sealed partial class Utf8JsonWriter
    {
        public void RawWriteValue(ReadOnlySpan<byte> utf8Json, bool skipInputValidation = false);
        public void RawWriteValue(ReadOnlySpan<char> json, bool skipInputValidation = false);
        public void RawWriteValue(string json, bool skipInputValidation = false);

        public void RawWrite(JsonEncodedText propertyName, string json, bool skipInputValidation = false);
        public void RawWrite(string propertyName, ReadOnlySpan<byte> utf8Json, bool skipInputValidation = false);
    }
}

Using Raw prefix rather than suffix may be simpler.

@ghost ghost closed this as completed in #54254 Jul 3, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jul 3, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Aug 2, 2021
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Text.Json tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.