Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Developers can use System.Text.Json to serialize type hierarchies securely #63747

Closed
4 tasks done
Tracked by #63762
eiriktsarpalis opened this issue Jan 13, 2022 · 60 comments · Fixed by #71346
Closed
4 tasks done
Tracked by #63762

Developers can use System.Text.Json to serialize type hierarchies securely #63747

eiriktsarpalis opened this issue Jan 13, 2022 · 60 comments · Fixed by #71346
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Text.Json Cost:L Work that requires one engineer up to 4 weeks Priority:2 Work that is important, but not critical for the release Team:Libraries User Story A single user-facing feature. Can be grouped under an epic.
Milestone

Comments

@eiriktsarpalis
Copy link
Member

eiriktsarpalis commented Jan 13, 2022

Background and Motivation

Serializing type hierarchies (aka polymorphic serialization) has been a feature long requested by the community (cf. #29937, #30083). We took a stab at producing a polymorphism feature during the .NET 6 development cycle, ultimately though the feature was cut since we eventually reached the conclusion that unconstrained polymorphism does not meet our security bar, under any circumstance.

For context, polymorphic serialization has long been associated with security vulnerabilities. More specifically, unconstrained polymorphic serialization can result in accidental data disclosure and unconstrained polymorphic deserialization can result in remote code execution when used with untrusted input. See the BinaryFormatter security guide for an explanation of such vulnerabilities.

We acknowledge that serializing type hierarchies is an important feature, and we are committed to delivering a secure implementation in a future release of System.Text.Json. This implies that we will be releasing a brand of polymorphism that is restricted by design, requiring users to explicitly opt-in to supported subtypes.

Basic Polymorphism

At the core of the design is the introduction of the JsonDerivedType attribute:

[JsonDerivedType(typeof(Derived))]
public class Base
{
    public int X { get; set; }
}

public class Derived : Base
{
    public int Y { get; set; }
}

This configuration enables polymorphic serialization for Base, specifically when the runtime type is Derived:

Base value = new Derived();
JsonSerializer.Serialize<Base>(value); // { "X" : 0, "Y" : 0 }

Note that this does not enable polymorphic deserialization since the payload would roundtripped as Base:

Base value = JsonSerializer.Deserialize<Base>(@"{ ""X"" : 0, ""Y"" : 0 }");
value is Derived; // false

Polymorphism using Type Discriminators

To enable polymorphic deserialization, users need to specify a type discriminator for the derived class:

[JsonDerivedType(typeof(Base), typeDiscriminatorId: "base")]
[JsonDerivedType(typeof(Derived), typeDiscriminatorId: "derived")]
public class Base
{
    public int X { get; set; }
}

public class Derived : Base
{
    public int Y { get; set; }
}

Which will now emit JSON along with type discriminator metadata:

Base value = new Derived();
JsonSerializer.Serialize<Base>(value); // { "$type" : "derived", "X" : 0, "Y" : 0 }

which can be used to deserialize the value polymorphically:

Base value = JsonSerializer.Deserialize<Base>(@"{  ""$type"" : ""derived"", ""X"" : 0, ""Y"" : 0 }");
value is Derived; // true

Type discriminator identifiers can also be integers, so the following form is valid:

[JsonDerivedType(typeof(Derived1), 0)]
[JsonDerivedType(typeof(Derived2), 1)]
[JsonDerivedType(typeof(Derived3), 2)]
public class Base { }

JsonSerializer.Serialize<Base>(new Derived2()); // { "$type" : 1, ... }

Mixing and matching configuration

It is possible to mix and match type discriminator configuration for a given type hierarchy

[JsonPolymorphic]
[JsonDerivedType(typeof(Derived1))]
[JsonDerivedType(typeof(Derived2), "derived2")]
[JsonDerivedType(typeof(Derived3), 3)]
public class Base
{
    public int X { get; set; }
}

resulting in the following serializations:

var json1 = JsonSerializer.Serialize<Base>(new Derived1()); // { "X" : 0, "Y" : 0 }
var json2 = JsonSerializer.Serialize<Base>(new Derived2()); // { "$type" : "derived2", "X" : 0, "Z" : 0 }

Customizing Type Discriminators

It is possible to customize the property name of the type discriminator metadata like so:

[JsonPolymorphic(CustomTypeDiscriminatorPropertyName = "$case")]
[JsonDerivedType(typeof(Derived1), "derived1")]
public class Base
{
    public int X { get; set; }
}

resulting in the following JSON:

JsonSerializer.Serialize<Base>(new Derived1()); // { "$case" : "derived1", "X" : 0, "Y" : 0 }

Unknown Derived Type Handling

Consider the following type hierarchy:

[JsonDerivedType(typeof(DerivedType1))]
public class Base { }
public class DerivedType1 : Base { }
public class DerivedType2 : Base { }

Since the configuration does not explicitly opt-in support for DerivedType2, attempting to serialize instances of DerivedType2 as Base will result in a runtime exception:

JsonSerializer.Serialize<Base>(new DerivedType2()); // throws NotSupportedException

The default behavior can be tweaked using the JsonUnknownDerivedTypeHandling enum, which can be specified like so:

[JsonPolymorphic(UnknownDerivedTypeHandling = JsonUnknownDerivedTypeHandling.FallBackToBaseType)]
[JsonDerivedType(typeof(DerivedType1))]
public class Base { }

JsonSerializer.Serialize<Base>(new DerivedType2()); // serialize using the contract for `Base`

The FallBackToNearestAncestor setting can be used to fall back to the contract of the nearest declared derived type:

[JsonPolymorphic(UnknownDerivedTypeHandling = JsonUnknownDerivedTypeHandling.FallBackToNearestAncestor)]
[JsonDerivedType(typeof(MyDerivedClass)]
public interface IMyInterface { }
public class MyDerivedClass : IMyInterface { }

public class TestClass : MyDerivedClass { }

JsonSerializer.Serialize<IMyInterface>(new TestClass()); // serializes using the contract for `MyDerivedClass`

It should be noted that falling back to the nearest ancestor admits the possibility of diamond ambiguity:

[JsonPolymorphic(UnknownDerivedTypeHandling = JsonUnknownDerivedTypeHandling.FallBackToNearestAncestor)]
[JsonDerivedType(typeof(MyDerivedClass)]
public interface IMyInterface { }

public interface IMyDerivedInterface : IMyInterface { }
public class MyDerivedClass : IMyInterface { }

public class TestClass : MyDerivedClass, IMyDerivedInterface { }

JsonSerializer.Serialize<IMyInterface>(new TestClass()); // throws NotSupportedException

Configuring Polymorphism via the Contract model

For use cases where attribute annotations are impractical or impossible (large domain models, cross-assembly hierarchies, hierarchies in third-party dependencies, etc.), it should still be possible to configure polymorphism using the JSON contract model:

public class MyPolymorphicTypeResolver : DefaultJsonTypeInfoResolver
{
    public override JsonTypeInfo GetTypeInfo(Type type, JsonSerializerOptions options)
    {
        JsonTypeInfo jsonTypeInfo = base.GetTypeInfo(type, options);
        if (jsonTypeInfo.Type == typeof(Base))
        {
            jsonTypeInfo.PolymorphismOptions =
                new JsonPolymorphismOptions
                {
                     TypeDiscriminatorPropertyName = "_case",
                     UnknownDerivedTypeHandling = JsonUnknownDerivedTypeHandling.FallBackToNearestAncestor,
                     DerivedTypes =
                     {
                          new JsonDerivedType(typeof(DerivedType1)),
                          new JsonDerivedType(typeof(DerivedType2), "derivedType2"),
                          new JsonDerivedType(typeof(DerivedType3), 42),
                     }
               }
        }

        return jsonTypeInfo;
    }
}

Additional details

  • Polymorphic serialization only supports derived types that have been explicitly opted in via the JsonDerivedType attribute. Undeclared runtime types will result in a runtime exception. The behavior can be changed by configuring the JsonPolymorphicAttribute.UnknownDerivedTypeHandling property.
  • Polymorphic configuration specified in derived types is not inherited by polymorphic configuration in base types. These need to be configured independently.
  • Polymorphic hierarchies are supported for both classes and interface types.
  • Polymorphism using type discriminators is only supported for type hierarchies that use the default converters for objects, collections and dictionary types.
  • Polymorphism is supported in metadata-based sourcegen, but not fast-path sourcegen.

API Proposal

namespace System.Text.Json.Serialization
{
    [AttributeUsage(AttributeTargets.Class | AttributeTargets.Interface, AllowMultiple = false, Inherited = false)]
    public sealed class JsonPolymorphicAttribute : JsonAttribute
    {
        public string? TypeDiscriminatorPropertyName { get; set; }
        public bool IgnoreUnrecognizedTypeDiscriminators { get; set; }
        public JsonUnknownDerivedTypeHandling UnknownDerivedTypeHandling { get; set; }
    }

    [AttributeUsage(AttributeTargets.Class | AttributeTargets.Interface, AllowMultiple = true, Inherited = false)]
    public class JsonDerivedTypeAttribute : JsonAttribute
    {
        public JsonDerivedTypeAttribute(Type derivedType);
        public JsonDerivedTypeAttribute(Type derivedType, string typeDiscriminatorId);
        public JsonDerivedTypeAttribute(Type derivedType, int typeDiscriminatorId);

        public Type DerivedType { get; }
        public object? TypeDiscriminatorId { get; }
    }

    public enum JsonUnknownDerivedTypeHandling
    {
        FailSerialization = 0, // Default, fail serialization on undeclared derived type
        FallBackToBaseType = 1, // Fall back to the base type contract
        FallBackToNearestAncestor = 2 // Fall back to the nearest declared derived type contract (admits diamond ambiguities in cases of interface hierarchies)
    }
}
namespace System.Text.Json.Metadata;

public partial class JsonTypeInfo
{
    public JsonPolymorphismOptions? PolymorphismOptions { get; set; } = null;
}

public class JsonPolymorphismOptions
{
    public JsonPolymorphismOptions();

    public ICollection<JsonDerivedType> DerivedTypes { get; }

    public bool IgnoreUnrecognizedTypeDiscriminators { get; set; } = false;
    public JsonUnknownDerivedTypeHandling UnknownDerivedTypeHandling { get; set; } = JsonUnknownDerivedTypeHandling.Default;
    public string TypeDiscriminatorPropertyName { get; set; } = "$type";
}

// Use a dedicated struct instead of ValueTuple that handles type checking of the discriminator id
public struct JsonDerivedType
{
    public JsonDerivedType(Type derivedType);
    public JsonDerivedType(Type derivedType, int typeDiscriminatorId);
    public JsonDerivedType(Type derivedType, string typeDiscriminatorId);

    public Type DerivedType { get; }
    public object? TypeDiscriminatorId { get; }
}

Anti-Goals

Progress

  • Bring prototype branch up to date.
  • Implementation & testing.
  • API proposal & review.
  • Conceptual documentation & blog posts.
@eiriktsarpalis eiriktsarpalis added area-System.Text.Json User Story A single user-facing feature. Can be grouped under an epic. Priority:2 Work that is important, but not critical for the release Cost:L Work that requires one engineer up to 4 weeks Team:Libraries labels Jan 13, 2022
@eiriktsarpalis eiriktsarpalis added this to the 7.0.0 milestone Jan 13, 2022
@eiriktsarpalis eiriktsarpalis self-assigned this Jan 13, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Jan 13, 2022
@ghost
Copy link

ghost commented Jan 13, 2022

Tagging subscribers to this area: @dotnet/area-system-text-json
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and Motivation

Serializing type hierarchies (aka polymorphic serialization) has been a long requested feature by the community (cf. #29937, #30083). We took a stab at producing a polymorphism feature during the .NET 6 development cycle, ultimately though the feature was cut since we eventually reached the conclusion that unconstrained polymorphism does not meet our security bar, under any circumstance.

For context, polymorphic serialization has long been associated with security vulnerabilities. More specifically, unconstrained polymorphic serialization can result in accidental data disclosure and unconstrained polymorphic deserialization can result in remote code execution when used with untrusted input. See the BinaryFormatter security guide for an explanation of such vulnerabilities.

We acknowledge that serializing type hierarchies is an important feature, and we are committed at delivering a secure implementation in a future release of System.Text.Json. This implies that we will be releasing a brand of polymorphism that is restricted by design, requiring users to explicitly opt-in to supported subtypes.

Examples

At the core of the design is the introduction of JsonKnownType attribute that can be applied to type hierarchies like so:

[JsonTypeDiscriminator("$type")]
[JsonKnownType(typeof(Derived1), "derived1")]
[JsonKnownType(typeof(Derived2), "derived2")]
public class Base
{
    public int X { get; set; }
}

public class Derived1 : Base
{
    public int Y { get; set; }
}

public class Derived2 : Base
{
    public int Z { get; set; }
}

This allows roundtrippable polymorphic serialization using the following schema:

var json1 = JsonSerializer.Serialize<Base>(new Derived1()); // { "$type" : "derived1", "X" : 0, "Y" : 0 }
var json2 = JsonSerializer.Serialize<Base>(new Derived2()); // { "$type" : "derived2", "X" : 0, "Z" : 0 }

JsonSerializer.Deserialize<Base>(json1) is Derived1; // true
JsonSerializer.Deserialize<Base>(json2) is Derived2; // true

Type discriminators can optionally be omitted from the contract like so:

[JsonKnownType(typeof(Derived1))]
[JsonKnownType(typeof(Derived2))]
public class Base
{
    public int X { get; set; }
}

resulting in the following serializations:

var json1 = JsonSerializer.Serialize<Base>(new Derived1()); // { "X" : 0, "Y" : 0 }
var json2 = JsonSerializer.Serialize<Base>(new Derived2()); // { "X" : 0, "Z" : 0 }

Note however that the particular contract does not support deserialization.

See also #30083 (comment) for details on the most recent API proposal.

Goals

  • Users should be able to serialize and deserialize type hierarchies using type discriminators.
  • Users must explicitly specify all supported subtypes for a given base type, following the approach used by DataContractSerializer's KnownTypeAttribute.
  • Users can optionally omit type discriminators from the serialized value, losing the ability to roundtrip the serialized value.
  • Users should be able to specify type hierarchies for types via APIs in JsonSerializerOptions.
  • Users should be able to specify type hierarchies via customized JSON contracts.
  • Type discriminators can either be of type string or type int, but a single hierarchy cannot mix and match types.
  • We should be able to reuse the polymorphism infrastructure when delivering related features, such as System.Text.Json : Consider supporting F# discriminated unions #55744.

Anti-Goals

Open Questions

  • Should we support polymorphic serialization for collection types?
  • How should we handle serialization of types in the hierarchy not explicitly opted in (adopt contract of nearest ancestor vs. throwing an exception). The former approach suffers from diamond ambiguity in the context of interface hierarchies.
  • How should we handle serialization of values of the base type if the base type has not been explicitly opted in?

Progress

  • Bring prototype branch up to date.
  • API proposal & review.
  • Implementation & testing.
  • Conceptual documentation & blog posts.
Author: eiriktsarpalis
Assignees: eiriktsarpalis
Labels:

area-System.Text.Json, User Story, Priority:2, Cost:L, Team:Libraries

Milestone: 7.0.0

@eiriktsarpalis eiriktsarpalis removed the untriaged New issue has not been triaged by the area owner label Jan 13, 2022
@SCLDGit
Copy link

SCLDGit commented Jan 13, 2022

I'm curious for more explanation by the open question "Should we support polymorphic serialization for collection types?".

If it simply means "Should we allow users to serialize a collection of objects of various types that all derive from IType?" (e.g. List) , then I can't possibly imagine a scenario where the answer would be no.

@eiriktsarpalis
Copy link
Member Author

eiriktsarpalis commented Jan 13, 2022

If it simply means "Should we allow users to serialize a collection of objects of various types that all derive from IType?" (e.g. List) , then I can't possibly imagine a scenario where the answer would be no.

That's right. The issue primarily is that collections serialize as JSON arrays. As such introducing a type discriminator would serve to complicate the contract, i.e. it would likely require nesting the array payload inside an object with metadata:

{ "$type" : "derivedList", "$values" : [1,2,3,4,5] }

There is precedent for such encodings in the reference preservation feature.

@IFYates
Copy link

IFYates commented Jan 13, 2022

unconstrained polymorphic serialization can result in accidental data disclosure
(Anti-Goal) No support for serializing open hierarchies

I completely understand these arguments and decisions, but I think it's a shame that this option is taken away from the developer. An opt-in to serializing open hierarchies would have quickly bridged a large gap from Newtonsoft.Json (from what I've seen in other comments) and could prove useful in some diagnostic situations.
I've taken someone's example for a write-only polymorphic converter, which fills my needs but feels overdue for official support.

For the above examples, would it be too much to also support an inverse of the attribute:

[JsonTypeDiscriminator("$type")]
public class Base
{
    public int X { get; set; }
}

[JsonKnownDerivedType("derived2")]
public class Derived1 : Base
{
    public int Y { get; set; }
}

This provides a clean solution for when you don't own the Base class.
I feel like it might complicate deserialization, but even for a serialize-only approach, it feels better than having to maintain a central list of type mappings. (Assuming I've understood "Users should be able to specify type hierarchies for types via APIs in JsonSerializerOptions" correctly)

@SCLDGit
Copy link

SCLDGit commented Jan 13, 2022

It does feel a bit insulting to have a much-requested feature completely disallowed simply because people might misuse it. Even if we had to toss it in an unsafe block, it would be nice to have simple does-what-it-says-on-the-box JSON serialization/deserialization for the (very common) case where security is not a concern. I've used JSON as a quick, simple data interchange format (most recently to store hierarchical scene data for rendering software) more times than I can count, and not having to rely on Newtonsoft.JSON would be great for those cases.

@eiriktsarpalis
Copy link
Member Author

For the above examples, would it be too much to also support an inverse of the attribute:

Such an attribute would likely need to explicitly specify the base type, since otherwise that would be ambiguous: was it meant for Base? or object? or some interface that Derived1 happens to implement?

In any case, it should be possible to work around not owning the base type by specifying the hierarchy via JsonSerializerOptions.

@jeffhandley
Copy link
Member

We acknowledge and understand the frustration that folks have with open hierarchies being disallowed. Our intent with publishing these goals and anti-goals is to help bring clarity to expectations and engage in these conversations. This should help us find a viable path forward together.

It does feel a bit insulting to have a much-requested feature completely disallowed simply because people might misuse it.

The challenge with open hierarchies isn't that people might misuse it, it's that any use of it is insecure. There are scenarios in which that vulnerability is masked by defenses elsewhere in a system, including the "I wrote this code and it's only ever going to run on my machine with my own data" scenario, but we cannot design this feature set with reliance on outside defenses. We must approach this design knowing that foundational security is paramount and then enable as many scenarios as we can with features that are still usable. We want to be clear about how we're approaching the problem space.

By designing with security first, we can keep building new capabilities outward to incrementally shrink the set of scenarios that cannot be accomplished. But if we start with features that are vulnerable by design, then we'll inevitably reach a point where it's difficult to use the features securely. That was the fate of BinaryFormatter, which is planned to be removed from .NET as a result.

We want to hear more about scenarios where open hierarchies have been used, and explore approaches for addressing those scenarios without relying on open hierarchies as an implementation detail. Thank you for helping us keep this moving forward!

@eiriktsarpalis
Copy link
Member Author

And, for what it's worth, it should still be possible to implement unconstrained polymorphism using custom converters. I've seen a few examples like that on the internet (including a couple of threads in this repo), although I wouldn't want to share links to them here.

@IFYates
Copy link

IFYates commented Jan 14, 2022

Such an attribute would likely need to explicitly specify the base type, since otherwise that would be ambiguous: was it meant for Base? or object? or some interface that Derived1 happens to implement?

Yes, I originally wrote [JsonKnownDerivedType(typeof(Base), "derived1")], but went for the stripped-down example.
I agree that forcing to define the base class is clearer. Not sure the implications of specifying a base higher up the hierarchy.

In any case, it should be possible to work around not owning the base type by specifying the hierarchy via JsonSerializerOptions.

Can you show an example? This sounds like (unless I've misunderstood) requiring the types to all be registered when the serializer is configured (e.g., in an OWIN startup).
While that feels correct if I'm tying together both things outside of my control, when I own the derived type, it feels smelly compared to your example of owning the the base class.

Secondarily related, would this structure work to expose the necessary in the derived type?

[JsonTypeDiscriminator("$type")]
[JsonKnownType(typeof(IExposeY), "derived1")] // Serialize based on interface; acknowledging that deserialization is impossible
public class Base
{
    public int X { get; set; }
}

public class Derived1 : Base, IExposeY
{
    public int Y { get; set; } // From IExposeY
}

@Symbai
Copy link

Symbai commented Jan 14, 2022

And, for what it's worth, it should still be possible to implement unconstrained polymorphism using custom converters.

But we don't want hand crafted solutions. If we would, the request wouldn't have so many up votes where people are telling you (and everyone posting his custom solution) this over and over again. The problem is rather simple: If the JSON .NET team make any change in the future because they are not aware of everyone's custom solution, and this change breaks this custom solution. But this custom solution has been used for so long that so many JSON (files for example) already exist. Then we have the worst case scenario ever. As a developer we want reliability, we want something where we can be sure it also works in 5 years or more. And we want it simply, something to opt-in. Something that works everywhere the same. No matter which developer wrote the code for it.

Newtonsoft JSON provides it. System.Text.JSON does not. And since .NET 3.1 we are asking for it and now that it has been delayed for so many times telling us to use custom converters is a bit odd. If that was ever an option, we would have used it already. But we are not, we are waiting for an official solution. Please keep this in mind, thank you.

@eiriktsarpalis
Copy link
Member Author

But we are not, we are waiting for an official solution.

Just to be absolutely clear, there will not be an officially supported way to do unconstrained polymorphic serialization in the foreseeable future. We are always open to offering better extensibility points that make such features easier to build as extensions, but the onus for doing this will always be on the developer/ecosystem.

@SCLDGit
Copy link

SCLDGit commented Jan 14, 2022

This is an especially annoying situation for those of us on the desktop app side of the equation who use JSON as a data storage or round-trip save format. The only security concern to us, at least as far as I understand it, is that someone could alter an on-disk JSON file that our software has generated that could do something malicious when loaded back into the system. If this is indeed the case (and please correct me if it's not), then there are already myriad data integrity validation tools at our disposal, including cryptographic hashing and singing, that make this concern a non issue. In every scenario where I want to serialize and deserialize JSON, I am in control of the data from end to end. Ignoring this use case in favor of only addressing the concerns of anonymous web transfers is ignoring a large chunk of the equation. No one is suggesting that security not be the prime concern by default, but if every library design approached its domain problem with the idea that developers should be absolutely disallowed from getting ourselves into potential trouble, even if it's opt-in, nothing would ever get done in the field of software development. Please reconsider this stance. Many of us don't require this level of hand holding.

@jeremyVignelles
Copy link

I agree with the fact that unconstrained polymorphism is a security issue, and as far as I understand, the only real counter-arguments here are that it worked before and that it's easy to dump process memory and load it back.

If it can be done with the proper extensions method, and that it's as useful as you say, I'm sure there will be an open-source package that will fill the gap, without tempting new devs that don't understand the security implications of this. What do you think?

@ishepherd
Copy link

ishepherd commented Jan 14, 2022

I'm sure there will be an open-source package that will fill the gap,

This seems logical. As S.T.J necessarily evolves slowly, an open source extensions package it is a natural idea. S.T.J should provide good enough extension points to let this package be built.

Because the package is not by MS, MS cannot rightly be blamed for the vulns it introduces.

@quixoticaxis
Copy link

Am I wrong in my assumtion that the issue being spoken about arises not from uncostrained polymorphism per se, but from the fact that traditionally multiple popular libraries provide the serialization facilities in ways that do not enforce class invariants and de-facto do binary deserialization, and also from the common issue (as I subjectively see it) of providing loose invariants throughout the ecosystem?
If everything is deserialized via constructor calls and most of the types in use guarantee invariants, the open polymorphism is not an issue, right?

On topic tl;dr: as an optional mode, open hierarchies may be allowed if the library is set to use constructors.

@eiriktsarpalis
Copy link
Member Author

provide the serialization facilities in ways that do not enforce class invariants

Can you clarify a bit more what you mean by class invariants?

@Symbai
Copy link

Symbai commented Jan 14, 2022

If it can be done with the proper extensions method, and that it's as useful as you say, I'm sure there will be an open-source package that will fill the gap

There are already open source packages which provide polymorphism support. But I found they don't work in all cases or aren't updated anymore. This is what can(!) happen to all third party packages. If you rely on them and got thousands of JSON files and it doesn't work anymore and you have to use another package perhaps and all of your previous JSON becomes invalid.. good luck. Not to say that any further updates to System.Text.Json can break it as well. I don't think this an appropriated solution for a serious developer/company but that's just my opinion.

@quixoticaxis
Copy link

quixoticaxis commented Jan 14, 2022

provide the serialization facilities in ways that do not enforce class invariants

Can you clarify a bit more what you mean by class invariants?

@eiriktsarpalis
I mean the set of assertions that stay true through the lifetime of the object, often enforced by the constructors.

I can understand that the class that provides public setters with no validation for properties of non specific types (for example, the class uses signed integers to model natural numbers) can be insecurely deserialized even without polymorphism, but I fail to imagine an example of a class that 1) would be insecure to recreate on deserialization through constructor, and 2) would not have other obvious design issues.
A quick Internet search didn't help.

@BreyerW
Copy link

BreyerW commented Jun 15, 2022

A question how configurating polymorphism for non-owned types work in case of metadata source gen? AFAIK it was suggested that custom contract resolver will NOT be picked up by source gen path rendering JsonPolymorphimsOptions nonviable for sourcegen. Or it is nonissue by virtue of static nature of source gen (that is, it knows all types that are to be serialized and thus can generate exhaustive discriminator mappings on its own)?

@eiriktsarpalis
Copy link
Member Author

A question how configurating polymorphism for non-owned types work in case of metadata source gen? AFAIK it was suggested that custom contract resolver will NOT be picked up by source gen path rendering JsonPolymorphimsOptions nonviable for sourcegen. Or it is nonissue by virtue of static nature of source gen (that is, it knows all types that are to be serialized and thus can generate exhaustive discriminator mappings on its own)?

Good question. There currently is no way make the source generator opt-in non-owned types into polymorphism, but it should still be possible to define a custom contract resolver deriving from JsonSerializerContext:

[JsonSerializable(typeof(NonOwnedBaseType))]
[JsonSerializable(typeof(NonOwnedDerivedType))]
public MyContext : JsonSerializerContext
{
}

public class MyContractResolver : IJsonTypeInfoResolver
{
     public JsonTypeInfo GetTypeInfo(Type type, JsonSerializerOptions options)
     {
          JsonTypeInfo sourceGenTypeInfo = ((IJsonTypeInfoResolver)MyContext.Default).GetTypeInfo(type, options);

          if (type == typeof(NonOwnedBaseType))
          {
               sourceGenTypeInfo.PolymorphismOptions = /* add polymorphism metadata as usual */
          }

          return sourceGenTypeInfo;
     }
}

var options = new JsonSerializerOptions { TypeInfoResolver = new MyContractResolver() };
JsonSerializer.Serialize(new NonOwnedDerivedType(), options);

In the future we might consider exposing polymorphism configuration options via the JsonSourceGenerationOptionsAttribute to make the experience better for source gen scenaria.

@BreyerW
Copy link

BreyerW commented Jun 15, 2022

@eiriktsarpalis hmm if sourcegen can actually pick up contracts and afaik public contract design with polymorphism is still in flight maybe there should be public static (or singleton) PolymorphismOptions mapping to which sourcegen would populate mapping based on attributes (also user could add programmatically mappings of lower priority than contract's mapping) and in actual serialization it would look up contract's mapping first (which would be empty in most cases) then public static mapping (which would be mostly filled with attribute based mapping).

Filling of static/singleton options could be done in generated module initializer or require custom contracts to be partial for source gen and generate population there (maybe in constructor or Initialize method). The latter option would let us delete static/singleton options and rely only on contract's options and in case someone wanted default contract with polymorphism then they can inhert default contract resolver with empty, partial class and register it, introducing minimal boilerplate for this scenario

The idea is to ensure theres no two separate ways for reflection and source gen serialization to enable non-owned types in polymorphism, which would ideally be achieved by fully sharing contract's interface between reflection and sourcegen mode

However if u dont want to invest into filling this gap for NET 7 then you should document somewhere this discrepancy and maybe even edit blog post about NET 7 preview 5 to reflect actual support https://devblogs.microsoft.com/dotnet/announcing-dotnet-7-preview-5/

EDIT: thinking about it further i might misunderstood you (english isnt my main language sorry). Did you mean only scenario where we want to enable non-owned types without custom contract isnt supported? But we can still add them manually via custom contract and avoiding custom contract would be enabled by future JsonSourceGenerationOptionsAttribute? At first I was under impression that sourceGenTypeInfo.PolymorphismOptions would be noop in sourcegen scenario but im not sure anymore ... Cant play with preview system.text.json and sourcegen for now so cant verify

@Hawxy
Copy link

Hawxy commented Jun 17, 2022

One limitation I noticed with the current implementation, it seems that $type is ordering sensitive.

If you're using jsonb PostgresSQL columns such as with EFCore or MartenDB, keys can end up in any order the database considers ideal for performance.

For example:

var databaseState = @"{
  ""BaseDictionary"": {
    ""6ed6e524-2ca4-4fea-8e21-7245ccb61863"": {
      ""Id"": ""6ed6e524-2ca4-4fea-8e21-7245ccb61863"",
      ""Name"": ""Something"",
      ""$type"": ""Derived"",
      ""OtherGuid"": ""d471c77d-5412-4e7a-a98d-8304e87792ed""
    }
  }
}";

JsonSerializer.Deserialize<WrapperType>(databaseState);

public record WrapperType(Dictionary<Guid, WrapperType.Base> BaseDictionary)
{
    [JsonDerivedType(typeof(Derived), nameof(Derived))]
    [JsonDerivedType(typeof(AlsoDerived), nameof(AlsoDerived))]
    public abstract record Base(Guid Id, string Name);
    public record Derived(Guid Id, string Name, Guid OtherGuid): Base(Id, Name);
    public record AlsoDerived(Guid Id, string Name) : Base(Id, Name);

}

Results in:

System.NotSupportedException: 'Deserialization of types without a parameterless constructor, a singular parameterized constructor, or a parameterized constructor annotated with 'JsonConstructorAttribute' is not supported. Type 'WrapperType+Base'. Path: $.BaseDictionary.6ed6e524-2ca4-4fea-8e21-7245ccb61863 | LineNumber: 3 | BytePositionInLine: 11.'

Newtonsoft had a MetadataPropertyHandling.ReadAhead setting to get around this, but I don't see anything similar for STJ.

@ishepherd
Copy link

ishepherd commented Jun 17, 2022

@Hawxy ugh, is that the STJ behaviour? That seems broken.

@eiriktsarpalis
Copy link
Member Author

eiriktsarpalis commented Jun 17, 2022

@Hawxy yes, this is a known limitation of the JSON metadata reader implementation (also used in features like reference preservation). Without getting into too much detail, it boils down to the fact that the internal converters have been written with async streaming in mind, as such the type discriminator might not even have been loaded in memory when the first properties are being read. In the future we might consider exposing an optional flag that works around this, at the expense of reading the same JSON twice and buffering excessive data in memory.

In the meantime, I think it might be reasonable to throw a better error message in cases where the base type is abstract and no type discriminators have been detected at the start of the object.

@BreyerW
Copy link

BreyerW commented Jun 17, 2022

@eiriktsarpalis Could you answer my question i made in edit? Need to understand actual limitations since im thinking of moving away from Newtonsoft to STJ and last obstacles are polymorphism and publicly exposed contract (well and support for private members at least just fields but i think i will be able to workaround it)

EDIT: thinking about it further i might misunderstood you (english isnt my main language sorry). Did you mean only scenario where we want to enable non-owned types without custom contract isnt supported? But we can still add them manually via custom contract and avoiding custom contract would be enabled by future JsonSourceGenerationOptionsAttribute? At first I was under impression that sourceGenTypeInfo.PolymorphismOptions would be noop in sourcegen scenario but im not sure anymore ... Cant play with preview system.text.json and sourcegen for now so cant verify

@eiriktsarpalis
Copy link
Member Author

At first I was under impression that sourceGenTypeInfo.PolymorphismOptions would be noop in sourcegen scenario but im not sure anymore ... Cant play with preview system.text.json and sourcegen for now so cant verify

The only difference between source gen and reflection is that the former generates metadata at compile-time while the latter uses reflection. The core serialization logic is agnostic as to the source of metadata so if PolymorphismOptions is specified it would be honored.

@BreyerW
Copy link

BreyerW commented Jun 17, 2022

@eiriktsarpalis in that case i was indeed under wrong impression and everything is okay with JsonSourceGenerationOptionsAttribute idea (but also feels kinda redundant if default resolver wont be sealed and be public and can be extended). Thanks for clarification.

@onionhammer
Copy link

onionhammer commented Jun 19, 2022

One thing that my library appears to have over this is compile time build errors for enum valued type discriminators.. not sure how this proposal handles enums here, the samples rely on ints and magic strings

https://github.com/wivuu/Wivuu.JsonPolymorphism

@AignerGames
Copy link

AignerGames commented Jun 22, 2022

I like the idea, but after reading this I have a few questions, because Im not sure if / what will be implemented of the following things:

  • Will this system also work for generic types (or interfaces), for example IMyGenericInterface < T > which is implemented by MyGenericClass < T > or MyStringClass < string > and uses as a class property of type public ISomeGeneric < string > MyStringProperty {get; set;}

  • Adding all the attribute for every "valid" type by hand seems a bit annoying, everytime you add a new class you also have to remember to add this attribute for the new derived type, is there no way to "automatically" detect and fill the list? I mean I can use reflection myself to detect all "valid" types, but how can I tell your system about this list? I can't add the attribute at runtime (correct me if Im wrong there)

The JsonTypeInfo examples posted above seem like a good option for that case, because then I can collect all "valid" types myself, even at runtime, and just add the info, but Im not sure if this is / will actually be added or was that just an example?

Ok now for the dumb question; I still don't fully understand the security issue, if I add a type property to the json like "Type" : "MyDerivedType" and the property is of type "MyBaseType", how is this a issue? It will find MyDerivedType and deserialize it, but only if it is an actual derived type of "MyBaseType" (it's easy to check). So it shouldn't be a issue, because MyDerivedType will only exist and be derived from MyBaseType if I added it to my code, it would only be an issue if I allow additional dlls to be loaded at runtime (but this has many other security issues anyways if not done with care) and someone adds a "bad MyEvilDerivedType" and then adds the type name in the json file, but unless I allow loading of additional dlls at runtime I don't see the issue?

Wouldn't it be a much simpler solution to just add a property to the (de)serializer settings with a list of "safe derived type" assembilies and allow the system to use all derived types (and interface implementations) of the defined assemblies without hard coding boilerplate code with every possible dervied type to the base type?
With this solution you would still restrict the scope of the dervied types and the system will not find / create instances of random evil types, because it will only check types in your predefined, "safe" assembly, which contains only your own types.

@eiriktsarpalis
Copy link
Member Author

Will this system also work for generic types (or interfaces)

Attribute annotations will not work for generic derived types, such configuration will need to happen using the contract resolver model. See the "Configuring Polymorphism via the Contract model" section in the OP.

Adding all the attribute for every "valid" type by hand seems a bit annoying

Likewise, authoring a custom contract resolver should let you define polymorphic hierarchies programmatically.

Ok now for the dumb question; I still don't fully understand the security issue, if I add a type property to the json like "Type" : "MyDerivedType" and the property is of type "MyBaseType", how is this a issue?

It becomes an issue when "MyBaseType" is high enough in the type hierarchy (e.g. object or IEnumerable<T>) to make it possible for attackers to force remote code execution using types already present in the BCL. It's not an issue of loading other assemblies as much as it is incorporating arbitrary types in the deserialization graph.

Wouldn't it be a much simpler solution to just add a property to the (de)serializer settings with a list of "safe derived type" assembilies

That's precisely what contract customization would be making, only configuration would be scoped to polymorphic types rather than the assembly level.

@AignerGames
Copy link

Thank you for the answers @eiriktsarpalis, I must have skipped the contract model section before. I checked the example and it seems to be exactly what I need.

I currently have a custom json serializer which collects all "valid" derived types from a specific "safe" assembly, which contains all the checks to find interface implementations and generic types etc. So it should be possible to simply change my implementaton to this the new feature and feed the custom contract model with my already existing code to collect the types.

@Michael-Bohin
Copy link

Thank you for the awesome work! I just copied one of the code snippets:

`[JsonPolymorhic(UnknownDerivedTypeHandling = JsonUnknownDerivedTypeHandling.FallBackToNearestAncestor)]
[JsonDerivedType(typeof(MyDerivedClass)]
public interface IMyInterface { }
public class MyDerivedClass : IMyInterface { }

public class TestClass : MyDerivedClass { }

JsonSerializer.Serialize(new TestClass()); // serializes using the contract for `MyDerivedClass``

At the first line there is a missing 'p', you might want to fix it. The typo is actually in more of those snippets. Thanks again for this update, will use it heavily.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jun 27, 2022
@eiriktsarpalis eiriktsarpalis removed the blocking Marks issues that we want to fast track in order to unblock other important work label Jun 28, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jun 28, 2022
@ilya-scale
Copy link

ilya-scale commented Jul 14, 2022

The new model that you have introduced seems like a big improvement and is a very welcome addition! One thing I was wondering and cannot find an answer to is how a type discriminator would behave if we will have the exact same field that we want to use in a runtime:

Example:

[JsonPolymorphic(CustomTypeDiscriminatorPropertyName = "Type")]
[JsonDerivedType(typeof(Base), typeDiscriminatorId: "base")]
[JsonDerivedType(typeof(Derived), typeDiscriminatorId: "derived")]
public class Base
{
    public string Type {get; }
    public int X { get; set; }
}

public class Derived : Base
{
    public int Y { get; set; }
}

What will happen if the DiscriminatorPropertyName will collide with an existing field and if it is possible to make it work "out of the box" that a field would both work as discriminator and a property.

Maybe this is really an anti-pattern to do it this way, and it is best to just check the C# type, but I have seen many places where code is using some additional Type field for various checks.

@eiriktsarpalis
Copy link
Member Author

@ilya-scale that's a good question actually. It would seem like the metadata writer does not check for property name conflicts even though the metadata reader does and will throw an exception. I think it might make sense to introduce a fix so that some form of validation takes place on the serialization side as well. I'll fork this into a separate issue.

@ilya-scale
Copy link

I have also found what seems to be a bug in the implementation and not sure if I should open a new github issue or not (since this is a preview package), I can just post it here perhaps:

It seems that the deserialization of the types is dependent that the descriminator type is the first field in Json. If I create json myself (i.e. not using System.Text.Json to serialize it first) and put the property not on top, then I get an exception:

Deserialization of types without a parameterless constructor, a singular parameterized constructor, or a parameterized constructor annotated with 'JsonConstructorAttribute' is not supported. Type 'MyType'. Path: $.SomePath.

While it is probably not a big issue, the json is order agnostic as far as I know and it should work anyway.

@bachratyg
Copy link

@ilya-scale this is a known limitation. See #63747 (comment)

@Hawxy
Copy link

Hawxy commented Jul 21, 2022

@eiriktsarpalis Is there an issue that tracks the above $type ordering limitation?

@eiriktsarpalis
Copy link
Member Author

There isn't one. Would you like to create it?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Text.Json Cost:L Work that requires one engineer up to 4 weeks Priority:2 Work that is important, but not critical for the release Team:Libraries User Story A single user-facing feature. Can be grouped under an epic.
Projects
None yet
Development

Successfully merging a pull request may close this issue.