Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modification to F# discriminated unions implementation to support treating Fields as a Record #1662

Open
CarlKinghorn opened this issue Mar 29, 2018 · 14 comments

Comments

@CarlKinghorn
Copy link

Modification to F# discriminated unions implementation to support treating Fields as a Record

Serialization of F# discriminated unions in the current Json.NET implementation and in the option presented in issue #1547 result in unfavorable output in my opinion. Below I will list problems, propose solutions, and present a semi-trivial case with sample code to support these opinions. I have already begun work on this issue here.

Unfavorable result 1 - Current implemtation

Example of F# code and resultant Json from the current implementation

type DUSerializationTest =
    | CaseA of id:int * name:string * dateTimeStamp:DateTime
    | CaseB of string * int
    | CaseC of description:string * DUSerializationTest

let serializeTest =
    CaseA (1, "Carl", DateTime.Now)
    |> JsonConvert.SerializeObject

let serializeTest2 =
    CaseB ("CaseB Item1 Example", 2)
    |> JsonConvert.SerializeObject

let serializeTest3 =
    CaseC ("Recursive Type Definition Example", CaseB ("CaseB Item1", 2))
    |> JsonConvert.SerializeObject
{
    "Case": "CaseA",
    "Fields": [
        1,
        "Carl",
        "2018-03-28T00:54:08.0652638+00:00"
    ]
}

{
    "Case": "CaseB",
    "Fields": [
        "CaseB Item1 Example",
        2
    ]
}

{
    "Case": "CaseC",
    "Fields": [
        "Recursive Type Definition Example",
        {
            "Case": "CaseB",
            "Fields": [
                "CaseB Item1",
                2
            ]
        }
    ]
}

I find this to be functional but not usable in scenarios where interoperability or discovery is important. Having a simple array of "Fields" as a series of ordered values without labels does not give any kind of hint of intent to the consumer.

Furthermore, it is my experience that when looking at API documentation or working with 3rd parties (who sometimes just send sample Json payloads as documentation) it is common practice to use a tool like json2csharp to generate classes at least as a basis for implementing the interaction. Doing this on the output detailed above results in the following class definition.

public class RootObject
{
    public string Case { get; set; }
    public List<object> Fields { get; set; }
}

As you can see the List<object> Fields property leaves us with more responsibility being placed on the consuming developer to get the implementation correct based on documentation. It would be preferable to have Json that would result in generated code that is named and explictly typed.

Unfavorable result 2 - Issue #1547 proposed struct-based implementation

While issue #1547 presents a good potential solution to the 1st unfavorable result I believe there are three problems which should prevent this library from adopting it.

Issue 1 - Current usage

The current implementation of serialization of Discriminated Unions by Json.NET is already published and available for common use; this means utilizing the existing [Struct] common attribute of Discriminated Unions would mean that developers who have implemented and potentially persisted [Struct] attributed unions in data stores would unexpectedly and unexplainably (without researching) have to deal with their programs not behaving properly or data loss.

Issue 2 - Limited to [Struct]

Outside of the obvious desire one would have to minimize restrictions on functionality the [Struct] attribute limits the ability to have a recursive type definition.

Issue 3 - "Case" becomes a reserved word with no compiler warning

The following F# would result in two instances of "case" being used within the Json output.

[<Struct>]
type DUSerializationCaseTest =
    | CaseA of case:string * dateTimeStamp:DateTime
    | CaseB of Case:string * DateTimeStamp:DateTime

let serializeTest =
    CaseA ("Not ideal", DateTime.Now)
    |> JsonConvert.SerializeObject

let serializeTest2 =
    CaseB ("Problem", DateTime.Now)
    |> JsonConvert.SerializeObject

Example 1 - technically valid because of the case-sensitivity of Json which, because of the common F# convention to use camelCasing in union case fields, could have a low impact on the majority of users.

{
    "Case": "CaseA",
    "case": "Not ideal",
    "dateTimeStamp": "2018-03-28T00:54:08.0652638+00:00"
}

Example 2 - invalid, duplicate key

{
    "Case": "CaseA",
    "Case": "Problem",
    "DateTimeStamp": "2018-03-28T00:54:08.0652638+00:00"
}

Proposed Solution

Begin by solving the "current usage" problem listed above by leaving the default functionality and adding an attribute which can be used to decorate Discriminated Unions so that they may be treated differently when serializing and deserializing.

public class DiscriminatedUnionFieldsAsRecordAttribute : Attribute { }

Modify the output of serializing a Discriminated Union decorated in this fashion in the following ways:

  • Leave the "Case" and "Fields" top level keyword paradigm but change "Fields" to "Record". This is beneficial to isolate and differentiate the label:value pairs of the union case from the wrapper identifiers.
  • Treat the fields of the Case being serialized as a record. This is beneficial for interoperability, discovery, and readability purposes. Additionally, the implementation of the deserialization of the "Record" does not have to follow a strict ordering of values since we can match on the labels and create the union case more safely.

Example of proposed changes

[<DiscriminatedUnionFieldsAsRecordAttribute>]
type DUSerializationTest =
    | CaseA of id:int * name:string * dateTimeStamp:DateTime
    | CaseB of string * int
    | CaseC of description:string * DUSerializationTest

let serializeTest =
    CaseA (1, "Carl", DateTime.Now)
    |> JsonConvert.SerializeObject

Results in the following Json

{
    "Case": "CaseA",
    "Record": {
        "id": 1,
        "name": "Carl",
        "dateTimeStamp": "2018-03-28T00:54:08.0652638+00:00"
    }
}

Example of json2csharp output when these changes have been applied

public class Record
{
    public int id { get; set; }
    public string name { get; set; }
    public DateTime dateTimeStamp { get; set; }
}

public class RootObject
{
    public string Case { get; set; }
    public Record Record { get; set; }
}

As I hope you will agree, the results of the proposed modification yield better results.

Sample Usage

This concept uses a Discriminated Union to model events for an ordering system. The goal is that these events can be folded into a state representation of an "Order" which a different system will display, modify, and return changes as a series of events. A benefit of utilizing a Discriminated Union for our model here is that we get to enforce completeness when doing operations on these events (e.g. folding events to state).

Model of events

type OrderEventData = {id:Guid; dateTimeStamp:DateTime; orderId:string; reason:string}

[<DiscriminatedUnionFieldsAsRecordAttribute>]
type OrderEvents =
  | OrderCreated of orderEvent:OrderEventData
  | CustomerChanged of orderEvent:OrderEventData * customerId:string
  | ItemAdded of orderEvent:OrderEventData * itemId:string * row:Guid
  | ItemDeleted of orderEvent:OrderEventData * row:Guid
  | ItemQuantityChange of orderEvent:OrderEventData * row:Guid * newQuantity:decimal
  | ItemPriceChange of orderEvent:OrderEventData * row:Guid * newPrice:decimal

With sample data serialized from above as "documentation" it becomes easy to run that through something like json2csharp and do some light renaming / plumbing to get a useful base for interoperability.

Sample "documentation" (note how the "Case" gives us info on the "Record" naming we'll use below)

{
    "Case": "CustomerChanged",
    "Record": {
        "orderEvent": {
            "id": "626d7012-acb0-4eec-99dd-072e2dd34e6d",
            "dateTimeStamp": "2018-03-28T00:54:08.061706+00:00",
            "orderId": "12345",
            "reason": "user selected"
        },
        "customerId": "10080"
    }
}

{
    "Case": "ItemAdded",
    "Record": {
        "orderEvent": {
            "id": "8d15144d-bf40-4c15-b0a7-2f5ec77f8847",
            "dateTimeStamp": "2018-03-28T00:54:08.0621004+00:00",
            "orderId": "12345",
            "reason": "user selected"
        },
        "itemId": "80511",
        "row": "9ca04207-2c77-4d3a-9d38-1f6f3a632fb9"
    }
}

Generated and modified C# which could be created entirely from the context of the above "documentation" though I will admit I think it would be important to point out to a consumer that the "Case" value is significant.

    //Create some base abstract types to support the event model
    public abstract class EventBase { }
    public abstract class EventRecordBase { }


    //Limited this example to two of the six cases defined in the F# code since this will take up many lines
    public class OrderEventMetadata
    {
        public string id { get; set; }
        public DateTime dateTimeStamp { get; set; }
        public string orderId { get; set; }
        public string reason { get; set; }
    }

    public class CustomerChangedEvent : EventBase
    {
        public OrderEventMetadata orderEvent { get; set; }
        public string customerId { get; set; }
    }

    public class ItemAddedEvent : EventBase
    {
        public OrderEventMetadata orderEvent { get; set; }
        public string itemId { get; set; }
        public string row { get; set; }
    }


    //Define the wrapper for events in support of the Case / Record root object
    public abstract class TypedEventRecord<T> : EventRecordBase where T : EventBase
    {
        public TypedEventRecord() => Case = GetCase();

        protected abstract string GetCase();

        public string Case { get; set; }
        public T Record { get; set; }
    }

    public class CustomerChangedRecord : TypedEventRecord<CustomerChangedEvent>
    {
        protected override string GetCase() => "CustomerChanged";
    }

    public class ItemAddedRecord : TypedEventRecord<ItemAddedEvent>
    {
        protected override string GetCase() => "ItemAdded";
    }

    //Create a factory for taking in events and producing event records
    public static class EventRecordFactory
    {
        public static EventRecordBase CreateEventRecord(EventBase record)
        {
            if (record.GetType() == typeof(CustomerChangedEvent))
            {
                return new CustomerChangedRecord()
                {
                    Record = (CustomerChangedEvent)record
                };
            }
            if (record.GetType() == typeof(ItemAddedEvent))
            {
                return new ItemAddedRecord()
                {
                    Record = (ItemAddedEvent)record
                };
            }

            throw new ArgumentException("Record is not a valid supported event type.", "record");
        }
    }
@cmeeren
Copy link

cmeeren commented Mar 29, 2018

I like the idea of an attribute to control this behavior to preserve backwards compatibility, but serializing to Case and Record looks like a leaky abstraction, since with a Struct union, the fact that it's an F# union is just an implementation detail. If I'm not mistaken, you could skip Case and Record altogether and only serialize the Record fields directly. Then, by looking at which of the record fields are present in the JSON, you'd be able to determine which F# DU case to deserialize to. If there are fields from different one could throw a deserialization exception.

@CarlKinghorn
Copy link
Author

CarlKinghorn commented Mar 29, 2018

I agree that the Case and Record paradigm seems like a bit of a leaky abstraction however I'm stumped trying to find I may have found a better alternative that fits my goals. (see bottom)

What you've suggested works fine for the case of strictly [Struct] Discriminated Unions since part of the requirement is that, for multicase unions, each field is assigned a unique name.

Example

[<Struct>]
[<DiscriminatedUnionFieldsAsRecordAttribute>]
type DUStructTest =
  | CaseA of id:string * name:string
  | CaseB of int
  | CaseC of numberOfThings:int

"Flattens" easily to the following, which is all unique and all easily identifiable when deserializing back to a case of the type DUStructTest

{
    "id": "1234",
    "name": "Carl"
}
{
    "Item1": 1
}
{
    "numberOfThings": 35
}

The problem, of course, is that it is only for those [Struct] unions and I think the ideal solution is something that functions consistently based on either the default functionality or this new proposed attribute [DiscriminatedUnionFieldsAsRecordAttribute]

With more thought, I asked myself "Self, what if we could go without the Case root level label and just have the case of the union be the key that describes the record?"

Example

[<DiscriminatedUnionFieldsAsRecordAttribute>]
type DUNoCaseKeywordTest =
  | CaseA of id:string * name:string
  | CaseB of int
  | CaseC of numberOfThings:int

Serializes CaseA into the following

{
    "CaseA": {
        "id": "1234",
        "name": "Carl"
    }
}

This is definitely possible and gives us identifiable uniqueness. However I'm up in the air on if I like the resultant interoperability better as it would be created from json2csharp as follows:

public class CaseA
{
    public string id { get; set; }
    public string name { get; set; }
}

public class RootObject
{
    public CaseA CaseA { get; set; }
}

If you go back and evaluate the C# I built in my sample usage similar but less plumbing would be required to allow me to deal with just the event classes and emit those events with their wrapper when needing to send data.

Simplified Sample (no change to F# so not copied here)

{
    "CustomerChanged": {
        "orderEvent": {
            "id": "b676ed22-6c45-49d8-b72c-4bfa8a144343",
            "dateTimeStamp": "2018-03-29T23:38:08.2207764+00:00",
            "orderId": "12345",
            "reason": "user selected"
        },
        "customerId": "100080"
    }
}

{
    "ItemAdded": {
        "orderEvent": {
            "id": "4ada3558-0226-436b-a10b-58f0153d4c81",
            "dateTimeStamp": "2018-03-29T23:38:08.2267768+00:00",
            "orderId": "12345",
            "reason": "user selected"
        },
        "itemId": "09878",
        "row": "7a4e6a1b-6fc1-4505-947f-fa4c5d82ccca"
    }
}
    //Create some base abstract types to support the event model
    public abstract class EventBase { }
    public abstract class EventRecordBase { }


    //Limited this example to two of the six cases defined in the F# code since this will take up many lines
    public class OrderEventMetadata
    {
        public string id { get; set; }
        public DateTime dateTimeStamp { get; set; }
        public string orderId { get; set; }
        public string reason { get; set; }
    }

    public class CustomerChangedEvent : EventBase
    {
        public OrderEventMetadata orderEvent { get; set; }
        public string customerId { get; set; }
    }

    public class ItemAddedEvent : EventBase
    {
        public OrderEventMetadata orderEvent { get; set; }
        public string itemId { get; set; }
        public string row { get; set; }
    }

    public class CustomerChangedRecord : EventRecordBase
    {
        public CustomerChangedEvent CustomerChanged { get; set; }
    }

    public class ItemAddedRecord : EventRecordBase
    {
        public ItemAddedEvent ItemAdded { get; set; }
    }

    //Create a factory for taking in events and producing event records
    public static class EventRecordFactory
    {
        public static EventRecordBase CreateEventRecord(EventBase record)
        {
            if (record.GetType() == typeof(CustomerChangedEvent))
            {
                return new CustomerChangedRecord()
                {
                    CustomerChanged = (CustomerChangedEvent)record
                };
            }
            if (record.GetType() == typeof(ItemAddedEvent))
            {
                return new ItemAddedRecord()
                {
                    ItemAdded = (ItemAddedEvent)record
                };
            }

            throw new ArgumentException("Record is not a valid supported event type.", "record");
        }
    }

Which means I still have the option to do nice things like this

public void PostEvents(List<EventBase> eventList, OrderEventRecordRepository orderEventRecordRepository)
{
    eventList.ForEach
        (e => orderEventRecordRepository.Insert(EventRecordFactory.CreateEventRecord(e)));
}

So I'll be honest as I wrote this response and examples I began to like this last option of dropping the Case keyword and turning the Record keyword into the union case name best, what do you think?

CarlKinghorn added a commit to CarlKinghorn/Newtonsoft.Json that referenced this issue Mar 30, 2018
@CarlKinghorn
Copy link
Author

CarlKinghorn commented Mar 30, 2018

Published a new branch (serialization only so far no deserialization yet) with the changes from the above comment.

edit: ok cool Github adds a reference to a commit if you reference the issue number there... (it didn't pop up until after I added this comment)

@DaveEmmerson
Copy link

DUs having poor serialization is a pain. I've been looking at how to work around it, but didn't think about a PR to here. It's a good idea.

Your use case actually looks similar to mine - want to be able to sensibly serialize events.

+1 from me

@cmeeren
Copy link

cmeeren commented Mar 31, 2018

Check out Microsoft.FSharpLu.Json, which is more or less exactly what you're proposing, plus a little more (Option to null, unwrapping of single-case DUs), and indeed the package I have used in my own projects to serialize unions.

It's a good format for making the json less verbose and more "natural", but the compatibility with C# is not as nice as using my struct suggestion in #1547, hence my proposing that solution in the first place. I deliberately addressed only struct unions there because my suggestion would serialize struct unions to JSON that could easily be deserialized to a single POCO. That was the whole point of #1547.

With that in mind, should I re-open it? It seems we're trying to fix slightly different problems. You seem to be after a general solution to serializing any kind of union, but with no special handling of struct unions. I consider union serialization in general a solved problem with FSharpLu.Json, and just want special (opt-in is fine) handling of struct unions for better interoperability with C#.

@CarlKinghorn
Copy link
Author

I checked out FSharpLu.Json prior to writing the issue above. It is different than the current Json.NET implementation but does not solve the issues I raise in my first post. Also, I believe many .Net developers use Json.NET instead of other various solutions out there (including FSharpLu) and it would be nice to have the best solution as part of this project.

Example

type DUFSharpLuTest =
  | CaseA of id:string * name:string
  | CaseB of int
  | CaseC of numberOfThings:int

Still results in an array of fields with unlabeled values.

{
  "CaseA": [
    "1234",
    "Carl"
  ]
}

Regarding the reopening of #1547, of course if you do not like my take on this I wouldn't discourage having an open issue to solve your problem. However, I did not spend any time on this issue this weekend and I'm responding to your comment late (for me) Sunday. Let me think on a possible solution to this and maybe we can get the best of both worlds somehow.

CarlKinghorn added a commit to CarlKinghorn/Newtonsoft.Json that referenced this issue Apr 6, 2018
CarlKinghorn added a commit to CarlKinghorn/Newtonsoft.Json that referenced this issue May 7, 2018
- Added region to discriminated union converter tests with DiscriminatedUnionFieldsAsRecord Tests
- Covered all cases the standard discriminated union converter tests cover, except those not applicable.
- Added new Shape and Currency classes with DiscriminatedUnionFieldsAsRecord attribute applied

Changes related to issues found / clarification needed during test additions
- DiscriminatedUnionFieldsAsRecord Attribute target changed to Class from Struct
- Changed to write case property name regardless of whether or not the case has fields
- Change unexpected token exception message to make more consistent with others
- Add conditional to throw with a different exception message when no property with a union name is found.
@daniellittledev
Copy link

We've been using a custom serialiser for unions which formats based on the shape.

Enum Union
If the union only has case labels then it will serialise exactly the same as an enum would

type UniverseType =
    | Infinite
    | Finite

Value Union
If at least one case has a payload we fall back to an object with a property for the case. I believe this is the SharpLu.Json format.

type MyRecord = { Name: string }

type MyUnion = 
    | CaseA of int
    | CaseB of MyRecord

{
    "CaseA": 1
}
// or
{
    "CaseA": { "Name": "Rabbit" }
}

Fields Union
If the union contains multiple fields then we use an array

type MyUnion = 
    | CaseA of int * int * int

{
    "CaseA": [1, 2, 3]
}

Anyone interested in a Pull Request? (in F#, or not)

@C0DK
Copy link

C0DK commented Jul 22, 2023

This is still an open issue, as far as can tell. We currently don't have any great solution, and it is a huge impediment, essentially making Newtonsoft unusable for my cases. (pun sorta intended)

@bartelink
Copy link

bartelink commented Jul 23, 2023

FsCodec's UnionConverter does close to what you specify
And has an interoperable equivalent for STJ

One slight gap is that I'd recommend backporting the autoTypeSafeEnumToJsonString and autoUnionToJsonObject options and implementing jet/FsCodec#96 to avoid the nasty surprised the default impl causes (I use FsCodec for current work and we have stacks of disgusting renderings of type safe enums in our stores)

One nice to have would be an interoperable reader that can accept a mangled rendering but write the correct one for new things, i.e. accept:

"items": [
        {
          "serviceId": "5c8795be52e34e82883d61babed19513",
          "serviceKind": {
            "Case": "ProductB"
          }
        },
        {
          "serviceId": "8a55ebbf0d404485b50da95bdb53f7b3",
          "serviceKind": {
            "Case": "ProductB"
          }
        }
      ]
    },

but write:

"items": [
                            {
                                "serviceId": "5c8795be52e34e82883d61babed19513",
                                "serviceKind": "ProductB"
                            },
                            {
                                "serviceId": "8a55ebbf0d404485b50da95bdb53f7b3",
                                "serviceKind": "ProductA"
                            }
                        ]

In case anyone has not read the tea leaves, NSJ is never going to fix this in the box - it's nowhere near as high priority as lots of other issues in this repo.


NOTE FsCodec is very much a live project with a future, and PRs are accepted and released pronto (by me),

Added issues: jet/FsCodec#98 and jet/FsCodec#97 covering the above (arguably...) nice to haves

Though I will admit to defaulting to just using the STJ variant for new projects so I don't personally feel the pain on a daily basis which would otherwise drive me to implement these already!

@CarlKinghorn
Copy link
Author

Should this issue remain open? The open pull request had a ton of opinions on it, and I ran into time issues trying to address back in 2018. If someone wants to take over, you're welcome to.

Since the current guidance in .NET is to use System.Text.Json, and attempting to serialize discriminated unions throws a System.NotSupportedException with the recommendation to write a custom converter. Maybe the effort should be spent there?

Current System.Text.Json "JsonSerializer.Serialize" exception on a DU:
image

@bartelink
Copy link

Not convinced you read my message but...

I dont think this issue is any more or less valid than the rest of the issues here.

The problem is simply that the project is unmaintained (though it is receiving minimal attention wrt extremely obvious bugs and good attention wrt security bugs etc, which is not to be sniffed at)

Plenty people need to deal with NSJ. FsCodec is an example of an NSJ union encoding impl that is maintained

New projects definitely should use STJ. It's not as feature rich, but that's not always a bad thing.

FsCodec has a UnionEncoder that implements the same semantics for STJ and NSJ.

My main point was to call out that a) there is an impl both sides and b) that represents a bridge

And that exception, I love it - its far better than the NSJ equivalent, hence jet/FsCodec#96. I'd argue that if this issue is to be closed, it should be replaced with a clone of that issue - the current behavior is a very unfortunate trap, and having an easy to opt into guard in the box would be a great thing to have (and is also way more likely to get merged and released).

There is an open issue on STJ regarding adding an impl in the box. As noted in dotnet/runtime#55744 there are current good solutions that make the absence of an in the box impl effectively a non issue (IMO)

  • FsCodec does something (that's intentionally minimal, and interoperates with FsCodec.NewtonsoftJson but there are also equivalent impls that consume that format for Java etc). Think 200 LOC for the union encoding.
  • FSharp.SystemTextJson has a very complete F#-specific end to end serialisation story, with good features and test coverage. It also has extremely rich DU encoding support. Think 5000 LOC (Union Encoding is not seperable).

In the long term, Union support in STJ is likely to be driven by C# language features (and that's also a likely impediment with them taking anything into the box too)

@CarlKinghorn
Copy link
Author

I'm sorry, I was responding more to the comment above yours that this was still an open issue and it being a huge impediment; I should have been clearer. Your recommendations are on point, and I appreciate your feedback. I'm going to take a look at FsCodec!

@bartelink
Copy link

👍 Contributions are definitely welcome in FsCodec.

In general it's trying to be minimal, and have a story about STJ vs NSJ interop (which makes the full semantics in the OP probably a stretch).

Having said that, if there is an NSJ converter that is likely to be useful in multiple codebases, I don't necessarily have a problem with it living there, even without an equivalent STJ side implementation.

But, most dearly of all, I'd love to see that glorious exception you showed, (which saved you, and people following in your footsteps from a random ill thought out behavior) could also be implemented for NSJ!

@ingted
Copy link

ingted commented Sep 16, 2024

type DUFSharpLuTest =
  | CaseA of id:string * name:string
  | CaseB of int
  | CaseC of numberOfThings:int

Still results in an array of fields with unlabeled values.

{
  "CaseA": [
    "1234",
    "Carl"
  ]
}

Actually there is no way to access id, name and numberOfThings in F#... Don't see any reason to have them in the serialized Json?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants