Add XML Serialization classes #557

fredrikhr · 2024-05-28T20:10:23Z

XML Serialization

When the ClangSharp P/Invoke generator operates in output mode Xml, the generator outputs Xml data that includes the information the generator has inferred by processing the output from ClangSharp. As such the Xml data can be seen as an intermediate step to producing C# P/Invoke bindings for the C/C++ input.

This PR includes C# DTO-classes that can be used to deserialize the output generated by the ClangSharp P/Invoke generator. This is useful for tools down the line that want to create different code to the C# code generated by ClangSharp, e.g. in order to introduce some case-specific knowledge into the code generator that a general-purpose tool like the ClangSharp P/Invoke generator cannot have.

This PR is currently in draft-state to provide an example for a general for-or-against discussion originally posed by issue #550.

There are several aspects that should be adressed code-wise specifically in this PR:

Where should the serialization classes be located?

The ClangSharp.PInvokeGenerator library and nuget package contains a namespace XML which is the current location of the XmlOutputBuilder class, the class that generates the XML output. Placing the serialization classes in the same namespace as the class that actually generates the XML keeps the code in the same place.
Create a new class library (and nuget package) purely meant to for deserialization of generated XML output. This would imply a shared contract and the ClangSharp.PInvokeGenerator could be a tool that creates data conforming to that contract whereas other applications could be designed to consume data conforming to that contract (without depending on the generator itself).

This PR chose option 1. of the options above for simplicit. Option 2 while having some benefits seems too heavy handed.

Use `System.Xml.Serialization`

This PR uses annotations from the System.Xml.Serialization namespace. System.Runtime.Serialization.DataContractAttribute could possible be used as an alternative. Since the output mode is XML specific, using the XML-specific serialization technology seems like the better choice.

Naming and class design

The classes in this PR are meant to act as DTO-types only, i.e. they only represent the data exacly as emitted in the generated XML. For that purpose I simply added the round-tripping code in the Unit test library and tweaked the classes until all Unit tests that involve XML generation passed with a round-trippable output.

However, naming things is hard, and I tried to follow naming inspired by the type names passed as input to the various methods on XmlOutputBuilder (FieldDesc is one example leading to the name FieldDescElement for the type representing the <field> element in the output XML). Some of the methods on XmlOutputBuilder are a little more complex however and create multiple elements as part of their implementation, which is why some classes don't have a corresponding type in the Abstractions namespace.

We should have a discussion on what a reasonable naming scheme should be here and make the serialization classes be named consistently. One possiblity could be to just simply make the types be named by their XML-element name (e.g. ConstantElement for the <constant> tag).

I have also introduced some hierarchical structure to the types and created common base types in an attempt to de-duplicate some aspects of the generated XML (e.g. NamedAccessDescElement which describes an Element with a name and access attribute).

In order to guarantee round-trippable XML in the unit tests the serialization types needed to structure some properties in a way that guarateed identical ordering of XML elements (e.g. in the NamespaceElement class). Obviously fields in structs, v-tables and interfaces need to preserve ordering, but in other situations the ordering from the XML output is probably not required.

The `<value>` and `<code>` elements

A tricky part in the serialization classes are the <value> and <code> elements in the XML output. Currently, this PR exposes this by simply having a List<XmlNode> property on the CSharpCodeElement class which may not be ideal. Given that the generator code emits these elements with mixed XML Text and element contents we should discuss the level of detail that should be represented in the serialization types and how to best expose this. Especially code related to bitfields contains a lot of extra elements that might occur as descandants to <code>.

Should `XmlOutputBuilder` re-use the serialization classes to generate XML?

Currently the ClangSharp P/Invoke generator uses a simple StringBuilder approach to generate the XML output. The generated output contains no XML namespaces and the generated XML is fairly simple. However, once you have serialization types available that other applications make use of, there is the question of how to enforce that the serialization types match the generated output produced by the generator? Sure, the Unit tests do help in that regard, but an argument can be made that the XmlOutputBuilder should re-use the serialization types to procude the XML. That way it would become very unlikely that the XmlOutputBuilder generates something that is not losslessly deserializable with the serialization types.

fredrikhr · 2024-06-01T15:16:29Z

Since this is still in draft, I have expanded the desciption with more detail. Despite discussing the more general aspects in the related issue, I'd like some feedback on the points I outlined in the PR description which might greatly affect the code changes for this PR before it gets into the ready-for-review stage.

Add XML Serialization classes

5c7c16c

fredrikhr mentioned this pull request May 28, 2024

Standardization of XML Output #550

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add XML Serialization classes #557

Add XML Serialization classes #557

fredrikhr commented May 28, 2024 •

edited

Loading

fredrikhr commented Jun 1, 2024

Add XML Serialization classes #557

Are you sure you want to change the base?

Add XML Serialization classes #557

Conversation

fredrikhr commented May 28, 2024 • edited Loading

XML Serialization

Where should the serialization classes be located?

Use System.Xml.Serialization

Naming and class design

The <value> and <code> elements

Should XmlOutputBuilder re-use the serialization classes to generate XML?

fredrikhr commented Jun 1, 2024

fredrikhr commented May 28, 2024 •

edited

Loading

Use `System.Xml.Serialization`

The `<value>` and `<code>` elements

Should `XmlOutputBuilder` re-use the serialization classes to generate XML?