Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reflection-based XmlSerializer - Deserialize empty collections and allow for sub-types in collection items. #111723

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

StephenMolloy
Copy link
Member

The XmlSerializer has two implementations that it selects based on various criteria. The preferred implementation is ILGenerator-based, and its behavior closely mimics that of XmlSerializer in NetFx. The reflection-based implementation has some areas where behavior does not match. This PR addresses two of those.

Null Collections:

When deserializing, the NetFx and ILGen serializers will by default create empty collections for any list-like-but-not-array members. This means that even if a list is left out of an xml payload or explicitly tagged as 'xsi:nil', the deserializer will still create an empty list. This is unfortunate, but it is how XmlSerializer has behaved for decades.

The reflection-based serializer actually gets this "correct", but in doing so it behaves differently from other implementations of XmlSerializer that are used in 99% of cases. This PR shifts the reflection-based serializer to creating empty collections instead of allowing them to be null.

Sub-Types as Collection items:

The NetFx and ILGen serializers both correctly allow for items within a collection to be a known XmlInclude-d sub type. The reflection-based serializer was not accounting for known sub-types. This PR also addresses that.

Test Enhancements:

Added new test cases to handle scenarios with empty and null lists, ensuring that the serialization and deserialization processes handle these cases correctly. Also added new test scenarios to test the ability of both serializers to handle known sub-types as collection items.

Fixes #107252, #108432

@StephenMolloy StephenMolloy added this to the 10.0.0 milestone Jan 22, 2025
@StephenMolloy StephenMolloy self-assigned this Jan 22, 2025
@StephenMolloy StephenMolloy requested a review from Copilot January 22, 2025 21:05
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 6 out of 8 changed files in this pull request and generated 2 comments.

Files not reviewed (2)
  • src/libraries/System.Private.Xml/src/System/Xml/Serialization/XmlSerializer.cs: Evaluated as low risk
  • src/libraries/System.Runtime.Serialization.Xml/tests/SerializationTypes.RuntimeOnly.cs: Evaluated as low risk
Comments suppressed due to low confidence (1)

src/libraries/System.Private.Xml/tests/XmlSerializer/XmlSerializerTests.cs:2523

  • The method 'AssertTypeAndUnwrap' is introduced but it is not clear if it is covered by tests. Ensure this method is tested.
private static Exception AssertTypeAndUnwrap<T>(object exception, string? message = null) where T : Exception

@StephenMolloy StephenMolloy requested a review from Copilot February 3, 2025 20:45

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again, by re-requesting a review.

@StephenMolloy StephenMolloy requested a review from Copilot February 3, 2025 20:48
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 6 out of 11 changed files in this pull request and generated 1 comment.

Files not reviewed (5)
  • src/libraries/System.Private.Xml/src/System/Xml/Serialization/Mappings.cs: Evaluated as low risk
  • src/libraries/System.Private.Xml/src/System/Xml/Serialization/ReflectionXmlSerializationReader.cs: Evaluated as low risk
  • src/libraries/System.Private.Xml/src/System/Xml/Serialization/ReflectionXmlSerializationWriter.cs: Evaluated as low risk
  • src/libraries/System.Private.Xml/src/System/Xml/Serialization/XmlSerializer.cs: Evaluated as low risk
  • src/libraries/System.Private.Xml/src/System/Xml/Serialization/Types.cs: Evaluated as low risk
Comments suppressed due to low confidence (2)

src/libraries/System.Private.Xml/tests/XmlSerializer/XmlSerializerTests.RuntimeOnly.cs:1036

  • The word 'defference' is misspelled. It should be 'deference'.
// Order matters. Legacy behavior gives defference to base mappings. So we start there.

src/libraries/System.Private.Xml/src/System/Xml/Serialization/XmlReflectionImporter.cs:2243

  • Ensure that the RemoveUniqueAccessor method is covered by tests to verify its correctness.
private static void RemoveUniqueAccessor(MemberMapping member, INameScope elements, INameScope attributes, bool isSequence)

@StephenMolloy StephenMolloy marked this pull request as ready for review February 5, 2025 22:08
@HongGit HongGit requested a review from Copilot February 6, 2025 22:14

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 6 out of 11 changed files in this pull request and generated no comments.

Files not reviewed (5)
  • src/libraries/System.Runtime.Serialization.Xml/tests/SerializationTypes.RuntimeOnly.cs: Evaluated as low risk
  • src/libraries/System.Runtime.Serialization.Xml/tests/SerializationTypes.cs: Evaluated as low risk
  • src/libraries/System.Private.Xml/src/System/Xml/Serialization/XmlSerializer.cs: Evaluated as low risk
  • src/libraries/System.Private.Xml/src/System/Xml/Serialization/Types.cs: Evaluated as low risk
  • src/libraries/System.Private.Xml/src/System/Xml/Serialization/XmlReflectionImporter.cs: Evaluated as low risk
Comments suppressed due to low confidence (11)

src/libraries/System.Private.Xml/src/System/Xml/Serialization/Mappings.cs:985

  • Ensure that the new Hides method is covered by tests to verify its correctness in identifying hidden members.
internal bool Hides(MemberMapping mapping)

src/libraries/System.Private.Xml/tests/XmlSerializer/XmlSerializerTests.RuntimeOnly.cs:902

  • The error message is unclear and could be improved for better understanding.
Assert.Equal("Cannot deserialize type 'SerializationTypes.TypeWithPrivateSetters' because it contains property 'PrivateSetter' which has no public setter.", ex.Message);

src/libraries/System.Private.Xml/tests/XmlSerializer/XmlSerializerTests.RuntimeOnly.cs:1032

  • The error message is unclear and could be improved for better understanding.
Assert.Contains("AlwaysNullPropertyPrivateSetter", ex.Message);

src/libraries/System.Private.Xml/tests/XmlSerializer/XmlSerializerTests.RuntimeOnly.cs:1092

  • The error message is unclear and could be improved for better understanding.
AssertXmlMappingException(ex, "SerializationTypes.HideArrayWithRenamedElement", "ListField", "Member 'HideArrayWithRenamedElement.ListField' hides inherited member 'BaseWithElementsAttributesPropertiesAndLists.ListField', but has different custom attributes.");

src/libraries/System.Private.Xml/tests/XmlSerializer/XmlSerializerTests.RuntimeOnly.cs:2855

  • The error message is unclear and could be improved for better understanding.
Assert.Contains("Invalid or missing value of the choice identifier", ex.Message);

src/libraries/System.Private.Xml/tests/XmlSerializer/XmlSerializerTests.RuntimeOnly.cs:2914

  • The error message is unclear and could be improved for better understanding.
Assert.Contains("is missing enumeration value", ex.Message);

src/libraries/System.Private.Xml/src/System/Xml/Serialization/ReflectionXmlSerializationReader.cs:1690

  • The method signature for CreateDelegate does not match the delegate type Func<MemberInfo, ReflectionXmlSerializationReaderHelper.SetMemberValueDelegate>. This could lead to runtime errors.
var getSetMemberValueDelegateWithType = getSetMemberValueDelegateWithTypeMi.CreateDelegate<Func<MemberInfo, ReflectionXmlSerializationReaderHelper.SetMemberValueDelegate>>();

src/libraries/System.Private.Xml/src/System/Xml/Serialization/ReflectionXmlSerializationReader.cs:1648

  • The changes introduce new behaviors for handling list-like members without public setters and ensuring empty collections. There should be test cases that specifically cover these scenarios to ensure the new behavior works as expected.
var isList = mapping.TypeDesc.IsArrayLike && !mapping.TypeDesc.IsArray;

src/libraries/System.Private.Xml/src/System/Xml/Serialization/ReflectionXmlSerializationWriter.cs:162

  • Potential NullReferenceException if choiceSources is not an array or is null. Suggest using 'as' operator for safe casting.
var choiceSource = ((Array?)choiceSources)?.GetValue(i);

src/libraries/System.Private.Xml/src/System/Xml/Serialization/ReflectionXmlSerializationWriter.cs:178

  • Potential NullReferenceException if choiceSources is not an array or is null. Suggest using 'as' operator for safe casting.
var choiceSource = ((Array?)choiceSources)?.GetValue(c++);

src/libraries/System.Private.Xml/src/System/Xml/Serialization/ReflectionXmlSerializationWriter.cs:232

  • Potential NullReferenceException if choiceSource is null. Suggest adding a null check for choiceSource.
if (choiceFullName == enumFullName)
string enumTypeName = choice.Mapping!.TypeDesc!.FullName;
string enumFullName = (enumUseReflection ? "" : enumTypeName + ".@") + FindChoiceEnumValue(element, (EnumMapping)choice.Mapping, enumUseReflection);
string choiceFullName = (enumUseReflection ? "" : choiceSource!.GetType().FullName + ".@") + choiceSource!.ToString();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might have a problem here if the enum has the [Flags] attribute as ToString might return multiple names that are comma delimited.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a little trouble reasoning about this. Which is why I decided to not be clever and just do what ILGen does. This sequence here was pretty much modeled after the code here: https://github.com/StephenMolloy/runtime/blob/36093e63bbba5bbea9c970fb45adf336181aa86b/src/libraries/System.Private.Xml/src/System/Xml/Serialization/XmlSerializationWriter.cs#L3684

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I don't think the 'enum' here is the property itself. Rather it refers to the list of enum values that are used to identify the type of an object in a choice list. Like is used in the 'choice' classes here: https://github.com/StephenMolloy/runtime/blob/36093e63bbba5bbea9c970fb45adf336181aa86b/src/libraries/System.Runtime.Serialization.Xml/tests/SerializationTypes.cs#L970-L1000
I'm not sure [Flags] has anything to do with this scenario, although I suppose it wouldn't hurt to double check.

for (int i = 0; i < list.Count; i++)
{
if (memberMapping.Hides(list[i]))
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this code run on every serialization/deserialization or is it initialization code only? If it's on every serialization/deserialization, has the performance impact of this been measured? If you have a class with a lot of members, this is O(n^2) so has the potential to be a problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the ILGen serializer, this only runs once on initialization. Currently in the reflection-based implementation, this xml-mapping process happens for each call to serialize/deserialize. Which is one of the first things I'd look at when getting around to general reflection-serializer improvements, because it seems like it really only needs to happen once. But I did not address that here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, this is the xml-mapping code that I wouldn't want to bring back to previous versions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just re-read my comment here and think I can state this a little better. It is the case for both serializers that when they are building their tree of code/IL/reflection-calls/whatever, they enter this code path to get member details for the type they are working with. In the case of the ILGen serializer it uses the results of this to generate IL that gets persisted in a TempAssembly and reused on each serialization, so it never needs to get this information again. In the case of the reflection-based serializer though, the reflection-call tree is not preserved. So this does get called multiple times. But this is the crux of the major perf improvements we want to make with the reflection serializer. When those updates come, this should only be called once.

Comment on lines +1636 to +1640
// Always create a collection for (non-array) collection-like types, even if the XML data says the collection should be null.
if (!mapping.TypeDesc.IsArray)
{
member.Collection ??= new CollectionMember();
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When considering whether it would make more sense to initialize empty lists above here after creating our object - rather than cleaning up after deserialization with an 'ensure' approach - I noticed that this case here was missed. Added one more property to one of the new testcases to hit this code path and ensure serializer parity here.

FWIW, I decided to stick with the 'ensure' approach because to do the proactive initialization above here would require duplicating a lot of the silly property-detection logic happening below here. I was getting messy and duplicative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants