Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Obsolete SerializationFormat.Binary in DataTable/DataSet and put it behind an appcontext switch #39289

Closed
GrabYourPitchforks opened this issue Jul 14, 2020 · 12 comments · Fixed by #65139
Assignees
Milestone

Comments

@GrabYourPitchforks
Copy link
Member

GrabYourPitchforks commented Jul 14, 2020

Ref:

for (int i = 0; i < Tables.Count; i++)
{
BinaryFormatter bf = new BinaryFormatter(null, new StreamingContext(context.State, false));
MemoryStream memStream = new MemoryStream();
bf.Serialize(memStream, Tables[i]);
memStream.Position = 0;
info.AddValue(string.Format(CultureInfo.InvariantCulture, "DataSet.Tables_{0}", i), memStream.GetBuffer());
}

//Tables, Columns, Rows
for (int i = 0; i < tableCount; i++)
{
byte[] buffer = (byte[])info.GetValue(string.Format(CultureInfo.InvariantCulture, "DataSet.Tables_{0}", i), typeof(byte[]))!;
MemoryStream memStream = new MemoryStream(buffer);
memStream.Position = 0;
BinaryFormatter bf = new BinaryFormatter(null, new StreamingContext(context.State, false));
DataTable dt = (DataTable)bf.Deserialize(memStream);
Tables.Add(dt);
}

This issue tracks the removal of this code per the BinaryFormatter obsoletion plan.

From offline discussions with Arthur and friends, we may want to eliminate support for SerializationFormat.Binary anyway, since it's not compatible between .NET Framework and .NET Core / .NET 5.0+. The only data format compatible between the different runtimes is SerializationFormat.Xml.

@ghost
Copy link

ghost commented Jul 14, 2020

Tagging subscribers to this area: @roji, @ajcvickers
Notify danmosemsft if you want to be subscribed.

@Wraith2
Copy link
Contributor

Wraith2 commented Jul 15, 2020

What should the behaviour be if a user attempts to use binary serialization once the code to allow it has been removed? silent no-op or throwing of an exception? If an exception there is already an invalid enum value exception (I think ArgumentException) present which could be thrown or should there be a new one indicating that it has been removed? Should SerializationFormat.Binary be marked as obsolete? Should any exceptions in System.Data.Common have their serialization/deserialization methods marked as obsolete while this work is being done?

@GrabYourPitchforks
Copy link
Member Author

From offline conversations the most straightforward way to do this would be to mark the Binary enum field as obsolete and to have the SerializationFormat property setter throw ArgumentException if any value other than Xml is provided. Let that bake for a release, then if all is well rip out the now-unused logic.

@ajcvickers ajcvickers removed the untriaged New issue has not been triaged by the area owner label Jul 16, 2020
@jeffhandley
Copy link
Member

jeffhandley commented Jul 18, 2020

@GrabYourPitchforks would we want to reuse the same diagnostic id as binary formatter’s obsoletion (SYSLIB0011)?

@GrabYourPitchforks
Copy link
Member Author

GrabYourPitchforks commented Jul 18, 2020

@jeffhandley I don't think so. There are two issues at play here.

The first issue (as tracked here) is the removal of BinaryFormatter within DataSet itself. This would still allow DataSet instances themselves to be serialized with BinaryFormatter, it's just that they'd be forced to use the same XML-based payload format that Full Framework uses. This is different from "consumers of .NET should stop using BinaryFormatter" and probably deserves its own obsoletion code.

The second issue (tracked by dotnet/designs#141) is the eventual removal of [Serializable] attributes, serialization ctors, and ISerializable interfaces from all built-in .NET types, including DataSet. At that point, it won't matter whether you specify SerializationFormat.Binary or SerializationFormat.Xml. Neither will work since the code paths which consumed them will no longer exist.

Note: I should clarify a possible point of confusion. The SerializationFormat enum is applicable only when using BinaryFormatter to serialize or deserialize an instance of DataSet or DataTable. The enum value controls whether BinaryFormatter itself should use a binary or an XML format for the persisted payload. You can continue to use XmlSerializer to serialize / deserialize instances of DataSet and DataTable (though this is discouraged for untrusted input) because XmlSerializer never looks at the value of this enum.

@pgovind
Copy link
Contributor

pgovind commented Feb 11, 2021

The SerializationFormat enum is applicable only when using BinaryFormatter to serialize or deserialize an instance of DataSet or DataTable. The enum value controls whether BinaryFormatter itself should use a binary or an XML format for the persisted payload.

Can someone clarify this? This statement seems to suggest that BinaryFormatter can be used regardless of what SerializationFormat's value is? I'm looking at SerializeDataSet and see this code

if (remotingFormat != SerializationFormat.Xml)
{
    use BF to serialize. Nothing here suggests XML unless I'm missing something?
}
else
{
                // old behaviour
                string strSchema = GetXmlSchemaForRemoting(null);

                string? strData = null;
                info.AddValue(KEY_XMLSCHEMA, strSchema);

                StringBuilder strBuilder = new StringBuilder(EstimatedXmlStringSize() * 2);
                StringWriter strWriter = new StringWriter(strBuilder, CultureInfo.InvariantCulture);
                XmlTextWriter w = new XmlTextWriter(strWriter);
                WriteXml(w, XmlWriteMode.DiffGram);
                strData = strWriter.ToString();
                info.AddValue(KEY_XMLDIFFGRAM, strData);

}

It looks like SerializationFormat.Binary uses BinaryFormatter while SerializationFormat.XML uses XMLWriter?

cc @GrabYourPitchforks @ajcvickers

@GrabYourPitchforks
Copy link
Member Author

DataTable and SerializationFormat are something of strange beasts. The best way to think about DataTable is as a container around other data. The SerializationFormat enum controls how the data within the container is serialized, and this could be using a format different than the container (DataTable) itself.

The default value is DataTable.RemotingFormat = RemotingFormat.Xml. This default value means that if you're using code like the following:

var dt = new DataTable(/* ... */) { RemotingFormat = RemotingFormat.Xml };
var bf = new BinaryFormatter(/* ... */);
bf.Serialize(memStream, dt);

The container (DataTable) is written using a BinaryFormatter-serialized format, because the caller instantiated a BinaryFormatter and passed a DataTable to it. But under the covers, DataTable is really XML-serializing its internal values and writing a bunch of XML literal strings to the underlying formatter. If you look at the raw payload bytes, it basically looks like the payload says "Hi, I'm a BinaryFormatter-serialized DataTable, and I have a bunch of string members!" It's akin to BinaryFormatter-serializing the following pseudo-code:

[Serializable]
public class MyDataTable
{
    public string ColumnName1;
    public Type ColumnType1;
    public string ColumnValue1SerializedAsXml;
    public string ColumnName2;
    public Type ColumnType2;
    public string ColumnValue2SerializedAsXml;
    /* ... */
}

If you instead set DataTable.RemotingFormat = RemotingFormat.Binary, then it recursively calls back into BinaryFormatter for each value it's serializing. So both the container (the DataTable itself) and all of its internal values are using BinaryFormatter. It's akin to BinaryFormatter-serializing the following pseudo-code:

[Serializable]
public class MyDataTable
{
    public string ColumnName1;
    public Type ColumnType1;
    public byte[] ColumnValue1SerializedAsBinaryFormatter; // note: raw bytes instead of string
    public string ColumnName2;
    public Type ColumnType2;
    public byte[] ColumnValue2SerializedAsBinaryFormatter; // note: raw bytes instead of string
    /* ... */
}

When serializing, DataTable also inserts a marker specifying which RemotingFormat was in use, that way it can call back into the correct code paths during deserialization. If the incoming payload contains the RemotingFormat.Xml marker, it unwraps its inner contents using XML serialization. If the incoming payload contains the RemotingFormat.Binary marker, it unwraps its inner contents using BinaryFormatter serialization.

The suggestion here is to dishonor the RemotingFormat.Binary switch for both serialization and deserialization unless the app developer has explicitly consented to its usage.

@roji roji modified the milestones: Future, 7.0.0 Feb 10, 2022
@roji roji self-assigned this Feb 10, 2022
roji added a commit to roji/runtime that referenced this issue Feb 10, 2022
@roji roji changed the title Remove BinaryFormatter usage from DataSet Obsolete SerializationFormat.Binary in DataTable/DataSet and put it behind an appcontext switch Feb 10, 2022
@roji
Copy link
Member

roji commented Feb 10, 2022

Submitted #65139 to obsolete SerializationFormat.Binary and put it behind an appcontext switch, for .NET 7.0. #65140 tracks removing the logic entirely in .NET 8.0.

Additional tasks:

@GrabYourPitchforks let me know if this all looks OK and if there's anything I've missed.

@gewarren
Copy link
Contributor

@roji Obsoleting SerializationFormat.Binary is a breaking change, correct? If so, can you log an issue at https://github.com/dotnet/docs/issues/new?assignees=gewarren&labels=breaking-change%2CPri1%2Cdoc-idea&template=breaking-change.yml&title=%5BBreaking+change%5D%3A+? I can add the docs for the new SYSLIB0038 diagnostic.

@roji
Copy link
Member

roji commented Mar 18, 2022

@gewarren thanks - opened dotnet/docs#28726.

Can you point me to where I'd add some docs on the DataSet and DataTable security guidance page?

@gewarren
Copy link
Contributor

gewarren commented Mar 18, 2022

Can you point me to where I'd add some docs on the DataSet and DataTable security guidance page?

You can edit that file here: https://github.com/dotnet/docs/blob/main/docs/framework/data/adonet/dataset-datatable-dataview/security-guidance.md. There's a section titled "Deserialize a DataSet or DataTable via BinaryFormatter".

@roji
Copy link
Member

roji commented Aug 8, 2022

Tracking the remaining docs work on my list, will do that as coding work winds down on 7.0.

@roji roji closed this as completed Aug 8, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Sep 7, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.