Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a fast and convenient deserialization API to ADO.NET #25739

Closed
GSPP opened this issue Apr 3, 2018 · 33 comments
Closed

Add a fast and convenient deserialization API to ADO.NET #25739

GSPP opened this issue Apr 3, 2018 · 33 comments
Labels
area-System.Data enhancement Product code improvement that does NOT require public API changes/additions tenet-performance Performance related issue

Comments

@GSPP
Copy link

GSPP commented Apr 3, 2018

When using ADO.NET to read rows the deserialization cost can be very significant. This is important in scenarios where a lot of data is streamed. Here is directly executable code to demonstrate this:

    static class Program
    {
        static void Main()
        {
            DeserializationBenchmark.Run();
        }

        public static class DeserializationBenchmark
        {
            public static void Run()
            {
                using (var connection = new SqlConnection("Data Source=(local); Integrated Security=True; Initial Catalog=master;"))
                {
                    connection.Open();

                    PrepareTestData(connection);

                    RunBenchmarkLoop();
                }
            }

            static void PrepareTestData(SqlConnection connection)
            {
                using (var command = new SqlCommand(@"SELECT o1.object_id value1, o1.object_id value2, o1.object_id value3, o1.object_id value4, o1.object_id value5 INTO ##BenchmarkRows FROM sys.all_objects o1 cross join (select top 100 * from sys.all_objects o2) o2", connection))
                {
                    command.ExecuteNonQuery();
                }
            }

            static void RunBenchmarkLoop()
            {
                long count = 0;

                while (true)
                {
                    ReadData();

                    count++;

                    if (count % 10 == 0)
                        Console.WriteLine($"Completed {count} iterations.");
                }
            }

            static void ReadData()
            {
                using (var connection = new SqlConnection("Data Source=(local); Integrated Security=True; Initial Catalog=master;"))
                {
                    connection.Open();

                    using (var transaction = connection.BeginTransaction(System.Data.IsolationLevel.ReadCommitted))
                    {
                        using (var command = new SqlCommand(@"SELECT * FROM ##BenchmarkRows", connection, transaction))
                        using (var reader = command.ExecuteReader())
                        {
                            while (reader.Read())
                            {
                                var value1 = reader.GetInt32(0);
                                var value2 = reader.GetInt32(1);
                                var value3 = reader.GetInt32(2);
                                var value4 = reader.GetInt32(3);
                                var value5 = reader.GetInt32(4);
                            }
                        }

                        transaction.Commit();
                    }
                }
            }
        }
    }

This code simply deserializes a table of a few thousand rows. There are 5 int columns. The benchmark is designed to demonstrate the issue but it's not unrealistic. I also used ADO.NET best practices.

image

I might be mistaken but it seems that a lot of the time is spent calling chatty APIs on SqlDataReader. It looks like there is a lot of API overhead and comparatively little work actually parsing the network protocol. The console process uses 100% of one CPU core. SQL Server uses only about 1/3 of one CPU core. This is further (heuristic) evidence that the client is less efficient than the server.

There is no way to avoid this cost at the moment. The code is as fast as it gets. I suggest the following enhancement: Provide an API that directly deserializes into a user-provided type.

Old:

IDataReader reader = command.ExecuteReader();
while (reader.MoveNext()) {
 //Read all fields individually
}

New:

IEnumerable<MyDTO> resultRows = command.ExecuteObjects<MyDTO>();

This API could be made extremely fast because:

  • It is less chatty. ADO.NET can access it's internal buffers more directly.
  • Metadata overhead has to be incurred only once for the whole result set. Many per-row and per-field metadata accesses and validations can be eliminated.
  • ADO.NET can compile a lambda expression dynamically that pulls data from the internal byte buffers and pushes it into the user-provided type. To illustrate the point: The expression could look like this: new MyDTO() { Field1 = internalReader.GetNextFieldAsInt32(), Field2 = internalReader.GetNextFieldAsString(), ... }.
  • The lambda can be cached internally. Alternatively an API consumer would use a factory to create a "fast reader" and cache the reader.
  • The entire row can be read in one go. I believe this makes it faster to parse the TDS protocol.
  • If async is in use there will be less async overhead because the APIs are less chatty.

Possible extensions of this idea would be:

  • Allow structs as well as classes
  • Allow invoking a constructor
  • Allow sequences of the form IEnumerable<object[]> where all field values are provided as objects.
  • Add a pull-style API. Very rough API idea: readerObject.GetNextObject<MyDTO>().
  • Add a way to provide and reuse preallocated objects e.g. readerObject.GetNextObject<MyDTO>(existingObject).
  • Async support (IAsyncEnumerable or other suitable mechanisms).
  • Maybe there can be a parallel mode that hands our a ParallelQuery. I'm not optimistic this can be done.
  • Allow sequences of the form IEnumerable<MyDTO[]>. Here, ADO.NET would hand out rows in chunks. The chunk size would be chosen optimally. Probably, ADO.NET would convert each network buffer into a chunk. This would make the API even less chatty.

I understand that this proposal is a bit raw. The API that I proposed is a straw man that clearly has some flaws. I think an API in this spirit could improve efficiency a lot. It also would be very convenient. The existing pattern is extremely awkward to use. People are using light-weight wrappers like Dapper which look just like this API proposal.


The API should be so that ORMs such as Entity Framework can use it as well. This might not be trivial depending on what Entity Framework does. Here's an idea on how to support arbitrary object instantiation:

Instead of making ADO.NET figure out how to construct the new MyDTO()... part of the expression tree, let the user provide that. ADO.NET provides one Expression for each field to be deserialized. It could look like this:

var fastReader = FastReader.Create<MyDTO>(
    columnDescriptions /* contains the type and order of columns */,
    (Expression[] columnValueExpressions) =>
        Expression.New(typeof(MyDTO), columnValueExpressions) //Create new MyDTO(...) expression
});

//cache fastReader

IEnumerable<MyDTO> objects = fastReader.Read(mySqlDataReader);

In this example, the API caller can create an arbitrary deserialization expression that uses the ADO.NET provided column values in columnValueExpressions. columnValueExpressions[0] would correspond to internalReader.GetNextFieldAsInt32() in the code further above. Entity Framework could construct an expression tree that does anything it wants including creating multiple objects, calling factory methods and calling into the object state manager. All of this processing would be very close to obtaining the actual column values from internal ADO.NET data structures. No chattyness.

@divega
Copy link
Contributor

divega commented Apr 3, 2018

System.Data Triage: This proposal has a lot in common with ideas we have been discussing as part of the data access performance effort, in particular the "shaper" API that materializes objects straight out of buffers that @anpete has prototyped.

We can use this issue to track the general idea, so moving to the Future milestone for now.

cc also @roji @ajcvickers in case they want to comment on the proposal.

@divega
Copy link
Contributor

divega commented Apr 3, 2018

Here are a couple of initial comments:

In https://github.com/aspnet/dataaccessperformance/, we happened to do some analysis of the inherent overhead of the ADO.NET API itself: we built a simple but very fast provider for PostgreSQL that exposes a somewhat similar API to what is described in this proposal, then we created a façade ADO.NET API over it. When we ran some benchmarks using both APIs, and the difference turned out to be negligible.

In some respects, IEnumerable may have more allocation overhead than the current data reader API: after all, each new row is a new instance, while you re-use the same instance of DbDataRader in ADO.NET. However, we agree that this overhead may be offset by new opportunities to do things more efficiently, especially in the cases in which the desired way to consume the results is an object instance per row anyway.

@roji
Copy link
Member

roji commented Apr 3, 2018

Just to state the obvious, it's important to test with another provider before determining that the chattiness of the DbDataReader API is a major cause of slowdown. I don't know how things work with SQL Server/TDS, but at least in PostgreSQL the protocol overhead really shouldn't be that big a factor. It's definitely worth profiling to see with other database (SQLite, PostgreSQL, MySQL) before going for something like this for performance reasons.

However, That doesn't mean that a new API to deserialize a row into an object isn't interesting, even from an API convenience standpoint (making ADO.NET more modern). I agree with @divega regarding allocations, but unless I'm mistaken it may be possible to do this without any allocations - the IEnumerable returned could be recycled across queries, keeping it on the DbCommand instance, or even on the DbConnection instance (if only one query is supported at a given time). If the implementation of this API really causes an allocation per row I think it's a pretty bad idea.

One nice aspect here is that a default implementation could be added to DbCommand (or even DbConnection?), which would simply call DbDataReader.GetFieldValue<T>(), but a provider could override this to prevent an efficient implementation if they wanted.

Finally, this is probably related to #25298.

@divega
Copy link
Contributor

divega commented Apr 3, 2018

If the implementation of this API really causes an allocation per row I think it's a pretty bad idea.

@roji FWIW, what I mean is that when you consume an IEnumerable<T>, the object returned by Enumerator<T>.Current is normally a new instance for every row, while when you consume a DbDataReader, you keep using the same object instance to read the field values from the next row. My second point is that if you want to expose query results as an IEnumerable<T> anyway (as most O/RMs and micro-O/RMs do), then you may as well want to do it as efficiently as possible, which is where an API like this could provide some value.

@roji
Copy link
Member

roji commented Apr 4, 2018

@divega of course, you're right - in my response above I concentrated on the IEnumerable instance itself, but of course that's much less important.

There's the theoretical possibility of returning the same T from Enumerator<T>.Current, and caching T. However, that's going to break the usual enumerable behavior (stuff like ToList(), ToArray() will totally fail). I do wonder if there's any precedence of this in ASP.NET, corefx to avoid allocations, it may be worth checking.

In any case, what's clear in mind is that adding an API that's supposed to help efficiency (e.g. via code-generating row parsing) but which allocates on every row seems like a very self-defeating thing. As usual it won't necessary show up on a microbenchmark, but the overall impact on an application is likely to be negative.

Another comment is that stuff like code-generating row parsing is expensive, and it isn't clear at what point this would happen nor how it could be cached. If you code-generate a row-parser every time a query is executed, then again it's likely the the overall effect would be negative, because of the generation overhead when executing small, easy-to-parse queries. However, this could be a good fit for preparation, which is meant precisely for doing one-time heavy operations in preparation for many executions.

@divega
Copy link
Contributor

divega commented Apr 4, 2018

@roji yes, I was thinking the same thing re preparation.

@GSPP
Copy link
Author

GSPP commented Apr 4, 2018

@divega

It's great to hear that a performance effort is underway. Many times I felt the SQL Server ADO.NET client was too slow for what it did.

I do not believe that allocations would be a significant concern. I mean look at all the work that is being performed in this profile. Allocating and deleting even a million short lived objects each second would not create much GC impact compared to all this work. while(true) new object(); runs at about 100m objects per second if I remember correctly. No data access API can ever provide anywhere near 100m objects per second. This is a heuristic argument to the idea that allocations would always be de minimis compared to the other work going on.

I generally feel that avoiding allocations is a bit over-prioritized these days. Sometimes it's valuable but often the real work overshadows any allocations issues. On the other hand if this API really gets as fast as I think it might then allocations might turn up in the profile.

I proposed a clean way to reuse objects: Let the caller hand in objects so that he is responsible for object reuse. He can the reuse them across calls even.

@roji great idea to see how other providers do. On the other hand I would argue that SQL Server is the "elephant in the room" use case. It's most important. So maybe it's just an internal SqlDataReader fix but given how the profile looks I kind of doubt it. Also, I do not believe the TDS protocol is at fault at all. It's the code that turns TDS into user-accessible data.

Code should never be generated per query as that is too expensive. As a first layer it would be an API where the user explicitly creates a FastReader. Creating this is a slow process and the user is expected to create one and cache it. Entity Framework has compiled queries already. So it must have some internal per-query cache object. It can easy perform the FastReader caching itself.

ADO.NET could have some default caching built-in. I believe the TDS protocol starts by sending the structure of rows and then the actual data for those rows. Each query result set returns schema info. So the cache could be keyed on the type of the user provided DTO and the schema info sent by the server. Then, the generated code can be perfectly specialized not only to the DTO type but also the binary row layout! This would become insanely fast. Then, the generated code would become what I showed in the proposal:

(InternalReader internalReader) =>
     new MyDTO() {
       Field1 = internalReader.GetNextFieldAsInt32(),
       Field2 = internalReader.GetNextFieldAsString(),
       ...
      }

Currently, the SqlDataReader.GetInt32(index) API must take care to read fields out of order and validate all kinds of things. Also, I think it reads rows incrementally from the network and it supports streaming large byte/char streams. That can all go away. Parsing would be almost as efficient as manually written code using something like BinaryReader. This would be the end state of efficiency. Cannot be improved further.

@roji
Copy link
Member

roji commented Apr 4, 2018

I do not believe that allocations would be a significant concern.

I think that depends on which allocations are being discussed... If you're referring to an allocation that happens once per command execution, then you're likely right (since a command execution basically means network traffic). However, an allocation that happens on each row, or worse, on each column, could have a significant impact on performance. In my benchmarks of Npgsql I definitely saw some significant improvements when eliminating row- and column-based allocations.

Basically I think that saying "allocations" doesn't mean much without specifying exactly when these allocations take place.

@roji great idea to see how other providers do. On the other hand I would argue that SQL Server is the "elephant in the room" use case. It's most important.

I'm not sure exactly how you mean that - why is SQL Server the most important? Lots of people use .NET Core with Sqlite, PostgreSQL or MySQL.

So maybe it's just an internal SqlDataReader fix but given how the profile looks I kind of doubt it. Also, I do not believe the TDS protocol is at fault at all. It's the code that turns TDS into user-accessible data.

So again, I'm not sure what the evidence is that parsing TDS is a significant performance bottleneck in SQL Client, and much less in ADO.NET drivers in general (after all we're discussing a general improvement to ADO.NET that doesn't affect only SQL Server). Unless I'm mistaken the Techempower benchmarks currently perform significantly worse for SQL Server compared to PostgreSQL/Npgsql. Of course that doesn't tell you if the problem is in the ADO.NET client, the server or both (the benchmark measures the entire stack), but this should be explored and proved before making any assumptions.

Code should never be generated per query as that is too expensive. As a first layer it would be an API where the user explicitly creates a FastReader. Creating this is a slow process and the user is expected to create one and cache it. Entity Framework has compiled queries already. So it must have some internal per-query cache object. It can easy perform the FastReader caching itself.

I think what you're referring to would be better done as part of command preparation, which is already a part of the standard ADO.NET API. I don't really see any need to introduce a new API such as FastReader - DbCommand.Prepare() could be the user-induced trigger to do the code generation.

ADO.NET could have some default caching built-in. I believe the TDS protocol starts by sending the structure of rows and then the actual data for those rows. Each query result set returns schema info. So the cache could be keyed on the type of the user provided DTO and the schema info sent by the server.

Again, that sounds like a good fit for a more generalized auto-preparation mechanism (see #25185). Compiling a row reader is just one part of optimizing repeated execution of the same command, and the concept of preparation already exists as a standard database API. Npgsql specifically already has a facility whereby when an SQL is executed more than X times, it gets automatically prepared, with the prepared sources being keyed on the SQL. This currently involves server statement preparation, but it could just as well include client-side code generation.

But I think there are more important question to be answered.

First, are we sure that there actually is a problem here? Your original profiling session shows that 77% of the time is spent in SqlDataReader.GetInt32(). However, your benchmark doesn't actually do anything with the values it reads, so it makes sense for the bulk of the time to be spent in SqlDataReader. It doesn't mean that anything is actually wrong or slow...

Second, even if we agree that SqlDataReader.ReadInt32() is somehow too slow, you still need to show that this is a systematic problem across ADO.NET providers in order to justify a new, cross-provider ADO.NET feature for row parsing. Otherwise you may be projecting from an SQL Client inefficiency to the entire database API.

Finally, even if we accept that a new code-generating API is needed, it still isn't clear what would the API shape should be. The original API proposed here (IEnumerable<MyDTO> resultRows = command.ExecuteObjects<MyDTO>()) suffers from the allocation-per-row issue, unless you break IEnumerable semantics and return the same MyDTO instance for every row (in which case ToList() and related stop working). Maybe you have another API in mind...

Then, the generated code can be perfectly specialized not only to the DTO type but also the binary row layout!

The binary layout doesn't depend just on the resultset schema/metadata, since fields have different lengths in different rows. In other word, the same string column could have different lengths in different rows, changing the position of all fields coming afterwards. That's one reason why I think the value of code generation isn't huge here.

Currently, the SqlDataReader.GetInt32(index) API must take care to read fields out of order and validate all kinds of things. Also, I think it reads rows incrementally from the network and it supports streaming large byte/char streams. That can all go away. Parsing would be almost as efficient as manually written code using something like BinaryReader. This would be the end state of efficiency. Cannot be improved further.

Reading fields out of order really isn't much. Also, unless CommandBehavior.Sequential is specified the row should in general be buffered in memory, making all column read operations very light memory operations - there's no reason for there to be any sort of overhead for reading rows incrementally from the network, supporting large byte/char stream, etc.. Maybe this isn't the way SQL Client operates (I have no idea), but it definitely could be.

I'd really like to see some hard figures with Npgsql before concluding that ADO.NET needs to be changed somehow.

@jnm2
Copy link
Contributor

jnm2 commented Apr 5, 2018

I'd absolutely love the ability to obtain an IAsyncEnumerable<T> from a query. Most of the time I do this, T will either be a primitive type or an immutable struct (or less frequently, immutable class). So I'd expect the API to pass the values to the constructor in those cases rather than setting properties or fields.

To me performance is secondary. The real win is that the data conveniently comes back already in the form that I want to pass around.

@roji
Copy link
Member

roji commented Apr 7, 2018

Regarding IAsyncEnumerable<T>, see #25298

@jnm2
Copy link
Contributor

jnm2 commented Apr 7, 2018

Isn't #25298 about IAsyncEnumerable<DbDataReader>, or does it cover IAsyncEnumerable<ImmutableDTO> too?

@roji
Copy link
Member

roji commented Apr 7, 2018

Isn't #25298 about IAsyncEnumerable, or does it cover IAsyncEnumerable too?

#25298 is about making "DbDataReader implement IAsyncEnumerable", I don't really know what IAsyncEnumerable<DbDataReader> would mean (an enumeration of resultsets?).

The question is still whether that IAsyncEnumerable would be of some sort of general record API (like a modern DbDataRecord) or of a user-provided DTO type. In any case, #25298 suffers from the same design issue as this proposal - either it constantly returns the same instance (in which case it breaks enumerable semantics and is only useful with foreach) or it allocates on each row, which isn't good for perf.

@jnm2
Copy link
Contributor

jnm2 commented Apr 7, 2018

I don't really know what IAsyncEnumerable<DbDataReader> would mean (an enumeration of resultsets?).

Right now it's IEnumerable and returns itself (or a proxy object implementing IDataRecord?) as enumerator.Current, right? I would have said IAsyncEnumerable<IDataRecord> but I'd want to use the async getters which are only available on DbDataReader. That's where that came from.

Anyway, my interest is really in materialized immutable read model objects. (Usually value types.) Even though IAsyncEnumerable<DbDataRecord> would be cool if DbDataRecord had async getters, it would only be a means to that end. Which thread is closest to tracking that particular end; #25298?

@roji
Copy link
Member

roji commented Apr 7, 2018

I don't really know what IAsyncEnumerable would mean (an enumeration of resultsets?).

Right now it's IEnumerable and returns itself (or a proxy object implementing IDataRecord?) as enumerator.Current, right? I would have said IAsyncEnumerable but I'd want to use the async getters which are only available on DbDataReader. That's where that came from.

Ah OK, I understand better. Yeah, the current IEnumerable implementation returns IDataRecord, which exposes pretty much the same API as DbDataReader.

Regarding implementing IAsyncEnumerable<T> vs. IAsyncEnumerable<IDataRecord>, the two could be useful. Once again, rather than (or in addition to) IAsyncEnumerable<T>, DbDataReader could simply have something like ReadRows<T>(T[] rows) which populates row objects given by the user - this would eliminate any allocations.

I guess the bigger question is whether ADO.NET should start mapping rows to user DTOs, which is a step in the direction of an O/RM (or at least something like Dapper).

@jnm2
Copy link
Contributor

jnm2 commented Apr 7, 2018

ReadRows<T>(T[] rows) works if the DTOs are mutable. You can still be allocation-free with IAsyncEnumerable<ImmutableStructDTO>, IAsyncEnumerable<int>, etc.

I guess the bigger question is whether ADO.NET should start mapping rows to user DTOs, which is a step in the direction of an O/RM (or at least something like Dapper).

How ORMy? I like the idea of ADO.NET staying close to the metal with this. Having relationships between objects seems like something you'd use EF for. I'm envisioning ADO.NET mapping a row to a constructor call, forcing each column name to match each constructor parameter. If you want nested structured types, you'd do the translation yourself from the primitive values you receive in the DTO's constructor.

@roji
Copy link
Member

roji commented Apr 7, 2018

ReadRows(T[] rows) works if the DTOs are mutable. You can still be allocation-free with IAsyncEnumerable, IAsyncEnumerable, etc.

I don't think it's a good idea to introduce an API that works well only for value types, isn't it better to have a single API that's allocation-free for both value and reference types? Regarding IAsyncEnumerable<int>, you're suggesting that ADO.NET detect that it's a one-field row and unwrap that, right?

I guess the bigger question is whether ADO.NET should start mapping rows to user DTOs, which is a step in the direction of an O/RM (or at least something like Dapper).

How ORMy? I like the idea of ADO.NET staying close to the metal with this. Having relationships between objects seems like something you'd use EF for. I'm envisioning ASO.NET mapping a row to a constructor call, forcing each column name to match each constructor parameter. If you want nested structured types, you'd do the translation yourself from the primitive values you receive in the DTO's constructor.

I'm not saying it's not possible, but this would be the first time ADO.NET maps rows to user objects (as opposed to columns with UDTs, which are pretty much opaque). That brings in new functionality that currently exists in Dapper, it's something to be considered. Again, the first thing people would (understandably) ask for is to relax the restriction on value types. Then, database columns typically have different names as your constructor arguments (my_field vs myField), so people will be asking for some sort of mapping...

I'm just saying this is probably significantly more than just adding the feature etc.

@jnm2
Copy link
Contributor

jnm2 commented Apr 7, 2018

isn't it better to have a single API that's allocation-free for both value and reference types?

Absolutely, if the APIs enable the same abilities. ReadRows won't be an option for me because it won't work with immutable structs or immutable classes.

Regarding IAsyncEnumerable<int>, you're suggesting that ADO.NET detect that it's a one-field row and unwrap that, right?

Yes, please! Like ExecuteScalarAsync does, but in one dimension not two! 😃

Again, the first thing people would (understandably) ask for is to relax the restriction on value types

I'm sorry, I didn't mean to ask for such a restriction. I'd use both immutable structs and immutable classes depending on the size and scenario. Hopefully that resolves that question?

ADO.NET already has a concept of mapping in SqlBulkCopy, but even so, how common can it be? If you are not able to make the constructor parameter names be OrdinalIgnoreCase equal to the query column names, you'll have this problem with every potential API as long as DTOs are involved. It sounds like a 0.1% case.

If you want to start with an API that makes you do a little more work but is a fallback for the 0.1% case (or it turns out to be a 20% case or something), that would be IAsyncEnumerable<DbDataRecord>, right? Can we have async getters on DbDataRecord?

@roji
Copy link
Member

roji commented Apr 7, 2018

Again, the first thing people would (understandably) ask for is to relax the restriction on value types

I'm sorry, I didn't mean to ask for such a restriction. I'd use both immutable structs and immutable classes depending on the size and scenario. Hopefully that resolves that question?

I guess I should have written mutable rather than reference types. The point (IMHO) is an API that works for both.

I'm not sure why an API that accepts an array doesn't work for immutable structs... You can always overwrite a slot in the array with a newly created (and populated) struct, no?

Anyway, I definitely agree that ADO.NET can and should be made more usable and convenient, it just has to be done in a way that addresses all the issues etc.

@jnm2
Copy link
Contributor

jnm2 commented Apr 7, 2018

I'm not sure why an API that accepts an array doesn't work for immutable structs... You can always overwrite a slot in the array with a newly created (and populated) struct, no?

True, if it decides to call the constructor. Then it will incur allocations if it doesn't act differently depending on whether T is a struct or classes, but that's fine. What happens if T is a class and you pass an array of nulls?

There are two benefits of IAsyncEnumerable which I'm looking forward to: foreach async in C#, and LINQ composability. That could be implemented as an extension method on top of DbDataReader.ReadRowsAsync<T>(T[]) by allocating new T[1] and awaiting ReadRowsAsync in MoveNextAsync.

@jnm2
Copy link
Contributor

jnm2 commented Apr 7, 2018

Guessing from the open questions in https://github.com/dotnet/csharplang/blob/master/proposals/async-streams.md, it would be something like this (but you'd use struct enumerators instead of an iterator method):

public static async IAsyncEnumerable<T> AsEnumerable<T>(this DbDataReader reader, CancellationToken cancellationToken)
{
    if (reader == null) throw new ArgumentNullException(nameof(reader));

    var singleRow = new T[1];

    while (await reader.ReadRowsAsync(singleRow, cancellationToken).ConfigureAwait(false))
    {
        yield return singleRow[0];
    }
}

Another question you'd have to answer if ADO.NET touched IAsyncEnumerable is whether you'd dispose the reader when the enumerator is disposed. It can't always dispose or this API could only be used with the final result set. Maybe DbDataReader.AsEnumerable<T>(bool leaveOpen, CancellationToken cancellationToken) would be useful to folks? Or would it be clearest to emphasize that the enumerator never disposes the reader and disposal must be tracked as you always would?
(I'm interested in what the guidance would be because even if you avoid IAsyncEnumerable, people will write this wrapper so they can foreach async and compose LINQ operations.)

@GSPP
Copy link
Author

GSPP commented Apr 20, 2018

There was a lot of discussion now around what result shapes would be supported. My expression based proposal shows how the user can customize this API for the shape of any T at all. The construction of a T item can be provided by the user.

This means that value types, custom constructors, factory methods, field and property writes and nested objects are all automatically supported.

The user could then also reuse a single object instance if he wants. He would just pass an expression tree like this:

(myCol1, myCol2, ...) => {
 var rowObject = GetSharedInstance(); //This could also be an Expression.Constant
 rowObject.MyCol1 = myCol1; //set values using mutation
 rowObject.MyCol2 = myCol2; //set values using mutation
 return rowObject; //always return the same instance
}

@GSPP
Copy link
Author

GSPP commented Apr 20, 2018

I now benchmarked the impact of allocations (I updated the code in the opening post; just change the value of allocationsPerRow and run). I tested allocating N objects per row.

0: 18.0ms/iteration.
1: 20.5ms/iteration.
10: 22.9ms/iteration.
100: 37.3ms/iteration.

"1" would be the typical use case. Each row normally turns into one DTO instance. This benchmark works on small rows of a fixed size. This should be a hard case for allocations. The benchmark results were a bit unstable but seemed good enough. Later iterations tended to be faster. Maybe this was the GC and SQL Server tuning themselves to the load.

I ran this on .NET 4.7.1, x64 and with <gcConcurrent enabled="true" /><gcServer enabled="true" />.

It seems that allocations do not have that much of an impact but they certainly are above "noise" level.

@roji
Copy link
Member

roji commented Apr 21, 2018

@GSPP what are the datatypes of myCol1, myCol2 in your code sample? If they're object you're boxing all value types coming from the database, right? Otherwise there needs to be some sort of generic API which I'm not seeing in your sample - it would be good to understand exactly what you have in mind.

More generally, there's the question here of the extent to which we want to start including features in ADO.NET which are currently implemented in an upper layer such as Dapper. This isn't to say it shouldn't be done - just how much we view ADO.NET as a low-level API. What exactly are your issues with the way things are currently doable with Dapper?

@GSPP
Copy link
Author

GSPP commented Apr 22, 2018

@roji I know that I have not explained it very well. I'll try harder:

Let's assume ADO.NET has an internal class TdsReader that has a method ReadInt32 that behaves essentially like BinaryReader.ReadInt32(). Then, ADO.NET would create the expression tree instanceOfTdsReader.ReadInt32() and pass this tree as myCol1.

Then, application code can use this incoming expression tree to obtain the value of the first column without boxing. The application code creates from those subexpressions a lambda expression that creates the output for that row (like a new object instance). This lambda expression tree is handed back to ADO.NET.

When then final integrated tree is compiled, there will be no boxing. ReadInt32 could even be inlined by the JIT. Performance would be as if a human had hand-written deserialization code specialized for the specific query and for the specific row type.

The expression myCol1 serves as a placeholder for ADO.NET to fill in the code needed to read data from the TDS stream. That way user code does not need to directly see and call the internal TdsReader class. This is abstracted away making the API very clean from the application code perspective.

Why integrate this into ADO.NET? Because only ADO.NET has access to the internal TDS parsing infrastructure. Only ADO.NET can therefore help create the optimal expression tree that really has the minimal amount of abstraction cost possible.

@roji
Copy link
Member

roji commented Apr 22, 2018

@GSPP thanks for the details. OK, so you're proposing that ADO.NET providers expose an expression tree for reading an int, alongside the current method which reads an int. This fragment would then be combined by the user to a larger expression tree, typically reading an entire row (by combining column reading fragments), and compiled. Is that a fair description?

It's clear that this indeed wouldn't box, but I think it's important to compare the perf benefits compared to today's method of calling DbDataReader.ReadInt32() and similar methods. It's true that this technique would eliminate the overhead of calling the method itself (and it's a virtual one), but the actual code contained in ReadInt32() would be included exactly as it is.

Using code generation (which is what this is) for optimization is usually a good idea when it allows doing an expensive thing upfront, and use it to shape the logic of your generated code... For example, if you need to access some object via reflection, it makes a lot of sense to perform the reflection once and then compile an expression tree to avoid paying a lot of reflection overhead on every access. Here, however, the only thing you'd be saving would be the virtual call overhead when getting each column. I'm not saying it's necessarily worthless, just that the usefulness of code generation here seems much more limited then when we usually use it.

One more general note... Before diving into the details and API shape, I think we need to understand what it is we're looking to accomplish here. I think there are two motivators here:

  1. A more usable/modern API (this is what @jnm2 seems to want). This is definitely interesting, but it begs the question of why Dapper isn't a good answer here; it allows you to easily read your own DTOs without doing DbDataReader column reads. If the driver for this is usability rather than perf, then Dapper seems to be a good answer (incidentally with minimal perf overhead).
  2. A performance optimization. If we're trying to speed up ADO.NET access here, then I think we really need to make sure we actually have a problem with ADO.NET column access (as I wrote above in https://github.com/dotnet/corefx/issues/28769#issuecomment-378748599). We haven't yet seen a benchmark that shows a bottleneck in a non-SQL Client provider (so this could be a simple SQL Client problem)...

To clarify, I'm not trying to shoot this down or anything, just to advance the conversation and make sure we understand why we're discussing this.

@jnm2
Copy link
Contributor

jnm2 commented Apr 22, 2018

A more usable/modern API (this is what @jnm2 seems to want). This is definitely interesting, but it begs the question of why Dapper isn't a good answer here; it allows you to easily read your own DTOs without doing DbDataReader column reads. If the driver for this is usability rather than perf, then Dapper seems to be a good answer (incidentally with minimal perf overhead).

Yes, very fair. My interest is in IAsyncEnumerable<>. For my own uses I'd want T to be able to be:

  • a column type (primitive type, DateTime, string, Decimal, enum)
  • an immutable user-defined struct or class

This would require ADO.NET to call user-defined constructors to pass in the primitive column values.

ℹ As a cool side benefit, I'd be able to use user-defined domain primitives for single-column returns. For example, instead of IAsyncEnumerable<DateTime> I'd be able to receive an IAsyncEnumerable<LocalDate>. Instead of IAsyncEnumerable<int>, I'd receive an IAsyncEnumerable<Id<Foo>>. This would fall out automatically because these simple wrapper types expose constructors which receive parameters that ADO.NET would fill exactly like it would for multi-column user-defined types.

Generalizing

Now stepping back, we'd be asking ADO.NET to do two things.

  • Call a user-defined delegate, injecting column values as arguments (what I think @GSPP might be asking for)
  • Provide the delegate's return values through IAsyncEnumerable<>.

I think this would be really powerful. You might want an IAsyncEnumerable<ThirdPartyFoo> which doesn't provide a constructor that would meet ADO.NET's focused requirements for column value injection. Rather than writing your own class to wrap it and using IAsyncEnumerable.Select to convert it, it would be cool to be able to do:

reader.AsAsyncEnumerable<ThirdPartyFoo>(
    (int x, DateTime y, int z) => new ThirdPartyFoo());

Even though most of the time you wouldn't need custom logic, so you'd let ADO.NET locate the constructor which has the correct parameters:

reader.AsAsyncEnumerable<NormalFoo>(); // Directly calls new NormalFoo(int x, DateTime y, int z)

@jnm2
Copy link
Contributor

jnm2 commented Apr 22, 2018

If you don't want IAsyncEnumerable, I'd also be interested in reader.ReadRows<T>. This can be generalized in the same way. Rather than filling an array element per row, ADO.NET would call an action per row and leave all the decisions to the user delegate (whether to allocate, how to instantiate, etc):

await reader.ReadRowsAsync(
    (int x, DateTime y, int z) => buffer.Add(new Foo()),
    min: 1 // Keep waiting until at least 1 row is available and sent to the delegate
    max: buffer.Capacity); // If extra rows are *synchronously* available

A user like me could easily wrap an IAsyncEnumerable API around this with min: 1, max: 1.

@roji
Copy link
Member

roji commented Nov 12, 2018

@GSPP, I executed your original benchmark on Npgsql. The following is DotTrace analyzing a console program running .NET Framework 4.6.1 on my local laptop running Windows 10, against a local PostgreSQL 10. This is definitely not the way to run a benchmark - the server needs to be on another machine etc. - but it gives an idea:

image

As you can see, GetInt32() doesn't take nearly as much time as it took in your own run on MSSQL. The running time is split more or less equally between NpgsqlDataReader.GetInt32() (called 5 times per iteration), NpgsqlDataReader.Read() (1 per iteration) and NpgsqlDataReader.ExecuteReader() (1 per iteration). I think this more or less makes sense, especially since these are the only things happening in the benchmark. Note that we know that there's room for improvement in Npgsql's implementation of column getters such as GetInt32() - we currently have too much virtual call indirection, but that's an internal implementation detail.

Based on the above, I can't say I see evidence for perf value in a new API deserializing an entire row - what you're seeing with MSSQL seems to be more of a perf issue in MSSQL than something that justifying a general ADO.NET change. In addition, such an API would still internally do more or less the same things as the current one - decoding integers from a memory buffer - so actual perf increases are likely to be negligible at best (i.e. you'd be saving column index checks but nothing more).

Note that this doesn't say anything about introducing a new API for usability, e.g. based on the new IAsyncEnumerable (#27684 tracks that). But I do think it shows that we currently have no reason to think about new ADO.NET column access APIs for performance.

@GSPP
Copy link
Author

GSPP commented Nov 12, 2018

@roji well, MSSQL has 70% time in GetInt32 and Npgsql has 40% time there. That is a tremendous amount for something that should be similar to *((int*)&networkBuffer[offset]). 10 instructions get turned into 1000 with all that indirection.

I imagine that the generated expression tree would look something like:

return new MyRow() {
 Field1 = *((int*)&networkBuffer[rowOffset + constantColumnOffset1]),
 Field2 = *((int*)&networkBuffer[rowOffset + constantColumnOffset2]),
 ...
}

I'm ignoring things like variable length columns here. Surely it would not be quite as slim. But far better than a huge call trees, vcalls, no inlining, less JIT optimizations because everything is so spread out. The code can be much slimmer because the network data format is known. We know the columns and in some cases even their direct offsets. I bet the existing code must do all kinds of dynamic lookups and validations. It's like reflection vs. directly written code.

@jnm2
Copy link
Contributor

jnm2 commented Nov 12, 2018

I have actually implemented a data reader that asynchronously reads all rows of a result set and calls the specified delegate (any void-returning delegate type) once per row, injecting the row's column values as arguments. Makes it easy to project the data however you like. Emitted IL is as good as handwritten.

@bgrainger
Copy link
Contributor

@GSPP I ran your benchmark against MySQL (.NET 4.7.2 console app running against local MySQL Server on Windows 10) and got these results:

image

It's 60% in Read (which is performing I/O and fetching new rows on the server) and 40% in GetInt32, 40% of which is parsing ASCII strings (which is how MySQL formats numbers on the wire) into ints. Like @roji, I also see little value in a new API for deserialising an entire row when it seems likely that this would be a complex subsystem that would have to be implemented and maintained by each ADO.NET library in order to get any performance gains over a generic implementation.

That is a tremendous amount for something that should be similar to *((int*)&networkBuffer[offset]).

It's certainly nowhere near this simple for the MySQL text protocol.

@roji
Copy link
Member

roji commented Nov 12, 2018

MSSQL has 70% time in GetInt32 and Npgsql has 40% time there. That is a tremendous amount for something that should be similar to ((int)&networkBuffer[offset]).

Whether a certain percentage is "a tremendous time" or not depends on what else is happening in the benchmark. As written above, in this case the benchmark isn't doing any processing on the results, so it could be (and probably is) completely fine for this to take 40%.

To make this simple... If you want to show that this is a worthwhile optimization, that the best way to do that is to do a PoC of your API in an existing driver (e.g. Npgsql or SqlClient). Then you would be able to show actual improvement (e.g. in operations per second) in a benchmark, directly as a result of your change. Continuing to argue about call percentages in a profiler isn't very meaningful in my opinion.

But far better than a huge call trees, vcalls, no inlining, less JIT optimizations because everything is so spread out.

That's a lot of assumptions right there - and assumptions in optimizations are a pretty bad idea. The size of the call tree, vcalls and JIT optimizations is going to vary from provider to provider, and there's a very good chance that all of them taken together they're negligible in the grander scheme of things.

The code can be much slimmer because the network data format is known. We know the columns and in some cases even their direct offsets.

This doesn't make sense to me. Offsets usually vary because column sizes do differ - not all data is fixed-size ints; I assume you're not pushing for an optimization which would only work for fixed-length-only rows. In any case, the time spent on finding the column really seems quite negligible, at least in Npgsql. And what does "knowing the columns" mean if you don't know the offset? If you need to get the value of column 2 in the current row, you have to go to where it starts regardless if it's from GetInt32() or some row API.

I bet the existing code must do all kinds of dynamic lookups and validations.

Which existing code, MSSQL? Npgsql? Does it have to be that way or is it just badly-written? Can't we just make it better without a whole new API that every single ADO.NET provider will have to implement?

As I wrote above, Npgsql's implementation of GetInt32() does indeed suffer from too many vcalls (@YohDeadfall is currently working on that). But that really is an internal implementation issue. Nothing is stopping an ADO.NET driver from implementing GetInt32() without vcalls.

The only validation I'm aware of is making sure that the parameter to GetInt32() isn't out-of-bounds. Are you referring to this? Is this really worth optimizing away?

It's like reflection vs. directly written code.

It really isn't. Reflection has a (huge) overhead over direct access, which simply does not exist here. In all cases, reading an int column of a row consists of finding where that column starts, deserializing it (e.g. endian handling for ints) and returning the value. There's nothing I can see that a row-based API (or a code-generation approach) can do to optimize these away: they have to be done regardless. And if you did want to code-generate a delegate for the column access (again, I don't see why), the driver could do this internally and execute that code-generated delegate from GetInt32(). In other words, it's once again an internal implementation detail of the provider.

To summarize, all we have at the moment is a profiler snapshot and a claim that 40% (or 70%) is "too much". In my opinion, if you want to show that your approach makes sense, then we need to see a simplified implementation of what you're proposing, along with a benchmark that shows a significant increase in operations per second.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@ajcvickers ajcvickers removed the untriaged New issue has not been triaged by the area owner label Jun 22, 2020
@ajcvickers
Copy link
Contributor

Closing as this is not something we plan to implement.

@ajcvickers ajcvickers removed this from the Future milestone Jun 23, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Data enhancement Product code improvement that does NOT require public API changes/additions tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

8 participants