-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a fast and convenient deserialization API to ADO.NET #25739
Comments
System.Data Triage: This proposal has a lot in common with ideas we have been discussing as part of the data access performance effort, in particular the "shaper" API that materializes objects straight out of buffers that @anpete has prototyped. We can use this issue to track the general idea, so moving to the Future milestone for now. cc also @roji @ajcvickers in case they want to comment on the proposal. |
Here are a couple of initial comments: In https://github.com/aspnet/dataaccessperformance/, we happened to do some analysis of the inherent overhead of the ADO.NET API itself: we built a simple but very fast provider for PostgreSQL that exposes a somewhat similar API to what is described in this proposal, then we created a façade ADO.NET API over it. When we ran some benchmarks using both APIs, and the difference turned out to be negligible. In some respects, IEnumerable may have more allocation overhead than the current data reader API: after all, each new row is a new instance, while you re-use the same instance of DbDataRader in ADO.NET. However, we agree that this overhead may be offset by new opportunities to do things more efficiently, especially in the cases in which the desired way to consume the results is an object instance per row anyway. |
Just to state the obvious, it's important to test with another provider before determining that the chattiness of the DbDataReader API is a major cause of slowdown. I don't know how things work with SQL Server/TDS, but at least in PostgreSQL the protocol overhead really shouldn't be that big a factor. It's definitely worth profiling to see with other database (SQLite, PostgreSQL, MySQL) before going for something like this for performance reasons. However, That doesn't mean that a new API to deserialize a row into an object isn't interesting, even from an API convenience standpoint (making ADO.NET more modern). I agree with @divega regarding allocations, but unless I'm mistaken it may be possible to do this without any allocations - the IEnumerable returned could be recycled across queries, keeping it on the DbCommand instance, or even on the DbConnection instance (if only one query is supported at a given time). If the implementation of this API really causes an allocation per row I think it's a pretty bad idea. One nice aspect here is that a default implementation could be added to DbCommand (or even DbConnection?), which would simply call Finally, this is probably related to #25298. |
@roji FWIW, what I mean is that when you consume an |
@divega of course, you're right - in my response above I concentrated on the There's the theoretical possibility of returning the same In any case, what's clear in mind is that adding an API that's supposed to help efficiency (e.g. via code-generating row parsing) but which allocates on every row seems like a very self-defeating thing. As usual it won't necessary show up on a microbenchmark, but the overall impact on an application is likely to be negative. Another comment is that stuff like code-generating row parsing is expensive, and it isn't clear at what point this would happen nor how it could be cached. If you code-generate a row-parser every time a query is executed, then again it's likely the the overall effect would be negative, because of the generation overhead when executing small, easy-to-parse queries. However, this could be a good fit for preparation, which is meant precisely for doing one-time heavy operations in preparation for many executions. |
@roji yes, I was thinking the same thing re preparation. |
It's great to hear that a performance effort is underway. Many times I felt the SQL Server ADO.NET client was too slow for what it did. I do not believe that allocations would be a significant concern. I mean look at all the work that is being performed in this profile. Allocating and deleting even a million short lived objects each second would not create much GC impact compared to all this work. I generally feel that avoiding allocations is a bit over-prioritized these days. Sometimes it's valuable but often the real work overshadows any allocations issues. On the other hand if this API really gets as fast as I think it might then allocations might turn up in the profile. I proposed a clean way to reuse objects: Let the caller hand in objects so that he is responsible for object reuse. He can the reuse them across calls even. @roji great idea to see how other providers do. On the other hand I would argue that SQL Server is the "elephant in the room" use case. It's most important. So maybe it's just an internal SqlDataReader fix but given how the profile looks I kind of doubt it. Also, I do not believe the TDS protocol is at fault at all. It's the code that turns TDS into user-accessible data. Code should never be generated per query as that is too expensive. As a first layer it would be an API where the user explicitly creates a ADO.NET could have some default caching built-in. I believe the TDS protocol starts by sending the structure of rows and then the actual data for those rows. Each query result set returns schema info. So the cache could be keyed on the type of the user provided DTO and the schema info sent by the server. Then, the generated code can be perfectly specialized not only to the DTO type but also the binary row layout! This would become insanely fast. Then, the generated code would become what I showed in the proposal:
Currently, the |
I think that depends on which allocations are being discussed... If you're referring to an allocation that happens once per command execution, then you're likely right (since a command execution basically means network traffic). However, an allocation that happens on each row, or worse, on each column, could have a significant impact on performance. In my benchmarks of Npgsql I definitely saw some significant improvements when eliminating row- and column-based allocations. Basically I think that saying "allocations" doesn't mean much without specifying exactly when these allocations take place.
I'm not sure exactly how you mean that - why is SQL Server the most important? Lots of people use .NET Core with Sqlite, PostgreSQL or MySQL.
So again, I'm not sure what the evidence is that parsing TDS is a significant performance bottleneck in SQL Client, and much less in ADO.NET drivers in general (after all we're discussing a general improvement to ADO.NET that doesn't affect only SQL Server). Unless I'm mistaken the Techempower benchmarks currently perform significantly worse for SQL Server compared to PostgreSQL/Npgsql. Of course that doesn't tell you if the problem is in the ADO.NET client, the server or both (the benchmark measures the entire stack), but this should be explored and proved before making any assumptions.
I think what you're referring to would be better done as part of command preparation, which is already a part of the standard ADO.NET API. I don't really see any need to introduce a new API such as FastReader -
Again, that sounds like a good fit for a more generalized auto-preparation mechanism (see #25185). Compiling a row reader is just one part of optimizing repeated execution of the same command, and the concept of preparation already exists as a standard database API. Npgsql specifically already has a facility whereby when an SQL is executed more than X times, it gets automatically prepared, with the prepared sources being keyed on the SQL. This currently involves server statement preparation, but it could just as well include client-side code generation. But I think there are more important question to be answered. First, are we sure that there actually is a problem here? Your original profiling session shows that 77% of the time is spent in Second, even if we agree that Finally, even if we accept that a new code-generating API is needed, it still isn't clear what would the API shape should be. The original API proposed here (
The binary layout doesn't depend just on the resultset schema/metadata, since fields have different lengths in different rows. In other word, the same string column could have different lengths in different rows, changing the position of all fields coming afterwards. That's one reason why I think the value of code generation isn't huge here.
Reading fields out of order really isn't much. Also, unless I'd really like to see some hard figures with Npgsql before concluding that ADO.NET needs to be changed somehow. |
I'd absolutely love the ability to obtain an To me performance is secondary. The real win is that the data conveniently comes back already in the form that I want to pass around. |
Regarding |
Isn't #25298 about |
#25298 is about making "DbDataReader implement IAsyncEnumerable", I don't really know what The question is still whether that IAsyncEnumerable would be of some sort of general record API (like a modern DbDataRecord) or of a user-provided DTO type. In any case, #25298 suffers from the same design issue as this proposal - either it constantly returns the same instance (in which case it breaks enumerable semantics and is only useful with foreach) or it allocates on each row, which isn't good for perf. |
Right now it's Anyway, my interest is really in materialized immutable read model objects. (Usually value types.) Even though |
Ah OK, I understand better. Yeah, the current Regarding implementing I guess the bigger question is whether ADO.NET should start mapping rows to user DTOs, which is a step in the direction of an O/RM (or at least something like Dapper). |
How ORMy? I like the idea of ADO.NET staying close to the metal with this. Having relationships between objects seems like something you'd use EF for. I'm envisioning ADO.NET mapping a row to a constructor call, forcing each column name to match each constructor parameter. If you want nested structured types, you'd do the translation yourself from the primitive values you receive in the DTO's constructor. |
I don't think it's a good idea to introduce an API that works well only for value types, isn't it better to have a single API that's allocation-free for both value and reference types? Regarding
I'm not saying it's not possible, but this would be the first time ADO.NET maps rows to user objects (as opposed to columns with UDTs, which are pretty much opaque). That brings in new functionality that currently exists in Dapper, it's something to be considered. Again, the first thing people would (understandably) ask for is to relax the restriction on value types. Then, database columns typically have different names as your constructor arguments ( I'm just saying this is probably significantly more than just adding the feature etc. |
Absolutely, if the APIs enable the same abilities.
Yes, please! Like
I'm sorry, I didn't mean to ask for such a restriction. I'd use both immutable structs and immutable classes depending on the size and scenario. Hopefully that resolves that question? ADO.NET already has a concept of mapping in SqlBulkCopy, but even so, how common can it be? If you are not able to make the constructor parameter names be OrdinalIgnoreCase equal to the query column names, you'll have this problem with every potential API as long as DTOs are involved. It sounds like a 0.1% case. If you want to start with an API that makes you do a little more work but is a fallback for the 0.1% case (or it turns out to be a 20% case or something), that would be |
I guess I should have written mutable rather than reference types. The point (IMHO) is an API that works for both. I'm not sure why an API that accepts an array doesn't work for immutable structs... You can always overwrite a slot in the array with a newly created (and populated) struct, no? Anyway, I definitely agree that ADO.NET can and should be made more usable and convenient, it just has to be done in a way that addresses all the issues etc. |
True, if it decides to call the constructor. Then it will incur allocations if it doesn't act differently depending on whether There are two benefits of IAsyncEnumerable which I'm looking forward to: |
Guessing from the open questions in https://github.com/dotnet/csharplang/blob/master/proposals/async-streams.md, it would be something like this (but you'd use struct enumerators instead of an iterator method): public static async IAsyncEnumerable<T> AsEnumerable<T>(this DbDataReader reader, CancellationToken cancellationToken)
{
if (reader == null) throw new ArgumentNullException(nameof(reader));
var singleRow = new T[1];
while (await reader.ReadRowsAsync(singleRow, cancellationToken).ConfigureAwait(false))
{
yield return singleRow[0];
}
} Another question you'd have to answer if ADO.NET touched IAsyncEnumerable is whether you'd dispose the reader when the enumerator is disposed. It can't always dispose or this API could only be used with the final result set. Maybe |
There was a lot of discussion now around what result shapes would be supported. My expression based proposal shows how the user can customize this API for the shape of any This means that value types, custom constructors, factory methods, field and property writes and nested objects are all automatically supported. The user could then also reuse a single object instance if he wants. He would just pass an expression tree like this:
|
I now benchmarked the impact of allocations (I updated the code in the opening post; just change the value of allocationsPerRow and run). I tested allocating N objects per row. 0: 18.0ms/iteration. "1" would be the typical use case. Each row normally turns into one DTO instance. This benchmark works on small rows of a fixed size. This should be a hard case for allocations. The benchmark results were a bit unstable but seemed good enough. Later iterations tended to be faster. Maybe this was the GC and SQL Server tuning themselves to the load. I ran this on .NET 4.7.1, x64 and with It seems that allocations do not have that much of an impact but they certainly are above "noise" level. |
@GSPP what are the datatypes of More generally, there's the question here of the extent to which we want to start including features in ADO.NET which are currently implemented in an upper layer such as Dapper. This isn't to say it shouldn't be done - just how much we view ADO.NET as a low-level API. What exactly are your issues with the way things are currently doable with Dapper? |
@roji I know that I have not explained it very well. I'll try harder: Let's assume ADO.NET has an internal class Then, application code can use this incoming expression tree to obtain the value of the first column without boxing. The application code creates from those subexpressions a lambda expression that creates the output for that row (like a new object instance). This lambda expression tree is handed back to ADO.NET. When then final integrated tree is compiled, there will be no boxing. The expression Why integrate this into ADO.NET? Because only ADO.NET has access to the internal TDS parsing infrastructure. Only ADO.NET can therefore help create the optimal expression tree that really has the minimal amount of abstraction cost possible. |
@GSPP thanks for the details. OK, so you're proposing that ADO.NET providers expose an expression tree for reading an int, alongside the current method which reads an int. This fragment would then be combined by the user to a larger expression tree, typically reading an entire row (by combining column reading fragments), and compiled. Is that a fair description? It's clear that this indeed wouldn't box, but I think it's important to compare the perf benefits compared to today's method of calling Using code generation (which is what this is) for optimization is usually a good idea when it allows doing an expensive thing upfront, and use it to shape the logic of your generated code... For example, if you need to access some object via reflection, it makes a lot of sense to perform the reflection once and then compile an expression tree to avoid paying a lot of reflection overhead on every access. Here, however, the only thing you'd be saving would be the virtual call overhead when getting each column. I'm not saying it's necessarily worthless, just that the usefulness of code generation here seems much more limited then when we usually use it. One more general note... Before diving into the details and API shape, I think we need to understand what it is we're looking to accomplish here. I think there are two motivators here:
To clarify, I'm not trying to shoot this down or anything, just to advance the conversation and make sure we understand why we're discussing this. |
Yes, very fair. My interest is in
This would require ADO.NET to call user-defined constructors to pass in the primitive column values. ℹ As a cool side benefit, I'd be able to use user-defined domain primitives for single-column returns. For example, instead of GeneralizingNow stepping back, we'd be asking ADO.NET to do two things.
I think this would be really powerful. You might want an reader.AsAsyncEnumerable<ThirdPartyFoo>(
(int x, DateTime y, int z) => new ThirdPartyFoo(…)); Even though most of the time you wouldn't need custom logic, so you'd let ADO.NET locate the constructor which has the correct parameters: reader.AsAsyncEnumerable<NormalFoo>(); // Directly calls new NormalFoo(int x, DateTime y, int z) |
If you don't want IAsyncEnumerable, I'd also be interested in await reader.ReadRowsAsync(
(int x, DateTime y, int z) => buffer.Add(new Foo(…)),
min: 1 // Keep waiting until at least 1 row is available and sent to the delegate
max: buffer.Capacity); // If extra rows are *synchronously* available A user like me could easily wrap an |
@GSPP, I executed your original benchmark on Npgsql. The following is DotTrace analyzing a console program running .NET Framework 4.6.1 on my local laptop running Windows 10, against a local PostgreSQL 10. This is definitely not the way to run a benchmark - the server needs to be on another machine etc. - but it gives an idea: As you can see, Based on the above, I can't say I see evidence for perf value in a new API deserializing an entire row - what you're seeing with MSSQL seems to be more of a perf issue in MSSQL than something that justifying a general ADO.NET change. In addition, such an API would still internally do more or less the same things as the current one - decoding integers from a memory buffer - so actual perf increases are likely to be negligible at best (i.e. you'd be saving column index checks but nothing more). Note that this doesn't say anything about introducing a new API for usability, e.g. based on the new IAsyncEnumerable (#27684 tracks that). But I do think it shows that we currently have no reason to think about new ADO.NET column access APIs for performance. |
@roji well, MSSQL has 70% time in GetInt32 and Npgsql has 40% time there. That is a tremendous amount for something that should be similar to I imagine that the generated expression tree would look something like:
I'm ignoring things like variable length columns here. Surely it would not be quite as slim. But far better than a huge call trees, vcalls, no inlining, less JIT optimizations because everything is so spread out. The code can be much slimmer because the network data format is known. We know the columns and in some cases even their direct offsets. I bet the existing code must do all kinds of dynamic lookups and validations. It's like reflection vs. directly written code. |
I have actually implemented a data reader that asynchronously reads all rows of a result set and calls the specified delegate (any void-returning delegate type) once per row, injecting the row's column values as arguments. Makes it easy to project the data however you like. Emitted IL is as good as handwritten. |
@GSPP I ran your benchmark against MySQL (.NET 4.7.2 console app running against local MySQL Server on Windows 10) and got these results: It's 60% in
It's certainly nowhere near this simple for the MySQL text protocol. |
Whether a certain percentage is "a tremendous time" or not depends on what else is happening in the benchmark. As written above, in this case the benchmark isn't doing any processing on the results, so it could be (and probably is) completely fine for this to take 40%. To make this simple... If you want to show that this is a worthwhile optimization, that the best way to do that is to do a PoC of your API in an existing driver (e.g. Npgsql or SqlClient). Then you would be able to show actual improvement (e.g. in operations per second) in a benchmark, directly as a result of your change. Continuing to argue about call percentages in a profiler isn't very meaningful in my opinion.
That's a lot of assumptions right there - and assumptions in optimizations are a pretty bad idea. The size of the call tree, vcalls and JIT optimizations is going to vary from provider to provider, and there's a very good chance that all of them taken together they're negligible in the grander scheme of things.
This doesn't make sense to me. Offsets usually vary because column sizes do differ - not all data is fixed-size ints; I assume you're not pushing for an optimization which would only work for fixed-length-only rows. In any case, the time spent on finding the column really seems quite negligible, at least in Npgsql. And what does "knowing the columns" mean if you don't know the offset? If you need to get the value of column 2 in the current row, you have to go to where it starts regardless if it's from
Which existing code, MSSQL? Npgsql? Does it have to be that way or is it just badly-written? Can't we just make it better without a whole new API that every single ADO.NET provider will have to implement? As I wrote above, Npgsql's implementation of The only validation I'm aware of is making sure that the parameter to
It really isn't. Reflection has a (huge) overhead over direct access, which simply does not exist here. In all cases, reading an int column of a row consists of finding where that column starts, deserializing it (e.g. endian handling for ints) and returning the value. There's nothing I can see that a row-based API (or a code-generation approach) can do to optimize these away: they have to be done regardless. And if you did want to code-generate a delegate for the column access (again, I don't see why), the driver could do this internally and execute that code-generated delegate from To summarize, all we have at the moment is a profiler snapshot and a claim that 40% (or 70%) is "too much". In my opinion, if you want to show that your approach makes sense, then we need to see a simplified implementation of what you're proposing, along with a benchmark that shows a significant increase in operations per second. |
Closing as this is not something we plan to implement. |
When using ADO.NET to read rows the deserialization cost can be very significant. This is important in scenarios where a lot of data is streamed. Here is directly executable code to demonstrate this:
This code simply deserializes a table of a few thousand rows. There are 5 int columns. The benchmark is designed to demonstrate the issue but it's not unrealistic. I also used ADO.NET best practices.
I might be mistaken but it seems that a lot of the time is spent calling chatty APIs on SqlDataReader. It looks like there is a lot of API overhead and comparatively little work actually parsing the network protocol. The console process uses 100% of one CPU core. SQL Server uses only about 1/3 of one CPU core. This is further (heuristic) evidence that the client is less efficient than the server.
There is no way to avoid this cost at the moment. The code is as fast as it gets. I suggest the following enhancement: Provide an API that directly deserializes into a user-provided type.
Old:
New:
This API could be made extremely fast because:
new MyDTO() { Field1 = internalReader.GetNextFieldAsInt32(), Field2 = internalReader.GetNextFieldAsString(), ... }
.Possible extensions of this idea would be:
IEnumerable<object[]>
where all field values are provided as objects.readerObject.GetNextObject<MyDTO>()
.readerObject.GetNextObject<MyDTO>(existingObject)
.IAsyncEnumerable
or other suitable mechanisms).ParallelQuery
. I'm not optimistic this can be done.IEnumerable<MyDTO[]>
. Here, ADO.NET would hand out rows in chunks. The chunk size would be chosen optimally. Probably, ADO.NET would convert each network buffer into a chunk. This would make the API even less chatty.I understand that this proposal is a bit raw. The API that I proposed is a straw man that clearly has some flaws. I think an API in this spirit could improve efficiency a lot. It also would be very convenient. The existing pattern is extremely awkward to use. People are using light-weight wrappers like Dapper which look just like this API proposal.
The API should be so that ORMs such as Entity Framework can use it as well. This might not be trivial depending on what Entity Framework does. Here's an idea on how to support arbitrary object instantiation:
Instead of making ADO.NET figure out how to construct the
new MyDTO()...
part of the expression tree, let the user provide that. ADO.NET provides oneExpression
for each field to be deserialized. It could look like this:In this example, the API caller can create an arbitrary deserialization expression that uses the ADO.NET provided column values in
columnValueExpressions
.columnValueExpressions[0]
would correspond tointernalReader.GetNextFieldAsInt32()
in the code further above. Entity Framework could construct an expression tree that does anything it wants including creating multiple objects, calling factory methods and calling into the object state manager. All of this processing would be very close to obtaining the actual column values from internal ADO.NET data structures. No chattyness.The text was updated successfully, but these errors were encountered: