Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ValueTask<T> returning async APIs to ADO.NET #25297

Open
roji opened this issue Mar 3, 2018 · 52 comments
Open

Add ValueTask<T> returning async APIs to ADO.NET #25297

roji opened this issue Mar 3, 2018 · 52 comments

Comments

@roji
Copy link
Member

roji commented Mar 3, 2018

With https://github.com/dotnet/corefx/issues/27445 allowing elimination of allocations in asynchronously-executing async methods, we should look into adding ValueTask-returning counterparts to ADO.NET. This would potentially allow zero-allocation database access.

Here are the current async methods:

Task DbConnection.OpenAsync(...);

Task<DbDataReader> DbCommand.ExecuteReaderAsync(...);
Task<DbDataReader> DbCommand.ExecuteDbDataReaderAsync(...);
Task<object> DbCommand.ExecuteScalarAsync(...);
Task DbCommand.ExecuteNonQueryAsync(...);

Task<bool> DbDataReader.ReadAsync(...);
Task<T> DbDataReader.GetFieldValueAsync<T>(...);
Task<bool> DbDataReader.IsDBNullAsync(...);

Notes:

  1. Naming the new methods is going to be complicated. Some parts of corefx are lucky in that they're introducing ValueTask<T> overloads along with Span<T> (e.g. Stream), but ADO.NET has no such upcoming parameter changes/additions. Since there can't be yet another overload that only differs on the return type, and the "standard" name with just the Async suffix is taken, we need to come up with a new naming pattern.

  2. There are some missing async methods in ADO.NET in general (e.g. DbCommand.Prepare(), DbConnection.BeginTransactionAsync(...), DbTransaction.CommitAsync(...)...). These can be added directly with return value ValueTask<T> (covered by issue https://github.com/dotnet/corefx/issues/35012). Naming of these new methods should probably be decided in the context of whatever new naming pattern we use for the ValueTask<T> returning variants of the existing async methods.

  3. Also note that async APIs that return Task or Task<bool> (such as ReadAsync() and IsDBNullAsync()) can be implemented to return cached Task<T> instances when they complete synchronously to gain efficiency instead of adopting ValueTask<TResult>. This mitigates the need for ValueTask<T> returning variants for most APIs. Also, in terms of allocations, the advantage of ValueTask<T> is much greater the more fine grained the async methods are.

  4. Compatibility for providers: It seems desirable to offer a default implementation of the Task-based method that calls AsTask() on the ValueTask<TResult> instance returned by the new API, so that new providers don't need to implement the old Task<TResult> version of the API but instead only implement the more efficient ValueTask<TResult> version. It also seems good not to force existing providers to switch.


**Updated by @divega on 1/13/2019 to consolidate with https://github.com/dotnet/corefx/issues/15011.

@divega
Copy link
Contributor

divega commented Mar 3, 2018

System.Data Triage: We should consider this, but since it isn’t for 2.1, moving to future.

@davidfowl
Copy link
Member

Are any of these methods streaming? Would IAsyncEnumerable be interesting here?

@divega
Copy link
Contributor

divega commented Mar 3, 2018

@davidfowl DbDataReader.Read is streaming. The current API does not resemble IEnumerable (possibly because it predates it 😄), but the new API could use IAsyncEnumerable if the timing is good.

See this was mentioned on aspnet/DataAccessPerformance#32.

@roji
Copy link
Member Author

roji commented Mar 3, 2018

DbDataReader implement IEnumerable, exposing rows as DbDataRecord. It's an old API (non-generic, sync-only), and the default implementation, which uses DbEnumerator, is also pretty inefficient: it calls DbDataReader.GetValues(), which copies the row's values into an object[] (which gets allocated on each MoveNext()...).

But I actually don't really see a reason not to also have DbDataReader implement IAsyncEnumerable... It could provide a modern and efficient way to stream over the the rows of a resultset. Opened dotnet/corefx#27684 to track.

@GSPP
Copy link

GSPP commented Apr 2, 2018

Is the allocation overhead really significant when opening a connection or executing a reader? Executing select null against SQL Server on the same machine over shared memory and using synchronous IO takes about 200us. This already is so much more than allocating a few objects and collecting them.

Adding ValueTask versions for reading rows and fields might be valuable.

@roji
Copy link
Member Author

roji commented Apr 2, 2018

You're right that when OpenAsync() returns asynchronously (i.e. a physical open or wait), the memory overhead is probably negligible. For synchronous invocations (no I/O, pooled idle connection is returned) there is no memory overhead, since OpenAsync() returns a non-generic task which can be cached. So I agree this has more value for DbDataReadet.Read().

However, don't forget that ADO.NET is sometimes used to access databases like Sqlite, where the physical open can be much much faster (no networking), so it could be significant there.

@Grauenwolf
Copy link

Grauenwolf commented Nov 7, 2018

ValueTask DblDataReader.GetFieldValueAsync(...);
ValueTask DbDataReader.IsDBNullAsync(...);

If we're going to touch this, we might as well fix the IsDBNull problem at the same time. Currently we need to make two calls, once to check for nulls and again to actually read the value. When I profile my ORM, IsDBNull is often flagged as most expensive operation in aggregate. (EDIT: this is no longer true.)

We can probably make it a lot more convenient efficient with methods such as

ValueTask<T?> DblDataReader.GetFieldValueOrNullAsync(...);

@roji
Copy link
Member Author

roji commented Nov 7, 2018

Depending on the ADO.NET provider implementation, calling IsDBNull() separately from GetFieldValue<T>() isn't necessary a big perf problem (I'm not sure it's the case with Npgsql). But introducing something like GetFieldValueOrNull() (and its async counterpart) would be a good idea regardless, if only for improving the API/usability.

Note that using the async versions of these methods is probably a bad idea unless CommandBehavior.SequentialAccess is specified, since the entire row is pretty much guaranteed to be buffered in memory anyway...

@Grauenwolf
Copy link

You're right. When I last benchmarked it on .NET Framework a year or two ago it was a major issue. Now it's barely visible in the performance profiler.

@Grauenwolf
Copy link

Still, it would be nice for those who are using ADO.NET directly to not have to write those checks.

@roji
Copy link
Member Author

roji commented Nov 8, 2018

Absolutely, it would be great if ADO.NET got a bit of a face-lift in terms of API usability, regardless of any perf aspects...

Good to know that perf-wise things are better, there have been a lot of changes in recent two years.

@GSPP
Copy link

GSPP commented Nov 8, 2018

If this proposed deserialization API is implemented then the ADO.NET APIs will not be very chatty. There will be less of a need for ValueTask. ValueTask can be used internally.

@divega
Copy link
Contributor

divega commented Nov 9, 2018

@roji @Grauenwolf I would like us to track the idea of GetFieldValueOrNull[Async] as a separate issue. Would you like to create it?

@Grauenwolf
Copy link

I haven't read the underlying code yet, so I think someone else would be better suited to proposing an API.

@MgSam
Copy link

MgSam commented Nov 9, 2018

I vote against mixing ValueTask and Task return types arbitrarily based solely on the age of the method. ValueTask return types should only be highly targeted to methods likely to be called over and over.

I do agree BeginTransactionAsync and CommitTransactionAsync are needed, though they should not be ValueTask.

@divega
Copy link
Contributor

divega commented Nov 9, 2018

@Grauenwolf stating the problem with the current API would be enough. But no worries.

@Grauenwolf
Copy link

I'm ok with BeginTransactionAsync and CommitTransactionAsync not being ValueTask.

Really it is only DBDataReader that I care regarding ValueTask because it is in a tight loop.

@divega divega changed the title Add ValueTask overloads to ADO.NET async APIs Add ValueTask<T> returning async APIs to ADO.NET Feb 1, 2019
@divega
Copy link
Contributor

divega commented Apr 3, 2019

@roji I wonder how this would look like now that we have a better understanding of the guidance re ValueTask<T> vs. Task<T> (from dotnet/efcore#15184).

It seems to me that only the API on DbDataReader is worth considering.

@roji
Copy link
Member Author

roji commented Apr 3, 2019

@divega I agree - for the execution APIs which never return synchronously, it doesn't make much sense to think about ValueTask. I will update the description soon to reflect this.

There's also dotnet/corefx#35012 (adding missing async methods, as opposed to ValueTask counterparts to existing async methods). It seems that proposal already follows the guidance - the only ValueTask-returning methods are DisposeAsync() (so coming from IAsyncDisposable) and CloseAsync(), which is almost identical to DisposeAsync().

@divega
Copy link
Contributor

divega commented Apr 3, 2019

@roji perhaps we should get everything we want from this issue copied into dotnet/corefx#35012 and bring that one to API review. I am saying this because the scope of this issue (#27682) is probably now small enough that it wouldn't affect the chances of dotnet/corefx#35012. I also think it helps have a more holistic view during the API review.

@divega
Copy link
Contributor

divega commented Apr 3, 2019

Though there is still the naming problem 😄

@roji
Copy link
Member Author

roji commented Apr 4, 2019

Yeah, and for GetFieldValue there's also (in theory) still the question of better nullability handling (even though right now we don't seem to have anything better). That's why it may be good to handle them differently - I think dotnet/corefx#35012 can be brought to design review right away.

@divega
Copy link
Contributor

divega commented Apr 4, 2019

I added api-ready-for-review on dotnet/corefx#35012. Let's at least discuss the ValueTask<bool> version of ReadAsync there.

@roji
Copy link
Member Author

roji commented Apr 4, 2019

@divega a small note after the recent discussions: ValueTask<bool> seems to be the same case as ValueTask, in that it provides no value for synchronous completion: Task is already cached (only two possible values). It may still make sense to add ValueTask<bool> overloads in case a provider ever decides to optimize the asynchronous completion case (via IValueTaskSource) - although at this point that seems pretty unlikely, given the effort required (see @stephentoub's comment quoted in dotnet/efcore#15184 (comment)).

We could defer adding those APIs until an easier alternative is introduced (as is planned for some point) - since until then the ValueTask<bool> version will actually be a bit slower than the Task one (due to the overhead of the added layer etc.). What do you think?

However, in any case we still have GetFieldValueAsync() which would benefit from an overload returning a ValueTask<T>.

@divega
Copy link
Contributor

divega commented Apr 4, 2019

Yes @roji, however DbDataReader.ReadAsync is exactly like IAsyncEnumerable<T>.MoveNextAsync, which is already optimized for the async completion case. I suspect someone willing to optimize it could reuse ManualResetValueTaskSourceCore<TResult> and duplicate the logic used to build async iterators.

I am hoping that in the API review meeting we can get the right people to ask about this.

@roji
Copy link
Member Author

roji commented Apr 5, 2019

@divega I'll try to investigate a bit how that works. But you're that in any case, even if today it's difficult to benefit from the async completion optimization, it may become easier in the future, and this specific API is definitely a place where that may make sense. As you say, raising this in the API review meeting would probably get us the answers we need.

Regardless, I'll soon update the description to reflect this discussion.

@GSPP
Copy link

GSPP commented Apr 9, 2019

An argument can be made for providing the highest possible performance alternative (likely ValueTask): Who still uses raw ADO.NET these days? That is rare. Normally, people use an ORM or a thin wrapper such as Dapper. Inconvenient APIs are hidden from most callers.

@MgSam
Copy link

MgSam commented Apr 9, 2019

@GSPP Tons and tons of people still use ADO.NET. There are huge classes of developers that refuse to use anything that is not shipped with .NET. Not to mention all the legacy code out there.

@roji
Copy link
Member Author

roji commented Apr 9, 2019

@GSPP can you explain how your argument that users don't usually use ADO.NET directly relates to this issue? At least at this point it isn't really recommended to add APIs everywhere that return non-generic ValueTask, as it is very unlikely that this would help perf in any way (it would actually degrade perf in some cases compared to returning non-generic Task).

There's also a direction of thought about making ADO.NET more usable directly (as opposed to usable only via other layers).

@jaredpar
Copy link
Member

jaredpar commented Sep 4, 2019

VB didn’t get Task likes?

Correct. It is still limited to defining async methods that return Task / Task<T>. It can consume ValueTask / ValueTask<T>, or any other type that implements the awaiter pattern.

https://sharplab.io/#v2:EYLgtghgzgLgpgJwDQBsQDdhJiAligHwEkwAHAewRigAIBlAT1jjAFgAoEiq2x5sAHQAVABYI4EACa4AdgHNh0ANZQOABQCuwFLgDGNAMIpotAxxoWam7XpoBBKAxn6AYhucxc5GTQCyAChh7WiFlAEpgmlCoJXNLeLsAdwhcIJg4iwBRGUkaNw8vGQyaYusdfQcnV3ddT28/ACZAyIA1CBQNOGilCIcaNo6u5WKE5NSadPZ47Nz82sKOGcNjKCggA==

@Grauenwolf
Copy link

Grauenwolf commented Sep 4, 2019 via email

@jaredpar
Copy link
Member

jaredpar commented Sep 4, 2019

There is nothing preventing VB from implementing a member that has a ValueTask return type. VB just can't do it using the Async function style. If an Async style method was needed then a simple thunking approach can be used to put the logic in an Async style method and call that in a ValueTask returning method directly.

@Grauenwolf
Copy link

Grauenwolf commented Sep 4, 2019 via email

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@ajcvickers ajcvickers removed the untriaged New issue has not been triaged by the area owner label Jun 23, 2020
@ajcvickers ajcvickers modified the milestones: 5.0.0, Future Jun 23, 2020
Copy link
Contributor

Due to lack of recent activity, this issue has been marked as a candidate for backlog cleanup. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will undo this process.

This process is part of our issue cleanup automation.

@dotnet-policy-service dotnet-policy-service bot added backlog-cleanup-candidate An inactive issue that has been marked for automated closure. no-recent-activity labels Dec 23, 2024
Copy link
Contributor

This issue will now be closed since it had been marked no-recent-activity but received no further activity in the past 14 days. It is still possible to reopen or comment on the issue, but please note that the issue will be locked if it remains inactive for another 30 days.

@dotnet-policy-service dotnet-policy-service bot removed this from the Future milestone Jan 6, 2025
@Wraith2
Copy link
Contributor

Wraith2 commented Jan 6, 2025

I'm not sure if we want this closed. It would be very useful to have ValueTask field value getters at least. We can make them provider specific if we need to but it would be preferable not to. The problem is naming. The name GetFieldValueValueTaskAsync<T> is clumsy and the best alternative I can come up with is GetValueTask2Async<T>

@dotnet-policy-service dotnet-policy-service bot removed no-recent-activity backlog-cleanup-candidate An inactive issue that has been marked for automated closure. labels Jan 6, 2025
@roji
Copy link
Member Author

roji commented Jan 7, 2025

Reopening to keep this on our radar... FWIW I think GetFieldValueAsync() is generally rarely needed, since it's mainly useful when CommandBehavior.SequentialAccess is specified (i.e. big rows being streamed); in the regular/default mode, the entire row is buffered anyway (and can be read in random order), so using the sync API (GetFieldValue) is perfectly fine.

But yeah, for sequential access mode we should indeed have a non-allocating API - which would ideally also improve the usability around nulls as well.

@roji roji reopened this Jan 7, 2025
@roji roji added this to the Future milestone Jan 7, 2025
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Jan 7, 2025
@roji roji removed the untriaged New issue has not been triaged by the area owner label Jan 7, 2025
@Wraith2
Copy link
Contributor

Wraith2 commented Jan 7, 2025

The knowledge you need to have to understand that using GetFieldValue is probably ok is quite high. Enabling users in async contexts to casually use await GetFieldValue2Async without exposing them to the current high costs of Task versions is desirable.

@roji
Copy link
Member Author

roji commented Jan 7, 2025

Fair enough, though I wouldn't call the costs of Task allocations "high" for the typical application. In any case I agree this is a good thing to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests