-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better API for ADO.NET command batching #15375
Comments
@roji Does PostgreSQL support SQL Batching ? It is used for SQL Batching support. e.g "select 1; update *...; select 2;" |
@kkurni, here are some details. PostgreSQL supports two network protocols: simple and extended (here are the detailed docs). In the simple protocol you can send a single message with "select 1; update *...; select 2;", PostgresSQL will parse out the multiple statements and return 2 result sets. However, the simple protocol has several shortcomings: no binary data, no parameter support, no prepared statements. For these reasons, Npgsql 3.x switched to the extended protocol. In the extended protocol you send a series of message - Parse, Describe, Bind, Execute, Sync. This allows you to do everything, but you can send only one SQL statement each time. You can still pack multiple messages into a single packet, achieving good network performance. Npgsql 3.x does allow users to set DbCommand.CommandText to "select 1; update *...; select 2;". However, since it uses the extended protocol it is forced to parse the statements client-side, and send multiple sets of messages in a single packet (2 Parse messages each with a single statement, 2 Describe messages, and so on). To summarize, PostgreSQL does support SQL batching, but requires you to submit the different SQL statements in separate protocol messages (if you use the extended protocol). Therefore, the ADO.NET batching API, which requires you to pack multiple statements into a single string with semicolon separators, is unsuitable... Let me know if you need any more details! |
Thanks @roji for providing us with this detail. This specification seems to be a provider specific. If we end up implementing your request, this will be breaking changes for other provider as well. public class MyCommand : DbCommand
{
public List<string> CommandList
{
get;
set;
}
private string _commandText;
public override string CommandText
{
get { return _commandText; }
set
{
if (IsExtendedQuery(value))
{
throw new InvalidOperationException("Use CommandList for extended Query");
}
_commandText = value;
}
}
private bool IsExtendedQuery(string query)
{
//check if it is extended query
return false; //or true
}
} |
@kkurni, I think I didn't explain myself clearly... The PostgreSQL extended vs. simple protocol isn't really relevant here, it was just to give more context. The problem is that ADO.NET's current batching API assumes that provider drivers can simply send a single string with multiple SQL statements and semicolons, which the backend will take care of splitting and executing. This simply isn't the case for PostgreSQL (extended protocol). The solution could simply be to add a This would provide a structured, fully-flexible batching API while fully preserving all existing ADO.NET providers. Please let me know if this is unclear in any way. |
Thanks @roji for clarifying this. by adding
We can potentially add a flag like you suggested above It will be simpler to add string[] CommandTexts only to provider which need this. E.g PostGreSQL. |
The addition of a CommandTexts property definitely adds a little of complexity (as always when something is added), but I'm not sure this complexity is added to other providers. If, as I proposed above, the default implementation of this property is to simply set and get the CommandText property, then things are extremely simple - there is only one CommandText value, and CommandTexts reads and writes to it). No need for any sort of legacy flag - the last set of either CommandText or CommandTexts will be the definitive one. So again, unless I'm missing something this adds a useful, more structured batching API without complexifying existing providers in any way. Of course I could add CommandTexts to Npgsql, but then I'm forcing people to write Npgsql-specific code in order to do batching. The point is that ADO.NET is supposed to be a provider-independent API, but its batching API models things in a way which isn't. |
How will CommandTexts read back the value from CommandText ? Except that we only allow to SetCommandTexts without able to read them back. |
Providing a set-only function definitely could work, as you suggest. |
@YoungGah, please let me know if you have any questions on this. It's extremely problematic to require providers to fully parse SQL only because their database protocol doesn't allow sending semicolon-joined statements. |
@roji, I understand the issue and I agree that the API is not optimal. In the past, we discussed providing a better API (similar to what you proposed) within the owning team, but due to complexity of error handling and other higher priority works, we never got to add the new API. We will evaluate and prioritize this request post V1, but the ask is clear and make sense. We will update on this request after we release V1. |
Great, thanks @YoungGah. This is obviously not high-priority - existing providers such as Npgsql already have some sort of answer to the problem. But it's an important long-term goal to be able to cut out SQL parsing. |
There's one more related issue with the ADO.NET batching API: managing parameters. Parameters are currently managed as a collection on the DbCommand - they are therefore "common" to all statements being sent in a batch; there's no association between a parameter and a specific statement in a batch. Here's why this is a problem. In the PostgreSQL wire protocol, each statement (whether batched or not) comes with its own parameters - there's no way to send a single parameter and then to reference it from two different statements. This makes ADO.NET's "one parameter collection for all batched statements" approach totally incompatible with PostgreSQL. The current workaround is, again, to parse the statement SQLs, identify parameter placeholders, and send each statement in the batch with its own set of parameters. If and when you guys get around to working on this, I'd recommend you review how the JDBC API works, it may provide some answers. |
@roji I'm curious here: for what range do you consider this API useful? By that I mean, what's the reasonable upper-bound before we're into bulk insert territory anyway? I'm thinking about how Dapper would support these scenarios across providers as well and just trying to gain an idea of expected use. |
That's a very good question - there's definitely overlap between bulk insert and command batching. One key thing about batching is the ability to do anything, not just insert - I can batch DELETEs, UPDATEs, or even stored procedure calls. On the other hand, bulk insert is supposed to provide a way of inserting even more optimized than batching INSERT commands. My main goal here is to make sure that ADO.NET's batching capabilities are more compatible with databases other than SqlServer. After all, a batching "API" already exists in ADO.NET (if you can call shoving a semicolon in the text an API); but while it works well for SqlServer, it's not well-suited for PostgreSQL, for example. Does that make sense? |
Note: have opened npgsql/npgsql#1042 to handle this specifically in Npgsql, in case you guys don't get around to working on it. |
FYI, I'm working on implementing this for PostgreSQL, i.e. a structured statement API within DbCommand which doesn't depend on semicolon-delimited statements packed inside CommandText. The API looks like this: using (var cmd = new NpgsqlCommand
{
Connection = conn,
Statements =
{
new NpgsqlStatement
{
SQL = "SELECT $1",
InputParameters = { new NpgsqlParameter { Value = 8 } }
},
new NpgsqlStatement
{
SQL = "SELECT $1",
InputParameters = { new NpgsqlParameter { Value = 9 } }
}
}
}) Note how each NpgsqlStatement has its own parameter list, this again corresponds to how the PostgreSQL protocol actually represents statements (parameter per statement rather than a shared parameter list for all statements of a command). To decide which mode the command uses, i.e. whether it looks at CommandText or at the structured Statements list, it's simply a matter of which API is used last. Setting CommandText sets a flag one way, and modifying the Statements list in any way sets it the other way. Any comments would be welcome, I'd love to see something like this in standard ADO.NET one day. |
in abo.net , we can use System.Data.SqlClient.SqlCommandSet for batch insert although SqlCommandSet actually internal
|
@fs7744 thanks for commenting, this is the first time I hear of SqlCommandSet! This could indeed be a way forward, but this class seems to be internal? How are you using it? |
Oh sorry, I missed that you actually said that. Any idea what this class is used for? |
in abo.net's design, SqlCommandSet is actually for SqlDataAdapter, so it's internal someone like use SqlCommandSet when without DataAdapters like http://stackoverflow.com/questions/402630/batching-in-ado-net-without-dataadapters You cane see SqlDataAdapter how to use SqlCommandSet in SqlDataAdapter.cs |
Well, it's true that this class is an API for performing batched operations without CommandText concatentation. However, it's not clear what this contributes to the overall problem - the class isn't a public API... @YoungGah, do you have any guidance here? I'd hate to go and implement my API in Npgsql just to change things again later. Does the team have any direction in mind, or at least an idea whether this will be tackled at some point? |
NHibernate, for example, got around |
I wonder if there's really a performance benefit associated with using |
@YoungGah, it's been a long while since someone from Microsoft commented on this. Is it possible to get some visibility on whether the ADO.NET API is going to get any attention soon? I'm specifically looking for some guidance on what to do here, and I'd like to avoid going down a path that wouldn't be compatible with ADO.NET later. Specifically, SqlClient appears to support batching via two separate APIs - a single command with semicolons in its CommandText, and the internal SqlCommandSet. As I said above, I have a branch with a an NpgsqlRawCommand, which itself can contain multiple NpgsqlStatements - but I'd be happy to have a serious conversation about where this is going before merging this kind of thing. |
MySQL has a similar problem/opportunity. There is a text protocol that allows multiple commands to be sent, and returns multiple result sets. However, it doesn't support parameter binding, prepared statements, and transmits all data types in a textual format. Then there is a binary protocol that supports parameter binding, prepared statements, and an efficient binary resultset; however, it only supports one command at a time. As a result, in order to use the binary protocol, the client has to be restricted to only executing a single command at a time (which is inefficient in terms of network round-trips), or the connector library has to parse the SQL and split it into multiple commands (which runs into all the problems @roji has mentioned with SQL parsing, parameter management, etc.). |
@bgrainger that's interesting, the MySQL protocol situation seems to be exactly identical to PostgreSQL's protocol situation... originally there was only a simple protocol which support multiple statements in the same protocol message, but no parameters, prepared statements or binary data representation. Then they added the "extended protocol", which supports all the above features but does one-statement-per-message. Statements can still easily be batched by packing several messages into the same packet (which is what Npgsql does), but ADO.NET forces Npgsql to parse since it doesn't has a proper batching API and forces users to shove multiple statements into the same CommandText. I really hope you guys at Microsoft start looking at ADO.NET soon, these issues have been around for quite a while and the API has stagnated. |
There's actually a MS SQL Server use case for this feature request, as well. With SQL Server and ADO.Net, as has already been pointed out, it's currently possible to build a single command consisting of multiple SQL statements. However, the same doesn't hold true for stored procedures: It's currently not possible to invoke multiple stored procedures using a single Implementing this feature request could make this use case possible. Keep in mind that, if a query uses parameters, SqlCommand converts the query into a stored procedure call behind-the-scenes. So, supporting this use case would affect a lot more than just direct user invocations of stored procedures--it would allow more efficient transmission of sets of queries that use parameters. |
For batching I'm using solution from NHibernate. Unfortunately on .net core is not implemented. SqlCommandSet type doesn't exist. PS |
@dario-l SqlCommandSet is an SqlClient-specific API, and is not even exposed publicly. The point here would be to add a standard API in ADO.NET, which may or may not end up resembling SqlCommandSet. |
@roji That's the point. I must use hacky way to do things performant. There should be the way to do this with official standard API. If batching is not implemented in .net core yet then, maybe, it is a chance to do this properly even if this requires some changes/extensions in API. |
@dario-l, unless I'm mistaken you can do batching with the official standard API, as described above - you just pack your multiple SQL statements into a single string and set CommandText to it. So the issue is not that an API is lacking, it's that it's unstructured and therefore problematic. |
@roji Yes, I can but this leads to rebuild query by database server every time it will change and will be not cached. I've tested that scenario. |
What do you mean when you say that the query "will change" and have to be rebuilt? If you properly use parameters why would the query have to change? |
@dario-l that's interesting. I wonder if you have the same speed difference if the queries are not identical between them (i.e. two different updates). In any case, we both agree and are asking for the same thing - a standard, structured API in ADO.NET for batching. |
Useful article on SqlClient batching options: https://blogs.msdn.microsoft.com/dataaccess/2005/05/19/does-ado-net-update-batching-really-do-something/ (copied from https://github.com/dotnet/corefx/issues/31070#issuecomment-405393102). |
It's important to note that batching via An ADO.NET batching API would permit exposing SqlCommandSet in a portable way and boost performance there as well. |
The SQL Anywhere ADO.NET driver solved this problem quite elegantly:
Example: cmd.CommandText = "UPDATE message SET text = @text WHERE id = @id";
cmd.Parameters.Add(new SqlParameter("text", "text1");
cmd.Parameters.Add(new SqlParameter("id", 1);
cmd.Parameters.Add(new SqlParameter("text", "text2");
cmd.Parameters.Add(new SqlParameter("id", 2); EDIT: Added example |
One problematic issue with this approach is that it doesn't allow you to batch different statements, only reexecute the same statement with different parameter sets (IIRC the JDBC batching API has the same limitation). One of the goals here is to allow batching of arbitrary statements together (different types of inserts, updates and even selects). Another limitation is that it's difficult to imagine how information on a specific statement in the batch would be returned. For example, it's important to be able to return the records affected for each individual statement in the batch, but this doesn't seem possible without a more structured API that has an explicit idea of a batch and a contained statement. Nevertheless the approach above is elegant within the confines of the current API. |
Superceded by dotnet/corefx#35135 |
ADO.NET currently supports batching several statements in one command by packing them into a single string as follows:
While workable, it's a problematic API. The PostgreSQL protocol, for example, needs to send each single SQL statement as a separate protocol message, forcing Npgsql to split CommandText by semicolon at the client side, a pretty complex task (need to understand when the semicolon is quoted, etc.).
It would be much better to provide a list-based API, where the user provides, say, an array of strings. A specific provider can still choose to join by semicolon if needed, but this should be done at the provider level rather than by the user.
For backwards compatibility, the CommandText property would still support the legacy semicolon batching mode. A boolean LegacyBatchingMode connection string parameter, true by default, could be turned off to disable it (thereby removing the client-side SQL parsing in Npgsql, etc.).
The text was updated successfully, but these errors were encountered: