Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Span-based way to read binary data in ADO.NET #24899

Open
roji opened this issue Feb 2, 2018 · 15 comments
Open

Span-based way to read binary data in ADO.NET #24899

roji opened this issue Feb 2, 2018 · 15 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Data
Milestone

Comments

@roji
Copy link
Member

roji commented Feb 2, 2018

DbDataReader includes a method which allow reading sliced binary data from the database: GetBytes().

With Span coming, it makes sense to add a similar method which simply return a ReadOnlySpan, that can slice the provider's internal buffer - this would provide a zero-copy way to get binary data out. A default implementation could be provided in DbDataReader which internally allocates the byte array and returns a ReadOnlySpan to it, but providers could override it to provide an efficient implementation.

Note that the same isn't possible for char-based data (GetChars()) because text decoding needs to be done by someone.

@davidfowl
Copy link
Member

davidfowl commented Feb 3, 2018

It probably wouldn't return a Span but instead would copy into one.

@roji
Copy link
Member Author

roji commented Feb 4, 2018

@davidfowl part of the idea here is to allow getting the data without any copying. Unless CommandBehavior.Sequential is specified (the vast majority of cases), the user is supposed to be able to seek back and forth in the row, implying that it is already buffered internally in the provider. If this is indeed the case, a ReadOnlySpan can be used to slice the internal buffer and avoid any copying.

To argue against myself, it could be claimed that binary data is usually large and would there be used frequently with CommandBehavior.Sequential, in which case this optimization isn't possible (we don't necessarily have the entire data in memory). But we could always fall back to allocating internally and returning a Span to that.

Don't get me wrong, the version you propose, which copies data into a Span is also needed - it's the same thing as the GetBytes() that's already in ADO but more general for Span.

@divega
Copy link
Contributor

divega commented Feb 8, 2018

System.Data Triage: This looks like a good idea, although we are not going to implement it right away. Moving to Future.

@bricelam
Copy link
Contributor

I bet you could enable this scenario using the existing GetStream() method--just return a span-optimized stream.

var buffer = new Span<byte>();
reader.GetStream(ordinal).Read(span);
var bytes = buffer.Slice(dataOffset, length);

@roji
Copy link
Member Author

roji commented Nov 28, 2018

@bricelam

I bet you could enable this scenario using the existing GetStream() method--just return a span-optimized stream.

Unless I'm confused, this will still copy your data out from the driver's buffer and into the user-provided Span, in that sense it's very similar to @davidfowl's suggestion above.

I was interested in thinking about getting a ReadOnlySpan<byte> directly into the buffered memory in the driver. As I wrote above, I'm not 100% convinced of how relevant this is, since binary data will tend to be long and therefore not necessarily buffered (that would probably depend on CommandBehavior.Sequential is specified or not).

If we're looking for a copying method, than I agree that doing it via the stream is probably best. On a somewhat unrelated note, this seems like a good example of why some sort of compliance testing suite would be helpful (#7810), so that missing support for this in a given provider could be discovered/flagged right away.

@bgrainger
Copy link
Contributor

so that missing support for this in a given provider could be discovered/flagged right away

For example, MySqlDataReader.GetStream in Oracle's Connector/NET doesn't work.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@ajcvickers ajcvickers added the untriaged New issue has not been triaged by the area owner label Jun 22, 2020
@ajcvickers ajcvickers modified the milestones: 5.0.0, Future Jun 23, 2020
@ajcvickers ajcvickers removed the untriaged New issue has not been triaged by the area owner label Jun 23, 2020
@bgrainger
Copy link
Contributor

add a similar method which simply return a ReadOnlySpan, that can slice the provider's internal buffer

I found myself wishing for a high-performance way to retrieve binary data from SQLite; a method like the one proposed that could return the results of sqlite3_column_blob (which returns a pointer into SQLite's native memory) as a ReadOnlySpan<byte> would be very useful.

Note that SQLite introduces some complex lifetime issues, though:

The pointers returned are valid until a type conversion occurs as described above, or until sqlite3_step() or sqlite3_reset() or sqlite3_finalize() is called. The memory space used to hold strings and BLOBs is freed automatically.

That means that the data returned by "GetReadOnlySpan(ordinal)" might only be valid until the very next column is read (if the method called to read that column requires a type conversion). (And of course it's invalidated if DbDataReader.Read() or DbDataReader.Close() is called.)

@roji
Copy link
Member Author

roji commented May 10, 2022

The lifetime issues are indeed the tricky problem here...

@davidfowl
Copy link
Member

The span needs to be passed in. Take a look at all of the exiting APIs that deal with spans, they all have this pattern.

@roji
Copy link
Member Author

roji commented May 11, 2022

@davidfowl we had this conversation above :) #24899 (comment)

But yeah, to develop this further... It's true that if the row is being streamed rather than buffered (CommandBehavior.SequentialAccess), it may make sense to allow users to pass in a (writable) Span, and to have the driver read into that directly. API-wise this is actually already possible via GetBytes), and Npgsql actually implements it in a zero-copy way (when SequentialAccess is specified).

What we could do is add an overload accepting a Span rather than byte[], for other memory types.

@davidfowl
Copy link
Member

Right, that would allow you to pass a stackalloced span. What happens if the buffer is too small today?

@bgrainger
Copy link
Contributor

Npgsql actually implements it in a zero-copy way

I may be misunderstanding what you're saying, but how does NpgsqlDataReader.GetBytes(int, long, byte[], int, int) fill the user's buffer (the third parameter) in a zero-copy way?

@vonzshik
Copy link
Contributor

I may be misunderstanding what you're saying, but how does NpgsqlDataReader.GetBytes(int, long, byte[], int, int) fill the user's buffer (the third parameter) in a zero-copy way?

NpgsqlDataReader.GetBytes(int, long, byte[], int, int) calls NpgsqlReadBuffer.Read, which reads directly into the provided buffer from the stream (that's true only if CommandBehavior.SequentialAccess is provided, because otherwise the entire row is buffered).

@roji
Copy link
Member Author

roji commented May 11, 2022

Right, that would allow you to pass a stackalloced span.

Yep.

What happens if the buffer is too small today?

GetBytes accepts parameters saying how many bytes to read, from which position in the column and where to put them in the buffer, so I don't think there's a problem (unless the user passes a length that's bigger than the buffer they themselves provided, in which case we throw).

If we added a Span overload, it would proabbly look something like:

public virtual long GetBytes(int ordinal, long dataOffset, Span<bytes> span);

This would allow reading different segments from the same binary column in different calls (the offset would need to always increment if streaming with SequentialAccess). Does this make sense and align with other APIs @davidfowl?

@dmcnaughton-nbc
Copy link

Just wanted to add to this by highlighting the fact that the GetStream() call won't work for the RowVersion/Timestamp data type in MSSQL. So having a Span based read operation on DbDataReader would reduce the need for a byte buffer when reading the RowVersion into something such as a single long value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Data
Projects
None yet
Development

No branches or pull requests

9 participants