Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

SqlClient memory performance changes #32811

Closed
wants to merge 16 commits into from
Closed

Conversation

Wraith2
Copy link
Contributor

@Wraith2 Wraith2 commented Oct 14, 2018

While doing some profiling I realised that SqlClient related classes were causing a lot of memory churn so I looked into why. I've made some performance changes which affect all query paths to some extent but at most effective when using a single command object with a prepared query, in that case the memory usage is almost 10x lower.

Points to note:

  • I have made changes to the _SqlRPC object which change how the system and user parameters are allocated and stored. Originally the system parameters were allocated each execution, I've altered this so the correct number are allocated as needed and re-used cleanly if the query type changes (between prepare, unprepared, proc and exec). User parameters are no longer copied in but referenced by index into the parameter collection. This will need careful review.
  • Many synchronous code paths contained lambda allocations for async usage which weren't needed. There were already a small number of places where these were split into their own functions, I have extended the use of this pattern. In these cases I have marked the broken out functions with NoInlining even though they're unlikely to meet the jit requirements because the jit will continue to change and improve and I don't want those functions to ever be eligible.
  • I spanified the WriteByteArray path and used that to allow the new BitConverter.TryWriteBytes overload with stackalloced buffers to remove memory allocations for writing of 16 and 32 bit reals. This path contains an async continuation when the buffer needs to be expanded which now allocates and copies to a byte[] only when needed.

Things that can still be improved with more extensive knowledge:

  • A major cause of allocations is the boxing of IntPtr to return as an object in TdsStateParserNative.ReadSyncOverAsync when calling into sni.dll. If the return type could be changed to IntPtr it would improve things.
  • SqlCommand.ValidateCommand unconditionally calls into ValidateAsyncCommand which allocates an async state object. ValidateCommand has a bool async arg but it isn't used which I find suspicious that someone would miss, I haven't changed this in case it's deliberate. Please review.

All the SqlClient tests pass locally. All the manual tests I'm able to run pass. All my testing of using it with various queries works. So I think it's stable and working, given the complexity of the library and the wide usage it will need some careful prodding from reviewers.

I can't get benchmarkdotnet to co-operate with the project despite help from people in gitter/corefx's help. I do have profiling output which shows the level of improvement. I have written the benchmarks and was using them on the original problem code I used the same setup/run/teardown loop from the test to profile with. The tests consist of generating 1000 numbers and then inserting them into a table using a prepared/unprepared/sproc query, and looping that.

I'm not an async expert so I haven't done much work on the async paths. They still work, they're no slower and they do benefit from the changes I've done. The profiling shows that async machinery dominates the allocations running the same test rig using async versions of functions.

Before changes, 100x1000 float32 prepared insert:
sqlclient-reference

After changes:
sqlclient-development

FAO: @keeratsingh, @afsanehr, @David-Engel

@saurabh500
Copy link
Contributor

@Wraith2 Are you still working on this PR? There is a bunch of commented out code, which can be deleted. I notice the [WIP] in the title, but wanted to double check.

@Wraith2 Wraith2 changed the title [WIP] SqlClient memory performance changes SqlClient memory performance changes Oct 18, 2018
@Wraith2
Copy link
Contributor Author

Wraith2 commented Oct 18, 2018

The build failure is caused by the new BitConverter.TryWriteBytes methods not being available in some builds. How would you suggest I deal with that? The changes are desirable and remove small buffer allocs. Would it be a good idea to #ifdef them or should I simply revert the change?

@danmoseley
Copy link
Member

Re BitConverter - if it's just one or two lines it's probably best to use #if netcoreapp with some .NET Standard compatible alternative in the #else.

The project file has <DefineConstants Condition="'$(TargetsNetCoreApp)' == 'true'">$(DefineConstants);netcoreapp</DefineConstants>

@Wraith2
Copy link
Contributor Author

Wraith2 commented Oct 18, 2018

Thanks, done. There are a small number of other places in the project that could also use those BitConverter overloads but the gains would be extremely minor and I'm not intending to change them in this PR.

@keeratsingh
Copy link

@Wraith2
The following Manual Tests are failing on Linux and Mac with the error, can you kindly confirm and fix the same.

Exception

System.ArgumentException : Destination is too short.

Failing Tests

System.Data.SqlClient.ManualTesting.Tests.DDDataTypesTest.MaxTypesTest [FAIL]
System.Data.SqlClient.ManualTesting.Tests.DDDataTypesTest.XmlTest [FAIL]
System.Data.SqlClient.ManualTesting.Tests.DataReaderTest.CheckSparseColumnBit [FAIL]
System.Data.SqlClient.ManualTesting.Tests.MARSSessionPoolingTest.MarsExecuteReader_Text_NoGC [FAIL]
System.Data.SqlClient.ManualTesting.Tests.MARSSessionPoolingTest.MarsExecuteReader_RPC_NoGC [FAIL]
System.Data.SqlClient.ManualTesting.Tests.MARSSessionPoolingTest.MarsExecuteReader_StoredProcedure_WithGC [FAIL]
System.Data.SqlClient.ManualTesting.Tests.MARSSessionPoolingTest.MarsExecuteScalar_AllFlavors [FAIL]
System.Data.SqlClient.ManualTesting.Tests.MARSSessionPoolingTest.MarsExecuteReader_Text_WithGC [FAIL]
System.Data.SqlClient.ManualTesting.Tests.MARSSessionPoolingTest.MarsExecuteNonQuery_AllFlavors [FAIL]
System.Data.SqlClient.ManualTesting.Tests.SqlBulkCopyTest.Bug903514Test [FAIL]

Stack Trace System.Data.SqlClient.ManualTesting.Tests.DDDataTypesTest.XmlTest

at System.ReadOnlySpan`1.CopyTo(Span`1 destination) in /root/coreclr/src/System.Private.CoreLib/shared/System/ReadOnlySpan.Fast.cs:line 181 
at System.Data.SqlClient.TdsParserStateObject.WriteByteArray(ReadOnlySpan`1 b, Int32 len, Int32 offsetBuffer, Boolean canAccumulate, TaskCompletionSource`1 completion, Byte[] array) in /mnt/vsoagent1/1/s/src/System.Data.SqlClient/src/System/Data/SqlClient/TdsParserStateObject.cs:line 3079 
at System.Data.SqlClient.TdsParser.WriteString(String s, Int32 length, Int32 offset, TdsParserStateObject stateObj, Boolean canAccumulate) in /mnt/vsoagent1/1/s/src/System.Data.SqlClient/src/System/Data/SqlClient/TdsParser.cs:line 5685 
at System.Data.SqlClient.TdsParser.WriteUnterminatedValue(Object value, MetaType type, Byte scale, Int32 actualLength, Int32 encodingByteSize, Int32 offset, TdsParserStateObject stateObj, Int32 paramSize, Boolean isDataFeed) in /mnt/vsoagent1/1/s/src/System.Data.SqlClient/src/System/Data/SqlClient/TdsParser.cs:line 9546 
at System.Data.SqlClient.TdsParser.TDSExecuteRPCAddParameter(TdsParserStateObject stateObj, SqlParameter param, MetaType mt, Byte options) in /mnt/vsoagent1/1/s/src/System.Data.SqlClient/src/System/Data/SqlClient/TdsParser.cs:line 7313 
at System.Data.SqlClient.TdsParser.TdsExecuteRPC(_SqlRPC[] rpcArray, Int32 timeout, Boolean inSchema, SqlNotificationRequest notificationRequest, TdsParserStateObject stateObj, Boolean isCommandProc, Boolean sync, TaskCompletionSource`1 completion, Int32 startRpc, Int32 startParam) in /mnt/vsoagent1/1/s/src/System.Data.SqlClient/src/System/Data/SqlClient/TdsParser.cs:line 7165 
at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, SqlDataReader ds) in /mnt/vsoagent1/1/s/src/System.Data.SqlClient/src/System/Data/SqlClient/SqlCommand.cs:line 2592 
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean asyncWrite, String method) in /mnt/vsoagent1/1/s/src/System.Data.SqlClient/src/System/Data/SqlClient/SqlCommand.cs:line 2469 
at System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(TaskCompletionSource`1 completion, Boolean sendToPipe, Int32 timeout, Boolean asyncWrite, String methodName) in /mnt/vsoagent1/1/s/src/System.Data.SqlClient/src/System/Data/SqlClient/SqlCommand.cs:line 1176 
at System.Data.SqlClient.SqlCommand.ExecuteNonQuery() in /mnt/vsoagent1/1/s/src/System.Data.SqlClient/src/System/Data/SqlClient/SqlCommand.cs:line 871 
at System.Data.SqlClient.ManualTesting.Tests.DDDataTypesTest.XmlTest() in /mnt/vsoagent1/1/s/src/System.Data.SqlClient/tests/ManualTests/DDBasics/DDDataTypesTest/DDDataTypesTest.cs:line 43

Copy link

@keeratsingh keeratsingh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes for failing tests on Linux and Mac

@Wraith2
Copy link
Contributor Author

Wraith2 commented Oct 19, 2018

I mentioned in my original post that I had run the manual tests that I could, there are ones I can't run because I don't have the databases and setup needed. If there's some information about how to set all of that up then i'll give it a try.
Mac and Linux are also outside my dev experience and while in theory I could setup a Linux vm I lack the knowledge to get to advanced enough a position to successfully use it as a test platform. So unfortunately I can't confirm.

Logically I can only see one way for that particular error to happen in that place, if I'm right it's just a case of the input buffer being longer than the value we want to consume and me being careless about my span length. I've pushed a tiny change which would correct this, can you pull and retry?

@saurabh500
Copy link
Contributor

@Wraith2 For Linux test execution you could either install and setup the SQL Server just the way you did on Windows, or better approach, you could simply point the Linux test environment variable to point to the already setup, test Sql server.
That should take care of running the tests on Linux. using a Linux VM would be a good way to do this. Ubuntu 16.04 would be good to test out these changes.

The steps to setup and build on Ubuntu are at https://github.com/dotnet/corefx/blob/master/Documentation/building/unix-instructions.md

@Wraith2
Copy link
Contributor Author

Wraith2 commented Oct 19, 2018

The part I'm missing is Linux experience, I have none. Going from zero to corefx and sql server dev/testing is a big task. I couldn't run most of the manual tests anyway because there no setup documented for them.

@saurabh500
Copy link
Contributor

saurabh500 commented Oct 19, 2018

@Wraith2 Did your Manual Tests execution on Windows, execute the failing tests above?
You mentioned that you have run a subset of Manual tests, because you were probably facing some issues. What tests ran and what issues were you facing?

@keeratsingh
Copy link

@Wraith2 I still see the same failures on the Linux with the latest changes. Mac run is still in progress.
@afsanehr Will be able to update you once it is done.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Oct 19, 2018

I doubt this test was executed, the only ones I can run are the basic ones that only need an sql server instance and northwind. These probably weren't in that list. As @keeratsingh has said these don't fail on windows though so I couldn't have caught it anyway.

As you can see from the code changes in that area the problem is occurring when there is too much information to continue the write synchronously, since it was passed in as a span it's allocating space and trying to copy into it before passing that into the continuation creation method.
How you can manage to have insufficient space to copy when the slice is length limited to the same value as the temp array I don't know.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Oct 21, 2018

I've spent the weekend getting an Ubuntu vm setup with sql2017 and the parts needed to compile corefx. I can now compile it and attempt to run tests. I say attempt because even the master tests won't work, the first manual test core dumps with an unrelated assertion fault. A tooling warning from adcade seems to block the sqlperf branch tests running and the build.sh script won't recognise -- as a valid argument so I can't force the test with /p:ForceRunTests=true . Combine this with the StressRunner not working at all on windows and the tests for this module seem to very difficult to setup correctly.

The change to WriteByteArray is only needed by the change for float and double at the moment and if needed that set of changes could be dropped. I'd rather not, I'd much rather have a repeatable cross platform set of tests that let me pinpoint the issue but after my last change took 3 months to integrate I don't want the same to happen here.

@AfsanehR-zz
Copy link
Contributor

Just an update that I am still seeing the failures on Mac, similar to Linux.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Oct 22, 2018

Expected. But I have nowhere to go with this unless someone willing to help more than telling me it's broken.

@karelz karelz added the tenet-performance Performance related issue label Oct 23, 2018
//
// Takes a byte array and writes it to the buffer.
//
internal Task WriteByteArray(byte[] b, int len, int offsetBuffer, bool canAccumulate = true, TaskCompletionSource<object> completion = null)
internal Task WriteByteArray(ReadOnlySpan<byte> b, int len, int offsetBuffer, bool canAccumulate = true, TaskCompletionSource<object> completion = null, byte[] array = null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flow of this function is dependent on array . It is a bit confusing, what is the source of data for this method.
If the byte[] array is defined then it is used to create a Span, else b is used.
Will it make sense to create an overload of this function, one with ReadOnlySpan b and one which takes byte[] array ? From the looks of it, if b is passed in and array is passed in, then the data in array takes precedence, which means that the data in b will be ignored.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you pass in a span and no byte array then it will attempt to use only the span.
If you pass in an array then the array will be used as the source of the span.
if the span cannot be written into the current packet then the remaining contents of the span are "promoted" to a heap allocated array so they can be part of the async closure that will used to callback into the method to continue.
Once you're got a full array the method will use that in preference to anything else so you can't be in a position where you accidentally pass a span and an array, array wins.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Wraith2 This is what I want to prevent in the method. Having to read through code to understand what is the data source 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's clear enough I can put it into the code at the top of the function, would that work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, make a comment on top of the function explaining this. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a coment to the header of the function explaining how it's used. I've also changed it from internal to private because i'd already provided wrappers for both array and span only, this way you have to be in the same file as the confusing parametered version to call it and at that point you chould read the comment. I also decided to rename it to WriteBytes because it's more accurate. Externally you have to call WriteByteArray or WriteByteSpan.

…ake private to prevent confused external callers
@danmoseley
Copy link
Member

@Wraith2 is this ready for re-review?

@Wraith2
Copy link
Contributor Author

Wraith2 commented Nov 12, 2018

Yes, has been for a while, I thought that's what @saurabh500 was doing...

@saurabh500
Copy link
Contributor

I think the code looks good to me. However I want to hold off the merge, till the EntityFrameworkCore tests have been run with SqlClient with these changes.
That's not an automated process and we haven't gotten to it yet. However running EF tests with changes have helped figure out issues in the past.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Nov 12, 2018

Ok, that makes sense. Is the EF test suite available for me to test against?

When this PR gets tested it might be a good idea to try #33155 as well, both contribute performance increases and open the way for further work I've identified but am blocked on.

@saurabh500
Copy link
Contributor

Yes EF tests are available. It is an open source project as well.
A few months ago I had put together instructions to run the tests. In case EF dev workflow has changed then the instructions may not work.

https://gist.github.com/saurabh500/7aea92c5bd0f2ad3d12f1dd751e6aec8

The over all idea is to clone and build https://github.com/aspnet/EntityFrameworkCore
Once that is done, replace the SqlClient DLL being used by EF, with the version built with the dev branch and run the tests again.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Nov 15, 2018

The EFCore tests run successfully.

repo verision:

Total tests: 10965. Passed: 10827. Failed: 0. Skipped: 138.
Test Run Successful.
Test execution time: 2.9681 Minutes

my version:

Total tests: 10965. Passed: 10827. Failed: 0. Skipped: 138.
Test Run Successful.
Test execution time: 3.5565 Minutes

Your instructions were good but slightly off at the end i think because efcore have changed how it works. Once you update the package version in build\dependencies.props you then have to provide a way to restore that package, not just the dll. I edited build\sources.props to add my corefx bin\packages\Debug directory to the list. Before doing that i ensured that the build was broken by the change to the package version alone so i know it picked up my package from the right place and had to use it.

Mine takes longer because i had to use the Debug build nuget packages. -allconfigurations doesn't seem to build the release packages and i can't find how to force build packages, the tooling docs only reference a batch file that no longer exists. There shouldn't be any behavioural difference between the debug and release versions though, if anything more checks are done with the assertions in debug so it may be a better test.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Nov 19, 2018

Nudge.

@benaadams
Copy link
Member

@dotnet-bot test UWP CoreCLR x64 Debug Build

@saurabh500
Copy link
Contributor

@Wraith2 Nudge acknowledged.
@afsanehr is taking care of testing these changes with different platforms especially Linux. We will take care of merging the PR once we have more confidence with the tests.
Since this change modifies the core part of SqlClient where network packet serialization and deserialization takes place, we are concentrating on this PR before we move on to the other PRs. Any regression due to this PR will break many scenarios .
As a result you won't see much activity on the other SQL PRs of yours as well.
Once we have built confidence about this change, we will merge it in.
Thanks for your patience.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Nov 28, 2018

@dotnet-bot test UWP CoreCLR x64 Debug Build

@AfsanehR-zz
Copy link
Contributor

@Wraith2 Running EFCore tests on Linux, I see a crash in the test EFCore.SqlServer.FunctionalTests

The active test run was aborted. Reason: Assertion Failed
AsyncResult null on callback
   at System.Data.SqlClient.TdsParserStateObject.ReadAsyncCallback[T](IntPtr key, T packet, UInt32 error)
   at System.Data.SqlClient.TdsParserStateObject.ReadAsyncCallback[T](T packet, UInt32 error)
   at System.Data.SqlClient.SNI.SNIMarsHandle.HandleReceiveError(SNIPacket packet)
   at System.Data.SqlClient.SNI.SNIMarsConnection.HandleReceiveError(SNIPacket packet)
   at System.Data.SqlClient.SNI.SNIMarsConnection.HandleReceiveComplete(SNIPacket packet, UInt32 sniErrorCode)
   at System.Data.SqlClient.SNI.SNIPacket.<>c__DisplayClass32_0.<ReadFromStreamAsync>b__0(Task`1 t)
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()

As Entity frameworks tests enable MARS randomly and for some tests MARS is always enabled, these failures are happening with MARS enabled code path.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Nov 29, 2018

EF tests enable MARS randomly? that would mean they weren't repeatable and would defeat their purpose wouldn't it? That doesn't make sense...

Interesting but not related as far as I can tell. The error is coming from

Debug.Assert(CheckPacket(packet, source) && source != null, "AsyncResult null on callback");
which is related to a bunch of async machinery I didn't go anywhere near. Either the packet has a null data array or the TCS is null and there's no way to tell which at the moment. Can you verify that test passes without my changes?

My behavioural changes are on write paths, so not involved in this read operation (and _networkPacketTaskSource is exclusively used for reads), the other changes are all minor movements of async out of method bodies and again none of them are involved in what's going wrong, the order of operations hasn't changed and the affected variable isn't present in those paths.

If these non-CI tests need to be run for both state object implementations it would make things easier if there were a way to switch between the two without having to recompile or fiddle with app context switches. Can the implementation switch be re-enabled and put on an environment variable?

@Wraith2
Copy link
Contributor Author

Wraith2 commented Nov 29, 2018

Ok, I see what you're seeing now. I'm still certain it isn't directly caused by a change I've made. The non-deterministic behaviour makes me think it's a timing problem. This combined with the quite random order that tests are run makes the error appear in different places each time, even multiple times in the same test when they're run in parallel.

I've managed verify that the TCS isn't null, that means that packet._data is null and that can only be the case when there's a problem receiving data in

Exception e = t.Exception?.InnerException;

In order to test I have to do a build -allconfigurations because it's the only way to force a package build. I've tried the Debugger.Break() trick and that won't work, the code is present in the binary but doesn't function for no reason I can determine. So at the moment I can iterate very slowly and my only method of inspection is assertion. This is possibly the worst debugging environment possible, even messageboxes are friendlier. Attempting to isolate the problem under these conditions is totally impractical.

@saurabh500
Copy link
Contributor

To remove the MARS randomization in EF tests, you could change the tests here #32811 (comment) to make sure that MARS is always set to true. MARS - MultipleActiveResultSets
In the connection string, add MultipleActiveResultSets=true as well.
About debugging, we have used asserts and console logs.

cc @danmosemsft @karelz, in case there are any pointers for debugging on Linux?

@Wraith2
Copy link
Contributor Author

Wraith2 commented Nov 30, 2018

Debugging on Linux isn't needed. The managed/unmanaged switch can be hardcoded to force use of the managed path on windows, this is what I've done and replicated the issues reported. It would be useful if this switch was available at runtime even if only in debug builds but I can get around it.

In order to replicate the issue seen at the moment you have to have build sqlclient, publish the nuget, consume that nuget from efcore and run the tests until they fail at some point, which they will but not reliably and not in the same test. When they do fail there is no debug output, attaching a debugger won't get you much information because you don't know when the problem will occur and when it does occur since you're using a package you don't get code or il you're down at assembly level. It may be possible to fix some of these issues but it'll still be slow.

I don't believe that the managed implementation is working correctly. As I said above none of the changes I've made have affected this path through the code. The problem we're seeing looks like the tcp socket is disconnecting and I've been nowhere near that and any timing changes should not affect how this functions. As you're aware that's a large complex and async set of code if there are problems inside it then I'm not going to attempt to isolate and quantify them with the absolute minimum level of debugging support currently available. My day job is maintenance programming and I wanted to work on core to get away from inheriting problems and having to fix them without help and have a community to work with and improve something.

@AfsanehR-zz
Copy link
Contributor

@Wraith2 We understand it's difficult to debug this issue given that you have to generate a package and use the same on EFCore tests. What we would suggest is to break down the PR into three different sections as you mentioned above.
This way it will be easier to track down where the issue is coming from and we can get the working pieces in the driver sooner. Thanks again for all your contributions 😄

@Wraith2
Copy link
Contributor Author

Wraith2 commented Dec 7, 2018

I can split apart the changes but they will have the same effect when you combine them. If i'm right that the error isn't actually coming from my code changes then there's no reason we won't just end up in the same state again, and it won't be because the last change is the cause it'll be that it's the one that changed the timing enough to make it apparent. Have you investigated and do you agree with my conclusions? or have you found information that leads you to think differently?

If you want me to close this and rework in parts can you review the other two open PR's, they're entirely independant of this one but weren't being looked at waiting for this one. #33580 and #33155

@AfsanehR-zz
Copy link
Contributor

AfsanehR-zz commented Dec 7, 2018

@Wraith2

Ok, I see what you're seeing now. I'm still certain it isn't directly caused by a change I've made. The non-deterministic behaviour makes me think it's a timing problem.

I was able to run the EFCore tests on Linux successfully without your changes
and I only see the issue happening running the tests with this pr. I still think it would be beneficial to separate this pull request in three different sections, since we see valuable contribution here and that way we would be able to merge it sooner.

If you want me to close this and rework in parts can you review the other two open PR's, they're entirely independant of this one but weren't being looked at waiting for this one. #33580 and #33155

👍 Will start looking in to these PRs too. I was just wondering if you had already run EFCore tests with those changes as well and if not can you please run them and let us know the results? Thanks!

@Wraith2
Copy link
Contributor Author

Wraith2 commented Dec 7, 2018

I've run the EFCore tests on #33155 in windows mode, they passed, as noted in the issue. I haven't run them for #33580 but I'm not sure it's useful to do so given that it's a tiny change.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Dec 7, 2018

Good news! I have a minimal reproduction of the problem you found with the ReadAsyncCallback. Steps to repro are pretty simple. Checkout current master, change UseManagedSNI to true, build and run the manual tests, watch it fail:

===========================================================================================================
    Discovering: System.Data.SqlClient.ManualTesting.Tests (method display = ClassAndMethod, method display options = None)
    Discovered:  System.Data.SqlClient.ManualTesting.Tests (found 203 test cases)
    Starting:    System.Data.SqlClient.ManualTesting.Tests (parallel test collections = on, max threads = 4)
      System.Data.SqlClient.ManualTesting.Tests.AADAccessTokenTest.AccessTokenTest [SKIP]
        Condition(s) not met: "IsAccessTokenSetup", "IsAzureServer"
      System.Data.SqlClient.ManualTesting.Tests.TvpTest.TestMain [FAIL]
  Assertion Failed
  Assertion Failed
  Assertion Failed
  AsyncResult null on callback

     at System.Data.SqlClient.TdsParserStateObject.ReadAsyncCallback[T](IntPtr key, T packet, UInt32 error) in E:\Programming\csharp7\corefx\src\System.Data.SqlClient\src\System\Data\SqlClient\TdsParserStateObject.cs:line 2722
     at System.Data.SqlClient.TdsParserStateObject.ReadAsyncCallback[T](T packet, UInt32 error) in E:\Programming\csharp7\corefx\src\System.Data.SqlClient\src\System\Data\SqlClient\TdsParserStateObject.cs:line 2688
     at System.Data.SqlClient.SNI.SNIMarsHandle.HandleReceiveError(SNIPacket packet) in E:\Programming\csharp7\corefx\src\System.Data.SqlClient\src\System\Data\SqlClient\SNI\SNIMarsHandle.cs:line 321
     at System.Data.SqlClient.SNI.SNIMarsConnection.HandleReceiveError(SNIPacket packet) in E:\Programming\csharp7\corefx\src\System.Data.SqlClient\src\System\Data\SqlClient\SNI\SNIMarsConnection.cs:line 136
     at System.Data.SqlClient.SNI.SNIMarsConnection.HandleReceiveComplete(SNIPacket packet, UInt32 sniErrorCode) in E:\Programming\csharp7\corefx\src\System.Data.SqlClient\src\System\Data\SqlClient\SNI\SNIMarsConnection.cs:line 165
     at System.Data.SqlClient.SNI.SNIPacket.<>c__DisplayClass32_0.<ReadFromStreamAsync>b__0(Task`1 t) in E:\Programming\csharp7\corefx\src\System.Data.SqlClient\src\System\Data\SqlClient\SNI\SNIPacket.cs:line 276
     at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state) in E:\A\_work\104\s\src\System.Private.CoreLib\shared\System\Threading\ExecutionContext.cs:line 321
     at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread) in E:\A\_work\104\s\src\System.Private.CoreLib\src\System\Threading\Tasks\Task.cs:line 2448
     at System.Threading.ThreadPoolWorkQueue.Dispatch() in E:\A\_work\104\s\src\System.Private.CoreLib\src\System\Threading\ThreadPool.cs:line 621

Tracking back you can find the inner exception from here which will show you a SocketException containing an IOException identifying that the socket connection was closed. This causes null data in the packet and that bubbles up to where you see it.

I opened an issue for this with some more details, https://github.com/dotnet/corefx/issues/33930

@Wraith2
Copy link
Contributor Author

Wraith2 commented Dec 12, 2018

I have split this apart as requested, it is now in 3 parts. #34047, #34048 , #34047 which will allow the various changes to be reviewed more easily.

@Wraith2 Wraith2 closed this Dec 12, 2018
@Wraith2 Wraith2 deleted the sqlperf branch December 15, 2018 11:33
@karelz karelz added this to the 3.0 milestone Dec 21, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants