Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How bulk insert with CDC enabled on SQL Server? #24597

Closed
tecnologer opened this issue Apr 6, 2021 · 2 comments
Closed

How bulk insert with CDC enabled on SQL Server? #24597

tecnologer opened this issue Apr 6, 2021 · 2 comments

Comments

@tecnologer
Copy link

Ask a question

I'm using SQL Server with Entity Framework Core in an ASP.net Core app.

  1. There is a hosted (IHostedService) service with a Thread.Timer that runs once a week, on the DoWork of the timer I have async functions (total: 13) to clone a lot of data into the same table (one function per table).
  2. MS SQL Server has enabled CDC globally

The issue is when it tries to save the new data for Table2 (approx 2k new records) Entity Framework never completes the SaveChangesAsync this process is blocked and it starts consuming the whole RAM available.

I've created a question on Stackoverflow, and I was able to save data calculating all properties on the client-side, but I also had to reduce the number of records. If some table tries to insert more than ~10k the process is blocked on the next table. I mean, if the table1 inserts more than ~10k records when the table2 tries to insert its data (~2k records) the process will be blocked.

If I disable CDC on the SQL side, everything works fine. But I can't disable it because is used by another project.

How should manage this kind of task?

Include your code

I call the cloner functions like this:

public async Task CallAll(ApplicationDbContext context)
{
    try
    {
        _context = context;

        await CloneDataTable1();
        await CloneDataTable2();
        //...
        await CloneDataTableN();
    }
    catch (Exception e)
    {
        //log 
    }
}

Each cloner function is like this one, there are other functions that require a loop to update randomly FKs:

public async Task CloneStudents()
{
    using (var context = _context.CreateNewInstance())
    {
        var existingStudents = context.Students.Where(s => s.UniversityId == _destUniversity.Id);
        var sourceStudents = context.Students.Where(s => s.UniversityId == _sourceUniversity.Id)
                                                .Select(s => new Student()
                                                {
                                                    //clone properties
                                                });

        var newStudents = sourceStudents.Where(s => !existingStudents.Any(es => es.Key == s.Key)).ToArray();

        if (newStudents.Length == 0)
        {
            existingStudents = null;
            sourceStudents = null;
            return;
        }

        try
        {
            context.Students.AddRange(newStudents);
            await context.SaveChangesAsync();
        }
        catch (Exception e)
        {
            //log error
        }
        existingStudents = null;
        sourceStudents = null;
    }
}

Include stack traces

There is no stack trace, I've never got an error.

Include verbose output

dotnet exec --depsfile $SOURCE\Company.DataWarehouse.Access\Company.Product.AdminCenter\bin\Debug\netcoreapp3.1\Company.Product.AdminCenter.deps.json --additionalprobingpath $HOME\.nuget\packages --additionalprobingpath "C:\Program Files\dotnet\sdk\NuGetFallbackFolder" --runtimeconfig $SOURCE\Company.DataWarehouse.Access\Company.Product.AdminCenter\bin\Debug\netcoreapp3.1\Company.Product.AdminCenter.runtimeconfig.json $HOME\.dotnet\tools\.store\dotnet-ef\5.0.4\dotnet-ef\5.0.4\tools\netcoreapp3.1\any\tools\netcoreapp2.0\any\ef.dll dbcontext list --assembly $SOURCE\Company.DataWarehouse.Access\Company.Product.AdminCenter\bin\Debug\netcoreapp3.1\Company.Product.AdminCenter.dll --startup-assembly $SOURCE\Company.DataWarehouse.Access\Company.Product.AdminCenter\bin\Debug\netcoreapp3.1\Company.Product.AdminCenter.dll --project-dir $SOURCE\Company.DataWarehouse.Access\Company.Product.AdminCenter\ --language C# --working-dir $SOURCE\Company.DataWarehouse.Access\Company.Product.AdminCenter --verbose --root-namespace Company.Product.AdminCenter
Using assembly 'Company.Product.AdminCenter'.
Using startup assembly 'Company.Product.AdminCenter'.
Using application base '$SOURCE\Company.DataWarehouse.Access\Company.Product.AdminCenter\bin\Debug\netcoreapp3.1'.
Using working directory '$SOURCE\Company.DataWarehouse.Access\Company.Product.AdminCenter'.
Using root namespace 'Company.Product.AdminCenter'.
Using project directory '$SOURCE\Company.DataWarehouse.Access\Company.Product.AdminCenter\'.
Remaining arguments: .
Finding DbContext classes...
Finding IDesignTimeDbContextFactory implementations...
Finding application service provider in assembly 'Company.Product.AdminCenter'...
Finding Microsoft.Extensions.Hosting service provider...
Using environment 'Development'.
Using application service provider from Microsoft.Extensions.Hosting.
Found DbContext 'ApplicationDbContext'.
Finding DbContext classes in the project...

Include provider and version information

Local:

  • EF Core version: 5.0.2
  • Database provider: Microsoft.EntityFrameworkCore.SqlServer 5.0.2
  • Target framework: dotnet core 3.1.407
  • Operating system: Windows 10 Pro 20H2 (19042.867)
  • IDE: Microsoft Visual Studio Community 2019 Version 16.9.2

SQL Server remote:

  • Database: Microsoft SQL Server Standard (14.0.3356.20)
  • Operating System: Windows Server 2016 Datacenter (10.0)
@ajcvickers
Copy link
Contributor

@tecnologer As far as we can tell, this looks like slowness of inserting when using CDC. You might want to try disabling EF Core batching since it is possible CDC doesn't handle large batches of inserts in one go. Beyond that, the performance here is likely to improve with future implementation of #15059, and the associated #9118 and #10443,

@roji
Copy link
Member

roji commented Apr 14, 2021

To adjust the batching size (set to 1 to disable batching): https://docs.microsoft.com/en-us/ef/core/performance/efficient-updating#batching

@ajcvickers ajcvickers reopened this Oct 16, 2022
@ajcvickers ajcvickers closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants