Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with RollbackToSavepointAsync not working after a DbCommand failure due to Geography column (NetTopologySuite) #24818

Closed
jondmcelroy opened this issue Apr 30, 2021 · 12 comments

Comments

@jondmcelroy
Copy link

File a bug

We are wrapping calls to the DbContext SaveChangesAsync with our own transaction, savepoint and rollback.
We noticed that when a command failure happened the rollback would not work due to the system thinking that transaction was no longer available.

internal class ReadWriteUnitOfWork : UnitOfWork, IReadWriteUnitOfWork
    {
        private const string SavepointName = nameof(ReadWriteUnitOfWork) + "Savepoint";
        private readonly IEntityCacheFacade _entityCache;
        private IDbContextTransaction? _transaction;
        private bool _disposed;

        public ReadWriteUnitOfWork(
            UnitOfWorkContext unitOfWorkContext,
            DbContext dbContext,
            IEntityCacheFacade entityCache)
            : base(unitOfWorkContext, dbContext)
        {
            _entityCache = entityCache;
        }

        public async Task SaveChangesAsync()
        {
            var dbContext = DbContext;
            _transaction ??= await dbContext.Database.BeginTransactionAsync();

            await _transaction.CreateSavepointAsync(SavepointName);
            try
            {
                await dbContext.SaveChangesAsync(false);
                await _entityCache.UpdateEntriesAsync(dbContext.ChangeTracker.Entries());
            }
            catch
            {
                await _transaction.RollbackToSavepointAsync(SavepointName);
                throw;
            }
            finally
            {
                await _transaction.ReleaseSavepointAsync(SavepointName);
            }

            // Accept changes after successfully updating cache.
            dbContext.ChangeTracker.AcceptAllChanges();
        }

   ....
}

Application insights show the timeline when we are attempting to save changes on a known sql command failure:
image
You can see that both the Save Transactions (the one we are doing and the one that efcore is doing) work properly.
Then the sql db command fails.
Following that both of the transaction rollbacks fail with this error: The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION.

When I attempt to dive into it, I noticed that when Creating the Savepoint the _dbTransaction in the RelationalTransction has both a set Connection and a InternalTransaction object.

image

After it calls SaveChangesAsync when I look at that same RelationTransaction, the fields on the _dbTransation have been set to null.

image

This happens to the __EFSavePoint as well as the one we have wrapped around it.

Has anyone else experienced this problem or have any ideas about what could cause it?

SQLServer Multiple Active Result Sets is not turned on.

Include provider and version information

EF Core version: 5.0.1
Database provider: Microsoft.EntityFrameworkCore.SqlServer - 5.0.1
Target framework: .NET 5.0
Operating system: WINDOWS 10
IDE: Rider 2020.3

@roji
Copy link
Member

roji commented May 2, 2021

It's difficult to understand from the code above what exactly your issue is and how it can be reproduced - I've created a minimal sample with an user savepoint wrapping SaveChanges in addition to the internal EF one, and it seems to work fine. Can you please tweak my code sample below to show what's not working?

Attempted repro
await using var ctx = new BlogContext();
await ctx.Database.EnsureDeletedAsync();
await ctx.Database.EnsureCreatedAsync();

var transaction = await ctx.Database.BeginTransactionAsync();
await transaction.CreateSavepointAsync("MySavepoint");

var blog = await ctx.Blogs.SingleAsync();

// Concurrent change to make the main change fail
await using (var otherCtx = new BlogContext())
{
    var otherBlog = await otherCtx.Blogs.SingleAsync();
    otherBlog.Name = "changed";
    await otherCtx.SaveChangesAsync();
}

blog.Name = "failed_changed";

try
{
    await ctx.SaveChangesAsync();
}
catch (DbUpdateConcurrencyException)
{
    await transaction.RollbackToSavepointAsync("MySavepoint");
    await transaction.RollbackAsync();
    Console.WriteLine("Concurrent update exception occurred");
    return;
}

throw new Exception("Concurrent update exception did not occur");

public class BlogContext : DbContext
{
    public DbSet<Blog> Blogs { get; set; }

    static ILoggerFactory ContextLoggerFactory
        => LoggerFactory.Create(b => b.AddConsole()
            .AddFilter("", LogLevel.Information)
            .AddFilter(DbLoggerCategory.Database.Transaction.Name, LogLevel.Debug));

    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        => optionsBuilder
            .UseSqlServer(@"Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0")
            .EnableSensitiveDataLogging()
            .UseLoggerFactory(ContextLoggerFactory);

    protected override void OnModelCreating(ModelBuilder modelBuilder)
        => modelBuilder.Entity<Blog>().HasData(new Blog { Id = 1, Name = "orig" });
}

public class Blog
{
    public int Id { get; set; }
    public string Name { get; set; }
    [Timestamp]
    public byte[] Rowversion { get; set; }
}

@jondmcelroy
Copy link
Author

jondmcelroy commented May 4, 2021

@roji
Thanks for looking at this. I was able to reproduce the issue when using a db table that contained a geography type.

Need to add <PackageReference Include="NetTopologySuite.Core" Version="1.15.3" /> to your test csproj.

The exception that is thrown on SaveChangesAsync is due to the invalid coordinates:

Microsoft.Data.SqlClient.SqlException (0x80131904): The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Parameter 2 ("@p0"): The supplied value is not a valid instance of data type geography. Check the source data for invalid values. An example of an invalid value is data of numeric type with scale greater than precision.

Something about this exception is causing this problem. I am not sure if this is an efcore bug or a nettopologysuite bug at this point.

using System;
using System.ComponentModel.DataAnnotations;
using System.Threading.Tasks;
using Microsoft.EntityFrameworkCore;
using Microsoft.Extensions.Logging;
using NetTopologySuite.Geometries;

namespace TestEFCoreSavePoint
{
    class Program
    {
        static async Task Main(string[] args)
        {
            await using var ctx = new BlogContext();
            await ctx.Database.EnsureDeletedAsync();
            await ctx.Database.EnsureCreatedAsync();

            var transaction = await ctx.Database.BeginTransactionAsync();
            await transaction.CreateSavepointAsync("MySavepoint");

            var blog = await ctx.Blogs.SingleAsync();

            blog.GeographicCoordinates = new Point(new Coordinate(20, 100)) { SRID = BlogContext.Srid };

            try
            {
                await ctx.SaveChangesAsync();
            }
            catch (Exception)
            {
                await transaction.RollbackToSavepointAsync("MySavepoint");
                await transaction.RollbackAsync();
                Console.WriteLine("Concurrent update exception occurred");
                return;
            }

            throw new Exception("Concurrent update exception did not occur");
        }
    }

    public class BlogContext : DbContext
    {
        
        public const int Srid = 4326;
        public DbSet<Blog> Blogs { get; set; }

        static ILoggerFactory ContextLoggerFactory
            => LoggerFactory.Create(b => b.AddConsole()
                .AddFilter("", LogLevel.Information)
                .AddFilter(DbLoggerCategory.Database.Transaction.Name, LogLevel.Debug));

        protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
            => optionsBuilder
                .UseSqlServer(
                    @"Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0",
                    x => x.UseNetTopologySuite()
                )
                .EnableSensitiveDataLogging()
                .UseLoggerFactory(ContextLoggerFactory);

        protected override void OnModelCreating(ModelBuilder modelBuilder)
            => modelBuilder.Entity<Blog>()
                .HasData(
                    new Blog { Id = 1, Name = "orig", GeographicCoordinates = new Point(new Coordinate(20, 90)) { SRID = Srid } }
                );
    }

    public class Blog
    {
        public int Id { get; set; }

        public string Name { get; set; }

        [Timestamp]
        public byte[] Rowversion { get; set; }

        public Point GeographicCoordinates { get; set; } = null!;
    }
}

@jondmcelroy jondmcelroy changed the title Issue with RollbackToSavepointAsync not working after a DbCommand failure in SaveChanges Issue with RollbackToSavepointAsync not working after a DbCommand failure due to Geography column (NetTopologySuite) May 4, 2021
@roji
Copy link
Member

roji commented May 6, 2021

Thanks for the code repro, I can see the problem happening now.

First, the specific error you're seeing seems to be the same as #19416; switching your two coordinates around makes it go away (new Coordinate(100, 20)).

Now, it seems that this specific exception implicitly terminates the transaction in SQL Server... I've verified this with a raw SqlClient test, without EF being involved in any way (see the code below). You should probably wrap your RollbackToSavepointAsync call in its own try/catch - this is a good idea regardless of this scenario, since any exception coming out of that would bubble up and hide the original exception that caused the rollback to get triggered.

Raw SqlClient test

Insert the following in the csproj:

<PackageReference Include="Microsoft.Data.SqlClient" Version="2.1.0" />
<PackageReference Include="NetTopologySuite" Version="2.3.0" />
<PackageReference Include="NetTopologySuite.IO.SqlServerBytes" Version="2.0.0" />

Program code:

await using var conn = new SqlConnection("Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0");
await conn.OpenAsync();

await using var cmd = conn.CreateCommand();

cmd.CommandText = @"IF OBJECT_ID('dbo.Blogs', 'U') IS NOT NULL DROP TABLE [Blogs];";
await cmd.ExecuteNonQueryAsync();
cmd.CommandText = @"CREATE TABLE [Blogs] ([Id] int NOT NULL IDENTITY, [GeographicCoordinates] geography NULL)";
await cmd.ExecuteNonQueryAsync();

var tx = conn.BeginTransaction();
tx.Save("MySavepoint");

cmd.CommandText = "INSERT INTO [Blogs] ([GeographicCoordinates]) VALUES (@p)";
cmd.Transaction = tx;
var value = new Point(new Coordinate(20, 100)) { SRID = 4326 };

var writer = new SqlServerBytesWriter { IsGeography = true };
cmd.Parameters.Add(new SqlParameter
{
    ParameterName = "p",
    SqlDbType = SqlDbType.Udt,
    UdtTypeName = "geography",
    Value = new SqlBytes(writer.Write(value))
});

try
{
    await cmd.ExecuteNonQueryAsync();
}
catch (SqlException)
{
    Console.WriteLine("Caught expected exception");
}

try
{
    tx.Rollback("MySavepoint");
}
catch (SqlException)
{
    Console.WriteLine("Failed to roll back savepoints");
}

tx.Rollback();

@roji roji added the closed-no-further-action The issue is closed and no further action is planned. label May 6, 2021
@ajcvickers ajcvickers added type-bug area-save-changes and removed closed-no-further-action The issue is closed and no further action is planned. labels May 12, 2021
@ajcvickers ajcvickers added this to the 6.0.0 milestone May 12, 2021
@ajcvickers
Copy link
Contributor

Note from triage: we may not be able to do anything here, but we should check that there isn't anything we can do from EF code to make this better.

@ajcvickers ajcvickers reopened this May 12, 2021
@roji
Copy link
Member

roji commented May 12, 2021

The main difficulty is that I don't think ADO.NET / SqlClient exposes information on whether a transaction is still (or currently in progress). There may some information in the error code on SqlException, we can take a look.

@roji
Copy link
Member

roji commented Aug 25, 2021

@cheenamalhotra @David-Engel @Wraith2 would it be possible to have your input on this?

In a nutshell, the repro code causes a legitimate exception with Number=8023, State=4, Class/Severity=16 - all is well and good. The problem is that it also causes the transaction to be terminated; an attempt to roll back to a savepoint from before the offendnig operation throws the exception "This SqlTransaction has completed; it is no longer usable".

I can't see anything in the docs or on the SqlClient API that exposes the fact that this exception terminates the exception - this information is useful in order to correctly react to it. For example, I can see in the docs that Severity 20 and above usually terminates the connection, but no similar information is given on the status of the transaction. In the same way, there doesn't seem to be a way to check if the SqlTransaction instance is still active.

Any ideas or input on this? Does it make sense to escalate the transaction termination to the SQL Server team, maybe this isn't reasonable behavior?

@cheenamalhotra
Copy link
Member

cheenamalhotra commented Aug 25, 2021

@roji

Thanks for the repro 👍🏻
I was able to capture server traces and there seems to be an exception occurring when RPC is executed.
This is what triggers the transaction termination.

image

It could be a driver issue, please transfer this to SqlClient repository and we will investigate further.

@cheenamalhotra
Copy link
Member

This is the complete error:

The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Parameter 1 ("@p"): The supplied value is not a valid instance of data type geography. Check the source data for invalid values. An example of an invalid value is data of numeric type with scale greater than precision.

I'm still not 100% sure if this is SqlClient issue and wondering if this is due to geography type issue? Could it be related to value generated by : var value = new Point(new Coordinate(20, 100)) { SRID = 4326 }; is not correct?

@roji
Copy link
Member

roji commented Aug 25, 2021

@cheenamalhotra I'm not convinced it's actually a SqlClient error here: as you wrote, this is related to the geography type being written as raw bytes, probably in an incorrect way (if you switch the parameters around things are OK). But even if that's the case, it's odd for the transaction to simply disappear without a trace (and for the connection to stay usable) - that's the larger question here from my perspective.

Let me know how you want to proceed, I can transfer this to SqlClient if you want.

@cheenamalhotra
Copy link
Member

Well actually if you fetch "tx.SupportsSavepoints" it says "False".

@roji
Copy link
Member

roji commented Sep 7, 2021

(got back to this after a while)

Well actually if you fetch "tx.SupportsSavepoints" it says "False".

@cheenamalhotra that's true, but isn't that because it always returns false? :) AFAIK SqlClient hasn't overridden that property, which was introduced in .NET 5.0 as part of dotnet/runtime#33397. I couldn't find an issue tracking that, so I opened dotnet/SqlClient#1256.

Stepping back... So far, the only scenario we know about where this happens is the above, which is the result of a programmer bug, rather than an actual exception that is expected to happen at runtime. I think it's OK to not do anything until we have a more relevant runtime repro - clearing milestone to bring to triage on the EF side.

@roji roji removed this from the 6.0.0 milestone Sep 7, 2021
@ajcvickers
Copy link
Contributor

Note from triage: no-action on the EF side for this case, since the exception is caused by bad binary data. However, it does seem like this may be an issue in other places (and other providers) if the transaction is reset by an exception.

@ajcvickers ajcvickers reopened this Oct 16, 2022
@ajcvickers ajcvickers closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants