Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing ordering on ORDINALITY column when projecting out primitive collections with composed LINQ operators #3207

Closed
pinkfloydx33 opened this issue Jun 22, 2024 · 8 comments · Fixed by #3209
Assignees
Labels
bug Something isn't working
Milestone

Comments

@pinkfloydx33
Copy link

I am not sure if this is an EF issue or a efcore.pg issue so I am starting here.

We have an entity with a primitive collection represented as a Postgres array column. We have found an issue in EF Core wherein the collection order is not preserved by EF. In our case, the incorrect ordering seems dependent upon various combinations of:

  1. OrderBy on the main query
  2. Number of entities returned by the main query
  3. Projection of the primitive collection to a DTO or just ToList
  4. Number of items in the primitive collections

In our actual application any one of the first three items seems to have an impact; we whittled our queries down such that we were only dealing with one of 1, 2, or 3 at a time and each would reproduce depending on the number of items in the related collections (4).

For the sake of this bug report, I've boiled it down to a minimal reproducible example (see below). However I've only been able to reproduce it when OrderBy is involved. I've not be able to reproduce it otherwise, so will continue trying to reduce our real application down until I can.

Entity and DTO Types
public sealed class Thing
{
    public int Id { get; set; }
    public string Name { get; set; } = null!;
    public int[] Related { get; set; } = [ ];
}

public sealed class Dto
{
    public int Id { get; set; }
}
Context
public sealed class TestContext : DbContext
{
    public DbSet<Thing> Things => Set<Thing>();

    private const string Conn = "Host=localhost;Port=5432;Database=arraytest;Username=postgres;Password=postgres;Include Error Detail=true";
    
    private static readonly DbDataSource Datatsource = new Npgsql.NpgsqlDataSourceBuilder(Conn)
        .Build();

    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
    {
        optionsBuilder.UseNpgsql(Datatsource, e => e.SetPostgresVersion(15, 0))
            .EnableDetailedErrors()
            .EnableSensitiveDataLogging()
            .LogTo(s => Debug.WriteLine(s));
    }

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        base.OnModelCreating(modelBuilder);

        modelBuilder.Entity<Thing>(e =>
        {
            e.ToTable("things");
            e.HasKey(p => p.Id);

            e.Property(p => p.Id)
                .HasColumnName("id")
                .UseIdentityByDefaultColumn();

            e.Property(p => p.Name)
                .HasColumnName("name")
                .IsRequired()
                .HasColumnType("text");

            // primitive collection has the issue
            e.PrimitiveCollection(p => p.Related)
                .HasColumnName("related")
                .HasColumnType("integer[]")
                .HasDefaultValueSql("ARRAY[]::integer[]")
                .IsRequired()
                .ElementType(i => i.IsRequired());

            // As does defining this way instead
            // e.Property(p => p.Related)
            //     .HasColumnName("related")
            //     .HasColumnType("integer[]")
            //     .HasDefaultValueSql("ARRAY[]::integer[]");

            e.HasIndex(p => p.Related)
                .HasMethod("gin");
        });
    }
}

Here is the application code/reproduction.

Application code
using Microsoft.EntityFrameworkCore;
using System.Data.Common;
using System.Diagnostics;

var alt = false;

await using (var ctx = new TestContext())
{
    await ctx.Database.EnsureDeletedAsync();
    await ctx.Database.EnsureCreatedAsync();
}

await using (var ctx = new TestContext())
{
    var a = new Thing
    {
        Id = 1,
        Name = "Thing A",
        Related = [1,2,3]
    };
    var b = new Thing
    {
        Id = 2,
        Name = "B Thing",
        Related = [3,2]
    };

    var c = new Thing
    {
        Id = 3,
        Name = "Empty Thing",
        Related = [  ]
    };

    var d = new Thing
    {
        Id = 4,
        Name = "Third Thing",
        Related = [ 4,5,6,7,8 ]
    };

    ctx.Add(a);
    ctx.Add(b);

    if (alt)
        ctx.Add(c);

    ctx.Add(d);
    await ctx.SaveChangesAsync();
}

await using (var ctx = new TestContext())
{
    WriteSpacer("#1 Correct (OrderBy + Basic Projection)");

    var res1 = await ctx.Things
        .OrderBy(e => e.Name)
        .Select(e => new { e.Id, e.Related })
        .ToListAsync();

    foreach (var output in res1)
        Console.WriteLine("{0} - {1}", output.Id, string.Join("-", output.Related));

    WriteSpacer($"#2 {(alt ? "Correct?!?!": "Incorrect")} (OrderBy + ProjectRelated)");

    var res2 = await ctx.Things
        .OrderBy(e => e.Name)
        .Select(e => new { e.Id, Related = e.Related.Select(r => new Dto { Id=r }).ToArray() })
        .ToListAsync();

    foreach (var output in res2)
        Console.WriteLine("{0} - {1}", output.Id, string.Join("-", output.Related.Select(o => o.Id)));

    WriteSpacer("#3 Correct (ProjectRelated)");

    var res3 = await ctx.Things
        .Select(e => new { e.Id, Related = e.Related.Select(r => new Dto { Id=r }).ToArray() })
        .ToListAsync();

    foreach (var output in res3)
        Console.WriteLine("{0} - {1}", output.Id, string.Join("-", output.Related.Select(o => o.Id)));

    WriteSpacer($"#4 {(alt ? "Correct?!?!": "Incorrect")} (OrderBy + ToList)");

    var res4 = await ctx.Things
        .OrderBy(e => e.Name)
        .Select(e => new { e.Id, r = e.Related.ToList() })
        .ToListAsync();

    foreach (var output in res4)
        Console.WriteLine("{0} - {1}", output.Id, string.Join("-", output.r));

    WriteSpacer("#5 Always Incorrect (TotalItems + OrderBy + ToList)");

    var res5 = await ctx.Things
        .Where(e => e.Id != 3)
        .OrderBy(e => e.Name)
        .Select(e => new { e.Id, r = e.Related.ToList() })
        .ToListAsync();

    foreach (var output in res5)
        Console.WriteLine("{0} - {1}", output.Id, string.Join("-", output.r));

}

void WriteSpacer(string title)
{
    Console.WriteLine("");
    Console.WriteLine($"=== {title} ".PadRight(70, '='));
    Console.WriteLine("");
}

Run the code once, as is. The expected result is that the output of the primitive collection matches the original order, despite the ordering of the main query itself. Meaning that (order of this list aside), the values should always be print to the console as:

1 - 1-2-3
2 - 3-2
4 - 4-5-6-7-8

Instead, under result sets #2, #4, and #5 you will see that the ordering for items with Id's 1 and 4 are incorrectly output as:

1 - 2-3-1
4 - 8-4-5-6-7

Now run the code again, this time with the alt variable set to true. This adds a fourth entity to the database with an empty primitive collection. The outputs of tests #2 and #4 are as expected this time, while #5 remains incorrect.

Output with alt=false
=== #1 Correct (OrderBy + Basic Projection) ==========================

2 - 3-2
1 - 1-2-3
4 - 4-5-6-7-8

=== #2 Incorrect (OrderBy + ProjectRelated) ==========================

2 - 3-2
1 - 2-3-1
4 - 8-4-5-6-7

=== #3 Correct (ProjectRelated) ======================================

1 - 1-2-3
2 - 3-2
4 - 4-5-6-7-8

=== #4 Incorrect (OrderBy + ToList) ==================================

2 - 3-2
1 - 2-3-1
4 - 8-4-5-6-7

=== #5 Always Incorrect (TotalItems + OrderBy + ToList) ==============

2 - 3-2
1 - 2-3-1
4 - 8-4-5-6-7
Output with alt=true
=== #1 Correct (OrderBy + Basic Projection) ==========================

2 - 3-2
3 -
1 - 1-2-3
4 - 4-5-6-7-8

=== #2 Correct?!?! (OrderBy + ProjectRelated) ========================

2 - 3-2
3 -
1 - 1-2-3
4 - 4-5-6-7-8

=== #3 Correct (ProjectRelated) ======================================

1 - 1-2-3
2 - 3-2
3 -
4 - 4-5-6-7-8

=== #4 Correct?!?! (OrderBy + ToList) ================================

2 - 3-2
3 -
1 - 1-2-3
4 - 4-5-6-7-8

=== #5 Always Incorrect (TotalItems + OrderBy + ToList) ==============

2 - 3-2
1 - 2-3-1
4 - 8-4-5-6-7

We were getting random reports of incorrect ordering from customers. Two of us spent a combined 48 hours trying to debug into this and figure out what was going on. Every time we thought we had a solution, a new test case would start failing because ultimately it all seemed dependent on ordering, filtering, and total counts. We ended up working around this by writing a Postgres function that preserved and sorted by the Ordinality of the array collection, unnesting it into the type we were trying to project to.

While we have the workaround, we are concerned the issue may creep up in other forms. We'd also like to ultimately remove the workaround since a function manually added into migrations won't benefit from changes to the model.

Versions used

  • Npgsql.EntityFrameworkCore.PostgreSQL - 8.0.4
  • Npgsql - 8.0.3
  • Microsoft.EntityFrameworkCore - 8.0.6
  • .NET 8
@roji roji changed the title Collection Properties lose database-side ordering Missing ordering on ORDINALITY column when projecting out primitive collections Jun 23, 2024
@roji roji self-assigned this Jun 23, 2024
@roji roji added the bug Something isn't working label Jun 23, 2024
@roji roji added this to the 8.0.5 milestone Jun 23, 2024
@roji roji changed the title Missing ordering on ORDINALITY column when projecting out primitive collections Missing ordering on ORDINALITY column when projecting out primitive collections with LINQ operators composed Jun 23, 2024
@roji roji changed the title Missing ordering on ORDINALITY column when projecting out primitive collections with LINQ operators composed Missing ordering on ORDINALITY column when projecting out primitive collections with composed LINQ operators Jun 23, 2024
@roji
Copy link
Member

roji commented Jun 23, 2024

@pinkfloydx33 thanks for the report and for the repro - I can confirm there's a bug here (see at the bottom for a minimal repro). I'll do my best to fix this ASAP for an upcoming 8.0 patch release.

When a primitive collection is unnested to a rowset for composition of LINQ operators, we use the PostgreSQL WITH ORDINALITY clause to add an ordering column to preserve the ordering of the original array. However, when the results of that query is projected out, we don't order by the ordinality column, so the arrayordering is lost.

Query:

_ = await context.Blogs
    .Select(b => b.Ints.Select(i => i + 1).ToArray())
    .ToListAsync();

SQL produced:

SELECT b."Id", i.value + 1, i.ordinality
FROM "Blogs" AS b
LEFT JOIN LATERAL unnest(b."Ints") WITH ORDINALITY AS i(value) ON TRUE
ORDER BY b."Id" -- missing ordering by i.ordinality

The SQL Server provider doesn't have this issue (the key column projected out by OPENJSON fulfills the same role as the PG ORDINALITY column):

SELECT [b].[Id], CAST([i].[value] AS int) + 1, [i].[key]
FROM [Blogs] AS [b]
OUTER APPLY OPENJSON([b].[Ints]) AS [i]
ORDER BY [b].[Id], CAST([i].[key] AS int)
Full minimal repro
await using var context = new BlogContext();
await context.Database.EnsureDeletedAsync();
await context.Database.EnsureCreatedAsync();

_ = await context.Blogs
    .Select(b => b.Ints.Select(i => i + 1).ToArray())
    .ToListAsync();

public class BlogContext : DbContext
{
    public DbSet<Blog> Blogs { get; set; }

    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        => optionsBuilder
            .UseNpgsql("Host=localhost;Username=test;Password=test")
            .LogTo(Console.WriteLine, LogLevel.Information)
            .EnableSensitiveDataLogging();
}

public class Blog
{
    public int Id { get; set; }
    public List<int> Ints { get; set; }
}

@pinkfloydx33
Copy link
Author

pinkfloydx33 commented Jun 23, 2024

Thanks @roji. If/when you have a nightly build I'm more than happy to verify expected results

@roji
Copy link
Member

roji commented Jun 23, 2024

Note: we currently don't add an ordering by the ordinality column, since PG's unnest is documented to return values in storage order. However, the moment one integrates the unnest into a SELECT via a JOIN, that "natural" ordering is lost and an explicit ordering must be present. I'll add an explicit ordering just like for the other providers.

@roji
Copy link
Member

roji commented Jun 23, 2024

Backported to 8.0.5 via 8a4c25b

@roji
Copy link
Member

roji commented Jun 23, 2024

@pinkfloydx33 I've pushed a fix and backported to 8.0.5; can you please try version 8.0.5-ci.20240623T224824.nupkg from the stable feed and confirm that the bug is gone for you?

@pinkfloydx33
Copy link
Author

@roji it worked as expected. Thanks for looking into this so quick.

When's the expected release of the 8.0.5 package?

@roji
Copy link
Member

roji commented Jun 24, 2024

@pinkfloydx33 great. I'll probably do a patch release in the next few weeks, based on what's pending also in Npgsql - stay tuned (and don't hesitate to use the prerelease nuget in the meantime).

@pinkfloydx33
Copy link
Author

@roji any updated estimate on an official 8.0.5 release?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants