Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop escaping Unicode characters unnecessarily in relational JSON #32152

Open
ajcvickers opened this issue Oct 24, 2023 · 9 comments
Open

Stop escaping Unicode characters unnecessarily in relational JSON #32152

ajcvickers opened this issue Oct 24, 2023 · 9 comments
Assignees
Labels
area-json consider-for-current-release customer-reported needs-design punted-for-9.0 Originally planned for the EF Core 9.0 (EF9) release, but moved out due to resource constraints. type-enhancement
Milestone

Comments

@ajcvickers
Copy link
Contributor

The JSON stored in a SQL Server column always has its Unicode characters escaped. For example, this code:

context.Add(new Customer { Json = new MyJson { Be = "Füzér Castle in the Zemplén Mountains" } } );
await context.SaveChangesAsync();

Results in the following JSON in the database:

{"Be":"F\u00FCz\u00E9r Castle in the Zempl\u00E9n Mountains"}

However, there is nothing preventing the JSON being stored as:

{"Be":"Füzér Castle in the Zemplén Mountains"}

EF reads the correct string back in both cases, but other tools may not be expecting escaped JSON for all Unicode characters.

Test code:

using (var context = new SomeDbContext())
{
    await context.Database.EnsureDeletedAsync();
    await context.Database.EnsureCreatedAsync();

    context.Add(new Customer { Json = new MyJson { Be = "Füzér Castle in the Zemplén Mountains" } });

    await context.SaveChangesAsync();
}

using (var context = new SomeDbContext())
{
    foreach (var c in context.Set<Customer>().ToList())
    {
        Console.WriteLine(c.Json.Be);
    }
}

public class SomeDbContext : DbContext
{
    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        => optionsBuilder
            .UseSqlServer(@"Data Source=(LocalDb)\MSSQLLocalDB;Database=AllTogetherNow")
            .LogTo(Console.WriteLine, LogLevel.Information)
            .EnableSensitiveDataLogging();

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Entity<Customer>().OwnsOne(e => e.Json).ToJson();
    }
}

public class Customer
{
    public int Id { get; set; }
    public MyJson Json { get; set; }
}

public class MyJson
{
    public string? Be { get; set; }
}
@maumar
Copy link
Contributor

maumar commented Oct 25, 2023

related: #30315

@roji
Copy link
Member

roji commented Nov 2, 2023

Here's the relevant docs page on encoding/escaping in System.Text.Json; we may want to specify JavaScriptEncoder.UnsafeRelaxedJsonEscaping.

However, for people using non-Unicode encodings in the database, or wanting to interact with the JSON documents from other tools which may not support special chars, we should probably have some sort of mechanism for specifying the encoding/escaping policy; this basically boils down to allowing users to specify the JsonSerializerOptions (which has this one it).

Maybe it makes sense to keep the current behavior as the default (as that's the S.T.Json default), and just tell people to specify UnsafeRelaxedJsonEscaping via JsonSerializerOptions if they want unescaped chars.

@maumar
Copy link
Contributor

maumar commented May 10, 2024

related: #33443

We also escape all unicode chars in JSON property names, however when building a JSON path for query or update we don't escape those chars. This leads to us not fetching/updating the requested JSON fragments

@roji
Copy link
Member

roji commented Jun 2, 2024

Note that we'd need to not escape - and deal with non-escaped JSON data - everywhere in our stack (SaveChanges, JSON path generation, shaper - see #33771.

As in #33771 (review), we should consider doing this for 9.0 as well, since #33771 is a breaking change for people querying externally-inputted JSON data that isn't escaped.

@maumar
Copy link
Contributor

maumar commented Jun 3, 2024

Note for the implementer: (as pointed out by @Charlieface here: #33443 (comment)) when we allow un-escaped JSON strings, we need to escape single and double quotes and also pre-pend N to the JSON path string.

@roji
Copy link
Member

roji commented Jun 4, 2024

Design discussion:

  • We'll indeed try to get this in for 9.0, along with Fix to #33443 - JSON path is not properly formatted #33771 (but not critical).
  • We won't make any effort to support both escaped and non-escaped JSON content in the same database. For example, a single context option is probably sufficient as an application-wide setting, as opposed to e.g. a column-by-column setting.

@ajcvickers ajcvickers added this to the 9.0.0 milestone Jun 15, 2024
@ajcvickers ajcvickers removed this from the 9.0.0 milestone Jul 1, 2024
@AndriySvyryd AndriySvyryd assigned roji and unassigned maumar Aug 8, 2024
@ajcvickers ajcvickers added the punted-for-9.0 Originally planned for the EF Core 9.0 (EF9) release, but moved out due to resource constraints. label Aug 14, 2024
@ajcvickers ajcvickers added this to the Backlog milestone Aug 14, 2024
@ajcvickers ajcvickers removed their assignment Aug 31, 2024
@fiseni-pcl
Copy link

Sorry to chime in here. Currently in EF Core 8, there is no way we can persist Unicode characters in non-escaped form for JSON columns? There is no easy workaround, correct?
If this feature makes it to EF9, any plans to backport it to EF8 too? Or we'll be forced to update to .NET 9 and use EF9?

@tecxx

This comment has been minimized.

@roji

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-json consider-for-current-release customer-reported needs-design punted-for-9.0 Originally planned for the EF Core 9.0 (EF9) release, but moved out due to resource constraints. type-enhancement
Projects
None yet
Development

No branches or pull requests

6 participants