Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add snake_case support for System.Text.Json #42009

Closed
wants to merge 3 commits into from

Conversation

huoyaoyuan
Copy link
Member

Closes #782
Implementation and tests are both copied from Json.NET

@Dotnet-GitSync-Bot
Copy link
Collaborator

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

/// </summary>
public static JsonNamingPolicy SnakeCase { get; } = new JsonSnakeCaseNamingPolicy();

// KebabCase was added together with SnakeCase because sharing implementation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally we don't add code in case it might be exposed in future. We would add the implementation when the API is approved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return s;
}

StringBuilder sb = new StringBuilder();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a hot path? If so should it use ValueStringBuilder?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied it as-is from Json.NET as a first step. I guess that naming policy only runs once during class initialization?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming policies are also used when serializing dictionary keys and the results are not cached (TBD whether it makes sense to). We want to do the most performant thing here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what can be used here to reduce the allocation. Unlike camel case, the output length of snake case cannot be predicted.
ValueStringBuilder? But we are not in corelib here.
Just maintain a stackalloc buffer here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously the team was against ValueStringBuilder since it makes libraries to bloat, but this is one of cleanest ways to improve performance. Another one is ArrayPool<char>, but it means that you should repeat whole logic of ValueStringBuilder which isn't a great idea.

FYI StringBuilder has a thread local instance which reused for .NET internals and as I can remember even for string formatting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worst-case, isn't the required length appx s.Length * 2 (i.e. a separator between every char)? If so, this can be done simply with:

char[] arr = ArrayPool<char>.Shared.Rent(s.Length * 2 - 1); // worst-case length for a separator between every char
int pos = 0;
...
arr[pos++] = ...; // every time a char is added
...
string result = new string(arr, 0, pos);
ArrayPool<char>.Shared.Return(arr);
return result;

Assert.Equal("shouting_case", ConvertToSnakeCase("SHOUTING_CASE"));
Assert.Equal("9999-12-31_t23:59:59.9999999_z", ConvertToSnakeCase("9999-12-31T23:59:59.9999999Z"));
Assert.Equal("hi!!_this_is_text._time_to_test.", ConvertToSnakeCase("Hi!! This is text. Time to test."));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about all whitespace input?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Json.NET didn't test it. I don't want to compose new test myself, because there are no well-defined correct result for corner cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see additional test cases and implementation from an initial PR from @YohDeadfall which was extensively reviewed - dotnet/corefx#41354.

there are no well-defined correct result for corner cases.

As much as possible, we need to determine the correct behavior for edge cases. A good first step here is to add tests for as many edge scenarios that we can think of, including all whitespace input as mentioned.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huoyaoyuan, I suggest you to wait a little bit more and concentrate on other things you're working on. There're a lot of corner cases, but no idea how the API should work. We discovered how external and existing libraries work, but no decision was made, unfortunately. This way you won't change the code after each comment here and would be able to focus on things which could be made right now.

Copy link
Contributor

@layomia layomia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. With this feature, there are a few design decisions to be fleshed out. See dotnet/corefx#41354 (comment). As we're currently wrapping up 5.0, it will probably be a couple weeks before I can take a deep dive here, but feel free to address them in the mean time.

The existing camel-case policy implementation might also need to be updated to use the same splitting mechanism as snake-case so we have a standard way across all built-in policies.

FYI @YohDeadfall

return s;
}

StringBuilder sb = new StringBuilder();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming policies are also used when serializing dictionary keys and the results are not cached (TBD whether it makes sense to). We want to do the most performant thing here.

/// </summary>
public static JsonNamingPolicy SnakeCase { get; } = new JsonSnakeCaseNamingPolicy();

// KebabCase was added together with SnakeCase because sharing implementation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert.Equal("shouting_case", ConvertToSnakeCase("SHOUTING_CASE"));
Assert.Equal("9999-12-31_t23:59:59.9999999_z", ConvertToSnakeCase("9999-12-31T23:59:59.9999999Z"));
Assert.Equal("hi!!_this_is_text._time_to_test.", ConvertToSnakeCase("Hi!! This is text. Time to test."));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see additional test cases and implementation from an initial PR from @YohDeadfall which was extensively reviewed - dotnet/corefx#41354.

there are no well-defined correct result for corner cases.

As much as possible, we need to determine the correct behavior for edge cases. A good first step here is to add tests for as many edge scenarios that we can think of, including all whitespace input as mentioned.

@YohDeadfall
Copy link
Contributor

@layomia, the problem here is the same as with mine PR. There is no agreement on compatibility issues with external translators. Otherwise, my PR was ready for merge for a long time ago.

@YohDeadfall
Copy link
Contributor

Here is a list of open questions we had before and other information which characters should be omitted and which should stay in the resulting string: dotnet/corefx#41354 (comment)

The whole research was did to not only add the snake case policy, but others too. Also we found that the current camel case policy doesn't work the same way as external.

If you're okay with what I already did I can to resurrect the PR in that repo. Otherwise, we should get consensus first, better in the linked issue, but has no tools to copy all comments from dotnet/corefx#41354.

/cc @stephentoub @jkotas @khellang

Comment on lines +56 to +58
char c;
c = char.ToLowerInvariant(s[i]);
sb.Append(c);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
char c;
c = char.ToLowerInvariant(s[i]);
sb.Append(c);
sb.Append(char.ToLowerInvariant(s[i]));

Comment on lines +40 to +41
bool hasNext = (i + 1 < s.Length);
if (i > 0 && hasNext)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool hasNext = (i + 1 < s.Length);
if (i > 0 && hasNext)
if (i > 0 && i + 1 < s.Length)

@steveharter
Copy link
Member

steveharter commented Sep 14, 2020

From @YohDeadfall:

@huoyaoyuan, I suggest you to wait a little bit more and concentrate on other things you're working on. There're a lot of corner cases, but no idea how the API should work. We discovered how external and existing libraries work, but no decision was made, unfortunately. This way you won't change the code after each comment here and would be able to focus on things which could be made right now.

Yes there are several implementations for how snake case should work, and no standard. The previous attempt from @YohDeadfall at dotnet/corefx#41354 never made it in due to those concerns.

Assuming this PR has the exact same behavior as Newtonsoft (I didn't verify), even with bugs, then the approach aligns to what we did when we added camel case (which was based on Newtonsoft 100%). Even with the built-in implementation matching Newtonsoft, it would still possible in a one-off manner to add a custom naming policy at runtime (as it is possible today) to match Node or Python semantics.

Instead of JsonNamingPolicy.SnakeCase, we could use JsonNamingPolicy.SnakeCaseNewtonsoftCompat or something similar to make that clear.

Once a naming policy is added, any change is breaking since previously serialized JSON may not be able to be deserialized.

@@ -154,6 +154,7 @@ public abstract partial class JsonNamingPolicy
{
protected JsonNamingPolicy() { }
public static System.Text.Json.JsonNamingPolicy CamelCase { get { throw null; } }
public static System.Text.Json.JsonNamingPolicy SnakeCase { get { throw null; } }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the many implementations of snake case, do we want to make it clear what algorithm we're using?

For example, instead of JsonNamingPolicy.SnakeCase, we could use JsonNamingPolicy.SnakeCaseNewtonsoftCompat or something similar to make that clear? I realize this passed API review, but we should revisit to ensure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@layomia I'd require an api design/review from you.

@ericstj
Copy link
Member

ericstj commented Jan 22, 2021

@layomia it looks like the open question right now is around the naming of the Public API describing this policy. Perhaps we can review that, share thoughts, then bring it to API review for approval?

@YohDeadfall
Copy link
Contributor

@ericstj, it would be much better if someone find some time to address questions first in dotnet/corefx#41354 instead of approving/reviewing this. Otherwise, there will be another mindless change which cannot be reverted or changed due to backward compatibility.

@layomia, please, find take a loot at the issue I mentioned. I really would like to implement it properly and update the translator in Npgsql too.

@ericstj
Copy link
Member

ericstj commented Jan 22, 2021

I see, apologies @YohDeadfall I missed that there were still some design issues to resolve. Agreed that we need to close on those. Is dotnet/corefx#41354 (comment) the right thing to look at?

@YohDeadfall
Copy link
Contributor

Right, @ericstj. It contains unresolved questions and a research in this area. When all of them get answered, it will be possible to implement a lot of policies including lower and upper cases and so on (take a look at JS translators mentioned in the same PR).

Base automatically changed from master to main March 1, 2021 09:06
@layomia
Copy link
Contributor

layomia commented Mar 10, 2021

This feature needs more research, but isn't being actively worked on right now. It remains important, but we have to circle back a little later in the .NET 6 release. I'll close it for now and ping this thread later on.

@layomia layomia closed this Mar 10, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Apr 9, 2021
@karelz karelz added this to the 6.0.0 milestone May 20, 2021
@huoyaoyuan huoyaoyuan deleted the snake-case branch March 4, 2023 04:13
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add snake_case support for System.Text.Json
9 participants