Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should StringBuilder.Equals ignore Capacity? #27684

Closed
svick opened this issue Oct 20, 2018 · 19 comments
Closed

Should StringBuilder.Equals ignore Capacity? #27684

svick opened this issue Oct 20, 2018 · 19 comments

Comments

@svick
Copy link
Contributor

svick commented Oct 20, 2018

The implementation of StringBuilder.Equals(StringBuilder) includes this code:

if (Capacity != sb.Capacity || MaxCapacity != sb.MaxCapacity || Length != sb.Length)
    return false;

This means that even if two StringBuilders have the same content (i.e. sb1.ToString() == sb2.ToString() would return true), calling Equals() on them could return false. Is this the right behavior? Would it be possible to change it? (Or is that an unacceptable breaking change?)


As a motivating example, consider improving performance of comparing the text of Roslyn SyntaxNodes. SyntaxNode.ToString() uses (pooled) StringBuilder to build the returned string, so it might be tempting to compare just the two StringBuilder, avoiding a string allocation. But this is not guaranteed to work, because it's possible the two StringBuilders will have the same contents, but different Capacity, in which case Equals() would still return false.

(But note that this is just an example. I assume Roslyn wouldn't be able to use StringBuilder.Equals directly even if it was improved, because it has to run on older frameworks.)


A concrete code example:

var sb1 = new StringBuilder();
var sb2 = new StringBuilder();

sb1.Append(new string('A', 16));
sb2.Append(new string('A', 16));

Console.WriteLine("sb1.Equals(sb2): " + sb1.Equals(sb2));
Console.WriteLine();

sb1.Insert(0, 'A');
sb2.Append('A');

Console.WriteLine("sb1.Equals(sb2): " + sb1.Equals(sb2));
Console.WriteLine("sb1.ToString() == sb2.ToString(): " + (sb1.ToString() == sb2.ToString()));

Because of the way StringBuilder growing heuristic is implemented, this will print:

sb1.Equals(sb2): True

sb1.Equals(sb2): False
sb1.ToString() == sb2.ToString(): True
@AraHaan
Copy link
Member

AraHaan commented Oct 20, 2018

I think that it should not ignore capacity as a string with shorter length as the input, or the other way around should definitely not be equal and if the function was to do as you suggested then possibly a lot of not only the framework, but also 99.9999999999% of user code would break. But @safern or someone knowing the parts to that well should know for sure.

@svick
Copy link
Contributor Author

svick commented Oct 20, 2018

@AraHaan I'm not saying it should ignore Length (how many characters the StringBuilder contains), only Capacity (how many characters it can contain without reallocation).

@AraHaan
Copy link
Member

AraHaan commented Oct 20, 2018

Ah, ok. 🤔

@Clockwork-Muse
Copy link
Contributor

....Part of the problem, I guess, is that Capacity and MaxCapacity are readable properties, so the capacity isn't just an internal implementation detail.

Perhaps what we want is an EqualContents(StringBuilder other) method, instead?
For that matter, perhaps a general ContentEquals(<various-string-like> other) method? String, ReadOnlyMemory, etc.

@svick
Copy link
Contributor Author

svick commented Oct 20, 2018

@Clockwork-Muse Yes, they are readable, but I don't think that means they have to influence equality.

And yes, I think something like EqualContents would be a reasonable alternative if changing Equals was not an option.

As for the general ContentEquals, I'm not sure. There's already StringBuilder.Equals(ReadOnlySpan<char>), which takes care of comparing StringBuilder with string, (RO)Span<char> and (RO)Memory<char>. And I think you can use the SequenceEqual extension method for all the other combinations. Is there anything missing?

@Clockwork-Muse
Copy link
Contributor

Clockwork-Muse commented Oct 21, 2018

Huh, missed those overloads somehow.

Ah, I was looking at the framework version, not the net-core version

@danmoseley
Copy link
Member

I don't know why StringBuilder.Equals(StringBuilder) checks Capacity, it does not seem what you would expect, but it appears to have been that way for some time and it is documented. I am not sure we have good enough reason to change it now, unless lots of people complain their code has a bug because they were expecting it to only compare content. @vancem any idea about it? If we don't, I think adding EqualContents is reasonable.

It looks like the S.R.M clone of StringBuilder does not do it.

If you want to compare content, and cannot use StringBuilder.Equals(ReadOnlySpan<char> (eg., you're comparing with another StringBuilder) the most efficient way would be to use StringBuilder.GetChunks. Everything else either indexes into the StringBuilder (which is really slow) or creates a string (which allocates arbitrarily)

@vancem
Copy link
Contributor

vancem commented Oct 22, 2018

@danmosemsft - As you point out the fact that StringBuilder checks Capacity is a very hold behavior in StringBuilder.

I do however, completely believe that it was and still is a mistake to have this behavior (I agree with @svick). The reason it is a mistake is because frankly Capacity is NOT something that a user has much control over (the internal implementation decides this), and thus Equals is next to useless in its current form (you can only rely on it if it returns true).

Both Capacity and MaxCapacity are artifacts that we have kept for compatibility reasons. Ideally Equals does not pay attention to either of them (MaxCapacity is not quite as harmful because the user has control over it (you call a special constructor), and thus can insure that basically you never use that constructor, and thus avoid the problem).

While making up a 'EqualsContent' method is a step in the right direction, this is not ideal because 'Equals' is wired into many things (e.g. Dictionary lookup) by default. It is clearly unfortunate that this doesn't 'just work'.

I also note that capacity is NOT a useful property on StringBuilder. No matter what value it has, it might have a new value after any operation. It is also simply not useful. Who cares what the capacity is since the implementation changes it to what it needs anyway.

In this world (where Capacity can change at the implemenation's whim), arguably the implemenation should be allowed to set the capacities of its arguments to be the same before going the equals opeations. This is of course silly, but it drives home the point that this change is really not likely to break anything that was not already pretty broken.

I think this is the kind of thing that we should entertain in .NET Core.

Realistically, it is a long term play. People probably won't take a dependency on it for years. It just make the conversation easier in several years from now.

Bottom line: I am for it (in .NET Core). It is a trivial change and arguably is not breaking the 'contract' (since the implemenation owns Capacity), and is very unlikley to break actual user code, and makes StringBuilder.Equals actually useful.

For what it is worth...

@svick
Copy link
Contributor Author

svick commented Oct 22, 2018

@vancem

'Equals' is wired into many things (e.g. Dictionary lookup) by default

I believe only object.Equals(object) is (and IEquatable<T>.Equals(T)). But StringBuilder does not override that, all it does is that it provides its own overloads of Equals. This means it's not usable as a key in Dictionary, unless you want reference equality (and also because it doesn't override GetHashCode).

Not sure if that makes a difference.

@danmoseley
Copy link
Member

Interestingly it appears in as much as 5% of apps run through the APIPort tool, although that may be simply because it's used somewhere in one common library.

I'm OK with changing this. @stephentoub do you have any concerns?

@stephentoub
Copy link
Member

stephentoub commented Oct 22, 2018

I'm OK with changing this. @stephentoub do you have any concerns?

I'd rather change it than add an alternative that gets it right. Let's do it. I have my normal hesitancy that someone could take a dependency on the new behavior and then things behave incorrectly on netfx and previous releases, but that's generally true for any bug fix.

@danmoseley
Copy link
Member

@svick are you interested in offering a PR?

@danmoseley
Copy link
Member

It will also need a PR into docs.

@vancem
Copy link
Contributor

vancem commented Oct 22, 2018

I agree that the most likely breakage risk is that code that works correctly on .NET Core will fail on Desktop. However, as Stephen points out, that is true of any fix in .NET Core, and should not block fixes.

@svick
Copy link
Contributor Author

svick commented Oct 23, 2018

@danmosemsft Yeah, I have created the PR: dotnet/corefx#32991.

@stephentoub
Copy link
Member

@svick, this can now be closed, correct?

@svick
Copy link
Contributor Author

svick commented Nov 23, 2018

@stephentoub There is one remaining piece of work: making sure the documentation is updated. I have just opened dotnet/dotnet-api-docs#1177 for that, so I'm closing this issue here.

@svick svick closed this as completed Nov 23, 2018
@stephentoub
Copy link
Member

Thanks

@danmoseley
Copy link
Member

Thank you for spotting this @svick.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 3.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 15, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants