Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Fix System.String over-allocation #17876

Merged
merged 3 commits into from
May 4, 2018
Merged

Fix System.String over-allocation #17876

merged 3 commits into from
May 4, 2018

Conversation

jkotas
Copy link
Member

@jkotas jkotas commented May 3, 2018

BaseSize for System.String was not set correctly. It caused unnecessary extra 8 bytes to be allocated at the end of strings that had Length % 4 < 2 on 64-bit platforms.

This change makes affected strings proportionally cheaper. For example, new string('a', 1) in a long-running loop is 7% faster.

BaseSize for System.String was not set correctly. It caused unnecessary extra 8 bytes to be allocated at the end of strings that had `Length % 4 < 2` on 64-bit platforms.

This change makes affected strings proportionally cheaper. For example, `new string('a', 1)` in a long-running loop is 7% faster.
@danmoseley
Copy link
Member

Would it be worth considering a 2.1 port, after bake time? That's a nice space win for some apps.

@jkotas
Copy link
Member Author

jkotas commented May 4, 2018

@dotnet-bot test Windows_NT arm64 Cross Checked Innerloop Build and Test please

@jkotas
Copy link
Member Author

jkotas commented May 4, 2018

Would it be worth considering a 2.1 port, after bake time? That's a nice space win for some apps.

The typical win is going to be a few percent at best. I do not think we are back porting improvements of this magnitude.

@jkotas jkotas requested a review from janvorli May 4, 2018 05:03
{
LIMITED_METHOD_DAC_CONTRACT;

return ObjSizeOf(Object) + sizeof(DWORD) + sizeof(WCHAR);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the extra sizeof(WCHAR) is for the terminating null, would be nice to copy the comment from before

@jkotas jkotas merged commit 37e6436 into dotnet:master May 4, 2018
@jkotas jkotas deleted the string-alloc branch May 4, 2018 17:50
@glenn-slayden
Copy link

glenn-slayden commented May 4, 2018

Although not exactly a string overflow, the x64 implementation of String.GetHashCode in 4.7.2 runs over into its zero-terminator ushort, XORing that value into the result. As with the situation described here, the behavior is likewise sensitive only two out of the four Length % 4 cases. (For this case, the zero-terminator is incorporated into the hash computation--with no effect if it's value is 0 as supposed--when that remainder is odd, if I recall correctly).

There wouldn't be a bug danger from 60e9eba (and it's possible that the hashing code 'of sketchy elegance' itself, that I just mentioned, may have since been updated) but I wanted to point out the type of hazard that can lurk, perhaps especially in backporting this sort of thing. This class of read-overrun hazard, say, 'unsafe' code carelessly scanning an odd-sized string by (e.g.) ulong at a time, may only manifest a hard crash when the end of an oddly-sized string exactly abuts the end of a VM block. So FWIW, in such a case, 60e9eba may newly introduce (i.e., breaking-change) the possibility of hard crash to such ugly albeit existing, non-crashing, code.

To mitigate this, one would want to continue to guarantee that at least four (4) bytes beyond the zero-terminator of every string are always readable, this amount being sufficient to protect a sloppy/careless ulong read at the last character address (the read needs 6 bytes protection, but the zero-terminator provides 2). In other words, if 60e9eba is a breaking change, it would be because it reduces the (previously erroneous/unwarranted) guarantee of 8 accessible bytes to 0, yet some awful code may be (un-)reasonably relying on up to 4.

Of course ensuring this 4-byte padding guarantee only becomes relevant for strings which abut the end of a VM block. All other strings can be tightly packed.

@GrabYourPitchforks
Copy link
Member

@glenn-slayden, which line from that source file runs past the null terminator? I just looked through the source but I don't see an overrun.

@glenn-slayden
Copy link

glenn-slayden commented Jul 26, 2018

@GrabYourPitchforks I didn't say it runs past the null terminator, I said it incorporates its (putative) zero value. If I recall correctly, I was probably referring to https://referencesource.microsoft.com/#mscorlib/system/string.cs,875.

If the 'referencesource' code I just cited hasn't yet been entirely deprecated, and you need a much-improved replacement that exactly preserves the legacy behavior, you're welcome to use the code I posted at https://stackoverflow.com/a/48775953/147511.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants