-
-
Notifications
You must be signed in to change notification settings - Fork 984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Randomization #1587
Memory Randomization #1587
Conversation
…-size array between iterations and calls global setup after it
With MemoryRandomization enabled and no outliers removal we get distribution (3 buckets|modes) similar to what we have gathered from multiple runs in the past: https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmaster_x64_Windows%2010.0.18362%2fSystem.Collections.CopyTo(Int32).Array(Size%3a%202048).html |
@kunalspathak if this build gets green (a matter of 30 minutes from now) then a new package <packageSources>
<add key="bdn-ci" value="https://ci.appveyor.com/nuget/benchmarkdotnet" />
</packageSources> |
cc @AndyAyersMS |
This looks really promising. I wonder if we might need something more sophisticated eventually. We don't know is which GC heap(s) the benchmark is accessing. We can impact Gen0/LOH alignments but it's trickier to impact Gen1/Gen2. Stack alignment might also come into play -- perhaps the benchmark runner can do random-sized stackallocs too? I suspect for Gen0 we only need very small alignments changes, perhaps just fractions of cache line sizes, though we might need to go all the way up to fractions of page sizes (though it is hard to imagine us doing enough iterations to really cover the space of possibilities here). For LOH likewise, some 85K+ base plus a cache-line-sized random amount on top. This also tells us we might need to pay more attention to controlling alignment of some key data in the runtime (eg frequently accessed static arrays?) Worth thinking about, anyways. |
make sure that the random-sized object gets promoted to Gen 1 and Gen 2 allocate sth on LOH too
By default every benchmark is single-threaded so it should not be a problem in most cases. When it comes to multithreaded benchmarks we could achieve that by for example having an affinitzed thread per core, but this would require much more work... I'll keep this in the back of my head and when it becomes a problem try to improve the solution.
Very good point! I've modified the code and made sure that the object gets promoted to Gen 1 and Gen 2 by keeping it alive for two GC collections: Debug.Assert(GC.GetGeneration(gen0object) == 0);
GC.Collect(0); // get it promoted to Gen 1
GC.Collect(1); // get it promoted to Gen 2
GC.KeepAlive(gen0object);
And another great point! Edit: I've added a way to allocate stack memory and keep it alive for iteration period
I have added that as well: var lohObject = new byte[85 * 1024 + random.Next(32)];
Debug.Assert(GC.GetGeneration(lohObject) == 2);
GC.KeepAlive(lohObject);
Maybe we should make the |
If anyone wants to give it a try then you need to use the <packageSources>
<add key="bdn-ci" value="https://ci.appveyor.com/nuget/benchmarkdotnet" />
</packageSources> |
I'm not sure if promoting will help or not... I guess my point was that the code being benchmarked may read from all sorts of objects and as far as I can tell we can't reliably randomize all their addresses. What you had initially may end up working better, if benchmarks tend to read from objects allocated after the random object. And perhaps the random LOH allocation will help too. But influencing Gen1 / Gen2 addresses seems harder. |
I am not sure if targeting specific GC behaviors (i.e. promotions) is necessary. Besides some of those behaviors could change with different tunings, state of the machine, server vs. workstation GC, etc.. It could be fairly certain though that objects allocated together in a sequence will typically be allocated together and likely stay together even when relocated. I think just allocating batches of objects of varying size should be sufficient, if I get the idea here. Varying the size differences within cache line could be enough. 64bit is the largest granularity that GC would align by itself. |
A very simple implementation of #1513 that I hope is going to help us answer the question of whether we should invest more in this direction.
I am creating the PR just to have a NuGet package published by our CI so others can easily give it a try.
Sample benchmark:
Default settings
-------------------- Histogram -------------------- [502.859 ns ; 508.045 ns) | @@@@@@@@@@@@@@@ ---------------------------------------------------
MemoryRandomization set to true and Default Outliers setting (remove upper)
dotnet run -c Release -f netcoreapp2.1 --filter IntroMemoryRandomization --memoryRandomization true --maxIterationCount 50
MemoryRandomization set to true and custom Outliers setting: don't remove any
dotnet run -c Release -f netcoreapp2.1 --filter IntroMemoryRandomization --memoryRandomization true --outliers DontRemove --maxIterationCount 50
@kunalspathak