Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up SuperPMI mcs remove dup process #33946

Merged
merged 7 commits into from
Mar 27, 2020

Conversation

BruceForstall
Copy link
Member

To speed up the "remove dup" code, create a "Hash" class that encapsulates the MD5 hashing
that is used to determine if two MCs are equivalent. Primarily, this allows caching the
Windows Crypto provider, which it is very slow to acquire.

In addition, make some changes to avoid unnecessary memory allocations
and other unnecessary work.

The result is that mcs -removeDup is about 4x faster.

Much of the remaining cost is that we read, deserialize the MC,
then reserialize the MC (if unique), and finally destroy the in-memory MC.
There is a lot of memory allocation/deallocation in this process that
could possibly be avoided or improved for this scenario.

In addition, add support to mcs -merge to also remove duplicates at the same time.
This removes the need to create merged file with duplicates, and thus significantly reduces the
disk space required to do a collection, as well as speeds up the collection processing.

Update the superpmi.py drive script, superpmicollect unit test, and documentation to match.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 23, 2020
@BruceForstall BruceForstall reopened this Mar 23, 2020
@BruceForstall
Copy link
Member Author

@dotnet/runtime-infrastructure It looks like almost all the CI jobs are "cancelled" with "#[error]The remote provider was unable to process the request."

Create a "Hash" class that encapsulates the MD5 hashing that is
used to determine if two MCs are equivalent. Primarily, this
allows caching the Windows Crypto provider, which it is very slow
to acquire.

In addition, make some changes to avoid unnecessary memory allocations
and other unnecessary work.

The result is that `mcs -removeDup` is about 4x faster.

Much of the remaining cost is that we read, deserialize the MC,
then reserialize the MC (if unique), and finally destroy the in-memory MC.
There is a lot of memory allocation/deallocation in this process that
could possibly be avoided or improved for this scenario.
Also add `-thin`.

With this,

```
mcs.exe -merge base.mch *.mc -recursive
mcs.exe -removeDup -thin base.mch nodup.mch
```

can be replaced with:

```
mcs.exe -merge -recursive -dedup -thin nodup.mch *.mc
```

The main benefit is avoiding creating a potentially very large base.mch file.
Related, the data being processed only needs to be loaded once.
Adjust superpmi.py script and superpmicollect.cs unit test.
Add description of `mcs -merge -dedup -thin` arguments and usage.
@BruceForstall
Copy link
Member Author

@dotnet/jit-contrib

Comment on lines +102 to +105
if (m_inFileLegacy->GetIndex(newInfo.ILCodeSize) == -1)
m_inFileLegacy->Add(newInfo.ILCodeSize, new DenseLightWeightMap<MethodContext*>());

DenseLightWeightMap<MethodContext*>* ourRank = m_inFileLegacy->Get(newInfo.ILCodeSize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe avoid the lookup here when GetIndex() returns -1 and set ourRank to the newly-constructed DenseLightWeightMap?

@dotnet dotnet deleted a comment from azure-pipelines bot Mar 24, 2020
@dotnet dotnet deleted a comment from azure-pipelines bot Mar 24, 2020
@BruceForstall
Copy link
Member Author

libraries test failure is infrastructure / System.Net flakiness (of course)

@BruceForstall
Copy link
Member Author

ping

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall.

What do we need the legacy mode for?

@BruceForstall
Copy link
Member Author

What do we need the legacy mode for?

I don't think I've ever used it. I should probably ask Scott sometime. We could probably get rid of all that code.

@BruceForstall BruceForstall merged commit bd30e15 into dotnet:master Mar 27, 2020
@BruceForstall BruceForstall deleted the AddRemoveDupToMerge branch March 27, 2020 02:43
@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants