Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up SuperPMI mcs remove dup process #33946

Merged
merged 7 commits into from
Mar 27, 2020

Commits on Mar 23, 2020

  1. Speed up SuperPMI mcs -removeDup

    Create a "Hash" class that encapsulates the MD5 hashing that is
    used to determine if two MCs are equivalent. Primarily, this
    allows caching the Windows Crypto provider, which it is very slow
    to acquire.
    
    In addition, make some changes to avoid unnecessary memory allocations
    and other unnecessary work.
    
    The result is that `mcs -removeDup` is about 4x faster.
    
    Much of the remaining cost is that we read, deserialize the MC,
    then reserialize the MC (if unique), and finally destroy the in-memory MC.
    There is a lot of memory allocation/deallocation in this process that
    could possibly be avoided or improved for this scenario.
    BruceForstall committed Mar 23, 2020
    Configuration menu
    Copy the full SHA
    c291681 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    71426ee View commit details
    Browse the repository at this point in the history
  3. Add new RemoveDup class

    BruceForstall committed Mar 23, 2020
    Configuration menu
    Copy the full SHA
    c947029 View commit details
    Browse the repository at this point in the history
  4. Add -dedup deduplication to mcs -merge

    Also add `-thin`.
    
    With this,
    
    ```
    mcs.exe -merge base.mch *.mc -recursive
    mcs.exe -removeDup -thin base.mch nodup.mch
    ```
    
    can be replaced with:
    
    ```
    mcs.exe -merge -recursive -dedup -thin nodup.mch *.mc
    ```
    
    The main benefit is avoiding creating a potentially very large base.mch file.
    Related, the data being processed only needs to be loaded once.
    BruceForstall committed Mar 23, 2020
    Configuration menu
    Copy the full SHA
    8bec36b View commit details
    Browse the repository at this point in the history
  5. Fix Linux build break

    BruceForstall committed Mar 23, 2020
    Configuration menu
    Copy the full SHA
    41912fd View commit details
    Browse the repository at this point in the history
  6. Adjust tools to use new mcs -merge -dedup -thin arguments

    Adjust superpmi.py script and superpmicollect.cs unit test.
    BruceForstall committed Mar 23, 2020
    Configuration menu
    Copy the full SHA
    7a4517d View commit details
    Browse the repository at this point in the history
  7. Update readme documentation for SuperPMI

    Add description of `mcs -merge -dedup -thin` arguments and usage.
    BruceForstall committed Mar 23, 2020
    Configuration menu
    Copy the full SHA
    2e67374 View commit details
    Browse the repository at this point in the history