Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LTO] Introduce a new class ImportIDTable #106503

Merged

Conversation

kazutakahirata
Copy link
Contributor

The new class implements a deduplication table to convert import list
elements:

{SourceModule, GUID, Definition/Declaration}

into 32-bit integers, and vice versa. This patch adds a unit test but
does not add a use yet.

To be precise, the deduplication table holds {SourceModule, GUID}
pairs. We use the bottom one bit of the 32-bit integers to indicate
whether we have a definition or declaration.

A subsequent patch will collapse the import list hierarchy --
FunctionsToImportTy holding many instances of FunctionsToImportTy --
down to DenseSet<uint32_t> with each element indexing into the
deduplication table above. This will address multiple sources of
space inefficiency.

The new class implements a deduplication table to convert import list
elements:

  {SourceModule, GUID, Definition/Declaration}

into 32-bit integers, and vice versa.  This patch adds a unit test but
does not add a use yet.

To be precise, the deduplication table holds {SourceModule, GUID}
pairs.  We use the bottom one bit of the 32-bit integers to indicate
whether we have a definition or declaration.

A subsequent patch will collapse the import list hierarchy --
FunctionsToImportTy holding many instances of FunctionsToImportTy --
down to DenseSet<uint32_t> with each element indexing into the
deduplication table above.  This will address multiple sources of
space inefficiency.
@llvmbot
Copy link
Collaborator

llvmbot commented Aug 29, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Kazu Hirata (kazutakahirata)

Changes

The new class implements a deduplication table to convert import list
elements:

{SourceModule, GUID, Definition/Declaration}

into 32-bit integers, and vice versa. This patch adds a unit test but
does not add a use yet.

To be precise, the deduplication table holds {SourceModule, GUID}
pairs. We use the bottom one bit of the 32-bit integers to indicate
whether we have a definition or declaration.

A subsequent patch will collapse the import list hierarchy --
FunctionsToImportTy holding many instances of FunctionsToImportTy --
down to DenseSet<uint32_t> with each element indexing into the
deduplication table above. This will address multiple sources of
space inefficiency.


Full diff: https://github.com/llvm/llvm-project/pull/106503.diff

3 Files Affected:

  • (modified) llvm/include/llvm/Transforms/IPO/FunctionImport.h (+71)
  • (modified) llvm/unittests/Transforms/IPO/CMakeLists.txt (+1)
  • (added) llvm/unittests/Transforms/IPO/ImportIDTableTests.cpp (+79)
diff --git a/llvm/include/llvm/Transforms/IPO/FunctionImport.h b/llvm/include/llvm/Transforms/IPO/FunctionImport.h
index b7280c56be9cc8..99fe110191dec9 100644
--- a/llvm/include/llvm/Transforms/IPO/FunctionImport.h
+++ b/llvm/include/llvm/Transforms/IPO/FunctionImport.h
@@ -10,6 +10,7 @@
 #define LLVM_TRANSFORMS_IPO_FUNCTIONIMPORT_H
 
 #include "llvm/ADT/DenseSet.h"
+#include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/IR/GlobalValue.h"
 #include "llvm/IR/ModuleSummaryIndex.h"
@@ -96,6 +97,76 @@ class FunctionImporter {
                std::tuple<unsigned, const GlobalValueSummary *,
                           std::unique_ptr<ImportFailureInfo>>>;
 
+  // Issues import IDs.  Each ID uniquely corresponds to a tuple of
+  // (FromModule, GUID, Definition/Declaration).
+  //
+  // The import IDs make the import list space efficient by referring to each
+  // import with a 32-bit integer ID while maintaining a central table that maps
+  // those integer IDs to tuples of (FromModule, GUID, Def/Decl).
+  //
+  // In one large application, a pair of (FromModule, GUID) is mentioned in
+  // import lists more than 50 times on average across all destination modules.
+  // Mentioning the 32-byte tuple:
+  //
+  // std::tuple<StringRef, GlobalValue::GUID, GlobalValueSummary::ImportKind>
+  //
+  // 50 times by value in various import lists would be costly.  We can reduce
+  // the memory footprint of import lists by placing one copy in a central table
+  // and referring to it with 32-bit integer IDs.
+  //
+  // To save space within the central table, we only store pairs of
+  // (FromModule, GUID) in the central table.  In the actual 32-bit integer ID,
+  // the top 31 bits index into the central table while the bottom 1 bit
+  // indicates whether an ID is for GlobalValueSummary::Declaration or
+  // GlobalValueSummary::Definition.
+  class ImportIDTable {
+  public:
+    using ImportIDTy = uint32_t;
+
+    // Create a pair of import IDs [Def, Decl] for a given pair of FromModule
+    // and GUID.
+    std::pair<ImportIDTy, ImportIDTy> createImportIDs(StringRef FromModule,
+                                                      GlobalValue::GUID GUID) {
+      auto Key = std::make_pair(FromModule, GUID);
+      auto InsertResult = TheTable.try_emplace(Key, TheTable.size());
+      return makeIDPair(InsertResult.first->second);
+    }
+
+    // Get a pair of previously created import IDs [Def, Decl] for a given pair
+    // of FromModule and GUID.  Returns std::nullopt if not available.
+    std::optional<std::pair<ImportIDTy, ImportIDTy>>
+    getImportIDs(StringRef FromModule, GlobalValue::GUID GUID) {
+      auto Key = std::make_pair(FromModule, GUID);
+      auto It = TheTable.find(Key);
+      if (It != TheTable.end())
+        return makeIDPair(It->second);
+      return std::nullopt;
+    }
+
+    // Return a tuple of [FromModule, GUID, Def/Decl] that a given ImportID
+    // corresponds to.
+    std::tuple<StringRef, GlobalValue::GUID, GlobalValueSummary::ImportKind>
+    lookup(ImportIDTy ImportID) const {
+      GlobalValueSummary::ImportKind Kind =
+          (ImportID & 1) ? GlobalValueSummary::Declaration
+                         : GlobalValueSummary::Definition;
+      auto It = TheTable.begin() + (ImportID >> 1);
+      StringRef FromModule = It->first.first;
+      GlobalValue::GUID GUID = It->first.second;
+      return std::make_tuple(FromModule, GUID, Kind);
+    }
+
+  private:
+    // Make a pair of import IDs [Def, Decl] from an index into TheTable.
+    static std::pair<ImportIDTy, ImportIDTy> makeIDPair(ImportIDTy Index) {
+      ImportIDTy Def = Index << 1;
+      ImportIDTy Decl = Def | 1;
+      return std::make_pair(Def, Decl);
+    }
+
+    MapVector<std::pair<StringRef, GlobalValue::GUID>, ImportIDTy> TheTable;
+  };
+
   /// The map maintains the list of imports.  Conceptually, it is a collection
   /// of tuples of the form:
   ///
diff --git a/llvm/unittests/Transforms/IPO/CMakeLists.txt b/llvm/unittests/Transforms/IPO/CMakeLists.txt
index ac632450d57305..65be2966bec049 100644
--- a/llvm/unittests/Transforms/IPO/CMakeLists.txt
+++ b/llvm/unittests/Transforms/IPO/CMakeLists.txt
@@ -13,4 +13,5 @@ add_llvm_unittest(IPOTests
   WholeProgramDevirt.cpp
   AttributorTest.cpp
   FunctionSpecializationTest.cpp
+  ImportIDTableTests.cpp
   )
diff --git a/llvm/unittests/Transforms/IPO/ImportIDTableTests.cpp b/llvm/unittests/Transforms/IPO/ImportIDTableTests.cpp
new file mode 100644
index 00000000000000..7554ee59c6869a
--- /dev/null
+++ b/llvm/unittests/Transforms/IPO/ImportIDTableTests.cpp
@@ -0,0 +1,79 @@
+//===- ImportIDTableTests.cpp - Unit tests for ImportIDTable --------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Transforms/IPO/FunctionImport.h"
+#include "gmock/gmock.h"
+#include "gtest/gtest.h"
+#include <set>
+#include <type_traits>
+
+using namespace llvm;
+
+TEST(ImportIDTableTests, Basic) {
+  FunctionImporter::ImportIDTable Table;
+
+  auto [Def, Decl] = Table.createImportIDs("mod", 123U);
+  auto [Def2, Decl2] = Table.createImportIDs("stuff", 456U);
+
+  // Def and Decl must be of the same unsigned integer type.
+  static_assert(
+      std::is_unsigned_v<FunctionImporter::ImportIDTable::ImportIDTy>);
+  static_assert(std::is_same_v<FunctionImporter::ImportIDTable::ImportIDTy,
+                               decltype(Def)>);
+  static_assert(std::is_same_v<FunctionImporter::ImportIDTable::ImportIDTy,
+                               decltype(Decl)>);
+
+  // Check that all IDs are unique.
+  std::set<FunctionImporter::ImportIDTable::ImportIDTy> IDs = {Def, Decl, Def2,
+                                                               Decl2};
+  EXPECT_THAT(IDs, ::testing::SizeIs(4));
+
+  // Verify what Def maps to.
+  auto DefTuple = Table.lookup(Def);
+  EXPECT_EQ(std::get<0>(DefTuple), StringRef("mod"));
+  EXPECT_EQ(std::get<1>(DefTuple), 123U);
+  EXPECT_EQ(std::get<2>(DefTuple), GlobalValueSummary::Definition);
+
+  // Verify what Def maps to.
+  auto DeclTuple = Table.lookup(Decl);
+  EXPECT_EQ(std::get<0>(DeclTuple), StringRef("mod"));
+  EXPECT_EQ(std::get<1>(DeclTuple), 123U);
+  EXPECT_EQ(std::get<2>(DeclTuple), GlobalValueSummary::Declaration);
+
+  // Verify what Def2 maps to.
+  auto Def2Tuple = Table.lookup(Def2);
+  EXPECT_EQ(std::get<0>(Def2Tuple), StringRef("stuff"));
+  EXPECT_EQ(std::get<1>(Def2Tuple), 456U);
+  EXPECT_EQ(std::get<2>(Def2Tuple), GlobalValueSummary::Definition);
+
+  // Verify what Def2 maps to.
+  auto Decl2Tuple = Table.lookup(Decl2);
+  EXPECT_EQ(std::get<0>(Decl2Tuple), StringRef("stuff"));
+  EXPECT_EQ(std::get<1>(Decl2Tuple), 456U);
+  EXPECT_EQ(std::get<2>(Decl2Tuple), GlobalValueSummary::Declaration);
+}
+
+TEST(ImportIDTableTests, Duplicates) {
+  FunctionImporter::ImportIDTable Table;
+
+  auto [Def1, Decl1] = Table.createImportIDs("mod", 123U);
+  auto [Def2, Decl2] = Table.createImportIDs("mod", 123U);
+
+  // Verify we get the same IDs back.
+  EXPECT_EQ(Def1, Def2);
+  EXPECT_EQ(Decl1, Decl2);
+}
+
+TEST(ImportIDTableTests, Missing) {
+  FunctionImporter::ImportIDTable Table;
+
+  auto Result = Table.getImportIDs("mod", 123U);
+
+  // Verify that we get std::nullopt for a non-existent pair.
+  EXPECT_EQ(Result, std::nullopt);
+}

Add a test for getImportIDs returning existing IDs.
@kazutakahirata
Copy link
Contributor Author

Addressed the feedback. PTAL. Thanks!

@kazutakahirata kazutakahirata merged commit bd6531b into llvm:main Aug 29, 2024
5 of 7 checks passed
@kazutakahirata kazutakahirata deleted the cleanup_thinlto_ImportIDTable branch August 29, 2024 16:45
@llvm-ci
Copy link
Collaborator

llvm-ci commented Aug 30, 2024

LLVM Buildbot has detected a new failure on builder lld-x86_64-win running on as-worker-93 while building llvm at step 7 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/146/builds/999

Here is the relevant piece of the build log for the reference
Step 7 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM-Unit :: Support/./SupportTests.exe/42/86' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe-LLVM-Unit-1948-42-86.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=86 GTEST_SHARD_INDEX=42 C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe
--

Script:
--
C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe --gtest_filter=ProgramEnvTest.CreateProcessLongPath
--
C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp(160): error: Expected equality of these values:
  0
  RC
    Which is: -2

C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp(163): error: fs::remove(Twine(LongPath)): did not return errc::success.
error number: 13
error message: permission denied



C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp:160
Expected equality of these values:
  0
  RC
    Which is: -2

C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp:163
fs::remove(Twine(LongPath)): did not return errc::success.
error number: 13
error message: permission denied




********************


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants