Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ConstantFold] Drop gep of gep fold entirely #95126

Merged
merged 1 commit into from
Jun 12, 2024

Conversation

nikic
Copy link
Contributor

@nikic nikic commented Jun 11, 2024

This is a followup to #93823 and drops the DataLayout-unaware GEP of GEP fold entirely. All cases are now left to the DataLayout-aware constant folder, which will fold everything to a single i8 GEP.

We didn't have any test coverage for this fold in LLVM, but some Clang tests change.

This is a followup to llvm#93823
and drop the DataLayout-unaware GEP of GEP fold entirely. All cases
are now left to the DataLayout-aware constant folder, which will
fold everything to a single i8 GEP.

We didn't have any test coverage for this fold in LLVM, but some
Clang tests change.
@nikic nikic requested review from aeubanks and dtcxzyw June 11, 2024 14:35
@llvmbot llvmbot added clang Clang issues not falling into any other category llvm:ir llvm:analysis clang:openmp OpenMP related changes to Clang labels Jun 11, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Jun 11, 2024

@llvm/pr-subscribers-llvm-analysis

Author: Nikita Popov (nikic)

Changes

This is a followup to #93823 and drops the DataLayout-unaware GEP of GEP fold entirely. All cases are now left to the DataLayout-aware constant folder, which will fold everything to a single i8 GEP.

We didn't have any test coverage for this fold in LLVM, but some Clang tests change.


Full diff: https://github.com/llvm/llvm-project/pull/95126.diff

10 Files Affected:

  • (modified) clang/test/CodeGenCUDA/managed-var.cu (+9-7)
  • (modified) clang/test/CodeGenCXX/2011-12-19-init-list-ctor.cpp (+2-2)
  • (modified) clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist-pr12086.cpp (+4-4)
  • (modified) clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp (+2-2)
  • (modified) clang/test/CodeGenCXX/ms-inline-asm-fields.cpp (+1-1)
  • (modified) clang/test/OpenMP/threadprivate_codegen.cpp (+8-8)
  • (modified) llvm/include/llvm/IR/ConstantFold.h (+1-1)
  • (modified) llvm/lib/Analysis/InstructionSimplify.cpp (+2-3)
  • (modified) llvm/lib/IR/ConstantFold.cpp (-33)
  • (modified) llvm/lib/IR/Constants.cpp (+1-2)
diff --git a/clang/test/CodeGenCUDA/managed-var.cu b/clang/test/CodeGenCUDA/managed-var.cu
index 5206acc62fe00..07e1a1e692c75 100644
--- a/clang/test/CodeGenCUDA/managed-var.cu
+++ b/clang/test/CodeGenCUDA/managed-var.cu
@@ -127,9 +127,10 @@ __device__ __host__ float load2() {
 
 // HOST-LABEL: define {{.*}}@_Z5load3v()
 // HOST:  %ld.managed = load ptr, ptr @v2, align 16
-// HOST:  %0 = getelementptr inbounds [100 x %struct.vec], ptr %ld.managed, i64 0, i64 1, i32 1
-// HOST:  %1 = load float, ptr %0, align 4
-// HOST:  ret float %1
+// HOST:  %0 = getelementptr inbounds [100 x %struct.vec], ptr %ld.managed, i64 0, i64 1
+// HOST:  %1 = getelementptr inbounds %struct.vec, ptr %0, i32 0, i32 1
+// HOST:  %2 = load float, ptr %1, align 4
+// HOST:  ret float %2
 float load3() {
   return v2[1].y;
 }
@@ -139,10 +140,11 @@ float load3() {
 // HOST:  %0 = getelementptr inbounds [100 x %struct.vec], ptr %ld.managed, i64 0, i64 1
 // HOST:  %1 = ptrtoint ptr %0 to i64
 // HOST:  %ld.managed1 = load ptr, ptr @v2, align 16
-// HOST:  %2 = getelementptr inbounds [100 x %struct.vec], ptr %ld.managed1, i64 0, i64 1, i32 1
-// HOST:  %3 = ptrtoint ptr %2 to i64
-// HOST:  %4 = sub i64 %3, %1
-// HOST:  %sub.ptr.div = sdiv exact i64 %4, 4
+// HOST:  %2 = getelementptr inbounds [100 x %struct.vec], ptr %ld.managed1, i64 0, i64 1
+// HOST:  %3 = getelementptr inbounds %struct.vec, ptr %2, i32 0, i32 1
+// HOST:  %4 = ptrtoint ptr %3 to i64
+// HOST:  %5 = sub i64 %4, %1
+// HOST:  %sub.ptr.div = sdiv exact i64 %5, 4
 // HOST:  %conv = sitofp i64 %sub.ptr.div to float
 // HOST:  ret float %conv
 float addr_taken2() {
diff --git a/clang/test/CodeGenCXX/2011-12-19-init-list-ctor.cpp b/clang/test/CodeGenCXX/2011-12-19-init-list-ctor.cpp
index 4033adc8f0390..14557829268ef 100644
--- a/clang/test/CodeGenCXX/2011-12-19-init-list-ctor.cpp
+++ b/clang/test/CodeGenCXX/2011-12-19-init-list-ctor.cpp
@@ -21,6 +21,6 @@ struct S {
 // CHECK: store i32 0, ptr @arr
 // CHECK: call void @_ZN1AC1EPKc(ptr {{[^,]*}} getelementptr inbounds (%struct.S, ptr @arr, i32 0, i32 1), ptr noundef @.str)
 // CHECK: store i32 1, ptr getelementptr inbounds (%struct.S, ptr @arr, i64 1)
-// CHECK: call void @_ZN1AC1EPKc(ptr {{[^,]*}} getelementptr inbounds (%struct.S, ptr @arr, i64 1, i32 1), ptr noundef @.str.1)
+// CHECK: call void @_ZN1AC1EPKc(ptr {{[^,]*}} getelementptr inbounds (%struct.S, ptr getelementptr inbounds (%struct.S, ptr @arr, i64 1), i32 0, i32 1), ptr noundef @.str.1)
 // CHECK: store i32 2, ptr getelementptr inbounds (%struct.S, ptr @arr, i64 2)
-// CHECK: call void @_ZN1AC1EPKc(ptr {{[^,]*}} getelementptr inbounds (%struct.S, ptr @arr, i64 2, i32 1), ptr noundef @.str.2)
+// CHECK: call void @_ZN1AC1EPKc(ptr {{[^,]*}} getelementptr inbounds (%struct.S, ptr getelementptr inbounds (%struct.S, ptr @arr, i64 2), i32 0, i32 1), ptr noundef @.str.2)
diff --git a/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist-pr12086.cpp b/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist-pr12086.cpp
index 6fbe4c7fd17a7..caa92f47a93c2 100644
--- a/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist-pr12086.cpp
+++ b/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist-pr12086.cpp
@@ -79,12 +79,12 @@ std::initializer_list<std::initializer_list<int>> nested = {
 // CHECK-DYNAMIC-BL: store i32 {{.*}}, ptr getelementptr inbounds (i32, ptr @_ZGR6nested1_, i64 1)
 // CHECK-DYNAMIC-BL: store ptr @_ZGR6nested1_,
 // CHECK-DYNAMIC-BL:       ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 1), align 8
-// CHECK-DYNAMIC-BL: store i64 2, ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 1, i32 1), align 8
+// CHECK-DYNAMIC-BL: store i64 2, ptr getelementptr inbounds ({{.*}}, ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 1), i32 0, i32 1), align 8
 // CHECK-DYNAMIC-BL: store i32 5, ptr @_ZGR6nested2_
 // CHECK-DYNAMIC-BL: store i32 {{.*}}, ptr getelementptr inbounds (i32, ptr @_ZGR6nested2_, i64 1)
 // CHECK-DYNAMIC-BL: store ptr @_ZGR6nested2_,
 // CHECK-DYNAMIC-BL:       ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 2), align 8
-// CHECK-DYNAMIC-BL: store i64 2, ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 2, i32 1), align 8
+// CHECK-DYNAMIC-BL: store i64 2, ptr getelementptr inbounds ({{.*}}, ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 2), i32 0, i32 1), align 8
 // CHECK-DYNAMIC-BL: store ptr @_ZGR6nested_,
 // CHECK-DYNAMIC-BL:       ptr @nested, align 8
 // CHECK-DYNAMIC-BL: store i64 3, ptr getelementptr inbounds ({{.*}}, ptr @nested, i32 0, i32 1), align 8
@@ -119,13 +119,13 @@ std::initializer_list<std::initializer_list<int>> nested = {
 // CHECK-DYNAMIC-BE: store ptr @_ZGR6nested1_,
 // CHECK-DYNAMIC-BE:       ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 1), align 8
 // CHECK-DYNAMIC-BE: store ptr getelementptr inbounds ([2 x i32], ptr @_ZGR6nested1_, i64 0, i64 2),
-// CHECK-DYNAMIC-BE:       ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 1, i32 1), align 8
+// CHECK-DYNAMIC-BE:       ptr getelementptr inbounds ({{.*}}, ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 1), i32 0, i32 1), align 8
 // CHECK-DYNAMIC-BE: store i32 5, ptr @_ZGR6nested2_
 // CHECK-DYNAMIC-BE: store i32 {{.*}}, ptr getelementptr inbounds (i32, ptr @_ZGR6nested2_, i64 1)
 // CHECK-DYNAMIC-BE: store ptr @_ZGR6nested2_,
 // CHECK-DYNAMIC-BE:       ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 2), align 8
 // CHECK-DYNAMIC-BE: store ptr getelementptr inbounds ([2 x i32], ptr @_ZGR6nested2_, i64 0, i64 2),
-// CHECK-DYNAMIC-BE:       ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 2, i32 1), align 8
+// CHECK-DYNAMIC-BE:       ptr getelementptr inbounds ({{.*}}, ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 2), i32 0, i32 1), align 8
 // CHECK-DYNAMIC-BE: store ptr @_ZGR6nested_,
 // CHECK-DYNAMIC-BE:       ptr @nested, align 8
 // CHECK-DYNAMIC-BE: store ptr getelementptr inbounds ([3 x {{.*}}], ptr @_ZGR6nested_, i64 0, i64 3),
diff --git a/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp b/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp
index 3d0cf968bfd54..ef05f0334cd75 100644
--- a/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp
+++ b/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp
@@ -365,12 +365,12 @@ namespace partly_constant {
   //
   // Second init list.
   // CHECK: store ptr {{.*}}@[[PARTLY_CONSTANT_SECOND]]{{.*}}, ptr getelementptr inbounds ({{.*}}, ptr {{.*}}@[[PARTLY_CONSTANT_INNER]]{{.*}}, i64 1)
-  // CHECK: store i64 2, ptr getelementptr inbounds ({{.*}}, ptr {{.*}}@[[PARTLY_CONSTANT_INNER]]{{.*}}, i64 1, i32 1)
+  // CHECK: store i64 2, ptr getelementptr inbounds ({{.*}}, ptr getelementptr inbounds ({{.*}}, ptr {{.*}}@[[PARTLY_CONSTANT_INNER]]{{.*}}, i64 1), i32 0, i32 1)
   //
   // Third init list.
   // CHECK-NOT: @[[PARTLY_CONSTANT_THIRD]],
   // CHECK: store ptr {{.*}}@[[PARTLY_CONSTANT_THIRD]]{{.*}}, ptr getelementptr inbounds ({{.*}}, ptr {{.*}}@[[PARTLY_CONSTANT_INNER]]{{.*}}, i64 2)
-  // CHECK: store i64 4, ptr getelementptr inbounds ({{.*}}, ptr {{.*}}@[[PARTLY_CONSTANT_INNER]]{{.*}}, i64 2, i32 1)
+  // CHECK: store i64 4, ptr getelementptr inbounds ({{.*}}, ptr getelementptr inbounds ({{.*}}, ptr {{.*}}@[[PARTLY_CONSTANT_INNER]]{{.*}}, i64 2), i32 0, i32 1)
   // CHECK-NOT: @[[PARTLY_CONSTANT_THIRD]],
   //
   // Outer init list.
diff --git a/clang/test/CodeGenCXX/ms-inline-asm-fields.cpp b/clang/test/CodeGenCXX/ms-inline-asm-fields.cpp
index 403e9c8427760..e3441d0b4614e 100644
--- a/clang/test/CodeGenCXX/ms-inline-asm-fields.cpp
+++ b/clang/test/CodeGenCXX/ms-inline-asm-fields.cpp
@@ -24,7 +24,7 @@ extern "C" int test_param_field(A p) {
 
 extern "C" int test_namespace_global() {
 // CHECK: define{{.*}} i32 @test_namespace_global()
-// CHECK: call i32 asm sideeffect inteldialect "mov eax, $1", "{{.*}}"(ptr elementtype(i32) getelementptr inbounds (%struct.A, ptr @_ZN4asdf8a_globalE, i32 0, i32 2, i32 1))
+// CHECK: call i32 asm sideeffect inteldialect "mov eax, $1", "{{.*}}"(ptr elementtype(i32) getelementptr inbounds (%"struct.A::B", ptr getelementptr inbounds (%struct.A, ptr @_ZN4asdf8a_globalE, i32 0, i32 2), i32 0, i32 1))
 // CHECK: ret i32
   __asm mov eax, asdf::a_global.a3.b2
 }
diff --git a/clang/test/OpenMP/threadprivate_codegen.cpp b/clang/test/OpenMP/threadprivate_codegen.cpp
index 2ee328008cc4b..7a6269954d39e 100644
--- a/clang/test/OpenMP/threadprivate_codegen.cpp
+++ b/clang/test/OpenMP/threadprivate_codegen.cpp
@@ -2586,7 +2586,7 @@ int foobar() {
 // SIMD1-NEXT:    [[TMP12:%.*]] = load i32, ptr [[RES]], align 4
 // SIMD1-NEXT:    [[ADD3:%.*]] = add nsw i32 [[TMP12]], [[TMP11]]
 // SIMD1-NEXT:    store i32 [[ADD3]], ptr [[RES]], align 4
-// SIMD1-NEXT:    [[TMP13:%.*]] = load i32, ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1, i64 1), align 4
+// SIMD1-NEXT:    [[TMP13:%.*]] = load i32, ptr getelementptr inbounds ([3 x %struct.S1], ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1), i64 0, i64 1), align 4
 // SIMD1-NEXT:    [[TMP14:%.*]] = load i32, ptr [[RES]], align 4
 // SIMD1-NEXT:    [[ADD4:%.*]] = add nsw i32 [[TMP14]], [[TMP13]]
 // SIMD1-NEXT:    store i32 [[ADD4]], ptr [[RES]], align 4
@@ -2663,7 +2663,7 @@ int foobar() {
 // SIMD1-NEXT:    [[TMP6:%.*]] = load i32, ptr [[RES]], align 4
 // SIMD1-NEXT:    [[ADD2:%.*]] = add nsw i32 [[TMP6]], [[TMP5]]
 // SIMD1-NEXT:    store i32 [[ADD2]], ptr [[RES]], align 4
-// SIMD1-NEXT:    [[TMP7:%.*]] = load i32, ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1, i64 1), align 4
+// SIMD1-NEXT:    [[TMP7:%.*]] = load i32, ptr getelementptr inbounds ([3 x %struct.S1], ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1), i64 0, i64 1), align 4
 // SIMD1-NEXT:    [[TMP8:%.*]] = load i32, ptr [[RES]], align 4
 // SIMD1-NEXT:    [[ADD3:%.*]] = add nsw i32 [[TMP8]], [[TMP7]]
 // SIMD1-NEXT:    store i32 [[ADD3]], ptr [[RES]], align 4
@@ -3052,7 +3052,7 @@ int foobar() {
 // SIMD2-NEXT:    [[TMP12:%.*]] = load i32, ptr [[RES]], align 4, !dbg [[DBG187:![0-9]+]]
 // SIMD2-NEXT:    [[ADD3:%.*]] = add nsw i32 [[TMP12]], [[TMP11]], !dbg [[DBG187]]
 // SIMD2-NEXT:    store i32 [[ADD3]], ptr [[RES]], align 4, !dbg [[DBG187]]
-// SIMD2-NEXT:    [[TMP13:%.*]] = load i32, ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1, i64 1), align 4, !dbg [[DBG188:![0-9]+]]
+// SIMD2-NEXT:    [[TMP13:%.*]] = load i32, ptr getelementptr inbounds ([3 x %struct.S1], ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1), i64 0, i64 1), align 4, !dbg [[DBG188:![0-9]+]]
 // SIMD2-NEXT:    [[TMP14:%.*]] = load i32, ptr [[RES]], align 4, !dbg [[DBG189:![0-9]+]]
 // SIMD2-NEXT:    [[ADD4:%.*]] = add nsw i32 [[TMP14]], [[TMP13]], !dbg [[DBG189]]
 // SIMD2-NEXT:    store i32 [[ADD4]], ptr [[RES]], align 4, !dbg [[DBG189]]
@@ -3133,7 +3133,7 @@ int foobar() {
 // SIMD2-NEXT:    [[TMP6:%.*]] = load i32, ptr [[RES]], align 4, !dbg [[DBG222:![0-9]+]]
 // SIMD2-NEXT:    [[ADD2:%.*]] = add nsw i32 [[TMP6]], [[TMP5]], !dbg [[DBG222]]
 // SIMD2-NEXT:    store i32 [[ADD2]], ptr [[RES]], align 4, !dbg [[DBG222]]
-// SIMD2-NEXT:    [[TMP7:%.*]] = load i32, ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1, i64 1), align 4, !dbg [[DBG223:![0-9]+]]
+// SIMD2-NEXT:    [[TMP7:%.*]] = load i32, ptr getelementptr inbounds ([3 x %struct.S1], ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1), i64 0, i64 1), align 4, !dbg [[DBG223:![0-9]+]]
 // SIMD2-NEXT:    [[TMP8:%.*]] = load i32, ptr [[RES]], align 4, !dbg [[DBG224:![0-9]+]]
 // SIMD2-NEXT:    [[ADD3:%.*]] = add nsw i32 [[TMP8]], [[TMP7]], !dbg [[DBG224]]
 // SIMD2-NEXT:    store i32 [[ADD3]], ptr [[RES]], align 4, !dbg [[DBG224]]
@@ -5707,7 +5707,7 @@ int foobar() {
 // SIMD3-NEXT:    [[TMP12:%.*]] = load i32, ptr [[RES]], align 4
 // SIMD3-NEXT:    [[ADD3:%.*]] = add nsw i32 [[TMP12]], [[TMP11]]
 // SIMD3-NEXT:    store i32 [[ADD3]], ptr [[RES]], align 4
-// SIMD3-NEXT:    [[TMP13:%.*]] = load i32, ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1, i64 1), align 4
+// SIMD3-NEXT:    [[TMP13:%.*]] = load i32, ptr getelementptr inbounds ([3 x %struct.S1], ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1), i64 0, i64 1), align 4
 // SIMD3-NEXT:    [[TMP14:%.*]] = load i32, ptr [[RES]], align 4
 // SIMD3-NEXT:    [[ADD4:%.*]] = add nsw i32 [[TMP14]], [[TMP13]]
 // SIMD3-NEXT:    store i32 [[ADD4]], ptr [[RES]], align 4
@@ -5784,7 +5784,7 @@ int foobar() {
 // SIMD3-NEXT:    [[TMP6:%.*]] = load i32, ptr [[RES]], align 4
 // SIMD3-NEXT:    [[ADD2:%.*]] = add nsw i32 [[TMP6]], [[TMP5]]
 // SIMD3-NEXT:    store i32 [[ADD2]], ptr [[RES]], align 4
-// SIMD3-NEXT:    [[TMP7:%.*]] = load i32, ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1, i64 1), align 4
+// SIMD3-NEXT:    [[TMP7:%.*]] = load i32, ptr getelementptr inbounds ([3 x %struct.S1], ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1), i64 0, i64 1), align 4
 // SIMD3-NEXT:    [[TMP8:%.*]] = load i32, ptr [[RES]], align 4
 // SIMD3-NEXT:    [[ADD3:%.*]] = add nsw i32 [[TMP8]], [[TMP7]]
 // SIMD3-NEXT:    store i32 [[ADD3]], ptr [[RES]], align 4
@@ -6173,7 +6173,7 @@ int foobar() {
 // SIMD4-NEXT:    [[TMP12:%.*]] = load i32, ptr [[RES]], align 4, !dbg [[DBG187:![0-9]+]]
 // SIMD4-NEXT:    [[ADD3:%.*]] = add nsw i32 [[TMP12]], [[TMP11]], !dbg [[DBG187]]
 // SIMD4-NEXT:    store i32 [[ADD3]], ptr [[RES]], align 4, !dbg [[DBG187]]
-// SIMD4-NEXT:    [[TMP13:%.*]] = load i32, ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1, i64 1), align 4, !dbg [[DBG188:![0-9]+]]
+// SIMD4-NEXT:    [[TMP13:%.*]] = load i32, ptr getelementptr inbounds ([3 x %struct.S1], ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1), i64 0, i64 1), align 4, !dbg [[DBG188:![0-9]+]]
 // SIMD4-NEXT:    [[TMP14:%.*]] = load i32, ptr [[RES]], align 4, !dbg [[DBG189:![0-9]+]]
 // SIMD4-NEXT:    [[ADD4:%.*]] = add nsw i32 [[TMP14]], [[TMP13]], !dbg [[DBG189]]
 // SIMD4-NEXT:    store i32 [[ADD4]], ptr [[RES]], align 4, !dbg [[DBG189]]
@@ -6254,7 +6254,7 @@ int foobar() {
 // SIMD4-NEXT:    [[TMP6:%.*]] = load i32, ptr [[RES]], align 4, !dbg [[DBG222:![0-9]+]]
 // SIMD4-NEXT:    [[ADD2:%.*]] = add nsw i32 [[TMP6]], [[TMP5]], !dbg [[DBG222]]
 // SIMD4-NEXT:    store i32 [[ADD2]], ptr [[RES]], align 4, !dbg [[DBG222]]
-// SIMD4-NEXT:    [[TMP7:%.*]] = load i32, ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1, i64 1), align 4, !dbg [[DBG223:![0-9]+]]
+// SIMD4-NEXT:    [[TMP7:%.*]] = load i32, ptr getelementptr inbounds ([3 x %struct.S1], ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1), i64 0, i64 1), align 4, !dbg [[DBG223:![0-9]+]]
 // SIMD4-NEXT:    [[TMP8:%.*]] = load i32, ptr [[RES]], align 4, !dbg [[DBG224:![0-9]+]]
 // SIMD4-NEXT:    [[ADD3:%.*]] = add nsw i32 [[TMP8]], [[TMP7]], !dbg [[DBG224]]
 // SIMD4-NEXT:    store i32 [[ADD3]], ptr [[RES]], align 4, !dbg [[DBG224]]
diff --git a/llvm/include/llvm/IR/ConstantFold.h b/llvm/include/llvm/IR/ConstantFold.h
index 9b3c8a0e5a632..42043d365b2d3 100644
--- a/llvm/include/llvm/IR/ConstantFold.h
+++ b/llvm/include/llvm/IR/ConstantFold.h
@@ -52,7 +52,7 @@ namespace llvm {
                                           Constant *V2);
   Constant *ConstantFoldCompareInstruction(CmpInst::Predicate Predicate,
                                            Constant *C1, Constant *C2);
-  Constant *ConstantFoldGetElementPtr(Type *Ty, Constant *C, bool InBounds,
+  Constant *ConstantFoldGetElementPtr(Type *Ty, Constant *C,
                                       std::optional<ConstantRange> InRange,
                                       ArrayRef<Value *> Idxs);
 } // End llvm namespace
diff --git a/llvm/lib/Analysis/InstructionSimplify.cpp b/llvm/lib/Analysis/InstructionSimplify.cpp
index 8b2aa6b9f18b0..00895860ca49b 100644
--- a/llvm/lib/Analysis/InstructionSimplify.cpp
+++ b/llvm/lib/Analysis/InstructionSimplify.cpp
@@ -5068,9 +5068,8 @@ static Value *simplifyGEPInst(Type *SrcTy, Value *Ptr,
     return nullptr;
 
   if (!ConstantExpr::isSupportedGetElementPtr(SrcTy))
-    // TODO(gep_nowrap): Pass on the whole GEPNoWrapFlags.
-    return ConstantFoldGetElementPtr(SrcTy, cast<Constant>(Ptr),
-                                     NW.isInBounds(), std::nullopt, Indices);
+    return ConstantFoldGetElementPtr(SrcTy, cast<Constant>(Ptr), std::nullopt,
+                                     Indices);
 
   auto *CE =
       ConstantExpr::getGetElementPtr(SrcTy, cast<Constant>(Ptr), Indices, NW);
diff --git a/llvm/lib/IR/ConstantFold.cpp b/llvm/lib/IR/ConstantFold.cpp
index 77a833610a3a9..34bcf36ec212c 100644
--- a/llvm/lib/IR/ConstantFold.cpp
+++ b/llvm/lib/IR/ConstantFold.cpp
@@ -1404,35 +1404,7 @@ Constant *llvm::ConstantFoldCompareInstruction(CmpInst::Predicate Predicate,
   return nullptr;
 }
 
-// Combine Indices - If the source pointer to this getelementptr instruction
-// is a getelementptr instruction, combine the indices of the two
-// getelementptr instructions into a single instruction.
-static Constant *foldGEPOfGEP(GEPOperator *GEP, Type *PointeeTy, bool InBounds,
-                              ArrayRef<Value *> Idxs) {
-  if (PointeeTy != GEP->getResultElementType())
-    return nullptr;
-
-  // Leave inrange handling to DL-aware constant folding.
-  if (GEP->getInRange())
-    return nullptr;
-
-  // Only handle simple case with leading zero index. We cannot perform an
-  // actual addition as we don't know the correct index type size to use.
-  Constant *Idx0 = cast<Constant>(Idxs[0]);
-  if (!Idx0->isNullValue())
-    return nullptr;
-
-  SmallVector<Value*, 16> NewIndices;
-  NewIndices.reserve(Idxs.size() + GEP->getNumIndices());
-  NewIndices.append(GEP->idx_begin(), GEP->idx_end());
-  NewIndices.append(Idxs.begin() + 1, Idxs.end());
-  return ConstantExpr::getGetElementPtr(
-      GEP->getSourceElementType(), cast<Constant>(GEP->getPointerOperand()),
-      NewIndices, InBounds && GEP->isInBounds());
-}
-
 Constant *llvm::ConstantFoldGetElementPtr(Type *PointeeTy, Constant *C,
-                                          bool InBounds,
                                           std::optional<ConstantRange> InRange,
                                           ArrayRef<Value *> Idxs) {
   if (Idxs.empty()) return C;
@@ -1462,10 +1434,5 @@ Constant *llvm::ConstantFoldGetElementPtr(Type *PointeeTy, Constant *C,
                      cast<VectorType>(GEPTy)->getElementCount(), C)
                : C;
 
-  if (ConstantExpr *CE = dyn_cast<ConstantExpr>(C))
-    if (auto *GEP = dyn_cast<GEPOperator>(CE))
-      if (Constant *C = foldGEPOfGEP(GEP, PointeeTy, InBounds, Idxs))
-        return C;
-
   return nullptr;
 }
diff --git a/llvm/lib/IR/Constants.cpp b/llvm/lib/IR/Constants.cpp
index a76be441875a1..d07907372f0e4 100644
--- a/llvm/lib/IR/Constants.cpp
+++ b/llvm/lib/IR/Constants.cpp
@@ -2438,8 +2438,7 @@ Constant *ConstantExpr::getGetElementPtr(Type *Ty, Constant *C,
   assert(Ty && "Must specify element type");
   assert(isSupportedGetElementPtr(Ty) && "Element type is unsupported!");
 
-  if (Constant *FC =
-          ConstantFoldGetElementPtr(Ty, C, NW.isInBounds(), InRange, Idxs))
+  if (Constant *FC = ConstantFoldGetElementPtr(Ty, C, InRange, Idxs))
     return FC; // Fold a few common cases.
 
   assert(GetElementPtrInst::getIndexedType(Ty, Idxs) && "GEP indices invalid!");

@llvmbot
Copy link
Collaborator

llvmbot commented Jun 11, 2024

@llvm/pr-subscribers-llvm-ir

Author: Nikita Popov (nikic)

Changes

This is a followup to #93823 and drops the DataLayout-unaware GEP of GEP fold entirely. All cases are now left to the DataLayout-aware constant folder, which will fold everything to a single i8 GEP.

We didn't have any test coverage for this fold in LLVM, but some Clang tests change.


Full diff: https://github.com/llvm/llvm-project/pull/95126.diff

10 Files Affected:

  • (modified) clang/test/CodeGenCUDA/managed-var.cu (+9-7)
  • (modified) clang/test/CodeGenCXX/2011-12-19-init-list-ctor.cpp (+2-2)
  • (modified) clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist-pr12086.cpp (+4-4)
  • (modified) clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp (+2-2)
  • (modified) clang/test/CodeGenCXX/ms-inline-asm-fields.cpp (+1-1)
  • (modified) clang/test/OpenMP/threadprivate_codegen.cpp (+8-8)
  • (modified) llvm/include/llvm/IR/ConstantFold.h (+1-1)
  • (modified) llvm/lib/Analysis/InstructionSimplify.cpp (+2-3)
  • (modified) llvm/lib/IR/ConstantFold.cpp (-33)
  • (modified) llvm/lib/IR/Constants.cpp (+1-2)
diff --git a/clang/test/CodeGenCUDA/managed-var.cu b/clang/test/CodeGenCUDA/managed-var.cu
index 5206acc62fe00..07e1a1e692c75 100644
--- a/clang/test/CodeGenCUDA/managed-var.cu
+++ b/clang/test/CodeGenCUDA/managed-var.cu
@@ -127,9 +127,10 @@ __device__ __host__ float load2() {
 
 // HOST-LABEL: define {{.*}}@_Z5load3v()
 // HOST:  %ld.managed = load ptr, ptr @v2, align 16
-// HOST:  %0 = getelementptr inbounds [100 x %struct.vec], ptr %ld.managed, i64 0, i64 1, i32 1
-// HOST:  %1 = load float, ptr %0, align 4
-// HOST:  ret float %1
+// HOST:  %0 = getelementptr inbounds [100 x %struct.vec], ptr %ld.managed, i64 0, i64 1
+// HOST:  %1 = getelementptr inbounds %struct.vec, ptr %0, i32 0, i32 1
+// HOST:  %2 = load float, ptr %1, align 4
+// HOST:  ret float %2
 float load3() {
   return v2[1].y;
 }
@@ -139,10 +140,11 @@ float load3() {
 // HOST:  %0 = getelementptr inbounds [100 x %struct.vec], ptr %ld.managed, i64 0, i64 1
 // HOST:  %1 = ptrtoint ptr %0 to i64
 // HOST:  %ld.managed1 = load ptr, ptr @v2, align 16
-// HOST:  %2 = getelementptr inbounds [100 x %struct.vec], ptr %ld.managed1, i64 0, i64 1, i32 1
-// HOST:  %3 = ptrtoint ptr %2 to i64
-// HOST:  %4 = sub i64 %3, %1
-// HOST:  %sub.ptr.div = sdiv exact i64 %4, 4
+// HOST:  %2 = getelementptr inbounds [100 x %struct.vec], ptr %ld.managed1, i64 0, i64 1
+// HOST:  %3 = getelementptr inbounds %struct.vec, ptr %2, i32 0, i32 1
+// HOST:  %4 = ptrtoint ptr %3 to i64
+// HOST:  %5 = sub i64 %4, %1
+// HOST:  %sub.ptr.div = sdiv exact i64 %5, 4
 // HOST:  %conv = sitofp i64 %sub.ptr.div to float
 // HOST:  ret float %conv
 float addr_taken2() {
diff --git a/clang/test/CodeGenCXX/2011-12-19-init-list-ctor.cpp b/clang/test/CodeGenCXX/2011-12-19-init-list-ctor.cpp
index 4033adc8f0390..14557829268ef 100644
--- a/clang/test/CodeGenCXX/2011-12-19-init-list-ctor.cpp
+++ b/clang/test/CodeGenCXX/2011-12-19-init-list-ctor.cpp
@@ -21,6 +21,6 @@ struct S {
 // CHECK: store i32 0, ptr @arr
 // CHECK: call void @_ZN1AC1EPKc(ptr {{[^,]*}} getelementptr inbounds (%struct.S, ptr @arr, i32 0, i32 1), ptr noundef @.str)
 // CHECK: store i32 1, ptr getelementptr inbounds (%struct.S, ptr @arr, i64 1)
-// CHECK: call void @_ZN1AC1EPKc(ptr {{[^,]*}} getelementptr inbounds (%struct.S, ptr @arr, i64 1, i32 1), ptr noundef @.str.1)
+// CHECK: call void @_ZN1AC1EPKc(ptr {{[^,]*}} getelementptr inbounds (%struct.S, ptr getelementptr inbounds (%struct.S, ptr @arr, i64 1), i32 0, i32 1), ptr noundef @.str.1)
 // CHECK: store i32 2, ptr getelementptr inbounds (%struct.S, ptr @arr, i64 2)
-// CHECK: call void @_ZN1AC1EPKc(ptr {{[^,]*}} getelementptr inbounds (%struct.S, ptr @arr, i64 2, i32 1), ptr noundef @.str.2)
+// CHECK: call void @_ZN1AC1EPKc(ptr {{[^,]*}} getelementptr inbounds (%struct.S, ptr getelementptr inbounds (%struct.S, ptr @arr, i64 2), i32 0, i32 1), ptr noundef @.str.2)
diff --git a/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist-pr12086.cpp b/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist-pr12086.cpp
index 6fbe4c7fd17a7..caa92f47a93c2 100644
--- a/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist-pr12086.cpp
+++ b/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist-pr12086.cpp
@@ -79,12 +79,12 @@ std::initializer_list<std::initializer_list<int>> nested = {
 // CHECK-DYNAMIC-BL: store i32 {{.*}}, ptr getelementptr inbounds (i32, ptr @_ZGR6nested1_, i64 1)
 // CHECK-DYNAMIC-BL: store ptr @_ZGR6nested1_,
 // CHECK-DYNAMIC-BL:       ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 1), align 8
-// CHECK-DYNAMIC-BL: store i64 2, ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 1, i32 1), align 8
+// CHECK-DYNAMIC-BL: store i64 2, ptr getelementptr inbounds ({{.*}}, ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 1), i32 0, i32 1), align 8
 // CHECK-DYNAMIC-BL: store i32 5, ptr @_ZGR6nested2_
 // CHECK-DYNAMIC-BL: store i32 {{.*}}, ptr getelementptr inbounds (i32, ptr @_ZGR6nested2_, i64 1)
 // CHECK-DYNAMIC-BL: store ptr @_ZGR6nested2_,
 // CHECK-DYNAMIC-BL:       ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 2), align 8
-// CHECK-DYNAMIC-BL: store i64 2, ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 2, i32 1), align 8
+// CHECK-DYNAMIC-BL: store i64 2, ptr getelementptr inbounds ({{.*}}, ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 2), i32 0, i32 1), align 8
 // CHECK-DYNAMIC-BL: store ptr @_ZGR6nested_,
 // CHECK-DYNAMIC-BL:       ptr @nested, align 8
 // CHECK-DYNAMIC-BL: store i64 3, ptr getelementptr inbounds ({{.*}}, ptr @nested, i32 0, i32 1), align 8
@@ -119,13 +119,13 @@ std::initializer_list<std::initializer_list<int>> nested = {
 // CHECK-DYNAMIC-BE: store ptr @_ZGR6nested1_,
 // CHECK-DYNAMIC-BE:       ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 1), align 8
 // CHECK-DYNAMIC-BE: store ptr getelementptr inbounds ([2 x i32], ptr @_ZGR6nested1_, i64 0, i64 2),
-// CHECK-DYNAMIC-BE:       ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 1, i32 1), align 8
+// CHECK-DYNAMIC-BE:       ptr getelementptr inbounds ({{.*}}, ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 1), i32 0, i32 1), align 8
 // CHECK-DYNAMIC-BE: store i32 5, ptr @_ZGR6nested2_
 // CHECK-DYNAMIC-BE: store i32 {{.*}}, ptr getelementptr inbounds (i32, ptr @_ZGR6nested2_, i64 1)
 // CHECK-DYNAMIC-BE: store ptr @_ZGR6nested2_,
 // CHECK-DYNAMIC-BE:       ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 2), align 8
 // CHECK-DYNAMIC-BE: store ptr getelementptr inbounds ([2 x i32], ptr @_ZGR6nested2_, i64 0, i64 2),
-// CHECK-DYNAMIC-BE:       ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 2, i32 1), align 8
+// CHECK-DYNAMIC-BE:       ptr getelementptr inbounds ({{.*}}, ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 2), i32 0, i32 1), align 8
 // CHECK-DYNAMIC-BE: store ptr @_ZGR6nested_,
 // CHECK-DYNAMIC-BE:       ptr @nested, align 8
 // CHECK-DYNAMIC-BE: store ptr getelementptr inbounds ([3 x {{.*}}], ptr @_ZGR6nested_, i64 0, i64 3),
diff --git a/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp b/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp
index 3d0cf968bfd54..ef05f0334cd75 100644
--- a/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp
+++ b/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp
@@ -365,12 +365,12 @@ namespace partly_constant {
   //
   // Second init list.
   // CHECK: store ptr {{.*}}@[[PARTLY_CONSTANT_SECOND]]{{.*}}, ptr getelementptr inbounds ({{.*}}, ptr {{.*}}@[[PARTLY_CONSTANT_INNER]]{{.*}}, i64 1)
-  // CHECK: store i64 2, ptr getelementptr inbounds ({{.*}}, ptr {{.*}}@[[PARTLY_CONSTANT_INNER]]{{.*}}, i64 1, i32 1)
+  // CHECK: store i64 2, ptr getelementptr inbounds ({{.*}}, ptr getelementptr inbounds ({{.*}}, ptr {{.*}}@[[PARTLY_CONSTANT_INNER]]{{.*}}, i64 1), i32 0, i32 1)
   //
   // Third init list.
   // CHECK-NOT: @[[PARTLY_CONSTANT_THIRD]],
   // CHECK: store ptr {{.*}}@[[PARTLY_CONSTANT_THIRD]]{{.*}}, ptr getelementptr inbounds ({{.*}}, ptr {{.*}}@[[PARTLY_CONSTANT_INNER]]{{.*}}, i64 2)
-  // CHECK: store i64 4, ptr getelementptr inbounds ({{.*}}, ptr {{.*}}@[[PARTLY_CONSTANT_INNER]]{{.*}}, i64 2, i32 1)
+  // CHECK: store i64 4, ptr getelementptr inbounds ({{.*}}, ptr getelementptr inbounds ({{.*}}, ptr {{.*}}@[[PARTLY_CONSTANT_INNER]]{{.*}}, i64 2), i32 0, i32 1)
   // CHECK-NOT: @[[PARTLY_CONSTANT_THIRD]],
   //
   // Outer init list.
diff --git a/clang/test/CodeGenCXX/ms-inline-asm-fields.cpp b/clang/test/CodeGenCXX/ms-inline-asm-fields.cpp
index 403e9c8427760..e3441d0b4614e 100644
--- a/clang/test/CodeGenCXX/ms-inline-asm-fields.cpp
+++ b/clang/test/CodeGenCXX/ms-inline-asm-fields.cpp
@@ -24,7 +24,7 @@ extern "C" int test_param_field(A p) {
 
 extern "C" int test_namespace_global() {
 // CHECK: define{{.*}} i32 @test_namespace_global()
-// CHECK: call i32 asm sideeffect inteldialect "mov eax, $1", "{{.*}}"(ptr elementtype(i32) getelementptr inbounds (%struct.A, ptr @_ZN4asdf8a_globalE, i32 0, i32 2, i32 1))
+// CHECK: call i32 asm sideeffect inteldialect "mov eax, $1", "{{.*}}"(ptr elementtype(i32) getelementptr inbounds (%"struct.A::B", ptr getelementptr inbounds (%struct.A, ptr @_ZN4asdf8a_globalE, i32 0, i32 2), i32 0, i32 1))
 // CHECK: ret i32
   __asm mov eax, asdf::a_global.a3.b2
 }
diff --git a/clang/test/OpenMP/threadprivate_codegen.cpp b/clang/test/OpenMP/threadprivate_codegen.cpp
index 2ee328008cc4b..7a6269954d39e 100644
--- a/clang/test/OpenMP/threadprivate_codegen.cpp
+++ b/clang/test/OpenMP/threadprivate_codegen.cpp
@@ -2586,7 +2586,7 @@ int foobar() {
 // SIMD1-NEXT:    [[TMP12:%.*]] = load i32, ptr [[RES]], align 4
 // SIMD1-NEXT:    [[ADD3:%.*]] = add nsw i32 [[TMP12]], [[TMP11]]
 // SIMD1-NEXT:    store i32 [[ADD3]], ptr [[RES]], align 4
-// SIMD1-NEXT:    [[TMP13:%.*]] = load i32, ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1, i64 1), align 4
+// SIMD1-NEXT:    [[TMP13:%.*]] = load i32, ptr getelementptr inbounds ([3 x %struct.S1], ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1), i64 0, i64 1), align 4
 // SIMD1-NEXT:    [[TMP14:%.*]] = load i32, ptr [[RES]], align 4
 // SIMD1-NEXT:    [[ADD4:%.*]] = add nsw i32 [[TMP14]], [[TMP13]]
 // SIMD1-NEXT:    store i32 [[ADD4]], ptr [[RES]], align 4
@@ -2663,7 +2663,7 @@ int foobar() {
 // SIMD1-NEXT:    [[TMP6:%.*]] = load i32, ptr [[RES]], align 4
 // SIMD1-NEXT:    [[ADD2:%.*]] = add nsw i32 [[TMP6]], [[TMP5]]
 // SIMD1-NEXT:    store i32 [[ADD2]], ptr [[RES]], align 4
-// SIMD1-NEXT:    [[TMP7:%.*]] = load i32, ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1, i64 1), align 4
+// SIMD1-NEXT:    [[TMP7:%.*]] = load i32, ptr getelementptr inbounds ([3 x %struct.S1], ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1), i64 0, i64 1), align 4
 // SIMD1-NEXT:    [[TMP8:%.*]] = load i32, ptr [[RES]], align 4
 // SIMD1-NEXT:    [[ADD3:%.*]] = add nsw i32 [[TMP8]], [[TMP7]]
 // SIMD1-NEXT:    store i32 [[ADD3]], ptr [[RES]], align 4
@@ -3052,7 +3052,7 @@ int foobar() {
 // SIMD2-NEXT:    [[TMP12:%.*]] = load i32, ptr [[RES]], align 4, !dbg [[DBG187:![0-9]+]]
 // SIMD2-NEXT:    [[ADD3:%.*]] = add nsw i32 [[TMP12]], [[TMP11]], !dbg [[DBG187]]
 // SIMD2-NEXT:    store i32 [[ADD3]], ptr [[RES]], align 4, !dbg [[DBG187]]
-// SIMD2-NEXT:    [[TMP13:%.*]] = load i32, ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1, i64 1), align 4, !dbg [[DBG188:![0-9]+]]
+// SIMD2-NEXT:    [[TMP13:%.*]] = load i32, ptr getelementptr inbounds ([3 x %struct.S1], ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1), i64 0, i64 1), align 4, !dbg [[DBG188:![0-9]+]]
 // SIMD2-NEXT:    [[TMP14:%.*]] = load i32, ptr [[RES]], align 4, !dbg [[DBG189:![0-9]+]]
 // SIMD2-NEXT:    [[ADD4:%.*]] = add nsw i32 [[TMP14]], [[TMP13]], !dbg [[DBG189]]
 // SIMD2-NEXT:    store i32 [[ADD4]], ptr [[RES]], align 4, !dbg [[DBG189]]
@@ -3133,7 +3133,7 @@ int foobar() {
 // SIMD2-NEXT:    [[TMP6:%.*]] = load i32, ptr [[RES]], align 4, !dbg [[DBG222:![0-9]+]]
 // SIMD2-NEXT:    [[ADD2:%.*]] = add nsw i32 [[TMP6]], [[TMP5]], !dbg [[DBG222]]
 // SIMD2-NEXT:    store i32 [[ADD2]], ptr [[RES]], align 4, !dbg [[DBG222]]
-// SIMD2-NEXT:    [[TMP7:%.*]] = load i32, ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1, i64 1), align 4, !dbg [[DBG223:![0-9]+]]
+// SIMD2-NEXT:    [[TMP7:%.*]] = load i32, ptr getelementptr inbounds ([3 x %struct.S1], ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1), i64 0, i64 1), align 4, !dbg [[DBG223:![0-9]+]]
 // SIMD2-NEXT:    [[TMP8:%.*]] = load i32, ptr [[RES]], align 4, !dbg [[DBG224:![0-9]+]]
 // SIMD2-NEXT:    [[ADD3:%.*]] = add nsw i32 [[TMP8]], [[TMP7]], !dbg [[DBG224]]
 // SIMD2-NEXT:    store i32 [[ADD3]], ptr [[RES]], align 4, !dbg [[DBG224]]
@@ -5707,7 +5707,7 @@ int foobar() {
 // SIMD3-NEXT:    [[TMP12:%.*]] = load i32, ptr [[RES]], align 4
 // SIMD3-NEXT:    [[ADD3:%.*]] = add nsw i32 [[TMP12]], [[TMP11]]
 // SIMD3-NEXT:    store i32 [[ADD3]], ptr [[RES]], align 4
-// SIMD3-NEXT:    [[TMP13:%.*]] = load i32, ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1, i64 1), align 4
+// SIMD3-NEXT:    [[TMP13:%.*]] = load i32, ptr getelementptr inbounds ([3 x %struct.S1], ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1), i64 0, i64 1), align 4
 // SIMD3-NEXT:    [[TMP14:%.*]] = load i32, ptr [[RES]], align 4
 // SIMD3-NEXT:    [[ADD4:%.*]] = add nsw i32 [[TMP14]], [[TMP13]]
 // SIMD3-NEXT:    store i32 [[ADD4]], ptr [[RES]], align 4
@@ -5784,7 +5784,7 @@ int foobar() {
 // SIMD3-NEXT:    [[TMP6:%.*]] = load i32, ptr [[RES]], align 4
 // SIMD3-NEXT:    [[ADD2:%.*]] = add nsw i32 [[TMP6]], [[TMP5]]
 // SIMD3-NEXT:    store i32 [[ADD2]], ptr [[RES]], align 4
-// SIMD3-NEXT:    [[TMP7:%.*]] = load i32, ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1, i64 1), align 4
+// SIMD3-NEXT:    [[TMP7:%.*]] = load i32, ptr getelementptr inbounds ([3 x %struct.S1], ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1), i64 0, i64 1), align 4
 // SIMD3-NEXT:    [[TMP8:%.*]] = load i32, ptr [[RES]], align 4
 // SIMD3-NEXT:    [[ADD3:%.*]] = add nsw i32 [[TMP8]], [[TMP7]]
 // SIMD3-NEXT:    store i32 [[ADD3]], ptr [[RES]], align 4
@@ -6173,7 +6173,7 @@ int foobar() {
 // SIMD4-NEXT:    [[TMP12:%.*]] = load i32, ptr [[RES]], align 4, !dbg [[DBG187:![0-9]+]]
 // SIMD4-NEXT:    [[ADD3:%.*]] = add nsw i32 [[TMP12]], [[TMP11]], !dbg [[DBG187]]
 // SIMD4-NEXT:    store i32 [[ADD3]], ptr [[RES]], align 4, !dbg [[DBG187]]
-// SIMD4-NEXT:    [[TMP13:%.*]] = load i32, ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1, i64 1), align 4, !dbg [[DBG188:![0-9]+]]
+// SIMD4-NEXT:    [[TMP13:%.*]] = load i32, ptr getelementptr inbounds ([3 x %struct.S1], ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1), i64 0, i64 1), align 4, !dbg [[DBG188:![0-9]+]]
 // SIMD4-NEXT:    [[TMP14:%.*]] = load i32, ptr [[RES]], align 4, !dbg [[DBG189:![0-9]+]]
 // SIMD4-NEXT:    [[ADD4:%.*]] = add nsw i32 [[TMP14]], [[TMP13]], !dbg [[DBG189]]
 // SIMD4-NEXT:    store i32 [[ADD4]], ptr [[RES]], align 4, !dbg [[DBG189]]
@@ -6254,7 +6254,7 @@ int foobar() {
 // SIMD4-NEXT:    [[TMP6:%.*]] = load i32, ptr [[RES]], align 4, !dbg [[DBG222:![0-9]+]]
 // SIMD4-NEXT:    [[ADD2:%.*]] = add nsw i32 [[TMP6]], [[TMP5]], !dbg [[DBG222]]
 // SIMD4-NEXT:    store i32 [[ADD2]], ptr [[RES]], align 4, !dbg [[DBG222]]
-// SIMD4-NEXT:    [[TMP7:%.*]] = load i32, ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1, i64 1), align 4, !dbg [[DBG223:![0-9]+]]
+// SIMD4-NEXT:    [[TMP7:%.*]] = load i32, ptr getelementptr inbounds ([3 x %struct.S1], ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1), i64 0, i64 1), align 4, !dbg [[DBG223:![0-9]+]]
 // SIMD4-NEXT:    [[TMP8:%.*]] = load i32, ptr [[RES]], align 4, !dbg [[DBG224:![0-9]+]]
 // SIMD4-NEXT:    [[ADD3:%.*]] = add nsw i32 [[TMP8]], [[TMP7]], !dbg [[DBG224]]
 // SIMD4-NEXT:    store i32 [[ADD3]], ptr [[RES]], align 4, !dbg [[DBG224]]
diff --git a/llvm/include/llvm/IR/ConstantFold.h b/llvm/include/llvm/IR/ConstantFold.h
index 9b3c8a0e5a632..42043d365b2d3 100644
--- a/llvm/include/llvm/IR/ConstantFold.h
+++ b/llvm/include/llvm/IR/ConstantFold.h
@@ -52,7 +52,7 @@ namespace llvm {
                                           Constant *V2);
   Constant *ConstantFoldCompareInstruction(CmpInst::Predicate Predicate,
                                            Constant *C1, Constant *C2);
-  Constant *ConstantFoldGetElementPtr(Type *Ty, Constant *C, bool InBounds,
+  Constant *ConstantFoldGetElementPtr(Type *Ty, Constant *C,
                                       std::optional<ConstantRange> InRange,
                                       ArrayRef<Value *> Idxs);
 } // End llvm namespace
diff --git a/llvm/lib/Analysis/InstructionSimplify.cpp b/llvm/lib/Analysis/InstructionSimplify.cpp
index 8b2aa6b9f18b0..00895860ca49b 100644
--- a/llvm/lib/Analysis/InstructionSimplify.cpp
+++ b/llvm/lib/Analysis/InstructionSimplify.cpp
@@ -5068,9 +5068,8 @@ static Value *simplifyGEPInst(Type *SrcTy, Value *Ptr,
     return nullptr;
 
   if (!ConstantExpr::isSupportedGetElementPtr(SrcTy))
-    // TODO(gep_nowrap): Pass on the whole GEPNoWrapFlags.
-    return ConstantFoldGetElementPtr(SrcTy, cast<Constant>(Ptr),
-                                     NW.isInBounds(), std::nullopt, Indices);
+    return ConstantFoldGetElementPtr(SrcTy, cast<Constant>(Ptr), std::nullopt,
+                                     Indices);
 
   auto *CE =
       ConstantExpr::getGetElementPtr(SrcTy, cast<Constant>(Ptr), Indices, NW);
diff --git a/llvm/lib/IR/ConstantFold.cpp b/llvm/lib/IR/ConstantFold.cpp
index 77a833610a3a9..34bcf36ec212c 100644
--- a/llvm/lib/IR/ConstantFold.cpp
+++ b/llvm/lib/IR/ConstantFold.cpp
@@ -1404,35 +1404,7 @@ Constant *llvm::ConstantFoldCompareInstruction(CmpInst::Predicate Predicate,
   return nullptr;
 }
 
-// Combine Indices - If the source pointer to this getelementptr instruction
-// is a getelementptr instruction, combine the indices of the two
-// getelementptr instructions into a single instruction.
-static Constant *foldGEPOfGEP(GEPOperator *GEP, Type *PointeeTy, bool InBounds,
-                              ArrayRef<Value *> Idxs) {
-  if (PointeeTy != GEP->getResultElementType())
-    return nullptr;
-
-  // Leave inrange handling to DL-aware constant folding.
-  if (GEP->getInRange())
-    return nullptr;
-
-  // Only handle simple case with leading zero index. We cannot perform an
-  // actual addition as we don't know the correct index type size to use.
-  Constant *Idx0 = cast<Constant>(Idxs[0]);
-  if (!Idx0->isNullValue())
-    return nullptr;
-
-  SmallVector<Value*, 16> NewIndices;
-  NewIndices.reserve(Idxs.size() + GEP->getNumIndices());
-  NewIndices.append(GEP->idx_begin(), GEP->idx_end());
-  NewIndices.append(Idxs.begin() + 1, Idxs.end());
-  return ConstantExpr::getGetElementPtr(
-      GEP->getSourceElementType(), cast<Constant>(GEP->getPointerOperand()),
-      NewIndices, InBounds && GEP->isInBounds());
-}
-
 Constant *llvm::ConstantFoldGetElementPtr(Type *PointeeTy, Constant *C,
-                                          bool InBounds,
                                           std::optional<ConstantRange> InRange,
                                           ArrayRef<Value *> Idxs) {
   if (Idxs.empty()) return C;
@@ -1462,10 +1434,5 @@ Constant *llvm::ConstantFoldGetElementPtr(Type *PointeeTy, Constant *C,
                      cast<VectorType>(GEPTy)->getElementCount(), C)
                : C;
 
-  if (ConstantExpr *CE = dyn_cast<ConstantExpr>(C))
-    if (auto *GEP = dyn_cast<GEPOperator>(CE))
-      if (Constant *C = foldGEPOfGEP(GEP, PointeeTy, InBounds, Idxs))
-        return C;
-
   return nullptr;
 }
diff --git a/llvm/lib/IR/Constants.cpp b/llvm/lib/IR/Constants.cpp
index a76be441875a1..d07907372f0e4 100644
--- a/llvm/lib/IR/Constants.cpp
+++ b/llvm/lib/IR/Constants.cpp
@@ -2438,8 +2438,7 @@ Constant *ConstantExpr::getGetElementPtr(Type *Ty, Constant *C,
   assert(Ty && "Must specify element type");
   assert(isSupportedGetElementPtr(Ty) && "Element type is unsupported!");
 
-  if (Constant *FC =
-          ConstantFoldGetElementPtr(Ty, C, NW.isInBounds(), InRange, Idxs))
+  if (Constant *FC = ConstantFoldGetElementPtr(Ty, C, InRange, Idxs))
     return FC; // Fold a few common cases.
 
   assert(GetElementPtrInst::getIndexedType(Ty, Idxs) && "GEP indices invalid!");

@llvmbot
Copy link
Collaborator

llvmbot commented Jun 11, 2024

@llvm/pr-subscribers-clang

Author: Nikita Popov (nikic)

Changes

This is a followup to #93823 and drops the DataLayout-unaware GEP of GEP fold entirely. All cases are now left to the DataLayout-aware constant folder, which will fold everything to a single i8 GEP.

We didn't have any test coverage for this fold in LLVM, but some Clang tests change.


Full diff: https://github.com/llvm/llvm-project/pull/95126.diff

10 Files Affected:

  • (modified) clang/test/CodeGenCUDA/managed-var.cu (+9-7)
  • (modified) clang/test/CodeGenCXX/2011-12-19-init-list-ctor.cpp (+2-2)
  • (modified) clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist-pr12086.cpp (+4-4)
  • (modified) clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp (+2-2)
  • (modified) clang/test/CodeGenCXX/ms-inline-asm-fields.cpp (+1-1)
  • (modified) clang/test/OpenMP/threadprivate_codegen.cpp (+8-8)
  • (modified) llvm/include/llvm/IR/ConstantFold.h (+1-1)
  • (modified) llvm/lib/Analysis/InstructionSimplify.cpp (+2-3)
  • (modified) llvm/lib/IR/ConstantFold.cpp (-33)
  • (modified) llvm/lib/IR/Constants.cpp (+1-2)
diff --git a/clang/test/CodeGenCUDA/managed-var.cu b/clang/test/CodeGenCUDA/managed-var.cu
index 5206acc62fe00..07e1a1e692c75 100644
--- a/clang/test/CodeGenCUDA/managed-var.cu
+++ b/clang/test/CodeGenCUDA/managed-var.cu
@@ -127,9 +127,10 @@ __device__ __host__ float load2() {
 
 // HOST-LABEL: define {{.*}}@_Z5load3v()
 // HOST:  %ld.managed = load ptr, ptr @v2, align 16
-// HOST:  %0 = getelementptr inbounds [100 x %struct.vec], ptr %ld.managed, i64 0, i64 1, i32 1
-// HOST:  %1 = load float, ptr %0, align 4
-// HOST:  ret float %1
+// HOST:  %0 = getelementptr inbounds [100 x %struct.vec], ptr %ld.managed, i64 0, i64 1
+// HOST:  %1 = getelementptr inbounds %struct.vec, ptr %0, i32 0, i32 1
+// HOST:  %2 = load float, ptr %1, align 4
+// HOST:  ret float %2
 float load3() {
   return v2[1].y;
 }
@@ -139,10 +140,11 @@ float load3() {
 // HOST:  %0 = getelementptr inbounds [100 x %struct.vec], ptr %ld.managed, i64 0, i64 1
 // HOST:  %1 = ptrtoint ptr %0 to i64
 // HOST:  %ld.managed1 = load ptr, ptr @v2, align 16
-// HOST:  %2 = getelementptr inbounds [100 x %struct.vec], ptr %ld.managed1, i64 0, i64 1, i32 1
-// HOST:  %3 = ptrtoint ptr %2 to i64
-// HOST:  %4 = sub i64 %3, %1
-// HOST:  %sub.ptr.div = sdiv exact i64 %4, 4
+// HOST:  %2 = getelementptr inbounds [100 x %struct.vec], ptr %ld.managed1, i64 0, i64 1
+// HOST:  %3 = getelementptr inbounds %struct.vec, ptr %2, i32 0, i32 1
+// HOST:  %4 = ptrtoint ptr %3 to i64
+// HOST:  %5 = sub i64 %4, %1
+// HOST:  %sub.ptr.div = sdiv exact i64 %5, 4
 // HOST:  %conv = sitofp i64 %sub.ptr.div to float
 // HOST:  ret float %conv
 float addr_taken2() {
diff --git a/clang/test/CodeGenCXX/2011-12-19-init-list-ctor.cpp b/clang/test/CodeGenCXX/2011-12-19-init-list-ctor.cpp
index 4033adc8f0390..14557829268ef 100644
--- a/clang/test/CodeGenCXX/2011-12-19-init-list-ctor.cpp
+++ b/clang/test/CodeGenCXX/2011-12-19-init-list-ctor.cpp
@@ -21,6 +21,6 @@ struct S {
 // CHECK: store i32 0, ptr @arr
 // CHECK: call void @_ZN1AC1EPKc(ptr {{[^,]*}} getelementptr inbounds (%struct.S, ptr @arr, i32 0, i32 1), ptr noundef @.str)
 // CHECK: store i32 1, ptr getelementptr inbounds (%struct.S, ptr @arr, i64 1)
-// CHECK: call void @_ZN1AC1EPKc(ptr {{[^,]*}} getelementptr inbounds (%struct.S, ptr @arr, i64 1, i32 1), ptr noundef @.str.1)
+// CHECK: call void @_ZN1AC1EPKc(ptr {{[^,]*}} getelementptr inbounds (%struct.S, ptr getelementptr inbounds (%struct.S, ptr @arr, i64 1), i32 0, i32 1), ptr noundef @.str.1)
 // CHECK: store i32 2, ptr getelementptr inbounds (%struct.S, ptr @arr, i64 2)
-// CHECK: call void @_ZN1AC1EPKc(ptr {{[^,]*}} getelementptr inbounds (%struct.S, ptr @arr, i64 2, i32 1), ptr noundef @.str.2)
+// CHECK: call void @_ZN1AC1EPKc(ptr {{[^,]*}} getelementptr inbounds (%struct.S, ptr getelementptr inbounds (%struct.S, ptr @arr, i64 2), i32 0, i32 1), ptr noundef @.str.2)
diff --git a/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist-pr12086.cpp b/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist-pr12086.cpp
index 6fbe4c7fd17a7..caa92f47a93c2 100644
--- a/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist-pr12086.cpp
+++ b/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist-pr12086.cpp
@@ -79,12 +79,12 @@ std::initializer_list<std::initializer_list<int>> nested = {
 // CHECK-DYNAMIC-BL: store i32 {{.*}}, ptr getelementptr inbounds (i32, ptr @_ZGR6nested1_, i64 1)
 // CHECK-DYNAMIC-BL: store ptr @_ZGR6nested1_,
 // CHECK-DYNAMIC-BL:       ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 1), align 8
-// CHECK-DYNAMIC-BL: store i64 2, ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 1, i32 1), align 8
+// CHECK-DYNAMIC-BL: store i64 2, ptr getelementptr inbounds ({{.*}}, ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 1), i32 0, i32 1), align 8
 // CHECK-DYNAMIC-BL: store i32 5, ptr @_ZGR6nested2_
 // CHECK-DYNAMIC-BL: store i32 {{.*}}, ptr getelementptr inbounds (i32, ptr @_ZGR6nested2_, i64 1)
 // CHECK-DYNAMIC-BL: store ptr @_ZGR6nested2_,
 // CHECK-DYNAMIC-BL:       ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 2), align 8
-// CHECK-DYNAMIC-BL: store i64 2, ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 2, i32 1), align 8
+// CHECK-DYNAMIC-BL: store i64 2, ptr getelementptr inbounds ({{.*}}, ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 2), i32 0, i32 1), align 8
 // CHECK-DYNAMIC-BL: store ptr @_ZGR6nested_,
 // CHECK-DYNAMIC-BL:       ptr @nested, align 8
 // CHECK-DYNAMIC-BL: store i64 3, ptr getelementptr inbounds ({{.*}}, ptr @nested, i32 0, i32 1), align 8
@@ -119,13 +119,13 @@ std::initializer_list<std::initializer_list<int>> nested = {
 // CHECK-DYNAMIC-BE: store ptr @_ZGR6nested1_,
 // CHECK-DYNAMIC-BE:       ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 1), align 8
 // CHECK-DYNAMIC-BE: store ptr getelementptr inbounds ([2 x i32], ptr @_ZGR6nested1_, i64 0, i64 2),
-// CHECK-DYNAMIC-BE:       ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 1, i32 1), align 8
+// CHECK-DYNAMIC-BE:       ptr getelementptr inbounds ({{.*}}, ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 1), i32 0, i32 1), align 8
 // CHECK-DYNAMIC-BE: store i32 5, ptr @_ZGR6nested2_
 // CHECK-DYNAMIC-BE: store i32 {{.*}}, ptr getelementptr inbounds (i32, ptr @_ZGR6nested2_, i64 1)
 // CHECK-DYNAMIC-BE: store ptr @_ZGR6nested2_,
 // CHECK-DYNAMIC-BE:       ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 2), align 8
 // CHECK-DYNAMIC-BE: store ptr getelementptr inbounds ([2 x i32], ptr @_ZGR6nested2_, i64 0, i64 2),
-// CHECK-DYNAMIC-BE:       ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 2, i32 1), align 8
+// CHECK-DYNAMIC-BE:       ptr getelementptr inbounds ({{.*}}, ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 2), i32 0, i32 1), align 8
 // CHECK-DYNAMIC-BE: store ptr @_ZGR6nested_,
 // CHECK-DYNAMIC-BE:       ptr @nested, align 8
 // CHECK-DYNAMIC-BE: store ptr getelementptr inbounds ([3 x {{.*}}], ptr @_ZGR6nested_, i64 0, i64 3),
diff --git a/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp b/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp
index 3d0cf968bfd54..ef05f0334cd75 100644
--- a/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp
+++ b/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp
@@ -365,12 +365,12 @@ namespace partly_constant {
   //
   // Second init list.
   // CHECK: store ptr {{.*}}@[[PARTLY_CONSTANT_SECOND]]{{.*}}, ptr getelementptr inbounds ({{.*}}, ptr {{.*}}@[[PARTLY_CONSTANT_INNER]]{{.*}}, i64 1)
-  // CHECK: store i64 2, ptr getelementptr inbounds ({{.*}}, ptr {{.*}}@[[PARTLY_CONSTANT_INNER]]{{.*}}, i64 1, i32 1)
+  // CHECK: store i64 2, ptr getelementptr inbounds ({{.*}}, ptr getelementptr inbounds ({{.*}}, ptr {{.*}}@[[PARTLY_CONSTANT_INNER]]{{.*}}, i64 1), i32 0, i32 1)
   //
   // Third init list.
   // CHECK-NOT: @[[PARTLY_CONSTANT_THIRD]],
   // CHECK: store ptr {{.*}}@[[PARTLY_CONSTANT_THIRD]]{{.*}}, ptr getelementptr inbounds ({{.*}}, ptr {{.*}}@[[PARTLY_CONSTANT_INNER]]{{.*}}, i64 2)
-  // CHECK: store i64 4, ptr getelementptr inbounds ({{.*}}, ptr {{.*}}@[[PARTLY_CONSTANT_INNER]]{{.*}}, i64 2, i32 1)
+  // CHECK: store i64 4, ptr getelementptr inbounds ({{.*}}, ptr getelementptr inbounds ({{.*}}, ptr {{.*}}@[[PARTLY_CONSTANT_INNER]]{{.*}}, i64 2), i32 0, i32 1)
   // CHECK-NOT: @[[PARTLY_CONSTANT_THIRD]],
   //
   // Outer init list.
diff --git a/clang/test/CodeGenCXX/ms-inline-asm-fields.cpp b/clang/test/CodeGenCXX/ms-inline-asm-fields.cpp
index 403e9c8427760..e3441d0b4614e 100644
--- a/clang/test/CodeGenCXX/ms-inline-asm-fields.cpp
+++ b/clang/test/CodeGenCXX/ms-inline-asm-fields.cpp
@@ -24,7 +24,7 @@ extern "C" int test_param_field(A p) {
 
 extern "C" int test_namespace_global() {
 // CHECK: define{{.*}} i32 @test_namespace_global()
-// CHECK: call i32 asm sideeffect inteldialect "mov eax, $1", "{{.*}}"(ptr elementtype(i32) getelementptr inbounds (%struct.A, ptr @_ZN4asdf8a_globalE, i32 0, i32 2, i32 1))
+// CHECK: call i32 asm sideeffect inteldialect "mov eax, $1", "{{.*}}"(ptr elementtype(i32) getelementptr inbounds (%"struct.A::B", ptr getelementptr inbounds (%struct.A, ptr @_ZN4asdf8a_globalE, i32 0, i32 2), i32 0, i32 1))
 // CHECK: ret i32
   __asm mov eax, asdf::a_global.a3.b2
 }
diff --git a/clang/test/OpenMP/threadprivate_codegen.cpp b/clang/test/OpenMP/threadprivate_codegen.cpp
index 2ee328008cc4b..7a6269954d39e 100644
--- a/clang/test/OpenMP/threadprivate_codegen.cpp
+++ b/clang/test/OpenMP/threadprivate_codegen.cpp
@@ -2586,7 +2586,7 @@ int foobar() {
 // SIMD1-NEXT:    [[TMP12:%.*]] = load i32, ptr [[RES]], align 4
 // SIMD1-NEXT:    [[ADD3:%.*]] = add nsw i32 [[TMP12]], [[TMP11]]
 // SIMD1-NEXT:    store i32 [[ADD3]], ptr [[RES]], align 4
-// SIMD1-NEXT:    [[TMP13:%.*]] = load i32, ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1, i64 1), align 4
+// SIMD1-NEXT:    [[TMP13:%.*]] = load i32, ptr getelementptr inbounds ([3 x %struct.S1], ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1), i64 0, i64 1), align 4
 // SIMD1-NEXT:    [[TMP14:%.*]] = load i32, ptr [[RES]], align 4
 // SIMD1-NEXT:    [[ADD4:%.*]] = add nsw i32 [[TMP14]], [[TMP13]]
 // SIMD1-NEXT:    store i32 [[ADD4]], ptr [[RES]], align 4
@@ -2663,7 +2663,7 @@ int foobar() {
 // SIMD1-NEXT:    [[TMP6:%.*]] = load i32, ptr [[RES]], align 4
 // SIMD1-NEXT:    [[ADD2:%.*]] = add nsw i32 [[TMP6]], [[TMP5]]
 // SIMD1-NEXT:    store i32 [[ADD2]], ptr [[RES]], align 4
-// SIMD1-NEXT:    [[TMP7:%.*]] = load i32, ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1, i64 1), align 4
+// SIMD1-NEXT:    [[TMP7:%.*]] = load i32, ptr getelementptr inbounds ([3 x %struct.S1], ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1), i64 0, i64 1), align 4
 // SIMD1-NEXT:    [[TMP8:%.*]] = load i32, ptr [[RES]], align 4
 // SIMD1-NEXT:    [[ADD3:%.*]] = add nsw i32 [[TMP8]], [[TMP7]]
 // SIMD1-NEXT:    store i32 [[ADD3]], ptr [[RES]], align 4
@@ -3052,7 +3052,7 @@ int foobar() {
 // SIMD2-NEXT:    [[TMP12:%.*]] = load i32, ptr [[RES]], align 4, !dbg [[DBG187:![0-9]+]]
 // SIMD2-NEXT:    [[ADD3:%.*]] = add nsw i32 [[TMP12]], [[TMP11]], !dbg [[DBG187]]
 // SIMD2-NEXT:    store i32 [[ADD3]], ptr [[RES]], align 4, !dbg [[DBG187]]
-// SIMD2-NEXT:    [[TMP13:%.*]] = load i32, ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1, i64 1), align 4, !dbg [[DBG188:![0-9]+]]
+// SIMD2-NEXT:    [[TMP13:%.*]] = load i32, ptr getelementptr inbounds ([3 x %struct.S1], ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1), i64 0, i64 1), align 4, !dbg [[DBG188:![0-9]+]]
 // SIMD2-NEXT:    [[TMP14:%.*]] = load i32, ptr [[RES]], align 4, !dbg [[DBG189:![0-9]+]]
 // SIMD2-NEXT:    [[ADD4:%.*]] = add nsw i32 [[TMP14]], [[TMP13]], !dbg [[DBG189]]
 // SIMD2-NEXT:    store i32 [[ADD4]], ptr [[RES]], align 4, !dbg [[DBG189]]
@@ -3133,7 +3133,7 @@ int foobar() {
 // SIMD2-NEXT:    [[TMP6:%.*]] = load i32, ptr [[RES]], align 4, !dbg [[DBG222:![0-9]+]]
 // SIMD2-NEXT:    [[ADD2:%.*]] = add nsw i32 [[TMP6]], [[TMP5]], !dbg [[DBG222]]
 // SIMD2-NEXT:    store i32 [[ADD2]], ptr [[RES]], align 4, !dbg [[DBG222]]
-// SIMD2-NEXT:    [[TMP7:%.*]] = load i32, ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1, i64 1), align 4, !dbg [[DBG223:![0-9]+]]
+// SIMD2-NEXT:    [[TMP7:%.*]] = load i32, ptr getelementptr inbounds ([3 x %struct.S1], ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1), i64 0, i64 1), align 4, !dbg [[DBG223:![0-9]+]]
 // SIMD2-NEXT:    [[TMP8:%.*]] = load i32, ptr [[RES]], align 4, !dbg [[DBG224:![0-9]+]]
 // SIMD2-NEXT:    [[ADD3:%.*]] = add nsw i32 [[TMP8]], [[TMP7]], !dbg [[DBG224]]
 // SIMD2-NEXT:    store i32 [[ADD3]], ptr [[RES]], align 4, !dbg [[DBG224]]
@@ -5707,7 +5707,7 @@ int foobar() {
 // SIMD3-NEXT:    [[TMP12:%.*]] = load i32, ptr [[RES]], align 4
 // SIMD3-NEXT:    [[ADD3:%.*]] = add nsw i32 [[TMP12]], [[TMP11]]
 // SIMD3-NEXT:    store i32 [[ADD3]], ptr [[RES]], align 4
-// SIMD3-NEXT:    [[TMP13:%.*]] = load i32, ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1, i64 1), align 4
+// SIMD3-NEXT:    [[TMP13:%.*]] = load i32, ptr getelementptr inbounds ([3 x %struct.S1], ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1), i64 0, i64 1), align 4
 // SIMD3-NEXT:    [[TMP14:%.*]] = load i32, ptr [[RES]], align 4
 // SIMD3-NEXT:    [[ADD4:%.*]] = add nsw i32 [[TMP14]], [[TMP13]]
 // SIMD3-NEXT:    store i32 [[ADD4]], ptr [[RES]], align 4
@@ -5784,7 +5784,7 @@ int foobar() {
 // SIMD3-NEXT:    [[TMP6:%.*]] = load i32, ptr [[RES]], align 4
 // SIMD3-NEXT:    [[ADD2:%.*]] = add nsw i32 [[TMP6]], [[TMP5]]
 // SIMD3-NEXT:    store i32 [[ADD2]], ptr [[RES]], align 4
-// SIMD3-NEXT:    [[TMP7:%.*]] = load i32, ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1, i64 1), align 4
+// SIMD3-NEXT:    [[TMP7:%.*]] = load i32, ptr getelementptr inbounds ([3 x %struct.S1], ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1), i64 0, i64 1), align 4
 // SIMD3-NEXT:    [[TMP8:%.*]] = load i32, ptr [[RES]], align 4
 // SIMD3-NEXT:    [[ADD3:%.*]] = add nsw i32 [[TMP8]], [[TMP7]]
 // SIMD3-NEXT:    store i32 [[ADD3]], ptr [[RES]], align 4
@@ -6173,7 +6173,7 @@ int foobar() {
 // SIMD4-NEXT:    [[TMP12:%.*]] = load i32, ptr [[RES]], align 4, !dbg [[DBG187:![0-9]+]]
 // SIMD4-NEXT:    [[ADD3:%.*]] = add nsw i32 [[TMP12]], [[TMP11]], !dbg [[DBG187]]
 // SIMD4-NEXT:    store i32 [[ADD3]], ptr [[RES]], align 4, !dbg [[DBG187]]
-// SIMD4-NEXT:    [[TMP13:%.*]] = load i32, ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1, i64 1), align 4, !dbg [[DBG188:![0-9]+]]
+// SIMD4-NEXT:    [[TMP13:%.*]] = load i32, ptr getelementptr inbounds ([3 x %struct.S1], ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1), i64 0, i64 1), align 4, !dbg [[DBG188:![0-9]+]]
 // SIMD4-NEXT:    [[TMP14:%.*]] = load i32, ptr [[RES]], align 4, !dbg [[DBG189:![0-9]+]]
 // SIMD4-NEXT:    [[ADD4:%.*]] = add nsw i32 [[TMP14]], [[TMP13]], !dbg [[DBG189]]
 // SIMD4-NEXT:    store i32 [[ADD4]], ptr [[RES]], align 4, !dbg [[DBG189]]
@@ -6254,7 +6254,7 @@ int foobar() {
 // SIMD4-NEXT:    [[TMP6:%.*]] = load i32, ptr [[RES]], align 4, !dbg [[DBG222:![0-9]+]]
 // SIMD4-NEXT:    [[ADD2:%.*]] = add nsw i32 [[TMP6]], [[TMP5]], !dbg [[DBG222]]
 // SIMD4-NEXT:    store i32 [[ADD2]], ptr [[RES]], align 4, !dbg [[DBG222]]
-// SIMD4-NEXT:    [[TMP7:%.*]] = load i32, ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1, i64 1), align 4, !dbg [[DBG223:![0-9]+]]
+// SIMD4-NEXT:    [[TMP7:%.*]] = load i32, ptr getelementptr inbounds ([3 x %struct.S1], ptr getelementptr inbounds ([2 x [3 x %struct.S1]], ptr @arr_x, i64 0, i64 1), i64 0, i64 1), align 4, !dbg [[DBG223:![0-9]+]]
 // SIMD4-NEXT:    [[TMP8:%.*]] = load i32, ptr [[RES]], align 4, !dbg [[DBG224:![0-9]+]]
 // SIMD4-NEXT:    [[ADD3:%.*]] = add nsw i32 [[TMP8]], [[TMP7]], !dbg [[DBG224]]
 // SIMD4-NEXT:    store i32 [[ADD3]], ptr [[RES]], align 4, !dbg [[DBG224]]
diff --git a/llvm/include/llvm/IR/ConstantFold.h b/llvm/include/llvm/IR/ConstantFold.h
index 9b3c8a0e5a632..42043d365b2d3 100644
--- a/llvm/include/llvm/IR/ConstantFold.h
+++ b/llvm/include/llvm/IR/ConstantFold.h
@@ -52,7 +52,7 @@ namespace llvm {
                                           Constant *V2);
   Constant *ConstantFoldCompareInstruction(CmpInst::Predicate Predicate,
                                            Constant *C1, Constant *C2);
-  Constant *ConstantFoldGetElementPtr(Type *Ty, Constant *C, bool InBounds,
+  Constant *ConstantFoldGetElementPtr(Type *Ty, Constant *C,
                                       std::optional<ConstantRange> InRange,
                                       ArrayRef<Value *> Idxs);
 } // End llvm namespace
diff --git a/llvm/lib/Analysis/InstructionSimplify.cpp b/llvm/lib/Analysis/InstructionSimplify.cpp
index 8b2aa6b9f18b0..00895860ca49b 100644
--- a/llvm/lib/Analysis/InstructionSimplify.cpp
+++ b/llvm/lib/Analysis/InstructionSimplify.cpp
@@ -5068,9 +5068,8 @@ static Value *simplifyGEPInst(Type *SrcTy, Value *Ptr,
     return nullptr;
 
   if (!ConstantExpr::isSupportedGetElementPtr(SrcTy))
-    // TODO(gep_nowrap): Pass on the whole GEPNoWrapFlags.
-    return ConstantFoldGetElementPtr(SrcTy, cast<Constant>(Ptr),
-                                     NW.isInBounds(), std::nullopt, Indices);
+    return ConstantFoldGetElementPtr(SrcTy, cast<Constant>(Ptr), std::nullopt,
+                                     Indices);
 
   auto *CE =
       ConstantExpr::getGetElementPtr(SrcTy, cast<Constant>(Ptr), Indices, NW);
diff --git a/llvm/lib/IR/ConstantFold.cpp b/llvm/lib/IR/ConstantFold.cpp
index 77a833610a3a9..34bcf36ec212c 100644
--- a/llvm/lib/IR/ConstantFold.cpp
+++ b/llvm/lib/IR/ConstantFold.cpp
@@ -1404,35 +1404,7 @@ Constant *llvm::ConstantFoldCompareInstruction(CmpInst::Predicate Predicate,
   return nullptr;
 }
 
-// Combine Indices - If the source pointer to this getelementptr instruction
-// is a getelementptr instruction, combine the indices of the two
-// getelementptr instructions into a single instruction.
-static Constant *foldGEPOfGEP(GEPOperator *GEP, Type *PointeeTy, bool InBounds,
-                              ArrayRef<Value *> Idxs) {
-  if (PointeeTy != GEP->getResultElementType())
-    return nullptr;
-
-  // Leave inrange handling to DL-aware constant folding.
-  if (GEP->getInRange())
-    return nullptr;
-
-  // Only handle simple case with leading zero index. We cannot perform an
-  // actual addition as we don't know the correct index type size to use.
-  Constant *Idx0 = cast<Constant>(Idxs[0]);
-  if (!Idx0->isNullValue())
-    return nullptr;
-
-  SmallVector<Value*, 16> NewIndices;
-  NewIndices.reserve(Idxs.size() + GEP->getNumIndices());
-  NewIndices.append(GEP->idx_begin(), GEP->idx_end());
-  NewIndices.append(Idxs.begin() + 1, Idxs.end());
-  return ConstantExpr::getGetElementPtr(
-      GEP->getSourceElementType(), cast<Constant>(GEP->getPointerOperand()),
-      NewIndices, InBounds && GEP->isInBounds());
-}
-
 Constant *llvm::ConstantFoldGetElementPtr(Type *PointeeTy, Constant *C,
-                                          bool InBounds,
                                           std::optional<ConstantRange> InRange,
                                           ArrayRef<Value *> Idxs) {
   if (Idxs.empty()) return C;
@@ -1462,10 +1434,5 @@ Constant *llvm::ConstantFoldGetElementPtr(Type *PointeeTy, Constant *C,
                      cast<VectorType>(GEPTy)->getElementCount(), C)
                : C;
 
-  if (ConstantExpr *CE = dyn_cast<ConstantExpr>(C))
-    if (auto *GEP = dyn_cast<GEPOperator>(CE))
-      if (Constant *C = foldGEPOfGEP(GEP, PointeeTy, InBounds, Idxs))
-        return C;
-
   return nullptr;
 }
diff --git a/llvm/lib/IR/Constants.cpp b/llvm/lib/IR/Constants.cpp
index a76be441875a1..d07907372f0e4 100644
--- a/llvm/lib/IR/Constants.cpp
+++ b/llvm/lib/IR/Constants.cpp
@@ -2438,8 +2438,7 @@ Constant *ConstantExpr::getGetElementPtr(Type *Ty, Constant *C,
   assert(Ty && "Must specify element type");
   assert(isSupportedGetElementPtr(Ty) && "Element type is unsupported!");
 
-  if (Constant *FC =
-          ConstantFoldGetElementPtr(Ty, C, NW.isInBounds(), InRange, Idxs))
+  if (Constant *FC = ConstantFoldGetElementPtr(Ty, C, InRange, Idxs))
     return FC; // Fold a few common cases.
 
   assert(GetElementPtrInst::getIndexedType(Ty, Idxs) && "GEP indices invalid!");

@dtcxzyw
Copy link
Member

dtcxzyw commented Jun 11, 2024

See 51e459a:

This causes some non-trivial text size increases in unoptimized builds for Bullet. Revert while I investigate.

Can you double check this?

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Jun 11, 2024
@nikic
Copy link
Contributor Author

nikic commented Jun 11, 2024

See 51e459a:

This causes some non-trivial text size increases in unoptimized builds for Bullet. Revert while I investigate.

Can you double check this?

This was mitigated by #93956. I did check that this patch does not cause code size regression for O0.

Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@nikic nikic merged commit 3b3b839 into llvm:main Jun 12, 2024
12 checks passed
@nikic nikic deleted the const-fold-gep-gep-simple branch June 12, 2024 07:50
@jplehr
Copy link
Contributor

jplehr commented Jun 12, 2024

I believe this broke our flang+openmp+offload bot: https://lab.llvm.org/staging/#/builders/140/builds/10168
Happy to help looking into it.

nikic added a commit that referenced this pull request Jun 12, 2024
This reverts commit 3b3b839.

This broke the flang+openmp+offload buildbot, as reported in
#95126 (comment).
@nikic
Copy link
Contributor Author

nikic commented Jun 12, 2024

I believe this broke our flang+openmp+offload bot: https://lab.llvm.org/staging/#/builders/140/builds/10168 Happy to help looking into it.

That's for the heads-up, I've reverted this patch in cece0a1.

I would appreciate some help with debugging this, I don't think I can run this test locally. Probably just having the pre-optimization LLVM IR/bitcode would be enough. (Though with offload, there is probably more than one module involved?)

@jplehr
Copy link
Contributor

jplehr commented Jun 12, 2024

Thank you @nikic.
I'll see to reproduce locally and narrow down as much as possible to provide small reproducer.

@agozillon
Copy link
Contributor

Hey, @jplehr notified me of this issue (thank you), I've had a little look at it and it's related to an issue this PR fixes: #94541 which you are currently a reviewer of ironically (as I use and alter slightly a piece of functionality you ported, so thank you very much for that)! A slightly funny convergence there :-)

Currently when generating the kernel we try to replace uses of values with their appropriate kernel input argument, in certain cases these are Constant's which poses an issue as we can't replace them with the Input arguments which are themselves not Constant I believe. So we can only replace instructions, to do so we transform the Constants to Instructions where necessary, the current function that's in place (a small naive function from myself) will do this within the kernel to a small extent, it doesn't handle nesting's very well for example, and apparently IR this patch will produce in the test cases that are failing. However, the PR I have opened above uses the more robust and comprehensive piece of functionality you made/ported via the function convertUsersOfConstantsToInstructions which solves the issues encountered when this PR is landed alongside some other issues I encountered recently.

So two ways forward I can see, depending on how urgent/annoying it is for this PR to be stalled:

  1. We disable the offending tests that are breaking the buildbot until the PR that will fix them lands, which will allow you to commit this as soon as they're disabled without angering the buildbot
  2. We hold off landing this until the PR that fixes the issues in the tests lands, I am unfortunately not sure how long this will take as it's been up for a while with a lack of reviewers (having a review from you @nikic would be excellent as well for the minor additions to the functionality you added), although, I can likely chase some people up to give it a look over as it's urgent.

I don't mind which route we go down, if we opt to disable the tests, I can land a commit shortly to do so, unless either of you @nikic @jplehr would prefer to do so :-)

@nikic
Copy link
Contributor Author

nikic commented Jun 12, 2024

@agozillon Thanks for looking into this! This change isn't urgent, so I'm happy to wait for your PR to go in first.

@agozillon
Copy link
Contributor

@nikic sounds good, if that changes don't hesitate to ping me and I'll be happy to deactivate the tests for a while (or feel free to do so yourself, just ping to tell me they're off so I know to turn them on again when the PR lands!)

@agozillon
Copy link
Contributor

Just landed the PR, so it should be good to land this PR now without causing issues with the flang+openmp+offload buildbot, however, if any other issues do arise, I am happy to look into them! But from testing this PR in conjunction with the one I just landed locally, it does seem like everything should be fine.

nikic added a commit that referenced this pull request Jun 13, 2024
Reapplying without changes. The flang+openmp buildbot failure
should be addressed by #94541.

-----

This is a followup to #93823
and drops the DataLayout-unaware GEP of GEP fold entirely. All cases are
now left to the DataLayout-aware constant folder, which will fold
everything to a single i8 GEP.

We didn't have any test coverage for this fold in LLVM, but some Clang
tests change.
EthanLuisMcDonough pushed a commit to EthanLuisMcDonough/llvm-project that referenced this pull request Aug 13, 2024
Reapplying without changes. The flang+openmp buildbot failure
should be addressed by llvm#94541.

-----

This is a followup to llvm#93823
and drops the DataLayout-unaware GEP of GEP fold entirely. All cases are
now left to the DataLayout-aware constant folder, which will fold
everything to a single i8 GEP.

We didn't have any test coverage for this fold in LLVM, but some Clang
tests change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:openmp OpenMP related changes to Clang clang Clang issues not falling into any other category llvm:analysis llvm:ir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants