[ConstantFold] Remove non-trivial gep-of-gep fold #93823

nikic · 2024-05-30T13:58:23Z

This fold is subtly incorrect, because DL-unaware constant folding does not know the correct index type to use, and just performs the addition in the type that happens to already be there. This is incorrect, since sext(X)+sext(Y) is generally not the same as sext(X+Y). See the @constexpr_gep_of_gep_with_narrow_type() for a miscompile with the current implementation.

One could try to restrict the fold to cases where no overflow occurs, but I'm not bothering with that here, because the DL-aware constant folding will take care of this anyway. I've only kept the straightforward zero-index case, where we just concatenate two GEPs.

llvmbot · 2024-05-30T13:58:54Z

@llvm/pr-subscribers-clang
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-ir

Author: Nikita Popov (nikic)

Changes

This fold is subtly incorrect, because DL-unaware constant folding does not know the correct index type to use, and just performs the addition in the type that happens to already be there. This is incorrect, since sext(X)+sext(Y) is generally not the same as sext(X+Y). See the @constexpr_gep_of_gep_with_narrow_type() for a miscompile with the current implementation.

One could try to restrict the fold to cases where no overflow occurs, but I'm not bothering with that here, because the DL-aware constant folding will take care of this anyway. I've only kept the straightforward zero-index case, where we just concatenate two GEPs.

Patch is 47.07 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/93823.diff

9 Files Affected:

(modified) clang/test/CodeGenCXX/2011-12-19-init-list-ctor.cpp (+1-1)
(modified) clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist-pr12086.cpp (+4-4)
(modified) clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp (+2-2)
(modified) clang/test/OpenMP/threadprivate_codegen.cpp (+48-48)
(modified) llvm/lib/IR/ConstantFold.cpp (+4-52)
(modified) llvm/test/Other/constant-fold-gep.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/gepgep.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/getelementptr.ll (+1-2)
(modified) llvm/test/Transforms/InstCombine/ptrtoint-nullgep.ll (+8-8)

diff --git a/clang/test/CodeGenCXX/2011-12-19-init-list-ctor.cpp b/clang/test/CodeGenCXX/2011-12-19-init-list-ctor.cpp
index 4033adc8f0390..6495fdebacf02 100644
--- a/clang/test/CodeGenCXX/2011-12-19-init-list-ctor.cpp
+++ b/clang/test/CodeGenCXX/2011-12-19-init-list-ctor.cpp
@@ -22,5 +22,5 @@ struct S {
 // CHECK: call void @_ZN1AC1EPKc(ptr {{[^,]*}} getelementptr inbounds (%struct.S, ptr @arr, i32 0, i32 1), ptr noundef @.str)
 // CHECK: store i32 1, ptr getelementptr inbounds (%struct.S, ptr @arr, i64 1)
 // CHECK: call void @_ZN1AC1EPKc(ptr {{[^,]*}} getelementptr inbounds (%struct.S, ptr @arr, i64 1, i32 1), ptr noundef @.str.1)
-// CHECK: store i32 2, ptr getelementptr inbounds (%struct.S, ptr @arr, i64 2)
+// CHECK: store i32 2, ptr getelementptr inbounds (%struct.S, ptr getelementptr inbounds (%struct.S, ptr @arr, i64 1), i64 1)
 // CHECK: call void @_ZN1AC1EPKc(ptr {{[^,]*}} getelementptr inbounds (%struct.S, ptr @arr, i64 2, i32 1), ptr noundef @.str.2)
diff --git a/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist-pr12086.cpp b/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist-pr12086.cpp
index 6fbe4c7fd17a7..4f4c88d9e96a5 100644
--- a/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist-pr12086.cpp
+++ b/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist-pr12086.cpp
@@ -83,8 +83,8 @@ std::initializer_list<std::initializer_list<int>> nested = {
 // CHECK-DYNAMIC-BL: store i32 5, ptr @_ZGR6nested2_
 // CHECK-DYNAMIC-BL: store i32 {{.*}}, ptr getelementptr inbounds (i32, ptr @_ZGR6nested2_, i64 1)
 // CHECK-DYNAMIC-BL: store ptr @_ZGR6nested2_,
-// CHECK-DYNAMIC-BL:       ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 2), align 8
-// CHECK-DYNAMIC-BL: store i64 2, ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 2, i32 1), align 8
+// CHECK-DYNAMIC-BL:       ptr getelementptr inbounds ({{.*}}, ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 1), i64 1), align 8
+// CHECK-DYNAMIC-BL: store i64 2, ptr getelementptr inbounds ({{.*}}, ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 1), i64 1, i32 1), align 8
 // CHECK-DYNAMIC-BL: store ptr @_ZGR6nested_,
 // CHECK-DYNAMIC-BL:       ptr @nested, align 8
 // CHECK-DYNAMIC-BL: store i64 3, ptr getelementptr inbounds ({{.*}}, ptr @nested, i32 0, i32 1), align 8
@@ -123,9 +123,9 @@ std::initializer_list<std::initializer_list<int>> nested = {
 // CHECK-DYNAMIC-BE: store i32 5, ptr @_ZGR6nested2_
 // CHECK-DYNAMIC-BE: store i32 {{.*}}, ptr getelementptr inbounds (i32, ptr @_ZGR6nested2_, i64 1)
 // CHECK-DYNAMIC-BE: store ptr @_ZGR6nested2_,
-// CHECK-DYNAMIC-BE:       ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 2), align 8
+// CHECK-DYNAMIC-BE:       ptr getelementptr inbounds ({{.*}}, ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 1), i64 1), align 8
 // CHECK-DYNAMIC-BE: store ptr getelementptr inbounds ([2 x i32], ptr @_ZGR6nested2_, i64 0, i64 2),
-// CHECK-DYNAMIC-BE:       ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 2, i32 1), align 8
+// CHECK-DYNAMIC-BE:       ptr getelementptr inbounds ({{.*}}, ptr getelementptr inbounds ({{.*}}, ptr @_ZGR6nested_, i64 1), i64 1, i32 1), align 8
 // CHECK-DYNAMIC-BE: store ptr @_ZGR6nested_,
 // CHECK-DYNAMIC-BE:       ptr @nested, align 8
 // CHECK-DYNAMIC-BE: store ptr getelementptr inbounds ([3 x {{.*}}], ptr @_ZGR6nested_, i64 0, i64 3),
diff --git a/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp b/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp
index aa2f078a5fb0c..ac05910151a46 100644
--- a/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp
+++ b/clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp
@@ -370,8 +370,8 @@ namespace partly_constant {
   //
   // Third init list.
   // CHECK-NOT: @[[PARTLY_CONSTANT_THIRD]],
-  // CHECK: store ptr {{.*}}@[[PARTLY_CONSTANT_THIRD]]{{.*}}, ptr getelementptr inbounds ({{.*}}, ptr {{.*}}@[[PARTLY_CONSTANT_INNER]]{{.*}}, i64 2)
-  // CHECK: store i64 4, ptr getelementptr inbounds ({{.*}}, ptr {{.*}}@[[PARTLY_CONSTANT_INNER]]{{.*}}, i64 2, i32 1)
+  // CHECK: store ptr {{.*}}@[[PARTLY_CONSTANT_THIRD]]{{.*}}, ptr getelementptr inbounds ({{.*}}, ptr getelementptr inbounds ({{.*}}, ptr {{.*}}@[[PARTLY_CONSTANT_INNER]]{{.*}}, i64 1), i64 1)
+  // CHECK: store i64 4, ptr getelementptr inbounds ({{.*}}, ptr getelementptr inbounds ({{.*}}, ptr {{.*}}@[[PARTLY_CONSTANT_INNER]]{{.*}}, i64 1), i64 1, i32 1)
   // CHECK-NOT: @[[PARTLY_CONSTANT_THIRD]],
   //
   // Outer init list.
diff --git a/clang/test/OpenMP/threadprivate_codegen.cpp b/clang/test/OpenMP/threadprivate_codegen.cpp
index b27783be829d6..7094decd41041 100644
--- a/clang/test/OpenMP/threadprivate_codegen.cpp
+++ b/clang/test/OpenMP/threadprivate_codegen.cpp
@@ -1260,8 +1260,8 @@ int foobar() {
 // CHECK1-NEXT:    invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i32 noundef 2)
 // CHECK1-NEXT:            to label [[INVOKE_CONT2:%.*]] unwind label [[LPAD]]
 // CHECK1:       invoke.cont2:
-// CHECK1-NEXT:    store ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 2), ptr [[ARRAYINIT_ENDOFINIT1]], align 8
-// CHECK1-NEXT:    invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 2), i32 noundef 3)
+// CHECK1-NEXT:    store ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i64 1), ptr [[ARRAYINIT_ENDOFINIT1]], align 8
+// CHECK1-NEXT:    invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i64 1), i32 noundef 3)
 // CHECK1-NEXT:            to label [[INVOKE_CONT3:%.*]] unwind label [[LPAD]]
 // CHECK1:       invoke.cont3:
 // CHECK1-NEXT:    store ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), ptr [[ARRAYINIT_ENDOFINIT]], align 8
@@ -1273,8 +1273,8 @@ int foobar() {
 // CHECK1-NEXT:    invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 1), i32 noundef 5)
 // CHECK1-NEXT:            to label [[INVOKE_CONT8:%.*]] unwind label [[LPAD6]]
 // CHECK1:       invoke.cont8:
-// CHECK1-NEXT:    store ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 2), ptr [[ARRAYINIT_ENDOFINIT5]], align 8
-// CHECK1-NEXT:    invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 2), i32 noundef 6)
+// CHECK1-NEXT:    store ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 1), i64 1), ptr [[ARRAYINIT_ENDOFINIT5]], align 8
+// CHECK1-NEXT:    invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 1), i64 1), i32 noundef 6)
 // CHECK1-NEXT:            to label [[INVOKE_CONT9:%.*]] unwind label [[LPAD6]]
 // CHECK1:       invoke.cont9:
 // CHECK1-NEXT:    [[TMP0:%.*]] = call i32 @__cxa_atexit(ptr @__cxx_global_array_dtor, ptr null, ptr @__dso_handle) #[[ATTR3]]
@@ -1783,8 +1783,8 @@ int foobar() {
 // CHECK2-NEXT:    invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i32 noundef 2)
 // CHECK2-NEXT:            to label [[INVOKE_CONT2:%.*]] unwind label [[LPAD]]
 // CHECK2:       invoke.cont2:
-// CHECK2-NEXT:    store ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 2), ptr [[ARRAYINIT_ENDOFINIT1]], align 8
-// CHECK2-NEXT:    invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 2), i32 noundef 3)
+// CHECK2-NEXT:    store ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i64 1), ptr [[ARRAYINIT_ENDOFINIT1]], align 8
+// CHECK2-NEXT:    invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i64 1), i32 noundef 3)
 // CHECK2-NEXT:            to label [[INVOKE_CONT3:%.*]] unwind label [[LPAD]]
 // CHECK2:       invoke.cont3:
 // CHECK2-NEXT:    store ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), ptr [[ARRAYINIT_ENDOFINIT]], align 8
@@ -1796,8 +1796,8 @@ int foobar() {
 // CHECK2-NEXT:    invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 1), i32 noundef 5)
 // CHECK2-NEXT:            to label [[INVOKE_CONT8:%.*]] unwind label [[LPAD6]]
 // CHECK2:       invoke.cont8:
-// CHECK2-NEXT:    store ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 2), ptr [[ARRAYINIT_ENDOFINIT5]], align 8
-// CHECK2-NEXT:    invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 2), i32 noundef 6)
+// CHECK2-NEXT:    store ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 1), i64 1), ptr [[ARRAYINIT_ENDOFINIT5]], align 8
+// CHECK2-NEXT:    invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 1), i64 1), i32 noundef 6)
 // CHECK2-NEXT:            to label [[INVOKE_CONT9:%.*]] unwind label [[LPAD6]]
 // CHECK2:       invoke.cont9:
 // CHECK2-NEXT:    [[TMP0:%.*]] = call i32 @__cxa_atexit(ptr @__cxx_global_array_dtor, ptr null, ptr @__dso_handle) #[[ATTR3]]
@@ -2458,8 +2458,8 @@ int foobar() {
 // SIMD1-NEXT:    invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i32 noundef 2)
 // SIMD1-NEXT:            to label [[INVOKE_CONT2:%.*]] unwind label [[LPAD]]
 // SIMD1:       invoke.cont2:
-// SIMD1-NEXT:    store ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 2), ptr [[ARRAYINIT_ENDOFINIT1]], align 8
-// SIMD1-NEXT:    invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 2), i32 noundef 3)
+// SIMD1-NEXT:    store ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i64 1), ptr [[ARRAYINIT_ENDOFINIT1]], align 8
+// SIMD1-NEXT:    invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i64 1), i32 noundef 3)
 // SIMD1-NEXT:            to label [[INVOKE_CONT3:%.*]] unwind label [[LPAD]]
 // SIMD1:       invoke.cont3:
 // SIMD1-NEXT:    store ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), ptr [[ARRAYINIT_ENDOFINIT]], align 8
@@ -2471,8 +2471,8 @@ int foobar() {
 // SIMD1-NEXT:    invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 1), i32 noundef 5)
 // SIMD1-NEXT:            to label [[INVOKE_CONT8:%.*]] unwind label [[LPAD6]]
 // SIMD1:       invoke.cont8:
-// SIMD1-NEXT:    store ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 2), ptr [[ARRAYINIT_ENDOFINIT5]], align 8
-// SIMD1-NEXT:    invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 2), i32 noundef 6)
+// SIMD1-NEXT:    store ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 1), i64 1), ptr [[ARRAYINIT_ENDOFINIT5]], align 8
+// SIMD1-NEXT:    invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 1), i64 1), i32 noundef 6)
 // SIMD1-NEXT:            to label [[INVOKE_CONT9:%.*]] unwind label [[LPAD6]]
 // SIMD1:       invoke.cont9:
 // SIMD1-NEXT:    [[TMP0:%.*]] = call i32 @__cxa_atexit(ptr @__cxx_global_array_dtor, ptr null, ptr @__dso_handle) #[[ATTR3]]
@@ -2922,8 +2922,8 @@ int foobar() {
 // SIMD2-NEXT:    invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i32 noundef 2)
 // SIMD2-NEXT:            to label [[INVOKE_CONT2:%.*]] unwind label [[LPAD]], !dbg [[DBG158:![0-9]+]]
 // SIMD2:       invoke.cont2:
-// SIMD2-NEXT:    store ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 2), ptr [[ARRAYINIT_ENDOFINIT1]], align 8, !dbg [[DBG156]]
-// SIMD2-NEXT:    invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 2), i32 noundef 3)
+// SIMD2-NEXT:    store ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i64 1), ptr [[ARRAYINIT_ENDOFINIT1]], align 8, !dbg [[DBG156]]
+// SIMD2-NEXT:    invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i64 1), i32 noundef 3)
 // SIMD2-NEXT:            to label [[INVOKE_CONT3:%.*]] unwind label [[LPAD]], !dbg [[DBG159:![0-9]+]]
 // SIMD2:       invoke.cont3:
 // SIMD2-NEXT:    store ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), ptr [[ARRAYINIT_ENDOFINIT]], align 8, !dbg [[DBG154]]
@@ -2935,8 +2935,8 @@ int foobar() {
 // SIMD2-NEXT:    invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 1), i32 noundef 5)
 // SIMD2-NEXT:            to label [[INVOKE_CONT8:%.*]] unwind label [[LPAD6]], !dbg [[DBG162:![0-9]+]]
 // SIMD2:       invoke.cont8:
-// SIMD2-NEXT:    store ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 2), ptr [[ARRAYINIT_ENDOFINIT5]], align 8, !dbg [[DBG160]]
-// SIMD2-NEXT:    invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 2), i32 noundef 6)
+// SIMD2-NEXT:    store ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 1), i64 1), ptr [[ARRAYINIT_ENDOFINIT5]], align 8, !dbg [[DBG160]]
+// SIMD2-NEXT:    invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 1), i64 1), i32 noundef 6)
 // SIMD2-NEXT:            to label [[INVOKE_CONT9:%.*]] unwind label [[LPAD6]], !dbg [[DBG163:![0-9]+]]
 // SIMD2:       invoke.cont9:
 // SIMD2-NEXT:    [[TMP0:%.*]] = call i32 @__cxa_atexit(ptr @__cxx_global_array_dtor, ptr null, ptr @__dso_handle) #[[ATTR3]], !dbg [[DBG164:![0-9]+]]
@@ -3451,8 +3451,8 @@ int foobar() {
 // CHECK-TLS1-NEXT:    invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i32 noundef 2)
 // CHECK-TLS1-NEXT:            to label [[INVOKE_CONT2:%.*]] unwind label [[LPAD]]
 // CHECK-TLS1:       invoke.cont2:
-// CHECK-TLS1-NEXT:    store ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 2), ptr [[ARRAYINIT_ENDOFINIT1]], align 8
-// CHECK-TLS1-NEXT:    invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 2), i32 noundef 3)
+// CHECK-TLS1-NEXT:    store ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i64 1), ptr [[ARRAYINIT_ENDOFINIT1]], align 8
+// CHECK-TLS1-NEXT:    invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i64 1), i32 noundef 3)
 // CHECK-TLS1-NEXT:            to label [[INVOKE_CONT3:%.*]] unwind label [[LPAD]]
 // CHECK-TLS1:       invoke.cont3:
 // CHECK-TLS1-NEXT:    store ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), ptr [[ARRAYINIT_ENDOFINIT]], align 8
@@ -3464,8 +3464,8 @@ int foobar() {
 // CHECK-TLS1-NEXT:    invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 1), i32 noundef 5)
 // CHECK-TLS1-NEXT:            to label [[INVOKE_CONT8:%.*]] unwind label [[LPAD6]]
 // CHECK-TLS1:       invoke.cont8:
-// CHECK-TLS1-NEXT:    store ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 2), ptr [[ARRAYINIT_ENDOFINIT5]], align 8
-// CHECK-TLS1-NEXT:    invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 2), i32 noundef 6)
+// CHECK-TLS1-NEXT:    store ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 1), i64 1), ptr [[ARRAYINIT_ENDOFINIT5]], align 8
+// CHECK-TLS1-NEXT:    invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 1), i64 1), i32 noundef 6)
 // CHECK-TLS1-NEXT:            to label [[INVOKE_CONT9:%.*]] unwind label [[LPAD6]]
 // CHECK-TLS1:       invoke.cont9:
 // CHECK-TLS1-NEXT:    [[TMP0:%.*]] = call i32 @__cxa_thread_atexit(ptr @__cxx_global_array_dtor, ptr null, ptr @__dso_handle) #[[ATTR3]]
@@ -4178,8 +4178,8 @@ int foobar() {
 // CHECK-TLS2-NEXT:    invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i32 noundef 2)
 // CHECK-TLS2-NEXT:            to label [[INVOKE_CONT2:%.*]] unwind label [[LPAD]]
 // CHECK-TLS2:       invoke.cont2:
-// CHECK-TLS2-NEXT:    store ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 2), ptr [[ARRAYINIT_ENDOFINIT1]], align 8
-// CHECK-TLS2-NEXT:    invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 2), i32 noundef 3)
+// CHECK-TLS2-NEXT:    store ptr getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i64 1), ptr [[ARRAYINIT_ENDOFINIT1]], align 8
+// CHECK-TLS2-NEXT:    invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([[STRUCT_S1]], ptr @arr_x, i64 1), i64 1), i32 noundef 3)
 // CHECK-TLS2-NEXT:            to label [[INVOKE_CONT3:%.*]] unwind label [[LPAD]]
 // CHECK-TLS2:       invoke.cont3:
 // CHECK-TLS2-NEXT:    store ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), ptr [[ARRAYINIT_ENDOFINIT]], align 8
@@ -4191,8 +4191,8 @@ int foobar() {
 // CHECK-TLS2-NEXT:    invoke void @_ZN2S1C1Ei(ptr noundef nonnull align 4 dereferenceable(4) getelementptr inbounds ([[STRUCT_S1]], ptr getelementptr inbounds ([3 x %struct.S1], ptr @arr_x, i64 1), i64 1), i32 noundef 5)
 // CHECK-TLS2-NEXT:            to label [[INVOKE_CONT8:%.*]] unwind label [[LPAD6]]
 // CHECK-TLS2:       invoke.con...
[truncated]

dtcxzyw

because the DL-aware constant folding will take care of this anyway.

Can you point out where we do this fold?

I've only kept the straightforward zero-index case, where we just concatenate two GEPs.

Is there a test for this case?

PR Link: llvm/llvm-project#93823

nikic · 2024-05-30T14:18:15Z

because the DL-aware constant folding will take care of this anyway.

Can you point out where we do this fold?

llvm-project/llvm/lib/Analysis/ConstantFolding.cpp

Line 865 in 1f46729

Constant *SymbolicallyEvaluateGEP(const GEPOperator *GEP,

I've only kept the straightforward zero-index case, where we just concatenate two GEPs.

Is there a test for this case?

It looks like we don't have an LLVM test, but it affects a few clang tests:

  Clang :: CodeGenCUDA/managed-var.cu
  Clang :: CodeGenCXX/2011-12-19-init-list-ctor.cpp
  Clang :: CodeGenCXX/cxx0x-initializer-stdinitializerlist-pr12086.cpp
  Clang :: CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp
  Clang :: CodeGenCXX/ms-inline-asm-fields.cpp
  Clang :: OpenMP/threadprivate_codegen.cpp

I kind of expected more clang fallout, so I'd be open to just dropping this fold from the DL-unaware folder completely, even the basic case.

dtcxzyw

LGTM.

aeubanks

this PR looks good, and further dropping the fold is also good

This fold is subtly incorrect, because DL-unaware constant folding does not know the correct index type to use, and just performs the addition in the type that happens to already be there. This is incorrect, since sext(X)+sext(Y) is generally not the same as sext(X+Y). See the `@constexpr_gep_of_gep_with_narrow_type()` for a miscompile with the current implementation. One could try to restrict the fold to cases where no overflow occurs, but I'm not bothering with that here, because the DL-aware constant folding will take care of this anyway. I've only kept the straightforward zero-index case, where we just concatenate two GEPs.

This reverts commit e1cc9e4. This causes some non-trivial text size increases in unoptimized builds for Bullet. Revert while I investigate.

This makes codegen for array initialization simpler in two ways: 1. Drop the zero-index GEP at the start, which is no longer needed with opaque pointers. 2. Emit GEPs directly to the correct element, instead of having a long chain of +1 GEPs. This is more canonical, and also avoids regressions in unoptimized builds from llvm#93823.

This makes codegen for array initialization simpler in two ways: 1. Drop the zero-index GEP at the start, which is no longer needed with opaque pointers. 2. Emit GEPs directly to the correct element, instead of having a long chain of +1 GEPs. This is more canonical, and also avoids regressions in unoptimized builds from #93823.

This fold is subtly incorrect, because DL-unaware constant folding does not know the correct index type to use, and just performs the addition in the type that happens to already be there. This is incorrect, since sext(X)+sext(Y) is generally not the same as sext(X+Y). See the `@constexpr_gep_of_gep_with_narrow_type()` for a miscompile with the current implementation. One could try to restrict the fold to cases where no overflow occurs, but I'm not bothering with that here, because the DL-aware constant folding will take care of this anyway. I've only kept the straightforward zero-index case, where we just concatenate two GEPs.

Reapply after #93956, which changed clang array initialization codegen to avoid size regressions for unoptimized builds. ----- This fold is subtly incorrect, because DL-unaware constant folding does not know the correct index type to use, and just performs the addition in the type that happens to already be there. This is incorrect, since sext(X)+sext(Y) is generally not the same as sext(X+Y). See the `@constexpr_gep_of_gep_with_narrow_type()` for a miscompile with the current implementation. One could try to restrict the fold to cases where no overflow occurs, but I'm not bothering with that here, because the DL-aware constant folding will take care of this anyway. I've only kept the straightforward zero-index case, where we just concatenate two GEPs.

This is a followup to llvm#93823 and drop the DataLayout-unaware GEP of GEP fold entirely. All cases are now left to the DataLayout-aware constant folder, which will fold everything to a single i8 GEP. We didn't have any test coverage for this fold in LLVM, but some Clang tests change.

This is a followup to #93823 and drops the DataLayout-unaware GEP of GEP fold entirely. All cases are now left to the DataLayout-aware constant folder, which will fold everything to a single i8 GEP. We didn't have any test coverage for this fold in LLVM, but some Clang tests change.

Reapplying without changes. The flang+openmp buildbot failure should be addressed by #94541. ----- This is a followup to #93823 and drops the DataLayout-unaware GEP of GEP fold entirely. All cases are now left to the DataLayout-aware constant folder, which will fold everything to a single i8 GEP. We didn't have any test coverage for this fold in LLVM, but some Clang tests change.

Reapplying without changes. The flang+openmp buildbot failure should be addressed by llvm#94541. ----- This is a followup to llvm#93823 and drops the DataLayout-unaware GEP of GEP fold entirely. All cases are now left to the DataLayout-aware constant folder, which will fold everything to a single i8 GEP. We didn't have any test coverage for this fold in LLVM, but some Clang tests change.

nikic requested review from aeubanks and dtcxzyw May 30, 2024 13:58

llvmbot added clang Clang issues not falling into any other category llvm:ir llvm:transforms clang:openmp OpenMP related changes to Clang labels May 30, 2024

dtcxzyw reviewed May 30, 2024

View reviewed changes

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request May 30, 2024

pre-commit: test PR93823

c504b3d

PR Link: llvm/llvm-project#93823

dtcxzyw mentioned this pull request May 30, 2024

pre-commit: test PR93823 dtcxzyw/llvm-opt-benchmark#648

Closed

dtcxzyw approved these changes May 30, 2024

View reviewed changes

aeubanks approved these changes May 30, 2024

View reviewed changes

nikic added 2 commits May 31, 2024 08:10

Update one more test

7299a3b

nikic force-pushed the const-fold-gep-gep branch from 1f0b92d to 7299a3b Compare May 31, 2024 06:34

nikic merged commit e1cc9e4 into llvm:main May 31, 2024
5 of 6 checks passed

nikic deleted the const-fold-gep-gep branch May 31, 2024 07:25

nikic added a commit that referenced this pull request May 31, 2024

Revert "[ConstantFold] Remove non-trivial gep-of-gep fold (#93823)"

51e459a

This reverts commit e1cc9e4. This causes some non-trivial text size increases in unoptimized builds for Bullet. Revert while I investigate.

nikic mentioned this pull request May 31, 2024

[CodeGen] Simplify codegen for array initialization #93956

Merged

nikic mentioned this pull request Jun 11, 2024

[ConstantFold] Drop gep of gep fold entirely #95126

Merged

HerrCai0907 mentioned this pull request Jun 13, 2024

tidy #95384

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ConstantFold] Remove non-trivial gep-of-gep fold #93823

[ConstantFold] Remove non-trivial gep-of-gep fold #93823

nikic commented May 30, 2024

llvmbot commented May 30, 2024 •

edited

Loading

dtcxzyw left a comment

nikic commented May 30, 2024

dtcxzyw left a comment

aeubanks left a comment

[ConstantFold] Remove non-trivial gep-of-gep fold #93823

[ConstantFold] Remove non-trivial gep-of-gep fold #93823

Conversation

nikic commented May 30, 2024

llvmbot commented May 30, 2024 • edited Loading

dtcxzyw left a comment

Choose a reason for hiding this comment

nikic commented May 30, 2024

dtcxzyw left a comment

Choose a reason for hiding this comment

aeubanks left a comment

Choose a reason for hiding this comment

llvmbot commented May 30, 2024 •

edited

Loading