Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[InstCombine] Extend folding of aggregate construction to cases when source aggregates are partially available #100828

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

weiguozhi
Copy link
Contributor

Function foldAggregateConstructionIntoAggregateReuse can fold
insertvalue(phi(extractvalue(src1), extractvalue(src2)))
into
phi(src1, src2)
when we can find source aggregates in all predecessors.

This patch extends it to handle following case
insertvalue(phi(extractvalue(src1), elm2))
into
phi(src1, insertvalue(elm2))
with the condition that the predecessor without source aggregate has only one successor.

…source aggregates are partially available

Function foldAggregateConstructionIntoAggregateReuse can fold
  insertvalue(phi(extractvalue(src1), extractvalue(src2)))
into
  phi(src1, src2)
when we can find source aggregates in all predecessors.

This patch extends it to handle following case
  insertvalue(phi(extractvalue(src1), elm2))
into
  phi(src1, insertvalue(elm2))
with the condition that the predecessor without source aggregate has only one
successor.
@llvmbot
Copy link
Collaborator

llvmbot commented Jul 26, 2024

@llvm/pr-subscribers-llvm-transforms

Author: None (weiguozhi)

Changes

Function foldAggregateConstructionIntoAggregateReuse can fold
insertvalue(phi(extractvalue(src1), extractvalue(src2)))
into
phi(src1, src2)
when we can find source aggregates in all predecessors.

This patch extends it to handle following case
insertvalue(phi(extractvalue(src1), elm2))
into
phi(src1, insertvalue(elm2))
with the condition that the predecessor without source aggregate has only one successor.


Full diff: https://github.com/llvm/llvm-project/pull/100828.diff

4 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp (+30-3)
  • (added) llvm/test/Transforms/InstCombine/fold-aggregate-reconstruction.ll (+215)
  • (modified) llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll (+3-4)
  • (modified) llvm/test/Transforms/InstCombine/phi-aware-aggregate-reconstruction.ll (+12-14)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp b/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
index 753ed55523c84..5377bf6f033d0 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
@@ -1106,6 +1106,7 @@ Instruction *InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse(
   // from which all the elements were originally extracted from?
   // Note that we want for the map to have stable iteration order!
   SmallDenseMap<BasicBlock *, Value *, 4> SourceAggregates;
+  bool FoundSrcAgg = false;
   for (BasicBlock *Pred : Preds) {
     std::pair<decltype(SourceAggregates)::iterator, bool> IV =
         SourceAggregates.insert({Pred, nullptr});
@@ -1117,9 +1118,35 @@ Instruction *InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse(
     // aggregate produced by OrigIVI must have been originally extracted from
     // the same aggregate. Is that so? Can we find said original aggregate?
     SourceAggregate = FindCommonSourceAggregate(UseBB, Pred);
-    if (Describe(SourceAggregate) != AggregateDescription::Found)
-      return nullptr; // Give up.
-    IV.first->second = *SourceAggregate;
+    if (Describe(SourceAggregate) == AggregateDescription::Found) {
+      FoundSrcAgg = true;
+      IV.first->second = *SourceAggregate;
+    } else {
+      // If UseBB is the single successor of Pred, we can add InsertValue to
+      // Pred.
+      if (succ_size(Pred) != 1)
+        return nullptr; // Give up.
+    }
+  }
+
+  if (!FoundSrcAgg)
+    return nullptr;
+
+  // For predecessors without appropriate source aggregate, create one in the
+  // predecessor.
+  for (auto &It : SourceAggregates) {
+    if (Describe(It.second) == AggregateDescription::Found)
+      continue;
+
+    BasicBlock *Pred = It.first;
+    Builder.SetInsertPoint(Pred, Pred->getTerminator()->getIterator());
+    Value *V = PoisonValue::get(AggTy);
+    for (auto I : enumerate(AggElts)) {
+      Value *Elt = (*I.value())->DoPHITranslation(UseBB, Pred);
+      V = Builder.CreateInsertValue(V, Elt, I.index());
+    }
+
+    It.second = V;
   }
 
   // All good! Now we just need to thread the source aggregates here.
diff --git a/llvm/test/Transforms/InstCombine/fold-aggregate-reconstruction.ll b/llvm/test/Transforms/InstCombine/fold-aggregate-reconstruction.ll
new file mode 100644
index 0000000000000..a75e9f1580ab9
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/fold-aggregate-reconstruction.ll
@@ -0,0 +1,215 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S -passes=instcombine < %s | FileCheck %s
+
+declare {ptr, i64} @bar(i64)
+
+; Basic test.
+define {ptr, i64} @test1(i1 %cond1, ptr %p1, ptr %p2) {
+; CHECK-LABEL: define { ptr, i64 } @test1(
+; CHECK-SAME: i1 [[COND1:%.*]], ptr [[P1:%.*]], ptr [[P2:%.*]]) {
+; CHECK-NEXT:    br i1 [[COND1]], label %[[BBB1:.*]], label %[[BBB2:.*]]
+; CHECK:       [[BBB1]]:
+; CHECK-NEXT:    [[CALL1:%.*]] = call { ptr, i64 } @bar(i64 0)
+; CHECK-NEXT:    br label %[[EXIT:.*]]
+; CHECK:       [[BBB2]]:
+; CHECK-NEXT:    [[VAL21:%.*]] = load ptr, ptr [[P1]], align 8
+; CHECK-NEXT:    [[VAL22:%.*]] = load i64, ptr [[P2]], align 4
+; CHECK-NEXT:    [[TMP1:%.*]] = insertvalue { ptr, i64 } poison, ptr [[VAL21]], 0
+; CHECK-NEXT:    [[TMP2:%.*]] = insertvalue { ptr, i64 } [[TMP1]], i64 [[VAL22]], 1
+; CHECK-NEXT:    br label %[[EXIT]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    [[RES_MERGED:%.*]] = phi { ptr, i64 } [ [[CALL1]], %[[BBB1]] ], [ [[TMP2]], %[[BBB2]] ]
+; CHECK-NEXT:    ret { ptr, i64 } [[RES_MERGED]]
+;
+  br i1 %cond1, label %bbb1, label %bbb2
+
+bbb1:
+  %call1 = call {ptr, i64} @bar(i64 0)
+  %val11 = extractvalue { ptr, i64 } %call1, 0
+  %val12 = extractvalue { ptr, i64 } %call1, 1
+  br label %exit
+
+bbb2:
+  %val21 = load ptr, ptr %p1
+  %val22 = load i64, ptr %p2
+  br label %exit
+
+exit:
+  %val1 = phi ptr [%val11, %bbb1], [%val21, %bbb2]
+  %val2 = phi i64 [%val12, %bbb1], [%val22, %bbb2]
+  %tmp = insertvalue { ptr, i64 } poison, ptr %val1, 0
+  %res = insertvalue { ptr, i64 } %tmp, i64 %val2, 1
+  ret {ptr, i64} %res
+}
+
+; Test with more predecessors.
+define {ptr, i64} @test2(i1 %cond1, i1 %cond2, ptr %p1, ptr %p2) {
+; CHECK-LABEL: define { ptr, i64 } @test2(
+; CHECK-SAME: i1 [[COND1:%.*]], i1 [[COND2:%.*]], ptr [[P1:%.*]], ptr [[P2:%.*]]) {
+; CHECK-NEXT:    br i1 [[COND1]], label %[[BBB1:.*]], label %[[BBB4:.*]]
+; CHECK:       [[BBB1]]:
+; CHECK-NEXT:    br i1 [[COND2]], label %[[BBB2:.*]], label %[[BBB3:.*]]
+; CHECK:       [[BBB2]]:
+; CHECK-NEXT:    [[CALL1:%.*]] = call { ptr, i64 } @bar(i64 0)
+; CHECK-NEXT:    br label %[[EXIT:.*]]
+; CHECK:       [[BBB3]]:
+; CHECK-NEXT:    [[CALL2:%.*]] = call { ptr, i64 } @bar(i64 1)
+; CHECK-NEXT:    br label %[[EXIT]]
+; CHECK:       [[BBB4]]:
+; CHECK-NEXT:    [[VAL31:%.*]] = load ptr, ptr [[P1]], align 8
+; CHECK-NEXT:    [[VAL32:%.*]] = load i64, ptr [[P2]], align 4
+; CHECK-NEXT:    [[TMP1:%.*]] = insertvalue { ptr, i64 } poison, ptr [[VAL31]], 0
+; CHECK-NEXT:    [[TMP2:%.*]] = insertvalue { ptr, i64 } [[TMP1]], i64 [[VAL32]], 1
+; CHECK-NEXT:    br label %[[EXIT]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    [[RES_MERGED:%.*]] = phi { ptr, i64 } [ [[CALL1]], %[[BBB2]] ], [ [[CALL2]], %[[BBB3]] ], [ [[TMP2]], %[[BBB4]] ]
+; CHECK-NEXT:    ret { ptr, i64 } [[RES_MERGED]]
+;
+  br i1 %cond1, label %bbb1, label %bbb4
+
+bbb1:
+  br i1 %cond2, label %bbb2, label %bbb3
+
+bbb2:
+  %call1 = call {ptr, i64} @bar(i64 0)
+  %val11 = extractvalue { ptr, i64 } %call1, 0
+  %val12 = extractvalue { ptr, i64 } %call1, 1
+  br label %exit
+
+bbb3:
+  %call2 = call {ptr, i64} @bar(i64 1)
+  %val21 = extractvalue { ptr, i64 } %call2, 0
+  %val22 = extractvalue { ptr, i64 } %call2, 1
+  br label %exit
+
+bbb4:
+  %val31 = load ptr, ptr %p1
+  %val32 = load i64, ptr %p2
+  br label %exit
+
+exit:
+  %val1 = phi ptr [%val11, %bbb2], [%val21, %bbb3], [%val31, %bbb4]
+  %val2 = phi i64 [%val12, %bbb2], [%val22, %bbb3], [%val32, %bbb4]
+  %tmp = insertvalue { ptr, i64 } poison, ptr %val1, 0
+  %res = insertvalue { ptr, i64 } %tmp, i64 %val2, 1
+  ret {ptr, i64} %res
+}
+
+; Test with multiple PHI instructions.
+define {ptr, i64} @test3(i1 %cond1, i1 %cond2, ptr %val31, i64 %val32) {
+; CHECK-LABEL: define { ptr, i64 } @test3(
+; CHECK-SAME: i1 [[COND1:%.*]], i1 [[COND2:%.*]], ptr [[VAL31:%.*]], i64 [[VAL32:%.*]]) {
+; CHECK-NEXT:    br i1 [[COND1]], label %[[BBB1:.*]], label %[[BBB4:.*]]
+; CHECK:       [[BBB1]]:
+; CHECK-NEXT:    br i1 [[COND2]], label %[[BBB2:.*]], label %[[BBB3:.*]]
+; CHECK:       [[BBB2]]:
+; CHECK-NEXT:    [[CALL1:%.*]] = call { ptr, i64 } @bar(i64 0)
+; CHECK-NEXT:    br label %[[EXIT:.*]]
+; CHECK:       [[BBB3]]:
+; CHECK-NEXT:    [[CALL2:%.*]] = call { ptr, i64 } @bar(i64 1)
+; CHECK-NEXT:    br label %[[BBB5:.*]]
+; CHECK:       [[BBB4]]:
+; CHECK-NEXT:    [[CALL3:%.*]] = call { ptr, i64 } @bar(i64 2)
+; CHECK-NEXT:    [[TMP1:%.*]] = insertvalue { ptr, i64 } poison, ptr [[VAL31]], 0
+; CHECK-NEXT:    [[TMP2:%.*]] = insertvalue { ptr, i64 } [[TMP1]], i64 [[VAL32]], 1
+; CHECK-NEXT:    br label %[[BBB5]]
+; CHECK:       [[BBB5]]:
+; CHECK-NEXT:    [[DOTMERGED:%.*]] = phi { ptr, i64 } [ [[CALL2]], %[[BBB3]] ], [ [[TMP2]], %[[BBB4]] ]
+; CHECK-NEXT:    br label %[[EXIT]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    [[RES_MERGED:%.*]] = phi { ptr, i64 } [ [[CALL1]], %[[BBB2]] ], [ [[DOTMERGED]], %[[BBB5]] ]
+; CHECK-NEXT:    ret { ptr, i64 } [[RES_MERGED]]
+;
+  br i1 %cond1, label %bbb1, label %bbb4
+
+bbb1:
+  br i1 %cond2, label %bbb2, label %bbb3
+
+bbb2:
+  %call1 = call {ptr, i64} @bar(i64 0)
+  %val11 = extractvalue { ptr, i64 } %call1, 0
+  %val12 = extractvalue { ptr, i64 } %call1, 1
+  br label %exit
+
+bbb3:
+  %call2 = call {ptr, i64} @bar(i64 1)
+  %val21 = extractvalue { ptr, i64 } %call2, 0
+  %val22 = extractvalue { ptr, i64 } %call2, 1
+  br label %bbb5
+
+bbb4:
+  %call3 = call {ptr, i64} @bar(i64 2)
+  br label %bbb5
+
+bbb5:
+  %val41 = phi ptr [%val21, %bbb3], [%val31, %bbb4]
+  %val42 = phi i64 [%val22, %bbb3], [%val32, %bbb4]
+  br label %exit
+
+exit:
+  %val1 = phi ptr [%val11, %bbb2], [%val41, %bbb5]
+  %val2 = phi i64 [%val12, %bbb2], [%val42, %bbb5]
+  %tmp = insertvalue { ptr, i64 } poison, ptr %val1, 0
+  %res = insertvalue { ptr, i64 } %tmp, i64 %val2, 1
+  ret {ptr, i64} %res
+}
+
+; Negative test, bbb4 has multiple successors, so we don't add insertvalue to it.
+define {ptr, i64} @test4(i1 %cond1, i1 %cond2, ptr %p1, ptr %p2) {
+; CHECK-LABEL: define { ptr, i64 } @test4(
+; CHECK-SAME: i1 [[COND1:%.*]], i1 [[COND2:%.*]], ptr [[P1:%.*]], ptr [[P2:%.*]]) {
+; CHECK-NEXT:    br i1 [[COND1]], label %[[BBB1:.*]], label %[[BBB4:.*]]
+; CHECK:       [[BBB1]]:
+; CHECK-NEXT:    br i1 [[COND2]], label %[[BBB2:.*]], label %[[BBB3:.*]]
+; CHECK:       [[BBB2]]:
+; CHECK-NEXT:    [[CALL1:%.*]] = call { ptr, i64 } @bar(i64 0)
+; CHECK-NEXT:    [[VAL11:%.*]] = extractvalue { ptr, i64 } [[CALL1]], 0
+; CHECK-NEXT:    [[VAL12:%.*]] = extractvalue { ptr, i64 } [[CALL1]], 1
+; CHECK-NEXT:    br label %[[EXIT:.*]]
+; CHECK:       [[BBB3]]:
+; CHECK-NEXT:    [[CALL2:%.*]] = call { ptr, i64 } @bar(i64 1)
+; CHECK-NEXT:    [[VAL21:%.*]] = extractvalue { ptr, i64 } [[CALL2]], 0
+; CHECK-NEXT:    [[VAL22:%.*]] = extractvalue { ptr, i64 } [[CALL2]], 1
+; CHECK-NEXT:    br label %[[EXIT]]
+; CHECK:       [[BBB4]]:
+; CHECK-NEXT:    [[VAL31:%.*]] = load ptr, ptr [[P1]], align 8
+; CHECK-NEXT:    [[VAL32:%.*]] = load i64, ptr [[P2]], align 4
+; CHECK-NEXT:    [[COND3_NOT:%.*]] = icmp eq i64 [[VAL32]], 0
+; CHECK-NEXT:    br i1 [[COND3_NOT]], label %[[EXIT]], label %[[BBB4]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    [[VAL1:%.*]] = phi ptr [ [[VAL11]], %[[BBB2]] ], [ [[VAL21]], %[[BBB3]] ], [ [[VAL31]], %[[BBB4]] ]
+; CHECK-NEXT:    [[VAL2:%.*]] = phi i64 [ [[VAL12]], %[[BBB2]] ], [ [[VAL22]], %[[BBB3]] ], [ [[VAL32]], %[[BBB4]] ]
+; CHECK-NEXT:    [[TMP:%.*]] = insertvalue { ptr, i64 } poison, ptr [[VAL1]], 0
+; CHECK-NEXT:    [[RES:%.*]] = insertvalue { ptr, i64 } [[TMP]], i64 [[VAL2]], 1
+; CHECK-NEXT:    ret { ptr, i64 } [[RES]]
+;
+  br i1 %cond1, label %bbb1, label %bbb4
+
+bbb1:
+  br i1 %cond2, label %bbb2, label %bbb3
+
+bbb2:
+  %call1 = call {ptr, i64} @bar(i64 0)
+  %val11 = extractvalue { ptr, i64 } %call1, 0
+  %val12 = extractvalue { ptr, i64 } %call1, 1
+  br label %exit
+
+bbb3:
+  %call2 = call {ptr, i64} @bar(i64 1)
+  %val21 = extractvalue { ptr, i64 } %call2, 0
+  %val22 = extractvalue { ptr, i64 } %call2, 1
+  br label %exit
+
+bbb4:
+  %val31 = load ptr, ptr %p1
+  %val32 = load i64, ptr %p2
+  %cond3 = icmp ne i64 %val32, 0
+  br i1 %cond3, label %bbb4, label %exit
+
+exit:
+  %val1 = phi ptr [%val11, %bbb2], [%val21, %bbb3], [%val31, %bbb4]
+  %val2 = phi i64 [%val12, %bbb2], [%val22, %bbb3], [%val32, %bbb4]
+  %tmp = insertvalue { ptr, i64 } poison, ptr %val1, 0
+  %res = insertvalue { ptr, i64 } %tmp, i64 %val2, 1
+  ret {ptr, i64} %res
+}
diff --git a/llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll b/llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll
index fbf58d47a32d2..41e3197e5a2f0 100644
--- a/llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll
+++ b/llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll
@@ -166,14 +166,13 @@ define %struct.half @one_elem_struct_merge(i1 %cond, %struct.half %a, half %b) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    br i1 [[COND:%.*]], label [[BB0:%.*]], label [[BB1:%.*]]
 ; CHECK:       BB0:
-; CHECK-NEXT:    [[TMP0:%.*]] = extractvalue [[STRUCT_HALF:%.*]] [[A:%.*]], 0
 ; CHECK-NEXT:    br label [[SINK:%.*]]
 ; CHECK:       BB1:
+; CHECK-NEXT:    [[TMP0:%.*]] = insertvalue [[STRUCT_HALF:%.*]] poison, half [[B:%.*]], 0
 ; CHECK-NEXT:    br label [[SINK]]
 ; CHECK:       sink:
-; CHECK-NEXT:    [[STOREMERGE:%.*]] = phi half [ [[TMP0]], [[BB0]] ], [ [[B:%.*]], [[BB1]] ]
-; CHECK-NEXT:    [[VAL1:%.*]] = insertvalue [[STRUCT_HALF]] poison, half [[STOREMERGE]], 0
-; CHECK-NEXT:    ret [[STRUCT_HALF]] [[VAL1]]
+; CHECK-NEXT:    [[VAL1_MERGED:%.*]] = phi [[STRUCT_HALF]] [ [[A:%.*]], [[BB0]] ], [ [[TMP0]], [[BB1]] ]
+; CHECK-NEXT:    ret [[STRUCT_HALF]] [[VAL1_MERGED]]
 ;
 entry:
   %alloca = alloca i64
diff --git a/llvm/test/Transforms/InstCombine/phi-aware-aggregate-reconstruction.ll b/llvm/test/Transforms/InstCombine/phi-aware-aggregate-reconstruction.ll
index 6b7ac445839d2..706ce4e02f231 100644
--- a/llvm/test/Transforms/InstCombine/phi-aware-aggregate-reconstruction.ll
+++ b/llvm/test/Transforms/InstCombine/phi-aware-aggregate-reconstruction.ll
@@ -97,28 +97,26 @@ end:
   ret { i32, i32 } %i8
 }
 
-; When coming from %left, elements are swapped
-define { i32, i32 } @negative_test2({ i32, i32 } %agg_left, { i32, i32 } %agg_right, i1 %c) {
-; CHECK-LABEL: @negative_test2(
+; When coming from %left, elements are swapped, and new insertvalue is created.
+; From right side the old aggregate can be directly reused.
+define { i32, i32 } @positive_test2({ i32, i32 } %agg_left, { i32, i32 } %agg_right, i1 %c) {
+; CHECK-LABEL: @positive_test2(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    br i1 [[C:%.*]], label [[LEFT:%.*]], label [[RIGHT:%.*]]
 ; CHECK:       left:
 ; CHECK-NEXT:    [[I2:%.*]] = extractvalue { i32, i32 } [[AGG_LEFT:%.*]], 1
 ; CHECK-NEXT:    [[I0:%.*]] = extractvalue { i32, i32 } [[AGG_LEFT]], 0
 ; CHECK-NEXT:    call void @foo()
+; CHECK-NEXT:    [[TMP0:%.*]] = insertvalue { i32, i32 } poison, i32 [[I2]], 0
+; CHECK-NEXT:    [[TMP1:%.*]] = insertvalue { i32, i32 } [[TMP0]], i32 [[I0]], 1
 ; CHECK-NEXT:    br label [[END:%.*]]
 ; CHECK:       right:
-; CHECK-NEXT:    [[I4:%.*]] = extractvalue { i32, i32 } [[AGG_RIGHT:%.*]], 1
-; CHECK-NEXT:    [[I3:%.*]] = extractvalue { i32, i32 } [[AGG_RIGHT]], 0
 ; CHECK-NEXT:    call void @bar()
 ; CHECK-NEXT:    br label [[END]]
 ; CHECK:       end:
-; CHECK-NEXT:    [[I5:%.*]] = phi i32 [ [[I2]], [[LEFT]] ], [ [[I3]], [[RIGHT]] ]
-; CHECK-NEXT:    [[I6:%.*]] = phi i32 [ [[I0]], [[LEFT]] ], [ [[I4]], [[RIGHT]] ]
+; CHECK-NEXT:    [[I8_MERGED:%.*]] = phi { i32, i32 } [ [[TMP1]], [[LEFT]] ], [ [[AGG_RIGHT:%.*]], [[RIGHT]] ]
 ; CHECK-NEXT:    call void @baz()
-; CHECK-NEXT:    [[I7:%.*]] = insertvalue { i32, i32 } undef, i32 [[I5]], 0
-; CHECK-NEXT:    [[I8:%.*]] = insertvalue { i32, i32 } [[I7]], i32 [[I6]], 1
-; CHECK-NEXT:    ret { i32, i32 } [[I8]]
+; CHECK-NEXT:    ret { i32, i32 } [[I8_MERGED]]
 ;
 entry:
   %i0 = extractvalue { i32, i32 } %agg_left, 0
@@ -417,14 +415,14 @@ define { i32, i32 } @test8({ i32, i32 } %agg_left, { i32, i32 } %agg_right, i1 %
 ; CHECK:       left:
 ; CHECK-NEXT:    call void @foo()
 ; CHECK-NEXT:    switch i32 [[VAL_LEFT:%.*]], label [[IMPOSSIBLE:%.*]] [
-; CHECK-NEXT:    i32 -42, label [[END:%.*]]
-; CHECK-NEXT:    i32 42, label [[END]]
+; CHECK-NEXT:      i32 -42, label [[END:%.*]]
+; CHECK-NEXT:      i32 42, label [[END]]
 ; CHECK-NEXT:    ]
 ; CHECK:       right:
 ; CHECK-NEXT:    call void @bar()
 ; CHECK-NEXT:    switch i32 [[VAL_RIGHT:%.*]], label [[IMPOSSIBLE]] [
-; CHECK-NEXT:    i32 42, label [[END]]
-; CHECK-NEXT:    i32 -42, label [[END]]
+; CHECK-NEXT:      i32 42, label [[END]]
+; CHECK-NEXT:      i32 -42, label [[END]]
 ; CHECK-NEXT:    ]
 ; CHECK:       impossible:
 ; CHECK-NEXT:    unreachable

@weiguozhi weiguozhi requested a review from LebedevRI July 26, 2024 22:52
dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Jul 27, 2024
dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Jul 30, 2024
Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch causes InstCombine to hang when building https://github.com/jqlang/jq:

; timeout 5s bin/opt -passes=instcombine reduced.ll -S
define { i64, ptr } @minmax_by({ i64, ptr } %0, i1 %1, i1 %2, { i64, ptr } %3) {
  %5 = extractvalue { i64, ptr } %0, 0
  %6 = extractvalue { i64, ptr } %0, 1
  br label %7

7:                                                ; preds = %14, %4
  %.sroa.01.0 = phi i64 [ %5, %4 ], [ %.sroa.01.1, %14 ]
  %.sroa.3.0 = phi ptr [ %6, %4 ], [ %.sroa.3.1, %14 ]
  br i1 %1, label %8, label %15

8:                                                ; preds = %7
  br i1 %2, label %9, label %12

9:                                                ; preds = %8
  %10 = extractvalue { i64, ptr } %3, 0
  %11 = extractvalue { i64, ptr } %3, 1
  br label %14

12:                                               ; preds = %8
  %13 = getelementptr { i64, ptr }, ptr null, i32 0, i32 0
  br label %14

14:                                               ; preds = %12, %9
  %.sroa.01.1 = phi i64 [ %10, %9 ], [ %.sroa.01.0, %12 ]
  %.sroa.3.1 = phi ptr [ %11, %9 ], [ %.sroa.3.0, %12 ]
  br label %7

15:                                               ; preds = %7
  %.fca.0.insert = insertvalue { i64, ptr } zeroinitializer, i64 %.sroa.01.0, 0
  %.fca.1.insert = insertvalue { i64, ptr } %.fca.0.insert, ptr %.sroa.3.0, 1
  ret { i64, ptr } %.fca.1.insert
}

@weiguozhi
Copy link
Contributor Author

This patch causes InstCombine to hang when building https://github.com/jqlang/jq:

; timeout 5s bin/opt -passes=instcombine reduced.ll -S
define { i64, ptr } @minmax_by({ i64, ptr } %0, i1 %1, i1 %2, { i64, ptr } %3) {
  %5 = extractvalue { i64, ptr } %0, 0
  %6 = extractvalue { i64, ptr } %0, 1
  br label %7

7:                                                ; preds = %14, %4
  %.sroa.01.0 = phi i64 [ %5, %4 ], [ %.sroa.01.1, %14 ]
  %.sroa.3.0 = phi ptr [ %6, %4 ], [ %.sroa.3.1, %14 ]
  br i1 %1, label %8, label %15

8:                                                ; preds = %7
  br i1 %2, label %9, label %12

9:                                                ; preds = %8
  %10 = extractvalue { i64, ptr } %3, 0
  %11 = extractvalue { i64, ptr } %3, 1
  br label %14

12:                                               ; preds = %8
  %13 = getelementptr { i64, ptr }, ptr null, i32 0, i32 0
  br label %14

14:                                               ; preds = %12, %9
  %.sroa.01.1 = phi i64 [ %10, %9 ], [ %.sroa.01.0, %12 ]
  %.sroa.3.1 = phi ptr [ %11, %9 ], [ %.sroa.3.0, %12 ]
  br label %7

15:                                               ; preds = %7
  %.fca.0.insert = insertvalue { i64, ptr } zeroinitializer, i64 %.sroa.01.0, 0
  %.fca.1.insert = insertvalue { i64, ptr } %.fca.0.insert, ptr %.sroa.3.0, 1
  ret { i64, ptr } %.fca.1.insert
}

@weiguozhi weiguozhi closed this Jul 30, 2024
@weiguozhi
Copy link
Contributor Author

Thanks for the quick test.

The new insertvalue is added to a predecessor with only one successor, so it intends to reduce the number of executed insertvalue instruction. But it doesn't behave like this when a loop is involved. The loop exiting edges and loop latches don't guarantee reduced dynamic number of insertvalue instructions. We should check these cases.

@weiguozhi weiguozhi reopened this Jul 30, 2024
@weiguozhi
Copy link
Contributor Author

ping

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Aug 21, 2024
Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another reproducer causing instcombine to hang:

bin/opt -passes=instcombine reduced.ll
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

define { i64, ptr } @minmax_by({ i64, ptr } %0, i1 %1, i1 %2, { i64, ptr } %3) {
  %5 = extractvalue { i64, ptr } %0, 0
  %6 = extractvalue { i64, ptr } %0, 1
  br label %7

7:                                                ; preds = %14, %4
  %.sroa.01.0 = phi i64 [ %5, %4 ], [ %.sroa.01.1, %14 ]
  %.sroa.3.0 = phi ptr [ %6, %4 ], [ %.sroa.3.1, %14 ]
  br i1 %1, label %8, label %15

8:                                                ; preds = %7
  br i1 %2, label %9, label %12

9:                                                ; preds = %8
  %10 = extractvalue { i64, ptr } %3, 0
  %11 = extractvalue { i64, ptr } %3, 1
  br label %14

12:                                               ; preds = %8
  %13 = getelementptr { i64, ptr }, ptr null, i32 0, i32 0
  br label %14

14:                                               ; preds = %12, %9
  %.sroa.01.1 = phi i64 [ %10, %9 ], [ %.sroa.01.0, %12 ]
  %.sroa.3.1 = phi ptr [ %11, %9 ], [ %.sroa.3.0, %12 ]
  br label %7

15:                                               ; preds = %7
  %.fca.0.insert = insertvalue { i64, ptr } zeroinitializer, i64 %.sroa.01.0, 0
  %.fca.1.insert = insertvalue { i64, ptr } %.fca.0.insert, ptr %.sroa.3.0, 1
  ret { i64, ptr } %.fca.1.insert
}

BTW, I don't think this transform is beneficial since it blocks some select-related optimizations:
Before this patch:

define hidden { i64, i64 } @"_ZN4core6result19Result$LT$T$C$E$GT$7map_err17h84015741f1851455E.llvm.9397651224479784633"(i64 noundef %0, i64 %1) unnamed_addr #0 {
   %3 = insertvalue { i64, i64 } poison, i64 %0, 0
   %4 = insertvalue { i64, i64 } %3, i64 %1, 1
   ret { i64, i64 } %4
 }

After this patch:

define hidden { i64, i64 } @"_ZN4core6result19Result$LT$T$C$E$GT$7map_err17h84015741f1851455E.llvm.9397651224479784633"(i64 noundef %0, i64 %1) unnamed_addr #0 {
   %3 = icmp eq i64 %0, -9223372036854775807
   %4 = insertvalue { i64, i64 } poison, i64 %0, 0
   %5 = insertvalue { i64, i64 } %4, i64 %1, 1
   %.merged = select i1 %3, { i64, i64 } { i64 -9223372036854775807, i64 undef }, { i64, i64 } %5
   ret { i64, i64 } %.merged
 }

Can you provide some performance data to demonstrate the usefulness of this patch?

@weiguozhi
Copy link
Contributor Author

The test case is extracted from hot path of tcmalloc, so it has real world impact.

This patch uses LoopInfo, so you need to run it with

bin/opt -passes='instcombine<use-loop-info>' reduced.ll

I ran the patch on your smaller test case, and got

opt -passes='instcombine<use-loop-info>' reduced2.ll -S

define hidden { i64, i64 } @"_ZN4core6result19Result$LT$T$C$E$GT$7map_err17h84015741f1851E.llvm"(i64 noundef %0, i64 %1) unnamed_addr {
  %3 = insertvalue { i64, i64 } poison, i64 %0, 0
  %4 = insertvalue { i64, i64 } %3, i64 %1, 1
  ret { i64, i64 } %4
}

How did you run it?

@dtcxzyw
Copy link
Member

dtcxzyw commented Aug 21, 2024

The test case is extracted from hot path of tcmalloc, so it has real world impact.

This patch uses LoopInfo, so you need to run it with

bin/opt -passes='instcombine<use-loop-info>' reduced.ll

I ran the patch on your smaller test case, and got

opt -passes='instcombine<use-loop-info>' reduced2.ll -S

define hidden { i64, i64 } @"_ZN4core6result19Result$LT$T$C$E$GT$7map_err17h84015741f1851E.llvm"(i64 noundef %0, i64 %1) unnamed_addr {
  %3 = insertvalue { i64, i64 } poison, i64 %0, 0
  %4 = insertvalue { i64, i64 } %3, i64 %1, 1
  ret { i64, i64 } %4
}

How did you run it?

I run it with opt -O3. UseLoopInfo is disabled by default due to compilation time regression: ef6f235

@weiguozhi
Copy link
Contributor Author

How did you run it?

I run it with opt -O3. UseLoopInfo is disabled by default due to compilation time regression: ef6f235

With -O3 I got

opt -O3 reduced2.ll -S

; ModuleID = 'reduced2.ll'
source_filename = "reduced2.ll"

; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
define hidden { i64, i64 } @"_ZN4core6result19Result$LT$T$C$E$GT$7map_err17h84015741f1851E.llvm"(i64 noundef %0, i64 %1) unnamed_addr #0 {
  %3 = insertvalue { i64, i64 } poison, i64 %0, 0
  %4 = insertvalue { i64, i64 } %3, i64 %1, 1
  ret { i64, i64 } %4
}

Thank you for the LoopInfo information. In order to make it more usable I should avoid the LoopInfo usage. I will think it more.

@weiguozhi
Copy link
Contributor Author

Now if I need to add insertvalue into predecessor, I restrict it to

1, UseBB == OrigBB, then it can not cross a loop
2, If profile is available, we can only add insertvalue into colder BB.

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Sep 10, 2024
@weiguozhi
Copy link
Contributor Author

ping

@nikic
Copy link
Contributor

nikic commented Sep 16, 2024

Some quick thoughts:

  • Looking at https://github.com/dtcxzyw/llvm-opt-benchmark/pull/1288/files, a significant number of the transforms look non-profitable. This transforms is reasonable if it actually drops extractelement + insertelement pairs, but if you just move insertelements to a predecessor while converting scalar phis into aggregate phis, I think that's a loss, because optimizations generally have troubles with aggregate SSA values. I think this happens mostly if the other phi operands are constants.
  • BFI in InstCombine -- wat? It looks like it's currently only used in SLC. Let's not use it in InstCombine proper.
  • To protect against cycles you probably want to check isBackEdge() instead, which was recently added.
  • succ_size(Pred) != 1 => This should probably explicitly check for an unconditional branch, so you don't need to deal with the weird edge cases? (Like say, a single-successor callbr with aggregate return).

BFI in InstCombine has limited usage, it's usually not available, so
remove it. Also avoid constructing constant aggregate.
@weiguozhi
Copy link
Contributor Author

  • Looking at https://github.com/dtcxzyw/llvm-opt-benchmark/pull/1288/files, a significant number of the transforms look non-profitable. This transforms is reasonable if it actually drops extractelement + insertelement pairs, but if you just move insertelements to a predecessor while converting scalar phis into aggregate phis, I think that's a loss, because optimizations generally have troubles with aggregate SSA values. I think this happens mostly if the other phi operands are constants.

The original code removes fully redundant insertvalue, this patch enhances it to remove partial redundant insertvalue, so it actually drop redundant insertvalue. Avoiding constant aggregate sounds reasonable, I added code to check it.

  • BFI in InstCombine -- wat? It looks like it's currently only used in SLC. Let's not use it in InstCombine proper.

Deleted.

  • To protect against cycles you probably want to check isBackEdge() instead, which was recently added.

Thanks for the info. Currently I restrict it to UseBB == OrigBB && succ_size(Pred) == 1, it can already avoid cycles.

  • succ_size(Pred) != 1 => This should probably explicitly check for an unconditional branch, so you don't need to deal with the weird edge cases? (Like say, a single-successor callbr with aggregate return).

Changed it to checking for unconditional branch.

@weiguozhi
Copy link
Contributor Author

@dtcxzyw could you help to explain these numbers? Execution time? File size? Number of llvm IR instructions?

Top 5 improvements:
  minetest/s_client.cpp.ll 300581269 294367598 -2.07%
  abseil-cpp/kernel_timeout_test.cc.ll 1973629781 1937840415 -1.81%
  cmake/cmFindBase.cxx.ll 1913629020 1891008778 -1.18%
  assimp/IFCReaderGen2_2x3.cpp.ll 3046110312 3017099134 -0.95%
  darktable/lut3dgmic.cpp.ll 236183674 234289794 -0.80%
Top 5 regressions:
  minetest/test_connection.cpp.ll 682795465 698591227 +2.31%
  cmake/cmQtAutoMocUic.cxx.ll 8264379610 8383878253 +1.45%
  tokio-rs/132n8ebvfaa2dg8b.ll 82150296 82583502 +0.53%
  tokio-rs/40sgumesnmyyffj3.ll 67652318 67979824 +0.48%
  spike/f128_rem.ll 123368228 123914940 +0.44%

Overall: 0.01762297%

I can guess out the numbers in other two parts, the llvm stats result and the changed lines in diff result.

@dtcxzyw
Copy link
Member

dtcxzyw commented Sep 21, 2024

@dtcxzyw could you help to explain these numbers? Execution time? File size? Number of llvm IR instructions?

Top 5 improvements:
  minetest/s_client.cpp.ll 300581269 294367598 -2.07%
  abseil-cpp/kernel_timeout_test.cc.ll 1973629781 1937840415 -1.81%
  cmake/cmFindBase.cxx.ll 1913629020 1891008778 -1.18%
  assimp/IFCReaderGen2_2x3.cpp.ll 3046110312 3017099134 -0.95%
  darktable/lut3dgmic.cpp.ll 236183674 234289794 -0.80%
Top 5 regressions:
  minetest/test_connection.cpp.ll 682795465 698591227 +2.31%
  cmake/cmQtAutoMocUic.cxx.ll 8264379610 8383878253 +1.45%
  tokio-rs/132n8ebvfaa2dg8b.ll 82150296 82583502 +0.53%
  tokio-rs/40sgumesnmyyffj3.ll 67652318 67979824 +0.48%
  spike/f128_rem.ll 123368228 123914940 +0.44%

Overall: 0.01762297%

I can guess out the numbers in other two parts, the llvm stats result and the changed lines in diff result.

It is the number of instructions executed by opt -O3.

@weiguozhi
Copy link
Contributor Author

It is the number of instructions executed by opt -O3.

So it's the number of instructions executed by opt, not the number of instructions executed by the generated code. Am I right?

@arsenm
Copy link
Contributor

arsenm commented Sep 21, 2024

It is the number of instructions executed by opt -O3.

So it's the number of instructions executed by opt, not the number of instructions executed by the generated code. Am I right?

Yes

@weiguozhi
Copy link
Contributor Author

It is the number of instructions executed by opt -O3.

So it's the number of instructions executed by opt, not the number of instructions executed by the generated code. Am I right?

Yes

Thanks for the confirmation. So it seems compile time impact is neutral.

I can only see two files from dtcxzyw/llvm-opt-benchmark@78a3cec. Both files reduced the dynamic number of insertvalue instructions, so it has positive performance impact.

@weiguozhi
Copy link
Contributor Author

ping

// If UseBB is the single successor of Pred, we can add InsertValue to
// Pred.
auto *BI = dyn_cast<BranchInst>(Pred->getTerminator());
if (!(BI && BI->isUnconditional()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!(BI && BI->isUnconditional()))
if (!BI || !BI->isUnconditional())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants