[LoopUnroll] Add flag to disable profile usage #102950

ellishg · 2024-08-12T18:36:28Z

Create the -loop-unroll-use-branch-weights LLVM flag to enable/disable the usage of getLoopEstimatedTripCount() which uses branch weights to determine the peel count.

llvm-project/llvm/include/llvm/Transforms/Utils/LoopUtils.h

Lines 318 to 324 in dc21cb5

    
           /// Returns a loop's estimated trip count based on branch weight metadata. 
        
           /// In addition if \p EstimatedLoopInvocationWeight is not null it is 
        
           /// initialized with weight of loop's latch leading to the exit. 
        
           /// Returns 0 when the count is estimated to be 0, or std::nullopt when a 
        
           /// meaningful estimate can not be made. 
        
           std::optional<unsigned> 
        
           getLoopEstimatedTripCount(Loop *L,

When building with -Oz, consuming profiles can drastically increase binary size. We found -loop-unroll-use-branch-weights=false can give us a 1.6% text size win when profiles are used, which mitigates some of this regression.

llvmbot · 2024-08-12T18:37:00Z

@llvm/pr-subscribers-llvm-transforms

Author: Ellis Hoag (ellishg)

Changes

Create the -loop-unroll-use-branch-weights LLVM flag to enable/disable the usage of getLoopEstimatedTripCount() which uses branch weights to determine the peel count.

llvm-project/llvm/include/llvm/Transforms/Utils/LoopUtils.h

Lines 318 to 324 in dc21cb5

    
           /// Returns a loop's estimated trip count based on branch weight metadata. 
        
           /// In addition if \p EstimatedLoopInvocationWeight is not null it is 
        
           /// initialized with weight of loop's latch leading to the exit. 
        
           /// Returns 0 when the count is estimated to be 0, or std::nullopt when a 
        
           /// meaningful estimate can not be made. 
        
           std::optional<unsigned> 
        
           getLoopEstimatedTripCount(Loop *L,

When building with -Oz, consuming profiles can drastically increase binary size. We found -loop-unroll-use-branch-weights=false can give us a 1.6% text size win when profiles are used, which mitigates some of this regression.

Full diff: https://github.com/llvm/llvm-project/pull/102950.diff

4 Files Affected:

(modified) llvm/include/llvm/Transforms/Utils/LoopPeel.h (+2-1)
(modified) llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp (+9-2)
(modified) llvm/lib/Transforms/Utils/LoopPeel.cpp (+3-3)
(modified) llvm/test/Transforms/LoopUnroll/peel-loop-conditions-pgo-1.ll (+4-3)

diff --git a/llvm/include/llvm/Transforms/Utils/LoopPeel.h b/llvm/include/llvm/Transforms/Utils/LoopPeel.h
index 0b78700ca71bb9..987c21b7ca5610 100644
--- a/llvm/include/llvm/Transforms/Utils/LoopPeel.h
+++ b/llvm/include/llvm/Transforms/Utils/LoopPeel.h
@@ -37,7 +37,8 @@ gatherPeelingPreferences(Loop *L, ScalarEvolution &SE,
 void computePeelCount(Loop *L, unsigned LoopSize,
                       TargetTransformInfo::PeelingPreferences &PP,
                       unsigned TripCount, DominatorTree &DT,
-                      ScalarEvolution &SE, AssumptionCache *AC = nullptr,
+                      ScalarEvolution &SE, bool UseBranchWeights,
+                      AssumptionCache *AC = nullptr,
                       unsigned Threshold = UINT_MAX);
 
 } // end namespace llvm
diff --git a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
index cbc35b6dd4292a..0a446851acf2d3 100644
--- a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
@@ -179,6 +179,12 @@ static cl::opt<unsigned> PragmaUnrollFullMaxIterations(
     "pragma-unroll-full-max-iterations", cl::init(1'000'000), cl::Hidden,
     cl::desc("Maximum allowed iterations to unroll under pragma unroll full."));
 
+static cl::opt<bool>
+    UseBranchWeights("loop-unroll-use-branch-weights", cl::init(true),
+                     cl::Hidden,
+                     cl::desc("Estimate loop trip counts with branch weight "
+                              "metadata to help determine the peel count"));
+
 /// A magic value for use with the Threshold parameter to indicate
 /// that the loop unroll should be performed regardless of how much
 /// code expansion would result.
@@ -1012,7 +1018,8 @@ bool llvm::computeUnrollCount(
   }
 
   // 5th priority is loop peeling.
-  computePeelCount(L, LoopSize, PP, TripCount, DT, SE, AC, UP.Threshold);
+  computePeelCount(L, LoopSize, PP, TripCount, DT, SE, UseBranchWeights, AC,
+                   UP.Threshold);
   if (PP.PeelCount) {
     UP.Runtime = false;
     UP.Count = 1;
@@ -1081,7 +1088,7 @@ bool llvm::computeUnrollCount(
   }
 
   // Check if the runtime trip count is too small when profile is available.
-  if (L->getHeader()->getParent()->hasProfileData()) {
+  if (UseBranchWeights && L->getHeader()->getParent()->hasProfileData()) {
     if (auto ProfileTripCount = getLoopEstimatedTripCount(L)) {
       if (*ProfileTripCount < FlatLoopTripCountThreshold)
         return false;
diff --git a/llvm/lib/Transforms/Utils/LoopPeel.cpp b/llvm/lib/Transforms/Utils/LoopPeel.cpp
index 5d7c0d947facc4..9557d31a122a63 100644
--- a/llvm/lib/Transforms/Utils/LoopPeel.cpp
+++ b/llvm/lib/Transforms/Utils/LoopPeel.cpp
@@ -538,8 +538,8 @@ static bool violatesLegacyMultiExitLoopCheck(Loop *L) {
 void llvm::computePeelCount(Loop *L, unsigned LoopSize,
                             TargetTransformInfo::PeelingPreferences &PP,
                             unsigned TripCount, DominatorTree &DT,
-                            ScalarEvolution &SE, AssumptionCache *AC,
-                            unsigned Threshold) {
+                            ScalarEvolution &SE, bool UseBranchWeights,
+                            AssumptionCache *AC, unsigned Threshold) {
   assert(LoopSize > 0 && "Zero loop size is not allowed!");
   // Save the PP.PeelCount value set by the target in
   // TTI.getPeelingPreferences or by the flag -unroll-peel-count.
@@ -632,7 +632,7 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize,
   // hit the peeled section.
   // We only do this in the presence of profile information, since otherwise
   // our estimates of the trip count are not reliable enough.
-  if (L->getHeader()->getParent()->hasProfileData()) {
+  if (UseBranchWeights && L->getHeader()->getParent()->hasProfileData()) {
     if (violatesLegacyMultiExitLoopCheck(L))
       return;
     std::optional<unsigned> EstimatedTripCount = getLoopEstimatedTripCount(L);
diff --git a/llvm/test/Transforms/LoopUnroll/peel-loop-conditions-pgo-1.ll b/llvm/test/Transforms/LoopUnroll/peel-loop-conditions-pgo-1.ll
index e3cfe53950f572..c7fb389c635957 100644
--- a/llvm/test/Transforms/LoopUnroll/peel-loop-conditions-pgo-1.ll
+++ b/llvm/test/Transforms/LoopUnroll/peel-loop-conditions-pgo-1.ll
@@ -1,5 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
-; RUN: opt < %s -S -passes=loop-unroll,loop-unroll -verify-dom-info -debug-only=loop-unroll -unroll-peel-max-count=7 2>&1 | FileCheck %s
+; RUN: opt < %s -S -passes=loop-unroll,loop-unroll -verify-dom-info -debug-only=loop-unroll -unroll-peel-max-count=7 2>&1 | FileCheck %s --check-prefixes=CHECK,PGO
+; RUN: opt < %s -S -passes=loop-unroll,loop-unroll -verify-dom-info -debug-only=loop-unroll -unroll-peel-max-count=7 -loop-unroll-use-branch-weights=false 2>&1 | FileCheck %s
 ; REQUIRES: asserts
 
 declare void @f1()
@@ -11,8 +12,8 @@ declare void @f2()
 define void @test1(i32 %k) !prof !4 {
 ; CHECK: Loop Unroll: F[test1] Loop %for.body
 ; CHECK: PEELING loop %for.body with iteration count 2!
-; CHECK: PEELING loop %for.body with iteration count 5!
-; CHECK: llvm.loop.unroll.disable
+; PGO: PEELING loop %for.body with iteration count 5!
+; PGO: llvm.loop.unroll.disable
 for.body.lr.ph:
   br label %for.body

nikic

This isn't the correct way to fix the stated problem. This particular peeling form shouldn't be performed when optimizing for size at all. It should pretty much always be bad for code size.

ellishg · 2024-08-30T19:52:00Z

Thanks @nikic, using -fno-unroll-loops did indeed give us a size win and I think that's the direction we will go in. However, I still see value in this PR. The -loop-unroll-use-branch-weights=false flag would have been hugely helpful for me to triage the size regression I saw when consuming profiles. This also allows users to consume profiles without changing the behavior of this pass.

In fact, I believe that all optimizations passes that use profile data should have the option to disable profile data usage. Many passes already have flags like this (although the flag names are widely inconsistent and difficult to find). I also added a similar flag in #102956.

[LoopUnroll] Add flag to disable PGO usage

9a0e480

ellishg requested review from LebedevRI, arsenm and modiking August 12, 2024 18:36

llvmbot added the llvm:transforms label Aug 12, 2024

ellishg changed the title ~~[LoopUnroll] Add flag to disable PGO usage~~ [LoopUnroll] Add flag to disable profile usage Aug 12, 2024

nikic requested changes Aug 12, 2024

View reviewed changes

ellishg mentioned this pull request Aug 30, 2024

[BlockPlacement] Add flag to disable profile usage #102956

Closed

ellishg closed this Oct 2, 2024

ellishg deleted the loop-peel-pgo branch October 2, 2024 23:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LoopUnroll] Add flag to disable profile usage #102950

[LoopUnroll] Add flag to disable profile usage #102950

ellishg commented Aug 12, 2024

llvmbot commented Aug 12, 2024

nikic left a comment

ellishg commented Aug 30, 2024

	/// Returns a loop's estimated trip count based on branch weight metadata.
	/// In addition if \p EstimatedLoopInvocationWeight is not null it is
	/// initialized with weight of loop's latch leading to the exit.
	/// Returns 0 when the count is estimated to be 0, or std::nullopt when a
	/// meaningful estimate can not be made.
	std::optional<unsigned>
	getLoopEstimatedTripCount(Loop *L,

[LoopUnroll] Add flag to disable profile usage #102950

[LoopUnroll] Add flag to disable profile usage #102950

Conversation

ellishg commented Aug 12, 2024

llvmbot commented Aug 12, 2024

nikic left a comment

Choose a reason for hiding this comment

ellishg commented Aug 30, 2024