-
Notifications
You must be signed in to change notification settings - Fork 14.8k
[SCEV] Fix NSW flag propagation in getGEPExpr, getMulExpr, and getAddExpr #155145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Previously, LAA would incorrectly classify Write-After-Write dependencies with negative distances as safe Forward dependencies, allowing inappropriate vectorization of loops with bidirectional WAW dependencies. The issue occurred in loops like: for(int i = 0; i < n; ++i) { A[(i+1)*4] = 10; // First store A[i] = 100; // Second store } The dependence distance from first store to second store is negative: {-16,+,-12}. However, this represents a bidirectional WAW dependency that DependenceAnalysis would report as 'output [<>]!', indicating dependency in both directions. This patch fixes LAA to properly detect WAW dependencies with negative distances as unsafe, preventing incorrect vectorization. The fix adds a check specifically for Write-After-Write dependencies before the general negative distance handling, ensuring they are classified as Unknown (unsafe) rather than Forward (safe). A proper fix, not implemented here because it would be a major rework of LAA, is to implement bidirectional dependence checking that computes distances in both directions and detects inconsistent direction vectors.
SCEV was losing NSW flags during AddRec operations, causing Dependence Analysis to add unnecessary runtime assumptions for inbounds GEPs. This patch fixes getAddExpr: when combining AddRecs from the same loop, preserve compatible NSW/NUW flags from input AddRecs instead of always using FlagAnyWrap.
SCEV was losing NSW flags during AddRec operations, causing Dependence Analysis to add unnecessary runtime assumptions for inbounds GEPs. This patch fixes getMulExpr: when multiplying AddRecs by constants, preserve NSW flags that were explicitly requested from inbounds GEP operations. The overflow check was too conservative and cleared flags even when they were explicitly requested via OrigFlags.
SCEV was losing NSW flags during AddRec operations, causing Dependence Analysis to add unnecessary runtime assumptions for inbounds GEPs. This patch fixes getGEPExpr: inherit flags from index expressions when GEP has no explicit flags, allowing NSW flags from AddRec indices to propagate to the final GEP result. This eliminates spurious runtime assumptions in DA for expressions like {0,+,(4 * %N * %M)} derived from inbounds GEPs, allowing proper dependence analysis without conservative runtime checks.
@llvm/pr-subscribers-llvm-analysis @llvm/pr-subscribers-llvm-transforms Author: Sebastian Pop (sebpop) ChangesSCEV was losing NSW flags during AddRec operations, causing Dependence There are 4 patches to be reviewed together and committed separately.
The other patches are independent of each other. Patch is 165.65 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155145.diff 42 Files Affected:
diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index bceddd0325276..e522d1392f7f9 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -2196,6 +2196,21 @@ MemoryDepChecker::isDependent(const MemAccessInfo &A, unsigned AIdx,
return Dependence::Unknown;
}
+ // For WAW (Write-After-Write) dependencies, negative distances in one
+ // direction can still represent unsafe dependencies. Since we only check
+ // dependencies in program order (AIdx < BIdx), a negative distance means
+ // the later write accesses memory locations before the earlier write.
+ // However, in a vectorized loop, both writes could execute simultaneously,
+ // potentially causing incorrect behavior. Therefore, WAW with negative
+ // distances should be treated as unsafe.
+ bool IsWriteAfterWrite = (AIsWrite && BIsWrite);
+ if (IsWriteAfterWrite) {
+ LLVM_DEBUG(
+ dbgs() << "LAA: WAW dependence with negative distance is unsafe\n");
+ return CheckCompletelyBeforeOrAfter() ? Dependence::NoDep
+ : Dependence::Unknown;
+ }
+
bool IsTrueDataDependence = (AIsWrite && !BIsWrite);
// Check if the first access writes to a location that is read in a later
// iteration, where the distance between them is not a multiple of a vector
diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp
index d2c445f1ffaa0..5763ef0d1fefa 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -2951,25 +2951,52 @@ const SCEV *ScalarEvolution::getAddExpr(SmallVectorImpl<const SCEV *> &Ops,
if (AddRecLoop == cast<SCEVAddRecExpr>(Ops[OtherIdx])->getLoop()) {
// Other + {A,+,B}<L> + {C,+,D}<L> --> Other + {A+C,+,B+D}<L>
SmallVector<const SCEV *, 4> AddRecOps(AddRec->operands());
+
+ // Track flags: start with the flags from the first AddRec.
+ bool AllHaveNSW = AddRec->hasNoSignedWrap();
+ bool AllHaveNUW = AddRec->hasNoUnsignedWrap();
+
for (; OtherIdx != Ops.size() && isa<SCEVAddRecExpr>(Ops[OtherIdx]);
++OtherIdx) {
const auto *OtherAddRec = cast<SCEVAddRecExpr>(Ops[OtherIdx]);
if (OtherAddRec->getLoop() == AddRecLoop) {
+ // Update flags based on this AddRec
+ if (!OtherAddRec->hasNoSignedWrap())
+ AllHaveNSW = false;
+ if (!OtherAddRec->hasNoUnsignedWrap())
+ AllHaveNUW = false;
for (unsigned i = 0, e = OtherAddRec->getNumOperands();
i != e; ++i) {
if (i >= AddRecOps.size()) {
append_range(AddRecOps, OtherAddRec->operands().drop_front(i));
break;
}
+ // Preserve no-wrap flags when combining AddRec operands.
+ SCEV::NoWrapFlags CombineFlags = SCEV::FlagAnyWrap;
+ if (auto *AR1 = dyn_cast<SCEVAddRecExpr>(AddRecOps[i]))
+ if (auto *AR2 =
+ dyn_cast<SCEVAddRecExpr>(OtherAddRec->getOperand(i))) {
+ if (AR1->hasNoSignedWrap() && AR2->hasNoSignedWrap())
+ CombineFlags = setFlags(CombineFlags, SCEV::FlagNSW);
+ if (AR1->hasNoUnsignedWrap() && AR2->hasNoUnsignedWrap())
+ CombineFlags = setFlags(CombineFlags, SCEV::FlagNUW);
+ }
SmallVector<const SCEV *, 2> TwoOps = {
AddRecOps[i], OtherAddRec->getOperand(i)};
- AddRecOps[i] = getAddExpr(TwoOps, SCEV::FlagAnyWrap, Depth + 1);
+ AddRecOps[i] = getAddExpr(TwoOps, CombineFlags, Depth + 1);
}
Ops.erase(Ops.begin() + OtherIdx); --OtherIdx;
}
}
// Step size has changed, so we cannot guarantee no self-wraparound.
- Ops[Idx] = getAddRecExpr(AddRecOps, AddRecLoop, SCEV::FlagAnyWrap);
+ // However, preserve NSW/NUW flags if all combined AddRecs had them.
+ SCEV::NoWrapFlags FinalFlags = SCEV::FlagAnyWrap;
+ if (AllHaveNSW)
+ FinalFlags = setFlags(FinalFlags, SCEV::FlagNSW);
+ if (AllHaveNUW)
+ FinalFlags = setFlags(FinalFlags, SCEV::FlagNUW);
+
+ Ops[Idx] = getAddRecExpr(AddRecOps, AddRecLoop, FinalFlags);
return getAddExpr(Ops, SCEV::FlagAnyWrap, Depth + 1);
}
}
@@ -3252,7 +3279,7 @@ const SCEV *ScalarEvolution::getMulExpr(SmallVectorImpl<const SCEV *> &Ops,
// NLI * LI * {Start,+,Step} --> NLI * {LI*Start,+,LI*Step}
SmallVector<const SCEV *, 4> NewOps;
NewOps.reserve(AddRec->getNumOperands());
- const SCEV *Scale = getMulExpr(LIOps, SCEV::FlagAnyWrap, Depth + 1);
+ const SCEV *Scale = getMulExpr(LIOps, OrigFlags, Depth + 1);
// If both the mul and addrec are nuw, we can preserve nuw.
// If both the mul and addrec are nsw, we can only preserve nsw if either
@@ -3261,6 +3288,15 @@ const SCEV *ScalarEvolution::getMulExpr(SmallVectorImpl<const SCEV *> &Ops,
SCEV::NoWrapFlags Flags =
AddRec->getNoWrapFlags(ComputeFlags({Scale, AddRec}));
+ // Preserve flags for positive constant Scale.
+ if (auto *SC = dyn_cast<SCEVConstant>(Scale))
+ if (SC->getAPInt().isStrictlyPositive()) {
+ if (hasFlags(OrigFlags, SCEV::FlagNSW))
+ Flags = setFlags(Flags, SCEV::FlagNSW);
+ if (hasFlags(OrigFlags, SCEV::FlagNUW))
+ Flags = setFlags(Flags, SCEV::FlagNUW);
+ }
+
for (unsigned i = 0, e = AddRec->getNumOperands(); i != e; ++i) {
NewOps.push_back(getMulExpr(Scale, AddRec->getOperand(i),
SCEV::FlagAnyWrap, Depth + 1));
@@ -3270,7 +3306,9 @@ const SCEV *ScalarEvolution::getMulExpr(SmallVectorImpl<const SCEV *> &Ops,
Instruction::Mul, getSignedRange(Scale),
OverflowingBinaryOperator::NoSignedWrap);
if (!NSWRegion.contains(getSignedRange(AddRec->getOperand(i))))
- Flags = clearFlags(Flags, SCEV::FlagNSW);
+ if (!hasFlags(OrigFlags, SCEV::FlagNSW))
+ Flags = clearFlags(Flags, SCEV::FlagNSW);
+ // Keep NSW flag if it was in OrigFlags.
}
}
@@ -3760,6 +3798,31 @@ ScalarEvolution::getGEPExpr(GEPOperator *GEP,
if (NW.hasNoUnsignedWrap())
OffsetWrap = setFlags(OffsetWrap, SCEV::FlagNUW);
+ // Inherit flags from index expressions when GEP has no explicit flags.
+ if (OffsetWrap == SCEV::FlagAnyWrap) {
+ // Check if all index expressions have compatible no-wrap flags
+ bool AllHaveNSW = true, AllHaveNUW = true;
+ for (const SCEV *IndexExpr : IndexExprs) {
+ if (auto *AR = dyn_cast<SCEVAddRecExpr>(IndexExpr)) {
+ if (!AR->hasNoSignedWrap())
+ AllHaveNSW = false;
+ if (!AR->hasNoUnsignedWrap())
+ AllHaveNUW = false;
+ } else {
+ // Be conservative for non-AddRec expressions.
+ AllHaveNSW = false;
+ AllHaveNUW = false;
+ break;
+ }
+ }
+ // Inherit NSW if all have NSW.
+ if (AllHaveNSW)
+ OffsetWrap = setFlags(OffsetWrap, SCEV::FlagNSW);
+ // Inherit NUW if all have NUW.
+ if (AllHaveNUW)
+ OffsetWrap = setFlags(OffsetWrap, SCEV::FlagNUW);
+ }
+
Type *CurTy = GEP->getType();
bool FirstIter = true;
SmallVector<const SCEV *, 4> Offsets;
diff --git a/llvm/test/Analysis/Delinearization/a.ll b/llvm/test/Analysis/Delinearization/a.ll
index 755c9baef9b8f..b2eeede192b1b 100644
--- a/llvm/test/Analysis/Delinearization/a.ll
+++ b/llvm/test/Analysis/Delinearization/a.ll
@@ -12,10 +12,10 @@ define void @foo(i64 %n, i64 %m, i64 %o, ptr nocapture %A) #0 {
; CHECK-LABEL: 'foo'
; CHECK-NEXT: Inst: store i32 1, ptr %arrayidx11.us.us, align 4
; CHECK-NEXT: In Loop with Header: for.k
-; CHECK-NEXT: AccessFunction: {{\{\{\{}}(28 + (4 * (-4 + (3 * %m)) * %o)),+,(8 * %m * %o)}<%for.i>,+,(12 * %o)}<%for.j>,+,20}<%for.k>
+; CHECK-NEXT: AccessFunction: {{\{\{\{}}(28 + (4 * (-4 + (3 * %m)) * %o)),+,(8 * %m * %o)}<%for.i>,+,(12 * %o)}<%for.j>,+,20}<nsw><%for.k>
; CHECK-NEXT: Base offset: %A
; CHECK-NEXT: ArrayDecl[UnknownSize][%m][%o] with elements of 4 bytes.
-; CHECK-NEXT: ArrayRef[{3,+,2}<nuw><%for.i>][{-4,+,3}<nw><%for.j>][{7,+,5}<nw><%for.k>]
+; CHECK-NEXT: ArrayRef[{3,+,2}<nuw><%for.i>][{-4,+,3}<nw><%for.j>][{7,+,5}<nuw><nsw><%for.k>]
;
entry:
%cmp32 = icmp sgt i64 %n, 0
diff --git a/llvm/test/Analysis/Delinearization/constant_functions_multi_dim.ll b/llvm/test/Analysis/Delinearization/constant_functions_multi_dim.ll
index c0b1a0b9cddaf..1c6be82ce5068 100644
--- a/llvm/test/Analysis/Delinearization/constant_functions_multi_dim.ll
+++ b/llvm/test/Analysis/Delinearization/constant_functions_multi_dim.ll
@@ -8,14 +8,14 @@ define void @mat_mul(ptr %C, ptr %A, ptr %B, i64 %N) #0 !kernel_arg_addr_space !
; CHECK-LABEL: 'mat_mul'
; CHECK-NEXT: Inst: %tmp = load float, ptr %arrayidx, align 4
; CHECK-NEXT: In Loop with Header: for.inc
-; CHECK-NEXT: AccessFunction: {(4 * %N * %call),+,4}<%for.inc>
+; CHECK-NEXT: AccessFunction: {(4 * %N * %call),+,4}<nsw><%for.inc>
; CHECK-NEXT: Base offset: %A
; CHECK-NEXT: ArrayDecl[UnknownSize][%N] with elements of 4 bytes.
; CHECK-NEXT: ArrayRef[%call][{0,+,1}<nuw><nsw><%for.inc>]
; CHECK-EMPTY:
; CHECK-NEXT: Inst: %tmp5 = load float, ptr %arrayidx4, align 4
; CHECK-NEXT: In Loop with Header: for.inc
-; CHECK-NEXT: AccessFunction: {(4 * %call1),+,(4 * %N)}<%for.inc>
+; CHECK-NEXT: AccessFunction: {(4 * %call1),+,(4 * %N)}<nsw><%for.inc>
; CHECK-NEXT: Base offset: %B
; CHECK-NEXT: ArrayDecl[UnknownSize][%N] with elements of 4 bytes.
; CHECK-NEXT: ArrayRef[{0,+,1}<nuw><nsw><%for.inc>][%call1]
diff --git a/llvm/test/Analysis/Delinearization/fixed_size_array.ll b/llvm/test/Analysis/Delinearization/fixed_size_array.ll
index 634850bb4a5a2..c7f35906037eb 100644
--- a/llvm/test/Analysis/Delinearization/fixed_size_array.ll
+++ b/llvm/test/Analysis/Delinearization/fixed_size_array.ll
@@ -12,7 +12,7 @@ define void @a_i_j_k(ptr %a) {
; CHECK-LABEL: 'a_i_j_k'
; CHECK-NEXT: Inst: store i32 1, ptr %idx, align 4
; CHECK-NEXT: In Loop with Header: for.k
-; CHECK-NEXT: AccessFunction: {{\{\{\{}}0,+,1024}<nuw><nsw><%for.i.header>,+,128}<nw><%for.j.header>,+,4}<nw><%for.k>
+; CHECK-NEXT: AccessFunction: {{\{\{\{}}0,+,1024}<nuw><nsw><%for.i.header>,+,128}<nuw><nsw><%for.j.header>,+,4}<nuw><nsw><%for.k>
; CHECK-NEXT: Base offset: %a
; CHECK-NEXT: ArrayDecl[UnknownSize][8][32] with elements of 4 bytes.
; CHECK-NEXT: ArrayRef[{0,+,1}<nuw><nsw><%for.i.header>][{0,+,1}<nuw><nsw><%for.j.header>][{0,+,1}<nuw><nsw><%for.k>]
@@ -61,7 +61,7 @@ define void @a_i_nj_k(ptr %a) {
; CHECK-LABEL: 'a_i_nj_k'
; CHECK-NEXT: Inst: store i32 1, ptr %idx, align 4
; CHECK-NEXT: In Loop with Header: for.k
-; CHECK-NEXT: AccessFunction: {{\{\{\{}}896,+,1024}<nuw><nsw><%for.i.header>,+,-128}<nw><%for.j.header>,+,4}<nw><%for.k>
+; CHECK-NEXT: AccessFunction: {{\{\{\{}}896,+,1024}<nuw><nsw><%for.i.header>,+,-128}<nsw><%for.j.header>,+,4}<nuw><nsw><%for.k>
; CHECK-NEXT: Base offset: %a
; CHECK-NEXT: ArrayDecl[UnknownSize][8][32] with elements of 4 bytes.
; CHECK-NEXT: ArrayRef[{0,+,1}<nuw><nsw><%for.i.header>][{7,+,-1}<nsw><%for.j.header>][{0,+,1}<nuw><nsw><%for.k>]
@@ -117,14 +117,14 @@ define void @a_ijk_b_i2jk(ptr %a, ptr %b) {
; CHECK-LABEL: 'a_ijk_b_i2jk'
; CHECK-NEXT: Inst: store i32 1, ptr %a.idx, align 4
; CHECK-NEXT: In Loop with Header: for.k
-; CHECK-NEXT: AccessFunction: {{\{\{\{}}0,+,1024}<nuw><nsw><%for.i.header>,+,256}<nw><%for.j.header>,+,4}<nw><%for.k>
+; CHECK-NEXT: AccessFunction: {{\{\{\{}}0,+,1024}<nuw><nsw><%for.i.header>,+,256}<nuw><nsw><%for.j.header>,+,4}<nuw><nsw><%for.k>
; CHECK-NEXT: Base offset: %a
; CHECK-NEXT: ArrayDecl[UnknownSize][4][64] with elements of 4 bytes.
; CHECK-NEXT: ArrayRef[{0,+,1}<nuw><nsw><%for.i.header>][{0,+,1}<nuw><nsw><%for.j.header>][{0,+,1}<nuw><nsw><%for.k>]
; CHECK-EMPTY:
; CHECK-NEXT: Inst: store i32 1, ptr %b.idx, align 4
; CHECK-NEXT: In Loop with Header: for.k
-; CHECK-NEXT: AccessFunction: {{\{\{\{}}0,+,1024}<nuw><nsw><%for.i.header>,+,256}<nw><%for.j.header>,+,4}<nw><%for.k>
+; CHECK-NEXT: AccessFunction: {{\{\{\{}}0,+,1024}<nuw><nsw><%for.i.header>,+,256}<nuw><nsw><%for.j.header>,+,4}<nuw><nsw><%for.k>
; CHECK-NEXT: Base offset: %b
; CHECK-NEXT: ArrayDecl[UnknownSize][4][64] with elements of 4 bytes.
; CHECK-NEXT: ArrayRef[{0,+,1}<nuw><nsw><%for.i.header>][{0,+,1}<nuw><nsw><%for.j.header>][{0,+,1}<nuw><nsw><%for.k>]
@@ -181,10 +181,10 @@ define void @a_i_2j1_k(ptr %a) {
; CHECK-LABEL: 'a_i_2j1_k'
; CHECK-NEXT: Inst: store i32 1, ptr %idx, align 4
; CHECK-NEXT: In Loop with Header: for.k
-; CHECK-NEXT: AccessFunction: {{\{\{\{}}128,+,1024}<nuw><nsw><%for.i.header>,+,256}<nw><%for.j.header>,+,4}<nw><%for.k>
+; CHECK-NEXT: AccessFunction: {{\{\{\{}}128,+,1024}<nuw><nsw><%for.i.header>,+,256}<nuw><nsw><%for.j.header>,+,4}<nuw><nsw><%for.k>
; CHECK-NEXT: Base offset: %a
; CHECK-NEXT: ArrayDecl[UnknownSize][4][64] with elements of 4 bytes.
-; CHECK-NEXT: ArrayRef[{0,+,1}<nuw><nsw><%for.i.header>][{0,+,1}<nuw><%for.j.header>][{32,+,1}<nw><%for.k>]
+; CHECK-NEXT: ArrayRef[{0,+,1}<nuw><nsw><%for.i.header>][{0,+,1}<nuw><nsw><%for.j.header>][{32,+,1}<nuw><nsw><%for.k>]
;
entry:
br label %for.i.header
@@ -235,7 +235,7 @@ define void @a_i_3j_k(ptr %a) {
; CHECK-LABEL: 'a_i_3j_k'
; CHECK-NEXT: Inst: store i32 1, ptr %idx, align 4
; CHECK-NEXT: In Loop with Header: for.k
-; CHECK-NEXT: AccessFunction: {{\{\{\{}}0,+,1024}<nuw><nsw><%for.i.header>,+,384}<nw><%for.j.header>,+,4}<nw><%for.k>
+; CHECK-NEXT: AccessFunction: {{\{\{\{}}0,+,1024}<nuw><nsw><%for.i.header>,+,384}<nuw><nsw><%for.j.header>,+,4}<nuw><nsw><%for.k>
; CHECK-NEXT: failed to delinearize
;
entry:
@@ -287,7 +287,7 @@ define void @a_i_j_3k(ptr %a) {
; CHECK-LABEL: 'a_i_j_3k'
; CHECK-NEXT: Inst: store i32 1, ptr %idx, align 4
; CHECK-NEXT: In Loop with Header: for.k
-; CHECK-NEXT: AccessFunction: {{\{\{\{}}0,+,1024}<nuw><nsw><%for.i.header>,+,128}<nw><%for.j.header>,+,12}<nw><%for.k>
+; CHECK-NEXT: AccessFunction: {{\{\{\{}}0,+,1024}<nuw><nsw><%for.i.header>,+,128}<nuw><nsw><%for.j.header>,+,12}<nuw><nsw><%for.k>
; CHECK-NEXT: Base offset: %a
; CHECK-NEXT: ArrayDecl[UnknownSize][8][32] with elements of 4 bytes.
; CHECK-NEXT: ArrayRef[{0,+,1}<nuw><nsw><%for.i.header>][{0,+,1}<nuw><nsw><%for.j.header>][{0,+,3}<nuw><nsw><%for.k>]
@@ -339,7 +339,7 @@ define void @a_i_j2k_i(ptr %a) {
; CHECK-LABEL: 'a_i_j2k_i'
; CHECK-NEXT: Inst: store i32 1, ptr %idx, align 4
; CHECK-NEXT: In Loop with Header: for.k
-; CHECK-NEXT: AccessFunction: {{\{\{\{}}0,+,1028}<%for.i.header>,+,256}<nw><%for.j.header>,+,128}<nw><%for.k>
+; CHECK-NEXT: AccessFunction: {{\{\{\{}}0,+,1028}<nuw><nsw><%for.i.header>,+,256}<nw><%for.j.header>,+,128}<nw><%for.k>
; CHECK-NEXT: failed to delinearize
;
entry:
@@ -391,7 +391,7 @@ define void @a_i_i_jk(ptr %a) {
; CHECK-LABEL: 'a_i_i_jk'
; CHECK-NEXT: Inst: store i32 1, ptr %idx, align 4
; CHECK-NEXT: In Loop with Header: for.k
-; CHECK-NEXT: AccessFunction: {{\{\{\{}}0,+,1152}<%for.i.header>,+,4}<nw><%for.j.header>,+,4}<nw><%for.k>
+; CHECK-NEXT: AccessFunction: {{\{\{\{}}0,+,1152}<nuw><nsw><%for.i.header>,+,4}<nw><%for.j.header>,+,4}<nw><%for.k>
; CHECK-NEXT: Base offset: %a
; CHECK-NEXT: ArrayDecl[UnknownSize][288] with elements of 4 bytes.
; CHECK-NEXT: ArrayRef[{0,+,1}<nuw><nsw><%for.i.header>][{{\{\{}}0,+,1}<nuw><nsw><%for.j.header>,+,1}<nuw><nsw><%for.k>]
@@ -503,7 +503,7 @@ define void @non_divisible_by_element_size(ptr %a) {
; CHECK-LABEL: 'non_divisible_by_element_size'
; CHECK-NEXT: Inst: store i32 1, ptr %idx, align 4
; CHECK-NEXT: In Loop with Header: for.k
-; CHECK-NEXT: AccessFunction: {{\{\{\{}}0,+,256}<nuw><nsw><%for.i.header>,+,32}<nw><%for.j.header>,+,1}<nw><%for.k>
+; CHECK-NEXT: AccessFunction: {{\{\{\{}}0,+,256}<nuw><nsw><%for.i.header>,+,32}<nuw><nsw><%for.j.header>,+,1}<nuw><nsw><%for.k>
; CHECK-NEXT: failed to delinearize
;
entry:
diff --git a/llvm/test/Analysis/Delinearization/himeno_1.ll b/llvm/test/Analysis/Delinearization/himeno_1.ll
index 292dca61d0592..1513eff8a6d76 100644
--- a/llvm/test/Analysis/Delinearization/himeno_1.ll
+++ b/llvm/test/Analysis/Delinearization/himeno_1.ll
@@ -33,7 +33,7 @@ define void @jacobi(i32 %nn, ptr nocapture %a, ptr nocapture %p) nounwind uwtabl
; CHECK-LABEL: 'jacobi'
; CHECK-NEXT: Inst: store float 1.000000e+00, ptr %arrayidx, align 4
; CHECK-NEXT: In Loop with Header: for.k
-; CHECK-NEXT: AccessFunction: {{\{\{\{}}(4 + (4 * (sext i32 %a.deps to i64) * (1 + (sext i32 %a.cols to i64))<nsw>)),+,(4 * (sext i32 %a.deps to i64) * (sext i32 %a.cols to i64))}<%for.i>,+,(4 * (sext i32 %a.deps to i64))<nsw>}<%for.j>,+,4}<%for.k>
+; CHECK-NEXT: AccessFunction: {{\{\{\{}}(4 + (4 * (sext i32 %a.deps to i64) * (1 + (sext i32 %a.cols to i64))<nsw>)),+,(4 * (sext i32 %a.deps to i64) * (sext i32 %a.cols to i64))}<%for.i>,+,(4 * (sext i32 %a.deps to i64))<nsw>}<%for.j>,+,4}<nsw><%for.k>
; CHECK-NEXT: Base offset: %a.base
; CHECK-NEXT: ArrayDecl[UnknownSize][(sext i32 %a.cols to i64)][(sext i32 %a.deps to i64)] with elements of 4 bytes.
; CHECK-NEXT: ArrayRef[{1,+,1}<nuw><nsw><%for.i>][{1,+,1}<nuw><nsw><%for.j>][{1,+,1}<nuw><nsw><%for.k>]
diff --git a/llvm/test/Analysis/Delinearization/himeno_2.ll b/llvm/test/Analysis/Delinearization/himeno_2.ll
index d210539d67d8b..158db4d1335e1 100644
--- a/llvm/test/Analysis/Delinearization/himeno_2.ll
+++ b/llvm/test/Analysis/Delinearization/himeno_2.ll
@@ -33,7 +33,7 @@ define void @jacobi(i32 %nn, ptr nocapture %a, ptr nocapture %p) nounwind uwtabl
; CHECK-LABEL: 'jacobi'
; CHECK-NEXT: Inst: store float 1.000000e+00, ptr %arrayidx, align 4
; CHECK-NEXT: In Loop with Header: for.k
-; CHECK-NEXT: AccessFunction: {{\{\{\{}}(4 + (4 * (sext i32 %a.deps to i64) * (1 + (sext i32 %a.cols to i64))<nsw>)),+,(4 * (sext i32 %a.deps to i64) * (sext i32 %a.cols to i64))}<%for.i>,+,(4 * (sext i32 %a.deps to i64))<nsw>}<%for.j>,+,4}<%for.k>
+; CHECK-NEXT: AccessFunction: {{\{\{\{}}(4 + (4 * (sext i32 %a.deps to i64) * (1 + (sext i32 %a.cols to i64))<nsw>)),+,(4 * (sext i32 %a.deps to i64) * (sext i32 %a.cols to i64))}<%for.i>,+,(4 * (sext i32 %a.deps to i64))<nsw>}<%for.j>,+,4}<nsw><%for.k>
; CHECK-NEXT: Base offset: %a.base
; CHECK-NEXT: ArrayDecl[UnknownSize][(sext i32 %a.cols to i64)][(sext i32 %a.deps to i64)] with elements of 4 bytes.
; CHECK-NEXT: ArrayRef[{1,+,1}<nuw><nsw><%for.i>][{1,+,1}<nuw><nsw><%for.j>][{1,+,1}<nuw><nsw><%for.k>]
diff --git a/llvm/test/Analysis/Delinearization/iv_times_constant_in_subscript.ll b/llvm/test/Analysis/Delinearization/iv_times_constant_in_subscript.ll
index cbe3ec8a19acd..dbb3b0eadb3fb 100644
--- a/llvm/test/Analysis/Delinearization/iv_times_constant_in_subscript.ll
+++ b/llvm/test/Analysis/Delinearization/iv_times_constant_in_subscript.ll
@@ -13,10 +13,10 @@ define void @foo(i64 %n, i64 %m, i64 %b, ptr %A) {
; CHECK-LABEL: 'foo'
; CHECK-NEXT: Inst: store double 1.000000e+00, ptr %arrayidx, align 8
; CHECK-NEXT: In Loop with Header: for.j
-; CHECK-NEXT: AccessFunction: {{\{\{}}(8 * %m * %b),+,(16 * %m)}<%for.i>,+,16}<%for.j>
+; CHECK-NEXT: AccessFunction: {{\{\{}}(8 * %m * %b),+,(16 * %m)}<%for.i>,+,16}<nsw><%for.j>
; CHECK-NEXT: Base offset: %A
; CHECK-NEXT: ArrayDecl[UnknownSize][%m] with elements of 8 bytes.
-; CHECK-NEXT: ArrayRef[{%b,+,2}<nsw><%for.i>][{0,+,2}<nuw><%for.j>]
+; CHECK-NEXT: ArrayRef[{%b,+,2}<nsw><%for.i>][{0,+,2}<nuw><nsw><%for.j>]
;
entry:
br label %for.i
diff --git a/llvm/test/Analysis/Delinearization/multidim_ivs_and_integer_offsets_3d.ll b/llvm/test/Analysis/Delinearization/multidim_ivs_and_integer_offsets_3d.ll
index 3d21d97438462..6e2ba3e8c8a20 100644
--- a/llvm/test/Analysis/Delinearization/multidim_ivs_and_integer_offsets_3d.ll
+++ b/llvm/test/Analysis/Delinearization/multidim_ivs_and_integer_offsets_3d.ll
@@ -13,7 +13,7 @@ define void @foo(i64 %n, i64 %m, i64 %o, ptr %A) {
; CHECK-LABEL: 'foo'
; CHECK-NEXT: Inst: store double 1.000000e+00, ptr %idx, align 8
; CHECK-NEXT: In Loop with Header: for.k
-; CHECK-NEXT: AccessFunction: {{\{\{\{}}(56 + (8 * (-4 + (3 * %m)) * %o)),+,(8 * %m * %o)}<%for.i>,+,(8 * %o)}<%for.j>,+,8}<%for.k...
[truncated]
|
You can test this locally with the following command:git diff -U0 --pickaxe-regex -S '([^a-zA-Z0-9#_-]undef[^a-zA-Z0-9_-]|UndefValue::get)' 'HEAD~1' HEAD llvm/test/Analysis/DependenceAnalysis/scev-nsw-flags-enable-analysis.ll llvm/test/Analysis/LoopAccessAnalysis/waw-negative-dependence.ll llvm/test/Analysis/ScalarEvolution/gep-nsw-flag-preservation.ll llvm/lib/Analysis/LoopAccessAnalysis.cpp llvm/lib/Analysis/ScalarEvolution.cpp llvm/test/Analysis/Delinearization/a.ll llvm/test/Analysis/Delinearization/constant_functions_multi_dim.ll llvm/test/Analysis/Delinearization/fixed_size_array.ll llvm/test/Analysis/Delinearization/himeno_1.ll llvm/test/Analysis/Delinearization/himeno_2.ll llvm/test/Analysis/Delinearization/iv_times_constant_in_subscript.ll llvm/test/Analysis/Delinearization/multidim_ivs_and_integer_offsets_3d.ll llvm/test/Analysis/Delinearization/multidim_ivs_and_integer_offsets_nts_3d.ll llvm/test/Analysis/Delinearization/multidim_ivs_and_parameteric_offsets_3d.ll llvm/test/Analysis/Delinearization/multidim_only_ivs_2d.ll llvm/test/Analysis/Delinearization/multidim_only_ivs_2d_nested.ll llvm/test/Analysis/Delinearization/multidim_only_ivs_3d.ll llvm/test/Analysis/Delinearization/multidim_only_ivs_3d_cast.ll llvm/test/Analysis/Delinearization/multidim_two_accesses_different_delinearization.ll llvm/test/Analysis/DependenceAnalysis/DADelin.ll llvm/test/Analysis/LoopAccessAnalysis/depend_diff_types.ll llvm/test/Analysis/LoopAccessAnalysis/different-access-types-rt-checks.ll llvm/test/Analysis/LoopAccessAnalysis/evaluate-at-symbolic-max-backedge-taken-count-may-wrap.ll llvm/test/Analysis/LoopAccessAnalysis/forward-loop-carried.ll llvm/test/Analysis/LoopAccessAnalysis/forward-loop-independent.ll llvm/test/Analysis/LoopAccessAnalysis/loops-with-indirect-reads-and-writes.ll llvm/test/Analysis/LoopAccessAnalysis/number-of-memchecks.ll llvm/test/Analysis/LoopAccessAnalysis/retry-runtime-checks-after-dependence-analysis-forked-pointers.ll llvm/test/Analysis/LoopAccessAnalysis/retry-runtime-checks-after-dependence-analysis.ll llvm/test/Analysis/LoopAccessAnalysis/symbolic-stride.ll llvm/test/Analysis/ScalarEvolution/different-loops-recs.ll llvm/test/Analysis/ScalarEvolution/flags-from-poison.ll llvm/test/Analysis/ScalarEvolution/max-backedge-taken-count-guard-info.ll llvm/test/Analysis/ScalarEvolution/max-expr-cache.ll llvm/test/Analysis/ScalarEvolution/nsw.ll llvm/test/Analysis/ScalarEvolution/ptrtoint.ll llvm/test/Analysis/ScalarEvolution/trip-count-scalable-stride.ll llvm/test/Transforms/LoopIdiom/memset-runtime-debug.ll llvm/test/Transforms/LoopStrengthReduce/lsr-term-fold-negative-testcase.ll llvm/test/Transforms/LoopUnrollAndJam/unroll-and-jam.ll llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll The following files introduce new uses of undef:
Undef is now deprecated and should only be used in the rare cases where no replacement is possible. For example, a load of uninitialized memory yields In tests, avoid using For example, this is considered a bad practice: define void @fn() {
...
br i1 undef, ...
} Please use the following instead: define void @fn(i1 %cond) {
...
br i1 %cond, ...
} Please refer to the Undefined Behavior Manual for more information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please separate out the SCEV changes into separate PRs?
|
||
// Track flags: start with the flags from the first AddRec. | ||
bool AllHaveNSW = AddRec->hasNoSignedWrap(); | ||
bool AllHaveNUW = AddRec->hasNoUnsignedWrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have {0,+,1}<nuw> + {0,+,1}<nuw>
, then there is a priori no reason to believe that {0,+,2}
is also <nuw>
, which I think is the claim you're trying to make here?
// Inherit NUW if all have NUW. | ||
if (AllHaveNUW) | ||
OffsetWrap = setFlags(OffsetWrap, SCEV::FlagNUW); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get what you're doing here. E.g. if you have getelementptr i32, ptr %p, i64 %idx
and %idx
is nowrap, then there is no reason to believe that 4 * %idx
is also nowrap, which is what the OffsetWrap is about.
@@ -3270,7 +3306,9 @@ const SCEV *ScalarEvolution::getMulExpr(SmallVectorImpl<const SCEV *> &Ops, | |||
Instruction::Mul, getSignedRange(Scale), | |||
OverflowingBinaryOperator::NoSignedWrap); | |||
if (!NSWRegion.contains(getSignedRange(AddRec->getOperand(i)))) | |||
Flags = clearFlags(Flags, SCEV::FlagNSW); | |||
if (!hasFlags(OrigFlags, SCEV::FlagNSW)) | |||
Flags = clearFlags(Flags, SCEV::FlagNSW); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code doesn't make sense anymore: It removes nsw only if nsw is already not set.
Flags = setFlags(Flags, SCEV::FlagNSW); | ||
if (hasFlags(OrigFlags, SCEV::FlagNUW)) | ||
Flags = setFlags(Flags, SCEV::FlagNUW); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, the claim you're making here is that scale *<nuw> {start,+,step}
is {scale*start,+,scale*step}<nuw>
-- that is, this only depends on the flags on the multiply, not the addrec (unlike the intersection taken directly above). This seems trivially incorrect in the case where scale=1
, in which case the multiplication is always nuw and you gain zero information on the addrec.
But even for non-trivial cases, let's take 2 *<nuw> {1,+,-1}
where the addrec takes values {1,0}
The multiply is nuw, but the addrec {2,+,-2}
isn't.
@@ -3252,7 +3279,7 @@ const SCEV *ScalarEvolution::getMulExpr(SmallVectorImpl<const SCEV *> &Ops, | |||
// NLI * LI * {Start,+,Step} --> NLI * {LI*Start,+,LI*Step} | |||
SmallVector<const SCEV *, 4> NewOps; | |||
NewOps.reserve(AddRec->getNumOperands()); | |||
const SCEV *Scale = getMulExpr(LIOps, SCEV::FlagAnyWrap, Depth + 1); | |||
const SCEV *Scale = getMulExpr(LIOps, OrigFlags, Depth + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this part of the change is correct.
For flag propagation changes, I'd really recommend writing alive2 proofs for the propagation rules you are claiming. You can't directly prove something on addrecs, but you can at least prove it for one step of the addrec. |
SCEV was losing NSW flags during AddRec operations, causing Dependence
Analysis to add unnecessary runtime assumptions.
There are 4 patches to be reviewed together and committed separately.
[LAA] Fix WAW dependency analysis with negative distances
First patch is needed before the getAddExpr patch uncovers a latent bug
in LAA. This prevents vectorization with the second patch for a loop
that should not be vectorized.
[SCEV] Fix NSW flag propagation in getAddExpr
[SCEV] Fix NSW flag propagation in getMulExpr
[SCEV] Fix NSW flag propagation in getGEPExpr
The other patches are independent of each other.