Skip to content

Commit

Permalink
update_test_checks: improve IR value name stability
Browse files Browse the repository at this point in the history
By default, UTC attempts to keep the produced diff small by keeping IR
value name variables stable. The old algorithm was roughly:

1. Compute a diff between the old and new check lines, where
   "uncommitted" variable names are replaced by a wildcard.
   This leads to a set of non-crossing "candidate" pairs of
   (old line, new line) that we can try to make equal.

2. Greedily walk this list of candidates, committing to variable names
   that make candidate lines equal if possible.

The greedy approach in the second step has the downside that committing
to a variable name greedily can sometimes prevent many subsequent
candidates from getting the variable name assignment that would make
them equal.

We keep the first step as-is, but replace the second one by an algorithm
that finds a large independent set of candidates, i.e. candidate pairs
of (old line, new line) which are non-conflicting in the sense that
their desired variable name mappings are not in conflict.

We find the large independent set by greedily assigning a coloring to
the conflict graph and taking the largest color class. We then commit to
all the variable name mappings which are desired by candidates in this
largest color class.

As before, we then recurse into regions between matching lines. This is
required in large cases. For example, running this algorithm at the
top-level of the new test case (stable_ir_values5.ll) matches up most of
the instructions, but not the names of the result values of all the
`load`s. This is because (unlike e.g. the getelementptrs) the load
instructions are all equal except for variable names, and so step 1 (the
diff algorithm) doesn't consider them as candidates. However, they are
trivially matched by recursion.

As is usually the case with these changes, the quality improvement is
hard to see from the diff of this patch. However, it becomes obvious when
comparing the diff of stable_ir_values5.ll against stable_ir_value5.ll.expected
before and after this change.
  • Loading branch information
nhaehnle committed Oct 2, 2024
1 parent d87de5e commit 992462a
Show file tree
Hide file tree
Showing 2 changed files with 102 additions and 58 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@ define dso_local void @main.resume.0(i64 %0, %structA %arg01, [23 x i32] %arg12,
; CHECK-SAME: i64 [[TMP0:%.*]], [[STRUCTA:%.*]] [[ARG01:%.*]], [23 x i32] [[ARG12:%.*]], [30 x i32] [[ARG23:%.*]]) {
; CHECK-NEXT: entryresume.0:
; CHECK-NEXT: [[FOO:%.*]] = call ptr @getter(i32 108)
; CHECK-NEXT: [[TMP5:%.*]] = insertvalue { [[STRUCTA]], [23 x i32], [30 x i32] } poison, [[STRUCTA]] [[ARG01]], 0
; CHECK-NEXT: [[TMP6:%.*]] = insertvalue { [[STRUCTA]], [23 x i32], [30 x i32] } [[TMP5]], [23 x i32] [[ARG12]], 1
; CHECK-NEXT: [[TMP1:%.*]] = insertvalue { [[STRUCTA]], [23 x i32], [30 x i32] } [[TMP6]], [30 x i32] [[ARG23]], 2
; CHECK-NEXT: [[TMP6:%.*]] = insertvalue { [[STRUCTA]], [23 x i32], [30 x i32] } poison, [[STRUCTA]] [[ARG01]], 0
; CHECK-NEXT: [[TMP8:%.*]] = insertvalue { [[STRUCTA]], [23 x i32], [30 x i32] } [[TMP6]], [23 x i32] [[ARG12]], 1
; CHECK-NEXT: [[TMP1:%.*]] = insertvalue { [[STRUCTA]], [23 x i32], [30 x i32] } [[TMP8]], [30 x i32] [[ARG23]], 2
; CHECK-NEXT: [[TMP3:%.*]] = extractvalue { [[STRUCTA]], [23 x i32], [30 x i32] } [[TMP1]], 2
; CHECK-NEXT: [[DOTFCA_0_EXTRACT:%.*]] = extractvalue [30 x i32] [[TMP3]], 0
; CHECK-NEXT: [[DOTFCA_1_EXTRACT:%.*]] = extractvalue [30 x i32] [[TMP3]], 1
Expand Down Expand Up @@ -100,59 +100,59 @@ define dso_local void @main.resume.0(i64 %0, %structA %arg01, [23 x i32] %arg12,
; CHECK-NEXT: [[DOTFCA_0_48_EXTRACT:%.*]] = extractvalue [[STRUCTC]] [[TMP4]], 0, 48
; CHECK-NEXT: [[DOTFCA_0_49_EXTRACT:%.*]] = extractvalue [[STRUCTC]] [[TMP4]], 0, 49
; CHECK-NEXT: [[TMP2:%.*]] = inttoptr i32 [[DOTFCA_0_EXTRACT]] to ptr
; CHECK-NEXT: [[TMP8:%.*]] = load i32, ptr [[TMP2]], align 4
; CHECK-NEXT: [[TMP5:%.*]] = load i32, ptr [[TMP2]], align 4
; CHECK-NEXT: [[TMP27:%.*]] = getelementptr inbounds i32, ptr [[TMP2]], i32 1
; CHECK-NEXT: [[TMP10:%.*]] = load i32, ptr [[TMP27]], align 4
; CHECK-NEXT: [[TMP7:%.*]] = load i32, ptr [[TMP27]], align 4
; CHECK-NEXT: [[TMP29:%.*]] = getelementptr inbounds i32, ptr [[TMP2]], i32 2
; CHECK-NEXT: [[TMP12:%.*]] = load i32, ptr [[TMP29]], align 4
; CHECK-NEXT: [[TMP9:%.*]] = load i32, ptr [[TMP29]], align 4
; CHECK-NEXT: [[TMP31:%.*]] = getelementptr inbounds i32, ptr [[TMP2]], i32 3
; CHECK-NEXT: [[TMP14:%.*]] = load i32, ptr [[TMP31]], align 4
; CHECK-NEXT: [[TMP11:%.*]] = load i32, ptr [[TMP31]], align 4
; CHECK-NEXT: [[TMP33:%.*]] = getelementptr inbounds i32, ptr [[TMP2]], i32 4
; CHECK-NEXT: [[TMP16:%.*]] = load i32, ptr [[TMP33]], align 4
; CHECK-NEXT: [[TMP13:%.*]] = load i32, ptr [[TMP33]], align 4
; CHECK-NEXT: [[TMP35:%.*]] = getelementptr inbounds i32, ptr [[TMP2]], i32 5
; CHECK-NEXT: [[TMP18:%.*]] = load i32, ptr [[TMP35]], align 4
; CHECK-NEXT: [[TMP15:%.*]] = load i32, ptr [[TMP35]], align 4
; CHECK-NEXT: [[TMP37:%.*]] = getelementptr inbounds i32, ptr [[TMP2]], i32 6
; CHECK-NEXT: [[TMP20:%.*]] = load i32, ptr [[TMP37]], align 4
; CHECK-NEXT: [[TMP17:%.*]] = load i32, ptr [[TMP37]], align 4
; CHECK-NEXT: [[TMP39:%.*]] = getelementptr inbounds i32, ptr [[TMP2]], i32 7
; CHECK-NEXT: [[TMP22:%.*]] = load i32, ptr [[TMP39]], align 4
; CHECK-NEXT: [[TMP19:%.*]] = load i32, ptr [[TMP39]], align 4
; CHECK-NEXT: [[TMP41:%.*]] = getelementptr inbounds i32, ptr [[TMP2]], i32 8
; CHECK-NEXT: [[TMP24:%.*]] = load i32, ptr [[TMP41]], align 4
; CHECK-NEXT: [[TMP21:%.*]] = load i32, ptr [[TMP41]], align 4
; CHECK-NEXT: [[TMP43:%.*]] = getelementptr inbounds i32, ptr [[TMP2]], i32 9
; CHECK-NEXT: [[TMP26:%.*]] = load i32, ptr [[TMP43]], align 4
; CHECK-NEXT: [[TMP24:%.*]] = load i32, ptr [[TMP43]], align 4
; CHECK-NEXT: [[TMP45:%.*]] = getelementptr inbounds i32, ptr [[TMP2]], i32 10
; CHECK-NEXT: [[TMP28:%.*]] = load i32, ptr [[TMP45]], align 4
; CHECK-NEXT: [[TMP25:%.*]] = load i32, ptr [[TMP45]], align 4
; CHECK-NEXT: [[TMP47:%.*]] = getelementptr inbounds i32, ptr [[TMP2]], i32 11
; CHECK-NEXT: [[TMP30:%.*]] = load i32, ptr [[TMP47]], align 4
; CHECK-NEXT: [[TMP28:%.*]] = load i32, ptr [[TMP47]], align 4
; CHECK-NEXT: [[TMP49:%.*]] = getelementptr inbounds i32, ptr [[TMP2]], i32 12
; CHECK-NEXT: [[TMP32:%.*]] = load i32, ptr [[TMP49]], align 4
; CHECK-NEXT: [[TMP30:%.*]] = load i32, ptr [[TMP49]], align 4
; CHECK-NEXT: [[TMP51:%.*]] = getelementptr inbounds i32, ptr [[TMP2]], i32 13
; CHECK-NEXT: [[TMP34:%.*]] = load i32, ptr [[TMP51]], align 4
; CHECK-NEXT: [[TMP32:%.*]] = load i32, ptr [[TMP51]], align 4
; CHECK-NEXT: [[TMP53:%.*]] = getelementptr inbounds i32, ptr [[TMP2]], i32 14
; CHECK-NEXT: [[TMP36:%.*]] = load i32, ptr [[TMP53]], align 4
; CHECK-NEXT: [[TMP34:%.*]] = load i32, ptr [[TMP53]], align 4
; CHECK-NEXT: [[TMP55:%.*]] = getelementptr inbounds i32, ptr [[TMP2]], i32 15
; CHECK-NEXT: [[TMP38:%.*]] = load i32, ptr [[TMP55]], align 4
; CHECK-NEXT: [[TMP36:%.*]] = load i32, ptr [[TMP55]], align 4
; CHECK-NEXT: [[TMP57:%.*]] = getelementptr inbounds i32, ptr [[TMP2]], i32 16
; CHECK-NEXT: [[TMP40:%.*]] = load i32, ptr [[TMP57]], align 4
; CHECK-NEXT: [[TMP38:%.*]] = load i32, ptr [[TMP57]], align 4
; CHECK-NEXT: [[TMP59:%.*]] = getelementptr inbounds i32, ptr [[TMP2]], i32 17
; CHECK-NEXT: [[TMP42:%.*]] = load i32, ptr [[TMP59]], align 4
; CHECK-NEXT: [[TMP40:%.*]] = load i32, ptr [[TMP59]], align 4
; CHECK-NEXT: [[TMP61:%.*]] = getelementptr inbounds i32, ptr [[TMP2]], i32 18
; CHECK-NEXT: [[TMP44:%.*]] = load i32, ptr [[TMP61]], align 4
; CHECK-NEXT: [[TMP42:%.*]] = load i32, ptr [[TMP61]], align 4
; CHECK-NEXT: [[TMP63:%.*]] = getelementptr inbounds i32, ptr [[TMP2]], i32 19
; CHECK-NEXT: [[TMP46:%.*]] = load i32, ptr [[TMP63]], align 4
; CHECK-NEXT: [[TMP44:%.*]] = load i32, ptr [[TMP63]], align 4
; CHECK-NEXT: [[TMP65:%.*]] = getelementptr inbounds i32, ptr [[TMP2]], i32 20
; CHECK-NEXT: [[TMP48:%.*]] = load i32, ptr [[TMP65]], align 4
; CHECK-NEXT: [[TMP46:%.*]] = load i32, ptr [[TMP65]], align 4
; CHECK-NEXT: [[TMP67:%.*]] = getelementptr inbounds i32, ptr [[TMP2]], i32 21
; CHECK-NEXT: [[TMP50:%.*]] = load i32, ptr [[TMP67]], align 4
; CHECK-NEXT: [[TMP48:%.*]] = load i32, ptr [[TMP67]], align 4
; CHECK-NEXT: [[TMP69:%.*]] = getelementptr inbounds i32, ptr [[TMP2]], i32 22
; CHECK-NEXT: [[TMP52:%.*]] = load i32, ptr [[TMP69]], align 4
; CHECK-NEXT: [[TMP50:%.*]] = load i32, ptr [[TMP69]], align 4
; CHECK-NEXT: [[TMP71:%.*]] = getelementptr inbounds i32, ptr [[TMP2]], i32 23
; CHECK-NEXT: [[TMP54:%.*]] = load i32, ptr [[TMP71]], align 4
; CHECK-NEXT: [[TMP52:%.*]] = load i32, ptr [[TMP71]], align 4
; CHECK-NEXT: [[TMP73:%.*]] = getelementptr inbounds i32, ptr [[TMP2]], i32 24
; CHECK-NEXT: [[TMP56:%.*]] = load i32, ptr [[TMP73]], align 4
; CHECK-NEXT: [[TMP54:%.*]] = load i32, ptr [[TMP73]], align 4
; CHECK-NEXT: [[TMP75:%.*]] = getelementptr inbounds i32, ptr [[TMP2]], i32 25
; CHECK-NEXT: [[TMP62:%.*]] = load i32, ptr [[TMP75]], align 4
; CHECK-NEXT: [[TMP56:%.*]] = load i32, ptr [[TMP75]], align 4
; CHECK-NEXT: [[TMP77:%.*]] = getelementptr inbounds i32, ptr [[TMP2]], i32 26
; CHECK-NEXT: [[TMP64:%.*]] = load i32, ptr [[TMP77]], align 4
; CHECK-NEXT: [[TMP62:%.*]] = load i32, ptr [[TMP77]], align 4
; CHECK-NEXT: [[TMP60:%.*]] = inttoptr i32 [[DOTFCA_0_EXTRACT]] to ptr
; CHECK-NEXT: [[TMP58:%.*]] = extractvalue { [[STRUCTA]], [23 x i32], [30 x i32] } [[TMP1]], 0
; CHECK-NEXT: [[DOTFCA_0_EXTRACT57:%.*]] = extractvalue [[STRUCTA]] [[TMP58]], 0
Expand Down
100 changes: 72 additions & 28 deletions llvm/utils/UpdateTestChecks/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -1534,14 +1534,35 @@ def diffify_line(line, mapper):

candidate_matches = find_diff_matching(lhs_lines, rhs_lines)

# Apply commits greedily on a match-by-match basis
matches = [(-1, -1)]
committed_anything = False
for lhs_idx, rhs_idx in candidate_matches:
candidate_matches = [
(old_begin + lhs_idx, new_begin + rhs_idx)
for lhs_idx, rhs_idx in candidate_matches
]

# Candidate matches may conflict if they require conflicting mappings of
# names.
#
# Treat the candidate matches as vertices in a conflict graph. Greedily
# color the vertices.
class Color:
def __init__(self):
# (lhs_idx, rhs_idx) of matches in this color
self.matches = []

# rhs_name -> lhs_name mappings required by this color
self.mapping = {}

# lhs_names committed for this color
self.committed = set()
colors = []

for match_idx, (lhs_idx, rhs_idx) in enumerate(candidate_matches):
lhs_line = old_line_infos[lhs_idx]
rhs_line = new_line_infos[rhs_idx]

local_commits = {}
compatible_colors = colors[:]
new_color = Color()
new_color.matches.append((lhs_idx, rhs_idx))

for lhs_value, rhs_value in zip(lhs_line.values, rhs_line.values):
if new_mapping[rhs_value.name] in committed_names:
Expand All @@ -1554,39 +1575,62 @@ def diffify_line(line, mapper):
else:
break

if rhs_value.name in local_commits:
if rhs_value.name in new_color.mapping:
# Same, but for a possible commit happening on the same line
if local_commits[rhs_value.name] == lhs_value.name:
if new_color.color[rhs_value.name] == lhs_value.name:
continue
else:
break

if lhs_value.name in committed_names:
# We can't map this value because the name we would map it to has already been
# committed for something else. Give up on this line.
if lhs_value.name in committed_names or lhs_value.name in new_color.committed:
# We can't map this value because the name we would map it
# to has already been committed for something else. Give up
# on this line.
break

local_commits[rhs_value.name] = lhs_value.name
else:
# No reason not to add any commitments for this line
for rhs_var, lhs_var in local_commits.items():
new_mapping[rhs_var] = lhs_var
committed_names.add(lhs_var)
committed_anything = True

if (
lhs_var != rhs_var
and lhs_var in new_mapping
and new_mapping[lhs_var] == lhs_var
):
new_mapping[lhs_var] = "conflict_" + lhs_var
new_color.mapping[rhs_value.name] = lhs_value.name
new_color.committed.add(lhs_value.name)

matches.append((lhs_idx, rhs_idx))
color_idx = 0
while color_idx < len(compatible_colors):
color = compatible_colors[color_idx]
compatible = True
if rhs_value.name in color.mapping:
compatible = color.mapping[rhs_value.name] == lhs_value.name
else:
compatible = lhs_value.name not in color.committed
if compatible:
color_idx += 1
else:
del compatible_colors[color_idx]
else:
# At a minimum, this line is viable standalone
if compatible_colors:
compatible_colors[0].mapping.update(new_color.mapping)
compatible_colors[0].committed.update(new_color.committed)
compatible_colors[0].matches.append((lhs_idx, rhs_idx))
else:
colors.append(new_color)

if colors:
# Pick the largest color class. This gives us a large independent
# (non-conflicting) set of candidate matches. Assign all names
# required by the independent set and recurse.
max_color = max(colors, key=lambda color: len(color.matches))

for rhs_var, lhs_var in max_color.mapping.items():
new_mapping[rhs_var] = lhs_var
committed_names.add(lhs_var)

if (
lhs_var != rhs_var
and lhs_var in new_mapping
and new_mapping[lhs_var] == lhs_var
):
new_mapping[lhs_var] = "conflict_" + lhs_var

matches.append((old_end, new_end))
matches = [(old_begin - 1, new_begin - 1)] + max_color.matches + [(old_end, new_end)]

# Recursively handle sequences between matches
if committed_anything:
for (lhs_prev, rhs_prev), (lhs_next, rhs_next) in zip(matches, matches[1:]):
recurse(lhs_prev + 1, lhs_next, rhs_prev + 1, rhs_next)

Expand Down

0 comments on commit 992462a

Please sign in to comment.