-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Add 3-opt implementation for improving upon RPO-based layout #103450
base: main
Are you sure you want to change the base?
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
src/coreclr/jit/fgopt.cpp
Outdated
improvedLayout = false; | ||
BasicBlock* const exitBlock = blockVector[blockCount - 1]; | ||
|
||
for (unsigned i = 1; i < (blockCount - 1); i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the root of the TP cost is here -- we want to avoid having to search for possible cut points.
One approach is to just pick randomly, but I think we can do better for now. Roughly speaking in the pass above we should find all blocks that are either not just before their optimal successor and/or not just after their optimal successor.
We can rank these by the difference in the current vs optimal score. Then greedily pick the worst, that gives the first cut point. For the second cut point you can pick the best pred for the first cut point's current next block, or the best succ for the current pred of the first cut point's ideal successor. That is, if we have
S ~~~~ 1|2 ~~~ 3|4 ~~~ 5|6 ~~~ E
1's ideal succ is 4
reordering is
S ~~~~ 1|4 ~~~ 5|2 ~~~ 3|6 ~~~ E
So we either try and find a 5 which is the ideal pred of 2, or a 6 which is the ideal succ of 3.
Failing that we might pick some other block that is not currently followed by its ideal succ.
So one idea is to keep 3 values for each block: its min score, current score, and best score (lower is better). Order the blocks by current-min. Pick of the best as the first split, and then see if any of the next few provide a good second split.
Likely though this ends up needing a priority queue or similar as once we accept an arrangement we need to update some of the costings...
Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it. |
@AndyAyersMS thanks for bearing with me on this. I've implemented your suggestion of building and maintaining a priority queue of cut points, and this seems to be sufficiently cheap. Diffs show plenty of variance in asmdiffs across platforms, though this looks like a net PerfScore win. To contain the number of iterations, we currently consider each edge at most once; we probably don't want to limit the search space too much, though these limitations had pretty small diffs locally, so it seems like the current approach is fixing the most obvious instances of subpar layout. I haven't implemented this for methods with EH yet, though I'm thinking of leaving the cutpoint search as-is, and then after reordering blocks, we can make EH regions contiguous by "bubbling up" the next EH block we see to its predecessor. This fixup can break up fallthrough from EH exits into non-EH blocks, but it will maintain the relative ordering such that the exit jump is forward; for now, breaking up such fallthrough seems necessary. With this approach, we can get rid of the EH fixup logic in earlier ordering passes (RPO layout, I think this PR is in good shape, so I thought I'd ping you now in case you want to take a look, though I don't plan to push to merge this until we get the LSRA changes where we want them. |
No description provided.