Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Add 3-opt implementation for improving upon RPO-based layout #103450

Draft
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

amanasifkhalid
Copy link
Member

No description provided.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 13, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

improvedLayout = false;
BasicBlock* const exitBlock = blockVector[blockCount - 1];

for (unsigned i = 1; i < (blockCount - 1); i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the root of the TP cost is here -- we want to avoid having to search for possible cut points.

One approach is to just pick randomly, but I think we can do better for now. Roughly speaking in the pass above we should find all blocks that are either not just before their optimal successor and/or not just after their optimal successor.

We can rank these by the difference in the current vs optimal score. Then greedily pick the worst, that gives the first cut point. For the second cut point you can pick the best pred for the first cut point's current next block, or the best succ for the current pred of the first cut point's ideal successor. That is, if we have

S ~~~~ 1|2 ~~~ 3|4 ~~~ 5|6 ~~~ E

1's ideal succ is 4

reordering is

S ~~~~ 1|4 ~~~ 5|2 ~~~ 3|6 ~~~ E

So we either try and find a 5 which is the ideal pred of 2, or a 6 which is the ideal succ of 3.

Failing that we might pick some other block that is not currently followed by its ideal succ.

So one idea is to keep 3 values for each block: its min score, current score, and best score (lower is better). Order the blocks by current-min. Pick of the best as the first split, and then see if any of the next few provide a good second split.

Likely though this ends up needing a priority queue or similar as once we accept an arrangement we need to update some of the costings...

Copy link
Contributor

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@github-actions github-actions bot locked and limited conversation to collaborators Aug 14, 2024
@amanasifkhalid
Copy link
Member Author

@AndyAyersMS thanks for bearing with me on this. I've implemented your suggestion of building and maintaining a priority queue of cut points, and this seems to be sufficiently cheap. Diffs show plenty of variance in asmdiffs across platforms, though this looks like a net PerfScore win. To contain the number of iterations, we currently consider each edge at most once; we probably don't want to limit the search space too much, though these limitations had pretty small diffs locally, so it seems like the current approach is fixing the most obvious instances of subpar layout.

I haven't implemented this for methods with EH yet, though I'm thinking of leaving the cutpoint search as-is, and then after reordering blocks, we can make EH regions contiguous by "bubbling up" the next EH block we see to its predecessor. This fixup can break up fallthrough from EH exits into non-EH blocks, but it will maintain the relative ordering such that the exit jump is forward; for now, breaking up such fallthrough seems necessary. With this approach, we can get rid of the EH fixup logic in earlier ordering passes (RPO layout, fgMoveColdBlocks, etc) and win back some TP -- but I wanted to evaluate that separately from this PR.

I think this PR is in good shape, so I thought I'd ping you now in case you want to take a look, though I don't plan to push to merge this until we get the LSRA changes where we want them.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants