Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contain memory operands under casts #72719

Merged
merged 6 commits into from
Aug 24, 2022

Conversation

SingleAccretion
Copy link
Contributor

@SingleAccretion SingleAccretion commented Jul 23, 2022

Recognize legal patterns in lowering and then fold the sign/zero-extension into an appropriate load at codegen time.

The change implements full support on XARCH, partial support on ARM/64, and punts LA. A follow-up issue will be filed to track the remaining work. The ARM/64 commit contains some notes on what is required for said support on our load/store architectures.

The diffs are nice and simple:

-            ldr     lr, [sp+0x34]      // [V07 arg7]
-            uxtb    lr, lr
+            ldrb    lr, [sp+0x34]      // [V07 arg7]

Regressions are RA/alignment.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 23, 2022
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Jul 23, 2022
@ghost
Copy link

ghost commented Jul 23, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Fold the sign/zero-extension into an appropriate load.

The change implements full support on XARCH, partial support on ARM/64, and punts LA. A follow-up issue will be filed to track the remaining work. The ARM/64 commit contains some notes on what is required for said support on our load/store architectures.

The diffs are nice and simple:

-            ldr     lr, [sp+0x34]      // [V07 arg7]
-            uxtb    lr, lr
+            ldrb    lr, [sp+0x34]      // [V07 arg7]
Author: SingleAccretion
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@SingleAccretion SingleAccretion force-pushed the Casts-Containment branch 3 times, most recently from d3d8af4 to 6ffddae Compare July 23, 2022 23:14
@SingleAccretion SingleAccretion marked this pull request as ready for review July 24, 2022 12:19
@SingleAccretion
Copy link
Contributor Author

@dotnet/jit-contrib

Copy link
Contributor

@TIHan TIHan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diffs look really good.

I tried doing something similar here: #70756 - but I didn't get to work on it longer.

@SingleAccretion
Copy link
Contributor Author

I think it would be good to run the fuzzers and [libraries] stress on this change.

@EgorBo
Copy link
Member

EgorBo commented Jul 28, 2022

/azp run Fuzzlyn, Antigen, runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, runtime-coreclr gcstress0x3-gcstress0xc, runtime-coreclr jitstressregs

@azure-pipelines
Copy link

Azure Pipelines successfully started running 6 pipeline(s).

@jakobbotsch jakobbotsch added this to the 8.0.0 milestone Jul 29, 2022
TODO: consider using a dedicated IND_EXT oper for ARM/ARM64 instead of containment.
This would allow us to cleany handle all indirections. It would not mean we'd give
up on the casts containment, as we'd still need to handle the "reg optional" case.

IND_EXT will be much like an ordinary IND, but have a "source" and "target" types.
The "target" type would always be int/long, while "source" could be of any integral
type.

This design would be a bit more natural, and nicely separable from casts. However,
the main problem with the current state of things, apart from the fact codegen of
indirections is tied strongly to "GenTreeIndir", is the fact that changing type of
the load can invalidate LEA containment. One would think this is solvable with some
tricks, like re-running containment analysis on an indirection after processing the
cast, but ARM64 codegen doesn't support uncontained LEAs in some cases.

A possible solution to that problem is uncontaining the whole address tree. That
would be messy, but ought to work. An additional complication is that these trees
can contain a lot of contained operands as part of ADDEX and BFIZ, so what would
have to be done first is the making of these into proper EXOPs.

In any case, this is all future work.
In CAST<short>(IND<byte>(...)), "m_extendSrcSize" must be "1".

Modulo the above, stress runs looked clean.
Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's curious to me that there are so many x86 improvements. Do we not already removed most of these unnecessary casts there by retyping loads in morph?

@SingleAccretion
Copy link
Contributor Author

It's curious to me that there are so many x86 improvements. Do we not already removed most of these unnecessary casts there by retyping loads in morph?

I think most of these come from normalize-on-load stack parameters, which we do not retype.

@jakobbotsch
Copy link
Member

I think most of these come from normalize-on-load stack parameters, which we do not retype.

Doesn't morph introduce this IR shape itself in fgMorphLocalVar? So doing that is a large regression?

@SingleAccretion
Copy link
Contributor Author

Doesn't morph introduce this IR shape itself in fgMorphLocalVar? So doing that is a large regression?

It of course does, and it is, but it is necessary for in-register parameters (and on-stack ones as well, though that should be relatively easy to fix by using extending loads in genEnregisterIncomingStackArgs).

Morph's logic there is questionable in more than one way. If we know the local will be DNER at that point, we could retype it to its small type instead of introducing a cast. However, this turns out to sometimes not be a win because we don't CSE locals, but do CSE casts.

@jakobbotsch jakobbotsch merged commit 0469020 into dotnet:main Aug 24, 2022
@jakobbotsch
Copy link
Member

Makes sense about CSE. The way fgMorphLocalVar worked always seemed odd to me.

Anyway, thanks for the contribution as usual.

@DrewScoggins
Copy link
Member

DrewScoggins commented Aug 30, 2022

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Sep 1, 2022

Possible regressions:
ubuntu arm64: dotnet/perf-autofiling-issues#8221
ubuntu x64: dotnet/perf-autofiling-issues#8255
windows x64: dotnet/perf-autofiling-issues#8258

@ghost ghost locked as resolved and limited conversation to collaborators Oct 2, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants