Adding EVEX encoding support for emitOutputRRR(). #75934

DeepakRajendrakumaran · 2022-09-20T21:52:51Z

Overview

This change adds the following

Adding flag to turn on EVEX encoding - JitStressEVEXEncoding
Updates emitOutputRRR() as well as other encoding logic used from emitOutputRRR() to support EVEX encoding.

EVEX prefix

// 4-byte VEX prefix = 62 <R, X, B, R', 0, 0, m, m> <W, v, v, v, v, 1, p, p> <z, L', L, b, V', a, a, a>
// - R, X, B, W - bits to express corresponding REX prefixes.Additionally, X combines with B to expand r/m to 32 SIMD registers
// - R' - combines with R to expand reg to 32 SIMD registers
// - mm - lower 2 bits of m-mmmmm (5-bit) in corresponding VEX prefix
// - vvvv (4-bits) - register specifier in 1's complement form; must be 1111 if unused
// - pp (2-bits) - opcode extension providing equivalent functionality of a SIMD size prefix
//                 these prefixes are treated mandatory when used with escape opcode 0Fh for
//                 some SIMD instructions
//   00  - None   (0F    - packed float)
//   01  - 66     (66 0F - packed double)
//   10  - F3     (F3 0F - scalar float
//   11  - F2     (F2 0F - scalar double)
// - z - bit to specify merging mode
// - L - scalar or AVX-128 bit operations (L=0),  256-bit operations (L=1)
// - L'- bit to support 512-bit operations or rounding control mode.
// - b - broadcast/rc/sae context
// - V'- bit to extend vvvv
// - aaa - specifies mask register

From the above the following bits are in inverted form : R, X, B, R', vvvv, V'. This results in the following initila EVEX prefix mask 62 <11110000> <01111100> <00001000> = 0x62F07C0800000000ULL

Since we are primarily interested in the RRR path, I have added the logic to set the following bits(some will need additional logic once we support zmm registers and/or additional vector registers)

Added/Modified bit set logic

Vector Width - This is done in AddEvexPrefix(). Currently we are setting only L since we support only xmm and ymm.
To Do : once we have zmm enabled, need to set L'L
EVEX.W - Done in AddRexWPrefix(). This is a legacy bit from VEX. Just needed to set the right bit in EVEX
Choosing when to set W bit is a little more interesting since it can be an opcode extension. There are cases where W bit is WIG for VEX but has to be set for EVEX. For now, this is handled in IsWEvexOpcodeExtension() : returns TRUE if W=1
EVEX.R - Done in AddRexRPrefix via insEncodeReg345(). Encodes target register.
To Do : once we have zmm enabled, need to set R' This will require RegEncoding() to return 32 values
EVEX.X - Done in AddRexBPrefix via insEncodeReg012(). Encodes 2nd src register.
EVEX.vvvv - Done in insEncodeReg3456(). Had to change offset to set the correct bit in EVEX prefix. Encodes 1st src register.
To Do : once we have zmm enabled, need to set V' This will require RegEncoding() to return 32 values
pp and mm bits- used same logic as VEX but had to change offset

Bits not set anywhere - b and z

Relevant Details

How it works now

IsAVX512OnlyInstruction() - always returns false. We haven't added avx512 yet.
IsAVX512Instruction() - is not a strict superset of IsAVXInstruction and must be explicitly paired with an TakesEvexPrefix(ins) check. There are instructions which can be VEX encoded but not EVEX encoded. (For e.g., pmovmskb)
Function with SIMD are used as a superset of VEX, AVX, and AVX512 now. They are equivalent to VEXOrAVXorAVX512.
Currently, since we do not support opcode masks(k mask), we cannot EVEX encode some instructions which requires kmask for its EVEX form. See HasKMaskRegisterDest() (e.g., pcmpgtb : The EVEX encoded form has k register as destination)
TakesEvexPrefix(ins) will have to grow to actually check cases and will need the instruction descriptor.
codeEvexMigrationCheck() is a temporary check. This is required since we are not adding EVEX support for all paths at the same time. This means that for common functions, we have to differentiate between cases where EVEX encoding is required vs VEX encoding.

How to Test

This feature is turned off by default. In order to turn it on, set the following environment variable on

set COMPlus_JitStressEVEXEncoding=1

ghost · 2022-09-20T21:53:02Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Overview

This change adds the following

Adding flag to turn on EVEX encoding - JitStressEVEXEncoding
Updates emitOutputRRR() as well as other encoding logic used from emitOutputRRR() to support EVEX encoding.

EVEX prefix

// 4-byte VEX prefix = 62 <R, X, B, R', 0, 0, m, m> <W, v, v, v, v, 1, p, p> <z, L', L, b, V', a, a, a>
// - R, X, B, W - bits to express corresponding REX prefixes.Additionally, X combines with B to expand r/m to 32 SIMD registers
// - R' - combines with R to expand reg to 32 SIMD registers
// - mm - lower 2 bits of m-mmmmm (5-bit) in corresponding VEX prefix
// - vvvv (4-bits) - register specifier in 1's complement form; must be 1111 if unused
// - pp (2-bits) - opcode extension providing equivalent functionality of a SIMD size prefix
//                 these prefixes are treated mandatory when used with escape opcode 0Fh for
//                 some SIMD instructions
//   00  - None   (0F    - packed float)
//   01  - 66     (66 0F - packed double)
//   10  - F3     (F3 0F - scalar float
//   11  - F2     (F2 0F - scalar double)
// - z - bit to specify merging mode
// - L - scalar or AVX-128 bit operations (L=0),  256-bit operations (L=1)
// - L'- bit to support 512-bit operations or rounding control mode.
// - b - broadcast/rc/sae context
// - V'- bit to extend vvvv
// - aaa - specifies mask register

From the above the following bits are in inverted form : R, X, B, R', vvvv, V'. This results in the following initila EVEX prefix mask 62 <11110000> <01111100> <00001000> = 0x62F07C0800000000ULL

Since we are primarily interested in the RRR path, I have added the logic to set the following bits(some will need additional logic once we support zmm registers and/or additional vector registers)

Added/Modified bit set logic

Vector Width - This is done in AddEvexPrefix(). Currently we are setting only L since we support only xmm and ymm.
To Do : once we have zmm enabled, need to set L'L
EVEX.W - Done in AddRexWPrefix(). This is a legacy bit from VEX. Just needed to set the right bit in EVEX
Choosing when to set W bit is a little more interesting since it can be an opcode extension. There are cases where W bit is WIG for VEX but has to be set for EVEX. For now, this is handled in IsWEvexOpcodeExtension() : returns TRUE if W=1
EVEX.R - Done in AddRexRPrefix via insEncodeReg345(). Encodes target register.
To Do : once we have zmm enabled, need to set R' This will require RegEncoding() to return 32 values
EVEX.X - Done in AddRexBPrefix via insEncodeReg012(). Encodes 2nd src register.
EVEX.vvvv - Done in insEncodeReg3456(). Had to change offset to set the correct bit in EVEX prefix. Encodes 1st src register.
To Do : once we have zmm enabled, need to set V' This will require RegEncoding() to return 32 values
pp and mm bits- used same logic as VEX but had to change offset

Bits not set anywhere - b and z

Relevant Details

How it works now

IsAVX512OnlyInstruction() - always returns false. We haven't added avx512 yet.
IsAVX512Instruction() - is not a strict superset of IsAVXInstruction and must be explicitly paired with an TakesEvexPrefix(ins) check. There are instructions which can be VEX encoded but not EVEX encoded. (For e.g., pmovmskb)
Function with SIMD are used as a superset of VEX, AVX, and AVX512 now. They are equivalent to VEXOrAVXorAVX512.
Currently, since we do not support opcode masks(k mask), we cannot EVEX encode some instructions which requires kmask for its EVEX form. See HasKMaskRegisterDest() (e.g., pcmpgtb : The EVEX encoded form has k register as destination)
TakesEvexPrefix(ins) will have to grow to actually check cases and will need the instruction descriptor.
codeEvexMigrationCheck() is a temporary check. This is required since we are not adding EVEX support for all paths at the same time. This means that for common functions, we have to differentiate between cases where EVEX encoding is required vs VEX encoding.

How to Test

This feature is turned off by default. In order to turn it on, set the following environment variable on

set COMPlus_JitStressEVEXEncoding=1

Author:	DeepakRajendrakumaran
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

DeepakRajendrakumaran · 2022-09-20T22:35:13Z

@tannergooding

src/coreclr/jit/compiler.cpp

src/coreclr/jit/emitxarch.h

src/coreclr/jit/emitxarch.cpp

src/coreclr/jit/emitxarch.h

src/coreclr/jit/emitxarch.cpp

tannergooding · 2022-09-26T18:06:30Z

src/coreclr/jit/emitxarch.cpp

-#ifdef TARGET_AMD64
-    return emitter::code_t(code | 0x4800000000ULL);
-#else
-    assert(!"UNREACHED");
-    return code;
-#endif
-}
+#ifdef TARGET_AMD64
+    return emitter::code_t(code | 0x4800000000ULL);
+#else
+    assert(!"UNREACHED");
+    return code;
+#endif
+}


Do we know what changed as part of this diff? It looks identical to me, so maybe line endings or something else got changed by accident?

I went over this and re-did the changes to make it cleaner but still looks weird. Not sure what's going on

src/coreclr/jit/emitxarch.cpp

tannergooding · 2022-09-26T18:11:36Z

src/coreclr/jit/emitxarch.cpp


 #ifdef TARGET_AMD64
-
-emitter::code_t emitter::AddRexRPrefix(instruction ins, code_t code)


Just noting that its unfortunate the diff is represented this way. It makes reviewing the changes around emitOutputSIMDPrefixIfNeeded much harder.

This might be related to that place I called out above where the diff is reporting a change where visibly there is none.

I tried to clean this up but could not find a way to do so. Might have to add the changes back manually as a last resort.

src/coreclr/jit/emitxarch.cpp

tannergooding · 2022-09-26T18:16:58Z

The changes generally LGTM.

There are a few comments on possible naming changes and where we need to add "method headers" describing the functions. There's also a place where the diff seems to be particularly bad and makes it harder to review.

No major changes needed from what I can tell, and it should be generally good to merge once we get the items raised above covered.

CC. @dotnet/jit-contrib

tannergooding · 2022-10-03T18:43:52Z

nit: Formatting job is failing

src/coreclr/jit/compiler.h

tannergooding · 2022-10-03T18:47:48Z

src/coreclr/jit/emitxarch.cpp

+        }
+    }
+
+    return IsAvx512OrPriorInstruction(ins);


This is going to be dependent on AVX512VL being supported for many instructions.

Does the other change resolve this issue for now? As we add new instructions, the logic in TakesEvexPrefix should evolve to handle newer cases.

The code still looks generally correct and works on my Intel Core i9-11900H (11th Gen Tiger Lake) and my AMD Ryzen 7950X (Zen4) CPU.

There is a concern around some of the EVEX checks and the new encoding stress switch in relation to AVX512VL.

@tannergooding Did the changes resolve all of your concerns or are there any other changes you're waiting on?

I think so, but I'll need to give it another pass when I'm back in office near the end of next week.

tannergooding · 2022-10-03T18:52:04Z

The code still looks generally correct and works on my Intel Core i9-11900H (11th Gen Tiger Lake) and my AMD Ryzen 7950X (Zen4) CPU.

There is a concern around some of the EVEX checks and the new encoding stress switch in relation to AVX512VL.

Adding flag to turn on EVEX encoding.

BruceForstall

I left a few comments, but they are minor, so I won't ask for updates in this change, but would appreciate consideration of them for follow-up changes.

Sorry it took me so long to look at this.

BruceForstall · 2022-10-25T23:58:01Z

src/coreclr/jit/jitconfigvalues.h

-                                                                   // defined or used in a multireg context.
+CONFIG_INTEGER(EnableMultiRegLocals, W("EnableMultiRegLocals"), 1)   // Enable the enregistration of locals that are
+                                                                     // defined or used in a multireg context.
+CONFIG_INTEGER(JitStressEVEXEncoding, W("JitStressEVEXEncoding"), 0) // Enable EVEX encoding for SIMD instructions.


Do we need this switch to be available in Release builds? If not, put it in a "ifdef DEBUG" block (it's not currently in one AFAICT)

Suggested change

CONFIG_INTEGER(JitStressEVEXEncoding, W("JitStressEVEXEncoding"), 0) // Enable EVEX encoding for SIMD instructions.

#ifdef DEBUG

CONFIG_INTEGER(JitStressEVEXEncoding, W("JitStressEVEXEncoding"), 0) // Use EVEX encoding for SIMD instructions if possible.

#endif // DEBUG

BruceForstall · 2022-10-26T00:11:29Z

src/coreclr/jit/compiler.h

+    // canUseEvexEncoding - Answer the question: Is Evex encoding supported on this target.
+    //
+    // Returns:
+    //    TRUE if Evex encoding is supported, FALSE if not.


nit: for bool returning functions, write the comment using true and false that match the casing of the C++ keywords, and not TRUE and FALSE. For some of us, TRUE and FALSE map to the Win32 concepts with BOOL type. You can enclose them in single back-ticks like in GitHub markdown formatting if that helps distinguish them from other text, e.g.:

// `true` if Evex encoding is supported, `false` if not.

nit2: please include a blank // line at the end of the header comment block before the function so the comment text doesn't "run into" the function declaration.

BruceForstall · 2022-10-26T00:16:53Z

src/coreclr/jit/emitxarch.cpp

+// Returns:
+//    TRUE if ins is a SIMD instruction.
+//
+bool emitter::IsSimdInstruction(instruction ins) const


Seems like an odd name: aren't SSE2 instructions "SIMD"?

Maybe IsAvxOrNewerInstruction?

BruceForstall · 2022-10-26T00:22:48Z

src/coreclr/jit/emitxarch.cpp

+    return IsAvx512Instruction(ins) && !HasKMaskRegisterDest(ins);
+}
+
+// Add base EVEX prefix without setting W, R, X, or B bits


It would be useful to leave a reference here so people can find the definition of EVEX encoding. Something like:

// Intel AVX-512 encoding is defined in "Intel 64 and IA-32 Architectures Software Developer's Manual Volume 2", Section 2.6.

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 20, 2022

ghost added the community-contribution Indicates that the PR has been added by a community member label Sep 20, 2022

DeepakRajendrakumaran force-pushed the avx512-RRR branch 2 times, most recently from b9e433d to 63056c7 Compare September 21, 2022 17:54

dakersnar added the avx512 Related to the AVX-512 architecture label Sep 21, 2022

DeepakRajendrakumaran force-pushed the avx512-RRR branch from 63056c7 to 764eb58 Compare September 21, 2022 22:46

tannergooding reviewed Sep 22, 2022

View reviewed changes

src/coreclr/jit/compiler.cpp Outdated Show resolved Hide resolved

tannergooding reviewed Sep 22, 2022

View reviewed changes

src/coreclr/jit/compiler.cpp Show resolved Hide resolved

tannergooding reviewed Sep 22, 2022

View reviewed changes

src/coreclr/jit/emitxarch.h Show resolved Hide resolved

tannergooding reviewed Sep 22, 2022

View reviewed changes

src/coreclr/jit/emitxarch.cpp Outdated Show resolved Hide resolved