Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding EVEX encoding support for emitOutputRRR(). #75934

Merged
merged 1 commit into from
Oct 26, 2022

Conversation

DeepakRajendrakumaran
Copy link
Contributor

Overview

This change adds the following

  1. Adding flag to turn on EVEX encoding - JitStressEVEXEncoding
  2. Updates emitOutputRRR() as well as other encoding logic used from emitOutputRRR() to support EVEX encoding.

EVEX prefix

// 4-byte VEX prefix = 62 <R, X, B, R', 0, 0, m, m> <W, v, v, v, v, 1, p, p> <z, L', L, b, V', a, a, a>
// - R, X, B, W - bits to express corresponding REX prefixes.Additionally, X combines with B to expand r/m to 32 SIMD registers
// - R' - combines with R to expand reg to 32 SIMD registers
// - mm - lower 2 bits of m-mmmmm (5-bit) in corresponding VEX prefix
// - vvvv (4-bits) - register specifier in 1's complement form; must be 1111 if unused
// - pp (2-bits) - opcode extension providing equivalent functionality of a SIMD size prefix
//                 these prefixes are treated mandatory when used with escape opcode 0Fh for
//                 some SIMD instructions
//   00  - None   (0F    - packed float)
//   01  - 66     (66 0F - packed double)
//   10  - F3     (F3 0F - scalar float
//   11  - F2     (F2 0F - scalar double)
// - z - bit to specify merging mode
// - L - scalar or AVX-128 bit operations (L=0),  256-bit operations (L=1)
// - L'- bit to support 512-bit operations or rounding control mode.
// - b - broadcast/rc/sae context
// - V'- bit to extend vvvv
// - aaa - specifies mask register

From the above the following bits are in inverted form : R, X, B, R', vvvv, V'. This results in the following initila EVEX prefix mask 62 <11110000> <01111100> <00001000> = 0x62F07C0800000000ULL

Since we are primarily interested in the RRR path, I have added the logic to set the following bits(some will need additional logic once we support zmm registers and/or additional vector registers)

Added/Modified bit set logic

  • Vector Width - This is done in AddEvexPrefix(). Currently we are setting only L since we support only xmm and ymm.
    To Do : once we have zmm enabled, need to set L'L

  • EVEX.W - Done in AddRexWPrefix(). This is a legacy bit from VEX. Just needed to set the right bit in EVEX
    Choosing when to set W bit is a little more interesting since it can be an opcode extension. There are cases where W bit is WIG for VEX but has to be set for EVEX. For now, this is handled in IsWEvexOpcodeExtension() : returns TRUE if W=1

  • EVEX.R - Done in AddRexRPrefix via insEncodeReg345(). Encodes target register.
    To Do : once we have zmm enabled, need to set R' This will require RegEncoding() to return 32 values

  • EVEX.X - Done in AddRexBPrefix via insEncodeReg012(). Encodes 2nd src register.

  • EVEX.vvvv - Done in insEncodeReg3456(). Had to change offset to set the correct bit in EVEX prefix. Encodes 1st src register.
    To Do : once we have zmm enabled, need to set V' This will require RegEncoding() to return 32 values

  • pp and mm bits- used same logic as VEX but had to change offset

Bits not set anywhere - b and z

Relevant Details

How it works now

  • IsAVX512OnlyInstruction() - always returns false. We haven't added avx512 yet.

  • IsAVX512Instruction() - is not a strict superset of IsAVXInstruction and must be explicitly paired with an TakesEvexPrefix(ins) check. There are instructions which can be VEX encoded but not EVEX encoded. (For e.g., pmovmskb)

  • Function with SIMD are used as a superset of VEX, AVX, and AVX512 now. They are equivalent to VEXOrAVXorAVX512.

  • Currently, since we do not support opcode masks(k mask), we cannot EVEX encode some instructions which requires kmask for its EVEX form. See HasKMaskRegisterDest() (e.g., pcmpgtb : The EVEX encoded form has k register as destination)

  • TakesEvexPrefix(ins) will have to grow to actually check cases and will need the instruction descriptor.

  • codeEvexMigrationCheck() is a temporary check. This is required since we are not adding EVEX support for all paths at the same time. This means that for common functions, we have to differentiate between cases where EVEX encoding is required vs VEX encoding.

How to Test

This feature is turned off by default. In order to turn it on, set the following environment variable on

set COMPlus_JitStressEVEXEncoding=1

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 20, 2022
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Sep 20, 2022
@ghost
Copy link

ghost commented Sep 20, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Overview

This change adds the following

  1. Adding flag to turn on EVEX encoding - JitStressEVEXEncoding
  2. Updates emitOutputRRR() as well as other encoding logic used from emitOutputRRR() to support EVEX encoding.

EVEX prefix

// 4-byte VEX prefix = 62 <R, X, B, R', 0, 0, m, m> <W, v, v, v, v, 1, p, p> <z, L', L, b, V', a, a, a>
// - R, X, B, W - bits to express corresponding REX prefixes.Additionally, X combines with B to expand r/m to 32 SIMD registers
// - R' - combines with R to expand reg to 32 SIMD registers
// - mm - lower 2 bits of m-mmmmm (5-bit) in corresponding VEX prefix
// - vvvv (4-bits) - register specifier in 1's complement form; must be 1111 if unused
// - pp (2-bits) - opcode extension providing equivalent functionality of a SIMD size prefix
//                 these prefixes are treated mandatory when used with escape opcode 0Fh for
//                 some SIMD instructions
//   00  - None   (0F    - packed float)
//   01  - 66     (66 0F - packed double)
//   10  - F3     (F3 0F - scalar float
//   11  - F2     (F2 0F - scalar double)
// - z - bit to specify merging mode
// - L - scalar or AVX-128 bit operations (L=0),  256-bit operations (L=1)
// - L'- bit to support 512-bit operations or rounding control mode.
// - b - broadcast/rc/sae context
// - V'- bit to extend vvvv
// - aaa - specifies mask register

From the above the following bits are in inverted form : R, X, B, R', vvvv, V'. This results in the following initila EVEX prefix mask 62 <11110000> <01111100> <00001000> = 0x62F07C0800000000ULL

Since we are primarily interested in the RRR path, I have added the logic to set the following bits(some will need additional logic once we support zmm registers and/or additional vector registers)

Added/Modified bit set logic

  • Vector Width - This is done in AddEvexPrefix(). Currently we are setting only L since we support only xmm and ymm.
    To Do : once we have zmm enabled, need to set L'L

  • EVEX.W - Done in AddRexWPrefix(). This is a legacy bit from VEX. Just needed to set the right bit in EVEX
    Choosing when to set W bit is a little more interesting since it can be an opcode extension. There are cases where W bit is WIG for VEX but has to be set for EVEX. For now, this is handled in IsWEvexOpcodeExtension() : returns TRUE if W=1

  • EVEX.R - Done in AddRexRPrefix via insEncodeReg345(). Encodes target register.
    To Do : once we have zmm enabled, need to set R' This will require RegEncoding() to return 32 values

  • EVEX.X - Done in AddRexBPrefix via insEncodeReg012(). Encodes 2nd src register.

  • EVEX.vvvv - Done in insEncodeReg3456(). Had to change offset to set the correct bit in EVEX prefix. Encodes 1st src register.
    To Do : once we have zmm enabled, need to set V' This will require RegEncoding() to return 32 values

  • pp and mm bits- used same logic as VEX but had to change offset

Bits not set anywhere - b and z

Relevant Details

How it works now

  • IsAVX512OnlyInstruction() - always returns false. We haven't added avx512 yet.

  • IsAVX512Instruction() - is not a strict superset of IsAVXInstruction and must be explicitly paired with an TakesEvexPrefix(ins) check. There are instructions which can be VEX encoded but not EVEX encoded. (For e.g., pmovmskb)

  • Function with SIMD are used as a superset of VEX, AVX, and AVX512 now. They are equivalent to VEXOrAVXorAVX512.

  • Currently, since we do not support opcode masks(k mask), we cannot EVEX encode some instructions which requires kmask for its EVEX form. See HasKMaskRegisterDest() (e.g., pcmpgtb : The EVEX encoded form has k register as destination)

  • TakesEvexPrefix(ins) will have to grow to actually check cases and will need the instruction descriptor.

  • codeEvexMigrationCheck() is a temporary check. This is required since we are not adding EVEX support for all paths at the same time. This means that for common functions, we have to differentiate between cases where EVEX encoding is required vs VEX encoding.

How to Test

This feature is turned off by default. In order to turn it on, set the following environment variable on

set COMPlus_JitStressEVEXEncoding=1

Author: DeepakRajendrakumaran
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@DeepakRajendrakumaran
Copy link
Contributor Author

@tannergooding

Comment on lines 870 to 1145
#ifdef TARGET_AMD64
return emitter::code_t(code | 0x4800000000ULL);
#else
assert(!"UNREACHED");
return code;
#endif
}
#ifdef TARGET_AMD64
return emitter::code_t(code | 0x4800000000ULL);
#else
assert(!"UNREACHED");
return code;
#endif
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know what changed as part of this diff? It looks identical to me, so maybe line endings or something else got changed by accident?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went over this and re-did the changes to make it cleaner but still looks weird. Not sure what's going on


#ifdef TARGET_AMD64

emitter::code_t emitter::AddRexRPrefix(instruction ins, code_t code)
Copy link
Member

@tannergooding tannergooding Sep 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that its unfortunate the diff is represented this way. It makes reviewing the changes around emitOutputSIMDPrefixIfNeeded much harder.

This might be related to that place I called out above where the diff is reporting a change where visibly there is none.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to clean this up but could not find a way to do so. Might have to add the changes back manually as a last resort.

@tannergooding
Copy link
Member

The changes generally LGTM.

There are a few comments on possible naming changes and where we need to add "method headers" describing the functions. There's also a place where the diff seems to be particularly bad and makes it harder to review.

No major changes needed from what I can tell, and it should be generally good to merge once we get the items raised above covered.

CC. @dotnet/jit-contrib

@tannergooding
Copy link
Member

nit: Formatting job is failing

}
}

return IsAvx512OrPriorInstruction(ins);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be dependent on AVX512VL being supported for many instructions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the other change resolve this issue for now? As we add new instructions, the logic in TakesEvexPrefix should evolve to handle newer cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code still looks generally correct and works on my Intel Core i9-11900H (11th Gen Tiger Lake) and my AMD Ryzen 7950X (Zen4) CPU.

There is a concern around some of the EVEX checks and the new encoding stress switch in relation to AVX512VL.

@tannergooding Did the changes resolve all of your concerns or are there any other changes you're waiting on?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, but I'll need to give it another pass when I'm back in office near the end of next week.

@tannergooding
Copy link
Member

The code still looks generally correct and works on my Intel Core i9-11900H (11th Gen Tiger Lake) and my AMD Ryzen 7950X (Zen4) CPU.

There is a concern around some of the EVEX checks and the new encoding stress switch in relation to AVX512VL.

Adding flag to turn on EVEX encoding.
Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments, but they are minor, so I won't ask for updates in this change, but would appreciate consideration of them for follow-up changes.

Sorry it took me so long to look at this.

// defined or used in a multireg context.
CONFIG_INTEGER(EnableMultiRegLocals, W("EnableMultiRegLocals"), 1) // Enable the enregistration of locals that are
// defined or used in a multireg context.
CONFIG_INTEGER(JitStressEVEXEncoding, W("JitStressEVEXEncoding"), 0) // Enable EVEX encoding for SIMD instructions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this switch to be available in Release builds? If not, put it in a "ifdef DEBUG" block (it's not currently in one AFAICT)

Suggested change
CONFIG_INTEGER(JitStressEVEXEncoding, W("JitStressEVEXEncoding"), 0) // Enable EVEX encoding for SIMD instructions.
#ifdef DEBUG
CONFIG_INTEGER(JitStressEVEXEncoding, W("JitStressEVEXEncoding"), 0) // Use EVEX encoding for SIMD instructions if possible.
#endif // DEBUG

// canUseEvexEncoding - Answer the question: Is Evex encoding supported on this target.
//
// Returns:
// TRUE if Evex encoding is supported, FALSE if not.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for bool returning functions, write the comment using true and false that match the casing of the C++ keywords, and not TRUE and FALSE. For some of us, TRUE and FALSE map to the Win32 concepts with BOOL type. You can enclose them in single back-ticks like in GitHub markdown formatting if that helps distinguish them from other text, e.g.:

 //    `true` if Evex encoding is supported, `false` if not.

nit2: please include a blank // line at the end of the header comment block before the function so the comment text doesn't "run into" the function declaration.

// Returns:
// TRUE if ins is a SIMD instruction.
//
bool emitter::IsSimdInstruction(instruction ins) const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like an odd name: aren't SSE2 instructions "SIMD"?

Maybe IsAvxOrNewerInstruction?

return IsAvx512Instruction(ins) && !HasKMaskRegisterDest(ins);
}

// Add base EVEX prefix without setting W, R, X, or B bits
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to leave a reference here so people can find the definition of EVEX encoding. Something like:

// Intel AVX-512 encoding is defined in "Intel 64 and IA-32 Architectures Software Developer's Manual Volume 2", Section 2.6.

@BruceForstall BruceForstall merged commit 7b5ab35 into dotnet:main Oct 26, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Nov 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI avx512 Related to the AVX-512 architecture community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants