Emit AVX-512 vector instructions #8264

itsamelambda · 2017-05-31T14:23:52Z

With the announcement of Skylake-X, AVX-512 is going mainstream.

The CLR should emit AVX-512 vector instructions that System.Numerics.Vector can use.

category:cq
theme:vector-codegen
skill-level:expert
cost:extra-large

jkotas · 2017-05-31T20:33:11Z

cc @CarolEidt

RussKeldorph · 2017-05-31T23:23:22Z

@dotnet/jit-contrib

oscarbg · 2018-04-11T22:12:20Z

News on this?

CarolEidt · 2018-04-11T22:25:55Z

This is likely to be a large work item, and I don't know how high this will land on the priority list for future work.
Much of the work will be up-front investigation work, and that could be particularly amenable to community contribution:

What are the VM requirements - that is, how does the VM communicate with the various OS's to not just identify whether the OS supports the feature, but how context is/should be saved.
What are the ABI implications - that is, how are 512-byte vectors passed in the standard calling convention (on each OS) and how do they impact potential future implementation of the vector calling convention?
Do the AVX-512 instructions have characteristics that must be modeled by the JIT, and that are not exposed by existing hw intrinsics?

zhongkaifu · 2018-06-14T22:58:09Z

Can you please prioritize it ? Our project heavily depends on System.Numerics.Vector and we are using Intel(R) Xeon(R) Platinum 8168 CPU (Skylake).

tannergooding · 2018-06-14T23:18:07Z

@zhongkaifu, do you have any numbers on how much of a performance increase AVX-512 is (both for a specific workload, and for applications as a whole)?

zhongkaifu · 2018-06-14T23:40:46Z

@tannergooding Our existing code could get 2x performance increase from AVX-128bits to AVX-256bits. For AVX-512, since existing System.Numerics.Vector doesn't support it yet, we cannot test it.

tannergooding · 2018-06-15T14:06:03Z

@zhongkaifu, it may be worth getting some experimental numbers using native code. Having some information showing that this improves your overall scenario would help to prioritize the work appropriately.

Like with many new ISAs or alternative algorithms, using AVX-512 isn't always a clear cut perf gain and benchmarking/profiling is important. Depending on the processor, workload, etc, they can actually reduce the frequency of your processor (temporarily) and impact the overall performance of the process (or other processes).

A simple search will show some blog posts from various consumers and some technical sheets from Intel which describe both the benefits and drawbacks that AVX-512 can find

You might want to see the following from CloudFlar: https://blog.cloudflare.com/on-the-dangers-of-intels-frequency-scaling/

and this Spec from Intel (see Erratta 24, and others): https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-spec-update.html

zhongkaifu · 2018-06-15T18:17:41Z

Thanks @tannergooding . I've read these articles, and they are really helpful. I didn't know this problem before. I may use MKL to run some tests and figure out how many gain we can have.

EgorBo · 2019-01-03T23:16:36Z

https://godbolt.org/z/bX3h2h
ported a piece of HashCode.cs to C and allowed clang to use avx-512
it inserted vprold (instead of vpsrld vpslld vpor)🙂 so I could try to vectorize it in C#:

public static int Combine(Vector128<uint> values)
{
    Vector128<uint> hash = seedVec;

    hash = Sse2.Add(hash, Sse41.MultiplyLow(values, Vector128.Create(Prime2)));

    // these three instructions could be a single `vprold` with AVX-512
    hash = Sse2.Or(
        Sse2.ShiftLeftLogical(hash, 13),
        Sse2.ShiftRightLogical(hash, 19));

    hash = Sse41.MultiplyLow(hash, Vector128.Create(Prime1));

    // same here - `vprold`
    hash = Sse2.Or(
        Avx2.ShiftLeftLogicalVariable(hash, Vector128.Create(1u, 7u, 12u, 18u)),
        Avx2.ShiftRightLogicalVariable(hash, Vector128.Create(31u, 25u, 20u, 14u)));

    // horizontal sum and add 16 to the result
    var hashAsInt32 = hash.AsInt32();
    hashAsInt32 = Ssse3.HorizontalAdd(hashAsInt32, hashAsInt32);
    hashAsInt32 = Ssse3.HorizontalAdd(hashAsInt32, hashAsInt32);
    var sum16 = Sse41.Extract(hashAsInt32.AsUInt32(), 0) + 16;

    return (int)MixFinal(sum16);
}

BruceForstall · 2022-10-30T20:40:41Z

I'm going to close this in favor of #77034

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

msftgits added this to the Future milestone Jan 31, 2020

Symbai mentioned this issue May 3, 2020

AVX-512 support in System.Runtime.Intrinsics.X86 #35773

Closed

BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020

BruceForstall added the avx512 Related to the AVX-512 architecture label Oct 13, 2022

BruceForstall removed arch-x86 arch-x64 JitUntriaged CLR JIT issues needing additional triage labels Oct 26, 2022

BruceForstall closed this as completed Oct 30, 2022

ghost locked as resolved and limited conversation to collaborators Nov 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emit AVX-512 vector instructions #8264

Emit AVX-512 vector instructions #8264

itsamelambda commented May 31, 2017

jkotas commented May 31, 2017

RussKeldorph commented May 31, 2017

oscarbg commented Apr 11, 2018

CarolEidt commented Apr 11, 2018

zhongkaifu commented Jun 14, 2018

tannergooding commented Jun 14, 2018

zhongkaifu commented Jun 14, 2018

tannergooding commented Jun 15, 2018

zhongkaifu commented Jun 15, 2018

EgorBo commented Jan 3, 2019 •

edited

Loading

BruceForstall commented Oct 30, 2022

Emit AVX-512 vector instructions #8264

Emit AVX-512 vector instructions #8264

Comments

itsamelambda commented May 31, 2017

jkotas commented May 31, 2017

RussKeldorph commented May 31, 2017

oscarbg commented Apr 11, 2018

CarolEidt commented Apr 11, 2018

zhongkaifu commented Jun 14, 2018

tannergooding commented Jun 14, 2018

zhongkaifu commented Jun 14, 2018

tannergooding commented Jun 15, 2018

zhongkaifu commented Jun 15, 2018

EgorBo commented Jan 3, 2019 • edited Loading

BruceForstall commented Oct 30, 2022

EgorBo commented Jan 3, 2019 •

edited

Loading