Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avx512f #917

Merged
merged 25 commits into from
Sep 26, 2020
Merged

Avx512f #917

merged 25 commits into from
Sep 26, 2020

Conversation

minybot
Copy link
Contributor

@minybot minybot commented Sep 24, 2020

castps128_512, castps_pd, castps_si512, castps512_ps256, castps512_ps128, castps128_ps512, castps256_ps512
castpd_ps, castpd_si512, castpd512_pd256, castpd512_pd128, castpd128_pd512, castpd256_pd512
castsi512_si128, castsi512_si256, castsi128_si512, castsi256_si512, castsi512_ps, castsi512_pd

broadcastsd_pd, broadcastss_ps, broadcastd_epi32, broadcastq_epi64
broadcast_i32x4, broadcast_i64x4, broadcast_f32x4, broadcast_f64x4
addnot: epi32, epi64
insertf32x4, insertf64x4, inserti32x4, inserti64x4
mask_blend: epi32,epi64,ps,pd
unpackhi: epi32,epi64,ps,pd
unpacklo: epi32,epi64,ps,pd

@rust-highfive
Copy link

r? @Amanieu

(rust_highfive has picked a reviewer for you, use r? to override)

@minybot
Copy link
Contributor Author

minybot commented Sep 24, 2020

For boradcast_f32x4,i32x4,f64x4,i64x4
The instruction generated from Linux and msvc are totally different. (shuffle vs broadcast)
Any suggestion to solve the check errors?

@Amanieu
Copy link
Member

Amanieu commented Sep 24, 2020

Add const_fn_transmute to the feature list in lib.rs. This issue is not caused by you, it's from the latest Rust nightly.

#[target_feature(enable = "avx512f")]
#[cfg_attr(test, assert_instr(vbroadcas))] //should be vpbroadcastq
pub unsafe fn _mm512_broadcastq_epi64(a: __m128i) -> __m512i {
simd_shuffle8(a, a, [1, 1, 1, 1, 1, 1, 1, 1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be [0, 0, 0, 0, 0, 0, 0, 0]? The documentation says the low element is broadcast.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be [0, 0, 0, 0, 0, 0, 0, 0]? The documentation says the low element is broadcast.

Yes. You are correct. Thanks.

#[target_feature(enable = "avx512f")]
#[cfg_attr(test, assert_instr(vperm))] //should be vpunpckhqdq
pub unsafe fn _mm512_unpackhi_epi64(a: __m512i, b: __m512i) -> __m512i {
simd_shuffle8(a, b, [2, 10, 3, 11, 2 + 4, 10 + 4, 3 + 4, 11 + 4])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct shuffle should be [1, 9, 3, 11, 5, 13, 7, 15].

You can get this by compiling the intrinsic with Clang and extracting the LLVM IR: https://godbolt.org/z/PzoKPz

b,
[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 12, 13, 14, 15],
),
_ => simd_shuffle16(a, b, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 16, 17, 18, 19]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change this to panic if an invalid imm8 is passed in? Same for all the others.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change this to panic if an invalid imm8 is passed in? Same for all the others.

As an example _mm512_inserti32x4(a, b, imm8)
In clang, it will check if imm8 is [0..3] in compile time. But in Rust we need to use constify_imm2 to check it in the run time.

But, constify_imm2 uses & 0b11 to make everything pass.
macro_rules! constify_imm2 {
($imm8:expr, $expand:ident) => {
#[allow(overflowing_literals)]
match ($imm8) & 0b11 {
0 => $expand!(0),
1 => $expand!(1),
2 => $expand!(2),
_ => $expand!(3),
}
};
}

So I need to create something like constify_imm2_sae?
match ($imm8)
0=>...
1=>...
...
_=> panic!(xxxxx)?

In x86_64: avx. Clang will check if index is [0..3], so we need to add the invalid check on this?
pub unsafe fn _mm256_insert_epi64(a: __m256i, i: i64, index: i32) -> __m256i {
let a = a.as_i64x4();
match index & 3 {
0 => transmute(simd_insert(a, 0, i)),
1 => transmute(simd_insert(a, 1, i)),
2 => transmute(simd_insert(a, 2, i)),
_ => transmute(simd_insert(a, 3, i)),
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or we just use "assert!(imm8 >= 0 && imm8 <= 255)"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the compiler would reject out-of-range values at compile-time, but rustc doesn't support this yet. So instead we should panic if an out-of-range value is passed. This allows us to reject out-of-range values in the future once rustc has that functionality.

You can either do the same as consitfy_imm2_sae or just assert!(imm8 <= 3).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I will do assert!(imm8 <= 3) and update a new version.

#[target_feature(enable = "avx512f")]
#[cfg_attr(test, assert_instr(vperm))] //should be vunpackhpd
pub unsafe fn _mm512_unpackhi_pd(a: __m512d, b: __m512d) -> __m512d {
simd_shuffle8(a, b, [2, 10, 3, 11, 2 + 4, 10 + 4, 3 + 4, 11 + 4])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct shuffle here should be [1, 9, 3, 11, 5, 13, 7, 15].

/// [Intel's documentation](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=512_mask_broadcastd_epi32&expand=546)
#[inline]
#[target_feature(enable = "avx512f")]
#[cfg_attr(test, assert_instr(vpbroadcast))] //should be vpbroadcastd
Copy link
Member

@Amanieu Amanieu Sep 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the one below generate the correct vpbroadcastd instruction. The comment is wrong.

/// [Intel's documentation](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=512_broadcast_i32x4&expand=510)
#[inline]
#[target_feature(enable = "avx512f")]
//#[cfg_attr(test, assert_instr(vbroadcast))] should be vbroadcasti32x4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment explaining that LLVM doesn't generate a vbroadcast instruction for this intrinsic. Please don't leave commented-out code lying around.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment explaining that LLVM doesn't generate a vbroadcast instruction for this intrinsic. Please don't leave commented-out code lying around.

Yes, For boradcast_f32x4,i32x4,f64x4,i64x4
The instruction generated from Linux and msvc are totally different. (shuffle vs broadcast).
So I remove the test and add a comment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you can remove the test and add a comment.

@Amanieu Amanieu merged commit 7895ab1 into rust-lang:master Sep 26, 2020
@minybot minybot deleted the avx512f branch September 26, 2020 15:31
jyn514 added a commit to jyn514/rust that referenced this pull request Oct 1, 2020
The primary purpose is to get the fixes from
rust-lang/stdarch#920
and rust-lang/stdarch#922.

The other changes included are
rust-lang/stdarch#917 and
rust-lang/stdarch#919.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants