Move from #[inline(always)] to #[inline] (#306)

* Move from #[inline(always)] to #[inline] This commit blanket changes all `#[inline(always)]` annotations to `#[inline]`. Fear not though, this should not be a regression! To clarify, though, this change is done out of correctness to ensure that we don't hit stray LLVM errors. Most of the LLVM intrinsics and various LLVM functions we actually lower down to only work correctly if they are invoked from a function with an appropriate target feature set. For example if we were to out-of-the-blue invoke an AVX intrinsic then we get a [codegen error][avx-error]. This error comes about because the surrounding function isn't enabling the AVX feature. Now in general we don't have a lot of control over how this crate is consumed by downstream crates. It'd be a pretty bad mistake if all mistakes showed up as scary un-debuggable codegen errors in LLVM! On the other side of this issue *we* as the invokers of these intrinsics are "doing the right thing". All our functions in this crate are tagged appropriately with target features to be codegen'd correctly. Indeed we have plenty of tests asserting that we can codegen everything across multiple platforms! The error comes about here because of precisely the `#[inline(always)]` attribute. Typically LLVM *won't* inline functions across target feature sets. For example if you have a normal function which calls a function that enables AVX2, then the target, no matter how small, won't be inlined into the caller. This is done for correctness (register preserving and all that) but is also how these codegen errors are prevented in practice. Now we as stdsimd, however, are currently tagging all functions with "always inline this, no matter what". That ends up, apparently, bypassing the logic of "is this even possible to inline". In turn we start inlining things like AVX intrinsics into functions that can't actually call AVX intrinsics, creating codegen errors at compile time. So with all that motivation, this commit switches to the normal inline hints for these functions, just `#[inline]`, instead of `#[inline(always)]`. Now for the stdsimd crate it is absolutely critical that all functions are inlined to have good performance. Using `#[inline]`, however, shouldn't hamper that! The compiler will recognize the `#[inline]` attribute and make sure that each of these functions is *candidate* to being inlined into any and all downstream codegen units. (aka if we were missing `#[inline]` then LLVM wouldn't even know the definition to inline most of the time). After that, though, we're relying on LLVM to naturally inline these functions as opposed to forcing it to do so. Typically, however, these intrinsics are one-liners and are trivially inlineable, so I'd imagine that LLVM will go ahead and inline everything all over the place. All in all this change is brought about by #253 which noticed various codegen errors. I originally thought it was due to ABI issues but turned out to be wrong! (although that was also a bug which has since been resolved). In any case after this change I was able to get the example in #253 to execute in both release and debug mode. Closes #253 [avx-error]: https://play.rust-lang.org/?gist=50cb08f1e2242e22109a6d69318bd112&version=nightly * Add inline(always) on eflags intrinsics Their ABI actually relies on it! * Leave #[inline(always)] on portable types They're causing test failures on ARM, let's investigate later.
rust-lang · Jan 29, 2018 · e38d5ac · e38d5ac
1 parent cd74516
commit e38d5ac
Show file tree

Hide file tree

Showing 41 changed files with 1,046 additions and 1,046 deletions.
diff --git a/coresimd/src/aarch64/neon.rs b/coresimd/src/aarch64/neon.rs
@@ -8,31 +8,31 @@ use simd_llvm::simd_add;
 use v128::f64x2;
 
 /// Vector add.
-#[inline(always)]
+#[inline]
 #[target_feature(enable = "neon")]
 #[cfg_attr(test, assert_instr(fadd))]
 pub unsafe fn vadd_f64(a: f64, b: f64) -> f64 {
     a + b
 }
 
 /// Vector add.
-#[inline(always)]
+#[inline]
 #[target_feature(enable = "neon")]
 #[cfg_attr(test, assert_instr(fadd))]
 pub unsafe fn vaddq_f64(a: f64x2, b: f64x2) -> f64x2 {
     simd_add(a, b)
 }
 
 /// Vector add.
-#[inline(always)]
+#[inline]
 #[target_feature(enable = "neon")]
 #[cfg_attr(test, assert_instr(add))]
 pub unsafe fn vaddd_s64(a: i64, b: i64) -> i64 {
     a + b
 }
 
 /// Vector add.
-#[inline(always)]
+#[inline]
 #[target_feature(enable = "neon")]
 #[cfg_attr(test, assert_instr(add))]
 pub unsafe fn vaddd_u64(a: u64, b: u64) -> u64 {

diff --git a/coresimd/src/aarch64/v8.rs b/coresimd/src/aarch64/v8.rs
@@ -9,14 +9,14 @@
 use stdsimd_test::assert_instr;
 
 /// Reverse the order of the bytes.
-#[inline(always)]
+#[inline]
 #[cfg_attr(test, assert_instr(rev))]
 pub unsafe fn _rev_u64(x: u64) -> u64 {
     x.swap_bytes() as u64
 }
 
 /// Count Leading Zeros.
-#[inline(always)]
+#[inline]
 #[cfg_attr(test, assert_instr(clz))]
 pub unsafe fn _clz_u64(x: u64) -> u64 {
     x.leading_zeros() as u64
@@ -29,7 +29,7 @@ extern "C" {
 }
 
 /// Reverse the bit order.
-#[inline(always)]
+#[inline]
 #[cfg_attr(test, assert_instr(rbit))]
 pub unsafe fn _rbit_u64(x: u64) -> u64 {
     rbit_u64(x as i64) as u64
@@ -39,7 +39,7 @@ pub unsafe fn _rbit_u64(x: u64) -> u64 {
 ///
 /// When all bits of the operand are set it returns the size of the operand in
 /// bits.
-#[inline(always)]
+#[inline]
 #[cfg_attr(test, assert_instr(cls))]
 pub unsafe fn _cls_u32(x: u32) -> u32 {
     u32::leading_zeros((((((x as i32) >> 31) as u32) ^ x) << 1) | 1) as u32
@@ -49,7 +49,7 @@ pub unsafe fn _cls_u32(x: u32) -> u32 {
 ///
 /// When all bits of the operand are set it returns the size of the operand in
 /// bits.
-#[inline(always)]
+#[inline]
 #[cfg_attr(test, assert_instr(cls))]
 pub unsafe fn _cls_u64(x: u64) -> u64 {
     u64::leading_zeros((((((x as i64) >> 63) as u64) ^ x) << 1) | 1) as u64

diff --git a/coresimd/src/arm/neon.rs b/coresimd/src/arm/neon.rs
@@ -9,135 +9,135 @@ use v64::{f32x2, i16x4, i32x2, i8x8, u16x4, u32x2, u8x8};
 use v128::{f32x4, i16x8, i32x4, i64x2, i8x16, u16x8, u32x4, u64x2, u8x16};
 
 /// Vector add.
-#[inline(always)]
+#[inline]
 #[target_feature(enable = "neon")]
 #[cfg_attr(test, assert_instr(add))]
 pub unsafe fn vadd_s8(a: i8x8, b: i8x8) -> i8x8 {
     simd_add(a, b)
 }
 
 /// Vector add.
-#[inline(always)]
+#[inline]
 #[target_feature(enable = "neon")]
 #[cfg_attr(test, assert_instr(add))]
 pub unsafe fn vaddq_s8(a: i8x16, b: i8x16) -> i8x16 {
     simd_add(a, b)
 }
 
 /// Vector add.
-#[inline(always)]
+#[inline]
 #[target_feature(enable = "neon")]
 #[cfg_attr(test, assert_instr(add))]
 pub unsafe fn vadd_s16(a: i16x4, b: i16x4) -> i16x4 {
     simd_add(a, b)
 }
 
 /// Vector add.
-#[inline(always)]
+#[inline]
 #[target_feature(enable = "neon")]
 #[cfg_attr(test, assert_instr(add))]
 pub unsafe fn vaddq_s16(a: i16x8, b: i16x8) -> i16x8 {
     simd_add(a, b)
 }
 
 /// Vector add.
-#[inline(always)]
+#[inline]
 #[target_feature(enable = "neon")]
 #[cfg_attr(test, assert_instr(add))]
 pub unsafe fn vadd_s32(a: i32x2, b: i32x2) -> i32x2 {
     simd_add(a, b)
 }
 
 /// Vector add.
-#[inline(always)]
+#[inline]
 #[target_feature(enable = "neon")]
 #[cfg_attr(test, assert_instr(add))]
 pub unsafe fn vaddq_s32(a: i32x4, b: i32x4) -> i32x4 {
     simd_add(a, b)
 }
 
 /// Vector add.
-#[inline(always)]
+#[inline]
 #[target_feature(enable = "neon")]
 #[cfg_attr(test, assert_instr(add))]
 pub unsafe fn vaddq_s64(a: i64x2, b: i64x2) -> i64x2 {
     simd_add(a, b)
 }
 
 /// Vector add.
-#[inline(always)]
+#[inline]
 #[target_feature(enable = "neon")]
 #[cfg_attr(test, assert_instr(add))]
 pub unsafe fn vadd_u8(a: u8x8, b: u8x8) -> u8x8 {
     simd_add(a, b)
 }
 
 /// Vector add.
-#[inline(always)]
+#[inline]
 #[target_feature(enable = "neon")]
 #[cfg_attr(test, assert_instr(add))]
 pub unsafe fn vaddq_u8(a: u8x16, b: u8x16) -> u8x16 {
     simd_add(a, b)
 }
 
 /// Vector add.
-#[inline(always)]
+#[inline]
 #[target_feature(enable = "neon")]
 #[cfg_attr(test, assert_instr(add))]
 pub unsafe fn vadd_u16(a: u16x4, b: u16x4) -> u16x4 {
     simd_add(a, b)
 }
 
 /// Vector add.
-#[inline(always)]
+#[inline]
 #[target_feature(enable = "neon")]
 #[cfg_attr(test, assert_instr(add))]
 pub unsafe fn vaddq_u16(a: u16x8, b: u16x8) -> u16x8 {
     simd_add(a, b)
 }
 
 /// Vector add.
-#[inline(always)]
+#[inline]
 #[target_feature(enable = "neon")]
 #[cfg_attr(test, assert_instr(add))]
 pub unsafe fn vadd_u32(a: u32x2, b: u32x2) -> u32x2 {
     simd_add(a, b)
 }
 
 /// Vector add.
-#[inline(always)]
+#[inline]
 #[target_feature(enable = "neon")]
 #[cfg_attr(test, assert_instr(add))]
 pub unsafe fn vaddq_u32(a: u32x4, b: u32x4) -> u32x4 {
     simd_add(a, b)
 }
 
 /// Vector add.
-#[inline(always)]
+#[inline]
 #[target_feature(enable = "neon")]
 #[cfg_attr(test, assert_instr(add))]
 pub unsafe fn vaddq_u64(a: u64x2, b: u64x2) -> u64x2 {
     simd_add(a, b)
 }
 
 /// Vector add.
-#[inline(always)]
+#[inline]
 #[target_feature(enable = "neon")]
 #[cfg_attr(test, assert_instr(fadd))]
 pub unsafe fn vadd_f32(a: f32x2, b: f32x2) -> f32x2 {
     simd_add(a, b)
 }
 
 /// Vector add.
-#[inline(always)]
+#[inline]
 #[target_feature(enable = "neon")]
 #[cfg_attr(test, assert_instr(fadd))]
 pub unsafe fn vaddq_f32(a: f32x4, b: f32x4) -> f32x4 {
     simd_add(a, b)
 }
 
 /// Vector long add.
-#[inline(always)]
+#[inline]
 #[target_feature(enable = "neon")]
 #[cfg_attr(test, assert_instr(saddl))]
 pub unsafe fn vaddl_s8(a: i8x8, b: i8x8) -> i16x8 {
@@ -147,7 +147,7 @@ pub unsafe fn vaddl_s8(a: i8x8, b: i8x8) -> i16x8 {
 }
 
 /// Vector long add.
-#[inline(always)]
+#[inline]
 #[target_feature(enable = "neon")]
 #[cfg_attr(test, assert_instr(saddl))]
 pub unsafe fn vaddl_s16(a: i16x4, b: i16x4) -> i32x4 {
@@ -157,7 +157,7 @@ pub unsafe fn vaddl_s16(a: i16x4, b: i16x4) -> i32x4 {
 }
 
 /// Vector long add.
-#[inline(always)]
+#[inline]
 #[target_feature(enable = "neon")]
 #[cfg_attr(test, assert_instr(saddl))]
 pub unsafe fn vaddl_s32(a: i32x2, b: i32x2) -> i64x2 {
@@ -167,7 +167,7 @@ pub unsafe fn vaddl_s32(a: i32x2, b: i32x2) -> i64x2 {
 }
 
 /// Vector long add.
-#[inline(always)]
+#[inline]
 #[target_feature(enable = "neon")]
 #[cfg_attr(test, assert_instr(uaddl))]
 pub unsafe fn vaddl_u8(a: u8x8, b: u8x8) -> u16x8 {
@@ -177,7 +177,7 @@ pub unsafe fn vaddl_u8(a: u8x8, b: u8x8) -> u16x8 {
 }
 
 /// Vector long add.
-#[inline(always)]
+#[inline]
 #[target_feature(enable = "neon")]
 #[cfg_attr(test, assert_instr(uaddl))]
 pub unsafe fn vaddl_u16(a: u16x4, b: u16x4) -> u32x4 {
@@ -187,7 +187,7 @@ pub unsafe fn vaddl_u16(a: u16x4, b: u16x4) -> u32x4 {
 }
 
 /// Vector long add.
-#[inline(always)]
+#[inline]
 #[target_feature(enable = "neon")]
 #[cfg_attr(test, assert_instr(uaddl))]
 pub unsafe fn vaddl_u32(a: u32x2, b: u32x2) -> u64x2 {
@@ -205,7 +205,7 @@ extern "C" {
 }
 
 /// Reciprocal square-root estimate.
-#[inline(always)]
+#[inline]
 #[target_feature(enable = "neon")]
 #[cfg_attr(test, assert_instr(frsqrte))]
 pub unsafe fn vrsqrte_f32(a: f32x2) -> f32x2 {

diff --git a/coresimd/src/arm/v6.rs b/coresimd/src/arm/v6.rs
@@ -10,14 +10,14 @@
 use stdsimd_test::assert_instr;
 
 /// Reverse the order of the bytes.
-#[inline(always)]
+#[inline]
 #[cfg_attr(test, assert_instr(rev))]
 pub unsafe fn _rev_u16(x: u16) -> u16 {
     x.swap_bytes() as u16
 }
 
 /// Reverse the order of the bytes.
-#[inline(always)]
+#[inline]
 #[cfg_attr(test, assert_instr(rev))]
 pub unsafe fn _rev_u32(x: u32) -> u32 {
     x.swap_bytes() as u32

diff --git a/coresimd/src/arm/v7.rs b/coresimd/src/arm/v7.rs
@@ -13,28 +13,28 @@ pub use super::v6::*;
 use stdsimd_test::assert_instr;
 
 /// Count Leading Zeros.
-#[inline(always)]
+#[inline]
 #[cfg_attr(test, assert_instr(clz))]
 pub unsafe fn _clz_u8(x: u8) -> u8 {
     x.leading_zeros() as u8
 }
 
 /// Count Leading Zeros.
-#[inline(always)]
+#[inline]
 #[cfg_attr(test, assert_instr(clz))]
 pub unsafe fn _clz_u16(x: u16) -> u16 {
     x.leading_zeros() as u16
 }
 
 /// Count Leading Zeros.
-#[inline(always)]
+#[inline]
 #[cfg_attr(test, assert_instr(clz))]
 pub unsafe fn _clz_u32(x: u32) -> u32 {
     x.leading_zeros() as u32
 }
 
 /// Reverse the bit order.
-#[inline(always)]
+#[inline]
 #[cfg_attr(test, assert_instr(rbit))]
 #[cfg_attr(target_arch = "arm", target_feature(enable = "v7"))]
 #[cfg(dont_compile_me)] // FIXME need to add `v7` upstream in rustc