Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Field ordering still causes extra memcpy #58082

Open
jrmuizel opened this issue Feb 2, 2019 · 4 comments
Open

Field ordering still causes extra memcpy #58082

jrmuizel opened this issue Feb 2, 2019 · 4 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-mir-opt Area: MIR optimizations A-mir-opt-nrvo Fixed by NRVO C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@jrmuizel
Copy link
Contributor

jrmuizel commented Feb 2, 2019

This is similar to #56356 except this time with more fields:

use std::mem;

struct SV {
    capacity: usize,
    disc: usize,
    data: [usize; 40],
}

impl SV {
    fn new() -> SV {
        SV { data: unsafe { mem::uninitialized() },
            disc: 0,
            capacity: 0 }
    }
}

pub struct L {
    a: SV,
    b: SV
}

pub struct Allocation<T> {
    f: *mut T,
}

impl<T> Allocation<T> {
    pub fn init(self, value: T) {
        use std::ptr;
        unsafe {
        ptr::write(self.f, value);
        }
    }
}

#[inline(never)]
pub fn foo(a: Allocation<L>) {
    a.init(L {
        a: SV::new(),
        b: SV::new()
    });
}

gives:

example::foo:
        sub     rsp, 680
        xorps   xmm0, xmm0
        movaps  xmmword ptr [rsp], xmm0
        movaps  xmmword ptr [rsp + 336], xmm0
        mov     rsi, rsp
        mov     edx, 672
        call    qword ptr [rip + memcpy@GOTPCREL]
        add     rsp, 680
        ret

vs moving capacity to the end of the struct:

example::foo:
        mov     qword ptr [rdi], 0
        xorps   xmm0, xmm0
        movups  xmmword ptr [rdi + 328], xmm0
        mov     qword ptr [rdi + 664], 0
        ret

The problem is reproducible with C++ so I'll file a new llvm bug too.

@jrmuizel
Copy link
Contributor Author

jrmuizel commented Feb 2, 2019

The llvm bug is https://bugs.llvm.org/show_bug.cgi?id=40574

@nikic nikic added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. labels Feb 2, 2019
@jrmuizel
Copy link
Contributor Author

@nikic do you think you might be able to look at what's going wrong here?

@jonas-schievink jonas-schievink added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Mar 26, 2019
@jonas-schievink jonas-schievink added A-mir-opt Area: MIR optimizations C-enhancement Category: An issue proposing an enhancement or a PR with one. A-mir-opt-nrvo Fixed by NRVO labels Jun 10, 2020
@nikic
Copy link
Contributor

nikic commented Mar 13, 2021

Issue still exists: https://godbolt.org/z/j6GEeo

Not sure why SROA doesn't handle this. IR test case:

%L = type { [0 x i64], %SV, [0 x i64], %SV, [0 x i64] }
%SV = type { [0 x i64], i64, [0 x i64], i64, [0 x i64], [40 x i64], [0 x i64] }

define void @test(i64* nocapture %a) unnamed_addr #0 {
start:
  %_4 = alloca %L, align 8
  %_4.0.sroa_cast = bitcast %L* %_4 to i8*
  call void @llvm.lifetime.start.p0i8(i64 672, i8* nonnull %_4.0.sroa_cast)
  %_41213 = bitcast %L* %_4 to i8*
  call void @llvm.memset.p0i8.i64(i8* nonnull align 8 dereferenceable(16) %_41213, i8 0, i64 16, i1 false)
  %0 = getelementptr inbounds %L, %L* %_4, i64 0, i32 3
  %1 = bitcast %SV* %0 to i8*
  call void @llvm.memset.p0i8.i64(i8* nonnull align 8 dereferenceable(16) %1, i8 0, i64 16, i1 false)
  %2 = bitcast i64* %a to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(672) %2, i8* nonnull align 8 dereferenceable(672) %_4.0.sroa_cast, i64 672, i1 false) #3
  call void @llvm.lifetime.end.p0i8(i64 672, i8* nonnull %_4.0.sroa_cast)
  ret void
}

declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #1

declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #1

declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #1

declare void @llvm.memset.p0i8.i64(i8* nocapture writeonly, i8, i64, i1 immarg) #2

Now, even if I comment out the memcpy this still isn't split. I also need to comment out the lifetime start and end for something to happen.

@nikic
Copy link
Contributor

nikic commented Mar 13, 2021

I think the problem here is that https://github.com/llvm/llvm-project/blob/b26c953f55d659ed5148f38e34716efb696b5016/llvm/lib/Transforms/Scalar/SROA.cpp#L559-L572 will always place all overlapping splittable slices into one partition, if there are no overlapping unsplittable slices. If we replace memset with a store (which is unsplittable), then we would create a new partition at the store boundary. I think that this code should be treating splittable slices that only overlap as subsets similarly to unsplittable slices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-mir-opt Area: MIR optimizations A-mir-opt-nrvo Fixed by NRVO C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

3 participants