Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in rustc while cross-compiling core #53099

Open
bheisler opened this issue Aug 5, 2018 · 12 comments
Open

Segfault in rustc while cross-compiling core #53099

bheisler opened this issue Aug 5, 2018 · 12 comments
Labels
C-bug Category: This is a bug. I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. O-NVPTX Target: the NVPTX LLVM backend for running rust on GPUs, https://llvm.org/docs/NVPTXUsage.html T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@bheisler
Copy link

bheisler commented Aug 5, 2018

I'm trying to compile the kernel code from japaric/nvptx to PTX. This requires cross-compiling the core crate. When I run the command to do this, rustc crashes with a segfault. I'm using cargo-xbuild because xargo doesn't seem to work anymore.

cargo xbuild --target nvptx64-nvidia-cuda.json --verbose

Output:

   Compiling core v0.0.0 (file:///home/brook/.rustup/toolchains/nightly-2018-04-10-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore)
     Running `rustc --crate-name core /home/brook/.rustup/toolchains/nightly-2018-04-10-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/lib.rs --crate-type lib --emit=dep-info,link -C opt-level=3 --sysroot /mnt/c/Users/Brook/workspace/nvptx/kernel/target/sysroot -Z force-unstable-if-unmarked -C metadata=4bb3cbb45694acfc -C extra-filename=-4bb3cbb45694acfc --out-dir /tmp/xargo.pCW5tJxxVfmO/target/nvptx64-nvidia-cuda/release/deps --target /mnt/c/Users/Brook/workspace/nvptx/kernel/nvptx64-nvidia-cuda.json -C incremental=/tmp/xargo.pCW5tJxxVfmO/target/nvptx64-nvidia-cuda/release/incremental -L dependency=/tmp/xargo.pCW5tJxxVfmO/target/nvptx64-nvidia-cuda/release/deps -L dependency=/tmp/xargo.pCW5tJxxVfmO/target/release/deps`
error: Could not compile `core`.

Caused by:
  process didn't exit successfully: `rustc --crate-name core /home/brook/.rustup/toolchains/nightly-2018-04-10-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/lib.rs --crate-type lib --emit=dep-info,link -C opt-level=3 --sysroot /mnt/c/Users/Brook/workspace/nvptx/kernel/target/sysroot -Z force-unstable-if-unmarked -C metadata=4bb3cbb45694acfc -C extra-filename=-4bb3cbb45694acfc --out-dir /tmp/xargo.pCW5tJxxVfmO/target/nvptx64-nvidia-cuda/release/deps --target /mnt/c/Users/Brook/workspace/nvptx/kernel/nvptx64-nvidia-cuda.json -C incremental=/tmp/xargo.pCW5tJxxVfmO/target/nvptx64-nvidia-cuda/release/incremental -L dependency=/tmp/xargo.pCW5tJxxVfmO/target/nvptx64-nvidia-cuda/release/deps -L dependency=/tmp/xargo.pCW5tJxxVfmO/target/release/deps` (signal: 11, SIGSEGV: invalid memory reference)
error: `"cargo" "rustc" "-p" "core" "--release" "--manifest-path" "/tmp/xargo.pCW5tJxxVfmO/Cargo.toml" "--target" "nvptx64-nvidia-cuda.json" "-v" "--" "--sysroot" "/mnt/c/Users/Brook/workspace/nvptx/kernel/target/sysroot" "-Z" "force-unstable-if-unmarked"` failed with exit code: Some(101)
note: run with `RUST_BACKTRACE=1` for a backtrace

The important bit is: (signal: 11, SIGSEGV: invalid memory reference). I've tried this under Windows, WSL and MINGW (as well as a second machine running Arch Linux), it's completely consistent.

Version: cargo 1.29.0-nightly (15433e8cc 2018-08-02) (I have also seen a similar crash with 2018-04-04).

I've tried to build a version of rustc with debug symbols so that I could collect more information, but have been unable to get a working build.

@bheisler
Copy link
Author

bheisler commented Aug 5, 2018

As an aside, I'd gladly help debug and potentially even fix this issue if someone can help me figure out my build system problems. Currently, I've tried to build the MSVC version of rust, and it fails with this error message:

Building stage0 compiler artifacts (x86_64-pc-windows-msvc -> x86_64-pc-windows-msvc)
   Compiling rustc v0.0.0 (file:///C:/Users/Brook/workspace/rust/src/librustc)
error: linking with `C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\bin\HostX64\x64\link.exe` failed: exit code: 1189
<lots of text removed>
note: LINK : fatal error LNK1189: library limit of 65535 objects exceeded

I have no idea what to do about that. I've also tried building the MINGW variant, but that fails with a different error message about an unknown argument to ld. I would prefer to build rust with the MSVC target if possible, but I can try again with MINGW if you want the full error.

@bheisler
Copy link
Author

I was able to build rustc on my Linux machine and try it there. The development version gives this error:

rustc: /home/brook/rust/src/llvm/lib/CodeGen/MachineFunction.cpp:186: void llvm::MachineFunction::init(): Assertion `Target.isCompatibleDataLayout(getDataLayout()) && "Can't create
 a MachineFunction using a Module with a " "Target-incompatible DataLayout attached\n"' failed.

LLVM assertions are disabled in release builds of rustc, so that would make sense.

@bheisler
Copy link
Author

bheisler commented Aug 15, 2018

I was able to collect a stack-trace from rustc when it segfaults:

#0  0x00007f9c54b59ea6 in llvm::CallSiteBase<llvm::Function const, llvm::BasicBlock const, llvm::Value const, llvm::User const, llvm::Use const, llvm::Instruction const, llvm::Call
Inst const, llvm::InvokeInst const, llvm::Use const*>::getCalledValue (this=<synthetic pointer>) at /home/brook/rust/src/llvm/include/llvm/ADT/PointerIntPair.h:155
#1  llvm::CallSiteBase<llvm::Function const, llvm::BasicBlock const, llvm::Value const, llvm::User const, llvm::Use const, llvm::Instruction const, llvm::CallInst const, llvm::Invo
keInst const, llvm::Use const*>::getCalledFunction (this=<synthetic pointer>) at /home/brook/rust/src/llvm/include/llvm/IR/CallSite.h:108
#2  llvm::NVPTXTargetLowering::getPrototype[abi:cxx11](llvm::DataLayout const&, llvm::Type*, std::vector<llvm::TargetLoweringBase::ArgListEntry, std::allocator<llvm::TargetLowering
Base::ArgListEntry> > const&, llvm::SmallVectorImpl<llvm::ISD::OutputArg> const&, unsigned int, llvm::ImmutableCallSite) const ()
    at /home/brook/rust/src/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp:1276
#3  0x00007f9c54b5c7de in llvm::NVPTXTargetLowering::LowerCall(llvm::TargetLowering::CallLoweringInfo&, llvm::SmallVectorImpl<llvm::SDValue>&) const ()
    at /home/brook/rust/src/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp:1661
#4  0x00007f9c55545cbb in llvm::TargetLowering::LowerCallTo (this=0x564cefae0618, CLI=...) at /home/brook/rust/src/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:8537
#5  0x00007f9c554d19c5 in llvm::DAGTypeLegalizer::ExpandIntRes_XMULO (this=this@entry=0x7f9c5348db90, N=N@entry=0x7f9c300ff288, Lo=..., Hi=...)
    at /home/brook/rust/src/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp:2778
#6  0x00007f9c554db6c5 in llvm::DAGTypeLegalizer::ExpandIntegerResult (this=this@entry=0x7f9c5348db90, N=N@entry=0x7f9c300ff288, ResNo=ResNo@entry=0)
    at /home/brook/rust/src/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp:1477
#7  0x00007f9c554e5940 in llvm::DAGTypeLegalizer::run() () at /home/brook/rust/src/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:257
#8  0x00007f9c554e5ccd in llvm::SelectionDAG::LegalizeTypes (this=<optimized out>) at /home/brook/rust/src/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h:175
#9  0x00007f9c555bc059 in llvm::SelectionDAGISel::CodeGenAndEmitDAG() () at /home/brook/rust/src/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:770
#10 0x00007f9c555bc7a0 in llvm::SelectionDAGISel::SelectBasicBlock (this=this@entry=0x7f9c3001a790, Begin=..., Begin@entry=..., End=..., End@entry=...,
    HadTailCall=<error reading variable>) at /home/brook/rust/src/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:667
#11 0x00007f9c555c1311 in llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) () at /home/brook/rust/src/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1733
#12 0x00007f9c555c37a9 in llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) () at /home/brook/rust/src/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:469
#13 0x00007f9c557a4143 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) () at /home/brook/rust/src/llvm/lib/CodeGen/MachineFunctionPass.cpp:61
#14 0x00007f9c5615b548 in llvm::FPPassManager::runOnFunction (this=0x7f9c300a6820, F=...) at /home/brook/rust/src/llvm/lib/IR/LegacyPassManager.cpp:1593
#15 0x00007f9c5615b5c3 in llvm::FPPassManager::runOnModule (this=0x7f9c300a6820, M=...) at /home/brook/rust/src/llvm/lib/IR/LegacyPassManager.cpp:1609
#16 0x00007f9c5615ac70 in (anonymous namespace)::MPPassManager::runOnModule (M=..., this=0x7f9c3000c4e0) at /home/brook/rust/src/llvm/lib/IR/LegacyPassManager.cpp:1669
#17 llvm::legacy::PassManagerImpl::run(llvm::Module&) () at /home/brook/rust/src/llvm/lib/IR/LegacyPassManager.cpp:1774
#18 0x00007f9c5615af49 in llvm::legacy::PassManager::run (this=this@entry=0x7f9c300e8010, M=...) at /home/brook/rust/src/llvm/lib/IR/LegacyPassManager.cpp:1805
#19 0x00007f9c546d4497 in LLVMRustWriteOutputFile (Target=<optimized out>, PMR=0x7f9c300e8010, M=0x564ceac35760,
    Path=0x7f9c300d2eb0 "/tmp/xargo.JwjTnk5yZGqd/target/nvptx64-nvidia-cuda/release/deps/core-61b4d04ef2a9c7b3.1i4jyjk3gxvvvngj.rcgu.o", RustFileType=<optimized out>)
    at /home/brook/rust/src/llvm/include/llvm/IR/Module.h:877
#20 0x00007f9c5463791a in rustc_codegen_llvm::back::write::write_output_file (handler=0x7f9c5348fb60, target=0x564cf1fa74e8, pm=0x80, m=0xffffffffffffffe8,
    output=<error reading variable: access outside bounds of object referenced via synthetic pointer>, file_type=rustc_codegen_llvm::llvm::ffi::ObjectFile)
    at librustc_codegen_llvm/back/write.rs:100
#21 0x00007f9c545facba in rustc_codegen_llvm::back::write::codegen::{{closure}}::{{closure}} (cpm=0x7f9c300e8010) at librustc_codegen_llvm/back/write.rs:804
#22 rustc_codegen_llvm::back::write::codegen::with_codegen (tm=<optimized out>, llmod=<optimized out>, no_builtins=<optimized out>, f=...)
    at librustc_codegen_llvm/back/write.rs:684
#23 0x00007f9c54601478 in rustc_codegen_llvm::back::write::codegen::{{closure}} () at librustc_codegen_llvm/back/write.rs:803
#24 0x00007f9c545ffac7 in rustc::util::common::time_ext (do_it=<optimized out>, sess=..., what=..., f=...) at /home/brook/rust/src/librustc/util/common.rs:163
#25 0x00007f9c54639b4d in rustc_codegen_llvm::back::write::codegen (cgcx=<optimized out>, diag_handler=<optimized out>, module=..., config=<optimized out>,
    timeline=<optimized out>) at librustc_codegen_llvm/back/write.rs:740
#26 0x00007f9c54641314 in rustc_codegen_llvm::back::write::execute_work_item (cgcx=<optimized out>, work_item=..., timeline=0x7f9c5348fcf0)
    at librustc_codegen_llvm/back/write.rs:1404
#27 0x00007f9c5466440b in rustc_codegen_llvm::back::write::spawn_work::{{closure}} () at librustc_codegen_llvm/back/write.rs:2049
#28 std::sys_common::backtrace::__rust_begin_short_backtrace (f=...) at /home/brook/rust/src/libstd/sys_common/backtrace.rs:136
#29 0x00007f9c5462c698 in std::thread::Builder::spawn::{{closure}}::{{closure}} () at /home/brook/rust/src/libstd/thread/mod.rs:409
#30 <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once (self=..., _args=<optimized out>) at /home/brook/rust/src/libstd/panic.rs:313
#31 0x00007f9c545ebbd8 in std::panicking::try::do_call (data=<optimized out>) at /home/brook/rust/src/libstd/panicking.rs:310
#32 0x00007f9c60706db1 in __rust_maybe_catch_panic (f=0x7f9c545ebbb0 <std::panicking::try::do_call>, data=0x564cf1fa74e8 "\260e\374\360LV", data_ptr=0x7f9c53490438,
    vtable_ptr=0x7f9c53490440) at libpanic_unwind/lib.rs:105
#33 0x00007f9c545eba5b in std::panicking::try (f=...) at /home/brook/rust/src/libstd/panicking.rs:289
#34 0x00007f9c5462c758 in std::panic::catch_unwind (f=...) at /home/brook/rust/src/libstd/panic.rs:392
#35 0x00007f9c545666df in std::thread::Builder::spawn::{{closure}} () at /home/brook/rust/src/libstd/thread/mod.rs:408
#36 <F as alloc::boxed::FnBox<A>>::call_box (self=0x7f9c380014d0, args=<optimized out>) at /home/brook/rust/src/liballoc/boxed.rs:642
#37 0x00007f9c606f97fc in std::sys_common::thread::start_thread (main=0x564cf0b4af90 "\320\024") at libstd/sys_common/thread.rs:24
#38 0x00007f9c606b9c66 in std::sys::unix::thread::Thread::new::thread_start (main=0x564cf1fa74e8) at libstd/sys/unix/thread.rs:90
#39 0x00007f9c588e2075 in start_thread () from /usr/lib/libpthread.so.0
#40 0x00007f9c6036b53f in clone () from /usr/lib/libc.so.6

@bheisler
Copy link
Author

Still working on debugging this. I've never worked with LLVM before (let alone the internals of instruction selection) so I only have a vague idea of how this code works. I'd appreciate it if someone more familiar could comment on this.

The segfault happens in NVPTXISelLowering.cpp:1277, when the ImmutableCallSite CS points to null (CS.I.Value = 0).

    } else if (retTy->isAggregateType() || retTy->isVectorTy() || retTy->isIntegerTy(128)) {
      auto &DL = CS.getCalledFunction()->getParent()->getDataLayout();
      O << ".param .align " << retAlignment << " .b8 _["
        << DL.getTypeAllocSize(retTy) << "]";
    } else {

Specifically, it segfaults inside the CallSiteBase::getCalledFunction function, but I don't think that's really material. Other code in this file has a special-case to handle CS pointing to NULL (see line 1356):

  if (!CS) {
    // CallSite is zero, fallback to ABI type alignment
    return DL.getABITypeAlignment(Ty);
  }

This makes it seem like it's expected that the call site can be zero. However, the code above has no guard, so when it reaches this code it dereferences a NULL pointer and segfaults.

@denzp
Copy link
Contributor

denzp commented Aug 22, 2018

Did you disable LLVM assertions in order to get the stacktrace? I'm facing failed assertion earlier:

rustc: /home/den/rust/src/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp:1729: virtual llvm::SDValue llvm::NVPTXTargetLowering::LowerCall(llvm::TargetLowering::CallLoweringInfo&, llvm::SmallVectorImpl<llvm::SDValue>&) const: 
Assertion `VTs.size() == Ins.size() && "Bad value decomposition"' failed.

Looks like some edge case of i128 support from #38824. Right now I'm trying to figure out a minimal reproducible example to fix this on LLVM side.

llc suggests it's related to core::num::from_str_radix

llc: /home/den/rust/src/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp:1729: virtual llvm::SDValue llvm::NVPTXTargetLowering::LowerCall(llvm::TargetLowering::CallLoweringInfo&, llvm::SmallVectorImpl<llvm::SDValue>&) const: Assertion `VTs.size() == Ins.size() && "Bad value decomposition"' failed.
Stack dump:
0.      Program arguments: /home/den/rust/build/x86_64-unknown-linux-gnu/llvm/build/bin/llc core.ll
1.      Running pass 'Function Pass Manager' on module 'core.ll'.
2.      Running pass 'NVPTX DAG->DAG Pattern Instruction Selection' on function '@_ZN4core3num14from_str_radix17hb496b766c386ac7aE'

@bheisler
Copy link
Author

I did disable LLVM assertions, but I didn't see that particular assertion, only the one mentioned above relating to incompatible data layouts. In that case, I figured out that adding i128:128 to the data-layout field in the target JSON also bypassed that assertion. That didn't cause the segfault, though, since it still happens with a clean target file.

@bheisler
Copy link
Author

@denzp reports (#38789 (comment)) that setting the obj-is-bitcode flag to true prevents the segfault. My understanding, however, is that this causes LLVM to write out LLVM bitcode rather than PTX assembly. If so, it's just bypassing the broken code. Have I misunderstood?

@denzp
Copy link
Contributor

denzp commented Aug 31, 2018

To be honest, I don't understand why rustc --emit=link --target nvptx64-nvidia-cuda is using instruction selection code from LLVM when obj-is-bitcode is disabled. When it's enabled, it simply stores a bitcode without extra steps.

Currently, there are two main issues you might face when you are trying to produce PTX assembly for a whole libcore crate, and both of them are related to i128 numbers:

  • Assertion VTs.size() == Ins.size() && "Bad value decomposition" failed. might seems not important, but still it contributes into a production of Invalid PTX. I didn't take into account all possible cases of using i128 numbers - especially when they are used in structs. The fix is trivial and I'll send the patch to LLVM upstream in next days.
  • When we make a workaround for SEGFAULT at empty call site CS (this is related to not defined for NVPTX intrinsics), we will be stuck at something like:
LLVM ERROR: Cannot select: t7: i64 = ExternalSymbol'__powidf2'
In function: powi

I tried to implement a solution for this last year, but even personally I'm not satisfied with it. I abandoned the changes because I focused on ptx-linker which helps to avoid the problems: as long as you don't use i128 types in your kernels, the output assembly would never be created for "unsupported" code.

In the best case, when LLVM wouldn't complain about Cannot select ... ExternalSymbol, the output assembly for libcore will still be useless and "Invalid" unless you provide an implementation [1] or [2] of the missing intrinsics. And for that, you need a kind of a linker anyway.

I'm going to give another try to implement a proper solution for the undefined intrinsics and hopefully we will get rid of the compilation errors!

@bheisler
Copy link
Author

bheisler commented Sep 5, 2018

Thanks for looking into this. Now that the ptx-linker is available, this issue no longer is no longer blocking me. I'd be happy to test out changes if needed, though.

I'm not sure I understand this part:

To be honest, I don't understand why rustc --emit=link --target nvptx64-nvidia-cuda is using instruction selection code from LLVM when obj-is-bitcode is disabled. When it's enabled, it simply stores a bitcode without extra steps.

I wasn't running that command (unless cargo-xbuild uses it behind the scenes), so I'm not sure how that's relevant.

@cyplo
Copy link
Contributor

cyplo commented Oct 28, 2018

Heya ! Is this still a problem or were you able to fix it ? thank you :)

@bheisler
Copy link
Author

I was never able to fix it. I haven't tried recently, so it may have been fixed by someone else.

@jonas-schievink jonas-schievink added I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. O-NVPTX Target: the NVPTX LLVM backend for running rust on GPUs, https://llvm.org/docs/NVPTXUsage.html labels Jan 27, 2019
@jonas-schievink jonas-schievink added the C-bug Category: This is a bug. label Aug 19, 2019
@kjetilkjeka
Copy link
Contributor

Is this issue relevant after -Z build-std was implemented? I'm currently using it to build core for nvptx64-nvidia-cuda without seeing any problems like this on a couple of different recent nightly versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. O-NVPTX Target: the NVPTX LLVM backend for running rust on GPUs, https://llvm.org/docs/NVPTXUsage.html T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants