Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.12.0: High memory usage when linking in release mode with debug info #36926

Closed
rphmeier opened this issue Oct 3, 2016 · 33 comments
Closed
Assignees
Labels
P-high High priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@rphmeier
Copy link
Contributor

rphmeier commented Oct 3, 2016

Reports of OOM, using 18GB of memory while building https://github.com/ethcore/parity in release mode with debuginfo.

Known to affect OSX and Linux, possibly Windows but unconfirmed.

@rphmeier
Copy link
Contributor Author

rphmeier commented Oct 3, 2016

@bluss
Copy link
Member

bluss commented Oct 3, 2016

Not a solution, but a useful tip might be to try the gold linker (not yet default in rustc), if availabe. Rustc flag is -Clink-args=-fuse-ld=gold

@alexcrichton
Copy link
Member

@rphmeier is this a regression for 1.11, or just a bug in general?

@rphmeier
Copy link
Contributor Author

rphmeier commented Oct 3, 2016

Appears to be a regression from 1.11.0, we didn't experience any abnormally high memory usage until then. Looks like it mostly affects users of OSX.

@alexcrichton alexcrichton added I-nominated T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. regression-from-stable-to-stable Performance or correctness regression from one stable version to another. labels Oct 3, 2016
@alexcrichton
Copy link
Member

Ok, thanks for the info! Tagged as such.

@cuviper
Copy link
Member

cuviper commented Oct 4, 2016

You might try 1.11 again with -Zorbit to see if it's a MIR issue.
(And please see #36774 for another release+debuginfo MIR issue.)

@nikomatsakis
Copy link
Contributor

cc @michaelwoerister @rust-lang/compiler

@tomusdrw
Copy link

tomusdrw commented Oct 4, 2016

1.11 with -Zorbit fails to compile at all

$ rustc --version
rustc 1.11.0 (9b21dcd6a 2016-08-15)
$ cat ~/.cargo/config
[build]
rustflags = ["-Zorbit"]
$ cd parity && cargo clean && cargo build --release -j 1
  [ truncated irrelevant output]
  Compiling dtoa v0.2.2
warning: the option `Z` is unstable and should only be used on the nightly compiler, but it is currently accepted for backwards compatibility; this will soon change, see issue #31847 for more details
error: internal compiler error: ../src/librustc_trans/mir/lvalue.rs:100: using operand local var4 as lvalue
note: the compiler unexpectedly panicked. this is a bug.
note: we would appreciate a bug report: https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#bug-reports
note: run with `RUST_BACKTRACE=1` for a backtrace
thread 'rustc' panicked at 'Box<Any>', ../src/librustc_errors/lib.rs:619
stack backtrace:
   1:     0x7f531ab8be2f - std::sys::backtrace::tracing::imp::write::h46e546df6e4e4fe6
   2:     0x7f531ab9a13b - std::panicking::default_hook::_$u7b$$u7b$closure$u7d$$u7d$::h077deeda8b799591
   3:     0x7f531ab99cd8 - std::panicking::default_hook::heb8b6fd640571a4f
   4:     0x7f531ab5fade - std::panicking::rust_panic_with_hook::hd7b83626099d3416
   5:     0x7f5313714187 - std::panicking::begin_panic::h3e029d5f110b4661
   6:     0x7f5313713ad1 - rustc_errors::Handler::bug::he60f76b829c68950
   7:     0x7f531712ceb4 - rustc::session::opt_span_bug_fmt::_$u7b$$u7b$closure$u7d$$u7d$::h11fa08f71c1e9d84
   8:     0x7f531712ccbd - rustc::session::opt_span_bug_fmt::hdc2517bf24a762d0
   9:     0x7f5317147886 - rustc::session::bug_fmt::h4bff3cf11871f37a
  10:     0x7f5319b948ae - rustc_trans::mir::lvalue::_<impl rustc_trans..mir..MirContext<'bcx, 'tcx>>::trans_lvalue::h2a6d486404ea098d
  11:     0x7f5319b93c35 - rustc_trans::mir::lvalue::_<impl rustc_trans..mir..MirContext<'bcx, 'tcx>>::trans_lvalue::h2a6d486404ea098d
  12:     0x7f5319b956c6 - rustc_trans::mir::operand::_<impl rustc_trans..mir..MirContext<'bcx, 'tcx>>::trans_consume::h73d5de2d873e0881
  13:     0x7f5319b939bd - rustc_trans::mir::operand::_<impl rustc_trans..mir..MirContext<'bcx, 'tcx>>::trans_operand::hc01879dc825e1a6b
  14:     0x7f5319ba2403 - rustc_trans::mir::rvalue::_<impl rustc_trans..mir..MirContext<'bcx, 'tcx>>::trans_rvalue_operand::h4cf1fcaf74fe9313
  15:     0x7f5319b8a2ef - rustc_trans::mir::block::_<impl rustc_trans..mir..MirContext<'bcx, 'tcx>>::trans_block::h62f4aec8e51f60c5
  16:     0x7f5319a89fe2 - rustc_trans::mir::trans_mir::hae8e3e3c01dc03cf
  17:     0x7f5319a82d07 - rustc_trans::base::trans_closure::h1ab33a1b60511d91
  18:     0x7f5319a8bde3 - rustc_trans::base::trans_fn::he8613d5f36dd799b
  19:     0x7f5319a952be - rustc_trans::base::trans_item::hcfc6f2f68b9918b0
  20:     0x7f5319aad893 - _<rustc_trans..base..TransItemsWithinModVisitor<'a, 'tcx> as rustc..hir..intravisit..Visitor<'v>>::visit_item::hfffd713d2b3de363
  21:     0x7f5319aad045 - rustc::hir::intravisit::Visitor::visit_stmt::hb5f2d3beb07f428e
  22:     0x7f5319aad27e - rustc::hir::intravisit::Visitor::visit_fn::h80e8da1b47487d41
  23:     0x7f5319aac14d - rustc::hir::intravisit::walk_item::hd4e84b3fe58a4a0c
  24:     0x7f5319aad8ac - _<rustc_trans..base..TransItemsWithinModVisitor<'a, 'tcx> as rustc..hir..intravisit..Visitor<'v>>::visit_item::hfffd713d2b3de363
  25:     0x7f5319a9e337 - rustc_trans::base::trans_crate::h75826f6271b49faf
  26:     0x7f531b0e007f - rustc_driver::driver::phase_4_translate_to_llvm::hbc7e9672529bb439
  27:     0x7f531b0dd06c - rustc_driver::driver::compile_input::_$u7b$$u7b$closure$u7d$$u7d$::h7168080c5b7e33b9
  28:     0x7f531b0d977d - rustc_driver::driver::phase_3_run_analysis_passes::_$u7b$$u7b$closure$u7d$$u7d$::hded790081e457a76
  29:     0x7f531b0d2f49 - rustc::ty::context::TyCtxt::create_and_enter::h7622c0f52ea2e7fe
  30:     0x7f531b09080f - rustc_driver::driver::compile_input::hdfe4405d66704c31
  31:     0x7f531b07cf44 - rustc_driver::run_compiler::h581448fb74257353
  32:     0x7f531b07a04e - std::panicking::try::call::hf081e8ea5e252d1a
  33:     0x7f531aba863b - __rust_try
  34:     0x7f531aba85de - __rust_maybe_catch_panic
  35:     0x7f531b07ab34 - _<F as alloc..boxed..FnBox<A>>::call_box::h2d5dcb354b3ff8db
  36:     0x7f531ab98264 - std::sys::thread::Thread::new::thread_start::hf2eed4b6f7149599
  37:     0x7f5312e01183 - start_thread
  38:     0x7f531a7d237c - clone
  39:                0x0 - <unknown>

error: Could not compile `dtoa`.

@eddyb
Copy link
Member

eddyb commented Oct 4, 2016

The other way around should also work: 1.12 with -Zorbit=off.

@nikomatsakis
Copy link
Contributor

@rphmeier so I'm trying to build this on a fresh windows VM but encountering some weird errors like this (which seem quite unrelated to the problem you are seeing):

"C:\Users\niko\.cargo\registry\src\github.com-1ecc6299db9ec823\ring-0.4.3\crypto\libring.Windows.vcxproj" (default target) (1) ->
(ClCompile target) ->
  rand\sysrand.c(50): fatal error C1083: Cannot open include file: 'windows.h': No such file or directory [C:\Users\niko\.cargo\registry\src\github.com-1ecc6299db9ec823\ring-0.4.3\crypto\libring.Windows.vcxproj]

    2 Warning(s)
    1 Error(s)

Time Elapsed 00:00:01.93

--- stderr
thread 'main' panicked at 'C:\Program Files (x86)\MSBuild\14.0\bin\amd64\msbuild.exe execution failed', C:\Users\niko\.cargo\registry\src\github.com-1ecc6299db9ec823\ring-0.4.3\build.rs:211
note: Run with `RUST_BACKTRACE=1` for a backtrace.

Any thoughts on what I have to do here? I did install Visual Studio, but perhaps the right directories are not in my path?

@arkpar
Copy link

arkpar commented Oct 4, 2016

@nikomatsakis I believe you need to use the "VS2015 x64 Native Tools Command Prompt"
see https://github.com/ethcore/parity#build-dependencies

@arkpar
Copy link

arkpar commented Oct 4, 2016

@nikomatsakis the latest master has debug = false in Cargo.toml to work around this issue. Set it back to true or use this commit to reproduce:
b1d8b84eb96ebd8a8fe4da1014a344fcd8b43085

@nikomatsakis
Copy link
Contributor

@arkpar thanks (for both comments =)

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Oct 4, 2016

OK, so, my efforts to reproduce this have not, in fact, reproduced the memory usage. I do however see this:

warning: struct is never used: `Headers`, #[warn(dead_code)] on by default
   --> ethcore\src\verification\queue\kind.rs:167:2
    |
167 |   pub struct Headers;
    |   ^^^^^^^^^^^^^^^^^^^

EH pad must be jumped to via an unwind edge
  %cleanuppad13 = cleanuppad within none []
  br i1 %1927, label %bb99, label %bb102_cleanup_trampoline_bb99, !dbg !727554
LLVM ERROR: Broken function found, compilation aborted!
error: Could not compile `ethcore`.

which I guess is #36924

@arkpar
Copy link

arkpar commented Oct 4, 2016

The memory issue has been reported for OSX and Linux specifically. Sorry for the confusion

@nikomatsakis
Copy link
Contributor

@arkpar

The memory issue has been reported for OSX and Linux specifically. Sorry for the confusion

No need to apologize, I see that @rphmeier in fact wrote this specifically:

Looks like it mostly affects users of OSX.

I'm just juggling too many things I guess. =( I'll try again on OS/X. Good to be able to reproduce #36924 regardless, I suppose.

@arielb1
Copy link
Contributor

arielb1 commented Oct 5, 2016

High memory usage (causing my Windows to page to death - when will Microsoft understand that the alternative to the OOM killer is the power switch) confirmed on Rust 1.14 on 64-bit MSVC.

@alexcrichton
Copy link
Member

I was able to reproduce this on Linux with the 1.12 compiler, but I was unable to get Valgrind's massif tool to work with that compiler because of jemalloc. I recompiled 3191fba (the current master as of a few hours ago) with alloc_system and was able to get a massif profile of the run. The full output is: https://gist.github.com/alexcrichton/0f1f043442ef1628fe887d27fe3ed436. The corresponding -Z time-passes -Z time-llvm-passes is here, and I saw the massive spike in memory happen just after the LLVM module passes, presumably during the codegen passes

The first peak in the profile has 80% of the memory with one stack trace, which to me looks like LLVM.

This may either be fixed upstream, or it's something we should probably report upstream.

@alexcrichton
Copy link
Member

I suppose this could also be how MIR inevitably changed our debuginfo output from old trans, so something there may have accidentally triggered something pathological.

@michaelwoerister
Copy link
Member

This might be relevant: https://llvm.org/bugs/show_bug.cgi?id=26055
It says that LLVM's LiveDebugValues pass is very slow, especially in the presence of function inlining (which would explain way this only occurs in optimized builds). There is also a patch addressing this: https://reviews.llvm.org/D24994. I don't know how hard it would be to backport and see if it solves the problem.

@michaelwoerister
Copy link
Member

From the commit message of the patch:

I have a benchmark that is a large C++ source with an enormous amount of inlined "this"-pointers that would previously eat >24GiB (most of them for DBG_VALUE intrinsics) and whose compile time was dominated by LiveDebugValues. With this patch applied the memory consumption is 1GiB and 1.7% of the time is spent in LiveDebugValues.

@brson brson added the P-high High priority label Oct 6, 2016
@rphmeier
Copy link
Contributor Author

rphmeier commented Oct 7, 2016

@michaelwoerister Great find! I'm not too knowledgeable about the Rust release conventions, but if this LLVM patch were to be backported, would it find its way into the beta branch or only nightly?

@michaelwoerister
Copy link
Member

@rphmeier If the patch is small enough, maybe? Do we have precedent for that, updating LLVM in beta, @brson @alexcrichton ?

@alexcrichton
Copy link
Member

I'm not sure if we have precedent, but that patch seems small enough that I'd be fine backporting. I think we'd have to prep two separate LLVM branches (to backport to both nightly and beta), but that seems fine to me.

@michaelwoerister
Copy link
Member

A little update: I was able to reproduce the memory spike in LLVM codegen passes on Linux (stable + nightly) and OS X (stable + nightly).

@michaelwoerister
Copy link
Member

Alright, cherry-picking the following two commits solves the problem on nightly:
llvm-mirror/llvm@275a9fe
llvm-mirror/llvm@b835e6e

How should we proceed? Apply the patches on all 3 channels? It is kind of a corner case that is only hit when turning on optimizations and debuginfo and probably having a big function with lots of inlining that triggers the pathological case.

@alexcrichton
Copy link
Member

Thanks for the investigation @michaelwoerister! My preferred course of action would be:

  1. Backport the commits to our LLVM fork, land it on nightly
  2. Backport the commits to our LLVM fork, land it on beta

Then determine if we'd like to land on stable (doing the same backporting). @brson thoughts?

@michaelwoerister
Copy link
Member

Removed I-nominated tag since we've talked about this in the compiler meeting.

@michaelwoerister
Copy link
Member

I've opened a PR with the fixes against the LLVM version that nightly uses: rust-lang/llvm#53

bors added a commit that referenced this issue Oct 10, 2016
…crichton

llvm: Update LLVM to include fix for pathologic case in its LiveDebugValues pass.

See #36926.
r? @alexcrichton
@nikomatsakis
Copy link
Contributor

This should be fixed in nightly builds, right? (We're still waiting on a backport, afaik)

@michaelwoerister
Copy link
Member

Yes, the latest nightly should contain the fix.

@arkpar
Copy link

arkpar commented Oct 14, 2016

Confirmed in rustc 1.14.0-nightly (098d22845 2016-10-13)

@brson
Copy link
Contributor

brson commented Oct 20, 2016

Thanks @arkpar!

@brson brson closed this as completed Oct 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P-high High priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests