Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a generic Destination Propagation optimization on MIR #72632

Merged
merged 20 commits into from
Sep 20, 2020
Merged

Implement a generic Destination Propagation optimization on MIR #72632

merged 20 commits into from
Sep 20, 2020

Conversation

jonas-schievink
Copy link
Contributor

@jonas-schievink jonas-schievink commented May 26, 2020

This takes the work that was originally started by @eddyb in #47954, and then explored by me in #71003, and implements it in a general (ie. not limited to acyclic CFGs) and dataflow-driven way (so that no additional infrastructure in rustc is needed).

The pass is configured to run at mir-opt-level=2 and higher only. To enable it by default, some followup work on it is still needed:

  • Performance needs to be evaluated. I did some light optimization work and tested against tuple-stress, which caused trouble in my last attempt, but didn't go much in depth here.
    • We can also enable the pass only at opt-level=2 and higher, if it is too slow to run in debug mode, but fine when optimizations run anyways.
  • Debuginfo needs to be fixed after locals are merged. I did not look into what is required for this.
  • Live ranges of locals (aka StorageLive and StorageDead) are currently deleted. We either need to decide that this is fine, or if not, merge the variable's live ranges (or remove these statements entirely – StorageLive (and even StorageDead) may be unnecessary in MIR. #68622).

Some benchmarks of the pass were done in #72635.

@rust-highfive
Copy link
Collaborator

r? @eddyb

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 26, 2020
@jonas-schievink jonas-schievink changed the title Implement a generic Destination Propagation optimization on MIR [WIP] Implement a generic Destination Propagation optimization on MIR May 27, 2020
@jonas-schievink

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Jun 3, 2020

☔ The latest upstream changes (presumably #72935) made this pull request unmergeable. Please resolve the merge conflicts.

@jrmuizel
Copy link
Contributor

#45663, #58082, #56172 are some unnecessary memcpy issues. It would be interesting to know how many of them are solved by this approach.

@goffrie
Copy link
Contributor

goffrie commented Jun 10, 2020

#45663, #58082, #56172 are some unnecessary memcpy issues. It would be interesting to know how many of them are solved by this approach.

I tried it out and it looks like #56172 is improved with this PR (no memcpys) but the others are unchanged.

@jonas-schievink
Copy link
Contributor Author

jonas-schievink commented Jun 10, 2020

I haven't looked into these in detail, but:

Seems like we're very close to handling them though.

@jrmuizel
Copy link
Contributor

Seems like we're very close to handling them though.

Exciting!

@rust-highfive

This comment has been minimized.

@jonas-schievink jonas-schievink changed the title [WIP] Implement a generic Destination Propagation optimization on MIR Implement a generic Destination Propagation optimization on MIR Jun 24, 2020
@dotdash
Copy link
Contributor

dotdash commented Jun 26, 2020

This modified example from #56172 regresses with this pass:

#[inline(never)]
pub fn g(clip: Option<&bool>) {
    clip.unwrap();
    let item = SpecificDisplayItem::PopStackingContext;
    do_item(&DI {
            item,
    });
    do_item(&DI {
            item,
    });
}

pub enum SpecificDisplayItem {
    PopStackingContext,
    Other([f64; 22]),
}

struct DI {
    item: SpecificDisplayItem,
}


fn do_item(di: &DI) { unsafe { ext(di) } }
extern {
    fn ext(di: &DI);
}

Nightly optimizes this properly, but with this pass, one of the DI instances is directly initialized (good), but then this is used to initialize the other instance, which is bad because that means that the memcpy can no longer be omitted and the lifetimes of those stack allocations are now overlapping so stack coloring can no longer collapse them into a single stack slot, increasing memory usage.

@dotdash
Copy link
Contributor

dotdash commented Jun 26, 2020

Since I always expected this to work the other way around, replacing dest with source, not the other way around, what's the reasoning behind doing it this way? Feel free to just point me somewhere where I can read up on it myself, I just didn't find anything right away.

@bjorn3
Copy link
Member

bjorn3 commented Jun 26, 2020

The return variable is always called _0, so if you have _0 = _10, then every assignment of _10 must be replaced with an assignment of _0 to perform destination propagation.

@dotdash
Copy link
Contributor

dotdash commented Jun 26, 2020

That's what RVO does, assigning the return slot to a value that would otherwise use a separate stack slot, but that's just one way. Doing copy propagation works just as well as long as there is only one possible value that is being assigned.

_5 = "Hello"
_0 = _5

is transformed to the following by this pass:

_0 = "Hello"
nop

but copy propagation would lead to:

_5 = "Hello"
_0 = "Hello"

Something like dead store elimination would then have clean up the dead assignment to _5 (or have a built-in DSE).

The example from above is akin to

_5 = "Hello"
_2 = _5
call(_2)
_3 = _5
call(_3)

and gets converted to

_3 = "Hello"
_2 = _3
call(_2)
nop
call(_3)

while copy propagation would give

_5 = "Hello"
_2 = "Hello"
call(_2)
_3 = "Hello"
call(_3)

The difference being that copy propagation doesn't force the two destinations to be alive at the same time, and apparently allowing more optimizations to happen e.g. in case of aggregates.

The case where the approach taken here works better is like this:

if _10, goto bb1 else goto bb2

bb1:
  _2 = "Hello"
  goto bb3

bb2:
  _2 = "Bye"
  goto bb3

bb3:
  _5 = _2

I'm wondering whether it's more common to use a single value in multiple places, or assign to single destination from a value that has been set in multiple places.

I suppose the truth is somewhere in between... Maybe limiting this pass to cases where only a single replacement is possible is useful?

@bors

This comment has been minimized.

@JohnCSimon JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 14, 2020
@Dylan-DPC-zz Dylan-DPC-zz added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 30, 2020
@bors bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Sep 18, 2020
@bors
Copy link
Contributor

bors commented Sep 19, 2020

⌛ Testing commit 2f9271b with merge 5b31eada25b8031d6ddf1ef9c45878b0751431e7...

@bors
Copy link
Contributor

bors commented Sep 19, 2020

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Sep 19, 2020
@jonas-schievink
Copy link
Contributor Author

Assertion failed: ((((rwlock_t *)*rwl)->valid == LIFE_RWLOCK) && (((rwlock_t *)*rwl)->busy > 0)), file ../../src/mingw-w64/mingw-w64-libraries/winpthreads/src/rwlock.c, line 40

Scary but spurious?

@bors retry

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 19, 2020
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Sep 20, 2020
Implement a generic Destination Propagation optimization on MIR

This takes the work that was originally started by @eddyb in rust-lang#47954, and then explored by me in rust-lang#71003, and implements it in a general (ie. not limited to acyclic CFGs) and dataflow-driven way (so that no additional infrastructure in rustc is needed).

The pass is configured to run at `mir-opt-level=2` and higher only. To enable it by default, some followup work on it is still needed:
* Performance needs to be evaluated. I did some light optimization work and tested against `tuple-stress`, which caused trouble in my last attempt, but didn't go much in depth here.
  * We can also enable the pass only at `opt-level=2` and higher, if it is too slow to run in debug mode, but fine when optimizations run anyways.
* Debuginfo needs to be fixed after locals are merged. I did not look into what is required for this.
* Live ranges of locals (aka `StorageLive` and `StorageDead`) are currently deleted. We either need to decide that this is fine, or if not, merge the variable's live ranges (or remove these statements entirely – rust-lang#68622).

Some benchmarks of the pass were done in rust-lang#72635.
@bors
Copy link
Contributor

bors commented Sep 20, 2020

⌛ Testing commit 2f9271b with merge 255a4c5...

@bors
Copy link
Contributor

bors commented Sep 20, 2020

☀️ Test successful - checks-actions, checks-azure
Approved by: oli-obk
Pushing 255a4c5 to master...

@rust-log-analyzer
Copy link
Collaborator

The job i686-mingw-1 of your PR failed (pretty log, raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
[RUSTC-TIMING] rustc_resolve test:false 127.764
[RUSTC-TIMING] rustc_plugin_impl test:false 13.873
[RUSTC-TIMING] rustc_privacy test:false 49.555
   Compiling rustc_interface v0.0.0 (D:\a\rust\rust\compiler\rustc_interface)
Assertion failed: ((((rwlock_t *)*rwl)->valid == LIFE_RWLOCK) && (((rwlock_t *)*rwl)->busy > 0)), file ../../src/mingw-w64/mingw-w64-libraries/winpthreads/src/rwlock.c, line 40
error: could not compile `rustc_interface`.

To learn more, run the command again with --verbose.



command did not execute successfully: "D:\\a\\rust\\rust\\build\\i686-pc-windows-gnu\\stage0\\bin\\cargo.exe" "test" "--target" "i686-pc-windows-gnu" "-Zbinary-dep-depinfo" "-j" "8" "--release" "--locked" "--color" "always" "--features" " llvm max_level_info" "--manifest-path" "D:\\a\\rust\\rust\\compiler/rustc/Cargo.toml" "-p" "rustc_interface" "--"


failed to run: D:\a\rust\rust\build\bootstrap\debug\bootstrap test --stage 2 --exclude src/test/ui --exclude src/test/compile-fail
Build completed unsuccessfully in 0:45:08
Build completed unsuccessfully in 0:45:08
make: *** [Makefile:82: ci-mingw-subset-1] Error 1
  local time: Sat Sep 19 23:05:35 CUT 2020
  network time: Sat, 19 Sep 2020 23:05:35 GMT
== end clock drift check ==
== end clock drift check ==
##[error]Process completed with exit code 2.
Terminate orphan process: pid (3068) (node)
Terminate orphan process: pid (7104) (python)
Terminate orphan process: pid (3936) (sccache)

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @rust-lang/infra. (Feature Requests)

@mati865
Copy link
Contributor

mati865 commented Sep 22, 2020

@jonas-schievink is there issue open for that assertion?

@jonas-schievink
Copy link
Contributor Author

No, I haven't seen that happen anywhere else so far

Comment on lines +659 to +660
// The intended semantics here aren't documented, we just assume that nothing that
// could be written to by the assembly may overlap with any other operands.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantics of inline assembly are documented in the unstable book:

Operand expressions are evaluated from left to right, just like function call arguments. After the asm! has executed, outputs are written to in left to right order. This is significant if two outputs point to the same place: that place will contain the value of the rightmost output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.