interpret: make typed copies lossy wrt provenance and padding #129778

RalfJung · 2024-08-30T08:57:21Z

A "typed copy" in Rust can be a lossy process: when copying at type usize (or any other non-pointer type), if the original memory had any provenance, that provenance is lost. When copying at pointer type, if the original memory had partial provenance (i.e., not the same provenance for all bytes), that provenance is lost. When copying any type with padding, the contents of padding are lost.

This PR equips our validity-checking pass with the ability to reset provenance and padding according to those rules. Can be reviewed commit-by-commit. The first three commits are just preparation without any functional change.

Fixes rust-lang/miri#845
Fixes rust-lang/miri#2182

rustbot · 2024-08-30T08:57:29Z

r? @michaelwoerister

rustbot has assigned @michaelwoerister.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

rustbot · 2024-08-30T08:57:31Z

Some changes occurred to the CTFE / Miri engine

cc @rust-lang/miri

Some changes occurred to the CTFE / Miri engine

cc @rust-lang/miri

The Miri subtree was changed

cc @rust-lang/miri

michaelwoerister · 2024-09-03T08:32:46Z

r? compiler

saethlin · 2024-09-03T14:35:46Z

r? saethlin

compiler/rustc_const_eval/src/interpret/validity.rs

saethlin · 2024-09-05T04:44:05Z

compiler/rustc_const_eval/src/const_eval/machine.rs

@@ -65,6 +65,9 @@ pub struct CompileTimeMachine<'tcx> {
    /// storing the result in the given `AllocId`.
    /// Used to prevent reads from a static's base allocation, as that may allow for self-initialization loops.
    pub(crate) static_root_ids: Option<(AllocId, LocalDefId)>,
+
+    /// A cache of "data range" computations for unions (i.e., the offsets of non-padding bytes).


This seems like the sort of thing that would normally be a query. Is there a reason it isn't?

I'm not sure we even want this cached during a normal compilation (but it probably won't hurt much either), and the RangeSet type is currently only declared in the interpreter so can't be used as a query. Also I am largely unfamiliar with the query system so adding a query is scary. ;)

Ah, because we won't encounter unions repeatedly in const eval. That makes sense.

If we have many constants of the same (union) type, then we would encounter it repeatedly and caching might be worth it. But we don't do validation during const-eval by default so the cache is not nearly as necessary as for Miri.

I can see how making this a query makes sense. But it would probably be a query we wouldn't want to serialize for incremental (I assume there's a way to do that), and as I said I am not sure where we would then put the RangeSet. Probably rustc_data_structures, but that seems odd for such a specific data structure. I'm open for suggestions though. :)

You've made a good case that we shouldn't make this into a query in this PR, I'm filing away this conversation for the next time I'm squinting at interpreter profiles :)

saethlin · 2024-09-05T05:08:06Z

compiler/rustc_const_eval/src/util/check_validity_requirement.rs

+        .validate_operand(
+            &allocated.into(),
+            /*recursive*/ false,
+            /*reset_provenance_and_padding*/ false,


Why does this single caller not want provenance and padding reset? I looked at the docs for this function for context and I think they refer to a function might_permit_raw_init that doesn't exist?

This is used for assert_zero_valid / assert_mem_uninitialized_valid. The CompileTimeMachine that we operate in here is discarded at the end of this function, alongside the data we are validating here. So there's no reason to reset its padding and provenance, nobody will ever see those writes anyway.

So this function parameter and all the branching on it is supposed to be a performance optimization?

FYI I have taken to always using the hyperlink rustdoc syntax when mentioning other functions so that the link checker ensures that I can't mention a function that doesn't exist. The habit has already prevented a few of my doc comments from rotting. It's extra work, up to you if you want to adopt it.

So this function parameter and all the branching on it is supposed to be a performance optimization?

That's one part.

Also this is not the only caller that doesn't to the reset -- const_validate_operand also doesn't reset. Basically I wanted this change to be Miri-only.

And validate_operand in the recursive case, when it validates behind references, also doesn't reset padding and provenance -- and that is crucial, it would be wrong to reset this (and indeed the Miri test suite fails when you try, since if you ever have a reference to read-only memory it then blows up when trying to reset padding and provenance there). So branching in the validator on whether we reset provenance and padding is definitely required.

saethlin · 2024-09-05T05:14:03Z

I'm very happy to see less interior mutability. The compiler uses more casual Cell and RefCell than any other codebase I've worked in which seems ironic.

@rustbot author

rust-timer · 2024-09-10T08:03:37Z

Finished benchmarking commit (304b7f8): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	2.1%	[0.2%, 4.7%]	13
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Max RSS (memory usage)

Results (secondary -2.8%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-2.8%	[-3.7%, -1.4%]	3
All ❌✅ (primary)	-	-	0

Cycles

Results (secondary -2.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-2.3%	[-2.3%, -2.3%]	1
All ❌✅ (primary)	-	-	0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 757.734s -> 756.628s (-0.15%)
Artifact size: 341.11 MiB -> 341.23 MiB (0.03%)

RalfJung · 2024-09-10T08:13:31Z

Oh wow, just having these branches sit there, even if they are never taken, shows up so strongly in the benchmarks?

I could add a Machine associated const that should make the branches trivially optimized away... but that will also mean one cannot enable resetting padding with -Zextra-const-ub-checks any more.

Kobzol · 2024-09-10T08:15:57Z

It only got a significant regression on CTFE stress tests. If you don't think that this should affect real programs, I think it would be fine to just take the hit (as it sounds that the proposed optimization would introduce a lot of complexity into the code).

RalfJung · 2024-09-10T08:30:10Z

It wouldn't be a lot of complexity, but it would make the (nightly-only) const-eval UB check less effective. But those checks already miss a bunch of UB anyway so maybe that's okay.

…venance-reset, r=<try> interpreter typed copy validation: statically disable padding/provenance in CTFE machine Let's see if this fixes the perf impact in rust-lang#129778. r? `@saethlin`

experiment: see where the perf regressions in rust-lang#129778 come from Let's see if we can figure out what caused the perf impact in rust-lang#129778. There are some extra functions in a few places so maybe more `inline(always)` helps... r? `@saethlin`

recovers some of the perf regressions from rust-lang#129778

djkoloski · 2024-09-11T03:25:29Z

Is it intended that after this change MaybeUninit::zeroed() no longer returns a completely-zeroed value? Because it's being returned from a function, the padding bytes of the interior T are being uninitialized.

I ran into this just now in rkyv, as I was using MaybeUninit::zeroed() to make a zeroed struct that eventually gets cast to bytes and written to a vector.

saethlin · 2024-09-11T04:22:26Z

Function returns are a typed copy. This PR only changed the behavior of the const eval interpreter, codegen has always been capable of breaking such code.

RalfJung · 2024-09-11T06:04:02Z

Yes, this is intended -- padding bytes are always uninitialized. And for very unfortunate reasons related to repr(transparent) and how the C ABI works on some targets, MaybeUninit<T> inherits the padding of T.

Happy to hear that is already found a bug in the wild. :D Is there something we can link to for this bug? If you wouldn't mind I'd like to add this to the "bugs Miri found" list.

djkoloski · 2024-09-11T06:36:45Z

Function returns are a typed copy.

Of course. I think the more salient question here is "what does a typed copy mean for unions?". Is there any documentation about what bytes are considered padding in unions? Is just MaybeUninit special? I think there's an argument to be made that unions having padding doesn't make sense, as what's stored in a union is not generally known to the compiler. I personally think "unions have padding where every field has padding" could make sense too, but I haven't seen any documentation about this as far as I can remember.

Is there something we can link to for this bug?

I think the best I have is this failed CI run. I haven't checked for other potentially-affected sources yet. Feel free to add whatever you'd like! 🙂

RalfJung · 2024-09-11T08:16:59Z

I think the more salient question here is "what does a typed copy mean for unions?".

Yes, that is indeed non-trivial.

I would love it to be a fully copy of all bytes, but sadly, that does not work. So currently we use "a union has padding where all fields have padding". I'd love to change this (see rust-lang/unsafe-code-guidelines#518), but I am not sure if that is possible.

interpret: mark some hot functions inline(always) That seems to recover a good part of the perf impact of rust-lang#129778. r? `@saethlin`

interpret: mark some hot functions inline(always) That seems to recover a good part of the perf impact of rust-lang/rust#129778. r? `@saethlin`

MaybeUninit::zeroed: mention that padding is not zeroed That should clarify cases like [this](rust-lang#129778 (comment)).

Rollup merge of rust-lang#130214 - RalfJung:zeroed, r=Mark-Simulacrum MaybeUninit::zeroed: mention that padding is not zeroed That should clarify cases like [this](rust-lang#129778 (comment)).

recovers some of the perf regressions from rust-lang#129778

rustbot assigned michaelwoerister Aug 30, 2024

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Aug 30, 2024

RalfJung force-pushed the interp-lossy-typed-copy branch 3 times, most recently from ea84409 to 8d243c6 Compare August 30, 2024 09:22

This comment has been minimized.

Sign in to view

RalfJung force-pushed the interp-lossy-typed-copy branch from 8d243c6 to c9f962d Compare August 30, 2024 10:31

This comment has been minimized.

Sign in to view

RalfJung force-pushed the interp-lossy-typed-copy branch 4 times, most recently from 3c0e117 to cd76305 Compare August 30, 2024 12:06

This comment has been minimized.

Sign in to view

RalfJung force-pushed the interp-lossy-typed-copy branch from cd76305 to 327cf31 Compare August 30, 2024 12:54

rustbot assigned BoxyUwU and unassigned michaelwoerister Sep 3, 2024

rustbot assigned saethlin and unassigned BoxyUwU Sep 3, 2024

RalfJung commented Sep 3, 2024

View reviewed changes

compiler/rustc_const_eval/src/interpret/validity.rs Show resolved Hide resolved

saethlin reviewed Sep 5, 2024

View reviewed changes

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 5, 2024

RalfJung force-pushed the interp-lossy-typed-copy branch from e140716 to 9529feb Compare September 5, 2024 19:51

rustbot added this to the 1.83.0 milestone Sep 10, 2024

This was referenced Sep 10, 2024

const-eval interning: accept interior mutable pointers in final value #128543

Merged

miri: treat non-memory local variables properly for data race detection #129828

Merged

rustbot added the perf-regression Performance regression. label Sep 10, 2024

RalfJung deleted the interp-lossy-typed-copy branch September 10, 2024 08:16

RalfJung mentioned this pull request Sep 10, 2024

interpreter typed copy validation: statically disable padding/provenance in CTFE machine #130182

Closed

RalfJung added a commit to RalfJung/rust that referenced this pull request Sep 10, 2024

experiment: see where the perf regressions in rust-lang#129778 come from

0852bf2

RalfJung mentioned this pull request Sep 10, 2024

interpret: mark some hot functions inline(always) #130197

Merged

RalfJung added a commit to RalfJung/rust that referenced this pull request Sep 10, 2024

interpret: mark some hot functions inline(always)

d104ded

recovers some of the perf regressions from rust-lang#129778

RalfJung mentioned this pull request Sep 11, 2024

MaybeUninit::zeroed: mention that padding is not zeroed #130214

Merged

jswrenn mentioned this pull request Sep 12, 2024

Defend against padding instability google/zerocopy#1648

Closed

Zalathar added a commit to Zalathar/rust that referenced this pull request Sep 15, 2024

Rollup merge of rust-lang#130214 - RalfJung:zeroed, r=Mark-Simulacrum

235644f

MaybeUninit::zeroed: mention that padding is not zeroed That should clarify cases like [this](rust-lang#129778 (comment)).

Zalathar added a commit to Zalathar/rust that referenced this pull request Sep 15, 2024

Rollup merge of rust-lang#130214 - RalfJung:zeroed, r=Mark-Simulacrum

0648987

MaybeUninit::zeroed: mention that padding is not zeroed That should clarify cases like [this](rust-lang#129778 (comment)).

Mark-Simulacrum added the perf-regression-triaged The performance regression has been triaged. label Sep 16, 2024

RalfJung mentioned this pull request Oct 1, 2024

Automatic Rustup rust-lang/miri-test-libstd#79

Merged

shrirambalaji pushed a commit to shrirambalaji/rust that referenced this pull request Oct 6, 2024

interpret: mark some hot functions inline(always)

7a0e2ef

recovers some of the perf regressions from rust-lang#129778

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

interpret: make typed copies lossy wrt provenance and padding #129778

interpret: make typed copies lossy wrt provenance and padding #129778

RalfJung commented Aug 30, 2024 •

edited

Loading

rustbot commented Aug 30, 2024

rustbot commented Aug 30, 2024

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

michaelwoerister commented Sep 3, 2024

saethlin commented Sep 3, 2024

saethlin Sep 5, 2024

RalfJung Sep 5, 2024

saethlin Sep 5, 2024

RalfJung Sep 5, 2024

saethlin Sep 5, 2024

saethlin Sep 5, 2024

RalfJung Sep 5, 2024

saethlin Sep 5, 2024

saethlin Sep 5, 2024

RalfJung Sep 5, 2024 •

edited

Loading

saethlin commented Sep 5, 2024

rust-timer commented Sep 10, 2024

RalfJung commented Sep 10, 2024

Kobzol commented Sep 10, 2024

RalfJung commented Sep 10, 2024

djkoloski commented Sep 11, 2024

saethlin commented Sep 11, 2024

RalfJung commented Sep 11, 2024 •

edited

Loading

djkoloski commented Sep 11, 2024

RalfJung commented Sep 11, 2024

interpret: make typed copies lossy wrt provenance and padding #129778

interpret: make typed copies lossy wrt provenance and padding #129778

Conversation

RalfJung commented Aug 30, 2024 • edited Loading

rustbot commented Aug 30, 2024

rustbot commented Aug 30, 2024

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

michaelwoerister commented Sep 3, 2024

saethlin commented Sep 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RalfJung Sep 5, 2024 • edited Loading

Choose a reason for hiding this comment

saethlin commented Sep 5, 2024

rust-timer commented Sep 10, 2024

Overall result: ❌ regressions - ACTION NEEDED

RalfJung commented Sep 10, 2024

Kobzol commented Sep 10, 2024

RalfJung commented Sep 10, 2024

djkoloski commented Sep 11, 2024

saethlin commented Sep 11, 2024

RalfJung commented Sep 11, 2024 • edited Loading

djkoloski commented Sep 11, 2024

RalfJung commented Sep 11, 2024

RalfJung commented Aug 30, 2024 •

edited

Loading

RalfJung Sep 5, 2024 •

edited

Loading

RalfJung commented Sep 11, 2024 •

edited

Loading