-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove special-case handling of vec.split_off(0)
#119917
Conversation
r? @cuviper (rustbot has picked a reviewer for you, use r? to override) |
That isn't really a cost since it's the same old allocation. I'd say the extra cost is the
😮💨 I wish people stopped documenting implementation details that aren't necessary to the function of a method. I think we should ping @richkadel though, perhaps he can give arguments in favor of the existing code. Also note that we now have |
The “cost” that I have in mind is not just the larger allocation for In many cases, that cost won't matter. The returned vector might happen to be more than half-full, or it might be very short-lived, or it might be only a negligible fraction of the overall program's memory usage. But if (If the user knows this quirk of the implementation, and wants to avoid it, they may end up having to copy the returned vector into a new vector with a more appropriate capacity. Doing so completely undermines the benefit of trying to avoid the copy in the first place.) If the standard library decides to incur that extra cost on the user's behalf in the (I think it's also significant that this choice of implementation only comes up for the
Curiously, Calling |
Thank you for identifying this issue and raising some great points! I really appreciate that @Zalathar took the time to read through my original defense in #76682, so I don't need to reiterate everything. I do want to raise awareness (for anyone that didn't read through that PR) that there were rudimentary tests done to show the performance benefit of the change, for the use case I was trying to solve. So to revert it now will negatively impact existing code that currently depends on the behavior from 2021 until now. In the PR, we did discuss the fact that the rustdoc for I did mention So to simply revert it puts us back in what I think is still a less than ideal place. But I 100% agree that the capacity issues of both the initial value and the returned value are problematic. My opinion is just mine, so I hope you get input from a few others, including @Mark-Simulacrum who approved the original change, but I suggest:
If you change it, I think it's especially important to update the rustdoc this time around. Recommending the |
There are other possible approaches such as only moving the allocation when it's not significantly oversized. |
Yeah, regardless of whether this PR is accepted or not, it would be good to add suggestions for other similar options, especially in the
Also as mentioned above, the current implementation is inconsistent with the “newly allocated vector” wording, so that tension ought to be resolved in some way. |
I have also found that https://doc.rust-lang.org/std/collections/index.html#sequences lists the performance of |
As long as there is no call to realloc happening capacity shouldn't factor into performance since requesting uninitialized memory from the OS doesn't depend on the size of the allocation. |
I think this makes sense, especially with doc additions to discuss the various capacity options. But it's enough of a change to warrant talking as a team. @rustbot +I-libs-nominated |
This was raised in https://rust-lang.zulipchat.com/#narrow/stream/259402-t-libs.2Fmeetings/topic/Meeting.202024-01-17/near/416062684, but that particular discussion didn't reach a specific conclusion/consensus before moving on to other nominated issues. |
…s, r=dtolnay Document some alternatives to `Vec::split_off` One of the discussion points that came up in rust-lang#119917 is that some people use `Vec::split_off` in cases where they probably shouldn't, because the alternatives (like `mem::take`) are hard to discover. This PR adds some suggestions to the documentation of `split_off` that should point people towards alternatives that might be more appropriate for their use-case. I've deliberately tried to keep these changes as simple and uncontroversial as possible, so that they don't depend on how the team decides to handle the concerns raised in rust-lang#119917. That's why I haven't touched the existing documentation for `split_off`, and haven't added links to `split_off` to the documentation of other methods.
…s, r=dtolnay Document some alternatives to `Vec::split_off` One of the discussion points that came up in rust-lang#119917 is that some people use `Vec::split_off` in cases where they probably shouldn't, because the alternatives (like `mem::take`) are hard to discover. This PR adds some suggestions to the documentation of `split_off` that should point people towards alternatives that might be more appropriate for their use-case. I've deliberately tried to keep these changes as simple and uncontroversial as possible, so that they don't depend on how the team decides to handle the concerns raised in rust-lang#119917. That's why I haven't touched the existing documentation for `split_off`, and haven't added links to `split_off` to the documentation of other methods.
Rollup merge of rust-lang#120180 - Zalathar:vec-split-off-alternatives, r=dtolnay Document some alternatives to `Vec::split_off` One of the discussion points that came up in rust-lang#119917 is that some people use `Vec::split_off` in cases where they probably shouldn't, because the alternatives (like `mem::take`) are hard to discover. This PR adds some suggestions to the documentation of `split_off` that should point people towards alternatives that might be more appropriate for their use-case. I've deliberately tried to keep these changes as simple and uncontroversial as possible, so that they don't depend on how the team decides to handle the concerns raised in rust-lang#119917. That's why I haven't touched the existing documentation for `split_off`, and haven't added links to `split_off` to the documentation of other methods.
After a followup discussion, and with the docs already added in #120180, I think we're good! @bors r+ |
…iaskrgr Rollup of 9 pull requests Successful merges: - rust-lang#117420 (Make `#![allow_internal_unstable(..)]` work with `stmt_expr_attributes`) - rust-lang#117678 (Stabilize `slice_group_by`) - rust-lang#119917 (Remove special-case handling of `vec.split_off(0)`) - rust-lang#120117 (Update `std::io::Error::downcast` return type) - rust-lang#120329 (RFC 3349 precursors) - rust-lang#120339 (privacy: Refactor top-level visiting in `NamePrivacyVisitor`) - rust-lang#120345 (Clippy subtree update) - rust-lang#120360 (Don't fire `OPAQUE_HIDDEN_INFERRED_BOUND` on sized return of AFIT) - rust-lang#120372 (Fix outdated comment on Box) r? `@ghost` `@rustbot` modify labels: rollup
Rollup merge of rust-lang#119917 - Zalathar:split-off, r=cuviper Remove special-case handling of `vec.split_off(0)` rust-lang#76682 added special handling to `Vec::split_off` for the case where `at == 0`. Instead of copying the vector's contents into a freshly-allocated vector and returning it, the special-case code steals the old vector's allocation, and replaces it with a new (empty) buffer with the same capacity. That eliminates the need to copy the existing elements, but comes at a surprising cost, as seen in rust-lang#119913. The returned vector's capacity is no longer determined by the size of its contents (as would be expected for a freshly-allocated vector), and instead uses the full capacity of the old vector. In cases where the capacity is large but the size is small, that results in a much larger capacity than would be expected from reading the documentation of `split_off`. This is especially bad when `split_off` is called in a loop (to recycle a buffer), and the returned vectors have a wide variety of lengths. I believe it's better to remove the special-case code, and treat `at == 0` just like any other value: - The current documentation states that `split_off` returns a “newly allocated vector”, which is not actually true in the current implementation when `at == 0`. - If the value of `at` could be non-zero at runtime, then the caller has already agreed to the cost of a full memcpy of the taken elements in the general case. Avoiding that copy would be nice if it were close to free, but the different handling of capacity means that it is not. - If the caller specifically wants to avoid copying in the case where `at == 0`, they can easily implement that behaviour themselves using `mem::replace`. Fixes rust-lang#119913.
Pkgsrc changes: * Adapt checksums and patches. Upstream chnages: Version 1.77.0 (2024-03-21) ========================== - [Reveal opaque types within the defining body for exhaustiveness checking.] (rust-lang/rust#116821) - [Stabilize C-string literals.] (rust-lang/rust#117472) - [Stabilize THIR unsafeck.] (rust-lang/rust#117673) - [Add lint `static_mut_refs` to warn on references to mutable statics.] (rust-lang/rust#117556) - [Support async recursive calls (as long as they have indirection).] (rust-lang/rust#117703) - [Undeprecate lint `unstable_features` and make use of it in the compiler.] (rust-lang/rust#118639) - [Make inductive cycles in coherence ambiguous always.] (rust-lang/rust#118649) - [Get rid of type-driven traversal in const-eval interning] (rust-lang/rust#119044), only as a [future compatiblity lint] (rust-lang/rust#122204) for now. - [Deny braced macro invocations in let-else.] (rust-lang/rust#119062) Compiler -------- - [Include lint `soft_unstable` in future breakage reports.] (rust-lang/rust#116274) - [Make `i128` and `u128` 16-byte aligned on x86-based targets.] (rust-lang/rust#116672) - [Use `--verbose` in diagnostic output.] (rust-lang/rust#119129) - [Improve spacing between printed tokens.] (rust-lang/rust#120227) - [Merge the `unused_tuple_struct_fields` lint into `dead_code`.] (rust-lang/rust#118297) - [Error on incorrect implied bounds in well-formedness check] (rust-lang/rust#118553), with a temporary exception for Bevy. - [Fix coverage instrumentation/reports for non-ASCII source code.] (rust-lang/rust#119033) - [Fix `fn`/`const` items implied bounds and well-formedness check.] (rust-lang/rust#120019) - [Promote `riscv32{im|imafc}-unknown-none-elf` targets to tier 2.] (rust-lang/rust#118704) - Add several new tier 3 targets: - [`aarch64-unknown-illumos`] (rust-lang/rust#112936) - [`hexagon-unknown-none-elf`] (rust-lang/rust#117601) - [`riscv32imafc-esp-espidf`] (rust-lang/rust#119738) - [`riscv32im-risc0-zkvm-elf`] (rust-lang/rust#117958) Refer to Rust's [platform support page][platform-support-doc] for more information on Rust's tiered platform support. Libraries --------- - [Implement `From<&[T; N]>` for `Cow<[T]>`.] (rust-lang/rust#113489) - [Remove special-case handling of `vec.split_off (0)`.](rust-lang/rust#119917) Stabilized APIs --------------- - [`array::each_ref`] (https://doc.rust-lang.org/stable/std/primitive.array.html#method.each_ref) - [`array::each_mut`] (https://doc.rust-lang.org/stable/std/primitive.array.html#method.each_mut) - [`core::net`] (https://doc.rust-lang.org/stable/core/net/index.html) - [`f32::round_ties_even`] (https://doc.rust-lang.org/stable/std/primitive.f32.html#method.round_ties_even) - [`f64::round_ties_even`] (https://doc.rust-lang.org/stable/std/primitive.f64.html#method.round_ties_even) - [`mem::offset_of!`] (https://doc.rust-lang.org/stable/std/mem/macro.offset_of.html) - [`slice::first_chunk`] (https://doc.rust-lang.org/stable/std/primitive.slice.html#method.first_chunk) - [`slice::first_chunk_mut`] (https://doc.rust-lang.org/stable/std/primitive.slice.html#method.first_chunk_mut) - [`slice::split_first_chunk`] (https://doc.rust-lang.org/stable/std/primitive.slice.html#method.split_first_chunk) - [`slice::split_first_chunk_mut`] (https://doc.rust-lang.org/stable/std/primitive.slice.html#method.split_first_chunk_mut) - [`slice::last_chunk`] (https://doc.rust-lang.org/stable/std/primitive.slice.html#method.last_chunk) - [`slice::last_chunk_mut`] (https://doc.rust-lang.org/stable/std/primitive.slice.html#method.last_chunk_mut) - [`slice::split_last_chunk`] (https://doc.rust-lang.org/stable/std/primitive.slice.html#method.split_last_chunk) - [`slice::split_last_chunk_mut`] (https://doc.rust-lang.org/stable/std/primitive.slice.html#method.split_last_chunk_mut) - [`slice::chunk_by`] (https://doc.rust-lang.org/stable/std/primitive.slice.html#method.chunk_by) - [`slice::chunk_by_mut`] (https://doc.rust-lang.org/stable/std/primitive.slice.html#method.chunk_by_mut) - [`Bound::map`] (https://doc.rust-lang.org/stable/std/ops/enum.Bound.html#method.map) - [`File::create_new`] (https://doc.rust-lang.org/stable/std/fs/struct.File.html#method.create_new) - [`Mutex::clear_poison`] (https://doc.rust-lang.org/stable/std/sync/struct.Mutex.html#method.clear_poison) - [`RwLock::clear_poison`] (https://doc.rust-lang.org/stable/std/sync/struct.RwLock.html#method.clear_poison) Cargo ----- - [Extend the build directive syntax with `cargo::`.] (rust-lang/cargo#12201) - [Stabilize metadata `id` format as `PackageIDSpec`.] (rust-lang/cargo#12914) - [Pull out as `cargo-util-schemas` as a crate.] (rust-lang/cargo#13178) - [Strip all debuginfo when debuginfo is not requested.] (rust-lang/cargo#13257) - [Inherit jobserver from env for all kinds of runners.] (rust-lang/cargo#12776) - [Deprecate rustc plugin support in cargo.] (rust-lang/cargo#13248) Rustdoc ----- - [Allows links in markdown headings.] (rust-lang/rust#117662) - [Search for tuples and unit by type with `()`.] (rust-lang/rust#118194) - [Clean up the source sidebar's hide button.] (rust-lang/rust#119066) - [Prevent JS injection from `localStorage`.] (rust-lang/rust#120250) Misc ---- - [Recommend version-sorting for all sorting in style guide.] (rust-lang/rust#115046) Internal Changes ---------------- These changes do not affect any public interfaces of Rust, but they represent significant improvements to the performance or internals of rustc and related tools. - [Add more weirdness to `weird-exprs.rs`.] (rust-lang/rust#119028)
#76682 added special handling to
Vec::split_off
for the case whereat == 0
. Instead of copying the vector's contents into a freshly-allocated vector and returning it, the special-case code steals the old vector's allocation, and replaces it with a new (empty) buffer with the same capacity.That eliminates the need to copy the existing elements, but comes at a surprising cost, as seen in #119913. The returned vector's capacity is no longer determined by the size of its contents (as would be expected for a freshly-allocated vector), and instead uses the full capacity of the old vector.
In cases where the capacity is large but the size is small, that results in a much larger capacity than would be expected from reading the documentation of
split_off
. This is especially bad whensplit_off
is called in a loop (to recycle a buffer), and the returned vectors have a wide variety of lengths.I believe it's better to remove the special-case code, and treat
at == 0
just like any other value:split_off
returns a “newly allocated vector”, which is not actually true in the current implementation whenat == 0
.at
could be non-zero at runtime, then the caller has already agreed to the cost of a full memcpy of the taken elements in the general case. Avoiding that copy would be nice if it were close to free, but the different handling of capacity means that it is not.at == 0
, they can easily implement that behaviour themselves usingmem::replace
.Fixes #119913.