Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Iterator::collect_array method #79659

Closed
wants to merge 1 commit into from

Conversation

matklad
Copy link
Member

@matklad matklad commented Dec 3, 2020

With const-generics, we can now collect exactly N elements out of
the iterator.

Very similar API existed in the itertools crate for quite some time:

https://docs.rs/itertools/0.9.0/itertools/trait.Itertools.html#method.collect_tuple

This needs more thorough tests before merging (esp around Drop & panic), I'll add them if this API as a whole feels like a good idea.

@matklad matklad added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. C-feature-request Category: A feature request, i.e: not implemented / a PR. A-const-generics Area: const generics (parameters and arguments) labels Dec 3, 2020
@rust-highfive
Copy link
Collaborator

r? @sfackler

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 3, 2020
where
Self: Sized,
{
struct Buf<T, const N: usize> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have this type somewhere already?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there is one in [T; N]::map() method

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there are more copies in other PRs, like #79451.

Maybe it'd be worth waiting for #75644? Then this might essentially just be [T; N]::try_generate(|_| it.next())...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we need core::collections::array_vec already :D

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm currently writing Iterator::array_chunks and I have +- the same code :)

(though it's a bit more convoluted since DoubleEndedItetator impl requires an ability to push_back)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we rarely get to our needs decision items 🙃

If we keep tabs on these unwind-safe-partially-initialized-array types we're introducing I'd be happy not to block any of these PRs individually on them and run a pass over later to replace them with something more general.

@matklad matklad force-pushed the collect_array branch 3 times, most recently from 9cf42aa to 7db95bf Compare December 3, 2020 11:20
With const-generics, we can now collect exactly `N` elements out of
the iterator.

Very similar API existed in the itertools crate for quite some time:

https://docs.rs/itertools/0.9.0/itertools/trait.Itertools.html#method.collect_tuple
@lcnr
Copy link
Contributor

lcnr commented Dec 3, 2020

is there a reason to prefer this over a more general FromIterator impl for arrays?

Do you intend to keep this as unstable until we are comfortable adding the impl as that is impl stable, similar to array::IntoIter::new?

@matklad
Copy link
Member Author

matklad commented Dec 3, 2020

is there a reason to prefer this over a more general FromIterator impl for arrays?

FromIterator can't file, this method returns an Option (and there's also a can-of-worms question of whether it should return a Result).

Do you intend to keep this as unstable until we are comfortable adding the impl as that is impl stable, similar to array::IntoIter::new?

I think I don't understand the question, could you rephrase it? Shooting in the dark, yes, I expect this to be unstable for quite some time, both because it uses const-generics, and because I think the API itself needs to bake in for some times. However, long term we absolutely should have something like this on stable, in one form or another.

@lcnr
Copy link
Contributor

lcnr commented Dec 3, 2020

We can add FromIterator for Result<[T; N]; Err> or FromIterator for Option<[T; N]> just fine and have a PR open which implements this: #69985

I think I don't understand the question, could you rephrase it?

My second question was based on my understanding that FromIterator for Option<[T; N]> would be preferable to a separate method. If we do want to add a FromIterator impl here I do not see the use of adding collect_array as well but I still think it would be useful to add an unstable method until FromIterator can be landed, because implementing that trait would be instantly stable.

@matklad
Copy link
Member Author

matklad commented Dec 3, 2020

Thanks for clarifying, I indeed misunderstood the question. I'll continue discussion in that PR to not split it over many threads.

@matklad
Copy link
Member Author

matklad commented Dec 3, 2020

I still think it would be useful to add an unstable method until FromIterator can be landed, because implementing that trait would be instantly stable.

I don't have a string opinion here, but it indeed might help with nailing down the fiddly Error type, for example!

@sfackler
Copy link
Member

sfackler commented Dec 4, 2020

I don't think there's any prior art for this kind of fallible collection (even the result propagation impls only fail due to an inner Err rather than a size mismatch). The "collect-to-result" impl is already a bit undiscoverable, and I'd worry that a similar impl would be even more awkward to work with since you'd potentially need to explicitly name the error type chosen.

/// ```
#[inline]
#[unstable(reason = "new API", issue = "none", feature = "iter_collect_array")]
fn collect_array<const N: usize>(self) -> Option<[Self::Item; N]>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pondering: given that this will potentially collect only a prefix, should it be &mut self? The Option is enough to know whether the iterator was exhausted in the call, so one could use .collect_array::<4>() in a loop...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think I misunderstood initially... You want to have "exact N" semantics, but you want &mut self at the same time?

This is interesting!

The problem I am seeing here is that, to get exactly N, you'll need to fetch N + 1 elements, so one would be dropped on the floor or stuffed into the Err variant somehow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that it.collect_array::<3>() would be equivalent to try { [it.next()?, it.next()?, it.next()?] }. So you'd get None if the length is shorter than the number requested, but if there's more you'd be able to call again to get more of them.

(Maybe with a rename like sfackler mentions -- this is reminding me of as_chunks and array_chunks on slices, so maybe something like that?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scottmcm for iterators, self and &mut self is the same thing - you can always use .by_ref()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WaffleLapkin Conceptually yes. But in implementation not exactly -- see all these benchmarks for example.

And if this doesn't exhaust the iterator, then it should be &mut self for consistency with other iterator methods that don't necessarily exhaust the iterator (like all and any, which are also &mut self).

@sfackler
Copy link
Member

sfackler commented Dec 4, 2020

One option to avoid dealing with the "fallible collect" question could be to instead treat this as fn next_n<const N: usize>(&mut self) -> Option<[Self::Item; N]>, to make it more like a variant of next than a variant of collect.

@matklad
Copy link
Member Author

matklad commented Dec 4, 2020

Answering both #79659 (comment) and #79659 (comment): I think we should do what itertool does.

It has (modernizing API) .next_array<const N: usize>(&mut self) -> Option<[T; N]> and .collect_array<const N: usize>(self) -> Option<[T; N]>, where next gets the first N element and fails only for short iterators, while collect checks that there are exactly N elements.

It just seems wrong for .collect to drop excess elements on the floor, it should ensure that the number of elements matches exactly, just like TryFrom<Vec<T>> for [T; N].

To be clear, I am still not sure whether .collect_array or generic .collect is best for collect API, I am talking only about exact N semantcis here.

@matklad
Copy link
Member Author

matklad commented Dec 4, 2020

Urgh, the previous comment actually referes to #69985, as I am confusing the two (didn't have my morning pu'er yet :(). To clarify:

@CryZe
Copy link
Contributor

CryZe commented Dec 4, 2020

Maybe it might make sense to return some kind of Result that lets you decide how you want to deal with the overflowing case, so Result<[T; N], [T; N]> or so with maybe some convenience methods for unwrapping either Ok or Err (maybe as a general method on Result<T, T>, seemingly people want such a method anyway: https://twitter.com/djco/status/1334535545905672192 )

@matklad
Copy link
Member Author

matklad commented Dec 14, 2020

Attempting to summarize discussion here:

  • we need both "take the first n elements" and "check that there are exactly n elements" methods.

  • the first one should clearly be a fn next_n<const: N: usize>(&mut self) -> ??? dedicated method.

  • for the latter, we have to choices:

    • a dedicated collect_array method
    • a collect::<Result<[T; N], _>>() API

    collect would be more composable (Iterator<Item=io::Result<String>> -> io::Result<Result<[String; 2], CollectArrayError>>), but it would also be absolutely undiscoverable.

Implementation wise, we def should wait for core::collections::ArrayVec. We might also need to wait for that for stabilization, as the perfect signature for next_n would be

fn next_n<const N: usize>(&mut self) -> Result<[T; N], ArrayVec<T, N - 1>>;

I suggest the following plan for this PR:

  • repurpose it from "exactly N collect" to "at least N next_n"
  • land with -> Option<[T; N]> signature with the understanding that we'd revise the return type once ArrayVec work lands
  • postpone (close) until ArrayVec or equivalent is avaiable

@scottmcm
Copy link
Member

scottmcm commented Dec 16, 2020

FWIW, I just had a situation where I wanted exactly next_n in a loop -- trying to do a simd version of Sum<f32> following bluss's idea. I tried at least 4 different approaches, and they all optimized poorly even on a copied slice iter -- even a .size_hint().0 check couldn't make multiple .next()s compile without a check per call.

EDIT: for clarity, I did need partial results for this use, so unlike my other comment I wouldn't have been able to use -> Option<[T; N]>.

A thought if we don't want to wait for ArrayVec -- array::IntoIter would work for this.

EDIT2: That version as

fn next_n<const N: usize>(&mut self) -> Result<[T; N], ArrayVec<T, N - 1>>;

sounds amazing, though -- would be exactly right for the simd-sum.

@JohnCSimon JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 11, 2021
@JohnCSimon JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 1, 2021
@the8472
Copy link
Member

the8472 commented Feb 1, 2021

Due to all the similar PRs floating around #81615 should help to centralize design discussion.

Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Feb 20, 2021
…ray, r=dtolnay

Add internal `collect_into_array[_unchecked]` to remove duplicate code

Unlike the similar PRs  rust-lang#69985, rust-lang#75644 and rust-lang#79659, this PR only adds private functions and does not propose any new public API. The change is just for the purpose of avoiding duplicate code.

Many array methods already contained the same kind of code and there are still many array related methods to come (e.g. `Iterator::{chunks, map_windows, next_n, ...}`, `[T; N]::{cloned, copied, ...}`, ...) which all basically need this functionality. Writing custom `unsafe` code for each of those doesn't seem like a good idea. I added two functions in this PR (and not just the `unsafe` version) because I already know that I need the `Option`-returning version for `Iterator::map_windows`.

This is closely related to rust-lang#81615. I think that all options listed in that issue can be implemented using the function added in this PR. The only instance where `collect_array_into` might not be general enough is when the caller want to handle incomplete arrays manually. Currently, if `iter` yields fewer than `N` items, `None` is returned and the already yielded items are dropped. But as this is just a private function, it can be made more general in future PRs.

And while this was not the goal, this seems to lead to better assembly for `array::map`: https://rust.godbolt.org/z/75qKTa (CC `@JulianKnodt)`

Let me know what you think :)

CC `@matklad` `@bstrie`
@bors
Copy link
Contributor

bors commented Feb 22, 2021

☔ The latest upstream changes (presumably #81732) made this pull request unmergeable. Please resolve the merge conflicts.

JohnTitor added a commit to JohnTitor/rust that referenced this pull request Feb 22, 2021
…ray, r=dtolnay

Add internal `collect_into_array[_unchecked]` to remove duplicate code

Unlike the similar PRs  rust-lang#69985, rust-lang#75644 and rust-lang#79659, this PR only adds private functions and does not propose any new public API. The change is just for the purpose of avoiding duplicate code.

Many array methods already contained the same kind of code and there are still many array related methods to come (e.g. `Iterator::{chunks, map_windows, next_n, ...}`, `[T; N]::{cloned, copied, ...}`, ...) which all basically need this functionality. Writing custom `unsafe` code for each of those doesn't seem like a good idea. I added two functions in this PR (and not just the `unsafe` version) because I already know that I need the `Option`-returning version for `Iterator::map_windows`.

This is closely related to rust-lang#81615. I think that all options listed in that issue can be implemented using the function added in this PR. The only instance where `collect_array_into` might not be general enough is when the caller want to handle incomplete arrays manually. Currently, if `iter` yields fewer than `N` items, `None` is returned and the already yielded items are dropped. But as this is just a private function, it can be made more general in future PRs.

And while this was not the goal, this seems to lead to better assembly for `array::map`: https://rust.godbolt.org/z/75qKTa (CC ``@JulianKnodt)``

Let me know what you think :)

CC ``@matklad`` ``@bstrie``
@bstrie
Copy link
Contributor

bstrie commented Feb 22, 2021

I suspect that this PR wants to become a thin wrapper around the internal API that landed in #82098 , yes?

@crlf0710
Copy link
Member

Triage: what's next steps here?

@JohnCSimon JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 29, 2021
@matklad
Copy link
Member Author

matklad commented Apr 2, 2021

Urgh, seems like I dropped the ball on this one. I feel that the plan outlined in #79659 (comment) is still valid, but I doubt that I'll pursue it to completion soon. So optimistically closing!

@matklad matklad closed this Apr 2, 2021
@ebenali
Copy link

ebenali commented Feb 1, 2024

and... ?

all the little features that I always end up needing and assuming rust supports have closed down PRs...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-const-generics Area: const generics (parameters and arguments) C-feature-request Category: A feature request, i.e: not implemented / a PR. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.