Add Iterator::collect_array method #79659

matklad · 2020-12-03T11:09:18Z

With const-generics, we can now collect exactly N elements out of
the iterator.

Very similar API existed in the itertools crate for quite some time:

https://docs.rs/itertools/0.9.0/itertools/trait.Itertools.html#method.collect_tuple

This needs more thorough tests before merging (esp around Drop & panic), I'll add them if this API as a whole feels like a good idea.

rust-highfive · 2020-12-03T11:09:21Z

r? @sfackler

(rust-highfive has picked a reviewer for you, use r? to override)

matklad · 2020-12-03T11:09:59Z

library/core/src/iter/traits/iterator.rs

+    where
+        Self: Sized,
+    {
+        struct Buf<T, const N: usize> {


Do we have this type somewhere already?

I guess there is one in [T; N]::map() method

Looks like there are more copies in other PRs, like #79451.

Maybe it'd be worth waiting for #75644? Then this might essentially just be [T; N]::try_generate(|_| it.next())...

Seems like we need core::collections::array_vec already :D

There's an item for the next libs meeting to talk about that https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/PRs.20need.20a.20decision.20on.20.60try_generate.60.20or.20.60ArrayVec.60.20or.20.2E.2E.2E/near/218779392 :)

I'm currently writing Iterator::array_chunks and I have +- the same code :)

(though it's a bit more convoluted since DoubleEndedItetator impl requires an ability to push_back)

Unfortunately we rarely get to our needs decision items 🙃

If we keep tabs on these unwind-safe-partially-initialized-array types we're introducing I'd be happy not to block any of these PRs individually on them and run a pass over later to replace them with something more general.

With const-generics, we can now collect exactly `N` elements out of the iterator. Very similar API existed in the itertools crate for quite some time: https://docs.rs/itertools/0.9.0/itertools/trait.Itertools.html#method.collect_tuple

lcnr · 2020-12-03T11:39:33Z

is there a reason to prefer this over a more general FromIterator impl for arrays?

Do you intend to keep this as unstable until we are comfortable adding the impl as that is impl stable, similar to array::IntoIter::new?

matklad · 2020-12-03T11:58:25Z

is there a reason to prefer this over a more general FromIterator impl for arrays?

FromIterator can't file, this method returns an Option (and there's also a can-of-worms question of whether it should return a Result).

Do you intend to keep this as unstable until we are comfortable adding the impl as that is impl stable, similar to array::IntoIter::new?

I think I don't understand the question, could you rephrase it? Shooting in the dark, yes, I expect this to be unstable for quite some time, both because it uses const-generics, and because I think the API itself needs to bake in for some times. However, long term we absolutely should have something like this on stable, in one form or another.

lcnr · 2020-12-03T12:14:14Z

We can add FromIterator for Result<[T; N]; Err> or FromIterator for Option<[T; N]> just fine and have a PR open which implements this: #69985

I think I don't understand the question, could you rephrase it?

My second question was based on my understanding that FromIterator for Option<[T; N]> would be preferable to a separate method. If we do want to add a FromIterator impl here I do not see the use of adding collect_array as well but I still think it would be useful to add an unstable method until FromIterator can be landed, because implementing that trait would be instantly stable.

matklad · 2020-12-03T12:39:19Z

Thanks for clarifying, I indeed misunderstood the question. I'll continue discussion in that PR to not split it over many threads.

matklad · 2020-12-03T12:43:02Z

I still think it would be useful to add an unstable method until FromIterator can be landed, because implementing that trait would be instantly stable.

I don't have a string opinion here, but it indeed might help with nailing down the fiddly Error type, for example!

sfackler · 2020-12-04T00:59:42Z

I don't think there's any prior art for this kind of fallible collection (even the result propagation impls only fail due to an inner Err rather than a size mismatch). The "collect-to-result" impl is already a bit undiscoverable, and I'd worry that a similar impl would be even more awkward to work with since you'd potentially need to explicitly name the error type chosen.

scottmcm · 2020-12-04T01:42:49Z

library/core/src/iter/traits/iterator.rs

+    /// ```
+    #[inline]
+    #[unstable(reason = "new API", issue = "none", feature = "iter_collect_array")]
+    fn collect_array<const N: usize>(self) -> Option<[Self::Item; N]>


pondering: given that this will potentially collect only a prefix, should it be &mut self? The Option is enough to know whether the iterator was exhausted in the call, so one could use .collect_array::<4>() in a loop...

Ah, I think I misunderstood initially... You want to have "exact N" semantics, but you want &mut self at the same time?

This is interesting!

The problem I am seeing here is that, to get exactly N, you'll need to fetch N + 1 elements, so one would be dropped on the floor or stuffed into the Err variant somehow.

I was thinking that it.collect_array::<3>() would be equivalent to try { [it.next()?, it.next()?, it.next()?] }. So you'd get None if the length is shorter than the number requested, but if there's more you'd be able to call again to get more of them.

(Maybe with a rename like sfackler mentions -- this is reminding me of as_chunks and array_chunks on slices, so maybe something like that?)

@scottmcm for iterators, self and &mut self is the same thing - you can always use .by_ref()

@WaffleLapkin Conceptually yes. But in implementation not exactly -- see all these benchmarks for example.

And if this doesn't exhaust the iterator, then it should be &mut self for consistency with other iterator methods that don't necessarily exhaust the iterator (like all and any, which are also &mut self).

sfackler · 2020-12-04T02:00:55Z

One option to avoid dealing with the "fallible collect" question could be to instead treat this as fn next_n<const N: usize>(&mut self) -> Option<[Self::Item; N]>, to make it more like a variant of next than a variant of collect.

matklad · 2020-12-04T08:29:32Z

Answering both #79659 (comment) and #79659 (comment): I think we should do what itertool does.

It has (modernizing API) .next_array<const N: usize>(&mut self) -> Option<[T; N]> and .collect_array<const N: usize>(self) -> Option<[T; N]>, where next gets the first N element and fails only for short iterators, while collect checks that there are exactly N elements.

It just seems wrong for .collect to drop excess elements on the floor, it should ensure that the number of elements matches exactly, just like TryFrom<Vec<T>> for [T; N].

To be clear, I am still not sure whether .collect_array or generic .collect is best for collect API, I am talking only about exact N semantcis here.

matklad · 2020-12-04T08:37:16Z

Urgh, the previous comment actually referes to #69985, as I am confusing the two (didn't have my morning pu'er yet :(). To clarify:

this PR, Add Iterator::collect_array method #79659, implements "exactly N" semantics
the Add FromIterator impl for [T; N] #69985 implements "at least N" semantics (unless I am misreading code), and I think this would be confusing.

CryZe · 2020-12-04T09:00:55Z

Maybe it might make sense to return some kind of Result that lets you decide how you want to deal with the overflowing case, so Result<[T; N], [T; N]> or so with maybe some convenience methods for unwrapping either Ok or Err (maybe as a general method on Result<T, T>, seemingly people want such a method anyway: https://twitter.com/djco/status/1334535545905672192 )

matklad · 2020-12-14T13:30:27Z

Attempting to summarize discussion here:

we need both "take the first n elements" and "check that there are exactly n elements" methods.
the first one should clearly be a fn next_n<const: N: usize>(&mut self) -> ??? dedicated method.
for the latter, we have to choices:
- a dedicated collect_array method
- a collect::<Result<[T; N], _>>() API
collect would be more composable (Iterator<Item=io::Result<String>> -> io::Result<Result<[String; 2], CollectArrayError>>), but it would also be absolutely undiscoverable.

Implementation wise, we def should wait for core::collections::ArrayVec. We might also need to wait for that for stabilization, as the perfect signature for next_n would be

fn next_n<const N: usize>(&mut self) -> Result<[T; N], ArrayVec<T, N - 1>>;

I suggest the following plan for this PR:

repurpose it from "exactly N collect" to "at least N next_n"
land with -> Option<[T; N]> signature with the understanding that we'd revise the return type once ArrayVec work lands
postpone (close) until ArrayVec or equivalent is avaiable

scottmcm · 2020-12-16T07:06:35Z

FWIW, I just had a situation where I wanted exactly next_n in a loop -- trying to do a simd version of Sum<f32> following bluss's idea. I tried at least 4 different approaches, and they all optimized poorly even on a copied slice iter -- even a .size_hint().0 check couldn't make multiple .next()s compile without a check per call.

EDIT: for clarity, I did need partial results for this use, so unlike my other comment I wouldn't have been able to use -> Option<[T; N]>.

A thought if we don't want to wait for ArrayVec -- array::IntoIter would work for this.

EDIT2: That version as

fn next_n<const N: usize>(&mut self) -> Result<[T; N], ArrayVec<T, N - 1>>;

sounds amazing, though -- would be exactly right for the simd-sum.

the8472 · 2021-02-01T22:00:50Z

Due to all the similar PRs floating around #81615 should help to centralize design discussion.

…ray, r=dtolnay Add internal `collect_into_array[_unchecked]` to remove duplicate code Unlike the similar PRs rust-lang#69985, rust-lang#75644 and rust-lang#79659, this PR only adds private functions and does not propose any new public API. The change is just for the purpose of avoiding duplicate code. Many array methods already contained the same kind of code and there are still many array related methods to come (e.g. `Iterator::{chunks, map_windows, next_n, ...}`, `[T; N]::{cloned, copied, ...}`, ...) which all basically need this functionality. Writing custom `unsafe` code for each of those doesn't seem like a good idea. I added two functions in this PR (and not just the `unsafe` version) because I already know that I need the `Option`-returning version for `Iterator::map_windows`. This is closely related to rust-lang#81615. I think that all options listed in that issue can be implemented using the function added in this PR. The only instance where `collect_array_into` might not be general enough is when the caller want to handle incomplete arrays manually. Currently, if `iter` yields fewer than `N` items, `None` is returned and the already yielded items are dropped. But as this is just a private function, it can be made more general in future PRs. And while this was not the goal, this seems to lead to better assembly for `array::map`: https://rust.godbolt.org/z/75qKTa (CC `@JulianKnodt)` Let me know what you think :) CC `@matklad` `@bstrie`

bors · 2021-02-22T07:00:09Z

☔ The latest upstream changes (presumably #81732) made this pull request unmergeable. Please resolve the merge conflicts.

…ray, r=dtolnay Add internal `collect_into_array[_unchecked]` to remove duplicate code Unlike the similar PRs rust-lang#69985, rust-lang#75644 and rust-lang#79659, this PR only adds private functions and does not propose any new public API. The change is just for the purpose of avoiding duplicate code. Many array methods already contained the same kind of code and there are still many array related methods to come (e.g. `Iterator::{chunks, map_windows, next_n, ...}`, `[T; N]::{cloned, copied, ...}`, ...) which all basically need this functionality. Writing custom `unsafe` code for each of those doesn't seem like a good idea. I added two functions in this PR (and not just the `unsafe` version) because I already know that I need the `Option`-returning version for `Iterator::map_windows`. This is closely related to rust-lang#81615. I think that all options listed in that issue can be implemented using the function added in this PR. The only instance where `collect_array_into` might not be general enough is when the caller want to handle incomplete arrays manually. Currently, if `iter` yields fewer than `N` items, `None` is returned and the already yielded items are dropped. But as this is just a private function, it can be made more general in future PRs. And while this was not the goal, this seems to lead to better assembly for `array::map`: https://rust.godbolt.org/z/75qKTa (CC ``@JulianKnodt)`` Let me know what you think :) CC ``@matklad`` ``@bstrie``

bstrie · 2021-02-22T20:22:40Z

I suspect that this PR wants to become a thin wrapper around the internal API that landed in #82098 , yes?

crlf0710 · 2021-03-12T16:54:59Z

Triage: what's next steps here?

matklad · 2021-04-02T12:22:37Z

Urgh, seems like I dropped the ball on this one. I feel that the plan outlined in #79659 (comment) is still valid, but I doubt that I'll pursue it to completion soon. So optimistically closing!

ebenali · 2024-02-01T00:30:15Z

and... ?

all the little features that I always end up needing and assuming rust supports have closed down PRs...

matklad added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. C-feature-request Category: A feature request, i.e: not implemented / a PR. A-const-generics Area: const generics (parameters and arguments) labels Dec 3, 2020

rust-highfive assigned sfackler Dec 3, 2020

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 3, 2020

matklad commented Dec 3, 2020

View reviewed changes

matklad force-pushed the collect_array branch 3 times, most recently from 9cf42aa to 7db95bf Compare December 3, 2020 11:20

Add Iterator::collect_array method

84bc38a

With const-generics, we can now collect exactly `N` elements out of the iterator. Very similar API existed in the itertools crate for quite some time: https://docs.rs/itertools/0.9.0/itertools/trait.Itertools.html#method.collect_tuple

matklad force-pushed the collect_array branch from 7db95bf to 84bc38a Compare December 3, 2020 11:22

matklad mentioned this pull request Dec 3, 2020

Tracking Issue for str_split_once #74773

Closed

4 tasks

matklad mentioned this pull request Dec 3, 2020

Add FromIterator impl for [T; N] #69985

Closed

scottmcm reviewed Dec 4, 2020

View reviewed changes

JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 11, 2021

c410-f3r mentioned this pull request Feb 1, 2021

Add 'core::array::from_fn' and 'core::array::try_from_fn' #75644

Merged

JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 1, 2021

the8472 mentioned this pull request Feb 1, 2021

Provide a means of turning iterators into fixed-size arrays #81615

Open

LukasKalbertodt mentioned this pull request Feb 14, 2021

Add internal collect_into_array[_unchecked] to remove duplicate code #82098

Merged

JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 29, 2021

matklad closed this Apr 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Iterator::collect_array method #79659

Add Iterator::collect_array method #79659

matklad commented Dec 3, 2020

rust-highfive commented Dec 3, 2020

matklad Dec 3, 2020

MikailBag Dec 3, 2020

scottmcm Dec 4, 2020

matklad Dec 4, 2020

scottmcm Dec 4, 2020

WaffleLapkin Dec 16, 2020

KodrAus Feb 12, 2021

lcnr commented Dec 3, 2020

matklad commented Dec 3, 2020

lcnr commented Dec 3, 2020

matklad commented Dec 3, 2020

matklad commented Dec 3, 2020

sfackler commented Dec 4, 2020

scottmcm Dec 4, 2020

matklad Dec 4, 2020

scottmcm Dec 4, 2020

WaffleLapkin Dec 16, 2020

scottmcm Feb 12, 2021

sfackler commented Dec 4, 2020

matklad commented Dec 4, 2020 •

edited

Loading

matklad commented Dec 4, 2020

CryZe commented Dec 4, 2020 •

edited

Loading

matklad commented Dec 14, 2020

scottmcm commented Dec 16, 2020 •

edited

Loading

the8472 commented Feb 1, 2021

bors commented Feb 22, 2021

bstrie commented Feb 22, 2021 •

edited

Loading

crlf0710 commented Mar 12, 2021

matklad commented Apr 2, 2021

ebenali commented Feb 1, 2024

Add Iterator::collect_array method #79659

Add Iterator::collect_array method #79659

Conversation

matklad commented Dec 3, 2020

rust-highfive commented Dec 3, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lcnr commented Dec 3, 2020

matklad commented Dec 3, 2020

lcnr commented Dec 3, 2020

matklad commented Dec 3, 2020

matklad commented Dec 3, 2020

sfackler commented Dec 4, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfackler commented Dec 4, 2020

matklad commented Dec 4, 2020 • edited Loading

matklad commented Dec 4, 2020

CryZe commented Dec 4, 2020 • edited Loading

matklad commented Dec 14, 2020

scottmcm commented Dec 16, 2020 • edited Loading

the8472 commented Feb 1, 2021

bors commented Feb 22, 2021

bstrie commented Feb 22, 2021 • edited Loading

crlf0710 commented Mar 12, 2021

matklad commented Apr 2, 2021

ebenali commented Feb 1, 2024

matklad commented Dec 4, 2020 •

edited

Loading

CryZe commented Dec 4, 2020 •

edited

Loading

scottmcm commented Dec 16, 2020 •

edited

Loading

bstrie commented Feb 22, 2021 •

edited

Loading