Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add par_extend, partition, unzip, and unzip_into #326

Merged
merged 11 commits into from
May 3, 2017

Conversation

cuviper
Copy link
Member

@cuviper cuviper commented Apr 26, 2017

  • ParallelExtend mirrors std::iter::Extend, with one method par_extend. It works basically like FromParallelIterator, except it adds to an existing container instead of creating a new one. In fact, most of the FromParallelIterators we implement now just create an empty container and then par_extend it.
  • ParallelIterator::partition mirrors Iterator::partition, separating the iterator into two ParallelExtend containers based on the user's predicate function.
  • ParallelIterator::unzip mirrors Iterator::unzip, separating an iterator over (A, B) into two ParallelExtend containers, one with just A and the other with just B.
  • IndexedParallelIterator::unzip_into is analogous to collect_into, writing directly into the target vectors.

@cuviper
Copy link
Member Author

cuviper commented Apr 26, 2017

@nikomatsakis In the wiki you said that Iterator::partition really ought to allow two distinct output types. It wouldn't be very hard for me to do that here if you want.

@cuviper
Copy link
Member Author

cuviper commented Apr 26, 2017

The nightly failures are rust-lang/rust#41479. It does work with nightly-2017-04-21.


fn opt_len(&mut self) -> Option<usize> {
self.base.opt_len()
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one got me thinking. It's fine to leak the source-iterator's index-ness through opt_len() here, and it should result in the optimization we want. However, with an eye on moving that mechanism to specialization, I can't actually implement a true IndexedParallelIterator for this!

I could implement IndexedParallelIterator::drive(), but not with_producer() because of the iterator requirement in Producer. We need to stay in "push" mode here to feed values into multiple consumers, but once we flip to an iterator we're left pulling. And that means we can't support zip et al. on this -- which I wouldn't expect in a par_extend() implementation anyway, but with specialization one could encounter that.

Maybe this is a distinction which deserves to split up the traits again. An IndexedParallelIterator that can drive(), enumerate(), etc., and a subset ProducableParallelIterator that supports with_producer(), zip(), etc.

Ugh.

(Incidentally, this "push vs. pull" is basically why Iterator::unzip requires Extend instead of FromIterator.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. It's also worth considering whether using specialization is worth it versus the current opt_len() trick.

@cuviper
Copy link
Member Author

cuviper commented Apr 27, 2017

There's also Itertools::partition_map we could emulate, which can map to different item types and allows different collection types. That would be yet another full implementation, where unzip.rs and partition.rs are already very similar... maybe they can be merged and abstracted so that only the Folder::consume is really different. Kind of like we used to with MapOp, but this code is trickier and also hidden from the API, so it's more favorable to sharing.

@cuviper
Copy link
Member Author

cuviper commented Apr 27, 2017

I merged that common functionality, generalized partition, and added partition_map.

@cuviper
Copy link
Member Author

cuviper commented Apr 28, 2017

Added FromParallelIterator<&char> for String since Rust 1.17 now has it -- rust-lang/rust#40028.
(not directly related to this PR, but it touches the same code as extend, so consider it a bonus...)

Copy link
Member

@nikomatsakis nikomatsakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome. I had plumb forgot that we still had a few things unchecked (and that they included such generally useful things as unzip() and partition()!). We should open issues for the remainder (which just appears to be scan() and cycle() -- scan() may just not make sense, cycle() seems achievable though).



/// Unzip an `IndexedParallelIterator` into two arbitrary `Consumer`s.
pub fn unzip_indexed<I, A, B, CA, CB>(pi: I, left: CA, right: CB)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these intended to be Rayon public APIs? (I'd appreciate a comment if not.)

@@ -39,6 +40,26 @@ fn special_extend<I, T>(pi: I, len: usize, v: &mut Vec<T>)
collect.complete();
}

/// Unzips the results of the exact iterator into the specified vectors.
pub fn unzip_into<I, A, B>(mut pi: I, left: &mut Vec<A>, right: &mut Vec<B>)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these intended to be Rayon public APIs? (I'd appreciate a comment if not.)


fn opt_len(&mut self) -> Option<usize> {
self.base.opt_len()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. It's also worth considering whether using specialization is worth it versus the current opt_len() trick.

{
let mut result = None;
{
// Now it's time to find the consumer for non-matching items
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clever trick, I will say. I wonder if we can use some internal traits + generics to "package up" this pattern rather than repeating it. It would probably make things less clear though, although it would provide a good place to document what is going on.

@@ -1,15 +1,24 @@
use super::internal::*;
use super::*;

trait UnzipOp<T>: Sync {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think perhaps you were a step ahead of me. Good place for comments, though!

src/iter/mod.rs Outdated
where C: Default + ParallelExtend<Self::Item>,
fn partition<A, B, P>(self, predicate: P) -> (A, B)
where A: Default + ParallelExtend<Self::Item>,
B: Default + ParallelExtend<Self::Item>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay! My only hesitation is that it means that transition to rayon may require more type annotations. e.g., with seq iterators, people can do:

let (x, y): (Vec<_>, _) = iter.partition();

maybe we should add another method for this case? (We could then propose the same method for seq iterators, I suppose). But I don't know what would be a good name.

src/iter/mod.rs Outdated
/// Partitions and maps the items of a parallel iterator into a pair of
/// arbitrary `ParallelExtend` containers. `Either::Left` items go into
/// the first container, and `Either::Right` items go into the second.
fn partition_map<A, B, P, L, R>(self, predicate: P) -> (A, B)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is meant for parity with itertools, I guess? Makes sense. I was wondering if we should link directly to itertools, so we can use the same Either type, but it's problematic that it is at only 0.6 I suppose, and it seems a bit silly to add an itertools dependency just for Either.

(cc @bluss -- as an aside, any reason itertools can't be 1.0?)

@nikomatsakis
Copy link
Member

So the only real question that came up in my review is whether to have partition() return two collections or not. It depends how much we care about compatibility. We could include a note in the docs that our signature is more flexible that the iterator one and hence may require an additional type annotation. I think I'm inclined to do that for now, and just leave it the way you have it.

(I have definitely needed two distinct collection types in the past and been annoyed that I can't have it.)

@cuviper
Copy link
Member Author

cuviper commented Apr 30, 2017

Yeah, I'm ok with the possible ambiguity, since it would only bite at compile time, not a runtime surprise, and it's simple to correct.

I'll add more comments where requested.

@nikomatsakis
Copy link
Member

r=me, after rebase

@cuviper cuviper merged commit 81e3d2c into rayon-rs:master May 3, 2017
@bluss
Copy link
Contributor

bluss commented Aug 4, 2017

Sorry for the very late reply 😄. Itertools is working towards 1.0, it doesn't feel very far off.

@bluss
Copy link
Contributor

bluss commented Aug 4, 2017

That said we don't have a "tracking issue" for it. One reason might be that if std adopts anything from itertools using the same method name, we have a problem.

@cuviper cuviper deleted the extend-unzip-partition branch September 23, 2017 04:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants