Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add support for parallel iteration of query results using Rayon. #17

Merged
merged 4 commits into from
Nov 3, 2021

Conversation

adamreichold
Copy link
Owner

@adamreichold adamreichold commented Oct 8, 2021

No description provided.

@adamreichold
Copy link
Owner Author

adamreichold commented Oct 8, 2021

Performance seems indeed rather mixed, the first variant using ParallelIterator::flat_map does beat the serial version if there are not too many archetypes:

test query_many_archetypes                     ... bench:      64,626 ns/iter (+/- 3,566)
test query_parallel_many_archetypes            ... bench:      45,171 ns/iter (+/- 10,898)

test query_single_archetype                    ... bench:      64,301 ns/iter (+/- 11,162)
test query_parallel_single_archetype           ... bench:      46,401 ns/iter (+/- 8,104)

test query_very_many_small_archetypes          ... bench:       1,249 ns/iter (+/- 93)
test query_parallel_very_many_small_archetypes ... bench:      32,429 ns/iter (+/- 4,760)

The second variant providing an indexed parallel iterator always looses out to the serial version without tuning

test query_many_archetypes                     ... bench:      63,730 ns/iter (+/- 17,927)
test query_parallel_many_archetypes            ... bench:      93,060 ns/iter (+/- 10,848)

test query_single_archetype                    ... bench:      63,785 ns/iter (+/- 6,693)
test query_parallel_single_archetype           ... bench:      88,996 ns/iter (+/- 11,143)

test query_very_many_small_archetypes          ... bench:       1,238 ns/iter (+/- 314)
test query_parallel_very_many_small_archetypes ... bench:      31,583 ns/iter (+/- 4,252)

but it can be tuned using .with_min_len(1024) and this results in

test query_many_archetypes                     ... bench:      63,946 ns/iter (+/- 25,165)
test query_parallel_many_archetypes            ... bench:      55,595 ns/iter (+/- 7,167)

test query_single_archetype                    ... bench:      63,443 ns/iter (+/- 10,203)
test query_parallel_single_archetype           ... bench:      55,543 ns/iter (+/- 11,323)

test query_very_many_small_archetypes          ... bench:       1,231 ns/iter (+/- 127)
test query_parallel_very_many_small_archetypes ... bench:       2,143 ns/iter (+/- 76)

which is a speed-up over the serial version but worse than the first variant except for the case of very many small archetypes where it is much faster than the first variant yet still slower than the serial version.

(The benchmark is probably unrealistic insofar parallel queries would only be used if the body of the system is substantial so that the overhead of iteration itself is less pronounced. But then again, making things slower due to mistakenly opting for parallel iteration is probably also not good usability.)

@adamreichold
Copy link
Owner Author

Performance seems indeed rather mixed

Things improve considerably when a specialised implementation of Producer::fold_with is provided which avoids the overhead of double-ended iteration and thereby provides measurable speeds-up over both the serial version and the first variant in all three cases.

test query_many_archetypes                     ... bench:      63,744 ns/iter (+/- 657)
test query_parallel_many_archetypes            ... bench:      22,249 ns/iter (+/- 4,616)

test query_single_archetype                    ... bench:      63,506 ns/iter (+/- 3,409)
test query_parallel_single_archetype           ... bench:      21,612 ns/iter (+/- 3,644)

test query_very_many_small_archetypes          ... bench:       1,225 ns/iter (+/- 54)
test query_parallel_very_many_small_archetypes ... bench:       1,062 ns/iter (+/- 45)

@adamreichold adamreichold changed the title Add support for parallel iteration of query results using Rayon. RFC: Add support for parallel iteration of query results using Rayon. Oct 9, 2021
@adamreichold adamreichold marked this pull request as ready for review October 9, 2021 00:05
@mlange-42
Copy link
Collaborator

👍

@adamreichold adamreichold merged commit a184536 into main Nov 3, 2021
@adamreichold adamreichold deleted the rayon branch November 3, 2021 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants