Track `execute()` and `enqueue()` tasks separately from scheduled tasks. #2645

gmilos · 2024-02-12T17:03:17Z

Motivation:

SelectableEventLoop uses ScheduledTask to track tasks scheduled for some specific point in time, but also tasks that are supposed to be executed immediately.

This casuse performance issues where the number of immediate tasks grows, as they need to be kept sorted in the _scheduledTasks PriorityQueue. Additionally, that datastructure is protected by lock(s) which casues lock contention from other threads.

Finally, immediate tasks differ from future scheduled tasks in that they don't have a failFn to handle destruction at the end of SelectableEventLoop.run() execution.

Modifications:

Modify ScheduledTask to only track closure tasks and require failFn to be provided.
Retain UnderlyingTask for tracking closure or ErasedUnownedJob for async-await tasks.
Add SelectableEventLoop._immediateTasks to track immediate tasks (collection of UnderlyingTask-s).
Split production / consumption of the tasks to match the datastructure changes described above.
Modify the defer block of SelectableEventLoop.run() to execute immediate tasks, instead of dropping them silently.

Result:

Performance of workloads relying on queuing lots of tasks gets better.

gmilos · 2024-02-12T18:56:55Z

The test failure is ... because the # of allocations for less than expected. By about 1k. That's a good thing.

17:35:51  ++ assert_greater_than 155002 156050
17:35:51  ++ [[ ! 155002 -gt 156050 ]]

And ... I see the fix just got merged: e4102ae#diff-660361849e6074a975e1787c5836a6138eef55096f2b80fb76d5da0af73b48d5R40

gmilos · 2024-02-12T18:57:33Z

@weissi this is the change we discussed in person.

gmilos · 2024-02-12T18:59:37Z

@swift-server-bot test this please

Lukasa

Comment in the diff

Sources/NIOPosix/SelectableEventLoop.swift

Lukasa · 2024-02-12T20:30:59Z

Sources/NIOPosix/SelectableEventLoop.swift

+                // therefore the best course of action is to run them.
+                for task in immediateTasksCopy {
+                    self.run(task)
+                }


This is a fairly substantial behavioural change. Previously these tasks got dropped. Normally I’d say the logic here is good but we should prepare ourselves for a few subtle bugs popping out of this.

Yes, true. But without this, the application would have their Task-s/functions not executed at all. I don't think any behaviour can be worse than that. We haven't been noticing, because before Tasks this wasn't frequently exercised. But yes, there could be some unintended consequences.

Agreed it's a substantial change but it's definitely a good one. I think this will actually make tearing down an EventLoop much easier than it was before this patch. Previously we just couldn't implement this sensibly, now we can.

But of course, it'll likely trigger a few subtle changes here or there but I can't see this changing anything that was reliably working fine for the worse.

Sources/NIOPosix/SelectableEventLoop.swift

weissi · 2024-02-12T21:25:10Z

@swift-nio-bot test perf please

weissi · 2024-02-12T21:25:16Z

@swift-nio-bot perf test please

weissi · 2024-02-12T21:29:52Z

@gmilos nice!

asyncwriter_single_writes_1M_times | 1.464787299 | 1.596832264 | current | -8%

8% better in one of the perf tests that actually uses async stuff. That's expected to be one of the few ones because most perf tests won't use execute. But that one will!

Weirdly, it claims that this one is 8% worse

schedule_and_run_100k_tasks | 0.252803345 | 0.233980813 | previous | 8%

gmilos · 2024-02-13T06:52:49Z

@weissi how much signal is in the performance tests? Looks like there are a few that did regress, including:

lock_8_threads_10M_ops | 0.957658591 | 0.873560162 | previous | 9%

gmilos · 2024-02-13T06:55:59Z

And another:

future_whenallcomplete_100k_deferred_on_loop | 0.084619949 | 0.08085791 | previous | 4%

There seems to be a pattern where the tests with large queues degraded in performance. I'll take a look.

Motivation: `SelectableEventLoop` uses `ScheduledTask` to track tasks scheduled for some specific point in time, but also tasks that are supposed to be executed immediately. This casuse performance issues where the number of immediate tasks grows, as they need to be kept sorted in the `_scheduledTasks` `PriorityQueue`. Additionally, that datastructure is protected by lock(s) which casues lock contention from other threads. Finally, immediate tasks differ from future scheduled tasks in that they don't have a `failFn` to handle destruction at the end of `SelectableEventLoop.run()` execution. Modifications: * Modify `ScheduledTask` to only track closure tasks and require `failFn` to be provided. * Retain `UnderlyingTask` for tracking closure or `ErasedUnownedJob` for async-await tasks. * Add `SelectableEventLoop._immediateTasks` to track immediate tasks (collection of `UnderlyingTask`-s). * Split production / consumption of the tasks to match the datastructure changes described above. * Modify the `defer` block of `SelectableEventLoop.run()` to execute immediate tasks, instead of dropping them silently. Result: Performance of workloads relying on queuing lots of tasks gets better.

gmilos · 2024-02-13T10:05:40Z

@swift-nio-bot perf test please

gmilos · 2024-02-13T10:27:00Z

I reviewed the test responsible for:

lock_8_threads_10M_ops | 0.957658591 | 0.873560162 | previous | 9%

This doesn't exercise my code change in any meaningful way. So the degradation must be due to an unrelated difference. I'm reviewing the test @weissi pointed out:

schedule_and_run_100k_tasks | 0.252803345 | 0.233980813 | previous | 8%

This may be down to me.

gmilos · 2024-02-13T12:00:13Z

I reviewed:

schedule_and_run_100k_tasks | 0.252803345 | 0.233980813 | previous | 8%

In the current version of the PR (which isn't the same as what was perf tested before), I see ~2x improvement. I had to raise the number of repeats by 100x to make it human scale. I also recorded sample. Which point to better arc performance with the simpler task type stored in ScheduledTask.

gmilos · 2024-02-13T12:00:33Z

@swift-server-bot test this please

weissi · 2024-02-13T12:01:34Z

@swift-server-bot test perf please

weissi · 2024-02-13T12:01:41Z

@swift-server-bot perf test please

weissi · 2024-02-13T12:01:46Z

@swift-server-bot add to allowlist

weissi · 2024-02-13T14:26:35Z

@Lukasa the perf tests are broken: Both of the results are from 3rd Jan (run id 156). Don't know why it still posts them as if they were current

Match 5.10 change made here: apple@e4102ae#diff-660361849e6074a975e1787c5836a6138eef55096f2b80fb76d5da0af73b48d5R40

…bleEventLoop down.

gmilos · 2024-02-13T19:05:16Z

I did a manual performance test run on my desktop. I run perf tests on main & my branch. They show major wins:

future_whenallsucceed_10k_deferred_off_loop	0.011824291	0.0222875	current	-46%
future_whenallcomplete_10k_deferred_off_loop	0.005605708	0.016276791	current	-65%
schedule_and_run_100k_tasks	0.108089916	0.200684584	current	-46%
execute_100k_tasks	0.014859583	0.153192833	current	-90%

There are couple of regressions also:
1.

lock_8_threads_10M_ops	0.215040375	0.18947725	previous	13%

but it looks like the lock tests are noisy (some of the others go faster, so it's probably just natural variability in the OS locking)

schedule_100k_tasks	0.037446041	0.033186875	previous	12%

When I started looking into the problem, I realised the test is badly implemented. The test prewarms the heap with 100k tasks, runs them to completion, and hands over for the measured test run. But then, the test loop is run 10x, thus placing 10x as many tasks into the heap than what the prewarm did. We're therefore testing the efficiency of heap doubling and not just the scheduling. I was able to prove this is indeed responsible for the test instability by running it with much bigger set of tasks:

measuring: schedule_10000k_tasks: 3.708622625, 4.334766542, 3.842835292, 4.574104792, 7.119135958, 4.038103833, 4.324497875, 4.464178167, 4.49384075, 4.640537542,

The runs range from 3.7 to 7.1.

I'll try to fix this test separately from this PR.

I'm also attaching the full manual perf test.

With all of this, @Lukasa the PR is ready for re-review.

swift-nio-perf-test-comparison-82fd942745b11ccebbf0db3e9e4bf150b60e5e44-to-e05c0c73d658ea6c4a38df866294f00941479b55.md

gmilos · 2024-02-13T19:11:07Z

@swift-server-bot test this please

gmilos · 2024-02-13T20:05:53Z

@swift-server-bot test this please

gmilos · 2024-02-13T21:20:47Z

@swift-server-bot test this please

weissi · 2024-02-13T23:11:58Z

I did a manual performance test run on my desktop. I run perf tests on main & my branch. They show major wins:

future_whenallsucceed_10k_deferred_off_loop	0.011824291	0.0222875	current	-46%
future_whenallcomplete_10k_deferred_off_loop	0.005605708	0.016276791	current	-65%
schedule_and_run_100k_tasks	0.108089916	0.200684584	current	-46%
execute_100k_tasks	0.014859583	0.153192833	current	-90%

nice!!


There are couple of regressions also: 1.

lock_8_threads_10M_ops 0.215040375 0.18947725 previous 13%


but it looks like the lock tests are noisy (some of the others go faster, so it's probably just natural variability in the OS locking)

yes, ignore the locks one for this. They depend on the contention reached.

schedule_100k_tasks	0.037446041	0.033186875	previous	12%
When I started looking into the problem, I realised the test is badly implemented. The test prewarms the heap with 100k tasks, runs them to completion, and hands over for the measured test run. But then, the test loop is run 10x, thus placing 10x as many tasks into the heap than what the prewarm did. We're therefore testing the efficiency of heap doubling and not just the scheduling. I was able to prove this is indeed responsible for the test instability by running it with much bigger set of tasks:
measuring: schedule_10000k_tasks: 3.708622625, 4.334766542, 3.842835292, 4.574104792, 7.119135958, 4.038103833, 4.324497875, 4.464178167, 4.49384075, 4.640537542,
The runs range from 3.7 to 7.1.

I'll try to fix this test separately from this PR.

awesome, thank you!!

Co-authored-by: Johannes Weiss <johannesweiss@apple.com>

gmilos · 2024-02-14T15:14:38Z

@weissi that's all the feedback handled. lmk if there is any more.

gmilos · 2024-02-14T17:12:58Z

@swift-server-bot test this please

gmilos · 2024-02-14T18:02:46Z

@swift-server-bot test this please

weissi

Thank you! That looks good to me.

…tions

Lukasa · 2024-03-01T11:14:38Z

From a "strict semver" this is a patch change, but I'm going to take some editorial discretion and mark it "minor". We're making a subtle but observable behavioural change, and while it will probably be fine I want to make it possible for users to ensure that they don't have to deal with both variants by calling this a minor instead of a patch.

Lukasa

Generally looking really nice. I have a few small refactoring suggestions, as we're in the space and you've already started with refactors ;)

Sources/NIOPosix/SelectableEventLoop.swift

Lukasa · 2024-03-01T11:28:34Z

Sources/NIOPosix/SelectableEventLoop.swift

+                    }
+                }
+
+                assert(self.tasksCopy.count <= Self.taskCopyBatchSize)


This assertion seems like it is straightforward to break. We can add two tasks in each loop iterator above, but we only check the length once. I think the loop above needs an extra length check.

Good catch. Reorganising the loop to check the length of the taskCopy array at each point we're mutating (adding to) it.

Heh, I tried to write a test for this, and realised the code before was fine, just. It's because:

if _scheduledTasks and _immediateTasks are non-empty, we'd be adding 2 items to the tasksCopy each time round the loop. And taskCopyBatchSize is even, so one test every 2 items is sufficient

if either _scheduledTasks or _immediateTasks run out of tasks, we'd be testing the size of tasksCopy once per each addition. This applies even if we just run out when we're terminating the batch.

This wasn't intentional micro-optimisation, so I'm adding the additional length check back to make the code safer against future refactoring (including making the size of the batch odd).

Yeah my concern was very much about future refactors 😉

Lukasa · 2024-03-01T11:39:31Z

Sources/NIOPosix/SelectableEventLoop.swift

+                    immediateTasksCopy.reserveCapacity(self._immediateTasks.count)
+                    while let immediate = self._immediateTasks.popFirst() {
+                        immediateTasksCopy.append(immediate)
+                    }


Instead of this while let, can we use swap?

This is removing tasks from self._immediateTasks and putting them to a different array immediateTasksCopy. So swap isn't applicable, unless there is some other swap I'm not aware of.

Ah, this swap. Adopted here: 8b818e6#diff-565b354476ebe678da3be159a85e33e0b7630b1e45b62b20efe506210c1d62faR632

…ask popping.

…ing-optimisations' into gm-selectableeventloop-job-queueing-optimisations

…p-job-queueing-optimisations

…Deque growth properties

gmilos · 2024-03-05T08:18:40Z

Looks like Deque has different growth properties in comparison to CircularBuffer. I had to bump some alloc count budgets to account for that.

…th 5.10

gmilos · 2024-03-05T09:55:42Z

Ah, Deque has 1.5 growth factor: https://github.com/apple/swift-collections/blob/ca8b4ab855f4b8075c1fd29eb50db756b1688e61/Sources/DequeModule/Deque._Storage.swift#L161, it doesn't appear to be configurable.

Lukasa · 2024-03-05T10:16:17Z

Sources/NIOPosix/SelectableEventLoop.swift

+
+
+        // nextScheduledTaskDeadline is the overall next deadline, but iff there are no more immediate tasks left.
+        return moreImmediateTasksToConsider ? now : nextScheduledTaskDeadline


Will we ever hit now? We can only get here if both moreImmediateTasksToConsider and moreScheduledTasksToConsider are both false, which should imply we never hit now.

Ah, yes. This changed because the batch size checks got moved to each of the if-s, and not the loop overall. I think the statement is harmless, but can be simplified. Checking it carefully.

The statement was indeed harmless, because while it'd always return the deadline for next scheduled task, this deadline is only enacted iff there are no more tasks to run. In all other cases we have to recheck, because more tasks may get queued up. Still, this deserved a cleanup, with now returned if there are any more immediate tasks left and an associated assert. Also, I added a test that tries to get immediate tasks stuck (which never reproduced a problem, but useful defence for the future).

…tions

gmilos requested a review from Lukasa February 12, 2024 18:57

gmilos marked this pull request as ready for review February 12, 2024 18:57

Lukasa requested changes Feb 12, 2024

View reviewed changes

gmilos added 2 commits February 13, 2024 10:04

Alternate immediate & scheduled tasks pickup, to avoid starvation

1b7ad7d

gmilos force-pushed the gm-selectableeventloop-job-queueing-optimisations branch from daf6e3d to 1b7ad7d Compare February 13, 2024 10:04

gmilos added 2 commits February 13, 2024 15:41

Adjust allocation limit for nighly.

9237e57

Match 5.10 change made here: apple@e4102ae#diff-660361849e6074a975e1787c5836a6138eef55096f2b80fb76d5da0af73b48d5R40

Reuse same arrays for scheduled/immediate tasks when shutting Selecta…

a51573a

…bleEventLoop down.

gmilos force-pushed the gm-selectableeventloop-job-queueing-optimisations branch from e05c0c7 to a51573a Compare February 13, 2024 15:41

Update Sources/NIOPosix/SelectableEventLoop.swift

e3d4edc

Co-authored-by: Johannes Weiss <johannesweiss@apple.com>

weissi approved these changes Feb 15, 2024

View reviewed changes

Merge branch 'main' into gm-selectableeventloop-job-queueing-optimisa…

c40f0c2

…tions

This was referenced Feb 27, 2024

Remove unreliable SchedulingBenchmark #2650

Merged

Fix CoW performance bug in NIOThreadPool work queue #2669

Merged

Merge branch 'main' into gm-selectableeventloop-job-queueing-optimisa…

6b893ca

…tions

Lukasa added semver/patch No public API change. semver/minor Adds new public API. and removed semver/patch No public API change. labels Mar 1, 2024

Lukasa reviewed Mar 1, 2024

View reviewed changes

gmilos added 3 commits March 4, 2024 16:10

Review comments, and associated refactoring for SelectableEventLoop t…

3f66019

…ask popping.

Merge remote-tracking branch 'gmilos/gm-selectableeventloop-job-queue…

b36f381

…ing-optimisations' into gm-selectableeventloop-job-queueing-optimisations

Merge remote-tracking branch 'origin/main' into gm-selectableeventloo…

7b55f61

…p-job-queueing-optimisations

gmilos requested a review from Lukasa March 4, 2024 16:17

gmilos added 3 commits March 4, 2024 16:52

Use 'swap' to drain 'immediateTasks' at SelectableEventLoop shutdown.

8b818e6

Put tasksCopy on SelectableEventLoop to improve alloc count again

c7a6ece

Bump allocation count for scheduling_10000_executions to account for …

4fb9f15

…Deque growth properties

gmilos force-pushed the gm-selectableeventloop-job-queueing-optimisations branch from 0ab81f4 to 4fb9f15 Compare March 5, 2024 08:17

Set MAX_ALLOCS_ALLOWED_1_reqs_1000_conn=393000 for nightly in line wi…

b8861d9

…th 5.10

Lukasa reviewed Mar 5, 2024

View reviewed changes

gmilos added 2 commits March 5, 2024 13:49

Improve deadline handling & asserts for SelectableEventLoop

7555cc5

Merge branch 'main' into gm-selectableeventloop-job-queueing-optimisa…

e829497

…tions

Lukasa approved these changes Mar 5, 2024

View reviewed changes

gmilos merged commit 5e47077 into apple:main Mar 5, 2024
10 checks passed



		// nextScheduledTaskDeadline is the overall next deadline, but iff there are no more immediate tasks left.
		return moreImmediateTasksToConsider ? now : nextScheduledTaskDeadline

Track execute() and enqueue() tasks separately from scheduled tasks. #2645

Track execute() and enqueue() tasks separately from scheduled tasks. #2645

Conversation

gmilos commented Feb 12, 2024

gmilos commented Feb 12, 2024

gmilos commented Feb 12, 2024

gmilos commented Feb 12, 2024

Lukasa left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gmilos Feb 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

weissi commented Feb 12, 2024

weissi commented Feb 12, 2024

weissi commented Feb 12, 2024 • edited Loading

gmilos commented Feb 13, 2024

gmilos commented Feb 13, 2024

gmilos commented Feb 13, 2024

gmilos commented Feb 13, 2024

gmilos commented Feb 13, 2024

gmilos commented Feb 13, 2024

weissi commented Feb 13, 2024

weissi commented Feb 13, 2024

weissi commented Feb 13, 2024

weissi commented Feb 13, 2024

gmilos commented Feb 13, 2024

gmilos commented Feb 13, 2024

gmilos commented Feb 13, 2024

gmilos commented Feb 13, 2024

weissi commented Feb 13, 2024

gmilos commented Feb 14, 2024

gmilos commented Feb 14, 2024

gmilos commented Feb 14, 2024

weissi left a comment

Choose a reason for hiding this comment

Lukasa commented Mar 1, 2024

Lukasa left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gmilos commented Mar 5, 2024

gmilos commented Mar 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gmilos Mar 5, 2024 • edited Loading

Choose a reason for hiding this comment

Track `execute()` and `enqueue()` tasks separately from scheduled tasks. #2645

Track `execute()` and `enqueue()` tasks separately from scheduled tasks. #2645

gmilos Feb 12, 2024 •

edited

Loading

weissi commented Feb 12, 2024 •

edited

Loading

gmilos Mar 5, 2024 •

edited

Loading