Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unreliable SchedulingBenchmark #2650

Merged
merged 4 commits into from
Mar 1, 2024

Conversation

gmilos
Copy link
Contributor

@gmilos gmilos commented Feb 14, 2024

Motivation:

At the moment SchedulingBenchmark preheats single EL to a specified number of tasks. However, the actual performance test runs number of times, and EL doesn't get drained in between the runs.

There are two problems with it:

  • perf test will trigger task heap doublings, which the preheating aims to avoid
  • each test run will be working with proportionally deeper heap, which means the run times are going to grow with each run

I proposed a fix, which made the benchmark work reliably, but @weissi and then others come to a consensus we're better off without it.

Modifications:

  • SchedulingBenchmark removed along with the dispatch site.

Result:

  • No schedule-but-dont-run perf tests.

Motivation:

At the moment `SchedulingBenchmark` preheats single EL to a specified number of tasks. However, the actual performance test runs number of times, and EL doesn't get drained in between the runs.

There are two problems with it:
* perf test will trigger task heap dublings, which the preheating aims to avoid
* each test run will be working with proportionally deeper heap, which means the run times are going to grow with each run

Modifications:

I plumbed through the # of runs to the `Benchmark.setUp`, and prepare ELG with # of ELs that match the expected number of runs.

Result:

Performance test will be more reliable.
for _ in 0..<self.numTasks {
self.loop.scheduleTask(in: .nanoseconds(0)) {
counter &+= 1
func setUp(runs: Int) throws {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmilos instead of introducing the runs everywhere, can we not just use an ELG with just 1 thread? Then it should be fine, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, hang on, it already tied it to exactly one EventLoop self.loop

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole test is completely busted. We need to either fix it (wait for the scheduled tasks) or just delete this perf test, currently it doesn't add value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, here we can see that this is not useful

schedule_100k_tasks 0.063766844 0.105724529 0.07314636890000001 0.012951859063592227

min runtime: 0.063s
mean runtime: 0.073s
max runtime: 0.105s
std deviation: 0.012 s

that's way too high of a std dev to be useful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weissi where did you get these test results? #2650 (comment)

It looks like it's still the broken (3rd Jan re-copied) test results, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd strongly suggest to delete this test. If we don't delete this test, then the only thing we can/should do is to spawn 1 EventLoopThread in each test's setUp. Relying on round robin and messing with existing groups (by enqueuing 100k tasks that will never run) is not a good idea, especially if there's a possibility that other perf tests are running on the same loop.

But again: I'd say we should delete the test, I don't think it adds value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine by me.

Copy link
Contributor Author

@gmilos gmilos Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FranzBusch do you have opinions (seems like you added it originally in #2009)? If you're happy to delete, we can do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine by me as well.

@weissi
Copy link
Member

weissi commented Feb 14, 2024

[...] runs number of times, and EL doesn't get drained in between the runs.

@gmilos that's the actual issue here. It should be completely drained after each run.

[...]
Modifications:

I plumbed through the # of runs to the Benchmark.setUp, and prepare ELG with # of ELs that match the expected number of runs.

That doesn't sound ideal as if it's actually the case that the others aren't drained, then this will now cause high CPU load and expects us to have #runs CPUs available etc.

@weissi
Copy link
Member

weissi commented Feb 14, 2024

@swift-nio-bot test perf please

@swift-server-bot
Copy link

performance report

build id: 156

timestamp: Wed Jan 3 13:45:50 UTC 2024

results

nameminmaxmeanstd
write_http_headers 0.042907723 0.043165247 0.042973653 9.70556125379205e-05
http_headers_canonical_form 0.10455533 0.10730633 0.10511068750000001 0.000809335616920894
http_headers_canonical_form_trimming_whitespace 0.020678702 0.021188336 0.0207580329 0.00015479947182040933
http_headers_canonical_form_trimming_whitespace_from_short_string 0.018708244 0.01923846 0.0187952384 0.00015906405380027787
http_headers_canonical_form_trimming_whitespace_from_long_string 0.030301067 0.030804568 0.030385388800000003 0.00015088716639322867
bytebuffer_write_12MB_short_string_literals 0.143270983 0.14943897 0.1441855301 0.0018536697851552317
bytebuffer_write_12MB_short_calculated_strings 0.067587874 0.069486342 0.0687012671 0.0005389776086500247
bytebuffer_write_12MB_medium_string_literals 0.938363651 0.97485219 0.9508204281999999 0.013250425598645685
bytebuffer_write_12MB_medium_calculated_strings 0.086556923 0.089021016 0.0870135612 0.000731880891976859
bytebuffer_write_12MB_large_calculated_strings 0.163417139 0.164472042 0.1641449972 0.0003404484955280193
bytebuffer_lots_of_rw 0.044265314 0.044929763 0.044431870299999995 0.00023316710060290754
bytebuffer_write_http_response_ascii_only_as_string 0.029828004 0.030381939 0.0299376602 0.00016310420389622758
bytebuffer_write_http_response_ascii_only_as_staticstring 0.029231652 0.029859072 0.0294445389 0.00017518514336073792
bytebuffer_write_http_response_some_nonascii_as_string 0.028767805 0.029312969 0.0288888285 0.00021134165086169015
bytebuffer_write_http_response_some_nonascii_as_staticstring 0.028939677 0.030695064 0.029339388700000003 0.0005196977050649629
no-net_http1_1k_reqs_1_conn 0.011615747 0.012100875 0.0117296609 0.00013522579434613994
http1_1k_reqs_1_conn 0.060492661 0.061901803 0.0612277168 0.0004381693782502116
http1_1k_reqs_100_conns 0.090465821 0.090860192 0.0906602483 0.00011466376957197908
future_whenallsucceed_100k_immediately_succeeded_off_loop 0.080549118 0.082637506 0.081468365 0.0007871206764443573
future_whenallsucceed_100k_immediately_succeeded_on_loop 0.080940765 0.088269588 0.0824305449 0.002134804165856356
future_whenallsucceed_10k_deferred_off_loop 0.023354389 0.023773316 0.023462138 0.0001324403603740188
future_whenallsucceed_10k_deferred_on_loop 0.014468765 0.014600609 0.0145289954 4.971332905815584e-05
future_whenallcomplete_100k_immediately_succeeded_off_loop 0.040924739 0.041610577 0.041152195600000004 0.00023931122711101126
future_whenallcomplete_100k_immediately_succeeded_on_loop 0.041419036 0.041985228 0.0416712287 0.0001902419231779779
future_whenallcomplete_10k_deferred_off_loop 0.016106523 0.017959537 0.0167431273 0.0006612598717008233
future_whenallcomplete_100k_deferred_on_loop 0.084619949 0.087602916 0.08551536779999999 0.0008921542183957266
future_reduce_10k_futures 0.017307059 0.017845536 0.0175047041 0.00015765937100206426
future_reduce_into_10k_futures 0.015271552 0.015405853 0.0153215517 4.1640959098918226e-05
channel_pipeline_1m_events 0.099658043 0.099798237 0.09974338660000001 4.948307737039316e-05
websocket_encode_50b_space_at_front_100k_frames_cow 0.049749614 0.050189388 0.0498925438 0.00020159322906883591
websocket_encode_50b_space_at_front_1m_frames_cow_masking 0.657249144 0.660868877 0.6581938381000001 0.001124422093879371
websocket_encode_1kb_space_at_front_1m_frames_cow 0.526078369 0.526747823 0.5263413171 0.00018762750014942448
websocket_encode_50b_no_space_at_front_100k_frames_cow 0.050102496 0.05058825 0.0502541197 0.0002134421036789604
websocket_encode_1kb_no_space_at_front_100k_frames_cow 0.052487978 0.052930183 0.052637525500000004 0.0001995756449003469
websocket_encode_50b_space_at_front_100k_frames 0.073903666 0.074350969 0.0741008752 0.00020326936093480533
websocket_encode_50b_space_at_front_10k_frames_masking 0.00889408 0.008927912 0.008907087599999999 9.962841542451471e-06
websocket_encode_1kb_space_at_front_10k_frames 0.012442082 0.012877802 0.0125207999 0.00013005579675017288
websocket_encode_50b_no_space_at_front_100k_frames 0.071874749 0.072902787 0.0723683473 0.0003529541819519128
websocket_encode_1kb_no_space_at_front_10k_frames 0.011704753 0.011798532 0.011730439300000001 3.002173715104898e-05
websocket_decode_125b_10k_frames 0.012596187 0.013051204 0.012710906 0.0001339040473863448
websocket_decode_125b_with_a_masking_key_10k_frames 0.013026622 0.01625301 0.0137532196 0.0011449994208533418
websocket_decode_64kb_10k_frames 0.012870337 0.013384654 0.0130012155 0.00014205029683355097
websocket_decode_64kb_with_a_masking_key_10k_frames 0.013328684 0.013495727 0.0134062059 5.629989460509189e-05
websocket_decode_64kb_+1_10k_frames 0.012897385 0.016614399 0.013328305499999998 0.0011557375071808039
websocket_decode_64kb_+1_with_a_masking_key_10k_frames 0.013289503 0.013809198 0.0134011819 0.00014745835970232413
circular_buffer_into_byte_buffer_1kb 0.033002613 0.033536173 0.0331520546 0.00018518534877924055
circular_buffer_into_byte_buffer_1mb 0.064661982 0.065130012 0.0648244472 0.00020092638953595694
byte_buffer_view_iterator_1mb 0.01756013 0.018081564 0.0176232754 0.00016123574967605693
byte_buffer_view_contains_12mb 0.052910349 0.053560145 0.0531561787 0.00021187141882076117
byte_to_message_decoder_decode_many_small 0.041325565 0.041860639 0.0415019664 0.00023618235080039308
generate_10k_random_request_keys 0.091185533 0.091505915 0.09138744169999999 0.00010962978709684585
bytebuffer_rw_10_uint32s 0.04080077 0.041416125 0.0409719066 0.00021478161702229284
bytebuffer_multi_rw_10_uint32s 0.074633317 0.075221512 0.0748953122 0.00024639593300070653
lock_1_thread_10M_ops 0.151529459 0.152741502 0.1520439887 0.0003694800419021466
lock_2_threads_10M_ops 0.786501782 0.909838773 0.8525907718999999 0.03231466284401043
lock_4_threads_10M_ops 0.937752797 0.959532161 0.9473351966999999 0.007973824451028328
lock_8_threads_10M_ops 0.957658591 0.987632844 0.9778233794 0.008966224334225099
schedule_100k_tasks 0.063766844 0.105724529 0.07314636890000001 0.012951859063592227
schedule_and_run_100k_tasks 0.252803345 0.267169133 0.2608068538 0.004253384282073607
execute_100k_tasks 0.103045814 0.105475272 0.1042747825 0.0009121376774441264
bytebufferview_copy_to_array_100k_times_1kb 0.010984296 0.011033014 0.0109959675 1.4597997512977405e-05
circularbuffer_copy_to_array_10k_times_1kb 0.019746973 0.020199469 0.019804835 0.00013886138398657403
deadline_now_1M_times 0.024568465 0.024832095 0.0246682263 9.23638256450479e-05
asyncwriter_single_writes_1M_times 1.464787299 1.467467645 1.4662272632 0.0008228612221267796
asyncsequenceproducer_consume_1M_times 0.907417083 0.910416828 0.9089906522 0.0010595941502413693
udp_10k_writes 0.37901331 0.379875118 0.3793720076 0.0002815189041030453
udp_10k_vector_writes 0.205883308 0.206418052 0.20622898890000002 0.00016904891221474557
udp_10k_vector_reads 0.386684625 0.387768161 0.3872836356 0.00033373534307398853
udp_10k_vector_reads_and_writes 0.109082179 0.109593621 0.1093604517 0.00017203582618684093
tcp_100k_messages_throughput 0.75330207 0.787823236 0.7734669324000001 0.010813256483347567

comparison

name current previous winner diff
write_http_headers 0.042907723 0.042886202 previous 0%
http_headers_canonical_form 0.10455533 0.106193642 current -1%
http_headers_canonical_form_trimming_whitespace 0.020678702 0.021160017 current -2%
http_headers_canonical_form_trimming_whitespace_from_short_string 0.018708244 0.019237102 current -2%
http_headers_canonical_form_trimming_whitespace_from_long_string 0.030301067 0.031139957 current -2%
bytebuffer_write_12MB_short_string_literals 0.143270983 0.143459794 current 0%
bytebuffer_write_12MB_short_calculated_strings 0.067587874 0.07066772 current -4%
bytebuffer_write_12MB_medium_string_literals 0.938363651 0.94105786 current 0%
bytebuffer_write_12MB_medium_calculated_strings 0.086556923 0.08698647 current 0%
bytebuffer_write_12MB_large_calculated_strings 0.163417139 0.165702724 current -1%
bytebuffer_lots_of_rw 0.044265314 0.043246136 previous 2%
bytebuffer_write_http_response_ascii_only_as_string 0.029828004 0.028208719 previous 5%
bytebuffer_write_http_response_ascii_only_as_staticstring 0.029231652 0.028714732 previous 1%
bytebuffer_write_http_response_some_nonascii_as_string 0.028767805 0.027803065 previous 3%
bytebuffer_write_http_response_some_nonascii_as_staticstring 0.028939677 0.028839596 previous 0%
no-net_http1_1k_reqs_1_conn 0.011615747 0.011778778 current -1%
http1_1k_reqs_1_conn 0.060492661 0.061404357 current -1%
http1_1k_reqs_100_conns 0.090465821 0.09061921 current 0%
future_whenallsucceed_100k_immediately_succeeded_off_loop 0.080549118 0.080259785 previous 0%
future_whenallsucceed_100k_immediately_succeeded_on_loop 0.080940765 0.079877066 previous 1%
future_whenallsucceed_10k_deferred_off_loop 0.023354389 0.023212502 previous 0%
future_whenallsucceed_10k_deferred_on_loop 0.014468765 0.014316848 previous 1%
future_whenallcomplete_100k_immediately_succeeded_off_loop 0.040924739 0.040145402 previous 1%
future_whenallcomplete_100k_immediately_succeeded_on_loop 0.041419036 0.0405237 previous 2%
future_whenallcomplete_10k_deferred_off_loop 0.016106523 0.015676013 previous 2%
future_whenallcomplete_100k_deferred_on_loop 0.084619949 0.08085791 previous 4%
future_reduce_10k_futures 0.017307059 0.016911554 previous 2%
future_reduce_into_10k_futures 0.015271552 0.014511281 previous 5%
channel_pipeline_1m_events 0.099658043 0.101659459 current -1%
websocket_encode_50b_space_at_front_100k_frames_cow 0.049749614 0.049812283 current 0%
websocket_encode_50b_space_at_front_1m_frames_cow_masking 0.657249144 0.668089258 current -1%
websocket_encode_1kb_space_at_front_1m_frames_cow 0.526078369 0.523242559 previous 0%
websocket_encode_50b_no_space_at_front_100k_frames_cow 0.050102496 0.04962388 previous 0%
websocket_encode_1kb_no_space_at_front_100k_frames_cow 0.052487978 0.052218856 previous 0%
websocket_encode_50b_space_at_front_100k_frames 0.073903666 0.072742069 previous 1%
websocket_encode_50b_space_at_front_10k_frames_masking 0.00889408 0.008845607 previous 0%
websocket_encode_1kb_space_at_front_10k_frames 0.012442082 0.012337981 previous 0%
websocket_encode_50b_no_space_at_front_100k_frames 0.071874749 0.07207833 current 0%
websocket_encode_1kb_no_space_at_front_10k_frames 0.011704753 0.011690726 previous 0%
websocket_decode_125b_10k_frames 0.012596187 0.012334842 previous 2%
websocket_decode_125b_with_a_masking_key_10k_frames 0.013026622 0.01274516 previous 2%
websocket_decode_64kb_10k_frames 0.012870337 0.012671642 previous 1%
websocket_decode_64kb_with_a_masking_key_10k_frames 0.013328684 0.013136916 previous 1%
websocket_decode_64kb_+1_10k_frames 0.012897385 0.012642493 previous 2%
websocket_decode_64kb_+1_with_a_masking_key_10k_frames 0.013289503 0.013195296 previous 0%
circular_buffer_into_byte_buffer_1kb 0.033002613 0.033011484 current 0%
circular_buffer_into_byte_buffer_1mb 0.064661982 0.06466184 previous 0%
byte_buffer_view_iterator_1mb 0.01756013 0.017563643 current 0%
byte_buffer_view_contains_12mb 0.052910349 0.052952322 current 0%
byte_to_message_decoder_decode_many_small 0.041325565 0.041571445 current 0%
generate_10k_random_request_keys 0.091185533 0.090277131 previous 1%
bytebuffer_rw_10_uint32s 0.04080077 0.041266035 current -1%
bytebuffer_multi_rw_10_uint32s 0.074633317 0.072410584 previous 3%
lock_1_thread_10M_ops 0.151529459 0.15131291 previous 0%
lock_2_threads_10M_ops 0.786501782 0.820194284 current -4%
lock_4_threads_10M_ops 0.937752797 0.87456686 previous 7%
lock_8_threads_10M_ops 0.957658591 0.873560162 previous 9%
schedule_100k_tasks 0.063766844 0.062177021 previous 2%
schedule_and_run_100k_tasks 0.252803345 0.233980813 previous 8%
execute_100k_tasks 0.103045814 0.099815383 previous 3%
bytebufferview_copy_to_array_100k_times_1kb 0.010984296 0.010981564 previous 0%
circularbuffer_copy_to_array_10k_times_1kb 0.019746973 0.019756913 current 0%
deadline_now_1M_times 0.024568465 0.024640981 current 0%
asyncwriter_single_writes_1M_times 1.464787299 1.596832264 current -8%
asyncsequenceproducer_consume_1M_times 0.907417083 0.885448468 previous 2%
udp_10k_writes 0.37901331 0.375730776 previous 0%
udp_10k_vector_writes 0.205883308 0.204086694 previous 0%
udp_10k_vector_reads 0.386684625 0.38397455 previous 0%
udp_10k_vector_reads_and_writes 0.109082179 0.10824488 previous 0%
tcp_100k_messages_throughput 0.75330207 0.778933674 current -3%

significant differences found

@weissi
Copy link
Member

weissi commented Feb 14, 2024

@swift-nio-bot perf test please

@gmilos
Copy link
Contributor Author

gmilos commented Feb 27, 2024

@weissi re #2650 (comment)

That doesn't sound ideal as if it's actually the case that the others aren't drained, then this will now cause high CPU load and expects us to have #runs CPUs available etc.

No, because the tasks never run. They are just scheduled for some future date (that never arrives during the test run). So the tasks scheduled in the past runs are effectively dormant.

@gmilos gmilos changed the title Fix SchedulingBenchmark preheating logic. Remove unreliable SchedulingBenchmark Feb 28, 2024
@gmilos gmilos enabled auto-merge (squash) February 29, 2024 17:15
@gmilos gmilos merged commit 325f762 into apple:main Mar 1, 2024
9 of 10 checks passed
@Lukasa Lukasa added the semver/none No version bump required. label Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver/none No version bump required.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants