add poller autoscaling in activity and decision workers #1186

shijiesheng · 2022-08-30T17:21:05Z

What changed?

add pollerAutoScalerOptions
surface user set options and the feature flag to enable poller autoscaling
start autoscaling in the workers

Why?

How did you test it?

Potential risks

coveralls · 2022-08-31T16:48:00Z

Pull Request Test Coverage Report for Build 0182fefc-7a13-49a1-8887-001a9f9b0ce0

79 of 81 (97.53%) changed or added relevant lines in 3 files are covered.
13 unchanged lines in 2 files lost coverage.
Overall coverage increased (+0.04%) to 64.164%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
internal/internal_worker_base.go	32	34	94.12%

Files with Coverage Reduction	New Missed Lines	%
internal/compatibility/thrift/enum.go	4	21.2%
internal/compatibility/thrift/types.go	9	48.42%

Totals
Change from base Build 0182dc27-c213-4eb3-a77e-c81ef840dc7b:	0.04%
Covered Lines:	12598
Relevant Lines:	19634

💛 - Coveralls

ZackLK · 2022-08-31T18:24:36Z

internal/worker.go

+
+		// optional: Sets the interval of poller autoscaling, between which poller autoscaler changes the poller count
+		// based on poll result. It takes effect if FeatureFlags.PollerAutoScalerEnabled is set to true.
+		// Default value is 1 min


Is 1 minute fast enough?

I'm still experimenting on this. Since we ensure at least one poller is always working, the impact of this parameter should be low.

Groxx · 2022-08-31T18:49:11Z

internal/internal_poller_autoscaler.go

-	targetMilliUsage uint64,
-	isDryRun bool,
-	cooldownTime time.Duration,
+	options pollerAutoScalerOptions,
 	logger *zap.Logger,
 	hooks ...func()) *pollerAutoScaler {
 	ctx, cancel := context.WithCancel(context.Background())


is this a sign we should be passing in a context? e.g. to propagate shutdown information maybe.

Groxx · 2022-08-31T19:03:46Z

internal/internal_poller_autoscaler.go

+
+// Stop stops the poller autoscaler
+func (p *pollerAutoScaler) Stop() {
+	p.cancel()


not relevant to this PR, but in general: we probably want a (time-bounded?) blocking "... and wait for shutdown to complete" behavior here.
otherwise it's impossible to rule out dangling goroutines (e.g. goleak tests) or race conditions (background loop triggering at the same time as cancelation).

Groxx · 2022-08-31T19:32:44Z

internal/internal_worker_base.go

 	if bw.pollLimiter == nil || bw.pollLimiter.Wait(bw.limiterContext) == nil {
+		if bw.pollerAutoScaler != nil {
+			if pErr := bw.pollerAutoScaler.Acquire(1); pErr == nil {
+				defer bw.pollerAutoScaler.Release(1)
+			} else {
+				bw.logger.Warn("poller auto scaler acquire error", zap.Error(pErr))
+			}
+		}


somewhat pedantic / realistic values seem unlikely to cause problems here, but I'm pretty sure this ordering means this is possible:

autoscale from N to 1

allow N to get through pollLimiter, while autoscaler allows one long request and blocks N-1

long request finishes, N-1 run through as fast as possible, ignoring ratelimter

sequentially, as autoscaler is at 1. or 2 at once if at 2, etc.

swapping the order seems better, since "waiting on the per-second request limiter" feels like "a concurrent poller" to me. the request itself isn't running, but we've reserved space for it to run "on time". so we'll degrade "concurrent requests on the network" (allowing it to go lower than specified, though it is in principle just a maximum) in favor of preserving "maximum requests per second".

Groxx

A minor thing worth changing, but otherwise this all looks reasonable / is mostly "just" a (good) refactor.

so 👍 structurally LGTM. I should read more of the autoscaler before reviewing in more depth though.

unrelated / not a problem for any realistic values for us, but the semaphore library doesn't detect overflows: https://github.com/marusama/semaphore/blob/master/semaphore.go#L98
we should probably make a PR for that. it'd be easy to fix.

* add pollerAutoScalerOptions * add tests for options * change autoscaler interface to include Stop method * add poller auto scaler to task worker * gracefully stop autoscaler * move up autoscaler in pollTask * add unit tests

shijiesheng added 4 commits August 30, 2022 10:17

add pollerAutoScalerOptions

39dfedd

add tests for options

8db9588

change autoscaler interface to include Stop method

75486c4

add poller auto scaler to task worker

8a16e04

shijiesheng changed the title ~~[WIP] add poller autoscaling in activity and decision workers~~ add poller autoscaling in activity and decision workers Aug 31, 2022

ZackLK reviewed Aug 31, 2022

View reviewed changes

ZackLK approved these changes Aug 31, 2022

View reviewed changes

Groxx reviewed Aug 31, 2022

View reviewed changes

shijiesheng added 3 commits September 1, 2022 11:27

gracefully stop autoscaler

ba8e14b

move up autoscaler in pollTask

bf7ce1e

add unit tests

851740b

shijiesheng merged commit 3a730da into uber-go:master Sep 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add poller autoscaling in activity and decision workers #1186

add poller autoscaling in activity and decision workers #1186

shijiesheng commented Aug 30, 2022 •

edited

Loading

coveralls commented Aug 31, 2022 •

edited

Loading

ZackLK Aug 31, 2022

shijiesheng Aug 31, 2022

Groxx Aug 31, 2022

Groxx Aug 31, 2022

Groxx Aug 31, 2022 •

edited

Loading

Groxx left a comment •

edited

Loading

add poller autoscaling in activity and decision workers #1186

add poller autoscaling in activity and decision workers #1186

Conversation

shijiesheng commented Aug 30, 2022 • edited Loading

coveralls commented Aug 31, 2022 • edited Loading

Pull Request Test Coverage Report for Build 0182fefc-7a13-49a1-8887-001a9f9b0ce0

💛 - Coveralls

ZackLK Aug 31, 2022

Choose a reason for hiding this comment

shijiesheng Aug 31, 2022

Choose a reason for hiding this comment

Groxx Aug 31, 2022

Choose a reason for hiding this comment

Groxx Aug 31, 2022

Choose a reason for hiding this comment

Groxx Aug 31, 2022 • edited Loading

Choose a reason for hiding this comment

Groxx left a comment • edited Loading

Choose a reason for hiding this comment

shijiesheng commented Aug 30, 2022 •

edited

Loading

coveralls commented Aug 31, 2022 •

edited

Loading

Groxx Aug 31, 2022 •

edited

Loading

Groxx left a comment •

edited

Loading