Enable HPA by default #27

sslavic · 2020-02-10T16:28:56Z

sslavic · 2020-02-10T17:10:15Z

helm/nginx-ingress-controller-app/Configuration.md

-`controller.autoscaling.enabled` | Enables or disables Horizontal Pod Autoscaler (HPA) for NGINX Ingress Controller Deployment. | `false`
-`controller.autoscaling.minReplicas` | Configures HPA min replicas. | `1`
+`controller.autoscaling.enabled` | Enables or disables Horizontal Pod Autoscaler (HPA) for NGINX Ingress Controller Deployment. | `true`
+`controller.autoscaling.minReplicas` | Configures HPA min replicas. | `2`


Syncs docs with actual configured default

sslavic · 2020-02-10T17:41:31Z

helm/nginx-ingress-controller-app/values.yaml

@@ -26,7 +26,7 @@ configmap:
  # Increase hash table size to allow more server names for stability reasons
  server-name-hash-bucket-size: "1024"
  server-tokens: "false"
-  worker-processes: "4"
+  worker-processes: "1"


Each nginx replica Pod starts number of processes

controller

cache

master

workers
and some processes get started periodically or on-demand, e.g. to load configuration changes.

Upstream default for number of worker processes is auto which means number of worker processes in each nginx replica/Pod is set to same number as detected number of CPUs. Problem is, current logic for detecting available CPUs is not cgroups aware.

So far we've been overriding worker-processes to 4. Just knowing we've been requesting 500m CPU and 600Mi memory this was obviously misconfigured. From HPA trials on adidas tenant clusters observed that each worker-process seems to keep its own copy of entire configuration derived from Ingress definitions, and on dev clusters with large amount of Ingress CRs especially during configuration validation/reload resource usage would double, and didn't fit in available requested resources.

Reducing worker-processes to just 1, and granting enough of memory and CPU made resource usage of each replica more predictable and stable with enough of capacity to store configuration even during reloads.

Currently these settings are sized up to handle largest known clusters.

With Prometheus per TC once it's availble, we'll be able to collect more metrics like nginx built in ones (connection count/rate, request rate), Ingress count/rate etc, analyze variance in resource utilization across clusters and potentially discover different usage profiles, so we'll very likely revisit these settings again in the future.

I have two concerns here:

Increasing from 500m to 2G will have an impact on the allocatable memory/CPU of each worker node. We need to check if default instance type (or available instance types) have enough resources to give customer functional workers

If we set 2 CPUs would have more sense to set worker processes to 2 instead of 1?
According to https://nginx.org/en/docs/ngx_core_module.html#worker_processes looks like a good initial configuration is to align CPU cores with workers processes

I share your concern, actually testing this and won't merge until verifying it fits; got earlier today ack from @puja108 on Slack that on-demand it's OK even to raise the minimal requirements.

Had same thoughts, through have to admit relatively naive experiments so far didn't see perf benefit, but did see better stability with single worker; maybe that's due to fact that it's not only worker processes that run in each nginx Pod, there's controller, master, cache, and periodically kicked off processes for validating and applying configuration updates. So 1 worker process seems now as safe start, we can always relatively easily adjust to 2 or revert to 4 and event disable HPA on-demand on case by case basis e.g. if rollout to existing production tenant clusters causes some previously not observed issues. Once metrics are available, and we collect more info/experience from the live deployments, we can set objective performance and capacity SLOs per replica and continuously ensure they are met with automated testing.

What about adding an entry in changelogs, I will try to give more visibility so users are aware of implications

ok all good

@pipo02mix I've update changelog, PTAL; we'll likely repeat this in some form and may even expand in legacy platform release notes.

Notice I've bumped that release with HPA enabled to be 1.6.0 instead of 1.5.0; pushed HPA back a bit to release first a regression fix for 1.4.0 + upgrade to latest nginx and that will be released as 1.5.0 instead. Regression was discovered during testing legacy WIP releases on aws that we discussed a bit earlier today in Slack #batman channel thread https://gigantic.slack.com/archives/C76JX6YLQ/p1581929367347400

sslavic · 2020-02-10T17:42:52Z

integration/templates/ingress_controller_values.go

+  resources:
+    requests:
+      cpu: 500m
+      memory: 600Mi


e2e tests environment seems to be constrained so without adjusting resource usage like this, 3rd replica wasn't able to start, due to not enough of resources.

We could maybe increase the size of the kind nodes but I think this makes more sense. 👍

sslavic · 2020-02-10T17:52:08Z

helm/nginx-ingress-controller-app/templates/controller-poddisruptionbudget.yaml

@@ -0,0 +1,15 @@
+{{- if or (and .Values.controller.autoscaling.enabled (gt (.Values.controller.autoscaling.minReplicas | int) 1)) (gt (.Values.ingressController.replicas | int) 1) }}
+apiVersion: policy/v1beta1
+kind: PodDisruptionBudget


With HPA enabled there's additional source of voluntary disruptions. This PDB helps us additionally protect nginx deployment, control how many instances can be unavailable at the same time.

sslavic · 2020-02-11T16:40:20Z

Tested on giraffe - AWS, china, nginx as optional app on top of GS 11.0.0

pipo02mix

Some concerns raised but it is a wonderful step forward

pipo02mix · 2020-02-13T11:34:58Z

helm/nginx-ingress-controller-app/values.yaml

@@ -26,7 +26,7 @@ configmap:
  # Increase hash table size to allow more server names for stability reasons
  server-name-hash-bucket-size: "1024"
  server-tokens: "false"
-  worker-processes: "4"
+  worker-processes: "1"


I have two concerns here:

Increasing from 500m to 2G will have an impact on the allocatable memory/CPU of each worker node. We need to check if default instance type (or available instance types) have enough resources to give customer functional workers

If we set 2 CPUs would have more sense to set worker processes to 2 instead of 1?
According to https://nginx.org/en/docs/ngx_core_module.html#worker_processes looks like a good initial configuration is to align CPU cores with workers processes

sslavic · 2020-02-17T10:26:41Z

@marians suggested https://docs.giantswarm.io/basics/cluster-size-autoscaling/#ingress-controller-replicas-with-autoscaling needs to be updated

pipo02mix · 2020-02-17T10:40:57Z

@sslavic do you think then is good idea to add the entry to the Changelog with the changes introduced?

sslavic · 2020-02-17T11:17:23Z

@pipo02mix yes, did add (too brief) thumbsup when you brought it up last time via #27 (comment)
but didn't get to it yet since working on legacy release first and had intense oncall shifts; plan to work on it today as I shared via https://gigantic.slack.com/archives/CRBH3QNNS/p1581932970023000

pipo02mix · 2020-02-17T11:54:19Z

no worries, take your time. :)

pipo02mix

Just some wording to tight the assessments

CHANGELOG.md

sslavic · 2020-02-25T15:05:19Z

@rossf7 @pipo02mix @puja108 I've updated PR to include changes we discussed earlier today in #team-batman https://gigantic.slack.com/archives/C76JX6YLQ/p1582626800037000 - break away from cluster-operator dynamically configurable ingressController.replicas

PTAL

pipo02mix

LGTM

sslavic · 2020-02-26T01:42:28Z

helm/nginx-ingress-controller-app/values.yaml

  use-forwarded-headers: "true"

 controller:
  name: nginx-ingress-controller
  k8sAppLabel: nginx-ingress-controller

+  replicaCount: 2


@pipo02mix @puja108 @rossf7 recently GS AWS worker instance type with 2CPUs was removed and now min instance type has 4CPUs, still ASG allowed min and max remained to be 1. With such minimal test/playground/learning or whatever the use case for max single worker node clusters is (trying to discover that via https://gigantic.slack.com/archives/C0251MXL5/p1582671214019900), 2 nginx replicas with 2CPU each couldn't fit on such minimal single node 4CPU tenant cluster, especially not together with all the other pre-installed managed workloads.

Maybe we can consider increasing min worker count to 2.

Still, assuming we won't increase minimum worker count to 2, I'm thinking to either

reduce the nginx replica count default to 1, and HPA minReplicas to 1 too, hopefully at least 1 can get scheduled on minimal single worker node cluster; con is that even on production environments nginx Deployment could be dropped by HPA to single replica only, so risky setup with no redundancy; or alternatively to

keep 2 as the default replica count, tolerating that not both of them would be scheduled, I'd test that at least one does get scheduled; both of them being scheduled and nothing else of pre-installed components I think is not currently possible since nginx gets installed relatively late in cluster init; customer could be instructed to manually customize/optimize resource requests and autoscale settings in this non-productive corner case; pro would be that more important and more common non-idle usage case would be better covered with automation out of the box.

Longer term we could figure out supported use cases / usage profiles, if we can have them categorized to few T-shirt sizes determined based on resources assigned to the cluster (number range of workers and worker size / instance type), cluster-operator could have that resources -> profile mapping logic and share outcome in cluster values ConfigMap the determined cluster usage profile/category, based on profile apps can adjust default resource requests and redundancy settings, which would hopefully minimize need to have resource/scaling customized for the supported usage profiles.

What do you think?

My understanding (which is limited) is that you can create single node clusters but they are not fully supported.

The 1st option of reducing to 1 replica seems simpler. There is a risk of downtime to the ingress but if you've configured a single node cluster you've increased your risk of downtime.

can we do an intermediate solution like setting a min value depending on the size of the cluster?

-> cluster single NP with 1 min 3 max workers -> HPA IC min 1
-> cluster single NP with 3 min 10 max workers -> HPA IC min 2
-> cluster single NP with 10 min 50 max workers -> HPA IC min 4

Problem is the cluster scaling actions should modify the values on HPA of some components (here IC) but IMO it would be the ideal scenario. I talked to a customer with big node pool cluster and they were surprised the HPA min value for CoreDNS was 2 replicas. I think condition the min (or max) to cluster size is the ideal. Can we do it somehow?

@rossf7 thanks for your feedback, on single node cluster use-case and support

It makes no sense to have min HPA higher than default replica count, as HPA would scale up immediately to its min.
I think defaulting to 1 replica (so also 1 min HPA) is availability risk even for tenant clusters which have more than 1 worker node - nginx, or node will crash or get recycled; we saw with recent incidents with Pods stuck Terminating for ages, it takes time for replacement to come up, so 1 replica is risk even in multi-node clusters.

Therefore I'm more for defaulting to 2 replicas, letting 1 hang unscheduled in playground single small node clusters.
Will go with that for now (so keep settings as is) and test in a WIP release even on a single node cluster with smallest instance type - can always come back and make adjustments in a follow up patch release.

@pipo02mix yes, mentioned earlier the idea with having cluster-operator do t-shirt sizes / usage profile estimation, estimation would be based on what customer assigned to the tenant cluster (scaling range i.e. instance count and instance type / capacity), estimated profile would be saved/updated in (tenant) cluster values ConfigMap that would be provided to every installed app, and app charts could have mapping of t-shirt sizes to app configuration (replica count, resource assignment, HPA range, ...)

Start simple with 2-3 levels/profiles - think anything is better than current single one-size fits all, apps are not smart enough to deal with the range of clusters/usage, we practically can't apply both HPA and VPA at the same time to a single app.

Tested HPA enabled nginx with 2 initial replicas, and HPA with 2min replicas on smallest single node cluster we currently support on aws, using 11.0.1 platform release, and installing nginx App as optional/ondemand without any adjustments - other things started as expected, while there was single running nginx replica and 2 Pending - HPA was trying to scale out since the desired replica count was lower than the minimum replica count.

Problem with this configuration is that we'll get paged about nginx Deployment not being satisfied. For this reason (to cover smallest cluster without paging) I'll reduce initial replica count to 1 and also HPA min to 1.

rossf7

LGTM

This is great Stevo. Thank for all your efforts on this. I mainly reviewed in the context of the cluster-operator changes which all makes sense to me.

CHANGELOG.md

rossf7 · 2020-02-26T07:35:22Z

integration/templates/ingress_controller_values.go

+  resources:
+    requests:
+      cpu: 500m
+      memory: 600Mi


We could maybe increase the size of the kind nodes but I think this makes more sense. 👍

rossf7 · 2020-02-26T07:40:28Z

helm/nginx-ingress-controller-app/values.yaml

  use-forwarded-headers: "true"

 controller:
  name: nginx-ingress-controller
  k8sAppLabel: nginx-ingress-controller

+  replicaCount: 2


My understanding (which is limited) is that you can create single node clusters but they are not fully supported.

The 1st option of reducing to 1 replica seems simpler. There is a risk of downtime to the ingress but if you've configured a single node cluster you've increased your risk of downtime.

Co-Authored-By: Ross Fairbanks <rossf7@users.noreply.github.com>

…e node clusters without paging

sslavic · 2020-02-27T17:47:13Z

Tested on gauss on smallest single worker node cluster with 4CPU, nginx upgrade works.

sslavic self-assigned this Feb 10, 2020

sslavic commented Feb 10, 2020

View reviewed changes

sslavic marked this pull request as ready for review February 11, 2020 16:40

sslavic requested review from a team February 11, 2020 16:40

pipo02mix reviewed Feb 13, 2020

View reviewed changes

sslavic force-pushed the enable-hpa branch from 6dcf183 to 5fad6f2 Compare February 17, 2020 15:10

pipo02mix approved these changes Feb 17, 2020

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

CHANGELOG.md Outdated Show resolved Hide resolved

CHANGELOG.md Outdated Show resolved Hide resolved

CHANGELOG.md Outdated Show resolved Hide resolved

CHANGELOG.md Outdated Show resolved Hide resolved

sslavic force-pushed the enable-hpa branch 3 times, most recently from a1eb4cb to 30f6a55 Compare February 18, 2020 14:09

sslavic force-pushed the enable-hpa branch 2 times, most recently from dae407a to a6e7bf0 Compare February 25, 2020 14:58

Enable HPA by default

9df20f8

sslavic force-pushed the enable-hpa branch from a6e7bf0 to 9df20f8 Compare February 25, 2020 17:16

pipo02mix approved these changes Feb 25, 2020

View reviewed changes

sslavic commented Feb 26, 2020

View reviewed changes

rossf7 approved these changes Feb 26, 2020

View reviewed changes

Stevo Slavić and others added 2 commits February 26, 2020 17:20

Update CHANGELOG.md

b956af2

Co-Authored-By: Ross Fairbanks <rossf7@users.noreply.github.com>

Adjust initial default replica count and HPA min to 1, to cover singl…

f94e7d6

…e node clusters without paging

sslavic merged commit 741a18a into master Feb 27, 2020

sslavic deleted the enable-hpa branch February 27, 2020 17:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable HPA by default #27

Enable HPA by default #27

sslavic commented Feb 10, 2020

sslavic Feb 10, 2020

sslavic Feb 10, 2020

pipo02mix Feb 13, 2020

sslavic Feb 13, 2020

pipo02mix Feb 13, 2020

sslavic Feb 17, 2020

sslavic Feb 10, 2020

rossf7 Feb 26, 2020

sslavic Feb 10, 2020

sslavic commented Feb 11, 2020

pipo02mix left a comment

pipo02mix Feb 13, 2020

sslavic commented Feb 17, 2020

pipo02mix commented Feb 17, 2020

sslavic commented Feb 17, 2020

pipo02mix commented Feb 17, 2020

pipo02mix left a comment

sslavic commented Feb 25, 2020

pipo02mix left a comment

sslavic Feb 26, 2020

rossf7 Feb 26, 2020

pipo02mix Feb 26, 2020

sslavic Feb 26, 2020

sslavic Feb 26, 2020

sslavic Feb 27, 2020

rossf7 left a comment

rossf7 Feb 26, 2020

rossf7 Feb 26, 2020

sslavic commented Feb 27, 2020

Enable HPA by default #27

Enable HPA by default #27

Conversation

sslavic commented Feb 10, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sslavic commented Feb 11, 2020

pipo02mix left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sslavic commented Feb 17, 2020

pipo02mix commented Feb 17, 2020

sslavic commented Feb 17, 2020

pipo02mix commented Feb 17, 2020

pipo02mix left a comment

Choose a reason for hiding this comment

sslavic commented Feb 25, 2020

pipo02mix left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rossf7 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sslavic commented Feb 27, 2020