Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[k8s] Add support for autoscaling kubernetes clusters #3513

Merged
merged 7 commits into from
May 7, 2024

Conversation

romilbhardwaj
Copy link
Collaborator

@romilbhardwaj romilbhardwaj commented May 7, 2024

Adds autoscaler field to config.yaml to allow the user to specify the autoscaler used in the underlying kubernetes cluster:

kubernetes:
  autoscaler: gke/karpenter/generic

Setting this field:

  1. Disables gpu capacity checks during provisioning
  2. Uses the appropriate label formatter for specifying GPU labels

When used in conjunction with provision_timeout, this allows users to bring in their scale-to-zero k8s clusters and use them with SkyPilot.

Currently supported autoscalers: GKE nodepools, Karpenter, any generic CA that can label new nodes with skypilot.co/accelerator (Related - #3432)

More context: https://docs.google.com/document/d/17LRYGCKDsY9AygAJbUIiQR4KkHZqSakL8G0SF7WNOR8/edit?usp=sharing

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Tested manually on a Karpenter cluster with autoscaling T4 GPU nodes
  • Tested manually on a GKE fixed cluster with 1x T4 GPU nodes
  • Tested manually on a GKE autoscaling cluster with T4 GPU nodepool

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for quickly adding this support @romilbhardwaj! Looks mostly good to me.

docs/source/reference/config.rst Outdated Show resolved Hide resolved
docs/source/reference/config.rst Show resolved Hide resolved
sky/cli.py Outdated Show resolved Hide resolved
sky/cli.py Outdated
Comment on lines 3051 to 3052
if cloud_is_kubernetes and kubernetes_autoscaling:
yield kubernetes_utils.KUBERNETES_AUTOSCALER_NOTE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean when the cloud is not specified, i.e. None, we will not print the hint? Should we print the hint if cloud is None as well.

Also, in the show_all case, should we print this hint as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh good point - fixed now! shown when cloud is not specified and also in show_all case.

sky/provision/kubernetes/utils.py Outdated Show resolved Hide resolved
sky/provision/kubernetes/utils.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @romilbhardwaj !

@romilbhardwaj
Copy link
Collaborator Author

Thanks @Michaelvll! Tested on karpenter and GKE nodepool autoscaling, merging now.

@romilbhardwaj romilbhardwaj merged commit 10340f8 into master May 7, 2024
20 checks passed
@romilbhardwaj romilbhardwaj deleted the k8s_autoscaling_support branch May 7, 2024 07:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants