Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fallback to node drain using --disable-eviction flag #8094

Merged
merged 5 commits into from Oct 20, 2021
Merged

Add fallback to node drain using --disable-eviction flag #8094

merged 5 commits into from Oct 20, 2021

Conversation

utkuozdemir
Copy link
Contributor

@utkuozdemir utkuozdemir commented Oct 18, 2021

Signed-off-by: Utku Ozdemir uoz@protonmail.com

What type of PR is this?
/kind feature

What this PR does / why we need it:
Explained in #8093:

Kubespray drains every node one by one during a cluster upgrade.
Node drain uses the eviction logic by default.
When there is a PodDisruptionBudget on a workload, that pdb resource can prevent pods that it protects from being evicted from the node.
When this happens, Kubespray upgrade fails and sometimes can leave the cluster half-upgraded (some nodes on older, some nodes on newer version).

The command kubectl drain has the flag --disable-eviction for such cases. When the flag is used, pods are deleted using regular kubectl delete pod logic instead of eviction.

There are setups where the cluster owners and the application owners are not the same people/team. The most common example is multi-tenant clusters, where there can be many PDBs of different tenants on a cluster, and any of them can prevent the cluster upgrades until it is deleted.

This proposal makes it possible to "forcefully" upgrade the clusters in such cases, while giving some grace period for the PDB-protected pods.

Which issue(s) this PR fixes:
Fixes #8093

Special notes for your reviewer:
If this gets approved, I would like to backport it to release-2.16 and release-2.15 branches.

Does this PR introduce a user-facing change?:

Add an optional fallback to node drain during cluster upgrades using `--disable-eviction` flag

Signed-off-by: Utku Ozdemir <uoz@protonmail.com>
@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 18, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @utkuozdemir. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Oct 18, 2021
Copy link
Contributor

@oomichi oomichi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @oomichi

roles/upgrade/pre-upgrade/tasks/main.yml Outdated Show resolved Hide resolved
Signed-off-by: Utku Ozdemir <uoz@protonmail.com>
Signed-off-by: Utku Ozdemir <uoz@protonmail.com>
Signed-off-by: Utku Ozdemir <uoz@protonmail.com>
Signed-off-by: Utku Ozdemir <uoz@protonmail.com>
@oomichi
Copy link
Contributor

oomichi commented Oct 19, 2021

Thanks for updating.

/ok-to-test
/lgtm

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 19, 2021
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 19, 2021
Copy link
Member

@floryut floryut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@utkuozdemir Thank you for that 👍

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: floryut, utkuozdemir

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 20, 2021
@k8s-ci-robot k8s-ci-robot merged commit 10c30ea into kubernetes-sigs:master Oct 20, 2021
@utkuozdemir
Copy link
Contributor Author

@floryut Happy to contribute :). Is it ok if I backport this to 2.16 and 2.15?

@floryut
Copy link
Member

floryut commented Oct 20, 2021

@floryut Happy to contribute :). Is it ok if I backport this to 2.16 and 2.15?

2.16 might be possible, 2.15 not really, way too much work to make it usable as of now :)

forselli-stratio pushed a commit to forselli-stratio/kubespray that referenced this pull request Mar 1, 2022
…sigs#8094)

* Add fallback to node drain using --disable-eviction flag

Signed-off-by: Utku Ozdemir <uoz@protonmail.com>

* Move drain fallback tasks to separate file

Signed-off-by: Utku Ozdemir <uoz@protonmail.com>

* Add delegate_facts to fix the drain fallback

Signed-off-by: Utku Ozdemir <uoz@protonmail.com>

* Fix ansible-lint error

Signed-off-by: Utku Ozdemir <uoz@protonmail.com>

* Move drain fallback into block

Signed-off-by: Utku Ozdemir <uoz@protonmail.com>
forselli-stratio added a commit to Stratio/kubespray that referenced this pull request Mar 1, 2022
…sigs#8094) (#33)

* Add fallback to node drain using --disable-eviction flag

Signed-off-by: Utku Ozdemir <uoz@protonmail.com>

* Move drain fallback tasks to separate file

Signed-off-by: Utku Ozdemir <uoz@protonmail.com>

* Add delegate_facts to fix the drain fallback

Signed-off-by: Utku Ozdemir <uoz@protonmail.com>

* Fix ansible-lint error

Signed-off-by: Utku Ozdemir <uoz@protonmail.com>

* Move drain fallback into block

Signed-off-by: Utku Ozdemir <uoz@protonmail.com>

Co-authored-by: Utku Özdemir <uoz@protonmail.com>
sakuraiyuta pushed a commit to sakuraiyuta/kubespray that referenced this pull request Apr 16, 2022
…sigs#8094)

* Add fallback to node drain using --disable-eviction flag

Signed-off-by: Utku Ozdemir <uoz@protonmail.com>

* Move drain fallback tasks to separate file

Signed-off-by: Utku Ozdemir <uoz@protonmail.com>

* Add delegate_facts to fix the drain fallback

Signed-off-by: Utku Ozdemir <uoz@protonmail.com>

* Fix ansible-lint error

Signed-off-by: Utku Ozdemir <uoz@protonmail.com>

* Move drain fallback into block

Signed-off-by: Utku Ozdemir <uoz@protonmail.com>
LuckySB pushed a commit to southbridgeio/kubespray that referenced this pull request Jun 27, 2023
…sigs#8094)

* Add fallback to node drain using --disable-eviction flag

Signed-off-by: Utku Ozdemir <uoz@protonmail.com>

* Move drain fallback tasks to separate file

Signed-off-by: Utku Ozdemir <uoz@protonmail.com>

* Add delegate_facts to fix the drain fallback

Signed-off-by: Utku Ozdemir <uoz@protonmail.com>

* Fix ansible-lint error

Signed-off-by: Utku Ozdemir <uoz@protonmail.com>

* Move drain fallback into block

Signed-off-by: Utku Ozdemir <uoz@protonmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a fallback method to node drain using --disable-eviction to ignore PodDisruptionBudgets
4 participants