Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

helm: add configurable liveness&readiness probes for master topology-updater and worker #1801

Merged
merged 1 commit into from
Jul 23, 2024

Conversation

omerap12
Copy link
Member

@omerap12 omerap12 commented Jul 18, 2024

Fixes: #1730
The following values.yaml:

master:
  enable: true
  args: []
  config: ### <NFD-MASTER-CONF-START-DO-NOT-REMOVE>
    # noPublish: false
    # autoDefaultNs: true
    # extraLabelNs: ["added.ns.io","added.kubernets.io"]
    # denyLabelNs: ["denied.ns.io","denied.kubernetes.io"]
    # resourceLabels: ["vendor-1.com/feature-1","vendor-2.io/feature-2"]
    # enableTaints: false
    # labelWhiteList: "foo"
    # resyncPeriod: "2h"
    # klog:
    #    addDirHeader: false
    #    alsologtostderr: false
    #    logBacktraceAt:
    #    logtostderr: true
    #    skipHeaders: false
    #    stderrthreshold: 2
    #    v: 0
    #    vmodule:
    ##   NOTE: the following options are not dynamically run-time configurable
    ##         and require a nfd-master restart to take effect after being changed
    #    logDir:
    #    logFile:
    #    logFileMaxSize: 1800
    #    skipLogHeaders: false
    # leaderElection:
    #   leaseDuration: 15s
    #   # this value has to be lower than leaseDuration and greater than retryPeriod*1.2
    #   renewDeadline: 10s
    #   # this value has to be greater than 0
    #   retryPeriod: 2s
    # nfdApiParallelism: 10
  ### <NFD-MASTER-CONF-END-DO-NOT-REMOVE>
  # The TCP port that nfd-master listens for incoming requests. Default: 8080
  # Deprecated this parameter is related to the deprecated gRPC API and will
  # be removed with it in a future release
  port: 8080
  metricsPort: 8081
  instance:
  featureApi:
  resyncPeriod:
  denyLabelNs: []
  extraLabelNs: []
  resourceLabels: []
  enableTaints: false
  crdController: null
  featureRulesController: null
  nfdApiParallelism: null
  deploymentAnnotations: {}
  replicaCount: 1

  podSecurityContext: {}
    # fsGroup: 2000

  securityContext:
    allowPrivilegeEscalation: false
    capabilities:
      drop: [ "ALL" ]
    readOnlyRootFilesystem: true
    runAsNonRoot: true
    # runAsUser: 1000

  serviceAccount:
    # Specifies whether a service account should be created
    create: true
    # Annotations to add to the service account
    annotations: {}
    # The name of the service account to use.
    # If not set and create is true, a name is generated using the fullname template
    name:

  # specify how many old ReplicaSets for the Deployment to retain.
  revisionHistoryLimit:

  rbac:
    create: true

  service:
    type: ClusterIP
    port: 8080

  resources:
    limits:
      memory: 4Gi
    requests:
      cpu: 100m
      # You may want to use the same value for `requests.memory` and `limits.memory`. The “requests” value affects scheduling to accommodate pods on nodes.
      # If there is a large difference between “requests” and “limits” and nodes experience memory pressure, the kernel may invoke
      # the OOM Killer, even if the memory does not exceed the “limits” threshold. This can cause unexpected pod evictions. Memory
      # cannot be compressed and once allocated to a pod, it can only be reclaimed by killing the pod.
      # Natan Yellin 22/09/2022 https://home.robusta.dev/blog/kubernetes-memory-limit
      memory: 128Mi

  nodeSelector: {}

  tolerations:
  - key: "node-role.kubernetes.io/master"
    operator: "Equal"
    value: ""
    effect: "NoSchedule"
  - key: "node-role.kubernetes.io/control-plane"
    operator: "Equal"
    value: ""
    effect: "NoSchedule"

  annotations: {}

  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 1
          preference:
            matchExpressions:
              - key: "node-role.kubernetes.io/master"
                operator: In
                values: [""]
        - weight: 1
          preference:
            matchExpressions:
              - key: "node-role.kubernetes.io/control-plane"
                operator: In
                values: [""]
                
  livenessProbe:
    grpc:
      port: 8082
    initialDelaySeconds: 10
    periodSeconds: 10
  readinessProbe:
    grpc:
      port: 8082
    initialDelaySeconds: 5
    periodSeconds: 10
    failureThreshold: 10

Will generate:

---
# Source: node-feature-discovery/templates/master.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name:  release-name-node-feature-discovery-master
  namespace: default
  labels:
    helm.sh/chart: node-feature-discovery-0.2.1
    app.kubernetes.io/name: node-feature-discovery
    app.kubernetes.io/instance: release-name
    app.kubernetes.io/version: "master"
    app.kubernetes.io/managed-by: Helm
    role: master
spec:
  replicas: 1
  revisionHistoryLimit: 
  selector:
    matchLabels:
      app.kubernetes.io/name: node-feature-discovery
      app.kubernetes.io/instance: release-name
      role: master
  template:
    metadata:
      labels:
        app.kubernetes.io/name: node-feature-discovery
        app.kubernetes.io/instance: release-name
        role: master
    spec:
      serviceAccountName: release-name-node-feature-discovery
      enableServiceLinks: false
      securityContext:
        {}
      containers:
        - name: master
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop:
              - ALL
            readOnlyRootFilesystem: true
            runAsNonRoot: true
          image: "gcr.io/k8s-staging-nfd/node-feature-discovery:master"
          imagePullPolicy: Always
          livenessProbe:
            failureThreshold: 10
            grpc:
              port: 8082
            initialDelaySeconds: 5
            periodSeconds: 10
          readinessProbe:
            failureThreshold: 10
            grpc:
              port: 8082
            initialDelaySeconds: 5
            periodSeconds: 10
          ports:
          - containerPort: 8080
            name: grpc
          - containerPort: 8081
            name: metrics
          env:
          - name: NODE_NAME
            valueFrom:
              fieldRef:
                fieldPath: spec.nodeName
          command:
            - "nfd-master"
          resources:
            limits:
              memory: 4Gi
            requests:
              cpu: 100m
              memory: 128Mi
          args:
            ## By default, disable crd controller for other than the default instances
            - "-crd-controller=true"
            # Go over featureGates and add the feature-gate flag
            - "-feature-gates=NodeFeatureAPI=true"
            - "-feature-gates=NodeFeatureGroupAPI=false"
            - "-metrics=8081"
          volumeMounts:
            - name: nfd-master-conf
              mountPath: "/etc/kubernetes/node-feature-discovery"
              readOnly: true
      volumes:
        - name: nfd-master-conf
          configMap:
            name: release-name-node-feature-discovery-master-conf
            items:
              - key: nfd-master.conf
                path: nfd-master.conf
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - preference:
              matchExpressions:
              - key: node-role.kubernetes.io/master
                operator: In
                values:
                - ""
            weight: 1
          - preference:
              matchExpressions:
              - key: node-role.kubernetes.io/control-plane
                operator: In
                values:
                - ""
            weight: 1
      tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/master
          operator: Equal
          value: ""
        - effect: NoSchedule
          key: node-role.kubernetes.io/control-plane
          operator: Equal
          value: ""
---
# Source: node-feature-discovery/templates/worker.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name:  release-name-node-feature-discovery-worker
  namespace: default
  labels:
    helm.sh/chart: node-feature-discovery-0.2.1
    app.kubernetes.io/name: node-feature-discovery
    app.kubernetes.io/instance: release-name
    app.kubernetes.io/version: "master"
    app.kubernetes.io/managed-by: Helm
    role: worker
spec:
  revisionHistoryLimit: 
  selector:
    matchLabels:
      app.kubernetes.io/name: node-feature-discovery
      app.kubernetes.io/instance: release-name
      role: worker
  template:
    metadata:
      labels:
        app.kubernetes.io/name: node-feature-discovery
        app.kubernetes.io/instance: release-name
        role: worker
    spec:
      dnsPolicy: ClusterFirstWithHostNet
      serviceAccountName: release-name-node-feature-discovery-worker
      securityContext:
        {}
      containers:
      - name: worker
        securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop:
              - ALL
            readOnlyRootFilesystem: true
            runAsNonRoot: true
        image: "gcr.io/k8s-staging-nfd/node-feature-discovery:master"
        imagePullPolicy: Always
        livenessProbe:
            grpc:
              port: 8082
            initialDelaySeconds: 10
        readinessProbe:
            failureThreshold: 10
            grpc:
              port: 8082
            initialDelaySeconds: 5
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_UID
          valueFrom:
            fieldRef:
              fieldPath: metadata.uid
        resources:
            limits:
              memory: 512Mi
            requests:
              cpu: 5m
              memory: 64Mi
        command:
        - "nfd-worker"
        args:
# Go over featureGate and add the feature-gate flag
        - "-feature-gates=NodeFeatureAPI=true"
        - "-feature-gates=NodeFeatureGroupAPI=false"
        - "-metrics=8081"
        ports:
          - name: metrics
            containerPort: 8081
        volumeMounts:
        - name: host-boot
          mountPath: "/host-boot"
          readOnly: true
        - name: host-os-release
          mountPath: "/host-etc/os-release"
          readOnly: true
        - name: host-sys
          mountPath: "/host-sys"
          readOnly: true
        - name: host-usr-lib
          mountPath: "/host-usr/lib"
          readOnly: true
        - name: host-lib
          mountPath: "/host-lib"
          readOnly: true
        - name: host-proc-swaps
          mountPath: "/host-proc/swaps"
          readOnly: true
        - name: source-d
          mountPath: "/etc/kubernetes/node-feature-discovery/source.d/"
          readOnly: true
        - name: features-d
          mountPath: "/etc/kubernetes/node-feature-discovery/features.d/"
          readOnly: true
        - name: nfd-worker-conf
          mountPath: "/etc/kubernetes/node-feature-discovery"
          readOnly: true
      volumes:
        - name: host-boot
          hostPath:
            path: "/boot"
        - name: host-os-release
          hostPath:
            path: "/etc/os-release"
        - name: host-sys
          hostPath:
            path: "/sys"
        - name: host-usr-lib
          hostPath:
            path: "/usr/lib"
        - name: host-lib
          hostPath:
            path: "/lib"
        - name: host-proc-swaps
          hostPath:
            path: "/proc/swaps"
        - name: source-d
          hostPath:
            path: "/etc/kubernetes/node-feature-discovery/source.d/"
        - name: features-d
          hostPath:
            path: "/etc/kubernetes/node-feature-discovery/features.d/"
        - name: nfd-worker-conf
          configMap:
            name: release-name-node-feature-discovery-worker-conf
            items:
              - key: nfd-worker.conf
                path: nfd-worker.conf
---
# Source: node-feature-discovery/templates/topologyupdater.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: release-name-node-feature-discovery-topology-updater
  namespace: default
  labels:
    helm.sh/chart: node-feature-discovery-0.2.1
    app.kubernetes.io/name: node-feature-discovery
    app.kubernetes.io/instance: release-name
    app.kubernetes.io/version: "master"
    app.kubernetes.io/managed-by: Helm
    role: topology-updater
spec:
  revisionHistoryLimit: 
  selector:
    matchLabels:
      app.kubernetes.io/name: node-feature-discovery
      app.kubernetes.io/instance: release-name
      role: topology-updater
  template:
    metadata:
      labels:
        app.kubernetes.io/name: node-feature-discovery
        app.kubernetes.io/instance: release-name
        role: topology-updater
    spec:
      serviceAccountName: release-name-node-feature-discovery-topology-updater
      dnsPolicy: ClusterFirstWithHostNet
      securityContext:
        {}
      containers:
      - name: topology-updater
        image: "gcr.io/k8s-staging-nfd/node-feature-discovery:master"
        imagePullPolicy: "Always"
        livenessProbe:
          grpc:
            port: 8082
          initialDelaySeconds: 10
        readinessProbe:
          failureThreshold: 10
          grpc:
            port: 8082
          initialDelaySeconds: 5
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: NODE_ADDRESS
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP
        command:
          - "nfd-topology-updater"
        args:
          - "-podresources-socket=/host-var/lib/kubelet-podresources/kubelet.sock"
          - "-sleep-interval=60s"
          - "-watch-namespace=*"
          - -metrics=8081
        ports:
          - name: metrics
            containerPort: 8081
        volumeMounts:
        - name: kubelet-podresources-sock
          mountPath: /host-var/lib/kubelet-podresources/kubelet.sock
        - name: host-sys
          mountPath: /host-sys
        - name: kubelet-state-files
          mountPath: /host-var/lib/kubelet
          readOnly: true
        - name: nfd-topology-updater-conf
          mountPath: "/etc/kubernetes/node-feature-discovery"
          readOnly: true

        resources:
            limits:
              memory: 60Mi
            requests:
              cpu: 50m
              memory: 40Mi
        securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop:
              - ALL
            readOnlyRootFilesystem: true
            runAsUser: 0
      volumes:
      - name: host-sys
        hostPath:
          path: "/sys"
      - name: kubelet-podresources-sock
        hostPath:
          path: /var/lib/kubelet/pod-resources/kubelet.sock
      - name: kubelet-state-files
        hostPath:
          path: /var/lib/kubelet
      - name: nfd-topology-updater-conf
        configMap:
          name: release-name-node-feature-discovery-topology-updater-conf
          items:
            - key: nfd-topology-updater.conf
              path: nfd-topology-updater.conf

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 18, 2024
@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jul 18, 2024
Copy link

netlify bot commented Jul 18, 2024

Deploy Preview for kubernetes-sigs-nfd ready!

Name Link
🔨 Latest commit b2222e2
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-nfd/deploys/669eaaf409165f00084172e9
😎 Deploy Preview https://deploy-preview-1801--kubernetes-sigs-nfd.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link
Contributor

@marquiz marquiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @omerap12 for the patch.

In addition to nfd-master, we should also add support for nfd-master and nfd-topology-updater.

docs/deployment/helm.md Outdated Show resolved Hide resolved
@marquiz
Copy link
Contributor

marquiz commented Jul 19, 2024

@ArangoGutierrez you were yearning for this, please verify and report back 😊

/assign @ArangoGutierrez

@marquiz marquiz mentioned this pull request Jul 19, 2024
15 tasks
@omerap12
Copy link
Member Author

omerap12 commented Jul 19, 2024

nfd-topology-updater.

sure. do you want it in the same PR?

@marquiz
Copy link
Contributor

marquiz commented Jul 19, 2024

sure. do you want it in the same PR?

Yes, enable them all at the same time

docs/deployment/helm.md Outdated Show resolved Hide resolved
@omerap12
Copy link
Member Author

@marquiz do we want to add failureThreshold to the livenessProbes as well?

@omerap12 omerap12 closed this Jul 19, 2024
@omerap12 omerap12 reopened this Jul 19, 2024
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jul 19, 2024
Copy link
Contributor

@marquiz marquiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to add failureThreshold to the livenessProbes as well?

You could add it as a commented out entry in the values.yaml. Let's not change any defaults in this PR, though

deployment/helm/node-feature-discovery/values.yaml Outdated Show resolved Hide resolved
deployment/helm/node-feature-discovery/values.yaml Outdated Show resolved Hide resolved
@omerap12 omerap12 force-pushed the issue_1730 branch 4 times, most recently from 92fe52e to 1dfe9fe Compare July 19, 2024 13:56
@omerap12 omerap12 requested a review from marquiz July 19, 2024 13:56
Copy link
Contributor

@marquiz marquiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add support for nfd-worker, too, to make this comprehensive and consistent?

Also, adjust the commit message (and the PR title) accordingly, e.g. helm: add support for configuring liveness and readiness probes or smth

docs/deployment/helm.md Outdated Show resolved Hide resolved
@omerap12 omerap12 changed the title Add liveness&readiness probes for NFD master helm: add liveness&readiness probes for nfd master,topologyUpdater and nfd worker Jul 22, 2024
@omerap12 omerap12 requested a review from marquiz July 22, 2024 07:19
Copy link
Contributor

@marquiz marquiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we're on it, suggest rewording/re-spelling the commit message (and PR title)

helm: add configurable liveness&readiness probes for master topology-updater and worker

EDIT: *configurable

docs/deployment/helm.md Outdated Show resolved Hide resolved
@omerap12 omerap12 changed the title helm: add liveness&readiness probes for nfd master,topologyUpdater and nfd worker helm: add configurable liveness&readiness probes for master topology-updater and worker Jul 22, 2024
docs/deployment/helm.md Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 22, 2024
@omerap12
Copy link
Member Author

Added some alignment, maybe that ok? @marquiz

@ArangoGutierrez
Copy link
Contributor

Added some alignment, maybe that ok? @marquiz

Looks fine now to me, let's wait for @marquiz, and we can move forward with this PR

@marquiz
Copy link
Contributor

marquiz commented Jul 22, 2024

Added some alignment, maybe that ok? @marquiz

Looks good to me 👍

…updater and worker

Signed-off-by: Omer Aplatony <omerap12@gmail.com>
Copy link
Contributor

@marquiz marquiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @omerap12 for the persistence on this. Nice job, looks good to me
/cherry-pick release-0.16

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 23, 2024
@marquiz
Copy link
Contributor

marquiz commented Jul 23, 2024

ping @ArangoGutierrez

/cherry-pick release-0.16

@k8s-infra-cherrypick-robot

@marquiz: once the present PR merges, I will cherry-pick it on top of release-0.16 in a new PR and assign it to you.

In response to this:

ping @ArangoGutierrez

/cherry-pick release-0.16

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Contributor

@ArangoGutierrez ArangoGutierrez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 23, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 6b763b4390cc42d8e7b658b9b688132da1e37658

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ArangoGutierrez, marquiz, omerap12

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [ArangoGutierrez,marquiz]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 493aa0c into kubernetes-sigs:master Jul 23, 2024
9 checks passed
@k8s-infra-cherrypick-robot

@marquiz: #1801 failed to apply on top of branch "release-0.16":

Applying: helm: add configurable liveness&readiness probes for master topology-updater and worker
.git/rebase-apply/patch:86: trailing whitespace.
                
.git/rebase-apply/patch:142: trailing whitespace.
  
warning: 2 lines add whitespace errors.
Using index info to reconstruct a base tree...
M	deployment/helm/node-feature-discovery/templates/master.yaml
M	deployment/helm/node-feature-discovery/templates/topologyupdater.yaml
M	deployment/helm/node-feature-discovery/templates/worker.yaml
M	deployment/helm/node-feature-discovery/values.yaml
M	docs/deployment/helm.md
Falling back to patching base and 3-way merge...
Auto-merging docs/deployment/helm.md
CONFLICT (content): Merge conflict in docs/deployment/helm.md
Auto-merging deployment/helm/node-feature-discovery/values.yaml
Auto-merging deployment/helm/node-feature-discovery/templates/worker.yaml
Auto-merging deployment/helm/node-feature-discovery/templates/topologyupdater.yaml
Auto-merging deployment/helm/node-feature-discovery/templates/master.yaml
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 helm: add configurable liveness&readiness probes for master topology-updater and worker
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

ping @ArangoGutierrez

/cherry-pick release-0.16

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@marquiz
Copy link
Contributor

marquiz commented Jul 23, 2024

Cherry-picked in #1808

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make readiness and liveness probes configurable
5 participants