Skip to content

Documentation for ignore-taint #5251

@hterik

Description

@hterik

Which component are you using?:
Cluster Autoscaler

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:
I would like to understand how to best use ignore-taint.
Both the command-line argument --ignore-taint, and the annotation prefix ignore-taint.cluster-autoscaler.kubernetes.io/.
See below for exact use case.

Describe the solution you'd like.:
Behaviour, use cases and examples for ignore-taint explained in cluster-autoscaler/FAQ.md

Describe any alternative solutions you've considered.:
go run main.go --help, very brief explanation.

Reading the source code. It get quite hairy for an outsider when starting with the ClusterSnapshot stuff.

Digging through issues,

  • Here is where i found the best explanation so far: Tainting node using ignored taint causes node groups to become unhealthy #3985 (comment)

    The only use-case supported by ignore-taint is one where a node requires additional custom initialization (ex. installing drivers, starting some DS) before it can accept pods. In this case node can be started with ignore-taint and the taint can be removed once the initialization is done.

    As long as the node has an ignore-taint CA will treat it as still booting up / unready. This is needed to avoid infinite scale-up (node is created with ignore-taint, pods remain pending, CA immediately triggers scale-up again, this cycle repeats until the taint is removed from some nodes). Since a node with ignore-taint is unready as far as CA is concerned, once there are enough such nodes in the NodeGroup CA will start treating the NG as unhealthy (thresholds are controlled by --max-total-unready-percentage, --ok-total-unready-count).

  • PR describing use case waiting for DaemonSets to init: cluster-autoscaler/taints: ignore taints on existing nodes #2758

  • PR adding annotation-prefix: Remove taints with specific prefix from template node #2733

  • PR adding --ignore-taints flag: Add support for passing custom ignored labels #2493

    for example if they have their own readiness labels or other markings that should not affect balancing behaviour.


Additional context.:

To explain the situation i'm trying to solve.
We have batch oriented pods that need to run on nodes with a pre-warmed cache in a shared hostPath. Network PVs don't work for this. Each batch takes around 10 minutes to process, pre-baking the cache takes 30 minutes. Each warm node can handle 3 such pods at a time.
Image you have one node N1, already serving 3 pods, P1,P2,P3. Now fourth pod, P4, is added to the scheduler-queue. Autoscaler sees that one pod is unschedulable and starts a new node N2.
Now N2 will not be able to serve the fourth pod until it has prepared the cache, which will take 30 minutes, but we also know that any of P1,P2,P3 will finish in only 10, so when that happens it's better to put P4 onto N1, while N2 is still warming up. (The load pattern is such that even after this, N2 will not be redundant because the queue is having consistent pressure with additional P5,6,7...)

To do this, i was imagining one could add a taint to the NodePool, CachePrepared=false:NoSchedule, a DaemonSet tolerating this taint will start on the Node, prepare the cache and remove the taint. All other pods get blocked from scheduling on the node until the taint is removed.

From my understanding, --ignore-taint CachePrepared would solve the situation of creating N2 even though it can not schedule P4 according to its template. What i'm unsure about is what happens after that, will the autoscaler understand that N2 is upcoming or will it still consider P4 as unschedulable and allocate a new N3?

If it makes any difference, using AKS with the built in autoscaler.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/cluster-autoscalerarea/core-autoscalerDenotes an issue that is related to the core autoscaler and is not specific to any provider.kind/featureCategorizes issue or PR as related to a new feature.lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions