Pod spec equivalency checks can break Cluster Autoscaler scalability

The logic in `buildPodEquivalenceGroups` and `filterOutSchedulable` groups pods by their scheduling requirements, as a scalability optimization. This is done by first grouping by the controller UID, and then comparing pod specs for pods from one controller. If there's something in the pod spec that's unique to a single pod within a controller, every pod ends up in a group of its own, and the optimization breaks.

In extreme cases when there are a lot of such pods (a couple thousand can be enough), CA can spend such a long time in one loop that it fails health-checks and is killed by kubelet. Then everything repeats once it gets back up, and CA is effectively broken until the pods are scheduled or deleted.

One trigger for pod specs being different is the `BoundServiceAccountTokenVolume` feature, which injects uniquely-named projected volumes into each pod's spec. This was taken into account by CA in https://github.com/kubernetes/autoscaler/pull/4441.

We've just run into another one - Jobs using `completionMode: Indexed`. In this mode, each pod gets a unique, indexed hostname in its spec. This is documented here: https://kubernetes.io/docs/concepts/workloads/controllers/job/#completion-mode. AFAIU the hostname shouldn't affect scheduling, so sanitizing it in `PodSpecSemanticallyEqual` should be enough to fix this particular issue.

However, this approach of "fixing" single fields as issues pop up doesn't scale very well. We should come up with a more generic solution to these kinds of problems. One idea could be having a cutoff for the number of groups within one controller, proposed in https://github.com/kubernetes/autoscaler/pull/4441#discussion_r773014959.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pod spec equivalency checks can break Cluster Autoscaler scalability #4724

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pod spec equivalency checks can break Cluster Autoscaler scalability #4724

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions