Cluster Autoscaler 1.25.0

towca released this 06 Sep 08:44

· 66 commits to cluster-autoscaler-release-1.25 since this release

cluster-autoscaler-1.25.0

e8d3e9b

Changelog

General

Parallel node drain for scale down is partially implemented, controlled by the --max-scale-down-parallelism and --max-drain-parallelism flags. The feature is still under development and not recommended for production use. Currently setting --max-drain-parallelism to >1 is a no-op, while --max-scale-down-parallelism can be used to control the maximum number of empty nodes deleted at the same time instead of the existing --max-empty-bulk-delete flag. The full feature is targeting Cluster Autoscaler 1.26 (tracking issue).
Introduces the --max-pod-eviction-time flag to allow configuration of MaxPodEvictionTime, i.e., the maximum time the cluster autoscaler tries to evict a pod (#4842).
Backoff time parameters are now configurable via the --initial-node-group-backoff-duration, --max-node-group-backoff-duration, and --node-group-backoff-reset-timeout flags (#3853).
Event de-duplication is now configurable via the --enable-event-duplication flag (#4921).
Fix an issue where CA could drastically overshoot scale-up for pods using zonal scheduling constraints (PodTopologySpreading or PodAntiAffinity on zonal topology) (#4970).
Limit maximum duration of binpacking simulation to prevent CA becoming unresponsive in huge scale-up scenarios. Introduce --max-nodes-per-scaleup and --max-nodegroup-binpacking-duration that can be used to control this behavior (note: those flags are only meant for fine-tuning scale-up calculation latency; they're not intended for rate-limiting scale-up) (#4970).
PodDisruptionBudget is bumped from v1beta1 to v1 (#4990).
Non-root user is now used for Cluster Autoscaler's base image (#4728).
Add an option to balance node groups exlusively by a set of labels defined by the --balancing-label flag (#4174).
A new metric (cluster_autoscaler_skipped_scale_events_count) has been added to monitor when CPU and memory resource limits have been exceeded (#5059).

GCE

Correct memory and ephemeral storage capacity calculations for ARM instances (#4899).
Add ephemeral storage pricing (#4911).
Correct invalid pricing for n2-highmem-128, and the n2d family (#4959).
Fix support for unusual custom machine types (from families other than n1, or using extended memory) (#5024, #5103).
Make VM_EXTERNAL_IP_ACCESS_POLICY_CONSTRAINT error code recognizable (#5057).
Add pricing for new A2 shapes and GPUs (#5070).

AWS

Support for NVIDIA A10G GPU type added (#4920).
Instance type list is updated, including c7g, i4i, x2i(e)dn, c6id, m6id (#4917, #4925).
Cluster Autoscaler can still work if instance type listing fails (#4873).
DescribeAutoScalingGroups now supports directly including tag filers, which results in less API calls to AWS. Users of the AWS Cloudprovider may want to update their IAM roles to remove the DescribeTags action as this is no longer used
Add support for attribute-based instance type selection for AWS using available instance requirements (#4588).

Azure

Update instance types, including Standard_Ls_v3, Standard_HB120, and Standard_NC (#5037).
Effectively cache instance-types SKUs (#5047).

Hetzner

Add support for hcloud firewall feature (#4185).
Add Hetzner public IPv4 and IPv6 configuration (#5001).
Add metrics for API calls (#5049).
Cache Hetzner Cloud API requests (#5055).

Cluster API

Drop deprecated annotations (#4928).
Add support for scaling to and from zero nodes (#4840). Enabling this feature will require changes by the user, for instruction please see the Cluster API (clusterapi) provider README file.

OVHcloud

Various bug fixes (#4874),

OCI

Support for skipping time-consuming findsInstanceByDetails API calls, turned off by default (#4860).

External gRPC

Proxy cloud provider for pluggable out-of-tree cloud provider implementations over gRPC is implemented (#4654).

CherryServers

Cluster Autoscaler support for CherryServers is implemented (#4843).
Support for including SSH keys to node pools (#4867).
Support for passing os partition size when creating nodes (#4955),

Civo

Cluster Autoscaler support for Civo is implemented (#4852).

Scaleway

Cluster Autoscaler support for Scaleway is implemented (#5062).

Rancher

Cluster Autoscaler support for Rancher with RKE2 is implemented (#4975).

Kamatera

Cluster Autoscaler support for Kamatera is implemented (#5101).

Images

k8s.gcr.io/autoscaling/cluster-autoscaler:v1.25.0
k8s.gcr.io/autoscaling/cluster-autoscaler-arm64:v1.25.0
k8s.gcr.io/autoscaling/cluster-autoscaler-amd64:v1.25.0

Assets 2