Cluster Autoscaler 1.25.0
·
66 commits
to cluster-autoscaler-release-1.25
since this release
Changelog
General
- Parallel node drain for scale down is partially implemented, controlled by the
--max-scale-down-parallelismand--max-drain-parallelismflags. The feature is still under development and not recommended for production use. Currently setting--max-drain-parallelismto>1is a no-op, while--max-scale-down-parallelismcan be used to control the maximum number of empty nodes deleted at the same time instead of the existing--max-empty-bulk-deleteflag. The full feature is targeting Cluster Autoscaler 1.26 (tracking issue). - Introduces the
--max-pod-eviction-timeflag to allow configuration ofMaxPodEvictionTime, i.e., the maximum time the cluster autoscaler tries to evict a pod (#4842). - Backoff time parameters are now configurable via the
--initial-node-group-backoff-duration,--max-node-group-backoff-duration, and--node-group-backoff-reset-timeoutflags (#3853). - Event de-duplication is now configurable via the
--enable-event-duplicationflag (#4921). - Fix an issue where CA could drastically overshoot scale-up for pods using zonal scheduling constraints (PodTopologySpreading or PodAntiAffinity on zonal topology) (#4970).
- Limit maximum duration of binpacking simulation to prevent CA becoming unresponsive in huge scale-up scenarios. Introduce
--max-nodes-per-scaleupand--max-nodegroup-binpacking-durationthat can be used to control this behavior (note: those flags are only meant for fine-tuning scale-up calculation latency; they're not intended for rate-limiting scale-up) (#4970). PodDisruptionBudgetis bumped fromv1beta1tov1(#4990).- Non-root user is now used for Cluster Autoscaler's base image (#4728).
- Add an option to balance node groups exlusively by a set of labels defined by the
--balancing-labelflag (#4174). - A new metric (
cluster_autoscaler_skipped_scale_events_count) has been added to monitor when CPU and memory resource limits have been exceeded (#5059).
GCE
- Correct memory and ephemeral storage capacity calculations for ARM instances (#4899).
- Add ephemeral storage pricing (#4911).
- Correct invalid pricing for
n2-highmem-128, and then2dfamily (#4959). - Fix support for unusual custom machine types (from families other than n1, or using extended memory) (#5024, #5103).
- Make
VM_EXTERNAL_IP_ACCESS_POLICY_CONSTRAINTerror code recognizable (#5057). - Add pricing for new A2 shapes and GPUs (#5070).
AWS
- Support for NVIDIA A10G GPU type added (#4920).
- Instance type list is updated, including
c7g,i4i,x2i(e)dn,c6id,m6id(#4917, #4925). - Cluster Autoscaler can still work if instance type listing fails (#4873).
- DescribeAutoScalingGroups now supports directly including tag filers, which results in less API calls to AWS. Users of the AWS Cloudprovider may want to update their IAM roles to remove the DescribeTags action as this is no longer used
- Add support for attribute-based instance type selection for AWS using available instance requirements (#4588).
Azure
- Update instance types, including
Standard_Ls_v3,Standard_HB120, andStandard_NC(#5037). - Effectively cache instance-types SKUs (#5047).
Hetzner
- Add support for hcloud firewall feature (#4185).
- Add Hetzner public IPv4 and IPv6 configuration (#5001).
- Add metrics for API calls (#5049).
- Cache Hetzner Cloud API requests (#5055).
Cluster API
- Drop deprecated annotations (#4928).
- Add support for scaling to and from zero nodes (#4840). Enabling this feature will require changes by the user, for instruction please see the Cluster API (clusterapi) provider README file.
OVHcloud
- Various bug fixes (#4874),
OCI
- Support for skipping time-consuming findsInstanceByDetails API calls, turned off by default (#4860).
External gRPC
- Proxy cloud provider for pluggable out-of-tree cloud provider implementations over gRPC is implemented (#4654).
CherryServers
- Cluster Autoscaler support for CherryServers is implemented (#4843).
- Support for including SSH keys to node pools (#4867).
- Support for passing os partition size when creating nodes (#4955),
Civo
- Cluster Autoscaler support for Civo is implemented (#4852).
Scaleway
- Cluster Autoscaler support for Scaleway is implemented (#5062).
Rancher
- Cluster Autoscaler support for Rancher with RKE2 is implemented (#4975).
Kamatera
- Cluster Autoscaler support for Kamatera is implemented (#5101).
Images
k8s.gcr.io/autoscaling/cluster-autoscaler:v1.25.0k8s.gcr.io/autoscaling/cluster-autoscaler-arm64:v1.25.0k8s.gcr.io/autoscaling/cluster-autoscaler-amd64:v1.25.0