Skip to content

Conversation

@yaroslava-serdiuk
Copy link
Contributor

What type of PR is this?

/kind bug

What this PR does / why we need it:

Added label for outer loop and break outer loop if the timer send signal. Before CA did nothing and didn't have any limit for scale down simulation.

Also, I modified the default value for scaleDownSimulationTimeout flag, because 30 seconds is enough to process ~1000 non empty nodes and in the same time cluster snapshot is not too old comparing to the snapshot for 5 minutes timeout.

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 28, 2023
@k8s-ci-robot k8s-ci-robot requested a review from x13n February 28, 2023 19:22
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Feb 28, 2023

// Contains returns true iff a given node is unremovable.
func (n *Nodes) Contains(nodeName string) bool {
_, found := n.ttls[nodeName]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works for nodes added with AddTimeout, but not for Add/AddReason. I think checking n.reasons would actually deliver on the promise from the function comment. Also, since you're extending the public interface of this struct, it might be worth to add tests verifying that nodes added via any of the Add* methods can then be checked for presence using Contains.

}
p.nodeUtilizationMap = utilizationMap
timer := time.NewTimer(p.context.ScaleDownSimulationTimeout)
RemovalSimulation:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, optional: I'd consider extracting an extra function (e.g. timedOut(time.Timer) bool) instead of introducing the loop label. I'm a fan of short functions :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand your proposal. Do you mean extract the whole loop to another function or perform a for loop with additional condition timedOut(time.Timer)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracting the whole loop into the function will reduce readability as for me, because the function will have a lot of parameters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, extracting the loop might not be very readable, I was thinking about replacing the whole select statement with:

if timedOut(timer) {
    break
}

to avoid the need for a label.

case <-timer.C:
klog.Warningf("%d out of %d nodes skipped in scale down simulation due to timeout.", len(currentlyUnneededNodeNames)-i, len(currentlyUnneededNodeNames))
break
break RemovalSimulation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about testing if the timeout is actually honored?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's require a timer mock, because with ScaleDownSimulationTimeout=0seconds, the channel still sends a signal later then the loop processing for small number of nodes.
I don't think we want to move timer to categorizeNodes() variables. Do you have other ideas how to mock the timer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking rather about using real timer, but mocking removalSimulator to make sure it is not called after the timeout.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 1, 2023
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 1, 2023
}
}

func TestContains(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the test! It would be good to have a test case for a node that wasn't added. Right now it seems the test checks whether Contains always returns true :)

@yaroslava-serdiuk yaroslava-serdiuk changed the title Fix RemovalSimulation Fix RemovalSimulation for parallel scale down Mar 1, 2023
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 1, 2023
@yaroslava-serdiuk
Copy link
Contributor Author

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 1, 2023
p := New(&context, NewTestProcessors(&context), deleteOptions)
p.eligibilityChecker = &fakeEligibilityChecker{eligible: asMap(tc.eligible)}
if tc.isSimulationTimeout {
context.AutoscalingOptions.ScaleDownSimulationTimeout = 1 * time.Nanosecond
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried ns/ms scale may make this test flaky - in particular 1ns can easily pass before we have a chance to process the first node.

@x13n
Copy link
Member

x13n commented Mar 1, 2023

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 1, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: x13n, yaroslava-serdiuk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 1, 2023
@yaroslava-serdiuk
Copy link
Contributor Author

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 1, 2023
@k8s-ci-robot k8s-ci-robot merged commit e1d9861 into kubernetes:master Mar 1, 2023
@yaroslava-serdiuk yaroslava-serdiuk deleted the scalability branch May 28, 2024 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cluster-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants