-
Notifications
You must be signed in to change notification settings - Fork 0
Closed
coder/coder
#20577Description
CI Run Link: https://github.com/coder/coder/actions/runs/18689201542
Branch: main
Commit: 86f0f39863a27040acd17dd6bc354cc6a430df7c (Steven Masley) — coder/coder@86f0f39
Summary:
- The "required" aggregator job failed because it detected a cancelled required check: gen.
- There were no failing tests; most test jobs succeeded. The gen job shows an explicit cancellation.
Evidence:
- required job log shows cancelled check and exits 1:
Checking required checks
- changes: success
- fmt: success
- lint: success
- gen: cancelled
- test-go-pg: success
- test-go-pg-17: success
- test-go-race-pg: success
- test-js: success
- test-e2e: success
- offlinedocs: success
- check-build: skipped
One of the required checks has failed or has been cancelled
##[error]Process completed with exit code 1.
- gen job log shows cancellation (no test failure):
##[error]The operation was canceled.
Post job cleanup.
Terminate orphan process: pid (...) (make)
Classification: Infrastructure/CI pipeline
- Root cause: The required checks aggregator treats cancelled jobs as failures. The gen job was cancelled mid-run, which triggered the aggregator to fail and sent a Slack alert, despite all test suites passing.
- Not a test flake, not a data race, and no panic/OOM found.
Timing verification:
- required job failed at 2025-10-21T15:45:36Z, matching the Slack notification window for this run.
Duplicates search (last 30 days & historical keywords):
- Queries used in coder/internal:
- "One of the required checks has failed or has been cancelled"
- gen cancelled required checks
- ci required checks cancelled
- required checks aggregator
- Closest prior issue: flake: CI failure in main - gen job (pnpm setup) and build job (Java setup) #929 (different root cause: network timeouts in setup steps). No active duplicate found for cancelled gen causing required to fail.
Assignment analysis (component ownership):
- This is owned by CI workflow maintainers. Recent ownership signals for .github/workflows/ci.yaml:
- ci: make
changesrequired (#20131) — Ethan - ci: make test-go-pg-17 a required check (#19722) — Ethan
- ci: ping slack on ci failures / prompt changes (#19835, #19435) — Ethan
- Recent edits also by Michael for Slack agent wiring (#20379)
- ci: make
- Given recent substantive changes adding and tuning the required checks mechanism were by Ethan, assigning to @ethanndickson for triage.
Recommendations:
- Consider allowing specific jobs (e.g., gen) to be treated as neutral on cancellation, or gate the aggregator until all required jobs are completed/succeeded/explicitly skipped.
- Investigate why gen was cancelled mid-run within the same workflow (concurrency/cancel-in-progress or runner preemption). If expected, update the aggregator’s logic; if not, address job stability.
Repro/Next steps:
- Review .github/workflows/ci.yaml: required job logic for handling cancelled jobs.
- Check workflow concurrency settings and any conditions that could cancel gen.
- If cancellations are expected in some paths, adjust required to ignore those paths or re-run gen automatically.
Quality Checklist:
- Identified actual failing job and validated timing
- Downloaded and reviewed failing job logs (required) and cancelled job logs (gen)
- Searched coder/internal for duplicates with multiple query patterns
- Classified as Infrastructure, not a test/data race/process crash
- Assignment based on component ownership history, not PR/commit author of the failing run
Metadata
Metadata
Assignees
Labels
No labels