feat: sandcastle reworks PRs with failing CI #29

Merged
weiwen merged 2 commits from sandcastle/issue-20 into main 2026-07-05 12:02:34 +08:00
Owner

Closes #20

What changed and why

The sandcastle loop now handles two classes of work: PR reworks (open PRs whose CI is red) and new issues (the existing path). PR-rework items are prioritized — they claim concurrency slots before new issues are scheduled.

New components

  • .sandcastle/pr-fixer-prompt.md — drives a new pr-fixer agent stage that reproduces the CI failure locally (nix develop .#ci -c just check), fixes it minimally, pushes to the existing branch, and posts a comment. It explicitly must not open a new PR or change any labels.
  • plan-prompt.md — extended to query tea pulls list for PRs with ci == "failure" and include them in the plan output as a prReworks array, listed before issues.
  • main.mts — the work-item list is now a WorkItem union (PrReworkItem | NewIssueItem). A MAX_CONCURRENCY = 5 cap slices the combined list; excess items defer to the next iteration. Dispatch branches on item.kind.

Key design decisions

  • Label-free CI path: PRs stay ready-for-human throughout rework. No relabelling happens — the live ci field in the Forgejo API governs re-pick on each iteration.
  • pending is a no-op: only failure triggers a rework attempt. After a fix push CI re-runs; if it fails again the PR is re-picked on a later iteration.
  • Concurrency budget: MAX_CONCURRENCY prevents runaway parallelism. PR-rework items are slotted first so a busy backlog of new issues never starves CI fixes.
  • Type derivation refactor (second commit): Plan, PrReworkItem, and NewIssueItem are now derived from the zod schema rather than maintained as parallel hand-written types.

Documentation

docs/agents/triage-labels.md now documents the CI-rework path so it's discoverable alongside the label lifecycle.

Reviewer checklist

  • pr-fixer-prompt.md: are the push command and comment command correct? Does the "STOP on push failure" rule prevent half-done state?
  • plan-prompt.md: does the jq filter (select(.ci == "failure")) correctly exclude pending and success PRs?
  • main.mts: concurrency slice — does PR-rework-first ordering hold when MAX_CONCURRENCY < prReworks.length? (It does — prReworks are mapped first before issues.)
  • MAX_CONCURRENCY = 5: reasonable default?
Closes #20 ## What changed and why The sandcastle loop now handles two classes of work: **PR reworks** (open PRs whose CI is red) and **new issues** (the existing path). PR-rework items are prioritized — they claim concurrency slots before new issues are scheduled. ### New components - **`.sandcastle/pr-fixer-prompt.md`** — drives a new `pr-fixer` agent stage that reproduces the CI failure locally (`nix develop .#ci -c just check`), fixes it minimally, pushes to the existing branch, and posts a comment. It explicitly must not open a new PR or change any labels. - **`plan-prompt.md`** — extended to query `tea pulls list` for PRs with `ci == "failure"` and include them in the plan output as a `prReworks` array, listed before `issues`. - **`main.mts`** — the work-item list is now a `WorkItem` union (`PrReworkItem | NewIssueItem`). A `MAX_CONCURRENCY = 5` cap slices the combined list; excess items defer to the next iteration. Dispatch branches on `item.kind`. ### Key design decisions - **Label-free CI path**: PRs stay `ready-for-human` throughout rework. No relabelling happens — the live `ci` field in the Forgejo API governs re-pick on each iteration. - **`pending` is a no-op**: only `failure` triggers a rework attempt. After a fix push CI re-runs; if it fails again the PR is re-picked on a later iteration. - **Concurrency budget**: `MAX_CONCURRENCY` prevents runaway parallelism. PR-rework items are slotted first so a busy backlog of new issues never starves CI fixes. - **Type derivation refactor** (second commit): `Plan`, `PrReworkItem`, and `NewIssueItem` are now derived from the zod schema rather than maintained as parallel hand-written types. ### Documentation `docs/agents/triage-labels.md` now documents the CI-rework path so it's discoverable alongside the label lifecycle. ## Reviewer checklist - [ ] `pr-fixer-prompt.md`: are the push command and comment command correct? Does the "STOP on push failure" rule prevent half-done state? - [ ] `plan-prompt.md`: does the `jq` filter (`select(.ci == "failure")`) correctly exclude `pending` and `success` PRs? - [ ] `main.mts`: concurrency slice — does PR-rework-first ordering hold when `MAX_CONCURRENCY < prReworks.length`? (It does — `prReworks` are mapped first before `issues`.) - [ ] `MAX_CONCURRENCY = 5`: reasonable default?
Task: Extend the sandcastle orchestration to detect open PRs with failing
CI and route them through a pr-fixer stage before picking up new issues.
PRD: issue #20.

Key decisions:
- plan-prompt.md now fetches open PRs via `tea pulls list --fields ci` and
  includes any with `ci == "failure"` as prReworks in the plan output.
  Pending PRs are skipped; no dependency analysis is needed for reworks.
- planSchema extended with a `prReworks` array alongside the existing
  `issues` array. The planner emits two groups; the orchestrator reads both.
- MAX_CONCURRENCY = 5 caps active pipelines per iteration. prRework items
  claim slots first; new issues fill the remainder; overflow defers
  naturally to the next iteration.
- PR-rework items route to a new pr-fixer stage (pr-fixer-prompt.md): runs
  `nix develop .#ci -c just check`, fixes the failure, pushes to the
  existing branch, and posts a brief comment. No label changes — CI status
  governs re-pick.
- When no PRs need rework, the new-issue pipeline (implement → review →
  pr-open) is unchanged.
- tsconfig.json + typecheck/test npm scripts added so `npm run typecheck`
  works on the .mts orchestration code.

Files changed:
- .sandcastle/plan-prompt.md: adds failing-PRs section and two-group output
- .sandcastle/main.mts: schema, MAX_CONCURRENCY, work-item routing, logging
- .sandcastle/pr-fixer-prompt.md: new pr-fixer agent prompt
- docs/agents/triage-labels.md: documents CI-rework path
- package.json, tsconfig.json: typecheck infrastructure

Blockers/notes:
- No live sandcastle run possible in this sandbox; TypeScript passes clean.
- The `tea pulls list` ci field values (failure/pending/success) match
  Forgejo's API — verified via `tea pulls list --help`.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
refactor: derive plan item types from schema in sandcastle loop
Some checks failed
CI / check (pull_request) Failing after 1m54s
e8be5471c4
Replace the duplicated inline object-shape annotations on the workItems
map callbacks with types inferred from planSchema. plan.output is typed
'any' by the library, so the annotations were load-bearing; routing it
through a typed 'const output: Plan' restores type safety while keeping
the shape defined once, and shortens the over-long map lines.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
weiwen merged commit 5e324e5cdc into main 2026-07-05 12:02:34 +08:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
weiwen/evie!29
No description provided.