fix(kanban): recover and self-heal stuck agent runs
A run could get stranded at 'running' in the UI after a crash/disconnect/ restart, with no way to clear it. Root cause was a race: the SSE history replay re-asserted a stale `running` status that beat the poll's settled status, leaving the run showing "Running" + the settle error at once. Server (runs.ts / runner.ts / index.ts): - reconcile() on every read force-settles any 'running' run with no live runner, so the board self-heals on the next poll (≤3s) — no restart needed. - forceSettle() emits a persisted `status` event so an open/reconnecting SSE stream replays the terminal state last, not a stale `running`. - Startup orphan-reconciliation now also emits that event (was the gap that let the replay re-assert `running` after a server restart). - Idle watchdog (10min): a silent pi is settled as 'failed' instead of hanging forever; SIGKILL escalation (20s) reaps wedged processes. - stop() now recovers: active→abort, orphaned-but-running→force-stop (the Stop button clears wedged runs instead of 409'ing). - start() catch force-settles 'failed' so a spawn failure never orphans a half-created 'running' row. Client (useOrchestrator.ts): - patchRun refuses to un-settle a terminal run, dropping stale replayed status as a belt-and-suspenders guard against any such race. EOF && echo "" && git log --oneline -3
This commit is contained in:
@@ -87,7 +87,7 @@ orchestrator.post('/runs/:id/message', async (c) => {
|
||||
}
|
||||
});
|
||||
|
||||
/** Stop an active run. */
|
||||
/** Stop an active or wedged run (force-settles a run with no live process). */
|
||||
orchestrator.post('/runs/:id/stop', (c) => {
|
||||
const id = c.req.param('id');
|
||||
try {
|
||||
|
||||
Reference in New Issue
Block a user