Files
Space-Game/PLAN-kanban-performance.md
francy51 d3ae1da3e2 docs(kanban): add board performance plan
Diagnosis and phased plan for the board's lag, grounded in the codebase:
the dominant cause is a render storm (orchestrator state hoisted at the
top, every SSE event re-renders all cards; KanbanCard not memoized), plus
a 3s background poll, an N+1 on /board, and per-card status scans.

Covers four phases — memoize (now landed in 292005e), calm the poll, batch
the /board query, and small wins — with a measurement plan and risk notes.
EOF && echo "" && git log --oneline -3
2026-06-16 20:32:57 -04:00

182 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Implementation Board — Performance Plan
**Status:** Proposal · **Scope:** `apps/api` (SQLite backend) + `apps/docs` (board UI)
**Symptom:** The kanban board is "functional and good but laggy", especially while an agent run is in flight.
---
## TL;DR — root causes (in priority order)
| # | Cause | Where | Impact |
|---|-------|-------|--------|
| 1 | **State hoisting + zero memoization.** Orchestrator state (`runs`, `eventsByRun`, `bevyRunning`) lives in `KanbanBoardPage`. Every streamed SSE event → `setState` → the **entire board re-renders**. `KanbanCard` is not `React.memo`'d, and props that *would* enable memo (`orch`, `onOpen`) get a new identity every render. | `KanbanBoardPage.tsx`, `useOrchestrator.ts` | **High** — this is the dominant cause. During an active run, events stream several/sec → constant full-board re-renders of all 38 cards. |
| 2 | **3 s background poll** whenever any run is active (`anyRunning ? 3_000 : 10_000`). | `useOrchestrator.ts:103` | Medium — periodic re-renders even with no SSE; re-runs `listRuns` and re-creates the interval. |
| 3 | **N+1 queries** in `hydrateCard`: 3 prepared statements × 38 cards = **114+ round-trips per `/board`**. | `apps/api/src/routes/kanban.ts` (`hydrateCard`) | Medium — every board load/mutation reconcile pays this. |
| 4 | **O(cards × runs) scans** each render: each `KanbanCard` calls `orch.isRunning` + `orch.runForCard` + `orch.bevyIsRunning`, each `.filter`/`.find` over `runs`. | `KanbanCard.tsx`, `useOrchestrator.ts` | LowMedium — cheap individually, but 38 × per-render adds up under re-render storm. |
| 5 | Minor: `reload()` re-fetches the immutable doc-page registry on every reload/reconcile; inline style objects churn GC; `onOpen`/`orch` identities churn. | `useKanbanBoard.ts`, all kanban components | Low |
> **Evidence:** `sqlite3 apps/api/.data/kanban.db` → **38 cards, 71 references, 5 runs, 307 events.** The dataset is small, so the lag is **rendering**, not data size — which is why fixes #1#2 dominate.
---
## Phase 1 — Memoize the board cards so streaming events don't re-render them (the cure)
**Goal:** During an active agent run, streaming `text`/`tool_*` events must NOT re-render the 38 board cards. Re-rendering should be confined to the open card modal.
**Render trace (why the cards re-render today).** On every streamed event, `useOrchestrator` calls `setEventsByRun(...)` → that state lives **inside `KanbanBoardPage`**`KanbanBoardPage` re-renders → it maps 38 `<KanbanCard>`s. Each card is handed two props that are a **fresh object every render**, so `React.memo` could never bail:
```tsx
<KanbanCard orch={orch} // new object literal every render
onOpen={() => setOpenCardId(card.id)} /> // new closure every render
```
The things the card actually cares about — `card`, `pages`, `customPages`, and "is this card's run active?" — do **not** change on a `text` event; only the `orch` and `onOpen` identities churn. Make those identities stable and hand the card **primitives** instead of `orch`, and `React.memo` bails → all 38 cards stop re-rendering. That is the whole cure. (Moving the event log out of the top-level hook — the original "split state" idea — is demoted to Phase 1-optional; the parent's remaining per-event work is memoized/cheap.)
**Files touched: 3, all surgical, behavior identical.**
### 1a. `KanbanCard` — wrap in `React.memo`, take primitives instead of `orch`
The card body only *reads* run status (it never calls an orch mutator — only the modal does), so it needs two booleans, not the whole orchestrator object.
`apps/docs/src/components/kanban/KanbanCard.tsx`:
- Remove `orch: UseOrchestrator` from props; add `isRunning: boolean`, `bevyRunning: boolean`.
- Change `onOpen: () => void``onOpen: (id: string) => void`, and call `onOpen(card.id)`.
- Replace the three `orch.*` reads with the props.
- `export const KanbanCard = React.memo(function KanbanCard(props) { ... })`.
```tsx
export const KanbanCard = memo(function KanbanCard({
card, pages, customPages, isRunning, bevyRunning, onOpen,
}: Props) {
const active = isRunning || bevyRunning; // was: orch.isRunning(...) || orch.bevyIsRunning(...)
// …onClick={() => onOpen(card.id)}…
});
```
### 1b. `useOrchestrator` — expose one memoized `activeByCard` map
Add a single derived selector so each card reads its status in O(1) and the map is referentially stable unless the **active set** actually changes (not on every text event):
```ts
const activeByCard = useMemo(() => {
const m = new Map<string, { running: boolean; bevy: boolean; runId: string }>();
for (const r of runs) {
if (r.status === 'running') m.set(r.cardId, { running: true, bevy: bevyRunning.has(r.id), runId: r.id });
}
return m;
}, [runs, bevyRunning]);
// …return { …, activeByCard }
```
This also retires the per-render `.filter().find()` scans each card does today — fixes #4.
### 1c. `KanbanBoardPage` — pass stable props
```ts
const openById = useCallback((id: string) => setOpenCardId(id), []); // setOpenCardId is already stable
const active = orch.activeByCard; // stable across text events
// …in the columns map:
const a = active.get(card.id);
<KanbanCard key={card.id}
card={card} // stable ref from memoized byColumn (changes only when THIS card mutates)
pages={pages} // set only on reload()
customPages={customPages} // context value memoized in CustomPagesProvider
isRunning={!!a?.running}
bevyRunning={!!a?.bevy}
onOpen={openById} // stable useCallback
/>
```
With every prop `===` across a `text`-event re-render, `React.memo` bails → the 38 cards are skipped.
**Stable-ref claims verified:** `pages` is set only in `reload()` (`useKanbanBoard.ts`); `customPages` comes from `CustomPagesProvider`, whose context value is `useMemo`-ed on `[pages, …]` (`customPagesStore.tsx`). Neither changes on orchestrator events. ✓
### 1d. (trivial, include it) memoize the static siblings
Wrap `StatCard` in `React.memo`; extract the fully-static category legend to a module-scope constant or a memoized `<CategoryLegend />`. Removes the last ~22 small reconciliations the parent does per event. Cheap insurance.
### What this does NOT change
- Behavior is identical — same renders, same data flow, just fewer wasted ones.
- `CardModal`/`AgentRunBar` still re-render on events (intended — they show the live log).
- `eventsByRun` stays in `useOrchestrator` (no SSE/state split).
### Phase 1-optional (only if the profiler shows residual parent churn) — split the event log out
Move the `EventSource` subscription and the `eventsByRun` entry into a per-card `useRunEvents(runId)` used only by `AgentRunBar`; push `status`/`bevy`/`done` changes back up via registry mutators (`reflectRunStatus`, `reflectBevy`). Result: text events re-render only the modal; `KanbanBoardPage` itself renders only on lifecycle changes. **Analysis says this is unnecessary once 1a1d are in** (the parent's per-event work is memoized/cheap) — so it is gated behind the measurement checkpoint, not part of the default cure.
---
## Phase 2 — Calm the background poll (#2)
- **Raise the active interval to ~58 s** and make it **adaptive on visibility**: only poll while the board tab is visible (`document.visibilityState === 'visible'`). Pause on `visibilitychange` → hidden.
- **Skip the poll entirely when an SSE stream is open** for the only running run — the stream already delivers status/bevy/done events; polling is redundant then. (Keep the poll as a fallback only when no stream is attached.)
- Keep the existing signature guard (it's good) and extend it to short-circuit before `setLoading`/any state write.
> The SSE path is the source of truth while a modal is open; the poll should be a *liveness backstop*, not the primary updater.
---
## Phase 3 — Kill the N+1 on `/board` (#3)
Replace the per-card prepared-statement fan-out in `hydrateCard` with **three batched queries**, then stitch in JS:
```ts
kanban.get('/board', (c) => {
const cards = db.prepare('SELECT * FROM cards ORDER BY sort_order ASC').all() as CardRow[];
const ids = cards.map((c) => c.id);
// batch via a parameterized IN (...) built from `ids`
const refs = db.prepare(allRefsSql(ids)).all(...ids) as ReferenceRow[];
const tags = db.prepare(allTagsSql(ids)).all(...ids) as { card_id: string; tag: string }[];
const cmts = db.prepare(allCmtsSql(ids)).all(...ids) as CommentRow[];
const byCard = groupByCard(cards, refs, tags, cmts); // O(n) stitch
return c.json({ cards: cards.map((r) => byCard[r.id]), lastUpdated: ... });
});
```
Helper to build safe `IN (?, ?, ...)` lists for `better-sqlite3` (or bind a JSON/temp-table). Result: **4 queries total** instead of 3N+1, and `hydrateCard` collapses to a pure shape-mapper.
Indexes already exist (`idx_comments_card`, `idx_tags_card`, `idx_refs_card`) so the batched queries are cheap.
---
## Phase 4 — Small wins (#5)
- **Stop refetching pages on every reload.** `DOC_PAGES` is a static registry loaded at module init. Fetch it **once** (cache in the hook / a tiny module-level promise), and have `reload()` hit only `/board`. The explicit "Reload" button can keep refetching pages if desired.
- **`reload()` on mutation failure** currently re-fetches board *and* pages; drop pages there too.
- **Hoist static inline style objects** to module scope where they're already constants (a few are local `const`s inside components — promote them). This is GC-churn only, do last.
- Optional: if the board grows past ~80100 cards, add **windowing/virtualization** to each column (e.g. `@tanstack/react-virtual`). Not needed at 38 cards; document the threshold.
---
## Measurement plan (verify before/after)
1. **React Profiler** (React DevTools) on `KanbanBoardPage` while a run streams events:
- *Before:* every SSE event re-renders `KanbanBoardPage` + all 38 `KanbanCard`s.
- *After (1a1d):* SSE events commit `KanbanBoardPage` (cheap) + `CardModal`/`AgentRunBar`; **0 `KanbanCard` commits** while the modal is open.
2. **Count commits** during a 10 s active run: target `board commits ≈ 0` while only the modal is open.
3. **`/board` latency**: `curl -w '%{time_total}' localhost:3001/api/kanban/board` — expect a drop from the N+1 path (measure with `EXPLAIN QUERY PLAN` to confirm single scans).
4. **Network**: during an active run with the modal open, confirm no `/api/orchestrator/runs` poll requests fire (Phase 2).
---
## Suggested execution order & effort
| Phase | Effort | Risk | Expected impact |
|-------|--------|------|-----------------|
| **1a1d** (memoize + primitive props) | SmallMedium | Low (pure refactor, identical behavior) | **Cures the lag** during runs |
| **1-opt** (split event log out) | Medium | LowMed | Removes residual parent churn — gated on profiler |
| **2** (adaptive poll) | Small | Low | Removes idle re-renders / network |
| **3** (batch `/board`) | SmallMedium | Low | Faster loads & reconciles |
| **4** (small wins) | Small | Low | Polish |
Recommend landing **Phase 1 first** and re-testing — it alone should resolve the perceived lag. Phases 24 are reinforcing improvements.
---
## Out of scope (noted, not addressed)
- No schema/feature changes; behavior is preserved.
- SpacetimeDB persistence migration is unrelated.
- The "agent may never mark `done`" rule is untouched.