Files

francy51 d3ae1da3e2 docs(kanban): add board performance plan

Diagnosis and phased plan for the board's lag, grounded in the codebase:
the dominant cause is a render storm (orchestrator state hoisted at the
top, every SSE event re-renders all cards; KanbanCard not memoized), plus
a 3s background poll, an N+1 on /board, and per-card status scans.

Covers four phases — memoize (now landed in 292005e), calm the poll, batch
the /board query, and small wins — with a measurement plan and risk notes.
EOF && echo "" && git log --oneline -3

2026-06-16 20:32:57 -04:00

11 KiB

Raw Blame History

Implementation Board — Performance Plan

Status: Proposal · Scope: apps/api (SQLite backend) + apps/docs (board UI) Symptom: The kanban board is "functional and good but laggy", especially while an agent run is in flight.

TL;DR — root causes (in priority order)

#	Cause	Where	Impact
1	State hoisting + zero memoization. Orchestrator state (`runs`, `eventsByRun`, `bevyRunning`) lives in `KanbanBoardPage`. Every streamed SSE event → `setState` → the entire board re-renders. `KanbanCard` is not `React.memo`'d, and props that would enable memo (`orch`, `onOpen`) get a new identity every render.	`KanbanBoardPage.tsx`, `useOrchestrator.ts`	High — this is the dominant cause. During an active run, events stream several/sec → constant full-board re-renders of all 38 cards.
2	3 s background poll whenever any run is active (`anyRunning ? 3_000 : 10_000`).	`useOrchestrator.ts:103`	Medium — periodic re-renders even with no SSE; re-runs `listRuns` and re-creates the interval.
3	N+1 queries in `hydrateCard`: 3 prepared statements × 38 cards = 114+ round-trips per `/board`.	`apps/api/src/routes/kanban.ts` (`hydrateCard`)	Medium — every board load/mutation reconcile pays this.
4	O(cards × runs) scans each render: each `KanbanCard` calls `orch.isRunning` + `orch.runForCard` + `orch.bevyIsRunning`, each `.filter`/`.find` over `runs`.	`KanbanCard.tsx`, `useOrchestrator.ts`	Low–Medium — cheap individually, but 38 × per-render adds up under re-render storm.
5	Minor: `reload()` re-fetches the immutable doc-page registry on every reload/reconcile; inline style objects churn GC; `onOpen`/`orch` identities churn.	`useKanbanBoard.ts`, all kanban components	Low

Evidence: sqlite3 apps/api/.data/kanban.db → 38 cards, 71 references, 5 runs, 307 events. The dataset is small, so the lag is rendering, not data size — which is why fixes #1–#2 dominate.

Phase 1 — Memoize the board cards so streaming events don't re-render them (the cure)

Goal: During an active agent run, streaming text/tool_* events must NOT re-render the 38 board cards. Re-rendering should be confined to the open card modal.

Render trace (why the cards re-render today). On every streamed event, useOrchestrator calls setEventsByRun(...) → that state lives inside KanbanBoardPage → KanbanBoardPage re-renders → it maps 38 <KanbanCard>s. Each card is handed two props that are a fresh object every render, so React.memo could never bail:

<KanbanCard orch={orch}                          // new object literal every render
  onOpen={() => setOpenCardId(card.id)} />        // new closure every render

The things the card actually cares about — card, pages, customPages, and "is this card's run active?" — do not change on a text event; only the orch and onOpen identities churn. Make those identities stable and hand the card primitives instead of orch, and React.memo bails → all 38 cards stop re-rendering. That is the whole cure. (Moving the event log out of the top-level hook — the original "split state" idea — is demoted to Phase 1-optional; the parent's remaining per-event work is memoized/cheap.)

Files touched: 3, all surgical, behavior identical.

1a. `KanbanCard` — wrap in `React.memo`, take primitives instead of `orch`

The card body only reads run status (it never calls an orch mutator — only the modal does), so it needs two booleans, not the whole orchestrator object.

apps/docs/src/components/kanban/KanbanCard.tsx:

Remove orch: UseOrchestrator from props; add isRunning: boolean, bevyRunning: boolean.
Change onOpen: () => void → onOpen: (id: string) => void, and call onOpen(card.id).
Replace the three orch.* reads with the props.
export const KanbanCard = React.memo(function KanbanCard(props) { ... }).

export const KanbanCard = memo(function KanbanCard({
  card, pages, customPages, isRunning, bevyRunning, onOpen,
}: Props) {
  const active = isRunning || bevyRunning;   // was: orch.isRunning(...) || orch.bevyIsRunning(...)
  // …onClick={() => onOpen(card.id)}…
});

1b. `useOrchestrator` — expose one memoized `activeByCard` map

Add a single derived selector so each card reads its status in O(1) and the map is referentially stable unless the active set actually changes (not on every text event):

const activeByCard = useMemo(() => {
  const m = new Map<string, { running: boolean; bevy: boolean; runId: string }>();
  for (const r of runs) {
    if (r.status === 'running') m.set(r.cardId, { running: true, bevy: bevyRunning.has(r.id), runId: r.id });
  }
  return m;
}, [runs, bevyRunning]);
// …return { …, activeByCard }

This also retires the per-render .filter().find() scans each card does today — fixes #4.

1c. `KanbanBoardPage` — pass stable props

const openById = useCallback((id: string) => setOpenCardId(id), []); // setOpenCardId is already stable
const active = orch.activeByCard;                                     // stable across text events
// …in the columns map:
const a = active.get(card.id);
<KanbanCard key={card.id}
  card={card}                 // stable ref from memoized byColumn (changes only when THIS card mutates)
  pages={pages}               // set only on reload()
  customPages={customPages}   // context value memoized in CustomPagesProvider
  isRunning={!!a?.running}
  bevyRunning={!!a?.bevy}
  onOpen={openById}           // stable useCallback
/>

With every prop === across a text-event re-render, React.memo bails → the 38 cards are skipped.

Stable-ref claims verified: pages is set only in reload() (useKanbanBoard.ts); customPages comes from CustomPagesProvider, whose context value is useMemo-ed on [pages, …] (customPagesStore.tsx). Neither changes on orchestrator events. ✓

1d. (trivial, include it) memoize the static siblings

Wrap StatCard in React.memo; extract the fully-static category legend to a module-scope constant or a memoized <CategoryLegend />. Removes the last ~22 small reconciliations the parent does per event. Cheap insurance.

What this does NOT change

Behavior is identical — same renders, same data flow, just fewer wasted ones.
CardModal/AgentRunBar still re-render on events (intended — they show the live log).
eventsByRun stays in useOrchestrator (no SSE/state split).

Phase 1-optional (only if the profiler shows residual parent churn) — split the event log out

Move the EventSource subscription and the eventsByRun entry into a per-card useRunEvents(runId) used only by AgentRunBar; push status/bevy/done changes back up via registry mutators (reflectRunStatus, reflectBevy). Result: text events re-render only the modal; KanbanBoardPage itself renders only on lifecycle changes. Analysis says this is unnecessary once 1a–1d are in (the parent's per-event work is memoized/cheap) — so it is gated behind the measurement checkpoint, not part of the default cure.

Phase 2 — Calm the background poll (#2)

Raise the active interval to ~5–8 s and make it adaptive on visibility: only poll while the board tab is visible (document.visibilityState === 'visible'). Pause on visibilitychange → hidden.
Skip the poll entirely when an SSE stream is open for the only running run — the stream already delivers status/bevy/done events; polling is redundant then. (Keep the poll as a fallback only when no stream is attached.)
Keep the existing signature guard (it's good) and extend it to short-circuit before setLoading/any state write.

The SSE path is the source of truth while a modal is open; the poll should be a liveness backstop, not the primary updater.

Phase 3 — Kill the N+1 on `/board` (#3)

Replace the per-card prepared-statement fan-out in hydrateCard with three batched queries, then stitch in JS:

kanban.get('/board', (c) => {
  const cards = db.prepare('SELECT * FROM cards ORDER BY sort_order ASC').all() as CardRow[];
  const ids = cards.map((c) => c.id);
  // batch via a parameterized IN (...) built from `ids`
  const refs   = db.prepare(allRefsSql(ids)).all(...ids) as ReferenceRow[];
  const tags   = db.prepare(allTagsSql(ids)).all(...ids) as { card_id: string; tag: string }[];
  const cmts   = db.prepare(allCmtsSql(ids)).all(...ids) as CommentRow[];
  const byCard = groupByCard(cards, refs, tags, cmts); // O(n) stitch
  return c.json({ cards: cards.map((r) => byCard[r.id]), lastUpdated: ... });
});

Helper to build safe IN (?, ?, ...) lists for better-sqlite3 (or bind a JSON/temp-table). Result: 4 queries total instead of 3N+1, and hydrateCard collapses to a pure shape-mapper.

Indexes already exist (idx_comments_card, idx_tags_card, idx_refs_card) so the batched queries are cheap.

Phase 4 — Small wins (#5)

Stop refetching pages on every reload. DOC_PAGES is a static registry loaded at module init. Fetch it once (cache in the hook / a tiny module-level promise), and have reload() hit only /board. The explicit "Reload" button can keep refetching pages if desired.
reload() on mutation failure currently re-fetches board and pages; drop pages there too.
Hoist static inline style objects to module scope where they're already constants (a few are local consts inside components — promote them). This is GC-churn only, do last.
Optional: if the board grows past ~80–100 cards, add windowing/virtualization to each column (e.g. @tanstack/react-virtual). Not needed at 38 cards; document the threshold.

Measurement plan (verify before/after)

React Profiler (React DevTools) on KanbanBoardPage while a run streams events:
- Before: every SSE event re-renders KanbanBoardPage + all 38 KanbanCards.
- After (1a–1d): SSE events commit KanbanBoardPage (cheap) + CardModal/AgentRunBar; 0 KanbanCard commits while the modal is open.
Count commits during a 10 s active run: target board commits ≈ 0 while only the modal is open.
/board latency: curl -w '%{time_total}' localhost:3001/api/kanban/board — expect a drop from the N+1 path (measure with EXPLAIN QUERY PLAN to confirm single scans).
Network: during an active run with the modal open, confirm no /api/orchestrator/runs poll requests fire (Phase 2).

Suggested execution order & effort

Phase	Effort	Risk	Expected impact
1a–1d (memoize + primitive props)	Small–Medium	Low (pure refactor, identical behavior)	Cures the lag during runs
1-opt (split event log out)	Medium	Low–Med	Removes residual parent churn — gated on profiler
2 (adaptive poll)	Small	Low	Removes idle re-renders / network
3 (batch `/board`)	Small–Medium	Low	Faster loads & reconciles
4 (small wins)	Small	Low	Polish

Recommend landing Phase 1 first and re-testing — it alone should resolve the perceived lag. Phases 2–4 are reinforcing improvements.

Out of scope (noted, not addressed)

No schema/feature changes; behavior is preserved.
SpacetimeDB persistence migration is unrelated.
The "agent may never mark done" rule is untouched.

11 KiB Raw Blame History Unescape Escape