Diagnosis and phased plan for the board's lag, grounded in the codebase:
the dominant cause is a render storm (orchestrator state hoisted at the
top, every SSE event re-renders all cards; KanbanCard not memoized), plus
a 3s background poll, an N+1 on /board, and per-card status scans.
Covers four phases — memoize (now landed in 292005e), calm the poll, batch
the /board query, and small wins — with a measurement plan and risk notes.
EOF && echo "" && git log --oneline -3
182 lines
11 KiB
Markdown
182 lines
11 KiB
Markdown
# Implementation Board — Performance Plan
|
||
|
||
**Status:** Proposal · **Scope:** `apps/api` (SQLite backend) + `apps/docs` (board UI)
|
||
**Symptom:** The kanban board is "functional and good but laggy", especially while an agent run is in flight.
|
||
|
||
---
|
||
|
||
## TL;DR — root causes (in priority order)
|
||
|
||
| # | Cause | Where | Impact |
|
||
|---|-------|-------|--------|
|
||
| 1 | **State hoisting + zero memoization.** Orchestrator state (`runs`, `eventsByRun`, `bevyRunning`) lives in `KanbanBoardPage`. Every streamed SSE event → `setState` → the **entire board re-renders**. `KanbanCard` is not `React.memo`'d, and props that *would* enable memo (`orch`, `onOpen`) get a new identity every render. | `KanbanBoardPage.tsx`, `useOrchestrator.ts` | **High** — this is the dominant cause. During an active run, events stream several/sec → constant full-board re-renders of all 38 cards. |
|
||
| 2 | **3 s background poll** whenever any run is active (`anyRunning ? 3_000 : 10_000`). | `useOrchestrator.ts:103` | Medium — periodic re-renders even with no SSE; re-runs `listRuns` and re-creates the interval. |
|
||
| 3 | **N+1 queries** in `hydrateCard`: 3 prepared statements × 38 cards = **114+ round-trips per `/board`**. | `apps/api/src/routes/kanban.ts` (`hydrateCard`) | Medium — every board load/mutation reconcile pays this. |
|
||
| 4 | **O(cards × runs) scans** each render: each `KanbanCard` calls `orch.isRunning` + `orch.runForCard` + `orch.bevyIsRunning`, each `.filter`/`.find` over `runs`. | `KanbanCard.tsx`, `useOrchestrator.ts` | Low–Medium — cheap individually, but 38 × per-render adds up under re-render storm. |
|
||
| 5 | Minor: `reload()` re-fetches the immutable doc-page registry on every reload/reconcile; inline style objects churn GC; `onOpen`/`orch` identities churn. | `useKanbanBoard.ts`, all kanban components | Low |
|
||
|
||
> **Evidence:** `sqlite3 apps/api/.data/kanban.db` → **38 cards, 71 references, 5 runs, 307 events.** The dataset is small, so the lag is **rendering**, not data size — which is why fixes #1–#2 dominate.
|
||
|
||
---
|
||
|
||
## Phase 1 — Memoize the board cards so streaming events don't re-render them (the cure)
|
||
|
||
**Goal:** During an active agent run, streaming `text`/`tool_*` events must NOT re-render the 38 board cards. Re-rendering should be confined to the open card modal.
|
||
|
||
**Render trace (why the cards re-render today).** On every streamed event, `useOrchestrator` calls `setEventsByRun(...)` → that state lives **inside `KanbanBoardPage`** → `KanbanBoardPage` re-renders → it maps 38 `<KanbanCard>`s. Each card is handed two props that are a **fresh object every render**, so `React.memo` could never bail:
|
||
|
||
```tsx
|
||
<KanbanCard orch={orch} // new object literal every render
|
||
onOpen={() => setOpenCardId(card.id)} /> // new closure every render
|
||
```
|
||
|
||
The things the card actually cares about — `card`, `pages`, `customPages`, and "is this card's run active?" — do **not** change on a `text` event; only the `orch` and `onOpen` identities churn. Make those identities stable and hand the card **primitives** instead of `orch`, and `React.memo` bails → all 38 cards stop re-rendering. That is the whole cure. (Moving the event log out of the top-level hook — the original "split state" idea — is demoted to Phase 1-optional; the parent's remaining per-event work is memoized/cheap.)
|
||
|
||
**Files touched: 3, all surgical, behavior identical.**
|
||
|
||
### 1a. `KanbanCard` — wrap in `React.memo`, take primitives instead of `orch`
|
||
|
||
The card body only *reads* run status (it never calls an orch mutator — only the modal does), so it needs two booleans, not the whole orchestrator object.
|
||
|
||
`apps/docs/src/components/kanban/KanbanCard.tsx`:
|
||
- Remove `orch: UseOrchestrator` from props; add `isRunning: boolean`, `bevyRunning: boolean`.
|
||
- Change `onOpen: () => void` → `onOpen: (id: string) => void`, and call `onOpen(card.id)`.
|
||
- Replace the three `orch.*` reads with the props.
|
||
- `export const KanbanCard = React.memo(function KanbanCard(props) { ... })`.
|
||
|
||
```tsx
|
||
export const KanbanCard = memo(function KanbanCard({
|
||
card, pages, customPages, isRunning, bevyRunning, onOpen,
|
||
}: Props) {
|
||
const active = isRunning || bevyRunning; // was: orch.isRunning(...) || orch.bevyIsRunning(...)
|
||
// …onClick={() => onOpen(card.id)}…
|
||
});
|
||
```
|
||
|
||
### 1b. `useOrchestrator` — expose one memoized `activeByCard` map
|
||
|
||
Add a single derived selector so each card reads its status in O(1) and the map is referentially stable unless the **active set** actually changes (not on every text event):
|
||
|
||
```ts
|
||
const activeByCard = useMemo(() => {
|
||
const m = new Map<string, { running: boolean; bevy: boolean; runId: string }>();
|
||
for (const r of runs) {
|
||
if (r.status === 'running') m.set(r.cardId, { running: true, bevy: bevyRunning.has(r.id), runId: r.id });
|
||
}
|
||
return m;
|
||
}, [runs, bevyRunning]);
|
||
// …return { …, activeByCard }
|
||
```
|
||
|
||
This also retires the per-render `.filter().find()` scans each card does today — fixes #4.
|
||
|
||
### 1c. `KanbanBoardPage` — pass stable props
|
||
|
||
```ts
|
||
const openById = useCallback((id: string) => setOpenCardId(id), []); // setOpenCardId is already stable
|
||
const active = orch.activeByCard; // stable across text events
|
||
// …in the columns map:
|
||
const a = active.get(card.id);
|
||
<KanbanCard key={card.id}
|
||
card={card} // stable ref from memoized byColumn (changes only when THIS card mutates)
|
||
pages={pages} // set only on reload()
|
||
customPages={customPages} // context value memoized in CustomPagesProvider
|
||
isRunning={!!a?.running}
|
||
bevyRunning={!!a?.bevy}
|
||
onOpen={openById} // stable useCallback
|
||
/>
|
||
```
|
||
|
||
With every prop `===` across a `text`-event re-render, `React.memo` bails → the 38 cards are skipped.
|
||
|
||
**Stable-ref claims verified:** `pages` is set only in `reload()` (`useKanbanBoard.ts`); `customPages` comes from `CustomPagesProvider`, whose context value is `useMemo`-ed on `[pages, …]` (`customPagesStore.tsx`). Neither changes on orchestrator events. ✓
|
||
|
||
### 1d. (trivial, include it) memoize the static siblings
|
||
|
||
Wrap `StatCard` in `React.memo`; extract the fully-static category legend to a module-scope constant or a memoized `<CategoryLegend />`. Removes the last ~22 small reconciliations the parent does per event. Cheap insurance.
|
||
|
||
### What this does NOT change
|
||
|
||
- Behavior is identical — same renders, same data flow, just fewer wasted ones.
|
||
- `CardModal`/`AgentRunBar` still re-render on events (intended — they show the live log).
|
||
- `eventsByRun` stays in `useOrchestrator` (no SSE/state split).
|
||
|
||
### Phase 1-optional (only if the profiler shows residual parent churn) — split the event log out
|
||
|
||
Move the `EventSource` subscription and the `eventsByRun` entry into a per-card `useRunEvents(runId)` used only by `AgentRunBar`; push `status`/`bevy`/`done` changes back up via registry mutators (`reflectRunStatus`, `reflectBevy`). Result: text events re-render only the modal; `KanbanBoardPage` itself renders only on lifecycle changes. **Analysis says this is unnecessary once 1a–1d are in** (the parent's per-event work is memoized/cheap) — so it is gated behind the measurement checkpoint, not part of the default cure.
|
||
|
||
---
|
||
|
||
## Phase 2 — Calm the background poll (#2)
|
||
|
||
- **Raise the active interval to ~5–8 s** and make it **adaptive on visibility**: only poll while the board tab is visible (`document.visibilityState === 'visible'`). Pause on `visibilitychange` → hidden.
|
||
- **Skip the poll entirely when an SSE stream is open** for the only running run — the stream already delivers status/bevy/done events; polling is redundant then. (Keep the poll as a fallback only when no stream is attached.)
|
||
- Keep the existing signature guard (it's good) and extend it to short-circuit before `setLoading`/any state write.
|
||
|
||
> The SSE path is the source of truth while a modal is open; the poll should be a *liveness backstop*, not the primary updater.
|
||
|
||
---
|
||
|
||
## Phase 3 — Kill the N+1 on `/board` (#3)
|
||
|
||
Replace the per-card prepared-statement fan-out in `hydrateCard` with **three batched queries**, then stitch in JS:
|
||
|
||
```ts
|
||
kanban.get('/board', (c) => {
|
||
const cards = db.prepare('SELECT * FROM cards ORDER BY sort_order ASC').all() as CardRow[];
|
||
const ids = cards.map((c) => c.id);
|
||
// batch via a parameterized IN (...) built from `ids`
|
||
const refs = db.prepare(allRefsSql(ids)).all(...ids) as ReferenceRow[];
|
||
const tags = db.prepare(allTagsSql(ids)).all(...ids) as { card_id: string; tag: string }[];
|
||
const cmts = db.prepare(allCmtsSql(ids)).all(...ids) as CommentRow[];
|
||
const byCard = groupByCard(cards, refs, tags, cmts); // O(n) stitch
|
||
return c.json({ cards: cards.map((r) => byCard[r.id]), lastUpdated: ... });
|
||
});
|
||
```
|
||
|
||
Helper to build safe `IN (?, ?, ...)` lists for `better-sqlite3` (or bind a JSON/temp-table). Result: **4 queries total** instead of 3N+1, and `hydrateCard` collapses to a pure shape-mapper.
|
||
|
||
Indexes already exist (`idx_comments_card`, `idx_tags_card`, `idx_refs_card`) so the batched queries are cheap.
|
||
|
||
---
|
||
|
||
## Phase 4 — Small wins (#5)
|
||
|
||
- **Stop refetching pages on every reload.** `DOC_PAGES` is a static registry loaded at module init. Fetch it **once** (cache in the hook / a tiny module-level promise), and have `reload()` hit only `/board`. The explicit "Reload" button can keep refetching pages if desired.
|
||
- **`reload()` on mutation failure** currently re-fetches board *and* pages; drop pages there too.
|
||
- **Hoist static inline style objects** to module scope where they're already constants (a few are local `const`s inside components — promote them). This is GC-churn only, do last.
|
||
- Optional: if the board grows past ~80–100 cards, add **windowing/virtualization** to each column (e.g. `@tanstack/react-virtual`). Not needed at 38 cards; document the threshold.
|
||
|
||
---
|
||
|
||
## Measurement plan (verify before/after)
|
||
|
||
1. **React Profiler** (React DevTools) on `KanbanBoardPage` while a run streams events:
|
||
- *Before:* every SSE event re-renders `KanbanBoardPage` + all 38 `KanbanCard`s.
|
||
- *After (1a–1d):* SSE events commit `KanbanBoardPage` (cheap) + `CardModal`/`AgentRunBar`; **0 `KanbanCard` commits** while the modal is open.
|
||
2. **Count commits** during a 10 s active run: target `board commits ≈ 0` while only the modal is open.
|
||
3. **`/board` latency**: `curl -w '%{time_total}' localhost:3001/api/kanban/board` — expect a drop from the N+1 path (measure with `EXPLAIN QUERY PLAN` to confirm single scans).
|
||
4. **Network**: during an active run with the modal open, confirm no `/api/orchestrator/runs` poll requests fire (Phase 2).
|
||
|
||
---
|
||
|
||
## Suggested execution order & effort
|
||
|
||
| Phase | Effort | Risk | Expected impact |
|
||
|-------|--------|------|-----------------|
|
||
| **1a–1d** (memoize + primitive props) | Small–Medium | Low (pure refactor, identical behavior) | **Cures the lag** during runs |
|
||
| **1-opt** (split event log out) | Medium | Low–Med | Removes residual parent churn — gated on profiler |
|
||
| **2** (adaptive poll) | Small | Low | Removes idle re-renders / network |
|
||
| **3** (batch `/board`) | Small–Medium | Low | Faster loads & reconciles |
|
||
| **4** (small wins) | Small | Low | Polish |
|
||
|
||
Recommend landing **Phase 1 first** and re-testing — it alone should resolve the perceived lag. Phases 2–4 are reinforcing improvements.
|
||
|
||
---
|
||
|
||
## Out of scope (noted, not addressed)
|
||
|
||
- No schema/feature changes; behavior is preserved.
|
||
- SpacetimeDB persistence migration is unrelated.
|
||
- The "agent may never mark `done`" rule is untouched.
|