diff --git a/mosaiciq-performance-audit-optimization-plan.md b/mosaiciq-performance-audit-optimization-plan.md new file mode 100644 index 0000000..a08a3b7 --- /dev/null +++ b/mosaiciq-performance-audit-optimization-plan.md @@ -0,0 +1,438 @@ +# MosaicIQ Performance Audit & Optimization Plan + +## Scope + +This revision is based on the current code in `MosaicIQ/` and a fresh production build. + +Verified baseline: + +- The main Vite entry chunk is still large: `dist/assets/index-RcdHjVMm.js` is `825.72 kB` minified and `246.79 kB` gzipped. +- The app already lazy-loads `ResearchMode`, `SettingsPage`, and `ResearchGraph`. +- The research backend opens a new SQLite connection inside `spawn_blocking` for every repository call. +- The research UI reloads the full workspace projection after note, ghost, and workspace events. + +## Corrected Executive Summary + +The biggest problems are not “lack of React.memo” or “missing async SQLite” in the abstract. The real bottlenecks are: + +1. `research` and `news` repository calls pay per-call SQLite connection setup cost and serialize a lot of work through `spawn_blocking`. +2. Workspace reads are duplicated and amplified: the backend assembles projections sequentially, and the frontend re-fetches the entire projection after many small events. +3. Background job processing can race. The scheduler loop and `kick_job_processor()` both call `process_due_jobs()`, but claiming work is not atomic. +4. Several hot paths still do N+1 work: `list_sources_by_ids`, note-save loops after link inference, and repeated audit trail fetches for the same selection. +5. Bundle size is a real issue, but the fix is targeted dependency splitting and measurement, not generic `manualChunks` by component file. + +## P0: Fix Correctness + High-Leverage Latency First + +### 1. Make job claiming atomic before adding concurrency + +Why this is first: + +- `spawn_research_scheduler()` runs every 3 seconds. +- `kick_job_processor()` also spawns ad hoc processors on note capture, note update, and retry. +- `process_due_jobs()` reads due jobs, then marks them running in a separate step. + +That means two runners can observe the same queued job and both process it. + +Files: + +- `src-tauri/src/research/pipeline.rs` +- `src-tauri/src/research/service.rs` +- `src-tauri/src/research/repository.rs` + +Evidence: + +- `spawn_research_scheduler()` loops forever and calls `service.process_due_jobs().await`. +- `kick_job_processor()` also spawns `process_due_jobs()`. +- `list_due_jobs()` and `mark_running()` are separate operations. + +Recommended fix: + +1. Replace “list then mark” with a single transactional claim method in the repository. +2. Ensure only one background processor is active at a time, or gate concurrent processors with a mutex/semaphore. +3. Only consider parallel per-job execution after claim semantics are safe. + +Implementation sketch: + +```rust +pub async fn claim_due_jobs(&self, limit: usize) -> Result> { + self.with_connection(move |connection| { + let tx = connection.transaction()?; + + let jobs = { + let mut stmt = tx.prepare( + "SELECT entity_json + FROM pipeline_jobs + WHERE status IN (?1, ?2) + AND (next_attempt_at IS NULL OR next_attempt_at <= ?3) + ORDER BY updated_at ASC + LIMIT ?4" + )?; + // read rows here + }; + + for job in &jobs { + tx.execute( + "UPDATE pipeline_jobs + SET status = ?2, updated_at = ?3, entity_json = ?4 + WHERE id = ?1 AND status IN (?5, ?6)", + params![/* updated running job */], + )?; + } + + tx.commit()?; + Ok(jobs) + }).await +} +``` + +Success criteria: + +- A job ID is never processed twice in logs for one enqueue. +- `kick_job_processor()` no longer creates overlapping workers. + +### 2. Parallelize projection assembly and audit-trail reads + +Why this matters: + +- `get_workspace_projection()` currently does four independent repository reads sequentially. +- `get_note_audit_trail()` also does a sequence of independent reads and then calls an N+1 helper for sources. + +Files: + +- `src-tauri/src/research/service.rs` + +Recommended fix: + +1. Use `tokio::try_join!` in `get_workspace_projection()`. +2. Use `tokio::try_join!` in `get_note_audit_trail()` for links, ghosts, and audit events after loading the note. +3. Deduplicate source IDs before querying sources. + +Implementation sketch: + +```rust +let workspace_fut = self.repository.get_workspace(&request.workspace_id); +let notes_fut = self.repository.list_notes(&request.workspace_id, false, None); +let links_fut = self.repository.list_links(&request.workspace_id, None); +let ghosts_fut = self.repository.list_ghosts(&request.workspace_id, false); + +let (workspace, notes, links, ghosts) = + tokio::try_join!(workspace_fut, notes_fut, links_fut, ghosts_fut)?; +``` + +Success criteria: + +- `get_workspace_projection` latency drops materially under tracing. +- `get_note_audit_trail` no longer performs serial backend waits for independent reads. + +### 3. Stop reloading the full workspace projection on every small event + +Why this matters: + +- `useResearchProjection` schedules a full `getWorkspaceProjection` refetch on workspace, note, and ghost updates. +- Background jobs emit note and ghost updates, so one user action can trigger repeated full projection reloads. + +Files: + +- `src/hooks/useResearchProjection.ts` +- `src/components/Research/ResearchMode.tsx` +- `src/components/Research/ResearchInspector.tsx` + +Recommended fix: + +1. Patch local projection state from event payloads where possible instead of re-fetching the whole projection. +2. Keep full reloads for coarse invalidation only. +3. Share one audit-trail fetch path. Right now both `ResearchMode` and `ResearchInspector` fetch `getNoteAuditTrail()` for the selected note. + +Notes: + +- `ResearchMode` fetches audit trail on selection change. +- `ResearchInspector` fetches the same audit trail again when `note` changes. + +Success criteria: + +- Selecting a note results in one audit-trail request, not two. +- Background enrichment/linking no longer causes repeated full projection fetches for the same workspace state. + +### 4. Batch link-inference writes and avoid unnecessary rewrites + +Why this matters: + +- `process_infer_links()` recalculates links, replaces all links, then re-saves every note in the workspace one by one. +- This is expensive and also creates extra downstream event churn. + +Files: + +- `src-tauri/src/research/service.rs` +- `src-tauri/src/research/repository.rs` + +Recommended fix: + +1. Add a transactional `save_notes_batch`. +2. Only persist notes whose inferred-link set actually changed. +3. Consider diffing links before `replace_links_for_workspace()` to avoid full delete/reinsert when unchanged. + +Success criteria: + +- Large workspaces no longer perform `N` separate note saves after every link inference pass. +- No-op link inference produces minimal writes. + +## P1: Reduce Database Overhead Without Prematurely Rewriting the Stack + +### 5. Reuse SQLite connections instead of opening one per repository call + +What is true: + +- `with_connection()` in both repositories uses `spawn_blocking` and opens a fresh SQLite connection every time. + +What is not yet justified: + +- A full migration to `sqlx` should not be the first recommendation. SQLite “async” drivers still use background threads internally, and the current biggest cost is repeated connection setup plus query shape, not just the driver choice. + +Files: + +- `src-tauri/src/research/repository.rs` +- `src-tauri/src/news/repository.rs` + +Recommended fix order: + +1. Keep `rusqlite` initially. +2. Introduce a small connection pool or a dedicated DB worker with persistent connections. +3. Re-measure before considering a driver migration. + +Candidate approaches: + +- `r2d2_sqlite` +- `deadpool-sqlite` +- one long-lived DB thread per subsystem if contention remains low + +Success criteria: + +- Repository calls no longer pay `Connection::open()` and PRAGMA setup on every operation. + +### 6. Replace `list_sources_by_ids` N+1 lookup with a single query + +Why this matters: + +- `list_sources_by_ids()` loops over IDs and executes one query per source. +- `get_note_audit_trail()` and `process_refresh_source()` both depend on it. + +Files: + +- `src-tauri/src/research/repository.rs` + +Recommended fix: + +1. Deduplicate incoming IDs. +2. Build a single `IN (?, ?, ...)` query. +3. Preserve input order in memory if the caller depends on it. + +Success criteria: + +- Source lookup for audit trails becomes one DB round-trip instead of many. + +### 7. Add the indexes that are actually missing + +The draft overstated this area. The repository already creates several useful indexes for notes, links, ghosts, sources, and jobs. + +Real candidates: + +1. `audit_events(entity_id, created_at)` for note audit trails. +2. `audit_events(workspace_id, created_at)` for bundle export. +3. `research_notes(workspace_id, source_id, note_type)` for `find_source_reference_note()`. +4. Source checksum/accession should not rely on unindexed `json_extract(entity_json, ...)` lookups on a hot path. + +Files: + +- `src-tauri/src/research/repository.rs` + +Recommended fix: + +1. Add the missing audit and source-reference indexes. +2. Promote checksum and filing accession into indexed columns, or add generated columns if that fits the migration strategy. + +Success criteria: + +- Query plans for audit-trail and source-dedup queries stop full-scanning growing tables. + +### 8. Normalize time comparisons for due jobs + +Why this matters: + +- `next_attempt_at` is written as RFC 3339 text. +- `list_due_jobs()` compares it to `datetime('now')`. + +That string comparison is fragile because the formats do not match exactly. + +Files: + +- `src-tauri/src/research/repository.rs` +- `src-tauri/src/research/util.rs` + +Recommended fix: + +1. Store retry timestamps as integer epoch seconds, or +2. Store them in a SQLite-compatible normalized format consistently. + +Success criteria: + +- Retry timing is deterministic and easy to index. + +## P2: Frontend and Bundle Work Based On Measured Hotspots + +### 9. Shrink the main bundle with dependency-level splitting + +What is true: + +- The main entry chunk is still `825.72 kB`. +- That is large enough to justify work. + +What is already done: + +- `App.tsx` already lazy-loads `ResearchMode` and `SettingsPage`. +- `ResearchMode.tsx` already lazy-loads `ResearchGraph`. + +What needs correction: + +- `manualChunks` should split heavy dependencies, not component source paths. +- Do bundle analysis first so chunk rules map to real weight. + +Files: + +- `vite.config.ts` +- `src/App.tsx` +- heavy feature entry points in `src/components` + +Recommended fix: + +1. Add a bundle analyzer for one pass. +2. Split by heavy libraries that dominate the main chunk. +3. Lazy-load rarely used panels or dependencies if they currently land in `index`. + +Success criteria: + +- The main `index-*.js` chunk is materially smaller after analysis-driven chunking. + +### 10. Treat `React.memo` as a profiling tool, not a blanket rule + +The draft was too broad here. + +Current state: + +- There is already lazy loading and targeted `useMemo`/`useDeferredValue` usage. +- There is no evidence yet that “all components need `React.memo`”. + +Recommended fix: + +1. Profile note-heavy views first. +2. Memoize only components that are proven hot and receive stable props. +3. Prefer fixing projection refresh storms and duplicate fetches before adding memo wrappers. + +Possible targets after profiling: + +- `NoteCard` +- `GhostCard` +- large list/board containers + +### 11. Do not treat file length as a performance bug by itself + +Large files such as `CommandInput.tsx` and `ResearchMode.tsx` are maintainability concerns, but splitting them does not automatically improve runtime performance. + +Recommended posture: + +- Refactor them when it enables real wins such as better lazy-loading boundaries, simpler state ownership, or isolated expensive subtrees. + +### 12. Defer virtualization until real note-count profiling justifies it + +Virtualized lists may become necessary for very large workspaces, but current higher-order bottlenecks are above them: + +- full projection refetches +- duplicate audit fetches +- batch-save inefficiencies +- non-atomic background job processing + +Add virtualization only after measuring large note canvases/boards. + +## Items Removed Or Downgraded From The Original Draft + +These were either incorrect, overstated, or not yet justified: + +- “No React.memo on components” as a top-level diagnosis. +- “Switch to sqlx” as the default first fix. +- Blanket entity caching in the repository. +- “Missing indexes” as a generic claim. +- “Excessive startTransition” as a meaningful bottleneck without profiling. +- “Potential memory leaks in Arc references” based on a file path that does not exist in this repo. +- Redis, memcached, or GraphQL recommendations for a local-first desktop app. + +## Recommended Execution Order + +### Phase 1: Stabilize background processing and remove duplicate work + +1. Atomic job claim + single-runner guard. +2. Parallelize `get_workspace_projection()`. +3. Parallelize `get_note_audit_trail()` and replace `list_sources_by_ids()` N+1. +4. Remove duplicate audit-trail fetches between `ResearchMode` and `ResearchInspector`. +5. Reduce full projection refetches from research events. + +Expected impact: + +- Better correctness. +- Faster workspace and inspector loads. +- Less UI thrash during background processing. + +### Phase 2: Reduce SQLite overhead and write amplification + +1. Introduce persistent SQLite connections. +2. Batch note saves after link inference. +3. Avoid rewriting unchanged inferred links/notes. +4. Add missing indexes and normalize due-job timestamp storage. + +Expected impact: + +- Lower latency under repeated research activity. +- Better scalability as workspaces grow. + +### Phase 3: Bundle and rendering cleanup + +1. Analyze the main chunk. +2. Split heavy dependencies intentionally. +3. Memoize only proven hot components. +4. Consider virtualization only if measured note counts require it. + +Expected impact: + +- Faster startup. +- Less JS to parse on initial load. + +## Verification Plan + +Add measurement before and after each phase. + +Backend: + +- Log timing for `get_workspace_projection`, `get_note_audit_trail`, `process_due_jobs`, and link inference. +- Log claimed job IDs so duplicate processing is visible immediately. +- Use `EXPLAIN QUERY PLAN` for audit-event and source-dedup queries after schema changes. + +Frontend: + +- Count `getWorkspaceProjection` and `getNoteAuditTrail` invocations during a single note capture flow. +- Use React Profiler on research views before adding memoization. +- Keep bundle-size snapshots from `npm run build`. + +Success targets: + +- One audit-trail request per note selection. +- No duplicate job processing. +- Meaningfully fewer projection refetches during enrichment/linking. +- Smaller main bundle than the current `825.72 kB`. + +## Short Version + +If only a few changes get done, do these first: + +1. Make job claiming atomic and prevent overlapping processors. +2. Parallelize projection and audit-trail reads. +3. Stop full projection refetches and duplicate audit-trail fetches in the research UI. +4. Batch note writes after link inference. +5. Reuse SQLite connections before considering a driver migration. diff --git a/response-overflow-check.png b/response-overflow-check.png deleted file mode 100644 index 4531f37..0000000 Binary files a/response-overflow-check.png and /dev/null differ