Files
MosaicIQ/mosaiciq-performance-audit-optimization-plan.md

15 KiB

MosaicIQ Performance Audit & Optimization Plan

Scope

This revision is based on the current code in MosaicIQ/ and a fresh production build.

Verified baseline:

  • The main Vite entry chunk is still large: dist/assets/index-RcdHjVMm.js is 825.72 kB minified and 246.79 kB gzipped.
  • The app already lazy-loads ResearchMode, SettingsPage, and ResearchGraph.
  • The research backend opens a new SQLite connection inside spawn_blocking for every repository call.
  • The research UI reloads the full workspace projection after note, ghost, and workspace events.

Corrected Executive Summary

The biggest problems are not “lack of React.memo” or “missing async SQLite” in the abstract. The real bottlenecks are:

  1. research and news repository calls pay per-call SQLite connection setup cost and serialize a lot of work through spawn_blocking.
  2. Workspace reads are duplicated and amplified: the backend assembles projections sequentially, and the frontend re-fetches the entire projection after many small events.
  3. Background job processing can race. The scheduler loop and kick_job_processor() both call process_due_jobs(), but claiming work is not atomic.
  4. Several hot paths still do N+1 work: list_sources_by_ids, note-save loops after link inference, and repeated audit trail fetches for the same selection.
  5. Bundle size is a real issue, but the fix is targeted dependency splitting and measurement, not generic manualChunks by component file.

P0: Fix Correctness + High-Leverage Latency First

1. Make job claiming atomic before adding concurrency

Why this is first:

  • spawn_research_scheduler() runs every 3 seconds.
  • kick_job_processor() also spawns ad hoc processors on note capture, note update, and retry.
  • process_due_jobs() reads due jobs, then marks them running in a separate step.

That means two runners can observe the same queued job and both process it.

Files:

  • src-tauri/src/research/pipeline.rs
  • src-tauri/src/research/service.rs
  • src-tauri/src/research/repository.rs

Evidence:

  • spawn_research_scheduler() loops forever and calls service.process_due_jobs().await.
  • kick_job_processor() also spawns process_due_jobs().
  • list_due_jobs() and mark_running() are separate operations.

Recommended fix:

  1. Replace “list then mark” with a single transactional claim method in the repository.
  2. Ensure only one background processor is active at a time, or gate concurrent processors with a mutex/semaphore.
  3. Only consider parallel per-job execution after claim semantics are safe.

Implementation sketch:

pub async fn claim_due_jobs(&self, limit: usize) -> Result<Vec<PipelineJob>> {
    self.with_connection(move |connection| {
        let tx = connection.transaction()?;

        let jobs = {
            let mut stmt = tx.prepare(
                "SELECT entity_json
                 FROM pipeline_jobs
                 WHERE status IN (?1, ?2)
                   AND (next_attempt_at IS NULL OR next_attempt_at <= ?3)
                 ORDER BY updated_at ASC
                 LIMIT ?4"
            )?;
            // read rows here
        };

        for job in &jobs {
            tx.execute(
                "UPDATE pipeline_jobs
                 SET status = ?2, updated_at = ?3, entity_json = ?4
                 WHERE id = ?1 AND status IN (?5, ?6)",
                params![/* updated running job */],
            )?;
        }

        tx.commit()?;
        Ok(jobs)
    }).await
}

Success criteria:

  • A job ID is never processed twice in logs for one enqueue.
  • kick_job_processor() no longer creates overlapping workers.

2. Parallelize projection assembly and audit-trail reads

Why this matters:

  • get_workspace_projection() currently does four independent repository reads sequentially.
  • get_note_audit_trail() also does a sequence of independent reads and then calls an N+1 helper for sources.

Files:

  • src-tauri/src/research/service.rs

Recommended fix:

  1. Use tokio::try_join! in get_workspace_projection().
  2. Use tokio::try_join! in get_note_audit_trail() for links, ghosts, and audit events after loading the note.
  3. Deduplicate source IDs before querying sources.

Implementation sketch:

let workspace_fut = self.repository.get_workspace(&request.workspace_id);
let notes_fut = self.repository.list_notes(&request.workspace_id, false, None);
let links_fut = self.repository.list_links(&request.workspace_id, None);
let ghosts_fut = self.repository.list_ghosts(&request.workspace_id, false);

let (workspace, notes, links, ghosts) =
    tokio::try_join!(workspace_fut, notes_fut, links_fut, ghosts_fut)?;

Success criteria:

  • get_workspace_projection latency drops materially under tracing.
  • get_note_audit_trail no longer performs serial backend waits for independent reads.

3. Stop reloading the full workspace projection on every small event

Why this matters:

  • useResearchProjection schedules a full getWorkspaceProjection refetch on workspace, note, and ghost updates.
  • Background jobs emit note and ghost updates, so one user action can trigger repeated full projection reloads.

Files:

  • src/hooks/useResearchProjection.ts
  • src/components/Research/ResearchMode.tsx
  • src/components/Research/ResearchInspector.tsx

Recommended fix:

  1. Patch local projection state from event payloads where possible instead of re-fetching the whole projection.
  2. Keep full reloads for coarse invalidation only.
  3. Share one audit-trail fetch path. Right now both ResearchMode and ResearchInspector fetch getNoteAuditTrail() for the selected note.

Notes:

  • ResearchMode fetches audit trail on selection change.
  • ResearchInspector fetches the same audit trail again when note changes.

Success criteria:

  • Selecting a note results in one audit-trail request, not two.
  • Background enrichment/linking no longer causes repeated full projection fetches for the same workspace state.

Why this matters:

  • process_infer_links() recalculates links, replaces all links, then re-saves every note in the workspace one by one.
  • This is expensive and also creates extra downstream event churn.

Files:

  • src-tauri/src/research/service.rs
  • src-tauri/src/research/repository.rs

Recommended fix:

  1. Add a transactional save_notes_batch.
  2. Only persist notes whose inferred-link set actually changed.
  3. Consider diffing links before replace_links_for_workspace() to avoid full delete/reinsert when unchanged.

Success criteria:

  • Large workspaces no longer perform N separate note saves after every link inference pass.
  • No-op link inference produces minimal writes.

P1: Reduce Database Overhead Without Prematurely Rewriting the Stack

5. Reuse SQLite connections instead of opening one per repository call

What is true:

  • with_connection() in both repositories uses spawn_blocking and opens a fresh SQLite connection every time.

What is not yet justified:

  • A full migration to sqlx should not be the first recommendation. SQLite “async” drivers still use background threads internally, and the current biggest cost is repeated connection setup plus query shape, not just the driver choice.

Files:

  • src-tauri/src/research/repository.rs
  • src-tauri/src/news/repository.rs

Recommended fix order:

  1. Keep rusqlite initially.
  2. Introduce a small connection pool or a dedicated DB worker with persistent connections.
  3. Re-measure before considering a driver migration.

Candidate approaches:

  • r2d2_sqlite
  • deadpool-sqlite
  • one long-lived DB thread per subsystem if contention remains low

Success criteria:

  • Repository calls no longer pay Connection::open() and PRAGMA setup on every operation.

6. Replace list_sources_by_ids N+1 lookup with a single query

Why this matters:

  • list_sources_by_ids() loops over IDs and executes one query per source.
  • get_note_audit_trail() and process_refresh_source() both depend on it.

Files:

  • src-tauri/src/research/repository.rs

Recommended fix:

  1. Deduplicate incoming IDs.
  2. Build a single IN (?, ?, ...) query.
  3. Preserve input order in memory if the caller depends on it.

Success criteria:

  • Source lookup for audit trails becomes one DB round-trip instead of many.

7. Add the indexes that are actually missing

The draft overstated this area. The repository already creates several useful indexes for notes, links, ghosts, sources, and jobs.

Real candidates:

  1. audit_events(entity_id, created_at) for note audit trails.
  2. audit_events(workspace_id, created_at) for bundle export.
  3. research_notes(workspace_id, source_id, note_type) for find_source_reference_note().
  4. Source checksum/accession should not rely on unindexed json_extract(entity_json, ...) lookups on a hot path.

Files:

  • src-tauri/src/research/repository.rs

Recommended fix:

  1. Add the missing audit and source-reference indexes.
  2. Promote checksum and filing accession into indexed columns, or add generated columns if that fits the migration strategy.

Success criteria:

  • Query plans for audit-trail and source-dedup queries stop full-scanning growing tables.

8. Normalize time comparisons for due jobs

Why this matters:

  • next_attempt_at is written as RFC 3339 text.
  • list_due_jobs() compares it to datetime('now').

That string comparison is fragile because the formats do not match exactly.

Files:

  • src-tauri/src/research/repository.rs
  • src-tauri/src/research/util.rs

Recommended fix:

  1. Store retry timestamps as integer epoch seconds, or
  2. Store them in a SQLite-compatible normalized format consistently.

Success criteria:

  • Retry timing is deterministic and easy to index.

P2: Frontend and Bundle Work Based On Measured Hotspots

9. Shrink the main bundle with dependency-level splitting

What is true:

  • The main entry chunk is still 825.72 kB.
  • That is large enough to justify work.

What is already done:

  • App.tsx already lazy-loads ResearchMode and SettingsPage.
  • ResearchMode.tsx already lazy-loads ResearchGraph.

What needs correction:

  • manualChunks should split heavy dependencies, not component source paths.
  • Do bundle analysis first so chunk rules map to real weight.

Files:

  • vite.config.ts
  • src/App.tsx
  • heavy feature entry points in src/components

Recommended fix:

  1. Add a bundle analyzer for one pass.
  2. Split by heavy libraries that dominate the main chunk.
  3. Lazy-load rarely used panels or dependencies if they currently land in index.

Success criteria:

  • The main index-*.js chunk is materially smaller after analysis-driven chunking.

10. Treat React.memo as a profiling tool, not a blanket rule

The draft was too broad here.

Current state:

  • There is already lazy loading and targeted useMemo/useDeferredValue usage.
  • There is no evidence yet that “all components need React.memo”.

Recommended fix:

  1. Profile note-heavy views first.
  2. Memoize only components that are proven hot and receive stable props.
  3. Prefer fixing projection refresh storms and duplicate fetches before adding memo wrappers.

Possible targets after profiling:

  • NoteCard
  • GhostCard
  • large list/board containers

11. Do not treat file length as a performance bug by itself

Large files such as CommandInput.tsx and ResearchMode.tsx are maintainability concerns, but splitting them does not automatically improve runtime performance.

Recommended posture:

  • Refactor them when it enables real wins such as better lazy-loading boundaries, simpler state ownership, or isolated expensive subtrees.

12. Defer virtualization until real note-count profiling justifies it

Virtualized lists may become necessary for very large workspaces, but current higher-order bottlenecks are above them:

  • full projection refetches
  • duplicate audit fetches
  • batch-save inefficiencies
  • non-atomic background job processing

Add virtualization only after measuring large note canvases/boards.

Items Removed Or Downgraded From The Original Draft

These were either incorrect, overstated, or not yet justified:

  • “No React.memo on components” as a top-level diagnosis.
  • “Switch to sqlx” as the default first fix.
  • Blanket entity caching in the repository.
  • “Missing indexes” as a generic claim.
  • “Excessive startTransition” as a meaningful bottleneck without profiling.
  • “Potential memory leaks in Arc references” based on a file path that does not exist in this repo.
  • Redis, memcached, or GraphQL recommendations for a local-first desktop app.

Phase 1: Stabilize background processing and remove duplicate work

  1. Atomic job claim + single-runner guard.
  2. Parallelize get_workspace_projection().
  3. Parallelize get_note_audit_trail() and replace list_sources_by_ids() N+1.
  4. Remove duplicate audit-trail fetches between ResearchMode and ResearchInspector.
  5. Reduce full projection refetches from research events.

Expected impact:

  • Better correctness.
  • Faster workspace and inspector loads.
  • Less UI thrash during background processing.

Phase 2: Reduce SQLite overhead and write amplification

  1. Introduce persistent SQLite connections.
  2. Batch note saves after link inference.
  3. Avoid rewriting unchanged inferred links/notes.
  4. Add missing indexes and normalize due-job timestamp storage.

Expected impact:

  • Lower latency under repeated research activity.
  • Better scalability as workspaces grow.

Phase 3: Bundle and rendering cleanup

  1. Analyze the main chunk.
  2. Split heavy dependencies intentionally.
  3. Memoize only proven hot components.
  4. Consider virtualization only if measured note counts require it.

Expected impact:

  • Faster startup.
  • Less JS to parse on initial load.

Verification Plan

Add measurement before and after each phase.

Backend:

  • Log timing for get_workspace_projection, get_note_audit_trail, process_due_jobs, and link inference.
  • Log claimed job IDs so duplicate processing is visible immediately.
  • Use EXPLAIN QUERY PLAN for audit-event and source-dedup queries after schema changes.

Frontend:

  • Count getWorkspaceProjection and getNoteAuditTrail invocations during a single note capture flow.
  • Use React Profiler on research views before adding memoization.
  • Keep bundle-size snapshots from npm run build.

Success targets:

  • One audit-trail request per note selection.
  • No duplicate job processing.
  • Meaningfully fewer projection refetches during enrichment/linking.
  • Smaller main bundle than the current 825.72 kB.

Short Version

If only a few changes get done, do these first:

  1. Make job claiming atomic and prevent overlapping processors.
  2. Parallelize projection and audit-trail reads.
  3. Stop full projection refetches and duplicate audit-trail fetches in the research UI.
  4. Batch note writes after link inference.
  5. Reuse SQLite connections before considering a driver migration.