Cache and reuse filing HTML during KPI note extraction #18

New Issue

Francy51 · 2026-03-15T01:25:59Z

Francy51 commented

2026-03-15 01:25:59 +00:00

Structured KPI extraction from filing notes repeatedly refetches and reparses the same filing HTML inside nested loops.

Why this is a problem:

Needlessly amplifies SEC requests.
Repeats expensive HTML parsing work.
Increases latency and makes throttling more likely.
Violates the performance-first priority.

Observed in:

lib/server/financials/kpi-notes.ts
- extractStructuredKpisFromNotes() loops over definitions and periods
- each period/definition path can call fetchHtml() for the same filing again
- each fetched document is reparsed with Cheerio
lib/server/financial-taxonomy.ts
- invokes note extraction in the taxonomy pipeline

Suggested direction:

Fetch each filing HTML at most once per extraction run.
Parse each filing HTML once and reuse the parsed representation.
Restructure the algorithm to iterate filing -> parsed tables -> matching definitions, instead of definition -> period -> fetch.

Acceptance criteria:

Each filing document is fetched at most once per extraction pass.
HTML parsing is reused rather than repeated for each definition.
Extraction latency and outbound SEC request volume are materially reduced.

Structured KPI extraction from filing notes repeatedly refetches and reparses the same filing HTML inside nested loops. Why this is a problem: - Needlessly amplifies SEC requests. - Repeats expensive HTML parsing work. - Increases latency and makes throttling more likely. - Violates the performance-first priority. Observed in: - `lib/server/financials/kpi-notes.ts` - `extractStructuredKpisFromNotes()` loops over definitions and periods - each period/definition path can call `fetchHtml()` for the same filing again - each fetched document is reparsed with Cheerio - `lib/server/financial-taxonomy.ts` - invokes note extraction in the taxonomy pipeline Suggested direction: - Fetch each filing HTML at most once per extraction run. - Parse each filing HTML once and reuse the parsed representation. - Restructure the algorithm to iterate filing -> parsed tables -> matching definitions, instead of definition -> period -> fetch. Acceptance criteria: - Each filing document is fetched at most once per extraction pass. - HTML parsing is reused rather than repeated for each definition. - Extraction latency and outbound SEC request volume are materially reduced.

Francy51 added the P1 label 2026-03-15 01:25:59 +00:00

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: Francy51/Neon-Desk#18