Files
Neon-Desk/rust/fiscal-xbrl-core/OPERATING_STATEMENT_PARSER_SPEC.md

4.2 KiB

Operating Statement Parser Spec

Purpose

This document defines the backend-only parsing rules for operating statement hydration in fiscal-xbrl-core.

This pass is intentionally limited to Rust parser behavior. It must not change frontend files, frontend rendering logic, or API response shapes.

Hydration Order

  1. Generic compact surface mapping builds initial surface_rows, detail_rows, and unmapped residuals.
  2. Universal income parsing rewrites the income statement into canonical operating-statement rows.
  3. Canonical income parsing is authoritative for income provenance and must prune any consumed residual rows from detail_rows["income"]["unmapped"].

Canonical Precedence Rule

For income rows, canonical universal mappings take precedence over generic residual classification.

If an income concept is consumed by a canonical operating-statement row, it must not remain in unmapped.

Alias Flattening Rule

Multiple source aliases for the same canonical operating-statement concept must flatten into a single canonical surface row.

Examples:

  • us-gaap:OtherOperatingExpense
  • us-gaap:OtherOperatingExpenses
  • us-gaap:OtherCostAndExpenseOperating

These may differ by filer or period, but they still represent one canonical row such as other_operating_expense.

Per-Period Resolution Rule

Direct canonical matching is resolved per period, not by selecting one global winner for all periods.

For each canonical income row:

  1. Collect all direct statement-row matches.
  2. For each period, keep only candidates with a value in that period.
  3. Choose the best candidate for that period using existing ranking rules.
  4. Build one canonical row whose values and resolved_source_row_keys are assembled period-by-period.

The canonical row's provenance is the union of all consumed aliases, even if a different alias wins in different periods.

Residual Pruning Rule

After canonical income rows are resolved:

  • collect all consumed source row keys
  • collect all consumed concept keys
  • remove any residual income detail row from unmapped if either identifier matches

unmapped is a strict remainder set after income canonicalization.

Synonym vs Aggregate Child Rule

Two cases must remain distinct:

Synonym aliases

Different concept names representing the same canonical meaning.

Behavior:

  • flatten into one canonical surface row
  • do not emit as detail rows
  • do not leave in unmapped

Aggregate child components

Rows that are true components of a higher-level canonical row, such as:

  • SalesAndMarketingExpense
  • GeneralAndAdministrativeExpense used to derive selling_general_and_administrative

Behavior:

  • may appear as detail rows under the canonical parent
  • must not also remain in unmapped once consumed by that canonical parent

Required Invariants

For income parsing, a consumed source may appear in exactly one of these places:

  • canonical surface provenance
  • canonical detail provenance
  • unmapped

It must never appear in more than one place at the same time.

Additional invariants:

  • canonical surface rows are unique by canonical key
  • aliases are flattened into one canonical row
  • resolved_source_row_keys are period-specific
  • normalization counts reflect the post-pruning state

Performance Constraints

  • Use HashSet membership for consumed-source pruning.
  • Build candidate collections once per canonical definition.
  • Avoid UI-side dedupe or post-processing.
  • Keep the parser close to linear in candidate volume per definition.

Test Matrix

The parser must cover:

  • direct alias dedupe for other_operating_expense
  • period-sparse alias merge into a single canonical row
  • pruning of canonically consumed aliases from income.unmapped
  • preservation of truly unrelated residual rows
  • pruning of formula-consumed component rows from income.unmapped

Learnings For Other Statements

The same backend rules should later be applied to balance sheet and cash flow:

  • canonical mapping must outrank residual classification
  • alias resolution should be per-period
  • consumed sources must be removed from unmapped
  • synonym aliases and aggregate child components must be treated differently

When balance sheet and cash flow are upgraded, they should adopt these invariants without changing frontend response shapes.