4.2 KiB
Operating Statement Parser Spec
Purpose
This document defines the backend-only parsing rules for operating statement hydration in fiscal-xbrl-core.
This pass is intentionally limited to Rust parser behavior. It must not change frontend files, frontend rendering logic, or API response shapes.
Hydration Order
- Generic compact surface mapping builds initial
surface_rows,detail_rows, andunmappedresiduals. - Universal income parsing rewrites the income statement into canonical operating-statement rows.
- Canonical income parsing is authoritative for income provenance and must prune any consumed residual rows from
detail_rows["income"]["unmapped"].
Canonical Precedence Rule
For income rows, canonical universal mappings take precedence over generic residual classification.
If an income concept is consumed by a canonical operating-statement row, it must not remain in unmapped.
Alias Flattening Rule
Multiple source aliases for the same canonical operating-statement concept must flatten into a single canonical surface row.
Examples:
us-gaap:OtherOperatingExpenseus-gaap:OtherOperatingExpensesus-gaap:OtherCostAndExpenseOperating
These may differ by filer or period, but they still represent one canonical row such as other_operating_expense.
Per-Period Resolution Rule
Direct canonical matching is resolved per period, not by selecting one global winner for all periods.
For each canonical income row:
- Collect all direct statement-row matches.
- For each period, keep only candidates with a value in that period.
- Choose the best candidate for that period using existing ranking rules.
- Build one canonical row whose
valuesandresolved_source_row_keysare assembled period-by-period.
The canonical row's provenance is the union of all consumed aliases, even if a different alias wins in different periods.
Residual Pruning Rule
After canonical income rows are resolved:
- collect all consumed source row keys
- collect all consumed concept keys
- remove any residual income detail row from
unmappedif either identifier matches
unmapped is a strict remainder set after income canonicalization.
Synonym vs Aggregate Child Rule
Two cases must remain distinct:
Synonym aliases
Different concept names representing the same canonical meaning.
Behavior:
- flatten into one canonical surface row
- do not emit as detail rows
- do not leave in
unmapped
Aggregate child components
Rows that are true components of a higher-level canonical row, such as:
SalesAndMarketingExpenseGeneralAndAdministrativeExpenseused to deriveselling_general_and_administrative
Behavior:
- may appear as detail rows under the canonical parent
- must not also remain in
unmappedonce consumed by that canonical parent
Required Invariants
For income parsing, a consumed source may appear in exactly one of these places:
- canonical surface provenance
- canonical detail provenance
unmapped
It must never appear in more than one place at the same time.
Additional invariants:
- canonical surface rows are unique by canonical key
- aliases are flattened into one canonical row
resolved_source_row_keysare period-specific- normalization counts reflect the post-pruning state
Performance Constraints
- Use
HashSetmembership for consumed-source pruning. - Build candidate collections once per canonical definition.
- Avoid UI-side dedupe or post-processing.
- Keep the parser close to linear in candidate volume per definition.
Test Matrix
The parser must cover:
- direct alias dedupe for
other_operating_expense - period-sparse alias merge into a single canonical row
- pruning of canonically consumed aliases from
income.unmapped - preservation of truly unrelated residual rows
- pruning of formula-consumed component rows from
income.unmapped
Learnings For Other Statements
The same backend rules should later be applied to balance sheet and cash flow:
- canonical mapping must outrank residual classification
- alias resolution should be per-period
- consumed sources must be removed from
unmapped - synonym aliases and aggregate child components must be treated differently
When balance sheet and cash flow are upgraded, they should adopt these invariants without changing frontend response shapes.