# Operating Statement Parser Spec ## Purpose This document defines the backend-only parsing rules for operating statement hydration in `fiscal-xbrl-core`. This pass is intentionally limited to Rust parser behavior. It must not change frontend files, frontend rendering logic, or API response shapes. ## Hydration Order 1. Generic compact surface mapping builds initial `surface_rows`, `detail_rows`, and `unmapped` residuals. 2. Universal income parsing rewrites the income statement into canonical operating-statement rows. 3. Canonical income parsing is authoritative for income provenance and must prune any consumed residual rows from `detail_rows["income"]["unmapped"]`. ## Canonical Precedence Rule For income rows, canonical universal mappings take precedence over generic residual classification. If an income concept is consumed by a canonical operating-statement row, it must not remain in `unmapped`. ## Alias Flattening Rule Multiple source aliases for the same canonical operating-statement concept must flatten into a single canonical surface row. Examples: - `us-gaap:OtherOperatingExpense` - `us-gaap:OtherOperatingExpenses` - `us-gaap:OtherCostAndExpenseOperating` These may differ by filer or period, but they still represent one canonical row such as `other_operating_expense`. ## Per-Period Resolution Rule Direct canonical matching is resolved per period, not by selecting one global winner for all periods. For each canonical income row: 1. Collect all direct statement-row matches. 2. For each period, keep only candidates with a value in that period. 3. Choose the best candidate for that period using existing ranking rules. 4. Build one canonical row whose `values` and `resolved_source_row_keys` are assembled period-by-period. The canonical row's provenance is the union of all consumed aliases, even if a different alias wins in different periods. ## Residual Pruning Rule After canonical income rows are resolved: - collect all consumed source row keys - collect all consumed concept keys - remove any residual income detail row from `unmapped` if either identifier matches `unmapped` is a strict remainder set after income canonicalization. ## Synonym vs Aggregate Child Rule Two cases must remain distinct: ### Synonym aliases Different concept names representing the same canonical meaning. Behavior: - flatten into one canonical surface row - do not emit as detail rows - do not leave in `unmapped` ### Aggregate child components Rows that are true components of a higher-level canonical row, such as: - `SalesAndMarketingExpense` - `GeneralAndAdministrativeExpense` used to derive `selling_general_and_administrative` Behavior: - may appear as detail rows under the canonical parent - must not also remain in `unmapped` once consumed by that canonical parent ## Required Invariants For income parsing, a consumed source may appear in exactly one of these places: - canonical surface provenance - canonical detail provenance - `unmapped` It must never appear in more than one place at the same time. Additional invariants: - canonical surface rows are unique by canonical key - aliases are flattened into one canonical row - `resolved_source_row_keys` are period-specific - normalization counts reflect the post-pruning state ## Performance Constraints - Use `HashSet` membership for consumed-source pruning. - Build candidate collections once per canonical definition. - Avoid UI-side dedupe or post-processing. - Keep the parser close to linear in candidate volume per definition. ## Test Matrix The parser must cover: - direct alias dedupe for `other_operating_expense` - period-sparse alias merge into a single canonical row - pruning of canonically consumed aliases from `income.unmapped` - preservation of truly unrelated residual rows - pruning of formula-consumed component rows from `income.unmapped` ## Learnings For Other Statements The same backend rules should later be applied to balance sheet and cash flow: - canonical mapping must outrank residual classification - alias resolution should be per-period - consumed sources must be removed from `unmapped` - synonym aliases and aggregate child components must be treated differently When balance sheet and cash flow are upgraded, they should adopt these invariants without changing frontend response shapes.