Expand backend financial statement parsers
This commit is contained in:
103
rust/fiscal-xbrl-core/OPERATING_STATEMENT_PARSER_SPEC.md
Normal file
103
rust/fiscal-xbrl-core/OPERATING_STATEMENT_PARSER_SPEC.md
Normal file
@@ -0,0 +1,103 @@
|
||||
# Operating Statement Parser Spec
|
||||
|
||||
## Purpose
|
||||
This document defines the backend-only parsing rules for operating statement hydration in `fiscal-xbrl-core`.
|
||||
|
||||
This pass is intentionally limited to Rust parser behavior. It must not change frontend files, frontend rendering logic, or API response shapes.
|
||||
|
||||
## Hydration Order
|
||||
1. Generic compact surface mapping builds initial `surface_rows`, `detail_rows`, and `unmapped` residuals.
|
||||
2. Universal income parsing rewrites the income statement into canonical operating-statement rows.
|
||||
3. Canonical income parsing is authoritative for income provenance and must prune any consumed residual rows from `detail_rows["income"]["unmapped"]`.
|
||||
|
||||
## Canonical Precedence Rule
|
||||
For income rows, canonical universal mappings take precedence over generic residual classification.
|
||||
|
||||
If an income concept is consumed by a canonical operating-statement row, it must not remain in `unmapped`.
|
||||
|
||||
## Alias Flattening Rule
|
||||
Multiple source aliases for the same canonical operating-statement concept must flatten into a single canonical surface row.
|
||||
|
||||
Examples:
|
||||
- `us-gaap:OtherOperatingExpense`
|
||||
- `us-gaap:OtherOperatingExpenses`
|
||||
- `us-gaap:OtherCostAndExpenseOperating`
|
||||
|
||||
These may differ by filer or period, but they still represent one canonical row such as `other_operating_expense`.
|
||||
|
||||
## Per-Period Resolution Rule
|
||||
Direct canonical matching is resolved per period, not by selecting one global winner for all periods.
|
||||
|
||||
For each canonical income row:
|
||||
1. Collect all direct statement-row matches.
|
||||
2. For each period, keep only candidates with a value in that period.
|
||||
3. Choose the best candidate for that period using existing ranking rules.
|
||||
4. Build one canonical row whose `values` and `resolved_source_row_keys` are assembled period-by-period.
|
||||
|
||||
The canonical row's provenance is the union of all consumed aliases, even if a different alias wins in different periods.
|
||||
|
||||
## Residual Pruning Rule
|
||||
After canonical income rows are resolved:
|
||||
- collect all consumed source row keys
|
||||
- collect all consumed concept keys
|
||||
- remove any residual income detail row from `unmapped` if either identifier matches
|
||||
|
||||
`unmapped` is a strict remainder set after income canonicalization.
|
||||
|
||||
## Synonym vs Aggregate Child Rule
|
||||
Two cases must remain distinct:
|
||||
|
||||
### Synonym aliases
|
||||
Different concept names representing the same canonical meaning.
|
||||
|
||||
Behavior:
|
||||
- flatten into one canonical surface row
|
||||
- do not emit as detail rows
|
||||
- do not leave in `unmapped`
|
||||
|
||||
### Aggregate child components
|
||||
Rows that are true components of a higher-level canonical row, such as:
|
||||
- `SalesAndMarketingExpense`
|
||||
- `GeneralAndAdministrativeExpense`
|
||||
used to derive `selling_general_and_administrative`
|
||||
|
||||
Behavior:
|
||||
- may appear as detail rows under the canonical parent
|
||||
- must not also remain in `unmapped` once consumed by that canonical parent
|
||||
|
||||
## Required Invariants
|
||||
For income parsing, a consumed source may appear in exactly one of these places:
|
||||
- canonical surface provenance
|
||||
- canonical detail provenance
|
||||
- `unmapped`
|
||||
|
||||
It must never appear in more than one place at the same time.
|
||||
|
||||
Additional invariants:
|
||||
- canonical surface rows are unique by canonical key
|
||||
- aliases are flattened into one canonical row
|
||||
- `resolved_source_row_keys` are period-specific
|
||||
- normalization counts reflect the post-pruning state
|
||||
|
||||
## Performance Constraints
|
||||
- Use `HashSet` membership for consumed-source pruning.
|
||||
- Build candidate collections once per canonical definition.
|
||||
- Avoid UI-side dedupe or post-processing.
|
||||
- Keep the parser close to linear in candidate volume per definition.
|
||||
|
||||
## Test Matrix
|
||||
The parser must cover:
|
||||
- direct alias dedupe for `other_operating_expense`
|
||||
- period-sparse alias merge into a single canonical row
|
||||
- pruning of canonically consumed aliases from `income.unmapped`
|
||||
- preservation of truly unrelated residual rows
|
||||
- pruning of formula-consumed component rows from `income.unmapped`
|
||||
|
||||
## Learnings For Other Statements
|
||||
The same backend rules should later be applied to balance sheet and cash flow:
|
||||
- canonical mapping must outrank residual classification
|
||||
- alias resolution should be per-period
|
||||
- consumed sources must be removed from `unmapped`
|
||||
- synonym aliases and aggregate child components must be treated differently
|
||||
|
||||
When balance sheet and cash flow are upgraded, they should adopt these invariants without changing frontend response shapes.
|
||||
Reference in New Issue
Block a user