Expand backend financial statement parsers
This commit is contained in:
144
rust/fiscal-xbrl-core/BALANCE_SHEET_PARSER_SPEC.md
Normal file
144
rust/fiscal-xbrl-core/BALANCE_SHEET_PARSER_SPEC.md
Normal file
@@ -0,0 +1,144 @@
|
||||
# Balance Sheet Parser Spec
|
||||
|
||||
## Purpose
|
||||
This document defines the backend-only balance-sheet parsing rules for `fiscal-xbrl-core`.
|
||||
|
||||
This pass is limited to Rust parser behavior and taxonomy packs. It must not modify frontend files, frontend rendering logic, or frontend response shapes.
|
||||
|
||||
## Hydration Order
|
||||
1. Load the selected surface pack.
|
||||
2. For non-core packs, merge in any core balance-sheet surfaces that the selected pack does not override.
|
||||
3. Resolve direct canonical balance rows from statement rows.
|
||||
4. Resolve aggregate-child rows from detail components when direct canonical rows are absent.
|
||||
5. Resolve formula-backed balance rows from already-resolved canonical rows.
|
||||
6. Emit `unmapped` only for rows not consumed by canonical balance parsing.
|
||||
|
||||
## Category Taxonomy
|
||||
Balance rows use these backend category keys:
|
||||
- `current_assets`
|
||||
- `noncurrent_assets`
|
||||
- `current_liabilities`
|
||||
- `noncurrent_liabilities`
|
||||
- `equity`
|
||||
- `derived`
|
||||
- `sector_specific`
|
||||
|
||||
Default rule:
|
||||
- use economic placement first
|
||||
- reserve `sector_specific` for rows that cannot be expressed economically
|
||||
|
||||
## Canonical Precedence Rule
|
||||
Canonical balance mappings take precedence over residual classification.
|
||||
|
||||
If a statement row is consumed by a canonical balance row, it must not remain in `detail_rows["balance"]["unmapped"]`.
|
||||
|
||||
## Alias Flattening Rule
|
||||
Synonymous balance concepts flatten into one canonical surface row.
|
||||
|
||||
Example:
|
||||
- `AccountsReceivableNetCurrent`
|
||||
- `ReceivablesNetCurrent`
|
||||
|
||||
These must become one `accounts_receivable` row with period-aware provenance.
|
||||
|
||||
## Per-Period Resolution Rule
|
||||
Direct balance matching is resolved per period, not by choosing one row globally.
|
||||
|
||||
For each canonical balance row:
|
||||
1. Collect all direct candidates.
|
||||
2. For each period, choose the best candidate with a value in that period.
|
||||
3. Build one canonical row from those period-specific winners.
|
||||
4. Preserve the union of all consumed aliases in `source_concepts`, `source_row_keys`, and `source_fact_ids`.
|
||||
|
||||
## Formula Evaluation Rule
|
||||
Structured formulas are evaluated only after their source surface rows have been resolved.
|
||||
|
||||
Supported operators:
|
||||
- `sum`
|
||||
- `subtract`
|
||||
|
||||
Formula rules:
|
||||
- formulas operate period by period
|
||||
- `sum` may treat nulls as zero when `treat_null_as_zero` is true
|
||||
- `subtract` requires exactly two sources
|
||||
- formula rows inherit provenance from the source surface rows they consume
|
||||
|
||||
## Residual Pruning Rule
|
||||
`balance.unmapped` is a strict remainder set.
|
||||
|
||||
A balance statement row must be excluded from `unmapped` when either of these is true:
|
||||
- its row key was consumed by a canonical balance row
|
||||
- its concept key was consumed by a canonical balance row
|
||||
|
||||
## Helper Surface Rule
|
||||
Some balance rows are parser helpers rather than user-facing canonical output.
|
||||
|
||||
Current helper rows:
|
||||
- `deferred_revenue_current`
|
||||
- `deferred_revenue_noncurrent`
|
||||
- `current_liabilities`
|
||||
- `leases`
|
||||
|
||||
Behavior:
|
||||
- they remain available to formulas
|
||||
- they do not appear in emitted `surface_rows`
|
||||
- they do not create emitted detail buckets
|
||||
- they still consume matched backend sources so those rows do not leak into `unmapped`
|
||||
|
||||
## Synonym vs Aggregate Child Rule
|
||||
Two cases must remain distinct.
|
||||
|
||||
### Synonym aliases
|
||||
Different concept names for the same canonical balance meaning.
|
||||
|
||||
Behavior:
|
||||
- flatten into one canonical surface row
|
||||
- do not emit duplicate detail rows
|
||||
- do not remain in `unmapped`
|
||||
|
||||
### Aggregate child components
|
||||
Rows that legitimately roll into a subtotal or total.
|
||||
|
||||
Behavior:
|
||||
- may remain as detail rows beneath the canonical parent when grouping is enabled
|
||||
- must not remain in `unmapped` after being consumed
|
||||
|
||||
## Sector Placement Decisions
|
||||
Sector rows stay inside the same economic taxonomy.
|
||||
|
||||
Mappings in this pass:
|
||||
- `loans` -> `noncurrent_assets`
|
||||
- `allowance_for_credit_losses` -> `noncurrent_assets`
|
||||
- `deposits` -> `current_liabilities`
|
||||
- `policy_liabilities` -> `noncurrent_liabilities`
|
||||
- `deferred_acquisition_costs` -> `noncurrent_assets`
|
||||
- `investment_property` -> `noncurrent_assets`
|
||||
|
||||
`sector_specific` remains unused by default in this pass.
|
||||
|
||||
## Required Invariants
|
||||
- A consumed balance source must never remain in `balance.unmapped`.
|
||||
- A synonym alias must never create more than one canonical balance row.
|
||||
- Hidden helper surfaces may consume sources but must not appear in emitted `surface_rows`.
|
||||
- Formula-derived rows inherit canonical provenance from their source surfaces.
|
||||
- The frontend response shape remains unchanged.
|
||||
|
||||
## Test Matrix
|
||||
The parser must cover:
|
||||
- direct alias flattening for `accounts_receivable`
|
||||
- period-sparse alias merges into one canonical row
|
||||
- formula derivation for `total_cash_and_equivalents`
|
||||
- formula derivation for `unearned_revenue`
|
||||
- formula derivation for `total_debt`
|
||||
- formula derivation for `net_cash_position`
|
||||
- helper rows staying out of emitted balance surfaces
|
||||
- residual pruning of canonically consumed balance rows
|
||||
- sector packs receiving merged core balance coverage without changing frontend contracts
|
||||
|
||||
## Learnings Reusable For Other Statements
|
||||
The same parser rules should later apply to cash flow:
|
||||
- canonical mapping outranks residual classification
|
||||
- direct aliases should resolve per period
|
||||
- helper rows can exist backend-only when formulas need them
|
||||
- consumed sources must be removed from `unmapped`
|
||||
- sector packs should inherit common canonical coverage rather than duplicating it
|
||||
Reference in New Issue
Block a user