Expand backend financial statement parsers

This commit is contained in:
2026-03-12 21:15:54 -04:00
parent 33ce48f53c
commit 7a7a78340f
13 changed files with 4398 additions and 456 deletions

View File

@@ -0,0 +1,144 @@
# Balance Sheet Parser Spec
## Purpose
This document defines the backend-only balance-sheet parsing rules for `fiscal-xbrl-core`.
This pass is limited to Rust parser behavior and taxonomy packs. It must not modify frontend files, frontend rendering logic, or frontend response shapes.
## Hydration Order
1. Load the selected surface pack.
2. For non-core packs, merge in any core balance-sheet surfaces that the selected pack does not override.
3. Resolve direct canonical balance rows from statement rows.
4. Resolve aggregate-child rows from detail components when direct canonical rows are absent.
5. Resolve formula-backed balance rows from already-resolved canonical rows.
6. Emit `unmapped` only for rows not consumed by canonical balance parsing.
## Category Taxonomy
Balance rows use these backend category keys:
- `current_assets`
- `noncurrent_assets`
- `current_liabilities`
- `noncurrent_liabilities`
- `equity`
- `derived`
- `sector_specific`
Default rule:
- use economic placement first
- reserve `sector_specific` for rows that cannot be expressed economically
## Canonical Precedence Rule
Canonical balance mappings take precedence over residual classification.
If a statement row is consumed by a canonical balance row, it must not remain in `detail_rows["balance"]["unmapped"]`.
## Alias Flattening Rule
Synonymous balance concepts flatten into one canonical surface row.
Example:
- `AccountsReceivableNetCurrent`
- `ReceivablesNetCurrent`
These must become one `accounts_receivable` row with period-aware provenance.
## Per-Period Resolution Rule
Direct balance matching is resolved per period, not by choosing one row globally.
For each canonical balance row:
1. Collect all direct candidates.
2. For each period, choose the best candidate with a value in that period.
3. Build one canonical row from those period-specific winners.
4. Preserve the union of all consumed aliases in `source_concepts`, `source_row_keys`, and `source_fact_ids`.
## Formula Evaluation Rule
Structured formulas are evaluated only after their source surface rows have been resolved.
Supported operators:
- `sum`
- `subtract`
Formula rules:
- formulas operate period by period
- `sum` may treat nulls as zero when `treat_null_as_zero` is true
- `subtract` requires exactly two sources
- formula rows inherit provenance from the source surface rows they consume
## Residual Pruning Rule
`balance.unmapped` is a strict remainder set.
A balance statement row must be excluded from `unmapped` when either of these is true:
- its row key was consumed by a canonical balance row
- its concept key was consumed by a canonical balance row
## Helper Surface Rule
Some balance rows are parser helpers rather than user-facing canonical output.
Current helper rows:
- `deferred_revenue_current`
- `deferred_revenue_noncurrent`
- `current_liabilities`
- `leases`
Behavior:
- they remain available to formulas
- they do not appear in emitted `surface_rows`
- they do not create emitted detail buckets
- they still consume matched backend sources so those rows do not leak into `unmapped`
## Synonym vs Aggregate Child Rule
Two cases must remain distinct.
### Synonym aliases
Different concept names for the same canonical balance meaning.
Behavior:
- flatten into one canonical surface row
- do not emit duplicate detail rows
- do not remain in `unmapped`
### Aggregate child components
Rows that legitimately roll into a subtotal or total.
Behavior:
- may remain as detail rows beneath the canonical parent when grouping is enabled
- must not remain in `unmapped` after being consumed
## Sector Placement Decisions
Sector rows stay inside the same economic taxonomy.
Mappings in this pass:
- `loans` -> `noncurrent_assets`
- `allowance_for_credit_losses` -> `noncurrent_assets`
- `deposits` -> `current_liabilities`
- `policy_liabilities` -> `noncurrent_liabilities`
- `deferred_acquisition_costs` -> `noncurrent_assets`
- `investment_property` -> `noncurrent_assets`
`sector_specific` remains unused by default in this pass.
## Required Invariants
- A consumed balance source must never remain in `balance.unmapped`.
- A synonym alias must never create more than one canonical balance row.
- Hidden helper surfaces may consume sources but must not appear in emitted `surface_rows`.
- Formula-derived rows inherit canonical provenance from their source surfaces.
- The frontend response shape remains unchanged.
## Test Matrix
The parser must cover:
- direct alias flattening for `accounts_receivable`
- period-sparse alias merges into one canonical row
- formula derivation for `total_cash_and_equivalents`
- formula derivation for `unearned_revenue`
- formula derivation for `total_debt`
- formula derivation for `net_cash_position`
- helper rows staying out of emitted balance surfaces
- residual pruning of canonically consumed balance rows
- sector packs receiving merged core balance coverage without changing frontend contracts
## Learnings Reusable For Other Statements
The same parser rules should later apply to cash flow:
- canonical mapping outranks residual classification
- direct aliases should resolve per period
- helper rows can exist backend-only when formulas need them
- consumed sources must be removed from `unmapped`
- sector packs should inherit common canonical coverage rather than duplicating it