Expand backend financial statement parsers
This commit is contained in:
144
rust/fiscal-xbrl-core/BALANCE_SHEET_PARSER_SPEC.md
Normal file
144
rust/fiscal-xbrl-core/BALANCE_SHEET_PARSER_SPEC.md
Normal file
@@ -0,0 +1,144 @@
|
|||||||
|
# Balance Sheet Parser Spec
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
This document defines the backend-only balance-sheet parsing rules for `fiscal-xbrl-core`.
|
||||||
|
|
||||||
|
This pass is limited to Rust parser behavior and taxonomy packs. It must not modify frontend files, frontend rendering logic, or frontend response shapes.
|
||||||
|
|
||||||
|
## Hydration Order
|
||||||
|
1. Load the selected surface pack.
|
||||||
|
2. For non-core packs, merge in any core balance-sheet surfaces that the selected pack does not override.
|
||||||
|
3. Resolve direct canonical balance rows from statement rows.
|
||||||
|
4. Resolve aggregate-child rows from detail components when direct canonical rows are absent.
|
||||||
|
5. Resolve formula-backed balance rows from already-resolved canonical rows.
|
||||||
|
6. Emit `unmapped` only for rows not consumed by canonical balance parsing.
|
||||||
|
|
||||||
|
## Category Taxonomy
|
||||||
|
Balance rows use these backend category keys:
|
||||||
|
- `current_assets`
|
||||||
|
- `noncurrent_assets`
|
||||||
|
- `current_liabilities`
|
||||||
|
- `noncurrent_liabilities`
|
||||||
|
- `equity`
|
||||||
|
- `derived`
|
||||||
|
- `sector_specific`
|
||||||
|
|
||||||
|
Default rule:
|
||||||
|
- use economic placement first
|
||||||
|
- reserve `sector_specific` for rows that cannot be expressed economically
|
||||||
|
|
||||||
|
## Canonical Precedence Rule
|
||||||
|
Canonical balance mappings take precedence over residual classification.
|
||||||
|
|
||||||
|
If a statement row is consumed by a canonical balance row, it must not remain in `detail_rows["balance"]["unmapped"]`.
|
||||||
|
|
||||||
|
## Alias Flattening Rule
|
||||||
|
Synonymous balance concepts flatten into one canonical surface row.
|
||||||
|
|
||||||
|
Example:
|
||||||
|
- `AccountsReceivableNetCurrent`
|
||||||
|
- `ReceivablesNetCurrent`
|
||||||
|
|
||||||
|
These must become one `accounts_receivable` row with period-aware provenance.
|
||||||
|
|
||||||
|
## Per-Period Resolution Rule
|
||||||
|
Direct balance matching is resolved per period, not by choosing one row globally.
|
||||||
|
|
||||||
|
For each canonical balance row:
|
||||||
|
1. Collect all direct candidates.
|
||||||
|
2. For each period, choose the best candidate with a value in that period.
|
||||||
|
3. Build one canonical row from those period-specific winners.
|
||||||
|
4. Preserve the union of all consumed aliases in `source_concepts`, `source_row_keys`, and `source_fact_ids`.
|
||||||
|
|
||||||
|
## Formula Evaluation Rule
|
||||||
|
Structured formulas are evaluated only after their source surface rows have been resolved.
|
||||||
|
|
||||||
|
Supported operators:
|
||||||
|
- `sum`
|
||||||
|
- `subtract`
|
||||||
|
|
||||||
|
Formula rules:
|
||||||
|
- formulas operate period by period
|
||||||
|
- `sum` may treat nulls as zero when `treat_null_as_zero` is true
|
||||||
|
- `subtract` requires exactly two sources
|
||||||
|
- formula rows inherit provenance from the source surface rows they consume
|
||||||
|
|
||||||
|
## Residual Pruning Rule
|
||||||
|
`balance.unmapped` is a strict remainder set.
|
||||||
|
|
||||||
|
A balance statement row must be excluded from `unmapped` when either of these is true:
|
||||||
|
- its row key was consumed by a canonical balance row
|
||||||
|
- its concept key was consumed by a canonical balance row
|
||||||
|
|
||||||
|
## Helper Surface Rule
|
||||||
|
Some balance rows are parser helpers rather than user-facing canonical output.
|
||||||
|
|
||||||
|
Current helper rows:
|
||||||
|
- `deferred_revenue_current`
|
||||||
|
- `deferred_revenue_noncurrent`
|
||||||
|
- `current_liabilities`
|
||||||
|
- `leases`
|
||||||
|
|
||||||
|
Behavior:
|
||||||
|
- they remain available to formulas
|
||||||
|
- they do not appear in emitted `surface_rows`
|
||||||
|
- they do not create emitted detail buckets
|
||||||
|
- they still consume matched backend sources so those rows do not leak into `unmapped`
|
||||||
|
|
||||||
|
## Synonym vs Aggregate Child Rule
|
||||||
|
Two cases must remain distinct.
|
||||||
|
|
||||||
|
### Synonym aliases
|
||||||
|
Different concept names for the same canonical balance meaning.
|
||||||
|
|
||||||
|
Behavior:
|
||||||
|
- flatten into one canonical surface row
|
||||||
|
- do not emit duplicate detail rows
|
||||||
|
- do not remain in `unmapped`
|
||||||
|
|
||||||
|
### Aggregate child components
|
||||||
|
Rows that legitimately roll into a subtotal or total.
|
||||||
|
|
||||||
|
Behavior:
|
||||||
|
- may remain as detail rows beneath the canonical parent when grouping is enabled
|
||||||
|
- must not remain in `unmapped` after being consumed
|
||||||
|
|
||||||
|
## Sector Placement Decisions
|
||||||
|
Sector rows stay inside the same economic taxonomy.
|
||||||
|
|
||||||
|
Mappings in this pass:
|
||||||
|
- `loans` -> `noncurrent_assets`
|
||||||
|
- `allowance_for_credit_losses` -> `noncurrent_assets`
|
||||||
|
- `deposits` -> `current_liabilities`
|
||||||
|
- `policy_liabilities` -> `noncurrent_liabilities`
|
||||||
|
- `deferred_acquisition_costs` -> `noncurrent_assets`
|
||||||
|
- `investment_property` -> `noncurrent_assets`
|
||||||
|
|
||||||
|
`sector_specific` remains unused by default in this pass.
|
||||||
|
|
||||||
|
## Required Invariants
|
||||||
|
- A consumed balance source must never remain in `balance.unmapped`.
|
||||||
|
- A synonym alias must never create more than one canonical balance row.
|
||||||
|
- Hidden helper surfaces may consume sources but must not appear in emitted `surface_rows`.
|
||||||
|
- Formula-derived rows inherit canonical provenance from their source surfaces.
|
||||||
|
- The frontend response shape remains unchanged.
|
||||||
|
|
||||||
|
## Test Matrix
|
||||||
|
The parser must cover:
|
||||||
|
- direct alias flattening for `accounts_receivable`
|
||||||
|
- period-sparse alias merges into one canonical row
|
||||||
|
- formula derivation for `total_cash_and_equivalents`
|
||||||
|
- formula derivation for `unearned_revenue`
|
||||||
|
- formula derivation for `total_debt`
|
||||||
|
- formula derivation for `net_cash_position`
|
||||||
|
- helper rows staying out of emitted balance surfaces
|
||||||
|
- residual pruning of canonically consumed balance rows
|
||||||
|
- sector packs receiving merged core balance coverage without changing frontend contracts
|
||||||
|
|
||||||
|
## Learnings Reusable For Other Statements
|
||||||
|
The same parser rules should later apply to cash flow:
|
||||||
|
- canonical mapping outranks residual classification
|
||||||
|
- direct aliases should resolve per period
|
||||||
|
- helper rows can exist backend-only when formulas need them
|
||||||
|
- consumed sources must be removed from `unmapped`
|
||||||
|
- sector packs should inherit common canonical coverage rather than duplicating it
|
||||||
155
rust/fiscal-xbrl-core/CASH_FLOW_STATEMENT_PARSER_SPEC.md
Normal file
155
rust/fiscal-xbrl-core/CASH_FLOW_STATEMENT_PARSER_SPEC.md
Normal file
@@ -0,0 +1,155 @@
|
|||||||
|
# Cash Flow Statement Parser Spec
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
This document defines the backend-only cash-flow parsing rules for `fiscal-xbrl-core`.
|
||||||
|
|
||||||
|
This pass is limited to Rust parser behavior, taxonomy packs, and backend comparison tooling. It must not modify frontend files, frontend rendering logic, or frontend response shapes.
|
||||||
|
|
||||||
|
## Hydration Order
|
||||||
|
1. Load the selected surface pack.
|
||||||
|
2. For non-core packs, merge in any core balance-sheet and cash-flow surfaces that the selected pack does not override.
|
||||||
|
3. Resolve direct canonical cash-flow rows from statement rows.
|
||||||
|
4. Resolve aggregate-child cash-flow rows from matched detail components when direct canonical rows are absent.
|
||||||
|
5. Resolve formula-backed cash-flow rows from already-resolved canonical rows and helper rows.
|
||||||
|
6. Emit `unmapped` only for rows not consumed by canonical cash-flow parsing.
|
||||||
|
|
||||||
|
## Category Model
|
||||||
|
Cash-flow rows use these backend category keys:
|
||||||
|
- `operating`
|
||||||
|
- `investing`
|
||||||
|
- `financing`
|
||||||
|
- `free_cash_flow`
|
||||||
|
- `helper`
|
||||||
|
|
||||||
|
Rules:
|
||||||
|
- `helper` rows are backend-only and use `include_in_output: false`.
|
||||||
|
- Only `operating`, `investing`, `financing`, and `free_cash_flow` should appear in emitted `surface_rows`.
|
||||||
|
|
||||||
|
## Canonical Precedence Rule
|
||||||
|
Canonical cash-flow mappings take precedence over residual classification.
|
||||||
|
|
||||||
|
If a statement row is consumed by a canonical cash-flow row, it must not remain in `detail_rows["cash_flow"]["unmapped"]`.
|
||||||
|
|
||||||
|
## Alias Flattening Rule
|
||||||
|
Synonymous cash-flow concepts flatten into one canonical surface row.
|
||||||
|
|
||||||
|
Example:
|
||||||
|
- `NetCashProvidedByUsedInOperatingActivities`
|
||||||
|
- `NetCashProvidedByUsedInOperatingActivitiesContinuingOperations`
|
||||||
|
|
||||||
|
These must become one `operating_cash_flow` row with period-aware provenance.
|
||||||
|
|
||||||
|
## Per-Period Resolution Rule
|
||||||
|
Direct cash-flow matching is resolved per period, not by choosing one row globally.
|
||||||
|
|
||||||
|
For each canonical cash-flow row:
|
||||||
|
1. Collect all direct candidates.
|
||||||
|
2. For each period, choose the best candidate with a value in that period.
|
||||||
|
3. Build one canonical row from those period-specific winners.
|
||||||
|
4. Preserve the union of all consumed aliases in `source_concepts`, `source_row_keys`, and `source_fact_ids`.
|
||||||
|
|
||||||
|
## Sign Normalization Rule
|
||||||
|
Some canonical cash-flow rows require sign normalization.
|
||||||
|
|
||||||
|
Supported transform:
|
||||||
|
- `invert`
|
||||||
|
|
||||||
|
Rule:
|
||||||
|
- sign transforms are applied after direct or aggregate resolution
|
||||||
|
- sign transforms are applied before formula evaluation consumes the row
|
||||||
|
- emitted detail rows inherit the same transform when they belong to the transformed canonical row
|
||||||
|
- provenance is preserved unchanged
|
||||||
|
|
||||||
|
## Formula Rule
|
||||||
|
Structured formulas are evaluated only after their source surface rows have been resolved.
|
||||||
|
|
||||||
|
Supported operators:
|
||||||
|
- `sum`
|
||||||
|
- `subtract`
|
||||||
|
|
||||||
|
Current formulas:
|
||||||
|
- `changes_unearned_revenue = contract_liability_incurred - contract_liability_recognized`
|
||||||
|
- `changes_other_operating_activities = changes_other_current_assets + changes_other_current_liabilities + changes_other_noncurrent_assets + changes_other_noncurrent_liabilities`
|
||||||
|
- `free_cash_flow = operating_cash_flow + capital_expenditures`
|
||||||
|
|
||||||
|
## Helper Row Rule
|
||||||
|
Helper rows exist only to support formulas and canonical grouping.
|
||||||
|
|
||||||
|
Current helper rows:
|
||||||
|
- `contract_liability_incurred`
|
||||||
|
- `contract_liability_recognized`
|
||||||
|
- `changes_other_current_assets`
|
||||||
|
- `changes_other_current_liabilities`
|
||||||
|
- `changes_other_noncurrent_assets`
|
||||||
|
- `changes_other_noncurrent_liabilities`
|
||||||
|
|
||||||
|
Behavior:
|
||||||
|
- helper rows remain available for formula evaluation
|
||||||
|
- helper rows do not appear in emitted `surface_rows`
|
||||||
|
- helper rows do not create emitted detail buckets
|
||||||
|
- helper rows still consume matched backend sources so those rows do not leak into `unmapped`
|
||||||
|
|
||||||
|
## Residual Pruning Rule
|
||||||
|
`cash_flow.unmapped` is a strict remainder set.
|
||||||
|
|
||||||
|
A cash-flow statement row must be excluded from `unmapped` when either of these is true:
|
||||||
|
- its row key was consumed by a canonical cash-flow row
|
||||||
|
- its concept key was consumed by a canonical cash-flow row
|
||||||
|
|
||||||
|
## Sector Inheritance Rule
|
||||||
|
Sector packs inherit the core cash-flow taxonomy unless they provide an explicit cash-flow override.
|
||||||
|
|
||||||
|
Current behavior:
|
||||||
|
- bank/lender inherits core cash-flow rows
|
||||||
|
- broker/asset manager inherits core cash-flow rows
|
||||||
|
- insurance inherits core cash-flow rows
|
||||||
|
- REIT/real estate inherits core cash-flow rows
|
||||||
|
|
||||||
|
No first-pass sector-specific cash-flow overrides are required.
|
||||||
|
|
||||||
|
## Synonym vs Aggregate Child Rule
|
||||||
|
Two cases must remain distinct.
|
||||||
|
|
||||||
|
### Synonym aliases
|
||||||
|
Different concept names for the same canonical cash-flow meaning.
|
||||||
|
|
||||||
|
Behavior:
|
||||||
|
- flatten into one canonical surface row
|
||||||
|
- do not emit duplicate detail rows
|
||||||
|
- do not remain in `unmapped`
|
||||||
|
|
||||||
|
### Aggregate child components
|
||||||
|
Rows that legitimately roll into a subtotal or grouped adjustment row.
|
||||||
|
|
||||||
|
Behavior:
|
||||||
|
- may remain as detail rows beneath the canonical parent when grouping is enabled
|
||||||
|
- must not remain in `unmapped` after being consumed
|
||||||
|
|
||||||
|
## Required Invariants
|
||||||
|
- A consumed cash-flow source must never remain in `cash_flow.unmapped`.
|
||||||
|
- A synonym alias must never create more than one canonical cash-flow row.
|
||||||
|
- Hidden helper surfaces may consume sources but must not appear in emitted `surface_rows`.
|
||||||
|
- Formula-derived rows inherit canonical provenance from their source surfaces.
|
||||||
|
- The frontend response shape remains unchanged.
|
||||||
|
|
||||||
|
## Test Matrix
|
||||||
|
The parser must cover:
|
||||||
|
- direct sign inversion for `capital_expenditures`
|
||||||
|
- direct sign inversion for `debt_repaid`
|
||||||
|
- direct sign inversion for `share_repurchases`
|
||||||
|
- direct mapping for `operating_cash_flow`
|
||||||
|
- formula derivation for `changes_unearned_revenue`
|
||||||
|
- formula derivation for `changes_other_operating_activities`
|
||||||
|
- formula derivation for `free_cash_flow`
|
||||||
|
- helper rows staying out of emitted cash-flow surfaces
|
||||||
|
- residual pruning of canonically consumed cash-flow rows
|
||||||
|
- sector packs receiving merged core cash-flow coverage without changing frontend contracts
|
||||||
|
- fallback classification for fact-only cash-flow concepts such as `IncreaseDecreaseInAccountsReceivable` and `PaymentsOfDividends`
|
||||||
|
|
||||||
|
## Learnings Reusable For Other Statements
|
||||||
|
The same parser rules now apply consistently across income, balance, and cash flow:
|
||||||
|
- canonical mapping outranks residual classification
|
||||||
|
- direct aliases resolve per period
|
||||||
|
- helper rows may exist backend-only when formulas need them
|
||||||
|
- consumed sources must be removed from `unmapped`
|
||||||
|
- sector packs inherit common canonical coverage instead of duplicating it
|
||||||
103
rust/fiscal-xbrl-core/OPERATING_STATEMENT_PARSER_SPEC.md
Normal file
103
rust/fiscal-xbrl-core/OPERATING_STATEMENT_PARSER_SPEC.md
Normal file
@@ -0,0 +1,103 @@
|
|||||||
|
# Operating Statement Parser Spec
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
This document defines the backend-only parsing rules for operating statement hydration in `fiscal-xbrl-core`.
|
||||||
|
|
||||||
|
This pass is intentionally limited to Rust parser behavior. It must not change frontend files, frontend rendering logic, or API response shapes.
|
||||||
|
|
||||||
|
## Hydration Order
|
||||||
|
1. Generic compact surface mapping builds initial `surface_rows`, `detail_rows`, and `unmapped` residuals.
|
||||||
|
2. Universal income parsing rewrites the income statement into canonical operating-statement rows.
|
||||||
|
3. Canonical income parsing is authoritative for income provenance and must prune any consumed residual rows from `detail_rows["income"]["unmapped"]`.
|
||||||
|
|
||||||
|
## Canonical Precedence Rule
|
||||||
|
For income rows, canonical universal mappings take precedence over generic residual classification.
|
||||||
|
|
||||||
|
If an income concept is consumed by a canonical operating-statement row, it must not remain in `unmapped`.
|
||||||
|
|
||||||
|
## Alias Flattening Rule
|
||||||
|
Multiple source aliases for the same canonical operating-statement concept must flatten into a single canonical surface row.
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
- `us-gaap:OtherOperatingExpense`
|
||||||
|
- `us-gaap:OtherOperatingExpenses`
|
||||||
|
- `us-gaap:OtherCostAndExpenseOperating`
|
||||||
|
|
||||||
|
These may differ by filer or period, but they still represent one canonical row such as `other_operating_expense`.
|
||||||
|
|
||||||
|
## Per-Period Resolution Rule
|
||||||
|
Direct canonical matching is resolved per period, not by selecting one global winner for all periods.
|
||||||
|
|
||||||
|
For each canonical income row:
|
||||||
|
1. Collect all direct statement-row matches.
|
||||||
|
2. For each period, keep only candidates with a value in that period.
|
||||||
|
3. Choose the best candidate for that period using existing ranking rules.
|
||||||
|
4. Build one canonical row whose `values` and `resolved_source_row_keys` are assembled period-by-period.
|
||||||
|
|
||||||
|
The canonical row's provenance is the union of all consumed aliases, even if a different alias wins in different periods.
|
||||||
|
|
||||||
|
## Residual Pruning Rule
|
||||||
|
After canonical income rows are resolved:
|
||||||
|
- collect all consumed source row keys
|
||||||
|
- collect all consumed concept keys
|
||||||
|
- remove any residual income detail row from `unmapped` if either identifier matches
|
||||||
|
|
||||||
|
`unmapped` is a strict remainder set after income canonicalization.
|
||||||
|
|
||||||
|
## Synonym vs Aggregate Child Rule
|
||||||
|
Two cases must remain distinct:
|
||||||
|
|
||||||
|
### Synonym aliases
|
||||||
|
Different concept names representing the same canonical meaning.
|
||||||
|
|
||||||
|
Behavior:
|
||||||
|
- flatten into one canonical surface row
|
||||||
|
- do not emit as detail rows
|
||||||
|
- do not leave in `unmapped`
|
||||||
|
|
||||||
|
### Aggregate child components
|
||||||
|
Rows that are true components of a higher-level canonical row, such as:
|
||||||
|
- `SalesAndMarketingExpense`
|
||||||
|
- `GeneralAndAdministrativeExpense`
|
||||||
|
used to derive `selling_general_and_administrative`
|
||||||
|
|
||||||
|
Behavior:
|
||||||
|
- may appear as detail rows under the canonical parent
|
||||||
|
- must not also remain in `unmapped` once consumed by that canonical parent
|
||||||
|
|
||||||
|
## Required Invariants
|
||||||
|
For income parsing, a consumed source may appear in exactly one of these places:
|
||||||
|
- canonical surface provenance
|
||||||
|
- canonical detail provenance
|
||||||
|
- `unmapped`
|
||||||
|
|
||||||
|
It must never appear in more than one place at the same time.
|
||||||
|
|
||||||
|
Additional invariants:
|
||||||
|
- canonical surface rows are unique by canonical key
|
||||||
|
- aliases are flattened into one canonical row
|
||||||
|
- `resolved_source_row_keys` are period-specific
|
||||||
|
- normalization counts reflect the post-pruning state
|
||||||
|
|
||||||
|
## Performance Constraints
|
||||||
|
- Use `HashSet` membership for consumed-source pruning.
|
||||||
|
- Build candidate collections once per canonical definition.
|
||||||
|
- Avoid UI-side dedupe or post-processing.
|
||||||
|
- Keep the parser close to linear in candidate volume per definition.
|
||||||
|
|
||||||
|
## Test Matrix
|
||||||
|
The parser must cover:
|
||||||
|
- direct alias dedupe for `other_operating_expense`
|
||||||
|
- period-sparse alias merge into a single canonical row
|
||||||
|
- pruning of canonically consumed aliases from `income.unmapped`
|
||||||
|
- preservation of truly unrelated residual rows
|
||||||
|
- pruning of formula-consumed component rows from `income.unmapped`
|
||||||
|
|
||||||
|
## Learnings For Other Statements
|
||||||
|
The same backend rules should later be applied to balance sheet and cash flow:
|
||||||
|
- canonical mapping must outrank residual classification
|
||||||
|
- alias resolution should be per-period
|
||||||
|
- consumed sources must be removed from `unmapped`
|
||||||
|
- synonym aliases and aggregate child components must be treated differently
|
||||||
|
|
||||||
|
When balance sheet and cash flow are upgraded, they should adopt these invariants without changing frontend response shapes.
|
||||||
@@ -37,10 +37,12 @@ static IDENTIFIER_RE: Lazy<Regex> = Lazy::new(|| {
|
|||||||
Regex::new(r#"(?is)<(?:[a-z0-9_\-]+:)?identifier\b[^>]*\bscheme=["']([^"']+)["'][^>]*>(.*?)</(?:[a-z0-9_\-]+:)?identifier>"#).unwrap()
|
Regex::new(r#"(?is)<(?:[a-z0-9_\-]+:)?identifier\b[^>]*\bscheme=["']([^"']+)["'][^>]*>(.*?)</(?:[a-z0-9_\-]+:)?identifier>"#).unwrap()
|
||||||
});
|
});
|
||||||
static SEGMENT_RE: Lazy<Regex> = Lazy::new(|| {
|
static SEGMENT_RE: Lazy<Regex> = Lazy::new(|| {
|
||||||
Regex::new(r#"(?is)<(?:[a-z0-9_\-]+:)?segment\b[^>]*>(.*?)</(?:[a-z0-9_\-]+:)?segment>"#).unwrap()
|
Regex::new(r#"(?is)<(?:[a-z0-9_\-]+:)?segment\b[^>]*>(.*?)</(?:[a-z0-9_\-]+:)?segment>"#)
|
||||||
|
.unwrap()
|
||||||
});
|
});
|
||||||
static SCENARIO_RE: Lazy<Regex> = Lazy::new(|| {
|
static SCENARIO_RE: Lazy<Regex> = Lazy::new(|| {
|
||||||
Regex::new(r#"(?is)<(?:[a-z0-9_\-]+:)?scenario\b[^>]*>(.*?)</(?:[a-z0-9_\-]+:)?scenario>"#).unwrap()
|
Regex::new(r#"(?is)<(?:[a-z0-9_\-]+:)?scenario\b[^>]*>(.*?)</(?:[a-z0-9_\-]+:)?scenario>"#)
|
||||||
|
.unwrap()
|
||||||
});
|
});
|
||||||
static START_DATE_RE: Lazy<Regex> = Lazy::new(|| {
|
static START_DATE_RE: Lazy<Regex> = Lazy::new(|| {
|
||||||
Regex::new(r#"(?is)<(?:[a-z0-9_\-]+:)?startDate>(.*?)</(?:[a-z0-9_\-]+:)?startDate>"#).unwrap()
|
Regex::new(r#"(?is)<(?:[a-z0-9_\-]+:)?startDate>(.*?)</(?:[a-z0-9_\-]+:)?startDate>"#).unwrap()
|
||||||
@@ -55,7 +57,8 @@ static MEASURE_RE: Lazy<Regex> = Lazy::new(|| {
|
|||||||
Regex::new(r#"(?is)<(?:[a-z0-9_\-]+:)?measure>(.*?)</(?:[a-z0-9_\-]+:)?measure>"#).unwrap()
|
Regex::new(r#"(?is)<(?:[a-z0-9_\-]+:)?measure>(.*?)</(?:[a-z0-9_\-]+:)?measure>"#).unwrap()
|
||||||
});
|
});
|
||||||
static LABEL_LINK_RE: Lazy<Regex> = Lazy::new(|| {
|
static LABEL_LINK_RE: Lazy<Regex> = Lazy::new(|| {
|
||||||
Regex::new(r#"(?is)<(?:[a-z0-9_\-]+:)?labelLink\b[^>]*>(.*?)</(?:[a-z0-9_\-]+:)?labelLink>"#).unwrap()
|
Regex::new(r#"(?is)<(?:[a-z0-9_\-]+:)?labelLink\b[^>]*>(.*?)</(?:[a-z0-9_\-]+:)?labelLink>"#)
|
||||||
|
.unwrap()
|
||||||
});
|
});
|
||||||
static PRESENTATION_LINK_RE: Lazy<Regex> = Lazy::new(|| {
|
static PRESENTATION_LINK_RE: Lazy<Regex> = Lazy::new(|| {
|
||||||
Regex::new(r#"(?is)<(?:[a-z0-9_\-]+:)?presentationLink\b([^>]*)>(.*?)</(?:[a-z0-9_\-]+:)?presentationLink>"#).unwrap()
|
Regex::new(r#"(?is)<(?:[a-z0-9_\-]+:)?presentationLink\b([^>]*)>(.*?)</(?:[a-z0-9_\-]+:)?presentationLink>"#).unwrap()
|
||||||
@@ -67,12 +70,14 @@ static LABEL_RESOURCE_RE: Lazy<Regex> = Lazy::new(|| {
|
|||||||
Regex::new(r#"(?is)<(?:[a-z0-9_\-]+:)?label\b([^>]*)>(.*?)</(?:[a-z0-9_\-]+:)?label>"#).unwrap()
|
Regex::new(r#"(?is)<(?:[a-z0-9_\-]+:)?label\b([^>]*)>(.*?)</(?:[a-z0-9_\-]+:)?label>"#).unwrap()
|
||||||
});
|
});
|
||||||
static LABEL_ARC_RE: Lazy<Regex> = Lazy::new(|| {
|
static LABEL_ARC_RE: Lazy<Regex> = Lazy::new(|| {
|
||||||
Regex::new(r#"(?is)<(?:[a-z0-9_\-]+:)?labelArc\b([^>]*)/?>(?:</(?:[a-z0-9_\-]+:)?labelArc>)?"#).unwrap()
|
Regex::new(r#"(?is)<(?:[a-z0-9_\-]+:)?labelArc\b([^>]*)/?>(?:</(?:[a-z0-9_\-]+:)?labelArc>)?"#)
|
||||||
|
.unwrap()
|
||||||
});
|
});
|
||||||
static PRESENTATION_ARC_RE: Lazy<Regex> = Lazy::new(|| {
|
static PRESENTATION_ARC_RE: Lazy<Regex> = Lazy::new(|| {
|
||||||
Regex::new(r#"(?is)<(?:[a-z0-9_\-]+:)?presentationArc\b([^>]*)/?>(?:</(?:[a-z0-9_\-]+:)?presentationArc>)?"#).unwrap()
|
Regex::new(r#"(?is)<(?:[a-z0-9_\-]+:)?presentationArc\b([^>]*)/?>(?:</(?:[a-z0-9_\-]+:)?presentationArc>)?"#).unwrap()
|
||||||
});
|
});
|
||||||
static ATTR_RE: Lazy<Regex> = Lazy::new(|| Regex::new(r#"([a-zA-Z0-9:_\-]+)=["']([^"']+)["']"#).unwrap());
|
static ATTR_RE: Lazy<Regex> =
|
||||||
|
Lazy::new(|| Regex::new(r#"([a-zA-Z0-9:_\-]+)=["']([^"']+)["']"#).unwrap());
|
||||||
|
|
||||||
#[derive(Debug, Deserialize)]
|
#[derive(Debug, Deserialize)]
|
||||||
#[serde(rename_all = "camelCase")]
|
#[serde(rename_all = "camelCase")]
|
||||||
@@ -451,7 +456,8 @@ pub fn hydrate_filing(input: HydrateFilingRequest) -> Result<HydrateFilingRespon
|
|||||||
});
|
});
|
||||||
};
|
};
|
||||||
|
|
||||||
let instance_text = fetch_text(&client, &instance_asset.url).context("fetch request failed for XBRL instance")?;
|
let instance_text = fetch_text(&client, &instance_asset.url)
|
||||||
|
.context("fetch request failed for XBRL instance")?;
|
||||||
let parsed_instance = parse_xbrl_instance(&instance_text, Some(instance_asset.name.clone()));
|
let parsed_instance = parse_xbrl_instance(&instance_text, Some(instance_asset.name.clone()));
|
||||||
|
|
||||||
let mut label_by_concept = HashMap::new();
|
let mut label_by_concept = HashMap::new();
|
||||||
@@ -459,11 +465,9 @@ pub fn hydrate_filing(input: HydrateFilingRequest) -> Result<HydrateFilingRespon
|
|||||||
let mut source = "xbrl_instance".to_string();
|
let mut source = "xbrl_instance".to_string();
|
||||||
let mut parse_error = None;
|
let mut parse_error = None;
|
||||||
|
|
||||||
for asset in discovered
|
for asset in discovered.assets.iter().filter(|asset| {
|
||||||
.assets
|
asset.is_selected && (asset.asset_type == "presentation" || asset.asset_type == "label")
|
||||||
.iter()
|
}) {
|
||||||
.filter(|asset| asset.is_selected && (asset.asset_type == "presentation" || asset.asset_type == "label"))
|
|
||||||
{
|
|
||||||
match fetch_text(&client, &asset.url) {
|
match fetch_text(&client, &asset.url) {
|
||||||
Ok(content) => {
|
Ok(content) => {
|
||||||
if asset.asset_type == "presentation" {
|
if asset.asset_type == "presentation" {
|
||||||
@@ -515,10 +519,15 @@ pub fn hydrate_filing(input: HydrateFilingRequest) -> Result<HydrateFilingRespon
|
|||||||
pack_selection.pack,
|
pack_selection.pack,
|
||||||
&mut compact_model,
|
&mut compact_model,
|
||||||
)?;
|
)?;
|
||||||
let kpi_result = kpi_mapper::build_taxonomy_kpis(&materialized.periods, &facts, pack_selection.pack)?;
|
let kpi_result =
|
||||||
|
kpi_mapper::build_taxonomy_kpis(&materialized.periods, &facts, pack_selection.pack)?;
|
||||||
compact_model.normalization_summary.kpi_row_count = kpi_result.rows.len();
|
compact_model.normalization_summary.kpi_row_count = kpi_result.rows.len();
|
||||||
for warning in kpi_result.warnings {
|
for warning in kpi_result.warnings {
|
||||||
if !compact_model.normalization_summary.warnings.contains(&warning) {
|
if !compact_model
|
||||||
|
.normalization_summary
|
||||||
|
.warnings
|
||||||
|
.contains(&warning)
|
||||||
|
{
|
||||||
compact_model.normalization_summary.warnings.push(warning);
|
compact_model.normalization_summary.warnings.push(warning);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -526,7 +535,11 @@ pub fn hydrate_filing(input: HydrateFilingRequest) -> Result<HydrateFilingRespon
|
|||||||
&mut compact_model.concept_mappings,
|
&mut compact_model.concept_mappings,
|
||||||
kpi_result.mapping_assignments,
|
kpi_result.mapping_assignments,
|
||||||
);
|
);
|
||||||
surface_mapper::apply_mapping_assignments(&mut concepts, &mut facts, &compact_model.concept_mappings);
|
surface_mapper::apply_mapping_assignments(
|
||||||
|
&mut concepts,
|
||||||
|
&mut facts,
|
||||||
|
&compact_model.concept_mappings,
|
||||||
|
);
|
||||||
|
|
||||||
let has_rows = materialized
|
let has_rows = materialized
|
||||||
.statement_rows
|
.statement_rows
|
||||||
@@ -572,7 +585,11 @@ pub fn hydrate_filing(input: HydrateFilingRequest) -> Result<HydrateFilingRespon
|
|||||||
concepts_count: concepts.len(),
|
concepts_count: concepts.len(),
|
||||||
dimensions_count: facts
|
dimensions_count: facts
|
||||||
.iter()
|
.iter()
|
||||||
.flat_map(|fact| fact.dimensions.iter().map(|dimension| format!("{}::{}", dimension.axis, dimension.member)))
|
.flat_map(|fact| {
|
||||||
|
fact.dimensions
|
||||||
|
.iter()
|
||||||
|
.map(|dimension| format!("{}::{}", dimension.axis, dimension.member))
|
||||||
|
})
|
||||||
.collect::<HashSet<_>>()
|
.collect::<HashSet<_>>()
|
||||||
.len(),
|
.len(),
|
||||||
assets: discovered.assets,
|
assets: discovered.assets,
|
||||||
@@ -622,7 +639,10 @@ struct DiscoveredAssets {
|
|||||||
assets: Vec<AssetOutput>,
|
assets: Vec<AssetOutput>,
|
||||||
}
|
}
|
||||||
|
|
||||||
fn discover_filing_assets(input: &HydrateFilingRequest, client: &Client) -> Result<DiscoveredAssets> {
|
fn discover_filing_assets(
|
||||||
|
input: &HydrateFilingRequest,
|
||||||
|
client: &Client,
|
||||||
|
) -> Result<DiscoveredAssets> {
|
||||||
let Some(directory_url) = resolve_filing_directory_url(
|
let Some(directory_url) = resolve_filing_directory_url(
|
||||||
input.filing_url.as_deref(),
|
input.filing_url.as_deref(),
|
||||||
&input.cik,
|
&input.cik,
|
||||||
@@ -631,12 +651,19 @@ fn discover_filing_assets(input: &HydrateFilingRequest, client: &Client) -> Resu
|
|||||||
return Ok(DiscoveredAssets { assets: vec![] });
|
return Ok(DiscoveredAssets { assets: vec![] });
|
||||||
};
|
};
|
||||||
|
|
||||||
let payload = fetch_json::<FilingDirectoryPayload>(client, &format!("{directory_url}index.json")).ok();
|
let payload =
|
||||||
|
fetch_json::<FilingDirectoryPayload>(client, &format!("{directory_url}index.json")).ok();
|
||||||
let mut discovered = Vec::new();
|
let mut discovered = Vec::new();
|
||||||
|
|
||||||
if let Some(items) = payload.and_then(|payload| payload.directory.and_then(|directory| directory.item)) {
|
if let Some(items) =
|
||||||
|
payload.and_then(|payload| payload.directory.and_then(|directory| directory.item))
|
||||||
|
{
|
||||||
for item in items {
|
for item in items {
|
||||||
let Some(name) = item.name.map(|name| name.trim().to_string()).filter(|name| !name.is_empty()) else {
|
let Some(name) = item
|
||||||
|
.name
|
||||||
|
.map(|name| name.trim().to_string())
|
||||||
|
.filter(|name| !name.is_empty())
|
||||||
|
else {
|
||||||
continue;
|
continue;
|
||||||
};
|
};
|
||||||
|
|
||||||
@@ -683,12 +710,19 @@ fn discover_filing_assets(input: &HydrateFilingRequest, client: &Client) -> Resu
|
|||||||
score_instance(&asset.name, input.primary_document.as_deref()),
|
score_instance(&asset.name, input.primary_document.as_deref()),
|
||||||
)
|
)
|
||||||
})
|
})
|
||||||
.max_by(|left, right| left.1.partial_cmp(&right.1).unwrap_or(std::cmp::Ordering::Equal))
|
.max_by(|left, right| {
|
||||||
|
left.1
|
||||||
|
.partial_cmp(&right.1)
|
||||||
|
.unwrap_or(std::cmp::Ordering::Equal)
|
||||||
|
})
|
||||||
.map(|entry| entry.0);
|
.map(|entry| entry.0);
|
||||||
|
|
||||||
for asset in &mut discovered {
|
for asset in &mut discovered {
|
||||||
asset.score = if asset.asset_type == "instance" {
|
asset.score = if asset.asset_type == "instance" {
|
||||||
Some(score_instance(&asset.name, input.primary_document.as_deref()))
|
Some(score_instance(
|
||||||
|
&asset.name,
|
||||||
|
input.primary_document.as_deref(),
|
||||||
|
))
|
||||||
} else if asset.asset_type == "pdf" {
|
} else if asset.asset_type == "pdf" {
|
||||||
Some(score_pdf(&asset.name, asset.size_bytes))
|
Some(score_pdf(&asset.name, asset.size_bytes))
|
||||||
} else {
|
} else {
|
||||||
@@ -708,7 +742,11 @@ fn discover_filing_assets(input: &HydrateFilingRequest, client: &Client) -> Resu
|
|||||||
Ok(DiscoveredAssets { assets: discovered })
|
Ok(DiscoveredAssets { assets: discovered })
|
||||||
}
|
}
|
||||||
|
|
||||||
fn resolve_filing_directory_url(filing_url: Option<&str>, cik: &str, accession_number: &str) -> Option<String> {
|
fn resolve_filing_directory_url(
|
||||||
|
filing_url: Option<&str>,
|
||||||
|
cik: &str,
|
||||||
|
accession_number: &str,
|
||||||
|
) -> Option<String> {
|
||||||
if let Some(filing_url) = filing_url.map(str::trim).filter(|value| !value.is_empty()) {
|
if let Some(filing_url) = filing_url.map(str::trim).filter(|value| !value.is_empty()) {
|
||||||
if let Some(last_slash) = filing_url.rfind('/') {
|
if let Some(last_slash) = filing_url.rfind('/') {
|
||||||
if last_slash > "https://".len() {
|
if last_slash > "https://".len() {
|
||||||
@@ -725,7 +763,10 @@ fn resolve_filing_directory_url(filing_url: Option<&str>, cik: &str, accession_n
|
|||||||
}
|
}
|
||||||
|
|
||||||
fn normalize_cik_for_path(value: &str) -> Option<String> {
|
fn normalize_cik_for_path(value: &str) -> Option<String> {
|
||||||
let digits = value.chars().filter(|char| char.is_ascii_digit()).collect::<String>();
|
let digits = value
|
||||||
|
.chars()
|
||||||
|
.filter(|char| char.is_ascii_digit())
|
||||||
|
.collect::<String>();
|
||||||
if digits.is_empty() {
|
if digits.is_empty() {
|
||||||
return None;
|
return None;
|
||||||
}
|
}
|
||||||
@@ -741,16 +782,25 @@ fn classify_asset_type(name: &str) -> &'static str {
|
|||||||
return "schema";
|
return "schema";
|
||||||
}
|
}
|
||||||
if lower.ends_with(".xml") {
|
if lower.ends_with(".xml") {
|
||||||
if lower.ends_with("_pre.xml") || lower.ends_with("-pre.xml") || lower.contains("presentation") {
|
if lower.ends_with("_pre.xml")
|
||||||
|
|| lower.ends_with("-pre.xml")
|
||||||
|
|| lower.contains("presentation")
|
||||||
|
{
|
||||||
return "presentation";
|
return "presentation";
|
||||||
}
|
}
|
||||||
if lower.ends_with("_lab.xml") || lower.ends_with("-lab.xml") || lower.contains("label") {
|
if lower.ends_with("_lab.xml") || lower.ends_with("-lab.xml") || lower.contains("label") {
|
||||||
return "label";
|
return "label";
|
||||||
}
|
}
|
||||||
if lower.ends_with("_cal.xml") || lower.ends_with("-cal.xml") || lower.contains("calculation") {
|
if lower.ends_with("_cal.xml")
|
||||||
|
|| lower.ends_with("-cal.xml")
|
||||||
|
|| lower.contains("calculation")
|
||||||
|
{
|
||||||
return "calculation";
|
return "calculation";
|
||||||
}
|
}
|
||||||
if lower.ends_with("_def.xml") || lower.ends_with("-def.xml") || lower.contains("definition") {
|
if lower.ends_with("_def.xml")
|
||||||
|
|| lower.ends_with("-def.xml")
|
||||||
|
|| lower.contains("definition")
|
||||||
|
{
|
||||||
return "definition";
|
return "definition";
|
||||||
}
|
}
|
||||||
return "instance";
|
return "instance";
|
||||||
@@ -779,7 +829,11 @@ fn score_instance(name: &str, primary_document: Option<&str>) -> f64 {
|
|||||||
score += 5.0;
|
score += 5.0;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
if lower.contains("cal") || lower.contains("def") || lower.contains("lab") || lower.contains("pre") {
|
if lower.contains("cal")
|
||||||
|
|| lower.contains("def")
|
||||||
|
|| lower.contains("lab")
|
||||||
|
|| lower.contains("pre")
|
||||||
|
{
|
||||||
score -= 3.0;
|
score -= 3.0;
|
||||||
}
|
}
|
||||||
score
|
score
|
||||||
@@ -819,7 +873,9 @@ fn fetch_text(client: &Client, url: &str) -> Result<String> {
|
|||||||
if !response.status().is_success() {
|
if !response.status().is_success() {
|
||||||
return Err(anyhow!("request failed for {url} ({})", response.status()));
|
return Err(anyhow!("request failed for {url} ({})", response.status()));
|
||||||
}
|
}
|
||||||
response.text().with_context(|| format!("unable to read response body for {url}"))
|
response
|
||||||
|
.text()
|
||||||
|
.with_context(|| format!("unable to read response body for {url}"))
|
||||||
}
|
}
|
||||||
|
|
||||||
fn fetch_json<T: for<'de> Deserialize<'de>>(client: &Client, url: &str) -> Result<T> {
|
fn fetch_json<T: for<'de> Deserialize<'de>>(client: &Client, url: &str) -> Result<T> {
|
||||||
@@ -847,17 +903,36 @@ fn parse_xbrl_instance(raw: &str, source_file: Option<String>) -> ParsedInstance
|
|||||||
let mut facts = Vec::new();
|
let mut facts = Vec::new();
|
||||||
|
|
||||||
for captures in FACT_RE.captures_iter(raw) {
|
for captures in FACT_RE.captures_iter(raw) {
|
||||||
let prefix = captures.get(1).map(|value| value.as_str().trim()).unwrap_or_default();
|
let prefix = captures
|
||||||
let local_name = captures.get(2).map(|value| value.as_str().trim()).unwrap_or_default();
|
.get(1)
|
||||||
let attrs = captures.get(3).map(|value| value.as_str()).unwrap_or_default();
|
.map(|value| value.as_str().trim())
|
||||||
let body = decode_xml_entities(captures.get(4).map(|value| value.as_str()).unwrap_or_default().trim());
|
.unwrap_or_default();
|
||||||
|
let local_name = captures
|
||||||
|
.get(2)
|
||||||
|
.map(|value| value.as_str().trim())
|
||||||
|
.unwrap_or_default();
|
||||||
|
let attrs = captures
|
||||||
|
.get(3)
|
||||||
|
.map(|value| value.as_str())
|
||||||
|
.unwrap_or_default();
|
||||||
|
let body = decode_xml_entities(
|
||||||
|
captures
|
||||||
|
.get(4)
|
||||||
|
.map(|value| value.as_str())
|
||||||
|
.unwrap_or_default()
|
||||||
|
.trim(),
|
||||||
|
);
|
||||||
|
|
||||||
if prefix.is_empty() || local_name.is_empty() || is_xbrl_infrastructure_prefix(prefix) {
|
if prefix.is_empty() || local_name.is_empty() || is_xbrl_infrastructure_prefix(prefix) {
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
|
||||||
let attr_map = parse_attrs(attrs);
|
let attr_map = parse_attrs(attrs);
|
||||||
let Some(context_id) = attr_map.get("contextRef").cloned().or_else(|| attr_map.get("contextref").cloned()) else {
|
let Some(context_id) = attr_map
|
||||||
|
.get("contextRef")
|
||||||
|
.cloned()
|
||||||
|
.or_else(|| attr_map.get("contextref").cloned())
|
||||||
|
else {
|
||||||
continue;
|
continue;
|
||||||
};
|
};
|
||||||
|
|
||||||
@@ -870,7 +945,10 @@ fn parse_xbrl_instance(raw: &str, source_file: Option<String>) -> ParsedInstance
|
|||||||
.cloned()
|
.cloned()
|
||||||
.unwrap_or_else(|| format!("urn:unknown:{prefix}"));
|
.unwrap_or_else(|| format!("urn:unknown:{prefix}"));
|
||||||
let context = context_by_id.get(&context_id);
|
let context = context_by_id.get(&context_id);
|
||||||
let unit_ref = attr_map.get("unitRef").cloned().or_else(|| attr_map.get("unitref").cloned());
|
let unit_ref = attr_map
|
||||||
|
.get("unitRef")
|
||||||
|
.cloned()
|
||||||
|
.or_else(|| attr_map.get("unitref").cloned());
|
||||||
let unit = unit_ref
|
let unit = unit_ref
|
||||||
.as_ref()
|
.as_ref()
|
||||||
.and_then(|unit_ref| unit_by_id.get(unit_ref))
|
.and_then(|unit_ref| unit_by_id.get(unit_ref))
|
||||||
@@ -896,8 +974,12 @@ fn parse_xbrl_instance(raw: &str, source_file: Option<String>) -> ParsedInstance
|
|||||||
period_start: context.and_then(|value| value.period_start.clone()),
|
period_start: context.and_then(|value| value.period_start.clone()),
|
||||||
period_end: context.and_then(|value| value.period_end.clone()),
|
period_end: context.and_then(|value| value.period_end.clone()),
|
||||||
period_instant: context.and_then(|value| value.period_instant.clone()),
|
period_instant: context.and_then(|value| value.period_instant.clone()),
|
||||||
dimensions: context.map(|value| value.dimensions.clone()).unwrap_or_default(),
|
dimensions: context
|
||||||
is_dimensionless: context.map(|value| value.dimensions.is_empty()).unwrap_or(true),
|
.map(|value| value.dimensions.clone())
|
||||||
|
.unwrap_or_default(),
|
||||||
|
is_dimensionless: context
|
||||||
|
.map(|value| value.dimensions.is_empty())
|
||||||
|
.unwrap_or(true),
|
||||||
source_file: source_file.clone(),
|
source_file: source_file.clone(),
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
@@ -916,10 +998,7 @@ fn parse_xbrl_instance(raw: &str, source_file: Option<String>) -> ParsedInstance
|
|||||||
})
|
})
|
||||||
.collect::<Vec<_>>();
|
.collect::<Vec<_>>();
|
||||||
|
|
||||||
ParsedInstance {
|
ParsedInstance { contexts, facts }
|
||||||
contexts,
|
|
||||||
facts,
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
fn parse_namespace_map(raw: &str, root_tag_hint: &str) -> HashMap<String, String> {
|
fn parse_namespace_map(raw: &str, root_tag_hint: &str) -> HashMap<String, String> {
|
||||||
@@ -935,7 +1014,10 @@ fn parse_namespace_map(raw: &str, root_tag_hint: &str) -> HashMap<String, String
|
|||||||
.captures_iter(&root_start)
|
.captures_iter(&root_start)
|
||||||
{
|
{
|
||||||
if let (Some(prefix), Some(uri)) = (captures.get(1), captures.get(2)) {
|
if let (Some(prefix), Some(uri)) = (captures.get(1), captures.get(2)) {
|
||||||
map.insert(prefix.as_str().trim().to_string(), uri.as_str().trim().to_string());
|
map.insert(
|
||||||
|
prefix.as_str().trim().to_string(),
|
||||||
|
uri.as_str().trim().to_string(),
|
||||||
|
);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -946,16 +1028,26 @@ fn parse_contexts(raw: &str) -> HashMap<String, ParsedContext> {
|
|||||||
let mut contexts = HashMap::new();
|
let mut contexts = HashMap::new();
|
||||||
|
|
||||||
for captures in CONTEXT_RE.captures_iter(raw) {
|
for captures in CONTEXT_RE.captures_iter(raw) {
|
||||||
let Some(context_id) = captures.get(1).map(|value| value.as_str().trim().to_string()) else {
|
let Some(context_id) = captures
|
||||||
|
.get(1)
|
||||||
|
.map(|value| value.as_str().trim().to_string())
|
||||||
|
else {
|
||||||
continue;
|
continue;
|
||||||
};
|
};
|
||||||
let block = captures.get(2).map(|value| value.as_str()).unwrap_or_default();
|
let block = captures
|
||||||
|
.get(2)
|
||||||
|
.map(|value| value.as_str())
|
||||||
|
.unwrap_or_default();
|
||||||
let (entity_identifier, entity_scheme) = IDENTIFIER_RE
|
let (entity_identifier, entity_scheme) = IDENTIFIER_RE
|
||||||
.captures(block)
|
.captures(block)
|
||||||
.map(|captures| {
|
.map(|captures| {
|
||||||
(
|
(
|
||||||
captures.get(2).map(|value| decode_xml_entities(value.as_str().trim())),
|
captures
|
||||||
captures.get(1).map(|value| decode_xml_entities(value.as_str().trim())),
|
.get(2)
|
||||||
|
.map(|value| decode_xml_entities(value.as_str().trim())),
|
||||||
|
captures
|
||||||
|
.get(1)
|
||||||
|
.map(|value| decode_xml_entities(value.as_str().trim())),
|
||||||
)
|
)
|
||||||
})
|
})
|
||||||
.unwrap_or((None, None));
|
.unwrap_or((None, None));
|
||||||
@@ -984,7 +1076,10 @@ fn parse_contexts(raw: &str) -> HashMap<String, ParsedContext> {
|
|||||||
|
|
||||||
let mut dimensions = Vec::new();
|
let mut dimensions = Vec::new();
|
||||||
if let Some(segment_value) = segment.as_ref() {
|
if let Some(segment_value) = segment.as_ref() {
|
||||||
if let Some(members) = segment_value.get("explicitMembers").and_then(|value| value.as_array()) {
|
if let Some(members) = segment_value
|
||||||
|
.get("explicitMembers")
|
||||||
|
.and_then(|value| value.as_array())
|
||||||
|
{
|
||||||
for member in members {
|
for member in members {
|
||||||
if let (Some(axis), Some(member_value)) = (
|
if let (Some(axis), Some(member_value)) = (
|
||||||
member.get("axis").and_then(|value| value.as_str()),
|
member.get("axis").and_then(|value| value.as_str()),
|
||||||
@@ -999,7 +1094,10 @@ fn parse_contexts(raw: &str) -> HashMap<String, ParsedContext> {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
if let Some(scenario_value) = scenario.as_ref() {
|
if let Some(scenario_value) = scenario.as_ref() {
|
||||||
if let Some(members) = scenario_value.get("explicitMembers").and_then(|value| value.as_array()) {
|
if let Some(members) = scenario_value
|
||||||
|
.get("explicitMembers")
|
||||||
|
.and_then(|value| value.as_array())
|
||||||
|
{
|
||||||
for member in members {
|
for member in members {
|
||||||
if let (Some(axis), Some(member_value)) = (
|
if let (Some(axis), Some(member_value)) = (
|
||||||
member.get("axis").and_then(|value| value.as_str()),
|
member.get("axis").and_then(|value| value.as_str()),
|
||||||
@@ -1062,10 +1160,16 @@ fn parse_dimension_container(raw: &str) -> serde_json::Value {
|
|||||||
fn parse_units(raw: &str) -> HashMap<String, ParsedUnit> {
|
fn parse_units(raw: &str) -> HashMap<String, ParsedUnit> {
|
||||||
let mut units = HashMap::new();
|
let mut units = HashMap::new();
|
||||||
for captures in UNIT_RE.captures_iter(raw) {
|
for captures in UNIT_RE.captures_iter(raw) {
|
||||||
let Some(id) = captures.get(1).map(|value| value.as_str().trim().to_string()) else {
|
let Some(id) = captures
|
||||||
|
.get(1)
|
||||||
|
.map(|value| value.as_str().trim().to_string())
|
||||||
|
else {
|
||||||
continue;
|
continue;
|
||||||
};
|
};
|
||||||
let block = captures.get(2).map(|value| value.as_str()).unwrap_or_default();
|
let block = captures
|
||||||
|
.get(2)
|
||||||
|
.map(|value| value.as_str())
|
||||||
|
.unwrap_or_default();
|
||||||
let measures = MEASURE_RE
|
let measures = MEASURE_RE
|
||||||
.captures_iter(block)
|
.captures_iter(block)
|
||||||
.filter_map(|captures| captures.get(1))
|
.filter_map(|captures| captures.get(1))
|
||||||
@@ -1097,7 +1201,10 @@ fn parse_attrs(raw: &str) -> HashMap<String, String> {
|
|||||||
let mut map = HashMap::new();
|
let mut map = HashMap::new();
|
||||||
for captures in ATTR_RE.captures_iter(raw) {
|
for captures in ATTR_RE.captures_iter(raw) {
|
||||||
if let (Some(name), Some(value)) = (captures.get(1), captures.get(2)) {
|
if let (Some(name), Some(value)) = (captures.get(1), captures.get(2)) {
|
||||||
map.insert(name.as_str().to_string(), decode_xml_entities(value.as_str()));
|
map.insert(
|
||||||
|
name.as_str().to_string(),
|
||||||
|
decode_xml_entities(value.as_str()),
|
||||||
|
);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
map
|
map
|
||||||
@@ -1138,12 +1245,20 @@ fn parse_label_linkbase(raw: &str) -> HashMap<String, String> {
|
|||||||
let mut preferred = HashMap::<String, (String, i64)>::new();
|
let mut preferred = HashMap::<String, (String, i64)>::new();
|
||||||
|
|
||||||
for captures in LABEL_LINK_RE.captures_iter(raw) {
|
for captures in LABEL_LINK_RE.captures_iter(raw) {
|
||||||
let block = captures.get(1).map(|value| value.as_str()).unwrap_or_default();
|
let block = captures
|
||||||
|
.get(1)
|
||||||
|
.map(|value| value.as_str())
|
||||||
|
.unwrap_or_default();
|
||||||
let mut loc_by_label = HashMap::<String, String>::new();
|
let mut loc_by_label = HashMap::<String, String>::new();
|
||||||
let mut resource_by_label = HashMap::<String, (String, Option<String>)>::new();
|
let mut resource_by_label = HashMap::<String, (String, Option<String>)>::new();
|
||||||
|
|
||||||
for captures in LOC_RE.captures_iter(block) {
|
for captures in LOC_RE.captures_iter(block) {
|
||||||
let attrs = parse_attrs(captures.get(1).map(|value| value.as_str()).unwrap_or_default());
|
let attrs = parse_attrs(
|
||||||
|
captures
|
||||||
|
.get(1)
|
||||||
|
.map(|value| value.as_str())
|
||||||
|
.unwrap_or_default(),
|
||||||
|
);
|
||||||
let Some(label) = attrs.get("xlink:label").cloned() else {
|
let Some(label) = attrs.get("xlink:label").cloned() else {
|
||||||
continue;
|
continue;
|
||||||
};
|
};
|
||||||
@@ -1160,11 +1275,21 @@ fn parse_label_linkbase(raw: &str) -> HashMap<String, String> {
|
|||||||
}
|
}
|
||||||
|
|
||||||
for captures in LABEL_RESOURCE_RE.captures_iter(block) {
|
for captures in LABEL_RESOURCE_RE.captures_iter(block) {
|
||||||
let attrs = parse_attrs(captures.get(1).map(|value| value.as_str()).unwrap_or_default());
|
let attrs = parse_attrs(
|
||||||
|
captures
|
||||||
|
.get(1)
|
||||||
|
.map(|value| value.as_str())
|
||||||
|
.unwrap_or_default(),
|
||||||
|
);
|
||||||
let Some(label) = attrs.get("xlink:label").cloned() else {
|
let Some(label) = attrs.get("xlink:label").cloned() else {
|
||||||
continue;
|
continue;
|
||||||
};
|
};
|
||||||
let body = decode_xml_entities(captures.get(2).map(|value| value.as_str()).unwrap_or_default())
|
let body = decode_xml_entities(
|
||||||
|
captures
|
||||||
|
.get(2)
|
||||||
|
.map(|value| value.as_str())
|
||||||
|
.unwrap_or_default(),
|
||||||
|
)
|
||||||
.split_whitespace()
|
.split_whitespace()
|
||||||
.collect::<Vec<_>>()
|
.collect::<Vec<_>>()
|
||||||
.join(" ");
|
.join(" ");
|
||||||
@@ -1175,7 +1300,12 @@ fn parse_label_linkbase(raw: &str) -> HashMap<String, String> {
|
|||||||
}
|
}
|
||||||
|
|
||||||
for captures in LABEL_ARC_RE.captures_iter(block) {
|
for captures in LABEL_ARC_RE.captures_iter(block) {
|
||||||
let attrs = parse_attrs(captures.get(1).map(|value| value.as_str()).unwrap_or_default());
|
let attrs = parse_attrs(
|
||||||
|
captures
|
||||||
|
.get(1)
|
||||||
|
.map(|value| value.as_str())
|
||||||
|
.unwrap_or_default(),
|
||||||
|
);
|
||||||
let Some(from) = attrs.get("xlink:from").cloned() else {
|
let Some(from) = attrs.get("xlink:from").cloned() else {
|
||||||
continue;
|
continue;
|
||||||
};
|
};
|
||||||
@@ -1190,7 +1320,11 @@ fn parse_label_linkbase(raw: &str) -> HashMap<String, String> {
|
|||||||
};
|
};
|
||||||
let priority = label_priority(role.as_deref());
|
let priority = label_priority(role.as_deref());
|
||||||
let current = preferred.get(concept_key).cloned();
|
let current = preferred.get(concept_key).cloned();
|
||||||
if current.as_ref().map(|(_, current_priority)| priority > *current_priority).unwrap_or(true) {
|
if current
|
||||||
|
.as_ref()
|
||||||
|
.map(|(_, current_priority)| priority > *current_priority)
|
||||||
|
.unwrap_or(true)
|
||||||
|
{
|
||||||
preferred.insert(concept_key.clone(), (label.clone(), priority));
|
preferred.insert(concept_key.clone(), (label.clone(), priority));
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -1207,18 +1341,31 @@ fn parse_presentation_linkbase(raw: &str) -> Vec<PresentationNode> {
|
|||||||
let mut rows = Vec::new();
|
let mut rows = Vec::new();
|
||||||
|
|
||||||
for captures in PRESENTATION_LINK_RE.captures_iter(raw) {
|
for captures in PRESENTATION_LINK_RE.captures_iter(raw) {
|
||||||
let link_attrs = parse_attrs(captures.get(1).map(|value| value.as_str()).unwrap_or_default());
|
let link_attrs = parse_attrs(
|
||||||
|
captures
|
||||||
|
.get(1)
|
||||||
|
.map(|value| value.as_str())
|
||||||
|
.unwrap_or_default(),
|
||||||
|
);
|
||||||
let Some(role_uri) = link_attrs.get("xlink:role").cloned() else {
|
let Some(role_uri) = link_attrs.get("xlink:role").cloned() else {
|
||||||
continue;
|
continue;
|
||||||
};
|
};
|
||||||
let block = captures.get(2).map(|value| value.as_str()).unwrap_or_default();
|
let block = captures
|
||||||
|
.get(2)
|
||||||
|
.map(|value| value.as_str())
|
||||||
|
.unwrap_or_default();
|
||||||
let mut loc_by_label = HashMap::<String, (String, String, bool)>::new();
|
let mut loc_by_label = HashMap::<String, (String, String, bool)>::new();
|
||||||
let mut children_by_label = HashMap::<String, Vec<(String, f64)>>::new();
|
let mut children_by_label = HashMap::<String, Vec<(String, f64)>>::new();
|
||||||
let mut incoming = HashSet::<String>::new();
|
let mut incoming = HashSet::<String>::new();
|
||||||
let mut all_referenced = HashSet::<String>::new();
|
let mut all_referenced = HashSet::<String>::new();
|
||||||
|
|
||||||
for captures in LOC_RE.captures_iter(block) {
|
for captures in LOC_RE.captures_iter(block) {
|
||||||
let attrs = parse_attrs(captures.get(1).map(|value| value.as_str()).unwrap_or_default());
|
let attrs = parse_attrs(
|
||||||
|
captures
|
||||||
|
.get(1)
|
||||||
|
.map(|value| value.as_str())
|
||||||
|
.unwrap_or_default(),
|
||||||
|
);
|
||||||
let Some(label) = attrs.get("xlink:label").cloned() else {
|
let Some(label) = attrs.get("xlink:label").cloned() else {
|
||||||
continue;
|
continue;
|
||||||
};
|
};
|
||||||
@@ -1228,14 +1375,27 @@ fn parse_presentation_linkbase(raw: &str) -> Vec<PresentationNode> {
|
|||||||
let Some(qname) = qname_from_href(&href) else {
|
let Some(qname) = qname_from_href(&href) else {
|
||||||
continue;
|
continue;
|
||||||
};
|
};
|
||||||
let Some((concept_key, qname, local_name)) = concept_from_qname(&qname, &namespaces) else {
|
let Some((concept_key, qname, local_name)) = concept_from_qname(&qname, &namespaces)
|
||||||
|
else {
|
||||||
continue;
|
continue;
|
||||||
};
|
};
|
||||||
loc_by_label.insert(label, (concept_key, qname, local_name.to_ascii_lowercase().contains("abstract")));
|
loc_by_label.insert(
|
||||||
|
label,
|
||||||
|
(
|
||||||
|
concept_key,
|
||||||
|
qname,
|
||||||
|
local_name.to_ascii_lowercase().contains("abstract"),
|
||||||
|
),
|
||||||
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
for captures in PRESENTATION_ARC_RE.captures_iter(block) {
|
for captures in PRESENTATION_ARC_RE.captures_iter(block) {
|
||||||
let attrs = parse_attrs(captures.get(1).map(|value| value.as_str()).unwrap_or_default());
|
let attrs = parse_attrs(
|
||||||
|
captures
|
||||||
|
.get(1)
|
||||||
|
.map(|value| value.as_str())
|
||||||
|
.unwrap_or_default(),
|
||||||
|
);
|
||||||
let Some(from) = attrs.get("xlink:from").cloned() else {
|
let Some(from) = attrs.get("xlink:from").cloned() else {
|
||||||
continue;
|
continue;
|
||||||
};
|
};
|
||||||
@@ -1248,8 +1408,16 @@ fn parse_presentation_linkbase(raw: &str) -> Vec<PresentationNode> {
|
|||||||
let order = attrs
|
let order = attrs
|
||||||
.get("order")
|
.get("order")
|
||||||
.and_then(|value| value.parse::<f64>().ok())
|
.and_then(|value| value.parse::<f64>().ok())
|
||||||
.unwrap_or_else(|| children_by_label.get(&from).map(|children| children.len() as f64 + 1.0).unwrap_or(1.0));
|
.unwrap_or_else(|| {
|
||||||
children_by_label.entry(from.clone()).or_default().push((to.clone(), order));
|
children_by_label
|
||||||
|
.get(&from)
|
||||||
|
.map(|children| children.len() as f64 + 1.0)
|
||||||
|
.unwrap_or(1.0)
|
||||||
|
});
|
||||||
|
children_by_label
|
||||||
|
.entry(from.clone())
|
||||||
|
.or_default()
|
||||||
|
.push((to.clone(), order));
|
||||||
incoming.insert(to.clone());
|
incoming.insert(to.clone());
|
||||||
all_referenced.insert(from);
|
all_referenced.insert(from);
|
||||||
all_referenced.insert(to);
|
all_referenced.insert(to);
|
||||||
@@ -1281,7 +1449,11 @@ fn parse_presentation_linkbase(raw: &str) -> Vec<PresentationNode> {
|
|||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
let parent_concept_key = parent_label.and_then(|parent| loc_by_label.get(parent).map(|(concept_key, _, _)| concept_key.clone()));
|
let parent_concept_key = parent_label.and_then(|parent| {
|
||||||
|
loc_by_label
|
||||||
|
.get(parent)
|
||||||
|
.map(|(concept_key, _, _)| concept_key.clone())
|
||||||
|
});
|
||||||
rows.push(PresentationNode {
|
rows.push(PresentationNode {
|
||||||
concept_key: concept_key.clone(),
|
concept_key: concept_key.clone(),
|
||||||
role_uri: role_uri.to_string(),
|
role_uri: role_uri.to_string(),
|
||||||
@@ -1292,7 +1464,11 @@ fn parse_presentation_linkbase(raw: &str) -> Vec<PresentationNode> {
|
|||||||
});
|
});
|
||||||
|
|
||||||
let mut children = children_by_label.get(label).cloned().unwrap_or_default();
|
let mut children = children_by_label.get(label).cloned().unwrap_or_default();
|
||||||
children.sort_by(|left, right| left.1.partial_cmp(&right.1).unwrap_or(std::cmp::Ordering::Equal));
|
children.sort_by(|left, right| {
|
||||||
|
left.1
|
||||||
|
.partial_cmp(&right.1)
|
||||||
|
.unwrap_or(std::cmp::Ordering::Equal)
|
||||||
|
});
|
||||||
for (index, (child_label, _)) in children.into_iter().enumerate() {
|
for (index, (child_label, _)) in children.into_iter().enumerate() {
|
||||||
dfs(
|
dfs(
|
||||||
&child_label,
|
&child_label,
|
||||||
@@ -1400,7 +1576,10 @@ fn materialize_taxonomy_statements(
|
|||||||
.clone()
|
.clone()
|
||||||
.or_else(|| fact.period_instant.clone())
|
.or_else(|| fact.period_instant.clone())
|
||||||
.unwrap_or_else(|| filing_date.to_string());
|
.unwrap_or_else(|| filing_date.to_string());
|
||||||
let id = format!("{date}-{compact_accession}-{}", period_by_signature.len() + 1);
|
let id = format!(
|
||||||
|
"{date}-{compact_accession}-{}",
|
||||||
|
period_by_signature.len() + 1
|
||||||
|
);
|
||||||
let period_label = if fact.period_instant.is_some() && fact.period_start.is_none() {
|
let period_label = if fact.period_instant.is_some() && fact.period_start.is_none() {
|
||||||
"Instant".to_string()
|
"Instant".to_string()
|
||||||
} else if fact.period_start.is_some() && fact.period_end.is_some() {
|
} else if fact.period_start.is_some() && fact.period_end.is_some() {
|
||||||
@@ -1420,7 +1599,10 @@ fn materialize_taxonomy_statements(
|
|||||||
accession_number: accession_number.to_string(),
|
accession_number: accession_number.to_string(),
|
||||||
filing_date: filing_date.to_string(),
|
filing_date: filing_date.to_string(),
|
||||||
period_start: fact.period_start.clone(),
|
period_start: fact.period_start.clone(),
|
||||||
period_end: fact.period_end.clone().or_else(|| fact.period_instant.clone()),
|
period_end: fact
|
||||||
|
.period_end
|
||||||
|
.clone()
|
||||||
|
.or_else(|| fact.period_instant.clone()),
|
||||||
filing_type: filing_type.to_string(),
|
filing_type: filing_type.to_string(),
|
||||||
period_label,
|
period_label,
|
||||||
},
|
},
|
||||||
@@ -1429,9 +1611,17 @@ fn materialize_taxonomy_statements(
|
|||||||
|
|
||||||
let mut periods = period_by_signature.values().cloned().collect::<Vec<_>>();
|
let mut periods = period_by_signature.values().cloned().collect::<Vec<_>>();
|
||||||
periods.sort_by(|left, right| {
|
periods.sort_by(|left, right| {
|
||||||
let left_key = left.period_end.clone().unwrap_or_else(|| left.filing_date.clone());
|
let left_key = left
|
||||||
let right_key = right.period_end.clone().unwrap_or_else(|| right.filing_date.clone());
|
.period_end
|
||||||
left_key.cmp(&right_key).then_with(|| left.id.cmp(&right.id))
|
.clone()
|
||||||
|
.unwrap_or_else(|| left.filing_date.clone());
|
||||||
|
let right_key = right
|
||||||
|
.period_end
|
||||||
|
.clone()
|
||||||
|
.unwrap_or_else(|| right.filing_date.clone());
|
||||||
|
left_key
|
||||||
|
.cmp(&right_key)
|
||||||
|
.then_with(|| left.id.cmp(&right.id))
|
||||||
});
|
});
|
||||||
let period_id_by_signature = period_by_signature
|
let period_id_by_signature = period_by_signature
|
||||||
.iter()
|
.iter()
|
||||||
@@ -1440,7 +1630,10 @@ fn materialize_taxonomy_statements(
|
|||||||
|
|
||||||
let mut presentation_by_concept = HashMap::<String, Vec<&PresentationNode>>::new();
|
let mut presentation_by_concept = HashMap::<String, Vec<&PresentationNode>>::new();
|
||||||
for node in presentation {
|
for node in presentation {
|
||||||
presentation_by_concept.entry(node.concept_key.clone()).or_default().push(node);
|
presentation_by_concept
|
||||||
|
.entry(node.concept_key.clone())
|
||||||
|
.or_default()
|
||||||
|
.push(node);
|
||||||
}
|
}
|
||||||
|
|
||||||
let mut grouped_by_statement = empty_parsed_fact_map();
|
let mut grouped_by_statement = empty_parsed_fact_map();
|
||||||
@@ -1502,9 +1695,13 @@ fn materialize_taxonomy_statements(
|
|||||||
let mut concepts = Vec::<ConceptOutput>::new();
|
let mut concepts = Vec::<ConceptOutput>::new();
|
||||||
|
|
||||||
for statement_kind in statement_keys() {
|
for statement_kind in statement_keys() {
|
||||||
let concept_groups = grouped_by_statement.remove(statement_kind).unwrap_or_default();
|
let concept_groups = grouped_by_statement
|
||||||
|
.remove(statement_kind)
|
||||||
|
.unwrap_or_default();
|
||||||
let mut concept_keys = HashSet::<String>::new();
|
let mut concept_keys = HashSet::<String>::new();
|
||||||
for node in presentation.iter().filter(|node| classify_statement_role(&node.role_uri).as_deref() == Some(statement_kind)) {
|
for node in presentation.iter().filter(|node| {
|
||||||
|
classify_statement_role(&node.role_uri).as_deref() == Some(statement_kind)
|
||||||
|
}) {
|
||||||
concept_keys.insert(node.concept_key.clone());
|
concept_keys.insert(node.concept_key.clone());
|
||||||
}
|
}
|
||||||
for concept_key in concept_groups.keys() {
|
for concept_key in concept_groups.keys() {
|
||||||
@@ -1516,12 +1713,21 @@ fn materialize_taxonomy_statements(
|
|||||||
.map(|concept_key| {
|
.map(|concept_key| {
|
||||||
let nodes = presentation
|
let nodes = presentation
|
||||||
.iter()
|
.iter()
|
||||||
.filter(|node| node.concept_key == concept_key && classify_statement_role(&node.role_uri).as_deref() == Some(statement_kind))
|
.filter(|node| {
|
||||||
|
node.concept_key == concept_key
|
||||||
|
&& classify_statement_role(&node.role_uri).as_deref()
|
||||||
|
== Some(statement_kind)
|
||||||
|
})
|
||||||
.collect::<Vec<_>>();
|
.collect::<Vec<_>>();
|
||||||
let order = nodes.iter().map(|node| node.order).fold(f64::INFINITY, f64::min);
|
let order = nodes
|
||||||
|
.iter()
|
||||||
|
.map(|node| node.order)
|
||||||
|
.fold(f64::INFINITY, f64::min);
|
||||||
let depth = nodes.iter().map(|node| node.depth).min().unwrap_or(0);
|
let depth = nodes.iter().map(|node| node.depth).min().unwrap_or(0);
|
||||||
let role_uri = nodes.first().map(|node| node.role_uri.clone());
|
let role_uri = nodes.first().map(|node| node.role_uri.clone());
|
||||||
let parent_concept_key = nodes.first().and_then(|node| node.parent_concept_key.clone());
|
let parent_concept_key = nodes
|
||||||
|
.first()
|
||||||
|
.and_then(|node| node.parent_concept_key.clone());
|
||||||
(concept_key, order, depth, role_uri, parent_concept_key)
|
(concept_key, order, depth, role_uri, parent_concept_key)
|
||||||
})
|
})
|
||||||
.collect::<Vec<_>>();
|
.collect::<Vec<_>>();
|
||||||
@@ -1532,8 +1738,13 @@ fn materialize_taxonomy_statements(
|
|||||||
.then_with(|| left.0.cmp(&right.0))
|
.then_with(|| left.0.cmp(&right.0))
|
||||||
});
|
});
|
||||||
|
|
||||||
for (concept_key, presentation_order, depth, role_uri, parent_concept_key) in ordered_concepts {
|
for (concept_key, presentation_order, depth, role_uri, parent_concept_key) in
|
||||||
let fact_group = concept_groups.get(&concept_key).cloned().unwrap_or_default();
|
ordered_concepts
|
||||||
|
{
|
||||||
|
let fact_group = concept_groups
|
||||||
|
.get(&concept_key)
|
||||||
|
.cloned()
|
||||||
|
.unwrap_or_default();
|
||||||
let (namespace_uri, local_name) = split_concept_key(&concept_key);
|
let (namespace_uri, local_name) = split_concept_key(&concept_key);
|
||||||
let qname = fact_group
|
let qname = fact_group
|
||||||
.first()
|
.first()
|
||||||
@@ -1672,7 +1883,13 @@ fn empty_detail_row_map() -> DetailRowStatementMap {
|
|||||||
}
|
}
|
||||||
|
|
||||||
fn statement_keys() -> [&'static str; 5] {
|
fn statement_keys() -> [&'static str; 5] {
|
||||||
["income", "balance", "cash_flow", "equity", "comprehensive_income"]
|
[
|
||||||
|
"income",
|
||||||
|
"balance",
|
||||||
|
"cash_flow",
|
||||||
|
"equity",
|
||||||
|
"comprehensive_income",
|
||||||
|
]
|
||||||
}
|
}
|
||||||
|
|
||||||
fn statement_key_ref(value: &str) -> Option<&'static str> {
|
fn statement_key_ref(value: &str) -> Option<&'static str> {
|
||||||
@@ -1709,7 +1926,13 @@ fn pick_preferred_fact(grouped_facts: &[(i64, ParsedFact)]) -> Option<&(i64, Par
|
|||||||
.unwrap_or_default();
|
.unwrap_or_default();
|
||||||
left_date.cmp(&right_date)
|
left_date.cmp(&right_date)
|
||||||
})
|
})
|
||||||
.then_with(|| left.1.value.abs().partial_cmp(&right.1.value.abs()).unwrap_or(std::cmp::Ordering::Equal))
|
.then_with(|| {
|
||||||
|
left.1
|
||||||
|
.value
|
||||||
|
.abs()
|
||||||
|
.partial_cmp(&right.1.value.abs())
|
||||||
|
.unwrap_or(std::cmp::Ordering::Equal)
|
||||||
|
})
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -1779,12 +2002,6 @@ fn classify_statement_role(role_uri: &str) -> Option<String> {
|
|||||||
|
|
||||||
fn concept_statement_fallback(local_name: &str) -> Option<String> {
|
fn concept_statement_fallback(local_name: &str) -> Option<String> {
|
||||||
let normalized = local_name.to_ascii_lowercase();
|
let normalized = local_name.to_ascii_lowercase();
|
||||||
if Regex::new(r#"cash|operatingactivities|investingactivities|financingactivities"#)
|
|
||||||
.unwrap()
|
|
||||||
.is_match(&normalized)
|
|
||||||
{
|
|
||||||
return Some("cash_flow".to_string());
|
|
||||||
}
|
|
||||||
if Regex::new(r#"equity|retainedearnings|additionalpaidincapital"#)
|
if Regex::new(r#"equity|retainedearnings|additionalpaidincapital"#)
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.is_match(&normalized)
|
.is_match(&normalized)
|
||||||
@@ -1794,6 +2011,22 @@ fn concept_statement_fallback(local_name: &str) -> Option<String> {
|
|||||||
if normalized.contains("comprehensiveincome") {
|
if normalized.contains("comprehensiveincome") {
|
||||||
return Some("comprehensive_income".to_string());
|
return Some("comprehensive_income".to_string());
|
||||||
}
|
}
|
||||||
|
if Regex::new(
|
||||||
|
r#"deferredpolicyacquisitioncosts(andvalueofbusinessacquired)?$|supplementaryinsuranceinformationdeferredpolicyacquisitioncosts$|deferredacquisitioncosts$"#,
|
||||||
|
)
|
||||||
|
.unwrap()
|
||||||
|
.is_match(&normalized)
|
||||||
|
{
|
||||||
|
return Some("balance".to_string());
|
||||||
|
}
|
||||||
|
if Regex::new(
|
||||||
|
r#"netcashprovidedbyusedin.*activities|increasedecreasein|paymentstoacquire|paymentsforcapitalimprovements$|paymentsfordepositsonrealestateacquisitions$|paymentsforrepurchase|paymentsofdividends|dividendscommonstockcash$|proceedsfrom|repaymentsofdebt|sharebasedcompensation$|allocatedsharebasedcompensationexpense$|depreciationdepletionandamortization$|depreciationamortizationandaccretionnet$|depreciationandamortization$|depreciationamortizationandother$|otheradjustmentstoreconcilenetincomelosstocashprovidedbyusedinoperatingactivities"#,
|
||||||
|
)
|
||||||
|
.unwrap()
|
||||||
|
.is_match(&normalized)
|
||||||
|
{
|
||||||
|
return Some("cash_flow".to_string());
|
||||||
|
}
|
||||||
if Regex::new(
|
if Regex::new(
|
||||||
r#"asset|liabilit|debt|financingreceivable|loansreceivable|deposits|allowanceforcreditloss|futurepolicybenefits|policyholderaccountbalances|unearnedpremiums|realestateinvestmentproperty|grossatcarryingvalue|investmentproperty"#,
|
r#"asset|liabilit|debt|financingreceivable|loansreceivable|deposits|allowanceforcreditloss|futurepolicybenefits|policyholderaccountbalances|unearnedpremiums|realestateinvestmentproperty|grossatcarryingvalue|investmentproperty"#,
|
||||||
)
|
)
|
||||||
@@ -1967,7 +2200,10 @@ mod tests {
|
|||||||
vec![],
|
vec![],
|
||||||
)
|
)
|
||||||
.expect("core pack should load and map");
|
.expect("core pack should load and map");
|
||||||
let income_surface_rows = model.surface_rows.get("income").expect("income surface rows");
|
let income_surface_rows = model
|
||||||
|
.surface_rows
|
||||||
|
.get("income")
|
||||||
|
.expect("income surface rows");
|
||||||
let op_expenses = income_surface_rows
|
let op_expenses = income_surface_rows
|
||||||
.iter()
|
.iter()
|
||||||
.find(|row| row.key == "operating_expenses")
|
.find(|row| row.key == "operating_expenses")
|
||||||
@@ -1978,7 +2214,10 @@ mod tests {
|
|||||||
.expect("revenue surface row");
|
.expect("revenue surface row");
|
||||||
|
|
||||||
assert_eq!(revenue.values.get("2025").copied().flatten(), Some(120.0));
|
assert_eq!(revenue.values.get("2025").copied().flatten(), Some(120.0));
|
||||||
assert_eq!(op_expenses.values.get("2024").copied().flatten(), Some(40.0));
|
assert_eq!(
|
||||||
|
op_expenses.values.get("2024").copied().flatten(),
|
||||||
|
Some(40.0)
|
||||||
|
);
|
||||||
assert_eq!(op_expenses.detail_count, Some(2));
|
assert_eq!(op_expenses.detail_count, Some(2));
|
||||||
|
|
||||||
let operating_expense_details = model
|
let operating_expense_details = model
|
||||||
@@ -1987,8 +2226,12 @@ mod tests {
|
|||||||
.and_then(|groups| groups.get("operating_expenses"))
|
.and_then(|groups| groups.get("operating_expenses"))
|
||||||
.expect("operating expenses details");
|
.expect("operating expenses details");
|
||||||
assert_eq!(operating_expense_details.len(), 2);
|
assert_eq!(operating_expense_details.len(), 2);
|
||||||
assert!(operating_expense_details.iter().any(|row| row.key == "sga-row"));
|
assert!(operating_expense_details
|
||||||
assert!(operating_expense_details.iter().any(|row| row.key == "rd-row"));
|
.iter()
|
||||||
|
.any(|row| row.key == "sga-row"));
|
||||||
|
assert!(operating_expense_details
|
||||||
|
.iter()
|
||||||
|
.any(|row| row.key == "rd-row"));
|
||||||
|
|
||||||
let residual_rows = model
|
let residual_rows = model
|
||||||
.detail_rows
|
.detail_rows
|
||||||
@@ -2003,17 +2246,26 @@ mod tests {
|
|||||||
.concept_mappings
|
.concept_mappings
|
||||||
.get("http://fasb.org/us-gaap/2024#ResearchAndDevelopmentExpense")
|
.get("http://fasb.org/us-gaap/2024#ResearchAndDevelopmentExpense")
|
||||||
.expect("rd mapping");
|
.expect("rd mapping");
|
||||||
assert_eq!(rd_mapping.detail_parent_surface_key.as_deref(), Some("operating_expenses"));
|
assert_eq!(
|
||||||
assert_eq!(rd_mapping.surface_key.as_deref(), Some("operating_expenses"));
|
rd_mapping.detail_parent_surface_key.as_deref(),
|
||||||
|
Some("operating_expenses")
|
||||||
|
);
|
||||||
|
assert_eq!(
|
||||||
|
rd_mapping.surface_key.as_deref(),
|
||||||
|
Some("operating_expenses")
|
||||||
|
);
|
||||||
|
|
||||||
let residual_mapping = model
|
let residual_mapping = model
|
||||||
.concept_mappings
|
.concept_mappings
|
||||||
.get("urn:company#OtherOperatingCharges")
|
.get("urn:company#OtherOperatingCharges")
|
||||||
.expect("residual mapping");
|
.expect("residual mapping");
|
||||||
assert!(residual_mapping.residual_flag);
|
assert!(residual_mapping.residual_flag);
|
||||||
assert_eq!(residual_mapping.detail_parent_surface_key.as_deref(), Some("unmapped"));
|
assert_eq!(
|
||||||
|
residual_mapping.detail_parent_surface_key.as_deref(),
|
||||||
|
Some("unmapped")
|
||||||
|
);
|
||||||
|
|
||||||
assert_eq!(model.normalization_summary.surface_row_count, 5);
|
assert_eq!(model.normalization_summary.surface_row_count, 6);
|
||||||
assert_eq!(model.normalization_summary.detail_row_count, 3);
|
assert_eq!(model.normalization_summary.detail_row_count, 3);
|
||||||
assert_eq!(model.normalization_summary.unmapped_row_count, 1);
|
assert_eq!(model.normalization_summary.unmapped_row_count, 1);
|
||||||
}
|
}
|
||||||
@@ -2051,18 +2303,60 @@ mod tests {
|
|||||||
#[test]
|
#[test]
|
||||||
fn classifies_pack_specific_concepts_without_presentation_roles() {
|
fn classifies_pack_specific_concepts_without_presentation_roles() {
|
||||||
assert_eq!(
|
assert_eq!(
|
||||||
concept_statement_fallback("FinancingReceivableExcludingAccruedInterestAfterAllowanceForCreditLoss")
|
concept_statement_fallback(
|
||||||
|
"FinancingReceivableExcludingAccruedInterestAfterAllowanceForCreditLoss"
|
||||||
|
)
|
||||||
.as_deref(),
|
.as_deref(),
|
||||||
Some("balance")
|
Some("balance")
|
||||||
);
|
);
|
||||||
assert_eq!(concept_statement_fallback("Deposits").as_deref(), Some("balance"));
|
assert_eq!(
|
||||||
|
concept_statement_fallback("Deposits").as_deref(),
|
||||||
|
Some("balance")
|
||||||
|
);
|
||||||
assert_eq!(
|
assert_eq!(
|
||||||
concept_statement_fallback("RealEstateInvestmentPropertyNet").as_deref(),
|
concept_statement_fallback("RealEstateInvestmentPropertyNet").as_deref(),
|
||||||
Some("balance")
|
Some("balance")
|
||||||
);
|
);
|
||||||
assert_eq!(concept_statement_fallback("LeaseIncome").as_deref(), Some("income"));
|
|
||||||
assert_eq!(
|
assert_eq!(
|
||||||
concept_statement_fallback("DirectCostsOfLeasedAndRentedPropertyOrEquipment").as_deref(),
|
concept_statement_fallback("DeferredPolicyAcquisitionCosts").as_deref(),
|
||||||
|
Some("balance")
|
||||||
|
);
|
||||||
|
assert_eq!(
|
||||||
|
concept_statement_fallback("DeferredPolicyAcquisitionCostsAndValueOfBusinessAcquired")
|
||||||
|
.as_deref(),
|
||||||
|
Some("balance")
|
||||||
|
);
|
||||||
|
assert_eq!(
|
||||||
|
concept_statement_fallback("IncreaseDecreaseInAccountsReceivable").as_deref(),
|
||||||
|
Some("cash_flow")
|
||||||
|
);
|
||||||
|
assert_eq!(
|
||||||
|
concept_statement_fallback("PaymentsOfDividends").as_deref(),
|
||||||
|
Some("cash_flow")
|
||||||
|
);
|
||||||
|
assert_eq!(
|
||||||
|
concept_statement_fallback("RepaymentsOfDebt").as_deref(),
|
||||||
|
Some("cash_flow")
|
||||||
|
);
|
||||||
|
assert_eq!(
|
||||||
|
concept_statement_fallback("ShareBasedCompensation").as_deref(),
|
||||||
|
Some("cash_flow")
|
||||||
|
);
|
||||||
|
assert_eq!(
|
||||||
|
concept_statement_fallback("PaymentsForCapitalImprovements").as_deref(),
|
||||||
|
Some("cash_flow")
|
||||||
|
);
|
||||||
|
assert_eq!(
|
||||||
|
concept_statement_fallback("PaymentsForDepositsOnRealEstateAcquisitions").as_deref(),
|
||||||
|
Some("cash_flow")
|
||||||
|
);
|
||||||
|
assert_eq!(
|
||||||
|
concept_statement_fallback("LeaseIncome").as_deref(),
|
||||||
|
Some("income")
|
||||||
|
);
|
||||||
|
assert_eq!(
|
||||||
|
concept_statement_fallback("DirectCostsOfLeasedAndRentedPropertyOrEquipment")
|
||||||
|
.as_deref(),
|
||||||
Some("income")
|
Some("income")
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|||||||
File diff suppressed because it is too large
Load Diff
@@ -1,12 +1,22 @@
|
|||||||
use anyhow::{anyhow, Context, Result};
|
use anyhow::{anyhow, Context, Result};
|
||||||
use serde::Deserialize;
|
use serde::Deserialize;
|
||||||
|
use std::collections::HashMap;
|
||||||
use std::env;
|
use std::env;
|
||||||
use std::fs;
|
use std::fs;
|
||||||
use std::collections::HashMap;
|
|
||||||
use std::path::PathBuf;
|
use std::path::PathBuf;
|
||||||
|
|
||||||
use crate::pack_selector::FiscalPack;
|
use crate::pack_selector::FiscalPack;
|
||||||
|
|
||||||
|
fn default_include_in_output() -> bool {
|
||||||
|
true
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Deserialize, Clone, Copy, PartialEq, Eq)]
|
||||||
|
#[serde(rename_all = "snake_case")]
|
||||||
|
pub enum SurfaceSignTransform {
|
||||||
|
Invert,
|
||||||
|
}
|
||||||
|
|
||||||
#[derive(Debug, Deserialize, Clone)]
|
#[derive(Debug, Deserialize, Clone)]
|
||||||
pub struct SurfacePackFile {
|
pub struct SurfacePackFile {
|
||||||
pub version: String,
|
pub version: String,
|
||||||
@@ -25,9 +35,44 @@ pub struct SurfaceDefinition {
|
|||||||
pub rollup_policy: String,
|
pub rollup_policy: String,
|
||||||
pub allowed_source_concepts: Vec<String>,
|
pub allowed_source_concepts: Vec<String>,
|
||||||
pub allowed_authoritative_concepts: Vec<String>,
|
pub allowed_authoritative_concepts: Vec<String>,
|
||||||
pub formula_fallback: Option<serde_json::Value>,
|
pub formula_fallback: Option<SurfaceFormulaFallback>,
|
||||||
pub detail_grouping_policy: String,
|
pub detail_grouping_policy: String,
|
||||||
pub materiality_policy: String,
|
pub materiality_policy: String,
|
||||||
|
#[serde(default = "default_include_in_output")]
|
||||||
|
pub include_in_output: bool,
|
||||||
|
#[serde(default)]
|
||||||
|
pub sign_transform: Option<SurfaceSignTransform>,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Deserialize, Clone)]
|
||||||
|
#[serde(untagged)]
|
||||||
|
pub enum SurfaceFormulaFallback {
|
||||||
|
LegacyString(#[allow(dead_code)] String),
|
||||||
|
Structured(SurfaceFormula),
|
||||||
|
}
|
||||||
|
|
||||||
|
impl SurfaceFormulaFallback {
|
||||||
|
pub fn structured(&self) -> Option<&SurfaceFormula> {
|
||||||
|
match self {
|
||||||
|
Self::Structured(formula) => Some(formula),
|
||||||
|
Self::LegacyString(_) => None,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Deserialize, Clone)]
|
||||||
|
pub struct SurfaceFormula {
|
||||||
|
pub op: SurfaceFormulaOp,
|
||||||
|
pub sources: Vec<String>,
|
||||||
|
#[serde(default)]
|
||||||
|
pub treat_null_as_zero: bool,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Deserialize, Clone, Copy, PartialEq, Eq)]
|
||||||
|
#[serde(rename_all = "snake_case")]
|
||||||
|
pub enum SurfaceFormulaOp {
|
||||||
|
Sum,
|
||||||
|
Subtract,
|
||||||
}
|
}
|
||||||
|
|
||||||
#[derive(Debug, Deserialize, Clone)]
|
#[derive(Debug, Deserialize, Clone)]
|
||||||
@@ -147,7 +192,9 @@ pub fn resolve_taxonomy_dir() -> Result<PathBuf> {
|
|||||||
candidates
|
candidates
|
||||||
.into_iter()
|
.into_iter()
|
||||||
.find(|path| path.is_dir())
|
.find(|path| path.is_dir())
|
||||||
.ok_or_else(|| anyhow!("taxonomy resolution failed: unable to locate runtime taxonomy directory"))
|
.ok_or_else(|| {
|
||||||
|
anyhow!("taxonomy resolution failed: unable to locate runtime taxonomy directory")
|
||||||
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
pub fn load_surface_pack(pack: FiscalPack) -> Result<SurfacePackFile> {
|
pub fn load_surface_pack(pack: FiscalPack) -> Result<SurfacePackFile> {
|
||||||
@@ -156,14 +203,52 @@ pub fn load_surface_pack(pack: FiscalPack) -> Result<SurfacePackFile> {
|
|||||||
.join("fiscal")
|
.join("fiscal")
|
||||||
.join("v1")
|
.join("v1")
|
||||||
.join(format!("{}.surface.json", pack.as_str()));
|
.join(format!("{}.surface.json", pack.as_str()));
|
||||||
let raw = fs::read_to_string(&path)
|
let mut file = load_surface_pack_file(&path)?;
|
||||||
.with_context(|| format!("taxonomy resolution failed: unable to read {}", path.display()))?;
|
|
||||||
let file = serde_json::from_str::<SurfacePackFile>(&raw)
|
if !matches!(pack, FiscalPack::Core) {
|
||||||
.with_context(|| format!("taxonomy resolution failed: unable to parse {}", path.display()))?;
|
let core_path = taxonomy_dir
|
||||||
|
.join("fiscal")
|
||||||
|
.join("v1")
|
||||||
|
.join("core.surface.json");
|
||||||
|
let core_file = load_surface_pack_file(&core_path)?;
|
||||||
|
let pack_inherited_keys = file
|
||||||
|
.surfaces
|
||||||
|
.iter()
|
||||||
|
.filter(|surface| surface.statement == "balance" || surface.statement == "cash_flow")
|
||||||
|
.map(|surface| (surface.statement.clone(), surface.surface_key.clone()))
|
||||||
|
.collect::<std::collections::HashSet<_>>();
|
||||||
|
|
||||||
|
file.surfaces.extend(
|
||||||
|
core_file
|
||||||
|
.surfaces
|
||||||
|
.into_iter()
|
||||||
|
.filter(|surface| surface.statement == "balance" || surface.statement == "cash_flow")
|
||||||
|
.filter(|surface| {
|
||||||
|
!pack_inherited_keys
|
||||||
|
.contains(&(surface.statement.clone(), surface.surface_key.clone()))
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
let _ = (&file.version, &file.pack);
|
let _ = (&file.version, &file.pack);
|
||||||
Ok(file)
|
Ok(file)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
fn load_surface_pack_file(path: &PathBuf) -> Result<SurfacePackFile> {
|
||||||
|
let raw = fs::read_to_string(path).with_context(|| {
|
||||||
|
format!(
|
||||||
|
"taxonomy resolution failed: unable to read {}",
|
||||||
|
path.display()
|
||||||
|
)
|
||||||
|
})?;
|
||||||
|
serde_json::from_str::<SurfacePackFile>(&raw).with_context(|| {
|
||||||
|
format!(
|
||||||
|
"taxonomy resolution failed: unable to parse {}",
|
||||||
|
path.display()
|
||||||
|
)
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
pub fn load_crosswalk(regime: &str) -> Result<Option<CrosswalkFile>> {
|
pub fn load_crosswalk(regime: &str) -> Result<Option<CrosswalkFile>> {
|
||||||
let file_name = match regime {
|
let file_name = match regime {
|
||||||
"us-gaap" => "us-gaap.json",
|
"us-gaap" => "us-gaap.json",
|
||||||
@@ -173,10 +258,18 @@ pub fn load_crosswalk(regime: &str) -> Result<Option<CrosswalkFile>> {
|
|||||||
|
|
||||||
let taxonomy_dir = resolve_taxonomy_dir()?;
|
let taxonomy_dir = resolve_taxonomy_dir()?;
|
||||||
let path = taxonomy_dir.join("crosswalk").join(file_name);
|
let path = taxonomy_dir.join("crosswalk").join(file_name);
|
||||||
let raw = fs::read_to_string(&path)
|
let raw = fs::read_to_string(&path).with_context(|| {
|
||||||
.with_context(|| format!("taxonomy resolution failed: unable to read {}", path.display()))?;
|
format!(
|
||||||
let file = serde_json::from_str::<CrosswalkFile>(&raw)
|
"taxonomy resolution failed: unable to read {}",
|
||||||
.with_context(|| format!("taxonomy resolution failed: unable to parse {}", path.display()))?;
|
path.display()
|
||||||
|
)
|
||||||
|
})?;
|
||||||
|
let file = serde_json::from_str::<CrosswalkFile>(&raw).with_context(|| {
|
||||||
|
format!(
|
||||||
|
"taxonomy resolution failed: unable to parse {}",
|
||||||
|
path.display()
|
||||||
|
)
|
||||||
|
})?;
|
||||||
let _ = (&file.version, &file.regime);
|
let _ = (&file.version, &file.regime);
|
||||||
Ok(Some(file))
|
Ok(Some(file))
|
||||||
}
|
}
|
||||||
@@ -188,10 +281,18 @@ pub fn load_kpi_pack(pack: FiscalPack) -> Result<KpiPackFile> {
|
|||||||
.join("v1")
|
.join("v1")
|
||||||
.join("kpis")
|
.join("kpis")
|
||||||
.join(format!("{}.kpis.json", pack.as_str()));
|
.join(format!("{}.kpis.json", pack.as_str()));
|
||||||
let raw = fs::read_to_string(&path)
|
let raw = fs::read_to_string(&path).with_context(|| {
|
||||||
.with_context(|| format!("taxonomy resolution failed: unable to read {}", path.display()))?;
|
format!(
|
||||||
let file = serde_json::from_str::<KpiPackFile>(&raw)
|
"taxonomy resolution failed: unable to read {}",
|
||||||
.with_context(|| format!("taxonomy resolution failed: unable to parse {}", path.display()))?;
|
path.display()
|
||||||
|
)
|
||||||
|
})?;
|
||||||
|
let file = serde_json::from_str::<KpiPackFile>(&raw).with_context(|| {
|
||||||
|
format!(
|
||||||
|
"taxonomy resolution failed: unable to parse {}",
|
||||||
|
path.display()
|
||||||
|
)
|
||||||
|
})?;
|
||||||
let _ = (&file.version, &file.pack);
|
let _ = (&file.version, &file.pack);
|
||||||
Ok(file)
|
Ok(file)
|
||||||
}
|
}
|
||||||
@@ -202,10 +303,18 @@ pub fn load_universal_income_definitions() -> Result<UniversalIncomeFile> {
|
|||||||
.join("fiscal")
|
.join("fiscal")
|
||||||
.join("v1")
|
.join("v1")
|
||||||
.join("universal_income.surface.json");
|
.join("universal_income.surface.json");
|
||||||
let raw = fs::read_to_string(&path)
|
let raw = fs::read_to_string(&path).with_context(|| {
|
||||||
.with_context(|| format!("taxonomy resolution failed: unable to read {}", path.display()))?;
|
format!(
|
||||||
let file = serde_json::from_str::<UniversalIncomeFile>(&raw)
|
"taxonomy resolution failed: unable to read {}",
|
||||||
.with_context(|| format!("taxonomy resolution failed: unable to parse {}", path.display()))?;
|
path.display()
|
||||||
|
)
|
||||||
|
})?;
|
||||||
|
let file = serde_json::from_str::<UniversalIncomeFile>(&raw).with_context(|| {
|
||||||
|
format!(
|
||||||
|
"taxonomy resolution failed: unable to parse {}",
|
||||||
|
path.display()
|
||||||
|
)
|
||||||
|
})?;
|
||||||
let _ = &file.version;
|
let _ = &file.version;
|
||||||
Ok(file)
|
Ok(file)
|
||||||
}
|
}
|
||||||
@@ -216,10 +325,18 @@ pub fn load_income_bridge(pack: FiscalPack) -> Result<IncomeBridgeFile> {
|
|||||||
.join("fiscal")
|
.join("fiscal")
|
||||||
.join("v1")
|
.join("v1")
|
||||||
.join(format!("{}.income-bridge.json", pack.as_str()));
|
.join(format!("{}.income-bridge.json", pack.as_str()));
|
||||||
let raw = fs::read_to_string(&path)
|
let raw = fs::read_to_string(&path).with_context(|| {
|
||||||
.with_context(|| format!("taxonomy resolution failed: unable to read {}", path.display()))?;
|
format!(
|
||||||
let file = serde_json::from_str::<IncomeBridgeFile>(&raw)
|
"taxonomy resolution failed: unable to read {}",
|
||||||
.with_context(|| format!("taxonomy resolution failed: unable to parse {}", path.display()))?;
|
path.display()
|
||||||
|
)
|
||||||
|
})?;
|
||||||
|
let file = serde_json::from_str::<IncomeBridgeFile>(&raw).with_context(|| {
|
||||||
|
format!(
|
||||||
|
"taxonomy resolution failed: unable to parse {}",
|
||||||
|
path.display()
|
||||||
|
)
|
||||||
|
})?;
|
||||||
let _ = (&file.version, &file.pack);
|
let _ = (&file.version, &file.pack);
|
||||||
Ok(file)
|
Ok(file)
|
||||||
}
|
}
|
||||||
@@ -230,17 +347,20 @@ mod tests {
|
|||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn resolves_taxonomy_dir_and_loads_core_pack() {
|
fn resolves_taxonomy_dir_and_loads_core_pack() {
|
||||||
let taxonomy_dir = resolve_taxonomy_dir().expect("taxonomy dir should resolve during tests");
|
let taxonomy_dir =
|
||||||
|
resolve_taxonomy_dir().expect("taxonomy dir should resolve during tests");
|
||||||
assert!(taxonomy_dir.exists());
|
assert!(taxonomy_dir.exists());
|
||||||
|
|
||||||
let surface_pack = load_surface_pack(FiscalPack::Core).expect("core surface pack should load");
|
let surface_pack =
|
||||||
|
load_surface_pack(FiscalPack::Core).expect("core surface pack should load");
|
||||||
assert_eq!(surface_pack.pack, "core");
|
assert_eq!(surface_pack.pack, "core");
|
||||||
assert!(!surface_pack.surfaces.is_empty());
|
assert!(!surface_pack.surfaces.is_empty());
|
||||||
|
|
||||||
let kpi_pack = load_kpi_pack(FiscalPack::Core).expect("core kpi pack should load");
|
let kpi_pack = load_kpi_pack(FiscalPack::Core).expect("core kpi pack should load");
|
||||||
assert_eq!(kpi_pack.pack, "core");
|
assert_eq!(kpi_pack.pack, "core");
|
||||||
|
|
||||||
let universal_income = load_universal_income_definitions().expect("universal income config should load");
|
let universal_income =
|
||||||
|
load_universal_income_definitions().expect("universal income config should load");
|
||||||
assert!(!universal_income.rows.is_empty());
|
assert!(!universal_income.rows.is_empty());
|
||||||
|
|
||||||
let core_bridge = load_income_bridge(FiscalPack::Core).expect("core bridge should load");
|
let core_bridge = load_income_bridge(FiscalPack::Core).expect("core bridge should load");
|
||||||
|
|||||||
File diff suppressed because it is too large
Load Diff
@@ -156,7 +156,7 @@
|
|||||||
"surface_key": "loans",
|
"surface_key": "loans",
|
||||||
"statement": "balance",
|
"statement": "balance",
|
||||||
"label": "Loans",
|
"label": "Loans",
|
||||||
"category": "surface",
|
"category": "noncurrent_assets",
|
||||||
"order": 30,
|
"order": 30,
|
||||||
"unit": "currency",
|
"unit": "currency",
|
||||||
"rollup_policy": "aggregate_children",
|
"rollup_policy": "aggregate_children",
|
||||||
@@ -181,7 +181,7 @@
|
|||||||
"surface_key": "allowance_for_credit_losses",
|
"surface_key": "allowance_for_credit_losses",
|
||||||
"statement": "balance",
|
"statement": "balance",
|
||||||
"label": "Allowance for Credit Losses",
|
"label": "Allowance for Credit Losses",
|
||||||
"category": "surface",
|
"category": "noncurrent_assets",
|
||||||
"order": 40,
|
"order": 40,
|
||||||
"unit": "currency",
|
"unit": "currency",
|
||||||
"rollup_policy": "aggregate_children",
|
"rollup_policy": "aggregate_children",
|
||||||
@@ -201,7 +201,7 @@
|
|||||||
"surface_key": "deposits",
|
"surface_key": "deposits",
|
||||||
"statement": "balance",
|
"statement": "balance",
|
||||||
"label": "Deposits",
|
"label": "Deposits",
|
||||||
"category": "surface",
|
"category": "current_liabilities",
|
||||||
"order": 80,
|
"order": 80,
|
||||||
"unit": "currency",
|
"unit": "currency",
|
||||||
"rollup_policy": "aggregate_children",
|
"rollup_policy": "aggregate_children",
|
||||||
@@ -215,7 +215,7 @@
|
|||||||
"surface_key": "total_assets",
|
"surface_key": "total_assets",
|
||||||
"statement": "balance",
|
"statement": "balance",
|
||||||
"label": "Total Assets",
|
"label": "Total Assets",
|
||||||
"category": "surface",
|
"category": "derived",
|
||||||
"order": 90,
|
"order": 90,
|
||||||
"unit": "currency",
|
"unit": "currency",
|
||||||
"rollup_policy": "direct_only",
|
"rollup_policy": "direct_only",
|
||||||
@@ -229,7 +229,7 @@
|
|||||||
"surface_key": "total_liabilities",
|
"surface_key": "total_liabilities",
|
||||||
"statement": "balance",
|
"statement": "balance",
|
||||||
"label": "Total Liabilities",
|
"label": "Total Liabilities",
|
||||||
"category": "surface",
|
"category": "derived",
|
||||||
"order": 100,
|
"order": 100,
|
||||||
"unit": "currency",
|
"unit": "currency",
|
||||||
"rollup_policy": "direct_only",
|
"rollup_policy": "direct_only",
|
||||||
@@ -243,7 +243,7 @@
|
|||||||
"surface_key": "total_equity",
|
"surface_key": "total_equity",
|
||||||
"statement": "balance",
|
"statement": "balance",
|
||||||
"label": "Total Equity",
|
"label": "Total Equity",
|
||||||
"category": "surface",
|
"category": "equity",
|
||||||
"order": 110,
|
"order": 110,
|
||||||
"unit": "currency",
|
"unit": "currency",
|
||||||
"rollup_policy": "direct_only",
|
"rollup_policy": "direct_only",
|
||||||
|
|||||||
@@ -63,7 +63,7 @@
|
|||||||
"surface_key": "total_assets",
|
"surface_key": "total_assets",
|
||||||
"statement": "balance",
|
"statement": "balance",
|
||||||
"label": "Total Assets",
|
"label": "Total Assets",
|
||||||
"category": "surface",
|
"category": "derived",
|
||||||
"order": 90,
|
"order": 90,
|
||||||
"unit": "currency",
|
"unit": "currency",
|
||||||
"rollup_policy": "direct_only",
|
"rollup_policy": "direct_only",
|
||||||
@@ -77,7 +77,7 @@
|
|||||||
"surface_key": "total_liabilities",
|
"surface_key": "total_liabilities",
|
||||||
"statement": "balance",
|
"statement": "balance",
|
||||||
"label": "Total Liabilities",
|
"label": "Total Liabilities",
|
||||||
"category": "surface",
|
"category": "derived",
|
||||||
"order": 100,
|
"order": 100,
|
||||||
"unit": "currency",
|
"unit": "currency",
|
||||||
"rollup_policy": "direct_only",
|
"rollup_policy": "direct_only",
|
||||||
@@ -91,7 +91,7 @@
|
|||||||
"surface_key": "total_equity",
|
"surface_key": "total_equity",
|
||||||
"statement": "balance",
|
"statement": "balance",
|
||||||
"label": "Total Equity",
|
"label": "Total Equity",
|
||||||
"category": "surface",
|
"category": "equity",
|
||||||
"order": 110,
|
"order": 110,
|
||||||
"unit": "currency",
|
"unit": "currency",
|
||||||
"rollup_policy": "direct_only",
|
"rollup_policy": "direct_only",
|
||||||
|
|||||||
File diff suppressed because it is too large
Load Diff
@@ -119,7 +119,7 @@
|
|||||||
"surface_key": "policy_liabilities",
|
"surface_key": "policy_liabilities",
|
||||||
"statement": "balance",
|
"statement": "balance",
|
||||||
"label": "Policy Liabilities",
|
"label": "Policy Liabilities",
|
||||||
"category": "surface",
|
"category": "noncurrent_liabilities",
|
||||||
"order": 80,
|
"order": 80,
|
||||||
"unit": "currency",
|
"unit": "currency",
|
||||||
"rollup_policy": "aggregate_children",
|
"rollup_policy": "aggregate_children",
|
||||||
@@ -145,17 +145,19 @@
|
|||||||
"surface_key": "deferred_acquisition_costs",
|
"surface_key": "deferred_acquisition_costs",
|
||||||
"statement": "balance",
|
"statement": "balance",
|
||||||
"label": "Deferred Acquisition Costs",
|
"label": "Deferred Acquisition Costs",
|
||||||
"category": "surface",
|
"category": "noncurrent_assets",
|
||||||
"order": 90,
|
"order": 90,
|
||||||
"unit": "currency",
|
"unit": "currency",
|
||||||
"rollup_policy": "aggregate_children",
|
"rollup_policy": "aggregate_children",
|
||||||
"allowed_source_concepts": [
|
"allowed_source_concepts": [
|
||||||
"us-gaap:DeferredPolicyAcquisitionCosts",
|
"us-gaap:DeferredPolicyAcquisitionCosts",
|
||||||
"us-gaap:DeferredAcquisitionCosts"
|
"us-gaap:DeferredAcquisitionCosts",
|
||||||
|
"us-gaap:DeferredPolicyAcquisitionCostsAndValueOfBusinessAcquired"
|
||||||
],
|
],
|
||||||
"allowed_authoritative_concepts": [
|
"allowed_authoritative_concepts": [
|
||||||
"us-gaap:DeferredPolicyAcquisitionCosts",
|
"us-gaap:DeferredPolicyAcquisitionCosts",
|
||||||
"us-gaap:DeferredAcquisitionCosts"
|
"us-gaap:DeferredAcquisitionCosts",
|
||||||
|
"us-gaap:DeferredPolicyAcquisitionCostsAndValueOfBusinessAcquired"
|
||||||
],
|
],
|
||||||
"formula_fallback": null,
|
"formula_fallback": null,
|
||||||
"detail_grouping_policy": "group_all_children",
|
"detail_grouping_policy": "group_all_children",
|
||||||
@@ -165,7 +167,7 @@
|
|||||||
"surface_key": "total_assets",
|
"surface_key": "total_assets",
|
||||||
"statement": "balance",
|
"statement": "balance",
|
||||||
"label": "Total Assets",
|
"label": "Total Assets",
|
||||||
"category": "surface",
|
"category": "derived",
|
||||||
"order": 100,
|
"order": 100,
|
||||||
"unit": "currency",
|
"unit": "currency",
|
||||||
"rollup_policy": "direct_only",
|
"rollup_policy": "direct_only",
|
||||||
@@ -179,7 +181,7 @@
|
|||||||
"surface_key": "total_liabilities",
|
"surface_key": "total_liabilities",
|
||||||
"statement": "balance",
|
"statement": "balance",
|
||||||
"label": "Total Liabilities",
|
"label": "Total Liabilities",
|
||||||
"category": "surface",
|
"category": "derived",
|
||||||
"order": 110,
|
"order": 110,
|
||||||
"unit": "currency",
|
"unit": "currency",
|
||||||
"rollup_policy": "direct_only",
|
"rollup_policy": "direct_only",
|
||||||
@@ -193,7 +195,7 @@
|
|||||||
"surface_key": "total_equity",
|
"surface_key": "total_equity",
|
||||||
"statement": "balance",
|
"statement": "balance",
|
||||||
"label": "Total Equity",
|
"label": "Total Equity",
|
||||||
"category": "surface",
|
"category": "equity",
|
||||||
"order": 120,
|
"order": 120,
|
||||||
"unit": "currency",
|
"unit": "currency",
|
||||||
"rollup_policy": "direct_only",
|
"rollup_policy": "direct_only",
|
||||||
|
|||||||
@@ -78,7 +78,7 @@
|
|||||||
"surface_key": "investment_property",
|
"surface_key": "investment_property",
|
||||||
"statement": "balance",
|
"statement": "balance",
|
||||||
"label": "Investment Property",
|
"label": "Investment Property",
|
||||||
"category": "surface",
|
"category": "noncurrent_assets",
|
||||||
"order": 40,
|
"order": 40,
|
||||||
"unit": "currency",
|
"unit": "currency",
|
||||||
"rollup_policy": "aggregate_children",
|
"rollup_policy": "aggregate_children",
|
||||||
@@ -99,7 +99,7 @@
|
|||||||
"surface_key": "total_assets",
|
"surface_key": "total_assets",
|
||||||
"statement": "balance",
|
"statement": "balance",
|
||||||
"label": "Total Assets",
|
"label": "Total Assets",
|
||||||
"category": "surface",
|
"category": "derived",
|
||||||
"order": 90,
|
"order": 90,
|
||||||
"unit": "currency",
|
"unit": "currency",
|
||||||
"rollup_policy": "direct_only",
|
"rollup_policy": "direct_only",
|
||||||
@@ -113,7 +113,7 @@
|
|||||||
"surface_key": "total_liabilities",
|
"surface_key": "total_liabilities",
|
||||||
"statement": "balance",
|
"statement": "balance",
|
||||||
"label": "Total Liabilities",
|
"label": "Total Liabilities",
|
||||||
"category": "surface",
|
"category": "derived",
|
||||||
"order": 100,
|
"order": 100,
|
||||||
"unit": "currency",
|
"unit": "currency",
|
||||||
"rollup_policy": "direct_only",
|
"rollup_policy": "direct_only",
|
||||||
@@ -127,7 +127,7 @@
|
|||||||
"surface_key": "total_equity",
|
"surface_key": "total_equity",
|
||||||
"statement": "balance",
|
"statement": "balance",
|
||||||
"label": "Total Equity",
|
"label": "Total Equity",
|
||||||
"category": "surface",
|
"category": "equity",
|
||||||
"order": 110,
|
"order": 110,
|
||||||
"unit": "currency",
|
"unit": "currency",
|
||||||
"rollup_policy": "direct_only",
|
"rollup_policy": "direct_only",
|
||||||
@@ -136,6 +136,25 @@
|
|||||||
"formula_fallback": null,
|
"formula_fallback": null,
|
||||||
"detail_grouping_policy": "top_level_only",
|
"detail_grouping_policy": "top_level_only",
|
||||||
"materiality_policy": "balance_default"
|
"materiality_policy": "balance_default"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"surface_key": "capital_expenditures",
|
||||||
|
"statement": "cash_flow",
|
||||||
|
"label": "Capital Expenditures",
|
||||||
|
"category": "investing",
|
||||||
|
"order": 130,
|
||||||
|
"unit": "currency",
|
||||||
|
"rollup_policy": "aggregate_children",
|
||||||
|
"allowed_source_concepts": [
|
||||||
|
"us-gaap:PaymentsToAcquireCommercialRealEstate",
|
||||||
|
"us-gaap:PaymentsForCapitalImprovements",
|
||||||
|
"us-gaap:PaymentsForDepositsOnRealEstateAcquisitions"
|
||||||
|
],
|
||||||
|
"allowed_authoritative_concepts": [],
|
||||||
|
"formula_fallback": null,
|
||||||
|
"detail_grouping_policy": "group_all_children",
|
||||||
|
"materiality_policy": "cash_flow_default",
|
||||||
|
"sign_transform": "invert"
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -5,7 +5,7 @@ import { hydrateFilingTaxonomySnapshot } from '@/lib/server/taxonomy/engine';
|
|||||||
import type { TaxonomyHydrationInput, TaxonomyHydrationResult } from '@/lib/server/taxonomy/types';
|
import type { TaxonomyHydrationInput, TaxonomyHydrationResult } from '@/lib/server/taxonomy/types';
|
||||||
|
|
||||||
type ComparisonTarget = {
|
type ComparisonTarget = {
|
||||||
statement: Extract<FinancialStatementKind, 'income' | 'balance'>;
|
statement: Extract<FinancialStatementKind, 'income' | 'balance' | 'cash_flow'>;
|
||||||
surfaceKey: string;
|
surfaceKey: string;
|
||||||
fiscalAiLabels: string[];
|
fiscalAiLabels: string[];
|
||||||
allowNotMeaningful?: boolean;
|
allowNotMeaningful?: boolean;
|
||||||
@@ -46,7 +46,7 @@ type FiscalAiTable = {
|
|||||||
};
|
};
|
||||||
|
|
||||||
type ComparisonRow = {
|
type ComparisonRow = {
|
||||||
statement: Extract<FinancialStatementKind, 'income' | 'balance'>;
|
statement: Extract<FinancialStatementKind, 'income' | 'balance' | 'cash_flow'>;
|
||||||
surfaceKey: string;
|
surfaceKey: string;
|
||||||
fiscalAiLabel: string | null;
|
fiscalAiLabel: string | null;
|
||||||
fiscalAiValueM: number | null;
|
fiscalAiValueM: number | null;
|
||||||
@@ -89,6 +89,11 @@ const CASES: CompanyCase[] = [
|
|||||||
surfaceKey: 'net_income',
|
surfaceKey: 'net_income',
|
||||||
fiscalAiLabels: ['Net Income Attributable to Common Shareholders', 'Consolidated Net Income', 'Net Income']
|
fiscalAiLabels: ['Net Income Attributable to Common Shareholders', 'Consolidated Net Income', 'Net Income']
|
||||||
},
|
},
|
||||||
|
{ statement: 'balance', surfaceKey: 'current_assets', fiscalAiLabels: ['Current Assets', 'Total Current Assets'] },
|
||||||
|
{ statement: 'balance', surfaceKey: 'total_assets', fiscalAiLabels: ['Total Assets'] },
|
||||||
|
{ statement: 'cash_flow', surfaceKey: 'operating_cash_flow', fiscalAiLabels: ['Cash from Operating Activities', 'Operating Cash Flow', 'Net Cash from Operations', 'Net Cash Provided by Operating'] },
|
||||||
|
{ statement: 'cash_flow', surfaceKey: 'capital_expenditures', fiscalAiLabels: ['Capital Expenditures', 'Capital Expenditure'] },
|
||||||
|
{ statement: 'cash_flow', surfaceKey: 'free_cash_flow', fiscalAiLabels: ['Free Cash Flow', 'Levered Free Cash Flow'] },
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -113,6 +118,11 @@ const CASES: CompanyCase[] = [
|
|||||||
surfaceKey: 'net_income',
|
surfaceKey: 'net_income',
|
||||||
fiscalAiLabels: ['Net Income to Common', 'Net Income Attributable to Common Shareholders', 'Net Income']
|
fiscalAiLabels: ['Net Income to Common', 'Net Income Attributable to Common Shareholders', 'Net Income']
|
||||||
},
|
},
|
||||||
|
{ statement: 'balance', surfaceKey: 'loans', fiscalAiLabels: ['Net Loans', 'Loans', 'Loans Receivable'] },
|
||||||
|
{ statement: 'balance', surfaceKey: 'total_assets', fiscalAiLabels: ['Total Assets'] },
|
||||||
|
{ statement: 'cash_flow', surfaceKey: 'operating_cash_flow', fiscalAiLabels: ['Cash from Operating Activities', 'Net Cash from Operating Activities', 'Net Cash Provided by Operating'] },
|
||||||
|
{ statement: 'cash_flow', surfaceKey: 'investing_cash_flow', fiscalAiLabels: ['Cash from Investing Activities', 'Net Cash from Investing Activities', 'Net Cash Provided by Investing'] },
|
||||||
|
{ statement: 'cash_flow', surfaceKey: 'financing_cash_flow', fiscalAiLabels: ['Cash from Financing Activities', 'Net Cash from Financing Activities', 'Net Cash Provided by Financing'] },
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -137,6 +147,18 @@ const CASES: CompanyCase[] = [
|
|||||||
surfaceKey: 'net_income',
|
surfaceKey: 'net_income',
|
||||||
fiscalAiLabels: ['Net Income Attributable to Common Shareholders', 'Consolidated Net Income', 'Net Income']
|
fiscalAiLabels: ['Net Income Attributable to Common Shareholders', 'Consolidated Net Income', 'Net Income']
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
statement: 'balance',
|
||||||
|
surfaceKey: 'deferred_acquisition_costs',
|
||||||
|
fiscalAiLabels: [
|
||||||
|
'Deferred Acquisition Costs',
|
||||||
|
'Deferred Policy Acquisition Costs',
|
||||||
|
'Deferred Policy Acquisition Costs and Value of Business Acquired'
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{ statement: 'balance', surfaceKey: 'total_assets', fiscalAiLabels: ['Total Assets'] },
|
||||||
|
{ statement: 'cash_flow', surfaceKey: 'operating_cash_flow', fiscalAiLabels: ['Cash from Operating Activities', 'Operating Cash Flow', 'Net Cash from Operations', 'Net Cash Provided by Operating'] },
|
||||||
|
{ statement: 'cash_flow', surfaceKey: 'free_cash_flow', fiscalAiLabels: ['Free Cash Flow', 'Levered Free Cash Flow'] },
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -154,7 +176,22 @@ const CASES: CompanyCase[] = [
|
|||||||
statement: 'income',
|
statement: 'income',
|
||||||
surfaceKey: 'net_income',
|
surfaceKey: 'net_income',
|
||||||
fiscalAiLabels: ['Net Income Attributable to Common Shareholders', 'Consolidated Net Income', 'Net Income']
|
fiscalAiLabels: ['Net Income Attributable to Common Shareholders', 'Consolidated Net Income', 'Net Income']
|
||||||
}
|
},
|
||||||
|
{
|
||||||
|
statement: 'balance',
|
||||||
|
surfaceKey: 'investment_property',
|
||||||
|
fiscalAiLabels: [
|
||||||
|
'Investment Property',
|
||||||
|
'Investment Properties',
|
||||||
|
'Real Estate Investment Property, Net',
|
||||||
|
'Real Estate Investment Property, at Cost',
|
||||||
|
'Total real estate held for investment, at cost'
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{ statement: 'balance', surfaceKey: 'total_assets', fiscalAiLabels: ['Total Assets'] },
|
||||||
|
{ statement: 'cash_flow', surfaceKey: 'operating_cash_flow', fiscalAiLabels: ['Cash from Operating Activities', 'Operating Cash Flow', 'Net Cash from Operations', 'Net Cash Provided by Operating'] },
|
||||||
|
{ statement: 'cash_flow', surfaceKey: 'capital_expenditures', fiscalAiLabels: ['Capital Expenditures', 'Capital Expenditure'] },
|
||||||
|
{ statement: 'cash_flow', surfaceKey: 'free_cash_flow', fiscalAiLabels: ['Free Cash Flow', 'Levered Free Cash Flow'] }
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -184,6 +221,9 @@ const CASES: CompanyCase[] = [
|
|||||||
];
|
];
|
||||||
|
|
||||||
function parseTickerFilter(argv: string[]) {
|
function parseTickerFilter(argv: string[]) {
|
||||||
|
let ticker: string | null = null;
|
||||||
|
let statement: Extract<FinancialStatementKind, 'income' | 'balance' | 'cash_flow'> | null = null;
|
||||||
|
|
||||||
for (const arg of argv) {
|
for (const arg of argv) {
|
||||||
if (arg === '--help' || arg === '-h') {
|
if (arg === '--help' || arg === '-h') {
|
||||||
console.log('Compare live Fiscal.ai standardized statement rows against local sidecar output.');
|
console.log('Compare live Fiscal.ai standardized statement rows against local sidecar output.');
|
||||||
@@ -191,16 +231,26 @@ function parseTickerFilter(argv: string[]) {
|
|||||||
console.log('Usage:');
|
console.log('Usage:');
|
||||||
console.log(' bun run scripts/compare-fiscal-ai-statements.ts');
|
console.log(' bun run scripts/compare-fiscal-ai-statements.ts');
|
||||||
console.log(' bun run scripts/compare-fiscal-ai-statements.ts --ticker=MSFT');
|
console.log(' bun run scripts/compare-fiscal-ai-statements.ts --ticker=MSFT');
|
||||||
|
console.log(' bun run scripts/compare-fiscal-ai-statements.ts --statement=balance');
|
||||||
|
console.log(' bun run scripts/compare-fiscal-ai-statements.ts --statement=cash_flow');
|
||||||
process.exit(0);
|
process.exit(0);
|
||||||
}
|
}
|
||||||
|
|
||||||
if (arg.startsWith('--ticker=')) {
|
if (arg.startsWith('--ticker=')) {
|
||||||
const value = arg.slice('--ticker='.length).trim().toUpperCase();
|
const value = arg.slice('--ticker='.length).trim().toUpperCase();
|
||||||
return value.length > 0 ? value : null;
|
ticker = value.length > 0 ? value : null;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (arg.startsWith('--statement=')) {
|
||||||
|
const value = arg.slice('--statement='.length).trim().toLowerCase().replace(/-/g, '_');
|
||||||
|
if (value === 'income' || value === 'balance' || value === 'cash_flow') {
|
||||||
|
statement = value;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
return null;
|
return { ticker, statement };
|
||||||
}
|
}
|
||||||
|
|
||||||
function normalizeLabel(value: string) {
|
function normalizeLabel(value: string) {
|
||||||
@@ -295,10 +345,98 @@ function chooseInstantPeriodId(result: TaxonomyHydrationResult) {
|
|||||||
return instantPeriods[0]?.id ?? null;
|
return instantPeriods[0]?.id ?? null;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function parseColumnLabelPeriodEnd(columnLabel: string) {
|
||||||
|
const match = columnLabel.match(/^([A-Za-z]{3})\s+'?(\d{2,4})$/);
|
||||||
|
if (!match) {
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
|
||||||
|
const [, monthToken, yearToken] = match;
|
||||||
|
const monthMap: Record<string, number> = {
|
||||||
|
jan: 0,
|
||||||
|
feb: 1,
|
||||||
|
mar: 2,
|
||||||
|
apr: 3,
|
||||||
|
may: 4,
|
||||||
|
jun: 5,
|
||||||
|
jul: 6,
|
||||||
|
aug: 7,
|
||||||
|
sep: 8,
|
||||||
|
oct: 9,
|
||||||
|
nov: 10,
|
||||||
|
dec: 11
|
||||||
|
};
|
||||||
|
const month = monthMap[monthToken.toLowerCase()];
|
||||||
|
if (month === undefined) {
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
|
||||||
|
const parsedYear = Number.parseInt(yearToken, 10);
|
||||||
|
if (!Number.isFinite(parsedYear)) {
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
|
||||||
|
const year = yearToken.length === 2 ? 2000 + parsedYear : parsedYear;
|
||||||
|
return { month, year };
|
||||||
|
}
|
||||||
|
|
||||||
|
function choosePeriodIdForColumnLabel(
|
||||||
|
result: TaxonomyHydrationResult,
|
||||||
|
statement: Extract<FinancialStatementKind, 'income' | 'balance' | 'cash_flow'>,
|
||||||
|
columnLabel: string
|
||||||
|
) {
|
||||||
|
const parsed = parseColumnLabelPeriodEnd(columnLabel);
|
||||||
|
if (!parsed) {
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
|
||||||
|
const matchingPeriods = result.periods
|
||||||
|
.filter((period): period is ResultPeriod => {
|
||||||
|
const end = periodEnd(period as ResultPeriod);
|
||||||
|
if (!end) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
const endDate = new Date(end);
|
||||||
|
if (Number.isNaN(endDate.getTime())) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
const periodMatchesStatement = statement === 'balance'
|
||||||
|
? !periodStart(period as ResultPeriod)
|
||||||
|
: Boolean(periodStart(period as ResultPeriod));
|
||||||
|
if (!periodMatchesStatement) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
return endDate.getUTCFullYear() === parsed.year && endDate.getUTCMonth() === parsed.month;
|
||||||
|
})
|
||||||
|
.sort((left, right) => {
|
||||||
|
if (statement !== 'balance') {
|
||||||
|
const leftStart = periodStart(left);
|
||||||
|
const rightStart = periodStart(right);
|
||||||
|
const leftDuration = leftStart
|
||||||
|
? Math.round((Date.parse(periodEnd(left) as string) - Date.parse(leftStart)) / (1000 * 60 * 60 * 24))
|
||||||
|
: -1;
|
||||||
|
const rightDuration = rightStart
|
||||||
|
? Math.round((Date.parse(periodEnd(right) as string) - Date.parse(rightStart)) / (1000 * 60 * 60 * 24))
|
||||||
|
: -1;
|
||||||
|
|
||||||
|
if (leftDuration !== rightDuration) {
|
||||||
|
return rightDuration - leftDuration;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return Date.parse(periodEnd(right) as string) - Date.parse(periodEnd(left) as string);
|
||||||
|
});
|
||||||
|
|
||||||
|
return matchingPeriods[0]?.id ?? null;
|
||||||
|
}
|
||||||
|
|
||||||
function findSurfaceValue(
|
function findSurfaceValue(
|
||||||
result: TaxonomyHydrationResult,
|
result: TaxonomyHydrationResult,
|
||||||
statement: Extract<FinancialStatementKind, 'income' | 'balance'>,
|
statement: Extract<FinancialStatementKind, 'income' | 'balance' | 'cash_flow'>,
|
||||||
surfaceKey: string
|
surfaceKey: string,
|
||||||
|
referenceColumnLabel?: string
|
||||||
) {
|
) {
|
||||||
const rows = result.surface_rows[statement] ?? [];
|
const rows = result.surface_rows[statement] ?? [];
|
||||||
const row = rows.find((entry) => entry.key === surfaceKey) ?? null;
|
const row = rows.find((entry) => entry.key === surfaceKey) ?? null;
|
||||||
@@ -306,9 +444,11 @@ function findSurfaceValue(
|
|||||||
return { row: null, value: null };
|
return { row: null, value: null };
|
||||||
}
|
}
|
||||||
|
|
||||||
const periodId = statement === 'balance'
|
const periodId = (referenceColumnLabel
|
||||||
|
? choosePeriodIdForColumnLabel(result, statement, referenceColumnLabel)
|
||||||
|
: null) ?? (statement === 'balance'
|
||||||
? chooseInstantPeriodId(result)
|
? chooseInstantPeriodId(result)
|
||||||
: chooseDurationPeriodId(result);
|
: chooseDurationPeriodId(result));
|
||||||
|
|
||||||
if (periodId) {
|
if (periodId) {
|
||||||
const directValue = row.values[periodId];
|
const directValue = row.values[periodId];
|
||||||
@@ -412,14 +552,24 @@ async function fetchLatestAnnualFiling(company: CompanyCase): Promise<TaxonomyHy
|
|||||||
async function scrapeFiscalAiTable(
|
async function scrapeFiscalAiTable(
|
||||||
page: import('@playwright/test').Page,
|
page: import('@playwright/test').Page,
|
||||||
exchangeTicker: string,
|
exchangeTicker: string,
|
||||||
statement: 'income' | 'balance'
|
statement: 'income' | 'balance' | 'cash_flow'
|
||||||
): Promise<FiscalAiTable> {
|
): Promise<FiscalAiTable> {
|
||||||
const pagePath = statement === 'income' ? 'income-statement' : 'balance-sheet';
|
const pagePath = statement === 'income'
|
||||||
|
? 'income-statement'
|
||||||
|
: statement === 'balance'
|
||||||
|
? 'balance-sheet'
|
||||||
|
: 'cash-flow-statement';
|
||||||
const url = `https://fiscal.ai/company/${exchangeTicker}/financials/${pagePath}/annual/?templateType=standardized`;
|
const url = `https://fiscal.ai/company/${exchangeTicker}/financials/${pagePath}/annual/?templateType=standardized`;
|
||||||
|
|
||||||
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 120_000 });
|
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 120_000 });
|
||||||
await page.waitForSelector('table', { timeout: 120_000 });
|
await page.waitForSelector('table', { timeout: 120_000 });
|
||||||
await page.waitForTimeout(2_500);
|
await page.waitForTimeout(2_500);
|
||||||
|
await page.evaluate(async () => {
|
||||||
|
window.scrollTo(0, document.body.scrollHeight);
|
||||||
|
await new Promise((resolve) => setTimeout(resolve, 750));
|
||||||
|
window.scrollTo(0, 0);
|
||||||
|
await new Promise((resolve) => setTimeout(resolve, 250));
|
||||||
|
});
|
||||||
|
|
||||||
return await page.evaluate(() => {
|
return await page.evaluate(() => {
|
||||||
function normalizeLabel(value: string) {
|
function normalizeLabel(value: string) {
|
||||||
@@ -452,45 +602,52 @@ async function scrapeFiscalAiTable(
|
|||||||
return Number.isFinite(parsed) ? (negative ? -Math.abs(parsed) : parsed) : null;
|
return Number.isFinite(parsed) ? (negative ? -Math.abs(parsed) : parsed) : null;
|
||||||
}
|
}
|
||||||
|
|
||||||
const table = document.querySelector('table');
|
const tables = Array.from(document.querySelectorAll('table'));
|
||||||
if (!table) {
|
if (tables.length === 0) {
|
||||||
throw new Error('Fiscal.ai table not found');
|
throw new Error('Fiscal.ai table not found');
|
||||||
}
|
}
|
||||||
|
|
||||||
|
const rowsByLabel = new Map<string, FiscalAiTableRow>();
|
||||||
|
let columnLabel = 'unknown';
|
||||||
|
|
||||||
|
for (const table of tables) {
|
||||||
const headerCells = Array.from(table.querySelectorAll('tr:first-child th, tr:first-child td'))
|
const headerCells = Array.from(table.querySelectorAll('tr:first-child th, tr:first-child td'))
|
||||||
.map((cell) => cell.textContent?.trim() ?? '')
|
.map((cell) => cell.textContent?.trim() ?? '')
|
||||||
.filter((value) => value.length > 0);
|
.filter((value) => value.length > 0);
|
||||||
|
|
||||||
const annualColumnIndex = headerCells.findIndex((value, index) => index > 0 && value !== 'LTM');
|
const annualColumnIndex = headerCells.findIndex((value, index) => index > 0 && value !== 'LTM');
|
||||||
if (annualColumnIndex < 0) {
|
if (annualColumnIndex < 0) {
|
||||||
throw new Error(`Could not locate latest annual column in headers: ${headerCells.join(' | ')}`);
|
continue;
|
||||||
}
|
}
|
||||||
|
|
||||||
const rows = Array.from(table.querySelectorAll('tr'))
|
if (columnLabel === 'unknown') {
|
||||||
.slice(1)
|
columnLabel = headerCells[annualColumnIndex] ?? 'unknown';
|
||||||
.map((row) => {
|
}
|
||||||
|
|
||||||
|
for (const row of Array.from(table.querySelectorAll('tr')).slice(1)) {
|
||||||
const cells = Array.from(row.querySelectorAll('td'));
|
const cells = Array.from(row.querySelectorAll('td'));
|
||||||
if (cells.length <= annualColumnIndex) {
|
if (cells.length <= annualColumnIndex) {
|
||||||
return null;
|
continue;
|
||||||
}
|
}
|
||||||
|
|
||||||
const label = cells[0]?.textContent?.trim() ?? '';
|
const label = cells[0]?.textContent?.trim() ?? '';
|
||||||
const valueText = cells[annualColumnIndex]?.textContent?.trim() ?? '';
|
const valueText = cells[annualColumnIndex]?.textContent?.trim() ?? '';
|
||||||
if (!label) {
|
if (!label) {
|
||||||
return null;
|
continue;
|
||||||
}
|
}
|
||||||
|
|
||||||
return {
|
rowsByLabel.set(label, {
|
||||||
label,
|
label,
|
||||||
normalizedLabel: normalizeLabel(label),
|
normalizedLabel: normalizeLabel(label),
|
||||||
valueText,
|
valueText,
|
||||||
value: parseDisplayedNumber(valueText)
|
value: parseDisplayedNumber(valueText)
|
||||||
};
|
});
|
||||||
})
|
}
|
||||||
.filter((entry): entry is FiscalAiTableRow => entry !== null);
|
}
|
||||||
|
|
||||||
|
const rows = Array.from(rowsByLabel.values());
|
||||||
|
|
||||||
return {
|
return {
|
||||||
columnLabel: headerCells[annualColumnIndex] ?? 'unknown',
|
columnLabel,
|
||||||
rows
|
rows
|
||||||
};
|
};
|
||||||
});
|
});
|
||||||
@@ -536,7 +693,7 @@ function compareRow(
|
|||||||
): ComparisonRow {
|
): ComparisonRow {
|
||||||
const fiscalAiRow = findFiscalAiRow(fiscalAiTable.rows, target.fiscalAiLabels);
|
const fiscalAiRow = findFiscalAiRow(fiscalAiTable.rows, target.fiscalAiLabels);
|
||||||
const fiscalAiValueM = fiscalAiRow?.value ?? null;
|
const fiscalAiValueM = fiscalAiRow?.value ?? null;
|
||||||
const ourSurface = findSurfaceValue(result, target.statement, target.surfaceKey);
|
const ourSurface = findSurfaceValue(result, target.statement, target.surfaceKey, fiscalAiTable.columnLabel);
|
||||||
const ourValueM = roundMillions(ourSurface.value);
|
const ourValueM = roundMillions(ourSurface.value);
|
||||||
const absDiffM = absoluteDiff(ourValueM, fiscalAiValueM);
|
const absDiffM = absoluteDiff(ourValueM, fiscalAiValueM);
|
||||||
const relDiffValue = relativeDiff(ourValueM, fiscalAiValueM);
|
const relDiffValue = relativeDiff(ourValueM, fiscalAiValueM);
|
||||||
@@ -587,17 +744,34 @@ async function compareCase(page: import('@playwright/test').Page, company: Compa
|
|||||||
throw new Error(`${company.ticker} parse_status=${result.parse_status}${result.parse_error ? ` parse_error=${result.parse_error}` : ''}`);
|
throw new Error(`${company.ticker} parse_status=${result.parse_status}${result.parse_error ? ` parse_error=${result.parse_error}` : ''}`);
|
||||||
}
|
}
|
||||||
|
|
||||||
const incomeTable = await scrapeFiscalAiTable(page, company.exchangeTicker, 'income');
|
const statementKinds = new Set(company.comparisons.map((target) => target.statement));
|
||||||
const balanceTable = await scrapeFiscalAiTable(page, company.exchangeTicker, 'balance');
|
const incomeTable = statementKinds.has('income')
|
||||||
|
? await scrapeFiscalAiTable(page, company.exchangeTicker, 'income')
|
||||||
|
: null;
|
||||||
|
const balanceTable = statementKinds.has('balance')
|
||||||
|
? await scrapeFiscalAiTable(page, company.exchangeTicker, 'balance')
|
||||||
|
: null;
|
||||||
|
const cashFlowTable = statementKinds.has('cash_flow')
|
||||||
|
? await scrapeFiscalAiTable(page, company.exchangeTicker, 'cash_flow')
|
||||||
|
: null;
|
||||||
const rows = company.comparisons.map((target) => {
|
const rows = company.comparisons.map((target) => {
|
||||||
const table = target.statement === 'income' ? incomeTable : balanceTable;
|
const table = target.statement === 'income'
|
||||||
|
? incomeTable
|
||||||
|
: target.statement === 'balance'
|
||||||
|
? balanceTable
|
||||||
|
: cashFlowTable;
|
||||||
|
if (!table) {
|
||||||
|
throw new Error(`Missing scraped table for ${target.statement}`);
|
||||||
|
}
|
||||||
return compareRow(target, result, table);
|
return compareRow(target, result, table);
|
||||||
});
|
});
|
||||||
|
|
||||||
const failures = rows.filter((row) => row.status === 'fail' || row.status === 'missing_ours');
|
const failures = rows.filter(
|
||||||
|
(row) => row.status === 'fail' || row.status === 'missing_ours' || row.status === 'missing_reference'
|
||||||
|
);
|
||||||
|
|
||||||
console.log(
|
console.log(
|
||||||
`[compare-fiscal-ai] ${company.ticker} filing=${filing.accessionNumber} fiscal_pack=${result.fiscal_pack ?? 'null'} income_column="${incomeTable.columnLabel}" balance_column="${balanceTable.columnLabel}" pass=${rows.length - failures.length}/${rows.length}`
|
`[compare-fiscal-ai] ${company.ticker} filing=${filing.accessionNumber} fiscal_pack=${result.fiscal_pack ?? 'null'} income_column="${incomeTable?.columnLabel ?? 'n/a'}" balance_column="${balanceTable?.columnLabel ?? 'n/a'}" cash_flow_column="${cashFlowTable?.columnLabel ?? 'n/a'}" pass=${rows.length - failures.length}/${rows.length}`
|
||||||
);
|
);
|
||||||
for (const row of rows) {
|
for (const row of rows) {
|
||||||
console.log(
|
console.log(
|
||||||
@@ -625,18 +799,28 @@ async function compareCase(page: import('@playwright/test').Page, company: Compa
|
|||||||
|
|
||||||
async function main() {
|
async function main() {
|
||||||
process.env.XBRL_ENGINE_TIMEOUT_MS = process.env.XBRL_ENGINE_TIMEOUT_MS ?? '180000';
|
process.env.XBRL_ENGINE_TIMEOUT_MS = process.env.XBRL_ENGINE_TIMEOUT_MS ?? '180000';
|
||||||
const tickerFilter = parseTickerFilter(process.argv.slice(2));
|
const filters = parseTickerFilter(process.argv.slice(2));
|
||||||
const selectedCases = tickerFilter
|
const selectedCases = (filters.ticker
|
||||||
? CASES.filter((entry) => entry.ticker === tickerFilter)
|
? CASES.filter((entry) => entry.ticker === filters.ticker)
|
||||||
: CASES;
|
: CASES
|
||||||
|
)
|
||||||
|
.map((entry) => ({
|
||||||
|
...entry,
|
||||||
|
comparisons: filters.statement
|
||||||
|
? entry.comparisons.filter((target) => target.statement === filters.statement)
|
||||||
|
: entry.comparisons
|
||||||
|
}))
|
||||||
|
.filter((entry) => entry.comparisons.length > 0);
|
||||||
|
|
||||||
if (selectedCases.length === 0) {
|
if (selectedCases.length === 0) {
|
||||||
console.error(`[compare-fiscal-ai] unknown ticker: ${tickerFilter}`);
|
console.error(
|
||||||
|
`[compare-fiscal-ai] no matching cases for ticker=${filters.ticker ?? 'all'} statement=${filters.statement ?? 'all'}`
|
||||||
|
);
|
||||||
process.exitCode = 1;
|
process.exitCode = 1;
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
const browser = await chromium.launch({ headless: false });
|
const browser = await chromium.launch({ headless: true });
|
||||||
const page = await browser.newPage({
|
const page = await browser.newPage({
|
||||||
userAgent: BROWSER_USER_AGENT
|
userAgent: BROWSER_USER_AGENT
|
||||||
});
|
});
|
||||||
|
|||||||
Reference in New Issue
Block a user