- add 3Y/5Y/10Y financial history filtering and reorganize normalization details UI - add new fiscal taxonomy surface/income bridge/KPI packs and update Rust taxonomy loading - auto-detect Homebrew SQLite for native `sqlite-vec` in local dev/e2e with docs and env guidance
12 KiB
Taxonomy Architecture
Overview
The taxonomy system defines all financial surfaces, computed ratios, and KPIs used throughout the application. The Rust JSON files in rust/taxonomy/ serve as the single source of truth for all financial definitions.
Data Flow
┌─────────────────────────────────────────────────────────────────┐
│ rust/taxonomy/fiscal/v1/ │
│ │
│ core.surface.json - Income/Balance/Cash Flow surfaces │
│ core.computed.json - Ratio definitions │
│ core.kpis.json - Sector-specific KPIs │
│ core.income-bridge.json - Income statement mapping rules │
│ │
│ *.surface.json - Core plus industry-specific packs │
│ *.income-bridge.json - Pack-specific universal income maps │
│ kpis/*.kpis.json - Pack KPI bundles │
└──────────────────────────┬──────────────────────────────────────┘
│
┌─────────────────┼─────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌────────────────┐ ┌──────────────┐
│ Rust Sidecar│ │ TS Generator │ │ TypeScript │
│ fiscal-xbrl │ │ scripts/ │ │ Runtime │
│ │ │ generate- │ │ │
│ Parses XBRL │ │ taxonomy.ts │ │ UI/API │
│ Maps to │ │ │ │ │
│ surfaces │ │ Generates TS │ │ Uses generated│
│ Computes │ │ types & consts │ │ definitions │
│ ratios │ │ │ │ │
└──────┬──────┘ └───────┬────────┘ └──────┬───────┘
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ lib/generated│ │
│ │ (gitignored) │ │
│ │ │ │
│ │ surfaces/ │ │
│ │ computed/ │ │
│ │ kpis/ │ │
│ └──────┬───────┘ │
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────┐
│ lib/financial-metrics.ts │
│ │
│ Thin wrapper that: │
│ - Re-exports generated types │
│ - Provides UI-specific types (GraphableFinancialSurface) │
│ - Transforms surfaces to metric definitions │
└─────────────────────────────────────────────────────────────┘
File Structure
rust/taxonomy/fiscal/v1/
├── core.surface.json # Core financial surfaces
├── core.computed.json # Ratio definitions (32 ratios)
├── core.income-bridge.json # Income statement XBRL mapping
├── core.kpis.json # Core KPIs (mostly empty)
├── universal_income.surface.json
│
├── bank_lender.surface.json
├── insurance.surface.json
├── reit_real_estate.surface.json
├── broker_asset_manager.surface.json
├── agriculture.surface.json
├── contractors_construction.surface.json
├── contractors_federal_government.surface.json
├── development_stage.surface.json
├── entertainment_*.surface.json
├── extractive_mining.surface.json
├── mortgage_banking.surface.json
├── title_plant.surface.json
├── franchisors.surface.json
├── not_for_profit.surface.json
├── plan_defined_*.surface.json
├── plan_health_welfare.surface.json
├── real_estate_*.surface.json
├── software.surface.json
├── steamship.surface.json
└── kpis/
└── *.kpis.json
lib/generated/ # Auto-generated, gitignored
├── index.ts
├── types.ts
├── surfaces/
│ ├── index.ts
│ ├── income.ts
│ ├── balance.ts
│ └── cash_flow.ts
├── computed/
│ ├── index.ts
│ └── core.ts
└── kpis/
├── index.ts
└── *.ts
Surface Definitions
Surfaces represent canonical financial line items. Each surface maps XBRL concepts to a standardized key. Generated TypeScript statement catalogs are built from the deduped union of core plus unique non-core surfaces, with core definitions winning for shared universal keys.
{
"surface_key": "revenue",
"statement": "income",
"label": "Revenue",
"category": "surface",
"order": 10,
"unit": "currency",
"rollup_policy": "direct_or_formula",
"allowed_source_concepts": [
"us-gaap:RevenueFromContractWithCustomerExcludingAssessedTax",
"us-gaap:SalesRevenueNet"
],
"formula_fallback": null
}
Surface Fields
| Field | Type | Description |
|---|---|---|
surface_key |
string | Unique identifier (snake_case) |
statement |
enum | income, balance, cash_flow, equity, comprehensive_income |
label |
string | Human-readable label |
category |
string | Grouping category |
order |
number | Display order |
unit |
enum | currency, percent, ratio, shares, count |
rollup_policy |
string | How to aggregate: direct_only, direct_or_formula, aggregate_children, formula_only |
allowed_source_concepts |
string[] | XBRL concepts that map to this surface |
formula_fallback |
object | Optional formula when no direct mapping |
Computed Definitions
Computed definitions describe ratios and derived metrics. They are split into two phases:
Phase 1: Filing-Derived (Rust computes)
Ratios computable from filing data alone:
- Margins: gross_margin, operating_margin, ebitda_margin, net_margin, fcf_margin
- Returns: roa, roe, roic, roce
- Financial Health: debt_to_equity, net_debt_to_ebitda, cash_to_debt, current_ratio
- Per-Share: revenue_per_share, fcf_per_share, book_value_per_share
- Growth: revenue_yoy, net_income_yoy, eps_yoy, fcf_yoy, *_cagr
Phase 2: Market-Derived (TypeScript computes)
Ratios requiring external price data:
- Valuation: marketcap, enterprise_value, price_to_earnings, price_to_fcf, price_to_book, ev_to*
{
"key": "gross_margin",
"label": "Gross Margin",
"category": "margins",
"order": 10,
"unit": "percent",
"computation": {
"type": "ratio",
"numerator": "gross_profit",
"denominator": "revenue"
}
}
{
"key": "price_to_earnings",
"label": "Price to Earnings",
"category": "valuation",
"order": 270,
"unit": "ratio",
"computation": {
"type": "simple",
"formula": "price / diluted_eps"
},
"requires_external_data": ["price"]
}
Computation Types
| Type | Fields | Description |
|---|---|---|
ratio |
numerator, denominator | Simple division |
yoy_growth |
source | Year-over-year percentage change |
cagr |
source, years | Compound annual growth rate |
per_share |
source, shares_key | Divide by share count |
simple |
formula | Custom formula expression |
Pack Inheritance
Non-core packs inherit balance and cash_flow surfaces from core:
// taxonomy_loader.rs
if !matches!(pack, FiscalPack::Core) {
// Inherit balance + cash_flow from core
// Override with pack-specific definitions
}
This ensures consistency across packs while allowing sector-specific income statements.
Auto-classification remains conservative. Pack selection uses concept and role scoring, then falls back to core when the top match is weak or ambiguous.
Build Pipeline
# Generate TypeScript from Rust JSON
bun run generate
# Build Rust sidecar (includes taxonomy)
bun run build:sidecar
# Full build (generates + compiles)
bun run build
package.json Scripts
| Script | Description |
|---|---|
generate |
Run taxonomy generator |
build:sidecar |
Build Rust binary |
build |
Generate + Next.js build |
lint |
Generate + TypeScript check |
Validation
The generator validates:
- No duplicate surface keys within the same statement
- All ratio numerators/denominators reference existing surfaces
- Required fields present on all definitions
- Valid statement/unit/category values
Run validation:
bun run generate # Validates during generation
Extending the Taxonomy
Adding a New Surface
- Edit
rust/taxonomy/fiscal/v1/core.surface.json - Add surface definition with unique key
- Run
bun run generateto regenerate TypeScript - Run
bun run build:sidecarto rebuild Rust
Adding a New Ratio
- Edit
rust/taxonomy/fiscal/v1/core.computed.json - Add computed definition with computation spec
- If market-derived, add
requires_external_data - Run
bun run generate
Adding a New Sector Pack
- Create
rust/taxonomy/fiscal/v1/<pack>.surface.json - Create
rust/taxonomy/fiscal/v1/<pack>.income-bridge.json - Create
rust/taxonomy/fiscal/v1/<pack>.kpis.json(if needed) - Add pack to
PACK_ORDERinscripts/generate-taxonomy.ts - Add pack to
FiscalPackenum inrust/fiscal-xbrl-core/src/pack_selector.rs - Run
bun run generate && bun run build:sidecar
Design Decisions
Why Rust JSON as Source of Truth?
- Single definition: XBRL mapping and TypeScript use the same definitions
- Type safety: Rust validates JSON at compile time
- Performance: No runtime JSON parsing in TypeScript
- Consistency: Impossible for Rust and TypeScript to drift
Why Gitignore Generated Files?
- Single source of truth: Forces changes through Rust JSON
- No merge conflicts: Generated code never conflicts
- Smaller repo: No large generated files in history
- CI validation: CI regenerates and validates
Why Two-Phase Ratio Computation?
- Filing-derived ratios: Can be computed at parse time by Rust
- Market-derived ratios: Require real-time price data
- Separation of concerns: Rust handles XBRL, TypeScript handles market data
- Same definitions: Both phases use the same computation specs