Files
Neon-Desk/docs/architecture/taxonomy.md
francy51 17de3dd72d Add history window controls and expand taxonomy pack support
- add 3Y/5Y/10Y financial history filtering and reorganize normalization details UI
- add new fiscal taxonomy surface/income bridge/KPI packs and update Rust taxonomy loading
- auto-detect Homebrew SQLite for native `sqlite-vec` in local dev/e2e with docs and env guidance
2026-03-18 23:40:28 -04:00

12 KiB

Taxonomy Architecture

Overview

The taxonomy system defines all financial surfaces, computed ratios, and KPIs used throughout the application. The Rust JSON files in rust/taxonomy/ serve as the single source of truth for all financial definitions.

Data Flow

┌─────────────────────────────────────────────────────────────────┐
│                     rust/taxonomy/fiscal/v1/                     │
│                                                                  │
│  core.surface.json        - Income/Balance/Cash Flow surfaces    │
│  core.computed.json       - Ratio definitions                    │
│  core.kpis.json           - Sector-specific KPIs                 │
│  core.income-bridge.json  - Income statement mapping rules       │
│                                                                  │
│  *.surface.json            - Core plus industry-specific packs   │
│  *.income-bridge.json      - Pack-specific universal income maps │
│  kpis/*.kpis.json          - Pack KPI bundles                    │
└──────────────────────────┬──────────────────────────────────────┘
                           │
         ┌─────────────────┼─────────────────┐
         │                 │                 │
         ▼                 ▼                 ▼
┌─────────────┐   ┌────────────────┐   ┌──────────────┐
│ Rust Sidecar│   │ TS Generator   │   │ TypeScript   │
│ fiscal-xbrl │   │ scripts/       │   │ Runtime      │
│             │   │ generate-      │   │              │
│ Parses XBRL │   │ taxonomy.ts    │   │ UI/API       │
│ Maps to     │   │                │   │              │
│ surfaces    │   │ Generates TS   │   │ Uses generated│
│ Computes    │   │ types & consts │   │ definitions  │
│ ratios      │   │                │   │              │
└──────┬──────┘   └───────┬────────┘   └──────┬───────┘
       │                  │                   │
       │                  ▼                   │
       │          ┌──────────────┐            │
       │          │ lib/generated│            │
       │          │ (gitignored) │            │
       │          │              │            │
       │          │ surfaces/    │            │
       │          │ computed/    │            │
       │          │ kpis/        │            │
       │          └──────┬───────┘            │
       │                 │                    │
       ▼                 ▼                    ▼
┌─────────────────────────────────────────────────────────────┐
│                    lib/financial-metrics.ts                  │
│                                                             │
│  Thin wrapper that:                                         │
│  - Re-exports generated types                               │
│  - Provides UI-specific types (GraphableFinancialSurface)   │
│  - Transforms surfaces to metric definitions                │
└─────────────────────────────────────────────────────────────┘

File Structure

rust/taxonomy/fiscal/v1/
├── core.surface.json          # Core financial surfaces
├── core.computed.json         # Ratio definitions (32 ratios)
├── core.income-bridge.json    # Income statement XBRL mapping
├── core.kpis.json             # Core KPIs (mostly empty)
├── universal_income.surface.json
│
├── bank_lender.surface.json
├── insurance.surface.json
├── reit_real_estate.surface.json
├── broker_asset_manager.surface.json
├── agriculture.surface.json
├── contractors_construction.surface.json
├── contractors_federal_government.surface.json
├── development_stage.surface.json
├── entertainment_*.surface.json
├── extractive_mining.surface.json
├── mortgage_banking.surface.json
├── title_plant.surface.json
├── franchisors.surface.json
├── not_for_profit.surface.json
├── plan_defined_*.surface.json
├── plan_health_welfare.surface.json
├── real_estate_*.surface.json
├── software.surface.json
├── steamship.surface.json
└── kpis/
    └── *.kpis.json

lib/generated/                  # Auto-generated, gitignored
├── index.ts
├── types.ts
├── surfaces/
│   ├── index.ts
│   ├── income.ts
│   ├── balance.ts
│   └── cash_flow.ts
├── computed/
│   ├── index.ts
│   └── core.ts
└── kpis/
    ├── index.ts
    └── *.ts

Surface Definitions

Surfaces represent canonical financial line items. Each surface maps XBRL concepts to a standardized key. Generated TypeScript statement catalogs are built from the deduped union of core plus unique non-core surfaces, with core definitions winning for shared universal keys.

{
  "surface_key": "revenue",
  "statement": "income",
  "label": "Revenue",
  "category": "surface",
  "order": 10,
  "unit": "currency",
  "rollup_policy": "direct_or_formula",
  "allowed_source_concepts": [
    "us-gaap:RevenueFromContractWithCustomerExcludingAssessedTax",
    "us-gaap:SalesRevenueNet"
  ],
  "formula_fallback": null
}

Surface Fields

Field Type Description
surface_key string Unique identifier (snake_case)
statement enum income, balance, cash_flow, equity, comprehensive_income
label string Human-readable label
category string Grouping category
order number Display order
unit enum currency, percent, ratio, shares, count
rollup_policy string How to aggregate: direct_only, direct_or_formula, aggregate_children, formula_only
allowed_source_concepts string[] XBRL concepts that map to this surface
formula_fallback object Optional formula when no direct mapping

Computed Definitions

Computed definitions describe ratios and derived metrics. They are split into two phases:

Phase 1: Filing-Derived (Rust computes)

Ratios computable from filing data alone:

  • Margins: gross_margin, operating_margin, ebitda_margin, net_margin, fcf_margin
  • Returns: roa, roe, roic, roce
  • Financial Health: debt_to_equity, net_debt_to_ebitda, cash_to_debt, current_ratio
  • Per-Share: revenue_per_share, fcf_per_share, book_value_per_share
  • Growth: revenue_yoy, net_income_yoy, eps_yoy, fcf_yoy, *_cagr

Phase 2: Market-Derived (TypeScript computes)

Ratios requiring external price data:

  • Valuation: marketcap, enterprise_value, price_to_earnings, price_to_fcf, price_to_book, ev_to*
{
  "key": "gross_margin",
  "label": "Gross Margin",
  "category": "margins",
  "order": 10,
  "unit": "percent",
  "computation": {
    "type": "ratio",
    "numerator": "gross_profit",
    "denominator": "revenue"
  }
}
{
  "key": "price_to_earnings",
  "label": "Price to Earnings",
  "category": "valuation",
  "order": 270,
  "unit": "ratio",
  "computation": {
    "type": "simple",
    "formula": "price / diluted_eps"
  },
  "requires_external_data": ["price"]
}

Computation Types

Type Fields Description
ratio numerator, denominator Simple division
yoy_growth source Year-over-year percentage change
cagr source, years Compound annual growth rate
per_share source, shares_key Divide by share count
simple formula Custom formula expression

Pack Inheritance

Non-core packs inherit balance and cash_flow surfaces from core:

// taxonomy_loader.rs
if !matches!(pack, FiscalPack::Core) {
    // Inherit balance + cash_flow from core
    // Override with pack-specific definitions
}

This ensures consistency across packs while allowing sector-specific income statements.

Auto-classification remains conservative. Pack selection uses concept and role scoring, then falls back to core when the top match is weak or ambiguous.

Build Pipeline

# Generate TypeScript from Rust JSON
bun run generate

# Build Rust sidecar (includes taxonomy)
bun run build:sidecar

# Full build (generates + compiles)
bun run build

package.json Scripts

Script Description
generate Run taxonomy generator
build:sidecar Build Rust binary
build Generate + Next.js build
lint Generate + TypeScript check

Validation

The generator validates:

  1. No duplicate surface keys within the same statement
  2. All ratio numerators/denominators reference existing surfaces
  3. Required fields present on all definitions
  4. Valid statement/unit/category values

Run validation:

bun run generate  # Validates during generation

Extending the Taxonomy

Adding a New Surface

  1. Edit rust/taxonomy/fiscal/v1/core.surface.json
  2. Add surface definition with unique key
  3. Run bun run generate to regenerate TypeScript
  4. Run bun run build:sidecar to rebuild Rust

Adding a New Ratio

  1. Edit rust/taxonomy/fiscal/v1/core.computed.json
  2. Add computed definition with computation spec
  3. If market-derived, add requires_external_data
  4. Run bun run generate

Adding a New Sector Pack

  1. Create rust/taxonomy/fiscal/v1/<pack>.surface.json
  2. Create rust/taxonomy/fiscal/v1/<pack>.income-bridge.json
  3. Create rust/taxonomy/fiscal/v1/<pack>.kpis.json (if needed)
  4. Add pack to PACK_ORDER in scripts/generate-taxonomy.ts
  5. Add pack to FiscalPack enum in rust/fiscal-xbrl-core/src/pack_selector.rs
  6. Run bun run generate && bun run build:sidecar

Design Decisions

Why Rust JSON as Source of Truth?

  1. Single definition: XBRL mapping and TypeScript use the same definitions
  2. Type safety: Rust validates JSON at compile time
  3. Performance: No runtime JSON parsing in TypeScript
  4. Consistency: Impossible for Rust and TypeScript to drift

Why Gitignore Generated Files?

  1. Single source of truth: Forces changes through Rust JSON
  2. No merge conflicts: Generated code never conflicts
  3. Smaller repo: No large generated files in history
  4. CI validation: CI regenerates and validates

Why Two-Phase Ratio Computation?

  1. Filing-derived ratios: Can be computed at parse time by Rust
  2. Market-derived ratios: Require real-time price data
  3. Separation of concerns: Rust handles XBRL, TypeScript handles market data
  4. Same definitions: Both phases use the same computation specs