# Taxonomy Architecture ## Overview The taxonomy system defines all financial surfaces, computed ratios, and KPIs used throughout the application. The Rust JSON files in `rust/taxonomy/` serve as the **single source of truth** for all financial definitions. ## Data Flow ``` ┌─────────────────────────────────────────────────────────────────┐ │ rust/taxonomy/fiscal/v1/ │ │ │ │ core.surface.json - Income/Balance/Cash Flow/Equity │ │ core.computed.json - Ratio definitions │ │ core.kpis.json - Sector-specific KPIs │ │ core.income-bridge.json - Income statement mapping rules │ │ │ │ *.surface.json - Core plus industry-specific packs │ │ *.income-bridge.json - Pack-specific universal income maps │ │ kpis/*.kpis.json - Pack KPI bundles │ └──────────────────────────┬──────────────────────────────────────┘ │ ┌─────────────────┼─────────────────┐ │ │ │ ▼ ▼ ▼ ┌─────────────┐ ┌────────────────┐ ┌──────────────┐ │ Rust Sidecar│ │ TS Generator │ │ TypeScript │ │ fiscal-xbrl │ │ scripts/ │ │ Runtime │ │ │ │ generate- │ │ │ │ Parses XBRL │ │ taxonomy.ts │ │ UI/API │ │ Maps to │ │ │ │ │ │ surfaces │ │ Generates TS │ │ Uses generated│ │ Computes │ │ types & consts │ │ definitions │ │ ratios │ │ │ │ │ └──────┬──────┘ └───────┬────────┘ └──────┬───────┘ │ │ │ │ ▼ │ │ ┌──────────────┐ │ │ │ lib/generated│ │ │ │ (gitignored) │ │ │ │ │ │ │ │ surfaces/ │ │ │ │ computed/ │ │ │ │ kpis/ │ │ │ └──────┬───────┘ │ │ │ │ ▼ ▼ ▼ ┌─────────────────────────────────────────────────────────────┐ │ lib/financial-metrics.ts │ │ │ │ Thin wrapper that: │ │ - Re-exports generated types │ │ - Provides UI-specific types (GraphableFinancialSurface) │ │ - Transforms surfaces to metric definitions │ └─────────────────────────────────────────────────────────────┘ ``` ## File Structure ``` rust/taxonomy/fiscal/v1/ ├── core.surface.json # Core financial surfaces ├── core.computed.json # Ratio definitions (32 ratios) ├── core.income-bridge.json # Income statement XBRL mapping ├── core.kpis.json # Core KPIs (mostly empty) ├── universal_income.surface.json │ ├── bank_lender.surface.json ├── insurance.surface.json ├── reit_real_estate.surface.json ├── broker_asset_manager.surface.json ├── agriculture.surface.json ├── contractors_construction.surface.json ├── contractors_federal_government.surface.json ├── development_stage.surface.json ├── entertainment_*.surface.json ├── extractive_mining.surface.json ├── mortgage_banking.surface.json ├── title_plant.surface.json ├── franchisors.surface.json ├── not_for_profit.surface.json ├── plan_defined_*.surface.json ├── plan_health_welfare.surface.json ├── real_estate_*.surface.json ├── software.surface.json ├── steamship.surface.json └── kpis/ └── *.kpis.json lib/generated/ # Auto-generated, gitignored ├── index.ts ├── types.ts ├── surfaces/ │ ├── index.ts │ ├── income.ts │ ├── balance.ts │ └── cash_flow.ts ├── computed/ │ ├── index.ts │ └── core.ts └── kpis/ ├── index.ts └── *.ts ``` ## Surface Definitions Surfaces represent canonical financial line items. Each surface maps XBRL concepts to a standardized key. Generated TypeScript statement catalogs are built from the deduped union of core plus unique non-core surfaces, with core definitions winning for shared universal keys. ```json { "surface_key": "revenue", "statement": "income", "label": "Revenue", "category": "surface", "order": 10, "unit": "currency", "rollup_policy": "direct_or_formula", "allowed_source_concepts": [ "us-gaap:RevenueFromContractWithCustomerExcludingAssessedTax", "us-gaap:SalesRevenueNet" ], "formula_fallback": null } ``` ### Surface Fields | Field | Type | Description | | ------------------------- | -------- | ------------------------------------------------------------------------------------------ | | `surface_key` | string | Unique identifier (snake_case) | | `statement` | enum | `income`, `balance`, `cash_flow`, `equity`, `comprehensive_income`, `disclosure` | | `label` | string | Human-readable label | | `category` | string | Grouping category | | `order` | number | Display order | | `unit` | enum | `currency`, `percent`, `ratio`, `shares`, `count` | | `rollup_policy` | string | How to aggregate: `direct_only`, `direct_or_formula`, `aggregate_children`, `formula_only` | | `allowed_source_concepts` | string[] | XBRL concepts that map to this surface | | `formula_fallback` | object | Optional formula when no direct mapping | ## Computed Definitions Computed definitions describe ratios and derived metrics. They are split into two phases: ### Phase 1: Filing-Derived (Rust computes) Ratios computable from filing data alone: - **Margins**: gross_margin, operating_margin, ebitda_margin, net_margin, fcf_margin - **Returns**: roa, roe, roic, roce - **Financial Health**: debt_to_equity, net_debt_to_ebitda, cash_to_debt, current_ratio - **Per-Share**: revenue_per_share, fcf_per_share, book_value_per_share - **Growth**: revenue_yoy, net_income_yoy, eps_yoy, fcf_yoy, \*\_cagr ### Phase 2: Market-Derived (TypeScript computes) Ratios requiring external price data: - **Valuation**: market*cap, enterprise_value, price_to_earnings, price_to_fcf, price_to_book, ev_to*\* ```json { "key": "gross_margin", "label": "Gross Margin", "category": "margins", "order": 10, "unit": "percent", "computation": { "type": "ratio", "numerator": "gross_profit", "denominator": "revenue" } } ``` ```json { "key": "price_to_earnings", "label": "Price to Earnings", "category": "valuation", "order": 270, "unit": "ratio", "computation": { "type": "simple", "formula": "price / diluted_eps" }, "requires_external_data": ["price"] } ``` ### Computation Types | Type | Fields | Description | | ------------ | ---------------------- | -------------------------------- | | `ratio` | numerator, denominator | Simple division | | `yoy_growth` | source | Year-over-year percentage change | | `cagr` | source, years | Compound annual growth rate | | `per_share` | source, shares_key | Divide by share count | | `simple` | formula | Custom formula expression | ## Pack Inheritance Non-core packs inherit balance and cash_flow surfaces from core: ```rust // taxonomy_loader.rs if !matches!(pack, FiscalPack::Core) { // Inherit balance + cash_flow from core // Override with pack-specific definitions } ``` This ensures consistency across packs while allowing sector-specific income statements. Auto-classification remains conservative. Pack selection uses concept and role scoring, then falls back to `core` when the top match is weak or ambiguous. ## Issuer Overlay Automation Issuer overlays now support a runtime, database-backed path in addition to checked-in JSON files. Explicit user ticker submits enqueue filing sync through `POST /api/tickers/ensure`; the sync task hydrates filings with the current overlay revision, generates additive issuer mappings from residual extension concepts, and immediately rehydrates recent filings when a new overlay revision is published. Automation is intentionally conservative: - it only extends existing canonical surfaces - it does not synthesize new surfaces - it does not auto-delete prior mappings Runtime overlay merge order is: 1. pack primary/disclosure 2. core primary/disclosure 3. static issuer overlay file 4. runtime issuer overlay No-role statement admission is taxonomy-aware: - primary statement admission is allowed only when a concept matches a primary statement surface - disclosure-only concepts are excluded from surfaced primary statements - explicit overlap handling exists for shared balance/equity concepts such as `StockholdersEquity` and `LiabilitiesAndStockholdersEquity` ## Build Pipeline ```bash # Generate TypeScript from Rust JSON bun run generate # Build Rust sidecar (includes taxonomy) bun run build:sidecar # Full build (generates + compiles) bun run build ``` ### package.json Scripts | Script | Description | | --------------- | --------------------------- | | `generate` | Run taxonomy generator | | `build:sidecar` | Build Rust binary | | `build` | Generate + Next.js build | | `lint` | Generate + TypeScript check | ## Validation The generator validates: 1. No duplicate surface keys within the same statement 2. All ratio numerators/denominators reference existing surfaces 3. Required fields present on all definitions 4. Valid statement/unit/category values Run validation: ```bash bun run generate # Validates during generation ``` ## Extending the Taxonomy ### Adding a New Surface 1. Edit `rust/taxonomy/fiscal/v1/core.surface.json` 2. Add surface definition with unique key 3. Run `bun run generate` to regenerate TypeScript 4. Run `bun run build:sidecar` to rebuild Rust ### Adding a New Ratio 1. Edit `rust/taxonomy/fiscal/v1/core.computed.json` 2. Add computed definition with computation spec 3. If market-derived, add `requires_external_data` 4. Run `bun run generate` ### Adding a New Sector Pack 1. Create `rust/taxonomy/fiscal/v1/.surface.json` 2. Create `rust/taxonomy/fiscal/v1/.income-bridge.json` 3. Create `rust/taxonomy/fiscal/v1/.kpis.json` (if needed) 4. Add pack to `PACK_ORDER` in `scripts/generate-taxonomy.ts` 5. Add pack to `FiscalPack` enum in `rust/fiscal-xbrl-core/src/pack_selector.rs` 6. Run `bun run generate && bun run build:sidecar` ## Design Decisions ### Why Rust JSON as Source of Truth? 1. **Single definition**: XBRL mapping and TypeScript use the same definitions 2. **Type safety**: Rust validates JSON at compile time 3. **Performance**: No runtime JSON parsing in TypeScript 4. **Consistency**: Impossible for Rust and TypeScript to drift ### Why Gitignore Generated Files? 1. **Single source of truth**: Forces changes through Rust JSON 2. **No merge conflicts**: Generated code never conflicts 3. **Smaller repo**: No large generated files in history 4. **CI validation**: CI regenerates and validates ### Why Two-Phase Ratio Computation? 1. **Filing-derived ratios**: Can be computed at parse time by Rust 2. **Market-derived ratios**: Require real-time price data 3. **Separation of concerns**: Rust handles XBRL, TypeScript handles market data 4. **Same definitions**: Both phases use the same computation specs