Consolidate metric definitions with Rust JSON as single source of truth

- Add core.computed.json with 32 ratio definitions (filing + market derived)
- Add Rust types for ComputedDefinition and ComputationSpec
- Create generate-taxonomy.ts to generate TypeScript from Rust JSON
- Generate lib/generated/ (gitignored) with surfaces, computed, kpis
- Update financial-metrics.ts to use generated definitions
- Add build-time generation via 'bun run generate'
- Add taxonomy architecture documentation

Two-phase ratio computation:
- Filing-derived: margins, returns, per-share, growth (Rust computes)
- Market-derived: valuation ratios (TypeScript computes with price data)

All 32 ratios defined in core.computed.json:
- Margins: gross, operating, ebitda, net, fcf
- Returns: roa, roe, roic, roce
- Financial health: debt_to_equity, net_debt_to_ebitda, cash_to_debt, current_ratio
- Per-share: revenue, fcf, book_value
- Growth: yoy metrics + 3y/5y cagr
- Valuation: market_cap, ev, p/e, p/fcf, p/b, ev/sales, ev/ebitda, ev/fcf
This commit is contained in:
2026-03-15 15:22:51 -04:00
parent ed4420b8db
commit 24aa8e33d4
11 changed files with 1453 additions and 123 deletions

View File

@@ -0,0 +1,292 @@
# Taxonomy Architecture
## Overview
The taxonomy system defines all financial surfaces, computed ratios, and KPIs used throughout the application. The Rust JSON files in `rust/taxonomy/` serve as the **single source of truth** for all financial definitions.
## Data Flow
```
┌─────────────────────────────────────────────────────────────────┐
│ rust/taxonomy/fiscal/v1/ │
│ │
│ core.surface.json - Income/Balance/Cash Flow surfaces │
│ core.computed.json - Ratio definitions │
│ core.kpis.json - Sector-specific KPIs │
│ core.income-bridge.json - Income statement mapping rules │
│ │
│ bank_lender.surface.json - Bank-specific surfaces │
│ insurance.surface.json - Insurance-specific surfaces │
│ reit_real_estate.surface.json - REIT-specific surfaces │
│ broker_asset_manager.surface.json - Asset manager surfaces │
└──────────────────────────┬──────────────────────────────────────┘
┌─────────────────┼─────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌────────────────┐ ┌──────────────┐
│ Rust Sidecar│ │ TS Generator │ │ TypeScript │
│ fiscal-xbrl │ │ scripts/ │ │ Runtime │
│ │ │ generate- │ │ │
│ Parses XBRL │ │ taxonomy.ts │ │ UI/API │
│ Maps to │ │ │ │ │
│ surfaces │ │ Generates TS │ │ Uses generated│
│ Computes │ │ types & consts │ │ definitions │
│ ratios │ │ │ │ │
└──────┬──────┘ └───────┬────────┘ └──────┬───────┘
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ lib/generated│ │
│ │ (gitignored) │ │
│ │ │ │
│ │ surfaces/ │ │
│ │ computed/ │ │
│ │ kpis/ │ │
│ └──────┬───────┘ │
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────┐
│ lib/financial-metrics.ts │
│ │
│ Thin wrapper that: │
│ - Re-exports generated types │
│ - Provides UI-specific types (GraphableFinancialSurface) │
│ - Transforms surfaces to metric definitions │
└─────────────────────────────────────────────────────────────┘
```
## File Structure
```
rust/taxonomy/fiscal/v1/
├── core.surface.json # Core financial surfaces
├── core.computed.json # Ratio definitions (32 ratios)
├── core.income-bridge.json # Income statement XBRL mapping
├── core.kpis.json # Core KPIs (mostly empty)
├── universal_income.surface.json
├── bank_lender.surface.json
├── bank_lender.income-bridge.json
├── bank_lender.kpis.json
├── insurance.surface.json
├── insurance.income-bridge.json
├── insurance.kpis.json
├── reit_real_estate.surface.json
├── reit_real_estate.income-bridge.json
├── reit_real_estate.kpis.json
├── broker_asset_manager.surface.json
├── broker_asset_manager.income-bridge.json
├── broker_asset_manager.kpis.json
└── kpis/
└── *.kpis.json
lib/generated/ # Auto-generated, gitignored
├── index.ts
├── types.ts
├── surfaces/
│ ├── index.ts
│ ├── income.ts
│ ├── balance.ts
│ └── cash_flow.ts
├── computed/
│ ├── index.ts
│ └── core.ts
└── kpis/
├── index.ts
└── *.ts
```
## Surface Definitions
Surfaces represent canonical financial line items. Each surface maps XBRL concepts to a standardized key.
```json
{
"surface_key": "revenue",
"statement": "income",
"label": "Revenue",
"category": "surface",
"order": 10,
"unit": "currency",
"rollup_policy": "direct_or_formula",
"allowed_source_concepts": [
"us-gaap:RevenueFromContractWithCustomerExcludingAssessedTax",
"us-gaap:SalesRevenueNet"
],
"formula_fallback": null
}
```
### Surface Fields
| Field | Type | Description |
|-------|------|-------------|
| `surface_key` | string | Unique identifier (snake_case) |
| `statement` | enum | `income`, `balance`, `cash_flow`, `equity`, `comprehensive_income` |
| `label` | string | Human-readable label |
| `category` | string | Grouping category |
| `order` | number | Display order |
| `unit` | enum | `currency`, `percent`, `ratio`, `shares`, `count` |
| `rollup_policy` | string | How to aggregate: `direct_only`, `direct_or_formula`, `aggregate_children`, `formula_only` |
| `allowed_source_concepts` | string[] | XBRL concepts that map to this surface |
| `formula_fallback` | object | Optional formula when no direct mapping |
## Computed Definitions
Computed definitions describe ratios and derived metrics. They are split into two phases:
### Phase 1: Filing-Derived (Rust computes)
Ratios computable from filing data alone:
- **Margins**: gross_margin, operating_margin, ebitda_margin, net_margin, fcf_margin
- **Returns**: roa, roe, roic, roce
- **Financial Health**: debt_to_equity, net_debt_to_ebitda, cash_to_debt, current_ratio
- **Per-Share**: revenue_per_share, fcf_per_share, book_value_per_share
- **Growth**: revenue_yoy, net_income_yoy, eps_yoy, fcf_yoy, *_cagr
### Phase 2: Market-Derived (TypeScript computes)
Ratios requiring external price data:
- **Valuation**: market_cap, enterprise_value, price_to_earnings, price_to_fcf, price_to_book, ev_to_*
```json
{
"key": "gross_margin",
"label": "Gross Margin",
"category": "margins",
"order": 10,
"unit": "percent",
"computation": {
"type": "ratio",
"numerator": "gross_profit",
"denominator": "revenue"
}
}
```
```json
{
"key": "price_to_earnings",
"label": "Price to Earnings",
"category": "valuation",
"order": 270,
"unit": "ratio",
"computation": {
"type": "simple",
"formula": "price / diluted_eps"
},
"requires_external_data": ["price"]
}
```
### Computation Types
| Type | Fields | Description |
|------|--------|-------------|
| `ratio` | numerator, denominator | Simple division |
| `yoy_growth` | source | Year-over-year percentage change |
| `cagr` | source, years | Compound annual growth rate |
| `per_share` | source, shares_key | Divide by share count |
| `simple` | formula | Custom formula expression |
## Pack Inheritance
Non-core packs inherit balance and cash_flow surfaces from core:
```rust
// taxonomy_loader.rs
if !matches!(pack, FiscalPack::Core) {
// Inherit balance + cash_flow from core
// Override with pack-specific definitions
}
```
This ensures consistency across packs while allowing sector-specific income statements.
## Build Pipeline
```bash
# Generate TypeScript from Rust JSON
bun run generate
# Build Rust sidecar (includes taxonomy)
bun run build:sidecar
# Full build (generates + compiles)
bun run build
```
### package.json Scripts
| Script | Description |
|--------|-------------|
| `generate` | Run taxonomy generator |
| `build:sidecar` | Build Rust binary |
| `build` | Generate + Next.js build |
| `lint` | Generate + TypeScript check |
## Validation
The generator validates:
1. No duplicate surface keys within the same statement
2. All ratio numerators/denominators reference existing surfaces
3. Required fields present on all definitions
4. Valid statement/unit/category values
Run validation:
```bash
bun run generate # Validates during generation
```
## Extending the Taxonomy
### Adding a New Surface
1. Edit `rust/taxonomy/fiscal/v1/core.surface.json`
2. Add surface definition with unique key
3. Run `bun run generate` to regenerate TypeScript
4. Run `bun run build:sidecar` to rebuild Rust
### Adding a New Ratio
1. Edit `rust/taxonomy/fiscal/v1/core.computed.json`
2. Add computed definition with computation spec
3. If market-derived, add `requires_external_data`
4. Run `bun run generate`
### Adding a New Sector Pack
1. Create `rust/taxonomy/fiscal/v1/<pack>.surface.json`
2. Create `rust/taxonomy/fiscal/v1/<pack>.income-bridge.json`
3. Create `rust/taxonomy/fiscal/v1/<pack>.kpis.json` (if needed)
4. Add pack to `PACK_ORDER` in `scripts/generate-taxonomy.ts`
5. Add pack to `FiscalPack` enum in `rust/fiscal-xbrl-core/src/pack_selector.rs`
6. Run `bun run generate && bun run build:sidecar`
## Design Decisions
### Why Rust JSON as Source of Truth?
1. **Single definition**: XBRL mapping and TypeScript use the same definitions
2. **Type safety**: Rust validates JSON at compile time
3. **Performance**: No runtime JSON parsing in TypeScript
4. **Consistency**: Impossible for Rust and TypeScript to drift
### Why Gitignore Generated Files?
1. **Single source of truth**: Forces changes through Rust JSON
2. **No merge conflicts**: Generated code never conflicts
3. **Smaller repo**: No large generated files in history
4. **CI validation**: CI regenerates and validates
### Why Two-Phase Ratio Computation?
1. **Filing-derived ratios**: Can be computed at parse time by Rust
2. **Market-derived ratios**: Require real-time price data
3. **Separation of concerns**: Rust handles XBRL, TypeScript handles market data
4. **Same definitions**: Both phases use the same computation specs