325 lines
13 KiB
Markdown
325 lines
13 KiB
Markdown
# Taxonomy Architecture
|
|
|
|
## Overview
|
|
|
|
The taxonomy system defines all financial surfaces, computed ratios, and KPIs used throughout the application. The Rust JSON files in `rust/taxonomy/` serve as the **single source of truth** for all financial definitions.
|
|
|
|
## Data Flow
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ rust/taxonomy/fiscal/v1/ │
|
|
│ │
|
|
│ core.surface.json - Income/Balance/Cash Flow/Equity │
|
|
│ core.computed.json - Ratio definitions │
|
|
│ core.kpis.json - Sector-specific KPIs │
|
|
│ core.income-bridge.json - Income statement mapping rules │
|
|
│ │
|
|
│ *.surface.json - Core plus industry-specific packs │
|
|
│ *.income-bridge.json - Pack-specific universal income maps │
|
|
│ kpis/*.kpis.json - Pack KPI bundles │
|
|
└──────────────────────────┬──────────────────────────────────────┘
|
|
│
|
|
┌─────────────────┼─────────────────┐
|
|
│ │ │
|
|
▼ ▼ ▼
|
|
┌─────────────┐ ┌────────────────┐ ┌──────────────┐
|
|
│ Rust Sidecar│ │ TS Generator │ │ TypeScript │
|
|
│ fiscal-xbrl │ │ scripts/ │ │ Runtime │
|
|
│ │ │ generate- │ │ │
|
|
│ Parses XBRL │ │ taxonomy.ts │ │ UI/API │
|
|
│ Maps to │ │ │ │ │
|
|
│ surfaces │ │ Generates TS │ │ Uses generated│
|
|
│ Computes │ │ types & consts │ │ definitions │
|
|
│ ratios │ │ │ │ │
|
|
└──────┬──────┘ └───────┬────────┘ └──────┬───────┘
|
|
│ │ │
|
|
│ ▼ │
|
|
│ ┌──────────────┐ │
|
|
│ │ lib/generated│ │
|
|
│ │ (gitignored) │ │
|
|
│ │ │ │
|
|
│ │ surfaces/ │ │
|
|
│ │ computed/ │ │
|
|
│ │ kpis/ │ │
|
|
│ └──────┬───────┘ │
|
|
│ │ │
|
|
▼ ▼ ▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ lib/financial-metrics.ts │
|
|
│ │
|
|
│ Thin wrapper that: │
|
|
│ - Re-exports generated types │
|
|
│ - Provides UI-specific types (GraphableFinancialSurface) │
|
|
│ - Transforms surfaces to metric definitions │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
## File Structure
|
|
|
|
```
|
|
rust/taxonomy/fiscal/v1/
|
|
├── core.surface.json # Core financial surfaces
|
|
├── core.computed.json # Ratio definitions (32 ratios)
|
|
├── core.income-bridge.json # Income statement XBRL mapping
|
|
├── core.kpis.json # Core KPIs (mostly empty)
|
|
├── universal_income.surface.json
|
|
│
|
|
├── bank_lender.surface.json
|
|
├── insurance.surface.json
|
|
├── reit_real_estate.surface.json
|
|
├── broker_asset_manager.surface.json
|
|
├── agriculture.surface.json
|
|
├── contractors_construction.surface.json
|
|
├── contractors_federal_government.surface.json
|
|
├── development_stage.surface.json
|
|
├── entertainment_*.surface.json
|
|
├── extractive_mining.surface.json
|
|
├── mortgage_banking.surface.json
|
|
├── title_plant.surface.json
|
|
├── franchisors.surface.json
|
|
├── not_for_profit.surface.json
|
|
├── plan_defined_*.surface.json
|
|
├── plan_health_welfare.surface.json
|
|
├── real_estate_*.surface.json
|
|
├── software.surface.json
|
|
├── steamship.surface.json
|
|
└── kpis/
|
|
└── *.kpis.json
|
|
|
|
lib/generated/ # Auto-generated, gitignored
|
|
├── index.ts
|
|
├── types.ts
|
|
├── surfaces/
|
|
│ ├── index.ts
|
|
│ ├── income.ts
|
|
│ ├── balance.ts
|
|
│ └── cash_flow.ts
|
|
├── computed/
|
|
│ ├── index.ts
|
|
│ └── core.ts
|
|
└── kpis/
|
|
├── index.ts
|
|
└── *.ts
|
|
```
|
|
|
|
## Surface Definitions
|
|
|
|
Surfaces represent canonical financial line items. Each surface maps XBRL concepts to a standardized key.
|
|
Generated TypeScript statement catalogs are built from the deduped union of core plus unique non-core surfaces, with core definitions winning for shared universal keys.
|
|
|
|
```json
|
|
{
|
|
"surface_key": "revenue",
|
|
"statement": "income",
|
|
"label": "Revenue",
|
|
"category": "surface",
|
|
"order": 10,
|
|
"unit": "currency",
|
|
"rollup_policy": "direct_or_formula",
|
|
"allowed_source_concepts": [
|
|
"us-gaap:RevenueFromContractWithCustomerExcludingAssessedTax",
|
|
"us-gaap:SalesRevenueNet"
|
|
],
|
|
"formula_fallback": null
|
|
}
|
|
```
|
|
|
|
### Surface Fields
|
|
|
|
| Field | Type | Description |
|
|
| ------------------------- | -------- | ------------------------------------------------------------------------------------------ |
|
|
| `surface_key` | string | Unique identifier (snake_case) |
|
|
| `statement` | enum | `income`, `balance`, `cash_flow`, `equity`, `comprehensive_income`, `disclosure` |
|
|
| `label` | string | Human-readable label |
|
|
| `category` | string | Grouping category |
|
|
| `order` | number | Display order |
|
|
| `unit` | enum | `currency`, `percent`, `ratio`, `shares`, `count` |
|
|
| `rollup_policy` | string | How to aggregate: `direct_only`, `direct_or_formula`, `aggregate_children`, `formula_only` |
|
|
| `allowed_source_concepts` | string[] | XBRL concepts that map to this surface |
|
|
| `formula_fallback` | object | Optional formula when no direct mapping |
|
|
|
|
## Computed Definitions
|
|
|
|
Computed definitions describe ratios and derived metrics. They are split into two phases:
|
|
|
|
### Phase 1: Filing-Derived (Rust computes)
|
|
|
|
Ratios computable from filing data alone:
|
|
|
|
- **Margins**: gross_margin, operating_margin, ebitda_margin, net_margin, fcf_margin
|
|
- **Returns**: roa, roe, roic, roce
|
|
- **Financial Health**: debt_to_equity, net_debt_to_ebitda, cash_to_debt, current_ratio
|
|
- **Per-Share**: revenue_per_share, fcf_per_share, book_value_per_share
|
|
- **Growth**: revenue_yoy, net_income_yoy, eps_yoy, fcf_yoy, \*\_cagr
|
|
|
|
### Phase 2: Market-Derived (TypeScript computes)
|
|
|
|
Ratios requiring external price data:
|
|
|
|
- **Valuation**: market*cap, enterprise_value, price_to_earnings, price_to_fcf, price_to_book, ev_to*\*
|
|
|
|
```json
|
|
{
|
|
"key": "gross_margin",
|
|
"label": "Gross Margin",
|
|
"category": "margins",
|
|
"order": 10,
|
|
"unit": "percent",
|
|
"computation": {
|
|
"type": "ratio",
|
|
"numerator": "gross_profit",
|
|
"denominator": "revenue"
|
|
}
|
|
}
|
|
```
|
|
|
|
```json
|
|
{
|
|
"key": "price_to_earnings",
|
|
"label": "Price to Earnings",
|
|
"category": "valuation",
|
|
"order": 270,
|
|
"unit": "ratio",
|
|
"computation": {
|
|
"type": "simple",
|
|
"formula": "price / diluted_eps"
|
|
},
|
|
"requires_external_data": ["price"]
|
|
}
|
|
```
|
|
|
|
### Computation Types
|
|
|
|
| Type | Fields | Description |
|
|
| ------------ | ---------------------- | -------------------------------- |
|
|
| `ratio` | numerator, denominator | Simple division |
|
|
| `yoy_growth` | source | Year-over-year percentage change |
|
|
| `cagr` | source, years | Compound annual growth rate |
|
|
| `per_share` | source, shares_key | Divide by share count |
|
|
| `simple` | formula | Custom formula expression |
|
|
|
|
## Pack Inheritance
|
|
|
|
Non-core packs inherit balance and cash_flow surfaces from core:
|
|
|
|
```rust
|
|
// taxonomy_loader.rs
|
|
if !matches!(pack, FiscalPack::Core) {
|
|
// Inherit balance + cash_flow from core
|
|
// Override with pack-specific definitions
|
|
}
|
|
```
|
|
|
|
This ensures consistency across packs while allowing sector-specific income statements.
|
|
|
|
Auto-classification remains conservative. Pack selection uses concept and role scoring, then falls back to `core` when the top match is weak or ambiguous.
|
|
|
|
## Issuer Overlay Automation
|
|
|
|
Issuer overlays now support a runtime, database-backed path in addition to checked-in JSON files. Explicit user ticker submits enqueue filing sync through `POST /api/tickers/ensure`; the sync task hydrates filings with the current overlay revision, generates additive issuer mappings from residual extension concepts, and immediately rehydrates recent filings when a new overlay revision is published.
|
|
|
|
Automation is intentionally conservative:
|
|
|
|
- it only extends existing canonical surfaces
|
|
- it does not synthesize new surfaces
|
|
- it does not auto-delete prior mappings
|
|
|
|
Runtime overlay merge order is:
|
|
|
|
1. pack primary/disclosure
|
|
2. core primary/disclosure
|
|
3. static issuer overlay file
|
|
4. runtime issuer overlay
|
|
|
|
No-role statement admission is taxonomy-aware:
|
|
|
|
- primary statement admission is allowed only when a concept matches a primary statement surface
|
|
- disclosure-only concepts are excluded from surfaced primary statements
|
|
- explicit overlap handling exists for shared balance/equity concepts such as `StockholdersEquity` and `LiabilitiesAndStockholdersEquity`
|
|
|
|
## Build Pipeline
|
|
|
|
```bash
|
|
# Generate TypeScript from Rust JSON
|
|
bun run generate
|
|
|
|
# Build Rust sidecar (includes taxonomy)
|
|
bun run build:sidecar
|
|
|
|
# Full build (generates + compiles)
|
|
bun run build
|
|
```
|
|
|
|
### package.json Scripts
|
|
|
|
| Script | Description |
|
|
| --------------- | --------------------------- |
|
|
| `generate` | Run taxonomy generator |
|
|
| `build:sidecar` | Build Rust binary |
|
|
| `build` | Generate + Next.js build |
|
|
| `lint` | Generate + TypeScript check |
|
|
|
|
## Validation
|
|
|
|
The generator validates:
|
|
|
|
1. No duplicate surface keys within the same statement
|
|
2. All ratio numerators/denominators reference existing surfaces
|
|
3. Required fields present on all definitions
|
|
4. Valid statement/unit/category values
|
|
|
|
Run validation:
|
|
|
|
```bash
|
|
bun run generate # Validates during generation
|
|
```
|
|
|
|
## Extending the Taxonomy
|
|
|
|
### Adding a New Surface
|
|
|
|
1. Edit `rust/taxonomy/fiscal/v1/core.surface.json`
|
|
2. Add surface definition with unique key
|
|
3. Run `bun run generate` to regenerate TypeScript
|
|
4. Run `bun run build:sidecar` to rebuild Rust
|
|
|
|
### Adding a New Ratio
|
|
|
|
1. Edit `rust/taxonomy/fiscal/v1/core.computed.json`
|
|
2. Add computed definition with computation spec
|
|
3. If market-derived, add `requires_external_data`
|
|
4. Run `bun run generate`
|
|
|
|
### Adding a New Sector Pack
|
|
|
|
1. Create `rust/taxonomy/fiscal/v1/<pack>.surface.json`
|
|
2. Create `rust/taxonomy/fiscal/v1/<pack>.income-bridge.json`
|
|
3. Create `rust/taxonomy/fiscal/v1/<pack>.kpis.json` (if needed)
|
|
4. Add pack to `PACK_ORDER` in `scripts/generate-taxonomy.ts`
|
|
5. Add pack to `FiscalPack` enum in `rust/fiscal-xbrl-core/src/pack_selector.rs`
|
|
6. Run `bun run generate && bun run build:sidecar`
|
|
|
|
## Design Decisions
|
|
|
|
### Why Rust JSON as Source of Truth?
|
|
|
|
1. **Single definition**: XBRL mapping and TypeScript use the same definitions
|
|
2. **Type safety**: Rust validates JSON at compile time
|
|
3. **Performance**: No runtime JSON parsing in TypeScript
|
|
4. **Consistency**: Impossible for Rust and TypeScript to drift
|
|
|
|
### Why Gitignore Generated Files?
|
|
|
|
1. **Single source of truth**: Forces changes through Rust JSON
|
|
2. **No merge conflicts**: Generated code never conflicts
|
|
3. **Smaller repo**: No large generated files in history
|
|
4. **CI validation**: CI regenerates and validates
|
|
|
|
### Why Two-Phase Ratio Computation?
|
|
|
|
1. **Filing-derived ratios**: Can be computed at parse time by Rust
|
|
2. **Market-derived ratios**: Require real-time price data
|
|
3. **Separation of concerns**: Rust handles XBRL, TypeScript handles market data
|
|
4. **Same definitions**: Both phases use the same computation specs
|