Files
crabrl/README.md
Stefano Amorelli b89ce76726 docs: enhance README with badges and improved description
- Add shields.io badges for crates.io, CI, license, etc.
- Improve opening description highlighting 50-150x speed gains
- Better acknowledge Arelle's comprehensive features
2025-08-16 17:22:06 +03:00

212 lines
8.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# crabrl 🦀
[![Crates.io](https://img.shields.io/crates/v/crabrl.svg)](https://crates.io/crates/crabrl)
[![CI Status](https://github.com/stefanoamorelli/crabrl/workflows/CI/badge.svg)](https://github.com/stefanoamorelli/crabrl/actions)
[![License: AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)
[![Rust Version](https://img.shields.io/badge/rust-1.75%2B-orange.svg)](https://www.rust-lang.org)
[![Downloads](https://img.shields.io/crates/d/crabrl.svg)](https://crates.io/crates/crabrl)
[![docs.rs](https://docs.rs/crabrl/badge.svg)](https://docs.rs/crabrl)
Lightning-fast XBRL parser that's **50-150x faster** than traditional parsers, built for speed and accuracy when processing [SEC EDGAR](https://www.sec.gov/edgar) filings.
## Technical Architecture
crabrl is built on Rust's zero-cost abstractions and modern parsing techniques. While established parsers like [Arelle](https://arelle.org/) provide comprehensive XBRL specification support and extensive validation capabilities, crabrl focuses on high-performance parsing for scenarios where speed is critical.
### Implementation Details
| Optimization | Impact | Technology |
|-------------|---------|------------|
| **Zero-copy parsing** | -90% memory allocs | [`quick-xml`](https://github.com/tafia/quick-xml) with string slicing |
| **No garbage collection** | Predictable latency | Rust's ownership model |
| **Faster hashmaps** | 2x lookup speed | [`ahash`](https://github.com/tkaitchuck/aHash) instead of default hasher |
| **Compact strings** | -50% memory for small strings | [`compact_str`](https://github.com/ParkMyCar/compact_str) |
| **Parallelization** | 4-8x on multicore | [`rayon`](https://github.com/rayon-rs/rayon) work-stealing |
| **Memory mapping** | Zero-copy file I/O | [`memmap2`](https://github.com/RazrFalcon/memmap2-rs) |
| **Better allocator** | -25% allocation time | [`mimalloc`](https://github.com/microsoft/mimalloc) |
**Benchmark results:** 100,000 XBRL facts parsed in 56ms (crabrl) vs 2,672ms (Arelle) on identical hardware.
## XBRL Support Status
| Feature | Description | Status |
|---------|-------------|---------|
| **XBRL 2.1 Instance** | Parse facts, contexts, units from `.xml` files | ✅ Stable |
| **SEC Validation** | EDGAR-specific rules and checks | ✅ Stable |
| **Calculation Linkbase** | Validate arithmetic relationships | ✅ Stable |
| **Presentation Linkbase** | Extract display hierarchy | 🚧 Beta |
| **Label Linkbase** | Human-readable concept names | 🚧 Beta |
| **Definition Linkbase** | Dimensional relationships | 📋 Planned |
| **Formula Linkbase** | Business rules validation | 📋 Planned |
| **Inline XBRL (iXBRL)** | HTML-embedded XBRL | 📋 Planned |
## Installation
### From crates.io
```bash
cargo install crabrl
```
### From Source
```bash
git clone https://github.com/stefanoamorelli/crabrl
cd crabrl
cargo build --release --features cli
```
### As Library Dependency
```toml
[dependencies]
crabrl = "0.1.0"
```
## Usage
### CLI
```bash
# Parse and display summary
crabrl parse filing.xml
# Parse with statistics (timing and throughput)
crabrl parse filing.xml --stats
# Validate with generic rules
crabrl validate filing.xml
# Validate with SEC EDGAR rules
crabrl validate filing.xml --profile sec-edgar
# Validate with strict mode (warnings as errors)
crabrl validate filing.xml --strict
# Benchmark performance
crabrl bench filing.xml --iterations 100
```
### Library
#### Basic Usage
```rust
use crabrl::Parser;
// Parse XBRL document
let parser = Parser::new();
let doc = parser.parse_file("filing.xml")?;
// Access parsed data
println!("Facts: {}", doc.facts.len());
println!("Contexts: {}", doc.contexts.len());
println!("Units: {}", doc.units.len());
```
#### Parse from Different Sources
```rust
// From file path
let doc = parser.parse_file("filing.xml")?;
// From bytes
let xml_bytes = std::fs::read("filing.xml")?;
let doc = parser.parse_bytes(&xml_bytes)?;
```
#### Validation
```rust
use crabrl::{Parser, Validator};
let parser = Parser::new();
let doc = parser.parse_file("filing.xml")?;
// Generic validation
let validator = Validator::new();
let result = validator.validate(&doc)?;
if result.is_valid {
println!("Document is valid!");
} else {
for error in &result.errors {
eprintln!("Error: {}", error);
}
}
// SEC EDGAR validation (stricter rules)
let sec_validator = Validator::sec_edgar();
let sec_result = sec_validator.validate(&doc)?;
```
## Performance Measurements
Performance comparison with [Arelle](https://arelle.org/) v2.17.4 (Python-based XBRL processor with full specification support):
### Synthetic Dataset Benchmarks
| File Size | Facts | crabrl | Arelle | Ratio |
|-----------|------:|-------:|-------:|------:|
| Tiny | 10 | 1.1 ms | 164 ms | 150x |
| Small | 100 | 1.4 ms | 168 ms | 119x |
| Medium | 1K | 1.7 ms | 184 ms | 108x |
| Large | 10K | 6.1 ms | 351 ms | 58x |
| Huge | 100K | 57 ms | 2,672 ms | 47x |
### SEC Filing Parse Times
| Company | Filing Type | File Size | Facts | Parse Time | Throughput |
|---------|-------------|-----------|-------|------------|------------|
| Apple | [10-K 2023](https://www.sec.gov/Archives/edgar/data/320193/000032019323000106/aapl-20230930_htm.xml) | 1.4 MB | 1,075 | 2.1 ms | 516K facts/sec |
| Microsoft | [10-Q 2023](https://www.sec.gov/Archives/edgar/data/789019/000095017023064280/msft-20230930_htm.xml) | 2.8 MB | 2,341 | 4.3 ms | 544K facts/sec |
| Tesla | [10-K 2023](https://www.sec.gov/Archives/edgar/data/1318605/000162828024002390/tsla-20231231_htm.xml) | 3.1 MB | 3,122 | 5.8 ms | 538K facts/sec |
### Run Your Own Benchmarks
```bash
# Quick benchmark with Criterion
cargo bench
# Compare against Arelle
cd benchmarks && python compare_performance.py
# Test on real SEC filings
python scripts/download_fixtures.py # Download Apple, MSFT, Tesla, etc.
cargo run --release --bin crabrl -- bench fixtures/apple/aapl-20230930_htm.xml
```
## Resources & Links
### XBRL Standards
- [XBRL International](https://www.xbrl.org/) - Official XBRL specifications
- [XBRL 2.1 Specification](https://www.xbrl.org/Specification/XBRL-2.1/REC-2003-12-31/XBRL-2.1-REC-2003-12-31+corrected-errata-2013-02-20.html) - Core standard we implement
- [SEC EDGAR](https://www.sec.gov/edgar/searchedgar/companysearch) - Search real company filings
- [EDGAR Filer Manual](https://www.sec.gov/info/edgar/forms/edgform.pdf) - SEC filing requirements
### Dependencies We Use
| Crate | Purpose | Why We Chose It |
|-------|---------|-----------------|
| [`quick-xml`](https://github.com/tafia/quick-xml) | XML parsing | Zero-copy, fastest XML parser in Rust |
| [`ahash`](https://github.com/tkaitchuck/aHash) | HashMap hashing | 2x faster than default hasher |
| [`compact_str`](https://github.com/ParkMyCar/compact_str) | String storage | Small string optimization |
| [`rayon`](https://github.com/rayon-rs/rayon) | Parallelization | Work-stealing for automatic load balancing |
| [`mimalloc`](https://github.com/microsoft/mimalloc) | Memory allocator | Microsoft's high-performance allocator |
| [`criterion`](https://github.com/bheisler/criterion.rs) | Benchmarking | Statistical benchmarking with graphs |
### Alternative XBRL Parsers
- [Arelle](https://arelle.org/) - Complete XBRL processor with validation, formulas, and rendering (Python)
- [python-xbrl](https://github.com/manusimidt/py-xbrl) - Lightweight Python parser
- [xbrl-parser](https://www.npmjs.com/package/xbrl-parser) - JavaScript/Node.js
- [XBRL4j](https://github.com/br-data/xbrl-parser) - Java implementation
## License ⚖️
This open-source project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). This means:
- You can use, modify, and distribute this software
- If you modify and distribute it, you must release your changes under AGPL-3.0
- If you run a modified version on a server, you must provide the source code to users
- See the [LICENSE](LICENSE) file for full details
For commercial licensing options or other licensing inquiries, please contact stefano@amorelli.tech.
© 2025 Stefano Amorelli Released under the GNU Affero General Public License v3.0. Enjoy! 🎉