Benchmarks
Measured performance across document formats
Processing Speed
Throughput comparison • Higher is better
FASTER
Memory Usage
RAM consumption • Lower is better
Kreuzberg
Markitdown
Pymupdf4llm
Docling
LIGHTER
Extraction Time
Pure processing time • Lower is better
Kreuzberg
Markitdown
Pymupdf4llm
Docling
FASTER
Extraction Accuracy (TF1)
Extraction accuracy (F1 Score) (TF1) • Higher is better
Kreuzberg
Markitdown
Docling
Pymupdf4llm
MORE ACCURATE
Structural Accuracy (SF1)
Structure preservation (SF1) • Higher is better
Kreuzberg
Docling
Pymupdf4llm
Markitdown
MORE ACCURATE
Success Rate
Success rate • Higher is better
Cold Start
Framework initialization time • Lower is better
Kreuzberg
Pdfminer
Pdfplumber
Pdftotext
Playa-pdf
Pypdf
Tika
Pandoc
Pymupdf4llm
Markitdown
Docling
FASTER
Installation Footprint
Framework installation size • Lower is better
Kreuzberg
Pypdf
Playa-pdf
Pdfminer
Pymupdf4llm
Pdfplumber
Pdftotext
Pandoc
Markitdown
Tika
Docling
SMALLER
CPU Usage
Processing PDF files • Real-time Replay
Live CPU Usage
0.0s / 25.0s
0%
—
0%
0%
0%
Average CPU usage
CPU %
Framework Capabilities
Feature support comparison across frameworks
| Framework | OCR SUPPORT | BATCH PROCESSING | ASYNC SUPPORT |
|---|---|---|---|
| Kreuzberg | |||
| Docling | |||
| Markitdown | |||
| Pandoc | |||
| Pdfminer | |||
| Pdfplumber | |||
| Pdftotext | |||
| Playa-pdf | |||
| Pymupdf4llm | |||
| Pypdf | |||
| Tika |