API Reference
Complete API reference for the distillcore library.
Standalone Chunking
| Function | Description |
|---|---|
chunk() | Split text into chunks using paragraph, sentence, fixed, or LLM strategy |
achunk() | Async version of chunk() |
estimate_tokens() | Estimate token count for a string |
See Chunking for details.
Pipeline Entry Points
| Function | Description |
|---|---|
process_document() | Process a file through the full pipeline |
process_text() | Process raw text through the full pipeline |
extract() | Extract text from a file |
Store | SQLite storage with search |
Async Variants
| Function | Description |
|---|---|
process_document_async() | Async version of process_document |
process_text_async() | Async version of process_text |
process_batch() | Process multiple files concurrently |
process_batch_sync() | Synchronous batch processing |
See Async & Batch for details.
Configuration
| Class | Description |
|---|---|
DistillConfig | Top-level configuration |
ChunkConfig | Chunk size, overlap, and strategy |
EmbeddingConfig | Embedding model selection |
DomainConfig | Domain-specific LLM prompts |
See Configuration for details.
Utilities
| Function | Description |
|---|---|
load_preset(name) | Load a domain preset |
register_extractor(ext) | Register a custom extractor |
compute_coverage(original, derived) | Word-level coverage metric |
find_missing_segments(original, derived) | Find text segments lost during processing |
safe_parse(raw) | Parse JSON with truncation repair fallback |