MCP Server
distillcore ships as a Model Context Protocol (MCP) server, letting AI assistants like Claude process documents, chunk text, and search your document store directly.
Installation
pip install distillcore[mcp,openai]For PDF/DOCX/HTML support:
pip install distillcore[mcp,all]Connecting to Claude Desktop
Add distillcore to your Claude Desktop MCP config at ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"distillcore": {
"command": "distillcore",
"env": {
"OPENAI_API_KEY": "sk-..."
}
}
}
}Restart Claude Desktop. You should see distillcore’s 8 tools available in the tools menu.
Connecting to Claude Code
Add to your Claude Code settings (.claude/settings.json or project-level):
{
"mcpServers": {
"distillcore": {
"command": "distillcore",
"env": {
"OPENAI_API_KEY": "sk-..."
}
}
}
}Or if you installed distillcore in a specific virtual environment:
{
"mcpServers": {
"distillcore": {
"command": "/path/to/venv/bin/distillcore",
"env": {
"OPENAI_API_KEY": "sk-..."
}
}
}
}Using with uv
If you manage Python tools with uv, you can run distillcore without a global install:
{
"mcpServers": {
"distillcore": {
"command": "uv",
"args": ["run", "--with", "distillcore[mcp,all]", "distillcore"],
"env": {
"OPENAI_API_KEY": "sk-..."
}
}
}
}Environment Variables
| Variable | Description | Default |
|---|---|---|
OPENAI_API_KEY | OpenAI API key for LLM stages + embeddings | required for LLM features |
DISTILLCORE_STORE | Path to SQLite store file | ~/.distillcore/store.db |
DISTILLCORE_TENANT_ID | Tenant ID for multi-user isolation | none |
DISTILLCORE_ALLOWED_DIRS | Colon-separated allowed file paths | unrestricted |
DISTILLCORE_EMBEDDING_MODEL | Embedding model for search | text-embedding-3-small |
Restricting file access
For security, restrict which directories the server can read:
{
"mcpServers": {
"distillcore": {
"command": "distillcore",
"env": {
"OPENAI_API_KEY": "sk-...",
"DISTILLCORE_ALLOWED_DIRS": "/Users/me/documents:/tmp/uploads"
}
}
}
}Any distill_file or distill_batch call outside these directories will be rejected.
Tools
distill_file
Process a document file through the full 7-stage pipeline.
| Parameter | Type | Default | Description |
|---|---|---|---|
file_path | string | required | Path to the document file |
format | string | auto-detect | Format override (“pdf”, “txt”, etc.) |
domain | string | "generic" | Preset name (“generic” or “legal”) |
embed | boolean | true | Generate embeddings |
chunk_target_tokens | integer | 500 | Target chunk size in tokens |
enrich | boolean | true | Run LLM enrichment on chunks |
store | boolean | false | Persist result for later search |
distill_text
Process raw text through the pipeline (skips extraction).
| Parameter | Type | Default | Description |
|---|---|---|---|
text | string | required | Text content to process |
domain | string | "generic" | Preset name |
embed | boolean | true | Generate embeddings |
chunk_target_tokens | integer | 500 | Target chunk size |
enrich | boolean | true | Run LLM enrichment |
store | boolean | false | Persist result |
distill_batch
Process multiple files concurrently.
| Parameter | Type | Default | Description |
|---|---|---|---|
file_paths | string[] | required | List of file paths |
domain | string | "generic" | Preset name |
embed | boolean | true | Generate embeddings |
chunk_target_tokens | integer | 500 | Target chunk size |
enrich | boolean | true | Run LLM enrichment |
store | boolean | false | Persist results |
max_concurrent | integer | 5 | Max concurrent pipelines |
Failed files don’t crash the batch — each gets a result with passed=false.
distill_chunks_only
Chunk text without any LLM calls. No API key needed.
| Parameter | Type | Default | Description |
|---|---|---|---|
text | string | required | Text to chunk |
chunk_target_tokens | integer | 500 | Target chunk size |
overlap_tokens | integer | 50 | Token overlap between chunks |
min_tokens | integer | 0 | Merge chunks below this size |
strategy | string | "paragraph" | "paragraph", "sentence", or "fixed" |
distill_validate
Check coverage between original text and a set of chunks.
| Parameter | Type | Default | Description |
|---|---|---|---|
original_text | string | required | Source text |
chunk_texts | string[] | required | Chunk strings to validate |
Returns coverage score (0–1) and any missing segments.
distill_search
Semantic search across stored documents using cosine similarity.
| Parameter | Type | Default | Description |
|---|---|---|---|
query | string | required | Natural language search query |
top_k | integer | 10 | Number of results |
document_type | string | none | Filter by document type |
Requires documents stored with store=true and embed=true.
distill_list_documents
List all documents in the store.
| Parameter | Type | Default | Description |
|---|---|---|---|
document_type | string | none | Filter by type |
limit | integer | 50 | Max documents to return |
distill_get_document
Get full details and chunks for a stored document.
| Parameter | Type | Default | Description |
|---|---|---|---|
document_id | string | required | Document UUID |
Example Workflows
Process and search documents
- Use
distill_filewithstore=trueto process and persist documents - Use
distill_searchto find relevant chunks across all stored documents - Use
distill_get_documentto retrieve full context for a specific document
Quick chunking
Use distill_chunks_only when you just need to split text — no API key, no LLM calls, instant results.
Legal document analysis
Use distill_file with domain="legal" to extract case numbers, attorneys, court orders, and transcript speaker turns automatically.