distillcore
Skip to Content
MCP Server

MCP Server

distillcore ships as a Model Context Protocol  (MCP) server, letting AI assistants like Claude process documents, chunk text, and search your document store directly.

Installation

pip install distillcore[mcp,openai]

For PDF/DOCX/HTML support:

pip install distillcore[mcp,all]

Connecting to Claude Desktop

Add distillcore to your Claude Desktop MCP config at ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{ "mcpServers": { "distillcore": { "command": "distillcore", "env": { "OPENAI_API_KEY": "sk-..." } } } }

Restart Claude Desktop. You should see distillcore’s 8 tools available in the tools menu.

Connecting to Claude Code

Add to your Claude Code settings (.claude/settings.json or project-level):

{ "mcpServers": { "distillcore": { "command": "distillcore", "env": { "OPENAI_API_KEY": "sk-..." } } } }

Or if you installed distillcore in a specific virtual environment:

{ "mcpServers": { "distillcore": { "command": "/path/to/venv/bin/distillcore", "env": { "OPENAI_API_KEY": "sk-..." } } } }

Using with uv

If you manage Python tools with uv, you can run distillcore without a global install:

{ "mcpServers": { "distillcore": { "command": "uv", "args": ["run", "--with", "distillcore[mcp,all]", "distillcore"], "env": { "OPENAI_API_KEY": "sk-..." } } } }

Environment Variables

VariableDescriptionDefault
OPENAI_API_KEYOpenAI API key for LLM stages + embeddingsrequired for LLM features
DISTILLCORE_STOREPath to SQLite store file~/.distillcore/store.db
DISTILLCORE_TENANT_IDTenant ID for multi-user isolationnone
DISTILLCORE_ALLOWED_DIRSColon-separated allowed file pathsunrestricted
DISTILLCORE_EMBEDDING_MODELEmbedding model for searchtext-embedding-3-small

Restricting file access

For security, restrict which directories the server can read:

{ "mcpServers": { "distillcore": { "command": "distillcore", "env": { "OPENAI_API_KEY": "sk-...", "DISTILLCORE_ALLOWED_DIRS": "/Users/me/documents:/tmp/uploads" } } } }

Any distill_file or distill_batch call outside these directories will be rejected.

Tools

distill_file

Process a document file through the full 7-stage pipeline.

ParameterTypeDefaultDescription
file_pathstringrequiredPath to the document file
formatstringauto-detectFormat override (“pdf”, “txt”, etc.)
domainstring"generic"Preset name (“generic” or “legal”)
embedbooleantrueGenerate embeddings
chunk_target_tokensinteger500Target chunk size in tokens
enrichbooleantrueRun LLM enrichment on chunks
storebooleanfalsePersist result for later search

distill_text

Process raw text through the pipeline (skips extraction).

ParameterTypeDefaultDescription
textstringrequiredText content to process
domainstring"generic"Preset name
embedbooleantrueGenerate embeddings
chunk_target_tokensinteger500Target chunk size
enrichbooleantrueRun LLM enrichment
storebooleanfalsePersist result

distill_batch

Process multiple files concurrently.

ParameterTypeDefaultDescription
file_pathsstring[]requiredList of file paths
domainstring"generic"Preset name
embedbooleantrueGenerate embeddings
chunk_target_tokensinteger500Target chunk size
enrichbooleantrueRun LLM enrichment
storebooleanfalsePersist results
max_concurrentinteger5Max concurrent pipelines

Failed files don’t crash the batch — each gets a result with passed=false.

distill_chunks_only

Chunk text without any LLM calls. No API key needed.

ParameterTypeDefaultDescription
textstringrequiredText to chunk
chunk_target_tokensinteger500Target chunk size
overlap_tokensinteger50Token overlap between chunks
min_tokensinteger0Merge chunks below this size
strategystring"paragraph""paragraph", "sentence", or "fixed"

distill_validate

Check coverage between original text and a set of chunks.

ParameterTypeDefaultDescription
original_textstringrequiredSource text
chunk_textsstring[]requiredChunk strings to validate

Returns coverage score (0–1) and any missing segments.

Semantic search across stored documents using cosine similarity.

ParameterTypeDefaultDescription
querystringrequiredNatural language search query
top_kinteger10Number of results
document_typestringnoneFilter by document type

Requires documents stored with store=true and embed=true.

distill_list_documents

List all documents in the store.

ParameterTypeDefaultDescription
document_typestringnoneFilter by type
limitinteger50Max documents to return

distill_get_document

Get full details and chunks for a stored document.

ParameterTypeDefaultDescription
document_idstringrequiredDocument UUID

Example Workflows

Process and search documents

  1. Use distill_file with store=true to process and persist documents
  2. Use distill_search to find relevant chunks across all stored documents
  3. Use distill_get_document to retrieve full context for a specific document

Quick chunking

Use distill_chunks_only when you just need to split text — no API key, no LLM calls, instant results.

Use distill_file with domain="legal" to extract case numbers, attorneys, court orders, and transcript speaker turns automatically.