Storage
The Store class provides SQLite-backed persistence with cosine similarity search and tenant isolation.
Quick Start
from distillcore.storage import Store
from distillcore import process_document
from distillcore.embedding import openai_embedder
# Process and embed
result = process_document(
"report.pdf",
embed=openai_embedder(api_key="sk-..."),
)
# Save
store = Store() # defaults to ~/.distillcore/store.db
doc_id = store.save(result)Constructor
Store(path="~/.distillcore/store.db")Creates or opens the SQLite database. The schema is created automatically.
Methods
save
doc_id = store.save(result, tenant_id=None)Saves a ProcessingResult (document + chunks). Returns the document ID.
get_document
doc = store.get_document(document_id, tenant_id=None)
# Returns dict with document metadata, or Nonelist_documents
docs = store.list_documents(
document_type=None, # filter by type
limit=50,
tenant_id=None,
)get_chunks
chunks = store.get_chunks(document_id, tenant_id=None)
# Returns list of chunk dicts, ordered by chunk_indexsearch
results = store.search(
query_embedding, # list[float]
top_k=10,
document_type=None, # filter by type
document_id=None, # filter by document
tenant_id=None,
)Returns chunks ranked by cosine similarity with a score field (higher = more similar).
delete_document
deleted = store.delete_document(document_id, tenant_id=None)
# Returns True if deleted, False if not found
# Cascades to chunksstats
stats = store.stats()
# {
# "documents": 42,
# "chunks": 1200,
# "chunks_with_embeddings": 1200,
# "searches": 15,
# "document_types": {"report": 30, "motion": 12}
# }close
store.close()Tenant Isolation
Every method accepts an optional tenant_id. When provided, queries are scoped to that tenant:
store.save(result, tenant_id="org-123")
docs = store.list_documents(tenant_id="org-123")
# Only sees documents for org-123Schema
The store uses three tables:
documents— document metadata, type, source filenamechunks— chunk text, embeddings (JSON), enrichment metadatasearch_log— query analytics
All operations are thread-safe via an internal lock.