Triage Agent
The Triage agent assesses incoming documents and configures the processing pipeline.
Tools
| Tool | API | Description |
|---|---|---|
preview_document | extract() | Extract first page for assessment |
list_available_presets | list_presets() | Get available domain presets |
Output: TriageDecision
class TriageDecision(BaseModel):
source_filename: str
page_count: int
detected_format: str
preset: str # "generic" or "legal"
needs_ocr: bool
target_tokens: int # recommended chunk size
overlap_chars: int
chunk_strategy: str # "auto", "paragraph", "sentence", "fixed", "llm"
min_tokens: int # merge chunks below this threshold (0 = disabled)
enable_enrichment: bool
llm_page_window_size: int
reasoning: str # explanation of choicesHow It Works
- Calls
preview_documentto extract the first page and get format/page count/character stats - Calls
list_available_presetsto see what domain presets are available - Analyzes the preview text to determine document type
- Selects a chunking strategy based on content structure:
"paragraph"for documents with clear paragraph breaks (reports, briefs)"sentence"for dense text without paragraph breaks (OCR output, old PDFs)"fixed"for code, logs, or data where consistent window size matters"llm"for high-value documents where semantic coherence is critical (extra LLM call)"auto"(default) when unsure — lets the pipeline decide
- Sets
min_tokensto merge tiny chunks (e.g., 50 for scanned documents or tables) - Returns a
TriageDecisionwith recommended configuration
Example Output
{
"source_filename": "motion_to_dismiss.pdf",
"page_count": 12,
"detected_format": "pdf",
"preset": "legal",
"needs_ocr": false,
"target_tokens": 500,
"overlap_chars": 200,
"chunk_strategy": "paragraph",
"min_tokens": 0,
"enable_enrichment": true,
"llm_page_window_size": 15,
"reasoning": "Document contains case numbers, legal citations, and party names consistent with a legal motion."
}