Skip to main content
Evaluators are specialized nodes that score, grade, or analyze outputs produced by your graph. They’re designed to run after upstream nodes complete, so you can measure quality, detect failures, or generate structured feedback. All evaluators are designed to output two fields:
Score
float
A numeric score representing the quality of the evaluated output. The scale and meaning of the score depend on the specific evaluator used.
Feedback
string
A textual explanation or feedback about the evaluation, providing context for the score. This may include reasons for low scores or suggestions for improvement.

What Evaluators Do

Evaluators help you:
  • Score outputs (e.g., correctness, relevance, tone).
  • Generate structured feedback (rubrics, classifications, labels).
  • Compare versions of a flow or model configuration.
  • Track quality over time as agents evolve.

When to Use an Evaluator

Add an evaluator when you want:
  • Automated quality checks
  • Regression detection
  • A quantitative score to rank outputs
  • A structured audit trail for later review

Supported Evaluators

  • Annotation Evaluator: Use human-provided annotations on previous runs to evaluate current agent behavior.
  • Code Evaluator: Execute custom code to evaluate outputs based on your own logic.
  • LLM Evaluator: Use a language model to assess output quality based on criteria you define.