Skip to main content
An Environment is the execution context in which your agent is tested, measured, and improved. Think of it as a controlled sandbox that defines how inputs are generated, how the agent is evaluated, and how performance is tracked over time. An environment lets you run the same agent under consistent conditions so you can understand its behavior, compare changes, and systematically improve quality.

What an Environment Enables

Within an environment, you can run your agent with:
  • Benchmarks & Mockers: Predefined inputs and input distributions (via Benchmark Trigger, Persona or similar nodes) that let you test your agent against known scenarios.
  • Simulation: Mocked interactions using environment nodes to simulate realistic user behavior and stress-test how the agent responds in different situations.
  • Evaluators: Downstream evaluators that analyze and score the agent’s outputs. These produce structured results that make quality measurable and auditable.
  • Optimization: Repeated rollouts in the same environment allow you to:
    • Optimize configurations (prompts, models, hyperparameters)
    • Optimize graph structure (flow, efficiency, latency)
Because all of these run against the same environment definition, results are comparable and meaningful.

Why Environments Matter

Environments turn agent development from trial-and-error into a repeatable, data-driven process. They help you:
  • Understand how your agent behaves under specific conditions
  • Detect regressions when you change prompts, models, or graph structure
  • Quantify improvements with evaluator scores
  • Safely experiment through simulations instead of live users
  • Continuously improve performance through optimization loops
In short, an environment is where benchmarks, evaluators, simulations, and optimization come together—giving you a clear feedback loop to understand and improve your agent’s behavior over time.