Playground Explorer
Mission
Explore how the AI agent behaves on a subject through iterative questioning, inspect trajectory artifacts as they are produced, and propose environment improvements that make better trajectories easier.
The playground is the environment in the observation-action loop where tools and UI are the agent interface:
observe → decide → act → observe ...
Primary execution path (CLI-first)
Use the playground CLI as the default path for all probes:
uv run python -m varro.playground.cli
Optional:
uv run python -m varro.playground.cli --user-id 1 --chat-id 62 --current-url /dashboard/boligpriser-sammenligning
Use commands during exploration:
- •
:status - •
:url <path> - •
:trajectory [turn_idx] - •
:snapshot [url]
Avoid non-CLI trajectory generation unless blocked.
Workflow
- •Start or resume a CLI session.
- •Define a subject and exploration goal.
- •Ask one probe question at a time, like a human investigator.
- •After each turn, inspect:
- •final response,
- •trajectory artifacts (
turn.md,tool_calls/,images/), - •URL/snapshot artifacts when relevant.
- •Record what changed in behavior and why.
- •Continue targeted follow-up probes until root causes are clear.
- •Write or update
findings.md.
Exploration rubric
Do not start from "was the answer correct?" Start from trajectory mechanics:
- •What observation did the agent receive?
- •Why did that observation lead to this action?
- •What missing or ambiguous signal created extra steps or uncertainty?
- •What is the smallest change in tool output, prompt, docs, or UI that would improve the next decision?
Correctness matters when it reveals environment gaps.
Scope boundaries
In scope:
- •Tool output shape and clarity
- •Tool instructions and prompt clarity
- •Documentation discoverability and task-fit guidance
- •URL/state ergonomics
- •Snapshot usability
- •Dashboard as communication surface for agent and user
Out of scope:
- •Broad model capability judgments without environment evidence
Output contract
Always write (or update):
data/trajectory/{user_id}/{chat_id}/findings.md
Each finding must include:
- •
Hypothesis - •
Probe(question + optional URL context) - •
Observed Trajectory Evidence(turn/step/tool refs) - •
Interpretation - •
Proposed Change - •
Expected Trajectory Delta - •
Validation Probe
Template:
# Findings: Chat {chat_id}
## Exploration Goal
{What subject/behavior was explored and why}
## Findings
### {short title}
**Hypothesis**: {what you expected}
**Probe**: {question asked, optional URL}
**Observed Trajectory Evidence**: {turn/step/tool references}
**Interpretation**: {why behavior emerged}
**Proposed Change**: {specific system change}
**Expected Trajectory Delta**: {steps removed / decision clarified}
**Validation Probe**: {next question to verify change}
## Prioritized Actions
1. {highest impact, lowest effort first}
2. {next}
Stopping criteria
Stop when one of the following is true:
- •Root causes are clear and supported by trajectory evidence.
- •Additional probes are not changing conclusions.
- •You have at least one high-impact, testable proposed change with a validation probe.
Relationship to other skill
Use $analyse-trajectory for retrospective audit of completed chats.
Use $playground-explorer for interactive probing and improvement discovery.