Skip to content

Run a Story

How to assign a work item to the agent and watch it run end-to-end.


Prerequisites

  • The daemon is running (vlx daemon or systemctl status vlx).
  • vlx.yaml is configured and all required env vars are set.
  • Your Telegram group chat has the bot added and you are in authorized_user_ids.

Assigning a story in ADO

  1. Open the ADO work item you want the agent to handle.
  2. Add the tag vlx to the work item.
  3. Assign the work item to the bot user (the email in ado.bot_assignee, or the PAT owner if bot_assignee is not set).
  4. Set the work item status to Active (or whatever your project's actionable status is — default: Active or New).

The daemon polls ADO on its configured interval and claims the next eligible item.

Do not run the daemon and a manual runbook simultaneously against the same story.

The docs/MANUAL-RUN-RUNBOOK.md file is the pre-daemon path. Running both against the same story creates a race on the worktree and the queue.


What happens next

1. Think stage

The Architect actor reads the work item description, the project memory at <repo>/.vlx/memory/, and produces stories/<storyId>/spec.md — a problem statement, acceptance criteria, constraints, and open questions.

If the spec is ambiguous, the actor posts a clarification question to your Telegram thread. Reply in the thread to continue. The daemon resumes the same ACP session from where it left off.

2. Plan stage

The Architect actor (now in Plan mode) reads the frozen spec and produces stories/<storyId>/plan.md — a file list, AC traceability table (each AC mapped to an implementation file and a test file), docs-impact list, test approach, and rollback notes.

3. APPROVAL gate (optional — off by default)

The plan-approval gate is disabled by default, so a story runs unattended from Plan straight into Build. vlx init asks whether to enable it during setup (60-second timeout → off); you can also set approval_gate.enabled in vlx.yaml. See Configure — approval_gate.

When the gate is enabled, this is the one mandatory human gate before code is written. The daemon posts the plan to your Telegram thread with Approve and Reject buttons. Tap Approve to proceed to Build; tap Reject to escalate — the daemon posts a notification and marks the run needs_review. Once approved, the Builder commits to the file list and AC traceability table in the plan; deviations are routed back to Plan for re-approval.

4. Build stage

The Builder actor implements the plan — writes code, tests, and docs-site updates per the plan's docs-impact section — and commits atomically to branch vlx-bot/<storyId>.

If the Builder needs a credential (e.g., a read-only API token to investigate something), it calls RequestCredential via the MCP bridge, which posts to your Telegram thread. Your reply is delivered to the resumed session. See Security — Credential relay for important caveats.

5. Review stage

The Critic actor reads the diff and emits stories/<storyId>/review-findings.json — a structured list of findings categorised by severity (critical / major / minor / nit).

  • Critical: orchestrator escalates to Telegram.
  • Major: build fix-up loop (up to 2 retries), then escalate if still failing.
  • Minor / nit: proceed to Test.

6. Test stage

The Inspector actor runs bun test, lint, and typechecks inside the worktree via gstack /qa-only (report-only — no source edits). Produces stories/<storyId>/qa-result.json.

  • Pass: proceed to Ship.
  • Fail: route back to Build (capped fix-up loop).
  • Incomplete / infrastructure failure: escalate to Telegram.

If the story is a UI story (the Architect set ui: true in spec.md), the Test stage also proves the UI in a real browser. The harness launches the app under test (your ui_test config), and the Inspector drives it with Playwright — recording video and screenshots per acceptance criterion — and compares the result against any mockups attached to the work item. The videos, screenshots, and mockup side-by-sides are posted to your Telegram thread and attached to the work item; the PR gains a per-AC verdict table and a UI-flows table. A mockup mismatch that breaks an AC sends the Builder back for one fix attempt, then ships flagged for your review. See Concepts — UI stories.

7. Ship stage

The orchestrator (deterministically — no LLM) pushes the branch and opens a PR via the SCM host adapter. The PR description is templated from the Critic's findings and the Inspector's QA results. The ADO work item is linked to the PR.

A Telegram status post is sent: PR opened: #<id> — <title>.


Watching progress in Telegram

Each story gets a dedicated Telegram thread (forum topic) in the configured group. Every stage transition, human gate, and status update posts to that thread.

[think] Spec written — 3 ACs identified.
[plan] Plan ready — tap to approve.
  [Approve] [Reject]
[build] Building… (attempt 1)
[review] Critic: 0 critical, 2 major, 1 minor
[build] Fix-up build — addressing 2 major findings
[review] Critic: clean
[test] Inspector: all gates green — bun test ✓, lint ✓, types ✓
[ship] PR opened: #42 — feat(foo): add bar

The /history <storyId> command shows the last 30 events for a story's latest run.


After the PR is open

Per-comment responses

When a reviewer leaves a comment on the PR, the Responder actor posts a one-line classification in-thread:

  • Question: a factual answer.
  • Change request (agreed): "Agreed — will address in next respin."
  • Change request (disagreed): "Disagree — [reason]. Thread left open for your review."
  • Nit: silently skipped (no automated reply).

The Responder is capped at 3 automated replies per thread.

Auto-respin

After reviewer comments and a 10-minute quiet period (no new comments), the daemon auto-triggers a respin — a new Build → Review → Test pass that addresses all agreed change requests. The respin pushes additional commits to the existing PR branch.

To bypass the debounce and respin immediately:

/respin <storyId>

Operator commands

All commands are Telegram messages sent in any thread the bot can see (or DM the bot).

CommandDescription
/respin <storyId>Respin reviewer comments immediately (bypasses the 10-min debounce).
/requeue <storyId>Revive a failed story from the stage it died at, with a fresh retry budget.
/restart <storyId> <stage>Re-run from an arbitrary pipeline stage (e.g., /restart 4042 plan). Valid stages: think, plan, build, review, test, ship, reflect.
/cancel <storyId>Kill the active run. Marks it failed. Cancelled runs are excluded from auto-escape.
/history <storyId>Last 30 events of the story's latest run.
/vetomerge <storyId>Block an in-progress auto-merge (send before the cooldown expires).

Auto-escape

When a pre-Ship stage fails, the daemon automatically requeues the story up to 2 times before requiring operator intervention. Auto-escape counts are per story across all runs.

At the cap, the run is marked needs_review and you receive a single Telegram notification. Use /requeue or /restart to resume, or /cancel to abandon.


Approving / reviewing the PR

The PR was opened by the agent; code review and merge are still human decisions. Review the diff in ADO (or GitHub) as you normally would. The agent's QA evidence (Critic findings, Inspector results) is in the PR description.

If you want the agent to address your review comments: leave them on the PR, then optionally use /respin <storyId> to trigger the respin immediately rather than waiting for the auto-respin debounce.

Auto-merge

Auto-merge is opt-in and off by default. See Concepts — Auto-merge and Configure — auto_merge.

Internal Veloxcore tool — not a public product.