Appearance
Concepts
Core concepts behind VeloxSaarthi — the pipeline, actors, memory model, trust score, and auto-merge.
The pipeline
Every story runs through the same deterministic seven-stage pipeline. Stage transitions are pure TypeScript — no LLM decides "what next."
Think → Plan → [APPROVAL] → Build → Review → Test → Ship
↓
(async, triggered by PR outcome)
Reflect| Stage | Actor | Human gate | Output |
|---|---|---|---|
| Think | Architect (Think mode) | Only if a clarification question fires | stories/<storyId>/spec.md |
| Plan | Architect (Plan mode) | No | stories/<storyId>/plan.md |
| APPROVAL | — (gate) | Configurable — off by default | Approval decision (skipped when the gate is off) |
| Build | Builder | Only if RequestCredential fires | Code + tests + docs committed on vlx-bot/<storyId> |
| Review | Critic | No | stories/<storyId>/review-findings.json |
| Test | Inspector | No | qa-result.json (+ browser evidence for UI stories) |
| Ship | — (orchestrator) | No | PR opened, ADO story linked, Telegram status posted |
| Reflect | Reflector | No | stories/<storyId>/reflection.json |
The APPROVAL gate is optional and disabled by default (approval_gate.enabled). When off, Plan routes straight to Build for fully unattended runs. When on, the daemon posts the plan to Telegram with Approve/Reject buttons and waits before any code is written. vlx init asks during setup (60-second timeout → off); see Configure — approval_gate.
Intake is the orchestrator's queue-claim step, not a pipeline stage.
Reflect runs asynchronously — it is triggered by the PR-status poller observing merged or abandoned, not by Ship completing. Ground-truth signals (CI results, PR review comments, merge/abandon outcome) are required for valid corrections. Running Reflect immediately after Ship would produce LLM self-assessment on a green build, which stores noise, not learning.
Stage retries and fix-up loops
- Build fix-up: if the Critic finds
majorfindings, the orchestrator routes back to Build for a fix-up pass (capped at 2 retries per attempt). A third failure escalates. - Coverage retry: if the Inspector finds missing test coverage, a separate coverage budget (not consuming the main
attemptNo) sends the Builder back to add the missing tests. - Mockup fix-up: if the Inspector finds the implemented UI breaks a story mockup at
ac_breakingseverity, the Builder gets one fix attempt on its own budget. If the UI still doesn't match after that, the run does not fail — it ships flagged (Telegram warning + ADO comment + a prominent PR-body section) for a human to judge. Cosmetic mockup drift is reported without gating. - Plan re-route: if the Builder emits
plan_inconsistent, the pipeline routes back to Plan (re-entering the APPROVAL gate when enabled). This is the agent adapting to new information — the orchestrator posts a deviation notification to Telegram + ADO.
UI stories: browser-level proof
A story is a UI story when the Architect marks ui: true in spec.md's frontmatter (set for anything a user sees in a browser). For these, the Test stage proves the feature works rather than only running unit gates:
- The harness launches the app under test from the worktree (the
ui_testconfig:start_command+url) and waits for it to respond. - The Inspector authors and runs real Playwright flows per acceptance criterion, with video recording on plus screenshots at each step. A failed flow is a defect — it routes back to Build like any other test failure.
- If the story has mockups attached to the work item, the Inspector compares each against the implemented UI and records a structured verdict (see the mockup fix-up loop above).
- The evidence — videos, screenshots, and mockup side-by-sides — is posted to the story's Telegram topic and attached to the ADO work item. The PR body gains a per-AC verdict table and a UI-flows table.
A UI story with no ui_test config, or one that produces no flow evidence, escalates fail-closed — it never silently ships untested. See Configure — ui_test.
The harness never trusts the Inspector's word
Pass/fail is computed by the orchestrator from structured fields, never from the Inspector's prose verdict. The harness independently verifies that the required gate floor (bun test) is present, that every story acceptance criterion is accounted for in the evidence, and that every cited evidence file actually exists on disk. Any violation escalates fail-closed.
Six mandatory actors
Each pipeline stage is executed by a named actor — a persona defined in actors/<name>.md. The actor file is loaded as the system prompt for that stage's ACP session.
| Actor | Stage | Output |
|---|---|---|
| Architect (Think mode) | Think | spec.md — problem, AC, constraints, open questions |
| Architect (Plan mode) | Plan | plan.md — file list, AC traceability table, docs-impact list, test approach, rollback |
| Builder | Build | Code + tests + docs-site/ updates committed atomically |
| Critic | Review | review-findings.json — structured findings with severity + category |
| Inspector | Test | qa-result.json — gate results, AC coverage, and (UI stories) browser evidence |
| Reflector | Reflect (async) | reflection.json — ground-truth signals + proposed corrections |
Responder is a seventh actor that operates out-of-band: one read-only turn per new PR review comment, producing an in-thread reply. It is not a pipeline stage.
Architect is stage-parameterized
The same actors/architect.md file describes two stage modes. The orchestrator passes the mode in the initial prompt; the actor invokes different gstack skills and produces different artifacts depending on the mode.
Three conditional actors (Phase 5)
These exist as actor files but are invoked only when a story's tags or file-scope triggers fire:
| Actor | Triggered by | Job |
|---|---|---|
| Designer | Story tagged [ui] or UI file paths in scope | UI sketch + interaction notes + design-system compliance |
| Sentinel | Story tagged [security] or auth/crypto surface touched | OWASP top-10 review, secret scan, dependency CVE check |
| Tuner | Story tagged [perf] or hot-path files touched | Benchmark on hot paths, regression detection |
Memory model
VeloxSaarthi uses two independent memory tiers.
Project memory
Per-project learning committed alongside code in the client repository.
<client-repo>/.vlx/memory/
conventions.md naming, layout, gotchas
dependencies.md installed packages + why
<date>-<slug>.md story learnings, one file per entryWritten by actors during story execution. Committed with the story PR — reviewed by the same PR process as code. Flat markdown, human-readable. Vinit (or any team member) can edit directly.
The Architect reads project memory at the start of Think and Plan. The Inspector extracts learnings after QA. The Builder notes dependency decisions after successful installs.
Agent brain
VeloxSaarthi's own behavioral memory. Lives in agent-brain/ in the VeloxSaarthi repo itself.
agent-brain/
CLAUDE.md self-corrections; copied into each run's worktree
memory/corrections.md mistake patterns + ground-truth fixes
CATALOG.md registry of reusable tools (skills + CLIs)
bin/ promoted CLI tools (vlx- prefix)
skills/ promoted gstack-style skillsHow corrections reach the next session. At session-prime time, BrainSync copies agent-brain/CLAUDE.md to <worktree>/.claude/CLAUDE.md. Claude Code loads this file via its normal cwd walk-up. The worktree is disposable, so the copy is cleaned up with it.
How corrections are written. After a story is merged or abandoned, the Reflector produces reflection.json with proposed_corrections[] — each tied to a ground-truth signal (CI failure fixed, review comment incorporated, Inspector veto addressed). The deterministic extractor (src/core/brain/extract-corrections.ts) filters them and opens a [brain] PR. Vinit reviews + merges. No LLM self-assessment without a signal.
Trust score
The trust score is a per-project float (0.0–1.0) that quantifies how reliable the agent has been on this project. It is computed deterministically from the event log:
| Signal | Effect |
|---|---|
Story ships without Critic critical findings | +points |
Story ships without major findings | +points |
| No respin required (PR approved as-is) | +points |
| Respin required | -points |
Critic found critical finding | -points |
| Inspector gate failed | -points |
The score is used as the min_trust_score gate for auto-merge. It accumulates across all stories for the project.
Auto-merge
Auto-merge is off by default and must be explicitly enabled in vlx.yaml.
When enabled, a story PR is auto-merged if all of the following hold:
- The story is tagged
auto-merge-okin ADO. - The project's trust score is ≥
auto_merge.min_trust_score(default: 0.80). - The PR has been approved by a reviewer (
vote_approved) with zero unresolved comments. - The cooldown period has elapsed since the vote (
auto_merge.cooldown_minutes, default: 240 / 4 hours). - No
/vetomerge <storyId>command was received before the cooldown expired.
The auto-merge waits out the cooldown to give reviewers time to /vetomerge if they change their mind after approving. After cooldown, the orchestrator calls the SCM host's merge API deterministically.
WARNING
Auto-merge is a trust-tier feature. Enable it only after you are satisfied with the agent's track record (trust score ≥ 0.80, several clean consecutive stories). The /vetomerge escape hatch gives you a window to block any individual merge.
Pause / resume model
Mid-stage clarifications and credential asks use a deferred-tool-return pattern:
- Agent calls
mcp__vlx__AskUserQuestionormcp__vlx__RequestCredentialvia the MCP bridge. - The bridge posts the question to Telegram and returns immediately with
{ "status": "deferred" }. - Agent emits a one-line acknowledgement and ends its turn.
- Orchestrator persists the session ID and terminates the ACP subprocess.
- On Telegram reply: orchestrator spawns a fresh ACP subprocess, loads the persisted session, and sends the reply as the next prompt.
- Stage continues — no context replay cost.
This design handles intentional pauses (waiting for a human) and unintentional restarts (OS update, daemon deploy) uniformly. A blocked process through hours of human delay is not viable.
Mermaid diagrams note
The docs site uses VitePress's built-in fenced code block rendering. Fenced ```mermaid ``` blocks are rendered as code (not as diagrams) without an additional plugin.
If you want to render a diagram from this documentation, paste the block into the Mermaid Live Editor. The decision not to add vitepress-plugin-mermaid is intentional — it adds mermaid + cytoscape as peer dependencies with known ESM compatibility issues, which is not worth the cost for an internal site with sparse diagram use.
Idempotency
Every external call (ADO status update, Telegram post, branch push, PR creation) carries an idempotency key derived from (run_id, stage, intent_hash). On daemon restart after a crash, reconcileIntents checks the external system's state for each "intended but unconfirmed" side effect and either records completion or retries idempotently. No duplicate posts or double PRs.
Watchdog
A lightweight in-process supervisor runs every 30 seconds:
- Lease extension: extends queue claim leases for
waitingruns so they are not reclaimed while awaiting human input. - Clarification timeout: abandons
waitingruns whose oldest pending clarification exceeds 72 hours (configurable). Posts "abandoned — reassign in ADO" to the story thread, updates ADO status. - Stale-heartbeat detection: marks
activeruns whose latest ACP session heartbeat exceeds 30 minutes asneeds_review. Catches hangs that survive the startup recovery pass.
The watchdog never calls an LLM and never makes routing decisions — it only surfaces problems for operator action.