Concepts

Core concepts behind VeloxSaarthi — the pipeline, actors, memory model, trust score, and auto-merge.

The pipeline

Every story runs through the same deterministic seven-stage pipeline. Stage transitions are pure TypeScript — no LLM decides "what next."

Think → Plan → [APPROVAL] → Build → Review → Test → Ship
                                                       ↓
                                    (async, triggered by PR outcome)
                                                    Reflect

Stage	Actor	Human gate	Output
Think	Architect (Think mode)	Only if a clarification question fires	`stories/<storyId>/spec.md`
Plan	Architect (Plan mode)	No	`stories/<storyId>/plan.md`
APPROVAL	— (gate)	Configurable — off by default	Approval decision (skipped when the gate is off)
Build	Builder	Only if `RequestCredential` fires	Code + tests + docs committed on `vlx-bot/<storyId>`
Review	Critic	No	`stories/<storyId>/review-findings.json`
Test	Inspector	No	`qa-result.json` (+ browser evidence for UI stories)
Ship	— (orchestrator)	No	PR opened, ADO story linked, Telegram status posted
Reflect	Reflector	No	`stories/<storyId>/reflection.json`

The APPROVAL gate is optional and disabled by default (approval_gate.enabled). When off, Plan routes straight to Build for fully unattended runs. When on, the daemon posts the plan to Telegram with Approve/Reject buttons and waits before any code is written. vlx init asks during setup (60-second timeout → off); see Configure — approval_gate.

Intake is the orchestrator's queue-claim step, not a pipeline stage.

Reflect runs asynchronously — it is triggered by the PR-status poller observing merged or abandoned, not by Ship completing. Ground-truth signals (CI results, PR review comments, merge/abandon outcome) are required for valid corrections. Running Reflect immediately after Ship would produce LLM self-assessment on a green build, which stores noise, not learning.

Stage retries and fix-up loops

Build fix-up: if the Critic finds major findings, the orchestrator routes back to Build for a fix-up pass (capped at 2 retries per attempt). A third failure escalates.
Coverage retry: if the Inspector finds missing test coverage, a separate coverage budget (not consuming the main attemptNo) sends the Builder back to add the missing tests.
Mockup fix-up: if the Inspector finds the implemented UI breaks a story mockup at ac_breaking severity, the Builder gets one fix attempt on its own budget. If the UI still doesn't match after that, the run does not fail — it ships flagged (Telegram warning + ADO comment + a prominent PR-body section) for a human to judge. Cosmetic mockup drift is reported without gating.
Plan re-route: if the Builder emits plan_inconsistent, the pipeline routes back to Plan (re-entering the APPROVAL gate when enabled). This is the agent adapting to new information — the orchestrator posts a deviation notification to Telegram + ADO.

UI stories: browser-level proof

A story is a UI story when the Architect marks ui: true in spec.md's frontmatter (set for anything a user sees in a browser). For these, the Test stage proves the feature works rather than only running unit gates:

The harness launches the app under test from the worktree (the ui_test config: start_command + url) and waits for it to respond.
The Inspector authors and runs real Playwright flows per acceptance criterion, with video recording on plus screenshots at each step. A failed flow is a defect — it routes back to Build like any other test failure.
If the story has mockups attached to the work item, the Inspector compares each against the implemented UI and records a structured verdict (see the mockup fix-up loop above).
The evidence — videos, screenshots, and mockup side-by-sides — is posted to the story's Telegram topic and attached to the ADO work item. The PR body gains a per-AC verdict table and a UI-flows table.

A UI story with no ui_test config, or one that produces no flow evidence, escalates fail-closed — it never silently ships untested. See Configure — ui_test.

The harness never trusts the Inspector's word

Pass/fail is computed by the orchestrator from structured fields, never from the Inspector's prose verdict. The harness independently verifies that the required gate floor (bun test) is present, that every story acceptance criterion is accounted for in the evidence, and that every cited evidence file actually exists on disk. Any violation escalates fail-closed.

Six mandatory actors

Each pipeline stage is executed by a named actor — a persona defined in actors/<name>.md. The actor file is loaded as the system prompt for that stage's ACP session.

Actor	Stage	Output
Architect (Think mode)	Think	`spec.md` — problem, AC, constraints, open questions
Architect (Plan mode)	Plan	`plan.md` — file list, AC traceability table, docs-impact list, test approach, rollback
Builder	Build	Code + tests + `docs-site/` updates committed atomically
Critic	Review	`review-findings.json` — structured findings with severity + category
Inspector	Test	`qa-result.json` — gate results, AC coverage, and (UI stories) browser evidence
Reflector	Reflect (async)	`reflection.json` — ground-truth signals + proposed corrections

Responder is a seventh actor that operates out-of-band: one read-only turn per new PR review comment, producing an in-thread reply. It is not a pipeline stage.

Architect is stage-parameterized

The same actors/architect.md file describes two stage modes. The orchestrator passes the mode in the initial prompt; the actor invokes different gstack skills and produces different artifacts depending on the mode.

Three conditional actors (Phase 5)

These exist as actor files but are invoked only when a story's tags or file-scope triggers fire:

Actor	Triggered by	Job
Designer	Story tagged `[ui]` or UI file paths in scope	UI sketch + interaction notes + design-system compliance
Sentinel	Story tagged `[security]` or auth/crypto surface touched	OWASP top-10 review, secret scan, dependency CVE check
Tuner	Story tagged `[perf]` or hot-path files touched	Benchmark on hot paths, regression detection

Memory model

VeloxSaarthi uses two independent memory tiers.

Project memory

Per-project learning committed alongside code in the client repository.

<client-repo>/.vlx/memory/
  conventions.md          naming, layout, gotchas
  dependencies.md         installed packages + why
  <date>-<slug>.md        story learnings, one file per entry

Written by actors during story execution. Committed with the story PR — reviewed by the same PR process as code. Flat markdown, human-readable. Vinit (or any team member) can edit directly.

The Architect reads project memory at the start of Think and Plan. The Inspector extracts learnings after QA. The Builder notes dependency decisions after successful installs.

Agent brain

VeloxSaarthi's own behavioral memory. Lives in agent-brain/ in the VeloxSaarthi repo itself.

agent-brain/
  CLAUDE.md               self-corrections; copied into each run's worktree
  memory/corrections.md   mistake patterns + ground-truth fixes
  CATALOG.md              registry of reusable tools (skills + CLIs)
  bin/                    promoted CLI tools (vlx- prefix)
  skills/                 promoted gstack-style skills

How corrections reach the next session. At session-prime time, BrainSync copies agent-brain/CLAUDE.md to <worktree>/.claude/CLAUDE.md. Claude Code loads this file via its normal cwd walk-up. The worktree is disposable, so the copy is cleaned up with it.

How corrections are written. After a story is merged or abandoned, the Reflector produces reflection.json with proposed_corrections[] — each tied to a ground-truth signal (CI failure fixed, review comment incorporated, Inspector veto addressed). The deterministic extractor (src/core/brain/extract-corrections.ts) filters them and opens a [brain] PR. Vinit reviews + merges. No LLM self-assessment without a signal.

Trust score

The trust score is a per-project float (0.0–1.0) that quantifies how reliable the agent has been on this project. It is computed deterministically from the event log:

Signal	Effect
Story ships without Critic `critical` findings	+points
Story ships without `major` findings	+points
No respin required (PR approved as-is)	+points
Respin required	-points
Critic found `critical` finding	-points
Inspector gate failed	-points

The score is used as the min_trust_score gate for auto-merge. It accumulates across all stories for the project.

Auto-merge

Auto-merge is off by default and must be explicitly enabled in vlx.yaml.

When enabled, a story PR is auto-merged if all of the following hold:

The story is tagged auto-merge-ok in ADO.
The project's trust score is ≥ auto_merge.min_trust_score (default: 0.80).
The PR has been approved by a reviewer (vote_approved) with zero unresolved comments.
The cooldown period has elapsed since the vote (auto_merge.cooldown_minutes, default: 240 / 4 hours).
No /vetomerge <storyId> command was received before the cooldown expired.

The auto-merge waits out the cooldown to give reviewers time to /vetomerge if they change their mind after approving. After cooldown, the orchestrator calls the SCM host's merge API deterministically.

WARNING

Auto-merge is a trust-tier feature. Enable it only after you are satisfied with the agent's track record (trust score ≥ 0.80, several clean consecutive stories). The /vetomerge escape hatch gives you a window to block any individual merge.

Pause / resume model

Mid-stage clarifications and credential asks use a deferred-tool-return pattern:

Agent calls mcp__vlx__AskUserQuestion or mcp__vlx__RequestCredential via the MCP bridge.
The bridge posts the question to Telegram and returns immediately with { "status": "deferred" }.
Agent emits a one-line acknowledgement and ends its turn.
Orchestrator persists the session ID and terminates the ACP subprocess.
On Telegram reply: orchestrator spawns a fresh ACP subprocess, loads the persisted session, and sends the reply as the next prompt.
Stage continues — no context replay cost.

This design handles intentional pauses (waiting for a human) and unintentional restarts (OS update, daemon deploy) uniformly. A blocked process through hours of human delay is not viable.

Mermaid diagrams note

The docs site uses VitePress's built-in fenced code block rendering. Fenced ```mermaid ``` blocks are rendered as code (not as diagrams) without an additional plugin.

If you want to render a diagram from this documentation, paste the block into the Mermaid Live Editor. The decision not to add vitepress-plugin-mermaid is intentional — it adds mermaid + cytoscape as peer dependencies with known ESM compatibility issues, which is not worth the cost for an internal site with sparse diagram use.

Idempotency

Every external call (ADO status update, Telegram post, branch push, PR creation) carries an idempotency key derived from (run_id, stage, intent_hash). On daemon restart after a crash, reconcileIntents checks the external system's state for each "intended but unconfirmed" side effect and either records completion or retries idempotently. No duplicate posts or double PRs.

Watchdog

A lightweight in-process supervisor runs every 30 seconds:

Lease extension: extends queue claim leases for waiting runs so they are not reclaimed while awaiting human input.
Clarification timeout: abandons waiting runs whose oldest pending clarification exceeds 72 hours (configurable). Posts "abandoned — reassign in ADO" to the story thread, updates ADO status.
Stale-heartbeat detection: marks active runs whose latest ACP session heartbeat exceeds 30 minutes as needs_review. Catches hangs that survive the startup recovery pass.

The watchdog never calls an LLM and never makes routing decisions — it only surfaces problems for operator action.

Concepts ​

The pipeline ​

Stage retries and fix-up loops ​

UI stories: browser-level proof ​

The harness never trusts the Inspector's word ​

Six mandatory actors ​

Architect is stage-parameterized ​

Three conditional actors (Phase 5) ​

Memory model ​

Project memory ​

Agent brain ​

Trust score ​

Auto-merge ​

Pause / resume model ​

Mermaid diagrams note ​

Idempotency ​

Watchdog ​