Appearance
Operations
Day-to-day tasks: debugging stuck runs, managing the database, memory reports, and recovery procedures.
Debug a stuck run
Step 1: Check active runs
bash
vlx statusShows all active runs, their current stage, and time since last event. A run that has not advanced in > 30 minutes is likely stuck (the watchdog marks it needs_review at the 30-minute threshold).
Step 2: Check Telegram
The story's Telegram thread contains every status update. Look for:
- A clarification question that has not been answered.
- An APPROVAL request that has expired.
- An error message from the orchestrator.
Step 3: Get the event trail
bash
# In Telegram:
/history <storyId>Prints the last 30 events for the story's latest run — stage transitions, clarifications asked/answered, approvals, gate failures, PR events.
Step 4: Check stage checkpoints
Per-stage checkpoint manifests are stored locally in the run's worktree:
.vlx/worktrees/<run-id>/.vlx/<storyId>/checkpoints/<stage>.jsonThese are read-only (orchestrator-owned, gitignored). Each manifest records the completed stage, its artifacts with SHA-256 hashes, and the commit SHA. If a manifest exists for a stage, that stage is considered complete.
Step 5: Check DB integrity
bash
vlx db checkRuns PRAGMA integrity_check on state.db. Any output other than ok indicates corruption — see Database recovery below.
Step 6: Apply an operator command
| Symptom | Command |
|---|---|
| Run stuck / no progress | /requeue <storyId> — revive from the stage it died at |
| Need to re-run from a specific stage | /restart <storyId> <stage> |
| Run is irrecoverably broken | /cancel <storyId> — kill the run |
| Plan needs a full redo | /restart <storyId> think |
Database backup
The daemon takes a daily backup at startup. Take a manual snapshot before any risky operation:
bash
vlx db backupBackups are stored in backups/ adjacent to state.db (or $VLX_DB_PATH).
Retention policy:
- Last 14 daily snapshots.
- Last 12 monthly snapshots.
Database restore
Destructive. Stop the daemon first.
bash
# 1. Stop the daemon
sudo systemctl stop vlx
# 2. Inspect available backups
ls -lh backups/
# 3. Check the backup before restoring
VLX_DB_PATH=backups/state-<timestamp>.db vlx db check
# 4. Restore
vlx db restore backups/state-<timestamp>.db --yes
# 5. Restart the daemon
sudo systemctl start vlxThere is no auto-rollback. If the restore makes things worse, restore from an earlier snapshot.
Database archive
Archive event logs for old terminal runs to keep the events table lean:
bash
vlx db archiveMoves event rows for runs in a terminal state (shipped, failed, abandoned, needs_review) older than 90 days to a separate archive file. The archived runs are still queryable via the archive file if needed.
Memory hygiene report
bash
vlx memory reportScans the project memory at <repo>/.vlx/memory/ and reports:
- Stale entries: files not updated in > 90 days (may contain outdated info).
- Large files: files > 10 KB (suggesting accumulated cruft rather than useful learnings).
- Potential duplicates: entries with high content similarity (could be merged).
No files are modified. The report is informational — act on it by editing or deleting the flagged files in a normal PR.
Recovery model
Daemon restarts
On startup, the orchestrator scans runs for active rows. For each active run:
- Checks the per-stage checkpoint manifests in the worktree. A completed stage with a valid manifest is not re-run.
- Runs
reconcileIntents: checks the external system (ADO, SCM) for any side effects that were "intended but unconfirmed" in the event log (i.e., a crash between issuing the ADO update and recording the completion). Re-records or retries idempotently. - Resumes the run from the last incomplete stage.
If a run's worktree is missing, the run is marked needs_review — a missing worktree is itself a signal worth surfacing, not silently re-creating.
Corrupt event log (seq gap)
If (run_id, seq=N) exists but seq=N+1 is missing, the event log has a gap. The run is marked needs_review instead of attempting to resume on incomplete history.
Audit the gap:
bash
# Query events table directly
VLX_DB_PATH=state.db sqlite3 state.db \
"SELECT seq, type FROM events WHERE run_id='<run-id>' ORDER BY seq"needs_review state
Runs marked needs_review are surfaced in vlx status and via a Telegram notification. They do not auto-recover — use /requeue or /restart to resume, or /cancel to abandon.
Auto-escape
Pre-Ship failures land in queue.status = failed. The auto-escape sweep (runs every few minutes) automatically requeues such stories up to 2 times per story before requiring operator action. The cap counts auto_escape_triggered events across all runs for the story. Cancelled runs and stories with an open PR are excluded.
At the cap: run marked needs_review, one Telegram notification.
Worktree management
Per-run git worktrees live at:
<runtime.worktree_root>/<run-id>/Default: .vlx/worktrees/<run-id>/.
Worktrees are created by the orchestrator and destroyed when:
- The run completes successfully (post-Ship cleanup).
- The run is cancelled.
- A
/cancelcommand is issued.
If a worktree is left behind (daemon crash, manual kill), remove it manually:
bash
git worktree remove .vlx/worktrees/<run-id> --force
git branch -D vlx-bot/<storyId> # only if no longer neededDo not delete the worktree while the daemon is running and a run is active — the orchestrator owns the worktree lifecycle.