Operations

Day-to-day tasks: debugging stuck runs, managing the database, memory reports, and recovery procedures.

Debug a stuck run

Step 1: Check active runs

bash

vlx status

Shows all active runs, their current stage, and time since last event. A run that has not advanced in > 30 minutes is likely stuck (the watchdog marks it needs_review at the 30-minute threshold).

Step 2: Check Telegram

The story's Telegram thread contains every status update. Look for:

A clarification question that has not been answered.
An APPROVAL request that has expired.
An error message from the orchestrator.

Step 3: Get the event trail

bash

# In Telegram:
/history <storyId>

Prints the last 30 events for the story's latest run — stage transitions, clarifications asked/answered, approvals, gate failures, PR events.

Step 4: Check stage checkpoints

Per-stage checkpoint manifests are stored locally in the run's worktree:

.vlx/worktrees/<run-id>/.vlx/<storyId>/checkpoints/<stage>.json

These are read-only (orchestrator-owned, gitignored). Each manifest records the completed stage, its artifacts with SHA-256 hashes, and the commit SHA. If a manifest exists for a stage, that stage is considered complete.

Step 5: Check DB integrity

bash

vlx db check

Runs PRAGMA integrity_check on state.db. Any output other than ok indicates corruption — see Database recovery below.

Step 6: Apply an operator command

Symptom	Command
Run stuck / no progress	`/requeue <storyId>` — revive from the stage it died at
Need to re-run from a specific stage	`/restart <storyId> <stage>`
Run is irrecoverably broken	`/cancel <storyId>` — kill the run
Plan needs a full redo	`/restart <storyId> think`

Database backup

The daemon takes a daily backup at startup. Take a manual snapshot before any risky operation:

bash

vlx db backup

Backups are stored in backups/ adjacent to state.db (or $VLX_DB_PATH).

Retention policy:

Last 14 daily snapshots.
Last 12 monthly snapshots.

Database restore

Destructive. Stop the daemon first.

bash

# 1. Stop the daemon
sudo systemctl stop vlx

# 2. Inspect available backups
ls -lh backups/

# 3. Check the backup before restoring
VLX_DB_PATH=backups/state-<timestamp>.db vlx db check

# 4. Restore
vlx db restore backups/state-<timestamp>.db --yes

# 5. Restart the daemon
sudo systemctl start vlx

There is no auto-rollback. If the restore makes things worse, restore from an earlier snapshot.

Database archive

Archive event logs for old terminal runs to keep the events table lean:

bash

vlx db archive

Moves event rows for runs in a terminal state (shipped, failed, abandoned, needs_review) older than 90 days to a separate archive file. The archived runs are still queryable via the archive file if needed.

Memory hygiene report

bash

vlx memory report

Scans the project memory at <repo>/.vlx/memory/ and reports:

Stale entries: files not updated in > 90 days (may contain outdated info).
Large files: files > 10 KB (suggesting accumulated cruft rather than useful learnings).
Potential duplicates: entries with high content similarity (could be merged).

No files are modified. The report is informational — act on it by editing or deleting the flagged files in a normal PR.

Recovery model

Daemon restarts

On startup, the orchestrator scans runs for active rows. For each active run:

Checks the per-stage checkpoint manifests in the worktree. A completed stage with a valid manifest is not re-run.
Runs reconcileIntents: checks the external system (ADO, SCM) for any side effects that were "intended but unconfirmed" in the event log (i.e., a crash between issuing the ADO update and recording the completion). Re-records or retries idempotently.
Resumes the run from the last incomplete stage.

If a run's worktree is missing, the run is marked needs_review — a missing worktree is itself a signal worth surfacing, not silently re-creating.

Corrupt event log (seq gap)

If (run_id, seq=N) exists but seq=N+1 is missing, the event log has a gap. The run is marked needs_review instead of attempting to resume on incomplete history.

Audit the gap:

bash

# Query events table directly
VLX_DB_PATH=state.db sqlite3 state.db \
  "SELECT seq, type FROM events WHERE run_id='<run-id>' ORDER BY seq"

`needs_review` state

Runs marked needs_review are surfaced in vlx status and via a Telegram notification. They do not auto-recover — use /requeue or /restart to resume, or /cancel to abandon.

Auto-escape

Pre-Ship failures land in queue.status = failed. The auto-escape sweep (runs every few minutes) automatically requeues such stories up to 2 times per story before requiring operator action. The cap counts auto_escape_triggered events across all runs for the story. Cancelled runs and stories with an open PR are excluded.

At the cap: run marked needs_review, one Telegram notification.

Worktree management

Per-run git worktrees live at:

<runtime.worktree_root>/<run-id>/

Default: .vlx/worktrees/<run-id>/.

Worktrees are created by the orchestrator and destroyed when:

The run completes successfully (post-Ship cleanup).
The run is cancelled.
A /cancel command is issued.

If a worktree is left behind (daemon crash, manual kill), remove it manually:

bash

git worktree remove .vlx/worktrees/<run-id> --force
git branch -D vlx-bot/<storyId>   # only if no longer needed

Do not delete the worktree while the daemon is running and a run is active — the orchestrator owns the worktree lifecycle.

Operations ​

Debug a stuck run ​

Step 1: Check active runs ​

Step 2: Check Telegram ​

Step 3: Get the event trail ​

Step 4: Check stage checkpoints ​

Step 5: Check DB integrity ​

Step 6: Apply an operator command ​

Database backup ​

Database restore ​

Database archive ​

Memory hygiene report ​

Recovery model ​

Daemon restarts ​

Corrupt event log (seq gap) ​

needs_review state ​

Auto-escape ​

Worktree management ​

Operations

Debug a stuck run

Step 1: Check active runs

Step 2: Check Telegram

Step 3: Get the event trail

Step 4: Check stage checkpoints

Step 5: Check DB integrity

Step 6: Apply an operator command

Database backup

Database restore

Database archive

Memory hygiene report

Recovery model

Daemon restarts

Corrupt event log (seq gap)

`needs_review` state

Auto-escape

Worktree management