Verification and Testing
How to check that a local or self-hosted Clawback deployment is working.
Audience: Operators, evaluators, and contributors validating the current product.
Fastest Verification
Run:
pnpm smoke:public-try
That is the public entrypoint for the main verification flow. It runs the core ingress and review path checks in sequence.
For the browser-level worker-first path that now exists in the console UI, run:
pnpm --filter @clawback/console exec playwright test \
e2e/worker-demo-proof.e2e.ts
To target a hosted demo or deployed site instead of local dev, set
CONSOLE_E2E_BASE_URL and the relevant credentials:
CONSOLE_E2E_BASE_URL=https://demo.clawback.team \
CONSOLE_E2E_EMAIL=evaluator@hartwell.com \
CONSOLE_E2E_PASSWORD=publicdemo1 \
pnpm test:console:demo-evaluator:e2e
CONSOLE_E2E_BASE_URL=https://demo.clawback.team \
CONSOLE_E2E_ADMIN_EMAIL=... \
CONSOLE_E2E_ADMIN_PASSWORD=... \
pnpm test:console:demo-admin:e2e
If the hosted admin login unexpectedly fails while the evaluator login still
works, reapply the demo seed on the host before treating it as a product
regression. The seeded Hartwell admin is dave@hartwell.com, and the normal
seed path restores that account's expected password hash.
There is also a manual GitHub Actions workflow for the same hosted-browser checks:
- workflow:
Hosted Demo Browser Smoke - required secrets:
DEMO_EVALUATOR_EMAIL,DEMO_EVALUATOR_PASSWORD - optional admin secrets:
DEMO_ADMIN_EMAIL,DEMO_ADMIN_PASSWORD
5-Minute Smoke Test
1. Check process health
The control-plane port depends on how the stack was started: ./scripts/start-local.sh uses 3011, while pnpm dev uses 3001.
# start-local.sh (default):
curl -s http://localhost:3011/healthz
curl -s http://localhost:3011/readyz
# pnpm dev:
curl -s http://localhost:3001/healthz
curl -s http://localhost:3001/readyz
Expected:
/healthzreturns200/readyzreturns200once Postgres and PgBoss are available- admin setup reports runtime readiness, including gateway reachability and the expected model-provider key state
2. Seed demo data if you want a realistic workspace
pnpm db:seed
3. Log in as the demo admin
Adjust the port to match your stack (3011 for start-local.sh, 3001 for pnpm dev).
curl -s -c /tmp/clawback-cookies.txt \
-X POST http://localhost:3011/api/auth/login \
-H "Content-Type: application/json" \
-d '{"email":"dave@hartwell.com","password":"demo1234"}'
4. Confirm the main workspace APIs respond
curl -s -b /tmp/clawback-cookies.txt http://localhost:3011/api/workspace/workers
curl -s -b /tmp/clawback-cookies.txt http://localhost:3011/api/workspace/inbox
curl -s -b /tmp/clawback-cookies.txt http://localhost:3011/api/workspace/work
curl -s -b /tmp/clawback-cookies.txt http://localhost:3011/api/workspace/activity
5. Run the full scripted verification
./scripts/public-try-verify.sh
The verifier treats Gmail watched inbox as optional: if the seeded Gmail read-only connection is absent or not connected, that portion is skipped rather than counted as a public-try failure.
For a deployed stack rather than local dev mode, run:
CONTROL_PLANE_URL=https://clawback.example.com \
CLAWBACK_INBOUND_EMAIL_WEBHOOK_TOKEN=... \
./scripts/public-try-verify.sh
The inbound webhook token is required for the forward-email portion of the
public-try path. On a no-SMTP deployment, the verifier also skips approval if
the pending review is send_email; denial still runs so review resolution is
still proven.
Main Flow Scripts
Seed verification
./scripts/verify-seed.sh
Checks that the main workspace resources are present.
Forward email
./scripts/test-forward-email.sh
Verifies:
- inbound Postmark-style webhook handling
- work item creation
- review creation
- idempotency on duplicate delivery
Watched inbox
./scripts/test-watched-inbox.sh
Verifies the Gmail watch-hook path and its idempotency behavior.
Review resolution
./scripts/test-approve-review.sh
./scripts/test-approve-review.sh deny
Verifies approved and denied review flows.
Reviewed send
./scripts/test-smtp-send.sh
This tests the full reviewed-send loop:
- check SMTP relay configuration status via
/smtp-statusendpoint - activate the seeded SMTP connection automatically when env vars are present
- create a review from forwarded email
- approve the specific review tied to that work item
- inspect exact post-approval execution truth for that work item
- assert scoped activity events (
work_item_sent,send_failed,review_approved)
The script provides early feedback on SMTP readiness before attempting the send, and after resolution it checks the activity stream for specific outcome events so the operator knows whether delivery was confirmed, failed, or not yet recorded.
Current behavior:
- approval authorizes the action; delivery depends on the configured transport
- if SMTP is configured and reachable, execution should progress to
completedand awork_item_sentactivity event appears - if SMTP is absent or unreachable, execution progresses to
failedwith an error classification (transient or permanent), asend_failedevent is recorded, and the failure is visible in the UI - failure after approval is recoverable — the UI exposes retry, and retry is safe (attempt counter increments, no double-send)
Retrieval and Connector Verification
These commands cover the current public retrieval checks:
pnpm smoke:connector-sync
pnpm smoke:incident-copilot
pnpm smoke:incident-copilot-action
What they cover:
- local-directory connectors can be created and synced
- retrieval-backed answers can be grounded in synced content
- a governed action can still run on top of that retrieval-backed worker flow
Read Known Limitations and Demo Walkthrough for the current public retrieval story and its limits.
Browser Paths
Worker-first admin path
- Sign in as
dave@hartwell.com/demo1234 - Open
/workspace/setup - Click
Run sample activity - On the worker proof rail, either open the latest inbox/work item or run the sample activity button
- Confirm you land in real
/workspace/inbox,/workspace/work/:id, or/workspace/activitystate
That path shows the current worker-first setup flow is alive in the UI and not just in backend scripts.
Retrieval-first evaluator path
- Open
/workspace/connectors - Confirm the seeded
Incident Copilot Democonnector has a completed sync - Open
/workspace/chat - Use
Incident Copilot - Inspect the resulting review/work state in
/workspace/inboxand/workspace/work
That path shows the no-Google retrieval story still works alongside the worker-first admin path.
Script Reference
| Script | Purpose |
|---|---|
scripts/public-try-verify.sh | Main public verification entrypoint |
scripts/pilot-verify.sh | Compatibility alias for the same verification flow |
scripts/verify-seed.sh | Checks seeded demo data and workspace APIs |
scripts/test-forward-email.sh | Tests the forward-email webhook path |
scripts/test-watched-inbox.sh | Tests the Gmail watch path |
scripts/test-approve-review.sh | Resolves a pending review |
scripts/test-smtp-send.sh | Tests reviewed send and execution truth |
scripts/test-deployed-stack.sh | Boots the supported prod Compose stack from scratch, seeds it, runs the public-try verifier, then tears it down |
scripts/test-migration-proof.sh | Checks the Drizzle migration chain on a fresh Postgres instance |
Post-Deployment Checklist
After deploying with docker-compose.prod.yml, verify:
Platform
-
docker compose psshows healthy containers -
/healthzreturns200 -
/readyzreturns200 - admin setup shows runtime readiness as
ready, or any degraded/blocked state is understood and intentional - the console loads in a browser
- login succeeds
Workspace
- workers are visible on
/workspace/workers - inbox items render on
/workspace/inbox - work items render on
/workspace/work - activity events render on
/workspace/activity
Providers
- forward-email webhook works if configured
- Gmail setup card works if configured
- SMTP relay status is truthful if configured
Hosted browser checks
-
pnpm test:console:demo-evaluator:e2epasses against the public demo URL -
pnpm test:console:demo-admin:e2epasses against the admin demo URL when admin creds are available
Docs / website
-
pnpm check:demo-docs-syncpasses before the site deploy - the site only points at docs routes that return
200
Security / config
-
COOKIE_SECRETis not the default -
CONSOLE_ORIGINmatches the public console URL -
CONTROL_PLANE_INTERNAL_URLpoints at the control-plane service from the console container - Postgres is not exposed more broadly than intended
Useful Test Commands
For current backend acceptance coverage:
pnpm --filter @clawback/control-plane exec vitest run \
src/e2e/http-acceptance.test.ts \
src/e2e/full-flows.test.ts \
src/hardening/api-boundaries.test.ts \
src/workspace-routes.test.ts
For build verification:
pnpm --filter @clawback/control-plane build
pnpm --filter @clawback/console build
For higher-signal whole-system checks that go beyond backend Vitest:
pnpm test:console
pnpm test:env
pnpm test:console:first-run:e2e
pnpm --filter @clawback/console exec playwright test e2e/worker-demo-proof.e2e.ts
pnpm --filter @clawback/db test
pnpm test:migration-proof
pnpm test:deployed-stack
What these add:
pnpm test:consolecovers console rendering and route-adjacent client logicpnpm test:envcovers environment parsing that the default rootpnpm testcurrently skipspnpm test:console:first-run:e2eshows the seeded no-Google knowledge path is discoverable in the actual browser UIpnpm --filter @clawback/console exec playwright test e2e/worker-demo-proof.e2e.tsshows the setup page can reach the worker proof rail and open real product state from therepnpm --filter @clawback/db teststatically checks the migration journal and catches duplicate-column and journal-integrity issuespnpm test:migration-proofchecks the migration chain against a fresh throwaway Postgres instancepnpm test:deployed-stackshows the supported prod Compose path can boot, seed, and pass the public-try verifier end to end