Deployment Readiness¶
Use this checklist before exposing Open Cowork Cloud or the headless gateway to company users, customers, or public channel webhooks. The deployment should be the same product on every provider: configure adapters and infrastructure, do not add provider-specific branches to core app code.
Required Topology¶
Choose one of the first-class topology profiles before choosing a provider:
| Profile | Production boundary |
|---|---|
desktop-only | local Desktop execution, no remote dependency |
gateway-only | Standalone Gateway owns private OpenCode and Gateway Postgres |
cloud-only | Cloud web/worker/scheduler own Cloud workspaces |
cloud-channel-gateway | Gateway is a Cloud client and channel adapter |
desktop-gateway | Desktop executes through outbound pairing only |
cloud-gateway-edge | Cloud registers an external Gateway/edge authority explicitly |
full-hybrid | every workspace declares one execution authority |
The topology profile contract lives in deploy/topologies/topology-profiles.json; the operator kit is deploy/topologies/README.md; the docs overview is Deployment Topologies.
After choosing a topology, apply the matching security gate from Hybrid Security Gates. The gate contract lives in deploy/security/hybrid-security-gates.json and defines the required auth, revocation, approval/question policy, audit events, quotas/rate limits, durability, backup/restore, redaction, and fail-closed checks for desktop-local, desktop-pairing, standalone-gateway, cloud-worker, cloud-channel-gateway, cloud-gateway-edge, and full-hybrid.
Use Setup and Health Center as the operator-facing bridge between topology selection and rollout evidence. Desktop exposes the same authority-aware states for local runtime readiness, workspace support, Cloud auth/sync, Gateway doctor/smoke checks, database migration posture, object store posture, backup posture, and pairing freshness.
Production Cloud deployments should run these processes separately:
- cloud
web: stateless HTTP, browser dashboard, API, SSE, auth, and durable projections. - cloud
worker: OpenCode execution, command processing, checkpoints, and artifact generation. - cloud
scheduler: durable workflow claims and scheduled run creation. - gateway: channel I/O, provider webhooks or polling, channel rendering, and delivery retries.
Local Compose may run all-in-one cloud for speed. Provider demos may use all-in-one for a focused pilot. Shared or hosted deployments should use split roles and shared Postgres/object storage.
Compose files in this repo are local/demo references. They intentionally ship loopback URLs, local MinIO, local Postgres, insecure auth overrides, fake/demo tokens, and build: blocks for fast validation. Production downstream overlays must pin OCI images by release tag or digest, replace every demo secret, use HTTPS public URLs, and move Postgres/object storage/secrets to the provider control plane.
Deployer Config¶
- Keep downstream product policy in
open-cowork.config.json; keep provider-specific wiring in Compose, Helm, Terraform, or platform manifests. - Use
brandingfor Desktop,cloud.publicBrandingfor Cloud Web, andgateway.brandingfor headless channel surfaces so all three clients expose the same downstream product name, logo, legal links, support links, and managed connection labels. - The stock Helm chart intentionally omits
cloud.branding.theme; Cloud Web inherits the shared Desktop-aligned dark theme unless a downstream deployment deliberately supplies its own public theme tokens. - Use
cloudDesktop.preconfiguredConnectionsfor managed-org Desktop builds instead of hardcoding cloud URLs in renderer code. - Use
gateway.providers[]for channel provider bindings and credential refs. Gateway can read this section throughOPEN_COWORK_CONFIG_PATH,OPEN_COWORK_CONFIG_DIR, orOPEN_COWORK_DOWNSTREAM_ROOT, withOPEN_COWORK_GATEWAY_*env vars as deployment overrides. - Use
cloud.billing.provider=noneorstubfor OSS self-host deployments. Stripe or future billing adapters are managed-hosting configuration, not a core runtime dependency. - For managed BYOK private beta, use
docs/runbooks/private-beta-launch.md,docs/runbooks/private-beta-support.md, anddeploy/private-beta/to keep onboarding, support, plan placeholders, and OSS self-host boundaries explicit before inviting design partners. - Run schema and semantic validation for downstream configs before rollout:
node --no-warnings --experimental-strip-types --test tests/config-schema-validation.test.ts.
Production Checklist¶
Auth¶
- Public cloud uses OIDC or trusted
headerauth behind a signed, trusted identity proxy. OPEN_COWORK_CLOUD_AUTH_MODE=noneis allowed only for local/demo installs with explicit insecure overrides.- Header auth includes
OPEN_COWORK_CLOUD_HEADER_AUTH_SECRETorOPEN_COWORK_CLOUD_HEADER_AUTH_SECRET_REF; identity headers from arbitrary clients are never trusted directly. - Public dashboard traffic uses HTTPS and a stable
OPEN_COWORK_CLOUD_PUBLIC_URLfor OIDC and trusted-header deployments alike.
Cookie Secret¶
- Set a high-entropy cookie secret through
OPEN_COWORK_CLOUD_COOKIE_SECRETorOPEN_COWORK_CLOUD_COOKIE_SECRET_REF. - Keep
OPEN_COWORK_CLOUD_COOKIE_SECURE=truefor HTTPS deployments. - Rotate the cookie secret during a maintenance window because browser sessions may be invalidated.
Postgres¶
- Use managed Postgres or a highly available cluster for the control plane.
- Enable automated backups and point-in-time recovery.
- Size connection limits for web replicas, workers, scheduler replicas, and dashboard/gateway API traffic.
- Run the real Postgres concurrency tests before changing schema, lease, command, delivery, or quota behavior.
Object Store¶
- Configure object storage for artifacts, uploads, exports, runtime checkpoints, workspace snapshots, and diagnostics bundles.
- Use provider-native object storage through adapter configuration: S3, GCS, Azure Blob, DigitalOcean Spaces, or compatible S3 endpoints.
- Do not rely on local filesystem object storage for scaled workers.
- Confirm object-store read/write with a smoke artifact or checkpoint-enabled session before enabling multiple workers.
- Multi-worker scale-out requires shared object storage for checkpoints and artifacts. Helm fails closed when
roles.worker.replicas > 1is paired with filesystem object storage, missing buckets, disabled global checkpoints, or disabled worker checkpoints.
Secret Adapter/KMS¶
- Store envelope keys, BYOK material, channel credentials, database URLs, object-store credentials, gateway service tokens, and billing secrets in a provider secret manager. Use cloud-provider KMS encryption underneath those secret-manager products or private deployment overlays until a first-class KMS decrypt adapter is added.
- Use
OPEN_COWORK_CLOUD_SECRET_KEY_REFwhere possible:gcp-sm://...,aws-sm://...,azure-kv://..., orenv:...for platform-injected secrets. public_productionrejects weak inline envelope keys; hosted deployments should use managed refs or existing Kubernetes secrets rather than Helm literal values.- BYOK plaintext is only revealed in the worker role and only long enough to build provider runtime config.
Public URL/HTTPS¶
- Set
OPEN_COWORK_CLOUD_PUBLIC_URLto the canonical HTTPS origin. - Set
OPEN_COWORK_GATEWAY_PUBLIC_URLwhen providers require webhook callbacks. - Terminate TLS at ingress, load balancer, Cloud Run/App Platform, or the service mesh; internal pod/service traffic may remain private.
- Do not send desktop bearer tokens, gateway service tokens, cookies, or BYOK setup requests over non-loopback HTTP.
- For generic webhook/bridge providers, keep outbound delivery on the default public policy: HTTPS only, no embedded credentials, host allowlists for managed bridges, DNS/private-address rejection, and circuit health visible in Gateway diagnostics. Enable private/internal delivery only in explicitly risk-accepted deployments.
Cloud Web Workbench¶
- Treat the browser workbench as a release-critical client, not only as an admin convenience UI.
- Cloud Web is the browser build of the desktop renderer, so its UI — including browser E2E behavior, accessibility, and performance and scale — is covered by the renderer suite: run
pnpm test:rendererbefore provider rollout. - Run
pnpm test:cloud-continuationbefore provider rollout to verify the cloud control-plane behavior the browser depends on: sessions, bounded pagination, cursor validation, SSE reconnect, and redaction. - Run
pnpm cloud:smoketo build the production cloud bundle, including the browser renderer, and prove it imports and serves. - The workbench route at
GET /must return the renderer entry document with the bootstrap JSON, hashed/assets/*module scripts,cache-control: no-storeon the document, and a nonce-backedContent-Security-Policy. - API bootstrap endpoints such as
GET /api/configandGET /api/workspacemust be reachable through the deployed origin and return either authenticated metadata or an expected auth error, never a proxy/static-asset failure. - Validate signed-out, member, admin, policy-blocked, quota-blocked, and billing-blocked states. A disabled browser control is only an ergonomic mirror; the API remains the authorization boundary.
- Test laptop and tablet widths. Thread lists, admin tables, approval/question panels, and artifact controls must not overlap or depend on desktop-only viewport assumptions.
- Downstream branding smoke checks should load the workbench with the deployed product name, logo URL, theme tokens, and managed connection labels.
Deployment Tiers¶
- Set
OPEN_COWORK_CLOUD_DEPLOYMENT_TIER=localfor laptop demos and throwaway all-in-one experiments. - Set
OPEN_COWORK_CLOUD_DEPLOYMENT_TIER=self_host_betaorprivate_betafor downstream pilots where the operator understands the remaining launch evidence gaps. - Set
OPEN_COWORK_CLOUD_DEPLOYMENT_TIER=public_productiononly for split-role public deployments. This tier fails startup unless the control plane is durable Postgres, object storage is provider-backed, secret/cookie material is production-strength or resolved from a managed secret ref, auth is enabled, the web role has a canonical HTTPS public URL, web does not process commands inline, and workers have checkpoints enabled. - Use
/livezfor process liveness and/readyzfor dependency readiness./healthzremains backward-compatible, but Kubernetes readiness probes should not use it for public production.
Worker/Scheduler Scaling¶
- Enable
OPEN_COWORK_CLOUD_CHECKPOINTS_ENABLED=truebefore scaling worker replicas beyond one. - Set
OPEN_COWORK_CLOUD_SHUTDOWN_GRACE_MSand the platform termination grace so active command loops can finish after a drain request. - Keep worker runtime roots ephemeral for horizontally scaled Kubernetes workers unless a single-worker persistent root is intentionally configured.
- Run at least one scheduler. Multiple schedulers are safe when they use database claims.
- For Kubernetes, add HPA or KEDA in the provider overlay that owns metrics.
Gateway Scaling And Operator Auth¶
- Run one gateway replica per channel-binding shard until stream ownership and cursors are externalized. For production, use one Helm release/deployment per shard with
replicaCount: 1; thegateway.experimentalDistributedOwnership=trueescape hatch is for lab deployments only and does not remove the need for an explicit shard ownership design. - Configure
OPEN_COWORK_GATEWAY_ADMIN_TOKENfor operator endpoints in every shared or public deployment. The loopback bypass is explicit local-only:OPEN_COWORK_GATEWAY_ALLOW_LOOPBACK_OPERATOR_BYPASS=trueand a loopback bind. Runtime rejects loopback bypass when a public URL or proxy-forwarded request is present. - Keep
OPEN_COWORK_GATEWAY_MAX_REQUEST_BODY_BYTESaligned with provider advertised file limits. Generic bridge/email attachment limits default to the same request-body cap. - Set bounded network deadlines:
OPEN_COWORK_GATEWAY_CLOUD_REQUEST_TIMEOUT_MS,OPEN_COWORK_GATEWAY_WEBHOOK_DELIVERY_TIMEOUT_MS,OPEN_COWORK_GATEWAY_SMTP_TIMEOUT_MS, andOPEN_COWORK_GATEWAY_SHUTDOWN_DRAIN_TIMEOUT_MS. HPA is appropriate for web CPU/memory or worker CPU/memory capacity; KEDA is appropriate for command queue depth, backlog age, or provider-native queue metrics. - Enable PodDisruptionBudgets for production web, worker, scheduler, and gateway workloads, then use topology spread constraints so replicas are distributed across nodes and zones.
- Monitor worker heartbeat age, scheduler heartbeat age, command latency, projection lag, and lease reclaim counts.
- Managed worker pools must follow the Managed Worker Service Plane contract before they are exposed as production capacity: explicit worker identity, scoped expiring credentials, lifecycle state, durable work claims, lease-token fencing, checkpoint/artifact ownership, recovery rules, quotas, and operator runbooks.
- Cloud-connected Standalone Gateway deployments must follow the Cloud Gateway Registration contract:
external_workspaceis redacted metadata only,edge_workeruses managed-worker lease fencing for Cloud-owned work, and customer-hosted edge workers against managed SaaS remain deferred. - Use the public templates under
deploy/managed-workers/for self-hosted and managed-worker pool deployment, release evidence, and restore drills. - The first supported managed-worker mode is control-plane-owned worker pools. Do not connect customer-hosted workers to a separate managed SaaS control plane until a separate trust review, update policy, and data-residency model are implemented.
Gateway Service Token¶
- Run the gateway as a separate deployment with a scoped service/API token.
- Store
OPEN_COWORK_GATEWAY_SERVICE_TOKENin the platform secret manager. - Rotate gateway tokens by issuing a new token, updating the deployment secret, restarting the gateway, then revoking the old token.
- The gateway token authenticates the gateway process only; inbound channel actor identity and approval authority are resolved separately by cloud.
- Cloud records the API-token id that last claimed each channel delivery. A gateway-scoped token can list, retry, or dead-letter only deliveries last claimed by that same token; channel admins retain broader Cloud recovery access. Run one gateway token per provider shard or deployment instance so retry/dead-letter ownership is auditable.
Provider Webhook Signing¶
- Public webhook providers require provider signing secrets or timestamped HMAC signatures.
- Slack uses its signing secret, email uses an inbound shared secret, and the generic webhook provider signs the raw body with
OPEN_COWORK_GATEWAY_WEBHOOK_SHARED_SECRETusingx-open-cowork-gateway-webhook-timestampandx-open-cowork-gateway-webhook-signature. - The fake provider is local/demo-only. Public demo exposure requires the deliberate
OPEN_COWORK_GATEWAY_ALLOW_PUBLIC_FAKE_PROVIDER=trueoverride and must not be used for production traffic. - Gateway metrics, diagnostics, and delivery operator endpoints require an admin token unless the process is explicitly running in local loopback bypass mode.
Trusted Header Auth¶
cloud.auth.mode=headeris for deployments behind a trusted identity proxy.- Public deployments require
OPEN_COWORK_CLOUD_HEADER_AUTH_SECRETand signed timestamped identity headers. Unsigned header auth is only for local demos. - Header-auth role headers must map to
owner,admin, ormember; unknown roles are rejected rather than treated as privileged users.
Quotas/Rate Limits¶
- Configure per-org session, worker, prompt, API, and gateway delivery limits before public hosting.
- Keep billing disabled or stubbed for OSS self-host; self-hosted use should work with no billing provider or the stub billing provider.
- Hosted SaaS should gate new execution on subscription state while preserving read access and export paths.
- Rate limits should return clear 429 responses with
Retry-After; billing gates should return clear 402 responses.
OTLP/Logging¶
- Use JSON logs in production.
- Configure
OPEN_COWORK_CLOUD_OTLP_ENDPOINT, scrape authenticatedGET /api/metricsfor Cloud where Prometheus is used, and scrape Gateway/metricswith the Gateway admin token. - Include request ids, org ids, session ids, run ids, worker ids, scheduler ids, and gateway delivery ids.
- Redact BYOK keys, API tokens, cookies, OAuth tokens, webhook secrets, database URLs, object-store signed URLs, and local paths.
- Keep deployable metric, dashboard, and alert assets under
deploy/observability/in sync with the production SLOs.
Backups/Restore¶
- Back up Postgres and object storage on the same retention policy.
- For Standalone Gateway team and enterprise deployments, run Postgres with verified TLS and enable the lease-gated retention daemon for sessions, artifacts, audit events, and completed/dead jobs.
- Restore Postgres first, then object-store artifacts/checkpoints for the same point in time.
- Start web with workers at zero, verify projections and session lists, start one worker, run a smoke prompt, then start scheduler and gateway.
- Verify channel deliveries resume from durable cursors without duplicates.
- Follow
docs/runbooks/backup-restore.mdand keep the latest redacted drill evidence indocs/runbooks/restore-drill-report.mdor a downstream private operations repository.
Deployment Validation¶
Run static and tool-backed configuration checks:
In CI or release qualification, require Docker and Helm:
The validator checks:
- Compose config for
docker-compose.cloud.yml,docker-compose.cloud.split.yml, anddocker-compose.cloud-gateway.yml. - Helm lint/render for cloud and gateway charts.
- Helm fail-closed behavior for unsafe public cloud auth, public gateway metrics without admin auth, and generic webhook ingress without a shared secret.
- Helm image pinning, no-
latestpolicy, and multi-worker checkpoint/object store guardrails. - Presence of provider recipes, this checklist, and the managed BYOK SaaS runbook.
Load, Soak, And Launch Gates¶
Before local/self-host beta, private-beta, or public-beta rollout, define the exact launch profile and run the load/soak harness in strict mode. The committed target profiles are in deploy/load/launch-readiness-targets.json, and the current accepted public launch tier is recorded in deploy/load/launch-evidence-matrix.json:
local-self-host-betafor OSS self-host and local reference deployments. This is the only launch tier the public repo currently claims.private-betafor design-partner and internal managed BYOK rollout.public-betafor the first broader hosted BYOK rollout.enterprise-scalefor large downstream or managed org readiness after public-beta evidence is green.
Do not use public templates alone to claim private hosted beta, public hosted beta, general availability, or enterprise-scale readiness. Those tiers need environment-specific private operations evidence for load, soak, failover, restore, security, support, and cost/SLO behavior.
Set evidence metadata for every plan, load, soak, and failover drill run:
export OPEN_COWORK_EVIDENCE_COMMIT_SHA="$(git rev-parse HEAD)"
export OPEN_COWORK_EVIDENCE_CLOUD_IMAGE_DIGEST=sha256:REPLACE_WITH_CLOUD_IMAGE_DIGEST
export OPEN_COWORK_EVIDENCE_GATEWAY_IMAGE_DIGEST=sha256:REPLACE_WITH_GATEWAY_IMAGE_DIGEST
Generate the planned route matrix:
OPEN_COWORK_LOAD_PROFILE=local-self-host-beta \
OPEN_COWORK_LOAD_CLOUD_TOKEN=... \
OPEN_COWORK_LOAD_GATEWAY_ADMIN_TOKEN=... \
OPEN_COWORK_LOAD_BYOK_PROVIDER=anthropic \
OPEN_COWORK_LOAD_INCLUDE_MUTATIONS=true \
OPEN_COWORK_LOAD_INCLUDE_SSE=true \
OPEN_COWORK_LOAD_OPERATOR_CHECKS=true \
OPEN_COWORK_LOAD_STRICT=true \
pnpm deploy:load:plan
Run the short load gate:
OPEN_COWORK_LOAD_CLOUD_URL=https://cowork.example.com \
OPEN_COWORK_LOAD_GATEWAY_URL=https://gateway.example.com \
OPEN_COWORK_LOAD_CLOUD_TOKEN=... \
OPEN_COWORK_LOAD_GATEWAY_ADMIN_TOKEN=... \
OPEN_COWORK_LOAD_BYOK_PROVIDER=anthropic \
OPEN_COWORK_LOAD_INCLUDE_MUTATIONS=true \
OPEN_COWORK_LOAD_INCLUDE_SSE=true \
OPEN_COWORK_LOAD_OPERATOR_CHECKS=true \
OPEN_COWORK_LOAD_PROFILE=local-self-host-beta \
pnpm deploy:load:strict
Run the long soak gate after the load gate is green:
OPEN_COWORK_LOAD_CLOUD_URL=https://cowork.example.com \
OPEN_COWORK_LOAD_GATEWAY_URL=https://gateway.example.com \
OPEN_COWORK_LOAD_CLOUD_TOKEN=... \
OPEN_COWORK_LOAD_GATEWAY_ADMIN_TOKEN=... \
OPEN_COWORK_LOAD_BYOK_PROVIDER=anthropic \
OPEN_COWORK_LOAD_INCLUDE_MUTATIONS=true \
OPEN_COWORK_LOAD_INCLUDE_SSE=true \
OPEN_COWORK_LOAD_OPERATOR_CHECKS=true \
OPEN_COWORK_LOAD_PROFILE=local-self-host-beta \
pnpm deploy:soak:strict
The harness writes JSON and Markdown reports under .open-cowork-test/launch-readiness/ by default. Attach those reports, dashboard evidence, cost notes, known limits, and final smoke results to docs/runbooks/launch-readiness-report.md or a downstream private operations repository. Each report records the command name, commit SHA, image digests, sanitized environment profile, dates, duration, and pass/fail or go/no-go status. Use pnpm deploy:launch:validate to verify the committed gate artifacts stay in sync.
After the ordinary load and soak gates pass with zero unexpected quota rejections, run a deliberate quota-pressure pass with OPEN_COWORK_LOAD_EXPECT_QUOTA_REJECTIONS=true and a low downstream quota overlay. That pass should produce 429/402-style rejections without 5xx spikes, worker crashes, or gateway delivery wedges.
Runtime Smoke Checks¶
Before promoting a build, run the OpenCode compatibility proof:
The proof validates the same compatibility registry exported in runtime diagnostics and fails closed on missing bundled OpenCode version metadata, unknown/private assumptions, source-version drift, missing proving tests, and undocumented shim or blocked-policy entries.
For a local Compose deployment:
For any already-running provider deployment:
OPEN_COWORK_SMOKE_CLOUD_URL=https://cowork.example.com \
OPEN_COWORK_SMOKE_GATEWAY_URL=https://gateway.example.com \
pnpm deploy:smoke
For the GCP reference deployment, run a read-only project/API preflight before rollout:
After rollout, the GCP infra smoke can combine the Cloud Web smoke with Cloud Storage and Secret Manager checks:
OPEN_COWORK_GCP_PROJECT=PROJECT \
OPEN_COWORK_GCP_BUCKET=OPEN_COWORK_BUCKET \
OPEN_COWORK_GCP_SECRET_REF=gcp-sm://projects/PROJECT/secrets/open-cowork-cloud-secret-key/versions/latest \
OPEN_COWORK_SMOKE_CLOUD_URL=https://cowork.example.com \
pnpm deploy:gcp:smoke
For the Desktop cloud-sync gate against the same deployed Cloud environment:
OPEN_COWORK_DESKTOP_SMOKE_CLOUD_URL=https://cowork.example.com \
OPEN_COWORK_DESKTOP_SMOKE_ADMIN_TOKEN=... \
pnpm deploy:desktop:smoke
This smoke uses the Desktop main-process cloud adapter and cache path, not a separate test-only client. It validates Desktop OIDC metadata when configured, bearer-auth HTTP/SSE, Desktop-to-Web and Web-to-Desktop session continuation, prompt/abort routing, read-only offline cache fallback, local workspace isolation, and ephemeral Desktop token revocation.
For the Gateway gate against the same deployed Cloud environment:
OPEN_COWORK_GATEWAY_SMOKE_CLOUD_URL=https://cowork.example.com \
OPEN_COWORK_GATEWAY_SMOKE_GATEWAY_URL=https://gateway.example.com \
OPEN_COWORK_GATEWAY_SMOKE_ADMIN_TOKEN=... \
OPEN_COWORK_GATEWAY_SMOKE_GATEWAY_ADMIN_TOKEN=... \
pnpm deploy:gateway:smoke
This smoke validates both managed and self-host Gateway paths. It checks the managed Gateway health/readiness and operator endpoint protection, creates temporary cloud channel state, proves a gateway-scoped token cannot administer channels or mint tokens, runs a loopback fake-provider Gateway against the deployed Cloud URL, verifies inbound prompt routing, session SSE rendering, approval interaction routing, async delivery, retry/dead-letter controls, and ephemeral token revocation.
The Gateway smoke should also confirm /diagnostics.deliveryOperator reports the enabled channelBindingIds, and that /deliveries is scoped to those local bindings. Provider-level dashboard and alert assets must include open_cowork_gateway_provider_state, open_cowork_gateway_provider_delivery_retries_total, and open_cowork_gateway_provider_delivery_dead_letters_total before public provider traffic is enabled.
For the full Web/Desktop/Gateway continuation parity gate:
OPEN_COWORK_CONTINUATION_SMOKE_CLOUD_URL=https://cowork.example.com \
OPEN_COWORK_CONTINUATION_SMOKE_ADMIN_TOKEN=... \
OPEN_COWORK_CONTINUATION_SMOKE_REQUIRE_RICH_PROJECTION=true \
pnpm deploy:continuation:smoke
This is the production promise gate for the synced product. It checks Cloud Web bootstrap and request correlation, creates short-lived Web/Desktop/Gateway tokens, proves Web-created, Desktop-created, and Gateway-created sessions can be continued by the other surfaces, verifies durable projection parity after reload/replay, resolves approval and question state across surfaces, checks artifact metadata, exercises concurrent prompts on one cloud thread, validates stale Desktop cursor hydration, verifies Gateway channel rendering, and revokes all smoke tokens.
For operator-only readiness checks:
OPEN_COWORK_SMOKE_OPERATOR_CHECKS=true \
OPEN_COWORK_SMOKE_CLOUD_TOKEN=... \
OPEN_COWORK_SMOKE_GATEWAY_ADMIN_TOKEN=... \
pnpm deploy:smoke
The smoke script validates cloud /healthz//livez, the Cloud Web Workbench at GET /, workbench CSP/bootstrap markers, cloud API bootstrap endpoint reachability, gateway /health, and gateway /ready. Operator mode also checks cloud runtime/heartbeat/metrics endpoints and gateway metrics, and now fails closed unless the required operator tokens are present.
For production release evidence, use the strict wrapper:
OPEN_COWORK_SMOKE_CLOUD_URL=https://cowork.example.com \
OPEN_COWORK_SMOKE_GATEWAY_URL=https://gateway.example.com \
OPEN_COWORK_SMOKE_ADMIN_TOKEN=... \
OPEN_COWORK_SMOKE_GATEWAY_ADMIN_TOKEN=... \
pnpm deploy:smoke:strict
Strict smoke requires HTTPS for non-loopback URLs, authenticated Cloud and Gateway operator checks, Cloud runtime status, worker heartbeat visibility, Desktop/Web mutation coverage with token revocation rejection, managed Gateway health/readiness plus operator coverage, Gateway mutation/retry/dead-letter coverage with token revocation rejection, and Continuation rich projection with all ephemeral tokens revoked.
For local release scenario evidence, run:
This writes redacted JSON and Markdown evidence under .open-cowork-test/live-scenarios/. The default suite lives in deploy/scenarios/local-desktop-scenarios.json and covers six stable local scenarios: runtime doctor/diagnostics, cloud projection fence contracts, capability bundle dry-run policy, desktop pairing command safety, and workspace sandbox portability policy and lifecycle planning, plus control-plane identity, permission, file-session, headless loopback, and semantic UI status/action contracts. Scenario evidence is additive to deterministic tests; it does not replace unit, projection, boundary, smoke, or deployment checks. Logs, command output, command argv, paths, and token-like values are redacted before evidence is written.
CI and release preflight run this stable suite so evidence paths are produced for every release-eligible build.
For OpenCode runtime portability evidence, run:
The proof starts two isolated app-managed OpenCode runtimes, creates a no-reply session, copies the portable runtime/workspace/artifact metadata, verifies the restored SDK view, and emits a redacted sandboxEnginePreflight result. A missing local Docker or Apple Container engine is reported as sandbox-runtime-engine-unavailable; it is not treated as a successful sandbox runtime start.
For a real sandboxed OpenCode session proof, configure an OpenCode runtime image and run:
OPEN_COWORK_SANDBOX_IMAGE=open-cowork/opencode:local \
pnpm proof:sandbox:opencode-session -- --json --strict --image-sha256 sha256:...
The proof mounts only a temp proof harness, workspace, and runtime home, starts OpenCode inside the sandbox, creates a no-reply session through the OpenCode HTTP API, verifies the prompt message was recorded, then exits. Without --strict, CI and release preflight record typed redacted evidence for missing engine/image prerequisites. sandbox-runtime-engine-unavailable, sandbox-runtime-image-not-configured, and sandbox-runtime-policy-blocked are not successful sandbox session proofs.
For local operator runtime checks without launching the Desktop renderer, run:
pnpm headless:host check
pnpm headless:host start
pnpm headless:host start --detached
pnpm headless:host status
pnpm headless:host doctor
The headless host command is loopback-only in the public repo. Remote, LAN, tunnel behavior must fail closed until the matching topology authority, pairing, recovery, and audit evidence exists. start is foreground by default; start --detached returns only after a child process has written recoverable state, and status clears stale start state when that process no longer exists. stop signals a running loopback start process by recorded PID when needed and clears the redacted product state.
Provider Recipe Contract¶
Provider recipes under deploy/gcp, deploy/aws, deploy/azure, and deploy/digitalocean must stay thin. They should define:
- image repository plus immutable release tag or digest,
- public HTTPS origins,
- OIDC or trusted header auth,
- Postgres control-plane URL,
- object-store adapter settings,
- secret manager/KMS references,
- worker/scheduler replica counts,
- HPA or KEDA policy, PodDisruptionBudgets, and topology spread constraints,
- gateway service token and provider signing secrets,
- OTLP/logging endpoints,
- backup/restore ownership.
They must not require changes to session, runtime, projection, gateway, OpenCode SDK, billing, or BYOK core code.
Self-host OSS recipes must preserve a billing-free path: cloud.billing.enabled=false with cloud.billing.provider=none, or the stub provider when operators want visible billing states without payment-provider dependencies. Managed SaaS billing belongs in downstream hosting overlays, not in the self-host contract.