Prebuilt Asset Workflow¶
Use this workflow when you want to build Nova storage assets once in CI and reuse them across jobs/repos in read-only mode.
Quick setup checklist¶
- Pin the reusable workflow to a release tag or commit SHA (do not use
@masterin production). - Pick one producer source:
- existing
manifest.json(manifest_pathormanifest_uri), or - generate manifest in workflow (
dbt_generate_manifest: true). - Choose a stable
storage_instance_id(same value must be used by consumers). - Choose
models_distribution_mode: nonefor local/pre-warmed models,publish_onlyfor optional consumer opt-in,publish_and_bootstrapfor bootstrap-driven model hydration.- If publishing remotely, set
publish_targetsand matching auth secrets. - Trigger the workflow and confirm success.
- Use the produced stable bootstrap alias in consumer env (
DBT_NOVA_BOOTSTRAP_URI).
What this solves¶
- Removes repeated index builds on consumer jobs.
- Makes consumer startup deterministic.
- Enforces a strict contract (
storage_instance_id+ manifest content hash).
v1 boundaries¶
- Producer workflow + GitHub Artifacts are supported.
- Optional models distribution is controlled by
models_distribution_mode. - Consumers are read-only (
DBT_NOVA_STORAGE_READ_ONLY=true) and do not fall back to rebuilding. - Optional S3/GCS/DBFS publish targets are supported and disabled by default.
Producer (build once)¶
Create a workflow in the downstream repo that calls Nova's reusable producer.
name: Build Nova Assets
on:
workflow_dispatch:
jobs:
build_nova_assets:
# Pin to a release tag or commit SHA
uses: joe-broadhead/dbt-nova/.github/workflows/nova-build-assets.yml@v0.0.2
with:
manifest_path: target/manifest.json
storage_instance_id: analytics-prod
installer_ref: v0.0.2
installer_install_mode: auto
artifact_name_prefix: analytics-prod
retention_days: 14
models_distribution_mode: none
publish_targets: ""
Alternative producer inputs:
manifest_uri(instead ofmanifest_path)dbt_generate_manifest: true+ structured invocation (dbt_command_args_json, optionaldbt_executable, optionaldbt_allow_unsafe_executable)dbt_generate_manifest: true+ trusted shell invocation (dbt_command)dbt_env_json(JSON object of non-secret env vars exported before dbt invocation)dbt_secret_env_map_json(JSON object mapping env var names to secret names)models_distribution_mode(none,publish_only,publish_and_bootstrap)- workflow_call secret
DBT_NOVA_SECRET_BUNDLE_JSON(optional JSON object ofsecret-name -> secret-valueentries for cross-owner reusable workflow calls)
Structured mode is the recommended default:
dbt_command_args_jsonmust be a JSON array of strings.- The workflow executes
[dbt_executable, *dbt_command_args_json]with no shell interpolation. dbt_executabledefaults todbt.- By default
dbt_executablemust resolve todbt/dbt.exe; setdbt_allow_unsafe_executable: trueonly in trusted contexts.
Trusted shell mode is still available for advanced cases:
dbt_commandis executed asbash -lc "<command>"inside the caller repository checkout.- Treat it as a trusted command surface and only use it in trusted repos/branches.
dbt_commandanddbt_command_args_jsonare mutually exclusive.
When using dbt_generate_manifest: true, prefer this structured pattern so the workflow works across Databricks, BigQuery, DuckDB, and mixed profiles:
jobs:
build_nova_assets:
uses: joe-broadhead/dbt-nova/.github/workflows/nova-build-assets.yml@v0.0.2
with:
dbt_generate_manifest: true
dbt_command_args_json: >-
["compile","--target","prod"]
dbt_env_json: >-
{"DBT_TARGET":"prod","DBT_PROFILES_DIR":"./"}
dbt_secret_env_map_json: >-
{"DBT_ACCESS_TOKEN":"DBT_ACCESS_TOKEN","DBT_BIGQUERY_KEYFILE_JSON":"DBT_BIGQUERY_KEYFILE_JSON"}
storage_instance_id: analytics-prod
secrets:
DBT_NOVA_SECRET_BUNDLE_JSON: ${{ secrets.DBT_NOVA_SECRET_BUNDLE_JSON }}
DBFS publish wrapper example:
jobs:
build_nova_assets:
uses: joe-broadhead/dbt-nova/.github/workflows/nova-build-assets.yml@v0.0.2
with:
dbt_generate_manifest: true
dbt_command_args_json: >-
["compile","--target","prod"]
dbt_env_json: >-
{"DBT_TARGET":"prod","DBT_PROFILES_DIR":"./","DATABRICKS_HOST":"https://<workspace>.cloud.databricks.com"}
dbt_secret_env_map_json: >-
{"DBT_ACCESS_TOKEN":"DBT_ACCESS_TOKEN","DATABRICKS_ACCESS_TOKEN":"DBT_ACCESS_TOKEN"}
storage_instance_id: analytics-prod
installer_ref: v0.0.2
installer_install_mode: release
publish_targets: dbfs
publish_dbfs_prefix: dbfs:/FileStore/projects/my-project/nova-assets/prod
publish_dry_run: false
models_distribution_mode: none
secrets:
DBT_NOVA_SECRET_BUNDLE_JSON: ${{ secrets.DBT_NOVA_SECRET_BUNDLE_JSON }}
For DBFS publish with publish_dry_run: false, DATABRICKS_HOST and DATABRICKS_ACCESS_TOKEN must be present at publish time. The example above wires both through dbt_env_json + dbt_secret_env_map_json.
Notes:
dbt_env_jsonvalues are plain strings.dbt_secret_env_map_jsonvalues are looked up in this order: 1) keys inDBT_NOVA_SECRET_BUNDLE_JSON(when provided), 2) inherited workflow secrets (secrets: inherit, same-owner/org calls).- Missing mapped secrets fail fast before dbt invocation.
- Use trusted
dbt_commandonly when you explicitly need shell semantics.
Secret setup patterns:
| Call pattern | How secrets are resolved |
|---|---|
| Same org/user, reusable workflow call | secrets: inherit can expose caller secrets directly |
| Cross-owner call or strict least-privilege | Pass one DBT_NOVA_SECRET_BUNDLE_JSON secret and map keys via dbt_secret_env_map_json |
| No dbt manifest generation | dbt_secret_env_map_json is not required |
Publish target auth requirements:
| Target | Required credentials |
|---|---|
s3 | Standard AWS auth env (AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / session token or OIDC role) |
gcs | Token via one of DBT_NOVA_GCP_ACCESS_TOKEN, DBT_NOVA_BIGQUERY_ACCESS_TOKEN, GCP_ACCESS_TOKEN, GOOGLE_OAUTH_ACCESS_TOKEN |
dbfs | DATABRICKS_HOST and DATABRICKS_ACCESS_TOKEN |
Common downstream pattern: keep a repo-local workflow_dispatch wrapper and call this reusable workflow from it. This lets each repo set its own target defaults, storage instance naming, and publish prefixes without forking Nova's workflow.
The producer emits:
- storage artifact (required)
- manifest artifact (
manifest.json, always exported) - metadata contract artifact (
nova-build-metadata.json, required) - models artifact (optional when
models_distribution_mode != none) - bootstrap contract artifact(s) when remote publish is configured
- publish summary artifact (
artifact_name_publish_summary) with published URIs by target
When remote publish is enabled, Nova produces two bootstrap paths per target:
- versioned bootstrap: immutable, manifest-hash-specific, useful for rollback/debugging
- stable bootstrap alias: mutable latest pointer derived from
storage_instance_id, recommended for consumers
Models distribution modes:
none: do not package or publish models artifact; bootstrap omitsmodels_artifact_uri.publish_only: package/publish models artifact, but bootstrap still omitsmodels_artifact_uri.publish_and_bootstrap: package/publish models artifact and includemodels_artifact_uriin bootstrap.
Consumer impact by mode:
models_distribution_mode | Models archive published | Bootstrap includes models URI | Consumer expectation |
|---|---|---|---|
none | No | No | Use local model cache (DBT_NOVA_EMBEDDINGS_CACHE_DIR) or on-demand model download |
publish_only | Yes | No | Optional manual consumer opt-in via DBT_NOVA_MODELS_ARTIFACT_URI |
publish_and_bootstrap | Yes | Yes | Bootstrap-native remote model hydration |
Optional remote publish targets¶
Use this when consumers should pull artifacts from cloud storage instead of GitHub Actions artifacts.
Workflow inputs:
publish_targets: comma-separated list froms3,gcs,dbfspublish_s3_prefix: e.g.s3://my-bucket/nova-assets/prodpublish_gcs_prefix: e.g.gs://my-bucket/nova-assets/prodpublish_dbfs_prefix: e.g.dbfs:/FileStore/projects/my-project/nova-assets/prodpublish_dry_run:trueto compute publish URIs without network uploadsinstaller_repository/installer_ref(advanced override for which repo/ref is used for installingdbt-nova; defaults tojoe-broadhead/dbt-novaplus the resolved reusable-workflow ref when available)installer_install_mode:auto(default; try release binary then fall back to source build),release(release binary only), orsource(always build from source)
Auth per target:
s3: standard AWS env credentials used byawsCLIgcs: one ofDBT_NOVA_GCP_ACCESS_TOKEN,DBT_NOVA_BIGQUERY_ACCESS_TOKEN,GCP_ACCESS_TOKEN,GOOGLE_OAUTH_ACCESS_TOKEN(or gcloud ADC token)dbfs:DATABRICKS_HOSTandDATABRICKS_ACCESS_TOKEN
Installer mode guidance:
- Keep
installer_install_mode: autoas the default. - Use
installer_install_mode: releasewith a release tag ref (for exampleinstaller_ref: v0.0.2or newer) to minimize runtime on compatible runners. - Use
installer_install_mode: sourcewhen you need an unreleased commit SHA or your runner image is incompatible with the prebuilt binary (for example older glibc environments).
Published object naming is deterministic:
- storage:
<prefix>/<artifact_name_storage>.tar.gz - manifest:
<prefix>/<artifact_name_manifest>.json - metadata:
<prefix>/<artifact_name_metadata>.json - bootstrap (versioned):
<prefix>/<artifact_name_bootstrap>.json - bootstrap (stable alias):
<prefix>/<storage_instance_id>-latest-bootstrap.json - models (optional):
<prefix>/<artifact_name_models>.tar.gz
Producer outputs include:
published_targets(comma-separated successful targets)published_bootstrap_latest_uris(stable alias URIs keyed by target)artifact_name_publish_summary(artifact containingpublished_*_urisJSON payloads)- legacy
published_*_urisoutputs are deprecated and return{}for compatibility
Example (extract DBFS stable bootstrap alias from publish summary artifact):
PUBLISH_SUMMARY_DIR="$(mktemp -d)"
gh run download "$GITHUB_RUN_ID" --name "$ARTIFACT_NAME_PUBLISH_SUMMARY" --dir "$PUBLISH_SUMMARY_DIR"
PUBLISH_SUMMARY_JSON="$(find "$PUBLISH_SUMMARY_DIR" -type f -name '*.json' | head -n 1)"
BOOTSTRAP_URI="$(jq -r '.published_bootstrap_latest_uris.dbfs' "$PUBLISH_SUMMARY_JSON")"
Example publish-summary JSON shape:
{
"published_storage_uris": {
"dbfs": "dbfs:/FileStore/projects/my-project/nova-assets/prod/analytics-storage-<run>-<manifest>.tar.gz"
},
"published_manifest_uris": {
"dbfs": "dbfs:/FileStore/projects/my-project/nova-assets/prod/analytics-manifest-<run>-<manifest>.json"
},
"published_metadata_uris": {
"dbfs": "dbfs:/FileStore/projects/my-project/nova-assets/prod/analytics-storage-<run>-<manifest>-metadata.json"
},
"published_bootstrap_uris": {
"dbfs": "dbfs:/FileStore/projects/my-project/nova-assets/prod/analytics-prod-storage-<run>-<manifest>-bootstrap.json"
},
"published_bootstrap_latest_uris": {
"dbfs": "dbfs:/FileStore/projects/my-project/nova-assets/prod/analytics-prod-latest-bootstrap.json"
},
"published_models_uris": {}
}
Post-run verification checklist:
- Check workflow summary for resolved artifact names and publish targets.
- Download
artifact_name_publish_summaryand confirm expected target URI keys. - Set
DBT_NOVA_BOOTSTRAP_URIto the stable bootstrap alias and rundbt-nova health check --json. - Confirm health reports
bootstrap.loaded=trueandartifact_consumer.storage_materialized=true. - After a later publish, run
reload_manifestto adopt the newer assets without editing MCP config.
Consumer (reuse in read-only mode)¶
Nova now supports native remote artifact consumption. Manual download/extract is optional.
Option A (recommended): native remote artifact mode¶
Set these env vars:
DBT_NOVA_STORAGE_READ_ONLY=trueDBT_NOVA_BOOTSTRAP_URI(recommended one-URI setup)DBT_NOVA_ARTIFACT_FETCH_POLICY=if_missing|always|never(defaultif_missing)
Optional:
DBT_NOVA_STORAGE_INSTANCE_ID,DBT_NOVA_STORAGE_ARTIFACT_URI,DBT_NOVA_METADATA_ARTIFACT_URI,DBT_NOVA_MODELS_ARTIFACT_URI(explicit mode if you do not use bootstrap URI)DBT_NOVA_ARTIFACTS_CACHE_DIR(defaults to<storage_root>/artifacts)DBT_NOVA_ARTIFACT_TIMEOUT_SECS(default300)DBT_NOVA_ARTIFACT_ALLOW_HTTP=true(only for non-TLS artifact URIs; not recommended)
Local shell example:
export DBT_MANIFEST_PATH="$PWD/manifest.json"
export DBT_NOVA_STORAGE_DIR="$PWD/.dbt-nova"
export DBT_NOVA_STORAGE_READ_ONLY="true"
export DBT_NOVA_BOOTSTRAP_URI="s3://my-bucket/nova-assets/prod/analytics-prod-latest-bootstrap.json"
export DBT_NOVA_ARTIFACT_FETCH_POLICY="if_missing"
dbt-nova health check --json
CI shell example:
export DBT_NOVA_STORAGE_DIR="$RUNNER_TEMP/.dbt-nova"
export DBT_NOVA_STORAGE_READ_ONLY="true"
export DBT_NOVA_BOOTSTRAP_URI="$BOOTSTRAP_URI"
export DBT_NOVA_ARTIFACT_FETCH_POLICY="if_missing"
dbt-nova health check --json
MCP client env example:
{
"mcpServers": {
"dbt-nova": {
"command": "/path/to/dbt-nova",
"env": {
"DBT_NOVA_STORAGE_DIR": "/path/to/.dbt-nova",
"DBT_NOVA_STORAGE_READ_ONLY": "true",
"DBT_NOVA_BOOTSTRAP_URI": "s3://my-bucket/nova-assets/prod/analytics-prod-latest-bootstrap.json",
"DBT_NOVA_ARTIFACT_FETCH_POLICY": "if_missing"
}
}
}
}
Recommended consumer pattern:
- Point
DBT_NOVA_BOOTSTRAP_URIat the stable alias (<storage_instance_id>-latest-bootstrap.json). - Keep the versioned bootstrap URIs for rollback/debugging only.
- After producers publish a newer asset set, run
reload_manifestso Nova re-fetches the stable bootstrap alias and adopts the new artifacts.
Use health to verify runtime decisions:
artifact_consumer.enabledartifact_consumer.fetch_policyartifact_consumer.metadata_validatedartifact_consumer.storage_materializedartifact_consumer.models_materializedartifact_consumer.last_evaluated_at_msartifact_consumer.last_materialized_at_msbootstrap.enabledbootstrap.uribootstrap.contract_versionbootstrap.loadedbootstrap.validatedbootstrap.applied_fieldsbootstrap.last_evaluated_at_ms
Option B: manual extraction fallback¶
If you prefer manual extraction, keep using pre-extracted artifacts with:
DBT_NOVA_STORAGE_DIRDBT_NOVA_STORAGE_INSTANCE_IDDBT_NOVA_STORAGE_READ_ONLY=true
If you extracted models separately, set:
DBT_NOVA_EMBEDDINGS_CACHE_DIRto the extracted models directory.
Manual retrieval examples¶
S3:
aws s3 cp s3://my-bucket/nova-assets/prod/<artifact_name_storage>.tar.gz .
tar -xzf <artifact_name_storage>.tar.gz
GCS:
gcloud storage cp gs://my-bucket/nova-assets/prod/<artifact_name_storage>.tar.gz .
tar -xzf <artifact_name_storage>.tar.gz
DBFS (Databricks CLI):
databricks fs cp dbfs:/FileStore/projects/my-project/nova-assets/prod/<artifact_name_storage>.tar.gz .
tar -xzf <artifact_name_storage>.tar.gz
Compatibility guidance¶
- Keep producer and consumer on the same released Nova version when possible.
storage_instance_idmust match between producer and consumer.- Consumer manifest content must match the producer-built manifest hash. Path differences are allowed; content differences are not.
- Metadata contract version must be compatible (
v1currently).
Failure modes and fixes¶
| Symptom | Likely cause | Fix |
|---|---|---|
Storage is read-only and no reusable index is available | Missing storage files, mismatched storage_instance_id, or manifest content mismatch | Re-download artifacts, verify instance id, verify manifest is identical to producer input |
| Metadata contract validation fails | Missing/corrupt nova-build-metadata.json or unsupported contract version | Re-run producer and consume both storage + metadata artifacts together |
| Health passes but embeddings are missing | Models artifact not provided in consumer setup | Native mode: set DBT_NOVA_MODELS_ARTIFACT_URI to the producer models artifact URI. Manual mode: extract models artifact and set DBT_NOVA_EMBEDDINGS_CACHE_DIR. |