Operations & Troubleshooting¶
This page covers operational checks for manifest refresh, caches, and index storage.
Refresh & Cache Troubleshooting¶
First Check
Always start with the health tool when diagnosing issues. It shows status, manifest details, and refresh metrics in one call.
Quick health check¶
Use the health tool to verify readiness and see the active manifest details:
status:loading|ready|refreshing|failedloading: bootstrapping or rebuild in progress, no active index yetready: healthy active index serving trafficrefreshing: rebuilding in background while serving the existing active indexfailed: initial manifest load failed (common after bad JSON, bad permissions, or temporary source failure)manifest.hash: content hash of the active manifest (ready/refreshing)manifest.version: version directory name (hash prefix)manifest.age_ms: age of the manifest file based on itsmodified_msmanifest.loaded_at_ms: when this index version was builtmanifest.loaded_age_ms: time since build
Refresh stats (if enabled):
refresh.attempts: refresh attempts (manifest changed)refresh.successes: successful refresh swapsrefresh.failures: failed refresh swapsrefresh.last_attempt_ms: timestamp of last refresh attemptrefresh.last_success_ms: timestamp of last successrefresh.last_failure_ms: timestamp of last failurerefresh.last_error: last error string (if any)
Tool metrics:
tool_metrics.<tool>.calls: total calls per tooltool_metrics.<tool>.errors: total error responses per tooltool_metrics.<tool>.error_rate_bps: error ratio in basis points (10000 * errors / calls)tool_metrics.<tool>.total_ms: cumulative latencytool_metrics.<tool>.avg_ms: average duration in mstool_metrics.<tool>.p95_ms: approximate p95 latency in mstool_metrics.<tool>.p99_ms: approximate p99 latency in mstool_metrics.<tool>.max_ms: maximum duration in mstool_metrics.<tool>.buckets: latency buckets (<=5ms, <=10ms, <=50ms, <=100ms, <=500ms, <=1000ms, >1000ms)
Search concurrency:
search_concurrency.enabled: whether queue/concurrency controls are activesearch_concurrency.max_concurrent: configured slot limitsearch_concurrency.in_flight: active searchessearch_concurrency.saturated: true when no execution slots are freesearch_concurrency.max_queue: configured queue lengthsearch_concurrency.queued: searches waiting in queuesearch_concurrency.queue_saturated: true when slots and queue are both full
Cache stats:
manifest_cache.hits: number of times a cached manifest was usedmanifest_cache.misses: number of times a remote fetch was attempted
Artifact consumer stats (when prebuilt artifact mode is configured):
artifact_consumer.enabled: whether remote artifact mode is activeartifact_consumer.fetch_policy:if_missing|always|neverartifact_consumer.metadata_validated: metadata contract was parsed and validatedartifact_consumer.storage_materialized: storage archive was materialized in this load cycleartifact_consumer.models_materialized: models archive was materialized in this load cycleartifact_consumer.last_evaluated_at_ms: timestamp for last artifact-consumer decisionartifact_consumer.last_materialized_at_ms: timestamp for last successful materialization (if any)
Rate limiting¶
Nova applies tool rate limits by default (search=60,execute_sql=20,default=120 per 60 seconds). It returns RATE_LIMITED when a tool exceeds its configured calls per window. Consider increasing limits with per-tool overrides (e.g., search=120,execute_sql=30,default=60) or set DBT_NOVA_TOOL_RATE_LIMITS= to disable.
If status=refreshing, Nova is building new indexes in the background and serving the active version until the swap completes.
If status=failed during startup, Nova keeps retrying refresh attempts on the configured interval. Once the underlying manifest source becomes valid, status returns to ready automatically without a process restart.
Force a refresh (local manifest)¶
- Edit or replace the local
manifest.jsonfile. - Ensure
DBT_NOVA_MANIFEST_REFRESH_SECSis set (e.g.5). - Wait for the next refresh interval; confirm
manifest.hashchanges viahealth.
Reload manifest source without restart¶
Use the reload_manifest tool to point Nova at a new manifest path/URI and rebuild indexes in the background:
Force a refresh (remote manifest)¶
If using http(s)://, s3://, gs://, or dbfs://:
- Update the remote manifest.
- Set a small refresh interval:
DBT_NOVA_MANIFEST_REFRESH_SECS=5. - If the remote doesn’t expose modified metadata, you can clear cache:
Nova will refetch on the next refresh cycle.
Clear caches with script¶
Use the helper script to clean cache directories safely:
# Preview what would be removed
scripts/clean_cache.sh --dry-run
# Remove all cache directories (manifests + instances + embeddings)
scripts/clean_cache.sh --all --yes
# Remove only manifest cache
scripts/clean_cache.sh --manifests --yes
By default, the script uses DBT_NOVA_STORAGE_DIR (or .dbt-nova) as the storage root. Override paths with:
--storage-root <path>--manifest-cache-dir <path>--embeddings-cache-dir <path>
Cache fallback behavior¶
If a remote fetch fails, Nova will fall back to the cached manifest (if available) and log a warning. This prevents downtime when remote storage is temporarily unavailable.
To test fallback behavior:
- Set
DBT_NOVA_MANIFEST_URIto a remote source. - Ensure a cached copy exists under
<storage_root>/manifests/. - Make the remote temporarily unavailable.
- Confirm
resolve_manifestreturnscached=trueandhealthremainsready.
Storage Layout¶
<storage_root>/
.fastembed_cache/
manifests/
<hash>.json
<hash>.meta.json
instances/
<instance_id>/
manifest.current.json
versions/
<manifest_hash>/
entities.bin
entities.idx
entities.checksum.json
index/
indexes.rkyv
embeddings.rkyv.zst
sparse_embeddings.rkyv.zst
manifest.signature.json
.in_use.lock
.build.lock
manifest.current.jsonpoints at the active version.versions/contains immutable, content-addressed index builds..in_use.lockprotects active versions from pruning.
Pruning Behavior¶
Nova prunes old versions when either: - DBT_NOVA_STORAGE_MAX_INSTANCES is exceeded, or - DBT_NOVA_STORAGE_MAX_BYTES is exceeded.
It always keeps at least DBT_NOVA_STORAGE_MIN_VERSIONS versions per instance, even if size limits are exceeded.
Troubleshooting Checklist¶
Systematic Debugging
Work through this checklist in order. Most issues are resolved in steps 1-3.
- Check
healthforstatus,manifest.hash, andrefresh.*counters. - Confirm
DBT_MANIFEST_PATH/DBT_NOVA_MANIFEST_URIpoints to the correct manifest. - If remote, verify auth env vars for the chosen scheme (see Manifest Sources).
- Ensure the cache directory is writable and not full (
<storage_root>/manifests). - Clear cache if stale:
rm -rf <storage_root>/manifests. - Confirm storage pruning didn't remove the active version (look at
.in_use.lock).
See Also¶
- Configuration Reference - All environment variables
- Manifest Sources - Remote manifest authentication
- Performance - Optimization tips