Architecture¶
dbt-nova streams the dbt manifest, stores rkyv-encoded Entity records (with a raw JSON payload string for fidelity), and builds/persists indexes for fast queries.
Design Philosophy¶
- Store Everything – raw JSON payload preserved alongside typed
Entity. - Index Everything – build indexes at load time for O(1) lookups.
- Use What dbt Gives Us – leverage
parent_map/child_map. - Consolidate Tools – fewer powerful tools over many tiny ones.
- Consistent Responses – standard response envelope for all tools.
Architecture Overview (Layers)¶
flowchart TD
A["MCP Clients"] --> B["DbtNovaServer"]
B --> C["Tool Router + Params/Responses"]
C --> D["ManifestSearch"]
D --> E["Indexes + Embeddings Cache"]
D --> F["EntityStore (rkyv + mmap)"]
G["Manifest Sources"] --> H["ManifestLoader"]
H --> D
I["SQL Provider"] --> J["Databricks API"] Architecture Overview (Detailed)¶
Tool Map (26 MCP tools)¶
flowchart LR
B["DbtNovaServer<br/>server/mcp.rs"] --> TR["Tool Router"]
subgraph Discovery["Discovery + Ops"]
TR --> T_search["search"]
TR --> T_list_entities["list_entities"]
TR --> T_find_by_path["find_by_path"]
TR --> T_search_recipes["search_recipes"]
TR --> T_get_recipe["get_recipe"]
TR --> T_run_recipe["run_recipe"]
TR --> T_show_metadata["show_metadata"]
TR --> T_list_tags["list_tags"]
TR --> T_list_packages["list_packages"]
TR --> T_list_databases["list_databases"]
TR --> T_health["health"]
TR --> T_reload["reload_manifest"]
end
subgraph Entity["Entity Access"]
TR --> T_get_entity["get_entity"]
TR --> T_batch_get["batch_get_entities"]
TR --> T_get_context["get_context"]
end
subgraph Lineage["Lineage + Impact"]
TR --> T_get_lineage["get_lineage"]
TR --> T_get_column_lineage["get_column_lineage"]
TR --> T_get_impact["get_impact"]
end
subgraph SchemaSQL["Schema + SQL"]
TR --> T_get_sql["get_sql"]
TR --> T_get_columns["get_columns"]
TR --> T_diff_entities["diff_entities"]
TR --> T_execute_sql["execute_sql"]
end
subgraph QualityGov["Quality + Governance"]
TR --> T_get_test_coverage["get_test_coverage"]
TR --> T_get_undocumented["get_undocumented"]
TR --> T_validate_dag["validate_dag"]
TR --> T_metadata_score["get_metadata_score"]
end Runtime + Storage Map¶
flowchart TD
M1["manifest.json"] --> L["ManifestLoader<br/>manifest/loader.rs + loader/{parse,runtime,storage}.rs"]
M2["manifest URI"] --> P["Manifest Providers"]
P --> PF[Local Files]
P --> PH[HTTP]
P --> PS[S3]
P --> PG[GCS]
P --> PD[DBFS]
PF --> L
PH --> L
PS --> L
PG --> L
PD --> L
L --> S["ManifestSearch<br/>manifest/search/core.rs + search/summary/*.rs"]
S --> IDX["Indexes<br/>rkyv cache"]
S --> TANT["Tantivy BM25 + n-gram + fuzzy"]
S --> VEC["Vector Search<br/>dense + sparse + rerank"]
S --> STORE["EntityStore<br/>rkyv + mmap"]
STORE --> BIN["entities.bin + entities.idx"]
IDX --> RK1["indexes.rkyv"]
VEC --> RK2["embeddings.rkyv.zst"]
VEC --> RK3["sparse_embeddings.rkyv.zst"]
VEC --> MODELS["fastembed model cache"]
SQLP["SQL Provider"] --> DBX["Databricks API"] Implementation note: - manifest/loader.rs orchestrates the load flow and delegates parsing, runtime helpers, and storage lifecycle functions to manifest/loader/. - persona and Nova metadata summary builders live under manifest/search/summary/ (persona.rs, nova.rs) to keep search response shaping isolated from core query orchestration.
Tool → Runtime Dependencies¶
| Tool | Primary components |
|---|---|
search | Tantivy + Vector Search |
list_entities | Indexes |
find_by_path | Indexes |
search_recipes | ManifestSearch + manifest analysis |
get_recipe | ManifestSearch + manifest analysis |
run_recipe | ManifestSearch + execute_sql |
show_metadata | Indexes |
list_tags | Indexes |
list_packages | Indexes |
list_databases | Indexes |
health | ManifestSearch |
reload_manifest | ManifestLoader |
get_entity | EntityStore |
batch_get_entities | EntityStore |
get_context | EntityStore + Indexes |
get_lineage | Indexes |
get_column_lineage | Indexes |
get_impact | Indexes |
get_sql | EntityStore |
get_columns | EntityStore |
diff_entities | EntityStore |
execute_sql | SQL Provider |
get_test_coverage | Indexes |
get_undocumented | Indexes |
validate_dag | Indexes |
get_metadata_score | EntityStore + Indexes |
Legend¶
| Node type | Meaning |
|---|---|
| Node | Runtime component or tool |
| Grouped subgraph | Logical layer (API, tools, core, storage, providers) |
| Arrow | Primary dependency or data flow |
Background Loading & Health¶
ManifestSearchHandle builds indexes in the background and transitions through Loading → Ready → Refreshing and can enter Failed when startup or refresh resolution fails. Tools return INDEX_BUILDING until a ready index exists; the health tool reports readiness and provides metadata about retrievers.
If a refresh swap fails after a ready manifest is active, Nova keeps serving the previous index. If startup fails completely, status remains failed until the next successful refresh cycle.
Storage & Concurrency¶
Nova stores all on-disk artifacts under <storage_root>/ with a shared embeddings cache (.fastembed_cache) and per-manifest instances:
<storage_root>/
.fastembed_cache/
manifests/
<hash>.json
<hash>.meta.json
instances/
manifest-<hash>/
manifest.current.json
versions/
<manifest_hash>/
entities.bin
entities.idx
entities.checksum.json
index/
indexes.rkyv
embeddings.rkyv.zst
sparse_embeddings.rkyv.zst
manifest.signature.json
.in_use.lock
.build.lock
Key behaviors: - Manifest hash instance IDs: the same manifest path reuses the same instance dir. - Build lock: only one process builds indexes; others wait (configurable) or reuse. - In-use lock: active instances are protected from pruning/cleanup. - Versioned swaps: refreshed manifests build into a new version directory and swap atomically. - Pruning: old instances/versions are removed by count and/or size limits.