Architecture¶

dbt-nova streams the dbt manifest, stores rkyv-encoded Entity records (with a raw JSON payload string for fidelity), and builds/persists indexes for fast queries.

Design Philosophy¶

Store Everything – raw JSON payload preserved alongside typed Entity.
Index Everything – build indexes at load time for O(1) lookups.
Use What dbt Gives Us – leverage parent_map/child_map.
Consolidate Tools – fewer powerful tools over many tiny ones.
Consistent Responses – standard response envelope for all tools.

Architecture Overview (Layers)¶

flowchart TD
  A["MCP Clients"] --> B["DbtNovaServer"]
  B --> C["Tool Router + Params/Responses"]
  C --> D["ManifestSearch"]
  D --> E["Indexes + Embeddings Cache"]
  D --> F["EntityStore (rkyv + mmap)"]
  G["Manifest Sources"] --> H["ManifestLoader"]
  H --> D
  I["SQL Provider"] --> J["Databricks API"]

Architecture Overview (Detailed)¶

Tool Map (26 MCP tools)¶

flowchart LR
  B["DbtNovaServer<br/>server/mcp.rs"] --> TR["Tool Router"]

  subgraph Discovery["Discovery + Ops"]
    TR --> T_search["search"]
    TR --> T_list_entities["list_entities"]
    TR --> T_find_by_path["find_by_path"]
    TR --> T_search_recipes["search_recipes"]
    TR --> T_get_recipe["get_recipe"]
    TR --> T_run_recipe["run_recipe"]
    TR --> T_show_metadata["show_metadata"]
    TR --> T_list_tags["list_tags"]
    TR --> T_list_packages["list_packages"]
    TR --> T_list_databases["list_databases"]
    TR --> T_health["health"]
    TR --> T_reload["reload_manifest"]
  end

  subgraph Entity["Entity Access"]
    TR --> T_get_entity["get_entity"]
    TR --> T_batch_get["batch_get_entities"]
    TR --> T_get_context["get_context"]
  end

  subgraph Lineage["Lineage + Impact"]
    TR --> T_get_lineage["get_lineage"]
    TR --> T_get_column_lineage["get_column_lineage"]
    TR --> T_get_impact["get_impact"]
  end

  subgraph SchemaSQL["Schema + SQL"]
    TR --> T_get_sql["get_sql"]
    TR --> T_get_columns["get_columns"]
    TR --> T_diff_entities["diff_entities"]
    TR --> T_execute_sql["execute_sql"]
  end

  subgraph QualityGov["Quality + Governance"]
    TR --> T_get_test_coverage["get_test_coverage"]
    TR --> T_get_undocumented["get_undocumented"]
    TR --> T_validate_dag["validate_dag"]
    TR --> T_metadata_score["get_metadata_score"]
  end

Runtime + Storage Map¶

flowchart TD
  M1["manifest.json"] --> L["ManifestLoader<br/>manifest/loader.rs + loader/{parse,runtime,storage}.rs"]
  M2["manifest URI"] --> P["Manifest Providers"]
  P --> PF[Local Files]
  P --> PH[HTTP]
  P --> PS[S3]
  P --> PG[GCS]
  P --> PD[DBFS]
  PF --> L
  PH --> L
  PS --> L
  PG --> L
  PD --> L

  L --> S["ManifestSearch<br/>manifest/search/core.rs + search/summary/*.rs"]
  S --> IDX["Indexes<br/>rkyv cache"]
  S --> TANT["Tantivy BM25 + n-gram + fuzzy"]
  S --> VEC["Vector Search<br/>dense + sparse + rerank"]
  S --> STORE["EntityStore<br/>rkyv + mmap"]

  STORE --> BIN["entities.bin + entities.idx"]
  IDX --> RK1["indexes.rkyv"]
  VEC --> RK2["embeddings.rkyv.zst"]
  VEC --> RK3["sparse_embeddings.rkyv.zst"]
  VEC --> MODELS["fastembed model cache"]

  SQLP["SQL Provider"] --> DBX["Databricks API"]

Implementation note: - manifest/loader.rs orchestrates the load flow and delegates parsing, runtime helpers, and storage lifecycle functions to manifest/loader/. - persona and Nova metadata summary builders live under manifest/search/summary/ (persona.rs, nova.rs) to keep search response shaping isolated from core query orchestration.

Tool → Runtime Dependencies¶

Tool	Primary components
`search`	Tantivy + Vector Search
`list_entities`	Indexes
`find_by_path`	Indexes
`search_recipes`	ManifestSearch + manifest `analysis`
`get_recipe`	ManifestSearch + manifest `analysis`
`run_recipe`	ManifestSearch + execute_sql
`show_metadata`	Indexes
`list_tags`	Indexes
`list_packages`	Indexes
`list_databases`	Indexes
`health`	ManifestSearch
`reload_manifest`	ManifestLoader
`get_entity`	EntityStore
`batch_get_entities`	EntityStore
`get_context`	EntityStore + Indexes
`get_lineage`	Indexes
`get_column_lineage`	Indexes
`get_impact`	Indexes
`get_sql`	EntityStore
`get_columns`	EntityStore
`diff_entities`	EntityStore
`execute_sql`	SQL Provider
`get_test_coverage`	Indexes
`get_undocumented`	Indexes
`validate_dag`	Indexes
`get_metadata_score`	EntityStore + Indexes

Legend¶

Node type	Meaning
Node	Runtime component or tool
Grouped subgraph	Logical layer (API, tools, core, storage, providers)
Arrow	Primary dependency or data flow

Background Loading & Health¶

ManifestSearchHandle builds indexes in the background and transitions through Loading → Ready → Refreshing and can enter Failed when startup or refresh resolution fails. Tools return INDEX_BUILDING until a ready index exists; the health tool reports readiness and provides metadata about retrievers.

If a refresh swap fails after a ready manifest is active, Nova keeps serving the previous index. If startup fails completely, status remains failed until the next successful refresh cycle.

Storage & Concurrency¶

Nova stores all on-disk artifacts under <storage_root>/ with a shared embeddings cache (.fastembed_cache) and per-manifest instances:

<storage_root>/
  .fastembed_cache/
  manifests/
    <hash>.json
    <hash>.meta.json
  instances/
    manifest-<hash>/
      manifest.current.json
      versions/
        <manifest_hash>/
          entities.bin
          entities.idx
          entities.checksum.json
          index/
          indexes.rkyv
          embeddings.rkyv.zst
          sparse_embeddings.rkyv.zst
          manifest.signature.json
          .in_use.lock
      .build.lock

Key behaviors: - Manifest hash instance IDs: the same manifest path reuses the same instance dir. - Build lock: only one process builds indexes; others wait (configurable) or reuse. - In-use lock: active instances are protected from pruning/cleanup. - Versioned swaps: refreshed manifests build into a new version directory and swap atomically. - Pruning: old instances/versions are removed by count and/or size limits.

Project Structure (High Level)¶

dbt-nova/
├── src/
│   ├── main.rs
│   ├── lib.rs
│   ├── config/
│   ├── bin/
│   ├── dbt_types.rs
│   ├── error.rs
│   ├── params.rs
│   ├── responses.rs
│   ├── server/
│   ├── tools/
│   ├── manifest/
│   ├── utils/
│   └── warehouse/
├── tests/
├── docs/
└── examples/