Performance Tuning¶

Search Latency¶

Benchmark Profiles (Reproducible)¶

The table below is generated from the search_eval harness on the fixture manifest/qrels and should be treated as a profile reference, not a hard SLA.

Command:

DBT_NOVA_EVAL_ENABLE_HYBRID=1 \
DBT_NOVA_EVAL_ENABLE_LIFECYCLE=1 \
cargo test --locked --test search_eval compare_lexical_vs_hybrid_search_quality -- --ignored --nocapture

For CI smoke and other network-restricted runs:

DBT_NOVA_EVAL_ENABLE_HYBRID=0 \
DBT_NOVA_EVAL_ENABLE_LIFECYCLE=0 \
DBT_NOVA_EVAL_ALLOW_EMBEDDING_DOWNLOAD=0 \
cargo test --locked --test search_eval compare_lexical_vs_hybrid_search_quality -- --ignored --nocapture

Captured: 2026-02-07 (fixture workload, k=10, 10 evaluation queries)

Profile	hit_rate	recall	mrr	ndcg	mean_ms	p95_ms
lexical_only	1.0000	1.0000	0.8667	0.8808	15.35	22.61
hybrid	1.0000	1.0000	1.0000	0.9808	342.03	390.72
delta (hybrid - lexical)	0.0000	0.0000	0.1333	0.1000	326.68	368.11

Lifecycle timings from the same run:

Profile	cold_start_ms	reload_swap_ms
lexical_only	599.68	713.92
hybrid	5015.34	5927.24
delta (hybrid - lexical)	4415.66	5213.31

Reducing Latency¶

Disable reranker for interactive search:
```
DBT_NOVA_SEARCH_ENABLE_RERANKER=false
```
Reduce vector candidates:
```
DBT_NOVA_SEARCH_VECTOR_TOP_K=100
```

Enable quantization for large manifests:

DBT_NOVA_SEARCH_ENABLE_VECTOR_QUANTIZATION=true

Memory Usage¶

Baseline Requirements¶

Manifest Size	Memory
1,000 entities	~200MB
10,000 entities	~800MB
50,000 entities	~3GB

Reducing Memory¶

Disable unused features:
```
DBT_NOVA_SEARCH_ENABLE_SPARSE=false
```

Limit cache sizes:

DBT_NOVA_COLUMN_LINEAGE_MAX_CANDIDATES=5000

Notes¶

The first load builds indexes and embeddings; subsequent runs reuse rkyv caches.
Latency depends on manifest size, enabled features, and storage IO.

Measuring Embedding Uplift¶

Use the manual search evaluation harness to quantify quality and latency deltas between lexical-only and hybrid search.

Evaluate on fixture qrels¶

DBT_NOVA_EVAL_ALLOW_EMBEDDING_DOWNLOAD=1 \
DBT_NOVA_EVAL_REQUIRE_MODELS=1 \
cargo test --test search_eval -- --ignored --nocapture

The report includes:

hit_rate@k
recall@k
mrr@k
ndcg@k
mean and p95 latency
cold startup time
reload swap time (time from reload trigger until new manifest hash becomes active)

Evaluate on your manifest/qrels¶

DBT_NOVA_EVAL_MANIFEST_PATH=/path/to/manifest.json \
DBT_NOVA_EVAL_QRELS_PATH=/path/to/qrels.json \
DBT_NOVA_EVAL_EMBEDDINGS_CACHE_DIR="$HOME/.dbt-nova-models" \
DBT_NOVA_EVAL_ALLOW_EMBEDDING_DOWNLOAD=1 \
DBT_NOVA_EVAL_TOP_K=10 \
DBT_NOVA_EVAL_MIN_QUERY_COUNT=25 \
cargo test --test search_eval -- --ignored --nocapture

For large manifests you can increase reload timing timeout:

DBT_NOVA_EVAL_RELOAD_TIMEOUT_SECS=1200 \
cargo test --test search_eval -- --ignored --nocapture

Disable lifecycle timing (quality-only run):

DBT_NOVA_EVAL_ENABLE_LIFECYCLE=0 \
DBT_NOVA_EVAL_ALLOW_EMBEDDING_DOWNLOAD=1 \
cargo test --test search_eval -- --ignored --nocapture

Optional assertions (for CI gates)¶

DBT_NOVA_EVAL_ASSERT_HYBRID_NONDECREASING=1 \
DBT_NOVA_EVAL_ASSERT_MIN_DELTA_MRR=0.02 \
DBT_NOVA_EVAL_ASSERT_MIN_DELTA_RECALL=0.03 \
DBT_NOVA_EVAL_ASSERT_MAX_COLD_START_MS=60000 \
DBT_NOVA_EVAL_ASSERT_MAX_RELOAD_SWAP_MS=90000 \
DBT_NOVA_EVAL_ALLOW_EMBEDDING_DOWNLOAD=1 \
DBT_NOVA_EVAL_REQUIRE_MODELS=1 \
cargo test --test search_eval -- --ignored --nocapture

qrels format¶

tests/fixtures/search_eval_qrels.json is the reference shape. Each query defines expected relevant unique_ids with optional graded relevance.

Recommended for release evaluation:

at least 25-50 queries
no duplicate query IDs
no duplicate relevant unique_id entries within a query
positive relevance grades only