Nova Search Ranking (Design Spec)¶
This document defines how Nova meta is indexed and ranked to improve discoverability without increasing metadata maintenance overhead.
Goals¶
- Make canonical datasets and metrics surface first.
- Allow analysts to search by business terms (synonyms, use cases, domains).
- Support KPI discovery without creating duplicate per‑slice metric models.
Indexed Nova Fields¶
Nova meta is indexed into dedicated search fields:
| Meta location | Field | Purpose |
|---|---|---|
meta.nova.synonyms | nova_synonyms | Business terms for the dataset |
meta.nova.domains | nova_domains | Domain routing (e.g., web, ecommerce) |
meta.nova.use_cases | nova_use_cases | Intent queries (e.g., weekly_report) |
meta.nova.measures[].name + synonyms[] | nova_measures | Measure discovery (sessions, revenue) |
meta.nova.metric(s).name + synonyms[] | nova_metric | KPI discovery (conversion_rate) |
meta.nova.metric(s).description | nova_metric | Metric definition keywords |
meta.nova.governance.sensitivity | nova_sensitivity | Governance discovery (none/public/internal/confidential/low/medium/high/restricted) |
meta.nova.governance.pii | nova_pii | PII signal (pii + level) |
meta.nova.governance.compliance[] | nova_compliance | Compliance frameworks (gdpr, soc2, hipaa) |
Column‑level meta.nova.synonyms, semantic_type, and example_values are appended to the columns search field to improve column discovery (e.g., “UK” → country_code).
Default Boosts (Config)¶
These values are tuned for high signal with minimal noise:
DBT_NOVA_ALIAS_BOOST=18.0
DBT_NOVA_NAME_BOOST=12.0
DBT_NOVA_DESCRIPTION_BOOST=6.0
DBT_NOVA_COLUMN_BOOST=4.0
DBT_NOVA_TAG_BOOST=3.0
DBT_NOVA_PATH_BOOST=2.0
DBT_NOVA_CODE_BOOST=1.5
DBT_NOVA_META_SYNONYMS_BOOST=7.0
DBT_NOVA_META_MEASURES_BOOST=8.0
DBT_NOVA_META_METRIC_BOOST=10.0
DBT_NOVA_META_SENSITIVITY_BOOST=6.0
DBT_NOVA_META_PII_BOOST=8.0
DBT_NOVA_META_COMPLIANCE_BOOST=6.0
DBT_NOVA_META_DOMAINS_BOOST=4.0
DBT_NOVA_META_USE_CASES_BOOST=4.0
DBT_NOVA_SEARCH_STAGING_DEBOOST_FACTOR=0.6
DBT_NOVA_SEARCH_MEASURE_MATCH_MULTIPLIER=1.15
DBT_NOVA_SEARCH_METRIC_MATCH_MULTIPLIER=1.20
DBT_NOVA_SEARCH_SYNONYM_MATCH_MULTIPLIER=1.20
DBT_NOVA_SEARCH_CANONICAL_MATCH_MULTIPLIER=1.08
DBT_NOVA_SEARCH_CANONICAL_META_MATCH_MULTIPLIER=1.35
DBT_NOVA_SEARCH_CANONICAL_META_MATCH_BONUS=2.5
DBT_NOVA_SEARCH_ENGINEER_EXACT_MATCH_MULTIPLIER=2.0
Ranking Behavior¶
1) Exact business terms (synonyms/metric names) score close to name matches.
2) Measures are next‑highest (sessions, revenue) so KPI‑driven queries surface the canonical model even without a metric model. 3) Domains/use_cases provide intent routing but stay below synonyms/measures. 4) description, columns, tags, path, and code remain as backstops. 5) Staging-layer models are de‑boosted so curated models rank higher. This deboost is layer-rule driven (DBT_NOVA_LAYER_RULES_JSON) and applies when the resolved layer is staging, stage, or stg. 6) Nova meta re‑ranking boosts canonical models when query tokens match meta.nova.measures, meta.nova.metric(s), or meta.nova.synonyms. 7) Governance terms (sensitivity/pii/compliance) get elevated when present. 8) Analyst semantic readiness adds a bounded multiplier for entities with metric/measure definitions, grain/time-field metadata, and query-aligned dimensions. Non-semantic entities receive a slight de-boost. 9) Analyst near-tie hinting emits analysis_hints when top candidates are within a small score gap, prompting get_entity/get_context validation before SQL generation.
Scoring Pipeline (Lexical + Vector + RRF + Reranker)¶
Nova fuses multiple retrieval channels into a single ranked list:
1) Lexical search uses Tantivy BM25 with field boosts. 2) Vector (dense) and sparse retrievers produce their own ranked lists. 3) RRF fusion combines all retrievers into a single score. 4) Reranker (if enabled) rescales the top‑N results using a cross‑encoder.
Weighted RRF formula¶
For each retriever list, Nova assigns a score based on rank position:
weightis the persona‑specific weight for that retriever (e.g.,vector,bm25,sparse).kis the RRF smoothing constant (DBT_NOVA_SEARCH_RRF_K).rankis 0‑based within that retriever list.
Final RRF scores are summed across retrievers to produce the fused ranking.
Reranker adjustments¶
If DBT_NOVA_SEARCH_ENABLE_RERANKER=true, Nova reranks the top‑N fused hits (DBT_NOVA_SEARCH_RERANK_TOP_N) using a cross‑encoder model. The reranker reorders the top‑N results while keeping the remainder of the list in fused order.
Persona Weights¶
Persona-specific weights (analyst/engineer/governance) are applied after base scoring. They amplify different signals without changing the underlying Nova metadata fields.
Current defaults (configurable via the persona weight env vars below):
| Signal | Analyst | Engineer | Governance | Default |
|---|---|---|---|---|
| bm25 | 1.0 | 1.5 | 1.2 | 1.0 |
| ngram | 1.1 | 1.0 | 1.0 | 1.0 |
| fuzzy | 1.0 | 0.8 | 1.0 | 1.0 |
| vector | 1.5 | 0.8 | 1.0 | 1.0 |
| sparse | 1.2 | 1.0 | 1.3 | 1.0 |
| measures | 1.3 | 0.9 | 0.9 | 1.0 |
| metrics | 1.4 | 0.9 | 0.9 | 1.0 |
| synonyms | 1.2 | 0.9 | 1.0 | 1.0 |
| docs | 1.2 | 0.9 | 1.3 | 1.0 |
| tests | 1.2 | 1.0 | 1.4 | 1.0 |
| tags | 1.0 | 0.9 | 1.4 | 1.0 |
| path | 0.9 | 1.3 | 1.0 | 1.0 |
If you need different weightings, override via env:
DBT_NOVA_SEARCH_PERSONA_ANALYST_WEIGHTS="vector=1.6,docs=1.3"
DBT_NOVA_SEARCH_PERSONA_ENGINEER_WEIGHTS="bm25=1.6,path=1.4"
DBT_NOVA_SEARCH_PERSONA_GOVERNANCE_WEIGHTS="tags=1.6,tests=1.5"
DBT_NOVA_SEARCH_PERSONA_DEFAULT_WEIGHTS="bm25=1.0"
Any weight not listed remains at its default.
Persona Resource-Type Priorities¶
After lexical/vector/sparse fusion, Nova applies persona-aware resource-type multipliers to reduce cross-persona noise:
- Analyst: boosts
metric,semantic_model,saved_query, and curatedmodelresults; de-booststestandmacro. - Engineer: boosts
model,source,test, andmacro; slightly de-boosts pure business artifacts (metric,semantic_model,saved_query). - Governance: strongly boosts
test, and also boostsmodel,source, andexposure; de-boostsmacro.
These multipliers are deterministic defaults in code and complement persona retriever weights. The goal is to keep top-K results high-signal for each agent persona without requiring strict resource-type filters on every call.
Why This Avoids Metric Model Explosion¶
With nova_metric and nova_measures indexed:
- Query: “conversion rate” → matches
metric(s).synonymsin canonical base model. - Query: “sessions” → matches measure name/synonyms.
Analysts can then apply filters (web/app, country, device) at query time instead of having one metric model per slice.
When to Add Metric Models Anyway¶
Metric models are still recommended for:
- Cross‑model joins or complex logic.
- Executive or compliance KPIs that must be enforced centrally.
See Also¶
- Tools Reference - Search tool documentation
- Configuration Reference - Environment variables
- Search Defaults - Default search behavior
- Personas - Persona-specific search behavior