Skip to content

Nova Search Ranking (Design Spec)

This document defines how Nova meta is indexed and ranked to improve discoverability without increasing metadata maintenance overhead.

Goals

  • Make canonical datasets and metrics surface first.
  • Allow analysts to search by business terms (synonyms, use cases, domains).
  • Support KPI discovery without creating duplicate per‑slice metric models.

Indexed Nova Fields

Nova meta is indexed into dedicated search fields:

Meta location Field Purpose
meta.nova.synonyms nova_synonyms Business terms for the dataset
meta.nova.domains nova_domains Domain routing (e.g., web, ecommerce)
meta.nova.use_cases nova_use_cases Intent queries (e.g., weekly_report)
meta.nova.measures[].name + synonyms[] nova_measures Measure discovery (sessions, revenue)
meta.nova.metric(s).name + synonyms[] nova_metric KPI discovery (conversion_rate)
meta.nova.metric(s).description nova_metric Metric definition keywords
meta.nova.governance.sensitivity nova_sensitivity Governance discovery (none/public/internal/confidential/low/medium/high/restricted)
meta.nova.governance.pii nova_pii PII signal (pii + level)
meta.nova.governance.compliance[] nova_compliance Compliance frameworks (gdpr, soc2, hipaa)

Column‑level meta.nova.synonyms, semantic_type, and example_values are appended to the columns search field to improve column discovery (e.g., “UK” → country_code).

Default Boosts (Config)

These values are tuned for high signal with minimal noise:

DBT_NOVA_ALIAS_BOOST=18.0
DBT_NOVA_NAME_BOOST=12.0
DBT_NOVA_DESCRIPTION_BOOST=6.0
DBT_NOVA_COLUMN_BOOST=4.0
DBT_NOVA_TAG_BOOST=3.0
DBT_NOVA_PATH_BOOST=2.0
DBT_NOVA_CODE_BOOST=1.5

DBT_NOVA_META_SYNONYMS_BOOST=7.0
DBT_NOVA_META_MEASURES_BOOST=8.0
DBT_NOVA_META_METRIC_BOOST=10.0
DBT_NOVA_META_SENSITIVITY_BOOST=6.0
DBT_NOVA_META_PII_BOOST=8.0
DBT_NOVA_META_COMPLIANCE_BOOST=6.0
DBT_NOVA_META_DOMAINS_BOOST=4.0
DBT_NOVA_META_USE_CASES_BOOST=4.0
DBT_NOVA_SEARCH_STAGING_DEBOOST_FACTOR=0.6
DBT_NOVA_SEARCH_MEASURE_MATCH_MULTIPLIER=1.15
DBT_NOVA_SEARCH_METRIC_MATCH_MULTIPLIER=1.20
DBT_NOVA_SEARCH_SYNONYM_MATCH_MULTIPLIER=1.20
DBT_NOVA_SEARCH_CANONICAL_MATCH_MULTIPLIER=1.08
DBT_NOVA_SEARCH_CANONICAL_META_MATCH_MULTIPLIER=1.35
DBT_NOVA_SEARCH_CANONICAL_META_MATCH_BONUS=2.5
DBT_NOVA_SEARCH_ENGINEER_EXACT_MATCH_MULTIPLIER=2.0

Ranking Behavior

1) Exact business terms (synonyms/metric names) score close to name matches.
2) Measures are next‑highest (sessions, revenue) so KPI‑driven queries surface the canonical model even without a metric model. 3) Domains/use_cases provide intent routing but stay below synonyms/measures. 4) description, columns, tags, path, and code remain as backstops. 5) Staging-layer models are de‑boosted so curated models rank higher. This deboost is layer-rule driven (DBT_NOVA_LAYER_RULES_JSON) and applies when the resolved layer is staging, stage, or stg. 6) Nova meta re‑ranking boosts canonical models when query tokens match meta.nova.measures, meta.nova.metric(s), or meta.nova.synonyms. 7) Governance terms (sensitivity/pii/compliance) get elevated when present. 8) Analyst semantic readiness adds a bounded multiplier for entities with metric/measure definitions, grain/time-field metadata, and query-aligned dimensions. Non-semantic entities receive a slight de-boost. 9) Analyst near-tie hinting emits analysis_hints when top candidates are within a small score gap, prompting get_entity/get_context validation before SQL generation.

Scoring Pipeline (Lexical + Vector + RRF + Reranker)

Nova fuses multiple retrieval channels into a single ranked list:

1) Lexical search uses Tantivy BM25 with field boosts. 2) Vector (dense) and sparse retrievers produce their own ranked lists. 3) RRF fusion combines all retrievers into a single score. 4) Reranker (if enabled) rescales the top‑N results using a cross‑encoder.

Weighted RRF formula

For each retriever list, Nova assigns a score based on rank position:

rrf_score = weight / (k + rank + 1)
  • weight is the persona‑specific weight for that retriever (e.g., vector, bm25, sparse).
  • k is the RRF smoothing constant (DBT_NOVA_SEARCH_RRF_K).
  • rank is 0‑based within that retriever list.

Final RRF scores are summed across retrievers to produce the fused ranking.

Reranker adjustments

If DBT_NOVA_SEARCH_ENABLE_RERANKER=true, Nova reranks the top‑N fused hits (DBT_NOVA_SEARCH_RERANK_TOP_N) using a cross‑encoder model. The reranker reorders the top‑N results while keeping the remainder of the list in fused order.

Persona Weights

Persona-specific weights (analyst/engineer/governance) are applied after base scoring. They amplify different signals without changing the underlying Nova metadata fields.

Current defaults (configurable via the persona weight env vars below):

Signal Analyst Engineer Governance Default
bm25 1.0 1.5 1.2 1.0
ngram 1.1 1.0 1.0 1.0
fuzzy 1.0 0.8 1.0 1.0
vector 1.5 0.8 1.0 1.0
sparse 1.2 1.0 1.3 1.0
measures 1.3 0.9 0.9 1.0
metrics 1.4 0.9 0.9 1.0
synonyms 1.2 0.9 1.0 1.0
docs 1.2 0.9 1.3 1.0
tests 1.2 1.0 1.4 1.0
tags 1.0 0.9 1.4 1.0
path 0.9 1.3 1.0 1.0

If you need different weightings, override via env:

DBT_NOVA_SEARCH_PERSONA_ANALYST_WEIGHTS="vector=1.6,docs=1.3"
DBT_NOVA_SEARCH_PERSONA_ENGINEER_WEIGHTS="bm25=1.6,path=1.4"
DBT_NOVA_SEARCH_PERSONA_GOVERNANCE_WEIGHTS="tags=1.6,tests=1.5"
DBT_NOVA_SEARCH_PERSONA_DEFAULT_WEIGHTS="bm25=1.0"

Any weight not listed remains at its default.

Persona Resource-Type Priorities

After lexical/vector/sparse fusion, Nova applies persona-aware resource-type multipliers to reduce cross-persona noise:

  • Analyst: boosts metric, semantic_model, saved_query, and curated model results; de-boosts test and macro.
  • Engineer: boosts model, source, test, and macro; slightly de-boosts pure business artifacts (metric, semantic_model, saved_query).
  • Governance: strongly boosts test, and also boosts model, source, and exposure; de-boosts macro.

These multipliers are deterministic defaults in code and complement persona retriever weights. The goal is to keep top-K results high-signal for each agent persona without requiring strict resource-type filters on every call.

Why This Avoids Metric Model Explosion

With nova_metric and nova_measures indexed:

  • Query: “conversion rate” → matches metric(s).synonyms in canonical base model.
  • Query: “sessions” → matches measure name/synonyms.

Analysts can then apply filters (web/app, country, device) at query time instead of having one metric model per slice.

When to Add Metric Models Anyway

Metric models are still recommended for:

  • Cross‑model joins or complex logic.
  • Executive or compliance KPIs that must be enforced centrally.

See Also