Data Connectors¶

SlideFlow supports six connector types for chart/replacement data sources:

csv
json
databricks
duckdb
dbt (composable, preferred)
databricks_dbt (legacy, still supported)

Use these in any data_source block for charts or replacements.

For a step-by-step migration from legacy databricks_dbt to composable dbt, see DBT Migration Guide.

Installation extras¶

Install connector dependencies based on the connectors you use:

# Base package
pip install slideflow-presentations

# Databricks SQL connector
pip install "slideflow-presentations[databricks]"

# dbt connectors (includes dbt-core adapter stack + Git clone support)
pip install "slideflow-presentations[dbt]"

# Optional dbt warehouse backends
pip install "slideflow-presentations[bigquery]"
pip install "slideflow-presentations[duckdb]"

Connector Matrix¶

Type	Best for	Requires network	Required env vars
`csv`	local tabular files	no	none
`json`	local API exports/events	no	none
`databricks`	direct warehouse SQL	yes	`DATABRICKS_HOST`, `DATABRICKS_HTTP_PATH`, `DATABRICKS_ACCESS_TOKEN`
`duckdb`	direct local/in-memory SQL	no	none
`dbt`	dbt model SQL executed on Databricks, BigQuery, or DuckDB (composable config)	yes/no (depends on warehouse and dbt repo)	Databricks env vars, BigQuery project/auth env vars, or none for local DuckDB (+ Git token env if needed)
`databricks_dbt`	dbt model SQL executed on Databricks	yes	same Databricks env vars (+ Git token env if needed)

CSV¶

data_source:
  type: "csv"
  name: "sales_csv"
  file_path: "./data/sales.csv"

Notes:

Uses pandas.read_csv with default parsing behavior.
Relative paths resolve from your current execution directory.

JSON¶

data_source:
  type: "json"
  name: "events_json"
  file_path: "./data/events.json"
  orient: "records"

Supported orient values:

split
records
index
columns
values
table

If your JSON shape does not match orient, parsing fails.

Databricks SQL¶

data_source:
  type: "databricks"
  name: "warehouse_query"
  query: |
    SELECT month, revenue, target
    FROM mart.revenue_summary
    WHERE fiscal_quarter = '{quarter}'
  # optional connector runtime overrides:
  # socket_timeout_s: 300
  # retry_max_attempts: 30
  # retry_max_duration_s: 900
  # retry_delay_min_s: 1
  # retry_delay_max_s: 60

Required environment:

export DATABRICKS_HOST="<workspace-hostname>"
export DATABRICKS_HTTP_PATH="<sql-warehouse-http-path>"
export DATABRICKS_ACCESS_TOKEN="<token>"

Optional Databricks connector runtime env tuning:

SLIDEFLOW_DATABRICKS_SOCKET_TIMEOUT_S
SLIDEFLOW_DATABRICKS_RETRY_MAX_ATTEMPTS
SLIDEFLOW_DATABRICKS_RETRY_MAX_DURATION_S
SLIDEFLOW_DATABRICKS_RETRY_DELAY_MIN_S
SLIDEFLOW_DATABRICKS_RETRY_DELAY_MAX_S

Databricks request identification:

SlideFlow sets Databricks SQL user_agent_entry to Slideflow so warehouse query history can attribute sessions to SlideFlow runs.

Tips:

Keep SQL deterministic for reporting workflows.
Limit columns to what chart/replacement logic needs.
Prefer validated parameter substitution ({quarter} from batch params) over string concatenation.

DuckDB SQL¶

data_source:
  type: "duckdb"
  name: "local_duckdb_query"
  database: "/tmp/analytics.duckdb" # optional; defaults to ':memory:'
  read_only: true # optional; defaults to true
  file_search_path: # optional; used for relative file references in DuckDB
    - "/tmp/data"
    - "/tmp/snapshots"
  query: |
    SELECT * FROM sales_summary

Notes:

Install DuckDB runtime deps: pip install "slideflow-presentations[duckdb]".
file_search_path can be a list or a comma-separated string.
If file_search_path is omitted, DuckDB uses its default file search behavior.

dbt on Databricks (`dbt`, preferred)¶

This connector compiles a dbt project, resolves a model's compiled SQL, then executes it on Databricks.

data_source:
  type: "dbt"
  name: "dbt_model"
  model_alias: "monthly_revenue_by_region"
  dbt:
    package_url: "https://$GIT_TOKEN@github.com/org/analytics-dbt.git"
    project_dir: "/tmp/dbt_project_workspace"
    branch: "main"
    target: "prod"
    vars:
      start_date: "2026-01-01"
      end_date: "2026-01-31"
    profiles_dir: "/path/to/profiles"
    profile_name: "analytics"
  warehouse:
    type: "databricks"

Behavior highlights:

Repositories are cloned under project_dir/.slideflow_dbt_clones/<key>.
project_dir is treated as a workspace root, not a direct clone target.
If package_url embeds $TOKEN_NAME, that env var must exist at runtime.
If profiles_dir is provided, SlideFlow copies profiles into the cloned dbt workspace and runs dbt with --profiles-dir <clone_dir>.
If profiles_dir is omitted but the cloned repo contains profiles.yml at project root, SlideFlow auto-uses that project-root profiles file.
Compile/dependency work for identical manifest cache keys is deduplicated across concurrent presentation threads in a single run.
If multiple dbt nodes share model_alias, set one of:
model_unique_id
model_package_name
model_selector_name to avoid ambiguity errors.

dbt on BigQuery (`dbt`)¶

This connector shape compiles a dbt project, resolves a model's compiled SQL, then executes it on BigQuery.

data_source:
  type: "dbt"
  name: "dbt_model_bigquery"
  model_alias: "monthly_revenue_by_region"
  dbt:
    package_url: "https://$GIT_TOKEN@github.com/org/analytics-dbt.git"
    project_dir: "/tmp/dbt_project_workspace"
    branch: "main"
    target: "prod"
    vars:
      start_date: "2026-01-01"
      end_date: "2026-01-31"
    profiles_dir: "/path/to/profiles"
    profile_name: "analytics"
  warehouse:
    type: "bigquery"
    project_id: "my-gcp-project" # optional if BIGQUERY_PROJECT/GOOGLE_CLOUD_PROJECT set
    location: "US" # optional
    credentials_path: "/path/to/service-account.json" # optional
    # credentials_json: '{"type":"service_account",...}' # optional alternative

BigQuery runtime options:

Install BigQuery runtime dependencies: pip install "slideflow-presentations[bigquery]".
Set project id via:
warehouse.project_id, or
BIGQUERY_PROJECT, or
GOOGLE_CLOUD_PROJECT.
Auth options:
warehouse.credentials_path, or
warehouse.credentials_json, or
Application Default Credentials (for example GOOGLE_APPLICATION_CREDENTIALS).
SlideFlow initializes the BigQuery client with client_info.user_agent set to Slideflow for request attribution in Google-side telemetry.

dbt on DuckDB (`dbt`)¶

This connector shape compiles a dbt project, resolves a model's compiled SQL, then executes it on DuckDB.

data_source:
  type: "dbt"
  name: "dbt_model_duckdb"
  model_alias: "monthly_revenue_by_region"
  dbt:
    package_url: "https://$GIT_TOKEN@github.com/org/analytics-dbt.git"
    project_dir: "/tmp/dbt_project_workspace"
    branch: "main"
    target: "prod"
  warehouse:
    type: "duckdb"
    database: "/tmp/warehouse.duckdb" # required for dbt+duckdb
    read_only: true # optional; defaults to true
    file_search_path: # optional
      - "/tmp/dbt_project_workspace"
      - "/tmp/data"

Legacy dbt on Databricks (`databricks_dbt`)¶

This legacy shape is still fully supported for backward compatibility.

data_source:
  type: "databricks_dbt"
  name: "dbt_model_legacy"
  model_alias: "monthly_revenue_by_region"
  package_url: "https://$GIT_TOKEN@github.com/org/analytics-dbt.git"
  project_dir: "/tmp/dbt_project_workspace"
  branch: "main"
  target: "prod"
  vars:
    start_date: "2026-01-01"
    end_date: "2026-01-31"

Caching and Execution¶

SlideFlow caches connector fetches by config identity, which helps when:

multiple charts use the same query/file in one run
multiple replacements reuse one source

Treat connectors as read-only sources during a run for predictable results.

Cache/compile tuning env vars:

SLIDEFLOW_DATA_CACHE_MAX_ENTRIES (global source cache cap)
SLIDEFLOW_DBT_CACHE_MAX_ENTRIES (default from built-in constants)
SLIDEFLOW_DBT_COMPILE_FAILURE_BACKOFF_S
SLIDEFLOW_DBT_FAILURE_CACHE_MAX_ENTRIES

Recommended Workflow¶

Start with local csv/json while designing charts and replacements.
Move to duckdb or databricks once schema and logic are stable.
Move to dbt when business logic should live in dbt models (databricks_dbt remains supported as legacy syntax).
Run slideflow validate before slideflow build in CI/CD.

Troubleshooting¶

File connector errors: check file existence and relative path assumptions.
Databricks auth errors: verify all three Databricks env vars.
dbt model not found: check model_alias, branch, and target.
dbt alias ambiguity: add model_unique_id, model_package_name, or model_selector_name.
dbt Git clone fails: verify token variable in package_url and repo access.