dbt Integration

Your dbt run succeeded.
But the output has already drifted.

A source table changed upstream. Your model compiled fine. But the output has a new null pattern, a type shift, a missing column. Catch drift in transformed data before it reaches downstream consumers.

Get free API key → API reference

Free tier · 500K rows/month · no credit card required

Baselines your model schema on first run — every subsequent run is compared automatically for drift.

The problem

Before vs after

✕Before

Drift in transformed data goes undetected

A source table changes upstream. Your dbt model transforms it successfully — no compilation errors. But the output has a new null pattern, a type change, or a column that disappeared. dbt tests only catch what you wrote rules for.

dbt run ✓ → model output → downstream consumers
[ type changed in output — no alert ]
[ ML model training on bad features ]

✓After

Bad data never reaches your warehouse

DataScreenIQ screens the payload before storage. A BLOCK stops the pipeline instantly — bad rows go to a dead-letter queue, not your database.

dbt run ✓ → screen output → PASS → downstream ✓
→ BLOCK → alert + stop

Quick start

Add this after dbt run

python copy & paste

import datascreeniq as dsiq
import pandas as pd

df = pd.read_sql("SELECT * FROM analytics.fct_orders LIMIT 50000", conn)
report = dsiq.Client().screen_dataframe(df, source="fct_orders")
report.raise_on_block()  # raises if data has drifted critically

Run this after every dbt run. Your model output is now screened before downstream consumers see it.

Setup guide

Get running in minutes

Install the SDK, drop in the integration, get PASS / WARN / BLOCK on every run.

Install the SDK

Add datascreeniq to your dbt project's Python environment.

Set your API key

Export DATASCREENIQ_API_KEY in your environment or add it to your secrets manager.

Add the post-run hook script

Create a Python script that reads model output from your warehouse and screens it. Run it after dbt run in your CI or orchestration step.

Set per-model thresholds

Use source="model_name" to track baselines per model. Set tighter thresholds for critical models like fct_revenue or dim_customers.

scripts/dbt_quality_check.py
import datascreeniq as dsiq
import pandas as pd
from datascreeniq.exceptions import DataQualityError

# Models to screen after dbt run
MODELS = [
    ("analytics.fct_orders",      "fct_orders"),
    ("analytics.fct_revenue",     "fct_revenue"),
    ("analytics.dim_customers",   "dim_customers"),
]

client = dsiq.Client()

for table, source in MODELS:
    df = pd.read_sql(
        f"SELECT * FROM {table} LIMIT 50000", conn
    )
    try:
        report = client.screen_dataframe(df, source=source)
        print(f"{source}: {report.summary()}")
        report.raise_on_block()
    except DataQualityError as e:
        alert_team(f"dbt model {source} failed quality gate: {e}")
        raise  # re-raise to fail the CI step

Makefile / CI step
dbt run --select +fct_orders
python scripts/dbt_quality_check.py  # runs after dbt

Drift detection: After the first run, DataScreenIQ baselines each model's schema. Subsequent runs compare against that baseline — catching field additions, removals, type changes, and null rate spikes automatically.

How it works

Every batch returns a verdict

DataScreenIQ runs 18 quality checks in a single pass — null rates, type mismatches, schema drift, outliers, duplicate rates, and more. The result is one of three verdicts.

PASS

Data is clean

All checks within thresholds. Pipeline proceeds to load. No action needed.

WARN

Issues detected

Quality degraded but above BLOCK threshold. Load proceeds, issue flagged for review.

BLOCK

Pipeline stopped

Critical quality issue detected. Row load prevented. Dead-letter queue or alert triggered.

Your dbt run succeeded.
But the output has already drifted.

Common dbt data failures

Drift in transformed data goes undetected

Bad data never reaches your warehouse

Add this after dbt run

Install the SDK

Set your API key

Add the post-run hook script

Set per-model thresholds

Data is clean

Issues detected

Pipeline stopped

Airflow DAG

GitHub Action

Prefect flow

Google Colab

Start screening in minutes

Your dbt run succeeded.But the output has already drifted.

Common dbt data failures

Drift in transformed data goes undetected

Bad data never reaches your warehouse

Add this after dbt run

Install the SDK

Set your API key

Add the post-run hook script

Set per-model thresholds

Data is clean

Issues detected

Pipeline stopped

Airflow DAG

GitHub Action

Prefect flow

Google Colab

Start screening in minutes

Your dbt run succeeded.
But the output has already drifted.