GitHub Action Data Quality Check

The problem

Before vs after

✕Before

Bad data files merge silently

A CSV file gets updated in a PR. The data has a type mismatch, a null spike, or a new column nobody noticed. It merges. It seeds the database. The dashboard breaks in production.

PR opened → review → merge → deploy
[ bad data in main branch ]
[ production broken after deploy ]

✓After

Bad data never reaches your warehouse

DataScreenIQ screens the payload before storage. A BLOCK stops the pipeline instantly — bad rows go to a dead-letter queue, not your database.

PR opened → data quality check → PASS → merge ✓
→ BLOCK → PR blocked

Setup guide

Get running in minutes

Install the SDK, drop in the integration, get PASS / WARN / BLOCK on every run.

Add your API key as a secret

In your GitHub repo: Settings → Secrets and variables → Actions → New secret. Name it DATASCREENIQ_API_KEY.

Create the workflow file

Add .github/workflows/data-quality.yml to your repo with the configuration below.

Add the check script

Add scripts/check_data_quality.py to your repo. This runs the screen call against your data files.

Open a PR — see the check run

Every pull request now shows a data quality status check. BLOCK = PR cannot merge until fixed.

.github/workflows/data-quality.yml
name: Data Quality Gate

on: [pull_request]

jobs:
  screen:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - name: Install DataScreenIQ
        run: pip install datascreeniq

      - name: Screen data files
        env:
          DATASCREENIQ_API_KEY: ${{ secrets.DATASCREENIQ_API_KEY }}
        run: python scripts/check_data_quality.py

scripts/check_data_quality.py
import datascreeniq as dsiq
from pathlib import Path
from datascreeniq.exceptions import DataQualityError

client = dsiq.Client()  # reads DATASCREENIQ_API_KEY from env

# Screen all CSV/JSON files in the data/ directory
data_files = list(Path("data").glob("**/*.csv")) + list(Path("data").glob("**/*.json"))

for f in data_files:
    try:
        report = client.screen_file(f, source=f.stem)
        print(f"{f.name}: {report.summary()}")
        report.raise_on_block()  # exits with code 1 → fails CI
    except DataQualityError as e:
        print(f"BLOCKED {f.name}: {e}")
        exit(1)  # non-zero exit = GitHub check fails = PR blocked

print("All data files passed quality checks.")

Branch protection: To make the check required, go to Settings → Branches → Branch protection rules → Require status checks to pass before merging → select screen.

How it works

Every batch returns a verdict

DataScreenIQ runs 18 quality checks in a single pass — null rates, type mismatches, schema drift, outliers, duplicate rates, and more. The result is one of three verdicts.

PASS

Data is clean

All checks within thresholds. Pipeline proceeds to load. No action needed.

WARN

Issues detected

Quality degraded but above BLOCK threshold. Load proceeds, issue flagged for review.

BLOCK

Pipeline stopped

Critical quality issue detected. Row load prevented. Dead-letter queue or alert triggered.

Your PR passed review.
But the data file is already corrupt.

Common CI/CD data failures

Bad data files merge silently

Bad data never reaches your warehouse

Add this to your workflow

Add your API key as a secret

Create the workflow file

Add the check script

Open a PR — see the check run

Data is clean

Issues detected

Pipeline stopped

Airflow DAG

dbt post-hook

Prefect flow

Google Colab

Start screening in minutes

Your PR passed review.But the data file is already corrupt.

Common CI/CD data failures

Bad data files merge silently

Bad data never reaches your warehouse

Add this to your workflow

Add your API key as a secret

Create the workflow file

Add the check script

Open a PR — see the check run

Data is clean

Issues detected

Pipeline stopped

Airflow DAG

dbt post-hook

Prefect flow

Google Colab

Start screening in minutes

Your PR passed review.
But the data file is already corrupt.