GitHub Action

Your PR passed review.
But the data file is already corrupt.

A CSV with a type mismatch, a null spike, a new unexpected column — it merges, it deploys, it breaks production. Block bad data before it reaches main.

Free tier · 500K rows/month · no credit card required

Screens CSV, JSON, and Excel files in CI — returns PASS/WARN/BLOCK before any merge is allowed.

Common CI/CD data failures

  • A data file merges with a type mismatch — production breaks after deploy
  • Seed data has null spikes nobody caught in review
  • Schema changed in a CSV — downstream models fail silently
  • A new column appeared — pipeline breaks on the next run
  • Bad data in a fixture gets into the test database and corrupts results
The problem
Before vs after
Before

Bad data files merge silently

A CSV file gets updated in a PR. The data has a type mismatch, a null spike, or a new column nobody noticed. It merges. It seeds the database. The dashboard breaks in production.

PR opened → review → merge → deploy
[ bad data in main branch ]
[ production broken after deploy ]
After

Bad data never reaches your warehouse

DataScreenIQ screens the payload before storage. A BLOCK stops the pipeline instantly — bad rows go to a dead-letter queue, not your database.

PR opened → data quality check → PASS → merge ✓
                               → BLOCK → PR blocked
Quick start

Add this to your workflow

python copy & paste
- name: Screen data files
  env:
    DATASCREENIQ_API_KEY: ${{ secrets.DATASCREENIQ_API_KEY }}
  run: |
    pip install datascreeniq -q
    python -c "
import datascreeniq as dsiq
report = dsiq.Client().screen_file('data/orders.csv', source='orders')
report.raise_on_block()
    "
Non-zero exit blocks the merge. Bad data never reaches main.
Setup guide
Get running in minutes

Install the SDK, drop in the integration, get PASS / WARN / BLOCK on every run.

01

Add your API key as a secret

In your GitHub repo: Settings → Secrets and variables → Actions → New secret. Name it DATASCREENIQ_API_KEY.

02

Create the workflow file

Add .github/workflows/data-quality.yml to your repo with the configuration below.

03

Add the check script

Add scripts/check_data_quality.py to your repo. This runs the screen call against your data files.

04

Open a PR — see the check run

Every pull request now shows a data quality status check. BLOCK = PR cannot merge until fixed.

.github/workflows/data-quality.yml
name: Data Quality Gate on: [pull_request] jobs: screen: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v5 with: python-version: "3.11" - name: Install DataScreenIQ run: pip install datascreeniq - name: Screen data files env: DATASCREENIQ_API_KEY: ${{ secrets.DATASCREENIQ_API_KEY }} run: python scripts/check_data_quality.py
scripts/check_data_quality.py
import datascreeniq as dsiq from pathlib import Path from datascreeniq.exceptions import DataQualityError client = dsiq.Client() # reads DATASCREENIQ_API_KEY from env # Screen all CSV/JSON files in the data/ directory data_files = list(Path("data").glob("**/*.csv")) + list(Path("data").glob("**/*.json")) for f in data_files: try: report = client.screen_file(f, source=f.stem) print(f"{f.name}: {report.summary()}") report.raise_on_block() # exits with code 1 → fails CI except DataQualityError as e: print(f"BLOCKED {f.name}: {e}") exit(1) # non-zero exit = GitHub check fails = PR blocked print("All data files passed quality checks.")
Branch protection: To make the check required, go to Settings → Branches → Branch protection rules → Require status checks to pass before merging → select screen.
How it works
Every batch returns a verdict

DataScreenIQ runs 18 quality checks in a single pass — null rates, type mismatches, schema drift, outliers, duplicate rates, and more. The result is one of three verdicts.

PASS

Data is clean

All checks within thresholds. Pipeline proceeds to load. No action needed.

WARN

Issues detected

Quality degraded but above BLOCK threshold. Load proceeds, issue flagged for review.

BLOCK

Pipeline stopped

Critical quality issue detected. Row load prevented. Dead-letter queue or alert triggered.

More integrations
Works with your whole stack

DataScreenIQ drops into any pipeline that can make an HTTP call.

Start screening in minutes

Free tier: 500K rows/month. No credit card. API key in 30 seconds.

Get a free API key →