A CSV with a type mismatch, a null spike, a new unexpected column — it merges, it deploys, it breaks production. Block bad data before it reaches main.
Screens CSV, JSON, and Excel files in CI — returns PASS/WARN/BLOCK before any merge is allowed.
A CSV file gets updated in a PR. The data has a type mismatch, a null spike, or a new column nobody noticed. It merges. It seeds the database. The dashboard breaks in production.
DataScreenIQ screens the payload before storage. A BLOCK stops the pipeline instantly — bad rows go to a dead-letter queue, not your database.
- name: Screen data files
env:
DATASCREENIQ_API_KEY: ${{ secrets.DATASCREENIQ_API_KEY }}
run: |
pip install datascreeniq -q
python -c "
import datascreeniq as dsiq
report = dsiq.Client().screen_file('data/orders.csv', source='orders')
report.raise_on_block()
"
Install the SDK, drop in the integration, get PASS / WARN / BLOCK on every run.
In your GitHub repo: Settings → Secrets and variables → Actions → New secret. Name it DATASCREENIQ_API_KEY.
Add .github/workflows/data-quality.yml to your repo with the configuration below.
Add scripts/check_data_quality.py to your repo. This runs the screen call against your data files.
Every pull request now shows a data quality status check. BLOCK = PR cannot merge until fixed.
name: Data Quality Gate
on: [pull_request]
jobs:
screen:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Install DataScreenIQ
run: pip install datascreeniq
- name: Screen data files
env:
DATASCREENIQ_API_KEY: ${{ secrets.DATASCREENIQ_API_KEY }}
run: python scripts/check_data_quality.py
import datascreeniq as dsiq
from pathlib import Path
from datascreeniq.exceptions import DataQualityError
client = dsiq.Client() # reads DATASCREENIQ_API_KEY from env
# Screen all CSV/JSON files in the data/ directory
data_files = list(Path("data").glob("**/*.csv")) + list(Path("data").glob("**/*.json"))
for f in data_files:
try:
report = client.screen_file(f, source=f.stem)
print(f"{f.name}: {report.summary()}")
report.raise_on_block() # exits with code 1 → fails CI
except DataQualityError as e:
print(f"BLOCKED {f.name}: {e}")
exit(1) # non-zero exit = GitHub check fails = PR blocked
print("All data files passed quality checks.")
screen.DataScreenIQ runs 18 quality checks in a single pass — null rates, type mismatches, schema drift, outliers, duplicate rates, and more. The result is one of three verdicts.
All checks within thresholds. Pipeline proceeds to load. No action needed.
Quality degraded but above BLOCK threshold. Load proceeds, issue flagged for review.
Critical quality issue detected. Row load prevented. Dead-letter queue or alert triggered.
DataScreenIQ drops into any pipeline that can make an HTTP call.
Quality gate between extract and load.
Catch schema drift in transformed data.
Quality gate flow with alerting on BLOCK.
Try DataScreenIQ in 60 seconds.
Free tier: 500K rows/month. No credit card. API key in 30 seconds.
Get a free API key →