Checking Data Quality with SQL — a Configurable Framework for Spotting Bad Data Generically

Bad data gives no warning. An age of 200 years, a duplicate customer number, a country code that doesn’t exist — in the source system nobody notices. Only when the ETL run tries to push the rows into the strictly modelled target layer does the load break: on a CHECK, on a UNIQUE index, on a foreign key. Checking … Read more

Design Pattern // The Architecture of an ETL Process — How to Isolate Bad Data Cleanly

A single date string that cannot be parsed, and the entire ETL run aborts. The design pattern for ETL process architecture presented here prevents exactly that: bad data is isolated, not passed along. TL;DR — what this article covers: Prerequisite. Basic familiarity with ETL processes. This is a conceptual article — not a step-by-step tutorial. Root of … Read more

Data Quality in an ETL Process — Catching Technical and Business Errors Before They Reach the Target System

A single value that cannot be converted — a date in the wrong format, a number with the wrong decimal separator — and the entire ETL run aborts. Data quality in an ETL process means catching such errors proactively: identifying, logging and isolating them before they reach the target system. This article is the entry point to … Read more

Design Pattern // Logging an ETL Process with T-SQL — How to Capture Run, Component and Action in Evaluable Log Tables

An ETL process finishes without an exception — but was everything really loaded that should have been? The mere fact that a process did not abort says nothing about whether it actually did what was expected of it. A readable, evaluable log is what turns a gut feeling into a defensible statement. This design pattern … Read more