A single value that won’t convert — a 25.5 in an integer column, an empty string, a date like 20240230 — and the ETL run aborts mid-import. Anyone who loads text data from upstream systems knows it: the delivery doesn’t honour the agreed interface, and a bare CONVERT throws an exception instead of cleanly logging the offending value.
This article describes a design pattern for safe type conversion: an approach that makes every conversion error individually identifiable without aborting the ETL process. It rests on three paradigms, derived in the next section.
What you’ll learn here:
- Materialization — why the intermediate results belong in a persisted table, so the result stays inspectable at any time: after an abort as well as after a successful run.
- Error identification — how a single
WHEREclause finds every failed value instead of killing the run. - Data-type subtleties — why
TRY_CONVERT(int, '')returns0and when that is functionally wrong. - Reusable UDFs —
fn_try_convert_*with empty-string-→-NULLhandling as a building block per target type.
Prerequisite: SQL Server / T-SQL and an ETL context where text data must be moved into typed columns. For the plain conversion functions CAST, CONVERT, TRY_CAST and TRY_CONVERT, see Type Conversion Basics with T-SQL — this article builds the pattern on top of them.
Contents
- The Three Paradigms
- Input and Output Values
- Materializing the Extracted Data
- Identifying Conversion Errors
- Conversion by Target Data Type
- Reusable Conversion Functions
- Critical Appraisal of the Approach
- FAQ
- Related Articles
The Three Paradigms
The approach rests on three paradigms:
NULLinstead of abort. The conversion function returnsNULLwhen the input value cannot be converted to the target data type — no runtime error, no ETL abort.- Materialize input and output value. The ETL process stores both the input value (text) and the converted output value in one table.
- Identify errors by comparison. Comparing the input and output value finds every failed value with a simple
WHEREclause.
SQL Server delivers the first paradigm out of the box with TRY_CONVERT: the function returns NULL on failure instead of throwing an exception. Applying it alone, however, does not guarantee a functionally correct conversion — for that you need to know the specifics of each target data type (see below). For some data types TRY_CONVERT can’t be applied sensibly at all: a yes/no value, for instance, arrives as text (J, N, Y, Yes, No, …), and date values often need pre-processing too. For these cases you write user-defined functions that satisfy the first paradigm (NULL on failure) — the section Reusable Conversion Functions shows them.
Robust, safe type conversion matters especially in data migration projects, where the data to be processed is delivered as files (Excel, CSV, XML, JSON, …).
Input and Output Values
Input values are data that has been extracted and stored in a database, in tables and columns of type nvarchar. Output values are data converted from the input values into the target data types. For every input value to be processed there is also an output value.
Materializing the Extracted Data
A pure in-memory approach tempts you not to persist intermediate results at all: extraction, type conversion and error identification then run in a single processing flow in memory (SSIS with its Control Flow is a well-known example). When values can’t be converted, extensive error handling is needed mid-flow — and if a record contains several errors, often only the first is handled and logged. Faulty records end up, at best, in a text file nobody ever looks at. As powerful as such tools are: comprehensive error handling for type conversion is rarely implemented consistently in practice.
It is better to separate the steps strictly and materialize the intermediate results in a database — regardless of which tool does the processing. The decisive gain is the persistence itself: the conversion result stays inspectable at any time — not only after an abort, but also after a successful run. You can look inside at any moment, trace individual errors and analyze their causes. That is exactly what the acronym ETL stands for: the data is first extracted into a database, then converted robustly, and in the last step error-free data is identified and processed further:
This figure shows an ETL process in which a separate database schema is created for each task to be performed:
| Schema | Meaning |
|---|---|
| E0 | Storing XML and JSON documents in the database |
| E1 | Extracting the values from the text files |
| T1 | Type conversion of the extracted values |
| T2 | Historization of error-free converted records |
| L1 | Structural transformation toward the target system |
| L2 | Storage of error-free, structurally transformed data |
The full derivation of this schema layering — from extraction (E0/E1) through transformation to loading (L1/L2) — is given in Design Pattern // The Architecture of an ETL Process. This article deepens the type-conversion step from schema E1 to T1 — so let’s focus on those two.
Schema E1
Extracted data is stored in tables of schema E1 in columns of type nvarchar. The length of the text fields should, if necessary, not be restricted; it must in any case be chosen so that all data can be extracted in full. Extraction can then only fail because of infrastructure problems. The extracted data is also called the input values.
Schema T1
For every table in schema E1 there is a table of the same name in schema T1. In the T1 tables the input-value columns are carried over with type nvarchar, and a second column is added per input value — this time with the target system’s data type. For pragmatic reasons these column pairs share the same column name, with the columns holding the input value receiving the suffix _E1. An example…
1: CREATE TABLE [T1].[Table]
2: (
3: [Id] int IDENTITY(1,1) NOT NULL
4: ,[PK_E1] nvarchar(256) NULL
5: ,[PK] int NULL
6: ,[Text_E1] nvarchar(256) NULL
7: ,[Text] nvarchar(3) NULL
8: ,[Integer_E1] nvarchar(256) NULL
9: ,[Integer] int NULL
10: ,[Date_E1] nvarchar(256) NULL
11: ,[Date] datetime NULL
12: );
In the table [T1].[Table] all columns except [Id] are declared nullable. This lets both the input values from an [E1].[Table] and the converted output values be stored — even in the presence of conversion problems. The prerequisite, however, is that all conversion functions return NULL on failure.
Identifying Conversion Errors
Let’s look at a data example for the [T1].[Table] above.
| Id | PK_E1 | PK | Text_E1 | Text | Integer_E1 | Integer | Date_E1 | Date |
|---|---|---|---|---|---|---|---|---|
| 1 | 1023 | 1023 | S01 | S01 | 25.5 | NULL | 20240218 | 2024-02-18 |
| 2 | 1024 | 1024 | S022 | S02 | 87 | 87 | 20240230 | NULL |
| 3 | 1025X | NULL | S03 | S03 | 65 | 65 | 20240219 | 2024-02-19 |
Problems converting input values into the output value’s data type can be identified with a simple WHERE clause. For output values of the general type non-text, the following WHERE clauses find the type-conversion problems — the input value is populated, but the converted output value is NULL:
1: WHERE [PK_E1] IS NOT NULL AND [PK] IS NULL
2: WHERE [Integer_E1] IS NOT NULL AND [Integer] IS NULL
3: WHERE [Date_E1] IS NOT NULL AND [Date] IS NULL
For output values of the general type text, you can query for inequality to identify values that are too long (truncated) and therefore problematic:
1: WHERE [Text_E1] <> [Text]
Why both values are persisted — and what the alternative costs. The pattern deliberately stores both the input value (_E1, text) and the converted output value side by side. In theory you could drop the output column and keep only the input values — but then the check routines become more complex. It comes down to where the conversion happens:
- Conversion materialized (this approach): the conversion result is written into the output column. The check is then a simple column comparison on the input/output pair — and you see directly in the row which specific column the error is in (in the example above: row 1 at
Integer, row 2 atDate, row 3 atPK). - Conversion only in the check: the check routine applies the actual conversion/validation logic to the input values at runtime and logs or counts the errors. That works — but the table no longer shows where the error is: you learn how many errors a record has, but not, by simply looking at the row, in which column. This very lack of visibility has caused confusion in practice. This rule-based route — applying check rules generically via dynamic SQL — is described in detail in Checking Data Quality with SQL.
Conversion by Target Data Type
SQL Server provides functions for converting data into a target data type. Apply them without examining exactly how they work, and you’ll get surprises. On closer inspection it turns out, for instance, that converting an empty string yields the number 0:
1: SELECT TRY_CONVERT(int, N'') -- 0
2: SELECT TRY_CONVERT(int, N' ') -- 0
That can be functionally correct. From a database developer’s point of view, however, no value was delivered — the value is unknown, and therefore NULL would be the correct conversion result. There are a number of such subtleties to consider for safe type conversion.
The following linked articles derive, per data type, how an input value is converted safely and correctly into the output value’s data type. Safe type conversion is derived for the following data types:
| Data type | Range | Bytes |
|---|---|---|
| char | Fixed-length string, 1 byte per character | n |
| nchar | Fixed-length string, 2 bytes per character | 2 * n |
| varchar | Variable-length string, 1 byte per character | variable |
| nvarchar | Variable-length string, 2 bytes per character | variable |
| bigint | -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 | 8 |
| int | -2,147,483,648 to 2,147,483,647 | 4 |
| smallint | -32,768 to 32,767 | 2 |
| tinyint | 0 to 255 | 1 |
| numeric [(p [, s])] / decimal [(p [, s])] | -10^38 +1 to 10^38 -1. The two data types are functionally identical. p = total number of digits, s = digits after the decimal point | 5-17 |
| money / smallmoney | For precision and because of money‘s special behaviour in calculations, decimal is recommended instead. | 8 / 4 |
| float[n] | n = number of bits used to store the mantissa (1-53) | 8 |
| real | Synonym for float(24) | 4 |
| bit | 0 or 1 | 1 |
| date | 0001-01-01 to 9999-12-31 (no time) | 3 |
| datetime | 1753-01-01 to 9999-12-31, with time; fractional-second precision rounded to ~3.33 ms | 8 |
| datetime2(n) | 0001-01-01 to 9999-12-31, with time; n = fractional-second precision | 6-8 |
| time(n) | 00:00:00 to 23:59:59.9999999; n = fractional-second precision | 5 |
The following articles in this series derive safe type conversion depending on the output value’s data type.
Reusable Conversion Functions
The first paradigm — NULL instead of an exception for a non-convertible value — repeats for every target type. Rather than spelling it out in every SELECT, you encapsulate it in a user-defined function fn_try_convert_<type>. It handles two things TRY_CONVERT alone does not: mapping empty strings and pure whitespace input to NULL (instead of the 0 trap above) and — for floating-point numbers — normalizing comma decimal notation.
For the integer types, the integer representative looks like this (the siblings fn_try_convert_bigint, fn_try_convert_smallint and fn_try_convert_tinyint differ only in the target type):
1: CREATE FUNCTION [dbo].[fn_try_convert_int] (@p_input AS nvarchar(256))
2: RETURNS int
3: AS
4: BEGIN
5: DECLARE @normalized AS nvarchar(256);
6:
7: SET @normalized = LTRIM(RTRIM(@p_input));
8:
9: -- empty string/whitespace is an unknown value, not 0
10: IF @normalized = N'' RETURN NULL;
11:
12: RETURN TRY_CONVERT(int, @normalized);
13: END;
For the floating-point types, comma-to-dot normalization is added so that a value written with a comma (e.g. 25,5, common in German-language source data) converts correctly (fn_try_convert_real is identical except for the target type):
1: CREATE FUNCTION [dbo].[fn_try_convert_float] (@p_input AS nvarchar(256))
2: RETURNS float
3: AS
4: BEGIN
5: DECLARE @normalized AS nvarchar(256);
6:
7: -- comma decimal notation: comma to dot
8: SET @normalized = REPLACE(LTRIM(RTRIM(@p_input)), N',', N'.');
9:
10: IF @normalized = N'' RETURN NULL;
11:
12: RETURN TRY_CONVERT(float, @normalized);
13: END;
With these functions, the conversion in schema T1 becomes a simple, abort-safe expression — [dbo].[fn_try_convert_int]([Integer_E1]) returns the typed value or NULL, never an exception. The data-type-specific subtleties (ranges, rounding for decimal, J/N mapping for bit, date formats) are derived per type in the linked articles of this series.
Critical Appraisal of the Approach
This article has laid out the basic approach to safe type conversion. Implementing it in an ETL process looks laborious at first: SELECT statements that read data from the schema E1 tables and store it typed in the schema T1 tables can become complex for tables with many columns.
It therefore makes sense to solve this task once with a generic, metadata-driven procedure: a procedure that generates the conversion SELECT dynamically from the T1 table structures reduces the development effort per table to essentially one line of code. This very pattern — configurable bad-data detection via dynamic SQL — is described in Checking Data Quality with SQL.
This places the article clearly: it is the practical rule guide for conversion error checking. The architectural frame — the schema layering E0–L2 — is provided by The Architecture of an ETL Process; the generalization to arbitrary data-quality rules by the framework just mentioned. This article covers the piece in between: how to concretely detect errors at the type-conversion step.
FAQ
Why not just use TRY_CONVERT directly in the SELECT?
TRY_CONVERT alone has two pitfalls: an empty string becomes 0 instead of NULL, and on failure the information about which value was non-convertible is lost. The pattern solves both — the fn_try_convert_* function maps empty values to NULL, and the E1/T1 materialization keeps the original value next to the conversion result.
What’s the difference from the “Type Conversion Basics” article?
The basics article compares the functions themselves — CAST, CONVERT, TRY_CAST and TRY_CONVERT. This article builds the design pattern on top: the ETL approach of materialization, error identification via WHERE clause and reusable UDFs. The basics are the tool, this is the method.
How do I find all failed conversions?
With a WHERE clause on the column pair: for non-text types the conversion failed if the input value is populated but the output value is NULL ([x_E1] IS NOT NULL AND [x] IS NULL). For text types, inequality ([x_E1] <> [x]) reveals truncated values.
How do I automate the conversion across many columns and tables?
With a generic, metadata-driven procedure that generates the conversion SELECT dynamically from the table structures. The pattern is worked out in Checking Data Quality with SQL — there as configurable bad-data detection via dynamic SQL.
Does the pattern also apply to PostgreSQL?
Conceptually yes. The three paradigms are engine-neutral. Postgres has no TRY_CONVERT, but the same idea can be implemented with a PL/pgSQL function that wraps the cast in a BEGIN … EXCEPTION WHEN others THEN RETURN NULL block. Especially relevant for data migration to PostgreSQL.
Related Articles
Basics:
- Data Quality // Type Conversion Basics with T-SQL — CAST, CONVERT, TRY_CAST and TRY_CONVERT compared.
Safe conversion by data type:
- TRY_CONVERT for bigint, int, smallint and tinyint
- TRY_CONVERT for decimal and numeric
- TRY_CONVERT for money and smallmoney
- TRY_CONVERT for float and real
- TRY_CONVERT for date, datetime, datetime2 and time
- TRY_CONVERT for bit — converting yes/no values
In the ETL, data-quality and migration context:
- The Architecture of an ETL Process — the schema layering
E0–L2into which this pattern places theE1→T1conversion step. - Data Quality in an ETL Process — the higher-level view of data quality in the process.
- Checking Data Quality with SQL — generic bad-data detection via dynamic SQL.
- Data Migration: SQL Server to PostgreSQL — the context where safe type conversion matters most.