Design Pattern // Logging an ETL process with T-SQL

How can you judge whether an ETL process has been successful? The mere fact that a process was not terminated with an exception does not necessarily mean that the process did what was expected of it. Readable and evaluable logging of an ETL process can help here. It enables a reliable assessment of the success of an ETL process.

This article presents a design pattern for logging an ETL process based on stored procedures that insert and update log records in log tables. The aim is to create a readable and evaluable protocol that provides answers to essential questions about the execution of an ETL process:

  • How long does the ETL process take in total?
  • How long does a stored procedure, SSIS package, or other component take to execute?
  • How long does it take to execute a specific SQL statement?
  • How many records were processed by an SQL statement?

Of course, we also want to know whether the process as a whole, a procedure or even a specific SQL statement was executed successfully.

The procedure is actually straightforward: At the beginning of each action, a log data record is written and after the action is completed, it is updated with either the status success or failure and, if necessary, other helpful information. That’s it! But it can still be a little more…

Content

The Result
Let’s start…
      Three-tier logging
            [LL].[Execution]
            [LL].[Component]
            [LL].[Trace]
            Schema LL
      Table descriptions, declaration and data model
            [LL].[Execution]
            [LL].[Component]
            [LL].[Trace]
            [LL].[Error]
      ETL process execution status
      Stored procedures for logging
            [LL].[spInsertTrace]
            [LL].[spUpdateTrace]
            [LL].[spUpdateTraceSuccess]
      ExceptionHandling
Example of logging and exception handling in an ETL process
      [T2].[spETLProcess]
      [T2].[spDo­Something_2_2]
Summary
Related Posts
Download

The result

Even if the creation of a protocol itself is not or should not be a big deal, there is still a lot to write about. But let’s start with the result and put the cart before the horse. The process logging presented here is three-tier and uses the following log tables:

  • [LL].[Execution]
  • [LL].[Component]
  • [LL].[Trace]

In these tables, an ETL process and the associated components and work steps are logged with increasing granularity – read from top to bottom. The ETL process, the components and the individual action are logged with exactly one row.

Of course, the logging of errors is also closely linked to the creation of the logging of the process. The logging approach presented here uses the following table for logging errors:

  • [LL].[Error]

The following images show the result of the three-stage logging of a simple, compact but complete ETL process.

In the [LL].[Trace] table, each action is recorded, among other things, with the name of the procedure, the target entity that is processed by the procedure, a short description of what was specifically done and other information, such as the number of affected Records and duration of execution are logged.

Tabelle [LL].[Trace]

The [LL].[Component] table logs the calls to procedures and SSIS packages or – generally speaking – components. Here too, a short description of the component’s task, the target entity and the duration of execution are logged.

Tabelle [LL].[Component]

At the top level, each execution of the ETL process is logged in the [LL].[Execution] table with exactly one row.

Tabelle [LL].[Execution]

All process logging tables contain two columns [State] and [Success], which store the success or failure of the execution of an action, component, or ETL process.

Let’s start…

After the brief introduction of the result, the following aspects of the procedure need to be clarified:

Three-tier logging

The procedure involves logging an ETL process in the three tables [LL].[Execution], [LL].[Component] and [LL].[Trace]. Each of the tables is intended for logging specific artifacts. This section introduces the use of each table, its columns, and the code to create it.

[LL].[Execution]

This table records the execution of an ETL process with exactly one record. An ETL process always has an entry function. This can be a stored procedure, an SSIS package, a Talend job, or a SQL Server Agent job. At the beginning of execution of the entry function, a log record is inserted into this table. After all tasks have been successfully processed, this data record is updated with the status success, or in the event of an error with the status error. This table therefore contains an overview of all executions of the ETL process and provides information such as the status and duration of the execution.

[LL].[Component]

In this table, the execution of a component is logged with exactly one row. A component can be a stored procedure, an SSIS package, a Talend job. A component is characterized by the fact that it controls and executes one or more data manipulations or other components. As with the table [LL].[Execution], a log entry is inserted into this table at the beginning of execution and updated with the status success or error upon completion. This table therefore contains a list of the components executed for each execution of the ETL process and provides information such as the status and duration of the execution for each component.

[LL].[Trace]

The name of this table already indicates that the table is intended for a detailed logging of actions of the ETL process. Which actions are logged is a design decision by the developer. However, it is recommended to log at least every INSERT, UPDATE and DELETE statement with its own log entry. In addition to the status fields mentioned above, the table also contains a column that provides the number of affected rows of a data manipulation task, thus giving the developer the opportunity to assess whether the statements did exactly what was expected.

[LL].[Error]

Errors that were detected in an ETL process must of course be logged. A distinction must be made between the logging of exceptions and data errors. The structure of thi

s table is designed for the logging of data errors and contains columns in which each data errors can be logged completely, readably and evaluably. This article focuses on logging an ETL process rather than logging data errors. The logging of the process is closely linked to explicit exception handling. A separate section is dedicated to exception handling and exception logging.

Schema LL

The schema name LL stands for Logging Layer. This schema stores all the log tables and the stored procedures used for logging.

Table descriptions, declaration and data model

The following sections describe the process logging tables and the error logging table. Finally, there is a diagram with the data model for these tables.

[LL].[Execution]

Columns

The table contains the following columns:

ColumnDescription
IdThe [Id] column is declared as IDENTITY and identifies a log record in this table.
ProcessThe logging tables cannot be used by just one ETL process. So that the log data records of different ETL processes can be distinguished from one another, a unique name of the ETL process must be specified in the [Process] column.
Start
End
These two columns indicate the start time and the end of the execution of an ETL process. When the ETL process starts, the current timestamp is saved in the [Start] column. The [End] column is initialized with a NULL at startup. The last command of an ETL process must update the state of the log entry ([State] and [Success] columns). When updated, the current timestamp is saved in the [End] column. A NULL in the [End] column indicates either that the process is still running or that it has aborted with an exception. Times are always stored as UTC time.
DeltaStart
DeltaEnd
These columns determine the time window for a delta load. [DeltaStart] sets the lower time limit and [DeltaEnd] sets the upper one. The time window is automatically determined by the stored procedure [LL].[spInsertExecution], which inserts the log entry into this table.
User
Machine
Version
These columns must indicate the context in which the ETL process is executed. The user specified in the [User] column is the user under which the process is executed. This is crucial for authorizations to databases, tables, file shares, etc. The user is determined by automatically using the T-SQL function SUSER_SNAME() when inserting the log entry.
The [Machine] column stores the name of the computer on which the ETL process is running. The name is also determined automatically when the log entry is inserted using the T-SQL function HOST_NAME().
The [Version] column takes the version number of the ETL process and may be helpful for troubleshooting after an incorrect deployment of an ETL process.
StateThe [State] column displays the current status of the ETL process. Possible values include processing, warning, success or error. Initially the ETL process is started with [State] = processing. If successful, [State] is updated with warning or success. In case of an error, [State] is updated with error.
SuccessThe [Success] column indicates whether the ETL process completed successfully with [State] = warning or success. Initially the ETL process is started with [Success] = 0. If successful, [Success] is updated with 1.
CreatedOn
CreatedBy
ModifiedOn
ModifiedBy
The [CreatedOn] and [ModifiedOn] columns contain a UTC timestamp. Initially, [CreatedOn] is assigned the current timestamp via a default value and [ModifiedOn] is assigned a NULL. [ModifiedOn] is updated with the current timestamp when the status is updated. The same system applies to the fields [CreatedBy] and [ModifiedBy]. The user under which the process is executed is stored here.

Declaration

The table is created using the following statement:

CREATE TABLE [LL].[Execution] 
(
    [Id]           bigint         IDENTITY (1, 1) NOT NULL
   ,[Process]      nvarchar (max)                 NOT NULL
   ,[Start]        datetime                       NOT NULL
   ,[End]          datetime                           NULL
   ,[DeltaStart]   datetime                           NULL
   ,[DeltaEnd]     datetime                           NULL
   ,[User]         nvarchar(128)                      NULL
   ,[Machine]      nvarchar(128)                      NULL
   ,[Version]      int                                NULL
   ,[State]        nvarchar(128)                      NULL
   ,[Success]      bit                                NULL
   ,[CreatedOn]    datetime
       CONSTRAINT [DF_LL_Execution_CreatedOn] 
       DEFAULT (GETUTCDATE())                     NOT NULL
   ,[CreatedBy]    nvarchar(100)
       CONSTRAINT [DF_LL_Execution_CreatedBy]
       DEFAULT (SUSER_SNAME())                    NOT NULL
   ,[ModifiedOn]   datetime                           NULL
   ,[ModifiedBy]   nvarchar(100)                      NULL
   ,CONSTRAINT [PK_LL_Execution] 
       PRIMARY KEY CLUSTERED ([Id] ASC)
);
GO

CREATE TRIGGER [LL].[TR_LL_Execution_Update]
ON [LL].[Execution]
FOR UPDATE
AS
BEGIN
SET NOCOUNT ON
   UPDATE [LL].[Execution]
      SET
          [ModifiedOn] = GETUTCDATE()
         ,[ModifiedBy] = SUSER_SNAME()
   FROM
      [LL].[Execution]
      INNER JOIN inserted
      ON
        inserted.[ID] = [LL].[Execution].[ID];
END;
GO

[LL].[Component]

Columns

The table contains the following columns:

ColumnDescritpion
IdThe [Id] column is declared as IDENTITY and identifies a log record in this table.
ExecutionIdUsing this foreign key on the table [LL].[Execution], log entries that are related to a specific execution of the ETL process can be identified.
SourceThe text in this column indicates the type of component that wrote the log record. Values used are, for example, T-SQL or SSIS.
ComponentThis column holds the name of the component that wrote the entry. For procedures, this is the procedure name, for other components, for example, the file name of the component.
VersionIf a version number is available for the component, it can and should be logged in this column.
EntityThe name of the target table or – to put it more generally – the entity that is processed by the component must be specified in this column. The name can be freely chosen. It is recommended to choose the name systematically in order to facilitate a systematic evaluation of the protocol.
StepA short description of the component’s task must be provided in the column [Step]. This description should also follow a strict system to enable easy evaluation of the protocol.
FileIdThe [FileId] column is used for the processing of files by the component. It is not important for describing the procedure presented here.
DescriptionAn additional description of the component’s task can be inserted in this column.
StateThe [State] column displays the current status of the component’s execution. Possible values include processing, warning, success or error. Initially the component is started with [State] equal to processing. If successful, [State] is updated with warning or success. In case of an error, [State] is updated with error.
SuccessThe column [Success] indicates whether the component ended successfully with [State] equal to warning or success. Initially the component is started with [Success] equal to 0. If successful, [Success] is updated with 1.
CreatedOn
CreatedBy
ModifiedOn
ModifiedBy
See table [LL].[Execution].

Declaration

The table is created using the following statement:

CREATE TABLE [LL].[Component] 
(
    [Id]           bigint         IDENTITY (1, 1) NOT NULL
   ,[ExecutionId]  bigint                         NOT NULL
   ,[Source]       nvarchar (5)                   NOT NULL
   ,[Component]    nvarchar (128)                 NOT NULL
   ,[Version]      int                                NULL
   ,[Entity]       nvarchar (128)                 NOT NULL
   ,[Step]         nvarchar (MAX)                 NOT NULL
   ,[Description]  nvarchar (MAX)                     NULL
   ,[FileId]       bigint                             NULL
   ,[State]        nvarchar (128)                     NULL
   ,[Success]      bit                                NULL
   ,[CreatedOn]    datetime
       CONSTRAINT [DF_LL_Execution_CreatedOn] 
       DEFAULT (GETUTCDATE())                     NOT NULL
   ,[CreatedBy]    nvarchar (100) 
       CONSTRAINT [DF_LL_Execution_CreatedBy] 
       DEFAULT (SUSER_SNAME())                    NOT NULL
   ,[ModifiedOn]   datetime                           NULL
   ,[ModifiedBy]   nvarchar (100)                     NULL
   ,CONSTRAINT [PK_LL_Component] 
       PRIMARY KEY CLUSTERED ([Id] ASC),
   ,CONSTRAINT [FK_LL_Component_ExecutionId] 
       FOREIGN KEY ([ExecutionId]) 
       REFERENCES [LL].[Execution] ([Id])
);
GO

CREATE TRIGGER [LL].[TR_LL_Component_Update]
ON [LL].[Component]
FOR UPDATE
AS
BEGIN
SET NOCOUNT ON

   UPDATE [LL].[Component]
      SET
          [ModifiedOn] = GETUTCDATE()
         ,[ModifiedBy] = SUSER_SNAME()
   FROM
      [LL].[Component]
      INNER JOIN inserted
      ON
        inserted.[ID] = [LL].[Component].[ID];
END;
GO

[LL].[Trace]

Columns

The table contains the following columns:

ColumnDescription
IdThe [Id] column is declared as IDENTITY and identifies a log record in this table.
ExecutionIdUsing this foreign key to table [LL].[Execution], log entries that are related to a specific execution of the ETL process can be identified.
ComponentIdUsing this foreign key to table [LL].[Component], log entries that are related to a specific execution of a component can be identified.
SourceThe text in this column indicates the type of component that wrote the log record. Values used are, for example, T-SQL or SSIS.
ComponentThis column holds the name of the component that wrote the entry. For procedures, this is the procedure name, for other components, for example, the file name of the component.
TaskThe stored procedures can also be used for logging in an SSIS package or other type of component. This column should store the name of the task in the component this log entry is related to.
EntityThe name of the target table or – to put it more generally – the entity that is processed by the component must be specified in this column. The name can be freely chosen. It is recommended to choose the name systematically in order to facilitate a systematic evaluation of the protocol.
StepA short description of the component’s task must be provided in the column [Step]. This description should also follow a strict system to enable easy evaluation of the protocol.
DescriptionAn additional description of the component’s task can be inserted in this column.
FileIdColumn [FileId] is used for the processing of files by the component. It is not important for the description of the procedure presented here.
ActionIn this column, the action carried out must be sepcified with a keyword. Frequently used texts include insert, update, delete, copy, etc. The same applies here: a strict system in the use of keywords makes it easier to evaluate the protocol.
AffectedRowsIf the action carried out inserts, updates or deletes data, the number of affected rows must be saved here. The number of affected rows can be queried, for example, using the T-SQL system function @@ROWCOUNT.
State
Success
CreatedOn
CreatedBy
ModifiedOn
ModifiedBy
See table [LL].[Execution].

Declaration

The table is created using the following statement:

CREATE TABLE [LL].[Trace] 
(
    [Id]           bigint         IDENTITY (1, 1) NOT NULL
   ,[ExecutionId]  bigint                         NOT NULL
   ,[ComponentId]  bigint                         NOT NULL
   ,[Source]       nvarchar(5)                    NOT NULL
   ,[Component]    nvarchar(128)                  NOT NULL
   ,[Task]         nvarchar(128)                      NULL
   ,[Entity]       nvarchar(128)                      NULL
   ,[Step]         nvarchar(MAX)                  NOT NULL
   ,[Description]  nvarchar(MAX)                      NULL
   ,[FileId]       bigint                             NULL
   ,[Action]       nvarchar(100)                      NULL
   ,[AffectedRows] int                                NULL
   ,[State]        nvarchar(100)                  NOT NULL
   ,[Success]      bit                            NOT NULL
   ,[CreatedOn]    datetime
       CONSTRAINT [DF_LL_Trace_CreatedOn]
       DEFAULT (GETUTCDATE())                     NOT NULL
   ,[CreatedBy]    nvarchar(100)
       CONSTRAINT [DF_LL_Trace_CreatedBy]
       DEFAULT (SUSER_SNAME())                    NOT NULL
   ,[ModifiedOn]   datetime                           NULL
   ,[ModifiedBy]   nvarchar(128)                      NULL
   ,CONSTRAINT [PK_LL_Trace] 
       PRIMARY KEY CLUSTERED ([Id] ASC)
   ,CONSTRAINT [FK_LL_Trace_ComponentId]
       FOREIGN KEY ([ComponentId])
       REFERENCES [LL].[Component] ([Id])
   ,CONSTRAINT [FK_LL_Trace_ExecutionId]
       FOREIGN KEY ([ExecutionId])
       REFERENCES [LL].[Execution] ([Id])
);
GO

CREATE TRIGGER [LL].[TR_LL_Trace_Update]
ON [LL].[Trace]
FOR UPDATE
AS
BEGIN
SET NOCOUNT ON

   UPDATE [LL].[Trace]
      SET
          [ModifiedOn] = GETUTCDATE()
         ,[ModifiedBy] = SUSER_SNAME()
   FROM
      [LL].[Trace]
      INNER JOIN inserted
      ON
        inserted.[Id] = [LL].[Trace].[Id];
END;
GO

[LL].[Error]

As mentioned above, the structure of this table is designed for logging data errors, which is outside the scope of this article. However, the table is also used for logging exceptions. The description of the columns only takes into account those columns that are required for logging an exception.

Columns

ColumnDescription
IdThe [Id] column is declared as IDENTITY and identifies a log record in this table.
ExecutionIdUsing this foreign key to table [LL].[Execution], log entries that are related to a specific execution of the ETL process can be identified.
ComponentIdUsing this foreign key to table [LL].[Component], log entries that are related to a specific execution of a component can be identified. However, the foreign key is not implemented. This is to allow logging entries in this table in absence of a log entry in table [LL].[Component].
TraceIdUsing this foreign key to table [LL].[Trace], log entries that are related to a specific execution of a component can be identified. However, the foreign key is not implemented. This is to allow logging entries in this table in absence of a log entry in table [LL].[Trace].
ErrorTypeThe severity of the error is marked in this column. Possible values in this column are: E = Error, W = Warning and I = Information.
Source
Component
Task
Für diese Spalten gelten die gleichen Ausführung wie bei der Tabelle [LL].[Trace].
Step
SchemaName
TableName
FileId
ID1Value
ID1ColumnName
ID2Value
ID2ColumnName
ID3Value
ID3ColumnName
ErrorValue
ErrorColumnName
FileName
These columns are not relevant for logging an exception.
DescriptionThe error message is to be saved in this column. In case of an exception thrown in a stored procedure, the error message can be queried using the T-SQL BuiltIn function ERROR_MESSAGE.
NumberThis column is used to be saved the error number. In case of an exception thrown in a stored procedure, the error number can be queried using the T-SQL BuiltIn function ERROR_NUMBER.
LineIn case of an exception thrown in a stored procedure, the line wihtin the script or stored procedure where the exception was thrown can be queried using the T-SQL BuiltIn function ERROR_LINE.
StateThe status code of the exception can be queried using the T-SQL BuiltIn function ERROR_STATE and should be written to this column..
CreatedOn
CreatedBy
The [CreatedOn] column contains a UTC timestamp. Initially, [CreatedOn] is assigned the current timestamp via a default value. The [CreatedBy] column stores the user under which the error was logged.

Declaration

CREATE TABLE [LL].[Error]
(
    [Id]              bigint        IDENTITY (1, 1) NOT NULL
   ,[ExecutionId]     bigint                        NOT NULL
   ,[ComponentId]     bigint                            NULL
   ,[TraceId]         bigint                            NULL
   ,[ErrorType]       char (1)                      NOT NULL
   ,[Source]          nvarchar(5)                   NOT NULL
   ,[Component]       nvarchar(128)                 NOT NULL
   ,[TaskName]        nvarchar(128)                     NULL
   ,[Entity]          nvarchar(128)                     NULL
   ,[Step]            nvarchar(max)                     NULL
   ,[SchemaName]      nvarchar(128)                     NULL
   ,[TableName]       nvarchar(128)                     NULL
   ,[FileId]          bigint                            NULL
   ,[ID1Value]        nvarchar(max)                     NULL
   ,[ID1ColumnName]   nvarchar(128)                     NULL
   ,[ID2Value]        nvarchar(max)                     NULL
   ,[ID2ColumnName]   nvarchar(128)                     NULL
   ,[ID3Value]        nvarchar(max)                     NULL
   ,[ID3ColumnName]   nvarchar(128)                     NULL
   ,[ErrorValue]      nvarchar(max)                     NULL
   ,[ErrorColumnName] nvarchar(128)                     NULL
   ,[FileName]        nvarchar(128)                     NULL
   ,[Description]     nvarchar(max)                     NULL
   ,[Number]          int                               NULL
   ,[Line]            int                               NULL
   ,[State]           nvarchar(max)                     NULL
   ,[CreatedOn]       datetime
       CONSTRAINT [DF_LL_Error_CreatedOn] 
       DEFAULT (GETUTCDATE() )                      NOT NULL
   ,[CreatedBy]       nvarchar(100) 
       CONSTRAINT [DF_LL_Error_CreatedBy] 
       DEFAULT (SUSER_SNAME())                      NOT NULL
   ,CONSTRAINT [PK_LL_Error] 
       PRIMARY KEY CLUSTERED ([Id] ASC)
   ,CONSTRAINT [FK_LL_Error_ExecutionId] 
       FOREIGN KEY ([ExecutionId]) 
       REFERENCES [LL].[Execution] ([Id])
);#
GO

Data model

ETL process execution status

All process logging tables have two columns [State] and [Success], which store the current status of the ETL process, the execution of a component or a concrete action – for example an INSERT, UPDTAE or DELETE.
The current status is stated in the [State] column with either processing, warning or error. The success is stated with [Success] = 1. The current status of the process can only be determined by combining both status values. The following combinations of status values are permitted:

StateSuccessDescription
processing0The process is currently running
– or –
The process terminated with an exception, without updating this column.
warning0The process is currently running and a warning has been logged in the error table [LL].[Error].
or –
The process terminated with an exception, without updating this column.
warning1The process was executed “successfully”. However, a warning was logged in the error table [LL].[Error].
success1The process was executed successfully.
error0The process has thrown an exception that caused the process to be ended. The exception was catched and logged in table [LL].[Error].
Combination of status values in [State] and [Success]

Other combinations of status values are not permitted. The procedures used for logging will throw an exception if an invalid combination is passed.

Stored procedures for logging

The following procedures (among others) are available for inserting log entries into the above tables:

TableStored Procedure
[LL].[Execution][LL].[spInsertExecution]
[LL].[spUpdateExecution]
[LL].[Component][LL].[spUpdateComponent]
[LL].[spUpdateComponent]
[LL].[spUpdateComponentSuccess]
[LL].[spUpdateComponentWarning]
[LL].[spUpdateComponentError]
[LL].[Trace][LL].[spInsertTrace]
[LL].[spUpdateTrace]
[LL].[spUpdateTraceSuccess]
[LL].[spUpdateTraceWarning]
[LL].[spUpdateTraceError]
[LL].[Error][LL].[spInsertError]
[LL].[spInsertErrorException]
Stored procedures for logging

Essentially, these procedures perform an INSERT or UPDATE on the log tables. The values to be logged are passed to the procedures as parameters. The procedures check the parameters passed and raise an exception in the case of invalid parameters.

The following three code examples show the procedures [LL].[spInsertTrace], [LL].[spUpdateTrace] and [LL].[spUpdateTraceSuccess]. The remaining procedures are developed in the same fassion. The code for all procedures can be downloaded via this link.

[LL].[spInsertTrace]

-- --------------------------------------------------------------------------------
-- Parameters
--    @p_executionId          AS int
--       Execution Id of the current execution
--    @p_componentId          AS int
--       Each procedure call gets a unique id that allows identifying all messages
--       that will be written by the procedure call
--    @p_traceId              AS int OUT
--       Returns the Id of the new row in [LL].[Trace]
--    @p_source               AS nvarchar(5)
--       Type of the calling source system (SSIS, T-SQL, ...)
--    @p_component            AS nvarchar(128)
--       Name of the calling SSIS-Package
--    @p_task                 AS nvarchar(128) = NULL
--       SSIS-Task name the log entry refers to.
--    @p_entity               AS nvarchar(128) = NULL
--       Entity name the log entry refers to.
--    @p_step                 AS nvarchar(max)
--       Description of the task in the calling object that will be logged
--    @p_description          AS nvarchar(max) = NULL
--       Additional description of the task in the calling object that will be
--       logged
--    @p_fileId               AS int = NULL
--       Foreign Key to [LL].[FileList].[Id]
--    @p_action               AS nvarchar(max) = NULL
--       Specifiy any action that will be logged by this procedure call like
--       Insert, Delete, Update, ...
--    @p_affectedRows         AS nvarchar(max) = NULL
--       Specifiy the number of rows/objects that were inserted, deleted, updated, ...
--    @p_state                AS nvarchar(100)
--       State of the current task (processing, success, error, warning)
--    @p_success              AS bit
--       Specifies whether the calling procedure succeded
--       0 = processing, warning, error
--       1 = success
-- --------------------------------------------------------------------------------
CREATE PROCEDURE [LL].[spInsertTrace]
    @p_executionId           AS int
   ,@p_componentId           AS int
   ,@p_traceId               AS int OUT
   ,@p_source                AS nvarchar(5)
   ,@p_component             AS nvarchar(128)
   ,@p_task                  AS nvarchar(128)
   ,@p_entity                AS nvarchar(128)
   ,@p_step                  AS nvarchar(max)
   ,@p_description           AS nvarchar(max)
   ,@p_fileId                AS bigint        = NULL
   ,@p_action                AS nvarchar(100) = NULL
   ,@p_affectedRows          AS int
   ,@p_state                 AS nvarchar(100)
   ,@p_success               AS bit
AS
BEGIN
   SET NOCOUNT ON;

   -- --------------------------------------------------------------------------------
   -- Declare variables
   -- --------------------------------------------------------------------------------
   DECLARE @component        AS nvarchar(128);
   DECLARE @table            AS table([Id] int);
   DECLARE @message          AS nvarchar(max);

   -- --------------------------------------------------------------------------------
   -- Initialize variables
   -- --------------------------------------------------------------------------------
   SET @component = OBJECT_SCHEMA_NAME(@@PROCID) + N'.' + OBJECT_NAME(@@PROCID);

   -- --------------------------------------------------------------------------------
   -- Workload
   -- --------------------------------------------------------------------------------
   BEGIN TRY
      -- --------------------------------------------------------------------------------
      -- Check parameters
      -- --------------------------------------------------------------------------------
      BEGIN
         -- --------------------------------------------------------------------------------
         -- Check @p_executionId
         -- --------------------------------------------------------------------------------
         IF (@p_executionId IS NULL)
            BEGIN
               EXEC [dbo].[spRaiseError] N'The parameter ''p_ExecutionId'' is NULL.', @component;
               RETURN 0;
            END;
         -- --------------------------------------------------------------------------------
         -- Check @p_componentId
         -- --------------------------------------------------------------------------------
         IF (@p_componentId IS NULL)
            BEGIN
               EXEC [dbo].[spRaiseError] N'The parameter ''p_ComponentId'' is NULL.', @component;
               RETURN 0;
            END;
         -- --------------------------------------------------------------------------------
         -- Check @p_source
         -- --------------------------------------------------------------------------------
         IF ([dbo].[fnIsNullOrEmpty](@p_source, 1) <> 0 )
            BEGIN
               EXEC [dbo].[spRaiseError] N'The parameter ''p_source'' is either NULL or an empty string.', @component;
               RETURN 1;
            END;
         -- --------------------------------------------------------------------------------
         -- Check @p_component
         -- --------------------------------------------------------------------------------
         IF ([dbo].[fnIsNullOrEmpty](@p_component, 1) <> 0 )
            BEGIN
               EXEC [dbo].[spRaiseError] N'The parameter ''p_component'' is either NULL or an empty string.', @component;
               RETURN 1;
            END;
         -- --------------------------------------------------------------------------------
         -- Check @p_step
         -- --------------------------------------------------------------------------------
         IF ([dbo].[fnIsNullOrEmpty](@p_step, 1) <> 0)
            BEGIN
               EXEC [dbo].[spRaiseError] N'The parameter ''p_step'' is either NULL or an empty string.', @component;
               RETURN 1;
            END;
         -- --------------------------------------------------------------------------------
         -- Check @p_state
         -- --------------------------------------------------------------------------------
         IF ([dbo].[fnIsNullOrEmpty](@p_state, 1) <> 0)
            BEGIN
               EXEC [dbo].[spRaiseError] N'The parameter ''p_state'' is either NULL or an empty string', @component;
               RETURN 1;
            END;
         -- --------------------------------------------------------------------------------
         -- Check @p_success
         -- --------------------------------------------------------------------------------
         IF @p_success IS NULL
            BEGIN
               EXEC [dbo].[spRaiseError] N'The parameter ''p_success'' is either NULL.', @component;
               RETURN 1;
            END;
         -- --------------------------------------------------------------------------------
         -- Check combination of @p_success and @p_state
         -- --------------------------------------------------------------------------------
         IF (@p_success = 0 AND @p_state IN ('success')) OR (@p_success = 1 AND @p_state IN ('processing', 'error'))
            BEGIN
               SET @message = CONCAT('Invalid state ''', @p_state, ''' for p_success = ''', CAST(@p_success AS nvarchar(100)),'''.');
               EXEC [dbo].[spRaiseError] @message, @component;
               RETURN 1;
            END;
      END;

      -- --------------------------------------------------------------------------------
      -- Insert trace into [LL].[Trace]
      -- --------------------------------------------------------------------------------
      INSERT INTO [LL].[Trace]
      (
          [ExecutionId]
         ,[ComponentId]
         ,[Source]
         ,[Component]
         ,[Task]
         ,[Entity]
         ,[Step]
         ,[Description]
         ,[FileId]
         ,[Action]
         ,[AffectedRows]
         ,[State]
         ,[Success]
      )
      OUTPUT Inserted.Id INTO @table
      VALUES
      (
          @p_executionId
         ,@p_componentId
         ,@p_source
         ,@p_component
         ,@p_task
         ,@p_entity
         ,@p_step
         ,CASE WHEN @p_description IS NULL OR DATALENGTH(@p_description) = 0  THEN NULL ELSE @p_description END
         ,@p_fileId
         ,@p_action
         ,@p_affectedRows
         ,@p_state
         ,@p_success
      );

      SELECT @p_traceId = [Id] FROM @table;

      RETURN 0;
   END TRY
   BEGIN CATCH
      THROW;
   END CATCH;
END;
-- [LL].[spInsertTrace]

-- DECLARE @ExecutionId  AS int;
-- DECLARE @ComponentId  AS int;
-- DECLARE @traceId      AS int;
-- DECLARE @source       AS nvarchar(5);
-- DECLARE @component    AS nvarchar(128);
-- DECLARE @entity       AS nvarchar(128);
-- DECLARE @step         AS nvarchar(max);
-- DECLARE @description  AS nvarchar(max);
-- DECLARE @fileId       AS bigint
-- DECLARE @action       AS nvarchar(100) = NULL
-- DECLARE @affectedRows AS int
-- DECLARE @state        AS nvarchar(100);
-- DECLARE @success      AS bit;
--
-- SET @ExecutionId  = 1;
-- SET @ComponentId  = 1;
-- SET @source       = 'T-SQL';
-- SET @component    = 'test script';
-- SET @entity       = '[LL].[spInsertTrace]';
-- SET @step         = '[LL].[spInsertTrace]';
-- SET @description  = 'none';
-- SET @fileId       = 456;
-- SET @action       = 'Insert';
-- SET @affectedRows = 55;
-- SET @state        = 'processing';
-- SET @success      = 0;
--
-- EXEC [LL].[spInsertTrace]
--     @p_executionId  = @ExecutionId
--    ,@p_componentId  = @ComponentId
--    ,@p_traceId      = @traceId OUTPUT
--    ,@p_source       = @source
--    ,@p_component    = @component
--    ,@p_entity       = @entity
--    ,@p_step         = @step
--    ,@p_description  = @description
--    ,@p_fileId       = @fileId
--    ,@p_action       = @action
--    ,@p_affectedRows = @affectedRows
--    ,@p_state        = @state
--    ,@p_success      = @success;
--
-- SELECT @traceId;
-- SELECT * FROM [LL].[Trace] WHERE [Id] = @traceId;

[LL].[spUpdateTrace]

-- --------------------------------------------------------------------------------
-- Parameters :
--    @p_traceId              AS int
--       ID of the row in [LL].[Trace], that is to be updated.
--   @p_description           AS nvarchar(max)
--       Additional description of the task in the calling object that will be logged
--    @p_action               AS nvarchar(max) = NULL
--       Specifiy any action that will be logged by this procedure call like Insert, Delete, Update, ...
--    @p_affectedRows         AS nvarchar(max) = NULL
--       Specifiy the number of rows/objects that were inserted, deleted, updated, ...
--    @p_state                AS nvarchar(100)
--       State of the current task (processing, success, error, warning)
--    @p_success              AS bit
--       Specifies the success state of the execution
--       0 = @p_state > processing, warning, error
--       1 = @p_state > success, warning
-- --------------------------------------------------------------------------------
ALTER PROCEDURE [LL].[spUpdateTrace]
    @p_traceId               AS int
   ,@p_description           AS nvarchar(max)
   ,@p_action                AS nvarchar(100) = NULL
   ,@p_affectedRows          AS int
   ,@p_state                 AS nvarchar(100)
   ,@p_success               AS bit
AS
BEGIN
   SET NOCOUNT ON;

   -- --------------------------------------------------------------------------------
   -- Declare variables
   -- --------------------------------------------------------------------------------
   DECLARE @component        AS nvarchar(128);
   DECLARE @tempuid          AS int;
   DECLARE @message          AS nvarchar(max);

   -- --------------------------------------------------------------------------------
   -- Initialize variables
   -- --------------------------------------------------------------------------------
   SET @component = OBJECT_SCHEMA_NAME(@@PROCID) + N'.' + OBJECT_NAME(@@PROCID);

   -- --------------------------------------------------------------------------------
   -- Workload
   -- --------------------------------------------------------------------------------
   BEGIN TRY
      -- --------------------------------------------------------------------------------
      -- Check parameters
      -- --------------------------------------------------------------------------------
      BEGIN
         -- --------------------------------------------------------------------------------
         -- Check @p_traceId
         -- --------------------------------------------------------------------------------
         IF (@p_traceId IS NULL)
            BEGIN
               EXEC [dbo].[spRaiseError] N'The parameter ''p_traceid'' is NULL.', @component;
               RETURN 1;
            END;
         -- --------------------------------------------------------------------------------
         -- Check @p_success
         -- --------------------------------------------------------------------------------
         IF (@p_success IS NULL)
            BEGIN
               SET @message = 'The parameter ''p_success'' is NULL.';
               EXEC [dbo].[spRaiseError] @message, @component;
               RETURN 1;
            END;
         -- --------------------------------------------------------------------------------
         -- Check combination of @p_success and @p_state
         -- --------------------------------------------------------------------------------
         IF (@p_success = 0 AND @p_state IN (N'success')) OR (@p_success = 1 AND @p_state IN (N'processing', N'error'))
            BEGIN
               SET @message = CONCAT(N'Invalid state ''', @p_state, N''' for p_success = ''', CAST(@p_success AS nvarchar(100)),'''.');
               EXEC [dbo].[spRaiseError] @message, @component;
               RETURN 1;
            END;
         -- --------------------------------------------------------------------------------
         -- Check whether an trace log does exist in [LL].[Trace] with
         -- [ID] = @p_traceId
         -- --------------------------------------------------------------------------------
         SELECT
             @tempuid = [Id]
         FROM
            [LL].[Trace]
         WHERE
            [Id] = @p_traceId;

         IF (@tempuid IS NULL)
            BEGIN
               SET @message = N'A record with [ID] = ''' + CAST(@p_traceId AS nvarchar(max)) + N''' could not be found.';
               EXEC [dbo].[spRaiseError] @message, @component;
               RETURN 1;
            END;
      END;

      -- --------------------------------------------------------------------------------
      -- Update trace log in [LL].[Trace]
      -- --------------------------------------------------------------------------------
      UPDATE [LL].[Trace]
         SET
             [Description]  = CASE WHEN (@p_description IS NULL OR DATALENGTH(@p_description) = 0) AND ([Description] IS NULL OR DATALENGTH([Description]) = 0) THEN NULL ELSE @p_description END
            ,[Action]       = @p_action
            ,[AffectedRows] = @p_affectedRows
            ,[State]        = @p_state
            ,[Success]      = @p_success
      WHERE
         [Id] = @p_traceId;

      RETURN 0;
   END TRY
   BEGIN CATCH
      THROW;
   END CATCH;
END;
-- [dbo].[spUpdateTrace]

-- EXEC [LL].[spUpdateTrace] 1, N'process successfully finished', N'Insert', 123, N'success', 1;

[LL].[spUpdateTraceSuccess]

-- --------------------------------------------------------------------------------
-- Parameters :
--    @p_traceId              AS int
--       ID of the row in [LL].[Trace], that is to be updated.
--    @p_description           AS nvarchar(max)
--       Additional description of the task in the calling object that will be logged
--    @p_action               AS nvarchar(max) = NULL
--       Specifiy any action that will be logged by this procedure call like Insert, Delete, Update, ...
--    @p_affectedRows         AS nvarchar(max) = NULL
--       Specifiy the number of rows/objects that were inserted, deleted, updated, ...
-- --------------------------------------------------------------------------------
ALTER PROCEDURE [LL].[spUpdateTraceSuccess]
    @p_traceId               AS int
   ,@p_description           AS nvarchar(max)
   ,@p_action                AS nvarchar(100) = NULL
   ,@p_affectedRows          AS int
AS
BEGIN
   SET NOCOUNT ON;

   EXEC [LL].[spUpdateTrace]
       @p_traceId
      ,@p_description
      ,@p_action
      ,@p_affectedRows
      ,'success'
      ,1;
END;
-- [LL].[spUpdateTraceSuccess]

-- EXEC [LL].[spUpdateTraceSuccess] 1, 'description', 'Insert', 11;

Exception Handling

To ensure that an ETL process terminates in an orderly manner and is not terminated without appropriate logging, explicit exception handling is required. In this case, orderly means that the process catches a thrown exception and updates the related log entries in the tables [LL].[Execution], [LL].[Component] and [LL].[Trace] – if applicable – with [Status] = error and [Success] = 0 and the exception is logged in the table [LL].[Error]. The following diagram shows the basic procedure of exception handling:

Exception-Handling

Exception handling is required at least at the top level of the execution of an ETL process, which is logged in the [LL].[Execution] table.

The following code example shows an exception handling including logging in the tables [LL].[Execution] and [LL].[Error] according to the diagram above:

-- Execution logging variables
DECLARE @executionId    AS int;
DECLARE @processName    AS varchar(max);
DECLARE @version        AS int;

-- Error logging variables
DECLARE @componentId    AS int;
DECLARE @traceId        AS int;
DECLARE @source         AS nvarchar(5);
DECLARE @component      AS nvarchar(128);
DECLARE @task           AS nvarchar(128);
DECLARE @entity         AS nvarchar(128);
DECLARE @step           AS nvarchar(128);
DECLARE @state          AS nvarchar(128);
DECLARE @success        AS bit;

-- Exception variables
DECLARE @error_message AS nvarchar(max);
DECLARE @error_number  AS int;
DECLARE @error_line    AS int;
DECLARE @error_state   AS nvarchar(max);

BEGIN TRY
    -- Initialize execution logging variables
    SET @processName = 'Name of ETL-Process';
    SET @version     = 123;

    -- Initialize error variables
    SET @componentId = NULL;
    SET @traceId     = NULL;
    SET @source      = 'sql';
    SET @component   = 'Procedure Name';
    SET @task        = NULL;
    SET @entity      = 'Any Entity';
    SET @step        = 'Do something';

    EXEC [LL].[spInsertExecution] @executionId OUTPUT, @processName, @version;

   -- Do something
   RAISERROR('Any Exception', 15, 1);

   SET @state        = 'success';
   SET @success      = 1;
   EXEC [LL].[spUpdateExecution] @executionId, @state, @success;

END TRY
BEGIN CATCH
   SET @error_message = ERROR_MESSAGE();
   SET @error_number  = ERROR_NUMBER();
   SET @error_line    = ERROR_LINE();
   SET @error_state   = ERROR_STATE();

   SET @state         = 'error';
   SET @success       = 0;

   IF @executionId IS NOT NULL
      BEGIN
         EXEC [LL].[spInsertErrorException] @executionId, @componentId, @traceId, @source, @component, @task, @entity, @step, @error_number, @error_message, @error_line, @error_state;
         EXEC [LL].[spUpdateExecution] @executionId, @state, @success;
      END;
   THROW;
END CATCH

Example of logging and exception handling in an ETL process

After these explanations, the following diagram shows the exception handling of a simple, compact but complete ETL process:

Beispiel für das Exception-Handling und Protokollierung eines Beispiel-ETL-Prozesses

This example shows logging an ETL process developed exclusively with and consisting of 5 stored procedures. The entry procedure [T2].[spETLProcess] calls the two procedures [T2].[spDoSomething_1] and [T2].[spDoSomething_2]. The first simulates and logs an INSERT, UPDATE and DELETE statement, while the second procedure executes two additional procedures [T2].[spDoSomething_1_1] and [T2].[spDoSomething_1_2]. These procedures also simulate and log an INSERT, UPDATE and DELETE statement. During the execution of the DELETE statement in the second procedure, an exception is thrown, which leads to an orderly termination of the ETL process with [State] = error and [Success] = 0.

Prozedur-Aufrufe des Beispiel-ETL-Prozesses

The example code for creating a database with the required stored procedures for logging and the ETL process procedures mentioned in the diagram from the T2 schema can be downloaded from this link.

Showing all procedures would lead to repetition. Therefore, only the entry procedure [T2].[spETLProcess] and the procedure [T2].[spDoSomething_2_2]  are listed here.

[T2].[spETLProcess]

CREATE PROCEDURE [T2].[spETLProcess]
AS
BEGIN
   SET NOCOUNT ON;   

   -- --------------------------------------------------------------------------------
   -- Declare variables
   -- --------------------------------------------------------------------------------
   -- Error Variables
   DECLARE @error_message    AS nvarchar(max);
   DECLARE @error_number     AS int;
   DECLARE @error_line       AS int;
   DECLARE @error_state      AS nvarchar(max);

   -- Logging Variables
   DECLARE @component        AS nvarchar(128);
   DECLARE @task             AS nvarchar(128);
   DECLARE @schema           AS nvarchar(128);
   DECLARE @table            AS nvarchar(128);

   DECLARE @source           AS nvarchar(5);
   DECLARE @step             AS nvarchar(max);
   DECLARE @entity           AS nvarchar(max);
   DECLARE @message          AS nvarchar(max);

   DECLARE @traceId          AS int; 
   DECLARE @componentId      AS int;
   DECLARE @executionId      AS int;

   DECLARE @description      AS nvarchar(max);
   DECLARE @affectedrows     AS int;
   DECLARE @action           AS varchar(100);
   DECLARE @state            AS varchar(100);
   DECLARE @success          AS int;

   -- --------------------------------------------------------------------------------
   -- Initialize variables
   -- --------------------------------------------------------------------------------
   -- Logging
   SET @message          = NULL;
   SET @description      = NULL;
   SET @affectedrows     = 0;

   SET @component        = OBJECT_SCHEMA_NAME(@@PROCID) + N'.' + OBJECT_NAME(@@PROCID);
   SET @source           = N'T-SQL';
   SET @entity           = N'[].[]';

   -- --------------------------------------------------------------------------------
   -- Workload
   -- --------------------------------------------------------------------------------
   BEGIN TRY
      -- --------------------------------------------------------------------------------
      -- Check parameters
      -- --------------------------------------------------------------------------------
      EXEC [LL].[spInsertExecution] @executionId OUTPUT, N'ETL process', 123;

      -- --------------------------------------------------------------------------------
      -- Insert component log
      -- --------------------------------------------------------------------------------
      SET @step        = N'Orchestrate ETL process';
      SET @description = '';
      EXEC [LL].[spInsertComponent] @executionId, @componentId OUTPUT, @source, @component, NULL, @entity, @step, @description;
      
      -- --------------------------------------------------------------------------------
      -- Execute Procedure [T2].[spDoSomething_1]
      -- --------------------------------------------------------------------------------
      SET @task        = NULL;
      SET @step        = N'Execute [T2].[spDoSomething_1]';
      SET @action      = 'execute';
      SET @description = NULL; 
      SET @state       = 'processing';
      SET @success     = 0;
      EXEC [LL].[spInsertTrace] @executionId, @componentId, @traceId OUTPUT, @source, @component, @task, @entity, @step, @description, NULL, @action, NULL, @state, @success;

      EXEC [T2].[spDoSomething_1] @executionId;

      EXEC [LL].[spUpdateTraceSuccess] @traceId, @description, @action, @@ROWCOUNT;

      -- --------------------------------------------------------------------------------
      -- Execute Procedure [T2].[spDoSomething_2]
      -- --------------------------------------------------------------------------------
      SET @task        = NULL;
      SET @step        = N'Execute [T2].[spDoSomething_2]';
      SET @action      = 'execute';
      SET @description = NULL; 
      SET @state       = 'processing';
      SET @success     = 0;
      EXEC [LL].[spInsertTrace] @executionId, @componentId, @traceId OUTPUT, @source, @component, @task, @entity, @step, @description, NULL, @action, NULL, @state, @success;

      EXEC [T2].[spDoSomething_2] @executionId;

      EXEC [LL].[spUpdateTraceSuccess] @traceId, @description, @action, @@ROWCOUNT;

      -- --------------------------------------------------------------------------------
      -- Execute another procedure 
      -- --------------------------------------------------------------------------------
      -- ...

      -- --------------------------------------------------------------------------------
      -- Update component log 
      -- --------------------------------------------------------------------------------
      EXEC [LL].[spUpdateComponentSuccess] @componentId, @description;

      -- --------------------------------------------------------------------------------
      -- Update execution log 
      -- --------------------------------------------------------------------------------
      SET @state       = 'success';
      SET @success     = 1;
      EXEC [LL].[spUpdateExecution] @executionId, @state, @success;

   END TRY
   BEGIN CATCH
      SET @error_message = ERROR_MESSAGE();
      SET @error_number  = ERROR_NUMBER();
      SET @error_line    = ERROR_LINE();
      SET @error_state   = ERROR_STATE();

      IF @executionId IS NOT NULL
         BEGIN
            EXEC [LL].[spInsertErrorException] @executionId, @componentId, @traceId, @source, @component, NULL, NULL, @step, @error_number, @error_message, @error_line, @error_state;

            IF @traceId IS NOT NULL     EXEC [LL].[spUpdateTraceError] @traceId, @description;
            IF @componentId IS NOT NULL EXEC [LL].[spUpdateComponentError] @componentId, @description;

            SET @state       = 'error';
            SET @success     = 0;
            EXEC [LL].[spUpdateExecution] @executionId, @state, @success;
         END;
   END CATCH; 
END;

[T2].[spDo­Something_2_2]

CREATE PROCEDURE [T2].[spDoSomething_2_2]
(   
    @p_executionId           AS int
)
AS
BEGIN
   SET NOCOUNT ON;   

   -- --------------------------------------------------------------------------------
   -- Declare variables
   -- --------------------------------------------------------------------------------
   -- Error Variables
   DECLARE @error_message    AS nvarchar(max);
   DECLARE @error_number     AS int;
   DECLARE @error_line       AS int;
   DECLARE @error_state      AS nvarchar(max);

   -- Logging Variables
   DECLARE @component        AS nvarchar(128);
   DECLARE @task             AS nvarchar(128);
   DECLARE @schema           AS nvarchar(128);
   DECLARE @table            AS nvarchar(128);

   DECLARE @source           AS nvarchar(5);
   DECLARE @step             AS nvarchar(max);
   DECLARE @entity           AS nvarchar(max);
   DECLARE @message          AS nvarchar(max);

   DECLARE @traceId          AS int; 
   DECLARE @componentId      AS int;

   DECLARE @description      AS nvarchar(max);
   DECLARE @affectedrows     AS int;
   DECLARE @action           AS varchar(100);
   DECLARE @state            AS varchar(100);
   DECLARE @success          AS int;

   -- --------------------------------------------------------------------------------
   -- Initialize variables
   -- --------------------------------------------------------------------------------
   -- Logging
   SET @message          = NULL;
   SET @description      = NULL;
   SET @affectedrows     = 0;

   SET @component        = OBJECT_SCHEMA_NAME(@@PROCID) + N'.' + OBJECT_NAME(@@PROCID);
   SET @source           = N'T-SQL';
   SET @entity           = N'[].[]';

   -- --------------------------------------------------------------------------------
   -- Workload
   -- --------------------------------------------------------------------------------
   BEGIN TRY
      -- --------------------------------------------------------------------------------
      -- Check parameters
      -- --------------------------------------------------------------------------------
      BEGIN
         IF (@p_executionId IS NULL)
            BEGIN
               SET @message = N'The parameter ''p_executionId'' is NULL.';
               EXEC [dbo].[spRaiseError] @message,  @component;
               RETURN -1;
            END;
      END;

      -- --------------------------------------------------------------------------------
      -- Insert component log
      -- --------------------------------------------------------------------------------
      SET @step        = N'Do something';
      SET @description = '';
      EXEC [LL].[spInsertComponent] @p_executionId, @componentId OUTPUT, @source, @component, NULL, @entity, @step, @description;
      
      -- --------------------------------------------------------------------------------
      -- Insert data
      -- --------------------------------------------------------------------------------
      SET @task        = NULL;
      SET @step        = N'Insert data';
      SET @action      = 'insert';
      SET @description = NULL; 
      SET @state       = 'processing';
      SET @success     = 0;
      EXEC [LL].[spInsertTrace] @p_executionId, @componentId, @traceId OUTPUT, @source, @component, @task, @entity, @step, @description, NULL, @action, NULL, @state, @success;

      -- Insert here a SQL Statement that inserts data into a table

      EXEC [LL].[spUpdateTraceSuccess] @traceId, @description, @action, @@ROWCOUNT;
   
      -- --------------------------------------------------------------------------------
      -- Update data
      -- --------------------------------------------------------------------------------
      SET @task        = NULL;
      SET @step        = N'Update data';
      SET @action      = 'update';
      SET @description = NULL; 
      SET @state       = 'processing';
      SET @success     = 0;
      EXEC [LL].[spInsertTrace] @p_executionId, @componentId, @traceId OUTPUT, @source, @component, @task, @entity, @step, @description, NULL, @action, NULL, @state, @success;

      -- Insert here a SQL Statement that updates data a table

      EXEC [LL].[spUpdateTraceSuccess] @traceId, @description, @action, @@ROWCOUNT;

      -- --------------------------------------------------------------------------------
      -- Delete data      
      -- --------------------------------------------------------------------------------
      SET @task        = NULL;
      SET @step        = N'Delete data';
      SET @action      = 'delete';
      SET @description = NULL; 
      SET @state       = 'processing';
      SET @success     = 0;
      EXEC [LL].[spInsertTrace] @p_executionId, @componentId, @traceId OUTPUT, @source, @component, @task, @entity, @step, @description, NULL, @action, NULL, @state, @success;

      -- Insert here a SQL Statement that deletes data from a table
      RAISERROR('Exception in [T2].[spDoSomething_2_2]', 15, 1);
      
      EXEC [LL].[spUpdateTraceSuccess] @traceId, @description, @action, @@ROWCOUNT;

      -- --------------------------------------------------------------------------------
      -- Update component log 
      -- --------------------------------------------------------------------------------
      EXEC [LL].[spUpdateComponentSuccess] @componentId, @description;

   END TRY
   BEGIN CATCH
      SET @error_message = ERROR_MESSAGE();
      SET @error_number  = ERROR_NUMBER();
      SET @error_line    = ERROR_LINE();
      SET @error_state   = ERROR_STATE();

      -- Write in Logging
      IF @p_executionId IS NOT NULL
         BEGIN
            EXEC [LL].[spInsertErrorException] @p_executionId, @componentId, @traceId, @source, @component, NULL, NULL, @step, @error_number, @error_message, @error_line, @error_state;

            IF @traceId IS NOT NULL     EXEC [LL].[spUpdateTraceError] @traceId, @description;
            IF @componentId IS NOT NULL EXEC [LL].[spUpdateComponentError] @componentId, @description;
         END;
      THROW;
   END CATCH; 
END;

The following script contains the commands for executing the ETL process. In order to obtain a compact process log, the date in the log tables is deleted beforehand. The final SELECT statement produces the result as shown at the beginning.

-- --------------------------------------------------------------------------------
-- 01: Delete data from logging tables
-- --------------------------------------------------------------------------------
TRUNCATE TABLE [LL].[Error]
TRUNCATE TABLE [LL].[Trace]
DELETE FROM [LL].[Component]
DELETE FROM [LL].[Execution]
DBCC CHECKIDENT (N'[LL].[Component]', RESEED, 0);
DBCC CHECKIDENT (N'[LL].[Execution]', RESEED, 0);

-- --------------------------------------------------------------------------------
-- 02: Execute ETL Process
-- --------------------------------------------------------------------------------
EXEC [T2].[spETLProcess];

-- --------------------------------------------------------------------------------
-- 03: Query logging tables
-- --------------------------------------------------------------------------------
SELECT * FROM [LL].[Execution];
SELECT * FROM [LL].[Component];
SELECT * FROM [LL].[Trace];

Summary

ETL processes are data-driven processes. If data is delivered that is not expected, then there is a high probability that either not all data or even incorrect data will be written to the target system. What is only mentioned casually in this article is that at the lowest level of logging, the number of affected data records is or can be logged. Knowing the number of records expected and processed is a good indication of the success or failure of an ETL process.

This approach therefore supports the development of robust ETL processes, which also ensure an assessment of data quality.

What is a must in software development is often neglected when developing ETL processes: explicit exception handling. The exception handling presented here initially ensures that a process is terminated in an orderly manner in the event of an error. But when is there an error? And does an error really always have to lead to the process being aborted? In conjunction with knowledge of the expected and actual number of records processed, real error handling can be implemented.

The approach presented here is, so to speak, an invitation to delve more into the data and the expected result. The protocol and in particular the knowledge of the data support the developer during the development of the ETL process in being able to assess the correctness of the development.

The procedures presented merely provide a toolbox for logging. When used correctly, it produces a readable and interpretable log that allows assessing the correctness of the execution of the ETL process and the data processed.

Related Posts

Download