Documentation of ETL Processes


When developing an ETL process, data from different data sources must often be processed, transformed, and loaded into the structure of the target system. The transformation of data is anything but trivial since data structures and master data of the sources may differ fundamentally from the structures and master data of the target system. There are numerous data-driven challenges to consider:

  • Unknown Identifier
  • Poor data integrity
  • Incompatibility of data types
  • Business rules

Inadequate knowledge of these points usually leads to problems with the ETL process and, in the worst case, to bad data in the target system.

After all, there is only the option to take a closer look at the data itself and the data structures in both the sources and the target. Only a profound analysis of the data and a proper documentation of data structures of both sides will ensure the development of robust ETL processes. As banal as this demand sounds, it is difficult to implement.

With this series of articles, I would like to give some suggestions for a suitable analysis of the data, the documentation or specification and the integration of a specification into the development process.

  • Understanding the data source
  • Analysis of data
  • Documentation of data structures – Find the proper tool
  • The Kimball Sheet
  • Integration oft he Kimball Sheets in the development process