SSIS vs. SQL: Source Code Management — Why SP Diffs Are Readable and `.dtsx` Diffs Are Not

Diff two versions of an SSIS package — even a trivial rename produces eight “changed regions” in the XML, and the diff doesn’t even locate the actual edit correctly. The same modification in a stored procedure shows three lines of diff, reviewable in 30 seconds. Source code management is a maintainability decision — not a tool question, but an artifact-format question.

What you’ll take away:

  • Comparing two versions of a T-SQL script in Visual Studio: a clearly readable inline diff.
  • Comparing two versions of an SSIS package (.dtsx): even renaming a script task produces eight “changed regions” in the XML, with incorrect diff localisation.
  • Complex example: the same hierarchy/ranking task on [DimEmployee] from AdventureWorks — versioned as a stored procedure it’s clearly diffable, as an SSIS package effectively not.
  • Source Code Management 2026: what has moved since the 2018 TFS state — Git, Liquibase/Flyway, sqlmesh.

Prerequisites: SQL Server 2017+ (for the examples), SSIS 2017+ (Visual Studio with SSDT), Git or Azure DevOps Server as VCS. Examples developed in 2018 with Visual Studio 2017 + TFS — the diff argument transfers 1:1 to the 2026 Git-centric stack.

Contents

Overview

SQL Server Integration Services (SSIS) is a powerful tool set for building ETL pipelines. There are plenty of good reasons to use SSIS, and plenty against. If we stay within the Microsoft product stack, the alternative for complex ETL pipelines is (essentially) Transact-SQL (T-SQL).

This article is part of a series of articles on the important decision criteria for choosing the right technology — SSIS and/or T-SQL.


Source code management enables source code to be stored in different versions. After a modification, the changed file can be saved as a new version. Along with the changed version, metadata such as the date and time of the change, the user ID of the person who made it, a reference to a change request, and so on are stored. Older versions of a document can be restored from these.

In addition, source code management systems offer features that are essential for multi-developer teamwork and for release management. In short: source code management is an essential building block of professional software development. In the Microsoft world, Git is today’s standard — either hosted on GitHub/GitLab/Bitbucket, or self-hosted via Azure DevOps Server (the successor to Team Foundation Server, TFS). Both integrate with the Visual Studio development environment.

Another important feature of source code management is comparing two versions of a file. Comparing two versions of the same file lets you identify the differences. One use case is the four-eyes principle: one developer edits a document, and another reviews the changes. Only the changes in the new version are relevant for the review, and those can be determined via version comparison.

A version comparison is only useful, however, when the meaning of the changes and ideally the reason for them are recognisable from the comparison. Comparing binary files is usually not helpful — neither the meaning nor the reason is visible. Comparing two versions of a text file has a high chance that both are recognisable. But this gets harder when the data in the text file is stored hierarchically and in a structured way. SSIS packages are stored in a hierarchically structured XML file with the extension *.dtsx, while SQL statements are usually stored as plain text files with the extension *.sql.

This article describes version comparison of SQL scripts and SSIS packages using three examples.

  • Comparing two versions of a SQL statement
  • Comparing two versions of a simple SSIS package
  • Comparing complex development artifacts

Comparing two versions of a SQL script

The following figure shows the result of comparing two versions of a SQL script in Visual Studio. Deviations are highlighted in colour.

The screenshot shows the SQL statement before the change on the left and the modified SQL statement on the right. In the modified statement, lines that contain a change are highlighted in light green. The changes themselves are highlighted in a stronger green. In the previous version on the left, the corresponding text is highlighted in red.

To the right of the vertical scroll bar, the changed regions are indicated throughout the document. A good description of using the file comparison can be found in Microsoft’s online documentation for Git in Azure DevOps.

Comparing two versions of a simple SSIS package

ETL pipelines in SSIS are developed as SSIS packages. SSIS packages are stored as XML documents. Microsoft itself wrote about the SSIS package format, among other things:

In the current release of Integration Services, significant changes were made to the package format (.dtsx file) to make it easier to read the format and to compare packages. You can also more reliably merge packages that don’t contain conflicting changes or changes stored in binary format.

This quote stems from the historical Microsoft documentation about the SSIS package format (a page specifically about the format overhaul in SQL Server 2014, removed from the live documentation in the meantime); the technical DTSX specification lives on as Open Spec MS-DTSX. So the format was evolved, among other things, to make packages easier to compare.

The two screenshots below show two versions of the control flow of an SSIS package, in which only the name of the second script task was changed from B SCT Skripttask to D SCT Skripttask. The screenshots come from an SSIS package where tasks were simply placed in the control flow, named freely, and connected — but not configured further. The example is simple and therefore only illustrative.

Version 1

Version 2

Version comparison

Comparing the two versions reveals something astonishing. Changing the name of a script task results in an incredible 8 changed regions of the SSIS package / XML document, shown to the right of the vertical scroll bar.

Lines 60 to 77 represent (among other things) the script task B SCT Skripttask in the previous version. According to the comparison, this task was renamed to C SCT Skripttask — not to D SCT Skripttask. Lines 78 to 95 represent (among other things) the script task C SCT Skripttask in the previous version. According to the comparison, this task was renamed to D SCT Skripttask.

To be clear: only the name of the script task B SCT Skripttask was changed to D SCT Skripttask.

The comparison simply produces a wrong result.

Comparing complex development artifacts

This section looks at a somewhat more complex example that could be found in a similar form in real-world practice.

Task

In this example, the ranking of employees in the table [AdventureWorksDW2017].[DimEmployee] is to be determined per hierarchy level along the vacation and sick-leave hours of the employees. Four key figures are to be calculated for each employee. The calculation uses the SQL Server aggregate functions (windowed functions) specified for each key figure:

  • Vacation ranking: NTILE(3) — NTILE(3) divides the set of employees (per hierarchy level) into three equally sized groups. The assignment of an employee to one of the three groups follows the ascending sort by vacation hours.
  • Vacation ranking: DENSE_RANK — DENSE_RANK assigns each employee (per hierarchy level) a position in a rank order. If two employees have the same number of vacation hours, both receive the same rank.
  • Sick-leave ranking: NTILE(3) — analogous to vacation.
  • Sick-leave ranking: DENSE_RANK — analogous to vacation.

This task is to be solved with both a SQL statement and an SSIS package. In a second part, the developed artifacts are to be modified so that the grouping criterion is no longer vacation and sick-leave hours, but the employee’s hire date and birth date.

There are two challenges in this task:

  • Determining the employee hierarchy
  • Determining the ranking

SQL statement

Transact-SQL offers easy-to-use mechanisms for both challenges:

When data is structured by a parent-child relationship — as in the table [DimEmployee] — it can be queried efficiently and recursively via a CTE:

  1: CREATE OR ALTER PROCEDURE [dbo].[sp_insert_employee_hierarchy_ranking]
  2: AS
  3: BEGIN
  4:    SET NOCOUNT ON;
  5: 
  6:    TRUNCATE TABLE [dbo].[fct_employee_hierarchy_ranking];
  7: 
  8:    WITH CTE_Employee AS
  9:    (
 10:       -- Anker der rekursiven CTE: der CEO als Top-Level-Mitarbeiter ohne
 11:       -- Vorgesetzten. [Level] = 1 markiert die Wurzel der Hierarchie.
 12:       SELECT
 13:           [EmployeeKey]
 14:          ,[FirstName]
 15:          ,[LastName]
 16:          ,[Title]
 17:          ,[ParentEmployeeKey]
 18:          ,[VacationHours]
 19:          ,[SickLeaveHours]
 20:          ,1 AS [Level]
 21:       FROM
 22:           [AdventureWorksDW2017].[dbo].[DimEmployee]
 23:       WHERE
 24:           [ParentEmployeeKey] IS NULL
 25: 
 26:       UNION ALL
 27: 
 28:       -- Rekursionsschritt: alle Mitarbeiter, deren [ParentEmployeeKey] auf
 29:       -- einen bereits in der CTE enthaltenen Mitarbeiter zeigt. [Level] wird
 30:       -- pro Tiefe um 1 erhöht. UNION ALL ist die einzige rekursions-erlaubte
 31:       -- Mengenoperation zwischen Anker und rekursivem Glied.
 32:       SELECT
 33:           T01.[EmployeeKey]
 34:          ,T01.[FirstName]
 35:          ,T01.[LastName]
 36:          ,T01.[Title]
 37:          ,T01.[ParentEmployeeKey]
 38:          ,T01.[VacationHours]
 39:          ,T01.[SickLeaveHours]
 40:          ,T00.[Level] + 1 AS [Level]
 41:       FROM
 42:          [AdventureWorksDW2017].[dbo].[DimEmployee] AS T01
 43:       INNER JOIN
 44:          CTE_Employee AS T00
 45:          ON
 46:            T01.[ParentEmployeeKey] = T00.[EmployeeKey]
 47:       WHERE
 48:          T01.[Status] = N'Current'
 49:    )
 50:    INSERT INTO [dbo].[fct_employee_hierarchy_ranking]
 51:    (
 52:        [ParentEmployeeKey]
 53:       ,[EmployeeKey]
 54:       ,[LastName]
 55:       ,[FirstName]
 56:       ,[Title]
 57:       ,[Level]
 58:       ,[VacationHours]
 59:       ,[SickLeaveHours]
 60:       ,[VacationHours_NTILE]
 61:       ,[VacationHours_DENSE_RANK]
 62:       ,[SickLeaveHours_NTILE]
 63:       ,[SickLeaveHours_DENSE_RANK]
 64:    )
 65:    SELECT
 66:        [ParentEmployeeKey]
 67:       ,[EmployeeKey]
 68:       ,[LastName]
 69:       ,[FirstName]
 70:       ,[Title]
 71:       ,[Level]
 72:       ,[VacationHours]
 73:       ,[SickLeaveHours]
 74:       ,NTILE(3)     OVER (PARTITION BY [Level] ORDER BY [VacationHours],  [EmployeeKey]) AS [VacationHours_NTILE]
 75:       ,DENSE_RANK() OVER (PARTITION BY [Level] ORDER BY [VacationHours],  [EmployeeKey]) AS [VacationHours_DENSE_RANK]
 76:       ,NTILE(3)     OVER (PARTITION BY [Level] ORDER BY [SickLeaveHours], [EmployeeKey]) AS [SickLeaveHours_NTILE]
 77:       ,DENSE_RANK() OVER (PARTITION BY [Level] ORDER BY [SickLeaveHours], [EmployeeKey]) AS [SickLeaveHours_DENSE_RANK]
 78:    FROM
 79:       CTE_Employee;
 80: END;
 81: GO

This statement provides each employee with the hierarchy level in column [Level] plus the four key figures along vacation and sick-leave hours. In total, the 254 employees are organised in 5 hierarchy levels.

A comparison of the statement shown above with a modified version (in which VacationHours is replaced by HireDate and SickLeaveHours by BirthDate) is exactly what the first diff example above illustrates — three column edits, one data-type change, the diff localises the ~15 affected lines clearly.

SSIS package

As easy as this task was to solve in T-SQL, developing the SSIS solution was hard and especially time-consuming — if you want to rely exclusively on SSIS-native tools. The following approach was chosen for the solution shown here:

  • Each task requires two tables in the database.
  • The Data Flow 1000 DFT Calculate Levels determines only the hierarchy.
  • The Data Flow 3000 DFT Calculate Ranking determines only the ranking and stores the result in a table.
  • The ranking for vacation and sick-leave hours is calculated in two script tasks each.

There may well be a much simpler solution.

Control Flow

The control flow of the SSIS package still looks reasonably simple with 4 tasks.

Data Flow 1000 DFT Calculate Levels

While T-SQL supports a recursive approach, SSIS (to my knowledge) does not. The mapping of the 5 hierarchy levels therefore has to be developed level by level.

Data Flow 3000 DFT Calculate Ranking

The actual ranking calculation happens in Script Tasks by comparing two consecutive records.

Modifying the package

After completing the package it was committed to Git (or, in an on-prem stack, to Azure DevOps Server) and then modified so that the ordering criterion for the ranking was no longer vacation/sick-leave hours but hire date/birth date. In essence three field names and one data type had to change. Despite the larger number of tasks involved, the scope of the modification stayed limited and was done within minutes. The result was committed as well.

Comparing the two versions

Comparing the two committed versions produced the following changes:

What matters for judging the scope of the change in the SSIS package is the right-hand vertical scroll bar. The hit locations of the modifications are highlighted in green and red along the scroll bar. Despite the rather minor scope of the actual change, the entire SSIS package has been turned upside down.

While the T-SQL solution can be developed quickly, developing the SSIS package is laborious — it cost me several hours. The main reasons were the poor readability of an SSIS package and the (surprising) realisation that there is no out-of-the-box SSIS solution for what looked like a simple task. At the start I experimented with a single formula expression to calculate the hierarchy level — after two or three hours I waved the white flag and chose the approach shown here, at the cost of the readability and flexibility of the SSIS package.

After finishing the package and committing the solution, only relatively few changes were required to switch the ordering criterion for the calculation. Those few changes resulted in numerous changes in the underlying XML document. A version comparison leaves the developer puzzled about what actually changed in the SSIS package. By contrast, with T-SQL the solution path is directly readable and changes are traceable.

Source Code Management 2026

The article’s core question — how diffable is a development artifact — is no longer a pure SQL-Server-/SSIS-only question in 2026. The Microsoft stack is just one option among others, and the diff pragmatics argument carries over across stacks.

The Microsoft stack today

Visual Studio database projects come in two flavours today — classic SSDT (.sqlproj, MSBuild legacy) and SDK-style (Microsoft.Build.Sql, in preview, compatible with the VS Code mssql extension); both remain supported. SSIS projects ship separately via the SSIS Projects Extension for Visual Studio 2022+ and are still .dtsx-XML-based, with the GUID-reordering problem unchanged. Microsoft itself has shifted the source-control standard from Team Foundation Server (TFS) to Git — Azure DevOps Server is the TFS successor and supports Git as the modern VCS backend (TFVC remains optionally available for legacy compatibility). The cloud move to Azure Data Factory + Synapse Pipelines introduces the same diff problem in JSON-defined pipelines instead of XML — the tool changes, the artifact-format question stays.

The Git world

Outside the Microsoft stack the world is more open and more SQL-centric. Three building blocks shape the 2026 version-control practice:

  • Git as default VCS. GitHub, GitLab, Bitbucket. SP diffs are plain-text diffs, trivially reviewable in any pull-request tool.
  • Schema migration tools. Liquibase and Flyway version schema changes as plain-SQL migration files. Each V001__add_employee_table.sql is diffable like any other SQL script.
  • sqlmesh. sqlmesh versions SQL pipelines with explicit state tracking — the .sql files are the versioned artifacts, and the maintainability lever is the same as for a stored procedure, just stack-agnostic (Postgres, Snowflake, BigQuery, Redshift, Databricks, ClickHouse, DuckDB, Microsoft Fabric, and more). Since March 2026 sqlmesh has been a Linux Foundation project (previously Tobiko Data, now part of Fivetran), placing it under community governance rather than single-vendor stewardship.

The question remains: what can be diffed?

Across stacks, the same point holds: it’s the artifact format that decides, not the tool. Plain SQL (stored procedure, Liquibase/Flyway migration, sqlmesh model) is trivially diffable. XML/JSON with GUID and position properties (.dtsx, ADF pipeline JSON) needs special tooling like BIDS Helper or structural-tree diff extensions.

Whoever wants to keep ETL logic versionable puts the substance into SQL artifacts. Whoever still has to version SSIS packages complements that with:

  • BI Developer Extensions (formerly BIDS Helper) or the SSIS Compare and Merge Tool from the Visual Studio Marketplace (structural-tree diff instead of XML diff — both are community tools; there is no official SSDT feature with a GUID-noise filter).
  • Screenshot diff of the Control Flow and Data Flow designers as a visual pragmatic approach.
  • Offload accompanying SQL scripts into stored procedures, so the .dtsx only contains orchestration and the content diff lives in the SQL script. That’s the strategic answer — and SSIS vs. SQL: Readability and Maintainability — how much SQL belongs in an SSIS package? makes the same point from the maintainability angle.

Take-away

  • SP diffs are readable, .dtsx diffs are not. That is a strategic architectural decision, not a tool question — and it carries across stacks to every XML- or JSON-based ETL format.
  • Trivial edit ≠ trivial diff. Renaming a script task produces eight XML region markers, and the diff does not localise the actual edit correctly. Whoever wants an audit trail has to pick a diff-friendly artifact format.
  • Versionability is an artifact-format question. Plain SQL wins: stored procedure, Liquibase/Flyway migration, sqlmesh model. XML/JSON artifacts are diffable only with special tooling.
  • Strategic answer: offload the ETL substance into SQL artifacts and keep .dtsx packages (or ADF pipelines) as pure orchestration. That keeps the audit trail in the SQL diff.

FAQ

Can I version .dtsx packages in Git in any meaningful way?

Yes — but the diff value is limited. You can commit and merge .dtsx files without trouble, but every edit makes SSIS reorder GUIDs and position properties in the XML. Practical answer: commit them, plus add a readable companion diff via BIDS Helper, a screenshot diff, or SQL offloading.

Which tool gives me a readable .dtsx diff?

Several approaches exist: (1) BI Developer Extensions (formerly BIDS Helper) for Visual Studio with Smart Diff (structural-tree diff instead of XML diff) — Marketplace builds are officially available for VS 2017/2019; no full VS 2022 build is published. (2) SSIS Compare and Merge Tool from the Visual Studio Marketplace — a community tool that covers the GUID-noise filter and also runs in VS 2022; SSDT itself doesn’t ship a .dtsx-specific diff. (3) Screenshot diff of the Control Flow and Data Flow designers is pragmatic — not a versioned diff, but visually directly graspable. Each of these three beats the naive XML diff.

How do I organise a mixed repository with stored procedures and SSIS packages?

In practice: keep SQL artifacts (SPs, views, functions) in a Visual Studio database project with *.sql files, and SSIS packages in a separate Visual Studio Integration Services project with *.dtsx files. Both projects land in the same Git repo, but pull-request reviews distinguish the artifact types: SQL diffs are reviewed as plain text, SSIS diffs need a structural-tree diff or a screenshot complement. Best practice: offload as much SQL into SPs as possible, which reduces the .dtsx diff problem to pure orchestration edits.

What does an SP diff look like in the typical pull-request workflow?

Three lines in a GitHub/GitLab/Azure DevOps PR view — the changed lines highlighted in light green, the deleted ones in red. The reviewer sees in 30 seconds that [VacationHours] was replaced by [HireDate] and the data type switched from float to date. This pattern is exactly what makes the audit trail of an ETL pipeline practicable in the first place.

SSIS-vs.-SQL cluster:

ETL context: