Formatting SQL Statements (Part 2)

Dieser Artikel in Deutsch…

Introduction

Developing SQL code follows a completely opposed paradigm compared to developing software code. While one single SQL statement may consist of hundreds of lines, software commands tend to be rather short and compact. You use many commands for achieving a goal in software development. In SQL server it is highly recommended to write one single statement as SQL is based on the mathematical set theory. Well known and accepted programming styles are therefor not applicable to SQL code.

Integrated development environments such as Microsoft Visual Studio supports developing  structured and formatted code with automatic source code formatting and code analysis features. SQL Server Management Studio (SSMS) proves to have a lack of comparable features when developing SQL artifacts.

With that it appears to much more challenging writing good SQL statements than software code.

SQL provides a rich vocabulary. The core of SQL, however, is querying data. At least it is that what the abbreviation SQL stands for: Structured Query Languages. A SELECT statement can be regarded as a synonym for SQL. In this article I have written down some best practice rules for structuring and formatting a SELECT statement. Of course, the rules mentioned here can also be applied to INSERT, UPDATE and DELETE clauses, as well as – for example – to the keywords for the control flow of statements: IF … ELSE, TRY … CATCH, WHILE, etc.

All best practices have in common that similar elements are left aligned and indented neatly. The indentation structures a statement, while a uniform alignment of similar language elements helps to organize sub elements in a table like manner. Both are important for a quick visual navigation even within complex statements. Of course, it is not possible to always meet both rules. This article should only be a rough suggestion for your own thoughts on the uniform structuring and formatting of SQL statements.

If all developers in a multi-developer project can agree on how to format and structure SQL statements, each developer and especially new team members will appreciate this very much after a certain time.

The following aspects of structuring SQL statements are considered in this article:

Basic thoughts in indentation

The format of a table of content (TOC) is not only a question of style but also a matter of clarity and with that readability. A common formatting style includes indentation of sub chapters as shown in the following example:

1. Chapter Level 1
   1.1. Chapter Level 2
      1.1.1. Chapter Level 3
      1.1.2. Chapter Level 3
   1.2. Chapter Level 2
      1.2.1. Chapter Level 3
      1.2.2. Chapter Level 3
      1.2.3. Chapter Level 3
      1.2.4. Chapter Level 3
   1.3. Chapter Level 2
      1.3.1. Chapter Level 3
      1.3.2. Chapter Level 3
      1.3.3. Chapter Level 3
2. Chapter Level 1
   2.1. Chapter Level 2
      2.1.1. Chapter Level 3
      2.1.2. Chapter Level 3
   2.2. Chapter Level 2
      2.2.1. Chapter Level 3
      2.2.2. Chapter Level 3

Compare the first example with the following TOC that left aligns all chapters and sub chapters in the same way:

1. Chapter Level 1
1.1. Chapter Level 2
1.1.1. Chapter Level 3
1.1.2. Chapter Level 3
1.2. Chapter Level 2
1.2.1. Chapter Level 3
1.2.2. Chapter Level 3
1.2.3. Chapter Level 3
1.2.4. Chapter Level 3
1.3. Chapter Level 2
1.3.1. Chapter Level 3
1.3.2. Chapter Level 3
1.3.3. Chapter Level 3
2. Chapter Level 1
2.1. Chapter Level 2
2.1.1. Chapter Level 3
2.1.2. Chapter Level 3
2.2. Chapter Level 2
2.2.1. Chapter Level 3
2.2.2. Chapter Level 3

The table of content of the first TOC appears to be much more comprehensible due to the indentation than the second one.  Of course, you will find also TOCs that are completely left aligned. Those TOCs often facilitate formatting features like bold letters and font size to increase the readability of the table of content.

Main language elements

The main language elements of a SELECT statement include the following clauses:

SELECT
FROM
WHERE
GROUP BY
HAVING
ORDER BY

Considering theses keywords as the main elements of level one all language elements and keywords of the second level must be indented by the agreed amount of white spaces when following the above stated rule of indentation. As a result, a well-structured statement would look like as follows:

SELECT
   field list
FROM
   table//view//cte//sub-SELECT
WHERE
   constraint
GROUP BY
   field list
HAVING
   constraint
ORDER BY
   field list;

In any case, the main elements of a SQL statement should be noted in separate lines. Don’t mix them up with second level elements.

In opposite to the above example let’s have a look at two commonly found styles, where no clear rule concerning indentation was applied. Both styles force the reader of the statement to in fact read the statement at least in parts in order to identify the main elements.

Completely left aligned SQL statement

Occasionally, you may find top-level elements and next-level elements equally left aligned. The following example applies this rule among others to the FROM clause. All data sources are indented to the FROM keyword equally:

SELECT field 1, field 2, field 3
FROM table 1
LEFT JOIN table 2 ON [...]
LEFT JOIN table 3 ON [...]
WHERE constraint 1
OR constraint 2
OR constraint 3
GROUP BY field 1, field 2
HAVING constraint 1
OR constraint 2
OR constraint 3
ORDER BY field 1, field 2;

Keywords are right aligned

When asking Google for other formatting style guides, I came across with a quite curious best practice rule: Right align keywords in each line. For those key phrases consisting of two keywords, apply the rule to the first keyword. At the first glance the statement appears to be more or less structured. However, in more complex statements this rule does not serve readability and comprehensibility, because it does not allow identifying easily where e.g. a FROM clause starts and where it ends.

SELECT field 1, field 2, field 3
  FROM table 1
  LEFT JOIN table 2 ON [...]
  LEFT JOIN table 3 ON [...]
 WHERE constraint 1
    OR constraint 2
    OR constraint 3
 GROUP BY field 1, field 2
HAVING constraint 1
    OR constraint 2
    OR constraint 3
 ORDER BY field 1, field 2;

For rather simple statements, however, this rule may serve a good readability and comprehensibility: e.g. for a CREATE INDEX statement.

CREATE UNIQUE NONCLUSTERED 
 INDEX [IndexName] 
    ON [dbo].[FactInternetSalesReason]([SalesOrderNumber] ASC);

SELECT field list

The natural reading order (of an SQL statement) is left to right and top to bottom. Regarding the navigation options with a keyboard or the mouse, the vertical navigation is much easier than the horizontal navigation. The mouse wheel and the keys Page Up/Page Down allow for a fast vertical navigation even within long statements.

The number of lines of a SQL Statement can easily grow when writing every field of a field list in a separate row. In my opinion, a list of field names should occupy one line per field or expression with a prefixed comma (see Formatting SQL Statements (Part 1); section The comma). With that you get a neatly left aligned list of field names that should be indented by the agreed amount of white spaces.

SELECT
    field1
   ,field2
   ,field3
FROM [...]   
WHERE [...]
GROUP BY
    field1    
   ,field2
   ,field3 
HAVING [...]
ORDER BY [...]
    field1    
   ,field2
   ,field3

WHERE clause

The WHERE clause is explained next here, instead of continuing with the FROM clause. The following rules are best explained first on the base of a rather simple WHERE clause instead of a more complex FROM clause. However, the rules may be applied to both clauses and others.

A WHERE clause contains one or more constraints that are connected with logical operators. When formatting constraints in a WHERE clause, you should give a thought to two issues:

  • Alignment of operands and operators
  • Indentation of equivalent constraints

Allignment of operands and operators

A simple constraint usually consists of two operands and an operator that indicates how to compare the two operands: =, <, >, !=, IN, NOT IN, etc. You may find a compound WHERE clause more readable when aligning the operands and operators neatly regardless of the length of the left field names and the operands as in the following example:

    T01.[fieldl___1]    =  'something'
AND T01.[fieldl__2]     <> 1
AND T01.[fieldl_____3]  NOT IN (1, 2, 3)
AND T01.[field4]       = T02.[field5]

As a result, you will have a table-like notation of the compound WHERE clause. It allows the reader to quickly and visually safely navigate within the components of the WHERE clause.

Indentation of equivalent constraints

When using more than one constraint you must connect the constraints with logical binary operators AND or OR. In more complex expressions must even use parentheses for enforcing a certain evaluation order of constraints or expressions. Expressions, that must be evaluated as a whole, must be enclosed in parentheses. Nested expressions are common. With that you quickly achieve a complexity in WHERE clauses that needs attention concerning formatting rules. In order to clarify the evaluation-order you should indent all expressions that will be evaluated at the same time by the same amount of white spaces as in the following example:

[...]
WHERE 
   (
          (
                [operand01] = [operand02]
             OR [operand03] = [operand05]
             OR [operand05] = [operand06]
          )
      AND (
                [operand07] = [operand08]
             OR [operand09] = [operand10]
          )
      AND [operand11] = [operand12] 
   )
OR (
      [operand13] = [operand14]
   )

A screenshot of the same WHERE clause in Notepad ++ may help to understand the motivation for the indentation, as Notepad++ draws vertical a helper line after each tab within those areas of a line that are either occupied by whitespaces or tabs.

FROM clause

We of course can apply the above-mentioned concepts on the FROM clause, too. The FROM clause, basically, specifies the data source of a query: Tables, Views, Common Table Expressions and  Sub-Selects.

Annotation on Sub-Selects

Using Sub-Selects within a FROM clause is far from being beneficial for the readability. Sub-Selects should be written as Common Table Expressions (CTE) at the beginning of a SQL statement wherever possible. CTE’s do have many advantages over Sub-Selects.

A FROM clause can contain – among others – the following language elements:

  • Table
  • JOIN Operator
  • Keyword ON
  • JOIN Constraints

The following statement shows all language elements left aligned. It is hard for the reader to identify quickly the data sources and how the data sources are joined… one of the most important information of a SQL Statement:

SELECT
[...]
FROM
table1 T01
JOIN table2 T02
ON T01.[field1] = T02.[field1]
JOIN table3 T03 ON
T02.[field2] = T03.[field2] AND T02.[field3] = T03.[field3]
JOIN table4 T04
ON T03.[field4] = T04.[field4]
WHERE
[...]

Here are some ideas on how to improve the readability of a FROM clause:

  • Put all above mentioned language elements in separate lines.
  • There is one exception from the last rule: Joined tables, views or CTE’s should be written directly after the JOIN Operator.
  • I prefer writing the ON keyword in a separate line. With that I find it easier to identify where JOIN constraints start in a FROM clause.
  • For JOIN constraints I always try to apply the concepts of a WHERE clause.

When applying these rules to the above statement we get the following FROM clause:

SELECT
   [...]
FROM
   table1 T01
   JOIN table2 T02
   ON
     T01.[field1] = T02.[field1]
   JOIN table3 T03
   ON
         T02.[field2] = T03.[field2]
     AND T02.[field3] = T03.[field3]
   JOIN table4 T04
   ON
     T03.[field4] = T04.[field4]
WHERE
  [...]

GROUP BY, HAVING, ORDER BY clauses

The above mentioned concepts can easily be adapted for the remaining main language elements: GROUP BY, HAVING and ORDER BY clause. A GROUP BY clause and the ORDER BY clause usually contain a field list. So, apply here the concepts explained in the section SELECT field list. A HAVING clause is similar to the WHERE clause. So, aplly here the conscepts of the WHERE clause.