Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

SQL Optimization - Temporary Tables

The purpose of this documentation is to cover some basic SQL fundamentals and to understand temporary tables from another perspective. In this document we will explore how temporary tables, a popular tool for data engineers doing ETL work both on-prem and in cloud environments, can be interpreted as an anti-pattern (something commonly put forward as appropriate and effective but actually has more bad consequences than good ones).

SQL in a nutshell

SQL is the bread and butter of data engineering. The main responsibilities of the data engineer is to make data accessible and transform it for different use cases and SQL is very good at doing just that. It comes with it's own nuances, mainly it's characteristic as a declarative language. This means that you declare that you want SQL to do and it does it for you in it's own way.

SQL Core Components

To execute code, SQL uses 4 core components. The parser, optimizer, executer and the storage engine. In this document we won't go into too much detail about how these components work but we will discuss how they work and how temporary tables affect the process.

Temporary Tables

Temporary tables are a special type of table that is created and stored in a system's temporary database (for example tempdb in SQL Server). It's commonly used to create mediation results when executing a query or stored procedure. Temporary tables are automatically deleted when the session or transaction that created them ends. They are usually denoted by a # with no spaces in front of the table name.

CREATE TABLE #TempTable1 (col_id INT, value VARCHAR(25))

In terms of optimization, the use case for using temp tables is that the SQL engine doesn't have to rerun the same code over and over again (unlike a CTE). But does this mean they always go faster?

The trade off with Temporary Tables

Real world SQL code takes up some rather interesting patterns which lead to a degradation of performance. This ranges from inline spaghetti code, missing indexes and the fan favourite, multi-layer nested views. While the negative consequences of these techniques is evident, with temporary tables it's a little bit more subtle.

The main issues with temporary tables revolve around the trade off of performance with readability. when temp tables are created they do not retain indexes which can lead to more physical reads compared to a straight forward sql query. This also leads to the fact that the job of the optimizer is to create an optimal execution plan to make your code more performant. The temp tables bypasses this by telling the optimizer how it should do it's job. Therefore even though it's commonly used as best practice the use of temp tables is not always advised as for enterprise use cases the removal of indices and the lack of attention to the execution plan results in very poor sql performance.

Does this mean use temp tables in every situation? No, does this mean they're bad? No. But it's always useful to understand what your tool/technique can and can't do as the best case scenario in every case is to make sure you use the right tool for the job.

Key links: