With SQL Server 2016, Microsoft has presented to us a new feature of temporal, or system-versioned temporal tables. Even the name is a little bit confusing, it is all about two tables: one for actual, current data, and the other one for historical data.
A system-versioned temporal table is a user table that keeps a full history of data changes and the period of validity for each row is managed by the system. The main table that contains current versions of data is referred to as the current table or temporal table, while the table that stores all versions of data is referred to as the history table.
There are several cases for temporal tables and tracking history of data changes like:
- data auditing and data forensics
- slowly changing dimensions
- repairing and recovering record level corruptions
- reproducing reports and analyzing trends over time
- anomaly detection
- reconstructing the state of data as of any time in the past
HOW DOES IT WORK
As we mentioned earlier, system-versioning for a table is set as a pair of tables: a temporal table and a history table. The temporal table contains the current value for each row, and history table contains each previous value for each row with start and end time for a period for which it was valid. To change a regular table to system-versioned temporal table, we have to explicate defined two datetime2 columns. These columns are known as period columns and are used by the system to record the period of validity for each row. The first one is period start column in which the system records the start time for the row, and the other is period end column in which the system records the end time for the row. We can give any name we want to that period columns.
The history table is mirrored schema of the temporal table and it is referenced by the temporal table. The system automatically stores all changes made to the temporal table into the history table. The name of history table can be specified default by the system or by the user while creating the temporal table.
When using temporal tables, the system performs a consistency check to ensure that the schema complies with requirements and that the data is consistent. This check can be divided into two groups: schema checks and data checks. Schema check is checking that the schema is consistent like names, a number of columns and datatypes are same in both, temporal and history table, that there are no triggers, constraints or identity columns set on history table etc. Data check is performed when creating a link to an existing history table which contains data. Then it checks that period end time is greater or equal to period start time. This ensures that existing records do not overlap.
On the insert, the system records the begin time of the current transaction in period start column and the maximum value of ‘9999-12-31’ in period end column. On update, the system stores previous values of the row to the history table and records the begin time of current transaction in period end column, while in the current table the system sets a value for period start column to the begin time of a current transaction. On delete, the system stores previous values of the row to the history table and records the begin time of current transaction in period end column, while in the current table the row is removed.
Because all the times recorded in the period columns are based on the begin time of the transaction itself, all rows inserted or deleted or updated within a single transaction, will have the same UTC recorded.
HOW TO CREATE IT
We can create the system-versioned temporal tables on existing tables or create the entirely new tables with system versioning set to on. There are three different ways of creating system-versioned temporal tables which differs only by how the history table is specified:
- temporal table with an anonymous history table
- temporal table with a default history table
- temporal table with a user-defined history table
When creating a temporal table, we have to follow some rules:
- the system-versioned temporal table must have a primary key defined
- the system-versioned temporal table must have precisely one PERIOD FOR SYSTEM_TIME defined with two datetime2 columns, declared as GENERATED ALWAYS AS ROW START/END
- the period columns must be a non-nullable
- history table must be schema-aligned with temporal table
- the history table is created as a row store table with the ability of PAGE compression
- history table cannot have any of table constraints or triggers
- history table must be created in the same database as the temporal table
- the clustered index that contains the period columns is automatically generated on the history table if the temporal table is not created with a user-defined history table
- temporal and history tables cannot be FILETABLE and cannot have a FILESTREAM datatype
- ON DELETE CASCADE and ON UPDATE CASCADE are not permitted in SQL server 2016 but it is supported in SQL server 2017
Creating the temporal table with anonymous history table is a most straightforward and easiest way of creating it. We specify a schema of the temporal table and let the system to create a corresponding history table with an auto-generated name. This is a good way when we need to quickly create a temporal table for testing reasons. The syntax is as follows:
The backdown of this way of creation of temporal table is that the system will automatically create a name of history table. Even it is always created in the same schema as a temporal table, it also adds a suffix to the name of history table to ensure uniqueness of table name, and then it is harder to query a data in the history table because we have to put the whole name of the table in query:
Creating the temporal table with default history table is the excellent option because the system automatically builds a history table with all default configuration, but we can specify a name of that history table. The syntax is:
As we can see, the only difference in syntax is that when creating the temporal table with default history table, we must specify the name of history table within WITH clause. While creating a default history table, some special rules must be applied:
- the schema name is mandatory
- if the schema does not exist, the creation of the table will fail
- if there is already a table with the same name, it will be validated against the newly created temporal table.
Creating of the temporal table with user-defined history table is convenient when we want to specify a history table with specific indexes and storage options. We then have to first create a history table with indexes, and then we create a temporal table with specifying a user-defined history table.
ALTERING NON-TEMPORAL TABLE TO TEMPORAL
We can enable a system versioning on an existing table. There are a lot of benefits for that like less complexity, immutable history, better DML performance, minimal maintenance costs etc. When converting an existing table to system-versioned temporal table, we should always use the HIDDEN clause on period columns so that they do not appear in searches. That way we don’t have to change the application to work with new columns and we avoid possible errors.
The first thing when adding versioning to the non-temporal table is to add two-period columns to the existing table but with the default constraints. That default constraints must be carefully chosen and it has to be as UTC for start period column. Adding period columns will perform a data consistency check to ensure that periods defaults are valid. After that, we must provide a name for empty history table, or let that system generate the name on its own.
When we want to migrate tables with trigger-based data tracking to built-in temporal support, we have to drop triggers on the existing table and make sure that all period columns, in history and temporal tables, are set to be non-nullable. Then we add existing period columns to PERIOD FOR SYSTEM_TIME definition. That way we implicitly setting generated_always_type to as_row_start/end to those columns. To be sure that we have done all correctly, it is good to enforce data consistency check on existing data.
If we want to alter a temporal table by adding, removing or altering columns, we can do that using ALTER TABLE command. When we make a change to the temporal table, the same changes are applied to the history table. During altering table, the system holds a schema lock on both, temporal and history table. Also, if we are adding a non-nullable column, or changing existing one to non-nullable, then we must specify the default values for that column. After the new non-nullable column in history table is populated, it is a good practice to drop a default constraint on history table, because all columns are populated already automatically when there is a change in the temporal table.
There are also few types of columns that are not possible to add with alter table while the system versioning is set to on. For that types, we have first set system versioning to off and then add columns to the temporal and history table. These types are computed columns, identity columns, sparse columns, column_set, and rowguidcol column.
QUERYING TEMPORAL AND HISTORY DATA
The syntax is similar to the regular SELECT statement with additional FOR SYSTEM_TIME clause in FROM clause. We can use that syntax directly on the table, in the joins, on the views, CTEs, table-valued functions, and stored procedures. The FOR SYSTEM_TIME clause has five temporal-specific sub-clauses to query temporal and history tables: AS OF datetime, FROM starttime TO endtime, BETWEEN starttime AND endtime, CONTAINED IN (starttime, endtime), ALL. Each sub-clause is specific and returns the different set of data. When we want to get actual data, we query table same as the non-temporal table. If we do not want to show period columns we can use HIDDEN property on that columns so they do not show in results. If the HIDDEN property is already set on period columns, and we want them to show in the result, we must reference them specifically in the SELECT statement. Period columns are set as UTC and if we want to query period columns in local time we can use an AT TIME ZONE hint.
AS OF sub-clause returns a data with rows containing the values that were actual at the specified time. Query search for data that have a period start date equal or less than the search date and period end date greater than the search date. It’s the same as we use a WHERE clause PeriodStartTime <= search_date_time AND PeriodEndTime > search_date_time.
FROM TO sub-clause is used for querying a specified range time. With this sub-clause, we search for data that have a period start time less than search end date and period end time greater than search start time. Querying with WHERE clause would be PeriodStartTime > search_start_date_time AND PeriodEndTime < search_end_date_time.
BETWEEN AND sub-clause is same as FROM TO sub-clause except the query results includes rows that became active in the given time range. For this one, WHERE clause would be PeriodStartTime <= search_end_date_time AND PeriodEndTime > search_start_date_time.
CONTAINED IN sub-clause returns all the data that were opened and closed in the given time range. With WHERE clause: PeriodStartTime >= search_start_date_time AND PeriodEndTime <= search_end_date_time. This is a recommended clause when searching for history versions only.
ALL sub-clause returns all the data from history and temporal table.
FOR SYSTEM_TIME clause filters out rows that have PeriodStartTime = PeriodEndTime. These rows can happen when we have a multiple data manipulation on the same primary key within the same transaction because for period start and end times system uses the time of starting the transaction and not of the actual time of manipulation with data. If we need that data also, we have to query history table directly.
MODIFYING DATA IN A SYSTEM-VERSIONED TEMPORAL TABLES
Modifying data in a system-versioned temporal table is same as modifying data in any non-temporal table with differences that a period columns cannot be modified. When inserting data, we need to refer to the period columns if they are not hidden. If they are hidden, then we do not need to account them, because the system will automatically generate values for these columns. If the period columns are visible, then we have three different scenarios for inserting data. The first is when we specify the column list in insert statement, and we omit period columns. Then we do not have to specify values for period columns. The second is, if we do specify the period columns in columns list for insert statement, then we must specify default as values for period columns. The third is, if we do not specify column list in insert statement, then we also have to specify default as values for period columns.
Inserting data with PARTITION SWITCH is possible but the staging table that is used with a system-versioned temporal table must have SYSTEM_TIME PERIOD defined, even it does not have to be a system-versioned temporal table.
The update is performed with the regular update statement. We can also update data in the temporal table using data from the history table. But, we can not update a period columns and we can not update history table while system versioning is set to on. If we want to update the history table, we have first set system versioning to off and then make an update. But if we do that, the system can not guarantee us that the data is correct.
When using a delete statement, the data is deleted from the temporal table, but it will remain in history table with end time column populated with a begin time of the started transaction. Deleting data from the history table and truncate statement are not possible while system versioning is set to on.
HOW TO STOP SYSTEM-VERSIONING
We can stop system versioning at any time we want. It could be because of maintaining the temporal table or we do not need system versioning anymore or even if we have to update some data in the history table (although this is not recommended because then we do not have a correct data in history table anymore). The syntax is straightforward:
After that, we will have two independent tables in the database: current table with period definitions and history table as a regular table. It is good to know that there is no data loss when setting system versioning to off and the system will continue to update period columns in the current table if the period column is not dropped.
With system-versioned temporal tables, we must look out on the increase of database size. The broader history tables are, the higher the database size is. That leads to higher storage costs and lack of performance on temporal querying. For that reason is good practice to have a data retention policy. There are four approaches for managing historical data in the temporal history table: stretch database, table partitioning, custom cleanup script and retention policy.
Stretch database migrates merely data to Azure. We can stretch the entire history table if it is about data audit or stretch a portion of the history table if we often query historical data. When stretching a portion of the history table, we need to specify a predicate function to select the rows that will be migrated from the history table.
Using a table partition approach makes greater scalability and tables more manageable. With table partition, we can move out the oldest partition of the historical data from the history table and keep the size of the retained part constant concerning age.
When using a custom cleanup script to delete old history data, we have first set system versioning to off. We make a generic script that will run periodically as scheduled job and delete historical data. We should delete data from all temporal tables and in small chunks to prevent long blocking.
Applying retention policy is simple because it requires only one parameter to be set. It can be configured at the individual table level, so users can create flexible policies. To configure a retention policy, we first have to check if the temporal historical retention is enabled at the database level, and after that set history retention on the table.
MEMORY-OPTIMIZED SYSTEM-VERSIONED TEMPORAL TABLES
There are system-versioned temporal tables for memory-optimized tables that provide high transactional throughput, lock-free concurrency and large storage amount. It allows us to store current data in-memory and a full history of changes on disk at an effective cost. When creating the system-versioned temporal table with memory-optimized tables we have to consider the following rules:
- only durable memory-optimized tables can be system-versioned
- history table must be disk-based
- data from the internal memory-optimized staging table is regularly moved to the disk-based history table by the data flush task
- querying of historical data returns a union between in-memory staging buffer and disk-based table
Internal memory-optimized staging table is an object created by the system to optimize DML operations and it is not represented in Object Explorer of SSMS.
Data flush task is a task that periodically checks if any memory-optimized table meets a memory size-based condition for moving data. When it does, it deletes all records from an in-memory internal buffer that are older than the oldest currently running transaction to move these records to the disk-based history table.
Modifying data in system-versioned memory-optimized temporal tables are done through a natively compiled stored procedure. But we have to expect excellent performance impact on update and delete operations because the history table is updated automatically. Therefore, we should not perform massive deletes or updates at once.
System-versioned temporal tables are great solutions in most cases. It sure has a lot of advantages for keeping historical data against the old-school way. But, it has also a few restrictions that maybe are not implementable in all scenarios. One problem is that the period times are calculated as a time of the transaction, and not as a time of actual data modification. But regardless that, it is a much simpler way of keeping track of data versions, and it inevitably brings a lot easier database development.