Maintaining Table Data with Partitions

2017-10-18 1468 words 7 minutes

Contents

Recently I was reviewing the growth of some of our third party vendor databases and discovered that one vendor had these handy, enormous audit tables that hadn’t been touched since they were created. Records were constantly being added and as a double surprise (SURPRISE) there was no way via the application to ensure these tables were getting the cleanup they needed. After some back and forth (triple SURPRISE) it fell to us to manage this data. We met with the business owners and determined a reasonable retention policy and set out to create some (relatively) simple SQL agent jobs that would clean up the data nightly and keep things tidy.

The single biggest problem we ran into was that being audit tables they were heavily used and deleting records potentially caused the application to grind to a halt while SQL took out locks to remove records (especially in the early day as we tried to purge the really old records. Eventually we got most of the purging working through proper scheduling and batch deleting. But one table held out as a problem child. The solution that worked on all the other tables simple could not delete records faster that they were being created (without grinding the system to standstill). Enter our last and only hope: partition switching! We had to schedule some down time to rebuild the table but once the table was rebuilt we were able to remove records from the table months at a time to temp tables which we could drop as soon as we switched the data into it. And best of all the blocking (if any) caused by the schema lock from the partition switch was a once a month deal! Sample code follows.

DO NOT JUST RUN IT ON YOUR PRODUCTION SERVERS: STRANGE DBAs GOOFING ON THE INTERNET DISTRIBUTING CODE IS NO BASIS FOR A SYSTEM OF DATA MANAGEMENT.

*ahem* Now that that’s out the way:

First you’ll need to create a partition function and partition scheme to rebuild your table across. Build your partition function first and then your partition scheme. With partition functions make sure to create empty partitions on either side of the series that will hold no data (if at all possible). The partition scheme below keeps the table all on the same file group (primary) but can be altered if needed. In our case we are changing a vendor database as little as possible!

USE [VeryBusyDatabase];
GO

CREATE PARTITION FUNCTION [pf_MonthlyWindow] ( DATETIME2(7)) -- This data type must match your partitioning column data type exactly!
AS RANGE RIGHT FOR VALUES ( N'1900-01-01T00:00:00.000'
    , N'2016-08-01T00:00:00.000'
    , N'2016-09-01T00:00:00.000'
    , N'2016-10-01T00:00:00.000'
    , N'2016-11-01T00:00:00.000'
    , N'2016-12-01T00:00:00.000'
    , N'2017-01-01T00:00:00.000'
    , N'2017-02-01T00:00:00.000'
    , N'2017-03-01T00:00:00.000'
    , N'2017-04-01T00:00:00.000'
    , N'2017-05-01T00:00:00.000'
    , N'2017-06-01T00:00:00.000'
    , N'2017-07-01T00:00:00.000'
    , N'2017-08-01T00:00:00.000'
    , N'2017-09-01T00:00:00.000'
    , N'2017-10-01T00:00:00.000'
    , N'2017-11-01T00:00:00.000'
    , N'2017-12-01T00:00:00.000'
    , N'2018-01-01T00:00:00.000'
    , N'2115-01-01T00:00:00.000' );
GO

CREATE PARTITION SCHEME [ps_MonthlyWindow] AS PARTITION [pf_MonthlyWindow] TO ( [PRIMARY]
    , [PRIMARY]
    , [PRIMARY]
    , [PRIMARY]
    , [PRIMARY]
    , [PRIMARY]
    , [PRIMARY]
    , [PRIMARY]
    , [PRIMARY]
    , [PRIMARY]
    , [PRIMARY]
    , [PRIMARY]
    , [PRIMARY]
    , [PRIMARY]
    , [PRIMARY]
    , [PRIMARY]
    , [PRIMARY]
    , [PRIMARY]
    , [PRIMARY]
    , [PRIMARY]
    , [PRIMARY] );
GO

Once your partition scheme and function are built it’s time to schedule a maintenance window: The table will be offline while you rebuild it across the partition. Additionally any other indexes will need to be rebuilt to include the partitioning column and built across the partition as well or partition switching will not work. Thankfully my super busy table only has a primary key. I’ll drop it and recreate it on the partition:

ALTER TABLE dbo.superbusytable DROP CONSTRAINT PK_BestKeyEver
GO

ALTER TABLE dbo.superbusytable
ADD CONSTRAINT PK_BestKeyEver
PRIMARY KEY CLUSTERED (id, insertdate ) ON ps_MonthlyWindow(insertdate); -- insertdate is an ever increasing date, we'll never be inserting "old" records
GO

Voila! Now we can bring the application back online and allow it to keep adding an ungodly number of audit records to the table. In the SQL Agent we add a job that has three job steps. The first adds future partitions to the partition function and partition scheme so we always have some empty partitions on that end of the structure. The second finds old partitions, switches them out and then drops the data. The third step removes old, empty partitions from the partition scheme and function. But before that you’ll need a to create a table with the same index structure as your partitioned table (sans partitioning). Every index on the partitioned table must be recreated on the secondary table where we’ll actually truncate the data. I cheat and create the table by selecting the top 1 records from my partitioned table, truncating it and then rebuild the needed indexes on top of that.

Step 1 dynamically adding new partitions:

/* Split upcoming partitions (work up to 3 months ahead of current month) */
DECLARE @nextPart DATE
    , @curDate DATE = DATEADD(MONTH, 3, GETDATE()); -- we'll extend our partitions out to this date

SELECT @nextPart = DATEADD(MONTH, 1, CAST(sprv.value AS DATETIME2))
FROM sys.partition_functions AS spf
  INNER JOIN sys.partition_range_values sprv ON sprv.function_id = spf.function_id
WHERE spf.name = N'pf_MonthlyWindow'
ORDER BY sprv.boundary_id DESC
OFFSET 1 ROWS FETCH NEXT 1 ROW ONLY;

/* ensure empty partitions exist to the right to
prevent long processing times to split partition range */
WHILE ( @nextPart <= @curDate )
BEGIN
  ALTER PARTITION SCHEME ps_MonthlyWindow NEXT USED [PRIMARY];

  ALTER PARTITION FUNCTION pf_MonthlyWindow() SPLIT RANGE (@nextPart);
  
  PRINT 'Added ' + CONVERT(VARCHAR(30), @nextPart, 120) + ' to pf_MonthlyWindow.'
  SELECT @nextPart = DATEADD(MONTH, 1, @nextPart);

END;

Step 2 Switch out old partitions:

/*
  Script to identify the oldest partition w/ data and swap it out

  1. identify partition
  2. script out target table (1 time?)
  3. ALTER TABLE audit.ServiceUserAction SWITCH PARTITION # TO
  4. truncate table
*/

DECLARE @part_num INT
    , @sqlcmd VARCHAR(MAX)
    , @rngValue VARCHAR(50);

WHILE EXISTS (  SELECT DISTINCT sp.partition_number
                    , rows
                    , CAST(sprv.value AS DATETIME2)
                FROM sys.partitions sp
                  INNER JOIN sys.indexes si ON si.index_id = sp.index_id
                      AND si.[object_id] = sp.[object_id]
                  INNER JOIN sys.data_spaces sds ON sds.data_space_id = si.data_space_id
                  INNER JOIN sys.partition_schemes sps ON sps.data_space_id = sds.data_space_id
                  INNER JOIN sys.partition_functions spf ON spf.function_id = sps.function_id
                  /* Join partition range values (RANGE Right means p number - 1 = boundary ID */
                  LEFT OUTER JOIN sys.partition_range_values sprv ON sprv.function_id = spf.function_id
                      AND sprv.boundary_id = sp.partition_number - 1
                WHERE spf.name = N'pf_MonthlyWindow'
                    AND sp.rows > 0
                    AND CAST(sprv.value AS DATETIME2) < DATEADD(MONTH, -13, GETDATE()))
BEGIN
    SELECT TOP (1) @part_num = sp.partition_number
        , @rngValue = CONVERT(VARCHAR(30)
        , CAST(sprv.value AS DATETIME2), 120)
    FROM sys.partitions sp
        INNER JOIN sys.indexes si ON si.index_id = sp.index_id
            AND si.[object_id] = sp.[object_id]
        INNER JOIN sys.data_spaces sds ON sds.data_space_id = si.data_space_id
        INNER JOIN sys.partition_schemes sps ON sps.data_space_id = sds.data_space_id
        INNER JOIN sys.partition_functions spf ON spf.function_id = sps.function_id
        /* Join partition range values (RANGE Right means p number - 1 = boundary ID */
        LEFT OUTER JOIN sys.partition_range_values sprv ON sprv.function_id = spf.function_id
            AND sprv.boundary_id = sp.partition_number - 1
    WHERE spf.name = N'pf_MonthlyWindow'
        AND sp.rows > 0
        AND CAST(sprv.value AS DATETIME2) < DATEADD(MONTH, -13, GETDATE()) -- 1 year was our agreed upon retention date
    ORDER BY sp.partition_number;
    
    SELECT @sqlcmd = 'ALTER TABLE dbo.verybusytable SWITCH PARTITION ' 
        + CAST(@part_num AS VARCHAR(3)) + ' TO dbo.truncatepartition;';
    
    PRINT 'Merged range value: ' + @rngValue + '.';
    
    EXEC (@sqlcmd);
    /* kill swapped out records: */
    TRUNCATE TABLE dbo.truncatepartition;
END;

And lastly removing old crusty partitions:

/*
Script to remove empty partitions older than 15 months (except partition 1 because we need at least one empty partition)
*/
DECLARE @mergedate DATETIME2
    , @sqlcmd VARCHAR(MAX);

CREATE TABLE #mergeDates ( m_date DATETIME2 );

INSERT INTO #mergeDates ( m_date )
SELECT CAST(sprv.value AS DATETIME2)
FROM sys.partition_functions AS spf
    INNER JOIN sys.partition_range_values sprv ON sprv.function_id = spf.function_id
WHERE spf.name = N'pf_MonthlyWindow'
    AND CAST(sprv.value AS DATETIME2) < DATEADD(MONTH, -15, GETDATE())
    AND sprv.boundary_id 1;

DECLARE curr_dates CURSOR FOR
    SELECT m_date
    FROM #mergeDates
    ORDER BY m_date ASC;

OPEN curr_dates;

FETCH NEXT FROM curr_dates INTO @mergedate;

WHILE @@FETCH_STATUS = 0
BEGIN
    /* merge old partitions */
    SELECT @sqlcmd = 'ALTER PARTITION FUNCTION pf_MonthlyWindow() MERGE RANGE ('''
        + CONVERT(VARCHAR(30), @mergedate, 120) + ''');';
    
    EXEC (@sqlcmd);
    
    FETCH NEXT FROM curr_dates INTO @mergedate;
END;

CLOSE curr_dates;
DEALLOCATE curr_dates;

DROP TABLE #mergeDates;

If you’ve got out of control tables that no one else is willing to tend to this will do the trick. Although remember that altering your vendor database without their consent may void your support contract and any number of other considerations. As with anything else you find on the internet: test it and see if it works for you before deploying to production. A huge hat tip to Kendra Little (b|t) whose introduction to partitioning made any of this possible for my own problem database. I hope that if nothing else I’ve added another avenue for some else to find her great break down of partitioning.