SQL Saturday Spokane (Part 1)

In 8 days Spokane will play host to it’s first ever SQL Saturday event (yay!) Right after I got my first DBA job I quickly discovered the PASS community (via Brent Ozar to be specific, the senior DBA was quite the fan and their 6 month JR DBA Training plan was invaluable in getting my sea legs). I quickly discovered a truly welcoming and helpful community that at its core is deeply invested in the success and growth of its members. I attended my first SQL Saturday in Portland in 2014 (and again in 2015 and Redmond in 2016). I learned about Summit which I first attended in 2015 (and again in 2016). After attending I was frustrated that there was no local user group to connect with (boo!) but as soon as I decided to take steps to create a Spokane PASS chapter (or join us on fb) another local DBA beat me to the punch (yay!). However I found myself running the group in the fall of 2016 and a small group of us have met regularly since then.

Something was missing still though: in a town of a quarter million surely we had more than 10 DBAs. I reached out to the wonderful Melody Zacharias (b|t) to get her thoughts about a SQL Saturday in our neck of the woods. She was encouraging and graciously agreed to help organize our own event. Today I am excited to report that our event is over 75% booked with more than half of those registered are new to SQL Saturday (and likely to the PASS community at large). I have high hopes for the success next weekend: if you are one of the many people who submitted sessions to present: thank you, just receiving all your submissions validated our idea. If you are one of the few that were selected to present and able to join us next weekend: thank you, I am incredibly excited to meet and learn from you. If you’ve registered to attend: thank you, I am excited to meet you and hopefully get the chance to tell you about how great PASS is and invite you to our monthly meet ups as well!

Goals for 2018

It’s rare that I ever look back at a year in review and think “Yeah, I nailed it.” 2017 maybe even less so. It found it difficult to maintain long term focus and a lot of little projects definitely fell through the cracks. That being said here are my goals for 2018 (both personal and professional):

  • More technical posts:
    • I should be more comfortable providing technical information to a wider audience. I might not post more at this point, but I like to post more helpful/specific articles.
  • Pass the 70-463 Exam:
    • This would complete my SQL MSCA and would be a nice milestone. I’ve been slowly working my way through test prep materials but I need to ramp it up and get it learned.
  • Continue learning to play the ukulele:
    • I know a few chords but it’d be nice to get an actual song or two into my skill set. Especially since my father has taken up building instruments and has nearly completed a tenor uke for me.
  • Pass my nidan test:
    • I’m an aikido instructor (evenings and weekends) and this last year my personal training has fallen off a bit. I need to buckle down and put the work in on my own progress.

What are your goals for next year?

Maintaining Table Data with Partitions

Recently I was reviewing the growth of some of our third party vendor databases and discovered that one vendor had these handy, enormous audit tables that hadn’t been touched since they were created. Records were constantly being added and as a double surprise (SURPRISE) there was no way via the application to ensure these tables were getting the cleanup they needed. After some back and forth (triple SURPRISE) it fell to us to manage this data. We met with the business owners and determined a reasonable retention policy and set out to create some (Relatively) simple SQL agent jobs that would clean up the data nightly and keep things tidy.

The single biggest problem we ran into was that being audit tables they were heavily used and deleting records potentially caused the application to grind to a halt while SQL took out locks to remove records (especially in the early day as we tried to purge the really old records. Eventually we got most of the purging working through proper scheduling and batch deleting. But one table held out as a problem child. The solution that worked on all the other tables simple could not delete records faster that they were being created (without grinding the system to standstill). Enter our last and only hope: partition switching! We had to schedule some down time to rebuild the table but once the table was rebuilt we were able to remove records from the table months at a time to temp tables which we could drop as soon as we switched the data into it. And best of all the blocking (if any) caused by the schema lock from the partition switch was a once a month deal! Sample code follows.


*ahem* Now that that’s out the way:

First you’ll need to create a partition function and partition scheme to rebuild your table across. Build your partition function first and then your partition scheme. With partition functions make sure to create empty partitions on either side of the series that will hold no data (if at all possible). The partition scheme below keeps the table all on the same file group (primary) but can be altered if needed. In our case we are changing a vendor database as little as possible!

Once your partition scheme and function are built it’s time to schedule a maintenance window: The table will be offline while you rebuild it across the partition. Additionally any other indexes will need to be rebuilt to include the partitioning column and built across the partition as well or partition switching will not work. Thankfully my super busy table only has a primary key. I’ll drop it and recreate it on the partition:

Viola! Now we can bring the application back online and allow it to keep adding an ungodly number of audit records to the table. In the SQL Agent we add a job that has three job steps. The first adds future partitions to the partition function and partition scheme so we always have some empty partitions on that end of the structure. The second finds old partitions, switches them out and then drops the data. The third step removes old, empty partitions from the partition scheme and function. But before that you’ll need a to create a table with the same index structure as your partitioned table (sans partitioning). Every index on the partitioned table must be recreated on the secondary table where we’ll actually truncate the data. I cheat and create the table by selecting the top 1 records from my partitioned table, truncating it and then rebuild the needed indexes on top of that.

Step 1 dynamically adding new partitions:

Step 2 Switch out old partitions:

And lastly removing old crusty partitions:

If you’ve got out of control tables that no one else is willing to tend to this will do the trick. Although remember that altering your vendor database without their consent may void your support contract and any number of other considerations. As with anything else you find on the internet: test it and see if it works for you before deploying to production. A huge hat tip to Kendra Little (b|t) whose introduction to partitioning made any of this possible for my own problem database. I hope that if nothing else I’ve added another avenue for some else to find her great break down of partitioning.

Why I’m Hoping I Can Attend Training with the SQL Skills Team

I’ve been eyeing the training programs at SQLSkills for some time now. I’ve managed to catch a couple of SQL Skills team’s sessions at Summit and they have been very helpful and very informative. I’ve only been doing this gig for 4 years now and every webinar, SQL Saturday or Summit has been a wealth of useful information that I’ve been able to take back and use to do my job better. I’m starting to reach that tipping point where I probably should start looking at what I know that I could share with others but frankly there is so much I still don’t know (at least not a level that satisfies me anyway) I’m not quite there yet. Which is one of the reasons the IEOTO1 has been on my radar (even when it was just IE1). I work primarily with a lot of 3rd party databases and I’ve found that even though I can do less with them most of my wins come from a better understanding of SQL internals which the SQLSkills team has in spades. I want to attend training because I want to be better at my job today, tomorrow and next year. SQL Server fundamentals to feel completely intuitive to me is just the very beginning of that path. I also need a chance to get an authentic Chicago Italian beef, but that would just be a bonus.

If you’d like to attend SQLSkills training they are running a competition now for a free seat at one of their training courses (IEPTO1 or IEPTO2) details are here.

Analyzing Tables for Compression in SQL Server

This is mostly because I can’t find the link to this Microsoft Technet Article easily enough. Like most things SQL server there is a lot nuance to compressing tables and indexes in your databases and I’m going to just touch on the first pass look I’m performing.

The first step is to look at the usage patterns on the table. The technet article splits this into two queries that I have combined into one single query below. What I am looking for first is how much the table/index is updated: more updates will mean more CPU overhead as pages/rows are decompressed before being updated. If the server doesn’t have many spare CPU cycles to spare I could be helping to create a bottleneck at the CPU. Secondly: how many scans happen on the table/index? More scans means it’s more likely that we’ll get a performance benefit when we go to read the the object from disk as we will do less physical reads.

This is important information but if you are looking at enabling compression there is probably a more immediate consideration as well: you need to conserve space on your data drives. The stored procedure below will give you an idea of the space savings benefit for either ROW or PAGE level compression. The data types in the index or table will have a large impact on how much SQL is able to squeeze data down. This ultimately (in tandem with the usage patterns determined above) drive your decision to compress your data. If you have spare CPU cycles to burn and you can reclaim a large amount of space in your database (or even disk drive if needed), then compression is going to an important tool in your toolbox. We recently reclaimed nearly 200 GB in our data warehouse by compressing our largest table (260 GB down to 56 GB). The table was rarely updated but scanned often so in addition to gaining back significant database space we’ve also sped up the ETL processes that access that table as a bonus!

The technet article is great and goes into greater depth on what’s happening under the hood when you compress tables as well as providing more detailed examples of compression decision scenarios and really is recommended reading if you are currently looking to implement compression.

Are Your Partitions on the Correct Filegroup?

Last summer we discovered that some records that had been migrated to a read-only filegroup via partitioning were still getting updated (here’s the sordid details on stack overflow). Working with the SQLCat team at PASS in October we discovered that while the partitioning scheme indicated the records should be on the read-only partition according to the allocation unit mapping showed the records will still resident on the active filegroup. For unknown reasons during partition maintenance the records hadn’t moved properly. Working with MS Support we weren’t able to replicate the issue or determine how it happened. Even more insidious is that there is no good way to detect if the issue is occurring unless you go specifically looking for it. It only came to our attention because we were setting some records to read only. If you are doing partition maintenance which moves records physically on the disk you might be interested in the following query which will detect when there is a disconnect between where your partition scheme thinks records will reside and where sys.allocation_units tells you those same records reside:

Partial Backups With Dell Litespeed

We currently use Dell’s Litespeed for SQL Server for our SQL backup compression and encryption needs. It has an intuitive GUI, manages log shipping and we are generally pretty happy with it. I’ve got a couple of problem databases where I don’t want the vanilla out of the box full backups. A couple of our data warehouse dbs are now over a terabyte in size and continuing to grow. We’re implementing partitioning to help with index and statistics maintenance as well as backups. The down side is that there is no GUI for partial backups with Litespeed so I’ve had to fumble through it with the help of their support. Here then are the working scripts for taking partial backups (and restores) of databases using the Litespeed stored procedures. Our database is in simple recovery mode and we’ll be backing up a read-only file group as well as all our active files.

First backup your read-only file group. You only need to back this up once. Or until an end user has you set it to read-write, and updates records. When that happens be sure to set the file group back to read-only and then back it up again.

The xp_backup_database also has a @read_write_filgroups parameter we’ll use later. It’s important to note that if you include it here with a value of 0 (which seems an intuitive thing to do) you will essentially take a full backup rather than a backup of just the intended filegroup. Including @returndetails is optional but will instruct Litespeed to return a row of information about the backup as it was processed.

Once you have your read-only files backed up you can back up the read-write file groups daily (or hourly or whatever works for your SLA) with the following code. Notice we’ve removed the specific filegroup parameter and added the @read_write_filegroups parameter.

Once you’ve taken a full backup you can do differential backups of the read-write file groups as well by including an additional @with:

Restore scripts look very similar to the backup scripts. If you do not have differential backups of your read-write file groups change ‘NORECOVERY’ to ‘RECOVERY.’

Restoring a read-write differential backup (RECOVERY should be changed to NO RECOVERY if there are multiple differentials to apply):

Finally read-only file groups can be restored after the database has been brought online after restoring all backups of read-write file groups:

Updating Tables with Read-Only Partitions

I learned an important thing today. I’ve started partitioning our data warehouse and we’ve moved into our testing phase. The partitions are in place, the archive is read-only and once a month the automated partition maintenance kicks off. It all seemed to be going swimmingly until…

A couple of times our data warehouse folks have found that an updated to a record on the read-write partition would fail with the following error:

Msg 652, Level 16, State 1, Line 1 The index “PK_STG_PHX_LN_ACCT_INT_OPT” for table “dbo.STG_PHX_LN_ACCT_INT_OPT” (RowsetId 72057594415022080) resides on a read-only filegroup (“ARCHIVE”), which cannot be modified.

Generated by this lovely ETL software generated SQL:

The table is partitioned on a column  (ETL_VERS_END_DTM)  that indicates if the record is current or not so *all* records being updated are expected to be on one end of the partitioned table. Since the partitioned column was not included in the WHERE clause of the update statement SQL generated an execution plan that indicated it would not be inconceivable the update statement could affect rows in read-only partitions of the table and then failed.

The fix is simple: include the partitioning column in the where clause. In our case we know that records being updated will always have a datetime value of ‘99990101 00:00:00.000’ (because consultants) and they are partitioned accordingly.

And the SQL is able to determine the query will not touch a read-only partition (and since the partition column value is not being updated there is no risk of movement to a read-only partition).

In a currently strange case of SQL Magic (otherwise know as SQL internals): setting  ETL_VER_END_DTM to the expected value (i.e. it’s current value) but not including it in the where clause also produces an execution plan that allows SQL to proceed and succeed despite not eliminating the possibility it exists in the read-only partition before it begins. I’ll update this later with a link to any explanation that I find as I try to wrap my head around it.

UPDATE: In the simplest terms (for folks like me) the query originally ended up with a shared rowset for the both the read and the update when it encountered the read-only partition SQL had to bail. Including the end date in the update clause prevents SQL from using a shared rowset under the covers so when it gets to the update portion its rowset includes only rows from the read-write partitions. A link to Paul White’s post on how rowset sharing causes the error is here.

Many thanks to Aaron Bertrand (b|t) and Paul White (b|t) for helping me to understand what was going on over on Stack Exchange. Not only was I able to wrap my head around some non-intuitive internal processes but it reaffirmed why I think the SQL Family is so great.

What’s In Your SLA?

Lets be honest SLAs are not particularly exciting. It’s a bunch of words you have to put together after (ugh) talking to end users who may not even know you exist. But hands down they are probably one of the most important documents you need to maintain (yes maintain, don’t just make it once and assume you are good). I recently checked in with my local user group and the results were mixed: some had SLAs but they existed for the application, not  for the database, some just had an agreed on RPO/RTO (these are the six letters that can get you fired), some had nothing. 99% of the databases in my environment are for third party vendors and I’ve found that I need more than just RPO & RTO. Here’s my recommendation for what you as a DBA should have on record for your all databases.

What is this and whose is it?

What application(s) are covered by the SLA? What SQL instance does the database (or databases) reside on? Pretty straight forward, this helps create a record of where you expect data to live. If someone comes to you with problems for the application for a database not listed in the SLA you’ve got a problem. Who is the product owner at your company? Susan from Loan Services? Bob from Marketing? This person should be in charge of the product for end users. They’ll know when and how it’s used and should have a good idea of what data is going into it. Who is the technical owner? If you aren’t in charge of the application there’s someone else out there in IT working to keep the rest of the application running. This person is going to have the contact information for support with the vendor. This is important particularly when you need to find out if that new service pack is supported by that vendor or if installing it will void your support contract.

Let’s about those six letters: RPO/RTO

Admittedly these are the drivers for the document. RPO (recovery point objective), this is how much data (measured in time) your business is willing or able to lose. This will drive your backup strategy for this system: do you need a daily backups? Differentials every 6 hours? Who knows? If you don’t have an agreed to RPO you can bet the product owner assumes nothing will be lost. RTO (recovery time objective), this is how long an outage the business is willing to tolerate. Do you need to be back up and running in 10 minutes? 30? Is it an operational service that can be down for a day? Just like RPO if you haven’t had that discussion with the product owner chances are they are going to assume you can just start it back up. This is the most important discussion to have so you can set realistic expectations for that product owner and make plans to budget for the solution that meets their expectations/needs going forward. Most important is to have the business document why the RPO/RTO are needed.

But wait that’s not all?

I was surprised to hear that most SLAs seem to stop there. There are number of other important questions you still need answered

What tier level is the application you are supporting? Is this a high or low importance system? This is a check on the requested RPO/RTO. If the system is not a tier 1 service but they can’t lose any data ever there is a disconnect somewhere in your organization and you need to get everyone on the same page.

Performance expectations. They’ll tell you fast but do you really need to spend all your limited budget dollars maxing the performance of a test environment that’s used once a quarter?

When is this application in use? Find out when it is not in use: What time of day are you able to do maintenance on the server/instance?

Backup retention? Identify how long you need to hang on to backups: Do you need just the most recent or will there every be a need to recover a backup from a specific day?

What data is stored in the database? Find out if there is personally identifiable information being kept in the database. You’re responsible for protecting it if there is. Do you need to ensure backups are encrypted? Do you need TDE?

Where does this application live? Is it external facing or internal? Do you need to take extra steps to reduce the surface area of your SQL Server?

Does the application access SQL databases for other applications? Which SLA is applicable in that case? Make notes explaining the impact of the database on your recovery plans.

And Finally:

SLAs are not set and forget. Mine are all stored in Sharepoint where they can be accessed by business and technical owners alike. Our friendly neighborhood Sharepoint admin set up a monthly alert sending me a list of SLAs that are over a year old. You should be revisiting these, reaching out to the people listed in the document to make sure they still think it’s a good plan. You gotta do it.