Why I’m Hoping I Can Attend Training with the SQL Skills Team

I’ve been eyeing the training programs at SQLSkills for some time now. I’ve managed to catch a couple of SQL Skills team’s sessions at Summit and they have been very helpful and very informative. I’ve only been doing this gig for 4 years now and every webinar, SQL Saturday or Summit has been a wealth of useful information that I’ve been able to take back and use to do my job better. I’m starting to reach that tipping point where I probably should start looking at what I know that I could share with others but frankly there is so much I still don’t know (at least not a level that satisfies me anyway) I’m not quite there yet. Which is one of the reasons the IEOTO1 has been on my radar (even when it was just IE1). I work primarily with a lot of 3rd party databases and I’ve found that even though I can do less with them most of my wins come from a better understanding of SQL internals which the SQLSkills team has in spades. I want to attend training because I want to be better at my job today, tomorrow and next year. SQL Server fundamentals to feel completely intuitive to me is just the very beginning of that path. I also need a chance to get an authentic Chicago Italian beef, but that would just be a bonus.

If you’d like to attend SQLSkills training they are running a competition now for a free seat at one of their training courses (IEPTO1 or IEPTO2) details are here.

Analyzing Tables for Compression in SQL Server

This is mostly because I can’t find the link to this Microsoft Technet Article easily enough. Like most things SQL server there is a lot nuance to compressing tables and indexes in your databases and I’m going to just touch on the first pass look I’m performing.

The first step is to look at the usage patterns on the table. The technet article splits this into two queries that I have combined into one single query below. What I am looking for first is how much the table/index is updated: more updates will mean more CPU overhead as pages/rows are decompressed before being updated. If the server doesn’t have many spare CPU cycles to spare I could be helping to create a bottleneck at the CPU. Secondly: how many scans happen on the table/index? More scans means it’s more likely that we’ll get a performance benefit when we go to read the the object from disk as we will do less physical reads.

This is important information but if you are looking at enabling compression there is probably a more immediate consideration as well: you need to conserve space on your data drives. The stored procedure below will give you an idea of the space savings benefit for either ROW or PAGE level compression. The data types in the index or table will have a large impact on how much SQL is able to squeeze data down. This ultimately (in tandem with the usage patterns determined above) drive your decision to compress your data. If you have spare CPU cycles to burn and you can reclaim a large amount of space in your database (or even disk drive if needed), then compression is going to an important tool in your toolbox. We recently reclaimed nearly 200 GB in our data warehouse by compressing our largest table (260 GB down to 56 GB). The table was rarely updated but scanned often so in addition to gaining back significant database space we’ve also sped up the ETL processes that access that table as a bonus!

The technet article is great and goes into greater depth on what’s happening under the hood when you compress tables as well as providing more detailed examples of compression decision scenarios and really is recommended reading if you are currently looking to implement compression.

Are Your Partitions on the Correct Filegroup?

Last summer we discovered that some records that had been migrated to a read-only filegroup via partitioning were still getting updated (here’s the sordid details on stack overflow). Working with the SQLCat team at PASS in October we discovered that while the partitioning scheme indicated the records should be on the read-only partition according to the allocation unit mapping showed the records will still resident on the active filegroup. For unknown reasons during partition maintenance the records hadn’t moved properly. Working with MS Support we weren’t able to replicate the issue or determine how it happened. Even more insidious is that there is no good way to detect if the issue is occurring unless you go specifically looking for it. It only came to our attention because we were setting some records to read only. If you are doing partition maintenance which moves records physically on the disk you might be interested in the following query which will detect when there is a disconnect between where your partition scheme thinks records will reside and where sys.allocation_units tells you those same records reside:

Partial Backups With Dell Litespeed

We currently use Dell’s Litespeed for SQL Server for our SQL backup compression and encryption needs. It has an intuitive GUI, manages log shipping and we are generally pretty happy with it. I’ve got a couple of problem databases where I don’t want the vanilla out of the box full backups. A couple of our data warehouse dbs are now over a terabyte in size and continuing to grow. We’re implementing partitioning to help with index and statistics maintenance as well as backups. The down side is that there is no GUI for partial backups with Litespeed so I’ve had to fumble through it with the help of their support. Here then are the working scripts for taking partial backups (and restores) of databases using the Litespeed stored procedures. Our database is in simple recovery mode and we’ll be backing up a read-only file group as well as all our active files.

First backup your read-only file group. You only need to back this up once. Or until an end user has you set it to read-write, and updates records. When that happens be sure to set the file group back to read-only and then back it up again.

The xp_backup_database also has a @read_write_filgroups parameter we’ll use later. It’s important to note that if you include it here with a value of 0 (which seems an intuitive thing to do) you will essentially take a full backup rather than a backup of just the intended filegroup. Including @returndetails is optional but will instruct Litespeed to return a row of information about the backup as it was processed.

Once you have your read-only files backed up you can back up the read-write file groups daily (or hourly or whatever works for your SLA) with the following code. Notice we’ve removed the specific filegroup parameter and added the @read_write_filegroups parameter.

Once you’ve taken a full backup you can do differential backups of the read-write file groups as well by including an additional @with:

Restore scripts look very similar to the backup scripts. If you do not have differential backups of your read-write file groups change ‘NORECOVERY’ to ‘RECOVERY.’

Restoring a read-write differential backup (RECOVERY should be changed to NO RECOVERY if there are multiple differentials to apply):

Finally read-only file groups can be restored after the database has been brought online after restoring all backups of read-write file groups:

Updating Tables with Read-Only Partitions

I learned an important thing today. I’ve started partitioning our data warehouse and we’ve moved into our testing phase. The partitions are in place, the archive is read-only and once a month the automated partition maintenance kicks off. It all seemed to be going swimmingly until…

A couple of times our data warehouse folks have found that an updated to a record on the read-write partition would fail with the following error:

Msg 652, Level 16, State 1, Line 1 The index “PK_STG_PHX_LN_ACCT_INT_OPT” for table “dbo.STG_PHX_LN_ACCT_INT_OPT” (RowsetId 72057594415022080) resides on a read-only filegroup (“ARCHIVE”), which cannot be modified.

Generated by this lovely ETL software generated SQL:

The table is partitioned on a column  (ETL_VERS_END_DTM)  that indicates if the record is current or not so *all* records being updated are expected to be on one end of the partitioned table. Since the partitioned column was not included in the WHERE clause of the update statement SQL generated an execution plan that indicated it would not be inconceivable the update statement could affect rows in read-only partitions of the table and then failed.

The fix is simple: include the partitioning column in the where clause. In our case we know that records being updated will always have a datetime value of ‘99990101 00:00:00.000’ (because consultants) and they are partitioned accordingly.

And the SQL is able to determine the query will not touch a read-only partition (and since the partition column value is not being updated there is no risk of movement to a read-only partition).

In a currently strange case of SQL Magic (otherwise know as SQL internals): setting  ETL_VER_END_DTM to the expected value (i.e. it’s current value) but not including it in the where clause also produces an execution plan that allows SQL to proceed and succeed despite not eliminating the possibility it exists in the read-only partition before it begins. I’ll update this later with a link to any explanation that I find as I try to wrap my head around it.

UPDATE: In the simplest terms (for folks like me) the query originally ended up with a shared rowset for the both the read and the update when it encountered the read-only partition SQL had to bail. Including the end date in the update clause prevents SQL from using a shared rowset under the covers so when it gets to the update portion its rowset includes only rows from the read-write partitions. A link to Paul White’s post on how rowset sharing causes the error is here.

Many thanks to Aaron Bertrand (b|t) and Paul White (b|t) for helping me to understand what was going on over on Stack Exchange. Not only was I able to wrap my head around some non-intuitive internal processes but it reaffirmed why I think the SQL Family is so great.

What’s In Your SLA?

Lets be honest SLAs are not particularly exciting. It’s a bunch of words you have to put together after (ugh) talking to end users who may not even know you exist. But hands down they are probably one of the most important documents you need to maintain (yes maintain, don’t just make it once and assume you are good). I recently checked in with my local user group and the results were mixed: some had SLAs but they existed for the application, not  for the database, some just had an agreed on RPO/RTO (these are the six letters that can get you fired), some had nothing. 99% of the databases in my environment are for third party vendors and I’ve found that I need more than just RPO & RTO. Here’s my recommendation for what you as a DBA should have on record for your all databases.

What is this and whose is it?

What application(s) are covered by the SLA? What SQL instance does the database (or databases) reside on? Pretty straight forward, this helps create a record of where you expect data to live. If someone comes to you with problems for the application for a database not listed in the SLA you’ve got a problem. Who is the product owner at your company? Susan from Loan Services? Bob from Marketing? This person should be in charge of the product for end users. They’ll know when and how it’s used and should have a good idea of what data is going into it. Who is the technical owner? If you aren’t in charge of the application there’s someone else out there in IT working to keep the rest of the application running. This person is going to have the contact information for support with the vendor. This is important particularly when you need to find out if that new service pack is supported by that vendor or if installing it will void your support contract.

Let’s about those six letters: RPO/RTO

Admittedly these are the drivers for the document. RPO (recovery point objective), this is how much data (measured in time) your business is willing or able to lose. This will drive your backup strategy for this system: do you need a daily backups? Differentials every 6 hours? Who knows? If you don’t have an agreed to RPO you can bet the product owner assumes nothing will be lost. RTO (recovery time objective), this is how long an outage the business is willing to tolerate. Do you need to be back up and running in 10 minutes? 30? Is it an operational service that can be down for a day? Just like RPO if you haven’t had that discussion with the product owner chances are they are going to assume you can just start it back up. This is the most important discussion to have so you can set realistic expectations for that product owner and make plans to budget for the solution that meets their expectations/needs going forward. Most important is to have the business document why the RPO/RTO are needed.

But wait that’s not all?

I was surprised to hear that most SLAs seem to stop there. There are number of other important questions you still need answered

What tier level is the application you are supporting? Is this a high or low importance system? This is a check on the requested RPO/RTO. If the system is not a tier 1 service but they can’t lose any data ever there is a disconnect somewhere in your organization and you need to get everyone on the same page.

Performance expectations. They’ll tell you fast but do you really need to spend all your limited budget dollars maxing the performance of a test environment that’s used once a quarter?

When is this application in use? Find out when it is not in use: What time of day are you able to do maintenance on the server/instance?

Backup retention? Identify how long you need to hang on to backups: Do you need just the most recent or will there every be a need to recover a backup from a specific day?

What data is stored in the database? Find out if there is personally identifiable information being kept in the database. You’re responsible for protecting it if there is. Do you need to ensure backups are encrypted? Do you need TDE?

Where does this application live? Is it external facing or internal? Do you need to take extra steps to reduce the surface area of your SQL Server?

Does the application access SQL databases for other applications? Which SLA is applicable in that case? Make notes explaining the impact of the database on your recovery plans.

And Finally:

SLAs are not set and forget. Mine are all stored in Sharepoint where they can be accessed by business and technical owners alike. Our friendly neighborhood Sharepoint admin set up a monthly alert sending me a list of SLAs that are over a year old. You should be revisiting these, reaching out to the people listed in the document to make sure they still think it’s a good plan. You gotta do it.

Starting Big

I’ve long imagined that my first blog would be about the importance and scheduling of backups (very) or maintaining SLAs (very important and they should be reviewed at least once a year). Instead I believe I am going to eat the proverbial elephant blogging my process for slicing and dicing my company’s shiny new data warehouse into maintainable pieces.

About a year ago our onsite consultants started to come online and build out our structure. I was only tangentially attached to the project in that they knew they’d need some SQL servers at some point. I started to do some research into VLDBs (while it’s started out “small” the new policy was to delete nothing ever again) and quickly discovered the usefulness of partitioning (both for maintenance tasks and ETL processes. I of course forwarded on the links and after a brief discussion was told that it could be addressed later. In my naivete of course this sounded reasonable and I continued on with the million other projects I was eyeball deep in.

Woe to you who is not sure enough to stand your ground! 9 months later when I learned that I’d be dissecting their largest databases table by table to get even a reasonable maintenance plan going. After a mere 9 months backups are already up to two hours and it was time to prove that a partitioned database could alleviate the inevitable backup and index maintenance apocalypse.

So that’s where this series is starting: me, my planning and then table by table through our two largest databases (ps I won’t be posting table by table).

Cartoon of a charging elephant

I am beginning to rethink my methodology for eating an elephant…

Summit 15 in the Rear View Mirror

Summit was intense. I paced myself pretty well I thought but I was still exhausted by the end (even cutting out half a day early). It is a fantastic opportunity not only to learn and grow as a SQL professional but to grow and connect with other DBAs. To that end I quickly realized that the sessions were almost secondary: the value of purchasing the sessions for download or on USB drive is quite high especially once you return and start reviewing your notes and realize how much of them are tied directly back to something that was happening in the session and they make barely any sense.

Once I made the decision to get the sessions to review later (or to make the decision between schedule conflicts easier) it was that much easier to start meeting and connecting with others. This was, by far, the most important piece of PASS for me: meeting other DBAs, asking them questions and (hopefully) making long term connections. If there is anything I’ve learned from my many career paths is that you are only ever as good as your network: those people you can turn to when you find yourself stumped on a particular problem. I went into all the events I attended last week actively looking to grow my personal network, to find connections I could turn to when stumped. I was also looking to connect with DBAs like me: newly minted and fresh on their journey into the SQL community. Suffice to say I did a lot of talking last week, it’s not my favorite thing (I tend towards introversion like much of our IT community) but well worth it. Highlights from the week for me included sessions on optimizing VMs for SQL with Denny Cherry, Paul Randal‘s DBA Mythbusters, meeting Marius from Norway (I can’t seem to locate his card currently), going over my current plan for DW partitioning (more on that in future posts) with the SQL Cat team and getting my plan both validated and improved, chatting with a DBA overseeing GIS data in Vancouver (her card is… somewhere around here), a great nearly shouted discussion about professional development during live band Karaoke at the Hard Rock Cafe, discussing the potential of RedGate’s Instaclone in our environment, and generally stalking the Red Gate folks in general (really they are the friendliest folks around). I’m currently waiting to hear back about getting registered for Summit 2016 (if you have a budget conscious manager perhaps the savings of early bird registration will tip the decision in your favor). It is a fantastic opportunity to learn amazing things, meet amazing people.

So fair warning next year: I’m the one on the right. If you see me coming I’m probably going to try and introduce myself.

Just two DBAs at Summit

Just two DBAs at Summit

SQL Saturday #446 & RedGate SQL in the City

So I’m missing out on the Summit 15 keynote this morning as I was sadly paged last night. Between that and the first timers meetup and welcome reception I was completely unable to get up at an appropriate time today. While my work colleague gets ready I wanted to jot down a few impressions from the last few days.

SQL Saturday #446: Portland, OR

This was my third SQL Saturday in Portland and the event continues to be fantastic for a number of reasons. It’s right before Summit so a number of speakers at summer present in Portland that Saturday before so you can open up your Summit schedule if you get in on the sessions early. It’s a great networking event (I think there was around 400 SQL Family that showed up?).  I attended a great session on auditing in SQL Server from Colleen Morrow (bt) The really short and skinny: in SQL 2008 Standard you can only audit at the instance level but in 2008 Enterprise and SQL 2012 and above you can drill down into specific databases with your auditing. It all makes use of extended events so it’s light weight. It’s configurable in filtering, retention and handling auditing errors (lose the audit info, cancel the transaction that triggered it etc). She also shared a plan for centralizing the collection and ingestion of the audit logs that were created in a central location for reporting purposes. Scripts and slide deck here.

SQL in the City

This was my second SQL in the City and it was even better than my previous experience as well: new products like ReadyRoll and InstantClone really speak to some pain points at our organization and I’m looking forward to see we can fully implement them to make it all better. Or at least mostly better. Or at least version or databases: I hear that’s a good start!

I’m super excited and it’s time to hop on a train to start hitting Summit sessions. If you found this post and found it useful let me know!