What's In Your SLA?
Lets be honest SLAs are not particularly exciting. It’s a bunch of words you have to put together after (ugh) talking to end users who may not even know you exist. But hands down they are probably one of the most important documents you need to maintain (yes maintain, don’t just make it once and assume you are good). I recently checked in with my local user group and the results were mixed: some had SLAs but they existed for the application, not for the database, some just had an agreed on RPO/RTO (these are the six letters that can get you fired, some had nothing. 99% of the databases in my environment are for third party vendors and I’ve found that I need more than just RPO & RTO. Here’s my recommendation for what you as a DBA should have on record for your all databases.
What is this and whose is it?
What application(s) are covered by the SLA? What SQL instance does the database (or databases) reside on? Pretty straight forward, this helps create a record of where you expect data to live. If someone comes to you with problems for the application for a database not listed in the SLA you’ve got a problem. Who is the product owner at your company? Susan from Loan Services? Bob from Marketing? This person should be in charge of the product for end users. They’ll know when and how it’s used and should have a good idea of what data is going into it. Who is the technical owner? If you aren’t in charge of the application there’s someone else out there in IT working to keep the rest of the application running. This person is going to have the contact information for support with the vendor. This is important particularly when you need to find out if that new service pack is supported by that vendor or if installing it will void your support contract.
Let’s Talk about those six letters: RPO/RTO
Admittedly these are the drivers for the document. RPO (recovery point objective), this is how much data (measured in time) your business is willing or able to lose. This will drive your backup strategy for this system: do you need a daily backups? Differentials every 6 hours? Who knows? If you don’t have an agreed to RPO you can bet the product owner assumes nothing will be lost. RTO (recovery time objective), this is how long an outage the business is willing to tolerate. Do you need to be back up and running in 10 minutes? 30? Is it an operational service that can be down for a day? Just like RPO if you haven’t had that discussion with the product owner chances are they are going to assume you can just start it back up. This is the most important discussion to have so you can set realistic expectations for that product owner and make plans to budget for the solution that meets their expectations/needs going forward. Most important is to have the business document why the RPO/RTO are needed.
But wait that’s not all?
I was surprised to hear that most SLAs seem to stop there. There are number of other important questions you still need answered
What tier level is the application you are supporting? Is this a high or low importance system? This is a check on the requested RPO/RTO. If the system is not a tier 1 service but they can’t lose any data ever there is a disconnect somewhere in your organization and you need to get everyone on the same page.
Performance expectations. They’ll tell you fast but do you really need to spend all your limited budget dollars maxing the performance of a test environment that’s used once a quarter?
When is this application in use? Find out when it is not in use: What time of day are you able to do maintenance on the server/instance?
Backup retention? Identify how long you need to hang on to backups: Do you need just the most recent or will there every be a need to recover a backup from a specific day?
What data is stored in the database? Find out if there is personally identifiable information being kept in the database. You’re responsible for protecting it if there is. Do you need to ensure backups are encrypted? Do you need TDE?
Where does this application live? Is it external facing or internal? Do you need to take extra steps to reduce the surface area of your SQL Server?
Does the application access SQL databases for other applications? Which SLA is applicable in that case? Make notes explaining the impact of the database on your recovery plans.
And Finally
SLAs are not set and forget. Mine are all stored in Sharepoint where they can be accessed by business and technical owners alike. Our friendly neighborhood Sharepoint admin set up a monthly alert sending me a list of SLAs that are over a year old. You should be revisiting these, reaching out to the people listed in the document to make sure they still think it’s a good plan. You gotta do it.