I’ve been taking notes hoping to share some lessons learned from our deployment of Red Gate’s SQL Clone but they kept it so deliciously simply and the documentation and worked examples for PowerShell are difficultto improve upon (also we aren’t doing anything too terrible fancy yet either). My only gotcha was learning that if you don’t use the redgate extension (.SQB) for backups that SQL Clone won’t recognize that it was compressed and encrypted with SQL Backup (we’ve wrapped an existing backup procedure around the SQL Backup proc that preserved our old naming conventions when we switched to the new backup tool). The work around was simple: replace the backup file extension prior to creating the image from the backup. If you think that you could benefit from cloning your databases to replicate (and mask) production data I encourage you to check it out. In the meantime I’ll see if I can find a use case or work around that needs some more documentation as we roll it out for our test group.
I was thinking more about numbers and wanted to provide some further breakdown on our SQL Saturday. It seemed like there was a large number of BI focused people who signed up/attended and that was validated by the numbers: 21% seemed to have a BI focus to their job with titles like BI consultant, BI Developer, Business Intelligence Analyst, Business Intelligence Team Lead, Data Analyst, Data Integration Developer, Information Analytics Manager, Information Delivery Manager, Information Systems Analyst, Manager of Predictive Analytics, Financial Analyst. Application developers were the next biggest group at 14%. This included Developers of multiple types, Software Architects and Enginners and Programmers. Closely on the developers heels were DBAs of different stripes at 12%. The last significant group were folks that were managers (or higher) at their organization at 10% including team leads and even one VP of Professional Services. 80% of our registrants were local (with a rather large circle of local considering we’re the only user group for eastern Washington and northern Idaho) which bodes well for our local PASS Chapter. The largest distance traveled (for a non speaker) was approx 1200 miles from Pasedena, CA. Some other numbers 4 of 22 speakers were part of our local group (19%) although we drafted a local professor into presenting as well upping our local speaker count to 5 (24%). 6 of our 22 speakers (27%) were women and while I’d like to see that percentage higher our first time out we didn’t fall completely down in that department. Running a quick informal peek at the attendee names 25 out of 93 attending (27%) seems to match up with our speaker break down as well.
Wow! The weekend was crazy but amazing as well. In my best dreams we got a little better than 50% buy in to our first SQL Saturday: I wasn’t sure where all our local DBAs were at and I expected that we’d end up rather student heavy. Instead we had 113 registered as of Saturday morning (including some middle of the night registrations) and 93 bodies through the door the day of: That’s an 82% attendance percentage! 72 of those folks (64%) were SQL Saturday first timers (and in an informal poll on Saturday many of them were new to PASS as well). 23 (20%) of those registered were students.
The event itself went rather smoothly: we were at a university in their classrooms so the technology aspect was well tested by daily use (we had an issue in one room and in the end just moved to a different open room). It’s been over a year since I had been able to make it a SQL Saturday or Summit and I’d forgotten how energizing it can be. I know that I’ve been struggling to be enthused about my job this winter but I have a new found energy that I can attribute directly to the chance to interact with such a great professional community. Even better we were able to really promote the local users group and I’ve already seen a better response to our next meeting that we’ve had in the past so I think we managed to find a lot more of our local DBAs here in Spokane. I’m looking forward to seeing them at our monthly meetings as well as SQL Saturday Spokane 2019. We’re definitely going to be doing this again and improving on our success this year.
We’ve been re-arranging some deck chairs at work (deploying a new backup tool in our SQL environment). As we’ve been working our way through that we’ve had to update or re-engineer some of our processes for automatic restores to use the new tool. We’re doing one restore though where no one could recall who actually used the restored database. After asking around for a month we didn’t have any better idea of whether the database was actually used or not. Being a risk-averse (not to mention polite) group we wanted to really be sure that no one was using it. Thus the inspiration for this post was born: we needed a way to audit database access confidently say one way or the other that our mystery database was in use or not.
We needed something quick and lightweight and After a quick spin through the SQLHelp channel on Slack I’d been pointed in the direction of extended events. SQL combined with XML my forever nemesis. Thanks to some help from Thomas Schutte (b) particularly his blog post on getting started with XML in SQL. So here is some useful code if you need to audit database use on a server.
First we need a simple extended event session. I picked the sql_batch_completed event so I knew I would catch users as they changed database context on the server and executed code. I kept the data just in the ring buffer since I’ll be checking it often to pull out relevant data and storing it in a table.
CREATE EVENT SESSION [UserContextAudit]
ADD EVENT sqlserver.sql_batch_completed
( ACTION ( sqlserver.database_id
, sqlserver.username )
WHERE sqlserver.session_id > 100 )
ADD TARGET package0.ring_buffer
WITH ( MAX_MEMORY = 1024KB
, STARTUP_STATE = ON );
ALTER EVENT SESSION [UserContextAudit]
ON SERVER STATE = START;
Generating extended events is easy peasy, especially when using the wizard in SSMS. The terrible (for me) part is parsing the results from XE which is stored as XML. I got about 65% of the way there but struggled to get to the data points I needed out of the XML. Thomas’s post above pointed me in the direction of the second cross apply I needed to get the XML nodes I was missing into my temp table. Once you have a good query to pull out the extract the data from the XE session we just dumb the results into our database. I’m running this script once a minute via a SQL Agent job to preserve the information.
SET QUOTED_IDENTIFIER ON;
DECLARE @XML AS XML;
SELECT @XML = ( SELECT TOP 1 CAST(xet.target_data AS XML) AS XEData
FROM sys.dm_xe_session_targets AS xet
JOIN sys.dm_xe_sessions AS xe ON ( xe.address = xet.event_session_address )
WHERE xe.name = 'UserContextAudit'
AND xet.target_name = 'ring_buffer' );
SELECT ce.event_node.value('@timestamp ', 'DATETIME2') AS EventDate
, ca.action_node.value('@name ', 'VARCHAR(200)') AS ActionName
, ca.action_node.value('(value/text())', 'VARCHAR(MAX)') AS ActionValue
FROM ( VALUES ( @XML )) xx ( XMLData )
CROSS APPLY xx.XMLData.nodes('//RingBufferTarget/event') ce(event_node)
CROSS APPLY ce.event_node.nodes('./action') ca(action_node);
SELECT DISTINCT a.EventDate
, a.ActionValue AS username
, b.ActionValue AS databasename
FROM #temp a
JOIN #temp b ON b.EventDate = a.EventDate
JOIN #temp c ON c.EventDate = b.EventDate
WHERE a.ActionName = 'username'
AND b.ActionName = 'database_name'
AND c.ActionName = 'sql_text';
SET lastdate = d.EventDate
FROM AuditDB.audit.usertracking upd
JOIN #temp2 d ON d.username = upd.username
AND upd.databasename = d.databasename
WHERE upd.lastdate < d.EventDate;
INSERT INTO SQLMgmtD.audit.usertracking ( username
, lastdate )
FROM #temp2 i
WHERE NOT EXISTS ( SELECT *
FROM Audit.audit.usertracking a
WHERE a.username = i.username
AND a.databasename = i.databasename )
GROUP BY username
DROP TABLE #temp;
DROP TABLE #temp2;
Run that XE Session and SQL Agent collector job for a month or so and you should have a solid idea of what databases are actually being utilized on your server. There is a non-zero chance that a database might only be referenced as cross-db query but if you think that’s possible the sql_text from the XE session could be analyzed to look for and pull apart 3 part names for databases referenced this wa
In 8 days Spokane will play host to it’s first ever SQL Saturday event (yay!) Right after I got my first DBA job I quickly discovered the PASS community (via Brent Ozar to be specific, the senior DBA was quite the fan and their 6 month JR DBA Training plan was invaluable in getting my sea legs). I quickly discovered a truly welcoming and helpful community that at its core is deeply invested in the success and growth of its members. I attended my first SQL Saturday in Portland in 2014 (and again in 2015 and Redmond in 2016). I learned about Summit which I first attended in 2015 (and again in 2016). After attending I was frustrated that there was no local user group to connect with (boo!) but as soon as I decided to take steps to create a Spokane PASS chapter (or join us on fb) another local DBA beat me to the punch (yay!). However I found myself running the group in the fall of 2016 and a small group of us have met regularly since then.
Something was missing still though: in a town of a quarter million surely we had more than 10 DBAs. I reached out to the wonderful Melody Zacharias (b|t) to get her thoughts about a SQL Saturday in our neck of the woods. She was encouraging and graciously agreed to help organize our own event. Today I am excited to report that our event is over 75% booked with more than half of those registered are new to SQL Saturday (and likely to the PASS community at large). I have high hopes for the success next weekend: if you are one of the many people who submitted sessions to present: thank you, just receiving all your submissions validated our idea. If you are one of the few that were selected to present and able to join us next weekend: thank you, I am incredibly excited to meet and learn from you. If you’ve registered to attend: thank you, I am excited to meet you and hopefully get the chance to tell you about how great PASS is and invite you to our monthly meet ups as well!
It’s rare that I ever look back at a year in review and think “Yeah, I nailed it.” 2017 maybe even less so. It found it difficult to maintain long term focus and a lot of little projects definitely fell through the cracks. That being said here are my goals for 2018 (both personal and professional):
- More technical posts:
- I should be more comfortable providing technical information to a wider audience. I might not post more at this point, but I like to post more helpful/specific articles.
- Pass the 70-463 Exam:
- This would complete my SQL MSCA and would be a nice milestone. I’ve been slowly working my way through test prep materials but I need to ramp it up and get it learned.
- Continue learning to play the ukulele:
- I know a few chords but it’d be nice to get an actual song or two into my skill set. Especially since my father has taken up building instruments and has nearly completed a tenor uke for me.
- Pass my nidan test:
- I’m an aikido instructor (evenings and weekends) and this last year my personal training has fallen off a bit. I need to buckle down and put the work in on my own progress.
What are your goals for next year?
Recently I was reviewing the growth of some of our third party vendor databases and discovered that one vendor had these handy, enormous audit tables that hadn’t been touched since they were created. Records were constantly being added and as a double surprise (SURPRISE) there was no way via the application to ensure these tables were getting the cleanup they needed. After some back and forth (triple SURPRISE) it fell to us to manage this data. We met with the business owners and determined a reasonable retention policy and set out to create some (Relatively) simple SQL agent jobs that would clean up the data nightly and keep things tidy.
The single biggest problem we ran into was that being audit tables they were heavily used and deleting records potentially caused the application to grind to a halt while SQL took out locks to remove records (especially in the early day as we tried to purge the really old records. Eventually we got most of the purging working through proper scheduling and batch deleting. But one table held out as a problem child. The solution that worked on all the other tables simple could not delete records faster that they were being created (without grinding the system to standstill). Enter our last and only hope: partition switching! We had to schedule some down time to rebuild the table but once the table was rebuilt we were able to remove records from the table months at a time to temp tables which we could drop as soon as we switched the data into it. And best of all the blocking (if any) caused by the schema lock from the partition switch was a once a month deal! Sample code follows.
DO NOT JUST RUN IT ON YOUR PRODUCTION SERVERS: STRANGE DBAs GOOFING ON THE INTERNET DISTRIBUTING CODE IS NO BASIS FOR A SYSTEM OF DATA MANAGEMENT.
*ahem* Now that that’s out the way:
First you’ll need to create a partition function and partition scheme to rebuild your table across. Build your partition function first and then your partition scheme. With partition functions make sure to create empty partitions on either side of the series that will hold no data (if at all possible). The partition scheme below keeps the table all on the same file group (primary) but can be altered if needed. In our case we are changing a vendor database as little as possible!
Once your partition scheme and function are built it’s time to schedule a maintenance window: The table will be offline while you rebuild it across the partition. Additionally any other indexes will need to be rebuilt to include the partitioning column and built across the partition as well or partition switching will not work. Thankfully my super busy table only has a primary key. I’ll drop it and recreate it on the partition:
Viola! Now we can bring the application back online and allow it to keep adding an ungodly number of audit records to the table. In the SQL Agent we add a job that has three job steps. The first adds future partitions to the partition function and partition scheme so we always have some empty partitions on that end of the structure. The second finds old partitions, switches them out and then drops the data. The third step removes old, empty partitions from the partition scheme and function. But before that you’ll need a to create a table with the same index structure as your partitioned table (sans partitioning). Every index on the partitioned table must be recreated on the secondary table where we’ll actually truncate the data. I cheat and create the table by selecting the top 1 records from my partitioned table, truncating it and then rebuild the needed indexes on top of that.
Step 1 dynamically adding new partitions:
Step 2 Switch out old partitions:
And lastly removing old crusty partitions:
If you’ve got out of control tables that no one else is willing to tend to this will do the trick. Although remember that altering your vendor database without their consent may void your support contract and any number of other considerations. As with anything else you find on the internet: test it and see if it works for you before deploying to production. A huge hat tip to Kendra Little (b|t) whose introduction to partitioning made any of this possible for my own problem database. I hope that if nothing else I’ve added another avenue for some else to find her great break down of partitioning.
I’ve been eyeing the training programs at SQLSkills for some time now. I’ve managed to catch a couple of SQL Skills team’s sessions at Summit and they have been very helpful and very informative. I’ve only been doing this gig for 4 years now and every webinar, SQL Saturday or Summit has been a wealth of useful information that I’ve been able to take back and use to do my job better. I’m starting to reach that tipping point where I probably should start looking at what I know that I could share with others but frankly there is so much I still don’t know (at least not a level that satisfies me anyway) I’m not quite there yet. Which is one of the reasons the IEOTO1 has been on my radar (even when it was just IE1). I work primarily with a lot of 3rd party databases and I’ve found that even though I can do less with them most of my wins come from a better understanding of SQL internals which the SQLSkills team has in spades. I want to attend training because I want to be better at my job today, tomorrow and next year. SQL Server fundamentals to feel completely intuitive to me is just the very beginning of that path. I also need a chance to get an authentic Chicago Italian beef, but that would just be a bonus.
If you’d like to attend SQLSkills training they are running a competition now for a free seat at one of their training courses (IEPTO1 or IEPTO2) details are here.
This is mostly because I can’t find the link to this Microsoft Technet Article easily enough. Like most things SQL server there is a lot nuance to compressing tables and indexes in your databases and I’m going to just touch on the first pass look I’m performing.
The first step is to look at the usage patterns on the table. The technet article splits this into two queries that I have combined into one single query below. What I am looking for first is how much the table/index is updated: more updates will mean more CPU overhead as pages/rows are decompressed before being updated. If the server doesn’t have many spare CPU cycles to spare I could be helping to create a bottleneck at the CPU. Secondly: how many scans happen on the table/index? More scans means it’s more likely that we’ll get a performance benefit when we go to read the the object from disk as we will do less physical reads.
This is important information but if you are looking at enabling compression there is probably a more immediate consideration as well: you need to conserve space on your data drives. The stored procedure below will give you an idea of the space savings benefit for either ROW or PAGE level compression. The data types in the index or table will have a large impact on how much SQL is able to squeeze data down. This ultimately (in tandem with the usage patterns determined above) drive your decision to compress your data. If you have spare CPU cycles to burn and you can reclaim a large amount of space in your database (or even disk drive if needed), then compression is going to an important tool in your toolbox. We recently reclaimed nearly 200 GB in our data warehouse by compressing our largest table (260 GB down to 56 GB). The table was rarely updated but scanned often so in addition to gaining back significant database space we’ve also sped up the ETL processes that access that table as a bonus!
The technet article is great and goes into greater depth on what’s happening under the hood when you compress tables as well as providing more detailed examples of compression decision scenarios and really is recommended reading if you are currently looking to implement compression.
Last summer we discovered that some records that had been migrated to a read-only filegroup via partitioning were still getting updated (here’s the sordid details on stack overflow). Working with the SQLCat team at PASS in October we discovered that while the partitioning scheme indicated the records should be on the read-only partition according to the allocation unit mapping showed the records will still resident on the active filegroup. For unknown reasons during partition maintenance the records hadn’t moved properly. Working with MS Support we weren’t able to replicate the issue or determine how it happened. Even more insidious is that there is no good way to detect if the issue is occurring unless you go specifically looking for it. It only came to our attention because we were setting some records to read only. If you are doing partition maintenance which moves records physically on the disk you might be interested in the following query which will detect when there is a disconnect between where your partition scheme thinks records will reside and where sys.allocation_units tells you those same records reside: