What’s a Data Catalog? Denver Edition

I had the wonderful opportunity to present an introduction to data catalogs to the Denver SQL Users Group yesterday. I am grateful for the chance to work on my presenting skills and talk about the new challenges about identifying and protecting risky data. I’ve uploaded the slide deck and some sample PowerShell scripts for SQL Data Catalog to my git repo here. Additionally there was a question about re-using Data Catalog work for data masking: Richard Macaskill’s blog post I mentioned is here. Thanks Denver: I am looking forward to coming back and visiting again!

SQL 101: What Happens When…

I had an interesting question asked this last Saturday when I was giving my Intro to SQL session at SQL Saturday Redmond: “What happens if you cast a NVARCHAR string as  VARCHAR?”

I had an educated guess but I didn’t have the time during the session to test it out: Casting NVARCHAR to VARCHAR. It won’t throw an error but it will destroy any data in the string that isn’t VARCHAR compatible. Try out the test below yourself!

The question was asked initially because she had found a place where she had to convert a uniqueID to a string and was concerned that perhaps this could cause issues. The uniqueidentifier is a series of hexadecimal numbers concantenated together with hyphens (for example: 851CC794-E0CB-49DC-976F-1687F62D2188). They will always be 36 “characters” long. The good news is that according to BOL the uniqueidentifier is consider a character type for the purposes of conversion. Attempting to cast a string that is longer than 36 characters will truncate the original string to the first 36 before it attempts the conversion. Conversely casting a uniqueidentifier to a string datatype will result in a 36 character string (unless you specify less than 36 characters in which case SQL will throw an error).

Friend of Redgate!

forg_2019_150So I tried a very new thing (for me anyway) and after a fair amount of dithering about it I told the little naysayer that lives in the back of my head to shut the hell up and applied for the Friend of Redgate program at the end of December.

I’m writing about it so of course you may easily guess the outcome: I was accepted into the program! I really am a huge fan of their tools ever since I first found SQL Compare 5 years ago (to the month almost now that I think about it) after a ridiculous manual deployment. In the time since I’ve deployed and learned to use more of their tools. They have created a wonderful community to help support the SQL community at large and I am looking forward to working hard to contribute as well! Currently my biggest engagement is with their new SQL Data Catalog. As more and more legislation is created to protect data in our control it’s more and more vital to be able to know exactly what types of data we have so we can take concrete steps to ensure its protection: a data catalog is an integral tool to get that understanding. I really love this problem space and I have lots of ideas and opinions about it!

Getting SSRS Details via Powershell

This is one of those posts so I never have to google this again (one hopes). Here is the PS code to pull back a set of details about every SSRS instance installed on a server including the SSRS instance name, & the first URL port it is running on, the service name and the name of the report server database etc.

2018 in the Rearview, 2019 Looming Large

2018 was a challenging year for all sorts of reasons. I set some modest goals for myself and it was mixed bag. My work goals got kind of demolished: I failed to post much last year (although I am happy I now have a post to return to every time I need to write a pivot) and after multiple attempts I failed to pass the MS 70-463 exam on building a data warehouse (I just couldn’t seem to get the last 50-100 points to push me over the top). My personal goals I did okay on though: I passed my nidan test back in September and I am about 1/3 of the way through learning to finger pick Loch Lomond (definitive version is Runrig’s live version) on my tenor uke. I would think the year was failure however I did a lot (especially in the first 3 months):

  • Coordinated and ran the first SQL Saturday here in Spokane.
  • Presented at said SQL Saturday.
  • Managed to keep the local PASS chapter meeting every month (even while I was out of the country with the assistance of my co-leader Elizabeth Hunt (t).
  • Attended SQL Sat Oregon, rode the SQL Train to PASS Summit. I finally managed to make the Summit meet-up with other chapter leaders and SQL Saturday organizers: so much learning and networking.
  • I started the process for creating a not for profit to handle the money for our SQL Saturday since it was successful enough to warrant another one. We officially incorporated right before Christmas.
  • At work I finally got automated backup testing up and running. It’s probably a little over-engineered but it was a solid first effort.
  • I also began attending the data governance meetings at work so we could start incorporating our SQL data into their consideration and I got us enrolled in the EAP for Redgate’s Data Catalog.

This last one is by far one of my favorite things this year. I’ve spent a lot of time thinking about how we treat our data but beyond being sure it was encrypted and backed up I haven’t been able to do much else concrete but in the last month of 2018 I was able to start cataloging data in our internally developed databases and get start on one of our bigger vendor databases. Adding this metadata will provide us with a much better view of what kinds of data we are storing and where it is.  I’m happy with this initial run at the problem space and I can already see where it starts to answer some questions we should all be asking about the data in our care.

2019 is looking to be another challenging and busy year: I need to refocus and look at my path to an MCSA/MCSE certification. I’m hoping I can take the 70-764 and 765 exams this year. I was hoping to get my 2012 MSCA and upgrade that but I still don’t do enough SSIS work to get myself over the last point hurdle with the 70-463. And being certified on the upcoming versions is probably worth my time.

I’ll finish putting together the Inland Northwest Data Professionals Association in the next couple of months, just in time to run the second SQL Saturday Spokane. I also applied to the Friends of Redgate program and we’ll see what comes of that: that could lead to a whole host of community contributions that I can’t even foresee right now.

SQL 102: A Useful Pivot

I don’t find myself pivoting data often but when I do I seem to always need to do the “complicated” pivot. You know, the one where you have a collection of unique values like email addresses where the value isn’t the column name but you want the 4 email addresses connected to one person in a single row rather than four rows. I can never remember how to do that so here is my personal internet bookmark for that code. Thanks to the late Robert Davis who gave the answer I needed back in 2015. Robert was a wonderful and generous contributor to the SQL Family his site is full of useful information and is still available here.

The solution is to add the categories you want as eventual columns to your result set before pivoting (it sounds simple enough but again then I can never remember it). So if you have say an account with multiple contacts:

This returns a list of results such as:

acctNo name email
1001 Susan susan@thesuze.com.edu
1001 Mark mark@thesuze.com.edu
1002 Aoife theboss@bagpipebattle.com.edu

The one simple trick is to create a value for each row that will be repeated it can be used as a column name when you pivot:

Simply repeat the structure of creating a valid column name (‘String’ + CAST(counter as VARCHAR)) and and you can include a pivot across as many different columns as need to be included. Our short list above now becomes:

acctNo Name1 Email1 Name2 Email2
1001 Susan susan@thesuze.com.edu Mark mark@thesuze.com.edu
1002 Aoife theboss@bagpipebattle.com.edu NULL NULL

Hopefully now in the future I’ll remember I wrote this and I won’t have to go searching for the solution (again).

Remarkably Honored

So this is a thing: I was nominated as an outstanding volunteer for PASS for April 2018. I really appreciate that someone thought I was doing something outstanding. I am bad at both accepting compliments and self promotion. I really believe in the PASS as a positive extension of the SQL community. I know that I wouldn’t be where I am today without the generous and helpful people who want to help myself and others do and be their best meeting the data needs of our respective employers/clients/organizations. It’s only natural that you turn around and help those around you that want to do better for themselves as well. I’m excited for the growth of our users group and the success of our SQL Saturday: I could not have done it alone.

Finding Unused Databases on Your SQL Instance

We’ve been re-arranging some deck chairs at work (deploying a new backup tool in our SQL environment). As we’ve been working our way through that we’ve had to update or re-engineer some of our processes for automatic restores to use the new tool. We’re doing one restore though where no one could recall who actually used the restored database. After asking around for a month we didn’t have any better idea of whether the database was actually used or not. Being a risk-averse (not to mention polite) group we wanted to really be sure that no one was using it. Thus the inspiration for this post was born: we needed a way to audit database access confidently say one way or the other that our mystery database was in use or not.

We needed something quick and lightweight and After a quick spin through the SQLHelp channel on Slack I’d been pointed in the direction of extended events. SQL combined with XML my forever nemesis. Thanks to some help from Thomas Schutte (b) particularly his blog post on getting started with XML in SQL. So here is some useful code if you need to audit database use on a server.

First we need a simple extended event session. I picked the sql_batch_completed event so I knew I would catch users as they changed database context on the server and executed code. I kept the data just in the ring buffer since I’ll be checking it often to pull out relevant data and storing it in a table.

CREATE EVENT SESSION [UserContextAudit]
ON SERVER
ADD EVENT sqlserver.sql_batch_completed
( ACTION ( sqlserver.database_id
, sqlserver.database_name
, sqlserver.session_id
, sqlserver.session_nt_username
, sqlserver.sql_text
, sqlserver.username )
WHERE sqlserver.session_id > 100 )
ADD TARGET package0.ring_buffer
WITH ( MAX_MEMORY = 1024KB
, STARTUP_STATE = ON );
GO

ALTER EVENT SESSION [UserContextAudit]
ON SERVER STATE = START;

Generating extended events is easy peasy, especially when using the wizard in SSMS. The terrible (for me) part is parsing the results from XE which is stored as XML. I got about 65% of the way there but struggled to get to the data points I needed out of the XML. Thomas’s post above pointed me in the direction of the second cross apply I needed to get the XML nodes I was missing into my temp table. Once you have a good query to pull out the extract the data from the XE session we just dumb the results into our database. I’m running this script once a minute via a SQL Agent job to preserve the information.

SET QUOTED_IDENTIFIER ON;

DECLARE @XML AS XML;

SELECT @XML = ( SELECT TOP 1 CAST(xet.target_data AS XML) AS XEData
FROM sys.dm_xe_session_targets AS xet
JOIN sys.dm_xe_sessions AS xe ON ( xe.address = xet.event_session_address )
WHERE xe.name = 'UserContextAudit'
AND xet.target_name = 'ring_buffer' );

SELECT ce.event_node.value('@timestamp [1]', 'DATETIME2') AS EventDate
, ca.action_node.value('@name [1]', 'VARCHAR(200)') AS ActionName
, ca.action_node.value('(value/text())[1]', 'VARCHAR(MAX)') AS ActionValue
INTO #temp
FROM ( VALUES ( @XML )) xx ( XMLData )
CROSS APPLY xx.XMLData.nodes('//RingBufferTarget/event') ce(event_node)
CROSS APPLY ce.event_node.nodes('./action') ca(action_node);

SELECT DISTINCT a.EventDate
, a.ActionValue AS username
, b.ActionValue AS databasename
INTO #temp2
FROM #temp a
JOIN #temp b ON b.EventDate = a.EventDate
JOIN #temp c ON c.EventDate = b.EventDate
WHERE a.ActionName = 'username'
AND b.ActionName = 'database_name'
AND c.ActionName = 'sql_text';

UPDATE upd
SET lastdate = d.EventDate
FROM AuditDB.audit.usertracking upd
JOIN #temp2 d ON d.username = upd.username
AND upd.databasename = d.databasename
WHERE upd.lastdate < d.EventDate;

INSERT INTO SQLMgmtD.audit.usertracking ( username
, databasename
, lastdate )
SELECT username
, databasename
, MIN(EventDate)
FROM #temp2 i
WHERE NOT EXISTS ( SELECT *
FROM Audit.audit.usertracking a
WHERE a.username = i.username
AND a.databasename = i.databasename )
GROUP BY username
, databasename;

DROP TABLE #temp;
DROP TABLE #temp2;

Run that XE Session and SQL Agent collector job for a month or so and you should have a solid idea of what databases are actually being utilized on your server. There is a non-zero chance that a database might only be referenced as cross-db query but if you think that’s possible the sql_text from the XE session could be analyzed to look for and pull apart 3 part names for databases referenced this wa

Goals for 2018

It’s rare that I ever look back at a year in review and think “Yeah, I nailed it.” 2017 maybe even less so. It found it difficult to maintain long term focus and a lot of little projects definitely fell through the cracks. That being said here are my goals for 2018 (both personal and professional):

  • More technical posts:
    • I should be more comfortable providing technical information to a wider audience. I might not post more at this point, but I like to post more helpful/specific articles.
  • Pass the 70-463 Exam:
    • This would complete my SQL MSCA and would be a nice milestone. I’ve been slowly working my way through test prep materials but I need to ramp it up and get it learned.
  • Continue learning to play the ukulele:
    • I know a few chords but it’d be nice to get an actual song or two into my skill set. Especially since my father has taken up building instruments and has nearly completed a tenor uke for me.
  • Pass my nidan test:
    • I’m an aikido instructor (evenings and weekends) and this last year my personal training has fallen off a bit. I need to buckle down and put the work in on my own progress.

What are your goals for next year?

Maintaining Table Data with Partitions

Recently I was reviewing the growth of some of our third party vendor databases and discovered that one vendor had these handy, enormous audit tables that hadn’t been touched since they were created. Records were constantly being added and as a double surprise (SURPRISE) there was no way via the application to ensure these tables were getting the cleanup they needed. After some back and forth (triple SURPRISE) it fell to us to manage this data. We met with the business owners and determined a reasonable retention policy and set out to create some (Relatively) simple SQL agent jobs that would clean up the data nightly and keep things tidy.

The single biggest problem we ran into was that being audit tables they were heavily used and deleting records potentially caused the application to grind to a halt while SQL took out locks to remove records (especially in the early day as we tried to purge the really old records. Eventually we got most of the purging working through proper scheduling and batch deleting. But one table held out as a problem child. The solution that worked on all the other tables simple could not delete records faster that they were being created (without grinding the system to standstill). Enter our last and only hope: partition switching! We had to schedule some down time to rebuild the table but once the table was rebuilt we were able to remove records from the table months at a time to temp tables which we could drop as soon as we switched the data into it. And best of all the blocking (if any) caused by the schema lock from the partition switch was a once a month deal! Sample code follows.

DO NOT JUST RUN IT ON YOUR PRODUCTION SERVERS: STRANGE DBAs GOOFING ON THE INTERNET DISTRIBUTING CODE IS NO BASIS FOR A SYSTEM OF DATA MANAGEMENT.

*ahem* Now that that’s out the way:

First you’ll need to create a partition function and partition scheme to rebuild your table across. Build your partition function first and then your partition scheme. With partition functions make sure to create empty partitions on either side of the series that will hold no data (if at all possible). The partition scheme below keeps the table all on the same file group (primary) but can be altered if needed. In our case we are changing a vendor database as little as possible!

Once your partition scheme and function are built it’s time to schedule a maintenance window: The table will be offline while you rebuild it across the partition. Additionally any other indexes will need to be rebuilt to include the partitioning column and built across the partition as well or partition switching will not work. Thankfully my super busy table only has a primary key. I’ll drop it and recreate it on the partition:

Viola! Now we can bring the application back online and allow it to keep adding an ungodly number of audit records to the table. In the SQL Agent we add a job that has three job steps. The first adds future partitions to the partition function and partition scheme so we always have some empty partitions on that end of the structure. The second finds old partitions, switches them out and then drops the data. The third step removes old, empty partitions from the partition scheme and function. But before that you’ll need a to create a table with the same index structure as your partitioned table (sans partitioning). Every index on the partitioned table must be recreated on the secondary table where we’ll actually truncate the data. I cheat and create the table by selecting the top 1 records from my partitioned table, truncating it and then rebuild the needed indexes on top of that.

Step 1 dynamically adding new partitions:

Step 2 Switch out old partitions:

And lastly removing old crusty partitions:

If you’ve got out of control tables that no one else is willing to tend to this will do the trick. Although remember that altering your vendor database without their consent may void your support contract and any number of other considerations. As with anything else you find on the internet: test it and see if it works for you before deploying to production. A huge hat tip to Kendra Little (b|t) whose introduction to partitioning made any of this possible for my own problem database. I hope that if nothing else I’ve added another avenue for some else to find her great break down of partitioning.