Adding SQL Docs to Jekyll

Back in August I talked about how our organization was using Jekyll to publish our internal docs. This fall I took it a step further and worked out how to publish content generated by Redgate’s SQL Doc. SQL Doc is handy tool that will allows you to easily add meta-data to your database. It’s great: you essentially comment every table, column, view, schema etc. and then commit those changes to the code repository itself. Of course if you want to share that documentation with someone who doesn’t have access to the source code (or maybe doesn’t want to flip through SQL files looking for the updates to metadata to read it even if they did) it needs to be extracted made usable somewhere. SQL Doc has several report generation options and since we were using markdown for the Jekyll site that seemed a good choice. The problem in this case is that the export wasn’t really built with the idea that I might want to insert it into another site. It simply takes all the objects from the instance you have it pointed at and creates an entire site/directory structure around that. So we need to (1) strip out all the directory references or other code we won’t be using and (2) insert any of the code needed by the Jekyll site to make it useful. The following PS script mostly (I am sure there are still improvements to be made) does that job: just point it at a fresh export of the documentation and it will delete all the existing files and regenerate the docs for inclusion. (One note that it is pretty site specific in that we assumes that the site is built on the Documentation theme.)

Author: Josh Smith
Created: 2020-09-03
This script will parse through SQL Doc Generated markdown files to create Jekyll friendly
db documentation.
Should be run from the root directory of the documentation repository.
function createHeader { param ($subject, $link, $page)
$title = "$subject - $page - Documentation"
$summary = "Documentation for " + $page.ToLower() + " in $subject."
$header = "---
title: ""$title""
tags: ["
if ($page.IndexOf("index.html") -gt -1){
$header = $header + "data_dictionary, "
$header = $header + "$subject]
sidebar: db_sidebar
permalink: $link
last_updated: Oct 21, 2020
summary: $summary
return $header
function removeImageFolder { param ($imagelineIn)
$imgPos = $imagelineIn.IndexOf('Images') + 6
$imageLine = $imagelineIn.Substring(0, $imagelineIn.IndexOf('(..') + 1) + '/images/dbmodel'
if ($imageLineIn.IndexOf('Images', $imgPos) -gt -1){
$imageLine = $imageLine + (removeImageFolder $imageLineIn.Substring($imgPos))
else {
$imageLine = $imageLine + $imageLineIn.Substring($imgPos, $imageLineIn.IndexOf(".", $imgPos) - $imgPos).ToLower()
if ($imageLineIn.IndexOf('Images', $imgPos + 1) -lt 1){
$imageLine = $imageLine + $imageLineIn.Substring($imageLineIn.IndexOf(".", $imgPos))
if ($imageLine.IndexOf("Index.png") -gt 0){ $imageLine = $imageLine -replace "Index.png", "index.png" }
return $imageLine
function removeAnchor {param ($anchorlineIn)
$anchorLine = $anchorlineIn.Substring(0, $anchorlineIn.IndexOf("<a name="))
$anchorLine = $anchorLine + $anchorlineIn.Substring($anchorlineIn.IndexOf("a>") + 2, $anchorlineIn.Length - ($anchorlineIn.IndexOf("a>") + 2))
return $anchorLine
function insertLink {param ($linklineIn, $linkfolders)
$newLink = ''
foreach($f in $linkfolders){ $newLink = $newLink + $f + '/' }
$newLink = $newLink + $linkLineIn.Substring($linklineIn.LastIndexOf('(') + 1, $linklineIn.LastIndexOf('.md') - ($linklineIn.LastIndexOf('(') + 1))
$newLink = $newLink -replace '/', '_'
$linkLine = $linklineIn.Substring(0, $linklineIn.LastIndexOf('(') + 1) + $newLink + '.html)'
return $linkLine.toLower()
function processMDFile {param ($content, $database, $processFolderList, $fileName, $filler, $workingFolder)
$mdOutput = ''
$newLine = ''
$firstLine = $true
$prevBlank = $true
foreach($l in $content){
if ($l.IndexOf('####') -lt 0 -or $l.IndexOf('[Database]') -lt 0 -or $l.IndexOf('---') -ne 0){
if ($l.Trim().Length -eq 0 -and $prevBlank -eq $true){
# do nothing
elseif ($l.Trim().Length -eq 0){
$newLine = $l.Trim()
else {
$prevBlank = $false
$FirstLine = $false
$newLine = $l -replace $filler, ''
$newLine = processText $newLine
if ($newLine.IndexOf(".md") -ge 0) {
if ($newLine.Substring(0,9) -eq '[Project]'){
$newLine = projectNavigation $newLine $database $processFolderList
else {
if ($newLine.IndexOf("|") -eq 0){
$cells = $newLine.Split("|")
$newLine = ""
foreach($c in $cells){
$tempText = $c
if ($tempText.IndexOf(".md") -gt 0) {$tempText = insertLink $tempText $processFolderList}
if ($c.length -gt 0) {$newLine = $newLine + "| $temptext "}
$newLine = "$newLine |"
else {
$newLine = insertLink $newLine $processFolderList
if ($Firstline -eq $false -and $prevBlank -eq $false){
if ($newLine.Length -eq 0){$prevBlank = $true}
$mdOutput = $mdOutput + $newLine + "`r`n"
return $mdOutput
function projectNavigation{param ($projectLine, $projectDB, $sourceFolders)
$newPL = "> "
$p1 = $projectLine.IndexOf($projectDB)
do {
if ($p1 -gt -1){
$p2 = $projectLine.IndexOf("]", $p1)
# no link exists
if ($p1 -lt 0 ){
$p2 = $projectLine.LastIndexOf(')')
$newPL = $newPL + $projectLine.Substring($p2 + 3)
$p2 = -1
if ($p2 -gt -1){ $text = ($projectLine.Substring($p1 - 1, $p2 - ($p1 -2))).Trim() }
else { $text = ($projectLine.Substring($p1)).Trim() }
if ($text.Substring($text.Length - 1, 1) -eq ']'){
# potential link: parse and replace
$text = $text.substring(1) -replace ']', ''
$folderPosition = $sourceFolders.IndexOf(($text -replace " ", "_"))
$i = 0
$fPath = ''
do {
if ($i -gt 0){ $fPath = $fPath + '\' + $sourceFolders[$i] }
} while ($i -le $folderPosition)
$fileName = $projectLine.Substring($p2 + 2, $projectLine.IndexOf(")", $p2) - ($p2 + 2)) -replace '../', ''
$tPath = $startFolder.ToLower() + $fPath + '\' + $fileName.tolower()
if ((Test-Path $tPath) -eq $true){
$URL = $projectDB.toLower() + ($fPath.toLower() -replace '\\', '_') + '_' + ($fileName.ToLower() -replace ".md", ".html")
$newPL = $newPL + "[$text]($URL) > "
$tPath = $startFolder + $fPath + "\" + $sourceFolders[$i-1] + ".md"
if ((Test-Path $tPath) -eq $true){
$URL = $projectDB.toLower() + "_" + ($fPath.toLower() -replace '\\', '_') + '_' + ($sourceFolders[$i -1].toLower() -replace ".md", "") + ".html"
else { $newPL = $newPL + $text + " > " }
else{ $newPL = "$newPL $text" }
if ($p2 -gt -1){ $p1 = $projectLine.IndexOf("[", $p2) }
} while ($p2 -gt -1)
return $newPL
function processText {param($textInput)
if ($textInput.IndexOf("Images") -ge 0){ $textInput = removeImageFolder $textInput }
if ($textInput.IndexOf("<a name=""#") -ge 0){ $textInput = removeAnchor $textInput }
$textInput = $textInput -replace "#foreignkeys", "#foreign-keys" # fix foreign-key links
$textInput = $textInput -replace "_Dev", "" # remove _dev appended to db names
return $textInput
$databaseName = '' # the known name of the database
$sourceFolder = 'C:\' # path to the root of the SQL Doc export folder
$imageSource = $sourceFolder + 'Images'
$imageDestination = 'images\dbmodel'
$destination = 'pages\dbs\models\' + $databaseName
$fillString = 'JS_' # our development dbs are have our initals appended the front of the db name: we want to strip this out
# move image files
Get-ChildItem -Path $imageSource | Copy-Item -Destination $imageDestination -Recurse -Force
$files = Get-ChildItem $imageDestination
# jekyll is running in linux so case matters: lowercase all image file names
foreach($f in $files){
if ($f.Name -cne $f.Name.toLower()){
Rename-Item $f.FullName $f.Name.toLower()
#find root directory for just the database:
$startFolder = (Get-ChildItem -Path $sourceFolder -filter 'user_databases' -Recurse).FullName
$startFolder = (Get-ChildItem $startFolder | Where-Object {$_.Mode -eq 'd----'}).FullName
# get the files we will reformat
$files = Get-ChildItem -Path $startFolder -Recurse -Filter *.md
# clear out old files in case we are deleting pages from doc:
Get-ChildItem -Path $destination -Filter *.md | Remove-Item
# pull apart and rebuild files!
foreach ($f in $files){
$workingFile = $f.FullName
$file= $f.Name
$type = ($f.Directory).Name
$folderList = $workingFile.Substring($workingFile.ToUpper().LastIndexOf($databaseName.toUpper()), $workingFile.Length - $workingFile.toUpper().LastIndexOf($databaseName.toUpper()))
$folderList = $folderList -replace '.md', ''
$fileName = $folderList.Substring($folderlist.LastIndexOf('\') + 1).ToLower()
$folderList = $folderList.Substring(0, $folderList.LastIndexOf('\'))
$folderList = $folderList -replace '\\', ','
$folderList = $folderList -replace '_Dev', '' #account for possible _dev appended to db along with initials
$folderCollection = $folderList.Split(",")
$content = $null
$result = $null
# pull the content out of the working file:
$content = Get-Content -Path $workingFile
# process the content
$result = processMDFile $content $databaseName $folderCollection $fileName $fillString $startFolder
# ensure correct formatting of file names for linux
$link = ($folderList.ToLower() -replace ",", "_") + "_$fileName.html"
# set file 'topic'
if ($fileName -eq "index"){ $topic = $folderCollection[$folderCollection.Length - 1] }
else { $topic = $fileName }
# format the header for the new doc
$curHeader = createHeader $databaseName $link $topic
$result = $curHeader + $result
if ((Test-Path $destination) -eq $false){ New-Item -ItemType Directory -Force -Path $destination }
$outPath = $destination + "\" + ($link -replace ".html", ".md")
Set-Content -Path $outPath -Value $result -Force
#todo: update db_sidebar with link to documentation home
$sidebarOut = Get-Content _data/sidebars/db_sidebar.yml -Raw
$dbmodelList = Get-ChildItem pages/dbs/models/
$sidebarOut = $sidebarOut.Substring(0, $sidebarOut.LastIndexOf('folderitems:') + 12) + "`r`n`r`n"
foreach($i in $dbmodelList){
$dbName = $i.Name
$fileName = $i.Name.ToLower()
$newEntry = " - title: $dbName
url: /$fileName" + "_index.html
output: web, pdf`r`n"
$sidebarOut = $sidebarOut + $newEntry
$sidebarOut = $sidebarOut.Substring(0, $sidebarOut.Length - 1)
Set-Content -Path _data/sidebars/db_sidebar.yml -Value $sidebarOut -Force
view raw ExtractSQLDocs.ps1 hosted with ❤ by GitHub

Auditing Your SQL Instance on the Cheap

I had some great questions after my SQL 101 session at PASS Summit last week about auditing access and actions on SQL Server. I’ve recovered enough that I can now coherently share what I’ve set up for some free1 rudimentary auditing of SQL Server.

We use Redgate’s SQL Monitor to keep tabs on our SQL environment. We started initially with this extended events session to detect database changes for a monitoring alert. We quickly realized that we probably wanted a record of what triggered the alert and made a few changes. Instead of just letting SQL Monitor query the extended events session we deployed tables to our local helper db to store any detected changes. We then updated the SQL Monitor alert to check the table itself rather than the extended events sessions directly for recent events. In addition to having a record of the changes stored for research (we clean up the records after a month or so) it also gave us the chance to tune them to be less noisy. For example we don’t particularly need to be alerted when our deployment process makes changes: we have a record of it if we need to determine what happened but generally we expect those changes and don’t need to hear about them (at least not from the monitoring process).

Once we’d done that it seemed natural to make the same modifications to a similar monitoring alert for security changes. Of course you don’t have to run a monitoring tool to use these (although it does generally make it easier). A local SQL agent job could be run to check the table and send email alerts etc. The other question that comes along with security events is “Who is supposed to have access?” So lastly we implemented a once a day poll of the instance and database principals that could serve as a point in time reference of who had access.

/* We collect the results daily and haven't been too concerned
about lost history if a particular instance fails occasionally */
CREATE TABLE ##userQuery
serverName VARCHAR(255)
, databaseName VARCHAR(255)
, databaseRoleName VARCHAR(255)
, memberName VARCHAR(255)
, loginType VARCHAR(50)
, status VARCHAR(50)
/* dynamic SQL for collecting DB permissions for each instance using Aaron Bertrands sp_foreachdb
but you'd be better off incorporating the new procedure: */
EXEC master.dbo.sp_foreachdb @command = N'USE[?];
INSERT INTO ##userQuery (serverName
, databaseName
, databaseRoleName
, memberName
, loginType
, status)
SELECT @@SERVERNAME AS serverName, DB_NAME() AS databaseName
, AS DatabaseRoleName
, ISNULL(, ''No members'') AS MemberName
, CASE DP2.type
WHEN ''U'' THEN ''WinLogin''
WHEN ''S'' THEN ''SQLLogin''
WHEN ''G'' THEN ''WinGroupLogin''
END AS LoginType
, CASE WHEN IS NULL THEN ''Orphan'' ELSE ''N/A'' END AS [status]
[?].sys.database_role_members AS DRM
RIGHT OUTER JOIN [?].sys.database_principals AS DP1 ON DRM.role_principal_id = DP1.principal_id
LEFT OUTER JOIN [?].sys.database_principals AS DP2 ON DRM.member_principal_id = DP2.principal_id
LEFT OUTER JOIN master.sys.syslogins l ON l.sid = DP2.sid
DP1.type = ''R''
AND <> ''dbo''
AND DP2.type IN (''G'', ''U'', ''S'')'
, @exclude_list = N'tempdb,model'
, @print_dbname = 1;
/* union all the db permissions with the instance permissions
and stuff it into the local table */
INSERT INTO dbo.secPoll (serverName
, databaseName
, databaseRoleName
, memberName
, loginType
, status
, dateDetected)
SELECT serverName
, databaseName
, databaseRoleName
, memberName
, loginType
, status
, @theTime AS dateDetected
FROM ##userQuery
WHERE memberName NOT LIKE '##%'
, 'N/A' AS databaseName
, CASE WHEN b.sysadmin = 1 THEN
WHEN b.securityadmin = 1 THEN
WHEN b.serveradmin = 1 THEN
WHEN b.setupadmin = 1 THEN
WHEN b.processadmin = 1 THEN
WHEN b.diskadmin = 1 THEN
WHEN b.dbcreator = 1 THEN
WHEN b.bulkadmin = 1 THEN
END AS ServerRole
, AS MemberName
, CASE WHEN a.type_desc = 'SQL_LOGIN' THEN
END AS LoginType
, CASE WHEN a.is_disabled = 1 THEN
WHEN a.is_disabled = 0 THEN
WHEN a.is_disabled IS NULL THEN
END AS Status
, @theTime AS dateDetected
FROM sys.server_principals AS a
JOIN master.dbo.syslogins AS b ON a.sid = b.sid
ORDER BY serverName
, memberName;
DROP TABLE ##userQuery;

The natural conclusion of gathering metrics is to drive informed action/decision making so we started gathering up all the drift and security events via SSIS and plunking it in our central database so we could make it easily available for review but others. If we start to see an excessive amount of failed logins on a SQL instance we notify our info sec team. They can then review the security audit report for that SQL instance to to get a sense of what is happening and if they need to take any action (or if we need to tune our alert better). Similarly when they get requests to access data sources they can research to see who else has been granted permissions to the same instance or database as well as check to see what other data sources the user might have access to.

This is all a good way to implement something with no budget but if auditing and/or security auditing specifically is something that is important to your organization then the next step is to start researching more robust tooling!

1. You do get what you pay for.

Deploying the MSDTC in a Hightly Secure Environment

This is just a fancy way of saying you need to be better friends with who ever is managing your enterprise firewall. I hadn’t had to touch the DTC until a recent vendor insisted their application wouldn’t work without it (despite their only having a single data store). The MSDTC was developed to coordinate transactions that would span multiple machines and was originally introduced in SQL Server 2000.

In theory it’s not super complicated: just enable the DTC service/communication on the servers in question and turn on some built in firewall rules on the servers right? Almost.

First you need to enable the DTC service. This is located in the Component Services manager:Screen shot of the Component Services Manager expanded out to show the Local DTC service.

Right click and open the Local DTC properties and select the Security tab: Screenshot of the security properties tab for the Local DTC service. We want to check the option Allow Network DTC Access as well as Allow Remote Clients. Under the Transaction Manager Communication the best practice would be to Require Mutual Authentication as our machines should ideally be on the same domain and aware of each other. No Authentication Required is a hold over and should only be used if you meet certain–sad–conditions. Additionally you may want to use a service account to run the DTC. (In hindsight it may have made my enterprise firewall adventures less adventurous to have done this: it would have made creating the enterprise firewall rule much more simpler: just allow the service account to the do the things!)

Once that’s out of the way there are 3 inbound and 1 outbound Windows firewall rules that need to be enabled to allow DTC traffic. This can be done manually or with the following Powershell Command.

Enable-NetFirewallRule -DisplayGroup "Distributed Transaction Coordinator

You’ll also need to set up a couple more rules to allow RPC calls to make it through the Windows Firewall. This page will you walk you through those rules.

It’s likely that if you are in a highly secure environment you’ll need to ensure the DTC transactions run on predictable ports. This article walks you through the process of configuring the DTC service to run on a specific port range. You’ll need to ensure there are firewall rules in place (both local and at the enterprise level) to allow that traffic.

Once you think you have everything configured and your firewall rule request has been closed out by information security it’s time to test it to be sure everything is working. Thankfully it’s PowerShell to the rescue again. The WinRM tools provides a cmdlet for this (please note it must be run in a console running as admin):

Test-Dtc -LocalComputerName "$env:COMPUTERNAME" -RemoteComputerName "mytargetserver" -ResourceManagerPort -[portnumber] -Verbose

Important notes from working my way through this: WinRM service will be making http calls on port 5985. Similarly the diagnostic tool starts a resource manager that defaults to running on port 0 unless you pass it a different option, I found it easiest to assign it a port at the end of the dynamic range I’d assigned to the DTC. The machines also need to be able to ping one another.

This week was not the most fun but hopefully this save someone (or even future me) some time in the future.

Other helpful links:

MS MSDTC documentation

Documentation of the Test-Dtc cmdlet

SQL 101: Query Basics

SQL is an acronym for Structured Query Language so when getting started with it I think it’s best to think about how it was put together to be able to ask questions about data. With that in mind let’s a take a look at some beginning SQL to understand how to write queries.

Queries can be very simple but they won’t answer very many questions in their most simple forms:

SELECT 'MyData';
Best practices of course would be to include the specific columns
in your select list so the SQL query engine can optimize the return
of the necessary data. Selecting * from the table is a lot like
eating an entire pizza because you wanted a single slice.
FROM myTable;
view raw sqlquery1.sql hosted with ❤ by GitHub

Will not a table and filter are clearly not required to write a query, the most common use for writing a query will require 3[1] parts to be useful:

  • a SELECT-ion of things (columns)
  • FROM a a collection of things (a table)
  • WHERE a criteria is met (filter(s))

I happen to have a database of movie data. Let’s explore some common filter operators by asking some questions about movies. Filters based on mathematical operators are the easiest to start with. We can ask for films with exactly 2 Oscar nominations or more than 4 or less than 6. We make these filters exclusive or inclusive by adding the “=” operator to the comparison. We can also exclude a specific number by using “<>” (Microsoft SQL Server will also support != but this is outside the ANSI standard so “<>” would be best practice as it will be usable in any version of SQL.)

/* films with exactly 2 oscar nominations */
SELECT name, releaseDate
FROM films
WHERE oscarNoms = 2;
/* films with more than 4 oscar nominations*/
SELECT name, releaseDate
FROM films
WHERE oscarNoms > 4;
/* films with 6 or less oscar nominations*/
SELECT name, releaseDate
FROM films
WHERE oscarNoms <= 6;
/* films with any number of oscar nominations except 3 */
SELECT name, releaesDate
FROM films
WHERE oscarNoms <> 3;

We can have multiple conditions in our WHERE clause and these are combined with AND or OR statements. We can ask for films with 1, 2 or 3 Oscar nominations (but we’ll also see we have a new operator IN that is cleaner and accomplishes the same goal). We can check for movies released on or after Jan 1, 2017 and were nominated for at least 1 Oscar. We can ask about movies released before the year 1950 that made more money than they spent.

/* What movies have been nominated for 1, 2 or 3 Oscars? */
SELECT name, releaseDate, oscarNoms
FROM films
/* if any one of the following conditions is true for a row it will be returned */
WHERE oscarNoms = 1
OR oscarNoms = 2
OR oscarNoms = 3;
/* These can more easily be written with the IN clause */
SELECT name releasedDate
FROM films
WHERE oscarNoms IN (1, 2, 3); -- this is the same as the OR statement above
/* Oscar nominated moveis released on or after Jan 1, 2017 */
SELECT name, releaseDate, oscarNoms
FROM films
/* both of the conditions on either side of the AND need to be true for the row to be valid */
WHERE releaseDate >= '2017-01-01' -- SQL will convert this string into a date YYYY-MM-DD is the easiest way to write this
AND OscarNoms >= 1;
/* we can even compare column values in rows to each other */
SELECT name, releaseDate
FROM films
WHERE releaseDate < '1950-01-01'
AND boxOffice > budget;
/* Bonus query!
You can combine and seperate WHERE clauses with parenthesis
these can be used to have more complex statements for
returning rows: essentially you working to return a true/false
statment for returning any given row (also called tuples)
SELECT name, releaseDate
FROM films
WHERE ( /* this outer statement is true if either of the two
parentheticals below are true */
(oscarNoms > 2) OR
(releaseDate > '1980-12-31' AND budget >= 100000 )
) /* if the previous was true AND the box office was less
than $500,000 then the row can be returned */
AND boxOffice < 500000;
view raw sqlqueryorderby.sql hosted with ❤ by GitHub

If you’re just getting started with writing SQL queries I hope this was helpful! Next up I’ll cover a few tips for working with dates as well as specific operators for filtering strings.

[1] It’s important to note that SQL does not guarantee any sort of order in your results and thus the last part of a query is generally an ORDER BY statement that tells SQL what order to put the results in.

Finding Unused Databases on Your SQL Instance

We’ve been re-arranging some deck chairs at work (deploying a new backup tool in our SQL environment). As we’ve been working our way through that we’ve had to update or re-engineer some of our processes for automatic restores to use the new tool. We’re doing one restore though where no one could recall who actually used the restored database. After asking around for a month we didn’t have any better idea of whether the database was actually used or not. Being a risk-averse (not to mention polite) group we wanted to really be sure that no one was using it. Thus the inspiration for this post was born: we needed a way to audit database access confidently say one way or the other that our mystery database was in use or not.

We needed something quick and lightweight and After a quick spin through the SQLHelp channel on Slack I’d been pointed in the direction of extended events. SQL combined with XML my forever nemesis. Thanks to some help from Thomas Schutte (b) particularly his blog post on getting started with XML in SQL. So here is some useful code if you need to audit database use on a server.

First we need a simple extended event session. I picked the sql_batch_completed event so I knew I would catch users as they changed database context on the server and executed code. I kept the data just in the ring buffer since I’ll be checking it often to pull out relevant data and storing it in a table.

ADD EVENT sqlserver.sql_batch_completed
( ACTION ( sqlserver.database_id
, sqlserver.database_name
, sqlserver.session_id
, sqlserver.session_nt_username
, sqlserver.sql_text
, sqlserver.username )
WHERE sqlserver.session_id > 100 )
ADD TARGET package0.ring_buffer


Generating extended events is easy peasy, especially when using the wizard in SSMS. The terrible (for me) part is parsing the results from XE which is stored as XML. I got about 65% of the way there but struggled to get to the data points I needed out of the XML. Thomas’s post above pointed me in the direction of the second cross apply I needed to get the XML nodes I was missing into my temp table. Once you have a good query to pull out the extract the data from the XE session we just dumb the results into our database. I’m running this script once a minute via a SQL Agent job to preserve the information.



SELECT @XML = ( SELECT TOP 1 CAST(xet.target_data AS XML) AS XEData
FROM sys.dm_xe_session_targets AS xet
JOIN sys.dm_xe_sessions AS xe ON ( xe.address = xet.event_session_address )
WHERE = 'UserContextAudit'
AND xet.target_name = 'ring_buffer' );

SELECT ce.event_node.value('@timestamp [1]', 'DATETIME2') AS EventDate
, ca.action_node.value('@name [1]', 'VARCHAR(200)') AS ActionName
, ca.action_node.value('(value/text())[1]', 'VARCHAR(MAX)') AS ActionValue
INTO #temp
FROM ( VALUES ( @XML )) xx ( XMLData )
CROSS APPLY xx.XMLData.nodes('//RingBufferTarget/event') ce(event_node)
CROSS APPLY ce.event_node.nodes('./action') ca(action_node);

, a.ActionValue AS username
, b.ActionValue AS databasename
INTO #temp2
FROM #temp a
JOIN #temp b ON b.EventDate = a.EventDate
JOIN #temp c ON c.EventDate = b.EventDate
WHERE a.ActionName = 'username'
AND b.ActionName = 'database_name'
AND c.ActionName = 'sql_text';

SET lastdate = d.EventDate
FROM AuditDB.audit.usertracking upd
JOIN #temp2 d ON d.username = upd.username
AND upd.databasename = d.databasename
WHERE upd.lastdate < d.EventDate;

INSERT INTO SQLMgmtD.audit.usertracking ( username
, databasename
, lastdate )
SELECT username
, databasename
, MIN(EventDate)
FROM #temp2 i
FROM Audit.audit.usertracking a
WHERE a.username = i.username
AND a.databasename = i.databasename )
GROUP BY username
, databasename;

DROP TABLE #temp2;

Run that XE Session and SQL Agent collector job for a month or so and you should have a solid idea of what databases are actually being utilized on your server. There is a non-zero chance that a database might only be referenced as cross-db query but if you think that’s possible the sql_text from the XE session could be analyzed to look for and pull apart 3 part names for databases referenced this wa

Maintaining Table Data with Partitions

Recently I was reviewing the growth of some of our third party vendor databases and discovered that one vendor had these handy, enormous audit tables that hadn’t been touched since they were created. Records were constantly being added and as a double surprise (SURPRISE) there was no way via the application to ensure these tables were getting the cleanup they needed. After some back and forth (triple SURPRISE) it fell to us to manage this data. We met with the business owners and determined a reasonable retention policy and set out to create some (Relatively) simple SQL agent jobs that would clean up the data nightly and keep things tidy.

The single biggest problem we ran into was that being audit tables they were heavily used and deleting records potentially caused the application to grind to a halt while SQL took out locks to remove records (especially in the early day as we tried to purge the really old records. Eventually we got most of the purging working through proper scheduling and batch deleting. But one table held out as a problem child. The solution that worked on all the other tables simple could not delete records faster that they were being created (without grinding the system to standstill). Enter our last and only hope: partition switching! We had to schedule some down time to rebuild the table but once the table was rebuilt we were able to remove records from the table months at a time to temp tables which we could drop as soon as we switched the data into it. And best of all the blocking (if any) caused by the schema lock from the partition switch was a once a month deal! Sample code follows.


*ahem* Now that that’s out the way:

First you’ll need to create a partition function and partition scheme to rebuild your table across. Build your partition function first and then your partition scheme. With partition functions make sure to create empty partitions on either side of the series that will hold no data (if at all possible). The partition scheme below keeps the table all on the same file group (primary) but can be altered if needed. In our case we are changing a vendor database as little as possible!

USE [VeryBusyDatabase];
CREATE PARTITION FUNCTION [pf_MonthlyWindow] ( DATETIME2(7)) -- This data type must match your partitioning column data type exactly!
AS RANGE RIGHT FOR VALUES ( N'1900-01-01T00:00:00.000'
, N'2016-08-01T00:00:00.000'
, N'2016-09-01T00:00:00.000'
, N'2016-10-01T00:00:00.000'
, N'2016-11-01T00:00:00.000'
, N'2016-12-01T00:00:00.000'
, N'2017-01-01T00:00:00.000'
, N'2017-02-01T00:00:00.000'
, N'2017-03-01T00:00:00.000'
, N'2017-04-01T00:00:00.000'
, N'2017-05-01T00:00:00.000'
, N'2017-06-01T00:00:00.000'
, N'2017-07-01T00:00:00.000'
, N'2017-08-01T00:00:00.000'
, N'2017-09-01T00:00:00.000'
, N'2017-10-01T00:00:00.000'
, N'2017-11-01T00:00:00.000'
, N'2017-12-01T00:00:00.000'
, N'2018-01-01T00:00:00.000'
, N'2115-01-01T00:00:00.000' );
AS PARTITION [pf_MonthlyWindow]
, [PRIMARY] );
view raw partitionDB hosted with ❤ by GitHub

Once your partition scheme and function are built it’s time to schedule a maintenance window: The table will be offline while you rebuild it across the partition. Additionally any other indexes will need to be rebuilt to include the partitioning column and built across the partition as well or partition switching will not work. Thankfully my super busy table only has a primary key. I’ll drop it and recreate it on the partition:

ALTER TABLE dbo.superbusytable DROP CONSTRAINT PK_BestKeyEver
ALTER TABLE dbo.superbusytable
PRIMARY KEY CLUSTERED (id, insertdate ) ON ps_MonthlyWindow(insertdate); -- insertdate is an ever increasing date, we'll never be inserting "old" records
view raw partitionPK hosted with ❤ by GitHub

Viola! Now we can bring the application back online and allow it to keep adding an ungodly number of audit records to the table. In the SQL Agent we add a job that has three job steps. The first adds future partitions to the partition function and partition scheme so we always have some empty partitions on that end of the structure. The second finds old partitions, switches them out and then drops the data. The third step removes old, empty partitions from the partition scheme and function. But before that you’ll need a to create a table with the same index structure as your partitioned table (sans partitioning). Every index on the partitioned table must be recreated on the secondary table where we’ll actually truncate the data. I cheat and create the table by selecting the top 1 records from my partitioned table, truncating it and then rebuild the needed indexes on top of that.

Step 1 dynamically adding new partitions:

/* Split upcoming partitions (work up to 3 months ahead of current month) */
, @curDate DATE = DATEADD(MONTH, 3, GETDATE()); -- we'll extend our partitions out to this date</code>
SELECT @nextPart = DATEADD(MONTH, 1, CAST(sprv.value AS DATETIME2))
FROM sys.partition_functions AS spf
INNER JOIN sys.partition_range_values sprv ON sprv.function_id = spf.function_id
WHERE = N'pf_MonthlyWindow'
ORDER BY sprv.boundary_id DESC
/* ensure empty partitions exist to the right to
prevent long processing times to split partition range */
WHILE ( @nextPart <= @curDate )
PRINT 'Added ' + CONVERT(VARCHAR(30), @nextPart, 120) + ' to pf_MonthlyWindow.'
SELECT @nextPart = DATEADD(MONTH, 1, @nextPart);

Step 2 Switch out old partitions:

Script to identify the oldest partition w/ data and swap it out
1. identify partition
2. script out target table (1 time?)
3. ALTER TABLE audit.ServiceUserAction SWITCH PARTITION # TO
3. truncate table
DECLARE @part_num INT
, @sqlcmd VARCHAR(MAX)
, @rngValue VARCHAR(50);
WHILE EXISTS ( SELECT DISTINCT sp.partition_number
, rows
, CAST(sprv.value AS DATETIME2)
FROM sys.partitions sp
INNER JOIN sys.indexes si ON si.index_id = sp.index_id
AND si.[object_id] = sp.[object_id]
INNER JOIN sys.data_spaces sds ON sds.data_space_id = si.data_space_id
INNER JOIN sys.partition_schemes sps ON sps.data_space_id = sds.data_space_id
INNER JOIN sys.partition_functions spf ON spf.function_id = sps.function_id
/* Join partition range values (RANGE Right means p number - 1 = boundary ID */
LEFT OUTER JOIN sys.partition_range_values sprv ON sprv.function_id = spf.function_id
AND sprv.boundary_id = sp.partition_number - 1
WHERE = N'pf_MonthlyWindow'
AND sp.rows > 0
SELECT TOP 1 @part_num = sp.partition_number
, @rngValue = CONVERT(VARCHAR(30)
, CAST(sprv.value AS DATETIME2), 120)
FROM sys.partitions sp
INNER JOIN sys.indexes si ON si.index_id = sp.index_id
AND si.[object_id] = sp.[object_id]
INNER JOIN sys.data_spaces sds ON sds.data_space_id = si.data_space_id
INNER JOIN sys.partition_schemes sps ON sps.data_space_id = sds.data_space_id
INNER JOIN sys.partition_functions spf ON spf.function_id = sps.function_id
/* Join partition range values (RANGE Right means p number - 1 = boundary ID */
LEFT OUTER JOIN sys.partition_range_values sprv ON sprv.function_id = spf.function_id
AND sprv.boundary_id = sp.partition_number - 1
WHERE = N'pf_MonthlyWindow'
AND sp.rows > 0
AND CAST(sprv.value AS DATETIME2) < DATEADD(MONTH, -13, GETDATE()) -- 1 year was our agreed upon retention date
ORDER BY sp.partition_number;
SELECT @sqlcmd = 'ALTER TABLE dbo.verybusytable SWITCH PARTITION '
+ CAST(@part_num AS VARCHAR(3)) + ' TO dbo.truncatepartition;';
PRINT @sqlcmd;
PRINT 'Merged range value: ' + @rngValue + '.';
EXEC (@sqlcmd);
/* kill swapped out records: */
TRUNCATE TABLE dbo.truncatepartition;

And lastly removing old crusty partitions:

Script to remove empty partitions older than 15 months (except partition 1 because we need at least one empty partition)
, @sqlcmd VARCHAR(MAX);
CREATE TABLE #mergedates ( m_date DATETIME2 );
INSERT INTO #mergedates ( m_date )
FROM sys.partition_functions AS spf
INNER JOIN sys.partition_range_values sprv ON sprv.function_id = spf.function_id
WHERE = N'pf_MonthlyWindow'
AND sprv.boundary_id 1;
SELECT m_date
FROM #mergedates
ORDER BY m_date ASC;
OPEN curr_dates;
FETCH NEXT FROM curr_dates INTO @mergedate;
/* merge old partitions */
+ CONVERT(VARCHAR(30), @mergedate, 120) + ''');';
PRINT @sqlcmd;
EXEC (@sqlcmd);
FETCH NEXT FROM curr_dates INTO @mergedate;
CLOSE curr_dates;
DEALLOCATE curr_dates;
DROP TABLE #mergedates;
view raw gistfile1.txt hosted with ❤ by GitHub

If you’ve got out of control tables that no one else is willing to tend to this will do the trick. Although remember that altering your vendor database without their consent may void your support contract and any number of other considerations. As with anything else you find on the internet: test it and see if it works for you before deploying to production. A huge hat tip to Kendra Little (b|t) whose introduction to partitioning made any of this possible for my own problem database. I hope that if nothing else I’ve added another avenue for some else to find her great break down of partitioning.

Partial Backups With Dell Litespeed

We currently use Dell’s Litespeed for SQL Server for our SQL backup compression and encryption needs. It has an intuitive GUI, manages log shipping and we are generally pretty happy with it. I’ve got a couple of problem databases where I don’t want the vanilla out of the box full backups. A couple of our data warehouse dbs are now over a terabyte in size and continuing to grow. We’re implementing partitioning to help with index and statistics maintenance as well as backups. The down side is that there is no GUI for partial backups with Litespeed so I’ve had to fumble through it with the help of their support. Here then are the working scripts for taking partial backups (and restores) of databases using the Litespeed stored procedures. Our database is in simple recovery mode and we’ll be backing up a read-only file group as well as all our active files.

First backup your read-only file group. You only need to back this up once. Or until an end user has you set it to read-write, and updates records. When that happens be sure to set the file group back to read-only and then back it up again.

EXEC master.dbo.xp_backup_database @database = '<database name>',
@filename = 'Y:\SQL_BACKUPS\<name of backup file>', @filegroup = '<filegroup to backup>'
, @compressionlevel = 4, @cryptlevel = 8, @encryptionkey = N'IMMAENCRYPTCHYU'
, @verify = 1, @returndetails = 1, @init = 1, @with = 'CHECKSUM';
view raw litespeed FG backup hosted with ❤ by GitHub

The xp_backup_database also has a @read_write_filgroups parameter we’ll use later. It’s important to note that if you include it here with a value of 0 (which seems an intuitive thing to do) you will essentially take a full backup rather than a backup of just the intended filegroup. Including @returndetails is optional but will instruct Litespeed to return a row of information about the backup as it was processed.

Once you have your read-only files backed up you can back up the read-write file groups daily (or hourly or whatever works for your SLA) with the following code. Notice we’ve removed the specific filegroup parameter and added the @read_write_filegroups parameter.

EXEC master.dbo.xp_backup_database @database = '<database name>', @filename = 'Y:\SQL_BACKUPS\<name of backup file>'
, @compressionlevel = 4, @cryptlevel = 8, @encryptionkey = N'IMMAENCRYPTCHYU'
, @read_write_filegroups = 1, @verify = 1, @returndetails = 1, @init = 1, @with = 'CHECKSUM';

Once you’ve taken a full backup you can do differential backups of the read-write file groups as well by including an additional @with:

EXEC master.dbo.xp_backup_database @database = '', @filename = 'Y:\SQL_BACKUPS\'
, @compressionlevel = 4, @cryptlevel = 8, @encryptionkey = N'IMMAENCRYPTCHYU'
, @read_write_filegroups = 1, @verify = 1, @returndetails = 1
, @init = 1, @with = 'CHECKSUM', @with = 'DIFFERENTIAL';

Restore scripts look very similar to the backup scripts. If you do not have differential backups of your read-write file groups change ‘NORECOVERY’ to ‘RECOVERY.’

EXEC master.dbo.xp_restore_database @database = '', @filename = ''
, @encryptionkey = N'IMMAENCRYPTCHYU', @read_write_filegroups = 1
, @with = 'NORECOVERY', @returndetails = 1;

Restoring a read-write differential backup (RECOVERY should be changed to NO RECOVERY if there are multiple differentials to apply):

EXEC master.dbo.xp_restore_database @database = '', @filename = ''
, @encryptionkey = N'IMMAENCRYPTCHYU', @read_write_filegroups = 1
, @with = 'RECOVERY', @returndetails = 1;

Finally read-only file groups can be restored after the database has been brought online after restoring all backups of read-write file groups:

EXEC master.dbo.xp_restore_database @database = '', @filename = ''
, @filegroup = '', @encryptionkey = N'IMMAENCRYPTCHYU', @with = 'RECOVERY'
, @with = 'MOVE '''' TO ''''', @returndetails = 1

Updating Tables with Read-Only Partitions

I learned an important thing today. I’ve started partitioning our data warehouse and we’ve moved into our testing phase. The partitions are in place, the archive is read-only and once a month the automated partition maintenance kicks off. It all seemed to be going swimmingly until…

A couple of times our data warehouse folks have found that an updated to a record on the read-write partition would fail with the following error:

Msg 652, Level 16, State 1, Line 1 The index “PK_STG_PHX_LN_ACCT_INT_OPT” for table “dbo.STG_PHX_LN_ACCT_INT_OPT” (RowsetId 72057594415022080) resides on a read-only filegroup (“ARCHIVE”), which cannot be modified.

Generated by this lovely ETL software generated SQL:

SET ETL_JOB_SEQ_NUM = 1027140 ,
ETL_IUD_CD = 'D' ,
ETL_UPD_DTM = CONVERT(DATETIME, '03/16/2016 01:01:17')
WHERE ETL_VERS_START_DTM = CONVERT(DATETIME, '12/27/2014 00:26:24.137')
AND ACCT_NO = '5001194157'

The table is partitioned on a column  (ETL_VERS_END_DTM)  that indicates if the record is current or not so *all* records being updated are expected to be on one end of the partitioned table. Since the partitioned column was not included in the WHERE clause of the update statement SQL generated an execution plan that indicated it would not be inconceivable the update statement could affect rows in read-only partitions of the table and then failed.

The fix is simple: include the partitioning column in the where clause. In our case we know that records being updated will always have a datetime value of ‘99990101 00:00:00.000’ (because consultants) and they are partitioned accordingly.

SET ETL_JOB_SEQ_NUM = 1027140 ,
ETL_IUD_CD = 'D' ,
ETL_UPD_DTM = CONVERT(DATETIME, '03/16/2016 01:01:17')
WHERE ETL_VERS_START_DTM = CONVERT(DATETIME, '12/27/2014 00:26:24.137')
AND ACCT_NO = '5001194157'
AND ETL_VERS_END_DTM > '2115-01-01';

And the SQL is able to determine the query will not touch a read-only partition (and since the partition column value is not being updated there is no risk of movement to a read-only partition).

In a currently strange case of SQL Magic (otherwise know as SQL internals): setting  ETL_VER_END_DTM to the expected value (i.e. it’s current value) but not including it in the where clause also produces an execution plan that allows SQL to proceed and succeed despite not eliminating the possibility it exists in the read-only partition before it begins. I’ll update this later with a link to any explanation that I find as I try to wrap my head around it.

UPDATE: In the simplest terms (for folks like me) the query originally ended up with a shared rowset for the both the read and the update when it encountered the read-only partition SQL had to bail. Including the end date in the update clause prevents SQL from using a shared rowset under the covers so when it gets to the update portion its rowset includes only rows from the read-write partitions. A link to Paul White’s post on how rowset sharing causes the error is here.

Many thanks to Aaron Bertrand (b|t) and Paul White (b|t) for helping me to understand what was going on over on Stack Exchange. Not only was I able to wrap my head around some non-intuitive internal processes but it reaffirmed why I think the SQL Family is so great.

Starting Big

I’ve long imagined that my first blog would be about the importance and scheduling of backups (very) or maintaining SLAs (very important and they should be reviewed at least once a year). Instead I believe I am going to eat the proverbial elephant blogging my process for slicing and dicing my company’s shiny new data warehouse into maintainable pieces.

About a year ago our onsite consultants started to come online and build out our structure. I was only tangentially attached to the project in that they knew they’d need some SQL servers at some point. I started to do some research into VLDBs (while it’s started out “small” the new policy was to delete nothing ever again) and quickly discovered the usefulness of partitioning (both for maintenance tasks and ETL processes. I of course forwarded on the links and after a brief discussion was told that it could be addressed later. In my naivete of course this sounded reasonable and I continued on with the million other projects I was eyeball deep in.

Woe to you who is not sure enough to stand your ground! 9 months later when I learned that I’d be dissecting their largest databases table by table to get even a reasonable maintenance plan going. After a mere 9 months backups are already up to two hours and it was time to prove that a partitioned database could alleviate the inevitable backup and index maintenance apocalypse.

So that’s where this series is starting: me, my planning and then table by table through our two largest databases (ps I won’t be posting table by table).

Cartoon of a charging elephant

I am beginning to rethink my methodology for eating an elephant…

SQL Saturday #446 & RedGate SQL in the City

So I’m missing out on the Summit 15 keynote this morning as I was sadly paged last night. Between that and the first timers meetup and welcome reception I was completely unable to get up at an appropriate time today. While my work colleague gets ready I wanted to jot down a few impressions from the last few days.

SQL Saturday #446: Portland, OR

This was my third SQL Saturday in Portland and the event continues to be fantastic for a number of reasons. It’s right before Summit so a number of speakers at summer present in Portland that Saturday before so you can open up your Summit schedule if you get in on the sessions early. It’s a great networking event (I think there was around 400 SQL Family that showed up?).  I attended a great session on auditing in SQL Server from Colleen Morrow (bt) The really short and skinny: in SQL 2008 Standard you can only audit at the instance level but in 2008 Enterprise and SQL 2012 and above you can drill down into specific databases with your auditing. It all makes use of extended events so it’s light weight. It’s configurable in filtering, retention and handling auditing errors (lose the audit info, cancel the transaction that triggered it etc). She also shared a plan for centralizing the collection and ingestion of the audit logs that were created in a central location for reporting purposes. Scripts and slide deck here.

SQL in the City

This was my second SQL in the City and it was even better than my previous experience as well: new products like ReadyRoll and InstantClone really speak to some pain points at our organization and I’m looking forward to see we can fully implement them to make it all better. Or at least mostly better. Or at least version or databases: I hear that’s a good start!

I’m super excited and it’s time to hop on a train to start hitting Summit sessions. If you found this post and found it useful let me know!