What’s a Data Catalog? Denver Edition

I had the wonderful opportunity to present an introduction to data catalogs to the Denver SQL Users Group yesterday. I am grateful for the chance to work on my presenting skills and talk about the new challenges about identifying and protecting risky data. I’ve uploaded the slide deck and some sample PowerShell scripts for SQL Data Catalog to my git repo here. Additionally there was a question about re-using Data Catalog work for data masking: Richard Macaskill’s blog post I mentioned is here. Thanks Denver: I am looking forward to coming back and visiting again!

SQL 101: What Happens When…

I had an interesting question asked this last Saturday when I was giving my Intro to SQL session at SQL Saturday Redmond: “What happens if you cast a NVARCHAR string as  VARCHAR?”

I had an educated guess but I didn’t have the time during the session to test it out: Casting NVARCHAR to VARCHAR. It won’t throw an error but it will destroy any data in the string that isn’t VARCHAR compatible. Try out the test below yourself!

/* Preface the string with N or
SQL will pre-emptively destroy
the data before storing the string */
DECLARE @nv AS NVARCHAR(10) = N'╥«┘{║‼℈≠'
, @v AS VARCHAR(10);
/* And by reassigning we can see
the data is permanently lost */
SELECT @v = @nv;

The question was asked initially because she had found a place where she had to convert a uniqueID to a string and was concerned that perhaps this could cause issues. The uniqueidentifier is a series of hexadecimal numbers concantenated together with hyphens (for example: 851CC794-E0CB-49DC-976F-1687F62D2188). They will always be 36 “characters” long. The good news is that according to BOL the uniqueidentifier is consider a character type for the purposes of conversion. Attempting to cast a string that is longer than 36 characters will truncate the original string to the first 36 before it attempts the conversion. Conversely casting a uniqueidentifier to a string datatype will result in a 36 character string (unless you specify less than 36 characters in which case SQL will throw an error).

Presenting Away!

I’ll be presenting my introduction to SQL queries in just a few short days at SQL Saturday Redmond! If you know somebody that’s just getting started with SQL and is wants an introduction to how to query and join tables: have I got a session for you! (It’s this one.) This will be my first time presenting away from my home turf of Spokane so I’m only a little nervous about it: I hope I see you there!

SQL 101: Query Basics

SQL is an acronym for Structured Query Language so when getting started with it I think it’s best to think about how it was put together to be able to ask questions about data. With that in mind let’s a take a look at some beginning SQL to understand how to write queries.

Queries can be very simple but they won’t answer very many questions in their most simple forms:

SELECT 'MyData';
Best practices of course would be to include the specific columns
in your select list so the SQL query engine can optimize the return
of the necessary data. Selecting * from the table is a lot like
eating an entire pizza because you wanted a single slice.
FROM myTable;
view raw sqlquery1.sql hosted with ❤ by GitHub

Will not a table and filter are clearly not required to write a query, the most common use for writing a query will require 3[1] parts to be useful:

  • a SELECT-ion of things (columns)
  • FROM a a collection of things (a table)
  • WHERE a criteria is met (filter(s))

I happen to have a database of movie data. Let’s explore some common filter operators by asking some questions about movies. Filters based on mathematical operators are the easiest to start with. We can ask for films with exactly 2 Oscar nominations or more than 4 or less than 6. We make these filters exclusive or inclusive by adding the “=” operator to the comparison. We can also exclude a specific number by using “<>” (Microsoft SQL Server will also support != but this is outside the ANSI standard so “<>” would be best practice as it will be usable in any version of SQL.)

/* films with exactly 2 oscar nominations */
SELECT name, releaseDate
FROM films
WHERE oscarNoms = 2;
/* films with more than 4 oscar nominations*/
SELECT name, releaseDate
FROM films
WHERE oscarNoms > 4;
/* films with 6 or less oscar nominations*/
SELECT name, releaseDate
FROM films
WHERE oscarNoms <= 6;
/* films with any number of oscar nominations except 3 */
SELECT name, releaesDate
FROM films
WHERE oscarNoms <> 3;

We can have multiple conditions in our WHERE clause and these are combined with AND or OR statements. We can ask for films with 1, 2 or 3 Oscar nominations (but we’ll also see we have a new operator IN that is cleaner and accomplishes the same goal). We can check for movies released on or after Jan 1, 2017 and were nominated for at least 1 Oscar. We can ask about movies released before the year 1950 that made more money than they spent.

/* What movies have been nominated for 1, 2 or 3 Oscars? */
SELECT name, releaseDate, oscarNoms
FROM films
/* if any one of the following conditions is true for a row it will be returned */
WHERE oscarNoms = 1
OR oscarNoms = 2
OR oscarNoms = 3;
/* These can more easily be written with the IN clause */
SELECT name releasedDate
FROM films
WHERE oscarNoms IN (1, 2, 3); -- this is the same as the OR statement above
/* Oscar nominated moveis released on or after Jan 1, 2017 */
SELECT name, releaseDate, oscarNoms
FROM films
/* both of the conditions on either side of the AND need to be true for the row to be valid */
WHERE releaseDate >= '2017-01-01' -- SQL will convert this string into a date YYYY-MM-DD is the easiest way to write this
AND OscarNoms >= 1;
/* we can even compare column values in rows to each other */
SELECT name, releaseDate
FROM films
WHERE releaseDate < '1950-01-01'
AND boxOffice > budget;
/* Bonus query!
You can combine and seperate WHERE clauses with parenthesis
these can be used to have more complex statements for
returning rows: essentially you working to return a true/false
statment for returning any given row (also called tuples)
SELECT name, releaseDate
FROM films
WHERE ( /* this outer statement is true if either of the two
parentheticals below are true */
(oscarNoms > 2) OR
(releaseDate > '1980-12-31' AND budget >= 100000 )
) /* if the previous was true AND the box office was less
than $500,000 then the row can be returned */
AND boxOffice < 500000;
view raw sqlqueryorderby.sql hosted with ❤ by GitHub

If you’re just getting started with writing SQL queries I hope this was helpful! Next up I’ll cover a few tips for working with dates as well as specific operators for filtering strings.

[1] It’s important to note that SQL does not guarantee any sort of order in your results and thus the last part of a query is generally an ORDER BY statement that tells SQL what order to put the results in.

Data Privacy Is Coming For You

I’ve been thinking a lot about GDPR and similar legislation for the last year or so. I think that codifying how companies treat our data (and what they can and can’t do it with) is a huge step in the right direction. Late in 2018 I was happy to find myself working with Redgate on their SQL Data Catalog early access program as we started looking at ways to attach and track metadata about our data in our environment. Last month Washington state (where I currently reside and work) proposed it’s own GDPR like legislation. Since we aren’t likely to get any federal level legislation anytime soon it’s most likely the US DBAs will have to learn about a variety of laws and investigate (along with a legal team) what they actually mean and if they impact their organization.

In the mean time now we can still a jump start on collecting the data we need to meet what is sure the bare minimum requirement of these laws: getting a handle on what you actually have in a way that is accessible and reportable. Also reviewing your organizations practices for cleaning data in dev and test environments. This means generating test data or masking/altering production data to remove/replace sensitive information. If digging into market trends to see what other organizations are doing is your thing Redgate is making the 2018 Data Masking Market Guide available for your perusal here. I’m really enjoying my foray into these questions! Have you started trying to solve these problems in your org? What have you run into that you weren’t expecting when you started?

Friend of Redgate!

forg_2019_150So I tried a very new thing (for me anyway) and after a fair amount of dithering about it I told the little naysayer that lives in the back of my head to shut the hell up and applied for the Friend of Redgate program at the end of December.

I’m writing about it so of course you may easily guess the outcome: I was accepted into the program! I really am a huge fan of their tools ever since I first found SQL Compare 5 years ago (to the month almost now that I think about it) after a ridiculous manual deployment. In the time since I’ve deployed and learned to use more of their tools. They have created a wonderful community to help support the SQL community at large and I am looking forward to working hard to contribute as well! Currently my biggest engagement is with their new SQL Data Catalog. As more and more legislation is created to protect data in our control it’s more and more vital to be able to know exactly what types of data we have so we can take concrete steps to ensure its protection: a data catalog is an integral tool to get that understanding. I really love this problem space and I have lots of ideas and opinions about it!

Getting SSRS Details via Powershell

This is one of those posts so I never have to google this again (one hopes). Here is the PS code to pull back a set of details about every SSRS instance installed on a server including the SSRS instance name, & the first URL port it is running on, the service name and the name of the report server database etc.

$servername = 'myserver'
$key = "Software\\Microsoft\\Microsoft SQL Server"
$reg = [Microsoft.Win32.RegistryKey]::OpenRemoteBaseKey('LocalMachine', $server)
$regKey = $reg.OpenSubKey($key)
$keys = $regKey.GetSubKeyNames()
$v = 0
foreach($k in $keys)
if(( $k -match 'MSRS[\d\d].') -and ($k -notcontains '\@'))
$pv = $k.Substring(4, 2)
if ($v -le $pv)
$v = $pv
$rs = "RS_" + $k.Substring($k.IndexOf('.') + 1)
$nspace = "root\Microsoft\SQLServer\ReportServer\$rs\v$v\Admin"
$RSServers = Get-WmiObject -Namespace $nspace -class MSReportServer_ConfigurationSetting -ComputerName $servername -ErrorVariable perror -ErrorAction SilentlyContinue
foreach ($r in $RSServers)
$ssrsHost = $r.InstanceName
$ssrsVers = $r.version
$ssrsDB = $r.DatabaseName
$ssrsShare = $r.IsSharePointIntegrated
$ssrsService = $r.ServiceName
$vPath = $r.VirtualDirectoryReportServer
$urls = $r.ListReservedUrls()
$urls = $urls.UrlString[0]
$urls = $urls.Replace('+', $servername) + "/$vPath"
# do a thing with this specific SSRS instance details
view raw SSRSConfigObjs.ps1 hosted with ❤ by GitHub

2018 in the Rearview, 2019 Looming Large

2018 was a challenging year for all sorts of reasons. I set some modest goals for myself and it was mixed bag. My work goals got kind of demolished: I failed to post much last year (although I am happy I now have a post to return to every time I need to write a pivot) and after multiple attempts I failed to pass the MS 70-463 exam on building a data warehouse (I just couldn’t seem to get the last 50-100 points to push me over the top). My personal goals I did okay on though: I passed my nidan test back in September and I am about 1/3 of the way through learning to finger pick Loch Lomond (definitive version is Runrig’s live version) on my tenor uke. I would think the year was failure however I did a lot (especially in the first 3 months):

  • Coordinated and ran the first SQL Saturday here in Spokane.
  • Presented at said SQL Saturday.
  • Managed to keep the local PASS chapter meeting every month (even while I was out of the country with the assistance of my co-leader Elizabeth Hunt (t).
  • Attended SQL Sat Oregon, rode the SQL Train to PASS Summit. I finally managed to make the Summit meet-up with other chapter leaders and SQL Saturday organizers: so much learning and networking.
  • I started the process for creating a not for profit to handle the money for our SQL Saturday since it was successful enough to warrant another one. We officially incorporated right before Christmas.
  • At work I finally got automated backup testing up and running. It’s probably a little over-engineered but it was a solid first effort.
  • I also began attending the data governance meetings at work so we could start incorporating our SQL data into their consideration and I got us enrolled in the EAP for Redgate’s Data Catalog.

This last one is by far one of my favorite things this year. I’ve spent a lot of time thinking about how we treat our data but beyond being sure it was encrypted and backed up I haven’t been able to do much else concrete but in the last month of 2018 I was able to start cataloging data in our internally developed databases and get start on one of our bigger vendor databases. Adding this metadata will provide us with a much better view of what kinds of data we are storing and where it is.  I’m happy with this initial run at the problem space and I can already see where it starts to answer some questions we should all be asking about the data in our care.

2019 is looking to be another challenging and busy year: I need to refocus and look at my path to an MCSA/MCSE certification. I’m hoping I can take the 70-764 and 765 exams this year. I was hoping to get my 2012 MSCA and upgrade that but I still don’t do enough SSIS work to get myself over the last point hurdle with the 70-463. And being certified on the upcoming versions is probably worth my time.

I’ll finish putting together the Inland Northwest Data Professionals Association in the next couple of months, just in time to run the second SQL Saturday Spokane. I also applied to the Friends of Redgate program and we’ll see what comes of that: that could lead to a whole host of community contributions that I can’t even foresee right now.

SQL 102: A Useful Pivot

I don’t find myself pivoting data often but when I do I seem to always need to do the “complicated” pivot. You know, the one where you have a collection of unique values like email addresses where the value isn’t the column name but you want the 4 email addresses connected to one person in a single row rather than four rows. I can never remember how to do that so here is my personal internet bookmark for that code. Thanks to the late Robert Davis who gave the answer I needed back in 2015. Robert was a wonderful and generous contributor to the SQL Family his site is full of useful information and is still available here.

The solution is to add the categories you want as eventual columns to your result set before pivoting (it sounds simple enough but again then I can never remember it). So if you have say an account with multiple contacts:

/* A list of contacts */
SELECT a.acctNo, c.name, c.email
FROM contacts c JOIN accounts a ON a.acctNo = c.acctNo
ORDER BY a.acctNo, c.contactNo

This returns a list of results such as:

acctNo name email
1001 Susan susan@thesuze.com.edu
1001 Mark mark@thesuze.com.edu
1002 Aoife theboss@bagpipebattle.com.edu

The one simple trick is to create a value for each row that will be repeated it can be used as a column name when you pivot:

/* A pivoted list of contacts */
WITH contactsCTE AS (SELECT a.acctNo
, c.name
, c.email
FROM contacts c JOIN accounts a ON a.acctNo = c.acctNo)
, MIN(name1) AS Name1
, MIN(email1) AS Email1
, MIN(name2) AS Name2
, MIN(email2) AS Email2
FROM (SELECT cte.acctNo
, 'Name' + CAST(cte.ConNumber AS VARCHAR) AS NameNumber --<=== This creates a valid column name
, cte.name --<=== The actual values that will be attached to the named column
, 'Email' + CAST(cte.ConNumber AS VARCHAR) AS EmailNumber
, cte.email
FROM contactsCTE cte) AS pvt
PIVOT (MIN(name)
FOR NameNumber IN ([Name1], [Name2])) AS NamePvt --<== include as many col + # values in this set as want to ultimately return in the row
PIVOT (MIN(email)
FOR EmailNumber IN ([Email1], [Email2])) AS EmailPvt
GROUP BY acctNo;

Simply repeat the structure of creating a valid column name (‘String’ + CAST(counter as VARCHAR)) and and you can include a pivot across as many different columns as need to be included. Our short list above now becomes:

acctNo Name1 Email1 Name2 Email2
1001 Susan susan@thesuze.com.edu Mark mark@thesuze.com.edu
1002 Aoife theboss@bagpipebattle.com.edu NULL NULL

Hopefully now in the future I’ll remember I wrote this and I won’t have to go searching for the solution (again).

Remarkably Honored

So this is a thing: I was nominated as an outstanding volunteer for PASS for April 2018. I really appreciate that someone thought I was doing something outstanding. I am bad at both accepting compliments and self promotion. I really believe in the PASS as a positive extension of the SQL community. I know that I wouldn’t be where I am today without the generous and helpful people who want to help myself and others do and be their best meeting the data needs of our respective employers/clients/organizations. It’s only natural that you turn around and help those around you that want to do better for themselves as well. I’m excited for the growth of our users group and the success of our SQL Saturday: I could not have done it alone.