Deploying the MSDTC in a Hightly Secure Environment

This is just a fancy way of saying you need to be better friends with who ever is managing your enterprise firewall. I hadn’t had to touch the DTC until a recent vendor insisted their application wouldn’t work without it (despite their only having a single data store). The MSDTC was developed to coordinate transactions that would span multiple machines and was originally introduced in SQL Server 2000.

In theory it’s not super complicated: just enable the DTC service/communication on the servers in question and turn on some built in firewall rules on the servers right? Almost.

First you need to enable the DTC service. This is located in the Component Services manager:Screen shot of the Component Services Manager expanded out to show the Local DTC service.

Right click and open the Local DTC properties and select the Security tab: Screenshot of the security properties tab for the Local DTC service. We want to check the option Allow Network DTC Access as well as Allow Remote Clients. Under the Transaction Manager Communication the best practice would be to Require Mutual Authentication as our machines should ideally be on the same domain and aware of each other. No Authentication Required is a hold over and should only be used if you meet certain–sad–conditions. Additionally you may want to use a service account to run the DTC. (In hindsight it may have made my enterprise firewall adventures less adventurous to have done this: it would have made creating the enterprise firewall rule much more simpler: just allow the service account to the do the things!)

Once that’s out of the way there are 3 inbound and 1 outbound Windows firewall rules that need to be enabled to allow DTC traffic. This can be done manually or with the following Powershell Command.

Enable-NetFirewallRule -DisplayGroup "Distributed Transaction Coordinator

You’ll also need to set up a couple more rules to allow RPC calls to make it through the Windows Firewall. This page will you walk you through those rules.

It’s likely that if you are in a highly secure environment you’ll need to ensure the DTC transactions run on predictable ports. This article walks you through the process of configuring the DTC service to run on a specific port range. You’ll need to ensure there are firewall rules in place (both local and at the enterprise level) to allow that traffic.

Once you think you have everything configured and your firewall rule request has been closed out by information security it’s time to test it to be sure everything is working. Thankfully it’s PowerShell to the rescue again. The WinRM tools provides a cmdlet for this (please note it must be run in a console running as admin):

Test-Dtc -LocalComputerName "$env:COMPUTERNAME" -RemoteComputerName "mytargetserver" -ResourceManagerPort -[portnumber] -Verbose

Important notes from working my way through this: WinRM service will be making http calls on port 5985. Similarly the diagnostic tool starts a resource manager that defaults to running on port 0 unless you pass it a different option, I found it easiest to assign it a port at the end of the dynamic range I’d assigned to the DTC. The machines also need to be able to ping one another.

This week was not the most fun but hopefully this save someone (or even future me) some time in the future.

Other helpful links:

MS MSDTC documentation

Documentation of the Test-Dtc cmdlet

Summit 2019 Post Mortem

Last week was a very excellent PASS Summit (made somewhat bittersweet by the fact that we don’t know exactly when we’ll return to Seattle) and I wanted to captures some of the things I brought back from a high level.

Big SQL News

A couple of announcements of note were made the first of which being the promotion of SQL 2019 to GA. 2019 includes such fancy features as the ability to perform data classification via metadata tables (as opposed to attaching that information to a columns extended properties). This feature will inter-operate with auditing to allow you to target the info that needs the most attention in your database. Accelerated Database Recovery to speed up rollbacks and make transaction log work more efficient. The ability to run R natively in SQL Server has been expanded to much larger collection of languages via sp_execute_externalscript. Better support for Docker and Kubernetes and more.

Azure Arc (currently in preview): This is crazy to me (in a fantastic way) but we can now run Azure services on prem! This makes infrastructure as code or the ability to allow teams to self-service their server needs that much easier. Plus you get the benefit of Azure threat analysis for those locally running services too. Currently it’s limited to Azure SQL Database and Azure Database for PostgresSQL Hyperscale. (More here.)

Azure Data Warehouse is now Azure Synapse Analytics: Microsoft is working to extend the data warehouse by connecting it more to our data lakes and analytics to provide more bang for the buck. Deep integration with AI and BI allow that workload to sit immediately next to the data in a unified environment.

On the Classroom Side

I believe the learning pathways they tried out this year were a success and overall at every time slot I was forced to make hard choices about what session to attend. The new MS certifications for Azure Data Engineer are an excellent area for career growth that brings together varied data wrangling skills into a new discipline to provide data for reporting and analysis in our organizations. The learning pathways started with a general understanding of this new role before spreading out to cover the various tools and Azure areas where Data Engineers will be spending their time. Similarly the AI learning pathways had a number of good sessions ranging from an introduction to AI, an overview of the process behind machine learning and how DBAs can better support their data scientists within the SQL Server stack.

I’m a privacy/data ethics nerd so I attended a couple of sessions on  data masking and data cataloging that were interested and really drove home that very few of us are ready for GDPR (or GDPR-like legislation) even now that the GDPR is here and in effect. (Up to 50% of UK companies are still not GDPR compliant!) There were also a number of excellent professional development sessions on technical leadership or helping drive organizational change as part of IT.

My favorite part was how much certain things came up over and over again no matter what session I was in: things like PowerShell (and specifically dbatools) as well as Azure Data Studio (the ability to direct folks internally to a tool that isn’t SSMS for data exploration has been a big driver at my org) especially the further development/integration of Jupyter Notebooks.

But my truly favorite part (as always) was the ability to connect up with other data professionals, face-to-face. To be able to shake the hand of the people that work so hard on dbatools, give feedback to the Azure Data Studio team or even just find another DBA facing similar challenges so we can cry in our beers together. PASS Summit has definitely been the biggest driver of my career learning since I started attending in 2014. The good news is you can register now for Summit 2020 in Houston: I hope to see you there!

Backing Up A Cosmos DB with the Cosmos DB Migrator Tool

CosmosDB really is an amazing datastore and even better (you might be thinking): Microsoft handles the backups for you. Which is true. They take backups every four hours and keep the last two. If you need anything recovered from the database you’d better hope that you notice with in that window *and* get a ticket open with Microsoft to get it fixed. This being the case Microsoft helpfully recommends that in addition to the by default backups that come with the Cosmos DB service that you export your data to a secondary location as needed to meet your organizations SLA. Data Factory to the rescue right? Again, almost.

Unfortunately if you are restricting access to your Cosmos DB service based on IP address (a reasonable security measure) then Data Factory won’t work as of this writing as Azure Data Factory doesn’t operate like a trusted Azure service and presents as IP address from somewhere in the data center where it is spun up. Thankfully they are working on this. In the meantime however the next best thing is to use the Cosmos DB migration tool (scripts below) to dump the contents to a location where they can be retained as long as needed. Be aware in addition to the RU cost of returning the data that if you bring these backups back out of the data center where the Cosmos DB lives you’ll also incur egress charges on the data.

The script reads from a custom json file, this will contain the cosmos db service(s), as well as the databases and collections that need to be backed up. This file will have the read-only keys to your cosmos DB services in it so should be encrypted on the disk in order to limit the number of people who can access the file.

Once the config file is in place the following PowerShell will read the file and backup the appropriate services, databases and collections appropriately (and remove any old backups that are no longer needed).

While this is not ideal if you have a need to immediately start backing up your cosmos dbs this will do the trick until Microsoft finishes incorporating Data Factory into their trusted services.

[Edited to add 10/3/2019:] Just yesterday it looks like MS updated their timeline for adding the needed functionality to ADF.

What’s a Data Catalog? Denver Edition

I had the wonderful opportunity to present an introduction to data catalogs to the Denver SQL Users Group yesterday. I am grateful for the chance to work on my presenting skills and talk about the new challenges about identifying and protecting risky data. I’ve uploaded the slide deck and some sample PowerShell scripts for SQL Data Catalog to my git repo here. Additionally there was a question about re-using Data Catalog work for data masking: Richard Macaskill’s blog post I mentioned is here. Thanks Denver: I am looking forward to coming back and visiting again!

SQL 101: What Happens When…

I had an interesting question asked this last Saturday when I was giving my Intro to SQL session at SQL Saturday Redmond: “What happens if you cast a NVARCHAR string as  VARCHAR?”

I had an educated guess but I didn’t have the time during the session to test it out: Casting NVARCHAR to VARCHAR. It won’t throw an error but it will destroy any data in the string that isn’t VARCHAR compatible. Try out the test below yourself!

The question was asked initially because she had found a place where she had to convert a uniqueID to a string and was concerned that perhaps this could cause issues. The uniqueidentifier is a series of hexadecimal numbers concantenated together with hyphens (for example: 851CC794-E0CB-49DC-976F-1687F62D2188). They will always be 36 “characters” long. The good news is that according to BOL the uniqueidentifier is consider a character type for the purposes of conversion. Attempting to cast a string that is longer than 36 characters will truncate the original string to the first 36 before it attempts the conversion. Conversely casting a uniqueidentifier to a string datatype will result in a 36 character string (unless you specify less than 36 characters in which case SQL will throw an error).

Presenting Away!

I’ll be presenting my introduction to SQL queries in just a few short days at SQL Saturday Redmond! If you know somebody that’s just getting started with SQL and is wants an introduction to how to query and join tables: have I got a session for you! (It’s this one.) This will be my first time presenting away from my home turf of Spokane so I’m only a little nervous about it: I hope I see you there!

SQL 101: Query Basics

SQL is an acronym for Structured Query Language so when getting started with it I think it’s best to think about how it was put together to be able to ask questions about data. With that in mind let’s a take a look at some beginning SQL to understand how to write queries.

Queries can be very simple but they won’t answer very many questions in their most simple forms:

Will not a table and filter are clearly not required to write a query, the most common use for writing a query will require 3[1] parts to be useful:

  • a SELECT-ion of things (columns)
  • FROM a a collection of things (a table)
  • WHERE a criteria is met (filter(s))

I happen to have a database of movie data. Let’s explore some common filter operators by asking some questions about movies. Filters based on mathematical operators are the easiest to start with. We can ask for films with exactly 2 Oscar nominations or more than 4 or less than 6. We make these filters exclusive or inclusive by adding the “=” operator to the comparison. We can also exclude a specific number by using “<>” (Microsoft SQL Server will also support != but this is outside the ANSI standard so “<>” would be best practice as it will be usable in any version of SQL.)

We can have multiple conditions in our WHERE clause and these are combined with AND or OR statements. We can ask for films with 1, 2 or 3 Oscar nominations (but we’ll also see we have a new operator IN that is cleaner and accomplishes the same goal). We can check for movies released on or after Jan 1, 2017 and were nominated for at least 1 Oscar. We can ask about movies released before the year 1950 that made more money than they spent.

If you’re just getting started with writing SQL queries I hope this was helpful! Next up I’ll cover a few tips for working with dates as well as specific operators for filtering strings.

[1] It’s important to note that SQL does not guarantee any sort of order in your results and thus the last part of a query is generally an ORDER BY statement that tells SQL what order to put the results in.

Data Privacy Is Coming For You

I’ve been thinking a lot about GDPR and similar legislation for the last year or so. I think that codifying how companies treat our data (and what they can and can’t do it with) is a huge step in the right direction. Late in 2018 I was happy to find myself working with Redgate on their SQL Data Catalog early access program as we started looking at ways to attach and track metadata about our data in our environment. Last month Washington state (where I currently reside and work) proposed it’s own GDPR like legislation. Since we aren’t likely to get any federal level legislation anytime soon it’s most likely the US DBAs will have to learn about a variety of laws and investigate (along with a legal team) what they actually mean and if they impact their organization.

In the mean time now we can still a jump start on collecting the data we need to meet what is sure the bare minimum requirement of these laws: getting a handle on what you actually have in a way that is accessible and reportable. Also reviewing your organizations practices for cleaning data in dev and test environments. This means generating test data or masking/altering production data to remove/replace sensitive information. If digging into market trends to see what other organizations are doing is your thing Redgate is making the 2018 Data Masking Market Guide available for your perusal here. I’m really enjoying my foray into these questions! Have you started trying to solve these problems in your org? What have you run into that you weren’t expecting when you started?

Friend of Redgate!

forg_2019_150So I tried a very new thing (for me anyway) and after a fair amount of dithering about it I told the little naysayer that lives in the back of my head to shut the hell up and applied for the Friend of Redgate program at the end of December.

I’m writing about it so of course you may easily guess the outcome: I was accepted into the program! I really am a huge fan of their tools ever since I first found SQL Compare 5 years ago (to the month almost now that I think about it) after a ridiculous manual deployment. In the time since I’ve deployed and learned to use more of their tools. They have created a wonderful community to help support the SQL community at large and I am looking forward to working hard to contribute as well! Currently my biggest engagement is with their new SQL Data Catalog. As more and more legislation is created to protect data in our control it’s more and more vital to be able to know exactly what types of data we have so we can take concrete steps to ensure its protection: a data catalog is an integral tool to get that understanding. I really love this problem space and I have lots of ideas and opinions about it!

Getting SSRS Details via Powershell

This is one of those posts so I never have to google this again (one hopes). Here is the PS code to pull back a set of details about every SSRS instance installed on a server including the SSRS instance name, & the first URL port it is running on, the service name and the name of the report server database etc.