DB Deployments Part 3: Time For A Pipeline!

All right, previously I configured Team City, created templates in the DBA Project directory that will be used by any child projects and then set up Octopus Deploy. Today I’ll talk a little about deploying an actual project in team city. Navigating to the page for the DBA Projects folder I can find the list of subprojects. This includes the handy “Create subproject” button. The project can be built out manually but as I have added my Github information I can also simply browse my organizations repositories to immediate link the TeamCity project to the appropriate source code. All I need to do after Team City verifies the repository is name the first build configuration. The build will inherit the build steps from the template I configured previously. The first settings I want to enter are the specific VCS settings. In the side bar for the project there is a link for VCS Roots. It indicates there is 1 VCS root configuration. I navigate to that tab and click Edit on the root that is there. Team city automatically appended #refs/heads/master to the VCS name and being the tidy person I am I remove that and regenerate the VCS root ID so they match. We’re going edit a few other settings as well:

  • I want to ensure the default branch is “refs/heads/develop”
  • I also add “+:refs/heads/*” to Branch specification. This filters the branches the VCS will see to branches that match the pattern defined. In this case I am adding any branch that starts as “refs/heads/”

Lastly it pulled in my Github account details to access the repository but that should be changed to use a dedicated Github account to be used solely by the Team City. Depending on how your organization is set up there may be a little extra work for this step. Our company enforces two factor authentication which means for any automated account to access repos independently we need to use ssh rather than https to connect to the repo. It also means we need to create an ssh key to use as well. This article is a good walk through of setting up an ssh key as well adding that key to the repository (if you want to go that route) although we simply added the SSH key to the account for our Team City user so it’s valid across all the projects it will access. The Fetch URL needs to be updated to the ssh URL. Under the Authentication Settings the method should be changed from Password to one of the key options. Our set up we uploaded the key to the Team City server because for one reason or another (we never really were able to figure out why) I was unable to use the Custom Private Key option. I added the key to the DBA projects folder so that it could be used in all sub-projects. Once these settings are configured I can test them with the “Test connection” button and save it if successful.

The template the dev build is based on is configured to be the development build and requires no changes so I can start adding builds based on the templates and configuring them as needed. First up is the master build: On the project home page I click “Create build configuration.” This time I am going to select the option to do it manually. I give it a valid name and then select the appropriate template (in this case DBA Build Template in the drop down box for “Based on Template.” Once created we need to add a VCS root for the build: under the Version Control Settings we need to click “Attach VCS root” and select the existing VCS root used in the default dev build. Next we move down to triggers and update the trigger settings that were inherited. The only change to the trigger for this build is to change the branch filter to “+:master.” this will filter out any branches except the master branch. The last bit is to ensure the build steps are customized as needed. Thankfully nothing actually needs to change except our deploy step: Remove the Deployment Enviroment(s) value and set the Release Channel value to Master (this will be configured used on the Octopus Deploy side).

To set up a build for Pull requests I need to make a few extra changes. I manually create one more build configuration and name it [database name] Merge Build. This time I use the Merge Build Template I created. I need to attach a VCS root but in this case I am going to end up creating a duplicate: I start by selecting the same VCS root and then I’ll be making some changes. I leave the default branch alone but I change the branch specification to “+:refs/pull/(/merge)” this filters out anything except the temporary branches created by pull requests in Github. At the bottom near the save button Team City has a helpful warning that I am changing an object used by more than one build configuration and gives me the option to save it for just this Build Configuration (creating a copy of the VCS root). I update the name of the VCS root (appending “[Merge Build]”), regenerate the VCS root ID and save. I update the branch filter on the Trigger to “+:, -:<default>” (If I don’t add -:<default> the merge builds also seem to run when I push changes to my default branch—develop.) Lastly I don’t want pull requests to trigger deployment of code so I disable steps 4 & 5 in the build steps to ensure we ignore Octopus Deploy for this build.

And that’s my current set up. I’ll gladly tell you that the process was a bit frustrating initially but with some help (so much thanks to Hamish Watson) I was able to get my head wrapped around it and as much as I knew that it could shrink the turn around time from developement to deployment I still found myself surprised at how quick I was able to turn around fixes we discovered we needed the week after I had this set up. I am pretty sure there are things that I could have done better but I definitely have the time to focus on that now. If you’re an Ops person on the fence about getting started with deployment automation I hope this walk through was helpful!

DB Deployments Part 1: Configure Team City

The first piece of the database DevOps puzzle is automating the build. Team City is a build server that developers use to compile and test their code. It consists of a central server and a number of agents. Agents are secondary machines that will pull code from source control, run tests, compile binaries and assemble deployment packages. Depending on your environment you may only need one machine. Our deployments are currently infrequent enough (mostly because they not needed often but also because they are manual so we avoid it unless absolutely necessary) that we only need on agent and it runs on the same machine as the central server. If we get busier and notice that builds are stacking up in a queue we can add more machines to install agents on that will take on that work as needed.

You will need a service account for the build server and agents. You can create specific users or possibly even a single gMSA to apply to the appropriate services1. Installation is relatively simple: download the installation files from JetBrains (https://www.jetbrains.com/teamcity/). Do that and then come back.

A portion of the Team City Administration Navigation
Team City Admin Navigation

Ready? So the first thing that needs setting up is Authentication: We enabled Microsoft Windows domain authentication as well as the token based authentication module (we like PowerShell on our Team). We also kept the built in authentication (we have admin user credentials stored in a secret server in case we need them). Once that’s sorted you can create some groups to manage access/permissions and then start adding the users that will access the build server. In our case it was the DBA team, BI team and our manager. These are all folks that will want or need to either create new projects or just follow up on the results of build. You’ll also want to configure the mail notifier so that users can be notified when their builds fail (eventually watching the builds progress will become less exciting—or at least that’s what they tell me). Lastly you’ll want to visit the plugins page then click the “Browse plugins repository” button. This will allow you to browse for and install both the Octopus Deploy Integration and the Redgate SQL Change Automation plugins. We’ll be making use of both of these in our builds.

tcprojectsYou’ll want to give some thought to how you are going to organize the space: I chose to create team folders in the root directory for the DBA and BI teams to hold our respective projects. There was a non-zero chance that we’d possibly have different enough configurations that it made sense to separate them out. Thankfully if 90% of our configurations will be the same it is very easy to set up a template that can be rolled out and configured for each project. I only needed a single template in the parent directory DBA Projects to create the three distinct builds we wanted for our process:

  • builds based on changes to develop,
  • builds based on changes to master,
  • and builds based on pull requests (from develop into master).

The template consists of five steps and trigger for running it:

  1. Generate Build Number: Our build processes all take place in SSMS and build numbers and other artifacts that might be generated by Visual Studio or other IDEs is a mystery to me. We decided that our build numbers would be meaningful in as much as they would tell you when the code was built (for example 2020.01.21.1404 was a package that was built at 2:04 on January 21, 2020). The first step runs a simple PowerShell script that sets the build number that will be used by the rest of the Team City process.
  2. Build Database: This step uses Redgate’s SQL Change Automation to actually to ensure the database is deployable as written. A temporary database is created3 from scratch by the Team City agent. This agent’s service account must have db_creator rights (at a minimum) and if your database contains encrypted objects the service account will need SA so that SQL Compare will operate correctly to compare the encrypted objects to your source code. If you are using DLM Dashboard you can include that information here and Team City will ensure it is updated. We’ve configured this step to point to our development instance and ensured that our unit testing objects are not considered part of the deployment by adding the SQL Compare option “IgnoretSQLt.” If any of our code doesn’t compile this step will fail and the person committing the code will be notified (based on our configured notification rules).
  3. Run tSQLt: Honestly the best part of automating all of this was the fact that we could roll tSQLt unit tests (via the SQL Test plugin for SSMS) to provide extra validation of our code. This step builds the database again (this time from the nuget package produced by the previous step) and then runs all the unit tests we’ve included in our source code. Again we’ve configured it to use a real SQL Server rather than localdb.
  4. Push Package to Octopus Deploy: If the previous steps were successful Team City will toss the package over to Octopus Deploy. Octopus Deploy does need to be installed by this point as you will need to generate an API key for Team City to communicate with Octopus Deploy. We’ve configured this step to pass the package over and overwrite any existing packages on the Octopus side.
  5. Deploy Release: This automates the release creation on the Octopus Deploy server. For this to work for us the Project name in Octopus must be the same as the one in TeamCity as the step will pass its own project name over to identify the project receiving the release along with our custom build number. By default the template specifies “DBA Dev” as the Octopus Environment for the release. This safely enables some automation on the Octopus side (our template won’t ever accidentally deploy code to production accidentally).

Lastly the template trigger is configured to monitor the project’s VCS (version control system) root for changes that would indicate we should test a build. Ours is set to trigger on each check-in and include several check-ins in the build if they are from the same person. Additionally the default for the trigger in our template only looks at the develop branch and specifically ignores master.

whew That feels like a lot for just one article. The next post will cover the Octopus Deploy install and environment so they can work in tandem.

1. I believe a gMSA would be best here but your mileage may vary. Likely you’ll have to deploy with a traditional service account and then come back and update the service to use the MSA/gMSA. For our use case permissions are pretty binary (either the service(s) will have permissions or they won’t)2.
2. Also please note that much of what follows is that path we took due to some trial and error and may not be the single most efficient implementation. We’re all human and we’re doing our best: your mileage may vary!
3. Best practice is deploy to a big boy SQL Server that matches the edition of SQL server that will be deployed to in production. While you can use localdb your builds could fail if your database code includes objects no supported by localdb or worse if the version of localdb doesn’t match your actual environment.

Summit 2019 Post Mortem

Last week was a very excellent PASS Summit (made somewhat bittersweet by the fact that we don’t know exactly when we’ll return to Seattle) and I wanted to captures some of the things I brought back from a high level.

Big SQL News

A couple of announcements of note were made the first of which being the promotion of SQL 2019 to GA. 2019 includes such fancy features as the ability to perform data classification via metadata tables (as opposed to attaching that information to a columns extended properties). This feature will inter-operate with auditing to allow you to target the info that needs the most attention in your database. Accelerated Database Recovery to speed up rollbacks and make transaction log work more efficient. The ability to run R natively in SQL Server has been expanded to much larger collection of languages via sp_execute_externalscript. Better support for Docker and Kubernetes and more.

Azure Arc (currently in preview): This is crazy to me (in a fantastic way) but we can now run Azure services on prem! This makes infrastructure as code or the ability to allow teams to self-service their server needs that much easier. Plus you get the benefit of Azure threat analysis for those locally running services too. Currently it’s limited to Azure SQL Database and Azure Database for PostgresSQL Hyperscale. (More here.)

Azure Data Warehouse is now Azure Synapse Analytics: Microsoft is working to extend the data warehouse by connecting it more to our data lakes and analytics to provide more bang for the buck. Deep integration with AI and BI allow that workload to sit immediately next to the data in a unified environment.

On the Classroom Side

I believe the learning pathways they tried out this year were a success and overall at every time slot I was forced to make hard choices about what session to attend. The new MS certifications for Azure Data Engineer are an excellent area for career growth that brings together varied data wrangling skills into a new discipline to provide data for reporting and analysis in our organizations. The learning pathways started with a general understanding of this new role before spreading out to cover the various tools and Azure areas where Data Engineers will be spending their time. Similarly the AI learning pathways had a number of good sessions ranging from an introduction to AI, an overview of the process behind machine learning and how DBAs can better support their data scientists within the SQL Server stack.

I’m a privacy/data ethics nerd so I attended a couple of sessions on  data masking and data cataloging that were interested and really drove home that very few of us are ready for GDPR (or GDPR-like legislation) even now that the GDPR is here and in effect. (Up to 50% of UK companies are still not GDPR compliant!) There were also a number of excellent professional development sessions on technical leadership or helping drive organizational change as part of IT.

My favorite part was how much certain things came up over and over again no matter what session I was in: things like PowerShell (and specifically dbatools) as well as Azure Data Studio (the ability to direct folks internally to a tool that isn’t SSMS for data exploration has been a big driver at my org) especially the further development/integration of Jupyter Notebooks.

But my truly favorite part (as always) was the ability to connect up with other data professionals, face-to-face. To be able to shake the hand of the people that work so hard on dbatools, give feedback to the Azure Data Studio team or even just find another DBA facing similar challenges so we can cry in our beers together. PASS Summit has definitely been the biggest driver of my career learning since I started attending in 2014. The good news is you can register now for Summit 2020 in Houston: I hope to see you there!

Backing Up A Cosmos DB with the Cosmos DB Migrator Tool

CosmosDB really is an amazing datastore and even better (you might be thinking): Microsoft handles the backups for you. Which is true. They take backups every four hours and keep the last two. If you need anything recovered from the database you’d better hope that you notice with in that window *and* get a ticket open with Microsoft to get it fixed. This being the case Microsoft helpfully recommends that in addition to the by default backups that come with the Cosmos DB service that you export your data to a secondary location as needed to meet your organizations SLA. Data Factory to the rescue right? Again, almost.

Unfortunately if you are restricting access to your Cosmos DB service based on IP address (a reasonable security measure) then Data Factory won’t work as of this writing as Azure Data Factory doesn’t operate like a trusted Azure service and presents as IP address from somewhere in the data center where it is spun up. Thankfully they are working on this. In the meantime however the next best thing is to use the Cosmos DB migration tool (scripts below) to dump the contents to a location where they can be retained as long as needed. Be aware in addition to the RU cost of returning the data that if you bring these backups back out of the data center where the Cosmos DB lives you’ll also incur egress charges on the data.

The script reads from a custom json file, this will contain the cosmos db service(s), as well as the databases and collections that need to be backed up. This file will have the read-only keys to your cosmos DB services in it so should be encrypted on the disk in order to limit the number of people who can access the file.

Once the config file is in place the following PowerShell will read the file and backup the appropriate services, databases and collections appropriately (and remove any old backups that are no longer needed).

While this is not ideal if you have a need to immediately start backing up your cosmos dbs this will do the trick until Microsoft finishes incorporating Data Factory into their trusted services.

[Edited to add 10/3/2019:] Just yesterday it looks like MS updated their timeline for adding the needed functionality to ADF.

What’s a Data Catalog? Denver Edition

I had the wonderful opportunity to present an introduction to data catalogs to the Denver SQL Users Group yesterday. I am grateful for the chance to work on my presenting skills and talk about the new challenges about identifying and protecting risky data. I’ve uploaded the slide deck and some sample PowerShell scripts for SQL Data Catalog to my git repo here. Additionally there was a question about re-using Data Catalog work for data masking: Richard Macaskill’s blog post I mentioned is here. Thanks Denver: I am looking forward to coming back and visiting again!

SQL 101: What Happens When…

I had an interesting question asked this last Saturday when I was giving my Intro to SQL session at SQL Saturday Redmond: “What happens if you cast a NVARCHAR string as  VARCHAR?”

I had an educated guess but I didn’t have the time during the session to test it out: Casting NVARCHAR to VARCHAR. It won’t throw an error but it will destroy any data in the string that isn’t VARCHAR compatible. Try out the test below yourself!

The question was asked initially because she had found a place where she had to convert a uniqueID to a string and was concerned that perhaps this could cause issues. The uniqueidentifier is a series of hexadecimal numbers concantenated together with hyphens (for example: 851CC794-E0CB-49DC-976F-1687F62D2188). They will always be 36 “characters” long. The good news is that according to BOL the uniqueidentifier is consider a character type for the purposes of conversion. Attempting to cast a string that is longer than 36 characters will truncate the original string to the first 36 before it attempts the conversion. Conversely casting a uniqueidentifier to a string datatype will result in a 36 character string (unless you specify less than 36 characters in which case SQL will throw an error).

Friend of Redgate!

forg_2019_150So I tried a very new thing (for me anyway) and after a fair amount of dithering about it I told the little naysayer that lives in the back of my head to shut the hell up and applied for the Friend of Redgate program at the end of December.

I’m writing about it so of course you may easily guess the outcome: I was accepted into the program! I really am a huge fan of their tools ever since I first found SQL Compare 5 years ago (to the month almost now that I think about it) after a ridiculous manual deployment. In the time since I’ve deployed and learned to use more of their tools. They have created a wonderful community to help support the SQL community at large and I am looking forward to working hard to contribute as well! Currently my biggest engagement is with their new SQL Data Catalog. As more and more legislation is created to protect data in our control it’s more and more vital to be able to know exactly what types of data we have so we can take concrete steps to ensure its protection: a data catalog is an integral tool to get that understanding. I really love this problem space and I have lots of ideas and opinions about it!

Getting SSRS Details via Powershell

This is one of those posts so I never have to google this again (one hopes). Here is the PS code to pull back a set of details about every SSRS instance installed on a server including the SSRS instance name, & the first URL port it is running on, the service name and the name of the report server database etc.

2018 in the Rearview, 2019 Looming Large

2018 was a challenging year for all sorts of reasons. I set some modest goals for myself and it was mixed bag. My work goals got kind of demolished: I failed to post much last year (although I am happy I now have a post to return to every time I need to write a pivot) and after multiple attempts I failed to pass the MS 70-463 exam on building a data warehouse (I just couldn’t seem to get the last 50-100 points to push me over the top). My personal goals I did okay on though: I passed my nidan test back in September and I am about 1/3 of the way through learning to finger pick Loch Lomond (definitive version is Runrig’s live version) on my tenor uke. I would think the year was failure however I did a lot (especially in the first 3 months):

  • Coordinated and ran the first SQL Saturday here in Spokane.
  • Presented at said SQL Saturday.
  • Managed to keep the local PASS chapter meeting every month (even while I was out of the country with the assistance of my co-leader Elizabeth Hunt (t).
  • Attended SQL Sat Oregon, rode the SQL Train to PASS Summit. I finally managed to make the Summit meet-up with other chapter leaders and SQL Saturday organizers: so much learning and networking.
  • I started the process for creating a not for profit to handle the money for our SQL Saturday since it was successful enough to warrant another one. We officially incorporated right before Christmas.
  • At work I finally got automated backup testing up and running. It’s probably a little over-engineered but it was a solid first effort.
  • I also began attending the data governance meetings at work so we could start incorporating our SQL data into their consideration and I got us enrolled in the EAP for Redgate’s Data Catalog.

This last one is by far one of my favorite things this year. I’ve spent a lot of time thinking about how we treat our data but beyond being sure it was encrypted and backed up I haven’t been able to do much else concrete but in the last month of 2018 I was able to start cataloging data in our internally developed databases and get start on one of our bigger vendor databases. Adding this metadata will provide us with a much better view of what kinds of data we are storing and where it is.  I’m happy with this initial run at the problem space and I can already see where it starts to answer some questions we should all be asking about the data in our care.

2019 is looking to be another challenging and busy year: I need to refocus and look at my path to an MCSA/MCSE certification. I’m hoping I can take the 70-764 and 765 exams this year. I was hoping to get my 2012 MSCA and upgrade that but I still don’t do enough SSIS work to get myself over the last point hurdle with the 70-463. And being certified on the upcoming versions is probably worth my time.

I’ll finish putting together the Inland Northwest Data Professionals Association in the next couple of months, just in time to run the second SQL Saturday Spokane. I also applied to the Friends of Redgate program and we’ll see what comes of that: that could lead to a whole host of community contributions that I can’t even foresee right now.

SQL 102: A Useful Pivot

I don’t find myself pivoting data often but when I do I seem to always need to do the “complicated” pivot. You know, the one where you have a collection of unique values like email addresses where the value isn’t the column name but you want the 4 email addresses connected to one person in a single row rather than four rows. I can never remember how to do that so here is my personal internet bookmark for that code. Thanks to the late Robert Davis who gave the answer I needed back in 2015. Robert was a wonderful and generous contributor to the SQL Family his site is full of useful information and is still available here.

The solution is to add the categories you want as eventual columns to your result set before pivoting (it sounds simple enough but again then I can never remember it). So if you have say an account with multiple contacts:

This returns a list of results such as:

acctNo name email
1001 Susan susan@thesuze.com.edu
1001 Mark mark@thesuze.com.edu
1002 Aoife theboss@bagpipebattle.com.edu

The one simple trick is to create a value for each row that will be repeated it can be used as a column name when you pivot:

Simply repeat the structure of creating a valid column name (‘String’ + CAST(counter as VARCHAR)) and and you can include a pivot across as many different columns as need to be included. Our short list above now becomes:

acctNo Name1 Email1 Name2 Email2
1001 Susan susan@thesuze.com.edu Mark mark@thesuze.com.edu
1002 Aoife theboss@bagpipebattle.com.edu NULL NULL

Hopefully now in the future I’ll remember I wrote this and I won’t have to go searching for the solution (again).