T-SQL Tuesday #114 – The SQL Puzzle Party

T-SQL Tuesday logoThere were times when I tried to look for puzzles to solve, especially the T-SQL puzzles (what happened to the T-SQL Challenge site?). Now I don’t. Life is challenging as it is, especially if you work with SQL Server and really try to understand what’s going on.

So rather than coming up with some contrived problem for you to solve as part of this edition of T-SQL Tuesday (thank you Matthew McGiffen) I will share something that surprised me only last week. And yes, I have solved it already, and will be blogging more about it soon so no there is no big price for solving my production issue here 😉

Here is the scenario

There is a table that stores millions of records. It has a primary key, a date when a record was processed, a bit column indicating whether it was processed or not, and some text fields that are used for something, but in our example, it’s just data that takes space on pages.

There is also an application which is using nHibernate to generate a T-SQL query that retrieves one (just one at a time) records from that table where IsProcessed = 0. There are 10-50 records like that at peak times, in a table which holds tens of millions of records so making it very, very fast should be easy with a tiny little covering filtered index. Well… it turns out, SQL Server prefers to scan the clustered index instead.

Have a look

The challenge setup

use tempdb
go
drop table if exists dbo.LongProcessingTable
if not exists(select 1 from sys.tables where name = 'LongProcessingTable')
create table LongProcessingTable (
Id int not null identity primary key
,ProcessedOn datetime2 null
,IsProcessed bit null
,SomeData nvarchar(1024) not null
)

-- just some text to fill up the space on pages
declare @sometext nvarchar(1024) = (
select string_agg(convert(char(1),name), '')
from sys.all_objects
)

-- create just 100k records with some random date values
-- at this time all records are marked as processed
insert into dbo.LongProcessingTable(ProcessedOn, IsProcessed, SomeData)
select top(100000)
dateadd(second, -abs(checksum(a.object_id, b.object_id)%10000), getdate())
,1
,@sometext
from sys.all_objects a
cross join sys.all_objects b

-- now mark 10 rows as not processed
update d set IsProcessed = 0, ProcessedOn = null
from (
select top (10) *
from dbo.LongProcessingTable d
order by ProcessedOn desc
) d

Now the query:

declare @IsProcessed bit = 0

select top(1) Id, SomeData
from dbo.LongProcessingTable
where IsProcessed = @IsProcessed

The above query comes from the application and cannot be changed. It is what it is. And to help you start, here is the index I thought would work, but doesn’t.

create index IX_LongProcessingTable_NotProcessedYet
on dbo.LongProcessingTable(IsProcessed) include (SomeData)
where IsProcessed = 0

The index gets ignored and the server goes for the table scan instead.
Of course, there was somebody who discovered it earlier. I wasn’t all that surprised that Erik Darling blogged about it in 2015, 2017 and 2018 it turns out, he even says ‘IT IS KNOWN’… well, it wasn’t to me. But even know, with that knowledge, I still cannot change the query, so what can I do? How to make this query more efficient without changing it, and without creating a covering indexing on the whole table which can contain hundreds of GB of data just to get one row.

If you are still reading… well, enjoy the challenge. I will follow up with a few comments and a couple of my attempts at solving the problem later this month (hopefully).

 

gMSA and Docker – Lessons Learnt

In the last two posts (here and here) I have documented how I use gMSAs to connect services running in docker containers on Windows to SQL Server using the domain authentication. In the end it was very simple, but there are things I wish I knew when I started. It would save me a lot of time. Here is an attempt to document the lessons learnt.

Versions are important!

While in the end I was able to make it work on Windows Server 2016, 1803, 2019 and 1809 I wasted some time trying to make it work with docker 17.06. Unsuccessfully. Docker 18.09.1 and 18.09.2 worked every time.

There are some reports of intermittent problems with specific OS updates breaking stuff, like the one here but I wasn’t able to reproduce it. I wonder if the updates changes something else that it causing problems, in other words is it the problem with the update itself or the update process?

The Set-AdServiceAccount

From the beginning I set to try the gMSA authentication on multiple VMs following blog posts which all included some use of the Set-AdServiceAccount powershell command (from `RSAT-AD-PowerShell). I could not make it work on more than one VM at a time. I thought I was going mad! The problem (and the clue) is in the name. Set. It is not add, not modify. So when I was doing something like this

Set-AdServiceAccount -Identity MyService `
  -PrincipalsAllowedToRetrieveManagedPasswords DH2019A$ 

it was setting the principals allowed to retrieve the managed passwords for MyService to DH2019A VM. As expected. But not as expected removing the privilege from all the other VMs I granted that permission before. With no warnings.

It is probably the most worrying part about using gMSAs for the service authentication in production, as I plan to do it. All it will take is one sysadmin to run a command like that to break all the services, potentially on all docker hosts. To mitigate we have decided to grant the permissions through a domain group to which we will add docker hosts. That way there should be no need to run this command when scaling out.

Misleading Get- and Test-AdServiceAccount

Understanding the above problem with Set-AdServiceAccount was made much worse, by my misunderstanding how Get-AdServiceAccount and `Test-AdServiceAccount work.

If you are a domain admin the Get-AdServiceAccount will always return details of the gMSA if it exists. So it is of no use to check if the specific gMSA can be used on a given host.

If the gMSA was previously installed the Test-AdServiceAccount will return true regardless if the host account has permissions to retrieve the password or not. That permission is necessary for the gMSA authentication to work.

So with that in mind neither command is fit for checking if a specific host has permissions it needs to use a gMSA. I was not able to find anything better than attempting to install it again with Install-AdServiceAccount. It will either install it again, or display error message indicating that the computer has no permissions to retrieve the password.

Remove-AdServiceAccount

This does not remove previously installed gMSA from the local host. It removes the gMSA from the domain!

The SSPI context error.

If you try to use domain authentication from the service running on a docker host which has no permissions to retrieve the gMSA password you will get fairly generic error tell you that the SSPI context couldn’t be created.

There are scores of blog posts and msdn documents explaining how to troubleshoot many SSPI context errors. Not a single one I found mentions any problems with gMSA. I have learnt a lot about SSPI and how it really works, just to eventually realise that everything is fine, and I have to look for problems somewhere else.

There is no localhost

When you run a standalone container you can access it from the same host on the localhost. By default a nat network is used and it allows communication on the hosts IP. When moving to the swarm mode (using docker service create not just docker create) by default the ingress network is used and the localhost is not available. You have to use the public IP address of the docker swarm. There is a lot of blog posts how to define your own overlay or bridge networks. I suppose they all work on Linux, but on windows when using an overlay network you cannot use host IPs (so no localhost) and you cannot create bridge networks at all.

PS C:\> docker network create -d bridge bnet
Error response from daemon: could not find plugin bridge in v1 plugin registry: plugin not found

I was able to make it work when publishing the port directly on the host using --publish published=8001,target=80,mode=host instead of the shorter -p 8001:80 (ports obviously may be different) but I don’t think that’s a configuration I’d be using so… I simply gave up. I use public IP and everything works.

Security considerations

To install a gMSA on a host which has permission to read the gMSA’s password you don’t need to have any extra permissions. It appears that anybody with access to PowerShell on the host can do Install-AdServiceAccount -Identity MyService. There are no restrictions which credential spec file can be sued for which servce either. This means that if somebody has access to the docker host they can create a new service using any gMSA to which the host itself has permissions.

gMSA name lenght limit

The Group Managed Service Account’s name is limited to 15 characters. Not a big deal, but it messed up a carefully agreed naming strategy and in the end I have vowel-less service names.

New-CredentialSpec silently overwrites existing files.

Active Directory, Windows Containers in Swarm Mode and SQL Server

In my previous post I have explained how I was able to connect from windows containers running on docker to a SQL Server cluster on a network using domain authentication (with gMSAs) rather than SA logins and passwords.

gMSAs in docker swarm mode

After I got the containers using Group Managed Service Accounts working on a single Docker host I went on to try the same in the swarm mode. My plan was to simply replace docker run -d part of the command creating the container with docker service create, but it turns out that it is not that simple, especially if you don’t have a lot of experience with swarm mode. It is also worth noting, that I had a lower success rate than when I was experimenting with standalone containers. I was able to make it work on all the same Windows versions (2016, 1803, 2019, 1809) but only when using Docker 18.09 and not on the 17.06 which was on the image I used for 1803 tests).

Demo setup

Similarily to the previous post I have tested this on a range of operating systems and docker versions, but what I want to show here is how it worked on Windows 2019 and Docker 18.09.

gMSA_Docker_Service_1

To make it a bit more exciting (and because of how Docker Swarm works) this time I will be testing the service from a web browser. To start with it does’t work. That is because there is no service listening on port 8101.

gMSA_Docker_Service_1_web

Creating a service

To create a service docker service create command is used. When compared to docker create some parameters are different, for example there is no --security-opt used in the previous post and instead --credential-spec is used. But first things first. Let’s just create a service using michalporeba/sqlgmsatest:1809nano image with minimal configuration and see what happens.

docker service create -p 8101:80 michalporeba/sqlgmsatest:1809nano

gMSA_Docker_Service_2

The -p 8101:80 makes the service available on the 8101 port using the default ingress network. No errors, the service is running, it is converged, so let’s try to connect to it!

gMSA_Docker_Service_2_WebRequest

And here is the first surprise. The localhost doesn’t work. That’s a swarm thing and although it is possible to publish ports in host mode it is not how I would be running in production, so I will just open the ports and connect to the service externally using a web browser.

gMSA_Docker_Service_2_web

OK, so the api/info call was successful. The service from michalporeba/sqlgmsatest:1809nano image is running and responding. So the next task is to use it to query theTestDBdatabase on my test instanceDB.sqlgmsa.local`.

gMSA_Docker_Service_2_web2

Adding the gMSA

Not authorized! But who? The NT AUTHORITY\ANONYMOUS LOGON. That is because despite the docker host being member of the sqlgmsa.local domain, the container running the service is not. To fix it, exactly as in the case of standalone container a Group Managed Service Account has to be created, installed and a credential file created.

# Create gMSA
New-AdServiceAccount -Name MyService -DNSHostName sqlgmsa.local `
  -PrincipalsAllowedToRetrieveManagedPassword "Domain Controllers", "Domain Admins", "CN=DockerHosts,CN=Computers,DC=sqlgmsa,DC=local" `
  -KerberosEncryptionType AES128, AES256

# Install it
Install-AdServiceAccount -Identity MyService

# Import the module to manage Credential Specs
Import-Module .\PsModules\CredentialSpec.psm1

# And create a spec file for MyService
New-CredentialSpec -Name MyService -AccountName MyService `
-Domain (Get-AdDomain -Current LocalComputer)

gMSA_Docker_Service_3

Consistency is everything, isn’t it? -Name, -Identity, -AccountName on those commands above refer to the same thing, the gMSA name and have to match. The -Name parameter on the New-CredentialSpec command is used to control the name of the json file containing the credential spec. The filename can be anything, and it doesn’t need to match the account name, but I find it easier if it does. The existing credential spec files can be found in C:\ProgramData\docker\CredentialSpecs\ or by using Get-CredentialSpec command from the CredentialSpec.psm1 module.

The next step is to use the newly created credential spec file when creating the service. The --security-opt is not supported when created a service and --credential-spec has to be used instead.

docker service create -p 8102:80 `
  --credential-spec file://MyService.json `
  michalporeba/sqlgmsatest:1809nano

gMSA_Docker_Service_4

The new service now runs on port 8102 and should use the new MyService identity. Let’s see.

gMSA_Docker_Service_4_web

Almost there! Login failed for user SQLGMSA\MyService$ That’s good, that means the correct identity has been picked up, so the last thing to do is to create the login on the SQL Server.

gMSA_Docker_Service_5_sql

And now, as if by magic

gMSA_Docker_Service_5_web

The test web service, written in C#, using .net core is hosted in a Docker container running on windows host,and queries a SQL Server database using domain authentication.

Active Directory, Windows Containers and SQL Server

The problem

Everything appears to be in containers nowadays, even the SQL Server. But still there are mixed environments, people and companies wanting to try containers without going all in. So I was wondering how practical would it be to have .net core services on docker, running in Windows containers connecting to an external, old fashioned SQL Server instance? Also, as it is all in a Windows domain, I’d like to use domain authentication so I don’t have to worry about managing passwords.

Simple isn’t it? Well, it turns not that simple as not everything is on the domain. The SQL Server is, the docker hosts are, but the containers are not.

Additionally there are differences depending on whether you want to run as an independent container, or in docker swarm mode. This blog post focuses on standalone containers, and the swarm mode is covered in the follow-up post.

The quick answer

The good news is that it is not an unreasonable requirement and it has been done before. The solution is to use Group Managed Service Accounts (gMSA) and Credential Spec Files. A number of people have already documented their efforts. Some were more successful than others.

My story

My problem was that I wasn’t able to make it work just by following any single write-up. In fact, for a few days, I was not able to get it going at all. But eventually it happened and here is a step by step description of how I made it work on Windows Server 2016, 1803, 1809 and 2019 as the host OS and 2016, 1803 and 1809 in full and nano options as the container base image. Generally, it is very simple once you know what to do, and more importantly what not to do (more about it later).

Test setup

To test it I have set up a virtual lab environment on Azure with 6 VMs

  • DC – Windows Server 2016 Datacenter acting as a domain controller
  • DB – Windows Server 2016 Datacenter with SQL Server 2017 Developer edition installed
  • DH2016A – Windows Server 2016 Datacenter with Containers (Docker version 18.09.2)
  • DH1803A – Windows Server 1803 with Containers (Docker version 17.06.2-ee-18)
  • DH1809A – Windows Server 1809 with Containers (Docker version 18.09.0)
  • DH2019A – Windows Server 2019 with Containers (Docker version 18.09.1)

I have created a sqlgmsa.local domain and joined all the VMs to it. SQL Server was run using SQLGMSA\SqlServer Managed Service Account without any special permissions.

In the domain I have 2 service accounts SQLGMSA\ServiceA and SQLGMSA\ServiceB. Both have logins on the SQL Server instance. I will be setting some of my containers to connect to the SQL Server as ServiceA and some as ServiceB.

Initially I tested the connectivity from containers build with full base images (standard mrc.microsoft.com/windows/servercore) and using dbatools module to run queries from them to the DB.sqlgmsa.local server. To be able to test nano based images I created two test images containing simple .net core WebAPI service written in C# with two public methods. Calling api/info you can check if the service is running, what system is it running on. Calling api/query/ attempts to open connection to the specified database (or master if the db parameter is not provided) and returns information about the database, the original login and the current user. A simple query

select 
     @@version SqlServer
    ,db_name()  [Database]
    ,current_user CurrentUser
    ,original_login() OriginalLogin
for json path

The test images are available on docker hub and the source code here on github.

PowerShell and AD

In the example I am using PowerShell to manage my active directory. If the commands I use don’t work for you you may be missing the AD modules. To install them add the RSAT-AD-PowerShell windows feature by executing this PowerShell command

Add-WindowsFeature RSAT-AD-PowerShell

Group Managed Service Accounts (gMSA)

Managed Service Accounts where introduced some time ago to reduce overhead associated with managing passwords for service accounts. The Group Managed Service Accounts solve the same problem but unlike MSAs gMSAs can be used across multiple computers.

To start using gMSAs on a domain a KDS Root Key has to be created first. It is the key with which passwords shared between the computers on the domain are protected. If your domain has other MSAs already you will not need to do it again.

To create a KDS Root Key Run I used this PowerShell command on the domain controller

Add-KdsRootKey -EffectiveTime (Get-Date).AddHours(-10)

and then to verify that it has been created

Get-KdsRootKey

Now to create the test service accounts I used the following commands. I was doing it on the DC, but with the right permissions it should be possible to do from any computer on the domain.

New-AdServiceAccount -Name ServiceA -DNSHostName sqlgmsa.local `
   -PrincipalsAllowedToRetrieveManagedPassword "Domain Controllers", "Domain Admins", "CN=DockerHosts,CN=Computers,DC=sqlgmsa,DC=local" `
   -KerberosEncryptionType AES128, AES256

New-AdServiceAccount -Name ServiceB -DNSHostName sqlgmsa.local `
   -PrincipalsAllowedToRetrieveManagedPassword "Domain Controllers", "Domain Admins", "CN=DockerHosts,CN=Computers,DC=sqlgmsa,DC=local" `
   -KerberosEncryptionType AES128, AES256

Where ServiceA and ServiceB are the names of the accounts and "CN=DockerHosts,CN=Computers,DC=sqlgmsa,DC=local" is the distinguished name of the group I have created for the docker hosts.

If you don’t know the exact distinguished name running this command can help

Get-AdGroup -filter { name -like "yourgroupname" }

Now on every docker host all the specific service accounts (2 in my test case) have to be installed so that the host OS can access them.

Install-AdServiceAccount -Identity ServiceA
Install-AdServiceAccount -Identity ServiceB

If there is an error message like this, it means the permissions were not set correctly
Install-AdServiceAccount : Cannot install service account. Error Message: ‘{Access Denied}

Credential Spec file

Docker Credential Spec Files have been created specifically to solve the problem of passing gMSA to containers. They are plain json files with information about the service account. It is possible to create the files manually, but there is module for it. It is documented here but here is a short instruction of how to create get and import the module.

This part needs to be done on every docker host.

# 1. Set TLS1.2 support from PowerShell so the module can be downloaded from github. 
PS C:\Tmp> [Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12

# 2. Download the psm1 file using Invoke-WebRequest
PS C:\Tmp> Invoke-WebRequest "https://raw.githubusercontent.com/MicrosoftDocs/Virtualization-Documentation/live/windows-server-container-tools/ServiceAccounts/CredentialSpec.psm1" -OutFile "CredentialSpec.psm1"

# 3. Import the module
PS C:\Tmp> Import-Module .\CredentialSpec.psm1

With CredentialSpec module imported for each gMSA a credential spec file has to be created.

PS C:\> New-CredentialSpec -Name ServiceA`
  -AccountName ServiceA `
  -Domain $(Get-AdDomain -Current LocalComputer)

PS C:\> New-CredentialSpec -Name ServiceB`
  -AccountName ServiceA `
  -Domain $(Get-AdDomain -Current LocalComputer)

The list of existing files can be obtained with

PS C:\> Get-CredentialSpec

Name     Path
----     ----
ServiceA C:\ProgramData\docker\CredentialSpecs\ServiceA.json
ServiceB C:\ProgramData\docker\CredentialSpecs\ServiceB.json

And finally, run the containers passing the credential spec files with the --security-opt parameter. (This is an example from DH2019A using the 1809 nano base image).

docker run -d -it -p 8001:80 `
   --security-opt "credentialspec=file://ServiceA.json" `
   --name ServiceA `
   michalporeba/sqlgmsatest:1809nano

docker run -d -it -p 8002:80 `
   --security-opt "credentialspec=file://ServiceB.json" `
   --name ServiceB `
   michalporeba/sqlgmsatest:1809nano

The proof is in the pudding

After checking both containers are running with docker ps I can start testing. As the test is not focused on anything else but domain authentication I didn’t open any ports to the lab, and all I was doing was to either connect to the container and use dbatools to execute a query on the db server, or from the docker host connecting to the service listening on the published port. Here are the example calls using Invoke-WebRequest on DH2019A.

sqlgmsa.proof

PS C:\> $env:ComputerName
DH2019A
PS C:\> docker start ServiceA
ServiceA
PS C:\> docker start ServiceB
ServiceB
PS C:\> docker ps
CONTAINER ID        IMAGE                               COMMAND                  CREATED             STATUS              PORTS                           NAMES
c6382bb7d816        michalporeba/sqlgmsatest:1809nano   "dotnet TestService.…"   2 days ago          Up 4 seconds        443/tcp, 0.0.0.0:8002->80/tcp   ServiceB
02ece189cb74        michalporeba/sqlgmsatest:1809nano   "dotnet TestService.…"   2 days ago          Up 8 seconds        443/tcp, 0.0.0.0:8001->80/tcp   ServiceA
PS C:\> # Service A
PS C:\> (Invoke-WebRequest -UseBasicParsing http://localhost:8001/api/info).Content
["OS:  Microsoft Windows 10.0.17763 ","Framework: .NET Core 4.6.27317.07"]
PS C:\> (Invoke-WebRequest -UseBasicParsing http://localhost:8001/api/query/DB.sqlgmsa.local).Content
[{"Database":"master","CurrentUser":"guest","OriginalLogin":"SQLGMSA\\ServiceA$"}]
PS C:\> # Service B
PS C:\> (Invoke-WebRequest -UseBasicParsing http://localhost:8002/api/info).Content
["OS:  Microsoft Windows 10.0.17763 ","Framework: .NET Core 4.6.27317.07"]
PS C:\> (Invoke-WebRequest -UseBasicParsing http://localhost:8002/api/query/DB.sqlgmsa.local).Content
[{"Database":"master","CurrentUser":"guest","OriginalLogin":"SQLGMSA\\ServiceB$"}]
PS C:\>

Conclusions

The above setup really boils down to 5 steps. If you want to use windows authentication from windows containers on docker to a SQL Server instance (or a cluster you have to

  1. Create gMSAs for your services
  2. Create logins for the service accounts on the SQL Server
  3. Install gMSAs on the docker hosts
  4. Create credential spec files
  5. Create containers with --security-opt parameter pointing to the credential spec file.

That’s it. In my case I was able to make it work (for standalone containers, not in swarm mode) on different OS versions (2016, 1803, 1809, 2019) using full and nano base images and using docker 17.06 and 18.09. However, there can be surprises and pulling hair. In the week I spent trying to figure it out I had a number of moments when I thought I’ve got it, just to realise that what worked a moment ago, doesn’t any more.

The biggest lessons where

  • use AD Groups for managing access to gMSAs rather than individual computer accounts,
  • be very careful with Set-AdServiceAccount which I have seen in some of the posts out there,
  • SSPI context errors is not what it seems, and can be very annoying

More details about the lessons learnt can be found in the follow up post.

Trying to do the same but in a service run on Docker in swarm mode is similar, but not exactly the same. I have described it here.

This post is long enough as it is, so I will not go into the details of those lessons learnt here but instead include them in a follow up to which I will link here later.

T-SQL Tuesday #111 – Why?

T-SQL Tuesday logo

This post is a part of T-SQL Tuesday #111 that invites SQL Community to answer Andy Leonard’s question of Why? Why do we do, what we do? Here is my answer.

 

Why?

My job titles typically include words like ‘database’, ‘DBA’, ‘SQL’, ‘Server’ but that’s just a corporate technicality, really. But if we met and you asked me who I am, the most likely answer you would get is that I am an eXplorer.

An eXplorer! Perhaps it is not the most obvious answer, especially in a professional, an office setting. It took me quite a while to realise it myself. One day a colleague called me the most intrinsically curious person he has ever met. I’m not entirely sure how it was meant but I took it as a compliment. And then I realised that he might be onto something.

I want to know! How are things made? How do they work? Why this and not the other way? I want to know what are the limits and what is beyond them? What is out there?

And being like that, I suppose, I could be a mechanical engineer or a car mechanic. But those professions don’t appeal to me at all! Too obvious. Too practical. Too palpable. In a sense too easy while at the same time too much hard work! I’m drawn to problems that require building complex abstractions in mind, understanding rules, but at the same time being creative.

That curiosity and the need for the not-so-obvious has got me to where I am now. Professionally it started with games on ZX Spectrum. To understand how games are made I found an interest in programming. Programming led to an interest in how computers are made. Assembly programming back in the day was such fun and a way to explore the hardware. You cannot really see the registers in the CPU or the bits in memory but if you got it all right, it was possible to write your own keyboard drivers, or modifying the VGA memory and push the limits of what seemed possible. From there I got into networks, switches, routers, servers. Somehow bits, ones and zeros travel from a computer to computer and make stuff happen. It is all difficult to see so abstractions, mental models had to be created on every level of the OSI model. Database engines were the next obvious choice as they appear to be the things people I worked with found difficult and obscure while at the same time being the things where hardware and software, infrastructure and algorithms come together.

And then it stopped there. At least professionally. Sometimes I wonder if the Peter Principal applies to personal development too? Are the databases (and data they contain) a problem difficult enough for me? Have I stopped because I reached my own level of incompetence? Maybe. Or maybe not. Maybe it’s just the fact, that there are limits to the time and space, and what one can do within those constraints?

The time is an important factor as the same forces, the curiosity, the taste for mental gymnastics shaped my other interests too. I speak 4 languages and I continuously study more (not very successfully, I must say). I studied music. Bassoon to be exact. I still spend a lot of time playing all sorts of instruments, experimenting with algorithmic music, sometimes composing. On a day off I go caving or cave diving in tight, muddy Welsh sumps. Not for the pleasure of scuba diving. There is no pleasure in it. Being submerged in a freezing cold muddy water, finding my way mostly by touch in a complex three-dimensional space. No, ‘fun’ is definitely not the word, but that is one of the last frontiers that doesn’t involve space flights. There are still places no human has ever been to before on Earth. There are still places for which there are no maps, places which are beyond known limits. I go there to see what is out there, to eXplore, to be the first.

So perhaps, the real answer to the question of why do I do what I do, why am I a DBA or a data professional is that nobody I know is paying for being a cave diver, and while the consequences of getting things wrong are different, the risks and problems faced are almost exactly the same. Yes, the way I see them they are almost exactly the same. The same traits that make me a reasonable cave diver, make me a decent production DBA too, but that’s another story.

A DBA’s thoughts on ORMs

What do you think about using NHibernate with SQL Server? What would you say if we were to use Entity Framework on our next project? What is your opinion on ORM frameworks?

I am a sort of DBA who spends a lot of his time working with developers. Deep in the implementation trenches, cutting code, trying to prevent any future ‘server issues’ by influencing the design at early stages of development. I find it much more efficient than waiting for them to chuck some code over the fence to us when there is very little we can do, but complain about them and get upset that somehow indexes and statistics don’t solve the problem. And so I hear those sort of questions a lot and hardly every I have the time to answer them in any other way than just to say ‘it depends’.

So here is an attempt on answering this question.

The developer in me wants to say:
Of course, use an ORM! Go code first if you can. It saves the time, it deals with the Object-Relational Impedance Mismatch problem, it keeps your code clean as there is no need for those strange looking data queries. All the code is in one place, one solution, easy to find, read and understand. It is data engine agnostic too, so we can deploy it on MySQL, Oracle, PostgreSQL or SQL Server. On anything really.

But then the DBA in me wants to shout:
Are you mad? Of course not! Don’t use ORMs. Ever. They produce unreadable, inefficient queries that are difficult to understand or optimise. The code first approach typically leads to inefficient schema. New database engine features are ignored because cross vendor compatibility is more important than performance. And don’t you see how those leaky abstractions of generic repositories you are using? Really, passing IQueryable to the business layer? Maybe you have the ability to run it on multiple data engine, but now your business layer depends on your ORM framework and the data model.. Read the Clean Architecture by Uncle Bob, especially the part about keeping frameworks at arm’s length.

And so the developer responds:
OK. So I will be more specific with my repositories… perhaps. Fine. But I’m not going to write any SQL statements. I don’t want any magic strings in my code with no support from the IDE. And no, no stored procedures. We cannot have logic split into multiple layers. All code needs to be in the repo, all code needs to be tested. Don’t you see, stored procedures just don’t fit in the modern software development cycle. Besides, we have developers who can write LINQ and don’t need to know any SQL.

But the DBA with a smug look on his face says:
Ha! That idea of abstracting away technology, so that you don’t have to understand it has been tried before. Sometimes it works, sometimes it doesn’t. What happened to WebForms. Wasn’t it the idea to hide HTML and JavaScrip to make web development easier for existing windows developers. How did that go?

And that’s how it starts again and again, and the discussion in my head goes on and on. But eventually I come to similar sort of conclusion time after time, and here is what I actually do. (It is a compromise on which both the developer and the DBA in me agree on, allowing me to stay sane).

  • For Proof of Concept work I use ORMs and the code first approach. That saves a lot of time and effort, and the code will be a throwaway anyway. My ORM of choice is Entity Framework but it doesn’t really matter.
  • I don’t spend much time thinking about data types. In most cases string defaulting to nvarchar(255) is good enough for a PoC.
  • I prefer to use EF Core as it supports in memory storage for even faster PoC development and testing.
  • Just in case it is not thrown away (as it should), I keep my architecture clean. I make sure to use specific repositories for data access, and that the repository abstraction is not leaking any implementation details. A repository takes and returns business objects and is using ORM framework internally only.
  • On projects which will not be thrown away I start with Dapper (a micro ORM) and stored procedures. It is a bit more work but forces me to design the data structures better, and offers a lot benefits for the future (more about it later in this post).
  • While I agree that logic should be in one place, there are different types of logic, and those should be implemented independently. There is UI Logic, there is Business Logic and there is Persistence Logic which I implement in a repository or in stored procedures. A good example would be a soft delete functionality.
  • All SQL code is kept in the same solution, is tested and deployed through the normal CI/CD channels using DbUp project.

So my answer is

Use ORMs as long as they work for you, but architect your code in such a way, that you don’t depend on them, and ready to ditch them when they start to cause more problems then they solve. Consider micro ORMs. Try Dapper.

Here are a few more benefits of using Dapper with stored procedures

  • Dapper has much smaller footprint than NHibernate or Entity Framwork.
  • Dapper is faster, almost as fast as a DataReader, when compared to full ORM frameworks. According to this at the moment of writing this post it is 10 times faster than NHibernate.
  • While being small and fast Dapper still takes the ORM problem away.
  • Stored procedures add some extra code that needs to be written, but allow access to the latest database engine features. In case on SQL Server those can be Hekaton (In-Memory OLTP), JSON or XML data types, graph structures, temporal tables, windowing functions and much more.
  • Stored procedures make performance troubleshooting and reviews much easier. For DBAs it is much easier to understand which part of an application creates the load and therefore what it is trying to do with well named stored procedure rather than a lot of auto generated SQL statements.
  • The cooperation with Database Developers is much easier, as they can easily identify queries that need to be optimised, and then improve them without worrying (too much) about any non SQL code.
  • Even if you don’t have DBAs and DBDs just now, you might in the future. If the business is successful it might be that you suddenly need to get somebody with those skills to help you. Having good structure, with a separated data layer will make their life easier, your bill lower and everybody happier.

Hashing values in SQL Server

TL;DR

  • SQL Server can hash values using some of the common hashing algorithms like MD or SHA.
  • It is possible to use XQuery in addition to XPath in XML value() function to do things T-SQL cannot do on its own.

The Details

Hash values or (hash codes) is what we typically use to store_passwords in databases. We use salt values too. Definitely, we don’t store clear text passwords. Right?

Typically during a login to an application, a password is combined with a salt value stored somewhere in a database and a hash value is calculated which is then compared with the hash value previously stored in a database. But there are other options. It is possible to calculate hash values directly in a database using T-SQL. It could be useful if a bulk update needs to be performed if you want to generate a lot of test users with predefined passwords during a database migration, but it is also possible to overwrite a password hash and gain access to the application as any user.

SQL Server (starting with 2008) has hashbytes function which can be used to calculate hashes using a number of different algorithms. The algorithms supported differ from version to version. The latest 2017 supports MD2, MD4, MD5, SHA, SHA1, SHA2_256, SHA2_512.

Let’s have a look at how to use it

declare @password nvarchar(32) = 'Secret1234'

select 'MD5' Algorithm, hashbytes('md5', @password) Hash
union all select 'SHA' ,hashbytes('sha', @password)
union all select 'SHA2_256' ,hashbytes('sha2_256', @password)

which produces something like this

Hashing results

You can see that the results are of varbinary type. That’s OK if your application is storing the hash in this format, but from my experiance, most developers will not know that a byte array can be stored in a database and they will convert it into a Base64 string.

SQL Server does not do Base64 encoding (not as far as I know) but it does support XML and XQuery, and they do encoding. So let’s use it.

declare @password nvarchar(32) = 'Secret1234'

select Algorithm
,convert(xml, N'').value('
xs:base64Binary(xs:hexBinary(sql:column("Hash")))', 'varchar(max)'
) Base64Hash
from (
select 'MD5' Algorithm, hashbytes('md5', @password) Hash
union all select 'SHA' ,hashbytes('sha', @password)
union all select 'SHA2_256' ,hashbytes('sha2_256', @password)
) t

Here I’m converting an empty string N'' to an XML type which creates an empty XML object which allows me to use it’s value() method to execute XQuery which is a functional language. In most examples of XML in SQL Server the only thing you will see is XPath in value() but XQuery can be used too.

The results are more what you’d expect:

Hashes to Base64

One thing to note. The hashing algorithms operator on bytes so are not only case sensitive but type sensitive too.

declare @password nvarchar(32) = 'Secret1234'

select
'varchar' Type
,hashbytes('md5', convert(varchar(32), @password)) Hash
union all select
'nvarchar'
,hashbytes('md5', convert(nvarchar(32), @password))

Have a look at the results. Only because type used is different, the hash is different too.

Hash values using different types

If you want to make it work with .Net make sure to use nvarchar as in .Net strings are Unicode.

How can it be useful?

  • It is possible to push hash calculation and comparison to database which means the correct hash and salt value are never loaded to the application memory.
  • It is possible to generate hashes for test users in database seeding scripts avoiding application doing row by row processing.
  • It is possible to generate hashes of objects to detect changes (although the checksum function can good enough and faster too).
  • It is possible to use XQuery to do things T-SQL cannot do.

But it brings some risk too
* It is possible to gain access to an account by updating values in a database even when using hashing with salt.