DB Stuck in Single User Mode

Some Background

TLDR? Solution is at the bottom of this page.

The internet is full of examples how to close all active connections to a database by doing something like

use master
alter database somedb set single_user with rollback immediate
alter database somedb set multi_user 

When you try it on your development machine it works just fine, when you do it in QA it is typically OK too, but one day you realise that you have made a mistake, that the use master should have been use somedb and now you have a production database in a single user mode, there is only one connection allowed at the time and yours is not the one. Worse, there are hundreds of clients, all queueing to be the only user!

You cannot connect to the database

Msg 924, Level 14, State 1, Line 1
Database 'somedb' is already open and can only have one user at a time. 

Trying to do anything that starts with alter database ... will fail too

Msg 5064, Level 16, State 1, Line 1
Changes to the state or options of database 'somedb' cannot be made at this time. The database is in single-user mode, and a user is currently connected to it. 
Msg 5069, Level 16, State 1, Line 1
ALTER DATABASE statement failed. 

Trying to get some ‘authority’ over the server by connecting as the admin (DAC) is of no use either.

So what can be done? If you happen to have one sql login per database (or a handful) then you can revoke that logins permission to log in, but then you have to make sure you grant it back, to all of the ones you revoked it from. It would work but typically either there are too many logins that would have to have rights revoked or there are too few logins and revoking any rights would make the outage wider than it already is.

A fairly obvious solution is to shut down the server and bring it back up in the single user mode. The Sqlservr.exe -m option. Then only one admin user will be allowed to connect to the instance, which will allow you to change the permissions of the affected database. The only problem is, that we were talking about a problem which affects almost exclusively busy production server. Shutting it down to fix just one database sounds like an overkill and I bet there will be a lot of explaining to do afterwards.

Solution

What I found to work quite well is to open two sessions. In the first one I execute a query that will constantly attempt to connect to the database which is stuck in the single suer mode

use somedb
go 200

That ensures that I’ve got a session that constantly is trying to connect. Then in the other session I will look up the SPID of the session currently connected and kill it. Obviously that assumes that I know my system, and I know that the application will recover if I kill the session.

select spid from sys.sysprocesses where dbid = db_id('somedb')
kill <spid>

Do it a couple of times making sure you don’t kill the spid of the first session and your first session should be connected to the database. Now all that is left is to finally do

alter database somedb set multi_user

Automated Environmental Checks

Unit Test Your Environments

Infrastructure as Code has been around for quite some time now. There are many clever definitions one can easily find, but for me it is simply an extension of good coding (or perhaps DevOpsing) principals. You don’t go into production systems and keep making changes until it works. You write your code, you check it into your favourite repository, then there is some process that will automatically test it, and eventually it will make its way to production environment. That way if you make a mistake, you can see where you made it and if it happens to work just fine, you can easily do it again, and again, and again as you scale out.

It’s all ~~very~~ easy when you work with ‘cloud’ virtual environments that wait to ingest next config file and reconfigure themselves automatically. But what if you look after a more traditional setup? Exchange, Share Point, SQL Server clusters hosted on premises on Windows Servers, rather than Docker pods? What if you are not the only admin and people do go in there and make changes to live systems’ configuration, perhaps even for good reasons?

I’d say you do the same as any decent software developer would do when asked to take care of an old, perhaps unfashionable, non-microservice code base: When asked to change anything, start with writing unit tests so you know at the very least you will not make it any worse. That’s right, even if you cannot define your configuration with code, you can test it with it. PowerShell and Pester are great tools to do so. And if you happen to be responsible for SQL Servers, then there are unit tests already written for you, available as the dbachecks module. It’s MIT licensed. Go, install it and use it.

PowerShell and Pester

Pester is an open source test and mock framework for PowerShell. All modern Windows servers have PowerShell, so let’s use it. Jakub Jareš one of the contributors to the Pester project has written an excellent blog post about environmental checks, how to write them in Pester and why you should test you ~~tests~~ checks. That was my starting point when I took it on myself to improve dbachecks. If you haven’t yet, go and read it. I will try not to repeat what Jakub wrote there, but rather where I got from there while trying to find a way to structure checks in such a way, that they are both testable, and easily understood by non-developer sysadmins.

The Difference

Unit tests typically follow the AAA pattern.
* Arrange your context
* Act (as in perform the action you want to test), and finally
* Assert that the outcome matches your expectations

In unit testing the objective is to test the functionality detached (as much as practical) from the outside environment. Obviously, that’s something that cannot be done when doing environmental checks. After all, the environments is what we are testing, so the AAA becomes CCC
* Configure your context, get the configuration for the environment you are validating
* Collect the data about your environment
* Confirm the real life matches your expectations

Describing it like that stopped me going in circles thinking how do I test my tests. Now I’m testing my checks and to do that I follow the AAA pattern.
* I arrange my test by mocking my configure and collect functions
* I act by calling the confirm function passing it the mocked details
* I assert the outcome by checking if the confirm action behaved as expected

Ideally that would be it, write a Pester unit test and then a test for that unit test. In practice it is slightly more complex as we have to work with the limitations of PowerShell and Pester (which after all is primarly for unit test not environmental checks)

Code Examples

Typically, my testable check has 4 components in 3 files. Here is an example from the dbachecks’ Page Verify check.

First, we need to write the confirm function and its configuration
confirms\Database.PageVerify.ps1 which could be as simple as:

function Confirm-PageVerify {
    param (
        [parameter(Mandatory=$true,ValueFromPipeline=$true)]
        [object[]]$TestObject
    )
    process {
        $TestObject.PageVerify | Should -Be "CHECKSUM" -Because "we expect Page Verify to be CHECKSUM"
    }
}

But to be able to test configuration input and to make the Confirm function fit better with the rest of the framework we do

function Get-ConfigForPageVerifyCheck {
    $pageverifyValidValues = @("NONE", "TORN_PAGE_DETECTION", "CHECKSUM")
    $pageverify = Get-DbcConfigValue policy.pageverify
    if (!($pageverify -in $pageverifyValidValues)) {
        throw "The policy.pageverify is set to $pageverify. Valid values are ($($pageverifyValidValues.Join(", ")))"
    }
    return @{
        PageVerify = (Get-DbcConfigValue policy.pageverify)
    }
}

function Confirm-PageVerify {
    param (
        [parameter(Mandatory=$true,ValueFromPipeline=$true)]
        [object[]]$TestObject, 
        [parameter(Mandatory=$true)][Alias("With")]
        [object]$TestSettings,
        [string]$Because
    )
    process {
        $TestObject.PageVerify | Should -Be $TestSettings.PageVerify -Because $Because
    }
}

and to make sure it works as expected, we create unit tests for it
tests\checks\Database.PageVerify.Tests.ps1 like so

Describe "Testing Page Verify Confirms" {
    Context "Test configuration" {
        $cases = @(
            @{ Option = "CHECKSUM" },
            @{ Option = "TORN_PAGE_DETECTION" },
            @{ Option = "NONE" }
        )

        It "<Option> is acceptable as policy.pageverify value" -TestCases $cases {
            param($Option) 
            Mock Get-DbcConfigValue { return $Option } -ParameterFilter { $Name -like "policy.pageverify" }
            (Get-ConfigForPageVerifyCheck).PageVerify | Should -Be $Option
        }

        It "Throw exception when policy.pageverify is set to unsupported option" {
            Mock Get-DbcConfigValue { return "NOT_SUPPORTED_OPTION" } -ParameterFilter { $Name -like "policy.pageverify" }
            { Get-ConfigForPageVerifyCheck } | Should -Throw 
        }
    }

    Context "Test the confirm function" {
        Mock Get-DbcConfigValue { return "CHECKSUM" } -ParameterFilter { $Name -like "policy.pageverify" }

        $testConfig = Get-ConfigForPageVerifyCheck 

        It "The test should pass when the PageVerify is as configured" {
            @{
                PageVerify = "CHECKSUM"
            } | 
                Confirm-PageVerify -With $testConfig 
        }

        It "The test should fail when the PageVerify is not as configured" {
            {
                @{
                    PageVerify = "NONE"
                } | 
                    Confirm-PageVerify -With $testConfig
            } | Should -Throw 
        }
    }
}

And finally, when we know the configuration and confirm functions are working as expected, the checks\Database.Tests.ps1 which provides the definition of the check could be as simple as:

$config = Get-ConfigForPageVerifyCheck                  # Configure
@(Get-Instance).ForEach{                                # Collect on instance level
    @(Get-DatabaseInfo -SqlInstance $psitem).ForEach{   # Collect on database level
        It "$($psitem.Database) should have page verify set to $($config.PageVerify)" {
            # Confirm environmental details (in $psitem)
            $psitem | Confirm-PageVerify -With $config -Because "Page verify helps SQL Server to detect corruption"
        }
    }
}

But to make it display better, and to provide more information to the curious sysadmin who would like to know what the check does we do this instead:

Describe "Page Verify Settings Check" {
    $config = Get-SettingsForPageVerifyCheck                    # Configure
    @(Get-Instance).ForEach{                                    # Collect on instance level
        Context "Testing page verify setting on $psitem" {          
            @(Get-DatabaseInfo -SqlInstance $psitem).ForEach{   # Collect on database level
                It "$($psitem.Database) should have page verify set to $($config.PageVerify)" {
                    # Confirm environmental details (in $psitem)
                    $psitem | Confirm-PageVerify -With $config -Because "Page verify helps SQL Server to detect corruption"
                }
            }
        }
    }
}

If somebody is simply curious what the check checks, and why, all he has to do is to read the strings here or in the Pester output
* “Page Verify Settings Check”
* “Testing page verify setting on ”
* “Each should have page verify set to ”
* “Because page verify helps SQL Server to detect corruption”

The Small Print

To make sure the checks are easy to read and have as high as practical test coverage I came up with a number of guidelines

  • Try to unit test your checks. But it is better to have a manually tested check than don’t have one at all.
  • Prioritise readability in the checks’ file. Assume that is the file for non-developer to read.
  • Confirm-* functions should have -Because parameter which value is defined in the checks file, rather than have -Because in the Confirm-* function directly. That improves readability.
  • While Pester’s test cases might be useful for unit testing our checks, they should be avoided in the checks themselves. The code is less complex to read to an untrained eye.
  • The checks’ file should have as little logic as possible. The collection of data has to happen here, and it will include typically some looping and parameter passing. That is unavoidable, but try to avoid what you can, definitely the conditional bits like if and else should not be there. Put them in the Confirm-* function and validate it with unit tests.
  • If the test is not applicable for a resource in a certain condition, it is better to get that resource and skip that test from inside the Confirm-* function. That way we know nothing is missed, and the skipping logic can be unit tested as well. OK, that would be the ideal solution, but unfortunately it is currently impossible in Pester, so instead you can mark the test as inconclusive using Set-TestInconclusive. (Check the GitHub issue I raised about that extra functionality).

Next Steps

Go and write some environmental checks for your servers!
If you work with SQL Servers, get dbachecks and try them out.

VS Code and Pester

I have recently switched to Visual Studio Code for writing PowerShell scripts. (Here is why and how). At the same time I started using Pester to do TDD with PowerShell too. It works very well out of the box, at least at the beginning of a project. As you are writing your Pester tests file to cover the function you have just written you can simply hit F5 to execute all of the tests in that file. Easy.

It’s just that it is backwards. The tests should have already been written, and as you are writing your function that will satisfy the tests you try to press F5 and that function gets executed, not the tests. Even more so, if the function is part a module it gets executed out of context. Nothing works.

Trying to solve my problem I found a very good writeup on Debugging PowerShell in VS Code but it did not answer my question of how to do test driven development with pester and Visual Studio Code.

Eventually to solve my problem I have created workinprogress.ps1 script which I typically keep just outside of the repository and define the following VS Code configuration in my launch.json file.

{
  "version": "0.2.0",
  "configurations": [

  {
    "type": "PowerShell",
    "request": "launch",
    "name": "Test workinprogress",
    "script": "${workspaceFolder}/../workinprogress.ps1",
    "args": [ "${workspaceFolder}", "${workspaceFolder}/tests", "${file}" ]
  }
  ]
}

VS Code will pass the current workspace, the path to the unit tests (${workspaceFolder}/tests in the example), and the open file with the current line number. Based on that information the following will happen when you hit F5

  • if there is a module (.psm1) file in the workspace, the module will be reloaded
  • if the open file name ends with .Tests.ps1 the script will attempt to find the closest pester -Tag and execute only that tag from the current file. Failing that the whole pester file will be executed
  • if it is not a pester file but the matching (by name) test file exists next to the open file, it will be executed.
  • if there is a matching (by name) test file somewhere inside the tests folder (${workspaceFoder}/tests in this example) it will be executed.

A matching test file has the same name as the original file but ends with .Tests.ps1 instead of .ps1. So for a command file Get-MyValue.ps1 the matching tests file is Get-MyValue.Tests.ps1

The workinprogress.ps1 file is available on github here.