Performance of JSON in SQL Server

Have you heard claims that in SQL Server the new JSON data support performs 10 times better than the old XML storage? And all that despite using nvarchar type for storage instead of the specialised xml type. Claims like this one on I used to do a lot of XML in SQL Server and now I use JSON too so I decided to check how they compare when it comes to performance in SQL Server (2017).

Setting the Scene

XML Data Type has been supported by SQL Server since version 2005. It was implemented as a new built-in data type. It is possible to define a column as an XML type with, or without schema and then query the non-relational data together with relational data. It came with for xml clause and three main methods: modify, query and value defined on the XML type.

Let’s see an example of retrieving a value of a node from an XML variable:

declare @xml xml = N'<Test>123</Test>';
select @xml.value(N'(//Test/text())[1]', N'int');

JSON support was introduced in SQL Server 2016. That’s right, JSON support but not a JSON type. The data is stored as plain text, something that was possible before of course, but there is for json clause and 4 methods: isjson, json_modify, json_query and json_value.

An example of retrieving value from a JSON document can look something like this:

declare @json nvarchar(max) = N'{ "Test": "123" }';
select json_value(@json, '$.Test');

It would be difficult to compare performance of every single usecase so I have decided to focus on comparing single value retrieval from a document using .value and json_value for XML and JSON respectively.

Sample XML document

   <row key="ValueA" value="123" />
   <row key="ValueB" value="-432" />
   <row key="ValueZ" value="849" />

Sample JSON document

   "ValueA": "123",
   "ValueB": "-432",
   "ValueZ": 849"

I have tables created where there are only two columns, an integer primary key Id and a Data column of the test type. The tables are populated with 100,000 random records. The test consists of simply calculating a sum of a specific property from the document across all the rows in the table. To put things in perspective I will compare performance on the document types to aggregate on an int and an varchar type columns.

Test XML query

select sum(
   [Data].value('(//test/row[@key="ValueA"]/@value)[1]', 'int')
from dbo.XmlData

Test JSON query

select sum(
   convert(int, json_value([Data], '$.ValueA'))
from dbo.JsonData

There are no indexes or computed columns to help with the performance of the queries as the objective is to simply measure performance of parsing and document querying.

First Comparison Results

Json in SQL Server Performance Comparison

That can’t be right!? Can it? JSON query took on average of 0.43 seconds which is 36 times faster than the 16.79 seconds it took to sum the values out of the XML documents. Better (or worse) still when compared to an XML with schema which took 40.47 seconds the JSON type seems to be 94 times faster. So what happened to the 10 times faster claim? Well, the native VARCHAR column performed 10 times faster at 0.04 seconds than the JSON one. the INT column was still faster at 0.03 seconds.

It is also worth looking at the logical reads. Native INT and VARCHAR columns required just over 200 logical reads while both JSON and schema-less XML took around 12,000 logical reads, while the XML with schema required over 34,000.

Playing with storage types

The above were rather unexpected results. Both in terms of CPU and the logical reads. And that got me thinking, what if I could reduce the number of reads? The simplest way to do so would be to reduce the size of data by changing the types. I have created additional tables with the exact copy of the JSON test records but using VARCHAR(max), NVARCHAR(512) and VARCHAR(512) types in addition to the standard (as in most common in all the examples I have seen) NVARCHAR(max).

Performance of JSON data with different storage types

Another surprise! As expected using VARCHAR instead of unicode NVARCHAR halved the number of logical reads but that didn’t translate to improved CPU times. Using json_value on VARCHAR(MAX) was significantly (and consistently) worse than the same method on NVARCHAR(MAX). Specific maximum length types performed better than the the types with the max length but there too unicode required less CPU time despite having twice as many page reads.

At the beginning or at the end

So is it really possible that the old XML type is 36 times worse than on-the-fly JSON parsing? Perhaps there are some benefits to XML storage? I’d imagine the JSON parser is clever enough to stop parsing after the first matching node is found which probably means that the performance will depend on weather we are looking for a value of a property close to the beginning or close to the end of the document? And, perhaps, that’s where XML will perform better being pre-parsed type? I have modified the tests to look at the first and last values from the documents.

Performance depending on value position in a document

Here the results were more as expected. When using json_value on a NVARCHAR(max) data ValueZ being at the end of the document structure performed worse than ValueA which is at the beginning. In fact ValueZ queries took 10 times longer than ValueA. XML still has shown differences but ValueZ queries took only 30% longer than the ValueA ones. Still worst case JSON query was 4 times faster than the best case XML query!


So that shows that JSON on unicode types uses less CPU time but requires more logical reads which suggests more pressure on RAM and potentially more I/O. How does the storage requirements of all those types compare?

Data storage comparison

XML with Schema is the most expensive in terms of storage, by quite some margin. Despite being visually more verbose the XML type actually takes slightly less disk space than the unicode json which is twice as much as non-unicode json. As one would expect the standard INT columns require least.


JSON data in SQL Server 2017 really is faster than the old XML type. How much faster will depend on the position of the specific value in the document. That’s important, the structure and ordering of properties in the document matters. If you are in control of the JSON document structure consider putting properties more likely to be queried at the beginning of the document and those you know you will index at the end so that you don’t have to parse through them when working with properties which are not indexed. Types matter too. NVARCHAR is faster than VARCHAR but will use twice logical reads (and therefor RAM). Specific lengths perform better than using (max).

Command Line Tools for SQL Server

TLDR: To efficiently manage multiple SQL Server instances learn PowerShell and check out the dbatools. There is also the mssql-cli to look at too.

Since its introduction in 2005 the SQL Server Management Studio (SSMS) was the tool used for development and administration of SQL Server databases. Most of the common, day to day tasks one can perform by pointing and clicking without the need to remember (or at least type in) all those confusing commands. But I never liked mice much and the main reason I kept using SSMS was that the only real alternative was the sqlcmd command line utility which is a command line tool, but doesn’t make you any more productive than SSMS.

But the times are changing. I attended ~SQL~Data Relay earlier this week and it struck me that a lot of SQL Server demos where done without the SSMS. And as I started thinking about it I realised that in fact I don’t use it that frequently in my job either. With the power of PowerShell and the dbatools and dbachecks module I can no longer imagine managing SQL Servers with SSMS.


Imagine you’ve got 3 servers (just to make the example simple, but it could be 30 or 300, it doesn’t really matter). All we have to do is define a variable holding a list of those servers like so:

$s = server1,server2,server3

And now we can get to work. Here are a few example of tasks which in SSMS would involve too much clicking for my liking, and task I can perform with a single PowerShell command.

Find a Database

I’ve got hundreds of databases on each instance but I need to find the specific one. All I have to do is

Find-DbaDatabase -SqlInstance $s -Pattern "PartOfMyDatabaseName"

And it really can be just a part of the name, so I don’t have to remember the whole name, it doesn’t matter if I remember the beginning or the end of the name.

Check server logs

Let’s say I need to check all my servers error logs for a specific five minutes when our system had a wobble. With SSMS it’s hours of mouse abuse. With PowerShell it’s a single command again

Get-DbaSqlLog -SqlInstance $s -Afater '2018-10-10 10:00' -Before '2018-10-10 10:05`

It will collate all error logs for the time period between 10:00 and 10:05 on 10/10/2018. If the output is too long to read in the PowerShell console you can pipe it to Out-GridView. In fact you can do that with any powershell command returning data. Try adding | Out-GridView at the end of any command like so in the log example

Get-DbaSqlLog -SqlInstance $s -Afater '2018-10-10 10:00' -Before '2018-10-10 10:05` | Out-GridView

Find DB Growth Events

How to find when my databases' files have grown recently? It is possible with SSMS but why would you do it if you can do:

Find-DbaDbGrowthEvent -SqlInstance $s | Out-GridView

That’s it. That is how simple it is. The above command returns all the growth events from all of the servers previously defined in the $s variable.


I still use SSMS, especially when it comes to managing Availability Groups and the AG Dashboards. It is useful for report writing and some ad hoc querying although for those sort of tasks I started using ~SQL Server Operations Studio~ recently renamed to Azure Data Studio. It is based on VS Code and runs on Windows, Linux and macOS, and although it doesn’t have all the features of SQL Server Management Studio, it does have some gems of its own which are not available in SSMS. Another reason could be performance tuning and looking at execution plans, but for those reasons I have long switched to the free Sentry One product Plan Explorer.

For anything else I use PowerShell with dbatools. Looking at the content of SQL Relay demos I’m not alone on that trend.

Parsing Database Object Names

Problem: Parse database object identifiers from single name to 4 part name and extract individual elements. So for example Server1.MyDb.dbo.TableA is TableA in schema dbo in MyDb database on a linked server Server1.

I’m not looking for a solution that will validate the name is valid, that it meets all the requirements, or even if it is well formatted. I’m assuming that it is a reasonable name, I just want to extract all the components provided: Linked Server, Database, Schema and Object names.

Solution 1 – PowerShell and String Splitting

Define a function that splits the string by ‘.’ and assigns values based on number of elements in the array.

function Parse-SqlName {
    process {
        $output = [pscustomobject]@{ Server = ""; Database = ""; Schema = ""; Object = ""; }

        $names = $name.Split('.')
        if ($names.Length -ge 4) { $output.Server = $names[-4] }
        if ($names.Length -ge 3) { $output.Database = $names[-3] }
        if ($names.Length -ge 2) { $output.Schema = $names[-2] }
        if ($names.Length -ge 1) { $output.Object = $names[-1] }

        return $output 

To use it call

Parse-SqlName "Server1.MyDb.dbo.TableA"

This returns

Server Database Schema Object
Server1 MyDb dbo TableA

#Solution 2 – Regular Expression

Regular Expression is another way to parse strings so I set myself a challange to create one that will do the same as the above function and would work regardless how many parts of the identifier are present. Here is the result

[\[]?(?:(?:(?:(?<Server>[\w_&@$ -]+)[\.\[\]]+)?(?<Database>[\w_&@$ -]+)[\.\[\]]+)?(?<Schema>[\w_&@$ -]+)[\.\[\]]+)?(?<Object>[\w_&@$ -]+)\]?

After processing a string with it up to four groups will be available. It can be used in any language that supports regex, but here is an example using PowerShell again

function Parse-SqlName2 {
    process {
        if ($name -match '[\[]?(?:(?:(?:(?<Server>[\w_&@$ -]+)[\.\[\]]+)?(?<Database>[\w_&@$ -]+)[\.\[\]]+)?(?<Schema>[\w_&@$ -]+)[\.\[\]]+)?(?<Object>[\w_&@$ -]+)\]?') {
            return [pscustomobject]@{
                Server = $Matches.Server 
                Database = $Matches.Database 
                Schema = $Matches.Schema 
                Object = $Matches.Object 

Reseeding Identity Columns

Recently I have been asked to increase the current value of all identity columns in tens of databases. No, the point is not to argue if it was smart, or a necessary thing to do, it wasn’t production environment and it was helpful to people requesting it, so I focused on how to do it. This blog post is about how dbatools once again proved to be an invaluable tool.

The first instinct was, of course I can do it… just a little bit of dynamic SQL, perhaps a coursor or two and I’ll have it done. But the thing is, since I discovered dbatools I don’t like dynamic T-SQL any more, and I’m sure whatever the task at hand might be, somebody probably has implemented a dbatools function for that alraedy. In this instance it wasn’t not quite so, but there is Test-DbaIdentityUsage which proved to be helpful. With it, in no time, I had code looking like this:

    $db = $psitem.db
    (Test-DbaIdentityUsage -SqlInstance myinstance -Databases $db).ForEach{
        $newid = [int]($psitem.LastValue * 1.2)
        Write-Host "  processing [$db].[$($psitem.Schema)].[$($psitem.Table)] has max ID $($psitem.LastValue), setting it to $newid"
        Invoke-DbaSqlQuery -SqlInstance myinstance -Database $db -Query "dbcc checkident ('[$($psitem.Schema)].[$($psitem.Table)]', reseed, $newid)"

Not quite the one-liner I hoped for, but not all bad either. And after few minutes all the identity column next values were increased by 20%. Dbatools made me look smart again 😉

DB Stuck in Single User Mode

Some Background

TLDR? Solution is at the bottom of this page.

The internet is full of examples how to close all active connections to a database by doing something like

use master
alter database somedb set single_user with rollback immediate
alter database somedb set multi_user 

When you try it on your development machine it works just fine, when you do it in QA it is typically OK too, but one day you realise that you have made a mistake, that the use master should have been use somedb and now you have a production database in a single user mode, there is only one connection allowed at the time and yours is not the one. Worse, there are hundreds of clients, all queueing to be the only user!

You cannot connect to the database

Msg 924, Level 14, State 1, Line 1
Database 'somedb' is already open and can only have one user at a time. 

Trying to do anything that starts with alter database ... will fail too

Msg 5064, Level 16, State 1, Line 1
Changes to the state or options of database 'somedb' cannot be made at this time. The database is in single-user mode, and a user is currently connected to it. 
Msg 5069, Level 16, State 1, Line 1
ALTER DATABASE statement failed. 

Trying to get some ‘authority’ over the server by connecting as the admin (DAC) is of no use either.

So what can be done? If you happen to have one sql login per database (or a handful) then you can revoke that logins permission to log in, but then you have to make sure you grant it back, to all of the ones you revoked it from. It would work but typically either there are too many logins that would have to have rights revoked or there are too few logins and revoking any rights would make the outage wider than it already is.

A fairly obvious solution is to shut down the server and bring it back up in the single user mode. The Sqlservr.exe -m option. Then only one admin user will be allowed to connect to the instance, which will allow you to change the permissions of the affected database. The only problem is, that we were talking about a problem which affects almost exclusively busy production server. Shutting it down to fix just one database sounds like an overkill and I bet there will be a lot of explaining to do afterwards.


What I found to work quite well is to open two sessions. In the first one I execute a query that will constantly attempt to connect to the database which is stuck in the single suer mode

use somedb
go 200

That ensures that I’ve got a session that constantly is trying to connect. Then in the other session I will look up the SPID of the session currently connected and kill it. Obviously that assumes that I know my system, and I know that the application will recover if I kill the session.

select spid from sys.sysprocesses where dbid = db_id('somedb')
kill <spid>

Do it a couple of times making sure you don’t kill the spid of the first session and your first session should be connected to the database. Now all that is left is to finally do

alter database somedb set multi_user