T-SQL Tuesday #111 – Why?

T-SQL Tuesday logo

This post is a part of T-SQL Tuesday #111 that invites SQL Community to answer Andy Leonard’s question of Why? Why do we do, what we do? Here is my answer.

 

Why?

My job titles typically include words like ‘database’, ‘DBA’, ‘SQL’, ‘Server’ but that’s just a corporate technicality, really. But if we met and you asked me who I am, the most likely answer you would get is that I am an eXplorer.

An eXplorer! Perhaps it is not the most obvious answer, especially in a professional, an office setting. It took me quite a while to realise it myself. One day a colleague called me the most intrinsically curious person he has ever met. I’m not entirely sure how it was meant but I took it as a compliment. And then I realised that he might be onto something.

I want to know! How are things made? How do they work? Why this and not the other way? I want to know what are the limits and what is beyond them? What is out there?

And being like that, I suppose, I could be a mechanical engineer or a car mechanic. But those professions don’t appeal to me at all! Too obvious. Too practical. Too palpable. In a sense too easy while at the same time too much hard work! I’m drawn to problems that require building complex abstractions in mind, understanding rules, but at the same time being creative.

That curiosity and the need for the not-so-obvious has got me to where I am now. Professionally it started with games on ZX Spectrum. To understand how games are made I found an interest in programming. Programming led to an interest in how computers are made. Assembly programming back in the day was such fun and a way to explore the hardware. You cannot really see the registers in the CPU or the bits in memory but if you got it all right, it was possible to write your own keyboard drivers, or modifying the VGA memory and push the limits of what seemed possible. From there I got into networks, switches, routers, servers. Somehow bits, ones and zeros travel from a computer to computer and make stuff happen. It is all difficult to see so abstractions, mental models had to be created on every level of the OSI model. Database engines were the next obvious choice as they appear to be the things people I worked with found difficult and obscure while at the same time being the things where hardware and software, infrastructure and algorithms come together.

And then it stopped there. At least professionally. Sometimes I wonder if the Peter Principal applies to personal development too? Are the databases (and data they contain) a problem difficult enough for me? Have I stopped because I reached my own level of incompetence? Maybe. Or maybe not. Maybe it’s just the fact, that there are limits to the time and space, and what one can do within those constraints?

The time is an important factor as the same forces, the curiosity, the taste for mental gymnastics shaped my other interests too. I speak 4 languages and I continuously study more (not very successfully, I must say). I studied music. Bassoon to be exact. I still spend a lot of time playing all sorts of instruments, experimenting with algorithmic music, sometimes composing. On a day off I go caving or cave diving in tight, muddy Welsh sumps. Not for the pleasure of scuba diving. There is no pleasure in it. Being submerged in a freezing cold muddy water, finding my way mostly by touch in a complex three-dimensional space. No, ‘fun’ is definitely not the word, but that is one of the last frontiers that doesn’t involve space flights. There are still places no human has ever been to before on Earth. There are still places for which there are no maps, places which are beyond known limits. I go there to see what is out there, to eXplore, to be the first.

So perhaps, the real answer to the question of why do I do what I do, why am I a DBA or a data professional is that nobody I know is paying for being a cave diver, and while the consequences of getting things wrong are different, the risks and problems faced are almost exactly the same. Yes, the way I see them they are almost exactly the same. The same traits that make me a reasonable cave diver, make me a decent production DBA too, but that’s another story.

A DBA’s thoughts on ORMs

What do you think about using NHibernate with SQL Server? What would you say if we were to use Entity Framework on our next project? What is your opinion on ORM frameworks?

I am a sort of DBA who spends a lot of his time working with developers. Deep in the implementation trenches, cutting code, trying to prevent any future ‘server issues’ by influencing the design at early stages of development. I find it much more efficient than waiting for them to chuck some code over the fence to us when there is very little we can do, but complain about them and get upset that somehow indexes and statistics don’t solve the problem. And so I hear those sort of questions a lot and hardly every I have the time to answer them in any other way than just to say ‘it depends’.

So here is an attempt on answering this question.

The developer in me wants to say:
Of course, use an ORM! Go code first if you can. It saves the time, it deals with the Object-Relational Impedance Mismatch problem, it keeps your code clean as there is no need for those strange looking data queries. All the code is in one place, one solution, easy to find, read and understand. It is data engine agnostic too, so we can deploy it on MySQL, Oracle, PostgreSQL or SQL Server. On anything really.

But then the DBA in me wants to shout:
Are you mad? Of course not! Don’t use ORMs. Ever. They produce unreadable, inefficient queries that are difficult to understand or optimise. The code first approach typically leads to inefficient schema. New database engine features are ignored because cross vendor compatibility is more important than performance. And don’t you see how those leaky abstractions of generic repositories you are using? Really, passing IQueryable to the business layer? Maybe you have the ability to run it on multiple data engine, but now your business layer depends on your ORM framework and the data model.. Read the Clean Architecture by Uncle Bob, especially the part about keeping frameworks at arm’s length.

And so the developer responds:
OK. So I will be more specific with my repositories… perhaps. Fine. But I’m not going to write any SQL statements. I don’t want any magic strings in my code with no support from the IDE. And no, no stored procedures. We cannot have logic split into multiple layers. All code needs to be in the repo, all code needs to be tested. Don’t you see, stored procedures just don’t fit in the modern software development cycle. Besides, we have developers who can write LINQ and don’t need to know any SQL.

But the DBA with a smug look on his face says:
Ha! That idea of abstracting away technology, so that you don’t have to understand it has been tried before. Sometimes it works, sometimes it doesn’t. What happened to WebForms. Wasn’t it the idea to hide HTML and JavaScrip to make web development easier for existing windows developers. How did that go?

And that’s how it starts again and again, and the discussion in my head goes on and on. But eventually I come to similar sort of conclusion time after time, and here is what I actually do. (It is a compromise on which both the developer and the DBA in me agree on, allowing me to stay sane).

  • For Proof of Concept work I use ORMs and the code first approach. That saves a lot of time and effort, and the code will be a throwaway anyway. My ORM of choice is Entity Framework but it doesn’t really matter.
  • I don’t spend much time thinking about data types. In most cases string defaulting to nvarchar(255) is good enough for a PoC.
  • I prefer to use EF Core as it supports in memory storage for even faster PoC development and testing.
  • Just in case it is not thrown away (as it should), I keep my architecture clean. I make sure to use specific repositories for data access, and that the repository abstraction is not leaking any implementation details. A repository takes and returns business objects and is using ORM framework internally only.
  • On projects which will not be thrown away I start with Dapper (a micro ORM) and stored procedures. It is a bit more work but forces me to design the data structures better, and offers a lot benefits for the future (more about it later in this post).
  • While I agree that logic should be in one place, there are different types of logic, and those should be implemented independently. There is UI Logic, there is Business Logic and there is Persistence Logic which I implement in a repository or in stored procedures. A good example would be a soft delete functionality.
  • All SQL code is kept in the same solution, is tested and deployed through the normal CI/CD channels using DbUp project.

So my answer is

Use ORMs as long as they work for you, but architect your code in such a way, that you don’t depend on them, and ready to ditch them when they start to cause more problems then they solve. Consider micro ORMs. Try Dapper.

Here are a few more benefits of using Dapper with stored procedures

  • Dapper has much smaller footprint than NHibernate or Entity Framwork.
  • Dapper is faster, almost as fast as a DataReader, when compared to full ORM frameworks. According to this at the moment of writing this post it is 10 times faster than NHibernate.
  • While being small and fast Dapper still takes the ORM problem away.
  • Stored procedures add some extra code that needs to be written, but allow access to the latest database engine features. In case on SQL Server those can be Hekaton (In-Memory OLTP), JSON or XML data types, graph structures, temporal tables, windowing functions and much more.
  • Stored procedures make performance troubleshooting and reviews much easier. For DBAs it is much easier to understand which part of an application creates the load and therefore what it is trying to do with well named stored procedure rather than a lot of auto generated SQL statements.
  • The cooperation with Database Developers is much easier, as they can easily identify queries that need to be optimised, and then improve them without worrying (too much) about any non SQL code.
  • Even if you don’t have DBAs and DBDs just now, you might in the future. If the business is successful it might be that you suddenly need to get somebody with those skills to help you. Having good structure, with a separated data layer will make their life easier, your bill lower and everybody happier.

Hashing values in SQL Server

TL;DR

  • SQL Server can hash values using some of the common hashing algorithms like MD or SHA.
  • It is possible to use XQuery in addition to XPath in XML value() function to do things T-SQL cannot do on its own.

The Details

Hash values or (hash codes) is what we typically use to store_passwords in databases. We use salt values too. Definitely, we don’t store clear text passwords. Right?

Typically during a login to an application, a password is combined with a salt value stored somewhere in a database and a hash value is calculated which is then compared with the hash value previously stored in a database. But there are other options. It is possible to calculate hash values directly in a database using T-SQL. It could be useful if a bulk update needs to be performed if you want to generate a lot of test users with predefined passwords during a database migration, but it is also possible to overwrite a password hash and gain access to the application as any user.

SQL Server (starting with 2008) has hashbytes function which can be used to calculate hashes using a number of different algorithms. The algorithms supported differ from version to version. The latest 2017 supports MD2, MD4, MD5, SHA, SHA1, SHA2_256, SHA2_512.

Let’s have a look at how to use it

declare @password nvarchar(32) = 'Secret1234'

select 'MD5' Algorithm, hashbytes('md5', @password) Hash
union all select 'SHA' ,hashbytes('sha', @password)
union all select 'SHA2_256' ,hashbytes('sha2_256', @password)

which produces something like this

Hashing results

You can see that the results are of varbinary type. That’s OK if your application is storing the hash in this format, but from my experiance, most developers will not know that a byte array can be stored in a database and they will convert it into a Base64 string.

SQL Server does not do Base64 encoding (not as far as I know) but it does support XML and XQuery, and they do encoding. So let’s use it.

declare @password nvarchar(32) = 'Secret1234'

select Algorithm
,convert(xml, N'').value('
xs:base64Binary(xs:hexBinary(sql:column("Hash")))', 'varchar(max)'
) Base64Hash
from (
select 'MD5' Algorithm, hashbytes('md5', @password) Hash
union all select 'SHA' ,hashbytes('sha', @password)
union all select 'SHA2_256' ,hashbytes('sha2_256', @password)
) t

Here I’m converting an empty string N'' to an XML type which creates an empty XML object which allows me to use it’s value() method to execute XQuery which is a functional language. In most examples of XML in SQL Server the only thing you will see is XPath in value() but XQuery can be used too.

The results are more what you’d expect:

Hashes to Base64

One thing to note. The hashing algorithms operator on bytes so are not only case sensitive but type sensitive too.

declare @password nvarchar(32) = 'Secret1234'

select
'varchar' Type
,hashbytes('md5', convert(varchar(32), @password)) Hash
union all select
'nvarchar'
,hashbytes('md5', convert(nvarchar(32), @password))

Have a look at the results. Only because type used is different, the hash is different too.

Hash values using different types

If you want to make it work with .Net make sure to use nvarchar as in .Net strings are Unicode.

How can it be useful?

  • It is possible to push hash calculation and comparison to database which means the correct hash and salt value are never loaded to the application memory.
  • It is possible to generate hashes for test users in database seeding scripts avoiding application doing row by row processing.
  • It is possible to generate hashes of objects to detect changes (although the checksum function can good enough and faster too).
  • It is possible to use XQuery to do things T-SQL cannot do.

But it brings some risk too
* It is possible to gain access to an account by updating values in a database even when using hashing with salt.

Backing SQL on Linux to Windows Share

How to back up a database from SQL Server on Linux (perhaps in a Docker container) to a Windows Share already on the network?

If you want to know how to run SQL Server on Linux in a Docker container read my earlier post.

Container run-time privileges and Linux capabilities

Before we start it is important to note that by default the containers are run with very limited privileges and cannot do much. That’s by design, to make them more secure. However, that prevents them from being able to use CIFS protocol to mount to an SMB share, exactly what we are trying to do. If you just follow the steps in the next sections when you try to mount the share you will get

Unable to apply new capability set.

To avoid this problem the container needs to be created with two extra capabilities: SYS_ADMIN and DAC_READ_SEARCH. You can read more about it on the here. But to make the long story short you need to add two --cap-add parameters to your docker run command.

The full command from my earlier post becomes:

docker run -d `
   -e ACCEPT_EULA=Y -e SA_PASSWORD=Secret1234 `
   -p 14333:1433 --name sql1 `
   --cap-add SYS_ADMIN --cap-add DAC_READ_SEARCH ` 
   mcr.microsoft.com/mssql/server:latest

Create a container with those capabilities before continuing.

The Naive Approach

Let’s assume you have a network share already available and it is accessible using UNC \\FileShare1\SqlBackups\. Being used to Windows networking one would expect to simply take a backup like so, for example:

backup database [TestDb] 
   to disk='\\FileShare1\SqlBackups\TestDB.bak'

The command completes with no errors and yet there is no TestDB.bak file in \\FileShare1\SqlBackups. Stranger still it is possible to restore from that file which doesn’t seem to exist. Try:

restore filelistonly 
   from disk='\\FileShare1\SqlBackups\TestDB.bak'

It works, because Linux translates the UNC it knows nothing about (after all it’s a Windows thing) to a local file /FileShare/SqlBackups/TestDB.bak. If you remote to the container (or execute bash on it with docker exec -it sql bash) you will find /FileShare1/SqlBackups/TestDB.bak file.

Interesting, but not what we expected.

Getting into SMB and CIFS

To solve this problem a network share needs to be mounted to a node in the Linux file system. There are two ways to do it, one is temporary using the mount command only, or a more permanent involving editing the fstab file and then using mount. But before we can do it, one problem has to be solved first. The Windows file server shares the folder using the SMB protocol which is gibberish to Linux. Luckily Linux can be taught to speak CIFS protocol which is compatible with SMB. The way to do it is to install the cifs-utils package. On Ubuntu (SQL Server vNext in a container runs on Ubuntu so I use it as an example) is to install the package with apt-get. Other flavours of the OS will use their own package managers, but overall the process is the same.

First, you will have to Connect to SQL Server on Linux.

Then the CIFS protocol utilities have to be installed using apt-get on Ubuntu.

apt-get update
apt-get install cifs-utils

You will be asked if you want to really do it as it will take an extra 41 MB of disk space. Press Y to agree.

After this operation, 41.2 MB of additional disk space will be used.
Do you want to continue? [Y/n]

Now a directory where the backups will be mounted needs to be created. Typically the external mounts are all in /mnt. So let’s create a backups directory there.

mkdir /mnt/backups

Now it is time to mount.

mount -t cifs //FileShare/SqlBackups /mnt/backups \
   -o username=yourusername,domain=yourdomain,file_mode=0777,dir_mode=0777,rw,sec=ntlm

You will be asked for a password and if everything is correct you will get prompt again which means everything went well. It is important to note the change from \ to /.

It should not be possible to navigate to that directory and create a test text file.

cd /mnt/backups
touch test.txt

With that, the test.txt file should be now visible on \\FileShare1\Backups.

And to take the backup to that share becomes no different to taking any other backup to disk:

backup database [TestDb] 
   to disk='/mnt/backups/testdb.bak'

Using Shares to Interact with the Host

It is also possible to use this approach instead of using -v or --mount to share the file system between a container and its host. It is arguably a bit more effort but it allows to easily share backup and restore location between multiple containers without read/write issues associated with folder binding.

Connect to SQL Server on Linux

How to connect to SQL Server on Linux?
It really depends on what one means by ‘connect’.

Query SQL Server on Linux from Windows

To connect to the SQL Server instance running on Linux directly or in a container to run a query it is no different from connecting to one running on Windows. Just connect using the DNS name or an IP and a port number if it is not the default 1433. The only difference is that only SQL Authentication is supported so you will not be able to use Windows Authentication and your domain credentials.

Query SQL Server from Linux

Azure Data Studio (formerly known as SQL Server Operations Studio) can run on Linux offering the same user experience as on Windows. It is also possible to use the sqlcmd which works exactly the same as the one on windows. To install it on Ubuntu or any Linux flavour using apt-get package manager you can simply do

apt-get install mssql-tools

End then just use it the same way as you would normally do

sqlcmd -S localhost -U sa -P mysecretpassword

In the default SQL Server Docker image the mssql-tools are already installer but they are not added to the $PATH variable. You can either add it, or use the full path to execute it which changes the above command to

/opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P mysecretpassword

Remote to Linux remote server

When managing a SQL Server instance it is sometimes necessary to connect to the operating system on which it runs. This is very different on Linux to running SQL Server on Windows. There is no remote desktop, and PowerShell doesn’t really work either. That’s where the typical Linux admin tools become necessary.

In a way the Secure Shell or SSH is the Linux equivalent of the Remote Desktop. There are many ways to do it, especially from another Linux box. Most of us, SQL DBAs, will be typically on a Windows machine. In that case a common approach is to use PuTTy. But if you use PowerShell and keep up with updates it is now possible to SSH directly for a PowerShell terminal too.

ssh mysqlonlinux.mydomain.com

or

ssh 10.11.12.13

Simple as that. You will connect with a specific user and now most of the admin commands will have to start with sudo which allows to execute them with elevated permissions. It’s similar to Windows’ Run as Administrator.

Connect to Docker container from the host

If you want to connect to a container running your SQL Server on Linux from the host use docker exec to start a bash shell.

docker exec -it sql1 bash

where sql1 is the name of the container hosting SQL Server.

You will connect with superuser permissions and sudo will not be necessary. The prompt will look something like this:

root@2ff21ad83800:/#

SSH into a container running SQL Server on Linux

If you cannot or don’t want to remote onto the host first to connect to the container running your SQL Server instance SSH is the way to go, again. By default, the SQL Server image does not have SSH server so you will have to install it or a custom image will have to be created. Once that is done it is no different from connecting with the ssh (or PuTTy) to any other Linux server. It doesn’t matter that it runs in a container.

Using SQL Server on Docker (1)

It is yesterday’s news. SQL Server runs on Linux, and on Docker too. There are plenty of blog posts showing how to install, start and connect to it too.

During last Data Relay, I have seen Mark Pryce-Maher‘s talk on SQL Server on Linux (twice). It was a good talk, but it followed the same pattern so not surprisingly a common question from the audience appeared to be ‘why would you want to do it?’.

I mean, the biggest claim to fame appears to be that SQL Server actually works on Linux. It doesn’t bring any new features. In fact, some features are missing. A good part of docker demos appears to be an exercise in futility too. One starts a container with SQL Server, creates a database, inserts some data then the container restarts and as if by magic the data is gone, the database never existed. So, indeed, why would you want to run an SQL Server on Docker?

This post is my attempt to address this question by showing how I use SQL Server on Docker and how I deal with the persistence problem.

First: Get it up and running!

If you want to know how to run and connect to SQL Server on Docker read the Microsoft’s Quickstart Document, or Andrew Pruski’s blog post about running vNext on Docker.

Making the long story short, assuming you are running Windows, have Docker installed, configured to run Linux containers and that you don’t have an SQL Server instance locally installed:

docker pull mcr.microsoft.com/mssql/server:latest
docker run -d -e ACCEPT_EULA=Y -e SA_PASSWORD=Secret1234 -p 1433:1433 --name sql mcr.microsoft.com/mssql/server:latest

That’s all you need to do to have the latest production SQL Server (Developer Edition) instance running on localhost on port 1433.

Creating a database that survives container restarts

If you have the container from the previous example running let’s stop and remove it first.

docker stop sql
docker rm sql

There is a number of ways to persist container data. For my purposes the simples appears to be docker volumes. You can read more about docker persistence on the official documentation site but for now let’s just create a local volume called sqlvolume

docker volume create sqlvolume

and mount that volume on the container

docker run -d -e ACCEPT_EULA=Y -e SA_PASSWORD=Secret1234 -p 1433:1433 --mount source=sqlvolume,target=/data --name sql mcr.microsoft.com/mssql/server:latest

Now, if you create a database with files on the mounted volume like so:

create database [TestDb]
on primary (name = N'TestDb', filename = N'/Data/TestDb.mdf')
log on (name = N'TestDb_log', filename = N'/Data/TestDb.ldf')

and you stop and start the container

docker stop sql
docker start sql

as if by magic, the TestDb database survived the restart.

Why I find it useful

So with some considerable effort, I was able to create a database that survives a restart of a container or that of a host system. Nothing a standard SQL Server instance running on windows couldn’t do! So why do I do it?

I write my blog posts on my laptop and I don’t want to commit to installing SQL Server on it. I want to be able to try out vNext while still having the ability to run demos against 2017 and I’m definitely not installing two instances on my laptop. Luckily I don’t have to. With the above setup all I have to do is run docker start sql and within seconds I have a local instance of SQL Server ready to serve my queries. When I’m done I do docker stop sql and it is as if the SQL Server was never there.

Performance of JSON in SQL Server

Have you heard claims that in SQL Server the new JSON data support performs 10 times better than the old XML storage? And all that despite using nvarchar type for storage instead of the specialised xml type. Claims like this one on blogs.msdn.microsoft.com. I used to do a lot of XML in SQL Server and now I use JSON too so I decided to check how they compare when it comes to performance in SQL Server (2017).

Setting the Scene

XML Data Type has been supported by SQL Server since version 2005. It was implemented as a new built-in data type. It is possible to define a column as an XML type with, or without schema and then query the non-relational data together with relational data. It came with for xml clause and three main methods: modify, query and value defined on the XML type.

Let’s see an example of retrieving a value of a node from an XML variable:

declare @xml xml = N'<Test>123</Test>';
select @xml.value(N'(//Test/text())[1]', N'int');

JSON support was introduced in SQL Server 2016. That’s right, JSON support but not a JSON type. The data is stored as plain text, something that was possible before of course, but there is for json clause and 4 methods: isjson, json_modify, json_query and json_value.

An example of retrieving value from a JSON document can look something like this:

declare @json nvarchar(max) = N'{ "Test": "123" }';
select json_value(@json, '$.Test');

It would be difficult to compare performance of every single usecase so I have decided to focus on comparing single value retrieval from a document using .value and json_value for XML and JSON respectively.

Sample XML document

<test>
   <row key="ValueA" value="123" />
   <row key="ValueB" value="-432" />
   ...
   <row key="ValueZ" value="849" />
</test>

Sample JSON document

{
   "ValueA": "123",
   "ValueB": "-432",
   ...
   "ValueZ": 849"
}

I have tables created where there are only two columns, an integer primary key Id and a Data column of the test type. The tables are populated with 100,000 random records. The test consists of simply calculating a sum of a specific property from the document across all the rows in the table. To put things in perspective I will compare performance on the document types to aggregate on an int and an varchar type columns.

Test XML query

select sum(
   [Data].value('(//test/row[@key="ValueA"]/@value)[1]', 'int')
) 
from dbo.XmlData

Test JSON query

select sum(
   convert(int, json_value([Data], '$.ValueA'))
) 
from dbo.JsonData

There are no indexes or computed columns to help with the performance of the queries as the objective is to simply measure performance of parsing and document querying.

First Comparison Results

Json in SQL Server Performance Comparison

That can’t be right!? Can it? JSON query took on average of 0.43 seconds which is 36 times faster than the 16.79 seconds it took to sum the values out of the XML documents. Better (or worse) still when compared to an XML with schema which took 40.47 seconds the JSON type seems to be 94 times faster. So what happened to the 10 times faster claim? Well, the native VARCHAR column performed 10 times faster at 0.04 seconds than the JSON one. the INT column was still faster at 0.03 seconds.

It is also worth looking at the logical reads. Native INT and VARCHAR columns required just over 200 logical reads while both JSON and schema-less XML took around 12,000 logical reads, while the XML with schema required over 34,000.

Playing with storage types

The above were rather unexpected results. Both in terms of CPU and the logical reads. And that got me thinking, what if I could reduce the number of reads? The simplest way to do so would be to reduce the size of data by changing the types. I have created additional tables with the exact copy of the JSON test records but using VARCHAR(max), NVARCHAR(512) and VARCHAR(512) types in addition to the standard (as in most common in all the examples I have seen) NVARCHAR(max).

Performance of JSON data with different storage types

Another surprise! As expected using VARCHAR instead of unicode NVARCHAR halved the number of logical reads but that didn’t translate to improved CPU times. Using json_value on VARCHAR(MAX) was significantly (and consistently) worse than the same method on NVARCHAR(MAX). Specific maximum length types performed better than the the types with the max length but there too unicode required less CPU time despite having twice as many page reads.

At the beginning or at the end

So is it really possible that the old XML type is 36 times worse than on-the-fly JSON parsing? Perhaps there are some benefits to XML storage? I’d imagine the JSON parser is clever enough to stop parsing after the first matching node is found which probably means that the performance will depend on weather we are looking for a value of a property close to the beginning or close to the end of the document? And, perhaps, that’s where XML will perform better being pre-parsed type? I have modified the tests to look at the first and last values from the documents.

Performance depending on value position in a document

Here the results were more as expected. When using json_value on a NVARCHAR(max) data ValueZ being at the end of the document structure performed worse than ValueA which is at the beginning. In fact ValueZ queries took 10 times longer than ValueA. XML still has shown differences but ValueZ queries took only 30% longer than the ValueA ones. Still worst case JSON query was 4 times faster than the best case XML query!

Storage

So that shows that JSON on unicode types uses less CPU time but requires more logical reads which suggests more pressure on RAM and potentially more I/O. How does the storage requirements of all those types compare?

Data storage comparison

XML with Schema is the most expensive in terms of storage, by quite some margin. Despite being visually more verbose the XML type actually takes slightly less disk space than the unicode json which is twice as much as non-unicode json. As one would expect the standard INT columns require least.

Conclusions

JSON data in SQL Server 2017 really is faster than the old XML type. How much faster will depend on the position of the specific value in the document. That’s important, the structure and ordering of properties in the document matters. If you are in control of the JSON document structure consider putting properties more likely to be queried at the beginning of the document and those you know you will index at the end so that you don’t have to parse through them when working with properties which are not indexed. Types matter too. NVARCHAR is faster than VARCHAR but will use twice logical reads (and therefor RAM). Specific lengths perform better than using (max).