SQL Server 2016 : JSON Support - SentryOne Team Blog

SQL Server 2016 : JSON Support

I was at MS Ignite last week, and attended the Foundational Keynote, entitled "The SQL Server Evolution." In that session they announced a new feature that will be available in the next version of SQL Server (and that you will get to play with in some capacity when CTP2 is released later this year): JSON.

Yes, you read that right: JSON.

I can't get into too many technical details just yet, and I'm not even sure how much functionality will actually be available in the forthcoming CTP, but I can share a little bit from what I've been able to play with so far. And I want to do that because, even though I don't have a direct need for JSON support myself, I've heard that as an argument against SQL Server a number of times (though it is often difficult to gauge how serious people are when they say that).

First of all, this is going to be standard functionality available in all editions, even Express. There is not going to be a JSON data type or any special kind of index; you would just store it as a traditional string (or extract relational data as a JSON string). But there will be extensions to T-SQL to support moving JSON data in and out of relational forms, much like we already have for XML – like FOR JSON PATH, FOR JSON AUTO, ISJSON(), JSON_QUERY(), JSON_VALUE(), JSON_MODIFY(), and OPENJSON(). It will be case sensitive, collation unaware, there will be no strict schema support, and it will be substantially less elaborate than, say, PostgreSQL's JSON support. (Just remember that this is a V1, and like many other V1 features, should get better as it matures.)

JSON will be supported in Hekaton In-Memory OLTP (even in natively compiled stored procedures) right out of the gate, though when used there you will have the same no-LOB and 8,060 byte limits we've grown to know and love. Compression, encryption, and both types of Columnstore indexes will be supported. And you'll be able to persist and index computed columns based on values extracted from the JSON string that is stored. Almost all data types are supported, with the exception of complex CLR types (geometry, geography, and any custom CLR types you use). As a first glimpse of the syntax and a demonstration of this limitation:

DECLARE @g GEOGRAPHY;
SELECT g = @g FOR JSON PATH;

Results:

Msg 13604, Level 16, State 1
FOR JSON does not support CLR types – cast CLR types explicitly into one of the supported types in FOR JSON queries.

However, this does not apply to all CLR types; hierarchyid, for example, works fine:

DECLARE @h HIERARCHYID = '/1/2/';
SELECT [My_first_JSON_query] = @h FOR JSON PATH;

Results (note that forward slashes are escaped using a backslash, and that the output column is given a new alias – "JSON_" with a GUID suffix):

JSON_F52E2B61-18A1-11d1-B105-00805F49916B
----------------------------------------------------------
{"My_first_JSON_query":"\/1\/2\/"}

Other characters are escaped as well: quotes, tabs, carriage returns, line feeds, and backslash:

DECLARE @x NVARCHAR(256) = N'tab: ' + CHAR(9)
 + N'cr/lf: ' + CHAR(13) + CHAR(10)
 + N'slashes: /\'
 + N'quote: "';
 
SELECT x = @x FOR JSON PATH;

Result:

JSON_F52E2B61-18A1-11d1-B105-00805F49916B
----------------------------------------------------------
{"x":"tab: \t; cr: \r; lf: \n; forward slash: \/; back: \\; quote: \""}

Your columns have to have an alias. If they don't:

SELECT 1,'foo' FOR JSON PATH;

Result:

Msg 13605, Level 16, State 1
Unnamed tables cannot be used as JSON identifiers as well as unnamed columns cannot be used as key names. Add alias to the unnamed column/table.

You need to use a table structure of some kind to use FOR XML AUTO. If you try:

SELECT f = 1 FOR JSON AUTO;

You'll get this:

Msg 13600, Level 16, State 1
FOR JSON AUTO requires at least one table for generating JSON objects. Use FOR JSON PATH or add a FROM clause with a table name.

This works:

SELECT f FROM (SELECT 'foo') AS x(f) FOR JSON AUTO;

Result:

JSON_8A6CFC08-C003-4E8F-A010-EF1E3AA7C16F
----------------------------------------------------------
[{"f":"foo"}]

You can't use SELECT INTO with either method:

Msg 13602, Level 16, State 1
The FOR JSON clause is not allowed in a SELECT INTO statement.

There are some quirks involving string lengths – it seems that if your JSON string (including delimiters) is longer than 2033 characters, it will get spread over multiple rows in the output.

DECLARE @t TABLE(x NVARCHAR(4000));
INSERT @t(x) SELECT REPLICATE(N'a',2023);
SELECT x FROM @t FOR JSON PATH;
 
DECLARE @u TABLE(y NVARCHAR(4000));
INSERT @u(y) SELECT REPLICATE(N'b',2027);
SELECT y FROM @u FOR JSON PATH;

Results:

JSON output

I'm not sure if that's a symptom of very early builds or how it is intended to work; if the latter, your consuming applications will need to account for the way the data is returned.


Okay, let's get serious

Now, how about some tangible examples of extracting JSON strings from existing relational data? The whole point of embracing JSON is to provide an additional data exchange format in and out of SQL Server – whether you are interfacing with existing applications, other platforms, or even other Microsoft offerings (like DocumentDB). So, naturally, there are methods for turning relational data into JSON output, and for turning JSON data into a relational format. This is how the Microsoft presentation illustrated it, on a slide entitled "Data exchange with JSON" (just ignore the Customer vs. AccountNumber discrepancy):

JSON slide

(Note that OPENJSON() is not available in current builds, so I'm going to focus on the top half of that slide – extracting relational data into JSON format. Also note that in current builds there are no carriage returns or indenting, so I'll be manually making the output of these examples a little more readable – these may not look exactly the same when you are able to try them yourself.)

Let's take a very simple example, like a few columns from sys.databases. A normal query would look like this:

SELECT TOP (2) 
  name, database_id, source_database_id, create_date 
FROM sys.databases 
ORDER BY database_id;

Output:

name    database_id   source_database_id   create_date
------  -----------   ------------------   -----------------------
master  1             NULL                 2003-04-08 09:13:36.390
tempdb  2             NULL                 2015-04-30 09:00:57.587

As a refresher, and because some of the methodologies are surprisingly similar, let's look at a couple of different ways to output this data using existing XML methods:

SELECT TOP (2) 
  name, database_id, source_database_id, create_date 
FROM sys.databases 
ORDER BY database_id
FOR XML AUTO, ROOT('Databases');
 
SELECT TOP (2) 
  name, database_id, source_database_id, create_date 
FROM sys.databases 
ORDER BY database_id
FOR XML PATH('Database'), ROOT('Databases');

The following two XML fragments are rendered:

<Databases>
  <sys.databases name="master" database_id="1" create_date="2003-04-08T09:13:36.390" />
  <sys.databases name="tempdb" database_id="2" create_date="2015-04-30T09:00:57.587" />
</Database>
 
<Databases>
  <Database>
    <name>master</name>
    <database_id>1</database_id>
    <create_date>2003-04-08T09:13:36.390</create_date>
  </Database>
  <Database>
    <name>tempdb</name>
    <database_id>2</database_id>
    <create_date>2015-04-30T09:00:57.587</create_date>
  </Database>
</Databases>

Notice that source_database_id is not present in any of the output. I included that column on purpose, to make you think about what happens to NULL values (which may or may not be desirable, depending on how the output is being used). One way to force them to be present is to add ELEMENTS XSINIL to the query:

SELECT TOP (2) 
  name, database_id, source_database_id, create_date 
FROM sys.databases 
ORDER BY database_id
FOR XML AUTO, ROOT('Databases'), ELEMENTS XSINIL;

This time the XML output is:

<Databases xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <sys.databases>
    <name>master</name>
    <database_id>1</database_id>
    <source_database_id xsi:nil="true" />
    <create_date>2003-04-08T09:13:36.390</create_date>
  </sys.databases>
  <sys.databases>
    <name>tempdb</name>
    <database_id>2</database_id>
    <source_database_id xsi:nil="true" />
    <create_date>2015-04-30T09:00:57.587</create_date>
  </sys.databases>
</Databases>

The source_database_id "data" is now included as an empty element with xsi:nil="true", which is one way for the consumer of this data to know that the entity exists but in this case did not contain any data.

Now, let's look at a couple of very similar queries using JSON:

SELECT TOP (2) 
  name, database_id, source_database_id, create_date 
FROM sys.databases 
ORDER BY database_id
FOR JSON AUTO;
 
SELECT TOP (2) 
  name, database_id, source_database_id, create_date 
FROM sys.databases 
ORDER BY database_id
FOR JSON PATH, ROOT('Databases');

The first query (FOR JSON AUTO) produces this JSON string:

JSON_F52E2B61-18A1-11d1-B105-00805F49916B
----------------------------------------------------------
[
  {
     "name":"master",
     "database_id":1,
     "create_date":"2003-04-08T09:13:36.390"
  },
  {
     "name":"tempdb",
     "database_id":2,
     "create_date":"2015-04-30T09:00:57.587"
  }
]

The second one, with FOR XML PATH, ROOT('Databases'), produces the following, very similar output:

JSON_F52E2B61-18A1-11d1-B105-00805F49916B
----------------------------------------------------------
{
  "Databases":
  [
    {
      "name":"master",
      "database_id":1,
      "create_date":"2003-04-08T09:13:36.390"
    },
    {
      "name":"tempdb",
      "database_id":2,
      "create_date":"2015-04-30T09:00:57.587"
    }
  ]
}

Here, again, source_database_id is not included in the output. With JSON it is much easier to include NULL values in the output, using the INCLUDE_NULL_VALUES option. Adding that to the first example:

SELECT TOP (2) 
  name, database_id, source_database_id, create_date 
FROM sys.databases 
ORDER BY database_id
FOR JSON AUTO, INCLUDE_NULL_VALUES;

Will produce this output:

JSON_F52E2B61-18A1-11d1-B105-00805F49916B
----------------------------------------------------------
[
  {
     "name":"master",
     "database_id":1,
     "source_database_id":null,
     "create_date":"2003-04-08T09:13:36.390"
  },
  {
     "name":"tempdb",
     "database_id":2,
     "source_database_id":null,
     "create_date":"2015-04-30T09:00:57.587"
  }
]

A few other things that have been discussed revolve around how to integrate JSON data with relational data structures. One way is to store JSON strings in NVARCHAR columns, and then there are extensions to make that more useful. For example, you can have a check constraint that ensures the contents are a valid JSON document, using ISJSON(); you can also extract values using JSON_VALUE into computed columns. The example they gave last week:

CREATE TABLE dbo.Orders
(
  OrderID INT PRIMARY KEY,
  OrderDetails NVARCHAR(4000),
  CONSTRAINT chk_OrderDetails_IsJSON CHECK (ISJSON(OrderDetails)=1),
  Quantity AS (CONVERT(INT, JSON_VALUE(OrderDetails, '$.Order.Qty')))
);
 
-- you could even index the computed column:
CREATE INDEX ix_Ord_Q ON dbo.Orders(Quantity);

http://json.org/

There is a lot more to it than this, but unfortunately this is about all I can share right now. The capabilities will be much more substantial later this year, and I will revisit with more extensive and tangible examples soon (with data exchange in both directions, I promise). You can also check out Jovan Popovic's blog post for more details.

Until then, happy JSONing!

Comments ( 10 )

      • Barbara Smith says:

        Hi Aaron, thanks for this article, I understood some parts but not everything, does anybody know of a good executive overview? I am just starting to get into programming so the code parts confused me a bit :)

      • Olive Williams says:

        Fantastic resources, Aaron! And excellent article!

      • Matt Psaltis says:

        Just wanted to add a quick comment to say thank you for a fantastic overview of these new features. I ran into the 2033 row splitting problem a couple of weeks ago and found that by wrapping the SQL queries in an additional select statement, it avoided this problem. E.g. SELECT (SELECT * FROM Table FOR JSON AUTO) As Query – Using this approach, I end up with a single row result set consisting of a single column named 'Query'.

        This approach seemed to handle several million records in a single result, deserializing the subsequent JSON string was another story entirely…

        Thanks again!

      • Aaron Bertrand says:

        Matt, that's interesting about the row splitting, thanks for the info!

      • John Corbett says:

        Matt/Aaron

        I am having this problem – the query takes a new line after 2033 characters rather than after each row is processed. As a result, the output will not work with my JavaScript app.

        Nesting in wrapping in an additional select statement just means it stops after 2033 characters. Any ideas?

        I feel this should be relatively simple!

        Cheers
        John

      • John M. says:

        Matt, Thank you!!

        I spent some hours trying to figure out what was wrong with my query. In my case, for some reason, I was getting only one row – having an incomplete and invalid JSON string as a result.

        Do you have any idea the reason why I wasn't getting multiple rows?

        I think that your approach should be included in this article – it's simple and very helpful.

        Regards,

      • Aaron Bertrand says:

        Hi Riley, thanks, I tried to make it obvious that I manipulated the output solely for readability:

        > so I'll be manually making the output of these examples a little more readable

      • Riley Major says:

        Thanks for these details.

        As a minor note: your special characters example SQL and results don't match up. The results display has the characters separated out more than the SQL would generate.

      • Russ Thomas says:

        Very thorough, thanks for the report from the front. I say, awesome. There is a lot to like about JSON (especially when compared to XML) it's way less wordy, way easier to code against, and the de-facto standard in modern disconnected communications. Completely aside from adding extended TSQL to work with JSON data, I wish they'd allow their own internal storage of semi-structured data i.e. deadlocks, xevents, query plans etc. to be stored as JSON. Just think how much more data could be fit in the same amount of space. Traces and event logs are already major disk hogs when special care isn't taken on how much is captured. I realize wholesale replacement doesn't really seam feasible, but if it was something configurable where you could specify one or the other for the internal processes to use – that would be cool.

      Leave A Comment

      Your email address will not be published.