Practical .NET

Converting JSON Objects to Relational Tables, Part 2: Creating Related Tables

SQL Server's OpenJson function will let you dismantle JSON structures into relational tables, including tables with foreign/primary key relationships.

As I noted in an earlier column, JSON is great for moving data around but not so hot when you need to convert it into data stored in a relational database. In that earlier column, I looked at why you might want to do that and handled the simple case: Converting single-value properties in a JSON object into a row in a table (or multiple rows where you have multiple objects).

To do that, I used SQL Server's OpenJson function that I discussed in an earlier post but addressing specific real-world JSON object as my example:

Declare @json NVarChar(MAX) = N'
{
 {
  "customers" : 
  [
   {"id" : "A123",
    "createdOn" : "2021-05-19T15:10:48.852073Z",
    "salesToDate" : {
          "amount" : "2040.15",
          "currency" : "USD"
    },
    "addresses" : [
        {
            "type" : "Billing",
            "street" : "500 Ridout St."
        },
        {
            "type" : "Shipping",
            "street" : "300 Strathcona"
        }
    ]       
  },
  …more customer objects…
]}';

This post is going to look at moving the values in the salesToDate property and the array in the addresses property into separate tables. As I discussed in my previous post, my first step is to strip the array of customer objects out of the customers property that enclosed it and stuff it into a string variable (declared as NVarChar(Max) because the OpenJson function prefers that for any JSON-related processing):

Declare @CustomerHold nvarchar(MAX);
Select @CustomerHold = value from OpenJson(@json);

Creating a Related Table from a Simple Object
To move the salesToDate data into a different table, the code isn't much different from the previous post where I flattened the properties values into my main table. To move salesToDate into a separate table, all I need to do is define a Table variable, pick the salesToDate property out of the JSON object, and include the customer's id property in the new table so that I can link the tables together.

In my production table, I have two choices on how I could define that table: I could have the id column function as both a foreign and primary key pointing back to the Customers table I created in the previous post. Alternatively, I could set up an auto-generated primary key column in my salesToDate table and make the id column a foreign key column.

Here's the statement that creates my separate salesToDate Table variable and loads it with a row for each JSON object in my CustomerHold variable (I use the JSON path statements to map the amount and currency properties of salesToDate to my amount and currency columns):

Declare @tSales Table (id nvarchar(100), amount decimal(7,2), currency nvarchar(3) );
Insert into @tSales
  Select * From
    OpenJson(@CustomerHold) with (id nvarchar(100), 
                                  amount decimal(7,2) '$.salesToDate.amount', 
                                  currency nvarchar(3) '$.salesToDate.currency');

A more common scenario is to convert a JSON array into multiple rows in a related table. In my sample JSON object, my addresses property is a good candidate to be converted into rows in a new table with one row for each id/address combination.

It's certainly possible that I can do that conversion in one step but, for clarity's sake, I'll do it in two. First, I create a staging table that will have a row for each customer to hold the customer id with all the JSON from its corresponding addresses property. The following statement creates that table and, again, because the address column will be holding JSON, I declare the column as NVarChar(Max) (in a production system I might use a temporary table as this staging table):

Declare @tAddressesHold Table (id nvarchar(100), addr nvarchar(MAX));

Now I load my staging table with a row for each customer. To ensure that I can process the addresses property as JSON, I flag the address column with the As Json clause and, again, declare the column in the With class as NVarChar(Max). Here's the statement loads my staging table:

Insert into @tAddressesHold
  Select * From
    OpenJson(@CustomerHold) with (id nvarchar(100), address nvarchar(max) '$.addresses' As Json);

Now I'm ready to create create multiple rows out of each one of the rows in my staging table. Again, the first step is to prepare a Table variable to hold an id and the information for an address (I've broken the address out into type and street columns):

Declare @tAddresses Table (id nvarchar(100), type nvarchar(100), street nvarchar(100));

To expand the JSON array in the addresses column and generate rows combining values from the table's non-JSON columns (i.e. my id column) with each of the objects from the JSON array, I use SQL's Cross Apply clause with the OpenJson function. The OpenJson function pulls out the individual objects from the array and the Cross Apply does a join between those objects and the rest of the columns on the row. I use the OpenJson's With keyword to specify which properties I want out of each JSON object (and I'm falling back on OpenJson's default mapping of column names to property names rather than use JSON paths):

Insert into @tAddresses
  Select id, type, street  
    From @tAddressesHold1
     Cross Apply OPENJSON (addr) WITH (type NVARCHAR(100), street NVARCHAR(100));

Moving to Production
It's possible that, once saved to disk, these tables could be used "as is" in my data warehouse or transactional system. Alternatively, I could also use the Table variables I've created here in a SQL Merge statement that updates or inserts rows in my production tables.

However, to do the conversion I'm going to need a multiple SQL statements to:

  1. Load my initial string variable
  2. Create my primary table with any flattened properties
  3. Create a staging table with my primary key field and any arrays of objects
  4. Create related tables from each of the arrays in my staging table

Plainly, this is going to be a stored procedure and, in the absence of a more general solution, I'll be creating one of these stored procedures for every JSON document I want to convert. These stored procedures will accept a JSON document as a parameter (declared as NText so that I can accept a JSON document of up to two billion characters).

The start of the stored procedure for my case study would look like this:

Create PROC CustomerJSONToTables
     @json NText 
As

/* extract collection of orders */
Declare @CustomerHold nvarchar(MAX);
Select @CustomerHold = value from OpenJson(@json);
…rest of statements…

With that stored procedure in place, I can convert my document into tables on the fly by calling my stored procedure, passing a parameter that holds my JSON document. Inside another stored procedure, that would look like this:

Exec CustomerJSONToTables @json;

At this point, I'll chicken out and leave it to you to handle properties nested more than two levels deep (with or without arrays). The best of luck to you with that! And if you do come up with a solution, I'd appreciate it if you'd drop it into the comments.

About the Author

Peter Vogel is a system architect and principal in PH&V Information Services. PH&V provides full-stack consulting from UX design through object modeling to database design. Peter tweets about his VSM columns with the hashtag #vogelarticles. His blog posts on user experience design can be found at http://blog.learningtree.com/tag/ui/.

comments powered by Disqus

Featured

  • Full Stack Hands-On Development with .NET

    In the fast-paced realm of modern software development, proficiency across a full stack of technologies is not just beneficial, it's essential. Microsoft has an entire stack of open source development components in its .NET platform (formerly known as .NET Core) that can be used to build an end-to-end set of applications.

  • .NET-Centric Uno Platform Debuts 'Single Project' for 9 Targets

    "We've reduced the complexity of project files and eliminated the need for explicit NuGet package references, separate project libraries, or 'shared' projects."

  • Creating Reactive Applications in .NET

    In modern applications, data is being retrieved in asynchronous, real-time streams, as traditional pull requests where the clients asks for data from the server are becoming a thing of the past.

  • AI for GitHub Collaboration? Maybe Not So Much

    No doubt GitHub Copilot has been a boon for developers, but AI might not be the best tool for collaboration, according to developers weighing in on a recent social media post from the GitHub team.

  • Visual Studio 2022 Getting VS Code 'Command Palette' Equivalent

    As any Visual Studio Code user knows, the editor's command palette is a powerful tool for getting things done quickly, without having to navigate through menus and dialogs. Now, we learn how an equivalent is coming for Microsoft's flagship Visual Studio IDE, invoked by the same familiar Ctrl+Shift+P keyboard shortcut.

Subscribe on YouTube