Practical .NET

Converting JSON Objects to Relational Tables: Flattening the Object

Here's how you can use SQL Server's OpenJson function to dismantle JSON structures into relational tables targeting either transactional systems or data warehouses.

JSON is great for moving data around but not so hot when you need to convert it into data stored in a relational database -- it's a problem complex enough that you might consider using a document-oriented or other NoSQL database rather than using a relational database. If, however, you want to get your data into some sort of SQL database (either to support your transactional systems or as the first step to get your data into a data warehouse) and you're not interested in a full-fledged Extract, Transform and Load solution, SQL Server's OpenJson function might be all you need.

I looked at OpenJson in an earlier post, but I'm going to return to the topic with a more practical focus: to address a specific real-world case with a typical JSON object as my example. Specifically, this JSON sample with both simple and complex properties (and a complex property with repeating values):

Declare @json NVarChar(MAX) = N'
{
 {
  "customers" : 
  [
   {"id" : "A123",
    "createdOn" : "2021-05-19T15:10:48.852073Z",
    "salesToDate" : {
          "amount" : "2040.15",
          "currency" : "USD"
    },
    "addresses" : [
        {
            "type" : "Billing",
            "street" : "500 Ridout St."
        },
        {
            "type" : "Shipping",
            "street" : "300 Strathcona"
        }
    ]       
 
  },
  …more customer objects…
]}';

One solution to moving this structure to relational tables is just to flatten the JSON into a single table with columns like id and createdOn. In this scenario, the properties in salesToDate could, for example, be converted into columns called salesToDateAmount and salesToDateCurrency. That may make sense if you're targeting a data warehouse where you may not want to take your data to third or fourth normal form -- simply eliminating repeated fields might be sufficient for reporting purposes.

On the other hand, you might prefer to move salesToDate into a separate table. The collection of addresses probably gives you less leeway -- those addresses probably do belong in a related Address table to support the variety of addresses a customer might have (not only "shipping" and "billing" but, potentially, "contact" and "alternate shipping").

In this post, I'm going to look at flattening salesToDate into a row with simple properties in a Customers table. In an upcoming post, I'll look at setting up both salesToDate and addresses as separate tables from that Customers table .

Creating a Flattened Table
My first step is to strip the JSON collection of customer objects out of the customers property it's nested inside of and put it in a string variable to use in subsequent steps. OpenJson prefers JSON to be held in NVarChar(Max) variables so that's how I declare my string variable:

Declare @CustomerHold nvarchar(MAX);

By default, OpenJson will refer to the value of the array inside the customers property as "value" so this code declares a string variable called CustomerHold and then loads it with that property from my JSON object:

Select @CustomerHold = value from OpenJson(@json);

Next I need to deal with the simple properties on the JSON document by loading them into a Table variable. My first step, therefore, is to define a Table variable to hold those properties:

Declare @Customers Table (id nvarchar(100), createdOn Date);

In a production system, rather than use a Table variable, I could truncate an existing table on disk and load the rows from my JSON object into it; alternatively, I could skip truncating the existing table and just append the rows from my JSON object. In my examples, I'll use Insert Into which supports both scenarios.

The OpenJson's With clause allows me to select property names within my JSON document and return them as if they were columns in a table just by giving my columns names that match properties in my JSON object. Also in the With clause, I define the data types for those columns which, among other benefits, allows me to convert the string-value date in the object into a genuine SQL Server date.

This SQL statement extracts the id and createdOn properties from my JSON object (and creates a row with columns called id and createdOn) for every customer in the JSON array:

Insert into @Customers
   Select * From
       OpenJson(@CustomerHold) With (id nvarchar(100), 
                                     createdOn Date);

Another nice feature of the With clause is that if a property is missing in a particular object (something that's perfectly OK in JSON), OpenJson just returns NULL for that property rather than giving up completely.

Flattening Complex Properties
Flattening complex properties into my main row is also pretty straightforward. First, I have to add columns to hold the data to my Table variable:

Declare @Customers Table (id nvarchar(100),
                          createdOn Date, 
                          salesToDateAmount decimal(7,2), 
                          salesToDateCurrency nchar(3));

The next step is to extend my With clause by specifying the new properties I want to extract from the salesToDate. So far, I've been counting on OpenJson's ability to match the column names in my table to the properties in my JSON object. With these nested properties, however, I'll use a different approach.

Where I can't use column names to match property names (or where I want my columns to have different names than my property names), I can use JSON path expressions in OpenJson's With clause to tie a column to a JSON property. These path expressions begin with a $ (indicating the root of the object) followed by the names of the nested properties, separated by periods. To specify, for example, the amount property nested inside the salesToDate property, I use '$.salesToDate.amount' (the path must be enclosed in single quotes).

This statement grabs the amount and currency properties inside the salesToDate property so that my Select statement can return them in columns called salesToDateAmount and salesToDateCurrency:

Insert into @Customers
   Select * From
       OpenJson(@CustomerHold) With (id nvarchar(100), 
                                     createdOn Date, 
                                     salesToDateAmount decimal(7,2) '$.salesToDate.amount', 
                                     salesToDateCurrency nchar(3) '$.salesToDate.currency' );

That's the easy stuff out of the way. With my next post, I'll create related tables both for the salesToDate property and the array in the addresses property.

About the Author

Peter Vogel is a system architect and principal in PH&V Information Services. PH&V provides full-stack consulting from UX design through object modeling to database design. Peter tweets about his VSM columns with the hashtag #vogelarticles. His blog posts on user experience design can be found at http://blog.learningtree.com/tag/ui/.

comments powered by Disqus

Featured

Subscribe on YouTube