Blog
Using Holistics SQL Tips

How to unnest / extract nested JSON data in MySQL 8.0

In this article, we will demonstrate how to use the JSON_EXTRACT and JSON_TABLE functions in MySQL8.0 to convert JSON table into tabular data for analysis and reports, and also how to utilise it in Holistics for drag-and-drop reports.

How to unnest / extract nested JSON data in MySQL 8.0

In this article, we will demonstrate how to use the JSON_EXTRACT() and JSON_TABLE() functions in MySQL8.0 to convert JSON table into tabular data for analysis and reports, and also how to utilise it in Holistics for drag-and-drop reports.

The examples below use the ga_sessions sample dataset provided by Google. All queries are written in Holistics using MySQL 8.0.

How JSON data is stored in MySQL 8.0

JSON data in MySQL is treated as its own data type, a JSON string, and can appear in 2 main forms:

  • Key-value object: a single record which consists of multiple named or indexed fields (or keys) paired with values
  • Nested Array/Table: a table built with multiple key-value objects in a hierarchical format

What JSON objects look like

Let's look at the Totals column in the ga_session dataset.

{
  "hits": 2,
  "visits": 1,
  "bounces": null,
  "newVisits": 1,
  "pageviews": 2,
  "timeOnSite": 189,
  "screenviews": null,
  "timeOnScreen": null,
  "transactions": null,
  "sessionQualityDim": 1,
  "uniqueScreenviews": null,
  "transactionRevenue": null,
  "totalTransactionRevenue": null
}

As is shown, a single record is stored as an array of keys paired with corresponding values.

It can have other such objects stored within it, like in traffic_source.adwordsClickInfo :

{
  "medium": "(none)",
  "source": "(direct)",
  "keyword": null,
  "campaign": "(not set)",
  "adContent": null,
  "campaignCode": null,
  "isTrueDirect": null,
  "referralPath": null,
  "adwordsClickInfo": {
    "page": 1,
    "slot": "Top",
    "gclId": "CjwKCAjwzYDMBRA1EiwAwCv6JgtTqBxSwgf2SWlnlbhLqjlcFnBEGCYZYXniDhqV3YB5D-X1H-LVNxoC6_wQAvD_BwE",
    "adGroupId": null,
    "isVideoAd": false,
    "campaignId": null,
    "creativeId": null,
    "criteriaId": null,
    "customerId": null,
    "adNetworkType": "Google Search",
    "targetingCriteria": null,
    "criteriaParameters": "not available in demo dataset"
  }
}

Extracting from JSON objects using JSON_EXTRACT() / ->>

Let's get them ready for our model using Totals and traffic_source.  For simple JSON strings like these, we just need to use the JSON_EXTRACT() function.

Sample code

select
  data.visit_id as visit_id,
  json_extract(data.totals, '$.hits') as hits,
  data.totals ->> '$.pageviews' as pageviews,
  json_extract(data.traffic_source, '$.adwordsClickInfo.adNetworkType') as adNetworkType,
  data.traffic_source ->>'$.adwordsClickInfo.gclId' as gclId
from
  demo_nested_data as data

Output

Syntax

Here is how we used JSON_EXTRACT()/ ->>.
Note: ->> is shorthand for JSON_EXTRACT()

The syntax is as follows:

JSON_EXTRACT(table.column, '$.childcolumn') as alias,
table.column ->> '$.childcolumn' as alias

So to extract hits from totals, we used:

json_extract(data.totals, '$.hits') as hits

And to extract pageviews from totals, we used:

data.totals ->> '$.pageviews' as pageviews 

And to extract a child-object, simply append .nestedchildcolumn to the end of childcolumn like we did when extracting adNetworkType and gclId from deep within traffic_source like so:

json_extract(data.traffic_source, '$.adwordsClickInfo.adNetworkType') as adNetworkType,
data.traffic_source ->>'$.adwordsClickInfo.gclId' as gclId

One major thing to note is that the extracted data will have the JSON data type and you may have to nest it inside a CAST() function or create a custom dimension within Holistics to change it into the data type of your choice.

There are other arguments which you may need which are not used in our guide. Please refer to the official documentation for a full list of the arguments.

JSON_EXTRACT() in Holistics

In Holistics, you can easily create a transform model and use the same SQL code as above to create the model with values already extracted into fields, ready for reporting usage.

But, if you are working on a model that does not have the values extracted, you can create a custom dimension to extract it and then use it in drag and drop report building.

All you have to do is use the same syntax you would for your SELECT argument for the desired field. You can also nest it within a CAST() function to change it to the appropriate data type.

What JSON Tables look like

As we previously saw with traffic_source.adwordsClickInfo, the JSON data type can store data within data.  So if your nested child column is also a nested column, it can become much more complex, storing entire tables within a single cell. Take a look at the structure of an element in hits column. Each element is an array of key-value objects with 33 elements, of which 13 are nested objects, and of those 13, 1 has yet another level of nested objects.

While potentially powerful, if you store your data in this way, it can cause many issues down the line. Your queries become more complicated and can become nigh on impossible to optimize in some cases.

If you attempt to use nested JSON_EXTRACT() functions, your query might implode. Luckily, MySQL 8.0 has the JSON_TABLE() function which is capable of unnesting (or flattening) nested arrays.

Extracting from Nested JSON Arrays using JSON_TABLE()

Sample Code

select
  n.visit_id,
  h.*
from
  demo_nested_data n,
  JSON_TABLE(
    n.hits,
    '$[*]' COLUMNS(
      time INT PATH '$.time',
      isInteraction bool PATH '$.isInteraction',
      NESTED PATH '$.page' COLUMNS(
        pageTitle VARCHAR(100) PATH '$.pageTitle',
        pagePath VARCHAR(40) PATH '$.pagePath'
      )
    )
  ) as h

Output

Syntax

The first thing you should note is that unlike the JSON_EXTRACT() function which is a SELECT argument, JSON_TABLE() is a FROM argument. It transforms the column into a table with columns you can then call on in your SELECT argument.

Second, you have to call your source table in addition to the JSON_TABLE() function in your FROM argument. This also creates an implicit JOIN between the 2 tables, which is why you can call on fields located in your source table in the same query without having to write the JOIN condition yourself. (e.g. called [visit.id] from the source table and is correctly associated for each row)

Let's look at the usage in these few lines:

JSON_TABLE(
    n.hits,
    '$[*]' COLUMNS(
      time INT PATH '$.time',
      NESTED PATH '$.page' COLUMNS(pageTitle VARCHAR(100) PATH '$.pageTitle')
    )

JSON_TABLE( sourcejsoncolumn , index/element COLUMNS( NESTED PATH COLUMNS() alias datatype PATH '$.childcolumn')

Arguments as in the example:

n.hits → The data source, call the JSON column which you want to extract values from

'$[*]' → specific index/element you are extracting from. If you only want to extract data from some of the listed elements, modify this argument to '$.elementname'. Or if you want to only extract the element by its index number, replace ' *' with the index or indices of your choice. e.g. first element only would be '$[0]', first 3 elements would be '$[0,1,2]'. Otherwise, '$[*]' will extract data from all elements available.

COLUMNS() → list the columns you want to extract

  • 'time' / 'pageTitle' → desired alias
  • INT / VARCHAR(100) → desired data type
  • PATH '$.time' / PATH '$.pageTitle' → designates the child column to be extracted
  • NESTED PATH '$.page'→ If the child column you are calling is nested within another column, use this argument and designate the child column where your data is nested in, then nest another COLUMNS() argument within it.

There are other arguments which you may need which are not used in our guide. Please refer to the official documentation for a full list of the arguments.

JSON_TABLE() in Holistics

As mentioned above, the JSON_TABLE() function is called in your FROM argument. As such, you cannot create a custom dimension to extract it and it must be designated in your SQL in a data transform model from the get-go. Whatever you extract can then be used as fields to create reports with drag and drop.

Conclusion

MySQL8.0 is a powerful tool for reading and formatting JSON data. With a few steps, it can be ready for reporting and analysis within Holistics. We hope this guide was helpful. Please drop us a message if you think we missed an important use case!

If you are using BigQuery, refer to this article instead.