Splitting array/string into rows in Amazon Redshift or MySQL
We've worked with a lot of customers who writes SQL on a regular basis. And from time to time they come to us with SQL question - that we thought would be interesting to share with others!
Today, one of our customers came to us with a question - They were having problems unnesting a string/array into rows in Amazon Redshift.
Question - Splitting comma-delimited string into rows in Redshift
Suppose we have a books
table in Amazon Redshift that has these data:
tags (varchar)
-----
A
A, B
C, D
E
And we want to split/unnest/explode it so that it becomes like this
tag
-----
A
A
B
C
D
E
In Postgres, we can immediately use unnest
:
select TRIM( UNNEST( STRING_TO_ARRAY(tags, ',') ) ) from books;
But how do we do this in Amazon Redshift (that doesn't support unnest)?
String to Array in Amazon Redshift
The solution (or workaround) is trying to split the string into multiple part:
with NS AS (
select 1 as n union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10
)
select
TRIM(SPLIT_PART(B.tags, ',', NS.n)) AS tag
from NS
inner join books B ON NS.n <= REGEXP_COUNT(B.tags, ',') + 1
The above query will give you the exact results as using unnest
in Postgres.
Here, the NS (number sequence) is a CTE that returns a list of number from 1 to N, here we have to make sure that our max number is greater than the size of our maximum tags, so you can try adding more numbers to the list depending on your context.
Similar problem in MySQL
Similar approach can be applied if you're using MySQL, though you have to change up the syntax/functions a little bit:
select
TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(B.tags, ',', NS.n), ',', -1)) as tag
from (
select 1 as n union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10
) NS
inner join books B ON NS.n <= CHAR_LENGTH(B.tags) - CHAR_LENGTH(REPLACE(B.tags, ',', '')) + 1
A few notes:
- MySQL doesn't have CTE, so you have to write NS as a subquery
- MySQL doesn't have
SPLIT_PART
, so we use 2 nestedSUBSTRING_INDEX
to achieve the same thing - MySQL doesn't have
REGEXP_COUNT
, so we have to work around a bit to count the number of,
in the string.
What if I have an Redshift JSON array instead?
Okay, what if your tags
column is actually a string that contains a JSON string?
tags (string with json)
-----
['A']
['A', 'B']
['C', 'D']
['E']
One approach is you can just treat them as string, and apply the same string_split above with some string cleaning (remove quotes, square brackets). Or you can use some JSON array functions:
with NS AS (
select 1 as n union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10
)
select
TRIM(JSON_EXTRACT_ARRAY_ELEMENT_TEXT(B.tags, NS.n - 1)) AS val
from NS
inner join books B ON NS.n <= JSON_ARRAY_LENGTH(B.tags)
Conclusion
Working with array in Redshift and MySQL is usually a pain, but most of the times there's always creative workaround that helps you achieve the same purpose!
I've compiled the list of examples with sample data here for your reference (you can copy and paste directly into your SQL terminal)
— — — — — — — — — — — —
•
Having problems finding a simple and affordable data reporting system for your startups? Check us out at holistics.io.
What's happening in the BI world?
Join 30k+ people to get insights from BI practitioners around the globe. In your inbox. Every week. Learn more
No spam, ever. We respect your email privacy. Unsubscribe anytime.