Thursday, July 2, 2015

Redshift: Store and Querying JSON object in Redshift

Problem: How to store JSON objects in redshift and also apply filter, group by, order by etc on that object?

Solution: Suppose we have a sparse table, where we need to add many columns and it could be exceed the maximum column limit supported by redshift. In that situation we can store the data for a row in key:value pairs in a single JSON string. JSON uses UTF-8 encoded text string so we can store JSON string in redshift text or varchar type column. For demonstration purpose we have created a table and insert some dummy JSON object. Below is the script.

create table if not exists json_experiment
(
 id int primary key identity(1,1),
 jsonformatdata text
);

insert into json_experiment(jsonformatdata) 
values('{"firstname":"Abu","lastname":"Zafor","gender":"M"}'),
('{"firstname":"Abdul","lastname":"Mohitee","gender":"M"}'),
('{"firstname":"Nabiha","lastname":"Zafor","gender":"F"}'),
('{"firstname":"Shaikh","lastname":"Hasina","gender":"F"}'),
('{"firstname":"Ahsan","lastname":"Uddin","gender":"M"}');


We have created the JSON object as a key:value pair where keys were first name, last name and gender. Let’s discuss some query for manipulating our JSON object.

Query1: Write a query to extract and show the first name, last name and gender from JSON object.

We can use Redshift JSON_EXTRACT_PATH_TEXT function to extract the value from JSON object by supplying its key. Below is the query that extracts the first name, last name and gender.

select id,
 json_extract_path_text(jsonformatdata, 'firstname') as FirstName,
 json_extract_path_text(jsonformatdata, 'lastname') as LastName,
 json_extract_path_text(jsonformatdata, 'gender') as Gender
from json_experiment;


When we execute the query, we get the below result set.

Figure: Extract value from JSON object 


Query2: Write a query to return rows that contains text ‘Zafor’ in value of lastname key of JSON object.

Below is the query. The query first extract the lastname key from the JSON object then apply the filter condition.

select id,
 json_extract_path_text(jsonformatdata, 'firstname') as FirstName,
 json_extract_path_text(jsonformatdata, 'lastname') as LastName,
 json_extract_path_text(jsonformatdata, 'gender') as Gender
from json_experiment
where json_extract_path_text(jsonformatdata, 'lastname') like '%Zafor%'


Figure: Apply filter in JSON object


The value which is return by json_extract_path_text(jsonformatdata, 'lastname') function is case sensitive, so if we write the query like below (use zafor instead of Zafor) it will return zero rows.

select id,
 json_extract_path_text(jsonformatdata, 'firstname') as FirstName,
 json_extract_path_text(jsonformatdata, 'lastname') as LastName,
 json_extract_path_text(jsonformatdata, 'gender') as Gender
from json_experiment
where json_extract_path_text(jsonformatdata, 'lastname') like '%zafor%'



Query3: Write a query that return count of male and female person.

We know that we store gender information in the JSON objects gender key so that we can write the query like,

select 
 json_extract_path_text(jsonformatdata, 'gender') as Gender,
 count(json_extract_path_text(jsonformatdata, 'gender')) as NoOfOccurences
from json_experiment
group by json_extract_path_text(jsonformatdata, 'gender')
order by NoOfOccurences


When we execute the query, we get the below result set.

Figure: Apply group by and order by in JSON object 

No comments:

Post a Comment