Saturday, August 8, 2015

Traditional (Row-Oriented) Database Vs Columnar Database

What is the real difference of columnar database over traditional (row-oriented) database? 

Columnar database is a database system that stores data in a column-wise fashion while in traditional database data has stored in row-wise fashion. Let’s see an example so that we can visualize how data will be stored for these two approaches. Below image shows our example record set.

Figure: Raw dataset

In Traditional (Row-Oriented) Database System
In traditional database system each record stored in a disk block that means all of the column values of a row stored in a single block.

Figure: Data stored in row-oriented system

We can see from the above image, data block contains all the columns values for entire row. If the block size smaller than the record size, the record storage will extended to another block. If the block size is larger than the size of the record, the remaining storage of the block will be unused, which cause inefficient use of disk space.

Let’s see another fact of row-oriented database system. What will happen when we try to retrieve all the records last name and category? In traditional database system all the records should be scan from left to write and will load all the column value in the memory no matter we are interested or not with that column. See the below image.

Figure: Data access in traditional database system

So when we try to retrieve the record for huge number (several billions) of data, it will take time and performance will degrade obviously.

In Columnar Database System
In columnar database like Amazon Redshift, each disk block stores values of a single column for multiple rows. See the below image for visualize.

Figure: Data stored in columnar database system

Since each disk block stores only same kind of data, columnar data can be highly compressed and it shows the following advantages.

  1. Rapidly performs columnar operations – like MIN, MAX, COUNT, SUM and AVG.
  2. Reducing disk space and I/O operations.
Now consider other fact – retrieve data from columnar database. Let’s say we want to retrieve First name and category for all records in terms of columnar database. What will happen? Well, since columnar database store a single column data in an identical disk block, it will just load the specific disk block in the memory and ignore other block. See the below image.

Figure: Data access in columnar database system

Since columnar database only load the column values that we need, it will save memory space by retrieving blocks for columns that we need.

Columnar database is not a replacement of traditional database. For OLTP system, most transaction frequently reading and writing all of the values for entire record and row-oriented storage is optimal. On the other hand data warehouse queries read only few columns, so columnar database is suitable for data warehouse system.

No comments:

Post a Comment