What is the real difference of columnar database over traditional (row-oriented) database?
Columnar database is a database system that stores data in a column-wise fashion while in traditional database data has stored in row-wise fashion. Let’s see an example so that we can visualize how data will be stored for these two approaches. Below image shows our example record set.
Figure: Raw dataset
In Traditional (Row-Oriented) Database System
In traditional database system each record stored in a disk block that means all of the column values of a row stored in a single block.
Figure: Data stored in row-oriented system
We can see from the above image, data block contains all the columns values for entire row. If the block size smaller than the record size, the record storage will extended to another block. If the block size is larger than the size of the record, the remaining storage of the block will be unused, which cause inefficient use of disk space.
Let’s see another fact of row-oriented database system. What will happen when we try to retrieve all the records last name and category? In traditional database system all the records should be scan from left to write and will load all the column value in the memory no matter we are interested or not with that column. See the below image.
Figure: Data access in traditional database system
So when we try to retrieve the record for huge number (several billions) of data, it will take time and performance will degrade obviously.
In Columnar Database System
In columnar database like Amazon Redshift, each disk block stores values of a single column for multiple rows. See the below image for visualize.
Figure: Data stored in columnar database system
Advantages
Since each disk block stores only same kind of data, columnar data can be highly compressed and it shows the following advantages.
- Rapidly performs columnar operations – like MIN, MAX, COUNT, SUM and AVG.
- Reducing disk space and I/O operations.
Now consider other fact – retrieve data from columnar database. Let’s say we want to retrieve First name and category for all records in terms of columnar database. What will happen? Well, since columnar database store a single column data in an identical disk block, it will just load the specific disk block in the memory and ignore other block. See the below image.
Figure:
Data access in columnar database system
Advantages
Since columnar database only load the column values that we need, it
will save memory space by retrieving blocks for columns that we need.
Conclusion
Columnar database is not a replacement of
traditional database. For OLTP system, most transaction frequently reading and
writing all of the values for entire record and row-oriented storage is
optimal. On the other hand data warehouse queries read only few columns, so
columnar database is suitable for data warehouse system.
Thank you so much for providing and exploring more about SSIS and many other utilities to look up for how can one break down traditional practices.
ReplyDeleteSSIS Upsert