Tuesday, July 14, 2015

Redshift: How to alter column datatype in redshift

There is no direct way to change the column datatype in redshift. If we want to change the column name we can use redshift alter statement with rename keyword like,

alter table BDPlayers rename column category to grade;


But if we want to change the datatype of the column, we cannot do it easily with a single statement. Of course we can do it by following some approach. We have described three different approaches below. For demonstration purpose I have created a table with some dummy data. Below is the script.

create table BDPlayers(
 Id int identity(1,1) NOT NULL,
 FirstName varchar(100) NULL,
 LastName varchar(100) NULL,
 Category varchar(10) NULL,
 PlayingRole varchar(50) null
); 

insert into BDPlayers(firstname,lastname,category)
values('Tamim','Iqbal','A'),
 ('Somuya','Sharkar','B'),
 ('Liton','Das','B'),
 ('Mushfiq','Rahim','A'),
 ('Shakib','Al-Hasan','A'),
 ('Sabbir','Rahman','B'),
 ('Nasir','Hossain','B'),
 ('Mashrafee','Mortaza','A'),
 ('Taskin','Ahmed','C'),
 ('Rubel','Hossain','B'),
 ('Mostafizur','Rahman','C');


Now I want to alter the table to change the category column’s datatype to varchar(5) from varchar(10). Let’s see our different approaches.

Approach # 1

The easiest way to change the column datatype is to drop the existing column and add new column with proper datatype.

alter table bdplayers drop column category;
alter table bdplayers add column category varchar(5);

This approach is easier and effective if the target table is empty J. But if we apply this approach in a table that has records we will lose all data for the column. To prevent data lose we can apply approach #2.

Approach # 2

This approach will be follow the below step.

  1.  Alter the table and add a temporary column with proper datatype. [ Ex: alter table TableName add column TempColumn (proper datatype) ]
  2. Update the table and set the newly added column value from existing original column. [ Ex: update TableName set TempColumn = OriginalColumnName; ]
  3. Alter the table and drop the original column. [ Ex: alter table TableName drop column OriginalColumnName; ]
  4. Alter the table and rename the newly added column to original column name. [ Ex: alter table TableName rename column TempColumn to OriginalColumnName; ]

Let’s see our working query.

alter table BDPlayers add column category_temp varchar(5);
update BDPlayers set category_temp = category;
alter table BDPlayers drop column category;
alter table BDPlayers rename column category_temp to category;

The problem of the above two approach is that the schema will be change. The newly added column will be last of the table. This could be a problem for COPY command and wrong data will insert in wrong column if we do not explicitly specify the column name in the COPY command. If we do not want to change the schema then we can use approach #3.

Approach # 3

In this approach we can use create table as command. The steps will be like below.

  1. Alter table and rename it. [ EX: alter table tablename renameto tablename_temp ]
  2. Use create table as command to create new table from the tablename_temp table. Apply appropriate casting of the column (which you want to alter) in select statement.
Let’s see our working example:

alter table BDPlayers rename to BDPlayers_Temp;

create table BDPlayers as
SELECT id
       , firstname
       , lastname
       , cast(category as varchar(5))
       , playingrole
 FROM public.BDPlayers_Temp;

13 comments:

  1. CREATE TABLE AS won't preserve distkey and sortkey selections, as well as permissions.

    ReplyDelete
    Replies
    1. Correction you can specify these in create table as:

      CREATE TABLE BDPlayers distkey(id) sortkey(firstname) as ....

      Delete
  2. Hi Abu,

    Nice to read your blog and specially your personal thought of 'forget' ..:)

    I need some more lights by you, I have a scenario that table already exist with data and have to change the datasize of the field. But with the above approach the newly added column will be appended at the end. When we upload the data into it an error occur due to mismatch datatype. It simply means order of the field has changed.
    What would be the best possible way, either to create a dummy table with required order of the field or there is any other way in redshift to do it?

    Thanks in Advance for your suggestion...:)

    ReplyDelete
    Replies
    1. Hi MYTECH SKILLS,
      Thank you for reading my blog. You can do the following think
      1. Rename your current table
      2. Create new table with proper data type and size
      3. Perform insert into command that we insert data from old table to new table
      4. drop old table

      If your table contains big data then you can follow the below step.
      1. Create a data file by executing unload query to your current table
      2. Create new table with proper data type and size
      3. Perform copy command

      I think this will help you to think.

      Delete
  3. Hi Abu,

    Thanks for your response, I already followed similar way.
    One more help I would like to request.
    I have data with timestamp column in UTC and my local system is in EDT, what would the possible way to upload data into redshift in UTC format?

    Though I have created a table with column datatype as timestamptz

    Sample:
    04-09-2017 19:22:58 -0400
    04-09-2017 19:55:11 -0400
    04-09-2017 19:16:51 -0400

    When I uploaded that data with copy command (timeformat 'MM-DD-YYYY HH24:MI:SS') it shows in table as
    04-09-2017 19:22:58+00
    04-09-2017 19:55:11+00
    04-09-2017 19:16:51+00


    Thanks in advance...!!!

    ReplyDelete
  4. Hi,

    Could you please resolve my query:
    do I need to take lock on the table while performing below steps:
    1. Rename your current table
    2. Create new table with proper data type and size
    3. Perform insert into command that we insert data from old table to new table
    4. drop old table

    I think If table is not locked then other transactions may currept data.

    ReplyDelete
  5. RE approach #1: You cannot drop a column if its a sort/disk key

    ReplyDelete
  6. The Problem with all the above approaches is to drop and recreate all the dependent View in the DB.

    ReplyDelete