Sunday, October 18, 2015

Redshift: Upload Data to Redshift from SQL Server using Python Script

Download Executable Attachment

Sometimes we need to upload data to Redshift from SQL Server. But there is no straight forward way to do that. And we have to do the following task manually to import the data to redshift.
  1. Export data from SQL Server to a text file.
  2. Compressed the file to GZIP.
  3. Upload the file to S3.
  4. Execute copy command in redshift.


I have written a python script that do the above task. The script first read configuraion from a YML file, export the SQL server data to a text file using BCP command, compressed the text file, upload the compressed file to S3, truncate the redshift table and finally execute a copy command to load the data to redshift from that file.


You can download the full project from github. I have also create a distributable(exe) so that one can easily run the script by just change the configuration file. Download the attachement. After extracting the attachment you can find a folder name ImportDataToRedshift  and a configuration file config.yml. Change the config file by providing your own credential.

Figure: Snapshot of config.yml configuration structure.

Please give extra attention when edit config file and insert necessary space for every section. Open command promot and move to ImportDataToRedshift  directory then run the ImportDataToRedshift.exe file by providing the full path of the config file. See the below image.

Figure: Snapshot after running the executable file.

If you do not find your expected data in the redshift table you can execute the below query in redshift cluster and investigate if there is any data loading errors.

select * from stl_load_errors order by starttime desc limit 10

Using this script you can export data from any table or view of SQL Server. Just make sure that the SQL Server and Redshift table structure are identical. If you face any problem please let me know through comments.

No comments:

Post a Comment