Wednesday, September 9, 2015

SSIS: Perform upsert (Update/Insert) using SSIS Package

It is a common scenario, when we need a script, package or something else to perform insert and update in a table simultaneously. Sometimes the task seems easy and sometimes it takes so much time to perfectly synchronize the databases. In this blog post we will try to create a SSIS package that performs the synchronization task by performing upsert (Insert/Update) using SSIS lookup transformation.


Setup Environment
For demonstration purpose we have created two different databases with the same table. For simplicity we named the databases ‘SourceDatabase’ & ‘DestinationDatabase’. Both databases have one single table named Person. Below is the table script:

CREATE TABLE Person
(
 Id int IDENTITY(1,1) PRIMARY KEY NOT NULL,
 FirstName varchar(250) NOT NULL,
 LastName varchar(250) NOT NULL,
 EmailId varchar(200) NULL,
 PhoneNumber varchar(50) NULL,
 Address varchar(250) NULL
)
GO

Create Package
Let’s add a package in our SSIS project, add a DataFlow task, necessary Connection Manager that points to ‘SourceDatabase’ and ‘DestinationDatabase’. See this blog post to know, how to create package, connection manager etc. After adding the task, connection manager, Control Flow tab of our package looks like the below image.

Figure: Control Flow tab of the package. 


Add Transformations in Data Flow Task
To accomplish the synchronization between two databases, first we need to add a data source in the Data Flow task that must be point to the source database. To add a Data Source double click the Data Flow task added in the Control Flow tab. It will open the Data Flow tab, drag and drop an OLE DB Source from the SSIS toolbox. Double click the Data Source and provide the necessary configuration in Connection Manager and Columns tab, below image shows an overview of our OLE DB Source Editor window.





Figure: OLE DB Source Editor Window 


Now we need to identify the record that exists in source database but not in destination database. If we identify those records we can just insert those records in the destination database. We will use SSIS lookup transformation to identify the new record. The logic is simple; we will apply lookup transformation with the primary key of our Person table for source and destination database. The lookup transformation gives us two pipelines one is for matching record and another is for no matching record. No matching record means the records are exists in source database but not in destination database. So we just insert those records in destination database. Lest add a SSIS lookup transformation in our package.

Add Lookup Transformation
Drag and drop a lookup transformation from SSIS Toolbox to Data Flow tab, connect it with our data source, rename the transformation by a meaningful name. See the below image for overall picture.



Figure: Adding Lookup Transformation. 

Now we need to configure the Lookup Transformation. Double clicking the transformation will open the Lookup Transformation Editor window. In general tab choose the configuration option like the below image.




Figure: Lookup Transformation General configuration.

In connection tab specify destination data source in the OLE DB connection manager, if you not create destination data source yet, you can create it by clicking the new button. Below the connection manager there are two radio buttons. From the first one, we can select a table name; in this case the whole table will be load. From second option, we can write SQL query and fetch only the required column which we want to use for lookup match. We will use second option since we perform the lookup using primary key (Id in this case). We can use any column that uniquely identifies the record for lookup column. Below image depicts the overall picture.




Figure: Lookup Transformation Connection Configuration. 


In columns tab of the Lookup Transformation Editor window we need to map between the lookup columns. See the below image where we map the Id columns from Available Input Columns to Available Lookup Columns.

Figure:  Lookup Transformation Columns configuration. 


Finally close the Lookup Transformation Editor by clicking the ok button.

Insert New Record to Destination
To insert new record to the destination database, first we need to add data destination in the Data Flow tab. Drag an OLE DB Destination from SSIS Toolbox and drop it to Data Flow task tab and rename it.



Figure: Adding Data Destination.


Now we need to connect the output of the Lookup Transformation to the destination. When we drag and connect the pipeline it opens Input Output Selection window. In this window we must select the Lookup no match output, which will give us those records that are not in destination database.

Figure: Connect Lookup output to the data source.


Double click the destination and configure the Connection Manager and Mappings so that it connects to the destination database and table with required columns. After this step if we execute the package it will insert the record, which exists in source database but not in destination database, into destination database. Below is the snapshot of OLE DB Destination Editor Window.

Figure: Destination Configuration. 


Update Existing Record to Destination
In our Lookup Transformation we have another pipeline, Lookup Match Output. In this output we will get all the records that exists both source database and destination database. The records can have two states
  1. They are identical both in source and destination database.
  2. One or more columns of the record have been modified in the source database.
We don’t need to do anything with the identical record but we must update the destination database so that it will sync with source database with the updated records. For this reason we need another Lookup Transformation that will filter out the modified records.

We have added another Lookup Transformation to the Data Flow task and connect it with the previous Lookup Transformation. In this transformation we use those columns as lookup columns that are frequently updated. Let’s configure the Lookup transformation like below images.

Figure: Create the connection in Lookup Transformation.



Figure: Mapping between the columns in Lookup Transformation.


This Lookup Transformation has also two outputs. Lookup Match Output is identical so we do not need to do anything with that records. For update the changed records we need to add another transformation name OLE DB Command. We add the transformation and connect it to the Lookup No Match Output of the newly added lookup transformation.

To configure the OLE DB Command double click on it and it will open the Advanced Editor window. In Connection Managers tab change the Connection manager property.



Figure: Configure Command Connection of OLE DB Command.

Now we need an update query for updating the necessary columns. We have used below query for our Person table and the column value will be supplied from OLE DB Command parameters.

UPDATE dbo.Person
SET
    FirstName = ?, 
    LastName = ?, 
    EmailId = ?, 
    PhoneNumber = ?, 
    Address = ?
WHERE Id = ?

In Component Properties tab click on the ellipse of the Sql Command properties and it will open the editor. See the below image and insert the query.



Figure: Configure Component Properties of OLE DB Command.

Now we need to map the parameters from Column Mappings tab. See the below image and map the columns with parameter. The parameters are appeared here in the order that the order in the update query.



Figure: Configure Column Mappings of OLE DB Command.

Finally click OK and close the window. Our package is ready now. The complete package image looks like the below image.


Figure: The full package structure 

Test the Package
Initially the Person table of the ‘SourceDatabase’ has two records and table of the ‘DestinationDatabase’ has empty. Below is the insert script:


INSERT INTO dbo.Person
(
    FirstName,
    LastName,
    EmailId,
    PhoneNumber,
    Address
)
VALUES('Abu','Zafor','kahiruzzaman@gmail.com','01726269938','Dhaka'),
('Mahabub','Alam','mahabub@jaxara.com','01791123456','Dhaka')
GO

When we execute the package it inserts two records in destination database. Now we will insert another new record and update an existing record in source database and execute the package again and observe what will happen. Below is the script:


INSERT INTO dbo.Person
(
    FirstName,
    LastName,
    EmailId,
    PhoneNumber,
    Address
)
VALUES('Ahsan','Uddin','ahsan@gmail.com','01726123456','Dhaka')

UPDATE dbo.Person
SET
    Address = 'Uttara, Dhaka'
WHERE Id = 1

After executing the package we observe the following change in the destination database.


Figure: Final result after package execution.


Conclusion
In this article we saw that for insert and update operation we can use SSIS Lookup Transformation we can also use slowly changing dimension transformation. We can also use any kind of SSIS transformation like Derived Column, Data Conversion etc before load the data in destination table.

33 comments:

  1. Nice and informative! Keep it up!!

    ReplyDelete
  2. Thank you for this post, it has helped me with a project I am working on.

    ReplyDelete
  3. Thank you so much for this post. It helped me create my first SSIS package.

    ReplyDelete
  4. thank you ..this post is very much useful for me...i am new to SSIS ...for new learners this post is very much useful

    ReplyDelete
  5. Very good and clear example. Very helpful. Thanks!

    ReplyDelete
  6. Thanks, very helpful and efficient compared to merge join.

    ReplyDelete
  7. Thanks for this, I had been using SAP data services for the last 2 years and was a bit rusty with SSIS. This post really helped

    ReplyDelete
  8. I think this is a good first step in learning how to do simple ETL tasks, but could be sped up pretty significantly if OleDBCommand tasks are avoided. OleDBCommand acts on each individual record, and a much faster solution would be to store the updated records into a staging table and then merge the update table and target table using an ExecuteSQL task to perform a MERGE of the two tables outside of the Data Flow.

    Thanks for the post though, and happy coding!

    ReplyDelete
    Replies
    1. Thank you mrgnwatson for pointing me to another approach. Definitely I will try this approach also. Thanks again.

      Delete
  9. Thank you for a great explanation with steps and images.

    ReplyDelete
  10. Good Example i have used the same in my project but the problem is my update is taking long time is there any alternate transformation that works faster than lookup.

    ReplyDelete
  11. Thx! Great example nad nicely worked out with all the screenshots. Really helpful!

    ReplyDelete
  12. Great Post! Good Explanation. I like the way you described this useful content. Each Script Task is called only once within a Control Flow, unless it is in a looping control. The Script Component has to be higher octane because it is going to be called per row of data in the data stream. Compared to the Script Task, the Script Component has a steeper learning curve.
    Using the script component in SSIS

    ReplyDelete
  13. Thank you so much. It got me through my confusion.

    ReplyDelete
  14. The information you provided in this Blog is very useful.The information is worth and very useful for the beginners.SQL Server Integration Services (SSIS) is Microsoft’s business intelligence suite and ETL tool. It is a component of the Microsoft SQL Server database software that can be used to perform a broad range of data migration tasks Readmore..

    ReplyDelete
  15. First, absolutely, thank you for the hint. It helped me over the hump. But yes, it is a resource hog when approached that way. If I were dealing with millions instead of thousands of rows, this would be a no-go as some already indicated. But again, thank you so much for the help. It worked perfectly for me, albeit slow. Best, Raphael.

    ReplyDelete
  16. This comment has been removed by the author.

    ReplyDelete
  17. This blog is descriptive and very informative for people those who want to know about SSIS Update or Insert.I would love to record this post and I feel developers and techies will gain extra knowledge from this.

    ReplyDelete
  18. This is one amazing article for newbies' like me. Wanted to know how do we follow the same procedure if we have multiple tables to be updated from the staging table as source. Any insights is appreciated. Thank you.

    ReplyDelete

  19. If you want to get information about the job notification, exam and interview dates, and Sarkari Result. you will get many websites of the Government of India. You can easily find out the result for any Government examination.
    sarkariresult |
    sarkri result |
    serkari result |
    sarkari risult |
    sarakri result |
    sarkari resul |
    sarkari resut |
    sarakariresult |
    sarkari reasult |
    sharkari result |
    sarkai result |
    sarkari reslt |
    sarkariresult. |
    sarkari reslut |
    sarkari result. |
    sarkari result notification |
    sarkari resuilt |
    sarkariresul |
    srakari result |
    sarkari resutl |
    sarkari rijalt |
    sarari result |

    ReplyDelete
  20. Hehehe Nice Blog, But I have Much More Intreset to See Some Educational Stuff, If You Guy's Have Same Thinking Like Me I Have One Awesome blog Site For You https://zappysys.com/products/SSIS-powerpack/SSIS-upsert-destination HAHAHA.... It Was Awesome naa I Told You, Thank me later.

    ReplyDelete
  21. UnoGeeks Offers the best Oracle Fusion Financials Training in the market today. If you want to become Expert Fusion Financials Consultant, Enrol in the Oracle Fusion Financials Training

    ReplyDelete