Wednesday, September 9, 2015

SSIS: Perform upsert (Update/Insert) using SSIS Package

It is a common scenario, when we need a script, package or something else to perform insert and update in a table simultaneously. Sometimes the task seems easy and sometimes it takes so much time to perfectly synchronize the databases. In this blog post we will try to create a SSIS package that performs the synchronization task by performing upsert (Insert/Update) using SSIS lookup transformation.


Setup Environment
For demonstration purpose we have created two different databases with the same table. For simplicity we named the databases ‘SourceDatabase’ & ‘DestinationDatabase’. Both databases have one single table named Person. Below is the table script:

CREATE TABLE Person
(
 Id int IDENTITY(1,1) PRIMARY KEY NOT NULL,
 FirstName varchar(250) NOT NULL,
 LastName varchar(250) NOT NULL,
 EmailId varchar(200) NULL,
 PhoneNumber varchar(50) NULL,
 Address varchar(250) NULL
)
GO

Create Package
Let’s add a package in our SSIS project, add a DataFlow task, necessary Connection Manager that points to ‘SourceDatabase’ and ‘DestinationDatabase’. See this blog post to know, how to create package, connection manager etc. After adding the task, connection manager, Control Flow tab of our package looks like the below image.

Figure: Control Flow tab of the package. 


Add Transformations in Data Flow Task
To accomplish the synchronization between two databases, first we need to add a data source in the Data Flow task that must be point to the source database. To add a Data Source double click the Data Flow task added in the Control Flow tab. It will open the Data Flow tab, drag and drop an OLE DB Source from the SSIS toolbox. Double click the Data Source and provide the necessary configuration in Connection Manager and Columns tab, below image shows an overview of our OLE DB Source Editor window.





Figure: OLE DB Source Editor Window 


Now we need to identify the record that exists in source database but not in destination database. If we identify those records we can just insert those records in the destination database. We will use SSIS lookup transformation to identify the new record. The logic is simple; we will apply lookup transformation with the primary key of our Person table for source and destination database. The lookup transformation gives us two pipelines one is for matching record and another is for no matching record. No matching record means the records are exists in source database but not in destination database. So we just insert those records in destination database. Lest add a SSIS lookup transformation in our package.

Add Lookup Transformation
Drag and drop a lookup transformation from SSIS Toolbox to Data Flow tab, connect it with our data source, rename the transformation by a meaningful name. See the below image for overall picture.



Figure: Adding Lookup Transformation. 

Now we need to configure the Lookup Transformation. Double clicking the transformation will open the Lookup Transformation Editor window. In general tab choose the configuration option like the below image.




Figure: Lookup Transformation General configuration.

In connection tab specify destination data source in the OLE DB connection manager, if you not create destination data source yet, you can create it by clicking the new button. Below the connection manager there are two radio buttons. From the first one, we can select a table name; in this case the whole table will be load. From second option, we can write SQL query and fetch only the required column which we want to use for lookup match. We will use second option since we perform the lookup using primary key (Id in this case). We can use any column that uniquely identifies the record for lookup column. Below image depicts the overall picture.




Figure: Lookup Transformation Connection Configuration. 


In columns tab of the Lookup Transformation Editor window we need to map between the lookup columns. See the below image where we map the Id columns from Available Input Columns to Available Lookup Columns.

Figure:  Lookup Transformation Columns configuration. 


Finally close the Lookup Transformation Editor by clicking the ok button.

Insert New Record to Destination
To insert new record to the destination database, first we need to add data destination in the Data Flow tab. Drag an OLE DB Destination from SSIS Toolbox and drop it to Data Flow task tab and rename it.



Figure: Adding Data Destination.


Now we need to connect the output of the Lookup Transformation to the destination. When we drag and connect the pipeline it opens Input Output Selection window. In this window we must select the Lookup no match output, which will give us those records that are not in destination database.

Figure: Connect Lookup output to the data source.


Double click the destination and configure the Connection Manager and Mappings so that it connects to the destination database and table with required columns. After this step if we execute the package it will insert the record, which exists in source database but not in destination database, into destination database. Below is the snapshot of OLE DB Destination Editor Window.

Figure: Destination Configuration. 


Update Existing Record to Destination
In our Lookup Transformation we have another pipeline, Lookup Match Output. In this output we will get all the records that exists both source database and destination database. The records can have two states
  1. They are identical both in source and destination database.
  2. One or more columns of the record have been modified in the source database.
We don’t need to do anything with the identical record but we must update the destination database so that it will sync with source database with the updated records. For this reason we need another Lookup Transformation that will filter out the modified records.

We have added another Lookup Transformation to the Data Flow task and connect it with the previous Lookup Transformation. In this transformation we use those columns as lookup columns that are frequently updated. Let’s configure the Lookup transformation like below images.

Figure: Create the connection in Lookup Transformation.



Figure: Mapping between the columns in Lookup Transformation.


This Lookup Transformation has also two outputs. Lookup Match Output is identical so we do not need to do anything with that records. For update the changed records we need to add another transformation name OLE DB Command. We add the transformation and connect it to the Lookup No Match Output of the newly added lookup transformation.

To configure the OLE DB Command double click on it and it will open the Advanced Editor window. In Connection Managers tab change the Connection manager property.



Figure: Configure Command Connection of OLE DB Command.

Now we need an update query for updating the necessary columns. We have used below query for our Person table and the column value will be supplied from OLE DB Command parameters.

UPDATE dbo.Person
SET
    FirstName = ?, 
    LastName = ?, 
    EmailId = ?, 
    PhoneNumber = ?, 
    Address = ?
WHERE Id = ?

In Component Properties tab click on the ellipse of the Sql Command properties and it will open the editor. See the below image and insert the query.



Figure: Configure Component Properties of OLE DB Command.

Now we need to map the parameters from Column Mappings tab. See the below image and map the columns with parameter. The parameters are appeared here in the order that the order in the update query.



Figure: Configure Column Mappings of OLE DB Command.

Finally click OK and close the window. Our package is ready now. The complete package image looks like the below image.


Figure: The full package structure 

Test the Package
Initially the Person table of the ‘SourceDatabase’ has two records and table of the ‘DestinationDatabase’ has empty. Below is the insert script:


INSERT INTO dbo.Person
(
    FirstName,
    LastName,
    EmailId,
    PhoneNumber,
    Address
)
VALUES('Abu','Zafor','kahiruzzaman@gmail.com','01726269938','Dhaka'),
('Mahabub','Alam','mahabub@jaxara.com','01791123456','Dhaka')
GO

When we execute the package it inserts two records in destination database. Now we will insert another new record and update an existing record in source database and execute the package again and observe what will happen. Below is the script:


INSERT INTO dbo.Person
(
    FirstName,
    LastName,
    EmailId,
    PhoneNumber,
    Address
)
VALUES('Ahsan','Uddin','ahsan@gmail.com','01726123456','Dhaka')

UPDATE dbo.Person
SET
    Address = 'Uttara, Dhaka'
WHERE Id = 1

After executing the package we observe the following change in the destination database.


Figure: Final result after package execution.


Conclusion
In this article we saw that for insert and update operation we can use SSIS Lookup Transformation we can also use slowly changing dimension transformation. We can also use any kind of SSIS transformation like Derived Column, Data Conversion etc before load the data in destination table.

12 comments:

  1. Nice and informative! Keep it up!!

    ReplyDelete
  2. Thank you for this post, it has helped me with a project I am working on.

    ReplyDelete
  3. Thank you so much for this post. It helped me create my first SSIS package.

    ReplyDelete
  4. thank you ..this post is very much useful for me...i am new to SSIS ...for new learners this post is very much useful

    ReplyDelete
  5. Very good and clear example. Very helpful. Thanks!

    ReplyDelete
  6. Thanks, very helpful and efficient compared to merge join.

    ReplyDelete
  7. Thanks for this, I had been using SAP data services for the last 2 years and was a bit rusty with SSIS. This post really helped

    ReplyDelete
  8. I think this is a good first step in learning how to do simple ETL tasks, but could be sped up pretty significantly if OleDBCommand tasks are avoided. OleDBCommand acts on each individual record, and a much faster solution would be to store the updated records into a staging table and then merge the update table and target table using an ExecuteSQL task to perform a MERGE of the two tables outside of the Data Flow.

    Thanks for the post though, and happy coding!

    ReplyDelete
    Replies
    1. Thank you mrgnwatson for pointing me to another approach. Definitely I will try this approach also. Thanks again.

      Delete