BIML Part III – Using BIML Script and Custom Meta Data Tables for Dynamic ETL Frameworks

 
Using BIML, BIML Script and existing metadata, such as INFORMATION_SCHEMA.TABLES, to dynamically generate SSIS packages can be very useful (See my previous post).  But what can be even more useful is to create your own metadata tables which can be used as the drivers for packages created via BIML Script.
Scenario
Let’s say that you have a scenario where you know that you are going to have to continue making a large number of SSIS packages in the future that will perform similar, but slightly different functions.  Maybe they all use different source and
destination tables with different queries, but also some of them have a
transform or two while others don’t.  You know that in the future you’re going to have to make countless more of these packages for different feeds, but you don’t know the source queries or which exact transforms each one will need.
Setup
Let’s assume we currently know of three tables we need to populate, one of which needs a derived column. In the future, the other two might also need this derived column, and we know that more tables will follow this same pattern. We’ll start with 3 source tables and 3 destination tables with the “Dest” suffix.  The destination
tables have a couple column differences from the source tables.  We also have a table called BIMLMetaData which will function as the driver for our package generation similar to how we used INFORMATION_SCHEMA.TABLES in the previous post’s example. You can download the CREATE script here.
The script assumes an existing db called Sandbox, but change it as you like.  It will also insert a few records into the source tables and into our meta data table.
Custom Metadata Table
So now we should have 7 tables – Employee, EmployeeDest, Product, ProductDest, Store, StoreDest, and BIMLMetaData.  In BIMLMetaData, each row we will store the
information specific to each package which will be combined with the common
elements of all the packages that we will create in our BIML file.  In BIMLMetaData we have:
  • Id – simple Identity column
  • SourceTableName – The name of our source table
  • TargetTableName – The name of our target table
  • SourceSQLQuery – The full SQL query that we want
    to use as the source for our package
  • NeedsTransform – A bit field to determine if
    this package requires a transform (in this case a derived column)
     
  • IsActive – A bit field to determine if we need
    to generate a package for this row
Three rows should have been inserted into BIMLMetaData from the script above, one for Product, Employee, and Store.  Note that they are all set to IsActive = 1
and Employee is set to NeedsTransform = 1.
The SQLSourceQuery values are just simple select statements which will be used in our packages.
BIML Code
First in the BIML code we will set up our connections and some variables within C#.  This is very similar to my previous post, except you will notice that highlighted in blue is a query against our BIMLMetaData table instead of INFORMATION_SCHEMA tables as in our previous example.  Also, as in that example, we are going to begin a foreach loop over each row in the result set ‘MyDataTable’.  In the query we also add a WHERE clause for IsActive = 1, this would allow us to “turn off” BIML regenerating any packages we don’t it to.
<Biml
xmlns=http://schemas.varigence.com/biml.xsd>
       <Connections>
              <Connection Name=SandBox ConnectionString=Provider=SQLNCLI11.1; Data
Source=localhost;Initial Catalog=Sandbox;Persist Security Info=True;Auto
Translate=False; Integrated Security=SSPI
/>
       </Connections>
       <Packages>
              <#

 

              //Now we are in C#!
                    
              string MyConnectionString =”Provider=SQLNCLI11.1; Data Source=localhost;Initial
Catalog=Sandbox;Persist Security Info=True;Auto Translate=False; Integrated
Security=SSPI”;  

 

              string TargetTableName;
              string TableSchema;
              string SourceTableName;
              int NeedsTransform;

        string SourceSQLQuery;
                    
              DataTable MyDataTable;
                 

 

       //Populate Data Table with values from BIMLMetaData where IsActive =1                    
MyDataTable = ExternalDataAccess.GetDataTable(MyConnectionString,”SELECT
m.SourceTableName,m.TargetTableName,m.NeedsTransform,m.SourceSQLQuery FROM
dbo.BIMLMetaData m WHERE m.IsActive = 1
“);
                    
                     /*Loop over each target table row in MyDataTable */

 

              foreach(DataRow row in MyDataTable.Rows){
                     TargetTableName = row[“TargetTableName”].ToString();
                     TableSchema = “dbo”;
                     SourceTableName = row[“SourceTableName”].ToString();
                     NeedsTransform = Convert.ToInt32(row[“NeedsTransform”]);

               SourceSQLQuery = row[“SourceSQLQuery”].ToString();

              #>
The following code snippets are contained within the above foreach loop.  For every record in the BIMLMetaData table we are going to create 1 package, named based on the source table name, count the existing rows in the target table (*Note: we don’t do anything with this count in this demo), and populate the OleDBSource in the Data Flow with the SourceSQLQuery from our meta data table.
<Package Name=Extract <#=SourceTableName#>
ConstraintMode=”Linear” ProtectionLevel=”EncryptSensitiveWithUserKey”>
<!–ConstraintMode connects the
tasks in order
–>
                    
       <Tasks>
              <ExecuteSQL Name=Get initial count ConnectionName=SandBox>
                     <DirectInput>
                           SELECT
COUNT(1) AS Total FROM
<#=TargetTableName#>
                                        
                     </DirectInput>
                                 
              </ExecuteSQL>
                           
       <Dataflow Name =<#=SourceTableName#> Data Flow>
                                 
              <Transformations>
                     <OleDbSource ConnectionName=SandBox Name=<#=SourceTableName#> Source” >
                           <DirectInput>
                           <#=SourceSQLQuery#> <!–Use SourceSQLQuery column
from BIMLMetaData to populate the OleDbSource direct input query
–>
                           </DirectInput>
                                               
                     </OleDbSource>
Next, still contained within the foreach loop, we add another nugget of C# code, this time with an if control statement.  Using the value in the  BIMLMetaData.NeedsTransform column for each table, we can have BIML generate components or logic based on our settings in the meta data table.  
<# if (NeedsTransform ==1)
       {#>
       <DerivedColumns Name=Transform <#=SourceTableName#> Data”>
              <Columns>
                     <Column Name=CreateDate DataType=DateTime > <!–Can do multiple derived
columns.  Values are Expressions
–>
                           GETDATE()
                     </Column>
              </Columns>                       
       </DerivedColumns>                        
       <#}#>
Here I’m just adding a derived column with a getdate(); but you could use this technique for much more interesting things, such as which type of Slowly Changing Dimension logic to apply.  With this type of a setup, you can configure
options and regenerate packages within minutes if a component is added or
removed. Lastly, we just add the OleDB Destination, as I have covered
in previous posts, along with our directives at the bottom.
<OleDbDestination Name=Insert into Destination ConnectionName=SandBox>
                                               
       <ExternalTableOutput Table=<#=TableSchema#>.<#=TargetTableName#>“></ExternalTableOutput>
                                               
</OleDbDestination>
                                        
                                        
</Transformations>
                                 
</Dataflow>
                          
</Tasks>
                    
</Package>
              <#}#> <!–End for each MyDataTable loop–>
</Packages>
      
</Biml>
<!–Directives:–>
<#@ template language =C# tier =2 #>
<#@ import namespace=System.Data #>
<#@
import namespace=Varigence.Biml.CoreLowerer.SchemaManagement #>
Results
Once we have created our BIML file (full code sample available here), we right click and Generate SSIS Packages and should get one package per record in the BIMLMetaData table.
 
What you should notice is that of the three packages generated, only the ‘Extract Employee’ package has our derived column component since it was the only record in BIMLMetaData to have NeedsTransform = 1.
Summary
In this demo we took a look at how you can create your own meta data tables to use in conjunction with BIML to build flexible and configurable SSIS frameworks.  Here we only used a few flags to generate packages with or without a derived column
task, but the concept could easily be applied to much more complex logic – such
as SCD Type of dimensions or what behavior updated Fact table data should use.  The thing to keep in mind is that BIML should just be “one tool in your toolbox”.  You
want to be careful not to build a very flexible, yet overly complex, set of BIML files when it would be simpler to create a package the old fashioned way.
Sample Files
Table CREATE script – CREATE_MetaData_Example_Tables.sql
Finished BIML file – BIMLScript_MetaDataTable.biml
Erik

 

Leave a Reply

Your email address will not be published. Required fields are marked *