BIML Part III – Using BIML Script and Custom Meta Data Tables for Dynamic ETL Frameworks

 
Using BIML, BIML Script and existing metadata, such as INFORMATION_SCHEMA.TABLES, to dynamically generate SSIS packages can be very useful (See my previous post).  But what can be even more useful is to create your own metadata tables which can be used as the drivers for packages created via BIML Script.
Scenario
Let’s say that you have a scenario where you know that you are going to have to continue making a large number of SSIS packages in the future that will perform similar, but slightly different functions.  Maybe they all use different source and
destination tables with different queries, but also some of them have a
transform or two while others don’t.  You know that in the future you’re going to have to make countless more of these packages for different feeds, but you don’t know the source queries or which exact transforms each one will need.
Setup
Let’s assume we currently know of three tables we need to populate, one of which needs a derived column. In the future, the other two might also need this derived column, and we know that more tables will follow this same pattern. We’ll start with 3 source tables and 3 destination tables with the “Dest” suffix.  The destination
tables have a couple column differences from the source tables.  We also have a table called BIMLMetaData which will function as the driver for our package generation similar to how we used INFORMATION_SCHEMA.TABLES in the previous post’s example. You can download the CREATE script here.
The script assumes an existing db called Sandbox, but change it as you like.  It will also insert a few records into the source tables and into our meta data table.
Custom Metadata Table
So now we should have 7 tables – Employee, EmployeeDest, Product, ProductDest, Store, StoreDest, and BIMLMetaData.  In BIMLMetaData, each row we will store the
information specific to each package which will be combined with the common
elements of all the packages that we will create in our BIML file.  In BIMLMetaData we have:
  • Id – simple Identity column
  • SourceTableName – The name of our source table
  • TargetTableName – The name of our target table
  • SourceSQLQuery – The full SQL query that we want
    to use as the source for our package
  • NeedsTransform – A bit field to determine if
    this package requires a transform (in this case a derived column)
     
  • IsActive – A bit field to determine if we need
    to generate a package for this row
Three rows should have been inserted into BIMLMetaData from the script above, one for Product, Employee, and Store.  Note that they are all set to IsActive = 1
and Employee is set to NeedsTransform = 1.
The SQLSourceQuery values are just simple select statements which will be used in our packages.
BIML Code
First in the BIML code we will set up our connections and some variables within C#.  This is very similar to my previous post, except you will notice that highlighted in blue is a query against our BIMLMetaData table instead of INFORMATION_SCHEMA tables as in our previous example.  Also, as in that example, we are going to begin a foreach loop over each row in the result set ‘MyDataTable’.  In the query we also add a WHERE clause for IsActive = 1, this would allow us to “turn off” BIML regenerating any packages we don’t it to.
<Biml
xmlns=http://schemas.varigence.com/biml.xsd>
       <Connections>
              <Connection Name=SandBox ConnectionString=Provider=SQLNCLI11.1; Data
Source=localhost;Initial Catalog=Sandbox;Persist Security Info=True;Auto
Translate=False; Integrated Security=SSPI
/>
       </Connections>
       <Packages>
              <#

 

              //Now we are in C#!
                    
              string MyConnectionString =”Provider=SQLNCLI11.1; Data Source=localhost;Initial
Catalog=Sandbox;Persist Security Info=True;Auto Translate=False; Integrated
Security=SSPI”;  

 

              string TargetTableName;
              string TableSchema;
              string SourceTableName;
              int NeedsTransform;

        string SourceSQLQuery;
                    
              DataTable MyDataTable;
                 

 

       //Populate Data Table with values from BIMLMetaData where IsActive =1                    
MyDataTable = ExternalDataAccess.GetDataTable(MyConnectionString,”SELECT
m.SourceTableName,m.TargetTableName,m.NeedsTransform,m.SourceSQLQuery FROM
dbo.BIMLMetaData m WHERE m.IsActive = 1
“);
                    
                     /*Loop over each target table row in MyDataTable */

 

              foreach(DataRow row in MyDataTable.Rows){
                     TargetTableName = row[“TargetTableName”].ToString();
                     TableSchema = “dbo”;
                     SourceTableName = row[“SourceTableName”].ToString();
                     NeedsTransform = Convert.ToInt32(row[“NeedsTransform”]);

               SourceSQLQuery = row[“SourceSQLQuery”].ToString();

              #>
The following code snippets are contained within the above foreach loop.  For every record in the BIMLMetaData table we are going to create 1 package, named based on the source table name, count the existing rows in the target table (*Note: we don’t do anything with this count in this demo), and populate the OleDBSource in the Data Flow with the SourceSQLQuery from our meta data table.
<Package Name=Extract <#=SourceTableName#>
ConstraintMode=”Linear” ProtectionLevel=”EncryptSensitiveWithUserKey”>
<!–ConstraintMode connects the
tasks in order
–>
                    
       <Tasks>
              <ExecuteSQL Name=Get initial count ConnectionName=SandBox>
                     <DirectInput>
                           SELECT
COUNT(1) AS Total FROM
<#=TargetTableName#>
                                        
                     </DirectInput>
                                 
              </ExecuteSQL>
                           
       <Dataflow Name =<#=SourceTableName#> Data Flow>
                                 
              <Transformations>
                     <OleDbSource ConnectionName=SandBox Name=<#=SourceTableName#> Source” >
                           <DirectInput>
                           <#=SourceSQLQuery#> <!–Use SourceSQLQuery column
from BIMLMetaData to populate the OleDbSource direct input query
–>
                           </DirectInput>
                                               
                     </OleDbSource>
Next, still contained within the foreach loop, we add another nugget of C# code, this time with an if control statement.  Using the value in the  BIMLMetaData.NeedsTransform column for each table, we can have BIML generate components or logic based on our settings in the meta data table.  
<# if (NeedsTransform ==1)
       {#>
       <DerivedColumns Name=Transform <#=SourceTableName#> Data”>
              <Columns>
                     <Column Name=CreateDate DataType=DateTime > <!–Can do multiple derived
columns.  Values are Expressions
–>
                           GETDATE()
                     </Column>
              </Columns>                       
       </DerivedColumns>                        
       <#}#>
Here I’m just adding a derived column with a getdate(); but you could use this technique for much more interesting things, such as which type of Slowly Changing Dimension logic to apply.  With this type of a setup, you can configure
options and regenerate packages within minutes if a component is added or
removed. Lastly, we just add the OleDB Destination, as I have covered
in previous posts, along with our directives at the bottom.
<OleDbDestination Name=Insert into Destination ConnectionName=SandBox>
                                               
       <ExternalTableOutput Table=<#=TableSchema#>.<#=TargetTableName#>“></ExternalTableOutput>
                                               
</OleDbDestination>
                                        
                                        
</Transformations>
                                 
</Dataflow>
                          
</Tasks>
                    
</Package>
              <#}#> <!–End for each MyDataTable loop–>
</Packages>
      
</Biml>
<!–Directives:–>
<#@ template language =C# tier =2 #>
<#@ import namespace=System.Data #>
<#@
import namespace=Varigence.Biml.CoreLowerer.SchemaManagement #>
Results
Once we have created our BIML file (full code sample available here), we right click and Generate SSIS Packages and should get one package per record in the BIMLMetaData table.
 
What you should notice is that of the three packages generated, only the ‘Extract Employee’ package has our derived column component since it was the only record in BIMLMetaData to have NeedsTransform = 1.
Summary
In this demo we took a look at how you can create your own meta data tables to use in conjunction with BIML to build flexible and configurable SSIS frameworks.  Here we only used a few flags to generate packages with or without a derived column
task, but the concept could easily be applied to much more complex logic – such
as SCD Type of dimensions or what behavior updated Fact table data should use.  The thing to keep in mind is that BIML should just be “one tool in your toolbox”.  You
want to be careful not to build a very flexible, yet overly complex, set of BIML files when it would be simpler to create a package the old fashioned way.
Sample Files
Table CREATE script – CREATE_MetaData_Example_Tables.sql
Finished BIML file – BIMLScript_MetaDataTable.biml
Erik

 

BIML Part II – Adding BIML Script to Automate SSIS Package Generation

In my last post (http://www.erikhudzik.com/2016/08/09/introduction-to-business-intelligence-markup-language-biml/)
we went through a quick introduction to BIML – AKA Business Intelligence Markup Language.  We used it, and the help of BIDS Helper, to generate a very basic SSIS package simply from the XML based BIML code.  In this post, we are going to jump ahead to the really cool things that BIML can do by adding in the concept of BIML Script.  With BIML Script, we are going to very quickly create a separate extract and load package for all of Adventure Works DW tables just by using metadata.
To jump right ahead, all demo files and samples are at the end of the post.

 

BIML Script

 

Along with BIML proper, our friends at Varigence also have defined a scripting
language that can be used within BIML files, called BIML Script.  BIML and BIML Script are somewhat like HTML and ASP.NET.  Using BIML, BIML Script, and our BIDS Helper to generate the packages, we can take one BIML file and generate multiple SSIS packages for different meta data with the same exact design pattern.  You can probably already start to imagine the possibilities.

 

I’m going to jump right into the BIML Script part of things, so please see my previous post linked above to get caught up.  First some light housekeeping.  In order to  properly use BIML Script, we need to import some references just as we would in other languages.  We also need to define which .Net language we want to use – C# or VB.Net.  We will be using C# for this demo.
<Biml
xmlns=http://schemas.varigence.com/biml.xsd>
<!–Directives–>
<#@ template language =C# #> <!–VB.Net or C#–>
<#@ import namespace=System.Data #> <!–Imported namespaces to give us access to some BIML Script methods–>
<#@ import namespace=Varigence.Biml.CoreLowerer.SchemaManagement #>
</Biml>
 

 

The <#@ #> tags are called “Directives” and actually can be included at the beginning or the end of our BIML file because the compiler finds them first wherever they are.  I
like to keep them at the bottom because I think it keeps our BIML files a little cleaner.

 

Now the magic of BIML Script is the ability to embed .Net code “nuggets” inside of the BIML.  We embed these code nuggets within <# #> tags as shown below.  As you can see, within the BIML file we can jump in and out of C#.  Below we jump to
C# and define some variables. Unfortunately, at this point Intellisense gets confused with the mixing of languages, so we lose a lot of syntax highlighting going forward.  I will try to indent and italicize the C# nuggets for clarity.
<Biml xmlns=http://schemas.varigence.com/biml.xsd>
  <Connections>
    <Connection Name=EriksDB ConnectionString=Data Source=localhost;Initial Catalog=AdventureWorks2014;Provider=SQLNCLI11.1;Persist Security Info=True;Auto
Translate=False; Integrated Security=SSPI
/>
    <Connection Name=EriksDestDB ConnectionString=Data Source=localhost;Initial
Catalog=AdventureWorks2014Destination;Provider=SQLNCLI11.1;Persist Security
Info=True;Auto Translate=False; Integrated Security=SSPI
/>
  </Connections>
  <Packages>

 

<#
//Now we are in C#!              
//Create some variables

 

string MyConnectionString
=”Data Source=localhost;Initial
Catalog=AdventureWorks2014;Provider=SQLNCLI11.1;Persist Security Info=True;Auto  Translate=False; Integrated Security=SSPI” ;  
string DestinationTableName;
string TableSchema;
string SourceTableName;
DataTable MyDataTable;

#>
<!–Back in BIML XML–>

 

</Packages>
</Biml>

 

As you can see, we can jump back and forth between BIML/XML and BIML Script C# code just by using the <# #> tags.  The important take away here is that the embedded BIML Script will be “run” when you compile your BIML file and will be used inthe expansion of the resulting BIML file – which is then compiled into your
SSIS packages.
  You could use BIDS Helper to compile the above BIML but it would not produce anything. 

 

Setting Up Our Demo Environment

 

Now that we have a basic idea of BIML Script, let’s use it to dynamically create a bunch of simple extract and load packages.  For this demo we will use the  AdventureWorks 2014 DW sample, as well as a modified version of it for our destination db that removes constraints to make it easier.  Here is the original AdventureWorks 2014 DW db backup, and here is the modified version.  First
restore the .bak file from codeplex to create the AdventureWorks2014 db on your
test server, then run CREATE_AW2014Destination.sql file to create a new db called AW2014Destination with all empty tables.

 

Using BIML Script to Dynamically Create Multiple Packages

 

We will walk through this piece by piece, and then summarize at the end, so please bear with me on the code snippets.  Below is a continuation of the C# snippet from above which is embeded within the BIML <Packages> tag.  What we added below is a GetDataTable method using a SQL query (highlighted in blue) to pull a list of tables from the INFORMATION_SCHEMA.Tables table.  Some tables with hierarchy
datatypes were filtered out for simplicity in this demo.

 

We then set up a for each loop to loop over each row (AKA each record of a table from the meta data) .  Then using the #> tag, we close out of the C# nugget and jump back to BIML XML with a <package> tag.  Notice that within the <package> tag, the Name attribute contains <#=SourceTableName#>using the <#= … #>
syntax, we can reference variables from previous C# BIML Script nuggets.  In this case, the SourceTableName parameter.

 

Eagle eyed readers will notice that in the snippet below, there is no closing “}” to the foreach loop. We’ll get to that shortly!

 

<Packages>
<#
//Now we are in C#!
                    
string MyConnectionString =”Data Source=localhost;InitialCatalog=AdventureWorks2014;Provider=SQLNCLI11.1;Persist Security Info=True;Auto Translate=False; Integrated Security=SSPI” ;   

 

string DestinationTableName;
string TableSchema;
string SourceTableName;

DataTable MyDataTable;

 

//Populate Data Table with
list of tables

 

MyDataTable =
ExternalDataAccess.GetDataTable(MyConnectionString
,”SELECT t.TABLE_CATALOG, t.TABLE_SCHEMA,t.TABLE_NAME FROM INFORMATION_SCHEMA.TABLES t WHERE t.TABLE_SCHEMA <> ‘dbo’ AND t.TABLE_TYPE =’Base Table’ AND t.TABLE_NAME NOT LIKE ‘%Document%’ AND t.TABLE_NAME <>‘Employee'”);

 

/*Loop over each target table row in MyDataTable */
        foreach(DataRow row in MyDataTable.Rows){
        TableSchema = row[“TABLE_SCHEMA”].ToString();
        SourceTableName = row[“TABLE_NAME”].ToString();

 

#>

 

<Package ConstraintMode=Linear Name=Extract <#=SourceTableName#>” ProtectionLevel=”EncryptSensitiveWithUserKey”> <!–ConstraintMode connects the tasks in order–>

 

 

The key thing to realize with the above snippet, is that everything after the foreach loop, is STILL contained in the foreach loop – including the BIML XML.  This means that for each row in MyDataTable, when the BIML file is expanded, everything
following the foreach { is repeated. Below, everything within the box is repeated for every row in MyDataTable.  You will finally notice the closing “}” bracket just after the </package> tag.

 

<Packages>
<#
//Now we are in C#!

 

string MyConnectionString =”Data Source=localhost;InitialCatalog=AdventureWorks2014;Provider=SQLNCLI11.1;Persist Security Info=True;Auto Translate=False; Integrated Security=SSPI” ;   
string DestinationTableName;
string TableSchema;
string SourceTableName;
DataTable MyDataTable;

 

//Populate Data Table with list of tables
MyDataTable = ExternalDataAccess.GetDataTable(MyConnectionString,”SELECT
t.TABLE_CATALOG, t.TABLE_SCHEMA,t.TABLE_NAME FROM INFORMATION_SCHEMA.TABLES t WHERE t.TABLE_SCHEMA
<> ‘dbo’ AND t.TABLE_TYPE =’Base Table’ AND t.TABLE_NAME NOT LIKE ‘%Document%’ AND t.TABLE_NAME <>‘Employee'”);

 

/*Loop over each target table row in MyDataTable */

 

foreach(DataRow row in MyDataTable.Rows){

 

TableSchema = row[“TABLE_SCHEMA”].ToString();
SourceTableName = row[“TABLE_NAME”].ToString();

 

#>

 

 

<Package ConstraintMode=Linear Name=Extract <#=SourceTableName#>
ProtectionLevel=”EncryptSensitiveWithUserKey”>
<!–ConstraintMode connects the
tasks in order
–>
<Tasks>
<ExecuteSQL Name=Truncate <#=SourceTableName#>
ConnectionName=”EriksDestDB”>
  <DirectInput>
    TRUNCATE TABLE <#=TableSchema#>.<#=SourceTableName#>
  </DirectInput>
</ExecuteSQL>
     
<Dataflow Name =DFT Extract >
        <Transformations>
          <OleDbSource ConnectionName=EriksDB Name=<#=SourceTableName#> Source” >
            <DirectInput>
              SELECT *
              FROM <#=TableSchema#>.<#=SourceTableName#>
            </DirectInput>
          </OleDbSource>
<OleDbDestination Name=Insert <#=SourceTableName#>
ConnectionName=”EriksDestDB” KeepIdentity=”true” >
                                               
<ExternalTableOutput Table=<#=TableSchema#>.<#=SourceTableName#>“></ExternalTableOutput>
                                               
</OleDbDestination>
         
        </Transformations>
      </Dataflow>
    </Tasks>
  </Package>

 

<#}#>
<!–
End for each MyDataTable loop–>
</Packages>

 

The whole BIML file for this demo can be downloaded here.  A very brief review (see my previous post for more) of what the BIML in the box above is actually doing is that it isusing <package> </package> tags to represent each package.   Within that we have <tasks> which are similar to the Control Flow.   Here wehave an <ExecuteSQL> tag to TRUNCATE the current table in the loop.   Then lastly we have the <DataFlow> tag which contains a source and destination component based on the current table name within the foreach loop.  The table names and schemas are represented as <#=TableSchema#>.<#=SourceTableName#> to access the C# variables as described above.

 

Now all we have to do is Right Click our BIML file (in my screenshot BimlScript_2.biml) and select Generate SSIS Packages (remember, you need
the free BIDS Helper add-in for this). The results can also be seen in the below screenshot – dozens of distinct SSIS packages, one for each Adventure Works table from the meta data.

 

 

The resulting .dtsx packages should look like the below –

 

 

Summary and Next Steps

 

In this example we built on the basics of BIML from my first post on the topic and introduced the concept of BIML Script – C# or VB.Net code nuggets embedded within the BIML XML itself. Using these embedded code snippets, we gain the extremely powerful ability to generate large numbers of ETL packages dynamically with very few lines of code.  This demo required less than 75 lines of code in the .biml file and only minutes to write, whereas manually creating all of these – even simple – SSIS packages would have taken a few hours.
 

 

In upcoming posts we will continue to build on the concept
of dynamic SSIS package generation and try to address some tips and common
pitfalls.

 

Sample Files:

 

Adventure Works 2014 DW .bak – Adventure Works DW 2014 Full Database Backup.zip

 

Modified Adventureworks db create script for destination
CREATE_AW2014Destination.sql
Final BIML file – Biml_Script2.biml

 

Erik

 

 

 

 

Introduction to Business Intelligence Markup Language (BIML)

Business Intelligence Markup Language, or BIML, is a dialect of XML that describes Business Intelligence objects such as ETL Packages, Cubes, and even Facts and Dimensions (here we will focus mainly on ETL Packages).  This post will serve as an
intro to this very powerful (and mostly free!) BI development tool.  In this post we will just create a singular SSIS package, but once we dive deeper into BIML’s capabilities we will be able to build dynamic and automated ETL design patterns.  I hope to turn this into a series of posts on BIML.
Unless otherwise noted, we will assume SQL Server 2014 and VS 2012, although much of this applies at least back to SQL Server 2008.  
Getting Started
First, in order to use BIML, you will need a tool to translate the actual BIML code into the desired BI object.  There are two main options for compiling BIML code – the free Visual Studio add-on, BIDS Helper, and the proprietary IDE, Mist, by Varigence.  We will focus on FREE 🙂
·
Assuming you already have SQL Server and Visual Studio installed, download and install the awesome VS add-in BIDS Helper (while there, check out the other great features it adds).
·
Once BIDS Helper is installed, if you start a new SSIS project, you will now have a new option when you right click on the project in the solution explorer
Now you will get a new file called BimlScript.biml under the “Miscellaneous” folder in the solution explorer.  If you open the file, you’ll see the XML root node of your new BIML file.
<Biml xmlns=http://schemas.varigence.com/biml.xsd>
</Biml>
Before going any further, lets set up our demo tables, Employee and EmployeeDest – both with identical columns – and insert a few rows.
/****** Object:  Table [dbo].[Employee]    Script Date: 11/1/2015 4:35:35 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Employee](
       [EmployeeId]
[int] IDENTITY(1,1) NOT NULL,
       [FirstName]
[nvarchar](50) NULL,
       [LastName]
[nvarchar](100)
NULL,
       [Notes]
[nvarchar](255)
NULL,
       [CreateDate]
[datetime] NULL,
 CONSTRAINT [PK_Employee] PRIMARY
KEY CLUSTERED
(
       [EmployeeId]
ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE =
OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
/****** Object:  Table [dbo].[EmployeeDest]    Script Date: 11/1/2015 4:35:43 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[EmployeeDest](
       [EmployeeId]
[int] IDENTITY(1,1) NOT NULL,
       [FirstName]
[nvarchar](50) NULL,
       [LastName]
[nvarchar](100)
NULL,
       [Notes]
[nvarchar](255)
NULL,
       [CreateDate]
[datetime] NULL,
 CONSTRAINT [PK_EmployeeDest] PRIMARY
KEY CLUSTERED
(
       [EmployeeId]
ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE =
OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
INSERT INTO [dbo].[Employee]
           ([FirstName]
           ,[LastName])
     VALUES
           (‘Erik’,‘H.’),
                 (‘Jeff’,‘K.’),
                 (‘Ed’,‘B.’)
GO
Generate a Simple Package
We will start simple and just create a single package, but you can take a look at the  vast array of possibilities at the Varigence website (which is unfortunately always  painfully slow) – https://www.varigence.com/Documentation/Language/Element/AstPackageNode
.
As you would with an SSIS package made via the standard UI, we will start by setting up our connection managers.  We do this with a <Connections> and <Connection> tag as seen below.  As you can see, thanks to BIDS Helper, we also have some intellisense to help us along.
  <Connections>
    <Connection Name=MyConnection ConnectionString=Data Source=localhost;Initial Catalog=Sandbox;
Provider=SQLNCLI11.1; Integrated Security = SSPI;
/>
  </Connections>
With our connection defined, we now want to add the BIML to generate our actual package(s).  As with connections, we start with a <Packages> node, then <Package>, and within <Package> we have <Tasks>.  The <Tasks> tag loosely  translates to the Control Flow tab of an SSIS package in the UI.  So all of the nodes underneath a task, such as ExecuteSQL as shown below, would be Control Flow  objects.
<Biml xmlns=http://schemas.varigence.com/biml.xsd>
  <Connections>
    <Connection Name=MyConnection ConnectionString=Data Source=localhost;Initial Catalog=Sandbox;
Provider=SQLNCLI11.1; Integrated Security = SSPI;
/>
  </Connections>
  <Packages>
    <Package Name=MyFirstPackage ConstraintMode=Linear> <!–Name and
attributes of package. Linear tells BIML to execute the components in sequence.
–>
      <Tasks> <!–“Tasks”
are roughly equivalent to components on the Control Flow
–>
        <ExecuteSQL Name=Select Employees ConnectionName=MyConnection> <!–This
would be on the Control Flow
–>
          <DirectInput>
            SELECT COUNT(1) AS TotalEmployees
FROM Employee
          </DirectInput>
        </ExecuteSQL>
      </Tasks>
    </Package>
  
  </Packages>
  </Biml>
So if we take the above code and put it in our BimlScript.biml file (and assuming we  have a DB named Sandbox and the sample tables created), we can then generate our first package.  Right click on the BIML file and select “Generate SSIS Packages”  as shown below.
Assuming no errors, you should now have an SSIS package called  MyFirstPackage.dtsx which looks like the below:

 

 
Simple, and useless!
Adding a Data Flow
Now let’s add some Data Flow components to our BIML file.  As you might expect, within the <Tasks> tag, we now would add a <DataFlow> tag to add a Data Flow to our SSIS Control Flow.  In BIML, all of the components within a Data Flow are called “Transformations” and thus go in… a <Transformations> tag.  Here’s a quick review:
In order to add a Data Flow and an OleDbSource to our package, within the <Tasks>  tag, we add
<Dataflow Name=Extract Employee Data>
          <Transformations>
            <OleDbSource Name=Employee Source ConnectionName=MyConnection>
              <DirectInput>
                SELECT [EmployeeId]
                ,[FirstName]
                ,[LastName]
                ,[Notes]
                ,[CreateDate]
                FROM [Sandbox].[dbo].[Employee]
              </DirectInput>
            </OleDbSource>
          </Transformations>
 </Dataflow>
after our <ExecuteSQL> tag.  You’ll notice that within the <OleDbSource> node there is another node for <DirectInput>.  This is exactly what would go in the OLE  DB Source Editor SQL Command box in the SSIS UI.  This is a good point to note that, as with all XML, BIML IS CASE SENSITIVE.
We could generate the package again at this point, and successfully re-create the MyFirstPackage.dtsx package (note that BIDS Helper helpfully will ask you if you want to overwrite an existing package when you re-generate it), but it wouldn’t do much yet.
Add a Derived Column and OLEDB Destination
For the last part of this demo, lets add a simple Derived Column transform and then our destination.  So after our </OleDbSource> tag, we can add a <DerivedColumn> tag.  As you can see from the screenshot, the intellisense lists the wide variety of components that you can use with BIML.
Within the <DerivedColumns> tag, we can have <Columns> and then each individual <Column> which contains the value that would go into the Derived Column component’s Expression box in the SSIS UI.  You can add multiple columns in one
<DerivedColumns> tag.  Also note that in this case I set it to ReplaceExisting = “true”, just as you can in the UI. At this point, our full BimlScript file should look like
this
<Biml xmlns=http://schemas.varigence.com/biml.xsd>
  <Connections>
    <Connection Name=MyConnection ConnectionString=Data Source=localhost;Initial Catalog=Sandbox;
Provider=SQLNCLI11.1; Integrated Security = SSPI;
/>
  </Connections>
  <Packages>
    <Package Name=MyFirstPackage ConstraintMode=Linear> <!–Name and
attributes of package
–>
      <Tasks> <!–“Tasks”
are roughly equivalent to components on the Control Flow
–>
        <ExecuteSQL Name=Count Employees ConnectionName=MyConnection> <!–This
would be on the Control Flow
–>
          <DirectInput>
            SELECT COUNT(1) AS TotalEmployees
FROM Employee
          </DirectInput>
        </ExecuteSQL>
        <Dataflow Name=Extract Employee Data>
          <Transformations>
            <OleDbSource Name=Employee Source ConnectionName=MyConnection>
              <DirectInput>
                SELECT [EmployeeId]
                ,[FirstName]
                ,[LastName]
                ,[Notes]
                ,[CreateDate]
                FROM [Sandbox].[dbo].[Employee]
              </DirectInput>
            </OleDbSource>
            <DerivedColumns Name=Set Columns>
              <Columns>
                <Column Name=CreateDate DataType=DateTime ReplaceExisting=true> <!–The value inside the <Column> tags is what would be
in the Expression box on a Derived Column component
–>
                  GETDATE()
                </Column>
                <Column Name=Notes DataType=String ReplaceExisting=true>
                  “Added via ETL”
                </Column>
              </Columns>
            </DerivedColumns>
            <OleDbDestination Name=Insert Employees ConnectionName=MyConnection>
              <ExternalTableOutput Table=dbo.EmployeeDest></ExternalTableOutput>
<!–
Destination target table–>
            </OleDbDestination>
          </Transformations>
        </Dataflow>
   
      </Tasks>
    </Package>
  </Packages>
  </Biml>
And if we re-generate the package, and run it, hopefully it’s all green for you also.
Summary and Next Steps
Obviously this post just scratches the surface of BIML and doesn’t even begin to touch on its applications for dynamic automated development.  But like with everything, with BIML it’s good to start small and build on what you’ve learned.  In this post we covered the basic syntax of BIML, and created a very simple SSIS package to move data from one table to another with a simple transformation.
One of the best things about BIML is that packages created with it are identical
to any package created in the SSIS UI – other than maybe straighter lines.  If you generate packages via BIML, a future developer does not need to know a thing about BIML to do maintenance work on your BIML generated package.  
In future posts, I will cover more components and uses for BIML and the added power that BIMLScript – .Net code embedded within BIML – gives us to generate
true dynamic packages.  
Erik

Have your SSIS Fast Load (Bulk Load) and Row by Row Error Messages too

 
SSIS OLEDB Destination
In the SSIS OLEDB Destination component there are several Data Access options to select from.  For this post we will only be discussing “Table or view” and “Table or view – fast load”.
The difference between these 2 options is that effectively “Table or view” will execute one SQL command for each and every row that you are loading into your destination, while “Table or view – fast load” effectively
uses a BULK INSERT command to insert the data from the pipeline.   For an under-the-hood look at the differences, please see Jamie Thomson’s post here – http://consultingblogs.emc.com/jamiethomson/archive/2006/08/14/SSIS_3A00_-Destination-Adapter-Comparison.aspx
 When you choose the fast load option, you are also given a few other options to set.  Rows per batch and Maximum insert commit size have always confused me because they sound similar, so I dug around a bit for the definitions.
Since fast load uses a BULK INSERT, Rows Per Batch and Maximum Insert Commit Size are actually arguments for the BULK INSERT operation.  Maximum Insert Commit Size actually equates to the BATCHSIZE argument and Rows Per Batch equals… ROWS_PER_BATCH.  See this post on MSDN for a bit more. Knowing that, we can define the OLEDB Destination fast load options as such from BOL:
Rows Per Batch (ROWS_PER_BATCH)
“Indicates
the approximate number of rows of data in the data file.
By
default, all the data in the data file is sent to the server as a single transaction, and the number of rows in the batch is unknown to the query optimizer. If you specify ROWS_PER_BATCH (with a value > 0) the server uses this value to optimize the bulk-import operation. The value specified for ROWS_PER_BATCH should approximately the same as the actual number of rows.”
Maximum Insert Commit Size (BATCHSIZE)
“Specifies the number of rows in a batch. Each batch is copied to the server as one transaction. If this fails, SQL Server commits or rolls back the transaction for every batch. By default, all data in the specified data file is one batch”
So by changing the Maximum Insert Commit Size, we can change the number of rows that will be committed at one time.  SSIS will continue to BULK INSERT batches of this size until all of the rows in the pipeline have been inserted (or failed).  And, if I understand the BOL entry correctly, Rows Per Batch is actually just an optimizer hint to get maximum performance out of the BULK INSERT operation.  That is a topic for another discussion!
Why would you ever NOT use fast load?
With the fast load option, if ANY row in a transaction fails (constraints, PK violations etc…) that entire transaction will fail to commit.  This means that all of the rows that were part of that BULK INSERT operation – which the number of rows is set by the Maximum Insert Commit Size – will fail and/or be passed to the OLEDB Destination Error Row path. Unfortunately, this means that if you leave the default value for Maximum Insert Commit Size (2147483647) and try to insert 1 million records, and 1 record fails a constraint, all 1 million records would be sent to the error output.  And even worse, you can’t get an Error Description from the fast load/BULK INSERT operation.  So you would get 1 million records into your error pipeline, not know what the error was, and not know which row caused the error.
Getting the benefits of Fast Load, but still capture error rows and descriptions
What you can do to get the benefits of a fast load, but still get the detailed error information of the non-fast load row by row insert is chain 2 OLEDB Destinations together and change the Maximum Insert Commit Size.
In the screen shot above, we are using the first OLEDB Destination with the fast load option into our FactClaim fact table and we have the Maximum Insert Commit Size set to 10,000 rows.  If 1 row in this batch fails, those 10,000 rows will go to the row by row insert – but if none of those 10,000 rows have errors, they will be committed using BULK INSERT via the fast load.  So if we were trying to load 1 million records, we would of course end up with 100 BULK INSERT transactions (assuming no errors), but depending on your required design this may be more acceptable than having the entire 1 million records fail or being forced to use non-fast load.  
A side benefit of changing the Maximum Insert Commit Size is that it can help to manage tempdb and log space because smaller batches are committed at one time, and thus the whole 1 million rows would not need to be rolled back if the last row fails.
Getting the exact error row and error description 
Once you send the smaller set of rows from the failed batch to the second OLEDB Destination with non-fast load, SSIS will try to insert those rows one at a time.  In this way, only the exact rows that fail will now be sent to this second OLEDB Destinations error output.  Thus only those rows that actually fail the insert are not inserted. We can then use a script task to add an ErrorDescription column, and capture the actual error description with code similar to the below.
public override void Input0_ProcessInputRow(Input0Buffer Row)

{
Row.ErrorDescription = ComponentMetaData.GetErrorDescription(Row.OLEDBDestination1ErrorCode);

}
Finally we can get a row count of the actual errors, and store those error rows, with error description, to a flat file or a table.
By using this method, we can get most of the benefit of using the fast load option (fast loading data with 0 errors in one BULK INSERT would likely be faster), but still be able to capture the exact error row and description.

First.

Intelligent Business Intelligence to come.