SSIS and MSSQL - Importing CSV with headings on 3rd row, While using 1st and 2nd row data as well

I have a bit of a conundrum on my hands. I have a package that imports csv files on a regular basis, now i have a added file to import, but the structure of this file is the challange.

With all the previous files, headers are all on line 1, and data below that no issues. The new file has 2 sets of Headers, Basically line 1 has 2 headers Level and Data, with Line 2 having the details for these 2 headers.

Line 3 has a new set of headers, and line 4 down has all the other data that is needed.

Example:

Month End       Level       
201501          CHEESE      
Region          Site        Pricing   Brand
Gauteng         Billys      100       Gouda
ECape           BeaconBay   150       Feta

Now the SSIS package i have loops through multiple foulders to find files, then imports into the correct tables, these new files will get a new table, but I am at a loss as to how to read them correctly. I need the Date, and Level to be pulled, and the data from row 4 down, with the headers from row 3.

My current package has a Loop for files to find all of them, adds the file path and name as variable (User::File) and its utilized for the dataflow.

But how do I process the file to allow reading of the file (Dynamically - The format will not change, and I can not get the source file adjusted before importing) and add the rows correctly with date and level to the table (Region_Data) within the DataFlow. I assume it must be something to setup on the Flat File Source? or is there another step to be included?

Thanks in advance for all and any assistance.

----With the help of @MiguelH I have this Setup currently. (Im using Visual Studio tools for applications (VB 2010)

And the Scripting used is:

#Region "Imports"
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
#End Region


<Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute()> _
<CLSCompliant(False)> _
Public Class ScriptMain
    Inherits UserComponent
    Dim StrMonthend As String
    Dim StrLevel As String
    Public Overrides Sub myInput_ProcessInputRow(ByVal Row As MyInputBuffer)
        do_output(Row.MyField)
    End Sub

    Public Sub do_output(ByRef data As String)
        Dim splitz() As String
        splitz = Split(data, ",")
        If splitz(0) <> "Month End" And splitz(0) <> "Region" Then
            With myoutputBuffer
                .AddRow()
                If UBound(splitz) < 2 Then
                    .MonthEnd = splitz(0)
                    .Level = splitz(1)
                    StrMonthend = splitz(0)
                    StrLevel = splitz(1)
                    .Brand = ""
                    .Description = ""
                    .Size = "0"
                    .VAT = "0"
                    .UnitsLY = "0"
                    .UnitsTY = "0"
                    .UnitsGrowth = "0"
                    .SalesInclLY = "0"
                    .SalesInclTY = "0"
                    .SalesInclGrowth = "0"
                    .SPInclLY = "0"
                    .SPInclTY = "0"
                    .SPInclGrowth = "0"
                    .Contrib = "0"
                    .BuyInd = "0"
                Else
                    .MonthEnd = StrMonthend
                    .Level = StrLevel
                    .Brand = splitz(0)
                    .Description = splitz(1)
                    .Size = splitz(2)
                    .VAT = splitz(3)
                    .UnitsLY = splitz(4)
                    .UnitsTY = splitz(5)
                    .UnitsGrowth = splitz(6)
                    .SalesInclLY = splitz(7)
                    .SalesInclTY = splitz(8)
                    .SalesInclGrowth = splitz(9)
                    .SPInclLY = splitz(10)
                    .SPInclTY = splitz(11)
                    .SPInclGrowth = splitz(12)
                    .Contrib = splitz(13)
                    .BuyInd = splitz(14)
                End If
            End With
        End If
    End Sub

    Public Overrides Sub PrimeOutput(ByVal Outputs As Integer, ByVal OutputIDs() As Integer, ByVal Buffers() As Microsoft.SqlServer.Dts.Pipeline.PipelineBuffer)
        MyBase.PrimeOutput(Outputs, OutputIDs, Buffers)
    End Sub

    Public Overrides Sub CreateNewOutputRows()
    End Sub

End Class

If I Remove the CLSComplient line I get this: Error 01 With the CLSComplient line in or out I still get this error. Error 02

Answers


You could add another Connection Manager to that same file. Set it up to receive everything in the row as one column. Then do a conditional split on the rows (Maybe use a SUBSTRING to get 'Month End' or whatever would identify that header row you need.) So just the one row will go through. You can then write it into a object variable (Result set). To use it again you will need to run it in a For each loop just to assign those values to variables you can define.

So to give it in steps. 1. Create new connection manager(CM) using only one column. 2. Create new data flow(DF) task. 3. In DF create flat file source and use the new CM. 4. add conditional split just below. Split values by using header identifier. 5. write to result set destination. (should only receive the one row.) 6. Outside Data flow use For each container to assign values in the result set to variables you can use.

Hope you can make out what I mean.


Here's a simple scripting solution to your problem. The idea is to reformat the separate header and detail records into a single record. All files could be processed and output to an extendable text file. You would then read this output file to create the final data. Here's the script ..

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

Public Class ScriptMain
Inherits UserComponent
Dim StrMonthend As String
Dim StrLevel As String
Public Overrides Sub myInput_ProcessInputRow(ByVal Row As myinputBuffer)
    '
    ' Add your code here
    '          
    do_output(Row.Myfield)
End Sub
Public Sub do_output(ByRef data As String)
    Dim splitz() As String
    splitz = Split(data, ",")
    If splitz(0) <> "Month End" And splitz(0) <> "Region" Then
        With myoutputBuffer
            .AddRow()
            If UBound(splitz) < 2 Then
                .MonthEnd = splitz(0)
                .Level = splitz(1)
                StrMonthend = splitz(0)
                StrLevel = splitz(1)
                .Region = ""
                .Site = ""
                .Pricing = "0"
                .Brand = ""
            Else
                '
                ' detail
                '
                .MonthEnd = StrMonthend
                .Level = StrLevel
                .Region = splitz(0)
                .Site = splitz(1)
                .Pricing = splitz(2)
                .Brand = splitz(3)
            End If
        End With
    End If
End Sub
Public Overrides Sub PrimeOutput(ByVal Outputs As Integer, ByVal OutputIDs() As Integer, ByVal Buffers() As Microsoft.SqlServer.Dts.Pipeline.PipelineBuffer)
    MyBase.PrimeOutput(Outputs, OutputIDs, Buffers)
End Sub
Public Overrides Sub CreateNewOutputRows()
End Sub
End Class

You need to create an Output in the script transformation. I've set all output as STR. Note: Make sure "None" is in the SynchronoiusInputID"


********************* EXTRA BIT! *****************


Here are some extra screenshots to compare against your solution ... Note my column order is a little different to yours, but this is irrelevant!


Need Your Help

Building Protocol Buffers with Standalone NDK toolchain

android android-ndk protocol-buffers

I'm having trouble building Protocol Buffers for Android using a standalone NDK toolchain created with the $NDK/build/tools/make-standalone-toolchain.sh script. The problem I'm having is this:

How to avoid annoying error "declared and not used"

go

I'm learning Go but I feel it is a bit annoying that when compiling, I should not leave any variable or package unused.