SSIS vs Pentaho
Has anyone used both of these to provide a good comparison. I am doing a school project so the cost of SSIS isn't an issue as we already have the license for it.
Background on whats going on. I will be downloading about 10 years of patent information into flat files. The result will be 2,080 delimited files. I want a way to load them into MS SQL server all at once. Then I want to be able to append additional files into the DB as they are released.
Speed of the software doesn't bother me much as I can just let it run overnight. I am just looking for something with some flexibility, and more importantly fairly easy to use. I have never done a project like this before and will be learning how to do this from the boards.
Have used both in real live projects. I do prefer Pentaho (PDI) over SSIS because of its ease of use and flexibility. Do read a little on the subject before you start using it. There are a couple of excellent books on kettle (PDI), or you could just read the Get Started in the Help menu of PDI. The forum is a good place if you are stuck or ##pentaho on IRC. What also helps a lot are the Samples that you can find in the Welcome Screen. I hope you enjoy it, I know I still do. Have been using it since 2006 and am always pissed when I have to use SSIS on some project :-)
PS : use a jtds jdbc-driver to connect to a SQL Server db, it will save you some headaches
Hope this helps,
After spending a couple days developing an ETL package in PDI and SSIS I feel confident in saying that PDI is definitely more user friendly. The user interface alone is much cleaner and seems to flow in a manner that is very intuitive and as such easy to use.