Thursday, June 21, 2007

Broken Promises of Drag-and-Drop Programming in SSIS

I'll come right out and say it: I am no fan of drag-and-drop programming. Perhaps I've been emotionally scarred by FrontPage 2000 or something? This doesn't mean that I don't appreciate the ability to drop a command button on a form, or create an HTML table with a pencil. But it does mean that I don't like (or appreciate) companies telling me that their drag-and-drop environments will 'jump start' me or 'do all the dirty work', because inevitably, I find myself in a position where I have to either (a) debug the generated code, (b) work around an undesired behavior, or (c) write the code myself anyway.

Take SSIS for example.

I am in the process of a large ETL project involving VFP and SQL Server 2005. We are taking data from SQL Server and populating VFP free tables (for re-distribution with our installed apps). Yes, you read right: SQL Server back to FoxPro.

So the company, long ago (and maybe in a galaxy far, far away), decided to use SSIS as the ETL tool of choice. Now, everything is in SSIS and a lot of time and effort has been put into the endeavor. With scores of packages extracting data from a variety of sources, SSIS has proved reasonably useful from a management and execution point-of-view. But I have to tell you, writing complex SSIS packages is no breeze -- and the whole 'drag-and-drop' approach is seriously getting in the (read: my) way.

I took an inventory of all the work I needed to do for this project. Here's the skinny:


The extracted data can have one or multiple sources (spanning multiple SQL Server databases) and destinations (each with a different FoxPro data schema and each potentially on different computers on the Network). I also need to be able to call packages from other packages. As a result, a lot of time was spent on setting up variables (and configuring the child packages to receive them from their parent). These variables are used extensively throughout the solution (in the connection manager and in various expressions for example -- and don't get me started on the expression builder!).


Here's were drag-and-drop programming really starts to fall short. My sources, almost every one of them (there are dozens of sources in the solution) use SQL statements -- instead of actual Tables (or views) -- to gather the data (using the OLE provider, not the native driver, by the way). This really isn't a big deal and in fact gives me a ton of flexibility, and I'm sure this approach is rather common.

For example, the FoxPro datasets do not support NULLs in any column (don't ask me why, and it's too late to fix it!). In the drag-and-drop world, one might drag a Derived Column tool, fight with the expression field, and 'replace' the NULLed column using an expression with some non-null value one at a time. Ugh. I tried this once and let me tell you: just build NULL elimination into the SQL statement using coalesce instead. Save yourself some aggravation and a headache. Also, while you're at it, do all your converts, trims, calculations in there as well. In my tests, performance was not affected by this approach (using a database with several million records). As I've mentioned many times before though, I'm not an expert on benchmarking. But you can write SQL much quicker than using that darned expression dialog box -- not to mention everything is right out in front of you in an SQL statement, and if you need to debug or experiment with the SQL, just copy and paste it into SSMS and off you go (try doing that with the Derived Column tool).


Why is it that, with the entire toolbox of transformations provided by SSIS, I am always forced into using the Script transformation, the Union All as a terminator (so my lookups don't complain about error output), and non-standard tools (such as the Lookup) to manipulate and filter data? I've already mentioned that Derived Columns and similar tools can be accomplished using good old fashioned SQL.


Sorry folks, I'm not going here. Forget it. I might be likely to say something I'll regret.


When looking at this integration project as a whole, I came to realize two important facts:

  • The core functionality of the package is hand-written code (the SQL, transformations, even setting up the variables);

  • And, that most of the time to put this together (about 60 hours of programming in total) was spent on getting SSIS to behave and 'flow'.

It turns out that drag-and-drop programming isn't really the answer IMO. Sure, it might get you off the ground, but I find that tools like SSIS just get in the way more than anything else.

So I have to ask: Why bother? If I had hand-coded this in VFP (or any other data-centric language), it would have been done last week, and it would have been just as easy to execute, maintain, and turn over. Especially if I had a framework to start off with. Hmmn...


Mr. Curious said...

Can you write the same package in .net?

Tod McKenna said...

I don't see why not, but my gut tells me it would be twice as difficult (I do a lot of work in C#, but I have no problem admitting that C# is not my 'first' language, or my second!).

I think a language like PHP could compete with VFP, mainly because cost and resource availability are non-issues.

Tod McKenna said...

Oh, and not to mention VFP tables are used for the destination. Using VFP avoids the need to use ODBC or other connection methods to write to the tables. It's all native... and FAST.