Monday, July 02, 2007

My Blog Has Moved! Visit blog.todmeansfox.com,

I am closing this blogger account, and moving operations to a new domain: http://blog.todmeansfox.com/

So please, take a moment to update your RSS feeds to my new one: http://blog.todmeansfox.com/feed and your Bookmarks.

Feel free to contact me (todatgrengamadotcom) if you have any comments or questions!

Tuesday, June 26, 2007

SSIS Packages and setting up Jobs

I promise not to turn todmeansfox into an SSIS blog! But I thought that as I continued development on my SQL Server back to FoxPro project, I would share any acquired wisdom along the way.

Last post, I discussed some of the problems I faced with the drag-and-drop nature of SSIS. In a nutshell, I had to do a lot (and I mean A LOT) of extra non-drag-and-drop work to get the packages running. As of now, I have thoroughly tested the entire solution locally on my machine and have now uploaded the packages to the server. A couple of notes:

Upload Packages


The first thing I had to do was upload my packages to the server. Log into Integration Services, click 'Stored Packages' from the treeview and add the package to the appropriate folder (in my case, MSDB). Do this by right-clicking the folder and click 'Import Package'. I had several packages to upload (a single general package that calls multiple specific packages). The Import Package dialog is easy to use, but takes a minute to get used to. There are some non-standard interface elements (the single '.' next to the disabled text box 'Protection level' is actually very important to push).

Once uploaded, I went ahead and created a job using SQL Server Agent.

SQL Server Agent


Connect to the server through SSMS where you want to execute your package. Click SQL Server Agent, and then right-click Jobs and click 'New Job...'. The New Job dialog will open.

There are quite a few important elements on this screen, all of which I'm sure I would do a poor job explaining. So instead, here are the basic things I did to get my job going:


  • On the General page, entered a name and description

  • On the Steps page, added a series of new steps designed to execute my main package in several different ways (passing in a variety of parameters for testing purposes):

    • Enter a Step name, and select the SQL Server Integration Services type

    • Select the Package source (SQL Server) and the Server name. Pick an uploaded package from the Package list

    • At this point, you'll want to look into how you want to log and fail the job as needed

    • To get things up and running, I clicked on the 'Set values' tab to enter all relevant parameters (see Some Snags below)



  • On the Notifications page, I set myself up to receive an email when the job finishes



Some Snags


While setting up the job, I ran into a few snags that I thought I would pass along.

First on the list: Be sure to define and assign variables properly:
Property Path: \Package.Variables[Somevariable]
Value: The literal parameter (do not use quotes for character strings)

When you are satisfied with your job, and you have entered all required information, save it (press 'Ok') and create a script. Simply right on the job in the SQL Server Agent / Jobs treeview and click Script Job As - Create To - New Query Editor Window. You can now use this script to re-create the job. I find it infinitely easier to make minor modifications to the generated script (which is just some T-SQL) and re-create the package than use the bulky and slow interface (which I had to do). One thing, though, caught me by surprise:

If you added a variable that ends in a backslash, you need to escape it. The T-SQL script generated will treat the single backslash as an escape for the quote mark that is used for the command line parameters. For example, assume your value is (without the quotes): "\\machine\folder\temp\". This will result in an error when executed because the last backslash is not escaped and on the command line will cause the a parenthesis mismatch. Instead, use something like "\\machine\folder\temp\\"

Hopefully someone, somewhere finds this useful!

Thursday, June 21, 2007

Broken Promises of Drag-and-Drop Programming in SSIS

I'll come right out and say it: I am no fan of drag-and-drop programming. Perhaps I've been emotionally scarred by FrontPage 2000 or something? This doesn't mean that I don't appreciate the ability to drop a command button on a form, or create an HTML table with a pencil. But it does mean that I don't like (or appreciate) companies telling me that their drag-and-drop environments will 'jump start' me or 'do all the dirty work', because inevitably, I find myself in a position where I have to either (a) debug the generated code, (b) work around an undesired behavior, or (c) write the code myself anyway.

Take SSIS for example.

I am in the process of a large ETL project involving VFP and SQL Server 2005. We are taking data from SQL Server and populating VFP free tables (for re-distribution with our installed apps). Yes, you read right: SQL Server back to FoxPro.

So the company, long ago (and maybe in a galaxy far, far away), decided to use SSIS as the ETL tool of choice. Now, everything is in SSIS and a lot of time and effort has been put into the endeavor. With scores of packages extracting data from a variety of sources, SSIS has proved reasonably useful from a management and execution point-of-view. But I have to tell you, writing complex SSIS packages is no breeze -- and the whole 'drag-and-drop' approach is seriously getting in the (read: my) way.

I took an inventory of all the work I needed to do for this project. Here's the skinny:

Variables


The extracted data can have one or multiple sources (spanning multiple SQL Server databases) and destinations (each with a different FoxPro data schema and each potentially on different computers on the Network). I also need to be able to call packages from other packages. As a result, a lot of time was spent on setting up variables (and configuring the child packages to receive them from their parent). These variables are used extensively throughout the solution (in the connection manager and in various expressions for example -- and don't get me started on the expression builder!).

Sources


Here's were drag-and-drop programming really starts to fall short. My sources, almost every one of them (there are dozens of sources in the solution) use SQL statements -- instead of actual Tables (or views) -- to gather the data (using the OLE provider, not the native driver, by the way). This really isn't a big deal and in fact gives me a ton of flexibility, and I'm sure this approach is rather common.

For example, the FoxPro datasets do not support NULLs in any column (don't ask me why, and it's too late to fix it!). In the drag-and-drop world, one might drag a Derived Column tool, fight with the expression field, and 'replace' the NULLed column using an expression with some non-null value one at a time. Ugh. I tried this once and let me tell you: just build NULL elimination into the SQL statement using coalesce instead. Save yourself some aggravation and a headache. Also, while you're at it, do all your converts, trims, calculations in there as well. In my tests, performance was not affected by this approach (using a database with several million records). As I've mentioned many times before though, I'm not an expert on benchmarking. But you can write SQL much quicker than using that darned expression dialog box -- not to mention everything is right out in front of you in an SQL statement, and if you need to debug or experiment with the SQL, just copy and paste it into SSMS and off you go (try doing that with the Derived Column tool).

Transformations


Why is it that, with the entire toolbox of transformations provided by SSIS, I am always forced into using the Script transformation, the Union All as a terminator (so my lookups don't complain about error output), and non-standard tools (such as the Lookup) to manipulate and filter data? I've already mentioned that Derived Columns and similar tools can be accomplished using good old fashioned SQL.

Debugging


Sorry folks, I'm not going here. Forget it. I might be likely to say something I'll regret.

Sigh


When looking at this integration project as a whole, I came to realize two important facts:

  • The core functionality of the package is hand-written code (the SQL, transformations, even setting up the variables);

  • And, that most of the time to put this together (about 60 hours of programming in total) was spent on getting SSIS to behave and 'flow'.


It turns out that drag-and-drop programming isn't really the answer IMO. Sure, it might get you off the ground, but I find that tools like SSIS just get in the way more than anything else.

So I have to ask: Why bother? If I had hand-coded this in VFP (or any other data-centric language), it would have been done last week, and it would have been just as easy to execute, maintain, and turn over. Especially if I had a framework to start off with. Hmmn...

Monday, June 18, 2007

Just a Cool Day

Ever get stuck on a word? Here it is, only 9:30am, and I must have used the word 'cool' 4 times already. Once in front of my boss ("Ok, that's cool. Thanks"), another in front of our data operations director ("Cool, I like it."), yet another with a colleague (discussing his recent trip to Scotland), and finally again on the phone with my doctor (I have a sore throat). What's funny is just before I drove in I was thinking about Craig Bailey's post from last year "VFP: How to make Visual FoxPro cool".

But "cool" needs to take a back seat at the moment.

Now, for VFP, it's about being a survivor.

In September, I will be presenting at Fox Forward. My discussion will be about using VFP for Data Warehousing -- especially for the prototype/iterate phase of planning and as an ETL tool. There is nothing really 'cool' about using FoxPro for ETL (although, I'll try my best on FoxETL.com, a site I'm in the process of putting together for this very topic). After all, we're talking about lots of code, objects and design patterns, schedules, staging, shuffling data, etc. Nothing groundbreaking, nothing cool: But incredibly valuable.

When you think about how much it costs to build a full Business Intelligence solution (could be in the $-millions-$), you can really begin to lose sleep. Couple that with the fact that many small and medium organizations, who realize that their data is a valuable asset, simply do not have access to resources (DBAs, SQL Programmers, Business Objects gurus, etc.) to implement such a solution. FoxPro presents an interesting and powerful alternative to the big players, all for a fraction of the cost. Everything you can do in SSIS for example, you can do in FoxPro (What Tod? Are you serious? Yup. I am). And in my humble opinion, you can do it better.

I suppose -- come to think of it -- that this makes VFP pretty darn cool. No? And using VFP for these types of tasks not only fills a large gap in the market, but also gives us die-hard FoxPro programmers lots to do in the foreseeable future.

Wednesday, June 13, 2007

Web Analysis with VFP: Download, Parse, and Stage

My Web Analysis application is beginning to take shape. At the moment, I have created a little road map document (which I'll share shortly), defined all my base classes (the entire application will be made available in the coming months), and proved the general concept (see below).

To recap, this project is about building a tool to help me make better business decisions on and using the Internet. While there are a ton of canned products on the market, I feel that I can get what I need with VFP (and get it better). In general, I want to be able to download a variety of data from my server, analyze it, and use that data to make decisions (everything from what IPs to ban to how I might redesign a certain page). See my post "Business Intelligence Through Web Analysis" for more details.

The following zip file contains all the code necessary to prove the concept. It (a) establishes a database, (b) downloads the log files from the FTP server, and (c) parses the file, storing it in a stage directory, which I can then use to create a Dimensional Model for analysis purposes. Here's the programs:

web_analytics.zip

In the zip, you will find:

  • wa_main.prg (the setup program, check the path and run this once)

  • functions.prg (a place to store my functions, this will become a class soon)

  • download.prg (set your server information up and run, it calls load_logfile(), which lives in functions.prg)

  • ftp/ftp.prg (FTP services class written by Robert Abram many moons ago -- still works great, requires wininet.dll)


Now that the concept has been proved, and I can visually analyze my log files, I will start to build an application in VFP9 around it. Stay tuned!

Tuesday, June 05, 2007

Sorting IP Addresses in VFP

In my series, "Business Intelligence Through Web Analysis", I talk about using web stats to build a web analytics solution in FoxPro. I have created a webstats table, and have populated it with some data from my access logs. I have a requirement to be able to sort my webstats table by IP address (among other things).

The format of an IP address doesn't lend itself to proper indexing (it is a 32-bit numeric address written as four numbers (octets) separated by periods). To accommodate my requirement, my first thought was to break the IP address into 4 separate numeric fields when importing. This would be good normalization. After all, the rule is that atomic data is better than concatenated data. But rules are meant to be broken. And, more importantly, does it make sense to atomize an IP address? An IP address is not really 4 separate numbers, the entire address is meaningful, even though each piece holds some significance.

So, I decided to keep the IP address intact.

Still needing a good way to sort and group, I thought about converting the IP address to its binary representation. MySQL has a function, INET_ATON (Adress TO Number), that does exactly that. FoxPro's CREATEBINARY function seems to do the trick at first. I can pass in the complete IP address like so:


? CREATEBINARY("128.0.0.0")
? CREATEBINARY("168.212.226.204")


But if you see these outputs, you'll realize that they're not great for indexing (which is the only reason I want to convert them). CREATEBINARY creates strings of varying length, which is not what I want for indexing.

Back to the drawing board. I could represent each octet as 0 and 1 like this:


lnOct = 122
lcBin = ''
DO WHILE lnOct > 0
lcBin = STR(lnOct%2, 1) + lcBin
lnOct = INT(lnOct/2)
ENDDO
? PADL(lcBin,8,'0')


But this seems terribly inefficient (I admit, no benchmarking was done, but this code is overkill!).

With all this in mind, I think I have a viable solution. It doesn't require a custom function, nor does it require storing data in separate fields. I know I can easily extract each octet of the IP address by using GETWORDNUM. And, knowing a little something about IP addresses, each octet has 2^8 (or 256, values 0 to 255) possible combinations. Therefore, an IP Address can be represented in a numeric form like so:


? VAL(GETWORDNUM(ip_address,1,'.')) * 256^3 + ;
VAL(GETWORDNUM(ip_address,2,'.')) * 256^2 + ;
VAL(GETWORDNUM(ip_address,3,'.')) * 256 + ;
VAL(GETWORDNUM(ip_address,4,'.'))


So my index for webstats.dbf would be:


SELECT webstats
INDEX ON BINTOC( ;
VAL(GETWORDNUM(ip_address,1,'.')) * 256^3 + ;
VAL(GETWORDNUM(ip_address,2,'.')) * 256^2 + ;
VAL(GETWORDNUM(ip_address,3,'.')) * 256 + ;
VAL(GETWORDNUM(ip_address,4,'.'));
,8) TAG IP


Your comments and feedback are welcome. Is there a better solution?

Thursday, May 31, 2007

Business Intelligence Through Web Analysis

I just received a surprising email regarding my previous post about Analyzing cPanel's Raw Access Logs in FoxPro. The commenter wrote “dude, you’re nuts. Why waste your time on this?!? I use Awstats and that works fine”.

I sat back in my chair quite puzzled. After all, I know that Awstats doesn’t even come close to giving me the answers I need to grow my business and website. Then I realized my folly: I jumped right into my example in my last post without fully explaining the goal of the project. I’ll try to redeem myself now.

Awstats and similar tools (in my case, provided with cPanel) are retrospective reporting tool. They give you nice charts and display some very interesting numbers and facts about the types of traffic generated on your site. If you’re good with numbers and can associate events to dates (in your head) then you may be able to notice some interesting patterns (like, “hey – it seems that whenever I post a new blog entry, my hits double!”).

But this isn’t good enough for more serious projects (but isn’t a bad place to start either).

When growing your business (whether you’re a blogger or selling widgets), this type of information is invaluable. You need good, consistent, scientific analysis to pull it all together (hunches and gut feelings don’t count). Pool raw data (from a variety of sources), integrate it, clean and add value to it, and compile it to create some incredibly useful and valuable information (Read: Business Intelligence). This information can help you make decisions like (a) how much to spend on advertising, (b) whether or not to sponsor an event (such as FoxForward), (c) how many blog posts to make per week to keep the interest of readers, (d) should I sign up with Google AdSense, or (e) what design elements and layout plans are making the greatest impact.

Data smog is a real issue, however. Too much data can not only waste your valuable analysis time, but the integration of this meaningless data will do nothing but eat away at your resources (and give you a headache). The key then is to do a little preparation before you begin a project like this (duh!). I think there are two important steps (1) monetize all elements of your business, and (2) identify all key performance indicators (KPIs). Armed with this information, you will be able to build a dimensional model (in VFP of course!) with an incredibly rich fact table. Monetizing helps you assign value to all your tasks, and KPIs allow you to measure the benefits of these endeavors. Throughout the course of this project, I’ll be itemizing these two items in greater detail. As an example, for my blog todmeansfox, I’ve monetized the following items:


  • Posting a new blog entry: $90.00 / post (my time and effort to post based on my current rates, discounted)

  • Responding to posts: $10.00 / comment

  • Social networking: $90.00 / contact (includes setting up accounts on del.ico.us for example)

  • Advertising: $40.00 / free advertising (example, updating my fox.wiki profile)

  • Advertising: case by case (I have not done any advertising yet)

  • Sponsoring: case by case (I have not sponsored an event yet)

  • Updating CSS files: $60.00 / hour (how much does it ‘cost’ me to update my site’s layout and colors?)


Next, I tried to identify all important KPIs:

  • referrals

  • length of stay

  • out clicks

  • total number of hits and page views

  • total number of comments made

  • total number of emails received

  • Blackstone Providence leads

  • consulting job hours from website leads


Of course, as I do more data profiling, I may uncover some other very useful pieces of data that I can later integrate.

The goal of this project, therefore, is rather simple: make better business decisions. Using the Internet, I will gather the right data, integrate it in a meaningful way, and use OLAP to analyze and report on the findings. I’ll use data mining, profiling, and trend analysis to identify abuse and spam, as well as identify areas were improvements in layout, methodology, and content can make a greater impact. My hope is to generate more business by leveraging my current assets. On top of that, I want to do it in the open so others can benefit.

The first step in my process is data profiling, where I’ll gather the data I think I can use, analyze it, test it for quality, and prepare it for my dimensional model. My last blog post attempted to start that process by simply downloading and parsing the daily Apache raw access log file (which I should mention is in the NCSA combined/XLF/ELF log format, in case you were wondering).

As you can see, Awstats can only get me so far (actually, not that far at all).

Friday, May 25, 2007

Analyze Apache Raw Access Logs in FoxPro

I've been increasingly interested in examining the access logs on my server. For the past 6 or 7 years, I've used some of the cool little tools provided by cPanel located in their Web/FPT Statistics folder (like Webalizer and Awstats). While nice to look at, I've always felt a bit handicapped by the interface and general content. I have my own ideas on what I'd like to see and how I want to see it.

In this and subsequent articles, I'll build an Apache Raw Access Log analysis application that will (a) download the access logs once or twice a day, (b) dump the data into a database, (c) integrate this data with other information (perhaps some demographics info, error logs, site information, date and times of blog posts, etc) (d) and in the end, build a suite of reports and filters that can be used to identify things like abuse, spam, and positive traffic.

My hope is to do a good enough job with this that, in the end, could end up in the Fox Open Source community if there's even a remote bit of interest. This should evolve in such a way that any access log from any server would be supported, but in the initial stages, I'll be using my Raw Access Logs from Apache. For those interested, here are the details of my server configuration:

Operating system: Linux
Apache version: 1.3.37 (Unix)
cPanel Build: 10.9.0-STABLE 9966
cPanel Pro: 1.0 (RC36)

The Log File



I started by looking at the raw files created by Awstats on my server (/tmp/awstats/). There is a ton of information in there which is already parsed and used by the various free analysis tools. But what I really need is a raw file that contains a line for each 'hit' to my server (including 200, 404, and 500 messages). The Raw Access Logs are perfect (in cPanel, you can manage them by opening the Raw Log Manager). They contain a line for each hit, some referral information, http status codes, etc.

The only problem with the file is its format (NCSA combined/XLF/ELF)! A single line looks like:

206.17.xxx.xxx - - [23/May/2007:07:59:21 -0500] "GET /Styles/color.css HTTP/1.1" 304 - "http://www.rabbitstewlounge.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; InfoPath.1)"

Fortunately, FoxPro can handle this sort of thing quite easily. With a log file containing potentially thousands of lines each day, you can see how reading this file without parsing can be nightmarish (not to mention highly inefficient). So, this is where I started. I downloaded today's access file from my site, and wrote the following code. As this project matures, I'll be sure to clean this up (hopefully with some reader's suggestions!) and provide the source in zip format. For now, follow along:

Get Started


First, download your raw access file. Load cPanel and click "Raw Access Logs". Select the raw access log for your domain and unzip it. In this example, I've also changed it's name from "rabbitstewlounge.com" to "raw_log.txt".

The Basic Code


I left out some environmental settings, such as SET SAFETY and SET MEMOWIDTH for now. Eventually this will be part of a project, which will set all these things later. Also, as this code is merely used to test the theory, I haven't included any real error checking. I'd be interested in hearing some ideas on how to tweak this for better speed and reliability.

I did consider writing this using low-level file functions (FOPEN, FGETS, FREAD, etc), but in the end I felt that the format was too wacky and I had better rely on STREXTRACT and SUBSTRING instead.


CREATE DATABASE C:\webstats\webstats
OPEN DATABASE C:\webstats\webstats
CREATE TABLE C:\webstats\webstats.dbf (;
stat_id integer AUTOINC,;
ip_address character(15),;
date_time datetime,;
gmt_offset character(5),;
access_method character(4),;
location varchar(200),;
http_version character(10),;
http_code character(3),;
bytes i,;
referrer varchar(200),;
agent varchar(200),;
source_file varchar(60))


lcData = FILETOSTR("C:\webstats\raw_log.txt")

FOR n = 1 TO MEMLINES(lcData)

lcLine = MLINE(lcData,n)

*-- IP address is first.
lcIP = ALLTRIM(SUBSTR(lcLine,1,AT("- -",lcLine)-1))

*-- date time and GMT offset info. Use STREXTRACT to get the data between
*-- the first two quotes on the line. Then break the datetime string
*-- into separate lines to use MLINE to build a proper datetime string
lcDateTimeGMT = STRTRAN(STREXTRACT(lcLine,"[","]")," ",CHR(13))
lcDT = STRTRAN(STRTRAN(MLINE(lcDateTimeGMT,1),"/",CHR(13)),":",CHR(13))
IF !EMPTY(lcDT)
ltDatetime = DATETIME(;
VAL(MLINE(lcDT,3)),;
get_monthnum(MLINE(lcDT,2)),;
VAL(MLINE(lcDT,1)),;
VAL(MLINE(lcDT,4)),;
VAL(MLINE(lcDT,5)),;
VAL(MLINE(lcDT,6)))
ELSE
ltDatetime = {}
ENDIF

lcGMTOffset = MLINE(lcDateTimeGMT,2)

*-- Request data. Just to 'switch-it-up' a bit, I'm using GETWORDNUM
*-- here because I need 3 chunks of data within the STREXTRACT, all
*-- conveniently separated by a space.
lcRequest = STREXTRACT(lcLine,'"','"')
lcMethod = GETWORDNUM(lcRequest,1)
lcLocation = GETWORDNUM(lcRequest,2)
lcHTTPVer = GETWORDNUM(lcRequest,3)

*-- HTTP Code and bytes returned.
lcHTTPCode = SUBSTR(lcLine, AT(lcRequest,lcLine)+LEN(lcRequest)+2,3)
lnBytes = VAL(SUBSTR(lcLine,AT(" " + lcHTTPCode + ;
" ",lcLine)+LEN(lcHTTPCode)+2,;
AT('"',lcLine,3) - (AT(" " + lcHTTPCode + " ",lcLine)+;
LEN(lcHTTPCode)+2)))

*-- referrer
lcReferer = STREXTRACT(lcLine,'"','"',3)

*-- User's agent
lcAgent = STREXTRACT(lcLine,'"','"',5)

INSERT INTO webstats (ip_address,date_time,gmt_offset,access_method,;
location,http_version,http_code,bytes,referrer,agent,source_file ) ;
VALUES (lcIP , ltDatetime , lcGMTOffset , lcMethod , lcLocation ,;
lcHTTPVer ,lcHTTPCode,lnBytes,lcReferer,lcAgent,"raw_log.txt" )

NEXT


FUNCTION get_monthnum
LPARAMETERS tcAbbr

DIMENSION aMonthList[12]
aMonthList[1] = "JAN"
aMonthList[2] = "FEB"
aMonthList[3] = "MAR"
aMonthList[4] = "APR"
aMonthList[5] = "MAY"
aMonthList[6] = "JUN"
aMonthList[7] = "JUL"
aMonthList[8] = "AUG"
aMonthList[9] = "SEP"
aMonthList[10] = "OCT"
aMonthList[11] = "NOV"
aMonthList[12] = "DEC"

RETURN ASCAN(aMonthList,LEFT(UPPER(ALLTRIM(tcAbbr)),3))
ENDFUNC


Next entry, I'll see about downloading this file automatically (might take some ingenuity) and building in some logic to do so periodically. After that, I'll write some algorithms to identify potential abuse situations, spam, and increased traffic, setting up email notifications in the process. Stay tuned!

Thursday, May 17, 2007

First FoxPro, now Jericho

It seems like all my favorites are getting canceled. I only watched one network show on television with any kind of regularity: Jericho on CBS. Now, after a mediocre second-half of the first season (CBS' fault, by the way), CBS opted to cancel it in favor of more reality TV (something about a Lord of the Flies rip-off with kids in Mexico). If you watch Jericho, or want to sign another petition, please do so at: petitiononline.com

If you're really into Jericho, please mail a bag of nuts to their studio (they'll know why you sent them -- it's a reference to the final episode in which Jake tells the invading town of New Bern "Nuts" when asked to surrender Jericho):

CBS TV
ATTN: President RE: JERICHO
51 West 52dn Street
New York New York 10019

But I'm not posting this here for Jericho support. I'm wasting my digital space today (and possibly your time!) to rant about how Corporations really run our lives. Soma Somasegar of Microsoft tells us (my words:), 'Hey, no more FoxPro. I know its hard to hear that you need to learn something else, but why don't you check out .NET!'. And CBS (Kelly Kahl, CBS’ chief scheduling executive in particular) is saying (my words:), 'hey, no more Jericho. I know it's hard to hear that you need to watch something else, but why don't you check out CSI?'.

Whether its our livelihood or our down-time, we're being run by a handful of corporations that mold and shape us into their image. So much for the little guy. As FoxPro developers, we're outnumbered. The old saying that 'power exists in numbers' is very true. That's one of the reasons people like me like to hold on to the niche, the subculture, the atypical, exceptional, and extraordinary!

Jericho has a decent chance of being picked up by the Sci-fi Network, TNT, or others. I wish the same can be said for FoxPro.

Wednesday, May 16, 2007

Take the Survey!

Take a moment to fill out this quick survey (takes about 5 minutes) published by A List Apart. The purpose of this survey is to create some metrics that can be used to gauge the vitality of Web Development as a whole. From A List Apart's website:


Designers, developers, project managers. Writers and editors. Information architects and usability specialists. People who make websites have been at it for more than a dozen years, yet almost nothing is known, statistically, about our profession. Who are we? Where do we live? What are our titles, our skills, our educational backgrounds? Where and with whom do we work? What do we earn? What do we value?


Take the Survey here: http://alistapart.com/articles/webdesignsurvey

Thursday, May 10, 2007

What's faster? SQL or looping to return a set of records?

When you need to get a set of records (for example, to return a list of items from a lookup table, to prepare a report, or to fill a read-only grid), which method should you use?

Of course, the answer is 'it depends' and many factors are at play. It is part of our job as developers (or programmers, if you'd prefer) to pick the solution that best fits the scenario. As much has been written on this topic already, I won't be revealing anything new here: Just a re-package, if you will.

First, the basics. There are several ways to go through a table with the intention of producing a set of resulting records. You can SCAN, DO WHILE, use a FOR..NEXT loop, or use SQL syntax. The first three options are procedural and process one record at a time. The last (SQL) is set-based. To get an idea on what looping mechanism to use, take a look at Andy Kramek's blog entry from March 2006 called "Writing Better Code (Part 2)". In this entry, Andy does a great job of showing you the effects of indexes on the three looping mechanisms, and should give you a good idea which one to use and when.

Using SQL syntax, you can return a result of records as well. You can replace your looping syntax with SQL syntax and get the same results (using much less code). When gathering data for a report, for example, I have often used the loop method to populate a special report cursor. I was inclined to stick with the procedural approach in the past because (a) it was easy enough to write, (b) easy to debug, (c) allowed me to easily insert a custom progress bar if needed, and (d) because I thought it was best. As my SQL skills improved, I found that SQL syntax is actually (a) easier to write, (b) just as easy to debug, (c) capable of supporting a custom progress bar, and (d) may in fact be best under many circumstances. This is especially true if you are trying to write portable code (duh!).

To demonstrate my point, I first took a table with 125,000 records with no index (CDX) file. This first case will process each record in the table. My benchmarking methodology here would make anyone who has ever done benchmarking in a scientific way before blood boil, but hey, I never claimed to be a scientist (I don't even play one on TV)!

First, are we all on the same page?


SET ANSI OFF
SET EXACT OFF
SET TALK OFF
SET DELETED ON
CLEAR


Carry on...


USE name_list IN 1 EXCL
nSecs = SECONDS()
CREATE CURSOR crName (name c(20))
SELECT name_list
LOCATE
DO WHILE NOT EOF()
INSERT INTO crName VALUES (name_list.name)
SKIP
ENDDO
? SECONDS() - nSecs && results: 0.313 seconds (10 runs)
USE IN name_list

*-----

USE name_list IN 1 EXCL
nSecs = SECONDS()
CREATE CURSOR crName (name c(20))
SELECT name_list
FOR nRec = 1 TO RECCOUNT()
GOTO nRec
INSERT INTO crName VALUES (name_list.name)
NEXT
? SECONDS() - nSecs && results: 0.281 seconds (10 runs)
USE IN name_list

*-----

USE name_list IN 1 EXCL
nSecs = SECONDS()
SELECT name FROM name_list INTO CURSOR crName
? SECONDS() - nSecs && results: 0.000 seconds (10 runs)
USE IN name_list


So what on earth is going on? The SQL Statement processes that entire result in (literally) no time at all! The answer can be found in a comment made in a recent blog posting of mine. Sergey explained that VFP doesn't actually run any SQL. All it does is create a filtered cursor on the underlying dataset. This is why RECCOUNT could be "wrong", as discussed in my blog entry. Check out that link to Sergey's site to read more about this. You can check what you got simply by issuing JUSTEXT(DBF()) (as Sergey does): "DBF" means a filtered cursor was created, while "TMP" means that a new cursor was created.

Additional Notes: When there is a CDX file (with no index set), the FOR loop seems to suffer a little: On average, the FOR loop takes an additional .020 seconds to run. The existence of a CDX file does not seem to bother the DO WHILE or SQL method in the above case.

Now, let's add a condition that the name was added to the table on or after 01/01/2005. The first run of this test is with no CDX file at all. If a CDX file exists, but no index set, then the results are a bit different (see notes after the code):


USE name_list IN 1 EXCL
nSecs = SECONDS()
CREATE CURSOR crName (name c(20))
SELECT name_list
LOCATE
DO WHILE NOT EOF()
IF created>={^2005-01-01}
INSERT INTO crName VALUES (name_list.name)
ENDIF
SKIP
ENDDO
? SECONDS() - nSecs && results: 0.234 seconds (10 runs)
USE IN name_list

*-----

USE name_list IN 1 EXCL
nSecs = SECONDS()
CREATE CURSOR crName (name c(20))
SELECT name_list
FOR nRec = 1 TO RECCOUNT()
GOTO nRec
IF created>={^2005-01-01}
INSERT INTO crName VALUES (name_list.name)
ENDIF
NEXT
? SECONDS() - nSecs && results: 0.235 seconds (10 runs)
USE IN name_list

*-----

USE name_list IN 1 EXCL
nSecs = SECONDS()
SELECT name FROM name_list WHERE created>={^2005-01-01} INTO CURSOR crName
? SECONDS() - nSecs && results: 0.125 seconds (10 runs)
USE IN name_list


Here, with no CDX file, VFP seems to actually process the SQL (it doesn't create a filtered cursor), and does so rather fast. If you add an INDEX on created date, FoxPro goes ahead and creates the filtered cursor (which again is processed in zero seconds).

Additional Notes: When there is a CDX file (even with no index set), the FOR loop seems to suffer a little: On average, the FOR loop takes an additional .020 seconds to run. The existence of a CDX file does not seem to bother the DO WHILE statement.

OK, so now what if I actually use the INDEX, and SET ORDER TO before all of this?


USE name_list IN 1 EXCL ORDER created
nSecs = SECONDS()
CREATE CURSOR crName (name c(20))
SELECT name_list
LOCATE
DO WHILE NOT EOF()
IF created>={^2005-01-01}
INSERT INTO crName VALUES (name_list.name)
ENDIF
SKIP
ENDDO
? SECONDS() - nSecs && results: 1.047 seconds (10 runs)
USE IN name_list

*-----

USE name_list IN 1 EXCL ORDER created
nSecs = SECONDS()
CREATE CURSOR crName (name c(20))
SELECT name_list
FOR nRec = 1 TO RECCOUNT()
GOTO nRec
IF created>={^2005-01-01}
INSERT INTO crName VALUES (name_list.name)
ENDIF
NEXT
? SECONDS() - nSecs && results: 0.235 seconds (10 runs)
USE IN name_list

*-----

USE name_list IN 1 EXCL ORDER created
nSecs = SECONDS()
SELECT name FROM name_list WHERE created>={^2005-01-01} INTO CURSOR crName
? SECONDS() - nSecs && results: 0.000 seconds (10 runs)
USE IN name_list



The Do WHILE really gets hit hard, as you can see. The FOR LOOP works as
well as before, and the SQL once again gives us a filtered cursor.

As you can see, you can really take advantage of this behavior to create simple cursors with simple predicates. Even if VFP can't create the filtered cursor (for example, applying an ORDER BY clause on the SQL statements), the results are still usually faster.

In a subsequent posting, I'll increase the complexity of these tests. In the meantime, I would be interested in any feedback to these cases.

Friday, May 04, 2007

Take That, Microsoft!

We just picked up 5 new licenses for VFP9, with the more on the way. I am very happy about this decision (it was in management's hands, certainly not mine). We are in the process of migrating a VFP7 application, which has a few hundred installations around the country, to VFP9. The move was almost squashed thanks to MS's latest announcement. VFP9 SP1 will make our application run better, give the engineers here more features and tools, open more doors into the XML world, and allow us to bother MS for support a little while longer.

Tuesday, May 01, 2007

VFP9 bug with _TALLY, RECCOUNT, DELETED, and an INDEX

Maybe I'm missing something (and I hope I am). The following condition exists in VFP7 through to VFP9SP1 (and maybe earlier than 7). I don't remember ever reading anything about it either.

This one is hard to explain (and believe), so you may need to run the example code to see what I mean. And right off the bat: This issue has nothing to do with the RECCOUNT/_TALLY results we all know about regarding deleted records (RECCOUNT always returns the number of rows in the table regardless of the SET DELETED setting). No, this issue has something to do with an index and the current setting of DELETED. When an index is set on a column involved in a WHERE clause of an SQL statement, RECCOUNT will return the total number of records from the 'FROM' table instead of the results from the cursor when SET DELETED is OFF! It seems as though the resulting cursor has a bunch of deleted records (the scrollbar in the browse window only goes half-way up (if you know what I mean)), but they can't be accessed regardless of the SET DELETED setting.

The problem is with code like this:


SET DELETED OFF
SELECT sometable
SET INDEX ON somefield && can exist in a CDX too
SELECT * FROM sometable WHERE somefield = somevalue INTO CURSOR crTemp
? RECCOUNT('crTemp') && wrong! RECCOUNT returns the wrong reccount!
? _TALLY && correct results


Here's a complete example, using a table called 'name_list', which has a column for name c(30) and some deleted records. In the first example, the table name_list does NOT have an index on 'name'. RECCOUNT and _TALLY return expected results.


USE name_list IN 0 EXCL

SET DELETED OFF

SELECT name FROM name_list WHERE name>='H' INTO CURSOR crName
? "WITH DELE OFF: "
?? _TALLY && 81,019
?? RECCOUNT('crName') && 81,019

SET DELETED ON

SELECT name FROM name_list WHERE name>='H' INTO CURSOR crName
? "WITH DELE ON: "
?? _TALLY && 80,951
?? RECCOUNT('crName') && 80,951


Now, add an index to the name column and be amazed:


USE name_list IN 0 EXCL
INDEX ON name TAG name

SET DELETED OFF

SELECT name FROM name_list WHERE name>='H' INTO CURSOR crName
? "WITH DELE OFF: "
?? _TALLY && 81,019
?? RECCOUNT('crName') && 126,607 -- the actual number of records in name_list!?!

SET DELETED ON

SELECT name FROM name_list WHERE name>='H' INTO CURSOR crName
? "WITH DELE ON: "
?? _TALLY && 80,951
?? RECCOUNT('crName') && 80,951


In the above example, it doesn't matter if the index is set to name (SET ORDER TO name) or not! Just the fact that the index exists throws off RECCOUNT(). If you do a count on crName with DELETED OFF, you get the expected results:


USE name_list IN 0 EXCL
INDEX ON name TAG name

SET DELETED OFF

SELECT name FROM name_list WHERE name>='H' INTO CURSOR crName
? "WITH DELE OFF: "
?? _TALLY && 81,019
?? RECCOUNT('crName') && 126,607 -- the actual number of records in name_list!?!
COUNT TO ln
?? ln && 81,019



The moral is not to use RECCOUNT() to get the total number of records from
an SQL statement. Always use _TALLY. But unfortunately, this undocumented behavior has certainly crept into someone's code somewhere. Consider a situation where you want to test to see if the SQL returns any records. You might do something like this:


SELECT * FROM sometable WHERE somefield = somevalue INTO CURSOR crTemp
IF RECCOUNT() > 0
...
ENDIF


This code is fatal if in fact there happens to be an index on somefield and SET DELETED is OFF! If anyone has some insight, I'd love to hear it...

Monday, April 23, 2007

Use Beyond Compare to compare tables in VFP

In this post I'll show how you can use Beyond Compare's scripting ability to compare DBFs from VFP.

"Your true value depends entirely on what you are compared with."
Bob Wells


Occasionally there is a need to compare tables in two directories to make sure that they are the same (or different). You can use my Intersect example from my last post using VFP code, but this will only get you to the row. Additional code would be needed to identify the exact column that is different. Although this task is certainly possible in VFP, I thought that using Beyond Compare (BC) would be a nicer solution.

Not too long ago I found a way to 'integrate' BC from within VFP using BC's script capability. It's tough to find good documentation on how to do it, though. I'm all ears if anyone has a good reference. Anyway, the following code demonstrates how I was able to accomplish the task. The output is sent to an HTML file (which I thought was easiest to view/navigate).


LOCAL lcDirA, lcDirB, lcDisplayOption,;
lcFileA, lcFileB, cVar As String
LOCAL ARRAY aDirA[1,1], aDirB[1,1]
LOCAL llDiffOnly AS Boolean
LOCAL n AS Integer

*-- step 1, get the directories
lcDirA = GETDIR()
lcDirB = GETDIR()
IF !DIRECTORY(lcDirA + "BC_A")
MKDIR lcDirA + "BC_A"
ENDIF
IF !DIRECTORY(lcDirB + "BC_B")
MKDIR lcDirB + "BC_B"
ENDIF

*-- step 2, load the tables into the array
lnDirA = ADIR(aDirA,lcDirA+"*.DBF")
lnDirB = ADIR(aDirB,lcDirB+"*.DBF")

*-- step 3, some options
llDiffOnly = .t. && set to .f. to display all records
lcDisplayOption = IIF(llDiffOnly,"display-mismatches","display-all")

*-- step 4, loop through the files that appear in both (the intersect)
FOR n = 1 TO lnDirA
IF ASCAN(aDirB,aDirA[n,1]) > 0
*-- copy to CSV (you will have to handle memo fields differently!)
lcFileA = lcDirA+aDirA[n,1]
USE (lcDirA+aDirA[n,1])
COPY TO lcDirA + "BC_A\" + FORCEEXT(aDirA[n,1],"CSV") TYPE CSV
USE IN (aDirA[n,1])
lcFileB = lcDirB+aDirA[n,1]
USE (lcFileB)
COPY TO lcDirB + "BC_B\" + FORCEEXT(aDirA[n,1],"CSV") TYPE CSV
USE IN (aDirA[n,1])
ENDIF
NEXT

*-- Step 5, set up the script file
lcResultFile = FORCEEXT("c:\temp\BC_" + DTOC(DATE(),1) ,"HTM")
TEXT TO cVar NOSHOW
load <<lcDirA+"BC_A\">> <<lcDirB+"BC_B\">>
expand all
select all.files
option confirm:no-to-all
file-report layout:side-by-side options:<<lcDisplayOption>>
title:"<<_SCREEN.caption>>" output-to:<<lcResultFile>>
output-options:html-color,wrap-none
ENDTEXT

STRTOFILE(TEXTMERGE(cVar),"c:\temp\bcvfp_script.txt")

*-- Step 6, Run Beyond Compare
RUN /N C:\Program Files\Beyond Compare 2\bc2.exe @c:\temp\bcvfp_script.txt




Please note, that BC will open its own dialog that will display script, log,
and error information. I usually check the box 'Close when finished' so I have an idea when BC actually completed the task. Then I just browse to the HTML file created.

Of course, there is plenty of room for improvement in the above code. I've started a project to combine this with several other methods to do an exhaustive comparison of tables from one directory to another (which would include index, structure, and memo field comparisons). I hope to publish some of that code in upcoming blog entries.

Beyond Compare
A general scripting reference for BC

*Update*
It seems I always get bit by this: In my code example above, the HTML parser removed several chunks of code because they were in between < and >. I need to get into the habit of using the ISO-8859-1 character set codes instead. Blogger doesn't convert these things for me... The code in the above example has been updated.

Tuesday, April 17, 2007

SQL implementations, database servers, and VFP

When connecting to a database server (perhaps using the ADO connection object or SQL Passthrough (SPT)), you need to keep in mind that some familiar VFP SQL syntax won't work against a database that doesn't support it. For a simple example, in SPT, you cannot do the following on a SQL Server 2005 database (for more info on connecting to SQL Server, see the article 'How to Connect to SQL Server from Visual FoxPro'):


cConn = ;
"driver=SQL Server;server=SERVERNAME;database=DATABASE;uid=;pwd="
nHnd = SQLSTRINGCONNECT(cConn,.f.)
IF nHnd <= 0
AERROR(aErrs)
=MESSAGEBOX('Connection failed. ' + aErrs[1,2], 16, 'SQL Error')
ELSE
*-- VFP allows abbreviation of most keywords to 4 letters,
*-- SQL Server does not. This code generates an error:
cSQL = "SELE TOP 50 * FROM store_transactions"
SQLEXEC(nHnd ,cSQL, "crTemp")
=SQLDISCONNECT(nHnd)
SELECT crTemp
BROWSE NORMAL
ENDIF


While this may be annoying for those of us who are trying to save a few miles in our carpel tunnels, it should be a reminder that the SQL we use is native to the database we've connected to. This is extremely important to understand because there may be some behavioral differences between VFP and other languages, which may not trigger a syntax error! I have yet to find a good resource that details these differences from a Visual FoxPro point of view (if anyone has a link, please pass it along!). Here is a site, prepared by Troels Arvin called 'Comparison of different SQL implementations' that compares Standard ANSI SQL to PostgreSQL, DB2, MS SQL, MySQL, and Oracle. Troels has licensed this under Creative Commons, which means it might be worth adding the document to the fox wiki with a VFP category.

On the flip side, it opens several doors into the proprietary SQL language of the connected database. For example, SQL Server 2005 supports different types of table operators (like APPLY, PIVOT, and UNPIVOT) and some new SET operations (EXCEPT and INTERSECT). The OVER clause is also very handy (for analytical rankings such as row_number, rank, dense_rank, and ntitle). While I'm no T-SQL guru, consider the following syntax as a healthy example of what I'm talking about. It uses the PIVOT operation (and a CASE statement for good measure) to create a silly little pivot table that lists how many registers are used just a few times, more than a few, and a lot of times.


cConn = ;
"driver=SQL Server;server=SERVERNAME;database=DATABASE;uid=;pwd="
nHnd = SQLSTRINGCONNECT(cConn,.f.)
IF nHnd <= 0
AERROR(aErrs)
=MESSAGEBOX('Connection failed. ' + aErrs[1,2], 16, 'SQL Error')
ELSE
cSQL = "SELECT store_id, afew, more, alot " + ;
"FROM ( " + ;
" SELECT store_id, store_register, " + ;
" CASE WHEN COUNT(tran_id) < 10 THEN 'afew' " + ;
" WHEN COUNT(tran_id) < 50 THEN 'more' " + ;
" WHEN COUNT(tran_id) >= 50 THEN 'alot' " + ;
" END AS totals " + ;
" FROM store_transactions " + ;
" GROUP BY store_id, store_register " + ;
") as tmp " + ;
"PIVOT( COUNT(store_register) " + ;
" FOR totals IN(afew,more,alot) ) AS tmp2"

SQLEXEC(nHnd ,cSQL, "crTemp")
=SQLDISCONNECT(nHnd)
SELECT crTemp
BROWSE NORMAL
ENDIF


Also note that using SQLEXEC, you can create stored procedures in the target database, among other things.

With VFP9, there is greater support for subqueries and other SQL elements common to other languages. Because of this, it is easy to write VFP-style SQL code that can be interpreted by a database server such as SQL Server without modification. But be aware of the nuances between the various SQL implementations and certainly take advantage of the propriety features (like PIVOT)!

Monday, April 16, 2007

Tod means Fox (b)

I decided to create a Windows Live Spaces account the other day in order to post a comment on a blog there. I was a little puzzled at what I could actually put in my new 'space'. So I decided to use it as a blog for posting interesting links that relate to VFP. The purpose is so that I can blog here on various topics and not clutter things with links to this-or-that. This may prove to be a dumb idea, but I'm going to give a try!

Tod means Fox (b)

Monday, April 09, 2007

Warehouse Data with Microsoft Visual FoxPro

If you haven't already noticed, I have an article in the April/May issue of Fox Advisor magazine entitled "Warehouse Data with Microsoft Visual FoxPro". This is my first published article, although I've been writing functional and technical documents for quite a while. I'd like to thank Andy Kramek for his help during the early stages (his initial edit was as valuable to me as it will be for you!) and anyone else involved with getting this to print. Heather Petrig did an amazing job channeling me in all the right directions.

In the end, I hope you like what I had to say. Please post any comments, questions, and/or suggestions you may have either here or in the Advisor forums (I'll check them occasionally).

PS: I'd like to mention that a follow-up article is currently in 'edit mode' (as I like to call it) and might be out in June/July. This follow-up article discusses VFP as an ETL tool; I'll have a spiffy new website, foxetl.com to go along with it.

Wednesday, April 04, 2007

The 64-bit excuse

Old news, but it sets up my question at the end of this post: Developers petition Microsoft to reconsider FoxPro phase out

...and the response from Microsoft on April 3rd (copied from the above article):


"For Microsoft to continue to evolve the FoxPro base, we would need to look at creating a 64-bit development environment and that would involve an almost complete rewrite of the core product."


Does anyone else feel as though the 64-bit issue is just an excuse? I'm not pretending to know what's involved with creating a 64-bit development environment for VFP. And I'm not disputing that this is one reason for the decision. But I feel that this has much more to do with channeling more money into .NET and SQL Server then the hardships Microsoft faces by a 64-bit redesign.

Am I on an island here? Or am I the last one off the boat?

Update!

OK, I must be the last one on the island. From the ComputerWorld article, "FoxPro users petition to keep database language alive":

"But FoxPro’s use of the open .dbf file format made it impossible for Microsoft to raise prices for the software. Even today, Visual FoxPro 9.0 lists for just $649. For no additional fee, developers can embed FoxPro in an unlimited number of their applications."

FoxPro, though wildly popular, became a burden and an opportunity cost for Microsoft. "Every time Microsoft sold a copy of FoxPro, I think Bill Gates thought about all the money they were losing from not being able to sell a copy of SQL Server," [Kevin] Cully [of Fox Forward] said.

VFP INTERSECTing SQL

For a long time, we have had UNION and UNION ALL to write SQL statements that act as an 'OR' operation between two SQL statements. The ORing grabs records from either the first or the second SQL. INTERSECT, though, acts as an AND, meaning that the results must exist in both sets. Unfortunately, there is no INTERSECT keyword in VFP for us to use.

I have found that the need for INTERSECT usually accompanies partitioned files, or when doing comparisons between two versions of files. For example, to find all the customers that have placed an order in 2004 and 2005, using two partitioned customer tables (partitioned by calendar years 2004 and 2005), you can use intersect set logic. In SQL Server 2005, the syntax is quite simple:


*-- In SQL Server 2005, you can:
SELECT cust_id, cust_name, cust_dob FROM sales_2004
INTERSECT
SELECT cust_id, cust_name, cust_dob FROM sales_2005


But in VFP9, and in prior versions of SQL Server, you have to do a little more work:


*-- IN VFP9, the following is equivalent
SELECT cust_id, cust_name, cust_dob ;
FROM (;
SELECT cust_id, cust_name, cust_dob FROM sales_2004 ;
UNION ALL ;
SELECT cust_id, cust_name, cust_dob FROM sales_2005 ;
) AS tmp1 ;
GROUP BY cust_id, cust_name, cust_dob ;
HAVING COUNT(*) = 2 ;
INTO CURSOR crIntersect


Please note these examples both make use of DISTINCT (SQL Server 2005 performs an implicit DISTINCT with INTERSECT). This means that it can only compare distinct records. What if the same customer ordered 12 products in 2004 and 8 in 2005? The above INTERSECT example would bring in the customer just once for each year. Although I have never had a need to write such a query, I have a few ideas on how it could be done. If anyone is bored or needs to burn 15 minutes, give it a try!

UPDATE

I failed to mention how the above code is better or different than simply using an inner join (as a colleague suggested). I had to think about this for a minute but then I recalled why using an inner join for intersecting isn't perfect: NULLS. Inner joins treat nulls as different, whereas the above syntax does not. To test this out for yourself, run the above SQL on two tables that have a column set to NULL (for example, a NULL DOB) for the same customer in both tables. You'll notice that the inner join refuses to recognize that NULL = NULL.

Tuesday, March 20, 2007

UNION oddity (or maybe it's just me)

I had a situation come up today where I needed to add the contents of one cursor (cursorA) to another (cursorB), without adding any duplicates in the process. Both cursors had the exact same structure. CursorA was going to be closed (a little later), while cursorB would persist (for another few processes). Immediately I turned to the UNION clause. My strategy was to select * from both tables, union them together, and put the results back into cursorB like this:

SELECT * FROM cursorB ;
UNION ;
SELECT * FROM cursorA ;
INTO CURSOR cursorB READWRITE

I was surprised to see the open dialog box pop up, basically asking me to locate cursorB! What happens here is that the INTO clause seems to strike first, essentially closing cursorB. I suppose I was a little surprised because I do stuff like this all the time:

SELECT * FROM cursorB WHERE .T. INTO CURSOR cursorB

I worked around this by modyfing my SQL to the following:

SELECT * FROM (SELECT * FROM cursorB) as tmp ;
UNION ;
SELECT * FROM cursorA ;
INTO CURSOR cursorB READWRITE

I should mention that this only works in VFP9. VFP7 will complain that the syntax is wrong (I presume the same in VFP8 too). The only solution that I can think of in VFP7 would be in two-steps:

SELECT * FROM cursorB ;
UNION ;
SELECT * FROM cursorA ;
INTO CURSOR cursorC

SELECT * FROM cursorC INTO cursorB READWRITE

Wednesday, March 14, 2007

Future of FoxPro

Yair Alan Griver (known as YAG) has posted on his blog "yag: Community and Architecture" a message for the VFP community: There will be no VFP10. I assume he means ever.

As other VFP bloggers will certainly do, I plan on talking more about this in the coming weeks. Especially in regards to CodePlex, and how VFP programmers might begin to approach the future. Make no mistake, VFP is a mature language and there are many VFP application out and about across the world. It's not like this news is a shock, nor will it send VFP applications to the graveyard in my opinion. More to come...

Parsing City/State/Zip

Many applications today still carry the city, state, and zip in a composite field. The problem with this type of concatenation is that you can't easily query the data or use it for other things (such as building a report by state). There really is no benefit or good reason to keep them together. I have yet to hear a good 'excuse':

1.) This is how I got the data
2.) It makes it easier for the end-user to enter everything in a single field
3.) I don't have to worry about combining them on labels
4.) (insert your bad excuse here)

To add value to these legacy composite fields, you can write a program to split them up.

Usually, the data has some form such as "city, state zip" where the state is almost certainly abbreviated. The following program can be used to parse out city, state, and zip from a concatenated field into atomic parts. Two assumption are made with this program (important ones, at that): that state will be 2 digits and that city names don't begin with a number. Another excellent approach would be to grab a zipcode file from the post office (or other data supplier) so that you could compare each segment to determine not only that it IS a zip code or city or state abbreviation (or full state name for that matter) but that it actually exists! I'll leave that part up to you! Here is my simple solution to get things atomic:


lcCity = ""
lcState = ""
lcZip = ""
lcCSZ= "Marlborough, MA 01752"
lcCSZ= "North Providence RI 02988"
lcCSZ= "Boston MA"

? parse_csz(lcCSZ , @lcCity, @lcState , @lcZip )
? "city: " + lcCity
? "state: " + lcState
? "zip: " + lcZip

FUNCTION parse_csz
LPARAMETERS tcFullString , rcCity , rcState , rcZip

LOCAL lcFullString AS String
LOCAL lnSegments AS Number

STORE "" TO rcCity , rcState , rcZip && reference parameters

*-- remove any commas and convert to spaces
lcFullString = STRTRAN( ALLTRIM(tcFullString ) , "," , " " )

*-- get the number of segments
lnSegments = GETWORDCOUNT( lcFullString )

*-- try to be smart and figure out what to do depending
*-- on the number of segments found
DO CASE
CASE lnSegments >= 3

rcZip = GETWORDNUM( lcFullString, lnSegments)
rcState = GETWORDNUM( lcFullString, lnSegments- 1)

FOR x = 1 TO lnSegments - 2
rcCity = rcCity + " " + GETWORDNUM( lcFullString, x )
NEXT

rcCity = ALLTRIM( rcCity )

CASE lnSegments = 2

IF ISDIGIT( GETWORDNUM(lcFullString, 2) )
rcZip = GETWORDNUM(lcFullString, 2)
ELSE
IF LEN( GETWORDNUM(lcFullString, 2) ) > 2
rcCity = GETWORDNUM(lcFullString, 1) + " " + ;
GETWORDNUM(lcFullString, 2)
ELSE
rcState = GETWORDNUM(lcFullString, 2)
rcCity = GETWORDNUM(lcFullString, 1)
ENDIF
ENDIF

CASE lnSegments = 1

IF ISDIGIT( lcFullString )
rcZip = lcFullString
ELSE
IF LEN( lcFullString ) > 2
rcCity = lcFullString
ELSE
rcState = lcFullString
ENDIF
ENDIF
ENDCASE

*-- Return in a mailing format
RETURN rcCity + " " + rcState + " " + rcZip

Friday, March 02, 2007

CoDe's Sedna Issue

I got my Sedna issue of CoDe magazine the other day, and just finished reading through it. I was a little excited when it came in, and was rather pleased with the content. This was their third Fox Focus issue (why not more?). Articles by Doug Hennig, Bo Durban, Rick Schummer, Rick Strahl and others dove right into some of Sedna's great new improvements. Although most of the content wasn't "new" -- in that I, like most VFP developers, have known about these wonderful enhancements for some time. But it is great to see them in print, in an issue especially designed for Fox.

I do admit that I skimmed over the VFP to .net stuff. I'm just not interested in porting my VFP applications to .net. Whenever I read through these sorts of things I just find myself frustrated, puzzled, and dismayed all at once. First, moving to .net would require a complete re-write. Second, I've developed a few .NET apps in my day and still feel that VFP (for many projects) is light years ahead of .net. Of course, I'm excited about the new C# (my .net language of choice), the cool features of the Development Studio, and LINQ. But for a Fox Focus issue, it would have been great to umm... focus more on Fox! But I digress..

Now that I have that off my chest, I'd like to comment on a few of the articles I found most interesting. First I was really happy to see Rick's article "Visual FoxPro Web Services Revisited". Rick gives us the goods on the Web Service Client process, ASMX, Windows Communication Foundation (WCF), and the status of SOAP. Communicating with the Web has never been fully straightforward in VFP, but with a little guidance and shove once in a while, future VFP development using Web Services looks bright.

Craig Boyd ("Welcome to the Future of Deployment") and John M. Miller ("Integrating VFP into VSTS Team Projects") show us how to use some of Visual Studio's coolest features with VFP. Craig discusses ClickOnce -- a deployment technology allowing you to do installs and upgrades on the Web -- and how you can use it with VFP. This is a must read for anyone looking to update their distribution strategy. The company I just started working for is just beginning to do Web installs (after years of mailing CDs). This is one of the first articles I'll be showing management.

Miller's article really took me by surprise. Visual Studio Team System (VSTS) is Microsoft's SDLC tool integrated into Visual Studio 2005. Although I haven't used it in the past, this article sparked my interest. VSTS promises to help manage the often difficult task of integrating resources on complex projects. The only problem is that VSTS is a .net thing, and really not designed to use non-dotnet tools out of the box. However, as John explains, you can use VSTS extensibility features to make it happen. Thanks John!

Lastly, this issue is packed with little sidebars and tidbits. Things like ClassBrowserX, Scrollable containers for VFP, and GDIPlus-X. I think this issue has also inspired me to join CodePlex and maybe jump on one of the VFP projects (at least as a tester).

Take a look for more: http://www.code-magazine.com/focus/index.aspx

Wednesday, February 07, 2007

Ragged Hierarchy Alert

So this week I was faced with writing yet another hierarchy script. I say this with affection, of course. I find that developing efficient hierarchies can sometimes be quite a challenge depending on the data set. When there are only a few records (a hundred or so) almost any method will work (SQL, procedures, or a mixture). As the data sets grow, the method becomes more and more important. Couple this with the hierarchy requirements (can a child have more than one parent? Is there a maximum depth? Will the result be used to populate a tree control? Do I need to process each branch now or can I just get the top level nodes? Etc.) and the tidy and efficient code you imagined is now becoming bloated and difficult to debug.

There are two ways to store hierarchies in a database: Using a recursive pointer or a bridge table. The recursive pointer is possibly the most widely used. Each record has a primary key and a foreign key that points to the primary key of another record in the same table (more on this in a bit). This approach is simple to maintain and intuitive. The bridge table provides much more flexibility (for example, a child can have more than one parent), but is more difficult to maintain (you do have another table to update, and there could be issues with time stamps and sequences if a child can have multiple parents). The discussion on bridge table hierarchies ends here. I may address it in a future blog, however.

The key to ragged hierarchies is that any record in the table can participate to any depth or degree. Organizational hierarchies (which I will demonstrate in a moment) are very common and take on a ragged frame; they are also perfect for the recursive pointer. The following figure will give you an idea of how a ragged hierarchy might look:



Like I said earlier in this post, there are lots of ways to snake through this diagram: from using the OVER clause in SQL Server 2005 to writing a procedural program in FoxPro, the possibilities are as varied as the structure. In my VFP applications, I always find that a mixture of SQL, some procedural code, and a recursive function provides the most flexibility and greatest performance. To demonstrate, consider the following:


CREATE CURSOR Employee (emp_id i, emp_name c(25), sup_id i)
INSERT INTO Employee VALUES (1,"Martha Jones",0)
INSERT INTO Employee VALUES (2,"Dan Brown",1)
INSERT INTO Employee VALUES (3,"Ed Smith",2)
INSERT INTO Employee VALUES (4,"Sarah Parker",2)
INSERT INTO Employee VALUES (5,"Henry Johnson",1)
INSERT INTO Employee VALUES (6,"Samual Smyth",5)
INSERT INTO Employee VALUES (7,"Ali Jennah",1)
INSERT INTO Employee VALUES (8,"Tori Heart",7)
INSERT INTO Employee VALUES (9,"Bob Jones",7)
INSERT INTO Employee VALUES (10,"Kelly Robinson",7)
INSERT INTO Employee VALUES (11,"Manny Diaz",7)
INSERT INTO Employee VALUES (12,"Jack White",11)
INSERT INTO Employee VALUES (13,"Melinda Jo",11)

CREATE CURSOR crTree (;
top_level l,;
this_id i,;
description c(25),;
parent_id i,;
deep n(2))

*-- run the recursive function
SELECT Employee
LOCATE
snake(0,0) && start at the top

FUNCTION snake (tnParenID, tnDeep)

LOCAL lcAlias
lcAlias = "crTemp_" + TRANSFORM(tnParenID)
SELECT * FROM Employee WHERE sup_id = tnParenID INTO CURSOR (lcAlias)

tnDeep = tnDeep + 1

SELECT (lcAlias)
SCAN
put_record(&lcAlias..emp_id, &lcAlias..emp_name, &lcAlias..sup_id, tnDeep)
snake(&lcAlias..emp_id, tnDeep)
ENDSCAN
USE IN (lcAlias)

ENDFUNC

FUNCTION put_record (tnThis_id, tcDescription , tnParent_id, tnDeep)
INSERT INTO crTree (;
top_level ,;
this_id ,;
description,;
parent_id ,;
deep ) ;
VALUES (;
tnParent_id=0,;
tnThis_id ,;
tcDescription,;
tnParent_id ,;
tnDeep )
ENDFUNC




I start off by creating a dummy Employee table, followed by a tree cursor (crTree)
to hold the results of my work. Next I call the recursive function 'snake', whose job it is to find each child of the passed in parent, tracking its depth as it goes. The assumption in this program is that there is at least one zero sup_id key for the top-most level. This code works well to populate treeview controls (you could easily add a unique character sequence ID to each record). You can also output a text representation:


FUNCTION show_tree()
SELECT crTree
LOCATE
SCAN
? REPL("-",crTree.deep) + " " + ALLT(crTree.description)
ENDSCAN
ENDFUNC


There you have it! For 1000 employees, the snake function takes about .8 seconds to run. Anything more than that and you should consider populating the tree only to a certain level (perhaps all parents and one child deep) and filling in the rest as needed. Another approach would be to turn this cursor into a table, relate it to your Employee table and update it as part of a save routine.