Wednesday, April 04, 2007

VFP INTERSECTing SQL

For a long time, we have had UNION and UNION ALL to write SQL statements that act as an 'OR' operation between two SQL statements. The ORing grabs records from either the first or the second SQL. INTERSECT, though, acts as an AND, meaning that the results must exist in both sets. Unfortunately, there is no INTERSECT keyword in VFP for us to use.

I have found that the need for INTERSECT usually accompanies partitioned files, or when doing comparisons between two versions of files. For example, to find all the customers that have placed an order in 2004 and 2005, using two partitioned customer tables (partitioned by calendar years 2004 and 2005), you can use intersect set logic. In SQL Server 2005, the syntax is quite simple:


*-- In SQL Server 2005, you can:
SELECT cust_id, cust_name, cust_dob FROM sales_2004
INTERSECT
SELECT cust_id, cust_name, cust_dob FROM sales_2005


But in VFP9, and in prior versions of SQL Server, you have to do a little more work:


*-- IN VFP9, the following is equivalent
SELECT cust_id, cust_name, cust_dob ;
FROM (;
SELECT cust_id, cust_name, cust_dob FROM sales_2004 ;
UNION ALL ;
SELECT cust_id, cust_name, cust_dob FROM sales_2005 ;
) AS tmp1 ;
GROUP BY cust_id, cust_name, cust_dob ;
HAVING COUNT(*) = 2 ;
INTO CURSOR crIntersect


Please note these examples both make use of DISTINCT (SQL Server 2005 performs an implicit DISTINCT with INTERSECT). This means that it can only compare distinct records. What if the same customer ordered 12 products in 2004 and 8 in 2005? The above INTERSECT example would bring in the customer just once for each year. Although I have never had a need to write such a query, I have a few ideas on how it could be done. If anyone is bored or needs to burn 15 minutes, give it a try!

UPDATE

I failed to mention how the above code is better or different than simply using an inner join (as a colleague suggested). I had to think about this for a minute but then I recalled why using an inner join for intersecting isn't perfect: NULLS. Inner joins treat nulls as different, whereas the above syntax does not. To test this out for yourself, run the above SQL on two tables that have a column set to NULL (for example, a NULL DOB) for the same customer in both tables. You'll notice that the inner join refuses to recognize that NULL = NULL.

No comments: