Tuesday, December 12, 2006

Sample Standard Deviation (and SQL)

The CALCULATE command contains a range of statistical operations that can be utilized in FoxPro. These commands work on fields in a table (or cursor) and can be filtered using a FOR clause and SCOPE operator. Calculating the mean, standard deviation, and variance is rather straightforward:

CALCULATE AVG() TO
CALCULATE STD() TO
CALCULATE VAR() TO

And some other useful functions:

CALCULATE MIN() TO
CALCULATE MAX() TO
CALCULATE CNT() TO

CALCULATE SUM() TO


In a previous blog entry I wrote a program to calculate the median from a set of data. VFP does not have a native median function.

Population vs. Sample Data.


Visual FoxPro doesn't have a built-in standard deviation formula for a sample. The functions CALCULATE STD() is a population function. The same is true of variance. The question then becomes, how can I get a sample standard deviation? I'm glad you asked! Here's how:


FUNCTION std_devs

LPARAMETERS tcTableField, tcPredicates

LOCAL nRet, nsavg , nscnt AS float
SELECT(JUSTSTEM(tcTableField))

IF !EMPTY(tcPredicates)
CALCULATE AVG(&tcTableField) FOR &tcPredicates TO nsavg
CALCULATE CNT(&tcTableField) FOR &tcPredicates TO nscnt
SELECT SUM( (&tcTableField- nsavg) ^ 2 ) as ssum ;
FROM JUSTSTEM(tcTableField)
WHERE &tcPredicates
INTO CURSOR crTmpSDEVS2
ELSE
CALCULATE AVG(&tcTableField) TO nsavg
CALCULATE CNT(&tcTableField) TO nscnt
SELECT SUM( (&tcTableField- nsavg) ^ 2 ) as ssum ;
FROM JUSTSTEM(tcTableField)
INTO CURSOR crTmpSDEVS2
ENDIF

nRet = SQRT(crTmpSDEVS2.ssum / (nscnt-1) )

USE IN crTmpSDEVS2
RETURN nRet

ENDFUNC



What about STD or VAR in an SQL statement? Again, glad you asked!


SELECT DISTINCT std_devp("crtmp.value") AS stddev ;
FROM crtmp INTO CURSOR cResult
SELECT DISTINCT std_devs("crtmp.value") AS stddev ;
FROM crtmp INTO CURSOR cResult

*-- here is the population version fit for SQL:
FUNCTION std_devp
LPARAMETERS tcTableField, tcPredicates
LOCAL nRet AS float

IF !EMPTY(tcPredicates)
CALCULATE STD(&tcTableField) ;
FOR &tcPredicates TO nRet IN JUSTSTEM(tcTableField)
ELSE
CALCULATE STD(&tcTableField) ;
TO nRet IN JUSTSTEM(tcTableField)
ENDIF

RETURN nRet

ENDFUNC


A big issue with a custom function in an SQL statement is performance. The sample standard deviation function I wrote above is certainly not going to win any sprints (but might just hold out during a marathon). Some environmental tests also need to be done in the above code (for example, resetting the alias back and checking for deleted would be nice improvements). For what it's worth, this should be a good stepping stone for those in need of a good sample standard deviation formula in pure FoxPro. In addition, using it as a function call in an SQL statement increases its usability and worth.

No comments: