Wednesday, March 14, 2007

Parsing City/State/Zip

Many applications today still carry the city, state, and zip in a composite field. The problem with this type of concatenation is that you can't easily query the data or use it for other things (such as building a report by state). There really is no benefit or good reason to keep them together. I have yet to hear a good 'excuse':

1.) This is how I got the data
2.) It makes it easier for the end-user to enter everything in a single field
3.) I don't have to worry about combining them on labels
4.) (insert your bad excuse here)

To add value to these legacy composite fields, you can write a program to split them up.

Usually, the data has some form such as "city, state zip" where the state is almost certainly abbreviated. The following program can be used to parse out city, state, and zip from a concatenated field into atomic parts. Two assumption are made with this program (important ones, at that): that state will be 2 digits and that city names don't begin with a number. Another excellent approach would be to grab a zipcode file from the post office (or other data supplier) so that you could compare each segment to determine not only that it IS a zip code or city or state abbreviation (or full state name for that matter) but that it actually exists! I'll leave that part up to you! Here is my simple solution to get things atomic:


lcCity = ""
lcState = ""
lcZip = ""
lcCSZ= "Marlborough, MA 01752"
lcCSZ= "North Providence RI 02988"
lcCSZ= "Boston MA"

? parse_csz(lcCSZ , @lcCity, @lcState , @lcZip )
? "city: " + lcCity
? "state: " + lcState
? "zip: " + lcZip

FUNCTION parse_csz
LPARAMETERS tcFullString , rcCity , rcState , rcZip

LOCAL lcFullString AS String
LOCAL lnSegments AS Number

STORE "" TO rcCity , rcState , rcZip && reference parameters

*-- remove any commas and convert to spaces
lcFullString = STRTRAN( ALLTRIM(tcFullString ) , "," , " " )

*-- get the number of segments
lnSegments = GETWORDCOUNT( lcFullString )

*-- try to be smart and figure out what to do depending
*-- on the number of segments found
DO CASE
CASE lnSegments >= 3

rcZip = GETWORDNUM( lcFullString, lnSegments)
rcState = GETWORDNUM( lcFullString, lnSegments- 1)

FOR x = 1 TO lnSegments - 2
rcCity = rcCity + " " + GETWORDNUM( lcFullString, x )
NEXT

rcCity = ALLTRIM( rcCity )

CASE lnSegments = 2

IF ISDIGIT( GETWORDNUM(lcFullString, 2) )
rcZip = GETWORDNUM(lcFullString, 2)
ELSE
IF LEN( GETWORDNUM(lcFullString, 2) ) > 2
rcCity = GETWORDNUM(lcFullString, 1) + " " + ;
GETWORDNUM(lcFullString, 2)
ELSE
rcState = GETWORDNUM(lcFullString, 2)
rcCity = GETWORDNUM(lcFullString, 1)
ENDIF
ENDIF

CASE lnSegments = 1

IF ISDIGIT( lcFullString )
rcZip = lcFullString
ELSE
IF LEN( lcFullString ) > 2
rcCity = lcFullString
ELSE
rcState = lcFullString
ENDIF
ENDIF
ENDCASE

*-- Return in a mailing format
RETURN rcCity + " " + rcState + " " + rcZip

1 comment:

Jamie Osborn said...

Using Outlook to do it for you is another possibility - http://www.civilsolutions.com.au/publications/parseaddress.htm