Thursday, May 31, 2007

Business Intelligence Through Web Analysis

I just received a surprising email regarding my previous post about Analyzing cPanel's Raw Access Logs in FoxPro. The commenter wrote “dude, you’re nuts. Why waste your time on this?!? I use Awstats and that works fine”.

I sat back in my chair quite puzzled. After all, I know that Awstats doesn’t even come close to giving me the answers I need to grow my business and website. Then I realized my folly: I jumped right into my example in my last post without fully explaining the goal of the project. I’ll try to redeem myself now.

Awstats and similar tools (in my case, provided with cPanel) are retrospective reporting tool. They give you nice charts and display some very interesting numbers and facts about the types of traffic generated on your site. If you’re good with numbers and can associate events to dates (in your head) then you may be able to notice some interesting patterns (like, “hey – it seems that whenever I post a new blog entry, my hits double!”).

But this isn’t good enough for more serious projects (but isn’t a bad place to start either).

When growing your business (whether you’re a blogger or selling widgets), this type of information is invaluable. You need good, consistent, scientific analysis to pull it all together (hunches and gut feelings don’t count). Pool raw data (from a variety of sources), integrate it, clean and add value to it, and compile it to create some incredibly useful and valuable information (Read: Business Intelligence). This information can help you make decisions like (a) how much to spend on advertising, (b) whether or not to sponsor an event (such as FoxForward), (c) how many blog posts to make per week to keep the interest of readers, (d) should I sign up with Google AdSense, or (e) what design elements and layout plans are making the greatest impact.

Data smog is a real issue, however. Too much data can not only waste your valuable analysis time, but the integration of this meaningless data will do nothing but eat away at your resources (and give you a headache). The key then is to do a little preparation before you begin a project like this (duh!). I think there are two important steps (1) monetize all elements of your business, and (2) identify all key performance indicators (KPIs). Armed with this information, you will be able to build a dimensional model (in VFP of course!) with an incredibly rich fact table. Monetizing helps you assign value to all your tasks, and KPIs allow you to measure the benefits of these endeavors. Throughout the course of this project, I’ll be itemizing these two items in greater detail. As an example, for my blog todmeansfox, I’ve monetized the following items:


  • Posting a new blog entry: $90.00 / post (my time and effort to post based on my current rates, discounted)

  • Responding to posts: $10.00 / comment

  • Social networking: $90.00 / contact (includes setting up accounts on del.ico.us for example)

  • Advertising: $40.00 / free advertising (example, updating my fox.wiki profile)

  • Advertising: case by case (I have not done any advertising yet)

  • Sponsoring: case by case (I have not sponsored an event yet)

  • Updating CSS files: $60.00 / hour (how much does it ‘cost’ me to update my site’s layout and colors?)


Next, I tried to identify all important KPIs:

  • referrals

  • length of stay

  • out clicks

  • total number of hits and page views

  • total number of comments made

  • total number of emails received

  • Blackstone Providence leads

  • consulting job hours from website leads


Of course, as I do more data profiling, I may uncover some other very useful pieces of data that I can later integrate.

The goal of this project, therefore, is rather simple: make better business decisions. Using the Internet, I will gather the right data, integrate it in a meaningful way, and use OLAP to analyze and report on the findings. I’ll use data mining, profiling, and trend analysis to identify abuse and spam, as well as identify areas were improvements in layout, methodology, and content can make a greater impact. My hope is to generate more business by leveraging my current assets. On top of that, I want to do it in the open so others can benefit.

The first step in my process is data profiling, where I’ll gather the data I think I can use, analyze it, test it for quality, and prepare it for my dimensional model. My last blog post attempted to start that process by simply downloading and parsing the daily Apache raw access log file (which I should mention is in the NCSA combined/XLF/ELF log format, in case you were wondering).

As you can see, Awstats can only get me so far (actually, not that far at all).

No comments: