September 2007


11 Sep 2007 12:51 am

fresh Strong Russian word: Nyet [No]. By the end of this post I hope you'll agree. Worst case you'll have food for thought.

This in-depth post covers a complex topic that might not apply to everyone, but it covers an area where companies have struggled to try to show return on the investments made in skills, technology and time. The post promises clarity and guidance that hopefully will result in you saving tons of aggravation and yes even a nice chunk of change.

Data Mining and Predictive Analytics have promised a the earth, the moon and the sun for sometime now, in all channels we do business in. My personal point of view is that on the web they fall far short of even the most pessimistic promises. For now.

As someone who has grown up in the world of traditional decision support systems (massively large data warehouses, business intelligence systems and tools, ERP & CRM systems) I have had the opportunity to be on both marketing / business side as well as development and implementation side of things.

There is nothing cooler than imagining all the wonderful things that will come if you simply move beyond reporting, and even analysis, to doing true data mining and predictive analytics. It is hard but can be rewarding.

Lots of consultants (yes I realize the irony here) will sell you this very effectively.

no outletOn pure web data though sadly it does not work.

Much as you might desire it, much as you might will it to happen. Your traditional data mining efforts and resources and $$$ spent on doing predictive analytics will yield very few and rare actionable insights. Most of the time it will prove to be a sub optimal use of time and energy.

[I can see the smart analysts amongst you get off your chair and mutter obscenities under your breath.]

There are a few very powerful, and non-obvious, elements working against you when it comes to finding exploitable trends and patterns in your web data, the kind that you are used to in offline and erp/crm type environments. Before you decide to pour $$$ and systems and people into your web predictive analytics efforts please consider the rest of this post.

I recently had the great opportunity to present at the bay area ACM Data Mining Special Interests Group. Here is the last slide of my presentation:

Data mining and predictive analytics challenge

The slide, on my behalf, captures the essence of the challenge when it comes to doing Predictive Analytics with web data. Let me explain.

#1 Type of Data:

It is important to realize that web data for the most part is completely anonymous, usually incomplete and really really unstructured. When you want to do traditional data mining (and not just analysis) and predictive analytics all of these things are poison.

You are looking for larger complex trends and patterns in the data for people, products, outcomes, behavior over large enough periods of time so that you can find something insightful that can also be exploitable.

That is really hard to do when the core things you are relying to capture data are anonymous cookies and javascript tags that can be very, shall we say, sensitive. And that's just the tip of the iceberg.

All this makes it much much harder to tie behavior of people to outcomes they might be driving (on any kind of website, ecommerce or not). Yes if you capture login id's and have connected that to a actual human's details from your offline system and do this for every single person who visits this problem eases a bit (the anonymity part) but most of it is still there.

variables

#2 Number of Variables:

People behave in crazy ways offline, they have multiple touch points and dont use perfect names and addresses etc. All this is much more insane in the online world.

We have discussed on this blog how it is not a online world or a offline world but rather it is a nonline world! This means people flow between channels and touch points and there could be a outcome (lead, purchase, problem resolution) at a completely different channel than were most of the interaction was. You can imagine how this will completely screw up your SAS or SPES or Clementine or other home grown solutions.

Here is another thing that lots of us underestimate. It is easier to Mine and then Predict when there is a certain amount of non-siloed existence. On the web Google is competing with a guy and his pony putting together a new search engine. Not only are there pretty much no barriers to entry but it is easy for your customers to flirt with your competitors and for your competitors to react to you in a massively efficient manner.

So are three visits to purchase typical? (What about two visits to a store in between?) Is $15 off to people from Florida the best strategy? (What happens to that when your competitors run aggressive PPC?) Is "Tony" and all visits attributed to Tony really Tony? (What about cookies and my wife and I and Damini all surfing Amazon on the same login?)

And here is what happens, by the time you control for the variables you can count and account for (while throwing away all that you can't) literally you are left with a glass of water (and you started with a ocean full of water) and your ability to predict anything scalable for massively actionable insights is deeply limited. It is just a glass of water after all. :)

multiple purposes

#3 Multiple Primary Purposes:

On the web this issue complicates things. We are trying to predict the outcome of our website, a complex being that exists to do lots (even things that your website was not created for).

So if it is unlike you other channels where a visit and outcome is fairly easily identifiable at the highest level then how do you Mine and Predict?

I have often stressed the importance of measuring Primary Purpose because of the power that comes from real understanding of why people visit the website. Two things connected to Primary Purpose mess up your Mining and Prediction efforts:

1) You don't know all of the primary purposes (click here to find out how you can find out).
2) It is incredibly difficult to take your massive collection of clicks and visits and then assign them into each primary purpose bucket and then predict on top of that.

3) See below.

#4 Multiple Visit Behavior:

multi taskingThis really screws things up. You can predict frames of minds (primary purpose) when you send people pieces of mail. You can predict what people want/think when they want into your supermarket / store. You can make up even more examples of things we all analyze and Mine and Predict.

It is a pain to go to a store and then go there six more times. On the web this is trivial. Hardly any website converts in one visit.

It is also a pain to go to the store for every problem you have or every question you have. On the web this is trivial. You can have the same person come to your website as a different persona many times to solve a different issue.

The question as you get ready to analyze your multi terabyte database is: How can you isolate this behavior in your clicks? With how much confidence?

On paper it sounds easy but in practice it is incredibly hard to accommodate for multiple visit behavior, even if you have nixed the problem of collecting data accurately for each person and for each of their visits.

missing keys

#5 Missing Primary Keys, Data Silos, Lack of Holistic Datasets:

One way to get better at prediction is to take you data out of the web analytics silos and merge it with other sets of customer data in your company (stores and supermarkets, phone channels, others). If you knew all the costumer touch points and had merged the data then it gets much much easier to understand current behavior and predict future behavior and outcomes.

This nirvana scenario is crushed by a couple of rather rotten tomatoes.

We are all familiar with untagged campaigns and pages. We also know that the url parameters don't always work in helping us collect data. The issue that causes more problems is the fact most companies don't quite put the forethought required to create the right "primary keys" that will allow data from different channels to be hooked up together.

There are even problems with name and address and phone numbers collected and stored differently, causing both a data reconciliation nightmare but specific to this post causing major challenges in analyzing outcomes.

For data mining and predictive analytics to yield positive ROI your company will have to put a lot of forethought into the process of data collection and storage across channels and in the deep bowels of your web / erp / crm systems. If that action item is not marked completed then it is optimal to focus on that first before cutting a chq for tools / people to do Mining and Predictions.

rapid change

#6 Massive Pace of Change on the Web:

Sure Google, Yahoo, Cnn, Craigslist, Amazon, Ebay, New York Times are always going to be there. It might even seem like things never change.

Unfortunately for you and I the game is not quite the same. The web is constantly changing. The way people experience it, the way people compete, the way people read and recommend and buy, the way everything happens.

Doing mining and predictive analytics on past behavior requires a certain amount of "stability" about your future (customers, business, outcomes etc etc). But if the "environment" changes too much, or even enough, then your predictions on past behavior will have only tiny chances of success.

For now this is perhaps one of the biggest challenges to Analysts and Statisticians who are working hard to get some of the traditional mining and predictive algorithms to work on our web data.

fortune cookie

The Wikipedia article on Predictive Analytics ends with this statement:

"Predictive analytics adds great value to a businesses decision making capabilities by allowing it to formulate smart policies on the basis of predictions of future outcomes. A broad range of tools and techniques are available for this type of analysis and their selection is determined by the analytical maturity of the firm as well as the specific requirements of the problem being solved."

I'll leave that thought with you and stress that you consider:

1] maturity of your firm

2] requirements of the problem you are solving

3] the six items mentioned in this post and weather

4] you fixed all the "low hanging fruit"?

Ok now its your turn.

What do you all think? Do you agree this is hard? Perhaps you have already subdued this tough problem? Perhaps there is a flaw in my hypothesis?

Please share your tips, tricks, war stories, critique, brickbats via comments.

[Like this post? For more posts like this please click here, if it might be of interest please check out my book: Web Analytics: An Hour A Day.]

05 Sep 2007 12:45 am

this or that A reader of the blog had a interesting question that made me think about the value of experience, or the value of "having been around for a while", vs the value of pure passion and excitement and moldability.

I get lots of wonderful email every day with delightful questions, this one made me think harder.

Here is the actual excerpted question…..

….with a quandary: Is it better to hire and train a really bright, freshly minted college grad, or does the extra value returned by someone who's been a web business analyst for several year merit the extra expense?

The answer is of course: It depends.

In many industries experience trumps everything. "You have operated a lathe / the Space Shuttle / a school bus for 15 years? Congratulations Ian you have the job – Jack could you show new graduate Avinash from Ohio State the door please, do let him know we appreciate his passion."

The web in some ways is unique, at the moment.

It is young, it is vibrant, it is evolving at a rapid pace, everything new is old quickly (and yes sometimes it seem the old is "new" again).

This complex organism demands a stunning amount of flexibility from people whose job it is to analyze it. It requires a atypical ability to let go of the past experiences and learned behaviors quickly so as to understand the new in a new way rather than taking the old known square pegs and try fit them in new round holes.

On the web, specifically for analysis of this interesting medium, experience counts for something. But in the grand scheme not as much as it used to.

age

On the web here is what counts:

    1) You actually "get" the web. I mean in your blood you are a web being, you marvel at its beauty, you use it, you love it, you "get" it (very critical if you are ever to be able to "get" your website visitors and make sense of all the clicks you have – no "get" web, no "get" insights).

    2) You are a inherently flexible being and you are open to new things, in fact you have experience proving that at every new job you ditched the old junk and moved your employer to the latest optimal mindset, not technology but mindset (very critical for someone to see evolution of the web and understand newest measurement opportunities – clickstream or otherwise). Entrenched mindsets will not win the war when it comes to Web Analytics.

    3) Change will not kill you. If you think for a moment this is different from #2. This is critical because human beings love the known, most fear change, and a few can't see future opportunity because they can't or don't want to change. Yet for the foreseeable future the only constant in the web measurement space is change – as you build out a team / skills you want to be ready for that.

    4) Critical thinking. From Wikipedia: "Critical thinking consists of mental processes of discernment, analyzing and evaluating. It includes all possible processes of reflecting upon a tangible or intangible item in order to form a solid judgment that reconciles scientific evidence with common sense."
    You want a Analyst right? Not a Report Writer? You can find critical thinking in a guy flipping burgers at McDonalds or doing advanced statistical analysis. Look for it.

Obviously I am only addressing attitudinal areas above, yet when I look to hire a employee those are the things that I look for. New or Experienced. Young or old. Web newbie or old hand.

buttonsI can teach anyone where to press buttons in Omniture or WebTrends. I can teach anyone the definition of bounce rate in HBX or Visual Sciences. It take a couple weeks but I can teach you how to create labels in ClickTracks or filter data in Google Analytics.

I can't "teach" you any of the above four requirements.

I have been a Practitioner / Manager / Director on the "do side" of things (vs sitting outside) for ten years, the last few on the web, and I have found people are "hard wired" for the above four. This could purely because I am not good at teaching, I am not taking that off the table. You tell me what your experience is.

Peer beyond the "fifteen years of Webtrends experience" line in the resume. Look beyond the statement "I have been doing this for twelve years (undertone – I know everything there is to know, now show me where your log files are!)" or "I am new out of college and just twittered my friends and this interviewing is being live streamed on my myspace page and could you please repeat the question about javascript, I am not sure I understand".

expert

The making of your own expert: A webbie who is flexible, who is not afraid to change and is a critical thinker.

So what was my advice to the blog reader? Is it better to pay more for a experienced person or hire a freshly minted grad (and I am assuming cheaper)?

    "It depends on one important criteria. Do you have a strong web analytics program in your company or a person who can provide mentoring / thought leadership?

    If the answer is yes then get the smart graduate and teach them the new mindset around analytics and how they should approach looking at numbers on the web. They are impressionable, teach them web analytics is not just clickstream but includes qualitative analysis and experimentation and testing and soon BT. Oh and you have Omniture, they'll learn in a week where the buttons are.

    If you don't have someone in your company who can provide true / current thought leadership on web analytics then you are better off paying more and getting someone with the latest thinking and experience. This person can get a grip on what you have and start executing and pointing your company in the right direction. When they have established credibility they can hire perhaps young folks.

    If you don't have someone who can point the ship and provide guidance then any passionate fresh grads you hire will wither."

Ok your turn.

What do you think? Would you have given different advice? What has your own experience been (especially if you have like me tried to build out a team of stars who can execute without you)?

Please share your tips, tricks, war stories, critique, brickbats via comments.

[Like this post? For more posts like this please click here, if it might be of interest please check out my book: Web Analytics: An Hour A Day.]

« Previous Page