Build A Great Web Experimentation & Testing Program

It is a crime against our customers if we don’t have a robust web Experimentation & Testing program in our respective companies. That is a bold statement but a good testing program is truly game changing in multiple ways.

(If you have not read the Experimentation and Testing Primer post I recommend that as foundational material for this post.)

The good news is that Experimentation and Testing is increasingly being accepted as something any decent web program should constantly be engaged in. The wonderful thing is that now technology makes it increasingly easy to test your ideas and insights, cheaply and quickly.

We are not limited by the long release cycles on our websites or “IT” or “Marketing” or other such usual hurdles. In the simplest model a simple javascript tag is placed on the site and then using your vendor’s system you can run tests you want and measure success without needing extra work by your developers or IT staff.

Yet in many companies testing is not quite ingrained in the culture or we are stuck in the simplistic A/B or bare bones Multivariate tests (see the link above for details of what these are).

So what does it take? Here, IMHO, are seven recommendations to build a great experimentation & testing program:

# 7 First get over your own opinions:

It really sucks but this is of paramount importance. If you are running the program it is important that first you get over yourself. If you are going to convince everyone else that testing and validating opinions should be a way of life, you should first truly drink the kool-aid. (IMHO the more entrenched the opinion in a company the more likely it is that it is wrong.)

When I go out and talk about testing I make the case of it by sharing the most recent three or four times testing has proven me to be disastrously wrong. I share the worst stories, the big losers. The point I am making is that we don’t really represent the customers and validating opinions is great.

Bottom-line: You will get a receptive audience and change minds much faster, because you are willing to “open the kimono.”

# 6 State a hypothesis, not test scenario:

Most often people will come to you and say, I want to run a test different box shots, can you swap this image with text, we should try different promotions etc.

The golden rule is: Always start with a hypothesis, not test details or test scenario. Turn to the person and say “what is your hypothesis.”

It is amazing how many times people are taken aback by that. Mostly because we as humans don’t want to put that much thought into anything.

The magic of this question is that it forces people to take a step back and think. They might come back to you and say “my hypothesis is that images of people are much more powerful at making a connection than current box shots hence we will have a higher engagement score (or sales or whatever)” or “my hypothesis is that visitors to the site are more interested in user generated content than our company propaganda”.

Bottom-line: Two great outcomes: 1) You can now contribute to the creation of the test, rather than just starting with a “I want you to do this” 2) In every well crafted hypothesis is a clear success measurement (how we’ll know which test version wins). If you don’t see a success measurement in the hypothesis then you don’t have a well thought out hypothesis.

# 5 Create goals/decisions before hand:

Another big mistake that is often made is that even if the success metric is known we don’t bother to set parameters to judge the “victory” by.

Decide what the success metrics for the test are before you launch and don’t forget to create a goal for those metrics. So you are launching a test to improve conversion rate. Great. By how much do you think you’ll improve the conversion rate?

Frequently that thought has not been put in up front but it is extremely critical for these two reasons:

It forces you to think, to do some research as to what the current trends in those success metrics are and go through a goal creation exercise for your test.
The awesomely cool outcome of this is that you’ll be able to judge if you should do the test in the first place.
So if testing dancing monkeys on your home page will only improve conversion rate by 0.001% (your goal) then maybe that test is not worthwhile, you should think of something more powerful. If you do $10 billion in sales on your website then clearly a 0.001% lift will endear you to your company leader / VP /CEO / guy-gal with bigger title than yours.

Bottom-line: This will push the thought envelope and at the same time encourage creation of tests that will yield more powerful customer experience improvements on your site.

# 4 Always test and validate for “multi-goal”:

Almost all testing is “single goal based”, especially the current swath of multivariate testing companies. Put this javascript box on this and that page, put this javascript tag on the goal page, crank the leaver, sing happy birthday, eat cake and here are the results.

Life and customer experiences are significantly more complex. Visitors come to your website for multiple purposes and if you use the current multivariate or other such testing tools then work to integrate other tools that will allow you to measure the impact of your test on all those other purposes.

Simple example of multiple purposes: Visitors come to your home page to buy, to find jobs, to print product information, to read your founder’s bio, to find your tech support phone number etc. If you only solve for conversion rate you might be majorly and negatively impacting your customers. Do you know if you are for tests you are running?

Simple example of tools integration: You’ll get statistical significance and single goal success from your testing vendor. But with small smarts you can integrate your testing parameters with your survey tool – say ForeSee (so you can measure conversion rate and Customer Satisfaction and get open ended customer feedback in each test version) or you can integrate testing parameters with your clickstream tool – say ClickTracks (so you can measure conversion rate and Customer Satisfaction and Click Density and Funnel Analysis for each test version). [See Disclaimers & Disclosures.]

Bottom-line: In the first few months measuring “single goal” will work and happiness will prevail. Only by moving to truly multi-goal will you be able to make the most optimal decisions, and in turn create a program in your company that will sustain and be a long term competitive advantage.

# 3 Accept “simple” / “silly” initial tests, but remember “if you pay peanuts you’ll get monkeys” :

Testing programs are run by people who are really really smart, you : ). Usually though they have not truly followed #7 above. When business users come to them with simple tests those are scoffed at and scenarios of deeply complex “Albert Einstein” tests are provided. This is the wrong thing to do in early stages of the program.

For the first little while we want to win over fans, we want to achieve mindset shifts. The best way to accomplish this is to accept and run the first few simple tests that originated from the minds of your users. You will thus get them involved in testing their ideas and win or lose you have shown them the power of the tool / methodology.

There is a important thing to remember though, your end goal should to have the mindset shift proceed towards the direction of doing complicated “big” tests, ones that put a lot bigger things on this line and not just play with the hero image on the home page.

Bottom-line: In early stages test your the ideas of your users, not matter how small or big, but drive your program to doing more complex and fundamental tests over time.

# 2 Create a fun environment:

We often forget that this is supposed to be fun. What other situations can you think of where you can “play” with your customers, and they never find out? That is awesome. Testing should be fun, it should be enjoyable, it should stretch our brains.

One really simple recommendation is to get everyone involved in the test to bet on the outcome (only in US states where betting is legal). Everyone loves betting and since these might be their tests they might really like the odds of winning. Stay with something small, one dollar for every prediction of the success metric or which version of the test will win.

What happens in reality is throughout the test duration people will keep checking which version is winning, thus learning complex measurements. After the test winner is declared you can bet the “losers” will want to pour over every success metric and its definition and computation until they realize that they really did lose, but they just learned so much more than you could have taught them in any other way. : )

Bottom-line: Learning should be fun. (Now where else have I read that. : )

# 1 Two absolutes you need: Evangelism & Expertise:

The number one recommendation is that you need two key people in any successful program, one Testing / Experimentation Evangelist and one Testing Expert.

Most people don’t yet get the testing religion, to convert them you will need a evangelist. Not just someone who “gets it” but someone through their communication skills, pure love, business understanding, position can go out there and preach and articulate the value proposition. If this person does not know what “r squared” is that is ok.

To run your program and actually execute much of the above you need s Testing Expert. Someone who is seeped in metrics and data and complex computations and has enough business expertise to look at tests and provide good feedback and even help generate great value add new ideas (and push back politely on the non value add ones over time). This person should meet #2, #4 & #5 Criteria from the Great Analyst post.

This position is important because the testing world is young on the Web and increasingly even if the vendors try to convince you of how great their tool is over the other guys the biggest challenge is great ideas to test and accurate success measurement.

There are many good vendors (we use Offermatica for multivariate testing). Differences between tools will not be your limitation for quite some time (testing ideas, culture, program sophistication, implementation etc will be). So if you like Dave Morgan because he posted on your blog then go with SiteSpect : ). But don’t forget your Expert.

Agree? Disagree? Have alternative recommendations? Got questions? Please share your feedback / critique via comments.

[Like this post? For more posts like this please click here.]

11 thoughts on “Build A Great Web Experimentation & Testing Program”

Pingback: Analytics for blogger and small business owner at Analytic Insight
Sanjay Smith
July 12, 2006 at 12:50
Excellent post as usual Avinash, thanks for sharing these tips. After doing a whole lot of “cool tests” our company is struggling to find what value has been added to the bottomline from all the money that has been spent. As I read your post I can already see key steps that we missed in our execution. Your recommendations are insightful, if a few months late for us :).
Reply
Edward O'Meara
July 13, 2006 at 08:07
Avinash,
Yes, yes, yes, yes, yes, yes, and yes. Great thoughts. We must wonder why the web metrics “companies” and web research “leaders” only understand #2 and #3…
I sense it is because the blogosphere is full of “Type II” evangelists motivated by their web 1.0 scars and emboldened by their Public Relations degrees!
Reply
Dave Morgan @ SiteSpect
July 13, 2006 at 15:46
Hey Avinash,
A great post all around. There’s been alot written about the mechanics of testing and “things to try”, etc., but this is the first solid article I’ve seen about the cultural side. So kudos to you! :) And, of course, the feedback:
Decide what the success metrics for the test are before you launch and don’t forget to create a goal for those metrics. So you are launching a test to improve conversion rate. Great. By how much do you think you’ll improve the conversion rate?
This also provides an opportunity for estimating the sample sizes required to reach a statistically significant result. Why do you care? Because it’ll guide you as to what tests may be feasible to run within a time constraint vs. those that can’t.
For example, if your goal is to go from 2.0% to 2.5% CTR from a landing page (a 25% lift) you’ll need a sample of ~30,000 visitors. If you run that test to 30k visitors and don’t reach the goal, stop the test and move on.
But perhaps counter-intuitively, the higher the goal (change in response rate), the smaller the required sample. So if our goal is instead to go from 2% to 3% (a 50% lift) we actually only need ~8,000 visitors! Hmm. So we can validate that hypothesis in about 1/4 of the time.
So just ask yourself – what is the smallest change that would have a material impact on our business goals? Is it 25%? 50%? more? Answering these questions will help you decide which tests are feasible given the available time — and when to “stop” a test because the success criteria wasn’t met.
[n.b. a sample size calculator ought to be part of everyone’s testing toolkit – it your testing solution doesn’t haveone, just google for “sample size calculator” for some free web-based tools.]
Almost all testing is “single goal based”, especially the current swath of multivariate testing companies …. Life and customer experiences are significantly more complex …. If you only solve for conversion rate you might be majorly and negatively impacting your customers.
Absolutely critical! It’s too easy to get caught up with “one step at a time” improvement. Behavior needs to be measured across multiple response metrics, across multiple visits (where applicable.)
Here’s an example… a user of SiteSpect recently ran a test where he improved landing page click-throughs by 91%. Not bad! But what did visitors do after the click when they saw the actual registration page? Turns out that the “best” landing page actually yieled only minor improvements in registration. Instead, a worse-performing landing page (worse for CTR) yielded the highest lift in the registration form.
So what’s this mean? :) Track multiple behavior and measure multiple processes/goals. Learn how each site element contributes to each goal or process (the multivariate piece). But know that some changes may improve certain response behavior while hindering others.
your end goal should to have the mindset shift proceed towards the direction of doing complicated “big” tests, ones that put a lot bigger things on this line and not just play with the hero image on the home page.
I agree that testing is an incremental learning process. Start simple and increase your sophistication with with each test. But I don’t think it’s a one-way linear procession from simple to complex… as in “once you’ve gone MVT you’ll never go back” ;)
Surely people’s sites change over time, requiring certain areas to be revisited and retested. My own opinion is that sometimes an A/B test is fine (like when you need to choose between several promotions), and you don’t need to redefine the problem (making it more complicated) just because you have the ability to run a multivariate test.
I do think web analysts need to consider testing things that they wouldn’t ordinarily think of, and perhaps this is your gist of “big” tests. I’m talking not just the usual testing fodder of headlines, images, copy, etc. Think outside the box and challenge deeply-embedded site elements. Page layout, navigational structure, style elements (CSS), etc. The sky’s the limit.
# 2 Create a fun environment:
Great! :)
cheers
Dave
Reply
Avinash Kaushik
July 14, 2006 at 09:06
Dave: Thanks for a *awesome* comment, you have truly added value to the conversation. It is greatly appreciated by myself and our dear blog readers.
just google for “sample size calculator” for some free web-based tools
I did and for benefit of our readers a couple that show up high are:
- UCLA Dept of Statistics: Sample size calculator
- Creative Research Systems: Sample size calculator
But I don’t think it’s a one-way linear procession from simple to complex… as in “once you’ve gone MVT you’ll never go back” ;)
……
I do think web analysts need to consider testing things that they would not ordinarily think of, and perhaps this is your gist of “big” tests.
I completely agree with you, once you go MVT (I would say testing overall) you’ll never go back.
For reasons that are complex people start in the MVT world, try some tests with images and content and then just stay there. Or repeat those kinds of tests on other pages etc.
The thrust of my recommendation was, think bigger and think different and think crazy. Only then will we/you get disproportionately high impact on customer experience and company bottom line.
Thanks again so much Dave for sharing your wisdom.
Reply
Pingback: Testing: When we take “Don’t make me think” as professional advice » Writing for the web
Pingback: Confluence: Strategy and Customer Insights
zyxo
January 29, 2009 at 06:56
Avinash,
your idea of letting people bet on the outcome is great !
Reply
Pingback: Confessions of an Aspiring Social Media Superhero, Part Two | A New Marketing Commentator
Pingback: Sneak Freq – The personal blog of David Ward | How to AB test a physical product - Sneak Freq - The personal blog of David Ward
Pingback: Is your marketing experimentation program ready for prime time?