Experimentation and Testing: A Primer

trinityThis post is a primer on the delightful world of testing and experimentation (A/B, Multivariate, and a new term from me: Experience Testing). There is a lot on the web about A/B or Multivariate testing but my hope in this post is to give you some rationale around importance and then a point of view on each methodology along with some tips.

I covered Experimentation and Testing during my recent speech at the Emetrics Summit and here is the text from that slide:

    Experiment or Go Home:

    • Customers yell our problems (when they call or see us), they bitch, they rarely provide solutions
    • Our bosses always think they represent site users and they want to do site design (which all of us promptly implement!!)
    • The most phenomenal site experience today is stale tomorrow
    • 80% of the time you/we are wrong about what a customer wants / expects from our site experience

That last one is hard to swallow because after all we are quite full of ourselves. But the reality is that we are not our site's customer, we are too close to the company, its products and websites. Experimentation and testing help us figure out we are wrong, quickly and repeatedly and if you think about it that is a great thing for our customers, and for our employers.

Experimentation and testing in the long run will replace most traditional ways of collecting qualitative data on our site experiences such as Lab Usability. Usability (in a lab or in a home or remotely) is great but if our customers like to surf our websites in their underwear then would it not be great if we could do usability on them when they are in their underwear?

It is important to realize that experimentation and testing might sound big and complex but it is not. We are lucky to live at at time when there are options available that allow us to get as deep and as broad as we want to be, and the cost is not prohibitive. There are three types of testing that are most prevalent (the first two are common).

A/B Testing: Usually this is a all encompassing category that seems to represent all kinds of testing. But A/B testing essentially represents testing of more than one version of a web page. Each version of the web page usually uniquely created and stand alone. The goal is to try, for example, three versions of the home page or product page or support FAQ page and see which version of the page works better. Almost always in A/B testing you are measuring one outcome (click thrus to next page or conversion etc). If you do nothing else you should do A/B testing.

How to do A/B Testing: You can simply have your designers/developers create versions of the page and depending on the complexity of your web platform you can put the pages up and measure. If you can't test them at the same time put them up one week after the other and try to control for external factors if you can.

    Pro's of doing A/B Testing:

    • This is perhaps the cheapest way of doing testing since you will use your existing resources and tools
    • If you don't do any testing this is a great way to just get going and energize your organization and really have some fun
    • My tip is first few times you do this have people place bets (where legal) and pick winners, you'll be surprised


    Con's of doing A/B Testing:

    • It is difficult to control all the external factors (campaigns, search traffic, press releases, seasonality) and so you won't be 100% confident of the results (put 70% confidence in the results and make decisions)
    • It is limiting in the kinds of things you can test, just simple stuff and usually it is hard to discern correlations between elements you are testing

Multivariate Testing: Currently the cool kid on the block, lots of hype, lots of buzz. In A/B above you had to create three pages. Now imagine "modularizing"? your page (break it up into chunks) and being able to just have one page but change dynamically what modules show up on the page, where they show up and to which traffic. Then being able to put that into a complex mathematical engine that will tell you not only which version of the page worked but correlations as well.

For example for my blog I can create "modules" / "containers" of the core page content, the top header, and each element of the right navigation (pages, categories, links, search etc). In a multivariate test I could move each piece around and see which one worked best.

    Pro's of doing Multivariate Testing:

    • Doing Multivariate turbocharges your ability to do a lot very quickly for a couple of reasons
      • There are free tools like the Google Website Optimizer or paid tools like Offermatica, Optimost, and SiteSpect who can help you get going very quickly by hosting all the functionality remotely (think asp model) such as content, test attributes, analytics, statistics.
      • You don't have to rely on your IT/Development team. All they have to do is put a few lines of javascript on the page and they are done. This is a awesome benefit because most of the times that is a huge hurdle.
    • It can be a continuous learning methodology
    Con's of doing Multivariate Testing:

    • The old computer adage applies, be careful of GIGO (garbage in, garbage out). You still need a clean pool of ideas that are sourced from known customer pain points or strategic business objectives. It is easy to optimize crap quickly.
    • Website experiences for most sites are complex multi page affairs. For a e-commerce website it is typical for a entry to a successful purchase to be around 12 to 18 pages, for a support site even more pages (as we thrash around to find a answer!). With Multivariate you are only optimizing one page and no matter how optimized it cannot play a outsized role in final outcome, just the first step or two.

Most definitely do Multivariate but be aware of its limitations (and yes the vendors will tell you that they can change all kinds of things throughout the site experience, take it with a grain of salt and take time to understand what exactly that means).

Experience Testing: New term that I have coined to represent the kind of testing where you have the ability to change the entire site experience of the visitor using capability of your site platform (say ATG, Blue Martini etc). You can not only change things on one page or say the left navigation or a piece of text on each page, but rather you can change all things about the entire experience on your website.

For example lets say you sell computer hardware on your website. Then with this methodology you can create one experience of your website where your site is segmented by Windows and Macintosh versions of products, another experience where the site is segmented by Current customers and New customers and another where the site is purple with white font with no left navigation and smiling babies instead of product box shots.

With experience testing you don't actually have to create three or four websites, but rather using your site platform you can easily create two or three persistent experiences on your websites and see which one your customers react to best. Since any analytics tools you use collect data for all three the analysis is the same you do currently.

    Pro's of doing Experience Testing:

    • This is Nirvana. You have an ability to test on your customers in their native environment (think underwear) and collect data that is most closely reflective of their true thoughts.
    • If your qualitative methods are integrated you can literally read their thoughts about each experience.
    • You will get five to ten times more powerful results than any other methodology
    Con's of doing Experience Testing:

    • You need to have a website platform that supports experience testing, (for example ATG supports this)
    • It takes longer than the other two methodology
    • It most definitely takes more brain power

Experience testing is very aspirational but companies are getting into it and sooner rather than later the current crop of vendors will start to expand into that space as well.

Agree? Disagree? Counter claims? Please share your feedback via comments.

Comments

  1. 2

    Pretty good post; I commented on it at Webmetricsguru.com. Wondering how Analytics will need to adapt to Personalization.

    We're coming to the long tail of search where eventually everyone will see different search results for the same query (based on who they are and where they're located).

    In Web Analytics, if a site shows a different page (one created for the customer on the fly) based on who they are – how will the Analytics represent those variations?

  2. 3

    Marshall: Thanks for your comments….

    In Web Analytics, if a site shows a different page (one created for the customer on the fly) based on who they are – how will the Analytics represent those variations?

    Many different analytics tool can now handle this quite easily. The challenge is that we should have the foresight to know what we want to track. Typically you can set either cookie values or url parameters that a analytics tool will automatically pick up and then you can analyze.

    For example you can come to http://www.kaushik.net/avinash and based on the keyword you came on (or campaign) you could see:
    http://www.kaushik.net/avinash?key=marshall
    http://www.kaushik.net/avinash?key=marshall-is-great

    Now you might see different content on the fly depending on what "key" you came on and my analytics tool can measure every kpi imaginable now that it has captured the key value.

  3. 4

    Very fine post, Avinash. Thank you for the thorough explanations.

    As you've asked for some feedback, here goes:

    Experimentation and testing in the long run will replace most traditional ways of collecting qualitative data on our site experiences such as Lab Usability. Usability (in a lab or in a home or remotely) is great but if our customers like to surf our websites in their underwear then would it not be great if we could do usability on them when they are in their underwear?

    Both lab testing (i.e. for usability) and transparent web testing (A/B, multivariate, etc.) have their places. Lab testing has many benefits, among which is the ability to reveal UI/usability issues that are difficult to quantity with web analytics. Likewise, A/B and multivariate testing are very strong quantitative tools that reveal user preferences that simply cannot be measured in a testing lab.

    For example, asking a user in a lab test to decide which promotion is more appealing is unreliable for many reasons (Hawthorne Effect, for one). But ask users "in the wild" by presenting offers in an A/B testing where they don't know they're being tested, and let them vote with their wallet. This is the crux what marketing experiments reveal that qualitative usability research cannot.

    It is difficult to control all the external factors (campaigns, search traffic, press releases, seasonality) and so you won’t be 100% confident of the results (put 70% confidence in the results and make decisions).

    There's nothing inherently more or less difficult to control with an A/B test vs. multivariate test. The key thing is to randomly assign visitors to the A group vs. the B group.

    In A/B above you had to create three pages. Now imagine “modularizing” your page (break it up into chunks) and being able to just have one page but change dynamically what modules show up on the page, where they show up and to which traffic.

    Another way to look at this: if A/B testing focuses on one site element (i.e. a product image), then multivariate testing focuses on multiple elements (product image plus headline). And a quick terminology note… in the realm of experimental design, these are commonly called multifactor tests. So when you see "multivariate" or "multivariable", or "multifactor" in the context of web testing vendors, it's useful to note that they essentially all describe the same process.

    You don’t have to rely on your IT/Development team. All they have to do is put a few lines of javascript on the page and they are done. This is a awesome benefit because most of the times that is a huge hurdle.

    This has nothing to do with a test being multivariate, and everything to do with the innovations provided by the vendors you've mentioned. More to the point: the hardest thing about A/B or multivariate testing is switching content – showing version A1B1 of a page to one user, while simultaneously showing version A1B2 of that same page to another user. The more factors you're simultaneously testing (varying), the more crucial the content switching becomes.

    You don’t have to rely on your IT/Development team. All they have to do is put a few lines of javascript on the page and they are done.

    Well, yes and no. In some organizations, instrumenting pages with javascript each time you want to test a new area requires the IT/development issue. [Note that I'll freely admit my bias here: my company's product is the only one that truly takes IT out of the equation because it doesn't require javascript tagging.]

    Finally, regarding Experience Testing, what you are articulating sounds alot like personalization with an experimental component thrown in. Besides ATG Dynamo, Microsoft Commerce Server and IBM WebLogic also support simplistic forms of this.

    Thanks again Avinash, and welcome to blog-o-sphere! :)

    Dave @ SiteSpect

  4. 5

    Dave,

    Thanks you for vastly enriching the value of my original post by adding your comments. I have learned more as I am sure have other readers.

    I have personally and actively observed the Hawthorne Effect and hence I am biased towards testing and specifically Experience Testing (which is not so much personalization as putting randomly assigned people into two or more "controlled and different" experiences on the website and seeing which experience performs better against preset goals).

    It is not perfect but we can not only measure conversion and revenue type stuff but also qualatitive like Task Completion and Satisfaction for these experiences. It is not perfect but in my mind a great tradeoff between bringing someone in and giving them $250 to run through a site vs doing it without the participant knowing.

    By no means is Lab Usability over, not by any stretch of the imagination, but as you can see I am giddy at the possibilities of Experience Testing and the learnings that can come from that.

  5. 6

    Sorry, just noticed my mistake… in my very last paragaph I should have said "IBM WebSphere", not WebLogic (sorry BEA).

    Dave

  6. 7
    Mike Samec says:

    Don't forget with A/B testing that you can test specific elements of a page rather than an entire page. For example, test two variations of a headline, send a % of people to version A and divert a % of traffic to version B. The rest of your traffic being your control group.

  7. 8

    Avinash,

    Wonderful post. Thank you for your insights. The tools for analysis and observation have improved greatly over the last several years. What is available now does not compare to the rotators, Javascript trackers and numerous spreadsheets that I was using years ago.

    In doing this conversion optimization work, we found that the segmentation that you speak of is a subset of "noise reduction" in terms of test design.

    As the tester observes and refines the "conversion conversation" the natural extension is to create a user defined channel that serves the visitors concerns and needs. When that match is closely correlated the users actions become more predictable.

    Your term – "Experience Testing" is a perfect articulation of the what I termed the "conversion conversation"

    I am enjoying your Blog and thinking.

    David

  8. 9

    Avinash,

    You should talk about the limitations of Javascript and how only an installed solution like the one that Memetrics offers can test more than a few attributes on a simple page. Why not test Paid Search, Direct Mail or anything with a dependent variable.

  9. 10
    Shashank says:

    :( I am facing problem in setting up experiments on Google Website Optimizer.

    For creating experiments for Conversion Rate Optimization I am facing problems even for a very small test to execute.

    Previously I have created around 5-6 experiments for our website using Website Optimizer, but from last one week I am not getting required combinations on setting up experiments.

    I do add the scripts to the test and conversion page and also making variations, but at last step when I preview it I don't find desired combinations.

    Please help me as I feel that before 1 week I was successfully setting up experiments but from these last 1 week I am doing same actions that I used to but not getting desired combinations.

    Thanks in advance.

    Regards,

    Shashank

  10. 11
    Matt Gershoff says:

    Good Blog,

    What is the difference between A/B testing and what you are calling Experience testing? It sounds like for the experience test you are creating a mega-variable, lets call it ‘site’, which is a bundle of attributes (pages, images, etc.). I don’t see how the actual testing is different than A/B.
    For the multivariate testing we are looking at attributes that sit in several dimensions (think of the corner points in a hypercube). What we are concerned about is that there will be interaction effects over the variables we want to test – so that we need to know all of the values of each variable when determining the results.
    One unmentioned ‘con’ of this approach is that it is very costly from a data perspective – we need lots of observations to ‘fill up’ that hypercube to make robust estimates. One way around this is to work with fractional factorial design testing – where we make assumptions about how many variables we need to include at any one time – this has the effect of collapsing or aggregating some of the corners of our cube so that we need fewer estimates.

    Thanks again

    Matt Gershoff

  11. 12

    Hi Avinash.

    Was looking for a quick and dirty description on multivariate testing. This was exactly what I was looking for :) Thank You.

    Vivek

  12. 13
    apageor2 says:

    Avinash,

    I came across this web site through another link and have been reading for the past 20 minutes on what you have to offer for information and advice regarding the building of web sites and pages themselves. I have been working solo the past 10 years while gaining my degrees; not an easy thing to do but possible.

    As a Unix programmer, I had to follow a certain protocol and test code before it was permitted to go live. At the time I was writing code in Unix or MUMPS however it still was a necessary matter. I still follow the same habits with my web pages before sending them to my clients.

    The company which I was working for used multivariate I believe. Now that I am working for myself, I am following experience testing. Great post! Best wishes for 2009!

    Sue

  13. 14
    Edu Barredo says:

    Hello Avinash, I consider myself a reader of your blog but I'm allways discovering your "old" amazing posts.

    Can you give more info about the "Expercience testing" concept or include some related links?

    Thanks Avinash, and keep up with your great work!

  14. 15
    Jay L. says:

    At our agency we're really doing some interesting A/B/N and multivariate testing of landing pages, ecommerce flows, banner creative among other digital media.

    One of the ongoing debates when we execute a split test is whether or not to keep content elements the same. Meaning, vary up the design concepts dramatically but use the exact same headlines, products, imagery, etc. to isolate the design variable.

    Is this a necessary practice or can a winning design be deemed the winner regardless if the elements are consistent? My belief is that the greatest change promotes the greatest opportunity for results.

  15. 16

    Jay: I think you might want to use Multivariate tests for the goal you have set for yourself, rather than A/B tests.

    In multi variate tests you are able to change anything about a page (headlines, product images, layout etc).

    As the test runs the math and regressions that the tool will do (say Google's Website Optimizer or Test & Target or Sitespect) will compute the impact on the outcome of each element you are trying.

    In the end not only will you know which page works the best, it will tell you the contribution of each element (and, this is cool, it does happen that sometimes a great headline that by itself scored well might not even be on the winning combination!).

    You can of course do this with A/B/N tests, it will just take you far too long.

    Avinash.

  16. 19
    muirskate says:

    Great article!

    I'm a UI designer for a website and A/B split testing has given me such great insight into how our customers think online. You may be right saying that 80% of the time what I think when designing the layout is based on my own experience and what I want people to think or what I expect people to think.

    For example, here's a simple situation where it proved my thinking faulty. When someone is viewing a longboard deck they have the option of buying the deck or buying a complete board with all the components (trucks, wheels). When designing the button I thought "the button should have a call to action like Build a Complete Skateboard". We tested this thinking with a simple button that says "Complete Longboard". In the end the button without the call to action got more conversions.

    I was surprised and thought that I can learn a formula from this, but in reality I can't pretend to figure people out. So for sure A/B testing is going to take websites to a new level for a lot of people, and it's free (Google's product).

    Maybe one day Experience Testing will be an option, but as you said that required a lot more.

  17. 20

    It's been a long time since I even paid attention to A/B split testing. I am glad your course has reminded me of something so valuable which I have overlooked for a couple years now. This will let the customer make the decision. I use this method in Google Adwords by making 2 ads and let them compete against each other. Once I have a winner, I delete the loser and make another ad to compete with the winner. In the end A/B split testing will let the customer decide which ad or webpage is better.

  18. 21
    Savitha A says:

    Avinash,

    I read through the blog and comments. I have one question, hope you can clarify it.

    I'm running a campaign for my product to incentive people to renew their license. I have two flows for the same campaign/offer. However, flow A has one design theme (with Black background) and flow B has another design theme (with white background). Am also keeping content, layout different in both the flows.

    Ultimately, I want to know which of the two design themes and flows are working better in terms of renewal rates and why?

    Should I be doing experience testing or multi-variance testing in this case.

    Please help based on your past experience.

    Thanks,
    Savitha

    • 22

      Savita: The scenario you are describing is a standard multi-variate experiment.

      If it is too complicated to tease out the causality factors, you can start with a A/B test which will just tell you which one works. Then over time you can work with other tests to tease out causality for individual changes.

      -Avinash.

Trackbacks

  1. Experimentation and Testing: A Primer by Avinash Kaushik…

    Avinash Kaushik wrote a very good post on the different types of testing you can do to pull metrics from a website.   Here's my take -  Avinash thinks 80% of the time we are wrong about what a customer wants……

  2. [...] Before we begin, you might want to refer to this very enlightening article on Experimentation and Testing. [...]

  3. Web Analytics…

    Phoenix.edu Currently Tracking Weekly visitor counts (including users passing through to eCampus) Daily phoenix.edu lead counts Daily total web lead counts Daily share (phoenix…….

  4. Web Design and the Scientific Method…

    Some of the Science Fair experiments effectively demonstrated outcomes that were non-intuitive; in the same way, the analysis of user behavior can highlight for us the sometimes unexpected and non-intuitive impact of design choices.

    A good scientist assumes nothing and tests everything! Should a good web designer do any less?

    —————-

    Update: One of my favorite blogs on this topic, Occam's Razor by Avinash Kaushik, has a wonderful post on the topic of web analytics and testing: Experimentation and Testing: A Primer . I highly recommend reading this post if you're interested in this area.

  5. [...] So, what are some good testing resources? How about Avinash's blog. He's got a great post entitled Experimentation and Testing: A Primer. Start there and then head over to FutureNow. They've got some great books about how to actually test. You can find two great starter guides at their online store. If you're already familiar with the testing process then check out the Website Optimizer help section and start reading. [...]

  6. Jim Sterne: Ask And Ye Shall Receive…

    ….

    I believe this. It's in my bones. Sure, customers are not going to invent new, breakthrough stuff. They don't know they need an iPod until everybody else has one. But what about the 99.999% of the rest of it? They do know how they like to buy. They do know how they like to shop. They know how they like to compare products and how they like to return products.

    Avinash Kaushik is one of the most insightful and intelligent web analysts I've ever met. On his excellent blog, Occams Razor Avinash said it best. "80% of the time you/we are wrong about what a customer wants / expects from our site experience."

    Avinash describes his work at Intuit as dealing with website experience, behavior and outcomes. Outcomes are the goals the company sets – selling software. Behavior is all about the clicks. But, says Avinash, if he only had one of the three to work with, it would have to be the customers' direct feedback and customer satisfaction.

    This is from his post Overview & Importance of Qualitative Metrics:

    ….

  7. [...] A news article on Ajaxian led me to this post about this article (PDF) from Microsoft. It talks about A/B testing and how important it is to success on the web. The article even cites the legendary Avinash Kaushik whom I had the pleasure of working with at Intuit. The best line in the article is: "The fewer the facts, the stronger the opinion" — Arnold Glasow. Avinash is credited with the term HiPO, which stands for Highest Paid Opinion. [...]

  8. Metrics and Reporting…

    What do we want to answer? What are the Media Group's reporting needs?……

  9. [...]
    Using Web Analytics To Further Identify Site Conversion Improvements

    Although I’m a big fan of using web analytic data to uncover a vast range of improvement possibilities with site content and conversions, for this post the most useful think I can do is direct you to the following pages on the excellent blog of Avinash Kaushik dealing with all things analytics:

    * Experimentation and Testing: A Primer
    * Excellent Analytics Tip#5: Conversion Rate Basics & Best Practices
    * Excellent Analytics Tip #8: Measure the Real Conversion Rate & “Opportunity Pie”
    [...]

  10. [...] The new division's going to be headed up by our own Russ Glass, and we're actively hiring for his team (which will be located in San Francisco and Waltham – yes, yes, I know, beatiful, beau colic, slightly post-industrial Waltham.  Try and contain yourselves.)  Sooo, if you're interested, we're immediately looking for a vice president of advertising, a senior product manager for our ad products, a product manager to focus on multivariate testing (which is actually pretty cool stuff – check this out if you're curious).  We'll be looking to add some advertising sales reps fairly soon (no job description yet, but if you're in the field you probably get it), and Ad Ops Manager, etc, etc…. [...]

  11. [...] LinksIan Ayers highly persuasive Supercrunchers is a book that attests to the power of split testing, and I thoroughly recommend it. Avinash Kaushik is a prominent analytics author and blogger, and has a good primer on experimental testing. [...]

  12. [...]
    3. Landing Page Quality:

    Your referrals are relevant and the visitor expectations are matching with what you offer. The quality of your landing page has to be questioned then.

    There are several methods to study the landing page quality, A/B testing and multi-variable testing are for example. You can also get support of heatmap tools such as crazyegg or site overlay option at google analytics for the same study.
    [...]

  13. [...] At PBworks, we take our data seriously.  So it should be no surprise to learn that we use A/B testing techniques to aid our product and website development decisions.  Having a web-based product means that we can quickly learn what our customers like and what they don't like and make changes accordingly.  If you're not familiar with A/B testing, Avinash Kaushik has a great primer. [...]

  14. [...] Concluding, both types of testing have their own advantages and disadvantages. Each can be a perfect technique, depending on the needs of the website. They should always go hand-in-hand, using one to test completely different designs and the other to optimize the current design. The important thing is to understand that testing is not a one-time effort It is an ongoing exercise that should be part of the mindset of an organization. As Avinash Kaushik once wrote in his blog, Experiment or go home! [...]

  15. [...]
    However, analytical marketers look at this investment differently. Their first step would be to create a test, budget and monitor the results of the advertising. They would identify the cost per new user and then compare it against other marketing channels.

    Does online advertising work as well or better than other marketing channels? If it does, then it’s worth increasing the investment. Otherwise allocate the resources elsewhere.
    [...]

  16. [...]
    Avinash Kaushik – Experimentation and Testing: A Primer
    [...]

  17. [...]
    Came across a nice post about experiment and test in UX.
    Experimentation and Testing: A Primer
    [...]

  18. [...]
    Avinash Kaushik, Googler and author of Web Analytics: An Hour a Day, wrote in his Experimentation and Testing primer that “80% of the time you/we are wrong about what a customer wants.”
    [...]

Add your Perspective

*