Excellent Analytics Tip#1: Compute Statistical Significance

We all wish that our key internal partners, business decision makers, would use Web Analytics data a lot more to make effective decisions. How do we make recommendations / decisions with confidence? How can we drive action rather than pushing data? The challenge is how to separate Signal from Noise and make it easy to communicate that distinction.

This is where Excellent Analytics Tip #1, a recurring series, comes in. Leverage the power of Statistics.

Consider this scenario (A):

Offer One Responses: 5,300. Order: 46. Hence Conversion Rate: 0.87%
Offer Two Responses: 5,200. Order: 55. Hence Conversion Rate: 1.06%

Is Offer Two better than Offer One? It does have “better” conversion rate, by 0.19%. Can you decide which one of the two is better with just 40 to 50 responses? We got 9 more orders from 100 fewer visitors.

Applying statistics tells us that the results, the two conversion rates, are just 0.995 standard deviations apart and not statistically significant. This would mean that it is quite likely that it is noise causing the difference in conversion rates.

Consider this scenario (B):

Offer One Responses: 5,300. Order: 46. Hence Conversion Rate: 0.87%
Offer Two Responses: 5,200. Order: 63. Hence Conversion Rate: 1.21%

Applying statistics will now tell us that the two numbers are 1.74 standard deviations apart and the results rate 95% statistically significant. 95% significance is a very strong signal. Based on this, and only a sample of 5k and sixty odd responses, we can confidently predict success.

Powerful benefits to presenting Statistical Significance rather than simply Conversion Rate:

You are taking yourself out of the equation, it is awesome to say “according to the God’s of Statistics here are the results…”
Focusing on quality of Signal means that we appear smarter than people give us Analysts credit for.
You take then thinking and questions out of the equation. Either something is Statistically Significant, and we take action, or we say it is not Significant and let’s try something else. No reporting, just actionable insights.

Is this really hard to do?

No! Simply use the spreadsheet below, which comes to us via the exceedingly kind Rags Srinivasan:

Statistical Significance Calculator

[Note: March, 2013: This is the updated version of the calculator. For a bit more context on version 2, please see this comment by Rags.]

In the spreadsheet you get even more bang for your buck. On sheet number one you can apple the 1-tailed or 2-tailed test to your statistical significance calculations. Here are the steps: Choose from the drop down in cell D7. Complete cells B13, C13, B14 and C14 (essentially how many participants or visitors etc were there and how many conversions you got). In cell C18 you’ll see if the results were statistically significant or not.

In sheet number two, for those of you who are a bit advanced, you can apply the chi-squared test. This test is more optimal for when you see very small conversion rates (not unusual on the web). It is a more skeptical test with a higher threshold for differences. The benefit is that small statistical anomalies don’t look like real differences.

When in doubt go with sheet number two, the chi-squared test.

Two small tips:

This is a best practice but aim for 95% or higher Confidence. That is not always required but it is recommended.
“Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital.” –Aaron Levenstein

Agree? Disagree? Not really a Excellent Analytics Tip? Please share your feedback via comments.

84 thoughts on “Excellent Analytics Tip#1: Compute Statistical Significance”

Mark McLaren
May 17, 2006 at 05:41
Thank you for starting your excellent blog. I found you via Robbin Steif’s LunaMetrics Blog.
Regarding the use of standard deviation as a means of interpreting order results. In general, I completely agree about the importance of removing bias from test results as much as possible.
What else do we need to know about the groups involved in the test?
Are they essentially the same group or are they two completely different groups? (I’m assuming you would want to send offers to as many people as possible; hence, they are the same group – less 100 people in the second case.)
How were members of the group(s) selected? Do you need a random sample in order to apply principles of standard deviation? 5,000+ is a good size group from which to draw conclusions, but I take it group members were not chosen randomly.
Reply
Jeff Leong
May 17, 2006 at 09:17
Dear Avinash,
Thank you for sharing your blog. This has been long awaited and definitely worth the time to read – subscribe to.
I’m wondering now based on this article, how this model plays in with UX decision when applying A/B tests. We often mind minimal differences and often base our decision the winner between the two.
Is there a way or method of filtering noise when the difference is minimal? Assuming all multivariate elements are correctly in place?
Congratulations on the great blog, the industry will soon be catching on!
Jeff
Reply
June Li
May 17, 2006 at 10:40
Hi Avinash,
I had the pleasure of hearing you speak at the 2005 eMetrics. I’m very happy that you’ve decided to blog. I too found your blog through Robbin Steif’s .
It’s excellent that you are giving us real examples of how statistics can be used, and providing tool references. I look forward to additional case studies and discussions.
Will you also be posting about monitoring and managing outside influences? Sometimes the Noise dampens the signal or deflects the signal.
Thanks,
Web: http://www.clickinsight.ca
Blog: clickinsight.blogspot.com
Reply
Avinash Kaushik
May 17, 2006 at 22:32
Mark McLaren: Thanks for your kind words about the post, I am glad you found it helpful.
What else do we need to know about the groups involved in the test?
Are they essentially the same group or are they two completely different groups? (I’m assuming you would want to send offers to as many people as possible; hence, they are the same group – less 100 people in the second case.)
In the specific example I used, and the spreadsheet, you control for one thing usually. you can have as many groups as you want. For example you can send one offer to people who live in CA and NY and FL and OR and OH and plug that into the spreadsheet against a control and know which works best.
Alternatively you could try 5, 6, 10 whatever number of different offers to a bunch of folks and see which one converts best.
The problem becomes when you want to test different offers to differnt groups (or many different content in different locations on the same page). Now you are in the world of multivariate and need to apply advanced statistics (think Taguchi).
Doing multivariate is awesomely powerful and yields great results, but beyond my humble spreadsheet.
Do you need a random sample in order to apply principles of standard deviation? 5,000+ is a good size group from which to draw conclusions,
The beauty of using statistics is that the standard deviations required, and amount of Statistical Significance (my suggestion of 95% or higher), will drive how big a sample you need. There is no fixed number (like 5k).
Hope this is the kind of information you were looking for.
Reply
Avinash Kaushik
May 17, 2006 at 22:45
Jeff: Glad to see your post…
I’m wondering now based on this article, how this model plays in with UX decision when applying A/B tests. We often mind minimal differences and often base our decision the winner between the two.
If we are doing a/b testing (asuming the Success Goal is clearly articulated and measurable and that it is not “impact on brand”) then it would be a sin not to use the spreadsheet in the post above to seperate Signal from Noise. Simply looking at Conversion Rate (or similar metric) difference is very dangerous because of exactly what you say, how much is enough to be confident.
The great news is that most current a/b testing solution (atleast the ones that so “page testing”) already include statistical computations to help us make better decisions.
If you don’t see atleast 90% plus statistica confidence take the results with a grain of salt.
Reply
Jaimie Scott
May 18, 2006 at 09:34
Hi Avinash,
I too am very happy to see your blog. I found it through Clint Ivy’s blog and I am enjoying reading your posts very much. I find them to be quite informative.
You say above:
“You can easily adapt the spreadsheet, as we have, to compute statistical difference between absolute numbers (say you want to know if the difference Page Views Per Visitor or Average Time on Site between segment One and Two is Significant)”
It’s not obvious to me how to do this. Can you elaborate?
Thanks.
Reply
Aurélie
May 21, 2006 at 13:37
Hi Avinash,
Good to find you blogging, sharing thoughts and experiences. It’s quite some interestign stuff and I hope you enjoy the experience.
I read your different posts on Saturday morning and your thoughts stayed with me for the entire week-end. Thank you.
Yes, statistical significance. I totally join you in the idea and would only add that tests that do not render truely significant results should not be communicated upon. I remember in my first job having warned of the non significance of a test only to find it had heavily influenced a commercial strategy. I vowed never again!
Another pavlovian reaction was to consider that any number of responses under 200 should not be taken into account as it holds high proability of not being representitive. I usually follow this first rule and adapt the variables in order to remain loyal to the statistical representitiveness of a sample. Quite pavlovian, I agree.
And the last thing is that I’ll bare statistical significance in mind but would like to suggest another possible subject: correlation between conversion rates.
Siegert suggested this formulation for a client yesterday:
“Is a visitor engaging into A but not engaging into B, converted easier into a lead, than someone engaged into C and B?”
In other words, you’ve got kind of low level conversion events that influence or not higher goals.
I’m having diffculty formulating this, sorry.
Hope it made sense, keep up the good work, cheers from expensive Brussels ;-)
Aurélie
Reply
Avinash Kaushik
May 21, 2006 at 23:06
Aurélie: Thanks for the thoughtful comment, I am sorry to have spoilt your entire weekend with my posts afterall there are so many more beautiful things in life.:)
I completely agree with the care around communicating anything that is not of significance, there is always a danger that inspite of your warning the will jump into the lake.
Another pavlovian reaction was to consider that any number of responses under 200 should not be taken into account as it holds high proability of not being representitive.
(For our readers here is something on pavlovian reaction.)
In the world of Multivariate we can detect a strong signal even with small samples. We use somethings like This Page to calculate sample set.
“Is a visitor engaging into A but not engaging into B, converted easier into a lead, than someone engaged into C and B?”
Corelations are important, very, and of course my simply little spreadsheet won’t account for that. Specially for complex web interactions it is important to understand the lower level conversion events might influence higher level (ultimate) goals.
Reply
Kerry Kim
May 30, 2006 at 17:37
Hi Avinash, my thanks also to you for sharing. Any additional insights you might have about the key drivers of adoption you’ve experienced would be greatly appreciated.
Regarding statistical significance, it appears that the reference in your post used a one tailed z test for testing whether there is a significant difference between two sample proportions. Wouldn’t it have been more precise to use a two tailed test? If not, why?
Reply
Avinash Kaushik
May 31, 2006 at 00:03
Kerry: The example used was quite a simple one to show that we can accomplish much applying statistics to our standard KPI’s with very little stress.
Wouldn’t it have been more precise to use a two tailed test? If not, why?
You are right, one can get quite sophisticated and get ever better results. The emphasis of the article was how to detect statistical significance in a simple case. I hope to blog more about how we can apply advanced methodologies in testing (to build on my experimentation and testing post).
Thanks for taking the time to post a comment.
Reply
Vicky Brock
June 1, 2006 at 08:51
Hi Avinash,
I so much agree on the importance of taking into account statistical significance – an essential part of the “so what” factor!
This is a neat chi square tool to test for statistical significance:
https://www.georgetown.edu/faculty/ballc/webtools/web_chi.html
I do love your blog, bye, Vicky
Reply
Hakim Aly
April 23, 2007 at 21:27
Although a 95% CL seems to be common (other than in a medical/pharmaceutical context), in a marketing context a lower CL may be quite appropriate. As you know, the choice of significance level(or it’s complement, Confidence Level) depends on the cost of being wrong.
A 5% significance level (95% CL) means there is a 5% probability of being wrong. This is Type I error, i.e., concluding that one RR% is higher than another (statistically significant)when in fact it is not. Acting on this wrong conclusion may result in incurring costs that do not yield revenue or profit to offset the costs.
Type 2 error is when one does not reject the null hypothesis when if fact it is false. In this
case, the cost associated with the decision to not roll out a marketing tactic is the foregone revenue/profit that would otherwise have been generated.
In many situations, the cost of Type II error exceeds the cost of Type I error. Clearly, a trade-off is involved, but a lower CL of 90% of even 80% may not be out of line.
Ultimately, each business needs to decide for itself what an appropriate CL is for purposes of assessing test results.
Would be interested in your thoughts. Hakim
Reply
Hakim Aly
April 23, 2007 at 21:35
Regarding the question of 1-tail vs. 2-tail test, the former is appropriate when one wants to determine whether one RR% is statistically HIGHER (or LOWER) than another. The latter is appropriate if one wants to know if a RR% is DIFFERENT FROM another.
I would suggest that in most marketing situations, we are more interested in the former (higher than) than the latter (different from).
In a few cases, we may want to know whether a proposed course of action may harm the response rate, in which case a 2-tail test would be appropriate.
Reply
Curtis
August 16, 2007 at 23:44
Thanks for the great insight. Do you have more details on exactly how the statistical significance is calculated? I’m curious how you derived the std deviations in the scenarios above with just the sample size and order counts.
Reply
pabitra chatterjee
November 4, 2007 at 23:11
this is to continue where mr hakim aly left. you may find this little piece at my blog, https://directindia.blogspot.com/2007/10/no-beta-yes-risk.html, interesting.
i’ve also given links for templates in the piece.
pac
Reply
Pingback: Web analytics en statistische significantie - Onetomarket Blog
Philip
February 18, 2008 at 10:38
The Analytical Group link for a free spreadsheet download appears to have changed. It’s now https://www.analyticalgroup.com/sigtest.html (with html instead of htm).
Thanks for all your great articles, Avinash!
Note : Thanks very much for the correction Philip! -Avinash.
Reply
Jbuser
February 20, 2008 at 13:56
Avinash,
As a stats guy, I am a little concered with the assumptions behind the model. I downloaded it and the first thing I noticed, was that it makes some pretty large assumptions with confidence levels (anything with z-score (I am assuming) of 1.65 and 2.33 = 95%). I understand that this is probably there to make things “easy” but I think it can be misleading. Also of note, was that IF their is a z-score assumption (which unsure of), there are some other assumptions underneath the covers (which I coudn’t get to), and z-scores are only for known pop means and sta. dev. Do you know what Brian is using? Is it possible to get this information?
Finally, one concern with the “plug and chug” nature of the spreadsheet is what you always must be wary of, and that is making statistical significance a badge of honor. All it tells you is given the values you have, is there a difference between the two. What you must do, more than anything else, is make sure your testing methods are solid BEFORE the test. Otherwise, you are going to be putting in values after values and getting significance or not and making some very important decisions when the whole test could be wrong. Practical vs. Statistical.
Reply
michael choe
March 27, 2008 at 14:05
all –
i share similar concerns as jbuser…
for example, 1.74 standard deviations is not 95% significance. 1.96 standard deviations is 95% confidence.
also, for computing standard deviation (s) of 2 or more proportions (in this case, conversion rate), i think it’s a good practice to assume the largest margin of error. in this case, margin of error = 2 * sqrt(0.5^2/n), where n is the size of your smallest sample. this is what pollsters such as zogby do when communicating poll results about hillary/obama, etc.
Reply
Pingback: Tyranny of numbers — checking for statistical significance | Ubermarketer
Barbara
April 13, 2008 at 06:15
Hi Avinash,
Thanks a million for your great posts. I just got into web analytics and have found both this blog and your book extremely helpful. You mentioned utilizing the statcalc.xls spreadsheet to measure significance between pageviews. Can you kindly advise on how to do this? Your response is eagerly awwaited. :)
Reply
Andrew Blank
July 16, 2008 at 07:45
The link for more advanced stats – https://www.mwrms.com/wwwRMS/DirectMarketing/MarketingCalc2.asp does not seem to have anything to do with stats anymore. Unfortunately I couldn’t find a live replacement.
Reply
Pingback: Web analytics en statistische significantie - Onetomarket
Lea SP
February 18, 2009 at 13:17
What an insightful post Avinash.
For someone who is just approaching web analytics from a statistical perspective, I’m constantly asked whether a particular report figure is “statistically significant” in terms of sample size. I readily understand the statcalc worksheet purpose of seeing whether the difference between two metrics is stat. significant, but how do you know if the metrics have enough sample size to gauge effectively?
For example, I have two conversion rates: 0.7% and 8.8%. The sheet shows that there is a 99% confidence that these are statistically different which is a good start. But my rates are based off of A(4,155 clicks & 6 conversions) and B(80 clicks and 7 conversions). Are the conversion samples too small to be effective indicators in this case? What is an acceptable threshold here, and how do you find it on a case-by-case basis?
Thank you!
Reply
Garry Przyklenk
May 6, 2009 at 05:17
Avinash, this post is giving me nightmares and heart palpitations.
My problem is similar to Lea SP’s, but on steroids. I have two sample populations:
Pop A: 200,000 participants, 800,000 conversions
Pop B: 1,000 participants, 8,000 conversions
Unfortunately, conversions in this case don’t generate direct cash, or else I’d be buying an island somewhere, and not worry about calculating significance!
But obviously, there should be something to be said of Pop B having way lower sample size and not being representative or even warranting comparison to Pop A, right?
Regards,
Garry
Reply
Joshua Daniel Egan
June 1, 2009 at 06:07
Since i am new to seo,i want to implement statistics in seo . i thought that statistical techniques cannot be implemented in seo or by the data given by google analytics. so can you suggest any statistical technique with example to my mail id it will be useful for me.i will be greatful to u
Reply
Pingback: Optimize Ad Texts | datumSense
Pingback: How to Optimize PPC Tip #1 : Optimize Ad Texts | datumSense
Pingback: Ultimate Web Analytics Training Guide: From Click to Close
Adrian Palacios
November 29, 2009 at 21:33
I am on my way to becoming an analysis ninja. My two roadblocks thus far are JavaScript and statistics (KPI’s? check. Segmentation? double check.)
I am wondering if you could recommend any good entry-level books that could help teach me to do statistical analysis? Anything that uses web analytics scenarios is a *huge* plus.
Thanks!
Reply
Avinash Kaushik
November 29, 2009 at 23:34
Adrian: I am afraid I don’t know any book on Statistics that covers web analytics scenarios. Though if you understood statistics I don’t think anything would stand in your way in terms of applying it to challenges you face in your web analytics job.
Taking a Statistics 101 or 201 course at your local university might just do the trick.
This might seem odd but one of the best books I have read on Statistics is this one:
Cartoon Guide to Statistics by Larry Gonick, Woollcott Smith
Simple and effective.
Not necessarily just about statistics but this is an awesome book if you want to be a great analyst:
Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets
by Nassim Nicholas Taleb
Hope this helps a bit.
Avinash.
Reply
Pingback: gestion, analyses et créativité | gestion analytique | gestion créative | site Web
Pingback: Análisis “alternativo” del éxito de campañas CPC en B2B « Blog de Un Analista Web
Pingback: Making decisions with confidence – free statistical significance tool | Value Propositions
Dave Rekuc
November 1, 2010 at 08:18
Thank you for the easy to understand description above. I’ve found it very useful, however, I work for an ecommerce site that has a price range of anywhere from $3 an item to $299 an item. So, I feel like in some situations only looking at conversion rate is looking at 1 piece of the puzzle.
I’ve often used sales/session or tried to factor in AOV when looking at conversion, but I’ve had a lot of trouble coming up with a statistical method to ensure my tests’ relevance. I can check to see if both conversion and AOV pass a null hypothesis test, but in the case that they both do, I’m back at square one.
Can anyone recommend a statistical method for this scenario?
Thanks Avinash for the article, love your blog (and books)!
Reply
Rags Srinivasan
November 1, 2010 at 20:55
Dave,
You are correct in stating that looking at conversion rate alone is looking at one part of the puzzle.
When you have items that vary in price, like you said from $3 to $299, your test for statistical significance of difference between conversion rates assumes an implicit hypothesis that is treated as given.
A1: The difference in conversion rates does not differ across price ranges.
and your null hypothesis (same, just added for completeness)
H0: Any difference between the conversion rates is due to randomness
When your data tells you that H0 cannot be rejected, it is conditioned on the implicit assumption A1 being true.
But what if A1 is false? Either you explicitly test this assumption first or as simpler option, segment your data and test each segment for statistical significance. Since you have a range of price points I recommend you test over 4-5 price ranges.
This is same as the case when you are A/B testing simple conversion rates and treat the population as the same (no male/female difference, no Geo specific difference etc).
Hope this helps.
-rags
Reply
Dave Rekuc
November 2, 2010 at 06:01
Thank you Rags, very helpful. I’ll use the segmentation method in my next test. Unfortunately, this means waiting for a larger sample size than non-segmented data. However, I suppose it’s worth it. Thanks!
Reply
Pingback: Web Analytics TV #14 – Just Wow | Google Analytics Blog
Pingback: The Hidden Hypotheses We Take For Granted « Iterative Path
Pingback: Avinash responde a mi pregunta | boluda.com
Pingback: » Uncertainty in web optimization Crunching the Web
Pingback: Analysing A/B Tests Beyond Visitors and Conversions | Nathan Jackson
LisaS
October 31, 2011 at 09:02
Hi,
What do you think about using Fisher’s Exact Test over the Chi-squared test? The 2nd link below suggests that using Fisher’s Exact Test is the thing to do.
https://www.langsrud.com/fisher.htm
https://www.graphpad.com/faq/viewfaq.cfm?faq=
Thanks,
Lisa
Reply
1. LisaS
  October 31, 2011 at 14:32
  The 2nd link got cut off, there’s a 1114 after the equal sign:
  https://graphpad.com/faq/viewfaq.cfm?faq=1114
  Reply
2. Rags Srinivasan
  October 31, 2011 at 18:24
  Lisa
  First, I see that you are from Boulder! Go Boulder! My hometown forever!
  Yes Fisher’s Exact Test is a candidate here but it is needed only when we are seeing conversion numbers as tiny as 5 or lower. With Fisher’s Exact test, when you are doing the calculations with a handheld calculator the permutations can get ugly for large numbers but with Excel we don’t have that issue. Yet, for larger numbers Chi-square test does well, eliminating the need for a Exact test.
  On the other hand if the conversion rates are so low one should not even be wasting time over statistical anomalies (there may be statistical significance but is there economic significance)?
  Chi-square test is more skeptical than t-test and Fisher’s Exact test is even more skeptical than Chi-square test. For most split tests that test the hypothesis that the two groups are the same, Chi-square will do.
  All that said I should admit that I am biased in picking a candidate test that I can explain to others in simple terms. It is relatively easier for me to explain t-test and chi-square test to those who are not statistically inclined (see here: https://iterativepath.wordpress.com/2011/07/03/a-closer-look-at-ab-testing/ ) than it is to explain Fisher’s test.
  Regards
  -Rags
  Reply
  1. LisaS
    November 29, 2011 at 12:53
    Thanks so much Rags for your reply. And yay for Boulder! I’m using your great spreadsheet tool. Thanks! So if I get a YES for statistical significance using the t-test, but not when using the chi-squared test. Would you still say YES, the test was significant? Is the t-test good enough in your opinion?
    Numbers for A are: 33548 conversions out of 212460 participants
    Numbers for B are: 33371 conversions out of 208143 participants
    Reply
    1. Rags Srinivasan
      November 29, 2011 at 14:47
      Lisa
      You need to decide whether or not you want to do Chi-square test or t-test.
      Chi-square test is more skeptical and it likely will more often find the difference to be not statistically significant.
      Since it operates on just the conversions, the procedure that uses Chi-square test also eliminates other hidden hypothesis that are prevalent with a procedure that uses t-test.
      One thing you should not do is use Chi-quare test, see it fail then use 2 tailed t-test, see it fail and use 1-tailed t-test and see it pass.
      https://iterativepath.wordpress.com/2010/06/01/is-your-ab-testing-tool-finding-differences-that-arent/
      -Rags
      Reply
      1. LisaS
        April 3, 2012 at 12:56
        Hi Rags,
        Me again! You say that the Chi-square test is more skeptical than the t-test. I’m confused because I’m finding that the 2-tail t-test is more skeptical than the Chi-square test. Any thoughts on that?
        I’m using your spreadsheet, thanks for making it available!!
        Control – 10164 participants, 584 conversions
        Treatment – 18928 participants 1121 conversions
        Is it working right if I see that the 2-tailed test does not pass statistical significance, but the Chi-square test does indicate statistical significance? Perhaps the 2-tailed t-test is more skeptical than the Chi-square test?
        thanks!
        Lisa
Pingback: Curated links for Tuesday, Nov. 8, 2011 | Innovation in College Media
Pingback: 8 Rules of A/B Testing – The Art in Marketing Science - Search Engine Watch (#SEW)
Pingback: A/B test - How many visitors do you need for your split test?
Pingback: Answers to the 19 Most Frequently Asked Questions About A/B Testing
Pingback: The Right Time To Start Analyzing Data
Pingback: 网站分析中数据的统计学显著性检验 - 南充SEO博客
Pingback: 19 самых часто задаваемых вопросов об A/B тестировании | Воронка продаж
Pingback: How a Preview Image Increased a Landing Page's Conversion Rate by 359% - Search Engine Watch (#SEW)
Pingback: 5 A/B-Testing Mythen
Pingback: How to Run AB Email Tests in Marketo - Marketing Rockstar Guides
Denisse Gomez
February 1, 2013 at 12:04
Hi,
I have results based on a sample size of X , the results fall into two buckets, how do I figure out if those individual buckets are statistical significant to draw a conclusion.
Thanks,
Denisse
Reply
Pingback: Today, everbody is data-driven. NOT! | out of the Can - Technology, Start-ups and other stuff out of the Can
Bryan
March 18, 2013 at 10:54
Just downloaded the spreadsheet, and the chi-sqr tab isn’t taking into account the number of test participants in each variation. That can’t be right.
Should “Expected value” be a weighted average rather than a straight average of conversions, or something more complex?
Reply
1. Rags Srinivasan
  March 30, 2013 at 06:56
  Bryan
  You are correct that the chi-square tab does not use number of test participants. I made a simplifying assumption that your A/B test evenly distributes (50-50) between two groups. Hence you see Row 24 calculating Expected value as 50% of total conversions.
  To use test participants you will fix Row 24 formula as follows (find Expected value for Control and Treatment, add another column) )
  Expected value for Control B24= ($B15+C15) * $B14/($B14+$C14)
  Expected Value for Treatment C24= (B15+C15) * $C14/($B14+$C14)
  You’ll notice the two will be same if the test participants are evenly split.
  Then you’ll change Row 25 as
  =((B24-B15)^2)/B24 + ((C24-C15)^2)/C24
  -Rags
  Reply
  1. Lisa S
    March 30, 2013 at 08:05
    Thanks Rags and Bryan.
    That helps clarify the issue I was seeing when compariing the t-test to the chi-square test.
    Reply
  2. Bryan
    April 1, 2013 at 07:14
    Rags, thanks for the update. Avinash, thanks for posting the new file. Lisa, glad to help.
    Can you guys tell me if this is appropriate to use in cases where conversions *exceed* visitors? I’m testing a “related articles” component that appears on article pages on a media site, and using page views as the conversion metric.
    Thanks!
    Reply
    1. Andrew Blank
      May 14, 2014 at 06:07
      I’ve run into this as well. Did you ever get an answer? I don’t see it in this section.
      Reply
  3. Aaron Adamson
    November 11, 2019 at 17:25
    Hi Rags, I was just noticing that your Chi-sq formula in the Chi-sqr part of your Statistical Significance Calculator appears to contain an error:
    Yours: ((B24-B15)^2)/B24 + ((C24-C15)^2)/C24
    which means expected minus observed, squared, divided by expected.
    But according to this source the formula is ‘observed minus expected’ which would be this:
    ((B15-B24)^2)/B24 + ((C15-C24)^2)/C24
    Source: https://www.statisticshowto.datasciencecentral.com/probability-and-statistics/chi-square/
    Reply
S M
April 9, 2013 at 03:17
Hi Avinash..
I liked the blog very much. i am in the process of developing more statistical insights and it would be great if you could suggest ways that can help me.. an informative blog or statistical bible for web analytics.. your inputs would be highly desirable..
Reply
1. Avinash Kaushik
  April 9, 2013 at 11:04
  S M: If you are serious about this then a course at a local university (or online) is a good idea. Even something that just goes into Statistics 101 and perhaps the 201 level would be great.
  There are a couple of books that are a great (and a lot of fun) intro as well.
  You can download, free, How To Lie With Statistics.
  I also very much like this one: Cartoon Guide to Statistics by Larry Gonick, Woollcott Smith
  Finally for something that moves beyond statistics, but still in the rough general neighbourhood, checkout Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets by Nassim Nicholas Taleb
  -Avinash.
  Reply
Pingback: 网站分析中数据的统计学显著性检验 - 互联网分析 | 互联网分析
Pingback: Create a Content Optimization Advantage with Performance Testing
Pingback: Statistical relevance of A/B in Sitecore DMS | non-linear thinking
Diego
February 10, 2014 at 01:11
Thanks Rags for this post.
Everything I can find regarding significance are based on conversion but the final goal for an ecommerce site is to increase sales (increasing conversion is just a mean and sometimes it can be missleading because you can increase conversion but if your AOV goes down your change could be impacting negatively in your sales).
So my question is, why do we use conversion instead of sales ($)? Could the signficance be calculated based on sales? How could this be done?
Thanks,
Diego
Reply
1. Andrew Blank
  May 14, 2014 at 06:11
  This makes complete sense. I’ve seen others just add up the dollar value based on the conversion type. The problem is that they aren’t necessarily significant. I’ve read that it takes additional statistics to see if an alternative value wins. However I haven’t seen a step by step approach to show how.
  Reply
Pingback: Are You a Victim of Your Own A/B Test’s Deception?
Pingback: 3 PPC Power Plays: Monday Mornings, Luxury Hotels & Online Education
Geo
April 8, 2014 at 03:47
Hi Avinash,
You’ve been my inspiration for quite a few years now, and your posts always bring wisdom to web analytics. Thank you for that!
I’d like to note that the Excel sheet calculator that you link to is actually pretty much useless for practical purposes. Yes, it will do in a Baseline vs Test scenario, but how many times does a marketer face this situation in the real world? Not very often.
Usually we have a Baseline vs Test 1 vs Test 2 vs Test 3 and so on. Depending on what you are testing (ad copy, landing pages, etc.) the number of variants you’ll be testing against can quickly go way above the trivial 1 or 2 tests.
Then you might say – but can’t I simply to a pairwise comparison: each Test vs the Baseline and then take the best one? The short answer is no. The slightly longer answer is “Because of Multiple Comparison Probelm in Statistics” for which there is enough written all over the web.
Thus you need a calculator that corrects for that in an intelligent way (too strict and you’ll be correcting for things very rarely required/encountered in the real world, e.g. the Bonferroni correction).
I’ve devised and published a calculator that handles many test variants easily (bulk input as well, copy paste from an excel table) and corrects for the multiple-correction error intelligently (the Benjamini-Hochberg-Yekutieli method). I believe it to be the best suitable calculator in the market right now. It’s a available here – analytics-toolkit.com/statistical-calculators/ and you can sign up for free to use it (no newsletters and shit!). Feel free to give it a try and let me know what you think :-)
Reply
davea0511
May 14, 2014 at 06:57
*Very* Helpful, Avinash.
I was having a hard time visualizing confidence in A/B tests until I read this. The key is to mentally overlay the graph for A and B by knowing their stddev. Your page clued me into that.
Thanks!
Reply
Pingback: Why Every Internet Marketer Should Be a Statistician | Blog for Web Analytics, Statistics and Data-Driven Internet Marketing | Analytics-Toolkit.com
Dan Grainger
August 5, 2014 at 06:52
Dug out this article for a re-read given that I’m currently redesigning some Excel based testing tools. I’ve always been a big fan of statistical confidence, indeed it’s one of the mandatory criteria in my book before making a call on an AB test.
While it’s simple enough to calculate stat confidence for conversion rates, there are (imho) other more important metrics which I add into the mix of decision making factors…such as revenue per visit, average order value and margin per order. So my question is this – given that these metrics aren’t percentages (i.e. they can be greater than 1 and less than 0), what’s the appropriate method to calculate statistical significance for them? I’m sure I have it somewhere, but wanted to draw on the experiences of you and your readers when encountering this!
Reply
Pingback: 5 CRO Questions Answered – Optimization Process, Testing & Troubleshooting – Think Like a User
Pingback: Check Your Site's traffic for Google's Mobile-Friendly update - eHealthcare Solutions
soumya
June 24, 2015 at 17:20
Thank you for this awesome post!
I understand the concept & application of the ab test & significance from a b2c online website(ecommerce) space which normally have a lot of visitors however, please could you advice whether the statistical significance calculator be applied universally. for eg. for a online content website whose goal is to get more lead gen forms filled with variant b ( new form) vs old form.
My question is if / how can one apply the concept if/when a content website has relatively fewer visits and even fewer conversions compared to an ecommerce site/other content sites. Any suggestions for this scenario? Appreciate it.
Reply
Pingback: conversion optimisation | conversion rate optimisation tips
Pingback: No te fíes de Analytics. Valida la significancia estadística de los datos – Usabilítica