<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Excellent Analytics Tip #9: Leverage Statistical Control Limits</title>
	<atom:link href="http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html/feed" rel="self" type="application/rss+xml" />
	<link>http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html</link>
	<description>Pluralitas non est ponenda sine neccesitate.</description>
	<pubDate>Mon, 08 Sep 2008 12:56:24 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.1</generator>
		<item>
		<title>By: Hexaware Blog Central- Business Intelligence</title>
		<link>http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-469962</link>
		<dc:creator>Hexaware Blog Central- Business Intelligence</dc:creator>
		<pubDate>Wed, 13 Aug 2008 10:43:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-469962</guid>
		<description>[...] To summarize, Six Sigma needs an improvement opportunity as the starting point for it to unleash its power to improve processes. BI generates lot of these opportunities with its DW/Reporting/Analytics components but does not enforce the process implementation rigor. I feel that there is lot of synergy in bringing both together – Six Sigma, the left hand and BI, the right hand when brought together can earn a lot of claps in the quest to create learning, performing organizations. Just to sample the power of Six Sigma techniques, please take a look at the following link: http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html, which illustrates the use of control charts (one of Six Sigma’s potent tools) in metrics / KPI management. Fascinating! [...]</description>
		<content:encoded><![CDATA[<p>[...] To summarize, Six Sigma needs an improvement opportunity as the starting point for it to unleash its power to improve processes. BI generates lot of these opportunities with its DW/Reporting/Analytics components but does not enforce the process implementation rigor. I feel that there is lot of synergy in bringing both together – Six Sigma, the left hand and BI, the right hand when brought together can earn a lot of claps in the quest to create learning, performing organizations. Just to sample the power of Six Sigma techniques, please take a look at the following link: <a href="http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html" rel="nofollow">http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html</a>, which illustrates the use of control charts (one of Six Sigma’s potent tools) in metrics / KPI management. Fascinating! [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Shikha</title>
		<link>http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-432467</link>
		<dc:creator>Shikha</dc:creator>
		<pubDate>Mon, 10 Mar 2008 19:13:59 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-432467</guid>
		<description>Hi Avinash, I love these. Can I connect with you. I have questions on xbar R chart and was looking for 1:1 mentoring.</description>
		<content:encoded><![CDATA[<p>Hi Avinash, I love these. Can I connect with you. I have questions on xbar R chart and was looking for 1:1 mentoring.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Patrick</title>
		<link>http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-382663</link>
		<dc:creator>Patrick</dc:creator>
		<pubDate>Sat, 01 Dec 2007 21:32:21 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-382663</guid>
		<description>Sorry to stir up this old thread Avinash, but I just can't resist b/c I ran into a situation where I thought this would be useful.

I read the part about this in your book and this blog post (including all comments) again and basically had/have three more questions:

1) Somebody mentioned using the slope or the most recent periods to filter out the effect of growth. Obviously if you take the data from the last 10 years the average might not be a good idea, but the data from the last year would be more insightful - that is if there's enough data during that period...thus choosing only the last periods over choosing all the historical data is mostly a balancing act?

2) The use of standard deviations seems to be exactly the same way as when they're used with other metrics. 3 SD's = 98% of the data, 2 SD's 95% of the data...and 1 SD would be roughly 68% of the data? Did I understand this correctly?

3)You mention that having a small sample size can pose a problem (as usual). Would it be right to assume that the control limits would still be "correct" no matter how many data points you have, but if you dont have enough, they're just not very insightful as the standard deviations would be very big and thus the control limits would be very big, too?

I hope my first 2 questions can be replied to with a simple yes (I hope!), but the last one really got me wondering. If the sample size is too small/too few data points, what exactly would happen that would make it harder to take action?

And is there a way to find out how many data points one would need for these control limits to be helpful? Something like a statistical significance test?!

Or would somethign like a statistical significance for the control limits have to be computed?!</description>
		<content:encoded><![CDATA[<p>Sorry to stir up this old thread Avinash, but I just can&#8217;t resist b/c I ran into a situation where I thought this would be useful.</p>
<p>I read the part about this in your book and this blog post (including all comments) again and basically had/have three more questions:</p>
<p>1) Somebody mentioned using the slope or the most recent periods to filter out the effect of growth. Obviously if you take the data from the last 10 years the average might not be a good idea, but the data from the last year would be more insightful - that is if there&#8217;s enough data during that period&#8230;thus choosing only the last periods over choosing all the historical data is mostly a balancing act?</p>
<p>2) The use of standard deviations seems to be exactly the same way as when they&#8217;re used with other metrics. 3 SD&#8217;s = 98% of the data, 2 SD&#8217;s 95% of the data&#8230;and 1 SD would be roughly 68% of the data? Did I understand this correctly?</p>
<p>3)You mention that having a small sample size can pose a problem (as usual). Would it be right to assume that the control limits would still be &#8220;correct&#8221; no matter how many data points you have, but if you dont have enough, they&#8217;re just not very insightful as the standard deviations would be very big and thus the control limits would be very big, too?</p>
<p>I hope my first 2 questions can be replied to with a simple yes (I hope!), but the last one really got me wondering. If the sample size is too small/too few data points, what exactly would happen that would make it harder to take action?</p>
<p>And is there a way to find out how many data points one would need for these control limits to be helpful? Something like a statistical significance test?!</p>
<p>Or would somethign like a statistical significance for the control limits have to be computed?!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tech_admin</title>
		<link>http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-186263</link>
		<dc:creator>Tech_admin</dc:creator>
		<pubDate>Wed, 01 Aug 2007 18:12:03 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-186263</guid>
		<description>Dilip, to add to your response, if there are a lot of fluctuations and they have a genuine reason, then this fluctuation is considered to be "normal". The control limits are calculated based on the distribution of the data around the mean. In a lot of instances, the data is not perfectly normal, but a multlimodal distribution, these in most instances will be a group of smaller "normal" distributions. The process owner will be able to identify these triggers easily, this should help regroup the data into different distributions and then monitor them using control charts.  e.g the day of the week can be an indicator for hits on a website, or the time of day etc. 

One short coming of fitting a straight line is ths normalizes the data in a very rough manner, a second degree polynomial might work better. 

--- This is an answer to some questions regarding getting box-plots etc out of the box, try using JMP, this is a statistical tools made by SAS, it is very user friendly with a lot of built in features.</description>
		<content:encoded><![CDATA[<p>Dilip, to add to your response, if there are a lot of fluctuations and they have a genuine reason, then this fluctuation is considered to be &#8220;normal&#8221;. The control limits are calculated based on the distribution of the data around the mean. In a lot of instances, the data is not perfectly normal, but a multlimodal distribution, these in most instances will be a group of smaller &#8220;normal&#8221; distributions. The process owner will be able to identify these triggers easily, this should help regroup the data into different distributions and then monitor them using control charts.  e.g the day of the week can be an indicator for hits on a website, or the time of day etc. </p>
<p>One short coming of fitting a straight line is ths normalizes the data in a very rough manner, a second degree polynomial might work better. </p>
<p>&#8212; This is an answer to some questions regarding getting box-plots etc out of the box, try using JMP, this is a statistical tools made by SAS, it is very user friendly with a lot of built in features.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dilip</title>
		<link>http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-132160</link>
		<dc:creator>Dilip</dc:creator>
		<pubDate>Tue, 12 Jun 2007 18:29:17 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-132160</guid>
		<description>An interesting article.

However, most data in the real world don't adhere to mean-related statistical techniques since they fluctuate a lot for very genuine reasons - which mean cannot capture since its a very basic indicator. 

An alternative method would be to fit the data onto a straight line using least square fit and use the slope, rather than the mean, to serve as the basis for control limits. This would take into account the growth/slowdown of products too.

Also, 3-sigma control limits are for eliminating outliers that fall in the outside 2% (meaning, the control limits contain 98% of the data between them). Applying a 2-sigma limit eliminates the outside 5% (meaning, the control limits contain 95% of the data between them). 

For those interested or knowledgeable of statistical distributions, the formulae are based on the assumption that the data follows a normal distribution (again, this is not case in most real-world data, though they can be massaged to fit into that assumption).</description>
		<content:encoded><![CDATA[<p>An interesting article.</p>
<p>However, most data in the real world don&#8217;t adhere to mean-related statistical techniques since they fluctuate a lot for very genuine reasons - which mean cannot capture since its a very basic indicator. </p>
<p>An alternative method would be to fit the data onto a straight line using least square fit and use the slope, rather than the mean, to serve as the basis for control limits. This would take into account the growth/slowdown of products too.</p>
<p>Also, 3-sigma control limits are for eliminating outliers that fall in the outside 2% (meaning, the control limits contain 98% of the data between them). Applying a 2-sigma limit eliminates the outside 5% (meaning, the control limits contain 95% of the data between them). </p>
<p>For those interested or knowledgeable of statistical distributions, the formulae are based on the assumption that the data follows a normal distribution (again, this is not case in most real-world data, though they can be massaged to fit into that assumption).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ravindra</title>
		<link>http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-119509</link>
		<dc:creator>Ravindra</dc:creator>
		<pubDate>Thu, 31 May 2007 09:58:10 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-119509</guid>
		<description>Very intresting article. We are implementing CMMI and one of the findings is : Specification limits for sub processes based on natural control limits can be identified" Can somebody help.</description>
		<content:encoded><![CDATA[<p>Very intresting article. We are implementing CMMI and one of the findings is : Specification limits for sub processes based on natural control limits can be identified&#8221; Can somebody help.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Search Engine Optimization Blog &#187; Show Me A Drunk And I&#8217;ll Show You A Statistic</title>
		<link>http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-112255</link>
		<dc:creator>Search Engine Optimization Blog &#187; Show Me A Drunk And I&#8217;ll Show You A Statistic</dc:creator>
		<pubDate>Thu, 24 May 2007 16:24:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-112255</guid>
		<description>[...] What do these statistics actually reveal? A lot. Or maybe nothing. As they say, torture numbers and they&#8217;ll confess to anything. It is up to you to decide what the key performance indicators for your community are. Once you have these KPIs, my best advice is to set up those nifty upper and lower control limits to filter out the statistical noise from the signal. [...]</description>
		<content:encoded><![CDATA[<p>[...] What do these statistics actually reveal? A lot. Or maybe nothing. As they say, torture numbers and they&#8217;ll confess to anything. It is up to you to decide what the key performance indicators for your community are. Once you have these KPIs, my best advice is to set up those nifty upper and lower control limits to filter out the statistical noise from the signal. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Emily Bruce</title>
		<link>http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-71887</link>
		<dc:creator>Emily Bruce</dc:creator>
		<pubDate>Thu, 29 Mar 2007 14:58:37 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-71887</guid>
		<description>Hi - 
I'm a relative newbie to the Analytics world but an avid fan of this blog. I'm soaking up the knowledge! Thank you to Avinash, et. al.

I have tried this application of upper and lower control limits on the rate of change of email opt in rates per week. I found I needed to eliminate the upper and lower 10% to get a reasonable standard deviation by which to examine a week's change from the previous week but otherwise it seems to work out pretty neatly.

Does anyone see an issue with using the control limits for this particular KPI? I'd love some feedback before I put this before my boss. Don't want to get the cross-eyes from him!

Thanks,
Emily</description>
		<content:encoded><![CDATA[<p>Hi -<br />
I&#8217;m a relative newbie to the Analytics world but an avid fan of this blog. I&#8217;m soaking up the knowledge! Thank you to Avinash, et. al.</p>
<p>I have tried this application of upper and lower control limits on the rate of change of email opt in rates per week. I found I needed to eliminate the upper and lower 10% to get a reasonable standard deviation by which to examine a week&#8217;s change from the previous week but otherwise it seems to work out pretty neatly.</p>
<p>Does anyone see an issue with using the control limits for this particular KPI? I&#8217;d love some feedback before I put this before my boss. Don&#8217;t want to get the cross-eyes from him!</p>
<p>Thanks,<br />
Emily</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: G Parris</title>
		<link>http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-49952</link>
		<dc:creator>G Parris</dc:creator>
		<pubDate>Mon, 26 Feb 2007 02:21:01 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-49952</guid>
		<description>Avinash, Thanks for the web forum. As a user of TQM methods for ten years I find it helpful to eliminate outliers in the data set by applying UCL and LCL, remove the data beyond 3 sigma and recalculate the limits, repeat until no outlier exists. 
All special cause events need to understood, as you state. Control chart rules are many but are statistically sound.</description>
		<content:encoded><![CDATA[<p>Avinash, Thanks for the web forum. As a user of TQM methods for ten years I find it helpful to eliminate outliers in the data set by applying UCL and LCL, remove the data beyond 3 sigma and recalculate the limits, repeat until no outlier exists.<br />
All special cause events need to understood, as you state. Control chart rules are many but are statistically sound.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: S.Hamel</title>
		<link>http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-33664</link>
		<dc:creator>S.Hamel</dc:creator>
		<pubDate>Wed, 31 Jan 2007 05:26:12 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-33664</guid>
		<description>I've been playing with Box Plots in Excel 2007 and I just posted a step by step technique on my blog.

Check it out at http://immeria.net/2007/01/box-plot-and-whisker-plots-in-excel.html

S.Hamel
http://immeria.net</description>
		<content:encoded><![CDATA[<p>I&#8217;ve been playing with Box Plots in Excel 2007 and I just posted a step by step technique on my blog.</p>
<p>Check it out at <a href="http://immeria.net/2007/01/box-plot-and-whisker-plots-in-excel.html" rel="nofollow">http://immeria.net/2007/01/box-plot-and-whisker-plots-in-excel.html</a></p>
<p>S.Hamel<br />
<a href="http://immeria.net" rel="nofollow">http://immeria.net</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Adelino de Almeida</title>
		<link>http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-29826</link>
		<dc:creator>Adelino de Almeida</dc:creator>
		<pubDate>Mon, 22 Jan 2007 16:15:27 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-29826</guid>
		<description>Avinash:

I applaud your effort in bringing some statistical rigor to the analysis of website performance, six sigma is a great tool chest to begin with.
The only aspect that I stress is that these tools provide great insights as long as they are applied correctly.
For instance, I've just posted about correlation and how it is normally mis-interpreted. Another great advantage of statistical/stochastic methods is that we no longer run our businesses out of averages but, rather, from expected ranges.
In any case: I am a great fan of your blog and I think this post is a great step towards bringing rigor into performance analysis.

Adelino 
adelino.typepad.com</description>
		<content:encoded><![CDATA[<p>Avinash:</p>
<p>I applaud your effort in bringing some statistical rigor to the analysis of website performance, six sigma is a great tool chest to begin with.<br />
The only aspect that I stress is that these tools provide great insights as long as they are applied correctly.<br />
For instance, I&#8217;ve just posted about correlation and how it is normally mis-interpreted. Another great advantage of statistical/stochastic methods is that we no longer run our businesses out of averages but, rather, from expected ranges.<br />
In any case: I am a great fan of your blog and I think this post is a great step towards bringing rigor into performance analysis.</p>
<p>Adelino<br />
adelino.typepad.com</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sushant Ajmani</title>
		<link>http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-29177</link>
		<dc:creator>Sushant Ajmani</dc:creator>
		<pubDate>Fri, 19 Jan 2007 22:22:02 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-29177</guid>
		<description>Avinash, this is really a thoughtful post and I really admire your reserach in this area.

In my opinion, this concept would be perfect for LEADING KPIs instead of LAGGING. Most of the management dashboards I have seen over the past few years, lacked LEADING INDICATORS and in the absence of that; it's very hard to take proctive decisions by looking at the trend. 

Most of the companies I have worked with simply relies on MOVING AVERAGE or to some extent WEIGHTED AVERAGE to compute the TREND but still; the picture is not clear because; they are not aware with the BASELINE figures and until and unless you have MIN and MAX MEAN values; it's hard to seperate the SIGNAL from the NOISE. 

CONTROL CHARTS are worth trying and I would definetely play with it.</description>
		<content:encoded><![CDATA[<p>Avinash, this is really a thoughtful post and I really admire your reserach in this area.</p>
<p>In my opinion, this concept would be perfect for LEADING KPIs instead of LAGGING. Most of the management dashboards I have seen over the past few years, lacked LEADING INDICATORS and in the absence of that; it&#8217;s very hard to take proctive decisions by looking at the trend. </p>
<p>Most of the companies I have worked with simply relies on MOVING AVERAGE or to some extent WEIGHTED AVERAGE to compute the TREND but still; the picture is not clear because; they are not aware with the BASELINE figures and until and unless you have MIN and MAX MEAN values; it&#8217;s hard to seperate the SIGNAL from the NOISE. </p>
<p>CONTROL CHARTS are worth trying and I would definetely play with it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Avinash Kaushik</title>
		<link>http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-29173</link>
		<dc:creator>Avinash Kaushik</dc:creator>
		<pubDate>Fri, 19 Jan 2007 21:59:07 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-29173</guid>
		<description>Adelino: I concur that a general application might be imprudent. In this case, the conversion rate example, it is not that we are setting a "upper limit" in the same way as we set goals. We are simply using a statistically generated "line" to help us understand metrics behavior. It is certainly not the upper limit of where we want the conversion rate to be (that would be 100% !), it is not a objective / company goal. 

My hope is that people will take this recommendation and use it as a mechanism to trigger deeper analysis when they swim in a sea of data that all fluctuates all the time. That's it, nothing more - nothing less. Though I can totally see how it could be misapplied.

I am a fan of your blog and I appreciate your feedback very much. Thanks.

-Avinash.</description>
		<content:encoded><![CDATA[<p>Adelino: I concur that a general application might be imprudent. In this case, the conversion rate example, it is not that we are setting a &#8220;upper limit&#8221; in the same way as we set goals. We are simply using a statistically generated &#8220;line&#8221; to help us understand metrics behavior. It is certainly not the upper limit of where we want the conversion rate to be (that would be 100% !), it is not a objective / company goal. </p>
<p>My hope is that people will take this recommendation and use it as a mechanism to trigger deeper analysis when they swim in a sea of data that all fluctuates all the time. That&#8217;s it, nothing more - nothing less. Though I can totally see how it could be misapplied.</p>
<p>I am a fan of your blog and I appreciate your feedback very much. Thanks.</p>
<p>-Avinash.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Clint</title>
		<link>http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-29171</link>
		<dc:creator>Clint</dc:creator>
		<pubDate>Fri, 19 Jan 2007 21:57:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-29171</guid>
		<description>Adelino,
I think that if you are slavish to the six-sigma framework and TQM then yes, that's a valid point because what you are trying to do is manage a process within specific limits - where an outlier, either positive or negative is bad thing.

The interesting thing that Avinash has done is to take some statistical tools from a very statistically mature framework (six sigma) and apply it to web analytics data.

Because the UCL and LCL are calculated on the fly, with the entire data set, I don't see how it limits or ignores growth, what it does is pin-point unusual events that need your attention.

Furthermore, if you are concerned about the impact of historical trends on the control limits, then you could limit your calculation of the standard deviation to the last N periods instead of the entire data set.

My 2 cents worth...

-Clint</description>
		<content:encoded><![CDATA[<p>Adelino,<br />
I think that if you are slavish to the six-sigma framework and TQM then yes, that&#8217;s a valid point because what you are trying to do is manage a process within specific limits - where an outlier, either positive or negative is bad thing.</p>
<p>The interesting thing that Avinash has done is to take some statistical tools from a very statistically mature framework (six sigma) and apply it to web analytics data.</p>
<p>Because the UCL and LCL are calculated on the fly, with the entire data set, I don&#8217;t see how it limits or ignores growth, what it does is pin-point unusual events that need your attention.</p>
<p>Furthermore, if you are concerned about the impact of historical trends on the control limits, then you could limit your calculation of the standard deviation to the last N periods instead of the entire data set.</p>
<p>My 2 cents worth&#8230;</p>
<p>-Clint</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Adelino de Almeida</title>
		<link>http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-29162</link>
		<dc:creator>Adelino de Almeida</dc:creator>
		<pubDate>Fri, 19 Jan 2007 21:30:03 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-29162</guid>
		<description>There is a problem with your example and with the whole six sigma in general: it does not easily account for growth. For example, in your last example you have a trended conversion rate, that is, a simple linear regression would tell you that the conversion rate is increasing with time. By setting an upper limit you are effectively stifling growth.
I've had rather negative experiences with people generalizing six sigma approaches beyond what they should.

Adelino de Almeida
adelino.typepad.com</description>
		<content:encoded><![CDATA[<p>There is a problem with your example and with the whole six sigma in general: it does not easily account for growth. For example, in your last example you have a trended conversion rate, that is, a simple linear regression would tell you that the conversion rate is increasing with time. By setting an upper limit you are effectively stifling growth.<br />
I&#8217;ve had rather negative experiences with people generalizing six sigma approaches beyond what they should.</p>
<p>Adelino de Almeida<br />
adelino.typepad.com</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: S.Hamel</title>
		<link>http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-28827</link>
		<dc:creator>S.Hamel</dc:creator>
		<pubDate>Fri, 19 Jan 2007 00:08:41 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-28827</guid>
		<description>I would add to Wendi's comment about Quartiles, IQR and outliers. Using box plot sometimes communicates very easily the most important numbers: median, ranges, extremes, outliers, trends (by being skewed one way or the other), etc. The only problem is that no we tool provides this kind of graphing out of the box and we have to play around manually with the data.

Check out http://en.wikipedia.org/wiki/Box_plot

S.Hamel
http://immeria.net</description>
		<content:encoded><![CDATA[<p>I would add to Wendi&#8217;s comment about Quartiles, IQR and outliers. Using box plot sometimes communicates very easily the most important numbers: median, ranges, extremes, outliers, trends (by being skewed one way or the other), etc. The only problem is that no we tool provides this kind of graphing out of the box and we have to play around manually with the data.</p>
<p>Check out <a href="http://en.wikipedia.org/wiki/Box_plot" rel="nofollow">http://en.wikipedia.org/wiki/Box_plot</a></p>
<p>S.Hamel<br />
<a href="http://immeria.net" rel="nofollow">http://immeria.net</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Clint</title>
		<link>http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-28824</link>
		<dc:creator>Clint</dc:creator>
		<pubDate>Fri, 19 Jan 2007 00:06:24 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-28824</guid>
		<description>Anonymous,
the spreadsheet that I posted only uses one Standard Deviation because using 3 would have set the LCL to 0.1 - an impossibility for page views per visit. And even using 2 standard deviations put the LCL below 1 which should also be an impossibility.

Therefore, I only used one std dev. so that my LCL was the next minimum above the floor (1).

To your comment, about negative LCLs - you could easily update the formulae to be an IF statement so that zeros are returned if the LCL is negative.

-Clint</description>
		<content:encoded><![CDATA[<p>Anonymous,<br />
the spreadsheet that I posted only uses one Standard Deviation because using 3 would have set the LCL to 0.1 - an impossibility for page views per visit. And even using 2 standard deviations put the LCL below 1 which should also be an impossibility.</p>
<p>Therefore, I only used one std dev. so that my LCL was the next minimum above the floor (1).</p>
<p>To your comment, about negative LCLs - you could easily update the formulae to be an IF statement so that zeros are returned if the LCL is negative.</p>
<p>-Clint</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Anonymous</title>
		<link>http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-28745</link>
		<dc:creator>Anonymous</dc:creator>
		<pubDate>Thu, 18 Jan 2007 20:15:53 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-28745</guid>
		<description>First, excellent post.  In my opinion, the biggest problem in web analytics right now is an understanding of how to interpret the data and what actions to take.  I think this sort of representation goes a long way to help the former.

While I'm not that familiar with TQM, but I have been thinking about how to apply statistical techniques to web data.  I had actually started down the path of creating a standard deviation based graph.  

One note, I think the excel spreadsheet that you posted a link contains an error.  It has the formula for the UCL and the LCL as only one standard deviation from the average.  I believe the correct formula is 3 times the standard deviation.  Although I like the idea of showing one and two standard deviations in the graph as well as the UCL and LCL.  Another thing the spreadsheet should take into account is that if the LCL is a negative value then it should be set to zero.</description>
		<content:encoded><![CDATA[<p>First, excellent post.  In my opinion, the biggest problem in web analytics right now is an understanding of how to interpret the data and what actions to take.  I think this sort of representation goes a long way to help the former.</p>
<p>While I&#8217;m not that familiar with TQM, but I have been thinking about how to apply statistical techniques to web data.  I had actually started down the path of creating a standard deviation based graph.  </p>
<p>One note, I think the excel spreadsheet that you posted a link contains an error.  It has the formula for the UCL and the LCL as only one standard deviation from the average.  I believe the correct formula is 3 times the standard deviation.  Although I like the idea of showing one and two standard deviations in the graph as well as the UCL and LCL.  Another thing the spreadsheet should take into account is that if the LCL is a negative value then it should be set to zero.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Wendi Malley</title>
		<link>http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-28740</link>
		<dc:creator>Wendi Malley</dc:creator>
		<pubDate>Thu, 18 Jan 2007 19:37:21 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-28740</guid>
		<description>Another suggestion for a more robust calculation for upper and lower control limits are using quartiles:
Q1 = 25th Percentile
Q2 = 50th Percentile, a.k.a. Median
Q3 = 75th Percentile  
LCL: Q1 - 1.5*IQR   (IQR stands for Inter Quartile Range, IQR = Q3 - Q1)
UCL = Q3 + 1.5*IQR
For &lt;b&gt;extreme&lt;/b&gt; outliers you can replace the 1.5 IQR mild outlier multiplier with a 3.   
http://en.wikipedia.org/wiki/Outlier  

Thanks for a great post, Avinash.</description>
		<content:encoded><![CDATA[<p>Another suggestion for a more robust calculation for upper and lower control limits are using quartiles:<br />
Q1 = 25th Percentile<br />
Q2 = 50th Percentile, a.k.a. Median<br />
Q3 = 75th Percentile<br />
LCL: Q1 - 1.5*IQR   (IQR stands for Inter Quartile Range, IQR = Q3 - Q1)<br />
UCL = Q3 + 1.5*IQR<br />
For <b>extreme</b> outliers you can replace the 1.5 IQR mild outlier multiplier with a 3.<br />
<a href="http://en.wikipedia.org/wiki/Outlier" rel="nofollow">http://en.wikipedia.org/wiki/Outlier</a>  </p>
<p>Thanks for a great post, Avinash.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Avinash Kaushik</title>
		<link>http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-28729</link>
		<dc:creator>Avinash Kaushik</dc:creator>
		<pubDate>Thu, 18 Jan 2007 18:49:59 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2007/01/excellent-analytics-tip-9-leverage-statistical-control-limits.html#comment-28729</guid>
		<description>&lt;strong&gt;&lt;u&gt;Brian&lt;/u&gt;&lt;/strong&gt;: Control Charts were originally popularized by the Six Sigma efforts and applied towards reducing defects in manufacturing processes. In as much they are not right for everything (hence my stress on isolating for multitudes of causes and using it for more "process" oriented metrics such as Cart &#038; Checkout Abandonment rates). 

I am not sure what top secret NASA data you are pumping into this :) but if you have segmented the data and it still explains "all variability" then in the past I have tried to put in more data points to see if it helps. I am not sure if it will help in your case. Thanks though for trying it, it is great to hear feedback from others.

&lt;strong&gt;&lt;u&gt;Anonymous&lt;/u&gt;&lt;/strong&gt;: As sketched in the post it would accommodate for all the data points including the outliers. But when that is observed in a metric / process I have typically done what the wonderful Mr. Morgan suggests in his comment directly above. If relevant in your case please try that (it is not just a matter of ignoring or not the top/bottom percentiles, I would encourage you to think of that decision in the context of the business process / Metric / KPI that you are applying the control limits to, i.e. what are you measuring and what is the business context of the outliers).

&lt;strong&gt;&lt;u&gt;Dave&lt;/u&gt;&lt;/strong&gt;: As always thanks for your comment, it is wonderfully helpful and adds to the quality of the conversation. Gracias!

-Avinash.</description>
		<content:encoded><![CDATA[<p><strong><u>Brian</u></strong>: Control Charts were originally popularized by the Six Sigma efforts and applied towards reducing defects in manufacturing processes. In as much they are not right for everything (hence my stress on isolating for multitudes of causes and using it for more &#8220;process&#8221; oriented metrics such as Cart &#038; Checkout Abandonment rates). </p>
<p>I am not sure what top secret NASA data you are pumping into this :) but if you have segmented the data and it still explains &#8220;all variability&#8221; then in the past I have tried to put in more data points to see if it helps. I am not sure if it will help in your case. Thanks though for trying it, it is great to hear feedback from others.</p>
<p><strong><u>Anonymous</u></strong>: As sketched in the post it would accommodate for all the data points including the outliers. But when that is observed in a metric / process I have typically done what the wonderful Mr. Morgan suggests in his comment directly above. If relevant in your case please try that (it is not just a matter of ignoring or not the top/bottom percentiles, I would encourage you to think of that decision in the context of the business process / Metric / KPI that you are applying the control limits to, i.e. what are you measuring and what is the business context of the outliers).</p>
<p><strong><u>Dave</u></strong>: As always thanks for your comment, it is wonderfully helpful and adds to the quality of the conversation. Gracias!</p>
<p>-Avinash.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
