Arthur C. Clarke said:
"Any sufficiently advanced technology is indistinguishable from magic."
That quote comes to mind when I think of a new feature in Google Analytics that carries the unassuming name of Weighted Sort. It is an advanced implementation of technology (mathematical algorithms in this case) and when used it very much feels like magic!
In this blog post I want to share with you why I am so incredibly excited about this feature, how it works and how going forward you will reject every tool that does not come built in with this feature (ok so maybe that's a stretch, but I promise you this is so cool that at least for a few minutes you'll think other tools are lame by comparison!).
Let's take a couple of steps back, get some context before we dive in.
We have a very long tail of data in web analytics. Tens of thousands of rows of keywords in the Search Report (even for this small blog!). Hundreds and hundreds of referring urls and campaigns and page names and so on and so forth.
Yet because we are humans we tend to look at just the top ten or twenty rows to try and find insights. The problem? The top ten of anything rarely changes (except in rare circumstances like a sale or on a pure content – think news – site).
Hence I have persistently evangelized the need for true Analysis Ninjas to move beyond the top ten rows of data to find insights.
How? Advanced table filters, tag clouds and keyword trees are a good start.
But we need more.
One more problem though.
As if massive data we have is not enough of a problem, we also rely on Averages, Percentages, Ratios and Compound/Calculated Metrics in a profoundly sub optimal way, as a drunken man uses lamp-posts – for support rather than for illumination.
Take a percentage, for example Bounce Rates. The top ten won't change.
Hmmm. what to do. what to do?
You know what I'll try to find the keywords with the highest bounce rates and fix them! After all I don't want to have all those visitors say: "I came. I puked. I left!"
Ok analytics tool: Sort descending!
See all those single visits? Would improving these bounce rates have a huge impact?
Ok maybe I should learn from keywords with low bounce rates so I can perhaps take the lessons from my awesomness and apply it to others. Tool: Sort ascending!
Arrrrrh! Again! Useless.
What could I possibly improve by focusing on these keywords with so few visits? Nothing.
So to recap:
- We tend to only understand the top ten rows of data, because that's what is easily visible.
- Gold exists beyond the top tend rows.
- Using percentages, averages sub optimally makes it impossible to find the Gold!
Yet gold I must find if I want to improve the outcomes for my web business (for profit or, as in the above example, non-profit).
The Google Analytics team has built a innovative and mathematically intelligent new feature called Weighted Sort to precisely solve this problem.
Now when you sort the data off a percentage or a ratio, like in the above case, you'll see this on top of the table.
When you press this unassuming checkbox something magical happens. Google Analytics brings back for me the rows of data I should analyze further to have the highest possible impact on my business.
It looks like this. . .
Notice that the Visits for these keywords are sorted in an "odd" manner, as are the bounce rates.
That is the magic.
Now you don't have to go through wild gyrations (or worse guesses) to figure out the best places to focus your attention on. You can skip combing through the, in this case, 5,777 rows of data. The algorithm will do that for you!
The "magic button" will sort your data from:
"focus here because something very important is going on here and if you focus here chances your improvements leading to reducing bounce rates will have a very high ROI for your business"
"rows/keywords where your efforts might not quite yield big ROI improvements"
Translation: Sort by "interestingness". What are the most interesting keywords with high bounce rates? [Where things are going "wrong".]
You can reverse sort the table, keeping the Weighted Sort checkbox on, and you'll find the most interesting keywords with the lowest bounce rates [where things are going right].
No more using silly ascending and descending sorting. No more worrying about if you are focused on the right places. Less worrying if you are prioritizing things right.
Save time. Do less data puking. Be happier!
Awesome right? Go try it on your own Google Analytics data!
How Does Weighted Sort, aka The Magic, Actually Work?
Good question. It is also the reason for the Arthur C. Clarke quote.
Of course there is no magic, it is all the beauty of some wonderful math and ingenuity.
But it is complicated.
Let me try to explain it as best I can using some visualizations and formulas.
What powers weighted sort?
This simple hypothesis:
The true value of a metric (bounce rate, conversion rate, time on site etc) for dimensions with small participants will be imprecise.
English: If the dimension you are looking at is referring urls and if only five visits this month originated from Bing then a conversion rate of 80% (or a conversion rate of 20%) is not reflective of the "true" conversion rate.
There are too many unknown variables, or irreplicable events, that could have contributed to that number (80 or 20) making it incredibly difficult to make any decisions based on just 5 visits.
You saw this problem when I sorted descending or ascending for the metric bounce rate above.
So how do you address this problem?
The fearless developers were given this amazing goal:
Compute the "expected true value" for each row on the table.
It is a difficult problem to solve. But since the actual values are not very useful, applying some logic and mathematical intelligence to figure out what the true value is can brilliantly help identify "interesting" data (aka where to focus).
Google Analytics computes the expected true value (in our case above "expected true bounce rate") and then sorts the data using the expected true value (ETV) giving you the most interesting data to look at.
The expected true value (ETV) is not shown in the UI (as it would simply be distracting).
How exactly do you compute the "expected true value"?
That is a good question.
Think of a scale. On one end there are is a dimensional value (keywords, countries, referring urls, product names etc) with zero visits and "a lot" of visits at the other end.
Let's assume we are analyzing the dimension countries and the metric bounce rate.
Remember out hypothesis above? True value of a metric is not reflective when it comes to small samples (visits in our case).
So if there was one visit from South Africa its actual bounce rate reported in the tool is not a precise reflection of what the true value might be. But if there were A Lot of visits from South Africa then the actual value is reflective of the true value.
Put another way. . . I request you to pay attention. . . .
For values to the very far left of the scale we equate the expected true value (ETV) to be equal to site average. A very safe bet.
For value at the very far right of the scale (i.e. "a lot" of visits) it is quite likely that the ETV will be equal to the actual value. Makes sense right?
All other points between the left and the right will have ETV's that will be a blend of the site average and actual values.
Hence when computing ETV. . .
Those closer to the left (fewer visits) will have a higher blend of site average compared to actual values.
Those closer to the right (many many visits) will have a higher blend of actual value compared to site average.
Here's a image that explains this very critical concept clearly. . .
Crystal clear on how ETV's are computed?
The quest is to figure out the estimated true value (ETV) for any metric for a given dimensional value (keyword, referrer, campaign, display ad, social media strategy).
NOTE: Numbered values (0.01, 0.99, 0.5 etc) are for illustrative purposes, just to explain how weighted sort works. Actual values used in your report are intelligently and automatically computed in context of your data.
Can you give me a specific example of ETV computation?
Let's say you are a multi billion dollar multi country multi people corporation with multiple products and services.
The next step in your world domination plan is to figure out how best to move beyond your current list of country domination (United States, Brazil, UK, India, Spain).
What do you do?
You'll look at where your traffic comes from and look at bounce rates, to figure out how you can retain even more people who land on your website. You are confident that if you just retain them beyond one page, engage them beyond your 200 mb flash intro, then you know you'll suck them into your business. Then world domination is but 15 minutes away!
So you log into your web analytics tool and you'll probably see a report like this in Google Analytics, or Omniture / Adobe or CoreMetrics / Unica / IBM or WebTrends or. . .
And you let out a little sarcastic: Just great.
The report has confirmed what you already knew from starting at the same top ten row. You very quickly went nowhere.
But you are in luck, you are using Google Analytics! (At least in my imagination. :)
You click on the Bounce Rate column to sort and then check on the Weighted Sort column and. . . bam!
Something useful. . . .
Sorted by "interestingness"!
You are now looking at a intelligently sorted list of countries where if you focus on improving your bounce rates (i.e. lower them) you'll have the best bang for you recession hit buck!
Segment the traffic from Argentina, Peru, Spain, Colombia, Chile and Denmark and you are on your way to the aforementioned world domination.
But how did Argentina rank #1 (4k visits), Peru #2 (1.5k visit), Spain #3 (8.8k visits)?
Analytics used the, again, aforementioned formula to compute the estimated true value (ETV), by leveraging Average Bounce Rate (64%) and Actual Bounce Rate for each country (last column above) and assigning contextual weights based on Visits from each country.
Let us see how the ranking worked by reverse engineering it. Here is what happened:
Argentina Bounce ETV = (0.01*avg BR) + (0.99*actual BR)
Argentina Bounce ETV = (0.01 * 63.49) + (0.99 * 79.53) = ETV = 79.37
Peru Bounce Rate ETV = (0.1 * 63.49) + (0.9 * 80.24) = ETV = 78.57
Spain Bounce Rate ETV = (0.001 * 63.49) + (0.999 * 77.76) = ETV = 77.75
Now you can see how each country, even though visits are very different, were sorted #1, #2, #3. By interestingness, by computing ETV for each.
Where did the number in red come from? You were not paying attention!!
Remember the scale? (If not see picture with scale above.)
The numbers in red are:
1. just for illustrative purposes in this blog post
2. a function of where Visits by a country would fit, closer to the Zero (Peru) or closer to A LOT (Spain), hence the name weighted sort
3. always computed uniquely for your website data based on a intelligent mathematical formulation (which is patent pending and I can't reveal to you!)
You now understand how weighted sort works! Yea!
What if you wanted to discover which are the most interesting countries to focus on, where bounce rates are already low, and deepen your world domination?
Reverse sort the table. . .
Examples Of Weighted Sort Analysis You Should Try.
I wanted to close this post by highlighting other places you can use weighted sort and some other types of analysis you could do.
Focus your efforts for attracting New Visitors to you site.
Weighted Sort also works with % of New Visits. So let's say you are a newspaper and up against the "newspapers" of Fox. To survive you must find new countries (or Cities or Regions) from which to attract lots more new visitors from.
Well just sort by % of New Visits and you'll have the answer. . . .
Now you know where to focus.
[Remember that for a newspaper Repeat Visits are also great! :)]
How about looking at the most interesting countries from where the % of New Visits is already high? Just reverse sort the above column.
You might then want to segment that data to go see if over time Visitor Loyalty for those countries is also increasing, or these are just fly-by-night visitors.
Valuable analysis right?
Understand audience preferences, improve $$, for a non-ecommerce site!
I don't have ads or promotions on this website. But like any good Analysis Ninja I have identified my goals (I have six) and then identified values for each goal. The values define revenue that does not come to me directly, on this site, but rather comes to me in other ways as a result of the work I do on the blog (multi channel impact baby!).
The benefit of Goals and Goal Values is that it helps me do "financial analysis" for all the traffic I get (you all!). That means I can focus on what works for you and what works for me.
The metric I use is $ Index. It is the average value a given page or a set of pages add to the overall pie.
The analysis I want to do is to understand what pages / content I should focus on to create the highest possible impact.
I am not going to look at the normal table found in Google Analytics or Site Catalyst or Yahoo! Web Analytics.
I am going to look at the table with Weighted Sort turned on to identify the rows with "interestingness". . .
Who would have thunk that my public speaking engagements page was of so much interest and creating so much value for me, with just 469 page views! Certainly not me.
Some of the other rows of data were also unexpected (I need to do more videos, podcasts!) and others were just plain gratifying (I love killing useless metrics, and so do you!).
But there was also heartbreak.
When I reverse sort the data, to find which blog posts / topics are not generating enough $ Index (value), I was sad to see this was the #1 post. . .
I was really sad because I was a manager and a senior manager and a director of a web research and analytics teams. The above post distills my little wisdom.
More people should read this post (and similar by others) because day in and day out I see wrong people leading analytics teams causing problems for the company and sucking the life out of the Analysis Ninjas. And I hate that.
See why my heart is broken with that low $0.11 value?
But at least I know!
Money, Money, Honey Bunny!
Can't close a blog post without an example of conversion rates right?
Traffic comes to your website from many sources. We typically tend to look at silos and rare compare across acquisition channels.
Hence I recommend that you look at one of my favorite reports: All Traffic Sources.
Let's suppose you are an Analysis Ninja called Nico Weber. Now at a glance you can compare direct traffic with referral traffic with paid search with organic search with campaigns with. . . everything! Make it your new favorite.
When you report to your Sr. Leader now you can look across ALL traffic channels and tell her/him which ones are most interesting for the company. . .
Did anything in web analytics look more delightful? [Maybe the Intelligence Reports. :)]
The above table helps you prioritize where your most interesting sources of traffic are, not by conversion rate only but rather by using a intelligent mathematical algorithm that weights against Visits while computing estimate true value of the conversion rate.
Oh and don't forget to reverse sort and find the "loser" traffic source prioritized by interestingness!
That's weighted sort.
It's a simple feature, a great addition to the portfolio of techniques that Analysis Ninja's will use to find insights faster and focus on what's important.
It is my fondest hope that web analytics vendors like Adobe, I B M, Yahoo! will take a step back from this constant quest to collect every more data and just puke it out. I hope they'll take mercy on the Reporting Squirrels and Analysis Ninjas of the world and spend 10% of their vendor resources on making tools smarter, a bit more intelligent. We deserve at least that much.
I hope the Google Analytics team also continues to do so.
Ok your turn now.
What do you think of this small feature in Analytics? Do you understand how it works? Do you use it in your job already? What do you think the team at Google did right with this feature? What could they have done better? Are there other techniques you use to move from Data to Insights faster?
Please share your feedback, tips, critique, words of praise, and all else via comments.