Data analysis is not easy. It takes years to get good at it, and once you get good at it you realize how much more there is to learn. That is part of the joy. You are always learning. You are always growing.
This blogpost is a collection of tips I share with my friends who are just starting out. Each tip is a "simple" mistake that is easily avoided. My hope is that you'll skip them if you are aware of them, and move on to making more important valuable mistakes. :)
My plan is to wrap each tip with additional observations, context that will be of value even to those who have been at this game for a very long time.
Ready for a can of concentrated compressed energy?
Let's do this.
1. Never Compare Apples to Watermelons.
There are some things that are quite promising about this graph.
I love that the analyst is segmenting the data rather than showing the aggregate trend ("all data in aggregate is essentially crap" – me). I also like that the analyst is showing a six month trend.
But there is something fundamentally wrong about this analysis. Before you jump to my reveal below this graphic, can you guess what's wrong with this data? Try it?
Found the problem?
Four different segments are being compared (yea!), but they are calibrated wrong (boo!).
On the surface this is hard to detect.
The part that is clean is that there is very little overlap between Search Traffic and Referral Traffic. If you use Omniture's Site Catalyst or Google Analytics or whatever, they do a good job of collecting clean data into those two segments. But Mobile is a platform. That traffic (or conversions in this case) is most likely in both Referrals and Search. So it is unclear what to make of that orange stacked bar. Is that good? Is that bad? Additionally it is showing conversions already included in Search and Referral (double counting) and because you have no idea what it is, it is impossible to know what action to take. [The analyst recommended a higher investment in Mobile based on this graph!]
Ditto for Social Media. It is likely that the Social Media conversions are already included in Referrals and, of course, in Mobile. Making that green graph useless. [The analyst recommended a massive increase in investment in Social Media as well. An imprecise conclusion.]
Ensure that you always calibrate the "altitude" of your segments. Always.
So if you want to analyze Mobile performance then you want to compare Mobile and Desktop segments. Very easy to create. For bonus points you can analyze Mobile Search traffic performance with Mobile Non-Search traffic performance. You can analyze Mobile Search performance with Mobile Referring traffic information. Then compare those two to Desktop Search and Desktop Referring traffic. So on and so forth.
Nice clean segments that will help you find nice clean answers (as good or as stinky as they might turn out to be :).
For Social Media you can compare it to Search (with no other changes to that segment, use the Default in GA/SC/WT/YWA), and for Referring Traffic make sure you create a new segment where you take out Referrers such as Facebook.com, Twitter.com, plus.Google.com, Stumbleupon.com etc., etc. So you'll be comparing clean buckets of Social Media, Search and Referring Traffic with no social referrals included.
Nice clean segments that provide you nice clean answers.
Always pause and ask yourself: "Are my segments all at the right 'altitude?' Are they individually unpolluted by the other?"
Then go analyze and confidently make recommendations based on what you find.
2. Don't Alarm HiPPOs and Sr. Leaders Unnecessarily.
Creating graphs is easy, and I could fill five blog posts with all the nonsense one can accomplish by playing with the axes. Yes it is a pet peeve of mine.
What do you think is wrong with this commonly available graph?
Look at it carefully? Found it?
It artificially inflates the importance of a change in the metric might not be all that important. In this case for my data it is not statistically significant (more on that later in this post), but there is no way you would know that (or not know that) just from the data in front of you. Yet the scale used for the y-axis implies that something huge has happened.
I am going to go out on a limb…. unless you are performing surgery and the above graph is showing the heart rate or blood pressure, try and avoid being so melodramatic in your data presentation. It causes people to read things into the performance that they should most likely not read.
You don't always have to have the y-axis at zero. But over-dramatizing this 1.5 point difference is a waste of everyone's time. And you know what happened to the boy who cried wolf one too many times right?
Another important thing.
Label your x axis. Please.
What time period does this graph cover? The last x hours? The last y weeks? The last z months? Depending on what you choose the data is completely ignorable or deserving of insane additional analytical love. (Assuming of course that you fix the y-axis first.)
As the analyst you hold a lot of power in your hands when it comes to visualizing data. Use that power with caution, and great responsibility.
3. Calibrate Your Time Series Optimally.
I am positive that many of you, including my friends who are just getting started, will have taken this screen shot out of Google Analytics and included it in a dashboard or presentation of some kind.
Don't scroll down yet.
Look at it carefully…. what's wrong with it?
It is a chart that shows nine months of performance… by day! The "trend" is completely useless.
But because this is the default view in Google Analytics everyone uses is. [Arrrrrhhh!] The uselessness comes from the fact that when you look at individual days over such a long time period you are effectively hiding insights / important changes.
It is impossible to find anything of value above.
Let's switch to looking at the exact same time period but by week.
Much better right? No more puke of squiggly lines that mean nothing, show nothing. You can kind of sort of see some kind of trend above, especially towards the end of the graph (even this simple thing was essentially hidden before).
Here's the amazing thing… when looking at long time periods you can do better!
The best practice I recommend in Web Analytics 2.0 is that if you are looking at four weeks of data then you can look at the daily trend and still find interesting insights.
If you look at three months of data (one quarter) then you should switch from the day view to week view. The macro trends won't get masked/hidden in the daily noise.
If you look at time periods long than that then it is optimal to look at the monthly view of the data.
In our case this is what that would look like….
You can clearly see the dip from Jan to Feb. You can see the nice consistent dip through July. Then something magical happened (What! What! What!) that has traffic rising to record levels.
All of this was nearly impossible to see in the daily graph, and most of it was hard to see in the weekly graph.
Do remember this really important point: When you look at lots of data, nine months in this case, you are usually not looking for tactical bits, you are trying to find big hairy things… calibrate your time series accordingly.
And if you calibrate your segments optimally you can quickly start doing deep dive analysis looking for some answers. What happened post July? What caused the funk between March and July? Why did x or y or z not happen? All the right good questions that otherwise might have been hidden in plain sight.
Simple best practice. Use it.
4. Always, Always, Always Make Your Point Clearly! (Oh, and Colors Matter.)
Everyone of you will present decks with 95 slides. Or at least 55. : )
When you are doing that data regurgitation it is important to try to make life for the person at the other end (typically your boss, or worse your boss's boss) as easy as possible.
At some point in the data tsunami you unleash eyes glaze over and life becomes boring.
So try to… ok, what do you think the two colors in the below graph represent? Don't look at the legend.
Bonus, what do you think the data is telling you? Don't scroll, think for just five seconds.
I my first thought was how come only 29 percent of the organizations have more than one person! That is bad.
Wait. That did not make sense.
I went back to read the question. Then the graph. Then the legend. Then back to the question. Then the legend.
Problem one is that "red" denotes "good" in this case and "green" represents "bad."
Here's something very, very simple you should understand and slavishly follow: Red is bad and Green is good. Always. Don't try to be cute. People will instinctively think that. We have been patterned that way. So show "good" in green and "bad" in red. It will communicate your point faster.
Problem two, much worse, and perhaps only for me, was that it was harder than it should be to understand what this data. First stacked bar above: "Yes 71 percent of the organizations Yes, more than one person." Too many yesses.
And what is the 29 percent? If the question is how many people are directly responsible for improving conversion rates and 71 percent have more than one person, then 29 percent are those that have less than one person or no one? Or just less than one person? Unclear (and frustrating).
[Third bar above] And if 62 percent of the people said "Yes we have no one responsible for improving conversion rates," then what does the 38 percent in green mean? Is it: "No, No we have someone responsible for conversion rate improvement?"
This graph actually comes from a source I deeply respect, an organization with really great analysts. But I'm afraid I completely failed to grasp the point. Do you understand it?
Sometimes you just want to skip the graph.
I don't understand the data above so I'm going to make some numbers up, but would a table like the one below have worked much better to communicate the point?
Why do the graph at all?
Okay so sometimes the application of something humorous might not work (I do always try :). But the rest of the table? Effective?
And if you have data for the last two years then perhaps this table is even more valuable…
Much, much better with context. I love context dearly. Amazingly so does your boss.
Or perhaps if you want to show it to very senior executives then maybe the numbers themselves are less than useful. You could go with something like this…
Scroll back up.
Look at the graph.
Now look at the table above.
I'm riding a horse! No, not really. What do you think?
I love graphs as much as all of you. But above all, what I crave is simple and effective communication. I want to make the point as fast as I can so that we can begin the politics and hard work of taking action. That is after all what pays our salary right?
5. Statistical Significance is Your BFF.
Okay I gave this one away with the title. We all (novices and experts) make this mistake all the time.
We create a table like the one below. (Mercifully the segments are calibrated right, hurray!) We create a "heat map" in the table highlighting where the conversion rate is good. We declare Organic to be the winner, Direct is close behind. Then the other two. And we recommend doing more SEO.
What's the problem with that?
None of this data could be significant – as in the fact the numbers seem to be so different might not mean anything. [Looking at July...] It is entirely possible that it is completely immaterial that Direct is 34% and Email is 10%, or that Referral is 7%.
One simple fix (covered in more detail in this post: 4 Not Useful KPI Measurement Techniques ) is to share the raw numbers to see if the percentage is meaningful at all. For example all the data in the Direct row could represent conversions out of 10 visits and all the Referral data could be representing conversions from 1,000,000 visits each month.
The better, much, much better thing to do would be to compute statistical significance to identify which comparison sets we can be confident are different, and in which cases we simply don't have enough confidence.
I have something special for you. I've just uploaded a brand spanking new Statistical Significance Calculator to my old post on that topic. It does 1-tail and 2-tail tests and the even more beloved chi-square test. Download it. Adapt it for your use. Ecstasy will follow.
One of the most common complaints of our Sr. Leaders is that we engage in massive data puking (true!) and never help them identify with any degree of certainty if an action you are recommending will produce results. Well, this is our chance. If you check to see if the results you are seeing are statistically significant, then make recommendations of action knowing that that will produce results you want (all other things held constant).
Remember ecstasy awaits!
Update: Bonus: If you use Google Analytics the always wonderful Michael Whitaker has created something delightful (triggered by our discussion in comments below). A Z-Test calculation that you can embed directly into Google Analytics!
Here is a mini-tutorial on how to use this delightful feature:
1. Visit Michael's blog and drag the bookmarklet into your browser's bookmarks bar. Stats calculator for Google Analytics.
2. Go to any report in Google Analytics and switch to a Goal tab or the Ecommerce tab.
3. Click Z-Test bookmarklet in your bookmarks bar.
4. At the bottom of your GA report table you'll see a new button called Z-test.
5. Check the box next to two dimensions for whom you would like to check statistical significance (apply the Z-test).
6. Press the button at the bottom of the table, Z-test, and boom (!) you have your answer. Green is good, red (lower then 95%) means you need to collect more data before you decide.
The conversion rate between our two main PPC keywords is 1.33% and 1.94%. Is that data statistically significant? Should we go ahead and invest more in Calico Critters (if we are using fixed budgets or there is more inventory)? Let's check…
Why yes of course we can!
Twitter sends 5,546 Visits and has (on a non-ecommerce website) a Goal Conversion Rate of 5.27%. Facebook sadly only sends a fraction of traffic and has a lower conversion rate 4.71%. Stop spending money/time in Facebook based on this data? Deprioritize it at least? Let's check….
No! See how that saved your goat, you were just about to plunk down a million dollars into Twitter! :)
7. Celebrate your new found awesomeness!
This only, currently works for Ecommerce Conversion Rate and Goal Conversion Rate key performance indicators.
For computing significance ("are the two conversion rates different enough that you can confidently take action") on Ecommerce Conversion Rates you can use this with no thought. (Ok always apply some thought!) But for using it to compute significance for Goal Conversion Rate you should be a little more careful. Unlike Ecommerce Conversion Rate, it is possible for a person to have more than one unique Goal Conversion during a visit in Google Analytics. So when you apply the Z-test you'll be comparing "rotten apples to rotten apples," i.e. measuring the same way for all dimensions. In the most ideal scenario you would apply the Z-test to each goal by itself. I still believe it is of value to use the Z-test for Goal Conversion Rates, but be aware of the nuances.
One more important caveat. Z-test / statistical computations are most optimally applied to results of controlled experiments and not to observational data because in the latter there could be other, uncontrolled, variables at play. So this is not "pure" in some sense. But (as I mention below in comments) we are better off being aware of this purity and still using this test because the insight delivered is better than just "eyeballing" the number to figure out when to take action.
Many thanks to Michael for doing this. No more going to excel (at least for GA), we can be a little smarter quicker directly in our web analytics tools. Makes me wonder why web analytics vendors are so enamored with data puking and can't build all this stuff natively to make more of us Analysis Ninjas!
6. There is Such a Thing as Too Little Data!
A variation on the above "simple" mistake.
I know we all get excited about having data, especially if we are new at this. And we get our tables and charts together and we start reporting data and having a lot of fun.
This, dear reader, is very dangerous. You see there is such a thing as too little data.
You don't want to wait until you've collected millions of rows of data to make any decision, but the table on the left is nearly useless. Recommending doubling down on Facebook (as the Analyst did) this early in your evolution would be a profound mistake.
Things can change so much in just a few days (and they will for you!).
So you can't do anything with data like this?
But what you can do is look at this report to see if places you've invested time in earning links from are sending you traffic (or not). Look for surprises, places you did not invest money, and see why they linked to you. You can get a tiny bit of understanding of your initial marketing strategy.
Do other useful things.
Look at your search keyword reports. Do you see a few people coming on keywords you SEOed the site for? Better still, go into Webmaster Tools and look to see if your site is well indexed. Look at the keywords for which your site is showing up in Google search results. Are they the ones you were expecting?
Even better… spend time with competitive intelligence tools like Compete / Trends for Websites, Insights for Search, Ad Planner and others to seek clues from your competitors and your industry ecosystem. At this stage you can learn a lot more from their data than your data!
We all tend to read too much into data sometimes. A good analyst knows when there's just not much there and volunteers her/his time on helping run a Task Completion Rate survey or creating new/better Inbound Marketing programs. Go get traffic!
7. Pie Charts Are Evil.
Okay maybe not evil. They are useful on rare occasions. See "Enchanting Analysis: Rule 2: Establish Macro Importance" in this post: Mate Custom Reports With Advanced Segments!
But most of the time they are an active hindrance to communicating anything of value.
Examples of horrible pie charts abound. But let me share this really simple one that I am sure you've seen or perhaps created yourself. :)
Take a moment to breathe it into your brain. What do you think?
The 3D effect does not help. Trust me on that.
This set of charts very cleverly hides any available insights because it makes your executive do these operations for every segment of understanding: Look left, find the interesting slice. Commit the color and number to memory. Go right. Find the color and segment and commit the new number to memory. Now subtract the first number from the second. Decide if the result is good or bad.
Repeat five more times.
Remember to remember only the interesting bits.
When the chart was created did you think you were going to torture your executive today? Would it be surprising then that everyone atom in this universe thinks "omg, numbers are so haaaarrrrd!"?
Why torture people who are so critical to your financial well being?
Just use a table (as we did in #4 above).
Much easier, right?
At the very least, you don't have to dart your eyes from left to right all the time and commit numbers to memory to understand what's happening.
And since you the Ninja-in-making are not being paid to just data puke, why even show things that might not be material?
Just go with this…
Would the discussion with your management team be much more focused now? And faster?
Oh and… you've already put so much effort into collecting and analyzing the data. Why not use your intelligence (and the statistical significance calculator) to filter data and just show what's most relevant?
It is easy to make things hard to understand. Working hard to make them easy to understand is what brings glory. Sustained glory.
So do that.
Okay it is your turn now.
What are the simple mistakes that you've learned to avoid? Would you recommend a different strategy to follow for one of the mistakes above? Got a better picture to submit? The mistake that most sets you off in the field of web analytics? How did you learn not to make these mistakes?
Please share your feedback, pictures, complaints, mistakes via comments.