Like a vast majority on planet Earth, I love data visualizations. Ok, so perhaps as the author of two bestselling books on analytics I love it a little bit more!
There is something magical about taking an incredible amount of complexity and presenting it as simply as we possibly can with the goal of letting the cogently presented insight drive action. Magical.
A day-to-day manifestation of this love is on my Google+ or Facebook profiles where 75% of my posts are related to my quick analysis and learnings from a visualization. Be it looking at 1.1 million FCC net neutrality comments, things people around the world identify as their biggest threat, water consumption of a burger patty vs. daily cooking, the religious gap on spanking children, or a simple graph that rises profound questions about where we donate vs. diseases that kill us.
Data visualized is data understood. Better. Faster. More useful. It delivers world peace!
I'm exaggerating a tiny bit. (As is clear from the discussion on the preference of guns over knowledge in 37 US states. But at least we're talking.)
In this post, I want to share some examples of data visualization I was excited about recently. In each case the creator did something interesting that made me wonder how I can use their strategy in my daily efforts in service of digital marketing and analytics.
We will look at six short stories. You are welcome to read them all at once (warning: once you start you won't be able to stop!), or you can consume them one at a time. For four of the examples, I'll also share how the visualization inspired me to apply the lessons to my web analytics data. In the other two, I'll ask for your help in how you might connect the inspiration to your work as a Marketer/Analyst.
Short story #1: Treemaps, Sunbursts, Packed Trees, Oh My!
Short story #2: Predictive Modeling, Quantifying Cost of Inaction.
Short story #3: Streamgraphs, Data Trends Diving Made Simple!
Short story #4: Multi-dimensional Slicing and Dicing!
Short story #5: Segmented Stacked Square Charts.
Short story #6: Conditional Formatting, Simple Strategies To A Drive Big Focus!
Six stories, a total of eleven different data visualization techniques to inspire you to think different at work when you play with data. Ready?
Our lives are dominated by columns and rows. [And sometimes they are indeed optimal: 7 Data Presentation Tips: Think, Simplify, Calibrate, Visualize. You'll also see examples below.]
So a table like this one is par for the course for you.
Table comes in. You do you best to understand what is going on. Yes you see the numbers, there are lots of them. You scroll up and down. Nice. Some countries use a lot of oil. That's just the top 12 rows, there are another 196 rows of data. Sure, sure, sure. Long tail . Yippie!
But how can you be expected to understand it all? How can you understand enough to at least pick directions you want to go down?
The table above is from Stats Monkey. Their approach is to actually present the data using a Treemap (they call it a squaretree for some reason).
It is so much better!
You can suddenly see the forest and the trees. (Get it?)
The few dominating countries (USA! USA! USA!) are more clearly visible, and you get a much stronger sense of proportions. Yes, you could see in the table that the US was bigger than China, but the Treemap really brings the comparison home. You start to see weird things like Russia and India are the same. Yes, it was in the table. But for a visual person like me, this is the ah-ha moment.
While you can't see the smaller consumers all that easily, you can hover your mouse and see the details.
Additionally, you can go down to the little ones, now that you have the ability to easily do that, and point and hover.
Three cool benefits: 1. Treemaps are a great way to visualize a lot of information. 2. They are really good at showing the differences in the big head and the long tail. 3. They can form the foundation of allowing data consumers to drilldown into the represented segments.
At a glance you can see all the big clusters of sources (close to the channels view in Google Analytics).
You can hover over each box to get a sense of the key metrics. Number of visits, percentage of share of total visits and the percentage change (which you can discern from the color of each box, in that sense the Compete Treemap does not use color just for decoration).
If you are interested in any particular channel, Miscellaneous as an example, you can click on it and… boom!
You see the big ones named, the hidden mysterious ones, you can unmask using a mouse hover.
It would be nice to see all the sites named, but it is kind of nice that it forces you to internalize the big ones, likely where you can have the biggest impact, and then look at the small ones.
Net, net. A delightful way to take your 198 row table and present it in a manner that aids stronger understanding of performance.
Let's go back to our table, and global oil consumption.
Stats Monkey also presents that table using the Sunburst visualization…
Perhaps compared to the Treemap, this visual shows fewer countries and fewer actual numbers of oil consumption due to space limitation.
You can still hover your mouse and get the details of each country. Additionally you can click on any country and just look at that one. Better than the table, but perhaps less optimal than the Treemap.
I want to use the above visual to share with you how much I adore the Sunburst visualization. I believe it is best at describing sequences of events. It is best demonstrated in the example below, which illustrates the path followed by a group of people on a website.
You get a confusing little thing, but the visualization is interactive. You simply move your mouse and it illuminates the journey and how many people follow a particular path.
You can configure what the end is, in my case the end is people who converted. Now I can quite literally follow the path to every conversion. I can find the biggest pools of customers who share a behavior and go back and optimize my campaign strategy, my content strategy and indeed my overall digital strategy.
I've used Sunbursts to do the same with keyword portfolios. No better way to optimize for all of search behavior, rather than the absolutely silly obsession with a few keywords (it is fatal when apply to single session conversion scenarios!).
The Sunburst visual of our oil consumption is nice. But you can see how much more powerful Sunbursts can be. Learn how to use them from this tutorial, which is linked off my most beloved data visualization source d3js.org.
One last nice visual from our friends at Stats Monkey. This time around using a Packedcircle…
Pretty mesmerizing, right? It does serve practical value as well.
You can visualize the sizes a bit better. You have the ability to still get the details when you hover your mouse.
For the Packedcircle they also provide a list of countries in the table, now you can choose the one you want from the right side and go to the one you are most interested in.
You can see the country you choose zoomed in context of the others that are around that same level of consumption.
The Treemap, Sunburst and Packedcircle demonstrate three possible paths you can take to go from a table to something much more understandable and much more interactive. It makes understanding data incrementally better, and encourages drilling down and exploration much easier than the table.
You've see the application of content consumption and keyword analysis using the Sunburst above, and the use of Treemaps by Compete. I was inspired by the above work to apply the Treemap to our day-to-day work, let me share that with you.
There are many ways to create Treemaps online. I used infogr.am to create mine below. They have a free option, you can try it yourself.
This Treemap illustrates the traffic sources and the number of Visitors. It is created using the All Traffic Sources report in Google Analytics, and clicking Source (rather than the default Source/Medium).
You can have a simple table that shows the visitors, but this is so much better in being able to show so much more data, much more easily. It is also so much nicer in being able to illustrate the proportional differences between each source.
As in all cases above, you can hover your mouse and get the specific number of Visitors.
When I'm creating a dashboard for a high level view, I would take the Treemap above and combine it with the one below that illustrates the amount of Goal Value delivered by each source.
I am sure you have noticed that the sized of each source in the Goal Value Treemap is different from the Visitors one. This allows for very quick understanding of site performance and the asking for very good questions very, very quickly.
I also want you to appreciate that you can't actually show this in a table. You would sort the table by count of Visitors, in which case some of the rows in the second Treemap would disappear, or you would sort it by Goal Value, in which case some of the ones in the first one would disappear.
You can definitely have two different tables with this data. In my case, and this may vary, it is not as easy to connect the dots (both on proportionality and deltas).
And that, my dear friends, is the power of simple visualization.
Let's look at some more.
This example is about the very sad reality of the Ebola epidemic and the sadder still inaction by governments (like ours). The work of the New York Times team inspired me it to do some predictive modeling for inaction in our world of digital marketing.
Ebola is an extremely serious topic, and I do not mean to trivialize it in any way by using it in the context of learning a digital analytics lesson. If this is upsetting to you, I do apologize sincerely in advance.
I found the NYT interactive visualization to be extremely illuminating: How the Speed of Response Defined the Ebola Crisis.
It shows the low and high estimates of infections of Ebola due to this terrible disease. It also allows us to predict what would happen if we delay action, by moving the blue dot on the graph.
Ignore the black line for a moment (it shows the actual reported cases of infection). The graph below shows the predictions of what would have happened if aggressive intervention started in June 2014. The high and low estimates of cases, as you can see below, would have been much, much lower than reality.
Countries with the money, resources and knowledge to deal with an Ebola type epidemic did not come to the rescue of the African countries as fast as you might have expected. Empty words of support and urgency were delivered (along with calls from the ill-informed to shut down flights etc.).
At that time these countries had models to predict what was the cost in human lives from inaction. Just move the blue dot. Let's say to August
Approximately 12k additional deaths. Very close to what happened in reality.
Large-scale intervention did not start until August, but thank goodness it did.
It is important to note that the high estimate includes deaths experts believe have been underreported.
At that time we could also have move the slider further ahead to model out the impact of inaction.
You can see that moving in Aug did have an impact, the black line, reported cases is less worse than it could have been. Even assuming that 12k is a lower number (lots of people don't report the disease). It also does not include the other countries beyond Liberia and Sierra Leone where we know infections and deaths have occurred.
If you want to see scary, move the blue dot to October.
I'm sure that information like this played a key role in getting our government, and likely others, to jump and take action when they did. Thank goodness for predictive models.
If you wonder why sometimes governments move so slowly, among the many other reasons, attribute some of the blame to the power of communication. The data used in the interactive visualization above comes from the Centers of Disease Control and Prevention. This is one of the fourteen tabs in their spreadsheet explaining all this stuff…
You can imagine how difficult it is to communicate what is going on – even after you give them credit for the fact that they are likely rushed and are trying to do a lot more in the spreadsheet. If you have an opportunity to do volunteer work for government agencies, please take the opportunity. Meanwhile, if you want to play with the Ebola dataset, you can download it here: Generic Ebola Response Modeling.
This example got me thinking about applying the spirit of the visualization, without any of the technical resources to create it, to the world of digital.
The inaction that upsets me the most is senior executives in companies brazenly disregarding our recommendations for taking actions based on our web data analysis. It. Makes. Me. Mad!
Here's a simple predictive model (though that might be too pompous of a word to use here) to get them to take action faster. Or at least think a lot harder about not investing the little amount of money to take big action.
The core performance of the current website look like this….
While we are applying it to a B2B case, it could just as easily be applied to a B2C / Ecommerce scenarios.
We've dug deep into the data and we've found some inefficiencies/suckiness in the digital experience. We know the optimization that is required. some straight forward in the stop the bleeding category that can be fixed without much thought. A couple other things, including the lead-gen form itself, which we would test to improve performance.
We need to invest a small amount of company time/resource and an additional small cost with our Conversion Optimization Agency.
You present a verbose Word document/email with your recommendations. You wear a Superwoman suit and, as an agency, present black slides with light grey text and deep shades of blue graphs to make the case.
You did not make the whole thing painful enough. People respond to pain. No pain = inaction.
Do this… Create a table with the future Conversion Rate, resulting leads, use the value of each lead to compute incremental value to the company… all extremely straightforward columns…
Then add the last column, the impact of not taking action for three months!
With the first just stop sucking we are predicting that the conversion rate can be moved to 2.5%. The cost to get that done is $200k. It sounded scary. Now, the $4 mil in incremental value eases the pain. And the leader, who by the way is smart, can look at the last column and understand the delay of not starting on the project right away!
The second bold conversion rate is what we predict we can get to with just stop sucking and our first two A/B tests on the product overview pages.
The third bold conversion rate is for what we predict after the package of changes, including multi-variate testing on the lead-gen page, will deliver.
With this simple predictive model the need for your Superwoman costume and black slides is reduced. Everyone can come together around a small table and discuss assumptions that went into creating it, argue about which changes to start first, and who to assign various parts of the project.
The big challenge in creating this model rests on your ability to compute the business impact of the changes you are recommending. We as an ecosystem are not very good at this. But you can see how incredibly valuable it is.
You can make small improvements to make the table better (remember, tables with lots of numbers don't work as well as you might have assumed).
Highlight the rows, click on Conditional Formatting in Excel and choose Data Bars. I choose red to imply red ink in our accounting system from not doing what we are proposing!
You know where your eyes are going to go. : )
You can play with the formatting options to get the one you like the most. I'm partial to using the Color Scales in the Conditional Formatting section.
When we apply that, change the font color a smidgen, this is the resulting impact… a small improvement…
Once you have the initial predictive model, you can start to play with other scenarios and model them out as well.
For example, what would happen if we focused on not only improving the conversion rate from making changes to the just stop sucking, product pages and the lead-gen pages, and also impact the value of each lead inquiry?
What if it moved from $238 currently to $275? This…
Even stronger sense of urgency around action, and hopefully an ever higher willingness on behalf of the company to pay the Agency more, incentivize the internal teams with bonuses to take action even faster because actual company profits are on the line so clearly!
Since you have the model, more more thing you could add… the cost of delaying action for six months….
Can you imagine anyone in your company saying no to your recommendations based on your data analysis? I honestly can't imagine even the biggest HiPPO saying no.
The onus is on you though to first compute business impact and wrap it into a predictive model.
[All the computations above are quite straightforward, but if you would like to have a copy of the above spreadsheet just send me an email.]
I'm giving the punchline away with my section titles, but stick with me. Pretend you did not read it.
Here is a lovely straightforward visualization.Worldwide Smartphone Sales by Operating System, in thousands. Nothing complicated here. BlackBerry is close to not much, even if the red line seems to be moving up (one challenge with this type of graph). So sad about Symbian.
The raw numbers used for sales makes the above graph less insightful. The total number of smart phones has exploded to such a degree that the decimation of other platforms, and their sad lost opportunity even with an early start, is hard to see. All you can see is that Android is big, iOS is doing wonderfully.
The fix is not that difficult though. The creators provide a lovely option called Extended, click, boom!
Better, much better at seeing the trends.
Not only can you see more clearly how Android and iOS are doing, the massive scale of Nokia's missed opportunity is also more clear. Ditto for Blackberry. Windows Phone, an early starter, is also visible now.
Additionally, you see that Android seems to be going through its almost predictable dip every x months at this time with iOS predictably taking that share.
Each graph has its purpose, in my words above you can see the kind of insights that I was looking for. That drives the graph I find most useful.
While the above graph was pretty lovely, it was the third option that I loved the most. Stream…
Now isn't that awesome?
I love the combination of two insights, one related to the raw growth of the entire space (so much better than the Stacked option above) and the evolving shares of each player (incrementally better than the Expanded option above).
You also have the capability of hovering your mouse at any time period and getting the actual numbers, should you need them.
The above example inspired me to share with you my use of Streamgraph as a visualization.
This example illustrates the presence of key concepts in my social activity on Twitter…
Rather than looking at a flat table, or lines or some such ungodly thing, I can see the concepts that become more or less important to me over time. For an analytics person my interest in Data seems to have a real ebb and flow. And why did I not care at all end of June and July? What caused me to really care in early Nov? All great questions, from a simple visualization.
I can of course focus in on just a certain time period…
Or obsess about just one of the concepts and follow its journey over time.
In the streams are also concepts that became very important at a certain period of time and then died, some came back again later. The Streamgraph gives me a great ability to truly explore lots of data in a visual that literally fits my small laptop screen. #awesomeness
If you are interested in exploring simple Streamgraph examples, our lovely friends at Microsoft Research have made it easy. Please visit their site on data visualization apps for Office, and download the app. They also have an app for Treemaps, and it includes using color, as in the Compete case, to represent a metric (say Conversion Rate)!
Another excellent resource is the Google Chart Gallery. Trust me, you'll be impressed with what you can do very quickly. Candlesticks, Scatter Plots, and Sankey charts! Don't forget the Sankey!!
In the next two stories, I want to share three examples of visuals that made me think. I'm hoping to spark some ideas in your head as to how they might inspire you to do something different. Please share your ideas via comments and help all of us learn from you. In the last story, we'll go through another visualization exercise and end with a bang.
This example is an interactive visualization on Luxury and Foreign Travelers in Rome.
On the x-axis you see the hotel stars (1-5) where the tourists stayed, and on the y-axis you see the distance of their home country from Rome. It is pretty nice.
It is easy to see the outliers, countries that are bunched together, and wonder if you need to find a job in Iceland as they can afford four-star hotels much more than others!
You can hover the mouse over any country and get a bit more detail about why they hold the position they do along the trend line.
What is cool about this is that we can switch the dimensions we are looking at quite easily to arrive at a more sophisticated understanding of what is going on.
Let's look at Length of Stay and Income Per Capita. Do people from richer countries stay longer in Rome?
Korea (blue, red yin-yang flag on the left) has a income per capita of $31,000 per year, but people stay a lot less than the wonderful Russians who make a lot less. And the Koreans stay for a shorter duration.
You can find the Greeks and the Swiss very close together on the chart now. You can see implications on marketing to individual countries if you are the Rome Tourism Board.
I like the option to slice by inequality index rating of the country (GINI rating).
It is a little surprising at first glance that countries with very high inequality move to the right. China, South Africa, Egypt. Or, maybe it is not surprising (the outliers from those countries would stay at nicer hotels I suppose.
I like the last option the best. How many tourists?
If you were the Rome Tourism Board, now all the sexy fun starts! You can understand your tourists better, you can make more precise decisions about your media spend, the color of the red carpets you might want to roll out, the languages you need to teach translators or wait staff at hotels. And so much more.
You can possibly send this type of a self-contained dataset to the VPs and the C-Suite of a company. But it is perhaps best created for the Directors and people who are making quarterly, six-monthly horizon decisions. It is not great for day-to-day tactical analysts/optimizers/marketers.
You can see it's power for executives under the C-Suite to slice and dice, bring their own business knowledge and context and help make more informed decisions.
So. If you could create this type of an environment for your digital existence, what dimensions would you use? What would you like to plot? If you were an SEO, or ran all Ecommerce for your company, or were responsible for consumer experience?
My initial thought was to have Channels are the plotted dimension (where you see Country). Average Order Value, Assisted Conversions, Bounce Rates and % New Visits. The last choice (How many tourists) of course would be Unique Visitors (for the same reason).
What do you think? What would you do?
Another example from my beloved New York Times. This time it was about the elderly, and their challenges of people who live longer in bad shape. Bracing for the Falls of an Aging Nation .
[Regardless of your age, this is extremely well worth reading – for the benefit of your parents today and yourself in the future.]
There was something mesmerizing about this graph that was in the article…
I admit it took a few minutes to figure out what was going on. The non-normal placement of the starting point of the graph might have been it. Or the shades in the legend. But it did take a few minutes.
I came to like it very much.
It shows some obvious things. The older you are, the more likely it is that you will have more falls and visits to ER. The fact that we had this step change suddenly in 7+ in 2008 gives you a pause. Then you can see that what likely happened is that the 6 became 7+. What happened? (The article shares some ideas.)
Overall everything getting worse and, sadly, these incidences are happening earlier and earlier.
I do like this as a nice stacked bar graph that includes a very relevant segmentation, broken into small squares for a nice effect.
If you had to use something like this at work, what would you plot?
I think people don't worry about multi-session conversion enough, they don't obsess about people enough. It is so silly but they are still obsessed about single session conversions! Makes me mad. [See: Multi-Channel Attribution Modeling] Hence, I would plot calendar quarters on the top, channels on the y-axis (not a number), and assisted conversions as the squares for each channel.
What do you think? Good idea? Crazy? What would you do?
Let's close on a let me inspire you to really do this so that you will rock a lot note.
We started with a table, global oil consumption, let's end with a table too. The magnificent Information is Beautiful site recently shared this lovely infographic: What is the world's biggest cash crop?
The right answer to any such question is…. It depends. Always, it depends.
The first visual shared on the site ran off this table.
[Among many other things, one thing I deeply appreciate about the site is that they almost always share their data in a Google Docs file. This means that I, and you, can always download new real world datasets to play with to perfect our own skills.]
Experts that they are, this is their wonderful visualization of the above data…
Depending on your definition of success (planted, yield, production or revenue), your answer might be different. Most people might be surprised that cannabis/marijuana is the grand champion (astonishing considering it is barely visible in the first three columns!). Poor cocaine, so roundly defeated! And good old rice, not very far behind. Go, rice, go!
So many interesting things going on here, look at Sugar Cane.
But that is not why I wanted to close with this story.
The point I wanted to make is that when we seen the work of such amazing artists like David McCandless and his team, it might seem like they are working at an unattainable level. Ok, yes they are. But while you and I, normal people, can't get to their level, we can do more than we might otherwise imagine.
We should try. We should find inspiration from them and we should see what we can do with the tools we have access to.
In the above case, thanks to the table quickly dumped into Excel, we can use the same strategy of using Conditional Formatting to create our version of the above graphic.
Not too bad for ten minutes worth of work. We even made the column titles smaller! :)
We can experiment with different options at our disposal. This is purely a matter of taste, but I thought the Data Bars lead to a better visual. The green intensity drags our eyes to handful of cells we need to look first. Wheat. Sugar cane. Marijuana!
All of the other numbers are there, but they are invisible, even as you can see them!
I see you are complaining that you don't like my table with all the white space and lovely row sizes. And you don't like black font. And you want it all to fit in one page. Happy birthday!
David and his team actually took this a couple notches higher and created a nice bubble chart with Revenue. This is the end-state, and it is so nice…
It is harder for us to create this with our normal tools or do it quickly. But you and I can do a lot more than we might believe. Yes the infographic will be shared a lot more in social channels. For us though, the table might work just fine.
I encourage you to go the extra mile, not give in to the default outputs in our digital analytics solutions, to find inspiration outside our space, as I did in all examples above, and get better every day at communicating our ideas more effectively.
We've moved beyond our obsession of data capture, escaped the time suck of data reporting, we are getting better every day at data analysis. This, data visualization, is our last frontier. The last thing between us and the glorious glory to be achieved by driving intelligent, fast action based on our insights.
As always, it is your turn now.
First do please share with all of us your ideas for what you would do with examples four and five. And then… How much time do you, or your team, spend on data visualization on a day-to-day basis? Does your company allow you to have the time to think and try various techniques? What are some of your favourite data visualizations for digital marketing and analytics? Are there resources you use to learn that you would like to share with all of us?
I would love to hear your ideas, critique, life lessons, specific tips and inspiration from this post.