A Big Data Imperative: Driving Big Action

Center Is there anything in the analytics space that is so full of promise and hype and sexiness and possible awesomeness than "big data?" I don't think so.

So what is big data really? No one quite knows.

As I interpret it, big data is the collection of massive databases of structured and unstructured data. The data sources include traditional (now considered puny) sources like corporate ERP/CRM systems and non-traditional (massive) sources like every technical ping from every human or mechanical sensor, all web behavior by everyone across the entire Internet, increasingly digital data from analog sources like hospitals or the atmosphere, and (good lord!) our collective tweeted wisdom.

That is a lot, right?

Because so much of the big data talk is focused on the promise of zettabytes of data, big data also tends to be about massively parallel computing, fantastic storage systems, the "cloud," Hadoop and MapReduce and other such deeply technical delights.

That explains why so much of big data talk comes from Oracle, IBM, Microsoft, SAP and other vendors. And not so much from practitioners, yet.

I believe in the promise of big data and the awesomeness of the insights that can come from it. But that should not come as a surprise. All the way back in 2007, I was evangelizing the value of moving away from the "small data" world of clickstream data to the "bigger data" world of using multiple data sources to make smarter decisions on the web. Clickstream + qualitative data + rigorous statistical analysis of outcomes + deep mining of data from competitive intelligence sources + rapid experiments + more.

Here's the "bigger web analytics data" picture from 2007… Multiplicity!

multiplicity-web analytics

The big data we are dealing with today puts the 2007 picture to shame. We have even more types of data, becoming ever more complex, distributed across multiple existences, and we are left with the task of parsing out terabytes of noise to get to a megabyte of signal.

That last part is what I love to focus on, what I worry about, what I think everyone should focus on. It is great that we have big data. It is greater that we have such amazing promise in that big data. It is sucky that almost no one knows what to do with it in the context of driving actual business value.

Hence my interest in big data is not about the zettabytes or Hadoop or unstructured variables or one of the n technical things that seem to dominate big data conversations.

My interest is deeply and passionately rooted in trying to figure out how to ride big data all the way to the bank (or world peace). How to find insights? How to structure organizations that will use this data to ensure that they get timely value from it? How to drive action? How to find frameworks that force a different type of thinking so we don't make the mistakes we so brilliantly have made in the world of small data?

If we don't answer all those hows big data will be a big disappointment.

Avoiding big disappointment and the hows were on my mind as I prepared my keynote for Strata 2012 Big Data conference . My goal was to take my TED-ish 15-minute timeslot to present my perspective on why driving big action was the big imperative for big data.

It was an incredible challenge, thanks to Strata co-chairs Edd Dumbill and Alistair Croll. In this post, I want to share the result with you.

I'd structured my keynote into three big pieces:

00:00 – 01:15 Intro. My new favorite data quote by Zack Matere, a Kenyan farmer.

01:15 – 04:05 Part 1. The current flawed data org structure, its challenges, and the new optimal org structure to truly bring big action to big data.

04:05 – 06:20 Part 2. A framework, inspired by Donald Rumsfeld, for big data vendors to think about when creating solutions and the unique space in which big data analysts should actually play in (only the "unknown unknowns!").

06:20 – 10:25 Part 3A. My first, tactical, example: How to auto magically solve the problem of having millions of rows of data, and not knowing how to find the 15 valuable rows that could have a huge business impact. Leveraging interestingness!

10:25 – 15:00 Part 3B. My second, strategic, example: Leveraging predict, mine, correlate to shift away from data puking to, even more auto magically, find trends in the data that truly are the unknown unknowns and identify causal factors for those trends so that we can move from data to action at light speed.

Here's the keynote…

[You can also watch this video on YouTube. You're also welcome to Like, Share, Tweet, Facebook, +1 it on YouTube as well.]

It is not my hope to encourage you to copy/paste the strategy outlined, or to use the tools shown.

My hope is to simply inspire you to think a little differently about organization design, share a framework to influence the focus of your analysis, and find the types of practical solutions that will really spark profitability from all this big data.

I welcome your feedback and thoughts on the video and the solutions via comments. Please also share your experience with big data. Any big or small success you've had would be inspiring to all of us.

Preparing for my keynote also got me thinking about all the implications of big data and my own longish career in trying to create superb decision support systems. The database has moved from my floppy disk (true story) to an infinite storage cloud, yet, amazingly, some of the biggest challenges have remained the same.

So big data revolutionaries…

Six Rules That Should Govern Your Big Data Existence.

Here are some rules from my experience in the small data world that I've come to believe also apply to the big data world, perhaps even more so. As you go about your big data journey you'll meet with even more immense success if you consider these valuable life lessons:

1. Don't buy the hype of big data and throw millions of dollars away. But don't stand still.

Take 15% of your decision making budget and give it to one really, really smart person (Ninja! OK, Data Scientist) and give that person the freedom to experiment in the cloud with big data possibilities for your companies.

It is cheap. You can do dirty data warehousing pretty darn fast. You can find all the ugly warts and problems. You can be much smarter when you start to mainstream big data into your company, while preserving the data awesomeness that already exists in your company.

Structure your big data efforts, at least initially, to fail faster while failing forward. Don't build the biggest, baddest big data environment over 32 months, only to realize it was your biggest, baddest mistake.

2. Big thinking about what big data should be solving for is supremely important.

I can't think of any other time in our lives where we could literally swim endlessly in an ocean of data, without having anything to show for it. Big data is that world. If you don't know where you are going, you will get there and you'll be miserable (if your company has not fired you already, in which case you'll be miserable and sad).

I've championed the need to leverage frameworks like the Digital Marketing & Measurement Model, in the web context, to ensure that the analysis we do is deeply and powerfully grounded in what's important to the business. You have to have that one page, even if it is roughly defined by your Sr. Management. Have something.

If your management refuses, or is not visionary enough to provide you with even basic starting points, then build one by yourself. All it takes is a little business analysis. Here's my post: Five Steps to Finding a Purpose for your Analysis.

When you have access to all this data, the answers you find will be surprising, the insights you deliver will be brilliant, and your impact on the business will be huge. But that can only happen if there is a model that defines the purpose of your sweet big data adventures.

3. The 10/90 rule for magnificent data success still holds true.

For every $100 you have available to invest in making smart decisions, invest $10 in tools and vendor services, and invest $90 in big brains (aka people, aka analysis ninjas, aka you!).

I will admit that Oracle and IBM and SAS and solid state drives are very expensive. Nine times that to invest in big brains might seem egregious. Perhaps it is. Let the 10/90 rule be an inspiration to simply over-invest (way over-invest) in people, because without that investment big data will absolutely, positively, be a big disappointment for your company.

Computers and artificial intelligence are simply not there yet. Hence your BFF is natural intelligence. :)

4. Shoot for right time data, not real time data.

Real time data is almost insane to shoot for because even for the smallest decisions, you'll have to do a lot of analysis first (5 hours), then present it to your superior (1 hour), who will add two bullet items and send it to a team of people (20 hours), who will in turn argue about priorities and how much the data is wrong (16 days), but ultimately come to an agreement because the deadline to make the decision passed 7 days ago (20 seconds), and send the data to the big boss who'll read just the first part of the executive summary (3 days), and decide that the data is telling her something counter to what she has always known works, and she'll make a decision based on her gut feel (5 seconds), and some action will be taken (14 days).

Total up those numbers. Was the real time data of any real value?

Ok so that is way over the top. But every company has a complex decision making structure that is time consuming and therefore unable to react in real time. If you can't react in real time, why do you need real time data?

Understand when is the right time for data in your organization. Shoot for systems and processes that match delivery of data (better still, insights ) to that time frame. You'll have less stress. You'll focus on big, important, strategic things (real time data is really good at driving the best companies to do tactical silly things). And you'll save a lot of money, because real time everything is really expensive!

Here's one way to check if you really need real time data: Does a human have to be involved from data receipt to taking action? If the answer is yes, then you don't need real time data, you need right time data. If the answer is no (say you have intelligence/rules driven automated systems), then you need real time data.

5. "Data quality sucks, just get over it."

That is the title of my post from June 2006. And look how far we've come. :)

The core thrust of my post was that data on the web will never get to 95% clean and it will have big holes and it will be sparse in some areas. We should aim to collect, process and store data as cleanly as humanly possible, but after that we should move on to using the data, because we will still have more data about the web than what God's blessed any other channel with. Let's not become the type of people who continue to waste time on quality beyond the point of diminishing returns. Let's not become persistent javascript hackers and sprop variable tweakers at the cost of delivering value from data now.

Multiply all of that a million times when it comes to big data. We will have dirty data. We will have no idea what to do with videos or spoken text or (omg!) social media overload. We will be missing primary keys. We will suffer from a lack of clean meta data (or sometimes any meta data!). We will realize the shallow limits of sentiment analysis. We will cry from the pain of the painful business process fixes that usually result in good data.

And yet, we are standing on a mountain of gold.

Do the best you can in terms of collecting, processing, and storing data of the cleanest possible quality. Know when to shift to data analysis. Start making decisions. Make small ones at first. (Remember, even they will be revolutionary, as these datasets have never come together!) Make bigger ones over time, as you understand the limitations of what you are dealing with.

Here's the kiss of death: Big data implementation projects where the first touch of an Analyst will come 18 months after the project was first conceived. You see, the world would have changed so dramatically in 18 months that nothing you possibly spec'ed for is relevant any more.

Think smart. Move fast. Slowly become Godlike over time.

6. Eliminating noise is even more important than finding a signal.

This might be a little controversial. But stay with me.

Thus far in the history data analysis the objective for our queries has been trying to find the signal amongst all the noise in the data. That has worked very well. We had clean business questions. The data size was smaller and the data set was more complete and we often knew what we were looking for. Known knowns and known unknowns. (See video above.)

With big data, it is so much more important to be magnificent at knowing what to ignore. You must know how to separate out all the noise in the disparate huge datasets to even have a fighting chance to start to look for the signal.

It is amazing but true. If you are not magnificent at knowing what to ignore, you'll never get a chance to pay attention to the stuff to which you should be paying attention.

Your business savvy. Your analytical gut instinct. Tuning your algorithms to first ignore and then hunt for insights. That is what will have a material impact.

Six simple rules for you revolutionaries to follow to ensure, well, revolutionary success.

Notice, none of them have to do with hardware or Hadoop. One important reason is that I'm solving for the CEO and not the CIO/CTO, so it is a matter of perspective. The second (main) reason is that we do face some big data technology challenges for now, but the things that will determine if big data will deliver big value have nothing to with technology. They have to do with the six rules above.

If you are really thinking big data value, think CEO and not CIO/CTO. It will dramatically change the focus of your work, in a good way.

As always, it's your turn now.

Did you find the keynote to be of value? Did you find the framework to be of value? Will it drive you to change your approach to big data? With regards to the rules above … is there one rule above that is your favorite? Is there one that should have been there but is missing? What is the biggest big data advice you would share from your experience?

Please share your wisdom, recommendations, and feedback via comments.

Thank you.


  1. 1

    As always, fantastic post. I had similar discussion with Addison Snell from Intersect360 about some of these issues last year at a trade show.

    As a tech geek, I very much get absorbed into all of the hardware and software application details when it comes to big data. It's easy to get lost in all of that.

    As a marketer, I have to remember to look for the value, the WHY and HOW this information is useful.

    But sometimes it's difficult to stay focused on that when you realize just how amazing it is that we have that data in the first place. It's very awe-inspiring.

    Thank you for taking the time to share!

  2. 2

    The area of unknown unknowns is ambiguous. Pulling in big data sets often requires you to make assumptions and smooth over inconsistencies between different data.

    It seems likely that the more data you pull in, the more noise there may be and, as a result, there may be more chance for folks to misinterpret the data or interpret it differently to support opposing viewpoints.

    That said, do you feel that big data can more easily lead folks astray than smaller, more contiguous data sets? How do you control for the differences?

    • 3

      Josh: Let me try to tease out some of the threads in your valuable comment…

      First, any data can lead the unaware astray. I don't think that is privy of big or small data. Sub optimal thinking in, sub optimal results out. :)

      Second, you are right that significantly higher data literacy will be required in our Marketing, Sales, Finance, HR peers, and in us Analysts. (I'd mentioned at the start of the video why that is so important.) Many organizations are not there yet. The time to carpe diem and start to evolve is now!

      Finally, connecting more data sets will lead to additional connectivity and interpretation challenges, but without them there is no juicy fruit to eat, no magnificent progress to be made. We can't, and likely won't, stay with a small data set or just one source because that likely means we stay stuck making small and perhaps incomplete decisions.

      Thank you as always for sharing your feedback, I appreciate that so very much.


  3. 4

    That is a great post and presentation Avinash. It reminds of what Peter Fader said in an interview:

    "There are companies that just feel: if we throw enough money at it, if we hire enough smart people they will figure out what to do with the data. No, that's not the way to go."

  4. 5
    Craig Burgess says


    Great post! I especially liked two paragraphs:

    1) "Let's not become the type of people who continue to waste time on quality beyond the point of diminishing returns. Let's not become persistent javascript hackers and sprop variable tweakers at the cost of delivering value from data now."

    This is a constant struggle with so much data available. This adage shares that idea: "You are defined more by what you don't do, than by what you do." And it leads to another insight I loved:

    2) "If you are not magnificent at knowing what to ignore, you'll never get a chance to pay attention to the stuff to which you should be paying attention."

    The issue also lies with what you are charged with doing vs. doing what you know charges you! Does it match up with what your boss(es)/CIO/CEO thinks you should be doing? Often not. This dissonance is like an annoying visual distraction that makes it even tougher to reflect and find those nuggets you KNOW are buried just below the surface!

    Thanks for writing and sharing, I always get good somethings out of it. Now onto the video….


  5. 6
    Tom Kluth says

    You are so right on point #6.

    You must have a framework and there are wonderful tools that help get away from the noise from some wonder Six Sigma process improvement-type tools to statistical modeling tools that calculate relationship values (like Information Values) or correlations (principle component analyses).

    The most important thing is to make the business link between the data-driven insight and the marketing action.

  6. 7

    This is a very serious post and hence I can't resist sharing this picture, from @kimwatkins…

    Starting small with big data!

    Big data is going to be big. It is important to start them small! :)

    I'm not sure if this is optimal for your kids, but there is a message there about it never been too early to start learning.

    Thank you Kim!


  7. 8

    I was working for a catalog company in 2009, and we were 'attempting' to integrate some online data into our marketing database. Six rules mentioned above are all good pointers – especially looking back.

    Omniture offered quite a list of raw data that can be 'dumped' out of their system – but we started with very small/selective data points. We already had 'ways of doing business' per se. Being a catalog company and all, we were particularly interested in how this would affect our match-backs. Having this 'focused' objective allowed the project to move forward – not stuck in a giant spiral of decision making points after points.

    I left the company shortly after the inception of the project, so not sure how all these good-intentions played out. However, I felt that it was extremely important that we had good people working on the project who understood not to bite more than we can even attempt to chew.

    IMO, it always will be a big challenge to overcome to decide on what/how to utilize the findings. Especially in enterprise-level, change isn't something that takes 'overnight'. It can…and a lot of cases that it should…but it just doesn't :)

    • 9

      Visitor: I normally reply to every comment I get on this blog, but you did not leave your real email address. But I publicly wanted to thank you for taking time to share your valuable experience in the comment above.

      It is immensely helpful to hear from people who've been out there and tried to change the world. :)

      Thank you,


  8. 10
    Urvashi Pitre says

    I must admit to being some what surprised at all the sudden hype about "Big data" when so many of us have been talking about–an using and driving change with–"big data" for so many years. The difference may well have been that we didn't have a catchy title for it. Saying multichannel data, or integrated data sources may not have been as sexy as "beeg beeg data!"

    IMHO, the whole push needs to continue to be about answering important business questions, driving profitability, and using data to drive decision making. Big data is just the raw material for how that gets done. What should drive changes is not the sudden availability of big data–mainly because it's not that sudden–but rather whether or not people have access to information that helps them make decisions.

    my 2 cents–would love to hear thoughts from others who have been in the data trenches for years.

    • 11

      Urvashi: Having been in the trenches starting with a 0.6 mb database in Access :), working up to a few hundred gigs in Sybase to now crazy amounts in the cloud, I can completely empathize with your perspective.

      It seems like we've been doing this forever.

      I do think that there is a lot of new stuff in "big data." The types of data we deal with. The complexity of analysis. The approaches we take to storing data (and throwing it away). The type of questions we can answer, ones we likely could never answer before.

      All that we have done on the IT and business side gets us well set up to take advantage of this new opportunity, even as our horrible enemies (six outlined in this post) seem to be our BFFs from the old world. :)

      Thanks so much for adding your perspective.


    • 12


      "…when so many of us have been talking about–an using and driving change with–"big data" for so many years…"

      This certainly is true. IMHO though, one of the major implications that worth noting is 'time stamp'. From a direct mail perspective, it was extremely difficult to time stamp your campaigns – unless you sent out a little goblin to stalk the mailman!!!.

      This particularly posses an issue when 'traditional' marketing data analysis was 'predicting' likely curve for the campaigns. With online, there is no 'prediction' as campaigns are clearly time stamped (well…in most cases that is :P) – i.e. ESP data, raw data dump from analytic data warehouse, etc.

      While real-time data can be powerful (IMO don't think most of us are there yet in relation to resource/bandwidth wise to utilize this), availability of time stamp allows more accurate/realistic curve for our marketing efforts via integration to 'traditional' marketing database. From a 'multichannel' direct marketing perspective, this has been a major change, IMHO.

  9. 13
    Richard Hren says


    Thanks for the succinct summery of big data issues in addition to an impassioned performance captured on video.

    Two points really resonated with me: right time data and signal/noise issues.

    Real time reactive data use can be very tactical and beneficial but rarely, if ever, does it really provide great insight beyond its operational boundaries. Its easy to pop up the next recommended book or song but your organization doesnt really learn anything from that. Valuable analysis takes a bit more time to create, more time to digest and internalize, and more time to execute against. Lets not confuse business rules or simple pattern matching with analysis.

    And the promise of big data is that there is a wealth of little data meandering around inside struggling to be set free. Most of the extremely granular data that we can now collect has little to no informational value. Being able to remove as much noise as possible is the secret of liberating those nuggets of truth. I sometimes think that we actually have the same amount of important data now as we did years ago, only now we have surrounded it with more junk.

    Great post, fun video.



  10. 14

    How can you get the weighted sort turned on? I have read that it's not active on all GA accounts.

    I tried to set up your report from the video and the weighted sort option does not come on. Even after clicking the bounce rate sort.

    Also, I went to help in GA and did what it said there and it still does not show up.

    Is my data set too small?

    • 15

      Jeff: Weighted Sort (like exporting to pdf and various other things) is only available in Google Analytics v4 for now. The team at Google is slowly releasing everything into v5 so should be there in the near future.

      If you want to play with the feature just click on the link called "Old Version" in the header.

      Here's a blog post that outlines how to use the feature in case you wanted to learn more:

      ~ End of Dumb Tables in Web Analytics! Hello: Weighted Sort


      • 16
        Cristiano Siqueira says

        Thanks Avinash!

        I'm using Weighted Sort in my analysis. I'm following your advice about Big Date!

        Congratulations for the post!

  11. 17

    #6 hits home – as this growing sea of data (maybe universe of data?) continues to keep growing, we have to be able to focus on the important segments and then zero in on the elusive signal.

    Great post as always Avinash!

  12. 18

    In part 1 of the video you make a good case for addressing the org structure prior to embarking on the big data journey. I humbly propose a Rule 0:

    0. Ensure that everyone is empowered to make evidence-based decisions

    Smash any process that requires decisions to be referred to a central command-and-control authority. If that means smashing the central command-and-control authority then grab your pitchfork now.

    If you're just the data guy and someone else decides about decisions, get out of there now! Your company is about to be out-competed by an upstart without hierarchy issues.

  13. 19


    Your keynotes are always provocative and wildly entertaining to me. Maslow would deeply appreciate your language as it relates to higher needs. Sometimes I think that your insights and strategies should be taught to the new young up and comers. Precisely because too many of the hippos will wallow in their own worlds of yesterday and self-belief.

    What are the things educationally that can be found to educate the up and comers and some of us older guys to massively disrupt and prosper?

    I know you have your startup Market Motive; but can you suggest some educational programs as well that can remold the future minds in your visions that you like or was Market Motive a response to your feeling that not enough of this is being taught.

    • 20

      Rob: I appreciate the kind words, thank you.

      Market Motive was most definitely founded because we felt there was a distinct lack of structured curriculum out there that helped create current generation analysts – both from an analytical knowledge perspective and from an optimal thought process perspective. I only half jokingly tell my students each quarter: "If by the end of this course I've thought you know to use the data, but not how to apply the right mental model to it then I would have failed!" :)

      Of course life is a great teacher of how to massively disrupt and prosper. One just has to have enough courage to stand up, and take the occasional career beheading. I've always found that to change minds at the very top of companies these strategies work very well:

      1. I have to be willing to do all the hard work required to show the value of the "new world." Far to often we just go preach. I think walking the talk, showing a rough prototype, a deep alternative analysis, something is of great value. Because it is concrete.

      2. I love framing things from a customer perspective. "Here is the magnificent delight we can deliver." "This is how we will revolutionize their experience." "Here is why the benefits we deliver will deliver additional glory."

      3. Competitors. I shamelessly leverage the current/impending success of direct competitors to make concrete why change is mandatory. No CEO wants that on their ego. :)

      Hope this helps a little bit.


  14. 21
    Ned Kumar says

    As always, enjoyed your post and the keynote. I also loved your rules – the two that I really felt folks should take note were "Shoot for the right time data, not real time data" and "Eliminating noise is even more important than finding a signal". While all your rules should be followed, I think even just focusing on these two can provide the firm with tons of actionable insights that can beef up their bottom line. (Especially on the "noise" rule, sometimes analyzing the noise as the 'signal' can provide wonderful insights on the whys and how of an event happening).

    Just a couple of more thoughts. I know there has been a lot of hype about Big Data including 'defining' it in terms of the 4Vs (volume, velocity, variance, and variability). In my mind Big Data is not as much about the volume aspect (even though that is a factor). For one, by Moore's law what is Big data today might not be Big a few years from now as our computing power increases. Volume in my mind is a byproduct – part of the complexity comes from the increased number of sources and how that exponentially increases the number of ways the data from those sources can interact or be integrated.

    Big data or not, I think the key is how we intent to use the data from various channels/sources to drive our business vision & goals (and why I liked your post & talk). Just because data is available need not always mean one has to use it (imo) — trying to force a relationship with data often sidetracks the very purpose of analysis.[I know everybody might not agree with me on this :-) ]. I mention this because so many folks feel left out if they are not looking at unstructured data, text data etc. — my first question to them always has been "Is that data & analysis of that data necessary given your business vision and objectives?".

    And lastly, I love the quote you had. Here is another one I like by Guy Laurence, CEO of Vodafone “Data on its own is impotent” :-)


  15. 22

    "Shoot for right time data, but not real time data" was a good info and esp., the examples stated, which is a big reality.

    Also 10/90 rules, which is very obviious, but rarely many firm's do understand that it's we human who makes the difference but not tools or data.

    Also good to have your latest videos, which I look forward to on youtube.

    Ranjan Jena

  16. 23

    Excellent stuff.

    I have one comment, though. You're a bit too hard on "realtime." I think realtime data has a role in exploration: Interactivity.

    Data science happens in two big ways. There's exploratory, what-if, ad-hoc discovery (think dragging columns and rows into a Pivot Table in Excel to find the "unknown unknowns" that emerge.) And there's reporting, here-are-the-results analysis, which is what often makes its way into the boardroom.

    If the boardroom can't react on the report in realtime—and by definition it can't, unless the boardroom is a bunch of software rather than flesh-based, error-prone executives and HIPPOs—there's no need for the data to be realtime. All that does is make people feel they're flying a fighter jet when they should be building a business.

    But speeding the human-machine interface dramatically increases the performance and productivity of an analyst. Think changing pivot tables when playing "what if." As you point out, 90% of your budget should go to smart humans. Making those humans as efficient and effective as possible is essential if you're going to reap something from that investment. Every time a precious analyst watches an hourglass or a spinning beachball, or goes to get a coffee while a report runs, you're squandering her.

    Real-time reporting isn't that useful except as an early warning system for error detection (sales are down a lot today so maybe the site is broken.) But realtime interactivity is more important than ever because the data is unstructured and the analyst's time is precious.

  17. 24

    Great Keynote! Congratulations …

    The focus on people rather than tools and the focus on actionable insight is always important.

    I have only one thing to add: In my experience real time analytics is important. Some businesses rely heavily on real time analytics. 'old school mass media' for example try to get the latest buzz online and include this into their newspapers or tv shows.

    You know it always depends …

    • 25

      Ulrich: I want to completely underscore your last point: It always depends!

      There are certainly some scenarios where real time can be of value. Especially, as I mentioned, if automation is involved. The scenario you're describing with "old school media" is a good one, many 100% automated tool, with a very lite editorial touch, where real time decision making works very well.

      A different example of this is how engines like Google or Bing will use 100s of signals and information being published in now time, and be able to show the most relevant answer.

      But in almost all other scenarios I've not had the privilege of seeing companies do much with real time data (even after pumping millions of dollars into getting that data – it felt good to have it, rarely did the business any good).


      • 26

        Hello Avinash,

        thanks a lot for your reply. I always appreciate your feedback. I had the luck to see a hand full of cases where companies did great things with the insights gathered from real time data.

        But you are right – this is rarely the case. Too often people put their money in ineffective projects.

        The 'feel good to have data' – problem is a common one, it is costly and could be solved by choosing the right people.

  18. 27
    Ned Kumar says

    @Alistair – I hear your underlying thought on wanting real-time data and agree there can be certain benefits to it (if handled correctly and for the right reason).

    However,(imo) we should separate out the efficiency of tools, servers, and human-machine interface with the need for high velocity, high frequency data input. Irrespective of the data being real-time or not, if the analyst is not equipped with the right tools or if the processing capabilities are not scaled enough there is bound to be a lot of 'wastage' of man-hours watching the hourglass. And in this I completely agree with you that the firm/HIPPO should ensure that they don't stop at hiring a brilliant mind but also provide the right environment & tools for that mind to do the best it is capable of.

    The way I interpreted Avinash's rule was that given your current business context, is it still worth it to go for real-time data? Here again I agree with you that if real-time data is readily available or available with minimal effort and there are no processing constraints then yes, the analyst can definitely do exploratory analysis and even subject that data to some of the cool methodologies to see if any insights can be gained (but the question still remains – how long before those insights can be or will be used for any decision making?).

    However, if real-time data is not readily available or if would take significant effort to make it available & processed then in most cases I am with Avinash that it might not be worth it. Mainly for the reason he mentions. We can do all the analysis on real-time data but firms (especially large firms) are not structured to make real-time decisions.

    Real-time data without [near] 'real-time' decision capability has no ROI. Insights by itself are a whole lot less valuable than 'actionable insights' that has been acted upon :-)

  19. 28

    Hello Avinash.

    I found your post quite interesting. I wrote a article that, on the surface might be against your viewpoint (see the link the website box). But on deeper reflection I think we agree on several points.

    Couple of things

    1. Real time data – as many other have pointed out here, it has its uses. Think of all the electricity grids, powerplants, manufacturing plants, UPS, FEDEX, airlines, banks, trading platforms, traffic control systems, etc etc etc A lot of stuff functions on decisions made on realtime data. And this is not just about control. When you have data streaming from thousands of sensors, the decision loop that you mention that takes 14 days, has to be compressed. So spending some time on developing the right kind of analytics/algorithms that can take the raw data about multiple parameters and present it as actionable insight has enormous value. There is also a place for what is typically called in these situations as "historical data" (data over 24 hours old to as far back as it goes). Those can be analyzed too, but real time data has realtime value.

    2. Typically when companies know the analysis they are looking for, they have found it. Is it the best possible one? No. Is it reasonably good, yes. With all the progress of the last two decades, I don't think any defined problem has been solved to some extent. Which leaves managers and executives with a vague longing that they are missing out on something big and in come the IBM type consultant, who don't know much about the industry, but are big on promises.

    3. I agree with you 90/10 rule. Unfortunately, the human component is seen as a ever repeating expense, whereas most software (even that bought as a service) is capitalized and depreciated. Also, there is little or no effort exerted on the actual recipients of analysis to learn the tools provided. The attitude is more like "Bring it to me and I'll eat". The true revolution in data analytics will come when the industry knowledge and data analytics are merged. This will not happen when external providers are interested in providing solutions. It'll happen when the tools are simplified enough for an average manager and executive with limited time can do their own analysis without recourse to a ninja, or a data poobah. The "analyst" role has to die out. I would restate your thesis to 10% for the software, 90% towards, not the right "data guru", but towards training the actual principals in finding their own insights. The difference is like same as between going on a guided tour vs. exploring on your own with a GPS.

    4. Last, but not the least, decisions and recommendations have to be made "defensibly". Anyone who does not understand this, does not know how the real world works. Consultants can present recommendations based and they then leave, but for people working for an organization, the stakes are different esp. with decisions that have a big import. Hence the meta data and quality of data is critically important to decision making. When these are lacking, in a organizational politics context, those opposed to you can easily derive the opposite conclusion or just plain pull out a bunch of erroneous data and go "you're making a decision based on this????". Data analysis based on shaky data might yield insights, but not decisions.

    • 29

      Deja Vu: I also think there is broad agreement between us.

      1. I'm not saying real time is completely useless. I'm saying that investing in eliminating humans is a worthy cause, it is perhaps the only way to ever make real time work.

      2. I'm sorry I don't think I understand exactly what you are saying here. But I do agree that hiring the IBM consultant might not be the right answer. :)

      3. Amen!

      I don't think that the Analyst role will die out. I think it will move away from analysts being "glorified data pukers" to actually doing big strategic hard analysis to power big important "unknown unknown" hard decisions.

      4. GIGO very much still rules. But in a quest to avoid that we have to balance the purity of data we seek with the timeliness with which decisions can be made. It is not a easy call, but it is critically important that decision makers get good at making those calls over time.


  20. 30
    Anonymous says

    Hi Avinash,

    Thank you for encouraging people to spend money on intelligent people.

    Having been a 'middle manager' in the past, it is really hard to get upper management to understand that. Then they are always mad at us middle managers because the good people are quitting and we're paying a ton of money for a tool, why can't we get better analysis?!?! Because I'm spending all my time training the new people, getting them to a point where they can finally start to run on their own, get offered double or triple pay elsewhere, quit, and the cycle continues.

    Tools are only as good as the people working in them.


  21. 31

    I have recently discovered your blog and i am an instant fan! I can totally identify with the view that eliminating noise is more important than finding the signal. As a practioner of advanced analytics myself, i too am overwhelmed by the mindshare that technology has grabbed in this domain, with few success stories, if any.

    I have a couple of questions/comments and would like to hear from you:

    1. Are there any advances in the data mining techniques that lend themselves especially well to big data? All talk about statistical significance became somewhat of a moot point as "data-mining" started dealing with extremely large (by then standards) sample sizes. Are we about to say goodbye to some other closely held analytical techniques?

    2. Where do you think the biggest probability of success might be for big data? Are there underdogs that might run with the ball and benefit the most — eg manufacturing, logistics etc.

    3. are you aware of any success stories, big or small (no pun intended), with big data.

    • 32

      Mukul: Quick answers to your questions….

      1. Data mining techniques are becoming more sophisticated with every passing day. Some of the coolest advances are in using artificial intelligence. I'm quite optimistic at the coolness that is in front of us.

      2. Everywhere. I don't think there is an industry I would randomly pick (mostly because I only know what I know). There is an incredible opportunity to extract value if there is an intersection between incredible human skills available, a unique opportunity and management vision.

      3. Randomly go to any big data vendor, or do a Google query, and you'll find them pimping many many success stories. Take them with a tiny grain of salt, find loads of inspiration.


  22. 33

    Most refreshing read, including the comments that follow.

    On real time, is the assumption being made about processing volumes coming out of innumerable sources in great variety and at a greater velocity, ending up in the palm of a decision makers hand, to sip from that ocean, as if swallowing it all?

    Real-time could also be about analysis that 'auto magically' triggers other systems or in stream determines if a visitor should see page A or page B or page Z31. Real time could be that sort of sub second response relying on in-stream processing or in-memory computation, or it could be about reducing a business task that took weeks into hours, such that it alters the fundamental business model to be able to serve responses on the same business day. You figure this one out.

    • 34

      Sridhar: My perspective on real time leaves aside any minor processing, storing, and, this might be crazy, reporting issues with real time. We have bigger, badder systems with every passing day capable of sifting through millions of rows in real time.

      Data collection, storing and processing is no longer the issue.

      The challenge is all about the time/process/bureaucracy/people/begging between "look here's some data" to "ok do x with the insight" to "done, it's implemented."

      As I mention in the post, as you mention above, if you can automate the steps from "look, insight x, action" by eliminating human beings then you can do a lot with real time data.

      Also please see Alistair's wonderful comment on this thread. I agree with his valuable perspective.


      • 35

        You are the Guru! Seriously! I entirely agree with your main submission i.e. system>human>think-think>sit a bit>act maybe/do something else. This is the enemy of real time, making it a waste of time literally. Thank you for pointing out Alistair's comment. Rightly highlights the importance of analyst time getting wasted sifting trove-ful of unstructured data, which is one aspect of Big Data.

        I should have been clearer in the first place that I implied real time in the same vein as 'on-demand.' Aside from obvious use cases that are system triggered alerts, like the super job folks at Splunk are doing, I was referring to a underlying/subliminal change when business process (transactions, workflows) could happen faster simply due to the speedier processing where T+2 potentially is T+1.

        In a system my company is currently working requires validating an entity inside 300 million+ of numbers. We reduced the process from close to half a day validation and reporting time to minutes. This is not exactly real time as in sub second response times, but what it does is alter the business process and how activities are chained. One 'real time activity' or a bunch of them across a business process could as a whole alter the business process. I am guilty of putting excess faith in technology and that technology is by itself apolitical – refer excerpt from Langdon Winner's "The whale and the reactor: a search for limits in an age of high technology. My exuberance apart, impact of technology (read 'real time processing') can be delinked from the status quo of how business folks conduct themselves, until several changes force their hand to offer products or services at a lower price and of higher quality. I am not just a 'real time' fan boy, but a technology one, hopefully for positive change.

        Apologies for my attempt at stoking the embers of this discussion when people have already moved on. Avinash, you are my man!

  23. 36
    Jason Luis says


    I always enjoy your enthusiam and insight.

    One quick comment – I just did a control+F, and couldn't find the word "creativity" anywhere.

    Creativity + Intelligence + Dilligence will help us find the diamond in the rough, see the potentail where others don't and help us deliver on the promise of big data. Smart people with frameworks will deliver expected results inside the box. Creative people will pick the lock and open up the opportunities! =)


    • 37

      Jason: That is great feedback, thank you.

      I concur with you about the three elements (C+I+D). There are perhaps a couple more we could add to that list (if we were thinking of Analysts). In this post with the emphasis on the 10/90 rule I wanted to stress at a macro level the importance of having "big smart brains" deployed against this complex task.

      My assumption was that if then companies do hire that they'll hire the right people. But perhaps that is a flawed assumption. :)


  24. 38

    Working for a company where data is our lifeline, this was AWE INSPIRING.

    I immediately shared this with some colleagues, many of whom have been also pondering on how we drive more and more insight leading to meaningful action from the non-stop barrage of data we deal in, day in, day out.- this just fuels the fires of inspiration.

  25. 39

    Hi Avinash,

    I am an amateur to the Professional World and trying to come to terms with it. Your blog has narrowed down the way I look at things and it is a complete Paradigm Shift. I owe you for the Insights. Cheers!


  26. 40

    Hi Avinash, One of the companies where I worked had the following approach to tackling their data integrity issues:

    Step 1) Go back to weblog files and get the activity data for the known user, pull the data from the web log files into a database, then using a visualization tool bring in the activity data from the database for reporting. Their activity KPIs were very unique and specific to the industry.

    Step 2) Pull out all the 300 to 350 users out of the analytics platform. Provide login to only a handful of users to the analytics platform who will then monitor the clickstream data for the purpose of base lining the dashboards and for plugging the holes in data.

    I have a few questions:

    1) Does it make sense to revert back to weblog data?

    2) Is it a wise decision to cut off access to the company wide users of an analytics platform to establish data integrity/ uniformity in reporting?

    3) What data should we look at to get a consolidated big picture of the user?

  27. 42
    Mamun Mahdeeb says

    Hi Avinash,

    Thanks to you for this nice post. It will be really helpful for us. You give us powerful information about how to use Data from a real-world application perspective.

    Thanks again.


  1. […]
    “Information is powerful. But it is how we use it that will define us. Understand when is the right time for data in your organization. Know what to ignore. Move from data to action at light speed.” – Avinash Kaushik, Driving Big Action.

  2. […]
    A Big Data Imperative: Driving Big Action, http://www.kaushik.net

  3. […]
    The vision of “data democracy” will come true and everybody in the organization will create and consume big data. Data science fundamentals will be thoroughly integrated in all levels of management education.  Mobile, easy-to-use big data analytics tools will follow the adoption curve of Excel and PowerPoint, in a market dominated by one or two large IT vendors. This dominant (tablet-based?) big data “platform” (or platforms) will also be heavily used by consumers at home to make sense of their personal big data.  Data science will not be a specific discipline or job category.

  4. […]
    It all brings us back to a post Avinash has put together on the way to go forward with big data where he looks at the things that you need to think about when implementing big data. The suggestion that you spend a small amount of money doing a small data integration is a good one (although the realities of business suggest that companies try to shoe horn bigger systems onto prototypes of smaller ones when they should start from scratch). The other thing that is often missed out is the 90/10 rule.

  5. […]
    Avinash Kaushik posts “ A Big Data Imperative: Driving Big Action” with his usual acumen at Occam’s Razor.

  6. […]
    On Occam's Razor, Avinash Kaushik drives home the point of avoiding data for data's sake and looking into finding courses of action based on good analysis of that data: A Big Data Imperative: Driving Big Action

  7. […]
    Whether you hire subject experts, grow your own, or outsource the problem through the application, data only becomes "unreasonably effective" through the conversation that takes place after the numbers have been crunched. At his Strata keynote, Avinash Kaushik (@avinash) revisited Donald Rumsfeld's statement about known knowns, known unknowns, and unknown unknowns, and argued that the "unknown unknowns" are where the most interesting and important results lie. That's the territory we're entering here: data-driven results we would never have expected. We can only take our inexplicable results at face value if we're just going to use them and put them away. Nobody uses data that way. To push through to the next, even more interesting result, we need to understand what our results mean; our second- and third-order results will only be useful when we understand the foundations on which they're based. And that's the real value of a subject matter expert: not just asking the right questions, but understanding the results and finding the story that the data wants to tell. Results are good, but we can't forget that data is ultimately about insight, and insight is inextricably tied to the stories we build from the data. And those stories are going to be ever more essential as we use data to build increasingly complex systems.

  8. […]
    #architecture – ‘Big Data Imperative – Driving Big Action’ – 6 principles, including –

  9. […]
    What are you doing with your data?
    Are you driving closer to your goals?

  10. […]
    I happened to run across this article by Avinash Kaushik today: https://www.kaushik.net/avinash/b
    I think it fits well for this question, and the evolution of 'big data' as it seems to become more of a sexy buzzword and lose some meaning.

  11. […]
    According to LinkedIn Founder Reid Hoffman, Web 3.0 will center around data. A recent article in WIRED glosses over the point. But I found it interesting because I had also read a blog post by Google’s Digital Marketing Evangelist Avinash Kaushik which made the same claim. Big Data Mo data, mo problems. There are several things keeping data down. In the Web 1.0 and 2.0 architectures, databases were singular, proprietary, formatted differently, and/or nonexistent. Now, the database-driven web envisioned in 2009 by world wide web founder Tim Berners-Lee is gaining some serious traction.

  12. […]
    Turns out he is the father of the term “unknown unknowns” – things we do not know we don’t know – popularized by former secretary of defense Donald Rumsfeld and later by Avinash Kaushik as “the unique space in which big data analysts should actually play.”

  13. […]
    Possible successes — if you will call it that: Web analytics (where data is so easily grown because of the ease of data collection and the number of people and actions that can be measured) and politics (which was, actually, driven quite a bit by web analytics).

  14. […]
    To get the most out of smart data, behavioural information should be combined with a consumer survey data and a social science perspective to generate information that guides the flow in understanding the pattern of consumer behavior. Sure that sounds easy but how does it work? I’ve chanced across a comprehensive set of Six Rules That Should Govern Your Big Data Existence:

  15. […]
    Anyone entering the realm of digital marketing and analytics will soon recognize Avinash Kaushik as a key thought leader in the industry. On his blog, Occam’s Razor, Kaushik has written extensively on how to use big data to find insights that drive action with timely value. In his blog post, “A Big Data Imperative: Driving Big Action“, Kaushik acknowledges the potential and the challenges posed by big data.

Add your Perspective