Everyone seems to be jumping on the “Big Data” bandwagon these days – especially by the marketing folks. Recently I read Big Data: A Revolution That Will Transform How We Live, Work, and Think by Viktor Mayer-Schonberger and Kenneth Cukier and it opened my eyes to the true meaning of this term and how it is far more than just more data.
Here are a few of the main points that resonated with me.
Big Data is not just more data, it’s ALL the data.
Something that I hadn’t considered was the impact that Big Data has on statistics. A lot of statistical methods were created due to the limits of working with large amounts of data and so techniques were created to get insight from smaller quantities (subsets).
You take a representative (and random) sample of the population and you extrapolate to draw some conclusion about the whole. This was done because access to the entire set of data was impossible or the computational power to make such analysis was not available.
With decreasing storage costs and increasing processing power, this is no longer the case and so with Big Data, you aim to collect all the data and so you never need to extrapolate from a smaller sample.
Causation matters less than Correlation.
As humans, we are hard-wired for causation. We want to know why something happens as much – if not more – as what happened. This can cause unnecessary biases and missed opportunities.
The traditional method of data analysis is to create a “hypothesis” and to then analyse the data to prove or disprove it. We would start with a cause and start looking for the effect.
The “Big Data” world, however, has no need for causality. If you have all of the data and a correlation exists, asking why may be interesting but it doesn’t change the result. There is a growing discipline and category of software around “Correlation Analysis”. You feed in data sets and let the software find the correlations.
One of the examples provided in the book was from Walmart. By analysing product sales combined with other data sets, their system determined that sales of Pop-tarts correlated closely with regional hurricane warnings. They used this information to instruct store managers to place Pop-tarts closer to checkout counters during these events and sales soared.
This is one of my new buzzwords. When you have all or nearly all of the data, the precision of the data required to make decisions decreases; there is a point where “good enough” can be effective and indicative.
For example, if you were measuring soil moisture levels in a crop field you could invest in expensive testing equipment and place these in a few representative areas around your land. Alternatively, you could spend your money on higher quantities of slightly less accurate equipment and place them everywhere on your land. The cheaper equipment might be less accurate when compared individually, but having more data will actually be more useful and the difference in quality would be statistically insignificant (quantity trumps quality with “Big Data”).
This was my favourite new term from this book. We recently went through a period of “digitisation” – that is, digitising information that was primarily “analog” – music, books, photos, paintings, etc. We are now going through a period of “Datafication” – turning everything we can into data so that we can track it, analyse it and start looking for correlations.
We’re seeing this in the trend known as the “Quantified Self”. Collecting and storing information about ourselves as we go about our day – how many steps we take, blood pressure, weight, what we eat, etc.
The logic is that we should capture and store everything we can. We can then start to look for correlations. This is especially true when we are able to aggregate this data into larger sets. Data is quickly becoming a resource and an asset – especially for business.
There’s obviously far more discussed in the book than these few points, but it’s important to understand that this “Big Data” thing is more than a marketing buzzword. It is a revolution and would have been unimaginable just five years ago; Incredible.