How to handle Big Data

The modern mantra in almost every company seems to be: we need lots of data, and the more the better.

Modern companies, large and small, believe that understanding and interpreting the terabytes of data that have accumulated will provide insights into their customers, predictions into the future and a leg up on their competitors.

This is because so far, the research has mostly been focusing on a binary construct: presence or absence of big data. In reality, this is a simplistic view and companies should move beyond this simple dichotomous assumption.

What many CIO’s, data scientists and heads of analytics departments seem to miss is that data has at least three flavours, not just one. The three flavours of data I talk about are: volume, variety and truthfulness, or veracity. In fact, a fourth one could be added, which is velocity. These represent the four V’s of big data.

However, most seem to only be interested in the first one, volume, not understanding that, in fact, focusing on volume only may be counter-productive to the firm performance.

A 2021 article by Cappa, Oriani, Peruffo and McCarthy1 focuses exactly on this issue.

They analyse the impact of volume, variety and veracity on firm performance and conclude that big data volume alone can have a negative impact on firm performance. This is due to the costs and risks associated with collecting and storing the data.

It is when variety is large that firms can create value from big data.

“Variety works with volume to produce benefits that outweigh the costs, and positively affect firm performance”, they state.

Furthermore, high values for big data veracity allow firms to capture value by producing insights able to improve business performance.

Since the authors’ focus is on data coming from mobile application, they do not focus on velocity because, being collection almost instantaneous, velocity would unlikely be a differentiating factor. While velocity is not used in this study, it remains an important dimension to consider, in general.

It is important for companies to move beyond the concept of big data volume as the only dimension to be interested in and focus more on the other ones, as well. Data should not be approached as a monolithic asset, rather as a multi-faceted asset that comprises several dimensions.

The authors of this article, as mentioned, focus on data produced by mobile applications and they cite as an example the negative impact of lack of data variety when Boston city authorities developed an application for citizens to report street potholes.

“The application produced a wealth of data -however- neighborhoods with a high percentage of older citizens -reported- fewer potholes resulting in wrong conclusions about which streets most urgently needed maintenance”.

This example illustrates how the lack of variety, defined as the assortment of data collected per observation, can produce bias and have a negative impact. Rather, the value of data “resides in both in the abundance and the spectrum of information from which valuable insights and better decision-making are derived”.

Veracity could be even more important than variety, of course. Clearly, inaccurate, incomplete and, at times, incorrect data will not allow a firm to produce insights on which they can rely.  Veracity is defined by the reliability, relevancy and timeliness of the data. Some studies (Acharya and Zhaoxia, 2017; La Valle, Lesser, Shockley, Hopkins, and Krushwitz, 2010) highlight how one third of business leaders do not trust the big data information they use to make business decisions.

It is often the case that data scientists need to spend a majority of their time to clean and interpret the data, contributing to the negative impact of lack of veracity on firms’ performance.

It is when all dimensions of the data are combined together, then, that the data can produce a positive net impact on the firm performance, and recently more research has been focusing on data veracity.

According to a Gartner’s Data Quality Market Survey2, poor data quality is hitting organisations “to the tune of $15 million as the average annual financial cost in 2017” but, despite this, “nearly 60% of organizations don’t measure the annual financial cost of poor quality data”.

A 2010 study by IBM found that poor data quality costs $3.1 trillion to the US economy every year3.

Another study by Dun & Bradstreet4 evaluates the cost of data at $1 per record, however it estimates the cost to correct the same record to be $100. It is stated: “It is far more cost-efficient to prevent data issues than to resolve them. If a company has 500,000 records and 30% are inaccurate, then it would need to spend $15 million to correct the issues versus $150,000 to prevent them”.

These costs can be divided in three broad categories5:

1.       Operational: actual costs caused by low-quality data

2.       Tactical: costs to assess if incorrect outcomes are due to bad data

3.       Strategic: costs derived from incorrect business decisions made due to poor data

Through the increased digitalisation of the business environment and increased customer data via their mobile devices, companies have been able to make better business decisions by analysing the trove of information they have been collecting. However, in order for the data to provide value, variety is important to create insights and prevent biases, while veracity is fundamental to produce results that are correct and strategically relevant in the long term.  Together with velocity, these dimensions represent the four V’s on which companies should focus to improve their performance and the ability to generate valuable information and insights that cannot be improved by volume which, in fact, by itself can produce a negative linear impact on firm performance.







Leave a Reply

Your email address will not be published. Required fields are marked *