Big Data or Fast Data – Does Big Really Matter?

After years of hype about big data this question remains: Does big data really matter? A recent CIO.com article – Study reveals that most companies are failing at big data – raises further questions. A recurring theme that I see at conferences – Strata, Enterprise Data World, TDWI, etc. – is many people and many organizations seeking the key that will unlock the magic of big data. Perhaps I’m a skeptic, but I doubt that the key can be found. I’m not convinced that magic can be derived from bigness of data.

Big data is often described as having three distinguishing characteristics (the 3 V’s) of volume, variety, and velocity. But we give most of the attention to volume, as if the size of the data is the predominant characteristic. I think that is a confused perspective. The magic of data depends entirely on a 4th V – value – that has little to do with the size of the data and much to do with ability to gain insight and drive innovation. That means having the right data, at the right speed, woven into the fabric of day-to-day business activities. Business decision makers don’t care about big data, they care about fast data. Meeting their expectations depends in part on using the right technologies, in the right ways, to manage today’s complex data ecosystems. See more about this in the Eckerson Group research report Big Data Management Software for the Data-Driven Enterprise.  

Rethinking the V’s, the focus on volume may be a barrier to value creation. We’ve been barking up the wrong tree … carrying water in a sieve. It’s time to turn attention to fast data, and to prioritize the V’s meaningfully: Velocity first to meet the challenges of fast data; Variety second to capitalize on the abundant opportunities of unstructured data; Volume last because sometimes datasets are so large that they require specialized techniques and technologies.

Velocity matters for all data – not just for big data. There is a small body of writing about the subject of fast data today, but largely with attention to streaming data. Data streams are data in motion – data arriving at high velocity and demanding high-speed ingestion. But fast data should not stop at ingestion. That’s only the beginning. Fast data should apply to all data big or small, streaming or stored. The purpose is to move data quickly to where it is needed, not simply to move data quickly to databases where traditional low-speed query, reporting, and analysis processes take over.

Fast data moves data quickly through an ecosystem of data warehouses, data marts, operational data stores, master data repositories, data lakes, analytic sandboxes, etc. to persist where needed but more importantly to thread data throughout business processes. A fast data environment must address each of:

  • High-speed ingestion of data streams
  • Rapid ingestion of all data sources
  • Fast data transformation and data preparation
    (For more about data preparation read Barry Devlin’s 4-part series Automating Data Preparation in a Digitalized World)
  • Fast execution of data pipelines
  • Rapid data pipeline development
  • Rapid response to data source changes
  • Rapid response to changes in application requirements.

Fast data encompasses more than computer processes operating on data. It also includes the human processes of development, operations, and change management. Putting all of these pieces together will increase the value derived from data and reduce the reports of big data failures. The real key is to give more attention to fast data than to big data.

Dave Wells

Dave Wells is an advisory consultant, educator, and industry analyst dedicated to building meaningful connections throughout the path from data to business value. He works at the intersection of information...

More About Dave Wells