Analyst Series - Data Fabric: The Next Step in the Evolution of Data Architectures
Summary
Dan and Jay discussed the concept of Data Fabric, an automated and AI-driven approach to managing modern data environments. They also compared it to another new data architecture called Data Mesh, which focuses on distributing the responsibility for data to different functional domains within a business.
Jay and Dan discussed the differences between Data Mesh and Data Fabric. Data Mesh emphasizes a domain-oriented organization of data and a granular, distributed approach to managing data products, while Data Fabric relies on a centralized approach and abstract data objects to de-emphasize the location and format of data.
Jay and Dan discussed how artificial intelligence and machine learning are core to the data fabric, as they help automate functions such as identifying sensitive data and pipeline preparation. They also talked about how abstracted data objects can be created either as virtualized or persistent objects to manage the volume and velocity of data.
Jay and Dan discussed the benefits and risks of implementing a data fabric, which involves integrating different products to work together in a coordinated way. Jay emphasized that data fabric is not a product, but rather a matter of integration, although there are now product suites available from different companies.
Transcript:
Dan O'Brien
I'm Dan O'Brien, a research analyst at Eckerson Group. Today I'm speaking with a fellow Eckerson Research Analyst, Jay Piscioneri, about his most recent report, “Data Fabric: the Next Step in the Evolution of Data Architectures.” Jay has over 25 years of experience in data technology, including data warehousing, business intelligence, data quality, and data governance.
Hi Jay, thanks for speaking with me today.
Jay Piscioneri
Hey Dan, a pleasure to be with you.
Dan O'Brien
Alright, let's hop into it. So what is Data Fabric? And do you know when the term was coined?
Jay Piscioneri
Yeah, well Data Fabric is an architectural approach for managing modern data environments, which are vast and diverse in both the format and the location of data, as well as the volume and velocity of data coming in.
The old approaches of managing these environments manually are no longer viable. So data fabric is a way of managing those challenges through automation, metadata and actually, you know, in AI and ML, ways of managing the data.
The term was first coined somewhere in the 2016, 2017 timeframe. It's, I think, most commonly associated with a Forrester researcher, but it's sort of taken my life on its own since then.
Dan O'Brien
In your report, I appreciated your discussion of what data fabric is not. There's a lot of terms that are thrown around these days. Can I ask you to compare data fabric with another new data architecture: data mesh?
Jay Piscioneri
Well, sure. So here's the thing. All the architectures that have been tried over the many years of dealing with data are fundamentally aimed at a couple of different common objectives, such as making data easier for people to use, often called democratizing data, and processing it faster. Data mesh and data fabric share these objectives.
One of the main differences between them is more organizational than architectural. DataMesh talks about distributing the responsibility for data to business domains or functional domains. By that I mean different areas of the business, such as marketing and sales..
In Data Mesh, the assertion is that data should be managed by those who are closest to the data, closest to the functional applications that create the data, and closest to the people who need to understand what's happening in their functional world. In other words, data is closest to those that have the greatest need for the data.
So that's one aspect of the difference, because Data Fabric doesn't make any statements about how you should organize yourself as a company.
The other significant difference is the implementation approach. Along with this domain-oriented organization of data in Data Mesh comes a much more granular and distributed way of managing the data itself.
Data Mesh talks about the concept of a data product, a data product being something that delivers value with data. And we've had data products for a long time, but the thinking of applying product management techniques to data is relatively new. In contrast, the data fabric doesn't necessarily have a position on this. When you implement a data product in Data Mesh, you're implementing it as a self-contained unit.
Data Mesh says that the data product should be responsible for the pipeline aspects of pulling the product together, the metadata, the quality, and making the data product discoverable, and trustworthy for the data-consuming audience that it's trying to reach. And that means that it's on the shoulders of those in these functional domains, as opposed to a M&F product, to make sure it does what it says it's going to do and that the users can count on it to continue to work over time. Even though the functional systems under the cover may be changing, it's that domain team's responsibility to shield those changes as much as possible from the data consumers who rely on it.
Again, data fabric doesn't take a position on this, but by contrast data fabric is much more of a centralized approach where it's working within the paradigm of a centralized data team and centralized IT and so forth to produce the systems that link data together into a fabric.
The irony of these two terms is that, well, a fabric is a form of mesh, right? And a mesh is a form of fabric. So, they have that in common too along with shared objectives.
Dan O'Brien
You said data fabric relies on metadata and this also involves abstract data objects. It sounds like they could be a very like data product, in the sense that they are a kind of data packaging. Could you describe for me an abstract data object and its purpose?
Jay Piscioneri
Let's step back a moment to ask what we mean by abstraction. Abstraction is a principle of computer science that goes back to the beginning. It simply means elevating certain aspects, in this case, of data and de-emphasizing others that are not relevant to the context in which the data is being used.
So specifically, an abstracted data object is trying to de-emphasize the location and the format of the data so that the consumer doesn't have to worry about that. The consumer sees an abstracted data object which could be a raw data set from the stream or it could be a table from a database or it could be a set of parquet files or whatever the nature of the data is, but they don't know anything about what's under the covers.
What they're seeing is a set of data that they can access and use in combination. And so that's the abstraction. So you can have abstracted data objects which are sort of mirror copies of a raw data set. Or you can have abstracted data objects that are pre-combined, which is more in the way we think of things traditionally like a data warehouse, which takes many data sets and conforms them into one. For example, conform customer dimension or a sales fact that takes sales transactions from a number of different sources.
The data fabric abstracted object concept allows you to do that either by creating virtualized objects or persistent objects. You probably have the need to do both.
Dan O'Brien
Artificial intelligence is sweeping the data analytics world. Artificial intelligence and machine learning techniques, are they core to data fabric or an accessory, something that can be built on top of it?
Jay Piscioneri
They are core to the data fabric in the sense that if one of your primary challenges is keeping up with the volume and velocity of data, then managing it manually is just infeasible, right?
So how do you automate those functions? A couple of examples are identifying potentially sensitive data: PI data, PHI, PCI. That can be done now through automated processes of pattern mapping and automated profiling and so forth. For example, let's say you're bringing onboard a new data source from a new application. You've already got a policy that says, we're gonna mask email addresses for everybody except for certain privileged roles. That's the kind of thing that can be automated, whereas before that it had to be manually implemented either through code or built-in tools.
Another case is in pipeline preparation, right? If you want to empower an analyst in the marketing department to build their own data product, then they need a tool that can help them navigate the various data sources that they're going to pull together. This is again where AI and ML come in. Intelligent systems can evaluate the data, figure out how they join together and even write the code that can then be executed. If the user is sophisticated enough, they can review the code and understand what's going on under the covers and modify it accordingly. This could be through a development environment or directly in the code itself.
Dan O'Brien
So there's some clear benefits of a data fabric. Would you say there are many risks or weaknesses with data fabric?
Jay Piscioneri
I think the hard parts of data fabric are the integration of many different tool sets, right? And the risks associated with that are or do you have technical skill within your company to figure that out. How do you integrate with tools you already have?
And then one of the most common risks for just about any IT initiative is biting off more than you can chew. So start small. Experiment and iterate and then expand to more use cases, bringing on more data cells.
Dan O'Brien
And that leads into the next question. You emphasize that data fabric is not a product. Can you say why that is? And is it possible that it will be offered as a product in the future?
Jay Piscioneri
So that statement may not be as true as it once was. And I can think of at least one vendor that would argue that point with gusto.
The reason why we say that it's not a product is because it's really a matter of integrating the number of different products, right? You've got functionality to (1) connect to the data, (2) retrieve the data, like a query engine, (3) cataloging, and (4) pipeline development. These are all classes of products. And in a data fabric, you're basically making them work together in a coordinated way. That's the work or the product of the fabric.
It's now there are product suites from different companies that, you know, traverse those different layers of the architecture, but that if you're doing a best-of-breed type of approach, or especially if you've already got a number of different products implemented within the company, then then that integration will be becomes the heavy lifting of implementation.
Dan O'Brien
Well, fantastic. Thank you for speaking with me today, Jay. I really appreciate it. If you want to learn more, you can read Jay's report “Data Fabric: the Next Step in the Evolution of Data Architectures” at our website, www.EckersonGroup.com. And Jay, what is the best way for listeners to keep up with your work?
Jay Piscioneri
So I think the Eckerson Group website, EckersonGroup.com. And if you look under the blogs section, you'll see my blog. That lists blogs I write on these subjects plus reports and webinars that we do.
And then also on LinkedIn. I'm always posting on LinkedIn about, chiming in on different conversations and posting questions. I'm interested in learning other people's opinions and experiences as well.