Data Fabric’s Use of Abstraction and Metadata
ABSTRACT: Data fabric may be a buzzword. But it’s shorthand for data issues that we must manage. Read on to learn what data fabric is and why it’s important.
This blog is sponsored by InterSystems.
Data fabric is one of those buzzwords that’s used so much and in so many ways that it often elicits an eyeroll—undeservedly so. The phrase is shorthand for a complex and important set of issues that we’re all working to manage. In this article we’ll review what data fabric is and why it’s important. We’ll focus on how it uses abstraction and metadata to address the challenges of modern data.
Data fabric’s objective is to increase the value organizations derive from their data in light of exploding data volumes, sources, formats, and use cases. It’s an architectural approach that uses metadata, machine learning, and automation to weave together data of any format in any location and make it easy for people and systems to find and consume. It unifies the separate functions of data management—integration, preparation, cataloging, security, and discovery—into a cohesive process through intelligent automation.
Business teams need to make fast data-driven decisions and data teams need new ways to keep up with the demand for data. The data fabric addresses these challenges by offering the following benefits:
Faster time to insight. Time to insight describes a measure of data management effectiveness: the amount of time it takes to use data to get to an “aha” moment that informs business action. Data fabric seeks to significantly reduce time to insight. In addition to providing intelligent automation and enhanced discoverability, a data fabric hides the complexities of enterprise data so that consumers don’t have to know where data is stored or what format it’s in. Removing those details lets data consumers focus on conducting their analyses rather than wrangling the data they need.
Reduced data management workload. Data fabric uses AI and machine learning to automate many data management tasks, such as cataloging new data sources, improving discoverability through natural language search, and assisting data preparation. Automation can eliminate time consuming manual steps that make it difficult, if not impossible, for data teams to keep up with demand for data.
More effective data discovery and access. A fully populated and up-to-date data catalog, combined with advanced search capabilities, enables data consumers to know what data is available to them. It’s like having Google for enterprise data. Keyword search functions along with rich documentation about data assets enables business analysts, data scientists, and data teams to find data resources. With these in hand, they can evaluate their value for a given use case. Discovery ultimately leads to data access. Data consumers need to connect to the data resources they’ve found and put them to use. Data fabric enables direct access to data or the ability to request access from the data owner.
Abstraction is key to how data fabric delivers these benefits. First, let’s define the term. Abstraction is the process of removing details that are not relevant in a certain context in order to emphasize other details that are relevant for a given purpose. Data fabric uses abstraction in a variety of ways. As a first step, it removes the details of source data format and location, thus emphasizing the content of the data that the source provides (see figure 1).
Figure 1. Data Fabric Creates Abstracted Data Objects
Data Sources to Abstracted Data Objects
The abstraction layer delivers data in either physical or virtual form. Data pipelines create physical abstracted data objects (ADOs), while data virtualization techniques create virtual ADOs. Virtualization is preferable because it reduces cost and risk by not making copies of data. However, sometimes performance issues or security constraints require data to be copied and moved. So, a data fabric needs to do both.
ADOs and Derived Data Views
Abstracted data supports a wide range of analytic uses. ADOs that mirror source objects create a logical data lake that supports the data science need for flexible data exploration and mash-up. In addition, power users can access ADOs through the preparation function to create derived data views that are optimized for specific analytic uses. These derived data views can support multiple scenarios. For example, they can serve as conformed data sets for business intelligence, or serve as prepared data inputs for predictive models (see figure 2).
Figure 2. Derived Data Views
Metadata serves as the foundation of the data fabric. Data fabric uses metadata to create its abstraction layers. While abstraction hides the details of data’s location and format, metadata keeps track of those details to use in functions such as connecting to sources, retrieving data, and making ADOs discoverable. Data fabric also uses metadata to power AI-driven processes that automate processes such as data classification and preparation.
Data Fabric Approaches
There are many approaches to designing a data fabric. The challenges and opportunities your organization faces determine what aspects of data fabric should be emphasized. Here are three approaches, each of which focus on different business problems:
Empowering businesspeople. Empowering businesspeople with access to data so that they can answer questions and make informed decisions is an important objective of data fabric. If this is your primary concern, then the ease of discovery and preparation functions are critical. For example, natural language processing (NLP), a form of artificial intelligence, makes finding data easier. It takes regular sentences or phrases and intelligently parses the words to find relevant data resources.
All-purpose analytics. The data fabric supports data science, embedded analytics, and streamlining data management. Data scientists need the unaltered data of ADOs (see figure 3 above) to organize and refine data for AI/ML models, and for real-time analytics embedded in applications. Companies with large, complex data environments have a greater need for automated data management functions to keep up with cataloging, data quality, and compliance.
Operational data management. Up to now, we’ve been discussing how a data fabric supports analytics use cases. But data fabric can also play a role in the transactional world of operations. Managing real-time integration between applications to bridge data silos and reaching to the edge of the data landscape where IoT devices live are examples of transactional use cases that require the most advanced approach to data fabric. For example, InterSystems IRIS data platform uses a distributed architecture with high performance transactional-analytic database management, integration, and analytics capabilities. Their platform enables high volume real-time processing with embedded analytics for AI/ML, business intelligence, NLP, business rules, and business user self service capabilities to power a broad range of low latency operational data management use cases.
Data Fabric and Beyond
Data fabric is a technology-driven approach for democratizing data in today’s vast and complex data landscape. It uses abstraction, metadata, and AI/ML automation to reduce data management workloads and help teams keep up with never-ending demand. Through advanced discoverability and access, it enables consumers to find and use the data they need. Data fabric removes the complications of disparate and distributed enterprise data, allowing users to focus on deriving insights and gaining business value rather than wrangling data.
As data fabric continues to evolve, its scope will likely expand to encompass other aspects of data management, such as DataOps and observability. We can also expect to see a blending of the organizational innovations of data mesh—domain-oriented data ownership and data as a product—with the architectural patterns of data fabric. In future articles, we'll explore the cross-pollination of data fabric and data mesh ideas.
To learn more about data fabric, see my full report, Data Fabric: The Next Step in the Evolution of Data Architectures.