Data Fabric - Hype, Hope, or Here Today?
What is Data Fabric?
Data fabric is a combination of architecture and technology that is designed to ease the complexities of managing many different kinds of data, using multiple database management systems, and deployed across a variety of platforms. A typical data management organization today has data deployed in on-premises data centers and multiple cloud environments. They have data in flat files, tagged files, relational databases, document stores, graph databases, and more. Processing spans technologies from batch ETL to changed data capture, stream processing, and complex event processing. The variety of tools, technologies, platforms, and data types make it difficult to manage processing, access, security, and integration across multiple platforms. Data fabric provides a consolidated data management platform. It is a single platform to manage disparate data and divergent technologies deployed across multiple data centers, both cloud and on-premises.
Why Data Fabric?
The complexities of modern data management expand rapidly as new technologies, new kinds of data, and new platforms are introduced. As data becomes increasingly distributed across in-house and cloud deployments the work of moving, storing, protecting, and accessing data becomes fragmented with different practices depending on data locations and technologies. Changing and bolstering data management methods with each technological shift is difficult and disruptive. As technology innovation accelerates it will quickly become unsustainable. Data fabric can serve to minimize disruption by creating a highly adaptable data management environment that can quickly be adjusted as technology evolves. A data fabric platform should have features for:
- Unified data management: Providing a single framework to manage data across multiple and disparate deployments reduces the complexity of data management.
- Unified data access: Providing a single and seamless point of access to all data regardless of structure, database technology, and deployment platform creates a cohesive analytics experience working across data storage silos.
- Consolidated data protection: Data security, backup, and disaster recovery methods are built into the data fabric framework. They are applied consistently across the infrastructure for all data whether in cloud, multi-cloud, hybrid, or on-premises deployments.
- Centralized service level management: Service levels related to responsiveness, availability, reliability, and risk containment can be measured, monitored, and managed with a common process for all types of data and all deployment options.
- Cloud mobility and portability: Minimizing the technical differences that lead to cloud service lock-in and enabling quick migration from one cloud platform to another supports the goal of a true hybrid cloud environment.
- Infrastructure resilience: Decoupling data management processes and practices from specific deployment technologies makes for a more resilient infrastructure. Whether adopting edge computing, GPU databases, or not yet known technology innovations, the data fabric’s management framework offers a degree of “future proofing” that reduces the disruptions of new technologies. New infrastructure end-points are connected to the data fabric without impact to existing infrastructure and deployments.
Hype or Reality?
The promise of data fabric is sure to generate interest. But we’ve seen the waves of technology “silver bullets” many times in the past. Will data fabric live up to the promises? That’s a hard question to answer without a crystal ball. Data fabric is still emerging and evolving but the positive signs are abundant and clear. Many technology vendors have embraced the data fabric concept and are developing and delivering a variety of product offerings:
- Denodo, an innovator and leader in data virtualization, is moving rapidly toward a data fabric platform.
- Oracle Coherence, a large-scale data grid, has the definitive characteristics of data fabric.
- Paxata is stretching the AI and machine learning capabilities of their data preparation technology in ways that begin to shape a data fabric offering.
- Talend has defined a data fabric functional architecture in which may of the Talend technologies fill roles to provide a data fabric platform.
- Cambridge Semantics’ Anzo Smart Data Lake technology has full reach across multi-cloud, hybrid cloud, and on-premises data deployments.
- Cloudera and Hortonworks are both evolving data fabric solutions that extend their management solutions for the Hadoop ecosystem.
- Syncsort is moving quickly toward a data fabric offering as they expand and evolve their AI and machine learning capabilities.
- Trifacta continues to enhance their data wrangling technology with advanced algorithms and smart automated processing that reaches across multiple data platforms.
- Informatica continues their long-standing practice of innovation, extending their data management technologies to reach cloud, multi-cloud, and hybrid cloud environments.
You can be sure that I’ve missed some software vendors with this list. It is intended to be illustrative, not exhaustive. But I think it is safe to say, with ten or more vendors of this magnitude reaching for data fabric, that it is real—still developing and still evolving, but very real.
Hype? No, it is certainly more than that. Hope? Yes, it does offer hope to ease the pain of multi-platform data management. Here today? Well … not fully mature yet, but coming fast. Data fabric will quickly become a centerpiece of smart data management.