May 09, 2022 / by Dave Wells Data Management

The Yin and Yang of Data Architecture

Secrets of Data & Analytics Leaders · The Yin And Yang Of The Data Architecture Audio Blog

ABSTRACT: Most data architecture discussions are heavily biased toward managing data for analytics. Well designed data architecture must give equal attention to managing operational data.

Data management architecture is a continuously evolving field. As the world of data continues to change, we experience corresponding changes in how we think about managing data. Today’s data architecture trends and discussions are heavily biased toward managing data for analytics, with lots of attention to big data, scalability, cloud, and cross-platform data management.

Modern data architecture is heavily biased toward analytical data.

Failure to address operational data management is a sure path to technical debt.

We need to acknowledge this architectural bias and consciously avoid it!

We need to acknowledge this architectural bias and make a conscious effort to avoid it. Well designed data architecture must give equal attention to managing operational data. Failure to address operational data management is a sure path to technical debt and future data management difficulties.

This is what I think of as the yin and yang of data management. In ancient Chinese philosophy, the concept of yin and yang describes how “opposite or contrary forces may be complementary, interconnected, and interdependent.”1 This duality of forces applies directly to the world of business, and subsequently to the world of data management. Analytical data provides information that supports the planning and decision making activities of business management. Operational data collects and applies information that supports the day-to-day activities of running the business. Analytics is the yang–ambitious and goal-oriented. Operations is the yin–focused and task-oriented. The two dimensions of business (and of data) are clearly complementary, interconnected, and interdependent. It is not practical to operate a business without active management, and management without operations has no purpose.

Master data is one example of applying the yin and yang concept to data. All master data begins as operational data. It is operational processes, not analytical systems that create instances of customers, products, accounts, etc. But much of the value of master data is in analytical systems–conforming dimensions for data analysis. Data analysis, in turn, provides feedback that informs operational processes and activities that create and update master data.

Let’s be clear about the terms operational data and analytical data. It can be argued that there is no difference–the same data is used for operational and analytical processes. From one perspective that is true. Data about transactions with customers, for example, is used both in operations and in analytics. Yet the transaction data created and stored by operational processes is most commonly copied, modified, and stored separately (in data warehouses, data lakes, etc.) for use by analytic processes. Ideally, at some point in the future we will end the practice of copying data, then making copies of copies, and then copies of copies, of copies, and then … on and on. The promise of zero-copy integration is enticing, but we’re not there yet. Most of the original data that we work with is operational data, and most of the copies that we create are analytical data. But those distinctions are less important than the focus on operational data management and analytical data management–concepts that will continue to be important when zero-copy integration is a mainstream practice.

Legacy Data Management Architecture

Typical data management architecture tends to isolate analytical and operational data management, treating them as separate and independent things. (See figure 1.) Although referred to as “legacy” data architecture, these practices are still very much in place in many organizations today.

Figure 1. Legacy Data Management Architecture

Legacy Operational Data Management

In many ways, operational data management in legacy architecture is treated as a subset of operational systems management. A variety of data management methods are embedded in the individual systems that create and maintain the data—OLTP systems, automation systems, and IoT systems. Disparate data is partially, and perhaps inadequately, integrated with an Operational Data Store (ODS) and/or Master Data Management (MDM) system.

OLTP systems are the primary means to create and manage operational data. Data silos are created because individual systems typically work with their unique semantic models and database schema. The mix of SaaS applications, ERP systems, custom development, and decades-old legacy systems creates barriers to shared data models and data harmonization.

Automation systems such as those for workflow automation and manufacturing automation create operational data—and additional data silos. Similarly, commercial and industrial IoT systems may add to operational data volume, variety, and disparity. These systems are not transactional in nature, but they are operational and they create and use operational data.

ODS is sometimes used to perform limited operational data integration that is primarily designed to support cross-application reporting needs. The heavy lifting of data integration is treated as a responsibility of data warehousing—an analytical data management process.

It is important to remember that master data originates from business operations.
It is operational data, not analytical data.

MDM is responsible for data integration in some specific domains, such as customer and product. We often treat MDM as if it is analytical data management. This is not surprising when we view it as dimensions by which data is analyzed, but it is important to remember that master data originates from business operations—it is operational data. Instances of customer, product, location, account, etc. are created by operational systems, not in analytical systems.

Legacy Analytical Data Management

Legacy architecture for analytical data management consists primarily of data warehousing and data lake management. Data warehousing is the older of the two approaches, preceding the advent of big data. Despite many declarations that the data warehouse is dead, data warehousing continues to be a common practice and a core component of analytical data management. We continue to manage and maintain data warehouses because they meet some business data needs. Yet they are problematic because we’ve experienced proliferation of data warehouses. Polls that I have conducted consistently show approximately 60% of respondents with 2 to 5 data warehouses, and less than 10% with only 1 data warehouse. That means around 30% with more that 5 legacy data warehouses. Additionally, data warehousing struggles to support needs for scalability, real time data, non-relational data, and other modern data management requirements.

Today we see data lakes as a new generation of data management problems.

More recently came the data lake. A decade ago it was hyped as the solution to the data management problems of the data warehouse—scalability, big data, real time, etc. Today, we see data lakes more realistically as a new generation of data management problems. Multiple and complex data pipelines, many moving parts in data management, many dependencies among those parts, cloud and multi-cloud deployments, cross-platform data and processing demands, and much more—these are the technical difficulties at the center of today’s data management struggles. In addition to technical difficulties, we wrestle with organizational and cultural challenges—autonomy and self-service vs. rigorous governance, agile development and DevOps at odds with quality control practices, enterprise data management vs. domain data management, and more.

And so continues the evolution of data management. Today’s discussions center around new thinking in data management architecture. Top-of-mind topics for data architects include data fabric and data mesh. It is in these new directions that we’ll find the yin and yang of data management.

New Directions in Data Management

Rethinking data management architecture begins with the assertion put forth at the beginning of this article: Today’s data architecture trends and discussions are heavily biased toward managing data for analytics. We need to acknowledge this architectural bias and make a conscious effort to avoid it. Well designed data architecture must give equal attention to managing operational data.

A single, consolidated architecture to manage both operational and analytical data (see figure 2) is a logical next step in the continuing evolution of data management.

Figure 2. New Directions in Data Management

Note the big difference from the legacy data management architecture shown in figure 1. It is a single view of data management instead of two loosely connected views. This architecture addresses both operational and analytical data, but it manages them together. They are complementary, interconnected, and interdependent—the yin and yang of data management. A look inside shows more shifts in architectural thinking.

New Directions in Operational Data Management

Operational data management continues to work with data from OLTP, automation, and IoT systems. Managing silos will continue to work in this new architecture, but the silos may be broken down by connecting data with a shared semantic model that is implemented using a knowledge-graph enabled data catalog.

ODH – an operational data hub replaces the legacy integration approach of the operational data store. Recall that the ODS is designed primarily to support cross-application reporting needs. The ODH goes beyond reporting to support cross-application data exchange. The tangled web of point-to-point data feeds and interfaces is replaced by a data hub that works well with publish-and-subscribe interface. Cutting-edge ODH implementations contain data products instead of simple datasets.

MDM/RDM – Master data management and reference data management takes the place of basic MDM. Sharing reference data among operational systems is one step toward integration and reduction of data disparity. Note that the MDM/RDM component has shifted from mostly operational to a balance of operational and analytical. Master data instances—customers, products, etc.—continue to be created and updated by operational systems. But they are enriched by connecting with web, social media, and other external and big data sources.

New Directions in Analytical Data Management

Analytical data management is still built on data warehousing and data lake management. The architectural shift is in the relationship of data lake and data warehouse. In legacy architecture they are separate and independent things. Rethinking addresses the need for data lake and data warehouse cohesion—the data lake and the data warehouse must work together. For modern data warehousing, it makes sense that the data warehouse is a distinct zone within the data lake. (Positioning of legacy data warehouses is a complex topic that is too long to include in this article, but there are practical ways to connect them with modern architectures—perhaps a topic for a future blog.)

One very significant difference of this view from the legacy data management view is the positioning of data lake and data warehouse. They are bounded by and operate within data fabric and/or data mesh architectural constructs.

Data fabric provides a single, unified platform for data management across multiple technologies and deployment platforms. Data fabric has three specific high-level objectives:

to resolve the complexities of cross-platform data management,
to manage multiple and complex data pipelines, and
to provide frictionless access to data wherever it is stored.

Data fabric is designed to address many of the technical difficulties of legacy data management.

Data mesh is an emerging data management architecture based on a big philosophical shift away from centralized enterprise data management. Domain data management places data ownership and data management close to those who are the data subject experts and who work most closely with the data. Data sharing is achieved through data products published by the domains. Data governance constraints, interoperability standards, and shared infrastructure support connections and dependencies among independent domains.

Extending fabric and mesh into operational data management is a practical thing to do. Although figure 2 shows them as contained in analytical data management, both can be configured to address some operational data management needs. Data mesh, for example, works when domain principles are applied to application-centric data management. The data fabric principles of frictionless data access can be applied to operational data as readily as to analytical data.

Blending Data Fabric and Data Mesh

Modern data architecture trends (and the corresponding hype) have many people asking: Data Fabric or Data Mesh – which way to go? Both architectural approaches promise to solve some of today’s data management challenges. Don’t get hung up with the “fabric or mesh” question. Instead think about fabric and mesh. Data fabric and data mesh are not alternatives. They are not mutually exclusive. They are complementary and they can and should work well together.

Data fabric is data pipeline oriented. Data mesh is data product oriented.

Designing a hybrid fabric/mesh architecture to meet your data management needs makes a lot of sense. Data fabric is pipeline oriented. Data mesh is data product oriented. Both are important to data management. Data fabric focuses on moving data from point-of-access to point-of-use, often transforming the data along the way. If we can’t deliver data to points of use, then the data becomes useless. This is especially important with today’s sprawl of data across multi-cloud and hybrid deployment platforms. Data mesh focuses on products–assemblies that can be built using data as raw materials. A data product approach increases value that can be derived from data. Store data to make it persistent and accessible. Move data through pipelines to deliver it to where it is needed. Build data products to tightly bind data with purpose.

On May 25 and 26, I will be teaching a live, online, and highly interactive 2-day workshop, Blending Data Fabric and Data Mesh – A Pragmatic Approach to Data Architecture, that is specifically designed to take you into this next generation of data management architecture. Please join us if you’d like to dive deeper into these exciting trends.

Previous post by expert Next post by expert

Dave Wells

Dave Wells is an advisory consultant, educator, and industry analyst dedicated to building meaningful connections throughout the path from data to business value. He works at the intersection of information...

More About Dave Wells