Operational Data Architecture
ABSTRACT: For many years data architecture efforts have focused on analytics. Now it is time to give attention to operational data architecture.
Over the past 20 years or more, data architects and data architecture practices have focused almost exclusively on managing data for analytics, with operational data considered simply as data sources. Operational data is much more than source data for analytics. It is the data used to run the business, while analytical data is the data used to observe the business. Day-to-day operation of business doesn’t work without operational data, yet we’ve largely ignored it as a critical part of data management architecture.
Operational systems are a prime source of technical debt, and operational data is a significant contributor to many data management challenges. Operational systems and data are especially significant for data sprawl, data silos, and data disparity. Data architecture that ignores operational data compounds these problems.
The Nature of Operational Systems
Over the decades that we’ve disregarded operational data, many changes have occurred in the world of operational systems and data. What was once primarily transactional systems has evolved to encompass transactions, workflow automation, process automation, commercial IoT, and industrial IoT. Here is a summary of those operational systems and the key architectural questions they pose to data teams.
Operational systems now encompass transactions, workflow automation, process automation, commercial IoT, and industrial IoT
Transactional Systems record business interactions as they occur in day-to-day operation of the business. Purchases, sales, payments, hiring, termination, and many more kinds of transactions occur with data captured as a historical record and audit trail of business activity. Transactions may be recorded in real time or with some latency. Transactional data typically flows to analytics systems as source data for data warehousing and data lakes, often with the latency inherent in batch processing. Does data architecture need to rethink this?
Workflow Automation streamlines business processes, coordinating actions across systems, data, and teams. Workflow systems support workflow and document configuration, task management, document flow, authorization and signatures, and policy compliance tracking. Workflow automation tools standardize forms and processes, pass data between systems, and collect data about the state of tasks and forms. These systems collect and store data in the form of documents as well as records of document routing and processing. Should the documents, or the document flow records, flow to a data lake?
Manufacturing Automation systems integrate machines, software, and data. Manufacturing processes run autonomously or with reduced workforce. Manufacturing automation includes robotics, machine control, machine performance monitoring, preventive maintenance, programmatic customization of products, and quality control. Machines receive data to control their operations and collect data to monitor conditions and behaviors. These systems rely heavily on real-time and very low latency data. Operationally, much of this data flows in real time. Should it also flow to a warehouse or data lake as history of machine operations and machine performance?
Commercial IoT Systems integrate software, networked devices, and real-time data to automate management processes for commercial buildings such as office buildings, stores, shopping malls, hotels, etc. These systems monitor and adjust environmental conditions such as temperature and air quality, economize utility costs and energy consumption, manage and monitor building access, and provide other smart building capabilities. Operationally, the data flows in real time. Should it also flow to a data lake to record the history of smart building operations?
Industrial IoT (IIoT) Systems interconnect sensors, instruments, machines, devices, and computer applications such as manufacturing and energy management. These interconnected components collect and exchange data to achieve productivity and efficiency gains. IIoT systems consist of devices capturing sensor data, networks connecting devices, services processing data, and content delivering IIoT metrics. Operationally, the data flows in real time. Should it also flow to a data lake to record the history of industrial operations?
The Challenges of Operational Systems
Operational systems include debt-heavy legacy systems and a steadily increasing proportion of purchased applications (ERP, SaaS, etc.) each with unique semantics, data models, and proprietary data architecture. The result is continuing accumulation of technical debt that inhibits data sharing, data interfaces, data quality, and even data meaning. These issues create architectural problems of two kinds. These are data sharing and data integration among operational systems, and interoperability between analytic and operational data architectures.
Operational systems pose two architectural problems: data sharing and integration, and analytics-operations interoperability
Data Sharing and Data Integration Among Operational Systems
Data sharing challenges are ubiquitous in operational systems. Technologically the systems and databases typically operate as silos, but the business processes that they support are interconnected and interdependent. The result is many point-to-point system interfaces and data feeds—i.e., a tangled mess of data flows that is a barrier to change and an ongoing maintenance problem. (See figure 1.) The overarching problem of operational data integration is semantic differences among the many operational applications. Operational data sharing and data integration are difficult when systems operate as data silos without a unifying semantic model.
Figure 1. Point-to-Point System Interfaces
Semantic Integration of Operational Data
Semantic integration (not physical data integration) is fundamental to resolving both sharing and data integration difficulties. A common semantic model is the foundation of many practical architectural solutions including operational data store, operational data hub, zero-copy data network, and data products.
Common Semantic Model
Mapping data silos to a semantic model is the first step to breaking down the silos. The ontology example illustrates concepts (what data modelers know as entities) and relationships among concepts. The model must be extended to include properties (what data modelers know as attributes) to work well as a foundation for operational data sharing and integration. Figure 2 shows an example of ontology as a semantic model.
At this level of detail, the ontology is represented as a graph data model. It can be developed using any of the many graph modeling tools that are available. Visual representation may be different depending on the tool of choice, but the components of the model are consistent. These include nodes to represent concepts or entities, edges (arrows) to represent relationships, and lists to represent properties.
Figure 2. Semantic Model Example
This model is the control structure to connect the dots among data silos. Mapping the data stores of each independent application and database to the semantic model is the means to identify overlaps, common characteristics, and differences among the databases. You’ll find applications that store the same data using different terminology, applications that store different data using similar terminology, applications that store related data with different meanings, and applications that store similar data with different formats and encoding. Ontology becomes the common denominator and controlled vocabulary used to understand data silos through a single lens. How ontology is applied varies for each of several architecture patterns—operational data store, operational data hub, zero-copy data network, and data products—that can be used to resolve data silos.
Operational Data Architectural Styles
Several architectural styles or design patterns can be applied to meet the challenges of operational data management. Operational data store (ODS) and master data management (MDM) are relatively mature patterns targeting specific data management needs. ODS is an approach to fixed-schema data integration most commonly applied for enterprise reporting, and not particularly effective for sharing data among operational systems. MDM is effective for data integration and data sharing. It is limited to master data, which is a small subset of the full scope of operational data.
More recent architectural patterns include operational data hub (ODH), zero-copy data network, and data products for data sharing. A quick search of the term “operational data hub” yields several definitions and descriptions with little consistency among them.
The operational data hub as illustrated in figure 3 is a messaging hub where applications share information about data events using a publish-and-subscribe model.
Figure 3. Operational Data Hub
The zero-copy data network is an emerging pattern, with supporting technology still evolving, that links data based on a common semantic model. (See Figure 4.) To learn more about zero-copy data management visit the Data Collaboration Alliance, and see Cinchy to learn about supporting technology.
Figure 4. Zero Copy Data Network
Product-based data sharing is based on the domain data management principles popularized as a core concept of data mesh architecture. (See figure 5.) Operational systems are grouped as data management domains based on business process affinity. Each domain is responsible to publish data products accessible to other domains.
Figure 5. Data Sharing with Data Products
Beyond this Introduction
There is much more to operational data architecture than I can express in a single 1,500 word article. I have barely introduced the three styles of ODH, zero-copy, and data products. In future articles, I will explore each of those in greater depth. In the near term you can join me for an Operational Data Architecture tutorial at Enterprise Data World Digital on March 27.