Data Observability: the Path to Healthier Analytics and AI Operations

For years, enterprise data pipelines served a stable, predictable world. Then data demand and supply exploded, and things got complicated.

A growing population of enterprise data consumers now tackles new use cases with new algorithms and new data from new sources, often on a real-time basis. This forces data teams to adopt new tools and run analytics workloads on new platforms. The complexity, as described in my earlier blog, boils down to a simple problem: we spend too many calories managing and analyzing rather than deriving value from it.

Data observability, or observability for short, proposes a systemic solution that takes a fresh approach compared with previous generations of application performance monitoring (APM), DataOps and ITOps. This blog defines observability and examines how key stakeholders can use it to meet business requirements for production analytics and AI workloads. It explains how data observability applies to three overlapping segments—on-premises data lakes, hybrid cloud and multi-cloud data warehouses—to help data analytics leaders understand how best to regain control of all those pipelines.

Scan the forest, and inspect the trees

Observability software seeks to monitor, automatically detect, predict, prevent and resolve the issues that break production analytics and AI workloads. It correlates thousands of data workload events across multiple servers, nodes, clusters, containers and applications. With observability, business owners, analysts, architects and engineers can speak a common language. They share intuitive multi-layered and cross-sectional views to understand causes and inter-dependencies across the application, data and infrastructure layers of the stack.

Here is a quick look at how data observability builds on previous generation technologies:

  • APM. Traditional APM tools monitor operational applications and their underlying resources to diagnose and help remediate issues. Data Observability applies these familiar APM functions to analytics & AI workloads.
  • DataOps. The emerging discipline of DataOps improves data pipelines with continuous integration and deployment (CI/CD), workflow orchestration, testing and monitoring. Observability addresses the monitoring aspect of DataOps for data quality, lineage and performance.
  • ITOps. For years IT operations tools provisioned, managed, monitored and tuned infrastructure. Now the AIOps subset of ITOps sharpens IT monitoring, diagnosis and remediation—as does observability for your data pipelines.

Data observability glues together these pieces to provide end-to-end views and correlate analytics workload events across the layers. Data engineers, for example, can isolate a slow ETL job at the data layer, then work with platform or site reliability engineers (SREs) to isolate the storage configuration error that is causing it. They can resolve data quality issues more rapidly, for example by pinpointing the root cause in a complex ETL job. They can prioritize machine learning algorithms by execution time, then drill down into the problematic areas for inspection and debugging. Observability also helps analysts and business owners control the efficiency and cost of their enterprise analytics and AI stacks.

With data observability:

  • Platform engineers and site reliability engineers ensure infrastructure performance, efficiency and capacity.
  • Architects and data engineers improve data access, data lineage and data quality. 
  • Analysts and business owners make smarter decisions, plan their analytics processes more effectively, and control costs.

Forests to Observe

Observability provides the greatest value for three overlapping types of environments: on-premises Hadoop data lakes, hybrid cloud, and multi-cloud. Each brings a different mix of challenges and requirements.

Hadoop on-premises. Observability helps data teams manage the Hadoop data lakes that persist in large on-premises environments. Enterprises planted some analytics data and workloads in Hadoop 5-10 years ago, and many have been slow to abandon the investment altogether even as more manageable, cost-effective alternatives arise on the cloud. As a result, many data teams still run analytics on Hadoop, and need help maintaining performance levels across myriad Apache open source components. Spark accelerates batch processing on Hadoop compared with MapReduce, but often still needs careful monitoring, troubleshooting and debugging to meet production analytics latency and throughput requirements. Data observability can assist.

Hybrid cloud. Data warehouses and data lakes are converging on a common set of functions in the cloud, pairing high-performance SQL query structures with efficient, elastic object storage. Enterprises adopt these cloud data platforms, such as Microsoft Azure Synapse Analytics, Databricks and Snowflake, to reduce administrative hassle and consolidate data workloads. But data teams still need observability to keep a close eye on cloud platform performance, for example to meet BI query latency requirements. They also need to reduce the risk of compute cost overruns—and maintain all the necessary linkages back to legacy on-premises systems.

Hybrid multi-cloud. As enterprise data teams gain experience on the cloud, they seek to optimize workloads and meet specialized requirements by shopping around for new AI tools, cloud data platforms, etc. As a result, many enterprise environments now include two or even three Cloud Service Providers (such as AWS, Microsoft Azure, Google Cloud) in addition to legacy on-premises systems. Data observability helps them discover assets in a complex data landscape, understand these distributed topologies and monitor data flows into and between clouds. It also helps business owners compare the efficiency and cost of different clouds, to optimize and rebalance workloads.

Start Observing

Like its predecessors APM, DataOps, ITOps and AIOps, data observability is a critical platform because it addresses an indisputable technology problem, namely complexity, that will persist for years to come. Data analytics leaders can use data observability to simplify how they manage data pipelines, and forge a path to healthier AI & analytics operations.

To learn more about Acceldata’s data observability solution, click here.

Kevin Petrie

Kevin is the VP of Research at BARC US, where he writes and speaks about the intersection of AI, analytics, and data management. For nearly three decades Kevin has deciphered...

More About Kevin Petrie