Apache Spark

An in-memory, open-source engine for processing large volumes of data that supports data science, data engineering, and SQL workloads on single nodes or clusters.

Added Perspectives

The Spark platform prepares the data in micro-batches to be consumed by the HDInsight data lake, SQL data warehouse, and various other internal and external subscribers. These targets subscribe to topics that are categorized by source tables. With this CDC-based architecture, StartupBackers is now efficiently supporting real-time analysis without affecting production operations.

- Kevin Petrie in Best Practices for Real Time Data Pipelines with Change Data Capture and Spark August 8, 2018

(Blog)