Architecting and Automating Data Pipelines: A Guide to Efficient Data Engineering for BI and Data Science
Explosive growth in data volumes, users, and use cases causes pain for all data stakeholders, and especially for architects and data engineers. Demand for analytics-ready data far outstrips the capacity of data engineering groups to build and support data pipelines. This data supply-and-demand imbalance creates pressures for everyone who works with data, and creates a multitude of data management challenges.
Architecture is an essential first step along the path to data pipeline automation. Data pipeline architecture is the framework that establishes standards and conventions for data pipelines and supports definition of patterns and templates to achieve consistency and accelerate pipeline development. Well-architected data pipelines lead to marked improvements in data access, analytics value, data engineer and data analyst productivity, and adaptability to changing business, regulatory, and technical environments.
Despite the common use of the term data engineering, most modern data pipelines are handcrafted with little attention to frameworks, standards, reusable components, or repeatable processes. Data pipeline automation shifts pipeline development away from handcrafting, moving closer to true engineering discipline. Data pipeline automation uses technology to gain efficiency and improve effectiveness for data pipeline development and operations. And it goes beyond simply automating the development process to encompass all aspects of pipeline engineering and operations including design, development, testing, deployment, orchestration, and change management.