Register for "It's All About Change: How to Ensure Adoption of Whatever You Deploy" - Thursday, January 30, 1:00 pm EST

Good Data Architecture - Anti-Monolith

ABSTRACT: A fresh perspective on data architecture, advocating for an 'anti-monolith' approach inspired by engineering best practices.

Read time: 3 mins.

System Design

Although I’ve been tackling data architecture challenges for nearly 3 decades, I started my career as a mechanical engineer. Amongst the many things I appreciated about this mature engineering discipline were the concepts of component and system engineering. At some level of complexity, components become a useful way of breaking down and parallelizing design tasks for more complex systems. A factory is not designed as a monolith, but rather broken down to independently designed and constructed components, interfaced through well-defined system requirements.

Over the past decade, software engineering has also started to emerge as a mature engineering discipline. Amongst the many facets of this movement, microservices has risen an established pattern for systems of significant complexity. Like the factory example, microservices enables large systems, like an ecommerce platform, to be broken down into smaller independent components, integrated through well-defined contracts. This not only enables design and build efficiencies, but also creates a more resilient and more evolutionary system design.   

Evolutionary architecture is perhaps one of the most powerful of these enablements. In tightly coupled, monolithic systems the idea of changing or upgrading even just one component of the system seems daunting, or perhaps even impossible. In microservice architecture, a component can be changed or wholly rewritten with relatively low risk, so long as it upholds its contract. These changes can be largely invisible to downstream components of the system.

Data Engineering is Decades Behind

My career has been divided, in near equal parts, between data engineering and software engineering. Needless to say, data is where my heart is, and should be an important part of any technology system build. However, having been on both sides, I’ve been somewhat frustrated with why my chosen specialty is decades behind software engineering.       

Most data professionals have used the phrase “data is just different” as a crutch, when explaining why tools and processes have remained so primitive. Having worked so long in this world I get it, there are many challenges, some self-inflicted, but others by larger industry and culture constraints.     

Our philosophy is centered around a “single version of the truth”. Although this is a noble goal, it has resulted in monolithic architecture. The majority of data platforms are single complex code bases and a large central database. Worse yet, in the largest organizations, this is infeasible, resulting in multiple monoliths that are siloed without interoperability.

Organizations have adopted large, centralized teams and ownership models, which has led to the impossible job of knowing everything about tech and business, and creating cross-team friction for almost any initiative.    

Finally, there is the technology and tooling problem within data engineering. We establish our contracts at the database level, instead of API’s, which are more capable of graceful evolution. There is minimal adoption of continuous integration, continuous deployment, automated testing, and observability, all established standards in software engineering.

Positive Progress on the Horizon 

The good news is there has been progress and discourse on the maturation of data architecture over the past few years. From a tooling perspective, we have seen CI/CD, and observability become more mainstream. For most of my career, I’ve been screaming about automated data quality checks and integration testing, and writing my own tooling. Now, there is a diverse ecosystem of open-source and commercial tooling and emerging quality standards.

On the anti-monolith front, we have seen amazing and thought-provoking discussions around Data Mesh design. Data Mesh design is essentially a techno-socio approach to implementing microservice patterns to data architecture. Although largely conceptual, there are practical technical patterns that we should already be incorporating into our data architecture and engineering implementation. In coming years, we will likely see more of the concepts getting groomed for adoption.

dbt (database build tool) has also emerged as a great solution to many of these challenges. It addresses many of the before-mentioned technical aspects, as well as proves practical Data Mesh support in the form of data contracts and cross-project references.

How to Anti-Monolith?

My grandfather had a saying, “It’s better to do something than nothing”, which is a philosophy I try to apply to many complex problems. As mentioned above, there are practical patterns and technologies that we can immediately apply to our work as data professionals. We’ll continue to provide thought leadership and resources to support the community in doing “something” about anti-monolith architecture and modernization.   

In the meantime, let’s think twice before saying, “data is just different.”

Elliott Cordo

Elliott is the Founder of Datafutures, an expert in data and software engineering, cloud architecture, and technology innovation with a passion for helping transform data into a powerful organizational...

More About Elliott Cordo