Data Architecture-as-a-Service: Liberation for Data Users
ABSTRACT: Data architecture-as-a-service or DaaS is a new self-service paradigm that empowers local data owners to create architecturally compliant data repositories.
We are soon to make a giant leap forward with self-service data and analytics. We’ve developed self-service tools for reporting, analysis, dashboarding, data set preparation, and even data science (e.g., autoML). But what we haven’t delivered yet are no-code tools that enable business users to create their own data repositories without IT assistance.
Versus data silos. Of course, we’ve always had data silos. Business users create them all the time in Excel, easy-to-use databases like Microsoft Access or SQL Server, or data preparation tools, like Alteryx or Tableau. What we need are “modern data silos” that enforce architectural integrity and data consistency using common dimensions, definitions, and logic. These self-service structures provide the speed and agility of data silos without the harmful consequences. These “non-siloed, data silos” are the essence of what I call “data architecture-as-a-service” or DaaS.
These self-service structures provide the speed and agility of data silos without the harmful consequences.
Business-built data domains. Data architecture-as-a-service enables business users to build local data domains or repositories without undermining enterprise data consistency and trustworthiness. It is the culmination of self-service, where business units liberate themselves almost entirely from enterprise IT. If done right, DaaS reduces data bottlenecks, eases the burden on enterprise data teams, and empowers local domains to service their own data needs. It’s also a key ingredient in the data mesh, an emerging distributed architecture for data ownership and management.
How is this possible? But here's the challenge: it’s obvious that we can’t expect data analysts to do the work of data architects or data engineers. They don’t know how to design, model, and implement robust, scalable data environments or build data pipelines that reuse standard data flows and naming conventions. We’ve seen what happens when they try: they create brittle, high-risk data silos and pipelines that don’t scale or perform well. But with DaaS, we bake architectural requirements into self-service data engineering tools so business users can create their own repositories without undermining data consistency and trustworthiness.
With architecture-as-a-service, we bake architectural requirements into self-service data engineering tools so business users can create their own repositories without undermining data consistency and trustworthiness.
Software building blocks. In our consulting practice, we’ve seen enterprise data architects create data “building blocks” that departmental analysts use to create extensions to an enterprise data warehouse. The blocks contain governance guardrails that enable analysts to create their own data marts without deep knowledge of SQL, data structures, query logic, or schemas.
Unfortunately, it’s a heavy lift for most enterprise data teams to create a self-service data infrastructure given competing demands for their time. Fortunately, some vendors have recognized an opportunity and now offer architecture-as-a-service tools. These products come in a variety of shapes and forms.
Extensible data models. For instance, cloud data analytics vendors, such as Domo and Infor Birst, provide multi-tenant data environments with extensible data models. This enables primary tenants to propagate a global model to sub-tenants who can extend that model by adding new columns and tables to support local requirements. The global model rolls down to sub-tenants, while local data and model extensions stay local. This hub-and-spoke approach is ideal for supporting retail and manufacturing distribution networks but can be applied in almost any data environment.
Self-service data engineering. More recently, data engineering vendors, such as Coalesce and Fivetran offer multi-code, template-driven toolkits that make it easy for data analysts or domain data owners to create repositories that align enterprise governance and schema requirements. Most of these tools are cloud-based variants of data integration, data transformation, or data warehouse automation tools.
For example, Coalesce, which launched last month, is a data transformation vendor that offers a more modern version of dbt, a popular, open source data transformation toolkit. Founded by ex-Wherescape employees, Coalesce offers both GUI- and code-based development environments, a column-aware architecture that supports full data lineage, and built-in automation functions. However, what I like best about this new product is that it allows data architects to build architectural guardrails into the GUI-based development environment via templates and other techniques so that business analysts can build architecturally compliant data repositories and pipelines.
Similarly, Fivetran is a data integration vendor that offers a more automated approach to centralizing data and a cloud application data. This makes it possible for a data analyst, rather than a data engineer, to build architecturally compliant data pipelines that move data from a single cloud application into a target database and run pre-built transformation processes to harmonize that data into a common schema. Both Coalesce and Fivetran are harbingers of what we expect will be a booming market for DaaS tools.
In transition. Today, however, a highly motivated data analyst might be able to use any number of GUI-based data engineering tools to build a data pipeline. However, there is little chance they will produce something that complies with architectural guidelines or governance standards. You need a trained data engineer whose work is reviewed by an enterprise data architect to do that. In a Data Architecture as a Service paradigm, however, a data architect configures a DaaS-ready tool to adhere to enterprise data standards and structures so a data analyst rather than a data engineer builds compliant data pipelines.
When we abstract data architecture, we solve the most enduring data pain point: the proliferation of data silos that wreak havoc on data consistency and trustworthiness.
Conclusion. Data architecture-as-a-service is a verbal twist on cloud processing environments, such as software-as-a-service or platform-as-a-service. This moniker conveys that it’s possible to abstract architecture and build it into easy-to-use, customer-facing tools. When we abstract data architecture, we solve the most enduring data pain point: the proliferation of data silos that wreak havoc on data consistency and trustworthiness.