The Universal Semantic Layer: More Than Enough?

ABSTRACT: A "universal" semantic layer delivers uniformly defined business metrics across the data stack, but a BI solution may offer a cost-effective alternative.

According to a 2022 analysis of large manufacturing and CPG companies, 97 percent of employees are not “data practitioners.” Yet many of these workers are required to make decisions based on data they can (or want) to access themselves. 

A semantic layer can help this majority of less-technical workers gain immediate and consistent value from data. 

What is a semantic layer? What is its value?

A semantic layer is a logical representation of underlying data that defines commonly used business metrics in terminology that can be easily and consistently understood among all users, especially non-technical users. Conceptually, the semantic layer resides between data sources and the presentation layer, the latter of which may include one or more business intelligence tools, notebooks, data apps, or data science tools. However, semantic layer functionality may either be included as a feature of a cloud data warehouse or a BI platform, for example, or exist as a standalone, “universal” solution.

To illustrate the semantic layer concept by example, let’s consider a calculated business metric, monthly sales commissions per region. A semantic layer could let business users quickly access monthly sales commissions per region figures using a BI tool’s query interface, even though such figures are calculated according to a specific formula using data from multiple sources that identifies sale location, employee, transaction amount, currency, time of transaction, and so forth. Once the business defines the semantic layer, monthly sales commissions per region becomes a reliable and quickly accessible metric for use across the business.

A semantic layer abstracts the end user from the form, location, and complexity of data to deliver the following benefits.

  • Enable self-service and collaboration by business users.

  • Reduce dependency on IT and data support resources.

  • Minimize organization-wide inconsistency and duplication of effort in calculating business metrics.

  • Speed time-to-insight. 

Depending on the semantic layer’s location and capabilities, it can also improve analytic query performance and reduce cloud-data egress fees by automatically precomputing and storing frequently used business metrics. 

How is the semantic layer different from similar data engineering concepts?

As with many IT terms, the precise definition and usage of “semantic layer” are not monolithic. Some IT practitioners and writers use the terms semantic layer and metrics layer (or metrics store) interchangeably. At minimum, a semantic layer is a metrics layer that contains reusable, commonly used data incorporating measures and dimensions (also known as categories). 

In addition to metrics definitions, various IT vendors and influencers add capabilities to the concept of semantic layer, which may include the following.

  • Caching of frequently used metrics.

  • Metadata that supports data lineage, data quality, and data definitions.

  • Access control based on permissions granted to users and roles.

  • APIs (SQL, REST, etc.) for querying data and modifying semantic definitions.

Though practical in value, the semantic layer is an abstract concept. For clarity, we should distinguish it from the data catalog. Like a semantic layer, a data catalog centralizes business metrics. However, it also emphasizes the organization and retrieval of physical data assets rather than metrics and their logical data associations. Further, and unlike a semantic layer, a data catalog includes search capability for accessing various assets, such as data models and dashboards. 


Though practical in value, the semantic layer is an abstract concept. 


The semantic layer should be understood differently from data virtualization and data federation, as well. Like a semantic layer, virtualization and federation are abstractions from source data. Unlike a semantic layer, they seek to create a representation of all underlying data rather than convert particular data elements into understandable and uniformly determined business metrics.

Why is the semantic layer attracting greater interest today — especially a “universal” semantic layer?

The semantic layer is not a new concept. Prior to its acquisition by SAP in 2007, BI platform vendor Business Objects introduced the concept of the semantic layer. SAP later named this semantic layer functionality the BusinessObjects Universe.

OLAP analytical data engines, such as SAP Business Warehouse and Microsoft SQL Server Analysis Services, offered a semantic layer in their cube structures in which measures, dimensions, relations, and calculated measures were defined. 

In the past decade, BI tools such as Tableau introduced their own integrated semantic layer; Google Looker’s LookML language especially popularized the use of the semantic layer. Such semantic layers have customarily been limited to their proprietary BI platforms, though we will return to this point. 

In 2021, interest in the semantic layer concept intensified when Mode founder Benn Stancil observed that an environment-wide or “universal” semantic layer represents the missing element in the modern data stack.

Based on a diagram by Benn Stancil

The market entry of such tools as Cube, MetricFlow, MetriQL, and AtScale suggests a burgeoning interest in the benefits of a universal semantic layer. Such universal semantic layer offerings apply the DRY principle to all business metrics. This approach delivers the benefits of the semantic layer across the organization by consolidating into “one version of the truth” the business metrics that disparate presentation layer tools previously siloed (see diagram, above).

Spurring additional interest in the semantic layer in 2021, Airbnb published a detailed blog post to describe its internal semantic layer, Minerva. In 2022, dbt Labs launched its semantic layer and then acquired Transform Data in 2023 to enhance it. 

Both the data storage layer and the data presentation layer of business IT environments have become increasingly complicated through a proliferation of data types and data consumption tools. According to a 2020 study, the average organization has almost four business intelligence solutions in use. Because of ever-growing IT complexity and the historical scarcity of semantic layers, market interest in the semantic layer — especially a universal semantic layer that centrally defines all business metrics and eliminates redundancies — has strengthened in recent years. 

Finally, the semantic layer, especially the universal semantic layer, attracts greater interest today because it enables the modern data stack. The semantic layer facilitates self-service in a distributed data mesh architecture. A universal semantic layer, in particular, bolsters the implementation of a data fabric, as both the universal semantic layer and the data fabric seek to insulate end users from backend complexity while giving them access to all available data in understandable terminology and formats.

Should every organization implement a universal semantic layer?

As mentioned, a universal semantic layer offers the advantage of unambiguously and uniformly defined business metrics accessible across the data stack and the entire organization. 

However, there are potential drawbacks. The time and expense requirements for implementing, integrating, and maintaining an added layer of the entire data stack are non-trivial. Organizations may require hiring and training a dedicated analytics engineer in order to sustain a continually updated universal semantic layer. 


The time and expense requirements for implementing, integrating, and maintaining an added layer of the entire data stack are non-trivial.


These potential challenges must be weighed against the alternative disadvantages of maintaining isolated and duplicative semantic layers native in each of several tools. 

Standardizing on a semantic layer built into an existing BI platform may be a sufficient approach that proves simpler and less expensive than a universal semantic layer, especially if a particular BI solution is used predominantly in the organization. Several major BI platforms have actually opened their semantic layers for use by other BI platforms to one extent or another. 

  • MicroStrategy’s Federated Analytics lets users of Microsoft Power BI, Tableau, and Qlik leverage MicroStrategy’s semantic layer. 

  • As analytics practitioner Aurimas Račas argues, Power BI is now capable as a semantic layer accessed by other data visualization tools that connect via XML for Analysis (XMLA), including Tableau. 

  • Finally, in March 2023 Google Looker announced Looker Modeler, a standalone semantic layer that defines and stores metrics for consumption in Power BI, Tableau, ThoughtSpot, and other platforms.

As universal semantic layer offerings gain awareness and customers, they prompt their older, BI-based rivals to take a more open approach to serving the full breadth of the data stack. This development unlocks additional options to organizations seeking cost-effective ways of realizing the benefits of a universal semantic layer.

Recommendations

A universal semantic layer promises significant advantages to business users and those supporting them. IT and data teams should consider the following factors in evaluating their potential.

  • The extent to which a production BI platform or other tool can function as a “sufficiently universal” semantic layer.

  • The risks, opportunity costs, and outlay costs of maintaining disparate semantic layers tethered to separate tools — and likewise the risks, opportunity costs, and outlay costs in software and staffing required to support a universal semantic layer. Considerations include the organization’s size, complexity of its underlying data, and scale of internal demand for consistently defined business metrics that can be accessed through a variety of presentation-layer tools.

Most vendors are eager to support proof-of-concept projects; these can realistically test assumptions and clarify the net benefits of adopting a universal semantic layer tool. As with most promising technologies, take seriously the potential value of a universal semantic layer, start small, and scale up in proportion to foreseeable benefits.

Author bio: 

Jeff Smith is a freelance industry analyst and marketing consultant in IT and technical business-to-business offerings. Jeff's experience at Accenture, IBM, SAS Institute, Pyramid Analytics, and self-employment includes competitor and market research, product marketing, pricing, software engineering, and sales.