Supporting Self-Service Analytics with a Unified Platform
It’s trendy these days to call data “the new oil.” By this, writers typically mean data is a precious commodity tied to enormous economic growth, but I would argue data shares another defining characteristic with oil—its value only comes from what you do with it. Oil is just black gunk. We give it value by refining it into fuel, making it into plastic, or using it as lubricant. In the same way, data is merely blips in a computer. Unless we use the data to make a decision, it’s worthless. That means we have to get data into the hands of business users.
Unless we use the data to make a decision, it’s worthless. That means we have to get data into the hands of business users.
In every state but New Jersey, I can go to a gas station and fill up my own tank whenever I need to. Ideally self-service analytics works the same way. Business users who need data can get it themselves without help from a dedicated technician. But just because I can pump gas doesn’t mean I know how to refine oil or set up a gas station. While an element of self-service exists, other specialists have created the infrastructure and laid the groundwork, so I can get what I need quickly and move on.
To build the appropriate infrastructure, you have to understand the end user. In his report, A Reference Architecture for Self-Service Analytics, my colleague Wayne Eckerson divides users into four general categories: data consumers, data explorers, data analysts, and data scientists. While these folks make up the visible part of self-service, under the surface, technical teams of engineers, architects, and data curators build and maintain the systems that enable them.(See figure 1.)
Figure 1. The Self-Service Analytic Hierarchy
Although each user type serves themselves to some degree, each requires a different level of refinement in their data solution—just as different cars require different grades of gasoline. Once meeting these disparate needs necessitated separate tools. Now, thanks to the trend of feature consolidation, more solutions exist that support multiple user types in a single platform. Teams can set up one system that provides different experiences for different users in the same way one gas pump provides multiple nozzles for different kinds of vehicles.
Data consumers represent by far the largest portion of self-service users. They rely on tools such as dashboards to find answers to business questions. They can interpret visualizations and put data in context but don’t build any data products for themselves. Executives and decision makers often fall in this group. For them, time comes at a premium, so they need clean, easily digested visualizations that help them find what they need to know fast.
Data consumers can interpret visualizations and put data in context but don’t build any data products for themselves.
I recommend drawing on data storytelling best practices in designing solutions to serve data consumers. Keep things simple, and don’t obscure the important insights with too much extra detail. Make sure you understand the kinds of questions your data consumers regularly have and work with them to ensure you deliver the relevant key performance indicators (KPIs).
Consolidated tools benefit this group by reducing the time it takes support teams to deliver the self-service solution. When the platform an analyst uses to connect to data and build dashboards is the same platform a manager uses to view them, it removes distribution steps that can slow the flow of information. Good examples of this include products such as the Domo Business Cloud and Infor Birst that lead with business intelligence (BI) but also provide complete functionality for the data side of the stack.
Data explorers go a step beyond the data consumer. Although still casual self-service users, they differ by having the time and expertise to augment their own dashboards. They know BI tools well enough to make their own visualizations from connected data. They don’t need as much hand holding on the design side, but they still depend on data or IT teams to deliver relevant, quality data.
Data explorers have the time and expertise to augment their own dashboards.
Because this group engages more directly with data, the names of rows and columns become more important. When designing the tables that will power dashboards for a data explorer be sure to use standard business terms, so they will immediately know what a given value represents. Using a unified tool that integrates a business glossary and allows sharing standard business calculations helps prevent data explorers from accidently creating silos within their department.
Data analysts are true data professionals. They build ad hoc analyses and connect to new data sources. At the same time, this group has the largest range in technical ability. Some may write SQL queries to retrieve data while others require a no-code environment. In either case, they are comfortable sorting through data sets to find what they need and can build reports or dashboards from scratch.
Data analysts are comfortable sorting through data sets to find what they need and can build reports or dashboards from scratch.
Lots of tools target this persona and can make their jobs easier. Chief among these are data catalogs and data marts. These solutions facilitate finding data. A catalog helps users search through a company’s assets to identify the data needed for a given project. Official certifications and peer ratings guide analysts toward better quality data, while statistical information tells them if a set meets their technical requirements. Data marts help analysts retrieve data faster by storing related data in smaller, topical repositories, so users don’t have to query the enterprise data warehouse.
Over the last couple of years, a new category of tool—the self-service analytics workbench—has also emerged to better meet the needs of analysts by combining aspects of data catalogs, query engines, and BI tools. Exemplified by products like Promethium, these tools center their workflows around the business question. They accelerate time to insight by consolidating all of the functionality an analyst needs to answer specific questions within a single tool.
Finally, the most technically savvy self-service users are data scientists. They have an extremely high level of data literacy and are typically fluent in multiple coding languages which they use to build machine learning algorithms and statistical models. Much of their work involves surfacing insights from big data and previously undervalued data. As a result, they need direct access to data that might not be prioritized in existing data warehouse structures.
Data scientists have an extremely high level of data literacy and build machine learning algorithms and statistical models.
Like data analysts, data scientists benefit from a data catalog. They can make particular use of the technical metadata most catalogs provide about data assets. Data lakes are another piece of infrastructure that helps facilitate self-service for data scientists. Data lakes make all data—structured and unstructured— available in one place, allowing data scientists to pull what they need even if it’s tangential to the current operation of the business.
At first, data scientists may be reluctant to use a shared platform. They might feel their work is too sophisticated to fit within the same tool a data consumer uses to explore dashboards, but increasingly platforms that provide consumer friendly BI also provide the ability to embed Python or R notebooks into data pipelines. In fact, bringing data scientists and business users together means the complex models data scientists build are more likely to reach a decision maker’s dashboard and have an impact on the business.
So, who provides each of these users with the resources they need for “self-service?” Why the very data and IT teams who now have to spend less time fielding requests for data. Data architects design the data lakes, data marts, and data warehouses from which power users draw their data; data engineers build pipelines to model data and deliver it to those repositories; and data curators manage catalogs to ensure proper data governance and to maintain company standards. They too benefit from platform unification. Instead of building parallel infrastructure for different user types, a shared platform lets them reuse components.
Far from eliminating the need for IT and technical data teams, the self-service analytics paradigm shifts their role. Now, instead of spending time finding, modeling, and delivering data on an as-needed basis, they build data infrastructure to anticipate business users desire for data. Five years ago, this required numerous tools designed for specific use cases. Now, the proliferation of unified platforms makes it easier to support multiple user types on a shared infrastructure.
In any case, meeting the needs of self-service users requires knowing their needs. The four categories in this article provide a loose guide, but the population of self-service users in your organization may look different. When designing a strategy to enable self-service make sure to reach out to all stakeholders, so you know how to best meet the unique requirements of your organization. The last thing you want to do is build a gas station only to discover everyone in the neighborhood drives an electric car.