Self-Service Triumvirate: The New Data Analyst Workbench
For decades, data analysts were left to their own devices to find, massage, analyze, and visualize data. Most used tools, such as spreadsheets (e.g. Excel) and desktop databases (e.g. Access), to handle these tasks. Consequently, data analysts spent 60% of their time preparing data, 20% building reports, and, if they were lucky, 20% analyzing the data.
Today, these ratios are changing thanks to the advent of a triumvirate of specialized, self-service technologies designed explicitly for data analysts: data visualization, data preparation, and data catalogs. Together, these three technologies, along with an underlying data connectivity layer, form the basis of an emerging self-service, data analyst workbench. Today, companies purchase these products separately and stitch them together. But soon, they will be able buy an integrated solution from a single vendor that supports the entire data analyst workflow.
The Data Analyst Workbench
Data Visualization. About a decade ago, self-service visualization tools debuted, enabling data analysts to create interactive reports and dashboards without IT assistance. Tableau led the charge here, but it now has dozens of close competitors. But data visualization, by itself, leaves data analysts stranded: they can create reports, but are still dependent on IT for data. (This assumes that existing data warehouses or data marts don’t have all the data analysts need—and they rarely do!)
Data Catalog. To address this problem, vendors recently began shipping data catalogs. These products scan corporate databases and data lakes to create an inventory of available data sets, data views, data workflows, and, sometimes, reports, code, and other artifacts. With a data catalog, data analysts can search for relevant data, profile that data, understand its lineage and quality, and examine who is using it elsewhere in the organization for what purpose and what they think about it. Basically, a catalog creates a data inventory—or data marketplace—of all relevant and available data for analysis purposes. Many companies now use catalogs to curate and govern data as well.
Data Preparation. Although data catalog makes it easy to find relevant data, it doesn’t help data analysts clean, format, combine, derive, or aggregate data. That’s the job of data preparation tool, the last component of the data analyst workbench. Data preparation tools go well beyond what Excel offers; most are designed to handle big data that consists of large volumes of multi-structured data with many concurrent users. Most importantly, data preparation tools keep a visual audit trail of data manipulations, allowing data analysts to edit or reuse an existing analysis that they or a colleague created.
Some vendors have now shipped products that combine two of the three technologies described above. For example, many business intelligence (BI) tools now incorporate data preparation features. In addition, some data catalogs provide built-in data preparation capabilities and vice versa. Soon, we’ll see a plethora of products that incorporate all three capabilities.
Openness. At the same time, a converged data analyst workbench will support open APIs, providing interoperability with third-party tools. Although vendors will supply all-in-one tools, they recognize that most customers have heterogeneous requirements. For example, a company that standardizes on a data catalog may have multiple BI and data preparation tools that need to integrate with it. A company with an enterprise data preparation tool may want to standardize on a data catalog or BI tool from a different vendor. So whether vendors ship a point product or integrated workbench, it will interoperate with other products.
The three applications described above form the basis of data analyst workbench. They provide the user interface and functionality data analysts use on a daily basis. However, below the surface is another key ingredient to make the data analyst workbench work: a data access layer. A data catalog only surfaces metadata about data and perhaps a small sample set. The actual data remains tucked away in a data lake, data warehouse, application, or external file. To get that data, a data analyst must leave the data catalog, open another tool, remember where and what they wanted to download, find the data, and then query or download it into a data preparation tool.
Obviously, this context switch is cumbersome and error-prone. This is where a data access layer comes in. Some data catalogs and data preparation tools provide connectivity to various databases and applications, as do most BI tools. But as data analyst workbenches emerge, we will also see the advent of a universal data connectivity layer built into the workbench that connects users to data sets that they find in a data catalog or use within a data preparation tool or BI tool.
The best data access layer will offer comprehensive support for all data sources, applications, and file systems. This degree of data connectivity is not generally what data catalog and data preparation tools offer. BI tools have more robust connectivity, but most are SQL-centric and don’t work well with data lakes or hierarchical data sets, such as JSON.
Data Virtualization. Data virtualization tools might offer a better data access foundation for a data analyst workbench. These tools specialize in providing dynamic access to any data source or application in the relational or big data world. They also provide a semantic view of data that simplifies data access for business users. The view masks the physical location of data, so users can query a view to obtain data that comes from multiple systems in real-time. Finally, data virtualization tools can enforce security policies by controlling data access at the data set, row, and column level.
A data analyst workbench that integrates data catalog, data preparation, and data analysis functionality is inevitable. Data analysts don’t want to jump from tool to tool when executing a workflow that is both linear and iterative: find data, prepare data, analyze data, and visualize data. It’s iterative because once users analyze data, they might see that they need more or different data, and when they visualize data, they may recognize that they have to format or group the data differently to do the analysis they need.
So expect to see the advent of a data analyst workbench soon. These workbenches could be all-in-one tools from a single vendor with organically grown capabilities. Or they could be a best-of-breed integration of tools from multiple vendors, each with different graphical interfaces and functionality. Or a combination of the two in companies that have an enterprise standard as well as various tools in different departments.
Whatever shape it takes, the data analyst workbench will make data analysts more productive. The workbenches will seamlessly support their natural workflow and enable them to collaborate more closely with data analysts across an organization, fostering higher levels of reuse and data literacy.