What to Expect in 2022: Ten Predictions to Ring in the New Year
Warning: 2022 might cause whiplash.
As we enter the third year of the 2020s, COVID and economic volatility continue to force enterprises to digitize their businesses at breakneck speed. They also must beef up their analytics strategies to take faster, smarter actions as they engage customers and optimize operations. This accelerates the adoption and convergence of innovative data tools and platforms; forces business and data teams to acquire new skills; and invites new regulatory scrutiny.
Eckerson Group’s team of experts offers the following predictions for 2022.
Prediction #1. Data governance platforms emerge (Wayne Eckerson)
A year ago, organizations that wanted to automate governance processes had to buy tools from multiple vendors: data catalogs, business glossaries, data lineage, data quality, data access control, masking, classification, privacy, security, and master data management.
Today, vendors assimilate adjacent functionality, turning their point products into data governance platforms. Attacama, Atlan, Alex Solutions, and OvalEdge were early to the game, but larger vendors are catching up, including Alation, Collibra, Informatica, Talend, Precisely, Zaloni, and Hitachi Vantara. Meanwhile, smaller vendors, such as Octopai, BigID, and OneTrust, now encompass additional data governance functionality.
Will these products deliver on their promise? Eckerson Group will host an online event dedicated to data governance platforms in April 2022. Stay tuned!
Prediction #2. Machine learning projects find a home in cloud data platforms (Kevin Petrie)
Organizations run machine learning projects through a full lifecycle: data and feature engineering, model development and training, and model production and governance. Data science platforms from vendors such as DataRobot, H2O, and Dataiku help data scientists manage this lifecycle by controlling the necessary data inputs, models, code, and APIs—and tapping an open ecosystem of tools, libraries, and notebooks. (See my blog series, “The Machine Learning Lifecycle and MLOps: Building and Operationalizing ML Models,” Eckerson Group 2021.)
In the coming year, organizations will manage more of these projects, start to finish, within cloud data platforms that contain integrated data science platforms. They already can do this within cloud data platforms such as Vertica, a heritage data warehouse, and Databricks, a heritage data lake. Snowflake, meanwhile, recently partnered with Anaconda to give its users access to open-source Python packages as they build ML models in the Snowflake Data Cloud. Organizations will embrace options like these as they migrate more and more ML-friendly data to the cloud.
Prediction #3. Data engineers don new hats (Kevin Petrie)
Data engineers serve as the single “throat to choke” for business and IT leaders when it comes to data delivery. Data engineers ensure data pipelines deliver timely, high-quality data to business intelligence (BI) and data science initiatives. To keep pace with booming supply and demand—and increasingly heterogeneous environments—data engineers need to learn new domains. Look for them to increase their responsibilities in the following areas in 2022.
Observability: Data engineers will help optimize the performance of storage and compute infrastructure that underlies data pipelines. This means playing the role of site reliability engineer, platform engineer, and ITOps engineer.
Data science: Data engineers will transform data to prepare it for feature engineering, and monitor AI/ML models for accuracy. This assists the data scientist, ML engineer, and governance officer.
Self service: As data and business analysts use no-code pipeline tools to prepare their own datasets, they will get into trouble and create messes. Data engineers, the experts on data preparation, will help them clean up the mess.
Prediction #4. The European Union will pass new AI/ML governance regulations, with global implications (Joe Hilleary)
Back in April, the European Commission proposed a comprehensive, ground-breaking AI law. This law requires ML and other types of AI models to be explainable in human terms, and bans models deemed to be biased or otherwise likely to inflict physical or psychological harm. Since then, legislators in the European Parliament have debated and worked to find common ground between the member states. Based on its current momentum, the law seems likely to go into effect sometime next year.
Like the General Data Protection Regulation (GDPR), the new AI law will have an enormous impact on companies that work in or interact with Europe. It will also set a precedent for other countries to follow in passing their own AI regulations. Despite outcry from businesses that the initial proposal went too far and would hobble the West’s ability to compete with China, the law appears to be growing even more restrictive. A draft circulated at the end of November expands the prohibitions against social scoring algorithms to private businesses. It also places restrictions on the use of AI for estimating insurance premiums.
Prediction #5. Data marketplaces explode (Joe Hilleary)
Data marketplaces provide a platform for companies to consume and supply data for a fee. Two factors will drive the growth of data marketplaces next year: demand and supply.
Demand. Business appetite for third-party data continues to rise. The unpredictable business environment created by the pandemic forces organizations to consume external data—including health statistics, weather updates, or updates on border closures. Acquiring and using third-party data now has a much lower opportunity cost and a clearer value proposition.
Supply. This increased demand for external data is matched by growth in the number of data providers. Digital transformation initiatives in the last decade resulted in companies across a range of industries generating more data than they know what to do with. CDOs and CIOs are hungry to monetize those data streams. Now, thanks to the increased maturity of data exchange platforms, it’s relatively easy to distribute that data as a product.
Data marketplaces feed off both trends by serving as a match-maker. Look for existing general data marketplaces to grow in size, and new marketplaces to spring up to serve specific industry sectors.
Prediction #6. Architecture-as-a-Service Platforms Enable Data Mesh (Wayne Eckerson)
A key to data mesh and distributed architectures is the existence of a self-service data platform that makes it easy for business domain owners to publish data sets for themselves and the rest of the organization. However, if every domain uses different technology, semantics, and interfaces to publish data, the distributed enterprise becomes a Tower of Babel where everyone speaks and no one communicates.
Given the interest in data mesh, we expect new “architecture-as-a-service” technologies to emerge that make it easy for domain owners (i.e., non-IT people) to publish data products for universal consumption. There are promising developments, such as no-code data pipelining tools and data exchanges that make it easy for non-technical people to create and share data products (see below). But these tools must also embed enterprise-defined guardrails that ensure that domain data products conform with enterprise governance, semantic, and API standards.
Prediction #7. Data exchanges ensnare data meshes (Wayne Eckerson)
Data meshes were the hot data architectural innovation of 2021, even if no one was clear on how to design and implement them. One major impediment to a data mesh is the lack of a robust self-service data platform that enables domain specialists to create data stores that serve their own needs plus the needs of business users across the enterprise. (See above). Vendors with low-code or no-code data pipeline tools, such as Informatica, Talend, Alteryx, and even Tableau, have the inside track. However, I think there is an easier solution: data exchange platforms. (See “The Rise of Data Exchanges: Frictionless Integration of Third-Party Data”, Eckerson Group, 2020.)
Data exchange platforms emerged recently to facilitate the exchange of data between data suppliers and data consumers. While geared to the consumption of external data, data exchanges can also facilitate internal data sharing among divisions or departments. In fact, they’re ideal for it. These platforms make it easy for data suppliers to create and publish data products and specify who can view them, for how long, and by what means—either in the exchange or outside of it via file transfer. They provide data consumers with an easy way to browse, ingest, and integrate relevant data products. Many use AI to recommend data products based on consumer preferences, activity, and keyword searches. Finally, many offer add-on services, such as data quality and de-duplication services. Look for forward-looking companies to use data exchanges to implement a data mesh architecture in the coming year.
Prediction #8. Supply chain analytics become a strategic imperative (Rich Fox)
Supply chain management, traditionally a tactical logistics function, became a strategic board room imperative thanks to acute profit pressure from COVID-19 disruptions. Many companies have already taken significant price increases, and customers may not be willing to accept more.
Supply-chain dependent enterprises now depend on careful digital tracking of all the ships, planes, trains, and trucks that transport physical goods. This means adding sensors, analyzing interdependencies, and building contingency plans–all to get products to store shelves on time. Solution providers such as Palantir will help companies better manage the supply chain through better analytics, leveraging machine learning and AI.
Prediction #9. Self-service tools focus on workflows (Wayne Eckerson)
An emerging breed of self-service platform places business questions at the center of the analysis process, rather than queries, dashboards, or models. It enables business users to ask business questions in natural language and get answers culled from prior results or dynamically generated queries. The answers display as simple charts and text that business users can understand. Promethium and ThoughtSpot are at the forefront of this trend.
But natural language queries are just the start. When no answers are available and the user hits the limits of keyword or semantic searches (or their own ability or willingness to search), the platform kicks off a workflow. It sends the question to a data analyst who combs through authorized data sets to find the answer and return it to the business user. If data doesn’t exist, the analyst initiates a workflow with a data engineer to create the required data set. The platform manages communication between these roles using chat and alerts. It also provides each role with all the functionality required to do their job efficiently. By collaborating on a common platform, business users get answers more quickly and data analysts and data engineers boost their productivity tenfold.
In the coming year, look for enterprises to adopt tools and practices that address questions in this fashion.
Prediction #10. Data literacy becomes an HR priority (Dave Wells)
Data skills, both individually and collectively, are essential to derive value from and create business impact with data. Corporate leaders recognize the importance of data literacy as a key contributor to success as a data-driven organization. (See “Building a Data Literacy Program: What, Why, and How, Eckerson Group,” 2021.)
As we move into the new year, proactive leaders will turn their attention to developing and retaining a highly data literate workforce. Acknowledging that data literacy has both cultural and capabilities dimensions, they will designate Human Resources (HR) departments with primary responsibilities to assess, motivate, cultivate, and drive literacy growth. Position descriptions and hiring practices will include specifically itemized data skills. Recruiting and retention processes and practices will expand to consider data talents, as will performance management and review criteria. Training and development activities will initiate data coaching and data mentoring programs.
In short, data literacy will become an HR priority, weaving essential data skills into workforce culture throughout the organization.