Register for "A Guide to Data Products: Everything You Need to Understand, Plan, and Implement" - Friday, May 31,1:00 p.m. ET

What to Expect in 2021: Ten Predictions to Ring in the New Year

(To view or download this article in an infographic format, click here.)

Every December, Eckerson Group fulfills its industry obligation to summon its collective knowledge and insights about data and analytics and speculate about what might happen in the coming year. The diversity of predictions from our research analysts and consultants exemplifies the breadth of their research and consulting experiences and the depth of their thinking. Predictions from Kevin Petrie, Joe Hilleary, Dave Wells, Andrew Sohn, and Sean Hewitt range from data and privacy governance to artificial intelligence with stops along the way for DataOps, data observability, data ethics, cloud platforms, and intelligent robotic automation. 

Lessons Learned from 2020

But first, let me ponder the lessons we’ve learned from the extraordinary events of 2020 and speculate how those lessons will shape our future. One thing we know is that the pandemic and concomitant recession will slowly fade into the dustbin of history, but not without fundamentally restructuring the way we work, shop, socialize, and value what’s important. If nothing else, the past nine months have taught us to hold our loved ones close, value our friends and neighbors, and give thanks for the joys and conveniences of modern life that we often take for granted, such as the ability to travel where and when we want, dine out with friends and family, collaborate with colleagues face-to-face at work, and revel in the communal experience of attending live concerts, plays, athletic games, and other events.


My daughter, who found her calling as a wine educator,... is now a Napa Valley refugee.


Hopefully, we’ve also recognized the importance of nursing our planet back to health. Rising carbon dioxide emissions are causing volatile and destructive shifts in weather, violent storms, rising sea levels, and unseasonably hot and dry temperatures that are fueling record-breaking fires in Australia and the western United States. My daughter, who found her calling as a wine educator in Napa Valley, which she adored for its beauty, culture, and climate, is now a Napa Valley refugee. She fled east this fall to escape the perpetual fires, power outages, rising temperatures, and drought conditions that she believes will eventually spell the end of wine country. She is mourning her once chosen path.

Data for Good. So what does all this have to do with data and analytics? Well, nothing and everything. Can we use data to improve our society and planet? On an individual basis, maybe not, unless we perform pro bono work for non-profits, something that a few of us here at Eckerson Group do. But at the corporate level, if we pool our resources, we can accomplish amazing things. Check out Kevin Petrie’s new report “Data for Good: Corporate Initiatives to Improve Social and Economic Well-Being.” The report examines various initiatives from data analytics vendors and data-savvy enterprises to harness data for public good. Your colleagues in this dynamic profession are using their work hours to make the world a better place.

#1. Work-Life Balance 

So my wish for 2021—I hesitate to call it a prediction—is that once the pandemic subsides, we think twice before plunging headlong into our old ways of working and living. Do we really need to hop on an airplane to attend an event or meeting? Do we really want to sacrifice family time in favor of a long commute each day? Can we allocate more time to our physical health and spiritual and emotional well-being? Can we devote several hours each month to a local cause, as we at Eckerson Group strive to do? It’s my wish that we take charge of our personal and professional lives and establish a proper balance between the two. The pandemic has enabled us to hit the reset button. (See “The Time Has Come: A Manifesto for Change.”) Let’s not waste this opportunity to make sure we put the things we cherish at the center of our lives. (Wayne Eckerson.)

#2. AI Marketplaces Drive AI Adoption and Monetization

As enterprises apply artificial intelligence (AI) to a rising portion of their business processes, they will look outside their organizations for the right algorithms. AI creators, meanwhile, will look to monetize their handiwork. These forces of supply and demand will find their equilibrium in AI marketplaces. Unlike open-source communities and libraries, AI marketplaces will enable enterprises and individuals to exchange AI models for profit. Business managers will find and buy them; data scientists can create and sell them; developers will use and integrate them with their applications. AI marketplaces, while in their infancy, make sense because one AI algorithm can be adapted to address many business scenarios. Startups such as Bonseyes and SeeMei.ai, and established vendors such as IBM and C3, will raise their profile – and prompt Cloud Service Providers to get in on the action. (Kevin Petrie.)

#3. Intelligent Process Automation (IPA) Takes Root

The first generation of robotic process automation (RPA) focused on automating discrete and straightforward, albeit time-consuming, tasks. Companies automated simple “if-then-else” rules to eliminate or scale data entry and other non-value-added manual work. Enterprises are now looking beyond these piecemeal implementations to automate entire business processes, not just those components that can be done with simple logic. To do this, RPA will morph into IPA and be coupled with AI and ML capabilities, such as Natural Language Processing (NLP), Text and Image Recognition, and Predictive Analytics. These cognitive capabilities are being added or integrated into products from the major vendors, such as Blue Prism, UIPath and Automation Anywhere. While the classic RPA functionality can perform simple actions, the new cognitive functions will enable more complex and flexible business logic. Low-code vendors, such as Appian and OutSystems, will integrate or partner with the classic RPA vendor to complete the end-to-end business process capabilities. (Andrew Sohn).

#4. Privacy Management Becomes Embedded in Analytics Tools and Data Pipelines

The complexity and resource challenges of privacy management are increasing as the volume and diversity of data increases. To help address these challenges, data pipelines will need to be architected, not only to comply with data protection standards but also to enable increased visibility into personal data. Privacy management depends on transparency in data collection, processing, consumption and sharing, to ensure activities are legitimate and reasonable, and data is sufficiently protected. This need will increase the demand for Analytics Workbench platforms that simplify the privacy management activities. Data pipelines will be supported with machine learning to automate privacy management tasks, and initiate risk management workflows to manage quality, access and usage. (Sean Hewitt).

#5. Data Ethics Becomes a Data Governance Imperative

The time for data ethics as a mainstream practice of data governance has finally come. (See my article “Data Ethics: The New Data Governance Challenge.) The topic of data ethics has been around for at least two decades, but given more lip service than serious consideration. Continuously expanding methods of collecting personal data, combined with public awareness of data privacy concerns bring us to the point where regulations such as GDPR and CCPA aren’t enough. Responsible organizations will recognize that doing the right things with data must be internalized—not treated as the responsibility of external regulators. CDOs will begin to see data ethics as a priority and data governance organizations will be charged with responsibilities to socialize ethical data principles and practices, and to frame a data ethics code of conduct. (Dave Wells).

#6. More Companies Hire Full-Time DataOps Engineers

In the last few years, an increasing number of organizations have started to adopt the DataOps approach to data pipeline development. Like DevOps in the software world, DataOps improves product quality and reduces development times through a focus on the processes and technologies that go into creating new solutions. In order for the methodology to be effective, however, it’s important to have technical staff devoted to implementing it. Currently, the responsibility for DataOps is often spread across several members of a data team, including data engineers and data architects. In the future, we will see more companies hiring dedicated DataOps engineers to oversee data development processes. This consolidation mirrors the earlier emergence of the DevOps engineer, and it is probable that many of the first DataOps engineers will come from a software development or DevOps background. (Check out my article “What is a DataOps Engineer?” for more on the kinds of folks who will fill this role). (Joe Hilleary).

 #7. Cloud Data Platforms Swallow the Data Warehouse and Data Lake

As enterprises migrate their data to cloud infrastructure, they will embrace offerings that support BI reporting, machine learning workloads and collaborative data science – all on the same platform. Apache Spark-driven data lakes, cloud-native data warehouses and SQL query engines are converging on functionality that supports both analysts and data scientists, business intelligence and advanced analytics, structured and unstructured data. These cloud data platforms pair high-performance data warehouse structures with efficient, elastic cloud-native object storage. They will entice enterprises to replace legacy data warehouses and data lakes, and discard old product categories. (Kevin Petrie).

#8. Quality Standards Begin to Emerge for Machine Learning (ML)

Many organizations are recognizing the limits and quality deficiencies in the current state of ML. The differences between ML model results in testing and results in the real world vary widely. Some models are never deployed. But more concerning are models deployed that should not be. Data drift (the tendency of input data changes to degrade model performance) and under-specification (failure to use all important predictor variables) are the primary contributors to behavioral differences as models move from training/testing to real-world deployments. These are ML quality issues that need attention to avoid deploying models that do real harm—the issues that should be the initial focus of ML quality standards. This will not be the year that solves the problem, but the year when first steps are taken. If ML quality standards come to the attention of the International Standards Organization (ISO) in 2021, we may see ISO standards by 2024 or 2025, as it typically takes about 3 years from initial proposal to publication for a set of ISO standards. (Dave Wells).

 #9. Companies Invest in Data Governance Technologies, Staff, and Certification Programs

In 2020, we started to see regulatory bodies reinforce the relationship between data governance and safe reliable data and compliance. Citibank was fined $400 million in October for deficiencies in programs that include data governance. In November, the European Union released a draft of data governance requirements for data sharing. Organizations will be forced to pay more attention to the effectiveness of their data governance programs, and be prepared to prove that they have the right level of rigor respective to the data they collect and process. Ineffective and "hollow shell" programs with no substance will be seen as a liability and need to be improved to pass scrutiny. This push for demonstrable data governance will result in a demand for certification programs and translate to more investments in technology and staff to support data governance programs. Staff and technology are required to reach optimal effectiveness while certification programs will be seen as a way to proactively demonstrate that steps have been taken to ensure sufficient governance is maintained. (Sean Hewitt)

#10. Data Pipeline Observability Goes Mainstream

To support the rigorous demands of production analytics and AI, enterprises will seek a more comprehensive approach to monitoring and managing data pipelines, a technology approach now called “observability”. Explosive data growth creates complexity on every dimension. New data consumers, use cases, sources, tools and platforms make it impossible for enterprise data teams to monitor and control data pipelines with traditional IT monitoring tools. In contrast, new observability tools monitor, detect, predict, and resolve myriad issues that cascade from source to consumption across a variety of enterprise systems and applications. Data pipeline observability solutions from AccelData, NewRelic, PepperData and others can dramatically improve the performance, reliability, and availability of data pipelines by detecting and correlating issues across increasingly complex data infrastructures. This will be good news for site reliability engineers, platform engineers, data engineers and architects – as well as the business owners they serve. (Kevin Petrie)

There you have it. Ten predictions for 2021. To understand whether any of these predictions are worth the electronic ink they’re written with, check out Joe Hilleary’s article this month that reviews how we fared with last year’s predictions. It seems that we’re decent prognosticators!

Wayne Eckerson

Wayne Eckerson is an internationally recognized thought leader in the business intelligence and analytics field. He is a sought-after consultant and noted speaker who thinks critically, writes clearly and presents...

More About Wayne Eckerson