Operationalizing Data Science: 13 Challenges

Data science is at a critical crossroads.
It is a dangerous intersection with lots of crazy drivers whizzing by. 

Here’s what the data science vendors think their customers are saying: “I wish I could unleash the power of AI and data science within my organization.”

Here’s what their customers are actually saying: “Data science is pretty complicated. Right now I just want to drag and drop some dimensions and measures and get my report.”

There is a disconnect (not surprising) between the hype in the media and the reality of what is getting requested and consumed by mainstream business. Data science is at a critical crossroads. It is a dangerous intersection with lots of crazy drivers whizzing by. Getting through it will take some luck and some discipline to avoid a crash. 

Many technologies have crashed before. In the 1980’s (the last time Artificial Intelligence was at the peak of its hype cycle) rule-based expert systems were the dominant AI lifeform.

Rule-based expert systems were a very sophisticated technology whereby an expert’s knowledge (like a doctor) could be encoded into rules that would then replicate their decisions and behaviors. The systems were very successful in fields like medicine and manufacturing but the tools and the technologies were very complex.

“No one talks about rule-based expert systems anymore.”

There was also a dearth of ‘knowledge workers’ who had the skills to understand the complexity. They were in high-demand and commanded extraordinarily high compensation. They built very complicated solutions for very important problems that were mostly successful. But the solutions were surprisingly difficult to maintain and sometimes resulted in unanticipated behaviors that led to large failures.

Rule-based expert systems were complex and difficult to control. Despite being enormously powerful they never transitioned from ‘rocket science’ to an operationalized technology that was safe, repeatable and predictable. No one talks about rule-based expert systems anymore.

Now I know that AI is nearing the top of its next hype cycle and all the caveats from AI’s last hype cycle can be re-applied. But also try replacing ‘rule-based expert system” in the paragraphs above with ‘data science’ and you’ll find many similarities.

Expert systems never successfully navigated to the other side of that dangerous intersection to become part of the day-to-day operations of an organization. Though very powerful, they ended up going the way of the brontosaurus.

Data science is at grave risk from the same challenges.  Complexity, unpredictable behaviors and large embarrassing failures from misuse are just as much the hallmark of data science as is predicting the future or growing profits and revenue with little increase in investment.

So how to avoid a possible data science crash at these important crossroads?

“Everyone needs to redirect their efforts towards operationalizing data science.”

We, as a field, and I mean academics, scientists, product developers, data scientists, consultants … everybody … need to redirect our efforts towards operationalizing data science. We as practitioners can unleash the power of data science only when we make it safe and find a way to fit it into normal business processes.

As product providers we need to stop battling it out in Red Ocean with product-feature wars and instead try to really understand our customers’ needs … at the highest levels. Recognize that the current complexity of data science makes it already too great to be sustainable or have long term business impact except for one-time, heroic, special-purpose projects … delivered by superhero data scientists.

If folks are having trouble finding and hiring ‘data scientists’ that is an indication that something is wrong. That perhaps we are asking too much of the role of the ‘data scientist’ and we need to distribute that role to different people with more limited and more focused responsibilities. Then have them all work together in a multi-disciplinary agile team.

“What looks like a data and technology problem is really a political and business process problem.”

As business users of data science we need to look for ways to create processes that make data science valuable, repeatable and systemic to our organization. We need to spend the time to build those multi-disciplinary teams and look for the tools that help to keep the team organized and productive. It will not be easy. What looks like a data and technology problem is really a political and business process problem.

I truly believe that data science (including predictive analytics, machine learning, AI, data mining and statistics) is a game changer. We are definitely swimming in Blue Ocean. But, we are behaving as if it were Red Ocean and acting like sharks in a feeding frenzy. We have just begun to enjoy the very beginning of returns on investment from data science but there is so much more that is possible.

“We, as vendors and practitioners, need guidance on where to focus our efforts.”

To unleash the potential value of data science we, as vendors and practitioners, need guidance on where to focus our efforts. To that end, I am currently conducting research to prioritize 8 key success factors (whether best practices or product features) that lead to the operationalization of data science.

What do you think are the most critical challenges that stand in the way of operationalizing data science?

What do you think of my current list?:

  1. Homogeneous teams that don’t have business representation
  2. Lack of a business goal or business plan for data science impact
  3. Model loses efficacy over time
  4. Temporal leakage
  5. No ability to rollback to any point in time for a checkpoint on data and model
  6. Single predictor is too highly correlated with target variable
  7. Privacy violations or regulatory violations
  8. Test and control are not well delineated
  9. An otherwise good variable goes bad over time
  10. Misunderstanding the meaning of a predictor field
  11. Good model results in bad business decisions
  12. No collaboration or sharing of knowledge of models that work
  13. Workflow not integrated and models are not deployed

 

EXPERT INSIGHTS

Many thanks to Benjamin Baer of FICO, Ted Fischer of IBM, Susara van den Heever of IBM, and James Serra of Microsoft for speaking to me and sharing their expert insights on this topic.

FURTHER READING

Stephen J. Smith

Stephen Smith is a well-respected expert in the fields of data science, predictive analytics and their application in the education, pharmaceutical, healthcare, telecom and finance...

More About Stephen J. Smith

Books by Our Experts