The Future of ML Governance and Data Management with Kevin Petrie

Originally published on MLOPs Weekly Podcast.

For episode 21 of the MLOps Weekly Podcast, Simba Khadder and Kevin Petrie, VP of Research at Eckerson Group, delve into the challenges and opportunities of integrating MLOps in traditional enterprises. They discuss strategies to overcome technical debt in implementation, the pivotal role of data in the success of ML projects, navigating regulatory compliance in machine learning, and the future of AI governance.

Transcript:

Simba Khadder 

Hey, everyone, Simba Khadder here, and you're listening to the MLOps Weekly Podcast

Simba Khadder 

Today, I'm speaking with Kevin Petrie. Kevin is the VP of Research at Eckerson Group, where he manages their research agenda and writes about topics such as data integration, data observability, and machine learning. For 25 years, Kevin has deciphered what technology means to practitioners. He's been an industry analyst, and an instructor, a marketer, a services leader, and a tech journalist. 

Simba Khadder 

He launched a data analytics services team for EMC Pivotal in the Americas and in EMEA, and ran field training at the data integration software provider Attunity, which is now part of Qlik. He's a frequent speaker and the co-author of two books about data management. He also loves helping startups educate their communities about merging technologies. 

Simba Khadder 

Kevin, so great to have you today. I just gave a quick intro on you, but I'd love to get your story of how you got to the position you are in today. 

Kevin Petrie 

Simba, really appreciate the opportunity to be here. I run the research group here at Eckerson Group.  We're a boutique research and consulting firm focused on data analytics. I came from the vendor side,  among other things. 

Kevin Petrie 

I was with EMC for about 10 years. During that time, I ran an analytics services team with the EMC  Pivotal. Then I was a client of Wayne Eckerson for a number of years when I was with a data integration vendor called Attunity, which got acquired by Qlik, really enjoyed reading Wayne Eckerson's reports.  We're both folks that love the written word and love to simplify complex technology and teach it to business people. 

Kevin Petrie 

That's fundamentally what I love about what I do now, looking at bleeding-edge opportunities, trying to translate that to IT team leaders and business leaders, help them understand what it means to them so that we're technically deep, but we're also helping folks understand the high level. That's a little bit about 

what makes me tick circuitous route, but I really enjoy where I landed here about three and a half years ago. 

Simba Khadder 

When we think of enterprises, I'm curious to understand or get your take on... Everyone's talking about AI  and ML. What makes it specifically hard for an enterprise? There's a lot of things, but I'd love to get your take on the pillars that specifically make it hard to get to the bleeding edge in AI and ML if you are a large enterprise. 

Kevin Petrie 

Great question. I think that there are a few elements that make this stuff hard. One is that, as I mentioned,  we have two sides of the house here at Eckerson Group. We have research, where we're working with innovative companies to talk about the very latest bleeding-edge tools and opportunities. Then we've got consulting. 

Kevin Petrie 

We noticed about a five to seven-year gap between what cutting-edge vendors are talking about and what the average traditional enterprise, meaning an enterprise born before the cloud boom, say before  2010, what they're actually doing. There are a lot of stubborn, hard problems that have persisted for a  long time. They're not sexy. They're not as interesting to talk about on a keynote stage, but they boil down to silos, fiefdoms, a lot of technical debt. 

Kevin Petrie 

You've got this long tale of old stuff that because of data gravity, sovereignty requirements, migration complexity, and cost, you're not going to be able to move it to the cloud. That's one thing. You've got these hybrid, complex environments. 

Kevin Petrie 

Another is that you have folks that have built up a fair amount of habits within a certain business unit that's different from the rest of the organisation, and it's hard to reconcile that. It's hard to retrain people.  There's also process. Process can get ingrained. 

Kevin Petrie 

Struggling with that technical debt related to people, and process, and technology, I think is the fundamental thing that makes any new technology initiative hard. That's certainly true with the AI/ML,  because you can take very cutting-edge tools and you can empower a smaller business unit to do some great things fantastic.

Kevin Petrie 

But you want to think of the second order implications in terms of getting something just slightly wrong on a fraud prevention ML algorithm could have some pretty serious downstream implications. You want to think about making certain decisions in terms of data governance for one business unit and how that actually might cascade to or create tougher regulatory compliance processes for the rest of the business. 

Kevin Petrie 

I think, in a nutshell, it's really dealing with history. It's the fact that you don't have a fresh start, if you're a  company that was born before the cloud came around. 

Simba Khadder 

I love that split you had of the people, process, and technologies. We sometimes, ourselves, when we're selling, we'll say organisational problems and technology problems, and we create that distinction. One,  all these things, how deeply tied are they? When you think about, let's say, the weights of those problems, if I'm like, "Hey, I can focus on getting new technologies in, or I can focus on process," is there one you could focus on? Is it even possible to focus on one of them? 

Kevin Petrie 

I think that it helps to start with the people, and make sure that you have the right people, and that you're educating and motivating them in the right way to make sure that they can take advantage of the latest technology and that they can adapt their process. If you start with the people, you can address, I'd say,  more than a third of the problem, half or more of the problem related to taking advantage of new technology opportunities. 

Kevin Petrie 

Then it's time to go to process and figure out, "What needs to change?" Then you go to technology and  say, "What are the very latest tools that can help us achieve the business objectives that we have filtering  down to IT from executive leadership?" That's one way to look at it. 

Kevin Petrie 

There is a tendency, of course, because we're technology people, and we get excited about, rightfully,  new stuff to start with the technology and start with the latest bells and whistles, start with advanced algorithms that can predict customer actions, recommend customer purchases, prevent fraud, do all sorts of things, predict prices and personalised content on the web. Those are very exciting bells and whistles,  and those are pretty powerful algorithms that are now available off the shelf from a lot of public libraries.

Kevin Petrie 

The bigger question is, do you have the right data to feed that and do you have the right people and process to support it? 

Simba Khadder 

You mentioned you're almost fighting against your company's build-up-a-debt. When I'm in a situation, in one hand, I see the explosion of innovation coming from AI and ML, and there's so many almost obvious use cases to empower the business. At the same time, I'm dragging along potentially even hundreds of years of debt, both technical, disorganisational, et cetera, trying to make this happen. What's the pattern you've seen work how to bring innovation to a larger organisation that's, let's say, fighting against a lot of technical debt that exists? 

Kevin Petrie 

What works, I think, is starting with something that is bite-size. Start with a problem that is demonstrable,  find a group that's in pain, and then spin up a tiger team, a group of innovative, forward-thinking folks who have fewer dependencies process-wise on the rest of the business, and see what they can cook up. It might be that you've got a website or part of your website focused on a new customer segment or a new offering, and you're less tied down to the rest of the of the business. You can create some pretty cool content personalisation algorithms or customer recommendation algorithms. 

Kevin Petrie 

If you can start to innovate in a modular way, and if you can demonstrate some quick success to the rest of the business, that can create confidence, it can give you a learning curve, it can help you demonstrate business results to the rest of the business and get the right political support, the right momentum to go broader, and start to roll out some of the things you learned in terms of people process and technology to the rest of the business. 

Kevin Petrie 

I think starting small, looking for a quick win, those are common, almost cliché phrases, but they're very appropriate here because we all have a tendency to fixate on the very latest cutting-edge tools rightfully.  There's some really incredible, powerful stuff out there, but looking for a quick win is a good way to get started to make sure you don't take on too much at the outset. 

Simba Khadder 

That makes a lot of sense. It's almost this idea of creating momentum and building off of that momentum to get where you need to get. It's because there's always going to be roadblocks. There's always going to 

be hurdles. There's always going to be some level of organisational depth to just fight your way through.  The more momentum you have, the easier it is to just push through all that. 

Simba Khadder 

One piece of the equation, which I know comes up a ton, is the data. The data is almost, in some way,  like a projection of lots of its debt that gets created. First broadly, if you're working ML or AI, it all starts with the data. If you don't have good data, there's not really much you can do. What are the common pain points and pillars that you see around the data part of the ML/AI process? 

Kevin Petrie 

I'll start by agreeing with you wholeheartedly that data is 90% of the problem and 90% of the opportunity because even with the very latest large language models, you look at what Databricks said when they came out with Dolly. They said, "You know what? We can take older, more rudimentary models, but if we  apply it to a relatively small but clean set of inputs, we can generate some pretty stunning results." 

Kevin Petrie 

It just shows that all the innovation and algorithms, that's not the big problem right now. The big problem is the data. If you look at the data, there are some fundamental recurring themes here. There are silos. 

Kevin Petrie 

If you look at structured data, we'll start with that, there are siloed, sometimes conflicting or disparate views of entities such as customers, such as suppliers, such as employees. There's a need for basic master data management there. There's a need for consolidation or at least reconciliation so that you  

have one version of the truth wherever possible as opposed to multiple. There's also a need to look at unstructured data and figure out how to enrich your insights on business opportunities, on entities, and so forth by extracting insights from the unstructured data and co-mingling it with the structured data. 

Kevin Petrie 

I think those are some fundamental issues related to, broadly speaking, data quality that persist time and time again. The notion of data centric AI, I think, really makes sense because philosophically, it starts to really get at that problem of making sure that you have clean, well-organised input. Labeling of inputs goes a long way as well. 

Simba Khadder 

I totally agree. Data centric AI is really the only kind. I almost feel like that's all we do. Everything is more about making the data usable. The models are getting to the point, where we're almost coming to straight up APIs. It's more about orchestrating the data, moving the data, getting to the right data, cleaning it up, 

et cetera, much more so than having the most cutting-edge model, which almost feels like it's being commoditised. 

Simba Khadder 

On the data side, I noticed almost two trends that, in a funny way, almost seem to counteract each other.  One is this huge centralisation of data. Let's get all of our data into this data lake. At the same time, I'm seeing this concept. It's like loosely data mesh, but it's very much this idea of, "Hey, all these teams are  maintaining their own databases, which allows them to move faster, allows them to iterate faster." 

Simba Khadder 

Then the goal becomes unifying an abstraction, or almost like an interface, or a protocol over the desperate data sets. Is that fair? Do you see that, too? Do you see those things fundamentally rubbing against each other, or can they both exist in the same organisation? 

Kevin Petrie 

Really good question. It gets to do a lot of the conversations we have within our research team here.  There is definitely a pendulum effect over time. If we look at the rush to the cloud during, say, 2015 to  2020, there was a desire to consolidate onto a cloud data lake or a cloud data warehouse. Now, the notion of a combined, whatever we want to call it, the lake has certainly works this common pattern of  SQL on top of object stores. 

Kevin Petrie 

There was a desire to consolidate as much as you can of analytical data in one repository. But the reality is that data consolidation, I want to say it failed. We still have very decentralised data sets. 

Kevin Petrie 

The reason is that there's the long tail of old stuff that because of data gravity, inertia, what have you, it might be a fraction of what you have, but it's going to stay in mainframe on-premise, older stuff. It's going to be there for a while. You've also got the desire to... 

Kevin Petrie 

Most companies now work with multiple cloud service providers. They're trying to optimise certain  workloads on one cloud versus another, maybe gain some pricing leverage, maybe take advantage of certain offerings in certain regions, so that multi-cloud trend is continuing. 

Kevin Petrie

There are things that limit what you can really achieve with data consolidation. That recognition of data decentralisation is part of what has created this enthusiasm about the notion of a data mesh, because now you can say it's okay that we have these islands of data. Let's start to empower people out in the business units to own and provide it to the rest of the business. 

Kevin Petrie 

But I think the shift now is people say, "That's fine. If we accept the data is decentralised, somewhat, we  need to have a common semantic layer, a common management plane on top." That's where you see, I  think, some more investment now in order to figure out how to manage things across decentralised environments in somewhat of a uniform way. 

Simba Khadder 

I totally agree. I think that's what we've seen. We've even structured our own product. We call it the virtual feature store just because it's almost like, what would happen if you could virtualise this application layer without having to forcibly centralise the data? 

Simba Khadder 

We've seen a lot of the uptake comes from companies that are on-prem and in cloud, where it just doesn't make sense. It's just not possible, especially when you're dealing with really sensitive data. The work you have to do to get that and lift it off of your mainframe onto wherever is just enormous. 

Simba Khadder 

I want to talk a bit about that, too, because there's also this component of regulatory risk and governance and all the components that come around that aspect. First, I'd love to just get almost your lay of the land.  If you are a large organisation enterprise, you have financial data, you have a variety of different user data. What is that like? What regulations come up a ton? What things should people in those positions be thinking about as they're working with this data? 

Kevin Petrie 

The regulatory environment continues to evolve. I think there's some pretty common patterns. If you're a global organisation, you've been dealing with GDPR in the European Union for some time now, and that's going to force you to make sure that you're only taking actions with customer data that they have explicitly authorised. 

Kevin Petrie

The strong corollary to that is the California Consumer Privacy Act, the CCPA. We've got several other state versions cropping up in the United States that are similar. Broadly speaking, they have the same principles. 

Kevin Petrie 

If you get down to the regulatory fine print, it is hard for companies now because they got to figure out,  "I'm complying with CCPA, does that mean I've got it licked for all the United States? Does that mean I'm  okay with GDPR?" 

Kevin Petrie 

I was at an event in January with the CEO of L.L.Bean, and he was lamenting the fact that they had to have multiple compliance teams solving multiple compliance requirements. He said, "Why don't we get a  universal standard here, at least for the United States?" Globally would be great. 

Kevin Petrie 

The first trend is that consumer privacy trend, making sure that you're only taking authorised actions with data. That applies to both business intelligence and new analytics workloads such as data science. That's one broad set of activities that's important. 

Kevin Petrie 

The next one is figuring out, "What are we going to do with artificial intelligence?" Because now we need to make sure that we have some visibility. We need to get past the black box problem to understand we actually know what actions we're taking with the customer data, and then we can explain it to people. 

Kevin Petrie 

The European Union has some draft legislation looking at data science, looking at artificial intelligence.  It'll take a while for that thing to shape up. But I think that what companies are taking a much harder look at are more basic questions. This explosion of interest in large language models since ChatGPT came onto the scene in November has, I think, stunned the world and stunned enterprises who look at their teams, including, according to our surveys, nearly half of data engineers are already using ChatGPT to help them do their jobs. 

Kevin Petrie 

They're starting to say, "Whoa, great. If you can get some productivity benefits, it's fantastic." But this is also the Wild West because with ungoverned inputs, you're going to get ungoverned outputs, and it creates a lot of risk from a compliance perspective, from an accuracy perspective.

Kevin Petrie 

I think the next wave of innovation is going to focus on AI/ML governance and looking at some basic questions, which is, do I know why this algorithm told me to do something, and do I trust it? Those are easy questions to ask, very hard questions to answer. I think there's going to be a lot of focus on it from companies and from vendors. 

Simba Khadder 

When we work with certain types of companies, I find that a lot of problems around governance is handled via essentially meetings, committees. Before I get a feature in production, I need to talk to this team and this team, et cetera. You mentioned there's always productivity gains. There's always business opportunities that you can take advantage of if you are using this new technology. 

Simba Khadder 

First, what's the state of the world now? How is governance in practice actually being applied at these enterprises? One, and then two, I'd love to get a sense of the utopia state. What should it look like? What would be in theory, the best way that this would look in the future? 

Kevin Petrie 

I think broadly speaking, the companies that we work with... There are some very common trends across companies. One is to get much more serious about cataloging their data and assembling the right metadata of all their different data sets throughout the organisation. It's hard to do, like everything, easier said than done. That's one effort that's underway. 

Kevin Petrie 

Another is to modernise and move as much as you can onto a common cloud-based repository for analytics. Cataloging, cloud consolidation are big trends. Now, there's also a desire now to invest quite a  bit in training in fostering best practices. There's a focus on data literacy. There's also a focus on enabling companies and making sure that you're at the grassroots level. 

Kevin Petrie 

The people in your organisation who are starting to use ChatGPT or Bard or Bloom, whatever it is, if they're starting to use those tools, they're doing so in consistent ways that they've been trained on some easy-to-understand guidelines about what's inbound, what's out of bounds. We find that companies that  create a centre of excellence to help foster consistent best practices for using both new and established technologies is the right way to get that going. 

Kevin Petrie

Those are some of the big trends that we see in terms of getting your arms around this governance problem. 

Simba Khadder 

It's funny, it comes back to your same people, process, technology, the part that it almost feels like skipped upon because it's not that sexy, it's like the people part. How do we get people in our org? Even if we don't necessarily have this perfect technology, how can we get it to a point where people just get it?  They're well trained, they understand how we think about things, and they can make the right decision even given the mix of ambiguity that exists. Do you buy that? Is that fair of how you think about it? 

Kevin Petrie 

I think that we're all still a little bit in shock about the capabilities, and I keep going back to large language models, but it's forced us to ask some hard questions about AI overall. We're all still in shock about how powerful it is, but I think that's starting to settle down. 

Kevin Petrie 

Some of the common patterns that are starting to emerge is that vendors and companies are going to create smaller language models, small SLLMs that have curated, governed inputs, and they're solving small tactical focus problems. It's not a big Wild West. Rather, it's something that a company working closely with a vendor can make sure they're solving in a governed way. 

Kevin Petrie 

I think the things will settle down on that front. It's going to take some time, and it'll probably take some missteps by a lot of companies in the meantime as well. 

Simba Khadder 

I have a deck that has a slide where I defined an SLM, a small language bundle. 

Kevin Petrie 

There you go. You beat me to it. I was thinking of coining the term. I'm sure it's appearing right on board with you. 

Simba Khadder 

I was joking. I'm like, "Well, if GPT is an LLM, does that make Bard an SLM? Just maybe starting to string together the traditional ML and the more AI focus have the new wave, the new paradigm that are emerging in the last six months to a year.

Simba Khadder 

I'm curious to ask about that. In the enterprise, we're seeing GPT provide this new paradigm. The idea of a prompt as a data scientist, it wasn't really a concept before. The idea of thinking about your prompt,  constructing prompts, that was never something we'd ever think about just because the models just worked very differently. 

Simba Khadder 

Now, if I am a team, and there's a lot of companies that have chatbots, they have chatbot teams. The chatbot teams are working in what's called more traditional methods of NLP. Is ChatGPT going to wipe out a lot of traditional ML use cases? Is everything going to get replaced with GPT or similar LLMs? How do you think about that? Where is the lines and distinctions between where it makes sense and where it doesn't make sense? 

Kevin Petrie 

My guess is that there will be a profound transformation, but it'll take time to really shake out, and the implications are hard to predict right now. GPT got there first but also had some of the most interesting examples of ungoverned outputs. I don't know that they're necessarily going to win the war. Google is doing quite a bit, and Google, I think, has a very high interest in making sure that these are governed inputs and outputs. 

Kevin Petrie 

I think that what's more likely to happen is that you will have small SLM, small language models, in governed rollouts using commercial technology from vendors, using homegrown stuff from open-source communities or a mixture thereof, where companies start to roll out much more focused language models. That'll probably become the norm because people have higher confidence in it than at least the current versions of ChatGPT. 

Simba Khadder 

It's really interesting. It makes a ton of sense. I've actually not heard this take as strongly. I think it's very much, in my head, it's the most clear path forward for an enterprise. If you're a large bank, figuring out how to make LLMs work is a lot harder than trying to figure out how to make an SLM work. 

Simba Khadder 

I want to come back to the COEs. Obviously, a lot of larger enterprises have COEs, and I've seen many different ways of trying to implement them. I'm curious, from your perspective, what makes a centre of excellence successful? How can you successfully show and nurture these best practices across a large 

org? If you think SLM is the way, how do you build a team that really can show that, one, and two, disseminate it to the rest of the organisation? 

Kevin Petrie 

Great question because it's real easy to spin up something that's cross-functionally that involves dotted lines and voluntary time on top of a day job, and it just goes away into the ether after a few months. But if  you start by saying, "We need executive buy-in and we need executive commitment of dedicated time at  the top to actually foster this community centre of excellence." 

Kevin Petrie 

Then we need within the different business units and different IT organisations, line managers that are  told, "Your people are going to dedicate X% of time to this." Then you've created enough time, enough motivation and buy-in that people give it the calories it needs. You need to balance this with everything.  This need for centralisation and control with need for innovation and decentralisation. 

Kevin Petrie 

Start with broad brush guiding principles, and start to boil down to specific policies and best practices,  and find people at the individual contributor level or the team manager level who are having success putting those guidelines to work, celebrate that success, help them share those best practices with their peers, and start to foster a community where people have a vested interest in learning from one another because ultimately, they're now helping one another be more productive in their day jobs. 

Kevin Petrie 

I think those are some of the common practices that we see. What's interesting about it is that none of it is distinct to AI/ML or to other specific technologies. Rather, this is the art of human management that is always evasive. There's some timeless principles that are still hard to master. 

Simba Khadder 

I love how you frame it. It's almost like making innovation a process to something that you do. I still have so many questions. I feel like we could continue to talk about this all day. 

Simba Khadder 

I would love just to leave off, for someone who wants to follow you, follow your research, I know you write a lot and share a lot of work about this, where should they look for you? 

Kevin Petrie

I am on LinkedIn quite a bit. I post each day and love to engage community members there. I find that some of our best research comes from people that we find on LinkedIn through polls, through messaging,  and so forth. I encourage folks to look for me on LinkedIn. 

Kevin Petrie 

If you think I'm wrong with something I say, tell me. Let's engage. I would love to have those conversations. Pro and con, I find that it really enriches our research process and ultimately the value that we can provide to our community of practitioners. 

Simba Khadder 

Awesome. We'll include all those links in the description. This has been such a pleasure. Thank you so much for helping on and sharing all these insights with us. 

Kevin Petrie

Kevin is the VP of Research at BARC US, where he writes and speaks about the intersection of AI, analytics, and data management. For nearly three decades Kevin has deciphered...

More About Kevin Petrie