The Machine Learning Lifecycle and MLOps: Building and Operationalizing ML Models - Part IV
Read - The Machine Learning Lifecycle and MLOps: Building and Operationalizing ML Models - Part I
Read - The Machine Learning Lifecycle and MLOps: Building and Operationalizing ML Models - Part II
Read - The Machine Learning Lifecycle and MLOps: Building and Operationalizing ML Models - Part III
When you computerize human cognition, you actually create a lot of new work for humans.
Machine learning presents enterprises with the opportunity to win new customers, increase margins, and improve productivity by automating human decisions. But building and operationalizing machine learning models also introduces complexities. Enterprises can achieve the right results by carefully dividing labor among their stakeholders: business owners, data scientists, data engineers, machine learning engineers, and DevOps engineers. Those stakeholders must help one another lead, support, and learn the many tasks of a machine learning initiative.
First, some basics. The job of a machine learning (ML) algorithm is to discover patterns in data. These patterns help people or applications predict, classify, and prescribe future outcomes, such as customer actions, market prices, or fraud. A fully developed and “trained” algorithm becomes a model, which is essentially an equation that defines the relationship between key data inputs (your “features”) and outcomes (also known as “labels”). ML applies various techniques to train and thereby create this model. Techniques span supervised learning, which studies known prior outcomes, and unsupervised learning, which finds patterns without knowing outcomes beforehand.
Enterprises build and deploy ML models in a lifecycle that comprises three phases: (1) data and feature engineering, (2) model development, and (3) model operations. The first three blogs in our series described these phases:
1: Data and feature engineering. Ingest and transform your input data and label the historical outcomes. Then derive the features from your input data that best predict historical outcomes.
2: Model development. Select your ML technique, and “train” that algorithm on historical data (containing features and labels) until it becomes a production-ready ML model.
3: Model operations. Implement your model in production workflows. Monitor the operational performance, accuracy, and cost of this model, and organize your models in a governed catalog.
You can expect to repeat the phases of the lifecycle. If your ML technique doesn’t fit your training data, you can go back and select a new technique. When your initial training results disappoint, you go back to change your input data or define new features within the input data. Once in production, the model’s results will drift over time, forcing you to repeat those steps again.
“The model is never right the first time. Ever,” Jim Swanson, CIO of Johnson & Johnson, observed in a recent webinar with Domino Data Lab. “Your data is not right, your hypothesis may be askew… You have to build an iterative learning approach.” To help make this happen, Johnson & Johnson created a data science council that fosters collaboration and knowledge-sharing across the enterprise. In a similar vein, the food delivery platform DoorDash created a machine learning council to democratize ML adoption. Hien Luu, head of ML infrastructure at DoorDash, described their council as the “glue” for cross-functional collaboration and knowledge sharing when speaking recently at Tecton’s apply() conference.
Collaboration and knowledge sharing matter because ML initiatives take the primary stakeholders—business owners, data scientists, data engineers, ML engineers, and DevOps engineers—out of their comfort zones. They must help each other and teach each other. Here are some guidelines about what activities stakeholders should lead and what activities they should support. We also define stakeholders’ comfort zones, and the skills and domain knowledge they should learn to deliver results.
Stakeholder Roles in the Machine Learning Lifecycle
Business Owner
Comfort zone: business domain and spreadsheets.
What they support: While the business owner plays a supporting role, it is a critical role that extends across the entire lifecycle. They provide the business objectives and questions for an ML initiative to address. They consult the data scientist and potentially ML engineer to ensure the ML initiative remains focused on those objectives and questions. They also provide invaluable domain knowledge, sponsor the initiative, and help the data scientist communicate with executives.
What they learn: The business owner must learn fundamental principles of ML and key aspects of their ML initiative. These include attributes of the input data, steps in each ML phase, and ML techniques the data scientist uses. The business owner also must understand how the ML model fits into operational workflows, and the interdependencies and risks that introduces.
Data scientist
Comfort zone: mathematics and programming.
What they lead: The data scientist serves as quarterback of the ML lifecycle from start to finish. They take the guiding objectives and questions from the business owner, study available datasets, then devise an ML model to answer those questions. They lead the data and feature engineering phase jointly with the data engineer that best understands the data. They lead the model development phase individually, but partner with the ML engineer to jointly lead the tricky effort of bringing a model into production.
What they learn: The data scientist must learn how to explain statistical principles and ML techniques—and their business implications—in simple terms to business owners. They also must learn basic aspects of data pipelines to collaborate effectively with data engineers. They likely already have the necessary programming expertise to collaborate effectively with ML engineers and DevOps engineers during the ML operations phase.
Data engineer
Comfort zone: data pipelines.
What they lead: The data engineer jointly leads the data and feature engineering phase with the data scientist. They design, configure, execute, and monitor pipelines that ingest and transform data from various sources into a usable format for the ML algorithm to process during the training phase.
What they support: The data engineer supports the data scientist during the model development phase by managing the pipelines that deliver training data. They support the data scientist and ML engineer during the ML operations phase by integrating, managing, and monitoring the pipelines that deliver live data to production models. They also might help data scientists and ML engineers monitor the performance and accuracy of production models.
What they learn: The data engineer must learn fundamental principles of ML to understand how models consume data.
Machine learning engineer
Comfort zone: ML and programming.
What they lead: The ML engineer serves as the running back that takes the handoff from the quarterback and runs the ball down the field. That is, they take the trained model from the data scientist and put that model into production by integrating it with operational workflows, all under the oversight of the data scientist. For example, they might convert a model from R to the Java code of the production application. Then they monitor and help govern the model to ensure it meets performance, accuracy, cost, and compliance requirements. Finally, the ML engineer huddles with the data scientist to jointly decide when it is time to re-engineer features or retrain the model.
What they support: The ML engineer supports the data scientist and data engineer during the data and feature engineering phase by ensuring their data pipelines, features, and labels align with the production environment. They support the data scientist during the model development phase by helping them understand production requirements.
What they learn: The ML engineer must learn basic aspects of data pipelines to support data engineers effectively. They must learn the fundamentals of DevOps to understand the requirements of their production environment.
DevOps engineer
Comfort zone: programming and IT operations.
What they support: The DevOps engineer supports the data scientist and ML engineer during the model operations phase. They apply continuous integration and continuous development (CI/CD) methods to operationalize and rapidly update models while maintaining quality standards. They help the ML engineer write, test, debug, and release software code that integrates the model with production applications and workflows.
What they learn: The DevOps engineer must learn fundamental principles of ML to understand how models can best integrate with production workflows.
Many early adopters are fostering the right level of collaboration and knowledge sharing across their teams by assigning roles in this fashion. Our next blog, the last in this series, will chart the landscape of tools that data scientists and their colleagues can use as they lead, learn, and support the key phases of the ML lifecycle.