AutoML and Declarative Machine Learning: Comparing Use Cases
ABSTRACT: AutoML and the emerging approach of declarative ML help simplify the process of creating and refining ML models.
Sponsored by Predibase
A lot of human work goes into the development of a machine learning (ML) model. The data scientist identifies a business problem, develops a hypothesis, and trains the first model. But that’s just the beginning. From there, they evaluate the results, and experiment with different features and parameters to land on the best outcome in terms of time, cost, and accuracy. With luck, the outputs generate business value.
AutoML and the emerging approach of declarative ML help simplify the process of creating and refining ML models. This blog compares both approaches and the use cases they support. It introduces three criteria—ease of use, sophistication, and flexibility—to help enterprise data teams choose the right approach for their environment and requirements. It builds on an earlier blog that defined declarative ML and explained how it aims to simplify the lifecycle of creating, managing and deploying ML models.
AutoML software arose in the last decade as a method of abstracting and automating the ML lifecycle, including data and feature engineering; model training and development; and model production and monitoring. Enterprises use AutoML tools to “democratize” machine learning by helping non-data scientists make sense of complex data sets and define their features, then build and operationalize simple models that derive insights. Data scientists, meanwhile, use AutoML to standardize and accelerate their work. Typical AutoML tools offer a graphical user interface (GUI), support for simpler use cases, and a structured ML lifecycle.
Declarative ML software emerged inside leading tech companies like Uber and Apple this decade as a way to achieve the abstraction of AutoML while still giving experts the granular visibility and control when they want it. Declarative ML enables data scientists, engineers, or analysts to define what they want to predict, classify, or recommend, then let the software figure out how to do it. They enter declarative commands without needing to specify the code, rules, or other elements that execute on those commands.
This new approach to machine learning, while early in the adoption curve, builds on the tradition of mainstream declarative languages such as hypertext markup language (html) and structured query language (SQL). It also builds on recent declarative platforms such as dbt for data transformation and Terraform for cloud infrastructure management. Examples of declarative ML include the Overton project at Apple, Looper project at Meta, and the open-source Ludwig project originally developed at Uber. Vendors such as Predibase also offer commercial platforms for declarative ML.
Declarative ML software seeks to achieve the abstraction of AutoML, but offer granular observability and control where necessary
A declarative ML system suggests default features, parameters, and algorithms to train the ML model(s) on historical data. Based on a simple configuration file that defines the model and data, declarative ML allows data scientists and engineers alike to train models without low-level knowledge of frameworks such as PyTorch or TensorFlow. They can accept the default selections—or adjust them to meet custom, dynamic requirements—then implement and monitor the models.
Sizing up your options
AutoML and declarative ML serve overlapping but distinct requirements. Eckerson Group recommends three criteria to evaluate your options: ease of use, sophistication, and flexibility.
Ease of use
Evaluation criteria start with the question of “who?” In this case, the fundamental question is “which stakeholders will manage this ML project, and what type of interface will make them most productive?” Stakeholders break into two camps as they answer this question.
Graphical user interface (GUI). Many business analysts and data analysts favor a graphical interface and its no-code approach given their heritage with BI tools. They know what they need to query but lack the skills to enter text instructions in a command line. This camp benefits from AutoML tools that guide users through the ML lifecycle with intuitive prompts, graphical displays, and explanatory popup windows. A data analyst might drag and drop a file to upload it, then perform exploratory data analysis by scrolling over various columns to review the automatic histogram that appears for each. They can click to select and refine features, pull up visual data selections, and start training the model.
Command-line interface (CLI) or software development kit (SDK). Many data scientists and engineers, in contrast, tend to favor a programmatic interface. They understand how to script tasks in SQL and python, and even prefer it given their training and the flexibility and precision the CLI offers. This camp benefits from declarative ML tools that include a CLI and take a configuration-driven approach to model development, meaning that users can control model behavior without changing the code itself. A data scientist or engineer might start simple by specifying a class of input features and type of predicted output. For example, to predict customer actions they might specify purchase history for inputs and purchase intent for output. Then they can use their programming knowledge to add commands that examine and change features, parameters, or outputs, all within a simplified YAML file.
Many data scientists and data engineers favor the flexibility and precision of the command line interface
Data leaders also need to ask, “how sophisticated is my use case?” The answers to this question run along a spectrum, from simple to complex. They relate to data volume, variety, or velocity; the number of features; or the number or type of modeling techniques involved.
- Simpler use cases. In most cases, analytics teams tackle simpler use cases first. They might need to prototype a model to validate the concept before starting more rigorous training and testing. A data science team might seek a quick win with low-risk use cases such as basic content personalization or demand forecasting. Simpler use cases tend to require smaller tabular datasets, fewer features, less complex model architectures, and common pre-built models found in public libraries. AutoML tools help the simple end of the spectrum because they reduce the time needed to explore data, identify features, and train models. They find basic features, then guide users to develop models based on quick assessments of the feature’s impact on predicted variables.
AutoML tools reduce the time needed to explore data, identify features, and train and deploy models
- More complex use cases. Many data science projects end up tackling more complex use cases such as customer service automation, content moderation, and sentiment analysis. Before putting revenue or profit on the line, most enterprises need to take a hard look at the many variables and elements involved. These include larger, multi-modal datasets, more ML models, and more sophisticated deep learning techniques such as natural language processing and image classification. Declarative ML can help this end of the spectrum because it gives expert users the ability to derive and correlate features from both structured and unstructured data, then build layers of ML models that generate outputs from the complex interactions of these inputs.
To handle complex use cases, data teams need the flexibility to make changes as they go. This brings us to the third big question: “how can my team inspect and adjust model parameters to drive the best possible results?”
- Less flexibility. AutoML tools will explain certain model assumptions, features, and parameters, as well as data lineage. But they will not explain all of it or give users the ability to adjust these elements to meet custom requirements for a given environment or use case. The result might be a black box that users cannot fully understand, explain, or adjust.
- More flexibility. To get the necessary visibility and flexibility, data teams can turn to declarative ML, which enables data scientists, engineers, and SQL-oriented data analysts to inspect all the angles and elements of ML models—and even multiple ML models that go into a neural network for deep learning. They can tweak, re-train, and re-test until they arrive at the best outcome. Once in production, they can compare, promote, or demote ML models based on measurement of the results. The alternative to declarative ML is a full-fledged ML framework such as PyTorch or TensorFlow, although that might require expert users to go through a complex ML development process.
As with many technologies, AutoML and declarative ML are not mutually exclusive. AutoML helps democratize ML projects by enabling business-oriented analysts to try out concepts with prototype models. Declarative ML, meanwhile, uses AutoML as a starting point. It helps data scientists and engineers inspect and navigate the inevitable complexities. Predibase offers a declarative ML platform, based on open-source Ludwig, that seeks to support the sophistication and flexibility described here. It also offers a low-code GUI for the beginner and a CLI for more advanced users.
Many data science projects will start with AutoML and proceed to declarative ML. The best way to decide on one approach—or both—is to consider the criteria outlined here: ease of use, sophistication, and flexibility.