Automating Predictive Analytics Through the Evolution of Models
The Power to Create and Test Billions of Models
The holy grail of predictive analytics is to provide business users with direct access to the power of predictive analytics without exposing them to unnecessary risk. The risk, in this case, comes when business users don’t fully understand the powerful tools that they are wielding and then acting upon inaccurate or non-robust predictions.
Several companies are navigating these dangerous but bountiful waters by utilizing principles of biological evolution. These systems create variants of ‘pretty good’ models and then test them to see if any of these variants are better than the original. If so then they try variants of those improved models. The process stops when optimal accuracy is achieved.
This is not easy to do but some companies have made great strides in this direction. Eckerson Group recently sat down with Nutonian’s top executives, including CEO Scott Howser and founder and CTO Michael Schmidt, to discuss the company’s pioneering offering, Eureqa.
Nutonian offers an artificial intelligence (AI)-powered modeling engine that is particularly adept at utilizing time-series data to create and test billions of mathematical equations to find the most accurate and simplest model. The product, aptly named Eureqa, greatly improves the productivity of data scientists and helps business users make better planning decisions, sometimes by simply providing a dashboard with predictive recommended actions.
The underlying technology was developed by founder Michael Schmidt while a PhD student in computational biology at Cornell University. The technology is a variant of a branch of machine learning called genetic programming popularized by John Koza in the 1980s and 1990s. Nutonian applies parallel processing to genetic programming to make it scalable and fast enough to use in a commercial product. It also enforces a strict ‘simplicity’ constraint on the remaining models that helps to ensure that the model is robust when applied to new situations. It automatically performs model validation on a test set (e.g. cross-validation). This keeps both business users and Ph.D. data scientists safe in case they forget to implement that important step of model building.
Eureqa greatly increases the productivity of data scientists and generates simple models that are highly accurate. For example, a large drugstore chain wanted to predict drug inventory and pharmacy staffing needs for each of its stores. Their initial model covered all of the United States. But each store was unique in small or large ways (e.g. consider the differences in predicting inventory for elderly medications between a retirement community in Florida and a young neighborhood in NYC) and the national model wasn’t always a perfect match for each individual store.
The business intelligence and data science teams working for the retail chain were creating this single model to forecast all sales for all stores in all locations. However, with only one model for an entire chain, forecasts were not optimal since stores in Austin, Texas and Billings, Montana sell very different products during the winter. The data scientists were well aware that more targeted models would be better but they were unable to create and test that many models with their current tools.
Thus predictive accuracy suffered because it was just not possible to create as many models as were needed. With Nutonian, the chain was able to build thousands of models, one for each store, in less time than it previously took them to create a single model for the entire chain. Overall forecast accuracy increased dramatically without requiring the hiring of additional data scientists.
With the ability to find the model that optimizes both simplicity and accuracy from billions of potential combinations, Nutonian is poised to become a ‘go to’ tool for forecasting when data scientists are overwhelmed with the need to build many models.
The simplicity of Eureqa’s models also helps business users more easily understand, interpret, and act on their data. In addition, the capability of producing an immense number of models makes Eureqa scalable for companies that wish to create models for many different applications.
Nutonian is endeavoring to make forecasting as simple to use as creating an equation in a spreadsheet. Just as you might subtract cost from revenue to calculate profits you can now easily get a forecast prediction from Nutonian plugged into your spreadsheet as a new column of data. As a business user, you can then decide whether or not you use the forecast. Initially the prediction may be solely considered interesting additional data and not acted upon. But, over time, as business users become more comfortable with it, these automated forecasts can become more and more integrated into business decisions.
If a business user is considering using one of these products but does not currently have a data scientist in-house, they should work with a consultant to get up to speed. Such self-service and automated predictive analytics products are tolerant of the novice user and can often protect them from generating models that don’t work well. However, most organizations would do well to continue working with a data scientist as a consultant on a regular basis.