A person with formal training in modern statistical methods develops AI models, especially machine learning models, that help predict, recommend, or categorize outcomes. Typically they use languages such as R or Python.
Added Perspectives
What they lead: The data scientist serves as quarterback of the ML lifecycle from start to finish. They take the guiding objectives and questions from the business owner, study available datasets, then devise an ML model to answer those questions. They lead the data and feature engineering phase jointly with the data engineer that best understands the data. They lead the model development phase individually, but partner with the ML engineer to jointly lead the tricky effort of bringing a model into production. What they learn: The data scientist must learn how to explain statistical principles and ML techniques—and their business implications—in simple terms to business owners. They also must learn basic aspects of data pipelines to collaborate effectively with data engineers. They likely already have the necessary programming expertise to collaborate effectively with ML engineers and DevOps engineers during the ML operations phase.
Finally, the most technically savvy self-service users are data scientists. They have an extremely high level of data literacy and are typically fluent in multiple coding languages which they use to build machine learning algorithms and statistical models. Much of their work involves surfacing insights from big data and previously undervalued data. As a result, they need direct access to data that might not be prioritized in existing data warehouse structures.
There are three types of data scientists: Research data scientist, applied data scientist, and citizen data scientist. Research data scientists focus on discovering and applying methods to generate new algorithms. Applied data scientists take well-established models to solve business problems and configure them using open source libraries or tools like SAS. An applied data scientist wears many hats to manage machine learning engineering, ETL, and think about delivering business value. An industrialized product can be amplified under the supervision of citizen data scientists. It's better to have citizen data scientists deal with the business because they are less likely to use data science jargon.