Generative AI for Data Management: Why, Where, and How Language Models Assist Data Teams
Data teams view language models (LMs) such as ChatGPT from OpenAI or BARD from Google with a mix of excitement and fear. Their fast, articulate answers to expert questions can help data teams discover datasets, write and debug code, document procedures, and learn new techniques as they build data pipelines. Exciting! But the fear also is justified. LMs can derail projects and undermine governance programs by giving answers that contain errors, sensitive data, or bias.
Given this ambivalence, data leaders and their team members need reliable guidance about where and how it makes sense to use LMs as part of a governed program. This eBook explores the emergence of LMs and LM-based tools from data vendors as well as their implications for the discipline of data engineering. It will define the technologies involved, assess emerging approaches and tools, and recommend ways to realize the productivity benefits of LMs while minimizing risk.
> Chapter One defines this market segment and examines adoption trends and use cases for combining them.
> Chapter Two explores governance strategies to handle the inherent risks of LMs.
> Chapter Three describes the emergence of domain-specific language models that support specialized data management use cases in a more governed fashion.
> Chapter Four recommends guiding principles for the successful usage of language models for data management.