The Opportunity and Risk of Generative AI Part I: A Nuclear Explosion
ABSTRACT: Generative AI brings a promise to improve lives in a blistering innovation race, but also a threat to people, corporations, and even nations.
Generative AI has exploded into the public arena, especially since the release of Open AI’s ChatGPT in November 2022. One hundred million people used the product monthly after just two months, the fastest adoption of any technology in history. This series focuses on generative AI, rather than AI broadly, because data and analytics companies are quickly adopting the technology to enhance their products.
This blog is the first in a series: The Opportunity and Risk of Generative AI. The goal for the series is to help data analytics leaders understand the darker side of generative AI as they consider using it in their own enterprise. The first blog will provide an overview of generative AI and its risks. The second and third blogs will use the Responsible AI framework to examine the regulatory and ethical issues of generative AI. The final blog will offer recommendations and best practices for implementing generative AI in data analytics projects.
In this blog you will learn what generative AI is and the common structures of generative AI systems. Then you’ll get an overview of the types of risks that generative AI poses to companies. Finally, I will offer you two visions of a generative AI future to help consider how governance decisions now will make either vision real.
What is generative AI?
Let’s start with the basics. Artificial intelligence (AI) is the ability of a machine to complete a human task reasonably well, with the goal of outperforming humans. Computers have long been able to do simple tasks like computation much faster and more accurately than humans. But recent advancements in hardware (like graphics cards), algorithms (like artificial neural networks), and data collection (we just have way, way more data these days) now enable AI to take off.
Generative AI is a type of artificial intelligence that creates some type of digital media: text (in many languages), computer code, images, video, audio, synthetic data, and 3D models. The allure of Generative AI is that it can support a huge number of use-cases like speeding up mundane tasks or helping brainstorm ideas. In the data analytics sphere the most promising use cases include generating metadata and documentation, serving as an advanced digital assistant, and generating synthetic data for training models.
There are three major types of generative AI models:
Generative Adversarial Networks (GANs): In this scenario, two adversarial neural networks compete to thwart each other. The generator neural network creates content based on real content from an external source. The discriminator neural network receives real content and generated content randomly and tries to determine whether each piece of content is real. Both models are trained, getting better at generating content and discriminating content until a benchmark level is achieved. GANs were the first major class of generative AI, created in 2014 by Ian Goodfellow, et. al., and remain in use today, particularly for image manipulation and generation.
Transformer-Based Models: Transformers are neural networks that process sequential data, usually sentences of text, and then output a transformed version of that data. They power large language models (LLMs) such as ChatGPT and Google’s Bard. Transformers use an encoder-decoder system: the encoder translates input into vectors representing the meaning. The decoder receives these vectors and produces meaningful content based on each component of the input and its context. The best applications for transformers include translation, summarization, and other text-related tasks.
Variational Autoencoder (VAE): VAEs learn the distribution of input data and then generate new data from that distribution. VAEs also use an encoder and decoder system, although they have different tasks than in a transformer. The VAE encoder tries to efficiently translate data into a compressed, organized form. The VAE decoder then generates data points from the encoder’s output. Developers often use VAEs to create synthetic data from a small dataset.
These designs all share similar characteristics: a large amount of training data, a semi-automated training process, and an undetermined end structure. These characteristics help define the unique risks of generative AI models that we explore in the next section.
Risks of Generative AI
Generative AI models include two significant components: the training data and the model. Typical BI and analytics risks get larger when using generative AI and the model itself adds a new, second layer of risk (see Figure 1).
Figure 1. Risks of Traditional Analytics and Generative AI
Security: Bad actors can attack data storage and transfer processes looking to either sell that data, use it for their benefit, or simply hold the data system for ransom. Training a generative AI system can add another layer of weakness to your system, especially if connecting to an outside network.
Privacy: Generative AI systems have two modes of data access: first, access to training data; and second, access to data as part of production tasks such as database querying. Data engineers must clearly define data access permissions for an AI system so that users cannot gain access to private information via an AI assistant. This can be difficult to enforce if there are many user types or use cases or if the model was trained on a sensitive dataset.
Data quality: Poor data quality influences all decisions and processes that depend on that data. Bad training data will result in a bad model (“garbage in, garbage out”). Because an AI system operates at scale, it can create a lot more garbage, amplifying the negative effects of poor data. Any form of bias in the data, like a history of discriminatory hiring practices, imprints into the behavior of a model trained on that data. Generative AI models that create synthetic data from bad data can then spread their poor data to other models.
In addition, users of Generative AI can mistakenly believe the outputs are true. Take for example the lawyer that used fake case law generated by ChatGPT in a federal court filing. Misunderstanding how the outputs of Generative AI are created and erroneous assumptions about their veracity adds a new aspect of data quality risk.
Regulatory Compliance: Data-related laws and regulations vary based on jurisdiction and type of data. Generative AI models must comply with such requirements. Additionally, political bodies are actively creating new laws to try to manage generative AI. From outrights bans to minimal regulation, a fast-changing and varied legal landscape only increases the risk of non-compliance and penalties.
Generative AI introduces its own model-related risks on top of data-related risks:
Decision-making and responsibility. Data engineers can give AI systems different degrees of power to do things. This is why we want them: so they can do things for us. However, engineers or executives might be tempted to delegate too much to an AI system. From wrong use-case to bad training data, a generative AI system can do the wrong thing, and do it at scale. Generative AI models are by design unpredictable, so practitioners face unavoidable risk when delegating tasks to them. An ensuing issue comes when assigning responsibility for an AI and its mistakes. Because AI system cannot be sued (yet), responsibility might fall on the systems developers, anyone involved in the data curation, higher-ups in the company, and even the real-time users. If you think it would be hard to predict how an advanced AI agent will make a mistake, try choosing who takes responsibility for that mistake.
Cyber attacks. Generative AIs have become so effective at creating realistic and convincing content that they are already being used to scam people. There are cases of people’s voices being cloned to request money or a ransom. The market plunged when a fake image of smoke coming from near the Pentagon circulated online. As Generative AI becomes more powerful, bad actors likely will use it in more sophisticated, ambitious plots.
Explainability. Generative AI models develop through an automated process. This means that developers do not decide the architecture of the model beforehand and so must study the model afterward to understand how it works. This is easier said than done when dealing with a model with millions of parameters–the modifiable values that determine how the model translates input to output. If developers cannot explain how a generative AI model works, they cannot foresee and manage issues like hallucinations, bias, or illogical methods of reasoning.
Lawmakers across the world have taken a variety of steps to regulate AI models. Check out the second blog in this series covering responsible AI and regulatory compliance to learn more.
Utopia or Dystopia?
J. Robert Oppenheimer, the father of the atomic bomb, directed the US’s development of the atomic bomb at the Los Alamos Laboratory during World War II. He made the following observation.
“When you see something that is technically sweet, you go ahead and do it and you argue about what to do about it only after you have had your technical success. That is the way it was with the atomic bomb.”
And that is the way it is now with Generative AI. Developers have created incredible technology without widespread regulation of what to do with it. Top tech leaders requested a pause in the development of super-powerful AI systems in an open letter, but nobody paused. The speed of development is incredible to witness, but we must make sure we do not lose control, both collectively and within our own enterprises.
Generative AI conjures two visions of the future: a utopia in which humans and machines work together in harmony, and a dystopia in which machines exploit and control humans. Consider the nuclear energy analogy. On the positive side, nuclear power generation has produced an incredible amount of sustainable, relatively clean power. However, when humans do not manage the risks carefully, for example in the case of the Chernobyl nuclear disaster, then disaster can result. We can foresee similar issues happening with poorly managed generative AI systems. Even though intended for good use, a system could discriminate, even just a little, on a massive scale and cause significant harm to certain groups.
On the darker side, scientists used nuclear energy to create an incredibly destructive atomic bomb that killed hundreds of thousands of people and dramatically shaped politics and war ever since. Generative AI, and other AI systems, hold a similar threat of massive harm, including fraud, cyber attacks, and propaganda. These threats could significantly affect individuals, corporations, and society as a whole. Data engineers and data scientists can create and deploy AI systems a bit easier than nuclear scientists can create and deploy an atomic bomb or nuclear reactor. The resources–data, compute power, and algorithms–are available to everyone and people can deploy a system easily through the internet. This makes Generative AI harder to control than nuclear arms, an important point when considering a world governing body like OpenAI’s CEO Sam Altman suggested.
How data leaders manage generative AI now determines whether the technology will push us toward utopia or dystopia. We must manage both intentional and unintentional harm in a unified, powerful way to ensure beneficial use of generative AI.
Generative AI development is a relatively unregulated arms race, involving both good and bad actors. The benefits of Generative AI are essentially too good for anyone to pass up, but adopters must understand and control these systems effectively to avoid harm. Generative AI adds on to existing data risks like security and privacy and data quality while introducing new risks including autonomous decision-making, cyber attacks, and explainability issues.
The next two blogs in this series will cover the Responsible AI framework for managing AI development and deployment. Blog 2 in our series will explore the regulatory landscape related to generative AI, including the EU’s GDPR and new legislation now in the works. Blog 3 in our series will explore the value-based risk management system of Responsible AI and suggested practices.