Looking “Under the Hood” of Generative AI

Is using AI like driving a car? You don’t have to know how to design a car in order to drive it around town. Back in the day, we had a whole class called “Driver’s Ed” taken during junior high school. In that class, we learned the rules of the road and got to sit behind the driver’s wheel for the first time and drive. My Dad, on the other hand, not only knew how to drive a car, but he would buy old cars (because we couldn’t afford a new one), take out the old engine, rebuild it, and put it back in. Using AI or machine learning algorithms can be similar to driving a car. Diving deeper into them would be more like repairing or designing a car. You don’t have to know how to repair or design a car just to drive it.

Driving “Artificial Intelligence” programs, or “Driving the Car”

Learning and using artificial intelligence is like driving a car. Nowadays, you don’t need to know how to create the algorithms yourself, you can just “drive them.” Like driving a car, you still need to know the rules of the road, how to evaluate where you are and where you are going, and be safe. But you don’t need to know all the intricacies of how an internal combustion engineer, and electric motor, to be able to design it. Back when I first started studying AI during my master’s in 1991, we had to either write our own AI code or find open-source software that we could add to. Today, with platforms like AWS Sagemaker, the algorithms are already coded. You can access and use them if you know which ones to use and how to string them together in Python code in a sequential fashion. You just need to read or take a class on how to use them. Thankfully, for educators at Community Colleges, HBCUs, Minority Serving and some PUI Institutions, AWS has set up an AI Educator Enablement program. I’m happy that five of our faculty have begun taking the AI Bootcamps offered in conjunction with The Coding School. This semester, I’m teaching a class modeled after AWS Machine Learning University’s Machine Learning Through Application course.

Designing “Artificial Intelligence” Programs, or “Designing the Car”

Learning artificial intelligence can also be like learning to design a car. Many of us are now familiar with ChatGPT, a conversational generative pre-trained transformer, that can generate new text for us based on our typed in prompt. Recently, I took DeepLearning.AI and Amazon Web Services (AWS) Coursera course on Generative AI with Large Language Models and was able to get the certificate of successful completion at the end. This course was more closely related to getting “under the hood” of the car and seeing what makes it work. I thoroughly enjoyed the course. I share a few things I liked.

Understanding the Generative AI Product Lifecyle

The GenAI course described the entire GenAI lifecycle including defining the scope, selecting the LLM to use, adapting and aligning the model, and application integration. The first part of this life cycle is understanding the use cases in higher education. We have to start with where it makes sense to use an LLM in an engineering course, for example. I’m excited that tomorrow we have our “Safely Exploring Generative AI for Faculty and Student Learning” design thinking session that is supported by the Kern Family Foundation and our new virtual Center for Artificial Intelligence, Algorithmic Integrity, and Autonomy Innovation (AI3). We have faculty from all five of our Schools (Engineering, Business, Humanities, Math and Science, and Education) coming representing about fifteen departments across campus. It’s imperative that all of our faculty begin thinking about the impact of GenAI on education generally and for engineering, how it will impact the way we prepare our engineers.

Pre-Training a Large Language Model

In the early days of AI, intelligent agents, or AI-enabled computer programs, were designed to “reason” symbolically using logic and inference engines. Today, the “reasoning” and “learning” in AI are done using statistical methods. In the course, LLMs are described as statistical calculators. LLMs take in large amounts of unstructured data, for example, from the internet, pass the data through a data quality filter, and create the LLM by running the pre-training algorithm on GPUs by updating the LLM weights.

How Pre-Training Works

LLMs essentially are trained to guess the next word in the text using a Transformer architecture, specified in the paper, “Attention is all you need.” A transformer consists of an encoder and a decoder. Depending on the purpose of the task, you can have encoder-only models, encoder-decoder models, and decoder-only models.

Autoencoder models, or encoder-only models, take the input words, or tokens, and try to learn to guess the masked tokens using a bi-directional context. They are good at tasks such as sentiment analysis, recognizing named entities, and classifying words. Example models are: BERT and Roberta.
Autoregressive models are decoder-only models and attempt to predict the next token in a text using a one-directional context. These are good for generating text and other types of tasks. This is the type of model GPT is. Example models are: GPT and BLOOM.
Sequence-to-Sequence models masks, or hides, random sequences of input tokens through the encoder. The decoder tries to reconstruct the span, or sequence of tokens, autoregressively. These are good for summarizing text, doing question and answering, and translating text. Example models include T5 and BART.

Tasks LLMs Can Do Well

Existing LLMs can do many tasks relatively well. These tasks include:

Essay writing
Language translation
Document summarization
Information retrieval
Actioncalls to external applications

Prompt Engineering Won’t Always Improve an LLMs Results

Those that are “driving the car” of LLMs know that they can specify what result they want to see using a prompt. There is a way to configure the LLM for the amount of randomness or length of response by modifying parameters for inference, including top k, top p, temperature, and max tokens. Modifying or writing a more complex prompt using a basic knowledge of how the LLM works can improve the results. This is called in-context learning and can involve giving examples of the prompt and the results. Giving no extra examples is called zero-shot inference, and giving one is called one-shot inference. Again, these things are covered in the DeepLearning.AI and AWS course, but I thought I’d mention them. When we start diving more into some of the theories, we are starting to get “under the hood” rather than just “driving the car.”

The Computational Costs for LLMs

Another aspect of the course that I liked is that it delved into a straightforward explanation of how much computing costs are involved. Those familiar with machine learning and cloud computing know that Nvidia GPU’s are the hardware engines that do all the compute processing required to train LLMs. The course helps us to realize that these ML algorithms in general, and LLMs, specifically require lots of computational processing power. A business or a highered ed institution conducting research will have to factor in these costs.

Techniques for Fine-Tuning the LLM

The course covers the methods used to fine-tune the LLM so that it can perform better at specific types of tasks. Although most casual LLM users are only familiar with the GPT models, there are others that exist and can be used. I just noticed that this blog post is getting long so I’ll end here.

Be Happy to “Drive AI” but Be Willing to Dive Deeper

In order to use AI, machine learning, or generative AI models, like LLMs, you don’t need to know everything under the hood. But learning how these models work will be helpful. Many people are complaining that GPT’s aren’t good at math. If you understand the architecture, you can see that they aren’t built for that. But they can be tied in with other applications that can do those things. I am hoping that as engineering educators, we can bring more understanding of AI, ML, and GenAI to the general public but also training others to design and build the next generation AI algorithms.

Picture: Participants in a recent, Safely Exploring Generative AI for Faculty and Student Learning – Using Design Thinking and Entrepreurial Mindset, sponsored by The Kern Family Foundation.

About the Author: Andrew B. Williams is Dean of Engineering and Louis S. LeTellier Chair for The Citadel School of Engineering. He was recently named on of Business Insider’s Cloudverse 100 and humbly holds the designation of AWS Education Champion. He sits on the AWS Machine Learning Advisory Board and is a certified AWS Cloud Practitioner. He is proud to have recently received a Generative AI for Large Language Models certification from DeepLearning.AI and AWS. Andrew has also held positions at Spelman College, University of Kansas, University of Iowa, Marquette University, Apple, GE, and Allied Signal Aerospace Company. He is author of the book, Out of the Box: Building Robots, Transforming Lives.

In Search of Innovation

Andrew B. Williams, Ph.D.

Looking “Under the Hood” of Generative AI

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply