Understanding ChatGPT: From Text Completion to Chat Model

Welcome back everyone to this lecture on how ChatGPT works. Keep in mind that this is an optional lecture, but we highly recommend you watch it so you can build an intuition of what’s happening under the hood when you’re working with ChatGPT.

ChatGPT is a large language model that has gone through a special process known as reinforcement learning with human feedback. In this lecture, we’re going to explore what these terms actually mean and get a high-level overview of how ChatGPT works.

Let’s begin our intuition by asking a simple question: Are certain words more likely to appear after a particular word? To answer this question, we can start by looking at singular words and see if we can start thinking of words as probabilities. For example, we can take a look at all published books that Google Books has scanned and count the percentage of time certain words show up on their own. By doing this, we can get an idea of the probability of certain words appearing after another word.

Now, let’s expand this idea to the concept of sequences. Text is really just a sequence of words, so it’s important to look at how words appear in succession. Given a previous word, certain words are more likely to come after it. By analyzing enough text, we can understand the probabilities of words that come after another.

ChatGPT takes this idea further by using a text completion model. It can predict the most likely word to come next given an input text. This is done through a training process where the model is fed a large amount of text data and learns the probabilities of words and sequences of words. The model converts the input text into tokens, which are numerical representations of the words. These tokens are then used to calculate the probability of the next token. The model can then generate the most probable response in chat form.

To convert the text completion model into a chat model, a process known as reinforcement learning is used. This involves collecting demonstration data and training a supervised policy. The model is then fine-tuned using comparison data and a reward model. The reward model calculates the reward for different responses, and the model is optimized against this reward model using reinforcement learning.

By going through this process, ChatGPT becomes a more conversational agent that can provide human-like responses. It can be trained to have different tones and formats, allowing it to take on different personas. The training loop involves humans in the process to ensure the model produces the desired responses.

In this course, we will explore different prompts and techniques to get the most out of ChatGPT. We will also learn how to use ChatGPT to generate code and run it in an environment. With this understanding, you’ll see that ChatGPT is not magic, but rather a result of matrix multiplications and training processes.

Thank you for joining this lecture on understanding ChatGPT. Stay tuned for the next lecture on prompt engineering.