ChatGPT is based on a machine learning model known as the Transformer architecture. It is composed of millions or even billions of little components called neurons, which are organized into layers. Let’s talk about the several moving components of ChatGPT.
The first step is the input layer. This is where the text you type in gets converted into numbers. Each word or piece of a word is mapped to a specific numerical vector through a process called tokenization.
Let’s look at an example. Imagine you walk into a library and ask the librarian a question. Your words are like books that are barcoded or converted into numbers. In this case, as they are handed to the librarian. So, what happens when you ask, ‘I need a list of popular Stephen King novels?’
The second layer is called the embedding layer. The numerical vectors from the input layer then go through this embedding layer, which is like a dictionary translating words into a language the model understands. If we tie this back to our example, the librarian checks a master catalog that helps her understand the context of each book that Stephen King has written. This is similar to embedding, where words get translated into a format the model can understand.
The information then passes through a third layer called the encoder layer. This layer actually has two components that we’re going to talk about next. But before we do that, let’s look at what our librarian is doing for this stage. The librarian goes through aisles or, in other words, layers in the library to gather books to help answer your question.
Each encoder layer has two main parts. Number one is the self-attention mechanism. This allows the model to weigh the importance of different words when considering each word. So, if you say, ‘I need a popular Stephen King novels,’ it understands Stefan and King are more closely related to each other than the other words like ’the,’ ’need,’ or ’list.’
Number two is the feed forward neural network. This is a mini network inside each layer that helps transform the incoming data.
The fourth layer is called the output layer. Finally, the model uses what it’s learned to guess the next word in the sequence. It does this by calculating probabilities for each possible next word, and the word with the highest probability gets selected.
In our example, after gathering the books, the librarian writes a summary or the model’s output based on the most relevant information found in those books. She chooses each word in the summary carefully based on which one seems to fit best next in the sentence.
The final layer in the whole process is back to input. The new word from the output layer is then added to the original query, and the process starts over until the complete answer is formed. In our example, if more detail is needed, the librarian returns to the library to gather more books, refine the summary, and make it more complete.
All these parts work together using very complex math to come up with the answers that you see. It’s like a super advanced calculator for text. In this way, each component of ChatGPT has its own specialized role, working in tandem to provide a comprehensive and relevant answer to your question.
That’s all for this episode. Please like and subscribe to the channel if you look forward to more videos like this. Until next time!