Lesson 1: Terminology
2 years ago, in 2021, I wrote a post on the “Powers and Perils of Language Technologies”. Today, I look back and chuckle. I was so wrong. Incorporation of the transform architecture, attention models, rise in our compute power has drastically unlocked the potential for language models.
In this series of blog posts, I am not trying to educate but share this learning journey with all of you. I believe that LLMs have a vast potential, unlocked new form of intelligence which can be harnessed to build and have massive impact in almost every industry.
Lets start with a few terminologies.
GPT: “Generative Pre-trained Transformer”
GPT was built by openAI. In 2018, OpenAI proposed the first GPT model.
LLMs: Large Language Models
LLMs are next-word prediction engines. Essentially probabilistic models that continue to predict the next word based on certain probability distributions
GPT2, GPT3, GPT4 …. :
GPT:
GPT was trained on the BookCorpus which contains 7,000 books. The model has 120 million parameters.
GPT-2:
With GPT-2, OpenAI proposed an even bigger model containing 1.5 billion parameters.It was trained on an undisclosed corpus called WebText. This corpus is 10 times larger than BookCorpus (according to the paper describing GPT-2).
GPT-3:
175 billion parameters, it was an even bigger jump from GPT-2 than GPT-2 from the first GPT.
GPT-3.5:
OpenAI denotes this class of GPT models as “instructGPT”.
GPT-4: Generative Pre-trained Transformer 4, multimodal large language model developed by OpenAI. Released on March 14, 2023
Prompt-Engineering:
Process of carefully designing prompts to generate desired outputs from LLMs. It leverages the model’s in-context learning ability. Well-constructed prompts significantly influence the quality and relevance of the generated text, allowing users to achieve specific results.
Fine-Tuning:
process commonly used in machine learning and deep learning to adapt a pre-trained model to perform better on specific tasks or domains. It involves adjusting the weights of the model’s neural network layers to make it more suited for a particular task.
follows the initial training on a large dataset and leverages transfer learning, where knowledge from a pre-trained model is applied to a related task with a smaller dataset.
Prompt-Tuning:
more efficient method that tunes continuous prompts with a frozen language model. This approach reduces storage and memory usage during training while adapting the model for new tasks.
Recommended Resources:
- ChatWithYourData Course:
step by step learn how to build own chatbot https://learn.deeplearning.ai/langchain-chat-with-your-data/lesson/1/introduction - Generative AI for LLMs
for a technical understanding of LLM technology, building intuition how different models work, practical + theoretical understanding of different training techniques (prompt tuning, prompt engineering, RLHF)
https://www.coursera.org/learn/generative-ai-with-llms
Papers: (If you really want to get to the source of it all)
1. Attention is all you need
https://arxiv.org/abs/1706.03762