Posts

How GPT Models Work

  Introduction It was 2021 when I wrote my first few lines of code using a GPT model, and that was the moment I realized that text generation had reached an inflection point. Prior to that, I had written language models from scratch in grad school, and I had experience working with other text generation systems, so I knew just how difficult it was to get them to produce useful results. I was fortunate to get early access to GPT-3 as part of my work on the announcement of its release within the Azure OpenAI Service, and I tried it out in preparation for its launch. I asked GPT-3 to summarize a long document and experimented with few-shot prompts. I could see that the results were far more advanced than those of prior models, making me excited about the technology and eager to learn how it’s implemented. And now that the follow-on GPT-3.5, ChatGPT, and GPT-4 models are rapidly gaining wide adoption, more people in the field are also curious about how they work. While the details of their

Machine Learning is Fun