AI Models: Does Size Matter?

17 February 2023

Explore the impact of AI model size and specialization, from GPT-3 to ChatGPT. Can smaller, more focused models outperform their larger counterparts? Discover the crucial role of data focus and training procedures in achieving optimal performance. What lies ahead for generative AI?

When explaining SoMin’s ad copy and banner generation features, I’m often asked if we replaced GPT-3 with ChatGPT or if we’re still running outdated models. “We haven’t, and we are not going to,” is usually my reply, which often always leaves a big surprise on our customers’ faces. Below, let me shed a light on why I’m giving such a reply despite the booming popularity of OpenAI’s first giant chatbot neural network, called ChatGPT.

The Seats At The Table

First and foremost, GPT-2, GPT-3, ChatGPT and, very likely, GPT-4 all belong to the same family of AI models—transformers. This means that, unlike the previous generation of machine learning models, they are trained to solve more unified tasks (a.k.a. Meta-Learning), so they do not require retraining for every specific task to produce viable results. The latter explains their giant sizes (175 billion parameters in the case of GPT-3)—a model needs to “remember the whole Internet” in order to be flexible enough to “switch” between different pieces of data it has learned based on the user input. Then, when a user specifies an input query, describing the task and a couple of examples (just like you would explain to a librarian what kind of books you are interested in), the model is able to produce the result: voilà. This approach is called “Few-shot learning,” which has recently become a trend when it comes to giving input into modern transformer models.

But do we always need to have the whole internet knowledge in order to solve a task we have in hand? Certainly not—in many cases, just like it is for ChatGPT, we require a bunch (well, a few million) of task-specific data samples that would allow the model to kick off the “Reinforcement Learning from Human Feedback (RLHF)” procedure. RLHF, in turn, would spin off a collaborative training process conducted between AI and humans to further train the model to produce engaging human-like conversations. As a result, we have ChatGPT shocking us with not just excellent performance in the Chatbot scenario but also helping us with writing short-length content, like poems or song lyrics, or lengthy content such as full papers (Disclaimer: This article is written 100% by me, a human); explaining complex topics in simple terms or in-depth knowledge when we need a quick answer; brainstorming new topics and ideas, which is helpful in the creative process; and supporting sales departments in personalized communication, such as email response generation.

I have purposely not mentioned above the applications like “Daily news summarization” or “Weather forecast”: While technically, large transformer models could attempt to solve such tasks, it is unlikely this would be possible to be done by ChatGPT and even GPT-4—the reason is that ChatGPT and other OpenAI’s transformers have very limited knowledge of world events because they are pretrained models and therefore their data updates are not happening often enough due to the extremely heavy computational requirements on model retraining. This is probably the biggest disadvantage of all the pretrained models that OpenAI (and anyone else) has produced so far. A bigger issue is more specific to ChatGPT: Unlike GPT-3, it was trained on a very focused conversational dataset and, therefore, only in conversational tasks will ChatGPT be able to surpass its older brother, while it would be less advanced when it comes to other human productivity tasks.

A Growing Family

Alright, now we understand that ChatGPT is just a smaller and a more specific version of GPT-3, but does it mean that we will be having more such models emerging in the nearest future: MarGPT for Marketing, AdGPT for Digital Ads, MedGPT for medical question answering?

I think it is likely possible, and here is why: When SoMin was submitting an application to receive access to GPT-3 Beta, despite filling up a lengthy application form with a detailed explanation of the current software we are going to build, we have been asked to agree to provide feedback on how we are using the model daily and the results we have received. OpenAI did these for a reason—being mainly a research project, they needed a business insight on the best application of their models, and they have crowdfunded it in exchange for a chance to be part of this great AI revolution story. It seems like the chatbot application was one of the most popular ones, so ChatGPT came out first. ChatGPT is not just smaller (20 billion vs. 175 billion parameters) and therefore faster than GPT-3, but it is also more accurate than GPT-3 when solving conversational tasks—a perfect business case for a lower cost/better quality AI product.

So, is bigger better when it comes to Generative AI? The answer is that it depends: When we are building a universal learning model capable of many tasks, then yes. It looks like bigger is better, and this is proven by the superiority of GPT-3 over GPT-2 and other predecessors. But, when we want to do a particular task really well, like chatbots in the case of ChatGPT, then the data focus and the appropriate training procedure are of far greater importance as compared to the model and the data size. That is why in SoMin, we are not using ChatGPT for copy and banner generation but rather leveraging the specific digital advertising-related data to guide GPT-3 to produce better content for the new ads the world is yet to see.

Then what is the future of generative AI, one would ask? Well, the multi-modality is one of the unavoidable progressions that we will see in the soon-coming GPT-4, as it has been mentioned by the OpenAI CEO Sam Altman in his speech. At the same time, Altman has broken the rumor of the model having 100 Trillion parameters. We all know bigger isn’t always better.


Prof. Aleks Farseev PhD

Aleks Farseev is a machine learning wizard who can teach a computer to sing "Bohemian Rhapsody" in binary code. He loves conjuring up new creations and is on a quest to figure out how machine learning can make the world a better place. When not tinkering with technology, Aleks can be found serenading his friends with his accordion skills, which he claims are only slightly less impressive than his machine learning prowess.

Want to have a chat about our products?

We're always happy to talk about how our tools can help people.

Book a Demo