Large language models: A beginner's guide

Imagine for a moment you’re having a conversation with a friend. You’re listening, responding, and sharing ideas. The flow is natural, the dialogue is coherent, and the interaction is engaging. Now, what if I told you that this conversation wasn’t with a person but with an artificial intelligence model? Would you believe me?

Let’s dive into the fascinating world of language models. By the end of this article, not only will you believe it, but you’ll also understand how it’s possible and why it’s becoming increasingly relevant in our technological world.

A beginner’s guide: What is a language model and how does it work?

In simple terms, a language model is an AI system that’s been trained to understand, generate, and engage with human language. Think of it as a digital entity that’s learned to talk, read, write, and respond like a human.

How do they work? These models learn from vast amounts of text data. They are fed countless sentences and phrases and are trained to predict what comes next in a given string of words. It’s like a very complex and advanced game of ‘fill in the blanks’.

For instance, if you gave a language model the phrase “The sky is…”, it might predict the next word to be “blue” or “clear”. Over time, and with enough training, these models can generate remarkably human-like text.

Unveiling the titans: Top large language models

Now that we have a basic understanding, let’s introduce some of the leading stars in the language model universe.

1. GPT-3 by OpenAI

Meet GPT-3, the darling of the language model world. Developed by OpenAI, this titan has a whopping 175 billion parameters. It uses deep learning, a subfield of machine learning, to produce human-like text.

GPT-3 has shown incredible versatility. From helping a theatre group in the U.K. write a play, to generating phishing emails for a cybersecurity study, and even being used in AI Dungeon, a text-based adventure game. Despite having some issues related to generated content, the wide range of its application showcases its flexibility.

2. Bloom by Hugging Face and BigScience

Bloom is the new kid on the block, boasting 176 billion parameters. What sets it apart is its multilingual capacity. It can generate text in 46 natural languages and 13 programming languages – a giant leap for inclusivity in the AI world. It’s also open source, which means developers can access and use it freely.

Bloom’s unveiling has sparked excitement, with experts suggesting it could democratize the research and development of large language models.

3. ESMFold by Meta AI

ESMFold, developed by Meta AI, comes in with a comparatively modest 15 billion parameters. But don’t let its size fool you. This model is an expert in a very specific domain – predicting full atomic protein structures from a single protein sequence. This groundbreaking capability could revolutionize drug discovery and other biochemistry fields.

4. Gato by DeepMind

Gato is another interesting player, with parameters ranging from 79 million to 1.18 billion. It’s a “general purpose” system capable of tackling a wide array of tasks, from playing Atari to captioning images and chatting. Despite some weaknesses, such as generating superficial or incorrect responses, its ability to perform diverse tasks is a stepping stone towards artificial general intelligence.

5. WuDao 2.0 by Beijing Academy of Artificial Intelligence

Here’s the heavyweight champion of the world, WuDao 2.0. With an incredible 1.75 trillion parameters, it is the world’s largest language model. Its capabilities range from simulating conversational speech, writing poems, to understanding images. Its enormous size, however, doesn’t necessarily equate to superior quality.

6. MT-NLG by Nvidia and Microsoft

The Megatron-Turing Natural Language Generation (MT-NLG) is a collaboration between Nvidia and Microsoft, coming in with a hefty 530 billion parameters. It can perform a range of natural language tasks, making it a powerful tool for many applications.

7. LaMDA by Google

We talked about LaMDA and how this LLM powers Google’s AI Bard in a past article. Google’s contribution to the language model race is LaMDA, packing 137 billion parameters.

What’s unique about LaMDA is that it was trained on dialogue, allowing it to generate more natural, open-ended conversations compared to traditional models.

Introducing the behemoth: GPT-4 by OpenAI

With all the impressive models we’ve covered, you might be wondering, what could possibly top them? Enter GPT-4, the most advanced language model developed by OpenAI to date. Its capabilities are astounding, but what’s really grabbed the headlines is its mind-boggling size in terms of parameters.

You’ve seen us mention ‘parameters’ quite a lot so far. So, let’s take a moment to explain what parameters are and why they’re crucial in large language models like GPT-4.

The heart of the model: What are parameters?

Parameters are the numerical values that dictate how a neural network, like GPT-4, processes input data to generate output data. They’re the fundamental components that encode the model’s knowledge and skills. To understand parameters, picture a giant web of interconnected nodes. Each node represents a concept or a feature the model has learned. The parameters are the weights assigned to the connections between these nodes, dictating how the model processes and relates these concepts.

During the training process, the model learns these parameters from the input data. It’s a continuous process of adjusting these weights to minimize the difference between the model’s predictions and the actual data. The more parameters a model has, the more complex and expressive it can be, allowing it to handle more data and detail.

The power of parameters: How many does GPT-4 have?

Now, onto the burning question: how many parameters does GPT-4 have? Some sources claim that GPT-4 boasts a staggering 170 trillion parameters. To put this into perspective, that’s approximately 1000 times larger than GPT-2 and nearly 1000 times larger than GPT-3, which had 1.5 billion and 175 billion parameters respectively.

However, it’s important to note that OpenAI hasn’t confirmed these numbers officially. Other sources suggest GPT-4 could have 100 trillion or even just 1 trillion parameters. While this uncertainty remains, what we can confirm is that GPT-4 is significantly larger than its predecessors.

Why parameters matter: The benefits and challenges of having more parameters

You might be thinking, “Great, GPT-4 has a lot of parameters. But why does this matter?” The number of parameters can significantly affect a language model’s capabilities and performance. Here’s what a trillion or so parameters bring to the table for GPT-4:

Multimodal data processing: GPT-4 can handle not just text, but also images as inputs, expanding its task handling abilities. For instance, it can describe images, summarize screenshots, or even answer questions based on diagrams.
Complex task handling: More parameters means a broader general knowledge and better problem-solving abilities. GPT-4 has proven this by passing a simulated bar exam with scores around the top 10%, a significant improvement from its predecessor.
Coherent texts: GPT-4 can generate longer and more consistent texts because the parameters increase the size of its ‘context window’ – the amount of input data it can consider at once. GPT-4’s context window is a whopping 32,768 tokens, compared to GPT-3’s 2,049 tokens.
Human-like intelligence: GPT-4 can generate, edit, and iterate creative and technical writing tasks. It can follow nuanced instructions, such as adjusting its tone of voice or output format, which brings it a step closer to exhibiting human-like intelligence.

However, having more parameters isn’t all sunshine and rainbows. It also brings its fair share of challenges, such as:

Computing costs: Training a model with so many parameters requires substantial computational resources. OpenAI had to redesign its entire deep learning stack and develop a supercomputer specifically to handle GPT-4’s demands. The estimated cost of training GPT-4 is a whopping $10 billion!
Training time: Training a model of GPT-4’s size takes an extensive amount of time. Although OpenAI hasn’t revealed the exact training time for GPT-4, we can safely assume it’s a lot longer than its predecessors.
Alignment with human values: With more parameters, aligning the model’s outputs with human values becomes increasingly complex. To ensure GPT-4’s outputs are safe and beneficial, OpenAI incorporated more human feedback into its training process.

The unveiling of GPT-4 is a testament to the tremendous potential of large language models. It also emphasizes the critical role parameters play in these models. While more parameters can enhance a model’s abilities, they also bring significant challenges.

However, with each advancement, we’re edging closer to a future where AI can seamlessly integrate into our daily lives, helping us in ways we can’t yet fully imagine.

How training large language models work

Training these language models is a computational heavyweight task. Developers feed the models vast amounts of text data, typically sourced from the internet, books, articles, and other digital text sources. The model then learns to predict the next word in a sentence based on the words it has seen before.

For instance, after seeing the phrase “I like to eat” followed by “apples” thousands of times in its training data, the model learns that “apples” is a likely word to follow “I like to eat”.

The parameters of the model, which can range from millions to trillions, represent the learned relationships between different words and phrases.

Are there small language models too?

In a sense yes, there are smaller language models, too. Language models can range in size and complexity based on the amount of training data used and the number of parameters they contain. These models are typically categorized based on their size – small, medium, large, and extra-large, though the specific terminology can vary.

The “smaller” models can be anything from simple rule-based systems or statistical models like n-gram models, to more sophisticated models like Recurrent Neural Networks (RNNs) or Long Short-Term Memory (LSTM) networks. Early transformer models like the original Transformer, and smaller versions of BERT (like BERT-Base) and GPT (like GPT-1) can also be considered smaller models compared to their successors.

BERT-Base, for instance, has 110 million parameters, which is considered ‘small’ compared to models like BERT-Large, GPT-3, or GPT-4. Similarly, DistilBERT is a smaller, faster, and lighter version of BERT, created through a process called distillation.

Even within the GPT series, there are smaller versions like GPT-1 and GPT-2 with 117 million and 1.5 billion parameters respectively. The size of these models allows them to be more computationally efficient and easier to run on limited hardware resources, though they generally offer less performance compared to larger models.

A comparison: How do these large language models measure up?

When comparing these large language models, it’s important to consider not only the number of parameters but also their applications and constraints.

For instance, GPT-4’s access to diverse applications and its substantial training in various domains makes it a very versatile model, despite Microsoft’s exclusive license to its underlying model.

On the other hand, Bloom’s edge lies in its multilingual capacity, making it a strong contender in a globalized world. ESMFold stands out with its unique specialization in predicting protein structures.

Gato, despite its lower parameters, shows promise with its general-purpose abilities. WuDao 2.0, although the largest, faces criticism for its lack of transparency regarding its training datasets and specific applications. MT-NLG showcases collaborative strength, and LaMDA shines in its ability to handle more natural, open-ended conversations.

In the end, the “best” model depends on the specific needs of the application. What is clear, however, is that these large language models are paving the way towards a future where humans and AI interact seamlessly in ever more complex and meaningful ways.

So, next time you chat with a customer service bot or use a text prediction tool, remember, you might be engaging with a language model trained with billions of parameters!

AI Talk

How large language models are revolutionizing AI