Articles

GPT4 vs ChatGPT: We analyze training methods, performance, capabilities and limitations

The new generative language model is expected to totally transform entire industries, including media, education, law and technology. 

In recent months, the speed with which innovative large language models have been released is astonishing. In this article, we will cover the main similarities and differences between GPT4 vs ChatGPT, including the training methods, performance, capabilities and limitations.

GPT4 vs Chat GPT: Similarities and differences in training methods

GPT4 and ChatGPT build on older versions of GPT models with improvements to the model architecture, employing more sophisticated training methods, and with a higher number of training parameters.

Both designs are based on transformer architecture, which uses an encoder to process input sequences and a decoder to generate output sequences. The encoder and decoder are connected by a mechanism, which allows the decoder to pay closer attention to the most significant input sequences.

The GPT4 technical report of OpenAI offers little insight into the model architecture and GPT4 formation process, citing the “competitive landscape and the safety implications of large-scale models“. What we do know is that GPT4 and ChatGPT are probably trained similarly, which is quite a difference from the training methods used for GPT-2 and GPT-3. We know a lot more about training methods for ChatGPT than GPT4, so we'll start there.

Chat GPT

ChatGPT is trained with dialogue datasets, including demo data, where human annotators demonstrate the expected output of a chatbot assistant in response to specific requests. This data is used to tune GPT3.5 with supervised learning, producing a policy model, which is used to generate multiple responses when requests are provided. Human annotators then classify which of the responses for a given prompt yielded the best results, which is used to train a reward model. The reward model is then used to iteratively fine-tune the policy model using reinforcement learning.

ChatGPT is trained using Reinforcement Learning from Human Feedback (RLHF), a way to incorporate human feedback to improve a language model during training. This allows the model output to align with the activity requested by the user, rather than just predicting the next word in a sentence based on a body of generic training data, such as GPT-3.

GPT4

OpenAI has yet to divulge details on how it trained GPT4. Their technical report does not include “details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar“. What we do know is that GPT4 is a trained transformer-style generative multimode model. Both on publicly available data and on third-party data licensed and subsequently fine-tuned using RLHFInterestingly, OpenAI shared details regarding their updated RLHF techniques to make model responses more accurate and less likely to drift outside safety guardrails.

After training a policy model (as with ChatGPT), RLHF is used in adversarial training, a process that trains a model on malicious examples intended to trick the model into defending it against such examples in the future. In the case of GPT4, the experts evaluate the political model's responses to the contradictory demands. These responses are then used to train additional reward models that iteratively refine the policy model, resulting in a model that is less likely to provide dangerous, evasive, or inaccurate responses.

GPT4 vs ChatGPT similarities and differences in terms of performance and capabilities

Capacity

In terms of functionality, ChatGPT and GPT4 are more similar than different. Like its predecessor, GPT-4 also interacts in a conversational style that aims to align with the user. As you can see below, the answers between the two models for a broad question are very similar.

OpenAI agrees that the distinction between models can be subtle and states that “the difference comes out when the complexity of the task reaches a sufficient threshold”. Given the six months of adversarial training that the GPT4 base model underwent in its post-training phase, this is probably an accurate characterization.

Unlike ChatGPT, which only accepts text, GPT4 accepts both image and text prompts, returning text responses. As of this writing, unfortunately, the ability to use image inputs is not yet publicly available.

Performance

As mentioned above, OpenAI reports a significant improvement in security performance for GPT4, compared to GPT-3.5 (from which ChatGPT was tuned). However, it is currently unclear whether:

  • the reduction of responses to requests for prohibited content,
  • the reduction of the generation of toxic contents e
  • improving responses to sensitive topics

are due to the GPT4 model itself or the additional contradicting tests.

Additionally, GPT4 outperforms CPT-3.5 in most human-taken academic and professional exams. Notably, GPT4 scores in the 90th percentile on the Uniform Bar exam compared to GPT-3.5, which scores in the 10th percentile. GPT4 also significantly outperforms its predecessor on traditional language model benchmarks and other SOTA models (albeit sometimes by narrowly).

GPT4 vs ChatGPT: differences and limitationsi

Both ChatGPT and GPT4 have significant limitations and risks. The GPT-4 system sheet includes insights from a detailed exploration of those risks conducted by OpenAI.

These are just some of the risks associated with both models:

  • Hallucinations (the tendency to produce nonsensical or factually inaccurate content)
  • Produce harmful content that violates OpenAI policies (e.g. hate speech, incitement to violence)
  • Amplify and perpetuate stereotypes of marginalized people
  • Generate realistic disinformation intended to deceive

While ChatGPT and GPT-4 struggle with the same limitations and risks, OpenAI has made special efforts, including numerous contradicting tests, to mitigate them for GPT-4. While this is encouraging, the GPT-4 system sheet ultimately demonstrates how vulnerable ChatGPT was (and perhaps still is). For a more detailed explanation of harmful unintended consequences, I recommend reading the GPT-4 system sheet, which begins on page 38 of the GPT-4 technical report .

Conclusion

While we know little about the model architecture, and training methods behind GPT4, there appears to be a refined version of ChatGPT. In fact, currently GPT4 is able to accept images and text input, and the results are safer, more accurate and more creative. Unfortunately, we'll have to take OpenAI's word for it, as GPT4 is only available as part of the ChatGPT Plus subscription.

Staying informed about the progress, risks and limitations of these models is essential as we navigate this exciting but rapidly evolving landscape of large language models.

BlogInnovazione.it

You might also like

Innovation newsletter
Don't miss the most important news on innovation. Sign up to receive them by email.

Latest Articles

Veeam features the most comprehensive support for ransomware, from protection to response and recovery

Coveware by Veeam will continue to provide cyber extortion incident response services. Coveware will offer forensics and remediation capabilities…

April 23 2024

Green and Digital Revolution: How Predictive Maintenance is Transforming the Oil & Gas Industry

Predictive maintenance is revolutionizing the oil & gas sector, with an innovative and proactive approach to plant management.…

April 22 2024

UK antitrust regulator raises BigTech alarm over GenAI

The UK CMA has issued a warning about Big Tech's behavior in the artificial intelligence market. There…

April 18 2024

Casa Green: energy revolution for a sustainable future in Italy

The "Green Houses" Decree, formulated by the European Union to enhance the energy efficiency of buildings, has concluded its legislative process with…

April 18 2024