Articles

Brief analysis of emerging skills in large linguistic models

Much of the research on artificial intelligence over the past two decades has focused on training neural networks, to perform a single task with specific training data sets. For example, classify if an image contains a cat, summarize an article, translate from English to Swahili ...

In recent years, a new paradigm has evolved around language models: neural networks that simply predict the next words in a sentence given the previous words in the sentence.

After being trained on a large body of unlabeled text, linguistic models can be "invited" to perform arbitrary tasks such as predicting the word following a sentence. For example, the task of translating an English sentence into Swahili could be rephrased as predicting the next word: "The Swahili translation of 'artificial intelligence' is ..."

From task-specific to task-general

This new paradigm represents a shift from models task-specific, trained to perform a single task, in models task-general, which can perform various tasks. Plus the models task-general they can also perform new activities that have not been explicitly included in the training data. For example, GPT-3 showed that linguistic models can successfully multiply two-digit numbers, even if they have not been explicitly trained to do so. However, this ability to perform new tasks only occurred with models with a certain number of parameters and trained on a sufficiently large data set.

Emergency as a behavior

The idea that quantitative changes in a system can lead to new behavior is known as emergency, a concept popularized by Nobel laureate Philip Anderson's 1972 essay “More is Different”. In many disciplines such as physics, biology, economics and computer science, the emerging phenomenon has been observed in complex systems.

In a recent article published Transactions on Machine Learning Research, the lab HAI in Stanford University definishes emerging skills in large language models as follows:

A skill is emergent if it is not present in the smaller models but is present in the larger models.

To characterize the presence of skills emerging, our article aggregated the findings for various models and approaches that have emerged over the past two years since the release of GPT-3. The paper examined research that analyzed the influence of scale: models of different sizes trained with different computational resources. For many activities, the behavior of the model grows predictably with scale or increases unpredictably from random performance to higher than random values at a specific scale threshold.

To learn more read the article on emerging skills in linguistic models

Jason Wei is a research scientist at Google Brain. Rishi Bommasani is a sophomore doctoral student at Stanford's Department of Computer Science who helped launch the Stanford Center for Research on Foundation Models (CRFM). Read their study "Emerging Abilities of Large Language Models,", written in collaboration with scholars from Google Research, Stanford University, UNC Chapel Hill, and DeepMind.

Staff BlogInnovazione.it