NSF Stories

Verbal nonsense reveals limitations of AI chatbots

Researchers tracked how current language models such as ChatGPT mistake nonsense sentences as meaningful

October 24, 2023

Artificial intelligence chatbots use large language models to generate responses that seemingly mimic the way humans use and understand language, but a new study shows that these models remain vulnerable to mistaking nonsense for natural language. For a team of researchers at Columbia University, this flaw might point toward ways to improve chatbot performance and help reveal how humans process language.

The U.S. National Science Foundation funded the research, and the paper is published in Nature Machine Intelligence. The scientists presented hundreds of pairs of sentences to nine different language models, asking people who participated in the study which sentences in each pair they thought was more likely to be read or heard in everyday life. The researchers then presented the sentences to the models to see how they would rate each sentence pair.

In head-to-head tests, more sophisticated AI language models tended to perform better than simpler recurrent neural network models and statistical models that just tally the frequency of word pairs found on the internet or in online databases. But all the models made mistakes, sometimes choosing sentences that sound like nonsense to a human ear.

"That some of the large language models perform as well as they do suggests that they capture something important that the simpler models are missing," said Nikolaus Kriegeskorte, a co-author of the paper. "That even the best models we studied still can be fooled by nonsense sentences shows that their computations are missing something about the way humans process language."

"Every model exhibited blind spots," added senior author Christopher Baldassano, a Columbia psychologist. "That should give us pause about the extent to which we want AI systems making important decisions, at least for now."

A key question for the research team is whether the computations in AI chatbots can inspire new scientific questions and hypotheses that could guide neuroscientists toward a better understanding of the human brain.

"Ultimately, we are interested in understanding how people think," said Tal Golan, the paper's corresponding author. "Comparing [the models'] language understanding to ours gives us a new approach to thinking about how we think."