More on the topic...
Generating detailed summary...
Failed to generate summary. Please try again.
New research raises serious concerns about the reliability of artificial intelligence, particularly large language models (LLMs) like ChatGPT. A study conducted by JV Roig at Kamiwaza AI revealed that as the amount of input text increases, the tendency for these models to "hallucinate" or generate incorrect or fabricated information also rises. For example, with a 32,000-word input, China's Zhipu AI's GLM 4.5 model made mistakes only 1.2% of the time. However, with a larger input of 128,000 words, that error rate jumped to 3.2%, and with 200,000 words, some models failed entirely, hallucinating answers in most instances.
The challenges don’t seem to stem from simple programming errors. A paper from Tsinghua University suggests that the flawed neurons responsible for hallucinations are formed during initial training, making them difficult to remove later. LLMs operate on a probabilistic basis, prioritizing fluency over factual accuracy. This fundamental issue poses risks in high-stakes applications, such as legal and financial contexts. For instance, a recent New York Times investigation found that LLMs could produce errors on tax forms that might lead to serious legal ramifications, like tax evasion.
Despite the potential for LLMs to assist with everyday tasks, the article cautions investors and executives about their limitations. Current models may suffice for less critical tasks, but relying on them for complex functions like accounting can expose companies to significant risks. While the industry is exploring alternative approaches to mitigate hallucination issues, those solutions may take years to develop.
Questions about this article
No questions yet.