A recent paper published in Ethics and Information Technology has sparked debate within the AI research community by proposing that the outputs of large language models (LLMs), such as ChatGPT, should not be referred to as “hallucinations” but rather as “bullshit” in the philosophical sense explored by Harry Frankfurt. The paper, authored by Michael Townsen Hicks, James Humphries, and Joe Slater from the University of Glasgow, argues that LLMs produce text without any genuine concern for the truth, making them fundamentally different from how they are currently understood.
The term “AI hallucination” has been commonly used to describe instances where AI models generate false or misleading information. However, Hicks and his colleagues suggest that this metaphor is misleading. They contend that the metaphor of “hallucination” implies that the AI is somehow misperceiving reality or making an innocent error in its understanding. In contrast, the authors argue that LLMs like ChatGPT are not designed to understand or convey truth in the first place. Instead, they are programmed to generate text that appears coherent and contextually appropriate, without any underlying concern for accuracy or factual correctness.
According to the researchers, the outputs of LLMs are more accurately described as “bullshit” in the Frankfurtian sense—a term that denotes speech or text produced without regard for its truth value. Frankfurt, a renowned philosopher, explored this concept in his 2005 book On Bullshit, where he distinguished bullshit from lying. While a liar is concerned with hiding the truth, a bullshitter is indifferent to the truth altogether. Hicks and his colleagues draw a parallel between this concept and the behavior of LLMs, which they describe as indifferent to the truth in their outputs.
The paper makes a distinction between two types of bullshit: “hard” and “soft.” Hard bullshit involves a deliberate attempt to deceive the audience about the nature of the enterprise, while soft bullshit simply reflects a lack of concern for truth. The authors argue that at a minimum, LLMs produce soft bullshit, as they generate text without any intent to mislead about their truthfulness. However, they also raise the possibility that LLMs could produce hard bullshit if they are seen as intentionally designed to give the impression of accuracy and truthfulness.
The implications of this research are significant for both policymakers and the public. The authors warn that the current use of the term “AI hallucination” could misinform people’s understanding of AI technology and its limitations. They argue that recognizing LLM outputs as bullshit rather than hallucinations would lead to a more accurate and useful discussion about the behavior and potential risks of these systems.
Hicks, Humphries, and Slater also highlight the dangers of relying on LLMs for tasks where accuracy is critical, such as legal work or medical advice. They point to recent incidents where AI-generated content included entirely fabricated information, causing serious issues for users who relied on it. For example, a lawyer used ChatGPT to draft a legal brief, only to find that many of the cited cases were completely fabricated, leading to professional embarrassment and potential legal consequences.
The paper concludes by calling for a re-evaluation of how AI-generated content is discussed and understood, particularly in high-stakes situations. The authors emphasize the importance of developing a more nuanced understanding of the limitations of LLMs and the potential consequences of their widespread use.
This research is expected to influence ongoing debates about the ethical implications of AI technology and may prompt a shift in how AI outputs are framed in both academic and public discourse.
Credits: This article is based on research conducted by Michael Townsen Hicks, James Humphries, and Joe Slater from the University of Glasgow. The paper was published in Ethics and Information Technology on June 8, 2024.