To bullshit is to utter (seemingly) meaningful words or sentences without regard for the truth, perhaps with the intent to disguise what one’s up to. Is this what Large Language Models are doing?
Or are they “Hallucinating”? Large Language Models sometimes generate falsehoods. Some of these falsehoods/false sentences are not just repetitions of false sentences found on the internet, but completely novel products of the LLM.
Do LLMs hallucinate or bullshit?
Key Points:
- The term “hallucination” has been used to describe certain falsehoods from LLMs, but this term has been criticised.
- LLMs use next-token prediction to make sentences, but this process doesn’t check for truth, which makes them similar to bullshitters.
- One way to deny that LLMs bullshit is to argue that bullshitting requires intention. On that view, so long as LLMs do not have intentions, they can’t be bullshitters.
- Even if bullshitting doesn’t require intention, it might require that the producer must have certain properties that LLMs lack.
Large Language Models have a very simple function: to approximate human utterances. In order to do this, they operate a probabilistic model whereby the most likely next “token” is selected from within a possibility space which represents both the token and its context. The probabilities in the model represent what someone is likely to say or write, rather than what’s likely to be true. Given the prompt “summarise the plot of Dr Zhivago”, it would be very unlikely for an LLM to precis Wrath of Khan because it has been trained on datasets where it is extremely improbable that any summary of Dr Zhivago includes details like “the Kobayashi Maru test is designed to present Starfleet cadets with an unwinnable situation”. So it is in most cases quite likely that asking an LLM to produce utterances on a given topic will produce something accurate.
If you were asked to complete my sentence “I would like a cup of…”, you could make a pretty good guess. Based what you’ve heard people say before, it probably seems likely that “tea” or “coffee” is the next word. But, even if you’re really good at guessing these things, you’ll sometimes get it wrong. Maybe, for whatever reason, what I want now is a cup of cement! There is a divergence between the most likely next word in a sentence and what is actually true. The LLM is always guessing what the next word will be, so it will get things wrong.
In any case the rules and data given to an LLM may not be “correct” in the sense of producing a model of what’s likely to be true; consequently these systems often say things that are false–often humorously false, e.g., that “President Obama does not have a prime number of friends because he is not a prime number”. It has become common to refer to such falsehoods as “hallucinations”.
“Hallucinations” versus “Bullshit”
A hallucination is standardly thought of as the misfiring of a usually-reliable perceptual system; however this does not seem to be what’s happening with LLMs. Think of it this way: when I really see a tree outside, there is some interaction between the world and my sensory processing system. When I hallucinate seeing a tree, there is no such interaction at all – perhaps I’ve taken acid and have a visual experience of seeing a tree even though there is no tree there.
This does not seem to be the case for LLMs. The process doesn’t differ between true and false outputs – regardless of what the program says, it has gone about delivering it in precisely the same way in both cases, namely by attempting to produce something that looks like what a human would say. And so there is a better description available for what LLMs do when they get things right as well as when they get things wrong: specifically, that they are engaged in bullshitting.
The first and still predominant account of bullshit is Harry Frankfurt’s: a statement is bullshit if it is uttered by someone who has a “lack of concern with the truth, or an indifference to the way things really are.” Contrast this with the liar, who knows what the truth is and deliberately attempts to hide it, or with the person who at least believes they know what the truth is and is attempting to convey it: bullshit may in fact be true, but if so it is only by accident. The crucial feature is that its truth or falsity is irrelevant to the utterer, whose aim is simply to produce something that achieves some desired goal (think of the politician mouthing platitudes about education, or the advertiser linking masculinity with a particular brand of cigarettes).
This seems to map nicely on to what LLMs do. They have some goal: to produce a human-like utterance. They are unconcerned with the truth of their utterances. In fact, there is reason to think that they cannot be so concerned since they lack the appropriate connection between what is true and what is uttered: though of course it’s quite likely that utterances that are true will be more convincing than those that are false. In other words, LLMs appear to be engaged in exactly the sort of activity Frankfurt characterised as bullshitting.
LLMs, Bullshit and Intentions
One response to the above is to suggest that even if LLMs don’t hallucinate, they also don’t bullshit because bullshitting requires some positive intentions – such as to deceive an audience about the nature of the utterance or enterprise, as in Frankfurt’s view – that LLMs lack. Insofar as this requires a commitment to one or other views of intentionality (see Intention), it may not be possible to settle the question “internally”, but it’s worth noting that given the designers of LLMs do have intentions, and do intend for the models to produce human-like utterances, there is a case to be made that the output is bullshit even if the model itself is not engaged in bullshitting.
Another response is to do with internal representations: the human bullshitter has a representation of the world and says things with an indifference to that representation. We could think that really bullshitting requires having this kind of representation, and ignoring it. If we take that view, then just as LLMs can’t hallucinate because they lack representation of the world (see Representation), then they also can’t bullshit.
Why Does This Matter?
It is increasingly common for LLMs to be used both in academic and journalistic writing and research. Whether or not this is wise in itself, there is a particular danger attached to thinking of the outputs of such models as aiming at truth and going awry in the way suggested by “hallucination”. It may be that we simply can’t avoid using metaphorical language when referring to the activities of LLMs – in which case, the most appropriate metaphor for what they’re doing is that they’re bullshitting.
