AI Chatbots and Meaning

Emma Borg

A sign (e.g. a word, sentence, picture) has meaning when it conveys information or stands for something.

Do AI chatbots really understand what they’re saying, or are they just good at putting words together?

Key Points:

  • AI chatbots (like ChatGPT, Gemini) count as grasping/processing meaning according to some accounts of what makes a sign meaningful but not all.

  • Knowing whether an AI system simply outputs meaningful signs or is sensitive to the meaning of those signs matters for how we interact with these systems. 

  • AI performance against behavioural benchmarks for grasp of meaning is continually improving, but the possibility remains that these systems are ‘gaming the tests’ (passing behavioural criteria but by the use of strategies that do not warrant ascription of richer properties).

Imagine a friend, Ada, who says “There is milk in the fridge”. What she utters is a meaningful sentence of English, but it is a further question whether Ada herself meant what she said. This difference is important because to rely on Ada’s utterance it matters that she produced that sentence because of what it means (if she was just humming or idly muttering song lyrics, rather than trying to convey a meaning, I shouldn’t use Ada’s utterance to direct my search for milk). 

So it is important, when thinking about AI, to distinguish between the question of whether a given sign has meaning and the question of whether the AI system understands that meaning (see Understanding natural language). AI chatbots output meaningful sentences but do they produce those outputs on the basis of what the sentences mean? Answers differ, but there are reasons to be sceptical:

1. Syntax vs semantics

Syntactic properties are the formal, structural features of a language (the order words come in, subject/object, etc). It is possible that syntax and semantics can come apart, with a system registering the formal properties of signs but not their meaning. This point lies at the heart of the argument that AI chatbots are “stochastic parrots”, i.e. that they are sensitive to the formal (statistical/distributional) properties of words only, not their meaning.

2. The Turing Test vs The Chinese Room

The Turing Test (or ‘Imitation Game’, designed by philosopher and code-breaker Alan Turing, imagines a human subject conversing, via text, with two other systems – one human and one artificial. If the subject cannot tell which is which then the artificial system is judged to pass the imitation game and hence count as a thinking thing. The Turing Test involves a behavioural criterion: if a system behaves enough like a human it should be credited with the same capacities as those which underpin the behaviour in the human case. So, if the linguistic performance of a chatbot is as good as a human, both should be judged as meaningful, semantic systems. 

This contrasts with the Chinese Room thought experiment): philosopher John Searle asked us to  imagine a person in a sealed room receiving input in a language they do not understand (the language in the original example was Chinese). The subject looks up incoming symbols in a table which matches them to other symbols, which the subject outputs. Searle contends that if the look-up table were comprehensive enough, the system as a whole could receive questions in Chinese and output appropriate answers, but the person in the room would not understand anything about what the symbols meant. Searle took this to parallel the behaviour of digital computers, showing them to be syntactic and not semantic engines. 

The jury remains out on which of these approaches is correct for AI chatbots: behavioural tests are good as they give us a concrete, objective way to assess grasp of meaning, but Searle’s suspicion that behavioural tests can always be ‘gamed’ (by crunching facts about the statistical distribution of words across a big enough data set) is hard to escape.

3. Different approaches to meaning: externalist/referential

Theories take meaning to reside in the connection between words and the world (‘dog’ refers to dogs). Internalist theories take meaning to reside in intra-linguistic relations, such as inferential or conceptual network properties (“dog” means what it does because users infer from “Fido is a dog” that “Fido is an animal, probably barks…”). Unless they are embedded in multimodal systems (perhaps ones capable of interacting with the world themselves, such as robots), AI systems lack direct connection to the world, so many people argue that, while AI chatbots might capture internalist (conceptual role) features of meaning, they do not capture externalist aspects of semantic content. 

4. Derived meaning

The meaning of linguistic and other external signs (like pictures) is derived – it is only because speakers have a practice of using ‘dog’ to refer to dogs that the word means what it does. Human thoughts, however, have been argued to have original meaning (what philosopher’s term ‘original intentionality’) – humans can introduce meaning de novo. When thinking about AI meaning, ask whether the outputs derive their meaning from us or whether the system itself is capturing and processing meaning/understanding language, and/or coining new meanings.

5. AI Hallucinations

AI chatbots sometimes produce sentences that are false. They may also go on to produce further false claims in support (e.g. a system which outputs the false claim that “Virginia Woolf won the Nobel Prize for Literature” might go on to appeal to a fictional biography of Woolf where this claim is supposedly made). This has become known as AI hallucination and it points to a fundamental difference between chatbots and ordinary speakers: we assume that other people (in general) are concerned with the truth or falsity of what they say, but chatbots are primarily concerned with statistical properties (looking for the most  likely combination of words in a given context). Roughly, they are driven by a search for linguistic plausibility not truth. Companies are trying to find ways to mitigate this issue but it is unlikely to be resolved in the near future, meaning that users should exercise caution when relying on AI generated content.   

6. Semantics vs pragmatics

‘Semantic content’ is the standing, conventional meaning of simple and complex expressions, which can differ from ‘pragmatic content’, i.e. the meanings that get produced on the fly, where the contribution of particular expressions can vary wildly across contexts. Compare the literal (semantic) meaning of “There is milk in the fridge” (where the sentence is true even if there is only a puddle of milk at the bottom of the fridge) to its contextual (pragmatic) meaning (where the utterance might convey “there is milk for tea in the fridge”). For AI outputs, ask whether they are restricted to literal meaning or not (this matters when considering how the public should interpret AI outputs).

Recent advances in AI have led to systems with incredible linguistic capacities - ask an AI chatbot a question and you are extremely likely to receive a well-formed, contextually appropriate, and articulate answer. But at time of writing it is simply unclear what skills really lie behind this impressive level of performance and whether AI systems should be understood as genuinely grasping/manipulating meaning or simply as statistical predictors par excellence.

Previous
Previous

Consciousness