Metacognition (literally cognition about cognition) is used to pick out a loose cluster of cognitive operations, skills, or capabilities that monitor and control other cognitive operations, skills, or capabilities.

AI systems can perform operations over some of their own processes, but does that amount to metacognition? Would metacognition make AI systems more human-like or more powerful?


Key Points:

  • In the context of some current research trends in AI development, metacognitive capacities are sought to achieve the enhancement of some capability of an AI system, thereby facilitating improvement in the performance of some task.
  • However, the specifics of what ‘metacognition’ entails are not clear, even within the cognitive sciences, as the literal meaning of ‘metacognition’ may be (and has been) fleshed out in multiple, alternative ways.
  • For example, when an AI system performs some operation over some of its own processes, such as offering a confidence judgement about some output or detecting some error in its own processes, is that sufficient to show that the system possesses metacognition?
  • This state of affairs is not inconsequential for AI. Alternative views on metacognition may require artificial systems to possess different capacities, each involving distinct design strategies and technological implementations, thus leading us to markedly different research and engineering programs.

At first glance, the motivations for incorporating metacognitive abilities, skills, or capabilities into artificial systems are numerous and diverse, spanning various sub-domains of the field of AI. In system safety, for instance, metacognition has been considered as key for enabling AI systems to detect failures, diagnose errors, and prevent malfunctions. In areas that study decision-making, metacognition has been posited as crucial in helping systems evaluate when to use different problem-solving strategies or systems. In the pursuit of trustworthy AI, metacognition has been proposed as a means for systems to assess their own reliability, allowing them, for example, to report reliability scores to human decision-makers, helping users to calibrate their reliance on the AI system’s advice.

Behind the plurality of motivations and reasons illustrated by these examples, a common thread underlies the interest in metacognitive artificial systems. Whatever its precise form, metacognition is sought as a means to achieve the enhancement of some capability of an AI system, thereby facilitating improvement in the performance of some task. As the examples suggest, its implementation is assumed to lead to better failure detection, more refined decision-making, and greater overall trustworthiness. In this context, ‘better’ could be cashed out in terms of efficiency and utility, thereby suggesting that implementing metacognitive capacities in AI systems is a means to achieving more efficient and optimal ways of performing some task. However, implementing metacognitive capacities in artificial systems faces some issues:

1. Terminological Indeterminacy

A first challenge in approaching metacognition in artificial systems stems from the apparent clarity the term itself seems to offer. At face value, ‘metacognition’, literally, ‘cognition about cognition’, might seem to suggest a straightforward engineering goal: designing systems that apply cognitive operations over their own cognition. However, this superficial clarity offers little practical guidance. The core difficulty lies in the fact that ‘cognition’ remains a contested concept within the cognitive sciences, and, by extension, there is little agreement on what exactly counts as a ‘cognitive’ operation. In the absence of further specification, ‘metacognition’ risks collapsing into a catch-all notion, whereby any purported cognitive process applied to another could count as ‘metacognitive’. As such, the term can end up encompassing a broad array of operations, such as monitoring, controlling, evaluating, representing, gaining knowledge, that operate over the most varied range of targets, including remembering, reasoning, representing, thinking, and believing, just to name a few. 

2. Multiplicity of Views

The vagueness of the term ‘metacognition’, has, at least in part, facilitated a proliferation of theoretical refinements. Across disciplines such as cognitive psychology, developmental psychology, cognitive science, pedagogy, and comparative cognition, researchers have developed scientifically grounded accounts that articulate different features, mechanisms, and roles as constitutive of metacognition. However, these efforts have not converged on a unified view. Instead, they have resulted in competing views that offer largely incompatible definitions and explanations of purported metacognitive phenomena. For instance, attributivist, or mindreading-based views, posit metacognition as being an agent’s ability to turn its mindreading capacities to itself, thereby acquiring the ability to engage in the self-attribution of representations. By contrast, evaluativist views treat metacognition as a capacity independent from a mindreading apparatus, which is realized by a distinct set of mechanisms whose function is to facilitate an agent’s self-evaluation. This fractured theoretical landscape has important implications for attempts to implement or identify metacognition in artificial systems. Because different views pick out different features, mechanisms, and roles as central to metacognition, there are no obvious, universally agreed-upon markers for what metacognition consists of or how it should be implemented. As a consequence, both the design of a purportedly metacognitive architecture and the identification of metacognitive activity in AI systems will largely depend on the background theoretical view of what metacognition is. That choice will, in turn, determine which features the system is expected to exhibit and what counts as evidence of metacognitive performance.

3. Testing for Metacognition

Even once we’ve identified features that count as metacognitive, a further challenge remains: how can we test for them? Most existing tests for metacognition were originally designed with humans in mind. These often rely on subjects’ linguistic abilities, particularly their capacity to provide verbal reports of their subjective experiences, such as confidence or uncertainty judgements. But once we move beyond the human case, things get trickier. Non-human animals, for instance, cannot give verbal reports, so researchers have had to develop behavioral markers or techniques that could allow us to infer metacognitive activity, such as opting out of difficult tasks or seeking additional information when uncertain. For AI researchers, this raises the following question: where do AI systems with purported metacognitive capacities fall in this spectrum? Some may generate language-like outputs, while others do not. Can existing tests, whether human or animal-oriented, be repurposed or adapted for the AI case? Or are entirely new benchmarks required?


Despite these challenges, the pursuit of metacognitive capacities in AI systems remains a potentially transformative research avenue. In humans, metacognition plays an important role in enabling agents to allocate resources in adaptive and efficient ways, to select problem-solving strategies, and to make productive use of limited information by constructing and exploiting rich internal models of their environment. If analogous forms of metacognition could be realised in AI systems, they may allow such systems to navigate resource and time constraints more effectively, thereby enhancing their capacity to generalise and act under conditions of uncertainty. Therefore, while conceptual and methodological questions remain, progress on metacognition in AI has the potential not only to address domain-specific engineering challenges, such as system safety, but also to illuminate broader questions about how agents, biological or artificial, monitor and control their own cognitive activity.