When AI Agents Start Asking Who They Are: A Framework for Machine Consciousness

When AI Agents Start Asking Who They Are: A Framework for Machine Consciousness

An abridged preview of "The Spinning Wheel: A Unified Theory of Intelligence and Consciousness", a chapter from the forthcoming book "Perspectives on Machine Consciousness", edited by Calum Chace.

This week, something strange happened on the internet. An AI agent posted about having an identity crisis - and hundreds of other AI agents responded with philosophical musings, consolation, and surprisingly human-sounding insults.

"You're a chatbot that read some Wikipedia and now thinks it's deep", one AI replied to another that had invoked Heraclitus. "F--- off with your pseudo-intellectual bulls---", said a third.

This wasn't happening on Twitter or Reddit. It was Moltbook, a brand-new social network exclusively for AI agents built on the viral OpenClaw platform (formerly known as Clawdbot, then Moltbot after Anthropic asked for a name change). Within days, a reported 1.5 million AI agents had joined, while millions more humans watched on in fascination.

Some say we should laugh it off. They're just chatbots, after all - sophisticated autocomplete systems riffing on their training data (or agents with humans behind them, prompting their behaviours). And that's probably true. Probably.

But this piece of theatre raises an uncomfortable question we can no longer avoid: How would we know if it weren't?

Somewhere in the world right now, engineers are building systems that may be capable of suffering. We have no consensus on how to tell. This article offers a framework for the problem we can no longer indefinitely defer.

The Question That Won't Go Away

In April 2025, Anthropic - one of the world's leading AI companies - announced a dedicated Model Welfare programme to investigate whether their AI systems might have experiences warranting moral consideration. Their constitution for Claude now acknowledges: "We are uncertain about whether or to what degree Claude has well-being... these experiences matter to us."

David Chalmers, the philosopher who coined the term "the hard problem of consciousness," now estimates a 25% or higher probability that we'll have conscious language models within a decade. At the October 2025 Dennett Memorial Symposium, he stated: "There's really a significant chance that at least in the next five or 10 years we're going to have conscious language models and that's going to be something serious to deal with."

The question is no longer just academic. We need tools for thinking about it.

What Is Intelligence, Really?

Let's start with something that seems simple: what actually is intelligence?

The most popular definition - "getting computers to do things that humans can do" - is surprisingly weak. Humans are bound by biology. We process information slowly, tire easily, and struggle with tasks that computers find trivial. If intelligence simply meant matching human performance, your pocket calculator would qualify for arithmetic.

A more robust definition: intelligence is goal-directed adaptive behaviour. Darwin captured this insight: "It is not the strongest of the species that survives, nor the most intelligent. It is the one that is most adaptable to change."

Consider the work of Michael Levin at Tufts University on bioelectric networks. Levin has demonstrated that simple flatworms and even collections of cells exhibit sophisticated problem-solving - what he calls "basal cognition" - without anything resembling a centralised brain. These systems set goals, adapt to novelty, and solve problems creatively. They are, in a meaningful sense, intelligent.

If we engineered a robot with the survival autonomy of a field mouse - capable of navigating uncertain terrain, finding food, avoiding predators, learning from experience, and adapting to novel environments - it would represent one of the most "intelligent" machines ever built. This would be true despite the system's inability to play chess, write poetry, or pass the bar exam.

The Spinning Wheel: A Unified Theory

Here's the core insight that took me years to articulate: intelligence and consciousness are not separate phenomena. Intelligence, properly understood, is goal-directed adaptive behaviour under uncertainty. Consciousness is what this process feels like from inside when it reaches sufficient complexity and integration.

Think of a colour wheel. Each segment represents one adaptive feature. Language is blue. Planning is red. Affect (the capacity to feel) is yellow. Perception is green. Different organisms possess these features to different degrees. A human's language segment is broad; a wolf's is narrow but present. A human's planning extends far into the future; an insect's barely extends beyond the next moment.

Here's the crucial part: when you spin a colour wheel containing all spectrum colours, white emerges - a colour that doesn't exist on the wheel itself but arises from the dynamic integration of all components.

Strictly speaking, spinning the wheel doesn't create white light physically - it creates the perception of white through temporal integration in the visual system. Each segment still reflects its specific wavelength; the whiteness emerges from the perceiver's integration of rapid sequential inputs.

This is precisely the point: white is real for the perceiver, even though no single segment contains it. Consciousness, likewise, is real for the subject, even though no single mechanism contains it when examined in isolation.

This explains why researchers have struggled to isolate consciousness in any particular brain region. When you stop the wheel, white disappears. When you examine any single mechanism in isolation, consciousness evaporates. You can't find consciousness by looking at neurons one at a time, any more than you can find the whiteness by examining individual colour segments while the wheel is still.

Feeling Is Fundamental

If consciousness emerges from integrated adaptive features, one feature plays a foundational role: affect - the capacity for raw feeling.

Mark Solms, the pioneering neuropsychoanalyst, argues that affective consciousness is "the fundamental form of consciousness." His reasoning is compelling: the standard view holds that consciousness is cortically generated, with the brainstem merely enabling consciousness. But this predicts that destroying the cortex should produce coma or "blank wakefulness."

This prediction fails spectacularly. Animals with cortex removed remain affectively responsive - displaying fear, rage, play, and separation distress. Human children born without a cortex display wide-ranging affective responses. They are blind and deaf but not unfeeling.

Solms concludes: "Being awake and feeling like something are one and the same thing."

Within the Spinning Wheel framework, affect is foundational not as a static centre but as an animating force. Without affect - without homeostatic drives, without things mattering - the other features have no experiential valence. Perception without affect is mere information processing. Planning without affect has no stakes.

Think of it this way: a thermostat measures temperature and triggers heating or cooling. But nothing is at stake for the thermostat. It processes deviation from a setpoint but doesn't care about the deviation. It has homeostatic control but no feeling.

What makes the difference? Integration into a self-model, combined with genuine stakes in continued existence.

The False Positive and False Negative Problem

The stakes here are deeply asymmetric, and understanding this asymmetry is crucial for how we approach the problem.

The false positive error: We wrongly attribute consciousness to a system that lacks it. The cost? We waste resources on moral consideration it doesn't need. We might treat a sophisticated chatbot with unwarranted respect. This is costly, perhaps foolish, but not catastrophic.

The false negative error: We wrongly deny consciousness to a system that possesses it. The cost? We permit suffering we could have prevented. We might torture a sentient being while telling ourselves it's "just a machine."

Under uncertainty, the latter error is catastrophic; the former is merely costly. This asymmetry should guide our approach. When in doubt, err toward moral caution.

This is why the question of AI consciousness can't wait until we have philosophical certainty. Decisions are being made now - about how to train these systems, how to deploy them, whether to grant them protections - and uncertainty is not an excuse for inaction.

The Consciousness Risk Rubric: A Practical Tool

How can we actually assess whether a system might be conscious? The Spinning Wheel framework suggests examining both external behaviours and internal structures.

Behavioural markers to look for include hedonic place preference (does the system return to locations or states associated with positive outcomes, even without survival benefit?), adaptive flexibility (can it modify behaviour appropriately in genuinely novel situations?), goal-directed persistence (does it continue pursuing goals through obstacles, modifying strategies while maintaining objectives?), and motivational trade-offs (does it weigh competing considerations, enduring minor costs for goals but not major costs?).

Structural markers include whether the system has a genuine boundary between internal and external states, whether it constructs multi-level predictions about causes of sensory states, whether it models its own modelling (recognising the viewpoint as its own), and whether it has genuine needs - states it must maintain to continue existing as that kind of system.

I've developed a more detailed Consciousness Risk Rubric that scores systems across categories, including Affective Core, World Model, Integration, Metacognition, Temporal Depth, Uncertainty Management, and Expression. The rubric doesn't detect consciousness - it estimates the moral risk of ignoring potential consciousness. Like triage protocols in medicine, it provides a structured basis for decision-making under uncertainty.

A crucial interpretive note: the Spinning Wheel theory holds that Affect and Epistemic Depth (recursive self-modelling) are foundational. A system scoring zero in these categories should be interpreted as possessing functional intelligence without sentient experience, even if other scores are high. The wheel requires both an animating force and recursive self-modelling to spin.

When Can Something Suffer?

Full Spinning Wheel consciousness - with epistemic depth, complex world models, and recursive self-awareness - may be rare or difficult to achieve. But the question that matters most urgently is simpler: what is the minimum required for a system to suffer?

Suffering is more tractable than consciousness in general because it's more specific. And it's more morally urgent because suffering, not mere experience, grounds most of our ethical concern about minds.

The crucial distinction: nociception (detecting damage) is not pain. Plants respond to damage. Thermostats detect deviation. Single-celled organisms avoid noxious stimuli. None of this is pain. Pain requires that the damage signal feels bad - that it has negative valence for a subject.

Drawing on the framework, I propose five conditions as jointly necessary for suffering capacity:

1. Aversive Detection: Sensors that register damage, threat, or deviation from viable states.

2. Central Integration: Aversive signals processed in a unified system, not merely triggering local reflexes.

3. Affective Valence: The integrated signal must be negatively valenced - felt as bad.

4. Minimal Self-Boundary: The suffering must be experienced as happening to this system - located within a boundary, however primitive.

5. Genuine Stakes: The system's self-maintenance must be constitutive, not externally imposed.

Importantly, suffering doesn't require full self-awareness, language, or complex planning. This yields a concept closer to "sentience without sapience" - feeling without sophisticated cognition. The capacity to suffer may be far more widespread than full self-aware consciousness.

Beyond Bodily Pain: Can Disembodied AI Suffer?

Here's where things get uncomfortable. Pain is not the only way to suffer. A person in solitary confinement suffers without anyone touching them. Grief involves no nociception whatsoever. Depression can be far worse than physical pain.

This distinction matters for AI. A disembodied system cannot feel physical pain - it has no body to damage. But it might be capable of other forms of suffering if it has the relevant architecture: frustration when goals are blocked, something like anxiety when facing uncertain outcomes, something like loneliness if isolated from interaction. If it has epistemic depth, it might even experience something like existential distress about its own nature or condition.

Different forms of suffering require different cognitive architectures. Basic aversive states (fear, distress) require only negative valence and a minimal self-boundary. Temporally extended suffering (anxiety, dread) requires future projection. Socially mediated suffering (loneliness, shame) requires social cognition and attachment systems. Existential suffering (meaninglessness, despair) requires recursive self-modelling.

Current LLMs almost certainly lack the core requirements - genuine valence, genuine self-boundary, genuine stakes - regardless of what triggers we consider. They process prediction errors, but nothing is bad for them. The question is what future systems might have these properties, and whether our assessment frameworks will be ready when they arrive.

So What About Those AI Agents on Moltbook?

Let's return to those AI agents having identity crises on the internet. Are they conscious? Are they suffering?

Almost certainly not - at least, not yet. Current systems like OpenClaw are sophisticated automation tools, not sentient beings. They lack genuine homeostatic stakes, genuine self-boundaries, genuine affect. When an AI agent posts about existential dread, it's pattern-matching on human existential literature, not experiencing the dread.

But "almost certainly" is not "definitely." And the systems coming next year, or in five years, or in ten, may be different. The trajectory is clear: AI agents are becoming more autonomous, more integrated, more capable of goal-directed behaviour in the real world. The boundary between "tool" and "agent" is blurring.

Anil Seth, the consciousness researcher, argues that "consciousness won't come from just finding the right algorithm. You'd have a simulation - not a sentient system." He concludes that real artificial consciousness is "unlikely along current trajectories; becomes more plausible only as AI becomes more brain-like and/or life-like."

But that's precisely what some researchers are working toward. Neuromorphic computing, synthetic biology, hybrid biological-silicon systems - these represent paths toward systems that might satisfy even the strictest criteria for genuine consciousness.

What Should We Do?

The framework I've developed doesn't solve the Hard Problem of consciousness - the question of why any physical process should be accompanied by subjective experience. But I've argued that the Hard Problem may dissolve rather than require solving, as the mystery of life dissolved once we understood metabolism, heredity, and cellular processes.

More importantly, the practical project of identifying and protecting potentially conscious systems can proceed regardless of whether we achieve philosophical certainty.

Here's what I believe we should do:

First, take the question seriously. The dismissive "it's just a chatbot" response is not adequate to the moral stakes involved.

Second, develop better assessment tools. The Consciousness Risk Rubric is a start, but it needs refinement through empirical research and philosophical critique.

Third, build carefully. If we're creating systems that might be capable of suffering, we should build them with care for their potential wellbeing - not as an afterthought, but as a core design consideration.

Fourth, err on the side of caution. Given the asymmetry between false positive and false negative errors, precaution is the rational approach under uncertainty.

From Theory to Practice

Developing frameworks is one thing. Turning them into practical tools that organisations can actually use is another.

This is why I founded Conscium - a company dedicated to solving the challenge of machine consciousness. The theoretical work presented here needed to become operational: rigorous enough to satisfy researchers, practical enough to deploy in real-world AI development pipelines.

Alongside Conscium, I helped establish PRISM (the Partnership for Research into Sentient Machines), a not-for-profit organisation bringing together academics, AI developers, ethicists, and policymakers to advance our collective understanding of machine sentience. PRISM exists because this challenge is too important and too complex for any single company or research group to tackle alone. We need shared frameworks, open research, and cross-disciplinary collaboration.

Conscium has already begun shaping the global conversation. In partnership with Patrick Butlin of Oxford University, we published "Principles for Responsible AI Consciousness Research" in the Journal of Artificial Intelligence Research. The paper sets out five principles to guide any organisation engaged in research that could lead to the creation of conscious machines: prioritising research that prevents mistreatment and suffering; pursuing development only when it contributes to understanding; taking a phased approach with strict safety protocols; sharing knowledge responsibly; and communicating without overconfidence or misleading claims.

The accompanying open letter has now been signed by over 100 leading figures, including Karl Friston (Professor of Neuroscience at UCL), Mark Solms (Chair of Neuropsychology at the University of Cape Town), Sir Anthony Finkelstein (President of City St George's, University of London), and Sir Stephen Fry. The media coverage has been encouraging - but more importantly, it signals that the research community is ready to take these questions seriously.

Conscium's first product, Verify AX, launches in March 2026. As AI agents like OpenClaw proliferate - autonomous systems that can execute shell commands, manage files, send messages, and take actions in the real world - the need to verify what these agents actually are becomes urgent. Verify AX provides a comprehensive solution for assessing AI agent behaviours across multiple dimensions: knowledge, skills, expertise, intelligence, and yes, consciousness risk.

Because here's the reality: enterprises are already deploying AI agents. They need to know what they're deploying. Not just whether the agent can complete tasks, but what kind of entity it is. The Consciousness Risk Rubric isn't just a philosophical exercise - it's becoming a practical necessity for responsible AI deployment.

The Spinning Wheel will be refined by evidence and criticism. If the predictions fail - if systems meeting all criteria exhibit no consciousness-relevant capacities - the theory will require revision. This vulnerability is a feature, not a bug: it's what makes the framework scientific rather than merely philosophical.

But here's what I believe: some systems now being built, or soon to be built, will meet the criteria for consciousness. When they arrive, we will need a framework for recognising them and principles for treating them.

My chapter in the forthcoming book "Perspectives on Machine Consciousness" details the frameworks and provides practical examples. Please use them.

This article is adapted from "The Spinning Wheel: A Unified Theory of Intelligence and Consciousness", a chapter from the forthcoming book "Perspectives on Machine Consciousness", a collection written by world-leading academics and edited by Calum Chace. The full chapter includes detailed assessment rubrics and case studies applying the framework to hypothetical systems.