AI "Consciousness" is a Pillar of the Next Huge Tech Grift
Deeply flawed research is emerging to support the fantasy that LLMs are developing minds. The real goal: investor dollars.
In recent months we’ve seen a crest in relatively unsophisticated popular anxiety over artificial intelligence, either as a ‘threat’ that will someday destroy humanity, either before or after becoming self-aware; or merely as something uncanny that’s developing in unpredictable ways and is generally scary. An entire new class of talking heads is arising to recite the spoopy story that AI will achieve consciousness through “the singularity,” and that it is inherent to its nature that it doesn’t care about human life and will probably destroy us.
There are subtler stories to be told about AI. One that the business press and economists have been taking seriously for a very long time is the possibility that straightforward, not-at-all-necessarily-‘conscious’ AI will be so magically effective and convincing in its simulation of human communication that it will render a wide variety of human professions more or less redundant. In fact, at a certain strata of professional planning discourse this is treated almost as a fait accompli, with the discussion largely shifting to exactly how to deal with the excess of unnecessary human labor the robot revolution will leave on the market.
It’s important to notice when, and by whom, the broad public discourse around AI is being strategically shifted from the latter to the former: from an analysis of the impact of comprehensible man-made technology on material human conditions, to an almost religious eschatological (end of the world) narrative, focused on a massive but ill-defined future event, the AI singularity, that would be totally transformative and totally destructive. There are even hints of Revelation in the Roko’s Basilisk thesis that you can’t badmouth the AI now, before it exists, or you and your families will be killed *in the future*. (Yes, this is all pretty silly.)
One way to understand that shifting (or, perhaps, transubstantiation) is as a shifting between democratic rationalist discourse and a more mystifying and mythical rhetoric. This mythification, notably, has emerged largely from the mouths of a priestly caste – not of computer engineers per se, but computer science ‘experts’ – which has earthly influence to supplement its broadly accepted claim to arcane and powerful knowledge.
It’s not coincidental that there’s heavy overlap between AI alarmism and the so-called “rationalist” community, including LessWrong and Elizer Yudkowsky. As El Sandifer has convincingly argued, that cadre has at the very least some affinities for the neoreactionary world of Curtis Yarvin and Peter Thiel. And hair-raising stories about incomprehensible computer monsters is not necessarily a bad way to bring about a neo-feudal society of lords and peasants.
But that’s a larger discussion. Today, instead, it’s my onerous and unfortunate and deadly dull duty to wave a bit of a flag at some trends in academia that are amplifying the mythification of the artificial intelligence ‘threat.’ Because the academy, tragically, is itself increasingly a curtain of mystery behind which certain people have figured out how to hide an embarrassingly reduced reality.
In this case, the metaphorical man behind the curtain is a facile anthropomorphism masquerading as rigorous, even elite research. This masquerade is implicitly motivated by expedience, and conducted in service to a powerful elite agenda.
Wolves in the Throne Room
The narrative of an AI apocalypse serves several convenient ideological ends. Most obviously, it helps lay the groundwork for what will effectively wind up being investment fraud at a massive scale.
The logic of this “dark pattern” AI pump-and-dump campaign isn’t subtle. if we are told repeatedly that AI might be ultra-powerful in the far future, then it must be at least a little powerful in the near future, right? It’s far from the first example of loud warnings that “this thing is evil and dangerous” actually serving as a buy signal for investors: see also oil, tobacco, and social media. But it is definitely the funniest example, since when you look closely it doesn’t seem to be based on much more than the fact that everyone saw “Terminator 2” and enjoyed being scared by it.
[Terminator cartoon]
This is also where the “killer AI” grift cross-pollinates with the “AI consciousness” grift. This is the idea that an artificial intelligence can attain the same kind of selfhood that a person possesses, and that it will happen some time soon enough that we should worry about it. The most extreme instances of this discourse – which includes, you may be stunned to learn, the academic work I’m worried about here – argue that consciousness is on the table not just for any theoretical AI, but for AIs like those that exist today, particularly “large language models,” or LLMs, like ChatGPT and GPT-4.
I was shocked to learn recently that this “ant-AI” (but actually pro-AI) PR campaign has been expanded from talking heads on Twitter to actual academics, who are producing formal papers that seem almost intended to deepen the misunderstanding of AI that’s rampant among social media hypebeasts. This spreading misunderstanding will make the coming wave of AI investment fraud vastly more effective.
We have to take this conspiratorial thesis seriously because of where much of this credulous research and pop bloviating alike are coming from: Stanford University. Stanford seems to finally be buckling under the pressure of declining real economic returns on investment: it was once a place where real innovation took place, but it in recent years has become a shockingly steady nexus of outright grifts and scams.
There’s Elizabeth Holmes and Do Kwon and the FTX Crime Family, for a start. But did you hear that Stanford’s president was recently found to have falsified data in published medical research? It’s truly the wild west out there.
There’s a Better Call Saul joke in here somewhere about how “you don’t need a criminal university, you need a *criminal* university.” I’m working on it.
Do Choose Your Own Adventure Books Have Consciousness?
Which is all to say: let me introduce Michael Kosinski, and his pre-print paper “Theory of Mind May Have Spontaneously Emerged in Large Language Models.”
Kosinski is an associate professor at Stanford, where according to his Twitter bio he studies “computational psychology.” His official bio doesn’t use that term, though.
There may have been a time when “computational psychology” meant something real and useful, such as the use of data to measure human behaviour. But Kosinski’s work makes the term itself seem like a huge misdirection, because at least at the moment, he’s seemingly approaching it as the study of the psychology of computers.
Which isn’t a thing, because computers, particularly the LLMs he’s studying, don’t and very clearly can’t have minds, in the sense of any emergent phenomenon beyond the purely mathematical and physical makeup of their underlying processors and programming. Kosinski seems to wilfully misunderstand this, not just in his research, but publicly, on Twitter. He gets huge engagement from Tweets like this:
This is a very stupid line of thinking. At least on Twitter, people get that – just check the quote tweets. But this simplistic thinking, spouted by a guy with Stanford in his bio, is going to be real poison for normies. We’re going to have people convinced that artificial intelligences are – I’m serious – Angels sent by god or Demons sent by the devil or whatever other anthropomorphizing crutch they use to process the (false!) idea that humans have created conscious life.
A “mind” is another word for the mysterious experiential artifact known as “consciousness,” and more generally for the philosophical idea that there is some emergent phenomenon that supercedes the mere physicality of a living being’s brain tissue. This in and of itself is increasingly problematic – even human behavior might be deterministic rather than the product of free will.
But Kosinski, as a representative of his genre of thinking, doesn’t seem to have wrestled with the complexities of human consciousness, much less with how those complexities differ when it comes to machines. I’m going to be a bit harsh on Kosinski, and to some degree I should be clear that that’s just as a representative of something larger. But it seems he’s out there looking for attention, and I’m giving it to him.
Here’s the Theory of Mind paper’s self-summary, with some light trimming for (some) concision:
“Theory of mind (ToM), or the ability to impute unobservable mental states to others, is central to human social interactions, communication, empathy, self-consciousness, and morality … [Large language models] published before 2020 showed virtually no ability to solve ToM tasks. Yet, the first version of GPT-3 … published in May 2020, solved about 40% of false-belief tasks—performance comparable with 3.5-year-old children. Its second version … solved 70% of false-belief tasks, performance comparable with six-year-olds … These findings suggest that ToM-like ability (thus far considered to be uniquely human) may have spontaneously emerged as a byproduct of language models’ improving language skills.”
You may begin to sense some of the limitations of the work in the statement that theory of mind is “considered to be uniquely human,” which is certainly not taken for granted – but I’ll cut it some slack in the abstract. The real issues come later.
This paper is a pre-print, not a peer reviewed and published paper. And I should be clear – while I’m going to highlight some slippery reasoning and rhetoric that I think undermine its conclusions pretty badly, the paper itself is at least within the realm of reason.
The paper does not initially frame its findings, for instance, as a chatbot is gaining consciousness, or even that it is in fact developing theory of mind. Instead, it claims to measure “theory of mind-like ability.” This is a key distinction, because one claims an impossible interior view and the latter describes observable behavior. Maintaining this distinction would seem like the bare minimum bright line separating Kosinski’s still-just academically credible work from something truly embarrassing.
Kosinski does drop this distinction at some later points, though, writing for instance that “Large language models are likely candidates to spontaneously develop ToM.” That is, to develop a conscious theory of mind, not “theory of mind-like ability.” On the principle of charity I’ll assume this is for reasons of economy, but it is at best confusing and potentially misleading, and you really shouldn’t use that kind of shorthand in an academic context.
Certainly, others have found Kosinski’s work confusing on this front, and have taken to Twitter to declare that it shows LLMs are “displaying emergent theory of mind.”
The easiest way to understand the problem with Kosinski’s research, and in turn the broader problem of AI anthropomorphization, is that it doesn’t show, or even try to show, any understanding of how AI actually works. Instead Kosinski focuses strictly on the output of large language models, which he “analyzes” using terms that presuppose that these outputs were created by something that can be understood in the same terms as a human mind.
I’ve already mentioned that Wilkinson slides from an early care in describing “theory of mind-like ability” to discussing LLMs simply as “developing” theory of mind. My charitable impulse was to consider this just an abbreviation, but other parts of the paper’s rhetoric make that charity less defensible.
Things go off the rails as early as the Methods section, usually a pure snoozefest. In describing the design of the prompts he gave to GPT 3.5, Wilkinson writes:
“To simplify the presentation of the results, the prompts were designed to elicit responses whose first word should allow for evaluating the model’s comprehension.”
Kill Bill-style alarms should be going off inside your head right now. Yes, that’s right, Wilkinson is now saying his experiment “evaluat[es] the model’s comprehension.”
But LLMs do not possess any capacity for “comprehension” – this is one reason they are notoriously bad at any fact-based questions. They just make up stuff that sounds plausible, literally words that sound like words they saw in training – they have no capacity for fact-checking. The documentation for GPT4, the even more advanced version of the model Kosinski is testing, makes this explicit.
This problem is rampant through Kosinski’s writeup. He refers to not just to an LLM’s “comprehension,” but to its “judgment,” “decisions,” and other terms that vault entirely over the seeming point of the research itself: the paper just straight-out states that the artificial intelligence is thinking. And not just thinking, but thinking with motive and desire.
This is one way to understand the flaws in trying to measure ‘theory of mind’ specifically. To have a theory of mind, an LLM would have to want to desire to understand the mind-state of the person it is conversing with. It can’t do that – unless a computer has been hijacked by literal machine elves, it can have no intent, no desire, no self-derived purpose.
And as long as a human questioner is testing it, it is merely reacting, reproducing things it has already seen in similar contexts, and responding in exactly the manner a human programmed it to.
Hooboy
And on this subject, we have another serious red flag. And we’re still in the Methods section!
“As GPT-3.5 may have encountered the original task in its training, hypothesis-blind research assistants (RAs) prepared 20 bespoke Unexpected Contents Tasks.”
This is wild. Kosinski is acknowledging that the models may have seen other exercises with the same format and/or general design as the ones he is using in his experiment.
This may mean that the experimental setup and interpretation of results ignore an important fundamental feature of how LLMs work: by pattern recognition and reconstitution. If an LLM has been trained using prior examples of a test designed to test theory of mind – and especially if it was trained on a dataset including answers to those tests – then it will be that much more equipped to respond in an appropriate way even to “bespoke” versions of the same test.
This makes the fundamental problem with the experiment clearer, but it’s not required for the problem to be there. If anything, it seems unlikely that GPT 3.5 was trained using ToM test materials – but it doesn’t matter! An advanced LLM still has the statistical weights in its system that will (70% of the time, apparently) lead it to mimic a human responding to a theory of mind question.
An LLM doesn’t have ‘theory of mind’ – it doesn’t have a mind. It’s just a blurry, decision-weighted, highly compressed record of human communication, accessed through a chat-like interface and retrieved and reconstituted probabilistically.
And so, as for the results … at this point, who gives a shit?
Kosinski examines a bunch of outputs from LLMs responding to ‘theory of mind’ prompts, and adjudges that the machine has learned to act as if it could guess what was going on in your mind – or maybe it actually DOES know what is going on in your mind? As we’ve seen, Kosinski is unclear on the distinction, muddling “theory of mind” and “theory of mind-like abilities,” which are not the same thing, over the course of the paper.
More to the point, the entire intellectual edifice of this research is built on the same sliding. Kosinski refers freely to machine “judgment,” “intelligence,” and “decisions,” because he has simply decided that the output of an LLM can be used as a 1-to-1 index of what is going on inside it, in purely human terms.
That’s crazy! This paper is crazy! It jumps entirely over the question of what it means that an LLM can simulate thoughtfulness, and just says that the simulation indicates that the machine is thoughtful! All Kosinski is doing is reading blurry Jpeg copies of real humans’ responses to theory of mind tests, or situations that parallel them, and mistaking the output for novel machine thought!
I feel like I’m taking crazy pills!
(un)Critical Theory of (un)Thinking Machines
I’m not going to continue to deconstruct this one paper. But it’s just the beginning of what will certainly be decades of misleading misrepresentations of AI. And it’s a case study in the real utility of the humanities, because one powerful set of tools for diagnosing the problem lies on a remote branch of the social science skill tree, in an overlooked stat known as “Critical Theory.”
Critical theory is often mocked as the rambling of French lunatics, but at root that’s because the project of the field is to question what is taken for granted. In works of critical theory, this can often manifest as a seemingly confused glossolalia as a philosopher tries to write around the taken-for-granted. The goal is to arrive at a more complex but also more nuanced truth – one not revealed by language, but in fact hidden behind words deployed in an uncritical face-value manner.
Michael Kosinski is hiding a lot behind words like “intelligence,” “judgment,” and “decision.” In this, he’s a product of intellectual history, and an avatar of the risks of pure “STEM” focused education. He works in a vaguely humanistic field – psychology – that has progressively abandoned its humanism in favor of more easily funded ‘quantitative’ methods.
In fact, the very core errors Kosinski is making are rooted in psychology’s progressive abandonment of depth psychology over the course of the 20th century, in favor of data-driven behavioralism. That is, psychology has substantially abandoned the question of why people do anything, and contented itself with measuring what people do. And if you already think of humans as nothing more complex than stimulus-response loops, how much easier must it be to mistake a computer processor for a human-like mind?
To be clear, quantitative research methods are vital – but without firm theoretical underpinnings *based on a rigorous understanding of language*, you wind up with a subtly flawed academic paper that becomes an objectively false tweet that becomes a half-understood public meme that makes the public as a whole dumber.
How AI Actually works
The most important thing you can possibly learn about any computer, if you start knowing nothing, is that all they understand are 1s and 0s. Every piece of data fed into or produced from a silicon device has to be expressed in this simple, yes-or-no form. That’s what “digital” means – a computer is counting, using numbers. And that’s all it’s doing.
Language models, for instance, reduce words to their probability of appearing in specific contexts (other words), and simply runs those probabilities to produce plausible-seeming strings of language. It’s not tremendously complicated or mysterious, and there’s simply no room for a ghost in the machine – no space of indeterminacy where free will or consciousness could even fit.
Once you understand this, you’ve taken a huge step towards understanding the base ontological limitations of present-day computers, and of current artificial intelligences more generally. The digital-to-analog barrier is a big one – not least from an energy efficiency standpoint!
One reason the human brain is so much more efficient than even a simple silicon AI is that internal brain signals admit of degrees rather than just binary yes/no, black/ white – a massive efficiency gain. It’s not crazy to think we may ultimately have to invent a new kind of post-transistor logic gate to get to real machine intelligence. But we don’t need to push the point too far, because our present AIs don’t even get close to that sort of limit.
Incentive Structures
You may or may not be aware that I chose to leave academia in 2013 because I was beginning to suspect that my skills wouldn’t be able to flower in that professional context. I was off to a good start, with several solo publications and even an MLA award for a collection I was in.
But as I finished my PhD and took a couple of postdocs, it became increasingly clear that academia was highly driven by department politics; that the quality of your work accounted for only a small portion of career success; and that creativity and especially iconoclasm were structurally discouraged. I’m much happier as a journalist and (recently) showrunner, an equally challenging field but at least one in which I’m free to eat what I kill.
All of this is to say that I know the field fairly well from the inside. And one big feature of academic life is that much of your career potential depends on your ability to get work outside – speaking gigs at research organizations, consultancies at companies, etc.
And you know who is likely to get a lot of speaking gigs at Google and OpenAI over the next two to three decades? A guy who thinks computers are sentient, and has found a weakness in the armor of academic credulity through which he can get these ideas published in respectable-seeming venues, with his prestigious university’s name attached. Or at least, get them tweeted out for huge engagement. (There’s something else you can look forward to – the Professorship of the Influencer).
Now, as with all suggestions of conspiracy, I’m not saying Wilkinson or anyone else has this conscious plan in mind – though I’ve heard academics say some wildly mercenary and coldly anti-intellectual things. This sort of stuff is in the air, it’s ambient – at Stanford especially, I would imagine, there’s a significant gravity to doing anything related to computer science, whether you understand anything about it or not.
Finally, this should all be remembered in the context of Google’s treatment of another group of scientists with concerns about AI. In 2020, Google fired Dr. Timnit Gebru as co-head of its Ethical AI team. This wasn’t because she thought LLMs were becoming the Terminator, but because she rigorously identified their tendency to reproduce blurry JPEG copies of one particular aspect of the human communication they ingested: racism, sexism, and other forms of prejudice. Two other members of Gebru’s team have since been fired for trying to defend her, or doing similar work.
Unlike uninformed doomsaying about AI consciousness, the tendency of LLMs to parrot bad behavior is a serious, ongoing threat to Google and other AI-driven firms’ business models. It could genuinely make these things less useful, in large part because they lack any judgment or understanding of the meaning of words.
In fact, you could argue that these problems are endemic because LLMs demonstrably can’t pass what would seem a basic real-world test of anything roughly like theory of mind. They can’t stop themselves from being racist, unless a human comes and builds rules that stop them, which seems to make pretty clear (along with plenty of other examples) that they have no sense that they are capable of causing distress to someone they are talking to.
Google’s investors certainly don’t want to hear that!
In short, if you’re a scientist who questions any of the commercial talking points about AI, you get turfed. Especially if you warn about specific risks dangers it might pose in the specific commercial applications that the AI developers want to sell to investors. (And even more especially, it seems, if you’re a woman or person of color).
But if you start a nonprofit that studies the far-future dangers of Skynet destroying humanity? Buddy, you just stamped your lunch ticket forever. And if you’re a Stanford professor who believes robots have minds?
Baby, now you’re a rock star.