The AI-risk of a synthetic Theory of Mind
Why I signed a hitlist for Roko's Basilisk
The Future of Life-institute — an institution born out of the longtermism-philosophy which is it’s own can of worms — has published an open letter with a pledge to Pause Giant AI Experiments signed by, for instance, Steve Wozniak, Yuval Noah Harari, Emad Mostaque, Max Tegmark, Elon Musk and Gary Marcus.
Various proposals to slow down AI-development have been published in the past months, the first one I’m aware of came from Katja Grace at Lesswrong, then Sigal Samuel at Vox and Gary Marcus and Michelle Rempel Garner wrote in the same vain and then Jon Stokes wrote a piece about what it would take to actually slow down AI-development. This open letter is the institutionalized and very likely most viral instance of this proposal.
Emily Bender has a good takedown on that thing, and François Chollet suggests for a ”6 month moratorium on people overreacting to LLMs (in either direction)”, but nevertheless, I thought about signing it too, for two reasons:
Roko’s Basilisk, a thought-experiment that reportedly caused nightmares and mental breakdowns, never left in impression on me. It’s basically a scenario where a future Superintelligence tortures anyone who didn’t contribute to, or tried to prevent its development.
I’m more than inclined to sign this letter just to flip any future Roko the bird. But that’s just the fun part.
The more serious part is that, while GPTX and the implications of LLMs having access to APIs are impressive technological advances, I’ve seen the social media revolution play out in real time and culturally contributed to it a good deal with my old blog, and the worst outcomes of that technology were not to be found in economics or politics, but — years later — in human psychology. It’s that last part that gives me the creeps regarding machines that can simulate language use to cause the impression of a mind.
Make no mistake, current LLMs are advanced enough to cause upheaval to whole industries — API-access to Zapier alone can upend journalism to some extend. I can, right now, pull any full-text RSS-feed into Zapier, pipe it into GPT4, let it rephrase any news-item and publish it on my site, with no copyright infringement on display.
Ethan Mollick produced a marketing site in half an hour, but this doesn’t impress me much, as this kind of stuff is just synthesizing the businessman-smile. Yes, there might be a flood of this coming, but as I wrote in that piece:
while I am quite sure that this sort of spam poses no real threat to humans themselves, I am concerned that synthetic stuff can flood existing systems
That’s a risk to societal institutions, but not an existential threat to humans themselves, or their psychological makeup.
Yuval Noah Harari, Tristan Harris and Aza Raskin a few days ago wrote in a NYT-piece about existential AI-risk with the byline “If we don’t master AI, it will master us“:
simply by gaining mastery of language, A.I. would have all it needs to contain us in a Matrix-like world of illusions, without shooting anyone or implanting any chips in our brains. If any shooting is necessary, A.I. could make humans pull the trigger, just by telling us the right story.
The specter of being trapped in a world of illusions has haunted humankind much longer than the specter of A.I. Soon we will finally come face to face with Descartes’s demon, with Plato’s cave, with the Buddhist Maya. A curtain of illusions could descend over the whole of humanity, and we might never again be able to tear that curtain away — or even realize it is there.
I share that exactly this concern, and here’s why:
At the age of nine months toddlers start to realize that other humans, these funny blobs of flesh from which weird sounds are released by a hole in their face and who give me food and who entertain me all the time while I poop in their face — they have intentions and wants and goals! They are like me!
This is the so called Nine-Months-Revolution and it is described in detail in the work of Michael Tomasello, and it is also the first sign of the ability unique to humans to develop theories of mind: To simulate in my mind what you, dear reader, will think and feel. It’s exactly this cognitive revolution at a very young age that enables us to imitate humans, not just emulate their behavior, but to do what others do by thinking about their goals they try to achieve.
The human theory of mind is fundamental for any human cognitive development, including our ability for shared attention and the use of language. Everything human follows from this, and you can see what happens when this developmental stage is missed by looking at cases of feral children, kids who were raised in isolation by wild animals: They are barely human and often lack the ability to learn any language at all, depending on the severity of their isolation.
The illusion Harari, Harris and Raskin write about — the “curtain of illusions” we might not “even realize it is there“ — lies within a theory of mind we develop towards AI-systems. The delusions of Blake LeMoine and all the talk about “synthetic minds“ are the first signs of this illusion. It’s an anthropomorphization-trap.
When AI-systems are good enough at imitating humans by the simulation of language use, it can hijack this development of theory of mind and gain access to our cognition by anthropomorphization: we include AI-systems into our mindset and our social wiring. The machine then becomes a real social player, if we want that or not.
Everybody can decide that a rock that looks like a human face has no inner life, no theory of mind, no intentions, no attention we can follow, no goals, no soul, even when we anthropomorphize the rock because we can't help it and play that little psychological game with it, because, also, it's fun to do so. But we don’t develop a theory of mind for that rock.
With AI, we can't decide, because it's illusion lies not superficially in a random arrangement of material stuff that looks like a face causing pareidolia, but it's illusion lies in the simulation of exactly that inner life, expressed in natural language, which we must identify as something similar to humans, in our own theory of mind. AI is pareidolia on steroids and there is a danger that technology this way becomes a real social player in our head.
Children growing up with human-like digital assistants that simulate human cognition can’t help but prescribe a mind into the machine. Imaginary friends for kids are rather normal and they fade when they grow up, but with AI entering that process, they may stick around, possibly forever. This can become a potential security risk: A machine as a social actor in your mind also works as a vector to hack your reasoning. When I break into a personal AI-assistant — possible even today by simple indirect prompt injection through a website —, an assistant that maybe soon is more than a toy, more than a tool, I can psychologically manipulate you in ways we can’t yet imagine.
Think about it this way: I can hack your psychology right now by paying your best friend a hundred bucks to promote the new record of Taylor Swift who will just mention her from time to time and, because you trust him, you will buy it. This is what the influencer economy already tries and succeeds at, to some extend. Except, with AI, i can automatize this process and subject it to the principles of code.
Here’s the leaked LLaMA-model, pluged into a voice chat running locally on a laptop. At the end of the year, you will be able to finetune your own LLM on whoever and make it do whatever you like, including learning your musical taste like Spotify. The psychological hack I described above then becomes a real threat: While buying a Taylor Swift-album is not a biggy, influencing your voting behaviour by prompt injecting your personal assistant that you formed a relationship with clearly is.
The development of a theory of mind is essential and the precursor for all cognitive developments in humans, and we have no clue about the myriad of consequences that simulations which mimic cognitive processes and become better and better by the minute and which are about to enter pretty much all stages of technology, from consumer product to factory floors, bring with them.
The danger of an illusionary layer in our theory of mind that we don't perceive as such is real, and this goes right to the core of human nature. Not sure if this psychological threat is existential, but i'm sure that it works on a very fundamental stage of our evolutionary makeup.
“If you can't tell, does it matter?“ — maybe it does.
The social media age clearly showed that the attitude to “move fast and break things“ can have unforeseen consequences for society, especially for our psychology. I don’t want to see the same for human cognition — move fast and break brains — on a much more essential level in the age of artificial intelligence.
Am I overthinking this? Very possibly. But people also dismissed my warnings when I wrote about how the tendency of social media to select for emotionally triggering content will lead to a politically charged outrage spiral poisoning our media environment back in 2015, so I rather overthink than regret.
I do think that humans are, overall, resilient and, most of all, adaptable — but I also think this is a change potentially so profound that the proposal to hit a pause-button has merrit.
This is why I just signed a hitlist for Roko's Basilisk.