Limitations of alignment and AI-regulation
The only thing that stops a bad guy with an open source AI is a good guy with an open source AI.
Yesterday, OpenAI-CEO Sam Altman spoke in front of the US-congress and voted for heavily regulating AI. Meanwhile, the EU is on fast track with their AI Act to be the first entity to regulate this technology, and developers complain that this regulation establishes too many hurdles on open source AI research.
I think the open source AI-enthusiasts are out of date and clinge to an anachronistic utopian view of the digital. The old utopian dream of "information wants to be free" already died from a thousand cuts by surveillance tech applied by state actors and trolls on 4chan and by the irresponsible unregulated deployment of addictive, psychoactive, extractive, manipulative social media environments.
I don't buy these age old arguments from the 90s anymore, for quite some time now. I didn't buy them when free information on the web was used to build forums for stochastic terrorism like 8kun, and I don't buy them now. If you want to protect innovative technology from overregulation, you gotta have better arguments than anachronistic memes and wrong equivalences comparing powerful AI-systems to simple tools like spreadsheets and hammers.
AI Regulation and Gun Regulation
We sometimes regulate tools according to the deadly force enclosed in their physical system. The deadlier the power enclosed in that system, the heavier we regulate it. This is easiest to understand with vehicles. We don't regulate bikes, we regulate motorbikes somewhat, we regulate cars, you need a more complicated license for trucks, and you while you can drive a tank, it is extremely heavily regulated.
Similar with guns. The common argument on the pro-gun side is that we don't regulate knifes and that "guns don't kill people, people kill people". But that's not true. All the deadly force of a knive comes from an arm attached to a human nervous system, the deadly force enclosed in the physical system of a knife itself is zero. The deadly force of a headshot on the other hand is nearly completely enclosed within the system of a gun, while the physical force needed to pull the trigger is arguably neglectable. It's the deadly force contained within the gun that kills people.
This is why we, at least in Europe, regulate the shit out of the deadly force enclosed in the physical system of a gun, and only license it to sports guys, for hunting, and professional safety services including police and the military, who have to absolve tests and trainings to find out if those humans might abuse that deadly force.
The same is true for vehicles: We absolve tests and trainings to make sure that, when we steer driving tools in the public which contain a certain amount of deadly force within their system, we don't cause trouble. We don't need a license to ride a bike because it's very, very hard to kill someone with a bike, while we need a license to ride a car because it's pretty easy to kill yourself or someone with a car without proper training.
I dont know much about philosophy of the law, but that's what i would come up with and i think this is the core reason why we regulate guns — and cars, and other heavy tools — in Europe, and why we look in horror at the insane unregulated gun market in the US: it is an philosophically incoherent regulation with deadly consequences for the public.
Needless to say, I am absolutely for regulating the shit out of guns, and while i get the historical arguments brought forth by gun enthusiasts, these are ahistorical comparisons in the sense that american gun regulation is based on revolutionary wars hundreds of years ago. It is insane to apply this logic to legislation in the 21st century.
The comparisons in these debates sound familiar: Why should we regulate AI-systems when we don't regulate tools like spreadsheets and hammers, while you can absolutely wreak havoc with them to a bank and your head, respectively. But what applies to a knife also applies to a hammer: when i smash your head, the deadly force stems from my human body, and the deadly force enclosed in the hammer is near zero, ignoring it's heavy weight. So we don't regulate them.
Meanwhile, the economically disrupting force enclosed in balance sheet on paper is pretty thin, but still we train accountants to work with them and follow the rules of accounting. The economically disrupting force enclosed in a spreadsheet in MS Excel is larger, but still marginable. Maybe an Excel sheet is more sophisticated and comes with automated calculations and rudimentary coding features. Compared to a balance sheet on paper an excel spreadsheet is the equivalent of a bike — maybe a car if your spreadsheet software is really, really good. But an Excel file is not a tank. Thus, we train accountants and white collar workers, but don't really regulate the development of office software.
We have no idea if an LLM is a tank, but researchers keep comparing it over and over and over again to nuclear power and the atombomb. Given this short essay, I'd compare them to the invention of black powder. You can make a gun, and you can make fireworks. Both nuclear and black powder are mediums of energy, and the debate about regulation is, just like with guns, about the energy enclosed in the system itself. The energy contained in a hammer, ignoring it's weight, is not much, while the energy contained in a nuclear reactor is pretty damn high. We regulate one, but not the other.
The very complicated stuff begins when you realize that the energy enclosed in an LLM is knowledge.
LLMs as speech
I've argued before why LLMs are like stochastic libraries, tools for the redistribution and consumption of knowledge in a new, somewhat ambivalent and random way. A library not of text, but of patterns contained within all text, and those patterns can be freely interpolated, reconfigured, remixed and rearanged.
When we regulate AIs, we are regulating a new form of library, and this means that we are in the center of a new free speech and censorship debate.
Most debates on the web around censorship are shallow at best and most people actively support censorship, they just don't realize it. Do you support an absolute free market for porn? If you think the state should prevent me to sell porn at your school, then you are pro-censorship. Did you ever complain that they don't show Evil Dead Rise at preschool? If not, you are pro-censorship. That's what censorship actually is: Regulation of media distribution. Simple as that.
In both Europe and the US we regulate public speech according to the potential harm it can do. We ban calls for violence for obvious reasons, and at least in Europe, when i call you an asshole, you can sue if you want, for attacking my dignity. We regulate movies to be shown to certain demographics of certain ages and we allow porn only in special shops, but not on public display at a supermarket. (Insert a funny "The Internet is for porn"-argument sideline here.)
Certain writings are verboten in Europe, such as instructions for building a bomb, which was one of the central talking points in the big censorship debates of the web2.0 era fifteen years ago: Are you allowed to distribute any text, regardless of its content, on the web freely, as stated in the utopian "information wants to be free"-paradigm of John Perry Barlows declaration of independence of cyberspace, or not?
The central argument back then was that the web mainly functions as a distributing physical structure made of cables, where people communicate freely like on telephone lines. The argument went something like: Why ban instructions for building a bomb with chemicals, when you can obtain the same information from a chemistry textbook, and build the bomb with stuff you can buy at the pharmacy? And the internet routes around censorship anyways, so back off pls.
I never got that argument — i guess i was a bad net activist back in the day —, because the answer to that is simply: Textbooks don't contain instructions for building a bomb, they merely mention the components in other contexts. The potentially deadly force of a chemistry textbook therefore lies within the assemblage of information-bits scattered throughout the book, not within the coherence of the text itself. But the deadly force of the bomb instruction on the other hand lies within the assemblage of the chemical components, which is directly related to it's symbolic representation in the instruction. Thus, we regulate/censor instructions for building a bomb, but not chemistry textbooks.
I'm fine with that, just as I'm fine with regulating guns. The internet routing around legislation is not a viable argument against that legislation.
The problem with LLMs as speech now is that this speech contains pretty much all speech in all combinations. I can get LLMs to say pretty much anything, given a sophisticated enough prompt. A common library does not contain all speech, but is heavily curated. Surely there are libraries containing instructions for building a bomb, but access is restricted.
Limitations of Alignment and Constitutional AI
RLHF, Reinforcement Learning from Human Feedback, is a strategy to curate a LLM, and it's somewhat working, but not all the time. You train a LLM, let humans rate the generated stuff, and feed it back at the machine. You do this for a while, and get a somewhat aligned AI-system.
The more open source you go, the more uncurated your stochastic library usually gets, and it can spit out anything, depending on the data it's trained on. Open Source AI enabled the creation of GPT-4chan, and now researchers created a Dark Web-LLM based on the open source RoBERTa Large Language Model. I don’t need to make explicit what you can do with these things, and given enough skills, you can absolutely create them yourself and run them on a laptop. I don't want that in the hands of the public for the same reasons i don't want unregulated sale of guns.
Returning to the nuclear power analogy: OpenAIs closed AI — and let's just take a look at the bureaucratic linguist monstrosity of the term "Closed OpenAI-AI" for a good measure — is like a socket for electric energy, while Open Source AI is like a nuclear reactor creating that same electric energy. We regulate both electric enegry and nuclear reactors, but only one of them is so heavily regulated its not feasible for the layman to build one, while everybody can install some electric energy in a house with some training.
This is why we regulate Large Open Source AI heavily, but allow for some relaxed regulation on Aligned Closed AI-Systems connected to some tools via API. So, from a legal perspective and compared to gun regulation, given that LLMs arguably contain information that is already legally and rightful censored, i'm all for regulating the shit out of them.
I'm a grownup, and if I want a Stable Diffusion distribution that is able to generate porn in all shapes and form, i should be able to get that, but i don't want this to be available to the broad public, including teenagers. And for LLMs that can give me instructions to build 40000 new kinds of chemical weapons, then hell no, i don't want that in the hands of some hobby biochemists playing libertarian mad scientist in their garage. Absolutely no fucking way.
Let's give this argument some scientific merrit: In Fundamental Limitations of Alignment in Large Language Models researchers
prove that for any behavior that has a finite probability of being exhibited by the model, there exist prompts that can trigger the model into outputting this behavior, with probability that increases with the length of the prompt. This implies that any alignment process that attenuates undesired behavior but does not remove it altogether, is not safe against adversarial prompting attacks.
If this holds up, any behavior a LLM can interpolate from it's training data can be triggered by a prompt, no matter how hard you align the LLM.
How this research plays along with Anthropics new alignment approach of Constitutional AI remains to be seen. Scott Alexander thinks that this can solve the alignment problem, and he may be right. In that case the constitution simply follows legislation, but we are not at this point at the moment, and constitutional AI is not part of the deployed AI systems right now.
This means that, if beforementioned paper is right, then any behavior contained within the interpolatable latent space can be prompted, including bomb instructions, or in the case of GPT-4chan, an automated attack on epilepsy forums with flashing GIF-files. This also means that an LLM trained on our chemistry textbook (but not bomb instructions) still contains the bomb instruction, simply because I can interpolate the data points in the chemistrytextbook-latent space and get the bomb instruction, no matter how much i align the chemistrytextbook-LLM.
This also is directly compatible with a recent paper that found that the so-called emergent behaviors are not emergent at all, but illusions. The bomb instruction is not a true emergent behavior of the chemistrytextbook-LLM, but an outcome of interpolated data in its latent space. As i wrote above, "the potentially deadly force of a chemistry textbook lies within the assemblage of information-bits scattered throughout the book, not the coherence of the text within the book itself". LLMs interpolative nature and the fact that i can prompt any behaviour contained within an LLM then means that this scattered assemblage can be prompted, and the coherence of the bomb instruction is a part of the interpolatable LLM itself.
Arguably, you can make "finding the prompt to get the bomb instruction from a chemistry textbook LLM" so hard that it's pretty much impossible to extract that information from latent space. That's OpenAIs approach of alignment via sometimes hardcoded censorship measures. You want a bomb? I can't do that, Dave.
OpenAI is trying to give you something like a "socket for intelligent software systems", where the source of that intelligent software is regulated and has oversight, but not an "intelligence reactor you can own", like in open source AI. Sam Altman seems very aware of that, and in good old free speech absolutist manner, "Altman really has become the villain of the community". It’s the same stupid techno utopian bullshit all over again.
Therefore, at least for now, i'm pulling a Yudkowski and will say that, especially for Open Source AI, heavy regulation is needed. Ofcourse we should be wary of overregulating the thing and I don't know where the soft spot lies within that tension, but i figure that time will tell.